Diapolo (OP)
|
|
January 13, 2012, 11:02:21 AM |
|
Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.
Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!
Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.
Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.
Dia
PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!
Sounds great, I'm looking forward to your next release! Even though wavefront may get crippled a little, with worksize=192 on vectors4 I didn't see much of a difference in the number of shares output, that's why I'm hoping to try it with vectors3. I'll definitely be sending a donation your way tomorrow! I took a deep look into Phoenix, the initial number of nonces to run per execution is 1 << AGRESSION, so this currently seems to be a value, which is always evenly divisable by 64. That means it is NOT evenly divisable by 192, which makes 192 as WORKSIZE invalid. I'm not sure how to change this to allow for 192 as valid value, whithout breaking other things in the code. Internal tests with my latest kernel show good results with "VECTORS2 WORKSIZE=128" and even with "VECTORS4 WORKSIZE=64" on VLIW5 GPUs, so perhaps 192 is not needed ... will see. I'm currently working on release notes, stay tuned. Dia
|
|
|
|
Diapolo (OP)
|
|
January 13, 2012, 11:44:43 AM |
|
A new version is ready for your testing pleasure: Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23phighlights:- the child has it's name, I call it phatk_dia - would be nice if you guys use this in discussions to be clear what your kernel is - faster on VLIW5 GPUs with VECTORS2 and VECTORS4 - more efficient on VLIW4 GPUs with VECTORS2 and a little faster with VECTORS4 - FASTLOOP defaults to false, so you don't need to supply FASTLOOP=false - added an extended check for supplied WORKSIZE parameter - removed a pyOpenCL finish() to reduce API overhead (could cause problems, but works here -> consider this beta till it proves stable) Please report and give me all your coins :-D! Edit: Please don't complain if this doesn't work good for non 2.6 SDK / Runtime versions, because this IS for 2.6 or later!Dia
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1001
Okey Dokey Lokey
|
|
January 13, 2012, 05:30:35 PM Last edit: January 13, 2012, 05:43:03 PM by JackRabiit |
|
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD
Ughhh.... Memclock seems to be mandatory to have set near 1000 otherwise i lose speed, -k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's 1000core 600mem=285mh/s @81°c fans at 90% 1000core 1000mem=317mh/s @88°C fans at 100%
I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,
Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s
|
|
|
|
Diapolo (OP)
|
|
January 13, 2012, 05:46:27 PM |
|
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD
Ughhh.... Memclock seems to be mandatory to have set near 1000 otherwise i lose speed, -k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's 1000core 600mem=285mh/s @81°c fans at 90% 1000core 1000mem=317mh/s @88°C fans at 100%
I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,
Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s
Is this version any faster for you, what were your results with the last version as a comparison. Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that. Have you more rejects or more shares submitted with this version? Dia
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1001
Okey Dokey Lokey
|
|
January 13, 2012, 05:49:22 PM Last edit: January 13, 2012, 06:03:30 PM by JackRabiit |
|
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD
Ughhh.... Memclock seems to be mandatory to have set near 1000 otherwise i lose speed, -k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's 1000core 600mem=285mh/s @81°c fans at 90% 1000core 1000mem=317mh/s @88°C fans at 100%
I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,
Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s
Is this version any faster for you, what were your results with the last version as a comparison. Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that. Have you more rejects or more shares submitted with this version? Dia 316.8mh/sec with previous kernal, 317.6mh/sec with new kernal No this version does not Appear to be Noteably faster for me, But, On the other hand, My comp feels like its "mining cleaner" i cant really describe it.. Samespeeds.. Less desktoplag... Yes my cards are Crossfired. When i set just the 2nd gpu to PhatkD, It does what it should, It goes to 99% and gives out exactly the same as what gpu 1 does, But as soon as i enable gpu 1 to mine at the same time as gpu 2 (with gpu2 starting first and running at 99%) Then it Drops off, to 98-92% fluttering, I'll add a pic. Both cards are on differnt cpu cores, just in case I occasionally get smacked with a stale share RIGHT AWAY, But after that everything is normal.. and it's only occasional, and it's like "Star---OMFG INVALI--Running" http://imageshack.us/f/718/28674354.png/<--Only useful info is the MSI window
|
|
|
|
Diapolo (OP)
|
|
January 13, 2012, 05:52:05 PM |
|
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD
Ughhh.... Memclock seems to be mandatory to have set near 1000 otherwise i lose speed, -k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's 1000core 600mem=285mh/s @81°c fans at 90% 1000core 1000mem=317mh/s @88°C fans at 100%
I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,
Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s
Is this version any faster for you, what were your results with the last version as a comparison. Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that. Have you more rejects or more shares submitted with this version? Dia Testing, Expect an update in 6mins 6870s are VLIW5, so I'm hoping for good news. Dia
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1001
Okey Dokey Lokey
|
|
January 13, 2012, 05:58:27 PM |
|
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD
Ughhh.... Memclock seems to be mandatory to have set near 1000 otherwise i lose speed, -k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's 1000core 600mem=285mh/s @81°c fans at 90% 1000core 1000mem=317mh/s @88°C fans at 100%
I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,
Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s
Is this version any faster for you, what were your results with the last version as a comparison. Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that. Have you more rejects or more shares submitted with this version? Dia Testing, Expect an update in 6mins 6870s are VLIW5, so I'm hoping for good news. Dia Well then sorry for the sad news :|
|
|
|
|
Diapolo (OP)
|
|
January 13, 2012, 06:00:51 PM |
|
Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD
Ughhh.... Memclock seems to be mandatory to have set near 1000 otherwise i lose speed, -k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's 1000core 600mem=285mh/s @81°c fans at 90% 1000core 1000mem=317mh/s @88°C fans at 100%
I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,
Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s
Is this version any faster for you, what were your results with the last version as a comparison. Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that. Have you more rejects or more shares submitted with this version? Dia Testing, Expect an update in 6mins 6870s are VLIW5, so I'm hoping for good news. Dia Well then sorry for the sad news :| Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too. Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure. Dia
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1001
Okey Dokey Lokey
|
|
January 13, 2012, 06:05:06 PM |
|
Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too. Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure.
Dia
Deleted all .Elf's VECTORS2 and WORKSIZE=128, 305mh/s VECTORS4 and WORKSIZE=64, 317mh/s Please note, That i apparently MUST have my memclock at 1000 or i cannot reach these speeds, Problem, Is heat, If my mem is at 1000, Then i cant run my core at 1000, It gets too hot, With my mem at 600 VECTORS2 and WORKSIZE=128, 283mh/s VECTORS4 and WORKSIZE=64, 287mh/s I currently have two different problems with running PhatkD 2nd gpu dances and messes around and mem clock Must be at 1000<--Bullocks, That kills my cards
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1001
Okey Dokey Lokey
|
|
January 13, 2012, 06:17:23 PM |
|
Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too. Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure.
Dia
Deleted all .Elf's VECTORS2 and WORKSIZE=128, 307.8mh/sVECTORS4 and WORKSIZE=64, 314.2mh/s Please note, That i apparently MUST have my memclock at 1000 or i cannot reach these speeds, Problem, Is heat, If my mem is at 1000, Then i cant run my core at 1000, It gets too hot, With my mem at 600 VECTORS2 and WORKSIZE=128, 283mh/s VECTORS4 and WORKSIZE=64, 287mh/s I currently have two different problems with running PhatkD 2nd gpu dances and messes around and mem clock Must be at 1000<--Bullocks, That kills my cards !!!!!!!!!!! Just noticed... That after deleting all .elf's I've lost performance.... But it's like 2mhash/s and could simply just be that fact that im using my comp while doing these testsWtf.. Killing, those .elf's made 128 run better, aswell as made 64 run worse? Thats gotta be inaccurate on my part...
|
|
|
|
Diapolo (OP)
|
|
January 13, 2012, 06:20:07 PM Last edit: January 13, 2012, 06:37:49 PM by Diapolo |
|
Ok, I'll let you first play around a bit, before asking for a performance comparison . I asked, what's happening, if only one card is mining in terms of GPU2 usage "bug", does it go up to 99% then? Are the cards connected via Crossfirebridge? What OS and driver are you on? Edit: By the way, did you try to lower mem clock even more via MSI Afterburner and unofficial overclocking mode? Dia
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1001
Okey Dokey Lokey
|
|
January 13, 2012, 07:14:50 PM Last edit: January 13, 2012, 07:57:45 PM by JackRabiit |
|
Ok, I'll let you first play around a bit, before asking for a performance comparison . I asked, what's happening, if only one card is mining in terms of GPU2 usage "bug", does it go up to 99% then?<---... I Said yeah, it works flawlessly when running alone Are the cards connected via Crossfirebridge?<---I said yes, What OS and driver are you on?<-Win7x64 sdk2.6 cat 12.1 Edit: By the way, did you try to lower mem clock even more via MSI Afterburner and unofficial overclocking mode? Dia I never saw a good reason to drop my mem below 600, But i cant do it Easily... I'll go do 1000core 315mem and post results Aswell as 1000core 1000mem. Using GUIminer+PhatkD, MSIa, sdk 2.6, cat 12.1, crossfired 6870's 1kcore 300mem=255mh/s 70°C Fans@70% 1kcore 1kmem=314.8mh/s 88°C Fans@100% Using GUIminer+pcolbm, MSIa, sdk 2.6, cat 12.1, crossfired 6870's 1kcore 500mem=307mh/s 77°C Fans@ 80% 1kcore 1kmem=307mh/s OverheatShutdown.
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1036
|
|
January 13, 2012, 08:10:41 PM |
|
Benchmarks on a 5770 (VLIW5, 800 stream processors, 980MHz core [scales more like 5870 than 5830]), Catalyst 12.1a/SDK 2.6, Phoenix 1.7.3 exe, win7 x32: Typical command line (single cpu affinity, realtime priority): start /AFFINITY 08 /REALTIME phoenix.exe -v -u http://xx/ -k dia VECTORS4 AGGRESSION=12 FASTLOOP=False WORKSIZE=64 | | worksize: | 64 | 128 | 256 | phatk2 | VECTORS4 | 1000MHz | 223.88 | 226.34 | 181.40 | phatk2 | VECTORS | 1000MHz | 197 | 205 | 195 | dia_new | VECTORS4 | 1000MHz | 223.28 | 225.48 | 195.75 | dia_new | VECTORS2 | 1000MHz | 215.71 | 220.37 | 212.23 | dia_last | VECTORS4 | 1000MHz | 207.27 | 200.41 | |
less MH/s than phatk2, peak performance at 1000MHz RAM...
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1001
Okey Dokey Lokey
|
|
January 13, 2012, 08:19:00 PM |
|
Benchmarks on a 5770 (VLIW5, 800 stream processors, 980MHz core [scales more like 5870 than 5830]), Catalyst 12.1a/SDK 2.6, Phoenix 1.7.3 exe, win7 x32: Typical command line (single cpu affinity, realtime priority): start /AFFINITY 08 /REALTIME phoenix.exe -v -u http://xx/ -k dia VECTORS4 AGGRESSION=12 FASTLOOP=False WORKSIZE=64 PLATFORM=0 DEVICE=0 | | worksize: | 64 | 128 | 256 | phatk2 | VECTORS4 | 1000MHz | 223.88 | 226.34 | 181.40 | phatk2 | VECTORS | 1000MHz | 197 | 205 | 195 | dia_new | VECTORS4 | 1000MHz | 223.28 | 225.48 | 195.75 | dia_new | VECTORS2 | 1000MHz | 215.71 | 220.37 | 212.23 |
less MH/s than phatk2, peak performance at 1000MHz RAM... Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1036
|
|
January 13, 2012, 08:26:06 PM |
|
Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...
This is something that has changed in SDK 2.6; The best performance at the best settings after trying all options comes at a GPU RAM speed of 1000MHz (stock speed for most cards) instead of at an underclock of 300MHz-370MHz. Version 2.6, included with driver 11.12 and 12.1, is significantly different in how it responds to worksizes, vector settings, and OpenCL programming than the previous SDKs. It is a benefit in that one doesn't need oddly tweak memory speeds from stock to get the best performance (annoying to tell noobs over and over to underclock RAM), but bad in that this old quirk was actually an electricity saver if you did it.
|
|
|
|
Diapolo (OP)
|
|
January 13, 2012, 10:37:18 PM |
|
Benchmarks on a 5770 (VLIW5, 800 stream processors, 980MHz core [scales more like 5870 than 5830]), Catalyst 12.1a/SDK 2.6, Phoenix 1.7.3 exe, win7 x32: Typical command line (single cpu affinity, realtime priority): start /AFFINITY 08 /REALTIME phoenix.exe -v -u http://xx/ -k dia VECTORS4 AGGRESSION=12 FASTLOOP=False WORKSIZE=64 | | worksize: | 64 | 128 | 256 | phatk2 | VECTORS4 | 1000MHz | 223.88 | 226.34 | 181.40 | phatk2 | VECTORS | 1000MHz | 197 | 205 | 195 | dia_new | VECTORS4 | 1000MHz | 223.28 | 225.48 | 195.75 | dia_new | VECTORS2 | 1000MHz | 215.71 | 220.37 | 212.23 | dia_last | VECTORS4 | 1000MHz | 207.27 | 200.41 | |
less MH/s than phatk2, peak performance at 1000MHz RAM... I really have a problem with this results, I simply don't understand, why VLIW5 cards with different stream processor counts behave THAT different. Take a look at my result with 6550D (VLIW5 / 400 shaders / 800 MHz Mem via DDR3-1600): phatk2 VECTORS WORKSIZE=128: 61,54 MH/s phatk_dia VECTORS2 WORKSIZE=128: 67,15 MH/sAnyone with 69XX hardware willing to test, it seems a bit quiet in here . Dia
|
|
|
|
TurdHurdur
|
|
January 13, 2012, 11:26:28 PM |
|
Is FASTLOOP broken? I get: Unhandled error in Deferred: Unhandled Error Traceback (most recent call last): File "twisted\internet\defer.pyc", line 361, in callback
File "twisted\internet\defer.pyc", line 455, in _startRunCallbacks
File "twisted\internet\defer.pyc", line 542, in _runCallbacks
File "QueueReader.pyc", line 136, in preprocess
--- <exception caught here> --- File "twisted\internet\defer.pyc", line 133, in maybeDeferred
File "kernels\phatk_dia\__init__.py", line 167, in <lambda>
File "kernels\phatk_dia\__init__.py", line 381, in preprocess
File "kernels\phatk_dia\__init__.py", line 377, in updateIterations
exceptions.UnboundLocalError: local variable 'EXP' referenced before assignment attempting to use it...
|
|
|
|
Diapolo (OP)
|
|
January 14, 2012, 12:00:09 PM Last edit: January 14, 2012, 12:25:57 PM by Diapolo |
|
Is FASTLOOP broken? I get: Unhandled error in Deferred: Unhandled Error Traceback (most recent call last): File "twisted\internet\defer.pyc", line 361, in callback
File "twisted\internet\defer.pyc", line 455, in _startRunCallbacks
File "twisted\internet\defer.pyc", line 542, in _runCallbacks
File "QueueReader.pyc", line 136, in preprocess
--- <exception caught here> --- File "twisted\internet\defer.pyc", line 133, in maybeDeferred
File "kernels\phatk_dia\__init__.py", line 167, in <lambda>
File "kernels\phatk_dia\__init__.py", line 381, in preprocess
File "kernels\phatk_dia\__init__.py", line 377, in updateIterations
exceptions.UnboundLocalError: local variable 'EXP' referenced before assignment attempting to use it... I wrote this in the first posting, yes it is broken currently! I'm looking into it. Are you sure it's needed for you? Edit: self.loopExponent = int(max(0, EXP)) causes this error, but I'm not sure yet, why this happens with my init and not the default one ... Edit 2: Fix is to place another tabstop at the beginning in line 377 in front of self.loopExponent = int(max(0, EXP))! Wow that's a stupid one. Will upload a fixed version later today. Edit 3: It has to look like this in an editor: if not (rate <= 0): # calculate the number of iterations to run EXP = max(0, (math.log(rate)/math.log(2)) - (self.AGGRESSION - 8)) # prevent switching between loop exponent sizes constantly if EXP > self.loopExponent + 0.54: EXP = round(EXP) elif EXP < self.loopExponent - 0.65: EXP = round(EXP) else: EXP = self.loopExponent
self.loopExponent = int(max(0, EXP)) Dia
|
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1036
|
|
January 14, 2012, 01:45:39 PM |
|
| | worksize: | 64 | 128 | 256 | phatk2 | VECTORS | 1000MHz | 197 | 205 | 195 | dia_new | VECTORS2 | 1000MHz | 215.71 | 220.37 | 212.23 |
phatk2 VECTORS WORKSIZE=128: 61.54 MH/s phatk_dia VECTORS2 WORKSIZE=128: 67.15 MH/s
That corresponds closely with the two-vector results I quote, however in finding the highest output possible from a GPU, VECTORS 4 (@ 64 or 128, depending on card), phatk2 still eeks out a win for me.
|
|
|
|
|