Diapolo (OP)
|
|
December 27, 2011, 11:15:55 AM |
|
Hi Diapolo,
do you think we can still have such big improvements like in the past ?
Perhaps with AMDs Graphics Core Next architecture and new kernels or the use of new OpenCL features, but I'm not able to write a new kernel from scratch . My current work is only to reorder some instructions in the kernel for better performance ... no big deal . Dia
|
|
|
|
Diapolo (OP)
|
|
December 27, 2011, 11:39:49 AM |
|
Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too. Edit: Guys, try a setting of 64 for the WORKSIZE, it showed good results for me, but still depends on the card! Dia
|
|
|
|
bulanula
|
|
December 27, 2011, 11:41:23 AM |
|
Great work mate !!! So this should be pretty ideal for 5870 cards right ? What is the best setup for owners of 5870s right now in terms of SDK version, kernel, miner etc. ( software ) ? Thanks !
|
|
|
|
Diapolo (OP)
|
|
December 27, 2011, 11:44:22 AM |
|
Great work mate !!! So this should be pretty ideal for 5870 cards right ? What is the best setup for owners of 5870s right now in terms of SDK version, kernel, miner etc. ( software ) ? Thanks ! Well I sold my 5830, so I could only test on 6950, but what I saw there was a boost with Phoenix 1.7 in comparison to CGMINER, which uses Phatk 2.X (which seems somewhat broken with SDK 2.6). If this is not the case for your setup don't blame me . Dia
|
|
|
|
cyberlync
|
|
December 27, 2011, 02:33:48 PM |
|
Thanks for the work!
I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?
Thanks in advance.
|
Giving away your BTC's? Send 'em here: 1F7XgercyaXeDHiuq31YzrVK5YAhbDkJhf
|
|
|
Diapolo (OP)
|
|
December 27, 2011, 03:56:30 PM |
|
Thanks for the work!
I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?
Thanks in advance.
Tell me which filehoster works for you and I can upload it there. But it should be possible to upload without registering! Dia
|
|
|
|
conspirosphere.tk
Legendary
Offline
Activity: 2352
Merit: 1064
Bitcoin is antisemitic
|
|
December 27, 2011, 09:20:52 PM |
|
Thanks, but no thanks. It makes about 405 Mhs on my 5870@965/300, against 456 Mhs with phatk2 on Phoenix 1.7, same string. Ati Driver 11.6.
|
|
|
|
cyberlync
|
|
December 27, 2011, 10:35:24 PM |
|
Thanks for the work!
I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?
Thanks in advance.
Tell me which filehoster works for you and I can upload it there. But it should be possible to upload without registering! Dia Just checked again and it worked, they must have been updating some servers or whatever, sorry to bother you. Thanks again, will send a donation as soon as the client is done downloading the blocks.
|
Giving away your BTC's? Send 'em here: 1F7XgercyaXeDHiuq31YzrVK5YAhbDkJhf
|
|
|
Diapolo (OP)
|
|
December 27, 2011, 11:15:28 PM |
|
Thanks, but no thanks. It makes about 405 Mhs on my 5870@965/300, against 456 Mhs with phatk2 on Phoenix 1.7, same string. Ati Driver 11.6. Same string means you didn't supply VECTORS2 instead of VECTORS, right? Dia
|
|
|
|
conspirosphere.tk
Legendary
Offline
Activity: 2352
Merit: 1064
Bitcoin is antisemitic
|
|
December 28, 2011, 08:18:27 AM |
|
Same string means you didn't supply VECTORS2 instead of VECTORS, right? Dia
This is the string I used with your kernel to get barely 405 Mhs with Phoenix 1.7: phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=64 -a 1000 And this is the string which makes me 455 Mhs with Phoenix 1.7: phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk2 DEVICE=0 VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 -a 1000
|
|
|
|
Diapolo (OP)
|
|
December 28, 2011, 09:52:16 AM |
|
Same string means you didn't supply VECTORS2 instead of VECTORS, right? Dia
This is the string I used with your kernel to get barely 405 Mhs with Phoenix 1.7: phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=64 -a 1000 And this is the string which makes me 455 Mhs with Phoenix 1.7: phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk2 DEVICE=0 VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 -a 1000 Thanks for sharing, on my machine phatk2 is slower with 6950 (VLIW4) and 6550D (VLIW5 - Fusion APU). You have 11.6 as driver!? And I said this kernel is for SDK / Runtime 2.6, which means it's best for 11.12 / 12.1 preview and newer!
Dia
|
|
|
|
naz86
Member
Offline
Activity: 111
Merit: 10
|
|
December 28, 2011, 01:38:31 PM |
|
NICE,
went from 243mhs to 245 on my 5830 on stock clock with parameters:
-k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=6 WORKSIZE=128 -a 1000
|
|
|
|
disclaimer201
Legendary
Offline
Activity: 1526
Merit: 1001
|
|
January 08, 2012, 03:35:15 PM |
|
Interesting. I have Sdk2.6 & 11.12 driver. With your kernel, I get back my better performance, but the cpu bug is back. Once I switch off phoenix and switch on poclbm again, it's gone.
I assume there is no way to get rid of the bug AND have the good performance back again?
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1036
|
|
January 08, 2012, 08:29:57 PM |
|
If you didn't note it, I profiled the performance of Diapolo's kernel on my 5830 using both SDK 2.5 and the performance-robbing 2.6 here with lots of options and RAM clock speeds. It might give insights of where to go on the kernel for current driver OpenCL performance; the 58xx is a bit different than Diapolo's 5770 in how it responds to worksize.
|
|
|
|
Diapolo (OP)
|
|
January 08, 2012, 09:17:06 PM |
|
If you didn't note it, I profiled the performance of Diapolo's kernel on my 5830 using both SDK 2.5 and the performance-robbing 2.6 here with lots of options and RAM clock speeds. It might give insights of where to go on the kernel for current driver OpenCL performance; the 58xx is a bit different than Diapolo's 5770 in how it responds to worksize. For my setup (6950 + 6550D) the strange thing is, that CGMINER (phatk2.x) is slower, no matter how it's configured. The 6950 is quite faster, for the 6550D the difference makes only a few MH/s. Dia
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
January 11, 2012, 08:20:16 AM |
|
So i've got cat12.1 Crossfire 6870's 1000core 600mem Phoenix 1.7.3 And 1.7.2 Using Dia's most recent custom kernal And it Pegs my cpu core at 100% It also doesnt display hashrates correctly, Unless it actually is only getting 205mh/sec with -v -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=256 -k phatk AGGRESSION=11 FASTLOOP=false VECTORS2 WORKSIZE=128 -v -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=64
It just seems to Shit all over my cards... and also Pegs my cpu core upto 100%
What the hell am i doing wrong? Even the Default Phoenix kernal brings back the 100% cpu bug....
And yet, GUIminer Poclbm with -v -w128 -f1 gives me 298mh/sec and no cpu bug.... Really i WANT to use this kernal, But why the hell is it failing?
Poclbm is apparently Stupid Easy to use. And as such i would assume that it cannot do as much as other miners can. So i came to Phoenix, It was either that or CG, And CG miner ALSO brings back 100%cpu usage, but atleast cg miner gives me 294mhash/sec
use aggression 10 or lower to avoid 100% cpu. you'll get about 40% cpu with 11, and 100 with 12+. 10 and lower will be maybe 4-5%
|
|
|
|
blandead
Newbie
Offline
Activity: 46
Merit: 0
|
|
January 12, 2012, 03:57:25 AM |
|
Hi Diapolo,
I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.
Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)
It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)
So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.
I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.
I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4. But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.
Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!
|
|
|
|
Diapolo (OP)
|
|
January 12, 2012, 06:28:01 AM Last edit: January 12, 2012, 01:02:05 PM by Diapolo |
|
Hi Diapolo,
I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.
Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)
It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)
So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.
I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.
I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4. But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.
Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!
Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/. Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts! Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix. Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release. Dia PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1001
Okey Dokey Lokey
|
|
January 12, 2012, 09:41:24 PM |
|
Hi Diapolo,
I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.
Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)
It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)
So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.
I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.
I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4. But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.
Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!
Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/. Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts! Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix. Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release. Dia PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks! Hohohoo!, Thank you guy what told me to drop Aggro to 10, No more cpu pegging! I just have one more problem, My 2nd gpu uses about 91-98% usage.. While 1st gpu is at a Lovely 99%... 'Sup? Both gpus are on different cpu cores..
|
|
|
|
blandead
Newbie
Offline
Activity: 46
Merit: 0
|
|
January 12, 2012, 11:48:11 PM |
|
Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.
Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!
Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.
Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.
Dia
PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!
Sounds great, I'm looking forward to your next release! Even though wavefront may get crippled a little, with worksize=192 on vectors4 I didn't see much of a difference in the number of shares output, that's why I'm hoping to try it with vectors3. I'll definitely be sending a donation your way tomorrow!
|
|
|
|
|