Bitcoin Forum
April 26, 2024, 05:56:17 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 »  All
  Print  
Author Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13  (Read 106673 times)
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 11:15:55 AM
 #341

Hi Diapolo,

do you think we can still have such big improvements like in the past ?

Perhaps with AMDs Graphics Core Next architecture and new kernels or the use of new OpenCL features, but I'm not able to write a new kernel from scratch Wink.
My current work is only to reorder some instructions in the kernel for better performance ... no big deal Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Remember that Bitcoin is still beta software. Don't put all of your money into BTC!
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714110977
Hero Member
*
Offline Offline

Posts: 1714110977

View Profile Personal Message (Offline)

Ignore
1714110977
Reply with quote  #2

1714110977
Report to moderator
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 11:39:49 AM
 #342

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9

Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.

Edit: Guys, try a setting of 64 for the WORKSIZE, it showed good results for me, but still depends on the card!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
bulanula
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile
December 27, 2011, 11:41:23 AM
 #343

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9

Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.

Dia

Great work mate !!!

So this should be pretty ideal for 5870 cards right ? What is the best setup for owners of 5870s right now in terms of SDK version, kernel, miner etc. ( software ) ? Thanks !
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 11:44:22 AM
 #344

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9

Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.

Dia

Great work mate !!!

So this should be pretty ideal for 5870 cards right ? What is the best setup for owners of 5870s right now in terms of SDK version, kernel, miner etc. ( software ) ? Thanks !

Well I sold my 5830, so I could only test on 6950, but what I saw there was a boost with Phoenix 1.7 in comparison to CGMINER, which uses Phatk 2.X (which seems somewhat broken with SDK 2.6). If this is not the case for your setup don't blame me Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
cyberlync
Full Member
***
Offline Offline

Activity: 226
Merit: 100



View Profile
December 27, 2011, 02:33:48 PM
 #345

Thanks for the work!

I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?

Thanks in advance.

Giving away your BTC's? Send 'em here: 1F7XgercyaXeDHiuq31YzrVK5YAhbDkJhf
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 03:56:30 PM
 #346

Thanks for the work!

I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?

Thanks in advance.

Tell me which filehoster works for you and I can upload it there. But it should be possible to upload without registering!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
conspirosphere.tk
Legendary
*
Offline Offline

Activity: 2352
Merit: 1064


Bitcoin is antisemitic


View Profile
December 27, 2011, 09:20:52 PM
 #347

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.
Dia

Thanks, but no thanks. It makes about 405 Mhs on my 5870@965/300, against 456 Mhs with phatk2 on Phoenix 1.7, same string. Ati Driver 11.6.
cyberlync
Full Member
***
Offline Offline

Activity: 226
Merit: 100



View Profile
December 27, 2011, 10:35:24 PM
 #348

Thanks for the work!

I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?

Thanks in advance.

Tell me which filehoster works for you and I can upload it there. But it should be possible to upload without registering!

Dia

Just checked again and it worked, they must have been updating some servers or whatever, sorry to bother you. Thanks again, will send a donation as soon as the client is done downloading the blocks.

Giving away your BTC's? Send 'em here: 1F7XgercyaXeDHiuq31YzrVK5YAhbDkJhf
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 11:15:28 PM
 #349

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.
Dia

Thanks, but no thanks. It makes about 405 Mhs on my 5870@965/300, against 456 Mhs with phatk2 on Phoenix 1.7, same string. Ati Driver 11.6.

Same string means you didn't supply VECTORS2 instead of VECTORS, right?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
conspirosphere.tk
Legendary
*
Offline Offline

Activity: 2352
Merit: 1064


Bitcoin is antisemitic


View Profile
December 28, 2011, 08:18:27 AM
 #350

Same string means you didn't supply VECTORS2 instead of VECTORS, right?
Dia

This is the string I used with your kernel to get barely 405 Mhs with Phoenix 1.7:
phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=64 -a 1000

And this is the string which makes me 455 Mhs with Phoenix 1.7:
phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk2 DEVICE=0 VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 -a 1000
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 28, 2011, 09:52:16 AM
 #351

Same string means you didn't supply VECTORS2 instead of VECTORS, right?
Dia

This is the string I used with your kernel to get barely 405 Mhs with Phoenix 1.7:
phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=64 -a 1000

And this is the string which makes me 455 Mhs with Phoenix 1.7:
phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk2 DEVICE=0 VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 -a 1000

Thanks for sharing, on my machine phatk2 is slower with 6950 (VLIW4) and 6550D (VLIW5 - Fusion APU).
You have 11.6 as driver!? And I said this kernel is for SDK / Runtime 2.6, which means it's best for 11.12 / 12.1 preview and newer!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
naz86
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
December 28, 2011, 01:38:31 PM
 #352

NICE,

went from 243mhs to 245 on my 5830 on stock clock with parameters:

-k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=6 WORKSIZE=128 -a 1000
disclaimer201
Legendary
*
Offline Offline

Activity: 1526
Merit: 1001


View Profile
January 08, 2012, 03:35:15 PM
 #353

Interesting. I have Sdk2.6 & 11.12 driver. With your kernel, I get back my better performance, but the cpu bug is back. Once I switch off phoenix and switch on poclbm again, it's gone.

I assume there is no way to get rid of the bug AND have the good performance back again?
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1025



View Profile WWW
January 08, 2012, 08:29:57 PM
 #354

If you didn't note it, I profiled the performance of Diapolo's kernel on my 5830 using both SDK 2.5 and the performance-robbing 2.6 here with lots of options and RAM clock speeds. It might give insights of where to go on the kernel for current driver OpenCL performance; the 58xx is a bit different than Diapolo's 5770 in how it responds to worksize.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 08, 2012, 09:17:06 PM
 #355

If you didn't note it, I profiled the performance of Diapolo's kernel on my 5830 using both SDK 2.5 and the performance-robbing 2.6 here with lots of options and RAM clock speeds. It might give insights of where to go on the kernel for current driver OpenCL performance; the 58xx is a bit different than Diapolo's 5770 in how it responds to worksize.

For my setup (6950 + 6550D) the strange thing is, that CGMINER (phatk2.x) is slower, no matter how it's configured. The 6950 is quite faster, for the 6550D the difference makes only a few MH/s.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
January 11, 2012, 08:20:16 AM
 #356

So i've got cat12.1
Crossfire 6870's 1000core 600mem
Phoenix 1.7.3 And 1.7.2
Using Dia's most recent custom kernal
And it Pegs my cpu core at 100%
It also doesnt display hashrates correctly, Unless it actually is only getting 205mh/sec with
-v -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=256
-k phatk AGGRESSION=11 FASTLOOP=false VECTORS2 WORKSIZE=128
-v -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=64

It just seems to Shit all over my cards... and also Pegs my cpu core upto 100%

What the hell am i doing wrong?
Even the Default Phoenix kernal brings back the 100% cpu bug....

And yet, GUIminer Poclbm with -v -w128 -f1 gives me 298mh/sec and no cpu bug....
Really i WANT to use this kernal, But why the hell is it failing?

Poclbm is apparently Stupid Easy to use. And as such i would assume that it cannot do as much as other miners can.
So i came to Phoenix, It was either that or CG, And CG miner ALSO brings back 100%cpu usage, but atleast cg miner gives me 294mhash/sec

use aggression 10 or lower to avoid 100% cpu. you'll get about 40% cpu with 11, and 100 with 12+. 10 and lower will be maybe 4-5%

blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 12, 2012, 03:57:25 AM
 #357

Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 12, 2012, 06:28:01 AM
Last edit: January 12, 2012, 01:02:05 PM by Diapolo
 #358

Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 12, 2012, 09:41:24 PM
 #359

Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Hohohoo!, Thank you guy what told me to drop Aggro to 10, No more cpu pegging!
I just have one more problem, My 2nd gpu uses about 91-98% usage.. While 1st gpu is at a Lovely 99%... 'Sup? Both gpus are on different cpu cores..

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 12, 2012, 11:48:11 PM
 #360

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Sounds great, I'm looking forward to your next release! Even though wavefront may get crippled a little, with worksize=192 on vectors4 I didn't see much of a difference in the number of shares output, that's why I'm hoping to try it with vectors3. I'll definitely be sending a donation your way tomorrow!
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [18] 19 20 21 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!