Bitcoin Forum
April 20, 2024, 03:45:51 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 4 5 6 »  All
  Print  
Author Topic: DiaKGCN kernel for CGMINER + Phoenix 2 (79XX / 78XX / 77XX / GCN) - 2012-05-25  (Read 27711 times)
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 02, 2012, 07:30:30 PM
 #21

Okay, I've downloaded the kernel and am trying it now.  So far, not bad.  The Vectors4 still has the whole issue with showing twice as many hashes as are actually computing, but I think that has to do with the init file as you said there were incompatibilities in the code when using the VECTORS4 option.  Also, why the (u) variable when using bitselect?
I like how you used the nonce here.  It seems that it could be better than using a series of if-else statements.
You've managed to keep the instructions low, but somehow the darn thing's not hashing faster.  Probably because it's not repeating the same task again and again for and with the same variables.  But, as you said, it's optimized for GCN so I have no idea.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
According to NIST and ECRYPT II, the cryptographic algorithms used in Bitcoin are expected to be strong until at least 2030. (After that, it will not be too difficult to transition to different algorithms.)
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713584751
Hero Member
*
Offline Offline

Posts: 1713584751

View Profile Personal Message (Offline)

Ignore
1713584751
Reply with quote  #2

1713584751
Report to moderator
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 02, 2012, 07:50:46 PM
 #22

Okay, I've downloaded the kernel and am trying it now.  So far, not bad.  The Vectors4 still has the whole issue with showing twice as many hashes as are actually computing, but I think that has to do with the init file as you said there were incompatibilities in the code when using the VECTORS4 option.  Also, why the (u) variable when using bitselect?
I like how you used the nonce here.  It seems that it could be better than using a series of if-else statements.
You've managed to keep the instructions low, but somehow the darn thing's not hashing faster.  Probably because it's not repeating the same task again and again for and with the same variables.  But, as you said, it's optimized for GCN so I have no idea.

VEC4 is bugged until I say it got fixed, sorry Cheesy. The (u) is a typecast because afther round 64 I use some mixed scalar and vector values and this is needed to cast them even.
For me this is the fastest version on my 7970 ... but it seems no one cares to try it (on GCN cards).

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
sveetsnelda
Hero Member
*****
Offline Offline

Activity: 642
Merit: 500


View Profile
February 02, 2012, 10:42:27 PM
 #23

I tried it, and I'm sorry that I haven't reported back.  Work has been chaotic.

The only way that I can get a similar hashrate compared to DiabloMiner with this kernel is to use a very high intensity (greater than 10).  By doing this though, CPU usage skyrockets and I burn up more wattage than the hashrate increase is worth.  I can make a few changes to the Poclbm kernel included with CGMiner though and get 96 percent of the performance of DiabloMiner while leaving the intensity at 9.  By using CGMiner, I am able to use a backup pools, RPC, thermal controls, etc, etc.  This more than makes up for the ~4 percent loss in performance.  I'm not at home right now to look at every change, but defining the Ch and Ma functions to use Bitselect is basically all that was needed.

I'll try to send you a PM tonight with more details.

14u2rp4AqFtN5jkwK944nn741FnfF714m7
blissfulyoshi
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
February 03, 2012, 12:41:43 AM
 #24

What I was asking about the name earlier in the previous thread is why the naming of this version changed?

current: diaggcn
thread title/previous: diakgcn

Oh well, minor thing, just changed the name of my inputs into phoenix. Keep up the good work.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 03, 2012, 06:43:32 AM
 #25

What I was asking about the name earlier in the previous thread is why the naming of this version changed?

current: diaggcn
thread title/previous: diakgcn

Oh well, minor thing, just changed the name of my inputs into phoenix. Keep up the good work.

ROFL ... I did a typing error, wow that is hard. Will upload a fixed one asap Cheesy. Sorry for the confusion, yesterday was a bit hard Cheesy.

Update: Fixed my typo ;-), download is back!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 03, 2012, 06:59:56 AM
 #26

I tried it, and I'm sorry that I haven't reported back.  Work has been chaotic.

The only way that I can get a similar hashrate compared to DiabloMiner with this kernel is to use a very high intensity (greater than 10).  By doing this though, CPU usage skyrockets and I burn up more wattage than the hashrate increase is worth.  I can make a few changes to the Poclbm kernel included with CGMiner though and get 96 percent of the performance of DiabloMiner while leaving the intensity at 9.  By using CGMiner, I am able to use a backup pools, RPC, thermal controls, etc, etc.  This more than makes up for the ~4 percent loss in performance.  I'm not at home right now to look at every change, but defining the Ch and Ma functions to use Bitselect is basically all that was needed.

I'll try to send you a PM tonight with more details.

I really would like to port this one into CGMiner (or help in getting it ported), but I did not have the time to do so AND I guess I need help in doing commits for CGMiner. I will send a PM to Con, perhaps he is interested ...

By the way, I use AGGRESSION=12 with this kernel and get ~75% utilization on 1 core. Not good, but could be worse Cheesy!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 03, 2012, 10:43:52 AM
 #27

download current version:
http://www.filedropper.com/diakgcn03-02-2012_1

This release fixes the bugged VECTORS4 code, which works again (tested on 7970 and 6550D) and could speedup things for VLIW4 / VLIW5 GPUs with WORKSIZE=128, just try it. There are no further changes for GCN in conjunction with VECTORS2 since 03-02-2012.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
blissfulyoshi
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
February 03, 2012, 04:40:39 PM
 #28

More testing!!!!!

2.5
DiakGCN VECTORS WORKSIZE=128: 247MHps
DiakGCN VECTORS2 WORKSIZE=128: 280-281MHps
DiakGCN VECTORS4 WORKSIZE=128: 284MHps....It looks like your old Phoenix kernal is finally beaten for me. Now just need to surpass my cgminer scores of 290MHps xD

CPU at 25-30% on my C2D.

Increases all across the board. Congratz!
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 04, 2012, 02:12:41 PM
 #29

Just tried your newer kernel and just about crapped myself.  I'm seeing some very competitive numbers with phatk2 now and I love the verbosity.  I see you decided to move the (u) values from the bitselect.  Did that help to speed things along?  I figured that if BFI_INT didn't have them, there was a major difference in something and one of them had to be slower.
I like how you used the global offset to your advantage.  (GOFFSET)
I am impressed.  You've been busy and I can see why.  If I was capable of hashing faster, I would totally send you some coin for your efforts.  Given a few months, I should.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 04, 2012, 02:36:03 PM
 #30

Also, if I may, it doesn't look like uu needs to be set for GOFFSET as base doesn't appear to even be used.  I'm guessing that was your intention in the first place.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
February 04, 2012, 04:32:12 PM
 #31

I get ~10Mhash/s more on my 5870 using:

Code:
VECTORS4 AGGRESSION=6 WORKSIZE=128 BFI_INT FASTLOOP

with this kernel for regular desktop usage.
Compared to my old phatk2 line:

Code:
VECTORS AGGRESSION=6 WORKSIZE=128 BFI_INT FASTLOOP


Though, this kernel doesn't seem to help my higher-aggression-set card(also a 5870) in my crossfire setup compared to your 2011-12-21 phatk_dia.
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 04, 2012, 04:52:20 PM
 #32

I get ~10Mhash/s more on my 5870 using:

Code:
VECTORS4 AGGRESSION=6 WORKSIZE=128 BFI_INT FASTLOOP

with this kernel for regular desktop usage.
Compared to my old phatk2 line:

Code:
VECTORS AGGRESSION=6 WORKSIZE=128 BFI_INT FASTLOOP


Though, this kernel doesn't seem to help my higher-aggression-set card(also a 5870) in my crossfire setup compared to your 2011-12-21 phatk_dia.
What did you get apples to apples?  As in using VECTORS4 with phatk2?

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
February 04, 2012, 05:36:07 PM
 #33

What did you get apples to apples?  As in using VECTORS4 with phatk2?

Oh, crap, significantly more. Guess I should've tried phatk2 with VECTORS4 before...

Edit: Blah, ignore above postings. I've been editing the kernel files.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 04, 2012, 05:57:04 PM
 #34

download current version:
http://www.filedropper.com/diakgcn04-02-2012

This version features uint8 vectors support, which is activated via VECTORS8 switch. This was beneficial on my VLIW5 6550D, but is pretty slow on GCN. Another switch GOFFSET was added, which can be used to disable the automatic usage of the global offset parameter (use GOFFSET=false to disable global offset). Perhaps it's faster for some to use the old way of generating the nonces in the kernel, so play around with it Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 04, 2012, 05:57:43 PM
 #35

What did you get apples to apples?  As in using VECTORS4 with phatk2?

Oh, crap, significantly more. Guess I should've tried phatk2 with VECTORS4 before...

Edit: Blah, ignore above postings. I've been editing the kernel files.

Would you mind to try the VECTORS8 version and report back?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
February 04, 2012, 06:45:40 PM
 #36

Would you mind to try the VECTORS8 version and report back?

Dia

I'm using Catalyst 12.1, 875/1225 clocks, same manufacturer/model 5870s on Windows 7.

https://bitcointalk.org/index.php?topic=6458.msg718648#msg718648 kernel:
Code:
VECTORS4 FASTLOOP=false AGGRESSION=10 WORKSIZE=128 BFI_INT

Max: ~400Mhash/s

Code:
VECTORS4 AGGRESSION=6 WORKSIZE=128 BFI_INT FASTLOOP

Max: ~390Mhas/s


Your new diakcgn kernel:

Code:
VECTORS8 FASTLOOP=false AGGRESSION=10 WORKSIZE=128 BFI_INT

Max: ~354Mhash/s

Code:
VECTORS8 AGGRESSION=6 WORKSIZE=128 BFI_INT FASTLOOP

Max: ~352Mhash/s
blissfulyoshi
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
February 04, 2012, 06:56:07 PM
 #37

I think that posting sytle looks nice, I'll copy.

2.5 (6870 on 11.11)
Code:
VECTORS4 DEVICE=0 BFI_INT AGGRESSION=12 WORKSIZE=128
Average: 284Mhash/s

Code:
VECTORS8 DEVICE=0 BFI_INT AGGRESSION=12 WORKSIZE=128
Average: 284Mhash/s

2.6 (6870 on 11.12, 50MHz slower on the GPU clock than the one on 2.5)
Code:
VECTORS2 DEVICE=0 BFI_INT AGGRESSION=12 WORKSIZE=128
Average: 262Mhash/s

Code:
VECTORS4 DEVICE=0 BFI_INT AGGRESSION=12 WORKSIZE=128
Average: 275Mhash/s

Code:
VECTORS8 DEVICE=0 BFI_INT AGGRESSION=12 WORKSIZE=128
Average: 268Mhash/s
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 05, 2012, 03:39:41 AM
 #38

VECTORS4 WORKSIZE=128 with GOFFSET=false 14.45 Mhash/s
VECTORS4 WORKSIZE=128 without GOFFSET=false 14.46 Mhash/s
VECTORS8 WORKSIZE=128 with GOFFSET=false 14.46 Mhash/s
VECTORS8 WORKSIZE=128 without GOFFSET=false 14.47 Mhash/s

VECTORS4 WORKSIZE=64 with GOFFSET=false 14.49 Mhash/s
VECTORS4 WORKSIZE=64 without GOFFSET=false 14.50 Mhash/s
VECTORS8 WORKSIZE=64 with GOFFSET=false 14.55 Mhash/s
VECTORS8 WORKSIZE=64 without GOFFSET=false 14.50 Mhash/s

VECTORS4 WORKSIZE=32 with GOFFSET=false 14.46 Mhash/s
VECTORS4 WORKSIZE=32 without GOFFSET=false 14.47 Mhash/s
VECTORS8 WORKSIZE=32 with GOFFSET=false 14.50 Mhash/s
VECTORS8 WORKSIZE=32 without GOFFSET=false 14.48 Mhash/s

*High fives*  Playing around with VECTORS8 has done some good.  ^_^  And hardly anyone believed me that using 256-byte integers could pay-off.
I'm going to "try" to do something with the nonce code in phatk2 by copying the nonce code from your kernel and see what happens.  I really wouldn't have known how to do it without you.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 05, 2012, 09:17:36 AM
 #39

Just so you know, the ATI cards are capable of handling up to 16 vectors that I'm aware of.  I'm not going to try this right now, but it'll supposedly cut-down on the amount of work that's required to be done.  Higher-end cards will, of course, see better results than lower-end ones.  I don't know what the physical computing size is for the data, but it'll handle int16 which should be best for dedicated rigs as long as the worksize is dropped to about half of the hardware's limit from what I see here.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 05, 2012, 04:44:38 PM
 #40

Just so you know, the ATI cards are capable of handling up to 16 vectors that I'm aware of.  I'm not going to try this right now, but it'll supposedly cut-down on the amount of work that's required to be done.  Higher-end cards will, of course, see better results than lower-end ones.  I don't know what the physical computing size is for the data, but it'll handle int16 which should be best for dedicated rigs as long as the worksize is dropped to about half of the hardware's limit from what I see here.

I could implement uint16, should be pretty straight forward, but massive vectorisation is really something GCN does not like currently.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Pages: « 1 [2] 3 4 5 6 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!