Bitcoin Forum
May 12, 2024, 12:49:04 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Regarding the theoretical maximum Performance of GPUs  (Read 990 times)
ElectricMucus (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 1057


Marketing manager - GO MP


View Profile WWW
August 20, 2011, 04:59:37 PM
 #1

I am interested in how effective the ALUs on a card can be utilized and did some calculations:

Considering the the SHA256 loop there are the following things in there:

Operations:
1 not, 5 and, 7 xor
6 rotations by 2, 13, 22 and 6, 11, 25

makes 19

32 Bit words. (register access)
5 A, 2 B, 5 E, 1 F, 1, G

makes 14

4 additions, 2 LUT accesses
8 Memory accesses, 2 extra additions

makes 16
---------
49 total
run 64 times
------
3136 cylces


5970 with 3200 ALUs:

3200*725/3136 = 739.795918 mhash

Is this calculation correct or is there more/less done on the gpu?

Because according to this the code utilization would be nearly optimal which makes claims of awesome optimizations dubious, (ArtForz entry on the wiki for ex...)
1715474944
Hero Member
*
Offline Offline

Posts: 1715474944

View Profile Personal Message (Offline)

Ignore
1715474944
Reply with quote  #2

1715474944
Report to moderator
1715474944
Hero Member
*
Offline Offline

Posts: 1715474944

View Profile Personal Message (Offline)

Ignore
1715474944
Reply with quote  #2

1715474944
Report to moderator
Even if you use Bitcoin through Tor, the way transactions are handled by the network makes anonymity difficult to achieve. Do not expect your transactions to be anonymous unless you really know what you're doing.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715474944
Hero Member
*
Offline Offline

Posts: 1715474944

View Profile Personal Message (Offline)

Ignore
1715474944
Reply with quote  #2

1715474944
Report to moderator
CanaryInTheMine
Donator
Legendary
*
Offline Offline

Activity: 2352
Merit: 1060


between a rock and a block!


View Profile
August 20, 2011, 06:24:27 PM
 #2

take a look at:

https://bitcointalk.org/index.php?topic=33817.0
ElectricMucus (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 1057


Marketing manager - GO MP


View Profile WWW
August 20, 2011, 06:46:48 PM
 #3

Thanks, thats nearly the same result, obviously I forgot some things architecture specific to the cards.  Cool
ArtForz
Sr. Member
****
Offline Offline

Activity: 406
Merit: 257


View Profile
August 20, 2011, 06:59:55 PM
 #4

You realize a bitcoinhash is *2* sha256 blocks operations, right?
Well, not exactly 2 thanks to some optimizations possible
you can drop the last 3 rounds completely (they don't change H), and lose part of the previous round (you only need the E output of the 4th-to-last)
Initial rounds can be optimized as well, as the last DWORD of hMerkleRoot and nTime/nBits don't change between loops, so you can drop the equivalent of ~3 rounds there as well.
Same thing goes for optimizing/precalculating parts of the W mangling, as we're feeding in quite a bit of constants.
Register access is basically free on GPUs (they mask reg r/w by pipelining 4 "threads" on the shader pipeline).
Ch() can be done in 1 cycle, and Maj() in 2.
Also, what LUT accesses? just hardcode the K constants in the instruction stream.

So while you came up with a somewhat reasonable result, you did so by pure chance using invalid assumptions and numbers.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
ElectricMucus (OP)
Legendary
*
Offline Offline

Activity: 1666
Merit: 1057


Marketing manager - GO MP


View Profile WWW
August 20, 2011, 07:09:54 PM
 #5

Well thanks for the pointers, excuse my noobish rants. I'll be back once I understand what is being said  Kiss
CanaryInTheMine
Donator
Legendary
*
Offline Offline

Activity: 2352
Merit: 1060


between a rock and a block!


View Profile
August 20, 2011, 07:35:07 PM
 #6

You realize a bitcoinhash is *2* sha256 blocks operations, right?
Well, not exactly 2 thanks to some optimizations possible
you can drop the last 3 rounds completely (they don't change H), and lose part of the previous round (you only need the E output of the 4th-to-last)
Initial rounds can be optimized as well, as the last DWORD of hMerkleRoot and nTime/nBits don't change between loops, so you can drop the equivalent of ~3 rounds there as well.
Same thing goes for optimizing/precalculating parts of the W mangling, as we're feeding in quite a bit of constants.
Register access is basically free on GPUs (they mask reg r/w by pipelining 4 "threads" on the shader pipeline).
Ch() can be done in 1 cycle, and Maj() in 2.
Also, what LUT accesses? just hardcode the K constants in the instruction stream.

So while you came up with a somewhat reasonable result, you did so by pure chance using invalid assumptions and numbers.

ArtForz,  you are quite the legend on these forums...  Glad to see you here! Smiley
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!