Bitcoin Forum
January 22, 2022, 06:26:35 AM *
News: Latest Bitcoin Core release: 22.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Theoretical minimum # of logic operations to perform double iterated SHA256?  (Read 3855 times)
Geremia
Sr. Member
****
Offline Offline

Activity: 481
Merit: 251


View Profile WWW
April 18, 2015, 10:55:55 PM
Merited by ETFbitcoin (1)
 #1

What is the theoretical minimum number of logical operations an ASIC needs to perform to compute double iterated SHA256, i.e., sha(sha(•))?

(cf. the Bitcoin StackExchange question)

BTC tip jar | my BTC wiki, BTC StackExchange | Tox ID: 65C3E8810738AD9D175234808FCB317A1103632903436203D45411AE97C03F54C34861AB6663
Join Kraken. | The best, free book on Bitcoin: Mastering Bitcoin
Nos cum prole pia benedicat Virgo Maria.
1642832795
Hero Member
*
Offline Offline

Posts: 1642832795

View Profile Personal Message (Offline)

Ignore
1642832795
Reply with quote  #2

1642832795
Report to moderator
1642832795
Hero Member
*
Offline Offline

Posts: 1642832795

View Profile Personal Message (Offline)

Ignore
1642832795
Reply with quote  #2

1642832795
Report to moderator
1642832795
Hero Member
*
Offline Offline

Posts: 1642832795

View Profile Personal Message (Offline)

Ignore
1642832795
Reply with quote  #2

1642832795
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1642832795
Hero Member
*
Offline Offline

Posts: 1642832795

View Profile Personal Message (Offline)

Ignore
1642832795
Reply with quote  #2

1642832795
Report to moderator
1642832795
Hero Member
*
Offline Offline

Posts: 1642832795

View Profile Personal Message (Offline)

Ignore
1642832795
Reply with quote  #2

1642832795
Report to moderator
Cryddit
Legendary
*
Offline Offline

Activity: 924
Merit: 1071


View Profile
April 19, 2015, 04:04:13 AM
 #2

SHA256 is sixty-four rounds comprising

384 32-bit additions (6 per round)
320 32-bit ORs (5 per round)
448 32-bit XORs (7 per round)

And a bunch of bit shifts, but bit shifts are free on an ASIC.  

SHA256D, which is what Bitcoin uses, is 128 rounds, comprising

768 additions,
640 ORs
896 XORs

And a bunch of bit shifts but bit shifts are free on an ASIC.

SHA256D is an interesting choice, actually; usually you don't see it except in a context where someone is worried about an extension attack - which doesn't really apply to the way Bitcoin uses it.
Geremia
Sr. Member
****
Offline Offline

Activity: 481
Merit: 251


View Profile WWW
April 19, 2015, 05:26:24 AM
 #3

SHA256D, which is what Bitcoin uses
SHA256D = double iterated SHA256?

BTC tip jar | my BTC wiki, BTC StackExchange | Tox ID: 65C3E8810738AD9D175234808FCB317A1103632903436203D45411AE97C03F54C34861AB6663
Join Kraken. | The best, free book on Bitcoin: Mastering Bitcoin
Nos cum prole pia benedicat Virgo Maria.
Geremia
Sr. Member
****
Offline Offline

Activity: 481
Merit: 251


View Profile WWW
April 19, 2015, 05:28:47 AM
 #4

And a bunch of bit shifts
How many, exactly?
but bit shifts are free on an ASIC.
What do you mean they "are free on an ASIC"?

BTC tip jar | my BTC wiki, BTC StackExchange | Tox ID: 65C3E8810738AD9D175234808FCB317A1103632903436203D45411AE97C03F54C34861AB6663
Join Kraken. | The best, free book on Bitcoin: Mastering Bitcoin
Nos cum prole pia benedicat Virgo Maria.
Cryddit
Legendary
*
Offline Offline

Activity: 924
Merit: 1071


View Profile
April 19, 2015, 05:35:36 AM
 #5

And a bunch of bit shifts
How many, exactly?
but bit shifts are free on an ASIC.
What do you mean they "are free on an ASIC"?

512 32-bit shifts for SHA256, 1024 for SHA256D. 8 per round.

But on an ASIC, Bit shifts are not logic operations.  Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates.

And yes, SHA256D is SHA256D(oubled). 


Geremia
Sr. Member
****
Offline Offline

Activity: 481
Merit: 251


View Profile WWW
April 19, 2015, 06:05:43 AM
 #6

But on an ASIC, Bit shifts are not logic operations.  Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates.
I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"?

BTC tip jar | my BTC wiki, BTC StackExchange | Tox ID: 65C3E8810738AD9D175234808FCB317A1103632903436203D45411AE97C03F54C34861AB6663
Join Kraken. | The best, free book on Bitcoin: Mastering Bitcoin
Nos cum prole pia benedicat Virgo Maria.
Geremia
Sr. Member
****
Offline Offline

Activity: 481
Merit: 251


View Profile WWW
April 19, 2015, 06:35:39 AM
 #7

The reason I ask is because I'd like to place a lower limit on the thermal expenditure of ASICs, using Landauer's principle.

So, for SHA256D, I gather there cannot be fewer than 73,728 bits (=32*(768+640+896)) written or erased per hash.
(assuming each addition, XOR, and OR operation involves no more than 32 bits written or erased)

So,
73,728 k T ln(2) = 2.29×10¯¹⁶ Joules,

assuming Boltzmann's constant k = 1.380 6504(24)×10¯²³ J K¯¹ and the circuit is at 324 K (which is the average of my ASIC).

If I'm hashing at 1.1 Thash/s, for example, I get that the power dissipated/required should be:

0.251 milliWatts

So, even the most efficient ASICs like the S5 are many orders of magnitude away from being as efficient as they could be. (Computers nowadays dissipate/require on the order of at least 500× the Landauer limit per elementary logic operation.)

BTC tip jar | my BTC wiki, BTC StackExchange | Tox ID: 65C3E8810738AD9D175234808FCB317A1103632903436203D45411AE97C03F54C34861AB6663
Join Kraken. | The best, free book on Bitcoin: Mastering Bitcoin
Nos cum prole pia benedicat Virgo Maria.
shorena
Copper Member
Legendary
*
Offline Offline

Activity: 1498
Merit: 1411


No I dont escrow anymore.


View Profile WWW
April 19, 2015, 08:26:56 AM
 #8

But on an ASIC, Bit shifts are not logic operations.  Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates.
I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"?

Think of it like you would solder this with huge boxes[2] that contain your logic operations. Each of theses boxes has one or more inputs and outputs. E.g. an AND[1] would have 2 inputs and 1 output. You can consider the circuit traces as the cables you would use to connect the different boxes with eachother. A shift would be a cable that is not going straight, but a little to the left (or right) to use a different input. Leftmost (or rightmost) cable would cross all other cable to use the rightmost (or leftmost) input.


[1] sometimes pictures are better than just words: -> http://www.homofaciens.de/bilder/technik/logic-gates_008.gif

[2] http://www.jaurich-online.de/ebay/ojau/oj1359.jpg

Im not really here, its just your imagination.
InceptionCoin
Member
**
Offline Offline

Activity: 108
Merit: 10


View Profile
April 19, 2015, 09:24:08 AM
 #9

But on an ASIC, Bit shifts are not logic operations.  Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates.
I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"?

I will try to explain it simplier than shorena:
imagine you have number 10 in 8bit integer, it means
0 0 0 0 1 0 1 0
Now, you need to shift right it, to get
0 0 0 0 0 1 0 1
In the cpu register you will put number from first bit  to second bit, from second to third etc, but in asic you can connect(actualy you shouldn't but nevermind)  registers like here:
Code:
_ _ _ _ _ _ _ _
 \ \ \ \ \ \ \
— — — — — — — —
And you get shifted number without replacing bits.

Skilled C++ and Python programmer. Looking around to create solid longterm coin by myself. Do you have any ideas? Feel free to PM me.
Meni Rosenfeld
Donator
Legendary
*
expert
Offline Offline

Activity: 2058
Merit: 1027



View Profile WWW
April 19, 2015, 11:27:52 AM
 #10

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

1EofoZNBhWQ3kxfKnvWkhtMns4AivZArhr   |   Who am I?   |   bitcoin-otc WoT
Bitcoil - Exchange bitcoins for ILS (thread)   |   Israel Bitcoin community homepage (thread)
Analysis of Bitcoin Pooled Mining Reward Systems (thread, summary)  |   PureMining - Infinite-term, deterministic mining bond
btchris
Hero Member
*****
Offline Offline

Activity: 672
Merit: 503

a.k.a. gurnec on GitHub


View Profile WWW
April 19, 2015, 03:14:43 PM
 #11

SHA256 is sixty-four rounds comprising

384 32-bit additions (6 per round)
320 32-bit ORs (5 per round)
448 32-bit XORs (7 per round)

And a bunch of bit shifts, but bit shifts are free on an ASIC.  

Are you sure of those numbers? For example in a naive implementation, I'm counting 48*3 + 64*7 + 8 = 600 additions for a single SHA-256 block. There could be better ways of doing SHA-256 that don't naively follow the standard though... I wouldn't know....
Cryddit
Legendary
*
Offline Offline

Activity: 924
Merit: 1071


View Profile
April 19, 2015, 06:21:04 PM
 #12

SHA256 is sixty-four rounds comprising

384 32-bit additions (6 per round)
320 32-bit ORs (5 per round)
448 32-bit XORs (7 per round)

And a bunch of bit shifts, but bit shifts are free on an ASIC.  

Are you sure of those numbers? For example in a naive implementation, I'm counting 48*3 + 64*7 + 8 = 600 additions for a single SHA-256 block. There could be better ways of doing SHA-256 that don't naively follow the standard though... I wouldn't know....

Here's one round of SHA256, expressed as a pseudo-circuit diagram:

where the red pluses indicate 32-bit addition and the dark-blue boxes are defined as
,
,
,

(and of course the plus inside the circle is xor).

It looks like the implementation you linked to is using extra additions to feed into Ch and Ma - which is an artifact of not being able to directly split the circuit traces that come off the EFG and ABC registers respectively.  It, or something very like it, is probably the best you can do in software. 
btchris
Hero Member
*****
Offline Offline

Activity: 672
Merit: 503

a.k.a. gurnec on GitHub


View Profile WWW
April 19, 2015, 11:45:24 PM
 #13

48*3 + 64*7 + 8 = 600 additions for a single SHA-256 block
<clip>
Here's one round of SHA256, expressed as a pseudo-circuit diagram:

Thanks, that's a great diagram.

Please excuse my ignorance, but does the plus to the right of Ch have three inputs, and is this considered a single operation on an ASIC? I think that's the 6 vs 7 additions per round discrepancy.

Also, should the message expansion step be included (creating Wt, the 48*3 from above)?
Should the final hash value creation be included (the 8 from above)?
altcoinex
Sr. Member
****
Offline Offline

Activity: 293
Merit: 250


Director - www.cubeform.io


View Profile WWW
April 20, 2015, 12:40:04 AM
 #14

Wanted to throw in my tip of the hat to Cryddit as well for such a well detailed response.


                                     ╓╢╬╣╣╖
                                   ┌║██████║∩
                                   ]█████████
                                    ╜██████╝`
                                      ╙╜╜╜`
                                   ╓╥@@@@@@╥╓
         ╓╖@@╖,                 ,@║██████████╢@,                 ,╓@@╖╓
       ╓╢██████╢.              ╓╢███████████████╖               ║╢█████║╓
       ║█████████    ,,╓╓,,   ┌║█████████████████┐   ,,╓╓,,    ]█████████
       └╢██████║` ╓╢║██████╢║∩``╙╙╙╙╙╙╙╙╙╙╙╙╙╙╙╙╙`»╢╢██████╢║╖  ║███████╜
         "╜╜╜╜` ╖╢█████████╣╜                      └╢██████████@ `╜╜╜╜╜
               ║██████████╜                          ╙╢██████████
              ┌█████████╜                              ╙╢█████████
              └███████╨`                                 ╜████████
               ║████╨╜                                    `╢█████
                ╙╢╣╜                                        └╢█╜
                ,,                                            ,,
             ╓@║██┐                                          ┌██║@╓
            ╢██████                                          ]█████H
           ╢███████∩                                        ┌████████
  ╓@@@@╓   █████████                                        ║████████`  ╓@@@@╖
╓╢██████║. █████████∩                                      ┌█████████ ,║███████╖
██████████ └█████████                                      ██████████ ]█████████
`║██████╜`  └╢████████                                    ┌███████╣╜   ╙██████╨`
  `╙╜╜╙`      `╙╨╢████                                    █████╝╜`       `╙╜╜`
                      ]@╓                              ╓╖H
                      ███╢║@╓,                    ,╓@╢╢███`
                      ████████╢@╖╓.           ╓╖@║████████`
                      ]███████████╢║@╓,  ,╓@╢╢████████████
                       ╙╢█████████████╨` ╜██████████████╜
                         ╙╝╢███████║╜`    `╜║████████╝╜`
                     ,╓@@@╓  `²╙``             `╙²`  ╓@@@╖,
                    ║╢█████╢H                      ╓╢██████H
                    █████████                      █████████`
                    ╙╢██████╜                      ╙╢██████╜
                      └╨╩╝┘                          └╨╩╝╜
WINFLOW.
██
██
██
██
██
██
██
██
██
██
██
██
██
..
██
██
██
██
██
██
██
██
██
██
██
██
██
.
Cryddit
Legendary
*
Offline Offline

Activity: 924
Merit: 1071


View Profile
April 20, 2015, 03:39:27 AM
 #15


Please excuse my ignorance, but does the plus to the right of Ch have three inputs, and is this considered a single operation on an ASIC? I think that's the 6 vs 7 additions per round discrepancy.

Yes, I think that's probably it.  And yes, you can construct a circuit to do an addition of three values in one step on an ASIC.  And no, this is not you being ignorant, this is fairly obscure stuff.  ASICs follow their own weird set of rules and it's not quite the same ones that software follows.

Also, should the message expansion step be included (creating Wt, the 48*3 from above)?
Should the final hash value creation be included (the 8 from above)?

I believe that the message expansion step can be a near-NOP like bitshifting on an ASIC; you are right about the final hash value though, so I was off by eight.

Sergio_Demian_Lerner
Hero Member
*****
expert
Offline Offline

Activity: 549
Merit: 575


View Profile WWW
April 20, 2015, 04:03:59 AM
 #16

What is the theoretical minimum number of logical operations an ASIC needs to perform to compute double iterated SHA256, i.e., sha(sha(•))?

(cf. the Bitcoin StackExchange question)

Cryddit gave an estimation on the number of standard gate building blocks required for a Bitcoin ASIC (adders, logic gates)
However, adders require more space than OR gates, so generally the number of gates will be dominated by the number of adders. Also adders can be implemented in several ways,  with different delay/space trade-offs, so even if there could be a theoretical minimum number of gates, practically all implementations would use much more to reduce the delay.

More interesting, you can:

- Compute SHA^2 approximately, and get a better practically good SHA^2 ASIC for mining.
See https://bitslog.wordpress.com/2015/02/17/faster-sha-256-asics-using-carry-reduced-adders.

- Compute SHA^2 asynchronously (e.g. using asynchronous adders)

Last, it has not been proven that performing a complete SHA^2 evaluation is required on average to check that a changing header has a SHA^2 hash that is below the target value. In fact, several widely known optimizations have disprove it.
Geremia
Sr. Member
****
Offline Offline

Activity: 481
Merit: 251


View Profile WWW
April 20, 2015, 04:00:45 PM
 #17

The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.
It has been experimentally verified:

Bérut, Antoine, Artak Arakelyan, Artyom Petrosyan, Sergio Ciliberto, Raoul Dillenschneider, and Eric Lutz. “Experimental Verification of Landauer/’s Principle Linking Information and Thermodynamics.” Nature 483, no. 7388 (March 8, 2012): 187–89. doi:10.1038/nature10872. (non-paywalled version)

BTC tip jar | my BTC wiki, BTC StackExchange | Tox ID: 65C3E8810738AD9D175234808FCB317A1103632903436203D45411AE97C03F54C34861AB6663
Join Kraken. | The best, free book on Bitcoin: Mastering Bitcoin
Nos cum prole pia benedicat Virgo Maria.
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
April 20, 2015, 04:33:26 PM
 #18

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

Interesting.  Let's imagine that we made a circuit to test nonces for bitcoin mining using reversible gates.

Code:
       ______________
      -|            |-
      -|            |-
 i    -|            |-   o
 n    -|            |-   u
 p    -|  circuit   |-   t
 u    -|            |-   p
 t    -|            |-   u
      -|            |-   t
      -|            |-
       --------------

Assume that the input to the circuit is the blockheader which consists of 608 bits + the 32-bit nonce.  So we need 640 wires (bits) coming into our circuit at the input side.  

At the output, all we really care about is a yea or nay on whether the nonce satisfies the difficulty target, which could be represented by a single bit.  But that's not reversible because from one bit we can't go backwards and reproduce the 640-bit input.  To make the computation reversible, our circuit must also have at least 640 bits coming out of it.  
 
To use the circuit in practice, we apply our first nonce to the input, wait for the output to settle, and determine if the difficulty target was satisfied.  The answer is probably NO...  

So, now comes the irreversible part.  We need test another nonce, which means we must flip at least one bit in our input (the input includes the nonce).  So, as per Landauer's principle, this costs us energy1

   E > kT ln 2.

We have to do this many, many times until we find a nonce that satisfies the difficult target, expending energy at each step.

Another interesting property of our reversible circuit above is that it only performs the computation for infinitesimal energy input if we're willing to wait an infinitely long time to observe the output.  In general, even for reversible gates, there is an energy-time tradeoff2 for performing a specific computation:

   (energy consumed)*(computation time) > some constant

A successful Bitcoin miner is concerned with more than finding the correct nonce with the least energy expenditure.  He also wants to find it quickly!  And this will always cost him energy, regardless of whether he use reversible or irreversible gates.  


1If this causes M bits to get flipped at the output, does this then mean the circuit required an energy input E > M kT ln 2 to test the next nonce?  I'm not sure...

2Discovered by Charles Bennet (working at IBM with Rolf Landauer), refer to Feynman Lectures on Computation, Chapter 5.

Run Bitcoin Unlimited (www.bitcoinunlimited.info)
tl121
Sr. Member
****
Offline Offline

Activity: 278
Merit: 251


View Profile
April 20, 2015, 05:30:45 PM
 #19

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

Interesting.  Let's imagine that we made a circuit to test nonces for bitcoin mining using reversible gates.

Code:
       ______________
      -|            |-
      -|            |-
 i    -|            |-   o
 n    -|            |-   u
 p    -|  circuit   |-   t
 u    -|            |-   p
 t    -|            |-   u
      -|            |-   t
      -|            |-
       --------------

Assume that the input to the circuit is the blockheader which consists of 608 bits + the 32-bit nonce.  So we need 640 wires (bits) coming into our circuit at the input side.  

At the output, all we really care about is a yea or nay on whether the nonce satisfies the difficulty target, which could be represented by a single bit.  But that's not reversible because from one bit we can't go backwards and reproduce the 640-bit input.  To make the computation reversible, our circuit must also have at least 640 bits coming out of it.  
 
To use the circuit in practice, we apply our first nonce to the input, wait for the output to settle, and determine if the difficulty target was satisfied.  The answer is probably NO...  

So, now comes the irreversible part.  We need test another nonce, which means we must flip at least one bit in our input (the input includes the nonce).  So, as per Landauer's principle, this costs us energy1

   E > kT ln 2.

We have to do this many, many times until we find a nonce that satisfies the difficult target, expending energy at each step.

Another interesting property of our reversible circuit above is that it only performs the computation for infinitesimal energy input if we're willing to wait an infinitely long time to observe the output.  In general, even for reversible gates, there is an energy-time tradeoff2 for performing a specific computation:

   (energy consumed)*(computation time) > some constant

A successful Bitcoin miner is concerned with more than finding the correct nonce with the least energy expenditure.  He also wants to find it quickly!  And this will always cost him energy, regardless of whether he use reversible or irreversible gates.  


1If this causes M bits to get flipped at the output, does this then mean the circuit required an energy input E > M kT ln 2 to test the next nonce?  I'm not sure...

2Discovered by Charles Bennet (working at IBM with Rolf Landauer), refer to Feynman Lectures on Computation, Chapter 5.

There is no need to output lots of nonce values for unsuccessful searches.  For example, one could build a reversible engine that would fully search a defined range and output a single bit that indicates whether the range contains a solution. This information could be used in a binary search to zero in on an exact solution.  Alternatively, the identity of promising ranges could be passed on to a conventional hash engine  which would search only guaranteed successful ranges.

Engineering details await detailed cost/performance specs for the "unobtanium" reversible devices.  Smiley

solex
Legendary
*
Offline Offline

Activity: 1078
Merit: 1000


100 satoshis -> ISO code


View Profile
April 20, 2015, 07:16:33 PM
Last edit: April 20, 2015, 07:39:30 PM by solex
 #20

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

Indeed. It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology. I have been looking forward to seeing informed comment on it.

When I suggested this on reddit the idea got shot down.
http://www.reddit.com/r/Bitcoin/comments/22tegd/do_we_have_an_idea_of_the_power_in_watt_necessary/cgq6ydq

Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!