Geremia (OP)


April 18, 2015, 10:55:55 PM 

What is the theoretical minimum number of logical operations an ASIC needs to perform to compute double iterated SHA256, i.e., sha(sha(•))? (cf. the Bitcoin StackExchange question)




Cryddit
Legendary
Offline
Activity: 924
Merit: 1129


April 19, 2015, 04:04:13 AM 

SHA256 is sixtyfour rounds comprising
384 32bit additions (6 per round) 320 32bit ORs (5 per round) 448 32bit XORs (7 per round)
And a bunch of bit shifts, but bit shifts are free on an ASIC.
SHA256D, which is what Bitcoin uses, is 128 rounds, comprising
768 additions, 640 ORs 896 XORs
And a bunch of bit shifts but bit shifts are free on an ASIC.
SHA256D is an interesting choice, actually; usually you don't see it except in a context where someone is worried about an extension attack  which doesn't really apply to the way Bitcoin uses it.




Geremia (OP)


April 19, 2015, 05:26:24 AM 

SHA256D, which is what Bitcoin uses SHA256D = double iterated SHA256?




Geremia (OP)


April 19, 2015, 05:28:47 AM 

And a bunch of bit shifts How many, exactly? but bit shifts are free on an ASIC. What do you mean they "are free on an ASIC"?




Cryddit
Legendary
Offline
Activity: 924
Merit: 1129


April 19, 2015, 05:35:36 AM 

And a bunch of bit shifts How many, exactly? but bit shifts are free on an ASIC. What do you mean they "are free on an ASIC"? 512 32bit shifts for SHA256, 1024 for SHA256D. 8 per round. But on an ASIC, Bit shifts are not logic operations. Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates. And yes, SHA256D is SHA256D(oubled).




Geremia (OP)


April 19, 2015, 06:05:43 AM 

But on an ASIC, Bit shifts are not logic operations. Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates. I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"?




Geremia (OP)


April 19, 2015, 06:35:39 AM 

The reason I ask is because I'd like to place a lower limit on the thermal expenditure of ASICs, using Landauer's principle. So, for SHA256D, I gather there cannot be fewer than 73,728 bits (=32*(768+640+896)) written or erased per hash. (assuming each addition, XOR, and OR operation involves no more than 32 bits written or erased) So, 73,728 k T ln(2) = 2.29×10¯¹⁶ Joules, assuming Boltzmann's constant k = 1.380 6504(24)×10¯²³ J K¯¹ and the circuit is at 324 K (which is the average of my ASIC). If I'm hashing at 1.1 Thash/s, for example, I get that the power dissipated/required should be: 0.251 milliWatts So, even the most efficient ASICs like the S5 are many orders of magnitude away from being as efficient as they could be. (Computers nowadays dissipate/require on the order of at least 500× the Landauer limit per elementary logic operation.)




shorena
Copper Member
Legendary
Offline
Activity: 1498
Merit: 1520
No I dont escrow anymore.


April 19, 2015, 08:26:56 AM 

But on an ASIC, Bit shifts are not logic operations. Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates. I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"? Think of it like you would solder this with huge boxes[2] that contain your logic operations. Each of theses boxes has one or more inputs and outputs. E.g. an AND[1] would have 2 inputs and 1 output. You can consider the circuit traces as the cables you would use to connect the different boxes with eachother. A shift would be a cable that is not going straight, but a little to the left (or right) to use a different input. Leftmost (or rightmost) cable would cross all other cable to use the rightmost (or leftmost) input. [1] sometimes pictures are better than just words: > http://www.homofaciens.de/bilder/technik/logicgates_008.gif[2] http://www.jaurichonline.de/ebay/ojau/oj1359.jpg

Im not really here, its just your imagination.



InceptionCoin
Member
Offline
Activity: 108
Merit: 10


April 19, 2015, 09:24:08 AM 

But on an ASIC, Bit shifts are not logic operations. Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates. I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"? I will try to explain it simplier than shorena: imagine you have number 10 in 8bit integer, it means 0 0 0 0 1 0 1 0 Now, you need to shift right it, to get 0 0 0 0 0 1 0 1 In the cpu register you will put number from first bit to second bit, from second to third etc, but in asic you can connect(actualy you shouldn't but nevermind) registers like here: _ _ _ _ _ _ _ _ \ \ \ \ \ \ \ — — — — — — — —
And you get shifted number without replacing bits.

Skilled C++ and Python programmer. Looking around to create solid longterm coin by myself. Do you have any ideas? Feel free to PM me.



Meni Rosenfeld
Donator
Legendary
Offline
Activity: 2058
Merit: 1054


April 19, 2015, 11:27:52 AM 

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.




btchris


April 19, 2015, 03:14:43 PM 

SHA256 is sixtyfour rounds comprising
384 32bit additions (6 per round) 320 32bit ORs (5 per round) 448 32bit XORs (7 per round)
And a bunch of bit shifts, but bit shifts are free on an ASIC.
Are you sure of those numbers? For example in a naive implementation, I'm counting 48*3 + 64*7 + 8 = 600 additions for a single SHA256 block. There could be better ways of doing SHA256 that don't naively follow the standard though... I wouldn't know....




Cryddit
Legendary
Offline
Activity: 924
Merit: 1129


April 19, 2015, 06:21:04 PM 

SHA256 is sixtyfour rounds comprising
384 32bit additions (6 per round) 320 32bit ORs (5 per round) 448 32bit XORs (7 per round)
And a bunch of bit shifts, but bit shifts are free on an ASIC.
Are you sure of those numbers? For example in a naive implementation, I'm counting 48*3 + 64*7 + 8 = 600 additions for a single SHA256 block. There could be better ways of doing SHA256 that don't naively follow the standard though... I wouldn't know.... Here's one round of SHA256, expressed as a pseudocircuit diagram: where the red pluses indicate 32bit addition and the darkblue boxes are defined as , , , (and of course the plus inside the circle is xor). It looks like the implementation you linked to is using extra additions to feed into Ch and Ma  which is an artifact of not being able to directly split the circuit traces that come off the EFG and ABC registers respectively. It, or something very like it, is probably the best you can do in software.




btchris


April 19, 2015, 11:45:24 PM 

48*3 + 64*7 + 8 = 600 additions for a single SHA256 block <clip> Here's one round of SHA256, expressed as a pseudocircuit diagram: Thanks, that's a great diagram. Please excuse my ignorance, but does the plus to the right of Ch have three inputs, and is this considered a single operation on an ASIC? I think that's the 6 vs 7 additions per round discrepancy. Also, should the message expansion step be included (creating W _{t}, the 48*3 from above)? Should the final hash value creation be included (the 8 from above)?




altcoinex


April 20, 2015, 12:40:04 AM 

Wanted to throw in my tip of the hat to Cryddit as well for such a well detailed response.

╓╢╬╣╣╖ ┌║██████║∩ ]█████████ ╜██████╝` ╙╜╜╜` ╓╥@@@@@@╥╓ ╓╖@@╖, ,@║██████████╢@, ,╓@@╖╓ ╓╢██████╢. ╓╢███████████████╖ ║╢█████║╓ ║█████████ ,,╓╓,, ┌║█████████████████┐ ,,╓╓,, ]█████████ └╢██████║` ╓╢║██████╢║∩``╙╙╙╙╙╙╙╙╙╙╙╙╙╙╙╙╙`»╢╢██████╢║╖ ║███████╜ "╜╜╜╜` ╖╢█████████╣╜ └╢██████████@ `╜╜╜╜╜ ║██████████╜ ╙╢██████████ ┌█████████╜ ╙╢█████████ └███████╨` ╜████████ ║████╨╜ `╢█████ ╙╢╣╜ └╢█╜ ,, ,, ╓@║██┐ ┌██║@╓ ╢██████ ]█████H ╢███████∩ ┌████████ ╓@@@@╓ █████████ ║████████` ╓@@@@╖ ╓╢██████║. █████████∩ ┌█████████ ,║███████╖ ██████████ └█████████ ██████████ ]█████████ `║██████╜` └╢████████ ┌███████╣╜ ╙██████╨` `╙╜╜╙` `╙╨╢████ █████╝╜` `╙╜╜` ]@╓ ╓╖H ███╢║@╓, ,╓@╢╢███` ████████╢@╖╓. ╓╖@║████████` ]███████████╢║@╓, ,╓@╢╢████████████ ╙╢█████████████╨` ╜██████████████╜ ╙╝╢███████║╜` `╜║████████╝╜` ,╓@@@╓ `²╙`` `╙²` ╓@@@╖, ║╢█████╢H ╓╢██████H █████████ █████████` ╙╢██████╜ ╙╢██████╜ └╨╩╝┘ └╨╩╝╜  WINFLOW  .  ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
 .   .  ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
 .  



Cryddit
Legendary
Offline
Activity: 924
Merit: 1129


April 20, 2015, 03:39:27 AM 

Please excuse my ignorance, but does the plus to the right of Ch have three inputs, and is this considered a single operation on an ASIC? I think that's the 6 vs 7 additions per round discrepancy.
Yes, I think that's probably it. And yes, you can construct a circuit to do an addition of three values in one step on an ASIC. And no, this is not you being ignorant, this is fairly obscure stuff. ASICs follow their own weird set of rules and it's not quite the same ones that software follows. Also, should the message expansion step be included (creating W_{t}, the 48*3 from above)? Should the final hash value creation be included (the 8 from above)?
I believe that the message expansion step can be a nearNOP like bitshifting on an ASIC; you are right about the final hash value though, so I was off by eight.




Sergio_Demian_Lerner


April 20, 2015, 04:03:59 AM 

What is the theoretical minimum number of logical operations an ASIC needs to perform to compute double iterated SHA256, i.e., sha(sha(•))? (cf. the Bitcoin StackExchange question) Cryddit gave an estimation on the number of standard gate building blocks required for a Bitcoin ASIC (adders, logic gates) However, adders require more space than OR gates, so generally the number of gates will be dominated by the number of adders. Also adders can be implemented in several ways, with different delay/space tradeoffs, so even if there could be a theoretical minimum number of gates, practically all implementations would use much more to reduce the delay. More interesting, you can:  Compute SHA^2 approximately, and get a better practically good SHA^2 ASIC for mining. See https://bitslog.wordpress.com/2015/02/17/fastersha256asicsusingcarryreducedadders.  Compute SHA^2 asynchronously (e.g. using asynchronous adders) Last, it has not been proven that performing a complete SHA^2 evaluation is required on average to check that a changing header has a SHA^2 hash that is below the target value. In fact, several widely known optimizations have disprove it.




Geremia (OP)


April 20, 2015, 04:00:45 PM 

The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth. It has been experimentally verified: Bérut, Antoine, Artak Arakelyan, Artyom Petrosyan, Sergio Ciliberto, Raoul Dillenschneider, and Eric Lutz. “ Experimental Verification of Landauer/’s Principle Linking Information and Thermodynamics.” Nature 483, no. 7388 (March 8, 2012): 187–89. doi: 10.1038/nature10872. ( nonpaywalled version)




Peter R
Legendary
Offline
Activity: 1162
Merit: 1007


April 20, 2015, 04:33:26 PM 

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth. Interesting. Let's imagine that we made a circuit to test nonces for bitcoin mining using reversible gates. ______________     i   o n   u p  circuit  t u   p t   u   t   
Assume that the input to the circuit is the blockheader which consists of 608 bits + the 32bit nonce. So we need 640 wires (bits) coming into our circuit at the input side. At the output, all we really care about is a yea or nay on whether the nonce satisfies the difficulty target, which could be represented by a single bit. But that's not reversible because from one bit we can't go backwards and reproduce the 640bit input. To make the computation reversible, our circuit must also have at least 640 bits coming out of it. To use the circuit in practice, we apply our first nonce to the input, wait for the output to settle, and determine if the difficulty target was satisfied. The answer is probably NO... So, now comes the irreversible part. We need test another nonce, which means we must flip at least one bit in our input (the input includes the nonce). So, as per Landauer's principle, this costs us energy ^{1} E > kT ln 2. We have to do this many, many times until we find a nonce that satisfies the difficult target, expending energy at each step. Another interesting property of our reversible circuit above is that it only performs the computation for infinitesimal energy input if we're willing to wait an infinitely long time to observe the output. In general, even for reversible gates, there is an energytime tradeoff ^{2} for performing a specific computation: (energy consumed)*(computation time) > some constant A successful Bitcoin miner is concerned with more than finding the correct nonce with the least energy expenditure. He also wants to find it quickly! And this will always cost him energy, regardless of whether he use reversible or irreversible gates. ^{1}If this causes M bits to get flipped at the output, does this then mean the circuit required an energy input E > M kT ln 2 to test the next nonce? I'm not sure...^{2}Discovered by Charles Bennet (working at IBM with Rolf Landauer), refer to Feynman Lectures on Computation, Chapter 5.




tl121


April 20, 2015, 05:30:45 PM 

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth. Interesting. Let's imagine that we made a circuit to test nonces for bitcoin mining using reversible gates. ______________     i   o n   u p  circuit  t u   p t   u   t   
Assume that the input to the circuit is the blockheader which consists of 608 bits + the 32bit nonce. So we need 640 wires (bits) coming into our circuit at the input side. At the output, all we really care about is a yea or nay on whether the nonce satisfies the difficulty target, which could be represented by a single bit. But that's not reversible because from one bit we can't go backwards and reproduce the 640bit input. To make the computation reversible, our circuit must also have at least 640 bits coming out of it. To use the circuit in practice, we apply our first nonce to the input, wait for the output to settle, and determine if the difficulty target was satisfied. The answer is probably NO... So, now comes the irreversible part. We need test another nonce, which means we must flip at least one bit in our input (the input includes the nonce). So, as per Landauer's principle, this costs us energy ^{1} E > kT ln 2. We have to do this many, many times until we find a nonce that satisfies the difficult target, expending energy at each step. Another interesting property of our reversible circuit above is that it only performs the computation for infinitesimal energy input if we're willing to wait an infinitely long time to observe the output. In general, even for reversible gates, there is an energytime tradeoff ^{2} for performing a specific computation: (energy consumed)*(computation time) > some constant A successful Bitcoin miner is concerned with more than finding the correct nonce with the least energy expenditure. He also wants to find it quickly! And this will always cost him energy, regardless of whether he use reversible or irreversible gates. ^{1}If this causes M bits to get flipped at the output, does this then mean the circuit required an energy input E > M kT ln 2 to test the next nonce? I'm not sure...^{2}Discovered by Charles Bennet (working at IBM with Rolf Landauer), refer to Feynman Lectures on Computation, Chapter 5. There is no need to output lots of nonce values for unsuccessful searches. For example, one could build a reversible engine that would fully search a defined range and output a single bit that indicates whether the range contains a solution. This information could be used in a binary search to zero in on an exact solution. Alternatively, the identity of promising ranges could be passed on to a conventional hash engine which would search only guaranteed successful ranges. Engineering details await detailed cost/performance specs for the "unobtanium" reversible devices.




solex
Legendary
Offline
Activity: 1078
Merit: 1002
100 satoshis > ISO code


April 20, 2015, 07:16:33 PM Last edit: April 20, 2015, 07:39:30 PM by solex 

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth. Indeed. It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology. I have been looking forward to seeing informed comment on it. When I suggested this on reddit the idea got shot down. http://www.reddit.com/r/Bitcoin/comments/22tegd/do_we_have_an_idea_of_the_power_in_watt_necessary/cgq6ydq




