Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: Geremia on April 18, 2015, 10:55:55 PM

Title: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Geremia on April 18, 2015, 10:55:55 PM

What is the theoretical minimum number of logical operations an ASIC needs to perform to compute double iterated SHA256, i.e., sha(sha(•))?

(cf. the Bitcoin StackExchange question (https://bitcoin.stackexchange.com/q/36984/4334))

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Cryddit on April 19, 2015, 04:04:13 AM

SHA256 is sixty-four rounds comprising

384 32-bit additions (6 per round)
320 32-bit ORs (5 per round)
448 32-bit XORs (7 per round)

And a bunch of bit shifts, but bit shifts are free on an ASIC.

SHA256D, which is what Bitcoin uses, is 128 rounds, comprising

768 additions,
640 ORs
896 XORs

And a bunch of bit shifts but bit shifts are free on an ASIC.

SHA256D is an interesting choice, actually; usually you don't see it except in a context where someone is worried about an extension attack - which doesn't really apply to the way Bitcoin uses it.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Geremia on April 19, 2015, 05:26:24 AM

Quote from: Cryddit on April 19, 2015, 04:04:13 AM

SHA256D, which is what Bitcoin uses

SHA256D = double iterated SHA256?

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Geremia on April 19, 2015, 05:28:47 AM

Quote from: Cryddit on April 19, 2015, 04:04:13 AM

And a bunch of bit shifts

How many, exactly?

Quote from: Cryddit on April 19, 2015, 04:04:13 AM

but bit shifts are free on an ASIC.

What do you mean they "are free on an ASIC"?

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Cryddit on April 19, 2015, 05:35:36 AM

Quote from: Geremia on April 19, 2015, 05:28:47 AM

Quote from: Cryddit on April 19, 2015, 04:04:13 AM

And a bunch of bit shifts

How many, exactly?

Quote from: Cryddit on April 19, 2015, 04:04:13 AM

but bit shifts are free on an ASIC.

What do you mean they "are free on an ASIC"?

512 32-bit shifts for SHA256, 1024 for SHA256D. 8 per round.

But on an ASIC, Bit shifts are not logic operations. Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates.

And yes, SHA256D is SHA256D(oubled).

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Geremia on April 19, 2015, 06:05:43 AM

Quote from: Cryddit on April 19, 2015, 05:35:36 AM

But on an ASIC, Bit shifts are not logic operations. Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates.

I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"?

Title: calculating the lower limit on the thermal expenditure of ASICs
Post by: Geremia on April 19, 2015, 06:35:39 AM

The reason I ask is because I'd like to place a lower limit on the thermal expenditure of ASICs, using Landauer's principle (https://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5392446).

So, for SHA256D, I gather there cannot be fewer than 73,728 bits (=32*(768+640+896)) written or erased per hash.
(assuming each addition, XOR, and OR operation involves no more than 32 bits written or erased)

So,

73,728 k T ln(2) = 2.29×10¯¹⁶ Joules,

assuming Boltzmann's constant k = 1.380 6504(24)×10¯²³ J K¯¹ and the circuit is at 324 K (which is the average of my ASIC).

If I'm hashing at 1.1 Thash/s, for example, I get that the power dissipated/required should be:

0.251 milliWatts

So, even the most efficient ASICs like the S5 are many orders of magnitude away from being as efficient as they could be. (Computers nowadays dissipate/require on the order of at least 500× the Landauer limit per elementary logic operation.)

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: shorena on April 19, 2015, 08:26:56 AM

Quote from: Geremia on April 19, 2015, 06:05:43 AM

Quote from: Cryddit on April 19, 2015, 05:35:36 AM

But on an ASIC, Bit shifts are not logic operations. Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates.

I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"?

Think of it like you would solder this with huge boxes[2] that contain your logic operations. Each of theses boxes has one or more inputs and outputs. E.g. an AND[1] would have 2 inputs and 1 output. You can consider the circuit traces as the cables you would use to connect the different boxes with eachother. A shift would be a cable that is not going straight, but a little to the left (or right) to use a different input. Leftmost (or rightmost) cable would cross all other cable to use the rightmost (or leftmost) input.

[1] sometimes pictures are better than just words: -> http://www.homofaciens.de/bilder/technik/logic-gates_008.gif

[2] http://www.jaurich-online.de/ebay/ojau/oj1359.jpg

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: InceptionCoin on April 19, 2015, 09:24:08 AM

Quote from: Geremia on April 19, 2015, 06:05:43 AM

Quote from: Cryddit on April 19, 2015, 05:35:36 AM

But on an ASIC, Bit shifts are not logic operations. Not even gates. They're just circuit traces that go at an angle instead of directly straight into the next array of gates.

I'm sorry, but I'm not too familiar with circuit design. What do you mean by "circuit traces"?

I will try to explain it simplier than shorena:
imagine you have number 10 in 8bit integer, it means
0 0 0 0 1 0 1 0
Now, you need to shift right it, to get
0 0 0 0 0 1 0 1
In the cpu register you will put number from first bit to second bit, from second to third etc, but in asic you can connect(actualy you shouldn't but nevermind) registers like here:

Code:

 _ _ _ _ _ _ _ _
 \ \ \ \ \ \ \ 
— — — — — — — —

And you get shifted number without replacing bits.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Meni Rosenfeld on April 19, 2015, 11:27:52 AM

You should read up on https://en.wikipedia.org/wiki/Reversible_computing. You can in theory construct your logic gates in a way that will not erase information, and make arbitrarily long computations with a bounded energy expenditure. The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: btchris on April 19, 2015, 03:14:43 PM

Quote from: Cryddit on April 19, 2015, 04:04:13 AM

SHA256 is sixty-four rounds comprising

384 32-bit additions (6 per round)
320 32-bit ORs (5 per round)
448 32-bit XORs (7 per round)

And a bunch of bit shifts, but bit shifts are free on an ASIC.

Are you sure of those numbers? For example in a naive implementation, I'm counting 48*3 (https://github.com/gurnec/HashCheck/blob/e08a5683a43fecd08e8127debf81d6c1228c5990/libs/sha256.c#L81-L83) + 64*7 (https://github.com/gurnec/HashCheck/blob/e08a5683a43fecd08e8127debf81d6c1228c5990/libs/sha256.c#L98-L109) + 8 (https://github.com/gurnec/HashCheck/blob/e08a5683a43fecd08e8127debf81d6c1228c5990/libs/sha256.c#L112-L119)) = 600 additions for a single SHA-256 block. There could be better ways of doing SHA-256 that don't naively follow the standard though... I wouldn't know....

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Cryddit on April 19, 2015, 06:21:04 PM

Quote from: btchris on April 19, 2015, 03:14:43 PM

Quote from: Cryddit on April 19, 2015, 04:04:13 AM

SHA256 is sixty-four rounds comprising

384 32-bit additions (6 per round)
320 32-bit ORs (5 per round)
448 32-bit XORs (7 per round)

And a bunch of bit shifts, but bit shifts are free on an ASIC.

Here's one round of SHA256, expressed as a pseudo-circuit diagram:
http://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/SHA-2.svg/400px-SHA-2.svg.png
where the red pluses indicate 32-bit addition and the dark-blue boxes are defined as
http://upload.wikimedia.org/math/2/2/d/22d25335571877dfffa0e3b92b567a74.png,
http://upload.wikimedia.org/math/d/c/3/dc385330f79942f1ccced410f8ca0374.png,
http://upload.wikimedia.org/math/c/6/5/c65cf353faa9befd97e4e90df3786bd3.png,
http://upload.wikimedia.org/math/5/0/f/50ff80081e37aeeb2ed0f01fae51af01.png
(and of course the plus inside the circle is xor).

It looks like the implementation you linked to is using extra additions to feed into Ch and Ma - which is an artifact of not being able to directly split the circuit traces that come off the EFG and ABC registers respectively. It, or something very like it, is probably the best you can do in software.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: btchris on April 19, 2015, 11:45:24 PM

Quote from: Cryddit on April 19, 2015, 06:21:04 PM

Quote from: btchris on April 19, 2015, 03:14:43 PM

48*3 (https://github.com/gurnec/HashCheck/blob/e08a5683a43fecd08e8127debf81d6c1228c5990/libs/sha256.c#L81-L83) + 64*7 (https://github.com/gurnec/HashCheck/blob/e08a5683a43fecd08e8127debf81d6c1228c5990/libs/sha256.c#L98-L109) + 8 (https://github.com/gurnec/HashCheck/blob/e08a5683a43fecd08e8127debf81d6c1228c5990/libs/sha256.c#L112-L119)) = 600 additions for a single SHA-256 block
<clip>

Here's one round of SHA256, expressed as a pseudo-circuit diagram:

Thanks, that's a great diagram.

Please excuse my ignorance, but does the plus to the right of Ch have three inputs, and is this considered a single operation on an ASIC? I think that's the 6 vs 7 additions per round discrepancy.

Also, should the message expansion step be included (creating W_t, the 48*3 from above)?
Should the final hash value creation be included (the 8 from above)?

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: altcoinex on April 20, 2015, 12:40:04 AM

Wanted to throw in my tip of the hat to Cryddit as well for such a well detailed response.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Cryddit on April 20, 2015, 03:39:27 AM

Quote from: btchris on April 19, 2015, 11:45:24 PM

Please excuse my ignorance, but does the plus to the right of Ch have three inputs, and is this considered a single operation on an ASIC? I think that's the 6 vs 7 additions per round discrepancy.

Yes, I think that's probably it. And yes, you can construct a circuit to do an addition of three values in one step on an ASIC. And no, this is not you being ignorant, this is fairly obscure stuff. ASICs follow their own weird set of rules and it's not quite the same ones that software follows.

Quote from: btchris on April 19, 2015, 11:45:24 PM

Also, should the message expansion step be included (creating W_t, the 48*3 from above)?
Should the final hash value creation be included (the 8 from above)?

I believe that the message expansion step can be a near-NOP like bitshifting on an ASIC; you are right about the final hash value though, so I was off by eight.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Sergio_Demian_Lerner on April 20, 2015, 04:03:59 AM

Quote from: Geremia on April 18, 2015, 10:55:55 PM

Cryddit gave an estimation on the number of standard gate building blocks required for a Bitcoin ASIC (adders, logic gates)
However, adders require more space than OR gates, so generally the number of gates will be dominated by the number of adders. Also adders can be implemented in several ways, with different delay/space trade-offs, so even if there could be a theoretical minimum number of gates, practically all implementations would use much more to reduce the delay.

More interesting, you can:

- Compute SHA^2 approximately, and get a better practically good SHA^2 ASIC for mining.
See https://bitslog.wordpress.com/2015/02/17/faster-sha-256-asics-using-carry-reduced-adders.

- Compute SHA^2 asynchronously (e.g. using asynchronous adders)

Last, it has not been proven that performing a complete SHA^2 evaluation is required on average to check that a changing header has a SHA^2 hash that is below the target value. In fact, several widely known optimizations have disprove it.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Geremia on April 20, 2015, 04:00:45 PM

Quote from: Meni Rosenfeld on April 19, 2015, 11:27:52 AM

The notion that the Landauer's principle places a lower limit on the energy cost of computation is a myth.

It has been experimentally verified:

Bérut, Antoine, Artak Arakelyan, Artyom Petrosyan, Sergio Ciliberto, Raoul Dillenschneider, and Eric Lutz. “Experimental Verification of Landauer/’s Principle Linking Information and Thermodynamics (http://www.nature.com/nature/journal/v483/n7388/full/nature10872.html).” Nature 483, no. 7388 (March 8, 2012): 187–89. doi:10.1038/nature10872 (http://dx.doi.org/10.1038/nature10872). (non-paywalled version (http://moerwiki.us.to/misc/Physics%20papers%20and%20books/Modern%20Papers/B%c3%a9rut%20et%20al.%20-%202012%20-%20Experimental%20verification%20of%20Landauer%27s%20principle.pdf))

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Peter R on April 20, 2015, 04:33:26 PM

Quote from: Meni Rosenfeld on April 19, 2015, 11:27:52 AM

Interesting. Let's imagine that we made a circuit to test nonces for bitcoin mining using reversible gates.

Code:

        ______________
      -|            |-
      -|            |-
 i    -|            |-   o
 n    -|            |-   u
 p    -|  circuit   |-   t
 u    -|            |-   p
 t    -|            |-   u
      -|            |-   t
      -|            |-
       --------------

Assume that the input to the circuit is the blockheader which consists of 608 bits + the 32-bit nonce. So we need 640 wires (bits) coming into our circuit at the input side.

At the output, all we really care about is a yea or nay on whether the nonce satisfies the difficulty target, which could be represented by a single bit. But that's not reversible because from one bit we can't go backwards and reproduce the 640-bit input. To make the computation reversible, our circuit must also have at least 640 bits coming out of it.

To use the circuit in practice, we apply our first nonce to the input, wait for the output to settle, and determine if the difficulty target was satisfied. The answer is probably NO...

So, now comes the irreversible part. We need test another nonce, which means we must flip at least one bit in our input (the input includes the nonce). So, as per Landauer's principle, this costs us energy¹

E > kT ln 2.

We have to do this many, many times until we find a nonce that satisfies the difficult target, expending energy at each step.

Another interesting property of our reversible circuit above is that it only performs the computation for infinitesimal energy input if we're willing to wait an infinitely long time to observe the output. In general, even for reversible gates, there is an energy-time tradeoff² for performing a specific computation:

(energy consumed)*(computation time) > some constant

A successful Bitcoin miner is concerned with more than finding the correct nonce with the least energy expenditure. He also wants to find it quickly! And this will always cost him energy, regardless of whether he use reversible or irreversible gates.

¹If this causes M bits to get flipped at the output, does this then mean the circuit required an energy input E > M kT ln 2 to test the next nonce? I'm not sure...

²Discovered by Charles Bennet (working at IBM with Rolf Landauer), refer to Feynman Lectures on Computation, Chapter 5.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: tl121 on April 20, 2015, 05:30:45 PM

Quote from: Peter R on April 20, 2015, 04:33:26 PM

Quote from: Meni Rosenfeld on April 19, 2015, 11:27:52 AM

Interesting. Let's imagine that we made a circuit to test nonces for bitcoin mining using reversible gates.

Code:

        ______________
      -|            |-
      -|            |-
 i    -|            |-   o
 n    -|            |-   u
 p    -|  circuit   |-   t
 u    -|            |-   p
 t    -|            |-   u
      -|            |-   t
      -|            |-
       --------------

There is no need to output lots of nonce values for unsuccessful searches. For example, one could build a reversible engine that would fully search a defined range and output a single bit that indicates whether the range contains a solution. This information could be used in a binary search to zero in on an exact solution. Alternatively, the identity of promising ranges could be passed on to a conventional hash engine which would search only guaranteed successful ranges.

Engineering details await detailed cost/performance specs for the "unobtanium" reversible devices. :)

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: solex on April 20, 2015, 07:16:33 PM

Quote from: Meni Rosenfeld on April 19, 2015, 11:27:52 AM

Indeed. It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology. I have been looking forward to seeing informed comment on it.

When I suggested this on reddit the idea got shot down.
http://www.reddit.com/r/Bitcoin/comments/22tegd/do_we_have_an_idea_of_the_power_in_watt_necessary/cgq6ydq

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Geremia on April 20, 2015, 10:50:42 PM

Quote from: solex on April 20, 2015, 07:16:33 PM

It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology.

From the Reddit page you linked: "No. Hashing by definition is irreversible and results in loss of information and therefore increased entropy."

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Peter R on April 20, 2015, 11:05:11 PM

Quote from: Geremia on April 20, 2015, 10:50:42 PM

Quote from: solex on April 20, 2015, 07:16:33 PM

It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology.

From the Reddit page you linked: "No. Hashing by definition is irreversible and results in loss of information and therefore increased entropy."

It's irreversible because the normal circuit for SHA256d discards information (and thus requires an energy input to move in the forward direction). For example, we hash the 640 bit blockheader to get a 256 bit digest; there's no way to work backwards with only 256 bits to reconstruct the 640 bit input (the output contains less information than the input). But I don't see why there wouldn't be another circuit that performs the hash function in a reversible way, by tracking all the information that is normally discarded. This circuit would output the hash value PLUS enough of other information that one could work backwards to reconstruct the inputs.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: solex on April 20, 2015, 11:08:07 PM

Quote from: Peter R on April 20, 2015, 11:05:11 PM

Quote from: Geremia on April 20, 2015, 10:50:42 PM

Quote from: solex on April 20, 2015, 07:16:33 PM

It seems obvious that reversible computing is the future for Bitcoin mining ASICs, and could be the first major commercial application of the technology.

From the Reddit page you linked: "No. Hashing by definition is irreversible and results in loss of information and therefore increased entropy."

Yes, that's what I was thinking. The final hash is irreversible, but not if the results of each step are known. The efficiency is gained by recycling logic gate inputs as they occur. Certainly a challenging design problem though.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Cryddit on April 20, 2015, 11:27:44 PM

As I understand it, (and I could be wrong here) what is actually absolutely required to spend energy for, is the output. IOW, you could at least in theory design a system that answers the one-bit question, "is there a nonce meeting the difficulty target within <some range of nonces>" by actually spending the energy to write exactly one bit. Everything else can be reversible, so the greater the amount of computation you can do without any external effects required the more of it can be done "free" (albeit at ridiculously high complexity) but no matter what, you have to write the output.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: DeathAndTaxes on April 21, 2015, 01:48:05 AM

Quote from: solex on April 20, 2015, 07:16:33 PM

Maybe in 50-80 years. Nobody has successfully implemented a 32 bit adder using reversible computing. If you think quantum computing is in its infancy well reversible computing hasn't even been born yet. Despite the theory being published in 1973 to date there has been pretty much no practical demonstration of implementing the most trivial of reversible circuits.

Part of the problem is that the circuit must be very insulated from the outside environment in order to remain reversible. To date this means a lot of very expensive near zero superconductors but even esoteric problems like a stray cosmic ray striking your circuit can leak to large scale leakage. To say the future of mining obviously involves reversible computing is sort of like suggesting Honda should stop researching hybrids and start researching hyperdrives because obviously on a long enough timeline some form of faster than light travel is an obvious requirement for a spacefaring civilization. :) There is a lot of potential improvement in classical computing. We something like a million times less efficient than the thermodynamic limit.

Also as Peter points out there is still an energy time tradeoff. If I gave you today a reversible miner which ran on near zero energy but had a hardware cost $10,000 of per GH would you be interested? If the technology is more expensive on an amortized per hash total lifecycle cost than classical computing well it doesn't really matter how little energy it uses.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: DeathAndTaxes on April 21, 2015, 01:56:46 AM

Quote from: Cryddit on April 20, 2015, 11:27:44 PM

In theory yes but even proponents of reversible computing don't believe leakage will be that low. There is the energy cost required to perfectly isolate the circuit from the outside environment so even if your raw circuit was perfectly reversible the total system energy cost will be much higher.

Also theory is just theory. In theory it is possible for someone to make a miner with 5,000,000 G/J (instead of 1 G/J) using plain boring classical computing. Granted you aren't going to do it with 20nm silicon but in theory it can be done. Now 5,000,0000 PH/s well we know that is not possible without a massive reduction in the "work" needed to complete a single hash. "In theory" is a nice way of saying nobody has proved it impossible. :)

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: 2112 on April 22, 2015, 04:01:01 AM

Quote from: Sergio_Demian_Lerner on April 20, 2015, 04:03:59 AM

It is hard to guess what was the original intention of the question: mathematical/algebraic minimum or some sort of minimum for a defined implementation technology.

The typical goal for a implementation using silicon CMOS process would be timing closure, i.e. minimizing the time to compute the result. This is the reason why everyone is using carry-look-ahead adders that add 32 bits in parallel in a single clock cycle.

The other approach would be to use serial adders that add 32 bits in 32 clock cycles. Such an adder is just a pair of XOR gates. The problem with this approach is that the resultant clock rate is too high for the practical silicon-based semiconductor manufacturing processes. But if somebody considers e.g. GaAs manufacturing then the very deeply pipelined serial implementation is a viable alternative.

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: Geremia on April 22, 2015, 05:43:40 AM

Quote from: 2112 on April 22, 2015, 04:01:01 AM

It is hard to guess what was the original intention of the question: mathematical/algebraic minimum or some sort of minimum for a defined implementation technology.

either

Quote from: 2112 on April 22, 2015, 04:01:01 AM

The typical goal for a implementation using silicon CMOS process would be timing closure, i.e. minimizing the time to compute the result. This is the reason why everyone is using carry-look-ahead adders that add 32 bits in parallel in a single clock cycle.

The other approach would be to use serial adders that add 32 bits in 32 clock cycles. Such an adder is just a pair of XOR gates. The problem with this approach is that the resultant clock rate is too high for the practical silicon-based semiconductor manufacturing processes. But if somebody considers e.g. GaAs manufacturing then the very deeply pipelined serial implementation is a viable alternative.

interesting
thanks

Title: SHA-256 by PEN & PENCIL!
Post by: Geremia on April 27, 2015, 08:33:43 PM

Quote from: btchris on April 19, 2015, 11:45:24 PM

that's a great diagram

These blog posts were very helpful for me:
http://www.righto.com/2014/09/mining-bitcoin-with-pencil-and-paper.html
http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html

Very impressive that he shows SHA-256 with pen and pencil!

Title: Re: Theoretical minimum # of logic operations to perform double iterated SHA256?
Post by: smolen on April 29, 2015, 08:00:41 PM

Quote from: Cryddit on April 19, 2015, 04:04:13 AM

SHA256D is an interesting choice, actually; usually you don't see it except in a context where someone is worried about an extension attack - which doesn't really apply to the way Bitcoin uses it.

Bitcoin mining would be vulnerable to "constant elimination attack" :) were plain SHA256 used instead of double one.

EDIT:
Geremia, take a look at Bitfury trick (https://bitcointalk.org/index.php?topic=183368.msg2266329#msg2266329)
This line saves some memory at the expense of extra computation:

Code:

ds <= er - cr;