|
June 08, 2013, 02:46:44 PM Last edit: June 08, 2013, 03:40:38 PM by J35st3r |
|
I'll take a punt at this (just to see if I can explain it, experts please step in to correct me)...
Your 32 bit chunk is incorrect (sorry). The algorithm works on 64 byte message packets, and the final hash is 32 bytes.
The block header is 80 bytes of data. This is padded with zero's to the next 64 byte bundary (hence 128 bytes total). plus a little housekeeping (a trailing 1 and the length, which is a constant vis 640 for the 80 bytes as a bit count).
The first 64 bytes of this 128 byte "message" are pre-hashed by the getwork server (bitcoind or the pool server) giviing a 32 byte SHA256-double-hash state value and supplied as "midstate".
The job of the fpga is to take this midstate, and use it to perform a hash on the second 64 bytes of that original 128 byte message. However most of this data is constant (1 followed by a bunch of zero's then the length constant), plus the 32 bit nonce which is generated onboard the fpga. This constant data is represented by that 384 bit value (it looks weird due to big vs little endian representation).
Anyway the pair of sha256_transform modules just calculate the SHA256-double hash (hence a pair of them). The top 32 bits are then compared 32'h00000000 (or an equivalent constant if shortcuts are applied). This represents a diff 1 share, and if found is reported back to the pool using the golden_nonce output.
Perhaps the key to your question is that bitcoin uses a double SHA256 hash. So we need a double hash in the fpga. Its just some magical math stuff that allows to precalculate the midstate in the getwork server, thus saving 50% of the work (remember its a 128 byte message, which would need four hash operations if we did the double hash according to the spec).
Hope that makes some sense.
[Edit] Another possible confusion is that the "data" supplied to the fpga (the second 64 bytes) is mostly discarded, vis only data_buf[95:0] is used. This is because the rest is already known (its that 384 bit constant) so we can save a consideralbe amount of logic by using the constant rather than having to process a variable value from data[255:96].
Also supplying the full 256 bits is rather inefficient use of the comms channel (the compiler will optimise away some or all of the unused logic, so perhaps no effect on the LE utilisation, but I haven't checked this in detail) and some of the implementations just supply 96 bits via the comms channel rather than the full 512 bits for data.
|