Bitcoin Forum
April 16, 2024, 11:02:09 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: sha256_top stop condition  (Read 4283 times)
phr33 (OP)
Full Member
***
Offline Offline

Activity: 226
Merit: 100


View Profile
April 05, 2012, 10:10:54 PM
 #1

I'm playing around with the open source verilog implementations of bitcoin miners. I see that Icarus reuse quite a bit of code from ztex. The sha256 modules in particular.

Since I don't have any FPGA big enough to hold that fully unrolled core, I though I replace it with a smaller one that requires more cycles. Anyway, I had a deeper look at the code at hand to see how it was built. I suspected that there would be at least some 'shortcuts' compared to the basic pseudo code I've seen on the block hashing algo.

"Double Sha256" is used, i.e. sha(sha(x)). And the basic idea is to vary a small postion of x (the nonce) until we find a final hash that is smaller than a certain level (i.e. there's a certain number of leading zeros).

So, I looked for some kind of stopping condition that resembles that, but found nothing. Instead the golder nonce is considered found when a portion of the second hash equals 0xa41f32e7. I smell shortcut, but can't get my head around how it works. Also the second sha-block only uses 61 rounds rather than the full 64.

Ok, so seeing that some 32-bit words are just shifted during the final rounds I can see how it might be possible to determine a stop condition before the entire hash is computed. But it's not obvious. At least not for me, without pen and paper  Cheesy

I tried to search for discussions on kernel optimizations etc, but couldn't find anything amongst the flood on discussions how to maximize you MHash/s using tolls someone else developed. I'm more interested in whats under the hood than earning a buck or two.

Anyone here how has got their hands dirty?

Perhaps some pointers on other threads / webpages where block hashing optimizations are descussed?

Thanks!

My BTC input: 1GAtPwoTGPQ35y9QugJueum5GzaEzLYjiQ
My GPG ID: B0CCFD4A
Transactions must be included in a block to be properly completed. When you send a transaction, it is broadcast to miners. Miners can then optionally include it in their next blocks. Miners will be more inclined to include your transaction if it has a higher transaction fee.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713265329
Hero Member
*
Offline Offline

Posts: 1713265329

View Profile Personal Message (Offline)

Ignore
1713265329
Reply with quote  #2

1713265329
Report to moderator
1713265329
Hero Member
*
Offline Offline

Posts: 1713265329

View Profile Personal Message (Offline)

Ignore
1713265329
Reply with quote  #2

1713265329
Report to moderator
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
April 05, 2012, 10:50:15 PM
Last edit: April 05, 2012, 11:04:44 PM by fpgaminer
 #2

Hi,

I maintain the Open Source FPGA Bitcoin Miner, so I may be able to help. Feel free to ask questions and I'll keep an eye on this thread.

Quote
I can see how it might be possible to determine a stop condition before the entire hash is computed. But it's not obvious.
There are no "stop conditions" with SHA-256 (1). You have to compute the whole thing, otherwise the hash is incomplete. As you have found, however, there is a small shortcut because of the way mining works.

Quote
And the basic idea is to vary a small postion of x (the nonce) until we find a final hash that is smaller than a certain level (i.e. there's a certain number of leading zeros).
For most miners, all you care about is finding what is called a difficulty 1 share (2). That means there are 32 bits of leading zeros. So, for optimized mining code we only care that the last 32-bits of the final hash are 0. As you saw, those last 32-bits are actually calculated in round 61, and get shifted 3 times in rounds 62, 63, and 64. We don't care about anything calculated in those last three rounds. Hence, the optimization is to skip them, and check the 5th 32-bit word of the hash produced after the 61st round.

You can see that in the code here. For the second run of SHA-256 it only uses 61 rounds, and reads the 5th word from the output (the code uses IDX(4), because the index starts from 0).

Quote
Instead the golder nonce is considered found when a portion of the second hash equals 0xa41f32e7.
There is another shortcut here. In SHA-256, after all 64 rounds have been computed, one last operation is performed. The individual 32-bit words of the input state are added to the individual 32-bits words of the last round's output. You can see that in The SHA2 wikipedia article, where it says "Add this chunk's hash to result so far."

Now, we're looking to see if the last word is equal to zero:

Code:
out[`IDX(7)] + input_state[`IDX(7)] == 0x00000000 ???

The input state to the second run of SHA-256 is a known value. The 8th word of which is 0x5be0cd19, so ...

Code:
out[`IDX(7)] + 0x5be0cd19 == 0x00000000 ???

Which is equivalent to:

Code:
out[`IDX(7)] == 0x00000000 - 0x5be0cd19 ???

Which is equivalent to:

Code:
out[`IDX(7)] == 0xa41f32e7 ???

I hope that explains those two optimizations. Again, feel free to ask questions if you're confused, or if you have other questions. I more or less wrote this quickly, so may have skipped some details.


NOTES
(1) I'm not sure what connotation you are giving to "stop conditions," so I will explain a little further to help clear any confusion. When mining, you are given a set of data to perform hashes on. As you noted, this is done by manipulating the nonce and hashing each time the nonce changes. What you really want to do is check all possible nonces. Even if you've already found a "golden nonce" (one which gives you a hash starting with 32 zeros), you need to keep searching for more. There could be anywhere between 0 and 2^32 solutions to a given block of work, so it is in your best interest to keep looking for more. Hence, there are no stop conditions in the sense of when to stop running your algorithm, other than having exhausted all possible nonces (at which point, you would get more work).

(2) This is because most miners are connected to a pool server, which accepts any hash meeting at least Difficulty 1 (32 leading zeros). It should be noted, however, that software designed around this optimization will still work for solo mining directly against bitcoind, because bitcoind will merely reject any share that doesn't meet the block difficulty (and every Difficulty 1 share has a possibility of doing so). Also, most FPGA or GPU miners have some high-level controlling software (written in Python or C) which can perform the real difficulty check if necessary. Having the hardware check for shares is really just a way of reducing bandwidth from the hardware to the controlling software.

fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
April 05, 2012, 11:14:40 PM
 #3

Quote
Since I don't have any FPGA big enough to hold that fully unrolled core, I though I replace it with a smaller one that requires more cycles.
Just in case you didn't know, designs like mine can be folded up to fit into smaller devices like the DE0-Nano:

https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/blob/master/src/fpgaminer_top.v
https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/blob/master/src/sha256_transform.v
https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/DE2_115_Unoptimized_Pipelined
https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/DE2_115_makomk_mod

phr33 (OP)
Full Member
***
Offline Offline

Activity: 226
Merit: 100


View Profile
April 06, 2012, 07:23:18 AM
 #4

OMG, I had missed your project completely. I got so excited when I found the icarus and ZTEX sources. No offence to them, but I had to spend some time cleanup their code for my eyes not to hurt. This. This is something else! Nice work!  Smiley

Spot on answers to my questions as well.

So one of my mistakes what that I forgot about pooled mining. Obviously the pools chop up the task. I kept thinking that the FPGA should solve the "full problem" rather than just "a part of it". So the FPGA (and opencl kernels) solve "Difficulty 1 blocks". Each such solution also has a chance of solving the full problem. The pool will reward you depending on how many Difficulty 1 solutions you submitted.
And as you say, there can be multiple solutions (nonce's) for each work block so you keep on working. Hence no "stop condition". If you were a solo-miner, solving the real problem, it would indeed be a stop-condition.
Thanks for reminding me on pooled mining!

And the final sha operation. My head was still inside the rounds,  so I missed it. No need to compute it. Nice.

Agh. So you already have a parameterized core that I can fold to my little DE1 board. Damn you, you leave no work for me!  Cheesy

Anyway, I will try out your code. I bet there will be a question or two. But I hope I can reach a point where I might actually contribute some. A project setup for the DE1 board if nothing else.

Cheers!

My BTC input: 1GAtPwoTGPQ35y9QugJueum5GzaEzLYjiQ
My GPG ID: B0CCFD4A
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
April 06, 2012, 08:29:37 AM
 #5

Quote
This. This is something else! Nice work!
Thank you for the kind words. I am glad my explanation, and the project on github, was able to help you Smiley

Quote
Hence no "stop condition". If you were a solo-miner, solving the real problem, it would indeed be a stop-condition.
Ah, now I see what you meant by stop condition. Yes indeed, if a full difficulty hash is found, that would be a stop condition because the block would now change completely.

For pooled mining, you could also consider exhausting all possible nonces as a stop condition, since the miner must now ask (or should have asked preemptively) for new work. None except my latest firmware obeys this stop condition though, merely for simplicity's sake. Here is where the latest code obeys the stop condition: linky. That helps to save power when the FPGA gets starved of work.

Quote
Anyway, I will try out your code. I bet there will be a question or two.
Feel free to ask away! I'll try to keep an eye on this thread, but you can also contact me through PM, or github, or IRC or whatever Smiley I appreciate the questions, as they lend themselves to helping me (eventually) write documentation on all these gritty details. I'm beginning to make strides towards that end here on github, but ... it's a bit disheveled at the moment Tongue

Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!