Bitcoin Forum
November 09, 2024, 09:12:17 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432941 times)
teknohog
Sr. Member
****
Offline Offline

Activity: 520
Merit: 253


555


View Profile WWW
June 05, 2011, 06:31:56 PM
 #141

That reminds me - have you managed to synthesize your code for a Spartan 6? I tried it, but it bailed out early on with a cryptic message about synthesis failing and no other information I could find. Rumour has it the Spartan 6 support may be more temperamental than for earlier generations. (Not that I have an FPGA to run this on anyway!)

No, I haven't tried it as I only have a Spartan 3E 500K. I have only been looking at the specs of Spartan 6 and others, so as to find the biggest number of logic units. Thanks for the idea though, testing the synthesis in advance would help us choose the best chip.

On another note, I have been estimating how my miner performs, based on how often a solution is found. I'm getting something like 3 to 4 Mhash/s at 100 MHz, which is much better than expected, but I may just be lucky. The mining script is updated to show these estimates, though with some more work you could get actual rates from the chip.

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
June 05, 2011, 08:49:07 PM
 #142

I haven't managed to synthesize anything that performs decently on a Spartan 6 (it complains about a congested design that can't be routed), but ArtForz claims to have one of these running at 190MH/s.

Hmmmm. Supposedly the Spartan 6 lacks the long-distance routing fabric of the Virtex 6 chips, and both have a much higher ratio of logic to routing infrastructure than older generations. Of course, this isn't officially documented anywhere that I can find... FPGA manufacturers are annoyingly secretive.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
Fhtagn
Newbie
*
Offline Offline

Activity: 58
Merit: 0


View Profile
June 05, 2011, 09:29:51 PM
 #143

I've been looking for a good excuse to dust off my Verilog books and old Digilent Spartan 2e 200K board, maybe I can fit a serialized version on it.
I'd expect <1MH/s from that FPGA.

Thanks for the estimate; I'm glad to find a project with knowledgeable people involved.

At this point, for me, it's not about hashing speed. It's about gaining more FPGA/HDL experience. I'm already mining with a decent amount of dedicated GPUs.
I've always loved hardware design, but haven't had much time for it since college.

I'd eventually move up to more powerful devices.

I think that this project is an important one for Bitcoin. FPGAs and ASICs will provide a much more power efficient mining infrastructure. Being open sourced will, hopefully, put device manufacture ability into more hands. This bodes well for network security.

If this project scales to multi-chip designs and board runs, I'll do what I can to help in prototyping/testing.

Ladyada has put together a list of some board makers: http://www.ladyada.net/library/pcb/manufacturers.html
bitcent
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
June 06, 2011, 08:24:30 AM
 #144

Well, this is a bit earlier than I had wanted, but I will tweak and improve this as we go along.
...
Please feel free to give me feedback, suggestions, critiques, and of course to submit Pull requests.
...
June 2nd, 2011 - Flexible Unrolling Added
Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.

FPGAminer - I've been following this thread for ~2 weeks now and looking at your TCL code for your miner (mine.tcl), and I am still trying to figure out *exactly* what goes into the FPGAs for hashing, and what comes out to be submitted.

It looks like the following takes place:

1) get_work() and send the following to the FPGA:
1 a) MIDSTATE - all 256 bits
1 b) DATA - *ONLY* 256bits [256-511] (DATA string characters 128-191)
1 c) HASH1, TARGET, and the remaining 75% of DATA are discarded. (?!?!)

2) wait up to 20 seconds for a result - [wait_for_golden_ticket 20]

3) upon finding a "golden ticket", submit_work to the bitcoin client containing:
3 a) original DATA string[0-151], plus
3 b) "golden ticket" nonce string replacing DATA characters[152-159], plus
3 c) original DATA string[160-255]
... in essence, the original data string with the 20th 32-bit all-zero data block replaced with the golden nonce.

Side note: it appears that the 18th 32-bit block is Unix seconds - since 01/01/1970 00:00:00.  Any other clues you can give about other fields?  Smiley  Maybe a link to the getwork() definition of returned data?

My questions are the following:
A) When does the FPGA "learn" of the target value to beat ... or does it ever? (Hardcoded?)
B) SHA256 requires 512bit chunks of data to hash over.  Is MIDSTATE really *right-in-the-middle* of a 64-round hash as opposed to just between 512bit chunks?
C) Exactly what gets hashed?  It looks like  the SHA256 engine is "primed" with MIDSTATE, and only gets 256bits of DATA to iterate with (ignoring the other 768bits of DATA).
D) If you only submit MIDSTATE and 256bits of DATA, how do we arrive at 128 round engines in the FPGA?

Any insight would be appreciated.  Especially if an explanation points to a more in-depth description of the algorithm.  (I've read every post here for ~2 weeks.)  I've also refreshed my memory at http://en.wikipedia.org/wiki/SHA-2

BTW - Thanks for all of your work!  GREAT JOB!
lebish
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
June 06, 2011, 08:41:51 AM
 #145

The insides of my heart are le melting. Epix!
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
June 06, 2011, 08:44:40 AM
 #146

Well, this is a bit earlier than I had wanted, but I will tweak and improve this as we go along.
...
Please feel free to give me feedback, suggestions, critiques, and of course to submit Pull requests.
...
June 2nd, 2011 - Flexible Unrolling Added
Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.

FPGAminer - I've been following this thread for ~2 weeks now and looking at your TCL code for your miner (mine.tcl), and I am still trying to figure out *exactly* what goes into the FPGAs for hashing, and what comes out to be submitted.

It looks like the following takes place:

1) get_work() and send the following to the FPGA:
1 a) MIDSTATE - all 256 bits
1 b) DATA - *ONLY* 256bits [256-511] (DATA string characters 128-191)
1 c) HASH1, TARGET, and the remaining 75% of DATA are discarded. (?!?!)

2) wait up to 20 seconds for a result - [wait_for_golden_ticket 20]

3) upon finding a "golden ticket", submit_work to the bitcoin client containing:
3 a) original DATA string[0-151], plus
3 b) "golden ticket" nonce string replacing DATA characters[152-159], plus
3 c) original DATA string[160-255]
... in essence, the original data string with the 20th 32-bit all-zero data block replaced with the golden nonce.

Side note: it appears that the 18th 32-bit block is Unix seconds - since 01/01/1970 00:00:00.  Any other clues you can give about other fields?  Smiley  Maybe a link to the getwork() definition of returned data?

My questions are the following:
A) When does the FPGA "learn" of the target value to beat ... or does it ever? (Hardcoded?)
B) SHA256 requires 512bit chunks of data to hash over.  Is MIDSTATE really *right-in-the-middle* of a 64-round hash as opposed to just between 512bit chunks?
C) Exactly what gets hashed?  It looks like  the SHA256 engine is "primed" with MIDSTATE, and only gets 256bits of DATA to iterate with (ignoring the other 768bits of DATA).
D) If you only submit MIDSTATE and 256bits of DATA, how do we arrive at 128 round engines in the FPGA?

Any insight would be appreciated.  Especially if an explanation points to a more in-depth description of the algorithm.  (I've read every post here for ~2 weeks.)  I've also refreshed my memory at http://en.wikipedia.org/wiki/SHA-2

BTW - Thanks for all of your work!  GREAT JOB!

A) it does not learn anything it just matches h[7] == 0, like saying the first 32bit is zero.
b)the midstate is between 2 chunks. a block is 80 bytes long. and 512 bit is 64byte. acc. to https://en.bitcoin.it/wiki/Protocol_specification#Block_Headers DATA[512:640] is 4 bytes of the rest of the merkelroot, timestamp, "bits", nonce. that means that the data is only 4*4=16bytes long, it should be 32bytes, it is then padded till it is 32 bytes long
C)the midstate is the state of the sha256 engine after the first block of 512bits, after then you hash up to 80 bytes(640bits).
D) there is 2 rounds of sha256, and one sha256 round is 64 rounds. 2*64=128rounds.
the way to calculate the hash of a block is: sha256(sha256(blockdata))

hope it explained it. feel free to drop some coins/cents at: 1EJXbMi5CjHeNmUQNpZgg72HWzEMX8tVja

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
gentakin
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
June 06, 2011, 08:47:49 AM
 #147

I can only answer about the getwork semantics:

* MIDSTATE is the sha256 hash after hashing the first 512-bit chunk of DATA, that is: the first half of DATA. So it is between SHA256 chunks, not in the middle of a sha256 round. The nonce is stored in the second half of the header, so the first half is constant and doesn't need to be hashed all over again.
* DATA is the block header for which a hash must be found. It does contain the unix timestamp. It also contains the current target value, so that's probably where the FPGA learns it (or it doesn't care at all and this is checked on the tcl-side). The nonce is set to 0x00000000.
* HASH1 is always the same, afaik. It's supposed to be some state buffer... or not. Not sure. Wink

When submitting the block via getwork, the original DATA needs to be adjusted to contain the valid nonce instead of 0x00000000.


So what the FPGA probably does is:
* increment nonce for every loop and use it as hash input for the second chunk.
* take midstate as the result of the first sha-256 chunk, then apply the second sha256 round.
* as bitcoin applies sha256 twice on the block header, hash the resulting 256bit string again, taking another sha256 round.
* if the resulting hash is "valid" (h==0), store it for the TCL script.

You might be interested in https://en.bitcoin.it/wiki/Block_hashing_algorithm .

edit: I'm too late, oh well. Grin

1HNjbHnpu7S3UUNMF6J9yWTD597LgtUCxb
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 09:01:00 AM
Last edit: June 06, 2011, 03:29:15 PM by TheSeven
 #148

D) there is 2 rounds of sha256, and one sha256 round is 64 rounds. 2*64=128rounds.
the way to calculate the hash of a block is: sha256(sha256(blockdata))
To be exact, a constant is prepended to the inner sha256 hash to pad it to the 512 data bits needed for the outer hash.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
June 06, 2011, 09:09:56 AM
 #149

D) there is 2 rounds of sha256, and one sha256 round is 64 rounds. 2*64=128rounds.
the way to calculate the hash of a block is: sha256(sha256(blockdata))
To be exact, a constant is prepended to the inner sha256 hash to pad it to the 512 data bytes needed for the outer hash.
yes. the first a 1bit then alot of 0bits and then the size(64bit) until we reaches 512, i know that. but it does not matter much, how its done, just that it is done... it is the padding a talked about.
btw. its 512 bytes its 512 bits. a byte is 8 bits.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
bitcent
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
June 06, 2011, 11:35:29 AM
 #150

D) there is 2 rounds of sha256, and one sha256 round is 64 rounds. 2*64=128rounds.
the way to calculate the hash of a block is: sha256(sha256(blockdata))
To be exact, a constant is prepended to the inner sha256 hash to pad it to the 512 data bytes needed for the outer hash.
yes. the first a 1bit then alot of 0bits and then the size(64bit) until we reaches 512, i know that. but it does not matter much, how its done, just that it is done... it is the padding a talked about.
btw. its 512 bytes its 512 bits. a byte is 8 bits.

kokjo, gentakin & TheSeven - Thanks for your replies.  The explanations (and LINKS!) helped me quickly understand what is happening, and in what state the FPGAs are getting the work.  (I knew it was some "midstate" but not exactly where.)  I'll spell it out here so you can check my understanding, and so the next noob can get a head start.

1) The first 1/2 of DATA is *already* hashed, and sitting in MIDSTATE.  Gotcha.  (So toss/ignore the first 1/2 of DATA for hash searching purposes.)

2) The next four 32bit long-words are:
2 a) the last 32bits of merkel tree
2 b) unix time in seconds
2 c) "bits", the current difficulty (encoded slightly)
2 d) nonce 0x00000000, which we iterate through 2^32 combinations looking for golden tickets

3) The remaining 384bits are the SHA256 spec for padding, which states:
Code:
re-processing:
append the bit '1' to the message
append k bits '0', where k is the minimum number >= 0 such that the resulting message
    length (in bits) is congruent to 448 (mod 512)
append length of message (before pre-processing), in bits, as 64-bit big-endian integer
( http://en.wikipedia.org/wiki/SHA-2 )
3 a) Padding starts with 0x00000080, which big-little endian converts to 0x80000000, the high-order bit is the '1' appended to the message.
3 b) followed by all zeros until the last 64bits
3 c) last 64bits specify message length 0x00000000 0x80200000 which big-little endian converts to 0x00000000 0x00000280.  0x280 = 640bits = 80bytes, and ALL header blocks are 80 bytes.  Check ... it all makes sense now.  (THANKS!)

While trying to read the protocols and make sense of FPGAminer's code, I wrote a quick perl script to print out repeated getwork() responses in nice columns for analysis.  If anyone wants, I can post it.  Besides, it's kind of mesmerizing to watch.  Smiley

Thanks again, guys!
MoonBuggy
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
June 06, 2011, 02:29:10 PM
 #151

There's a (currently slim) possibility that I could secure some time on a few decent sized FPGA systems (although the owners are understandably wary about who they allow to play with their rather expensive equipment), but right now I'm not sure how worthwhile it would be to pursue; what kind of performance would be expected from a Convey HC-1ex (four Virtex 6 LX760s) or possibly an Xtreme Data XD-PCIE3000 with three Stratix IVs per card?

Any ballpark figures would be greatly appreciated, and there's a strong possibility that I'd need someone who knows their stuff to assist me in exchange for a share of the profits if it turns out to be plausible!
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 03:29:45 PM
 #152

I now have a  bitfile for the atlys board (spartan 6 - lx45) with depth:=2 and 50mhz

The only problem is, that miner.py refuses to communicate over the serial port.
It detects the core, but when it starts "Measuring FPGA performance..." it produces and timeout: "Timed out waiting for FPGA to accept work"

@TheSeven: any idea how to debug or solve the problem? is the miner.py code working for all depths and frequencies?
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 03:40:02 PM
 #153

There's a (currently slim) possibility that I could secure some time on a few decent sized FPGA systems (although the owners are understandably wary about who they allow to play with their rather expensive equipment), but right now I'm not sure how worthwhile it would be to pursue; what kind of performance would be expected from a Convey HC-1ex (four Virtex 6 LX760s) or possibly an Xtreme Data XD-PCIE3000 with three Stratix IVs per card?

Any ballpark figures would be greatly appreciated, and there's a strong possibility that I'd need someone who knows their stuff to assist me in exchange for a share of the profits if it turns out to be plausible!
Ballpark estimate for the Convey machine would be 1-2GH/s. I'll know more after I attempt to synthesize a design. Do you know the speed grade of the FPGAs?
No idea about the Altera ones, ask fpgaminer Smiley

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 03:45:38 PM
 #154

I now have a  bitfile for the atlys board (spartan 6 - lx45) with depth:=2 and 50mhz

The only problem is, that miner.py refuses to communicate over the serial port.
It detects the core, but when it starts "Measuring FPGA performance..." it produces and timeout: "Timed out waiting for FPGA to accept work"

@TheSeven: any idea how to debug or solve the problem? is the miner.py code working for all depths and frequencies?
You'll need to adjust the pin locations for clk_in, rx and tx in the UCF file, and adjust the clock divider for the serial port for the 50MHz frequency.
Replace "10000010001" with "0110110010" and "11000011001" with "01010001011" in uart.vhd.
And I should probably publish the new version of my miner, it now supports multiple pools, long polling, etc. Smiley

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 03:52:09 PM
 #155

damn, i missed that one...
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 04:13:31 PM
 #156

And I should probably publish the new version of my miner, it now supports multiple pools, long polling, etc. Smiley
Here's the current version of PyFPGAMiner, with a demo config file: http://dl.dropbox.com/u/23683845/pyfpgaminer-0.0.1.zip
There are a lot more configuration options available, you can either reconstruct those from the source code or just ask me Smiley

Oh, and don't forget to donate if you like it Smiley
163PG9aNBj4ZaFzAK2LsRLvRzttww7vu6u

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 04:25:30 PM
 #157

great, will have a look once the bitfile finally runs...

communication is working now, but "Measuring FPGA performance..." takes a lot longer than the specified 120 secs until it timeouts...

I calculated around 4200 secs if the hw achieves 1 MH/s ... can that be?
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 04:38:23 PM
 #158

great, will have a look once the bitfile finally runs...

communication is working now, but "Measuring FPGA performance..." takes a lot longer than the specified 120 secs until it timeouts...

I calculated around 4200 secs if the hw achieves 1 MH/s ... can that be?
Yes, in the old miner version. The new one will take about a minute and it will also check whether the FPGA is working correctly.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
tantive
Newbie
*
Offline Offline

Activity: 10
Merit: 0


View Profile
June 06, 2011, 05:04:00 PM
 #159

ok, gave it a try, after measuring the fpga performance it crashes:

miner.py line 385
 delta = (endtime - starttime).total_seconds() - 0.0145
AttributeError: 'datetime.timedelta' object has no attribute 'total_seconds'
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 06, 2011, 05:35:04 PM
 #160

Which python version is that? It's running fine for me with 2.7, but it's probably not 3.0-ready yet.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!