Bitcoin Forum
December 02, 2016, 10:29:11 PM *
News: To be able to use the next phase of the beta forum software, please ensure that your email address is correct/functional.
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 402443 times)
Freakin
Full Member
***
Offline Offline

Activity: 140


View Profile
June 16, 2011, 04:33:07 PM
 #241

One obvious thing to bear in mind is that, at some point, the pools will inevitably all increase the difficulty of each share in order to reduce the work required to check all the submitted shares. So any ASIC-based mining system needs to be capable of checking hashes against difficulty levels above the minimum, either in the ASIC itself or in the processor controlling it. It looks like BTC Guild is actually making this change right now - there's a notice on their website saying they're doubling the difficulty per share. (This also requires changes to the software controlling FPGA miners.)

Edit: The other is that there's a good chance pools will eventually move to making miners compute midstates locally, again to reduce their resource usage. That's almost certainly best handled on a controller CPU rather than the main hash-computation hardware. What's more, since the rate of midstate computation is roughly proportional to the number of gigahashes/sec, mining ASICs would probably mean both of these happen sooner than they otherwise would.
Both of these can easily be handled on the controller CPU up to an FPGA speed of several gigahashes, so this is just a firmware matter and not relevant for the actual ASIC.
Oh, and yes, my miner ignores the requested difficulty. This just means that it keeps sending difficulty 1 shares, which means that it refuses to reduce the server load and ends up with roughly half of the shares being rejected, but will work perfectly fine apart from that. While it should of course be fixed, this issue isn't critical.

Would a more generalized SHA256 ASIC have dual-purpose for a security firm or similar, assuming a controller could handle enough of the logic for hashing?  Or is the whole point of an ASIC to make it as highly specialized as possible to get optimal efficiency?
Bitcoin addresses contain a checksum, so it is very unlikely that mistyping an address will cause you to lose money.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
kokjo
Legendary
*
Offline Offline

Activity: 1050

You are WRONG!


View Profile
June 16, 2011, 04:42:03 PM
 #242

One obvious thing to bear in mind is that, at some point, the pools will inevitably all increase the difficulty of each share in order to reduce the work required to check all the submitted shares. So any ASIC-based mining system needs to be capable of checking hashes against difficulty levels above the minimum, either in the ASIC itself or in the processor controlling it. It looks like BTC Guild is actually making this change right now - there's a notice on their website saying they're doubling the difficulty per share. (This also requires changes to the software controlling FPGA miners.)

Edit: The other is that there's a good chance pools will eventually move to making miners compute midstates locally, again to reduce their resource usage. That's almost certainly best handled on a controller CPU rather than the main hash-computation hardware. What's more, since the rate of midstate computation is roughly proportional to the number of gigahashes/sec, mining ASICs would probably mean both of these happen sooner than they otherwise would.
Both of these can easily be handled on the controller CPU up to an FPGA speed of several gigahashes, so this is just a firmware matter and not relevant for the actual ASIC.
Oh, and yes, my miner ignores the requested difficulty. This just means that it keeps sending difficulty 1 shares, which means that it refuses to reduce the server load and ends up with roughly half of the shares being rejected, but will work perfectly fine apart from that. While it should of course be fixed, this issue isn't critical.

Would a more generalized SHA256 ASIC have dual-purpose for a security firm or similar, assuming a controller could handle enough of the logic for hashing?  Or is the whole point of an ASIC to make it as highly specialized as possible to get optimal efficiency?

to make it highly specialized. Cheesy

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
June 16, 2011, 10:43:34 PM
 #243

Quote
Would a more generalized SHA256 ASIC have dual-purpose for a security firm or similar, assuming a controller could handle enough of the logic for hashing?  Or is the whole point of an ASIC to make it as highly specialized as possible to get optimal efficiency?
Probably what kokjo said, but I wouldn't rule out the possibility of making it more generalized. The algorithm can be optimized specifically for Bitcoin, and there are benefits to that which include both increased ASIC performance and reduced cost. What one would have to do is determine how much interest by non-Bitcoin parties there would be in such a chip, and if the cost benefits lost due to supporting a more general approach are out-weighed by the increased demand.

Also, it might be a good idea to keep the ASIC general simply because of risk. If Bitcoin fails to meet exceptions during the production of the ASIC chips, the manufacturer would at least have chips that could possibly be sold into other markets.

On a somewhat related note, I'd like to stress that I greatly believe any ASIC developments should all be done in the open, with open-source/open-hardware licenses on as much of the process as possible up to and including the free release of the masks. Now, that's a pretty bold statement, since the masks are very expensive. But if, for example, the entire venture is publicly funded that might not be such a crazy idea.

inh
Full Member
***
Offline Offline

Activity: 157


View Profile
June 17, 2011, 04:42:12 AM
 #244

This board should be good enough to get playing with this, yea?

http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,836&Prod=ATLYS

edit: looks like 45k LUTs isnt nearly enough for a pipelined version. Might have to make my own board for this =/
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
June 17, 2011, 04:43:19 PM
 #245

Quote
edit: looks like 45k LUTs isnt nearly enough for a pipelined version. Might have to make my own board for this =/
The current version of the code can unroll as much or as little as you want, so you can make it fit into 45K.

magik
Jr. Member
*
Offline Offline

Activity: 44


View Profile
June 17, 2011, 07:46:28 PM
 #246

just adding my 2 cents to this thread

At work I have a lot of Spartan-3E 1600's laying around - it's our main dev FPGA for our main product.

I initially wasted a day debugging the serial connection.  But after re-reading the thread, I saw you had a 120MHz clock source.  So just some counter adjustments for the uart's clock dividers fixed that.

Unfortunately, the Spartan 3 routing is horrible.  I've only been able to synthesize TheSeven's version of the code with a depth of 2 - and that barely missed the timing analysis running it all at 50MHz ( 214 signals had a data path delay of 21.5 ns - 50MHz = 20 ns ).

Gatewise, it looks like the 3E 1600 part with a depth of 2 is using 23% of the FFs and 38% of 4-input LUTs.  So I imagine gatewise it could hold a depth of 3, possibly 4.

But routing is another story.  Even with a depth of 2 running at 50MHz barely works.  I could probably underclock it a bit more, but then I have to use the clk_dv output on the dcm with some odd scaling factors.

This system clocked in using the pyminer a hashrate of 3.24 MH/s.

It may be possible to further optimize this for smaller devices by doing some more pipelining.  But I'm not sure how well that will fit into your parameterized SHA rounds.  But basically by looking at the static timing report, it looks like the longest path delays have to go through multiple adders.  A possible solution to synthesize this better on devices with lesser routing may be to split up these adders into multiple cycles.  You take a hit by using more cycles - but this may make routing a bit easier on the chip as well, and if that is the case, then you may be able to better unroll the SHA rounds and still obtain a good clock rate.
OrphanedGland
Member
**
Offline Offline

Activity: 71


View Profile
June 18, 2011, 01:50:40 AM
 #247

just adding my 2 cents to this thread

At work I have a lot of Spartan-3E 1600's laying around - it's our main dev FPGA for our main product.

I initially wasted a day debugging the serial connection.  But after re-reading the thread, I saw you had a 120MHz clock source.  So just some counter adjustments for the uart's clock dividers fixed that.

Unfortunately, the Spartan 3 routing is horrible.  I've only been able to synthesize TheSeven's version of the code with a depth of 2 - and that barely missed the timing analysis running it all at 50MHz ( 214 signals had a data path delay of 21.5 ns - 50MHz = 20 ns ).

Gatewise, it looks like the 3E 1600 part with a depth of 2 is using 23% of the FFs and 38% of 4-input LUTs.  So I imagine gatewise it could hold a depth of 3, possibly 4.

But routing is another story.  Even with a depth of 2 running at 50MHz barely works.  I could probably underclock it a bit more, but then I have to use the clk_dv output on the dcm with some odd scaling factors.

This system clocked in using the pyminer a hashrate of 3.24 MH/s.

It may be possible to further optimize this for smaller devices by doing some more pipelining.  But I'm not sure how well that will fit into your parameterized SHA rounds.  But basically by looking at the static timing report, it looks like the longest path delays have to go through multiple adders.  A possible solution to synthesize this better on devices with lesser routing may be to split up these adders into multiple cycles.  You take a hit by using more cycles - but this may make routing a bit easier on the chip as well, and if that is the case, then you may be able to better unroll the SHA rounds and still obtain a good clock rate.

Extra pipelining can be inserted.  I have sent some code to fpgaminer that shows how it is done.
inh
Full Member
***
Offline Offline

Activity: 157


View Profile
June 18, 2011, 01:55:21 AM
 #248

Would anyone be willing to help me design a barebones dev board for the 150k LUT spartan 6? I've done plenty of boards before for microcontrollers and whatnot, but I don't have the faintest clue what kind of support circuitry and FPGA would need. This could be a great asset to the community as well, since any retail board with a nice FPGA on it seems to be >$500, when these could be made for substantially less. I'm willing to offer up the design as open source once complete, and possibly even resell the boards (getting them made in bulk isn't expensive at all.)
OrphanedGland
Member
**
Offline Offline

Activity: 71


View Profile
June 18, 2011, 02:16:57 AM
 #249

Would anyone be willing to help me design a barebones dev board for the 150k LUT spartan 6? I've done plenty of boards before for microcontrollers and whatnot, but I don't have the faintest clue what kind of support circuitry and FPGA would need. This could be a great asset to the community as well, since any retail board with a nice FPGA on it seems to be >$500, when these could be made for substantially less. I'm willing to offer up the design as open source once complete, and possibly even resell the boards (getting them made in bulk isn't expensive at all.)

Normally the schematics posted by Xilinx are a good starting point... http://www.xilinx.com/support/documentation/boards_and_kits/XTP095_SP623_SCH_revC.pdf
inh
Full Member
***
Offline Offline

Activity: 157


View Profile
June 18, 2011, 02:47:47 AM
 #250

Thanks Smiley I spent all last night looking for exactly that Smiley
TheSeven
Hero Member
*****
Offline Offline

Activity: 504


FPGA Mining LLC


View Profile WWW
June 18, 2011, 05:51:56 AM
 #251

Would anyone be willing to help me design a barebones dev board for the 150k LUT spartan 6? I've done plenty of boards before for microcontrollers and whatnot, but I don't have the faintest clue what kind of support circuitry and FPGA would need. This could be a great asset to the community as well, since any retail board with a nice FPGA on it seems to be >$500, when these could be made for substantially less. I'm willing to offer up the design as open source once complete, and possibly even resell the boards (getting them made in bulk isn't expensive at all.)

Normally the schematics posted by Xilinx are a good starting point... http://www.xilinx.com/support/documentation/boards_and_kits/XTP095_SP623_SCH_revC.pdf

A good starting point, yes. But remember to add some more bypassing caps and a stronger power supply, as bitcoin mining drives the FPGA close to its TDP, if not even out of spec. Cooling and power supply stability and efficiency will be the most important factor here. Account for at least 10 amps per FPGA!

The FPGA you would want to use would be one of the XC6SLX150 types without T, choose whichever package is suited best. The better speed grade will probably not improve timings enough to be worth it though.

Regarding I/Os: Apart from a clock generator (I'd think that anywhere from 20MHz to 100MHz should work fine, but that might need a closer look), you'll just need two I/Os connected to a serial port through a MAX232 or similar. This will limit reusability for other purposes, but allows to cut costs, which seems to be the goal here. If it doesn't complicate PCB routing, you might want to allow for LED drivers and a couple of status LEDs to be added (or maybe some headers exposing unused I/O pins, but that might increase ESD risks)

Oh, and remember that prototyping this isn't exactly cheap with the FPGAs costing around $200 each.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
inh
Full Member
***
Offline Offline

Activity: 157


View Profile
June 18, 2011, 06:37:22 AM
 #252

Would anyone be willing to help me design a barebones dev board for the 150k LUT spartan 6? I've done plenty of boards before for microcontrollers and whatnot, but I don't have the faintest clue what kind of support circuitry and FPGA would need. This could be a great asset to the community as well, since any retail board with a nice FPGA on it seems to be >$500, when these could be made for substantially less. I'm willing to offer up the design as open source once complete, and possibly even resell the boards (getting them made in bulk isn't expensive at all.)

Normally the schematics posted by Xilinx are a good starting point... http://www.xilinx.com/support/documentation/boards_and_kits/XTP095_SP623_SCH_revC.pdf

A good starting point, yes. But remember to add some more bypassing caps and a stronger power supply, as bitcoin mining drives the FPGA close to its TDP, it not even out of spec. Cooling and power supply stability and efficiency will be the most important factor here. Account for at least 10 amps per FPGA!

The FPGA you would want to use would be one of the XC6SLX150 types without T, choose whichever package is suited best. The better speed grade will probably not improve timings enough to be worth it though.

Regarding I/Os: Apart from a clock generator (I'd think that anywhere from 20MHz to 100MHz should work fine, but that might need a closer look), you'll just need two I/Os connected to a serial port through a MAX232 or similar. This will limit reusability for other purposes, but allows to cut costs, which seems to be the goal here. If it doesn't complicate PCB routing, you might want to allow for LED drivers and a couple of status LEDs to be added (or maybe some headers exposing unused I/O pins, but that might increase ESD risks)

Oh, and remember that prototyping this isn't exactly cheap with the FPGAs costing around $200 each.

Great info TheSeven, thank you Smiley Do you have any more specific information on the clock generators and bypassing caps? I'm pretty new to this level of circuit design. It shouldn't be too hard to build a power supply that can drop +12v down to the 3.3 and 1.8 (And 1.2?) that the FPGA needs. Also good to know I just need two I/O lines for the serial converter. In all honesty I'll probably use an FTDI serial to usb since that's what myself and pretty much everyone else will need to use after the board anyways.

I planned on basically making it a breakout board so that all pins could be used, but you bring up a valid point with the ESD risks. It would also be much simpler and straightforward to just throw a couple of status LEDs on there and maybe a switch or two. Perhaps a small 8 pin header for other uses?

JTAG seems like it would be the best way to go so far as a programming interface is concerned.  Thoughts?

Oh and FPGA miner I'll gladly start a new thread if this is too far off topic from your original intent Smiley

Thanks for the help guys!
TheSeven
Hero Member
*****
Offline Offline

Activity: 504


FPGA Mining LLC


View Profile WWW
June 18, 2011, 01:30:29 PM
 #253

Great info TheSeven, thank you Smiley Do you have any more specific information on the clock generators and bypassing caps? I'm pretty new to this level of circuit design. It shouldn't be too hard to build a power supply that can drop +12v down to the 3.3 and 1.8 (And 1.2?) that the FPGA needs. Also good to know I just need two I/O lines for the serial converter. In all honesty I'll probably use an FTDI serial to usb since that's what myself and pretty much everyone else will need to use after the board anyways.

I planned on basically making it a breakout board so that all pins could be used, but you bring up a valid point with the ESD risks. It would also be much simpler and straightforward to just throw a couple of status LEDs on there and maybe a switch or two. Perhaps a small 8 pin header for other uses?

JTAG seems like it would be the best way to go so far as a programming interface is concerned.  Thoughts?

Oh and FPGA miner I'll gladly start a new thread if this is too far off topic from your original intent Smiley

Thanks for the help guys!

I have never used that FPGA series myself, so you might want to ask someone else who has more experience with them regarding power supply and clocking, for example ArtForz, who is working on building an FPGA cluster and seems to be pretty knowledgeable. Anyway, the 1.2V rail should be designed for at least 10 amps. As you don't need anything >3.3V, the input voltage range would ideally be something like 5-15V (standard barrel connector?) with an LDO for 3.3V and a POL switcher for 1.2V.

Yeah, an FTDI might be the way to go. And JTAG surely is the way to go for programming. There's just no point in spending money on a flash chip for this application, as the end user will need to have a JTAG device anyway.

Oh, and don't forget the trivial things like mounting holes Smiley

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
dgtlmrln
Newbie
*
Offline Offline

Activity: 8


View Profile
June 20, 2011, 02:02:21 AM
 #254

First off, I'd like to say that I totally love this project. I've been out of the loop in terms of hardware design for the last 3-4 years and this project is giving me the motivation to get back into it Smiley

I'm running makomk's branch on a DE2 (not -115) at 80MHz with CONFIG_LOOP_LOG2=3. I should be able to run at at least 90MHz because I have plenty of slack but I haven't had the time to look into that yet. At the moment, setting a higher frequency causes "Place & Route" to do worse instead of better.

Anyways, my question is this: How fast am I hashing?

If I understand correctly, the frequency that I'm running at is approximately my hashing speed... So assuming fully unrolled, 80 MHz would give me 80 MHashes/s. However, with CONFIG_LOOP_LOG2=3, my hashing power should be 80 * (0.5 ** 3) or 10 MHashes/s. However, based on shares submitted to a pool,  I'm very roughly estimating ~25MHashes/s. Is there a way to get a better idea of how fast I'm hashing?

1Gnr3J1REUD6hh7Ti34814NsJnnTALACmZ
TheSeven
Hero Member
*****
Offline Offline

Activity: 504


FPGA Mining LLC


View Profile WWW
June 20, 2011, 02:18:25 AM
 #255

First off, I'd like to say that I totally love this project. I've been out of the loop in terms of hardware design for the last 3-4 years and this project is giving me the motivation to get back into it Smiley

I'm running makomk's branch on a DE2 (not -115) at 80MHz with CONFIG_LOOP_LOG2=3. I should be able to run at at least 90MHz because I have plenty of slack but I haven't had the time to look into that yet. At the moment, setting a higher frequency causes "Place & Route" to do worse instead of better.

Anyways, my question is this: How fast am I hashing?

If I understand correctly, the frequency that I'm running at is approximately my hashing speed... So assuming fully unrolled, 80 MHz would give me 80 MHashes/s. However, with CONFIG_LOOP_LOG2=3, my hashing power should be 80 * (0.5 ** 3) or 10 MHashes/s. However, based on shares submitted to a pool,  I'm very roughly estimating ~25MHashes/s. Is there a way to get a better idea of how fast I'm hashing?

10MH/s sounds correct, and pool hashrate estimates being massively off is not something unusual. Average it over a couple of hours, and you should end up with 8-12MH/s.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
June 20, 2011, 02:19:56 AM
 #256

Quote
Anyways, my question is this: How fast am I hashing?
80 MHz at CONFIG_LOOP_LOG2 should be 10MH/s. If you're seeing 25MH/s at a pool then perhaps you've been lucky? Over what timespan is that average made?

Quote
First off, I'd like to say that I totally love this project. I've been out of the loop in terms of hardware design for the last 3-4 years and this project is giving me the motivation to get back into it
That's great! Makes me happy to spread the FPGA love  Grin

OrphanedGland
Member
**
Offline Offline

Activity: 71


View Profile
June 20, 2011, 01:09:11 PM
 #257

Not sure if anyone here is using Stratix IV, but I have managed to compile a design with 4 SHA256 pairs for the EP4SE530 (2nd largest Stratix IV device).  The resource usage is at 68% and clock rate is 195MHz, to give a total hash rate of 780Mh/s.  I'm now trying to compile with 6 SHA256 pairs.  Not sure that will fit, but 5 definitely will, so a single Stratix IV is good for approximately 1Gh/s.

The code can be found at: https://github.com/OrphanedGland/Open-Source-FPGA-Bitcoin-Miner
makomk
Hero Member
*****
Offline Offline

Activity: 686


View Profile
June 20, 2011, 04:14:16 PM
 #258

I'm running makomk's branch on a DE2 (not -115) at 80MHz with CONFIG_LOOP_LOG2=3. I should be able to run at at least 90MHz because I have plenty of slack but I haven't had the time to look into that yet. At the moment, setting a higher frequency causes "Place & Route" to do worse instead of better.
Should be 10 megahash/sec in theory, yeah. I'm surprised you managed to reach 80MHz though. Just to check, you're not running one of my older branches that's broken with CONFIG_LOOP_LOG2!=0? They should all be clearly labelled and it would probably be obvious if you were because you wouldn't get any shares, but still...

Not sure if anyone here is using Stratix IV, but I have managed to compile a design with 4 SHA256 pairs for the EP4SE530 (2nd largest Stratix IV device).  The resource usage is at 68% and clock rate is 195MHz, to give a total hash rate of 780Mh/s.  I'm now trying to compile with 6 SHA256 pairs.  Not sure that will fit, but 5 definitely will, so a single Stratix IV is good for approximately 1Gh/s.

The code can be found at: https://github.com/OrphanedGland/Open-Source-FPGA-Bitcoin-Miner
Ooh, neat, thanks! Stratix IV is well outside my price range - and probably most people's, to be fair - but it's interesting to see what can be achieved in theory. Some very unusual optimisations there too.

Edit: Aha, that's what you meant by precomputing H+K+W! I'd unimaginatively named the equivalent register t1_part...

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
dgtlmrln
Newbie
*
Offline Offline

Activity: 8


View Profile
June 21, 2011, 04:23:59 AM
 #259

I decided to do some testing to see if I could get a better estimate of my hashing power. I modified mine.tcl to poll my hashrate (based on a 10 minute average) from deepbit every time I timeout searching for a golden ticket (so approximately every 20 seconds). I then average my polls to get an estimate.

After about 4 hours of running my estimate is ~ 13 MH/s. Looks like I was just getting lucky earlier Smiley

I'm running makomk's branch on a DE2 (not -115) at 80MHz with CONFIG_LOOP_LOG2=3. I should be able to run at at least 90MHz because I have plenty of slack but I haven't had the time to look into that yet. At the moment, setting a higher frequency causes "Place & Route" to do worse instead of better.
Should be 10 megahash/sec in theory, yeah. I'm surprised you managed to reach 80MHz though. Just to check, you're not running one of my older branches that's broken with CONFIG_LOOP_LOG2!=0? They should all be clearly labelled and it would probably be obvious if you were because you wouldn't get any shares, but still...

I believe I'm using your master branch. I did try one of your broken branches: I was able to get 100 MHz with CONFIG_LOOP_LOG2=2! I knew it didn't make any sense, but I went ahead and tried it anyways. Of course all my submits were rejected.

1Gnr3J1REUD6hh7Ti34814NsJnnTALACmZ
makomk
Hero Member
*****
Offline Offline

Activity: 686


View Profile
June 21, 2011, 11:25:17 AM
 #260

I believe I'm using your master branch. I did try one of your broken branches: I was able to get 100 MHz with CONFIG_LOOP_LOG2=2! I knew it didn't make any sense, but I went ahead and tried it anyways. Of course all my submits were rejected.
In that case, you're actually running fpgaminer's unmodified code. (There are technical reasons why I have that as the master branch on my git repository.) Still surprised you managed to reach 80 MHz though...

Edit: Also, at some point I should modify my newer changes to properly allow less loop unrolling and see how well that works. Not sure how effective it would be though, and it's a pain to get right.,

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 [13] 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!