Bitcoin Forum
June 16, 2025, 04:14:28 AM *
News: Pizza day contest voting
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432989 times)
teknohog
Sr. Member
****
Offline Offline

Activity: 520
Merit: 253


555


View Profile WWW
June 08, 2011, 05:21:41 PM
 #181

All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need.
If you want to go for a cost effective solution, you'll need to build a board yourself.
If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.

I've thought about this too -- not just because of the price, but simply having all those nice components I could not use while the thing is mining. There are some more minimal boards available, such as this one with an LX150, for about 400 euros with the necessary baseboard:

http://shop.trenz-electronic.de/catalog/product_info.php?cPath=1_65_143&products_id=917&osCsid=40823974778ae324bbd6778f2e17b289

Another problem is that the free-beer Xilinx software does not work with the largest chips, beyond LX45 or something.

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 08, 2011, 05:30:10 PM
 #182

Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need.
If you want to go for a cost effective solution, you'll need to build a board yourself.
If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.
it would pay off in about 2 Month, if the current price and difficulty holds

595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty.
966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days.
Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 08, 2011, 05:33:42 PM
 #183

I've thought about this too -- not just because of the price, but simply having all those nice components I could not use while the thing is mining. There are some more minimal boards available, such as this one with an LX150, for about 400 euros with the necessary baseboard:

http://shop.trenz-electronic.de/catalog/product_info.php?cPath=1_65_143&products_id=917&osCsid=40823974778ae324bbd6778f2e17b289

Another problem is that the free-beer Xilinx software does not work with the largest chips, beyond LX45 or something.

It isn't really clear though whether it's an LX100 or LX150.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
June 08, 2011, 05:43:27 PM
 #184

595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty.
966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days.
Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation Sad i was thinking about the DE0-nano, price at 79$/59$
my fault.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
rb2k
Member
**
Offline Offline

Activity: 109
Merit: 10


View Profile
June 08, 2011, 06:00:56 PM
 #185

595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty.
966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days.
Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation Sad i was thinking about the DE0-nano, price at 79$/59$
my fault.

What is the current state of the DE0-nano (how many MH/s, room for improvement)?
In general, I don't expect to get rich, but I'm a software engineer by trade and always happy about new gadgets Smiley If those gadgets would finance themselves sooner or later, I'd also be happy about that Wink
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
June 08, 2011, 06:11:44 PM
 #186

595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty.
966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days.
Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation Sad i was thinking about the DE0-nano, price at 79$/59$
my fault.

What is the current state of the DE0-nano (how many MH/s, room for improvement)?
In general, I don't expect to get rich, but I'm a software engineer by trade and always happy about new gadgets Smiley If those gadgets would finance themselves sooner or later, I'd also be happy about that Wink
i am known for my bad calculations but i think it might give 20MH/s its ~20k LEs. the DE2 is 80MH/s and the design fully unrolled is about ~ 80k LEs. do the math yourself. mine might be wrong.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
AnnihilaT
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
June 08, 2011, 11:23:24 PM
 #187

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
This seems to be a rather inexpensive system actually. Should yield 30GH/s.

Which one would yield 30GH/s ?  The cluster or the single card?
nathanrees19
Full Member
***
Offline Offline

Activity: 196
Merit: 100



View Profile
June 09, 2011, 12:03:43 AM
 #188

What is the current state of the DE0-nano (how many MH/s, room for improvement)?

It fits with the unrolling parameter set to 3 (just). This results in one hash per 8 clock cycles, or 6.25MH/s at 50MHz (Max 79 MHz). Depending on price/difficulty it would bring in $5-10 per month.

As with any development board, it isn't a cost effective mining solution. It is, however, a good choice if you're looking to learn.
rethaw
Sr. Member
****
Offline Offline

Activity: 378
Merit: 255



View Profile
June 09, 2011, 12:10:03 AM
 #189

Now I have: http://dl.dropbox.com/u/23683845/fpgaminer-virtex5.zip
You'll need to adjust the line "constant DEPTH : integer := 6;" (2^n pipeline stages) in top.vhd.

Hi All.  I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design.  But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240".  I would appreciate any guidance you could provide. Thanks.

TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 09, 2011, 08:51:11 AM
 #190

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
This seems to be a rather inexpensive system actually. Should yield 30GH/s.

Which one would yield 30GH/s ?  The cluster or the single card?
The cluster (190MH/s per XC6SLX150).

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 09, 2011, 08:53:33 AM
 #191

Now I have: http://dl.dropbox.com/u/23683845/fpgaminer-virtex5.zip
You'll need to adjust the line "constant DEPTH : integer := 6;" (2^n pipeline stages) in top.vhd.

Hi All.  I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design.  But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240".  I would appreciate any guidance you could provide. Thanks.
As you're probably not running at 120MHz you'll need to adjust the UART clock divider.
If you provide your clock frequency I can calculate the correct values for you.

Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
AnnihilaT
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
June 09, 2011, 09:01:31 AM
 #192

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
This seems to be a rather inexpensive system actually. Should yield 30GH/s.

Which one would yield 30GH/s ?  The cluster or the single card?
The cluster (190MH/s per XC6SLX150).

I just got a price quote on the single card as well... 9,995 USD.  If i understand you correctly the card with 13 cores should yield about 2.5 GH/s.  So at these rates it still seems cheaper to go with 3 6990's at 750 USD per piece than something like this.  Power consumption is of course a factor but if you are talking pure hashes per sec to purchase price these things dont yet compete.  Or am i missing something?

1 x PCIe card with 13 cores = 9,995 USD -> 2560 MH/s
3 x 6990 = +/- 2350 USD -> 2100 MH/s

nathanrees19
Full Member
***
Offline Offline

Activity: 196
Merit: 100



View Profile
June 09, 2011, 11:33:10 AM
 #193

Power consumption is of course a factor but if you are talking pure hashes per sec to purchase price these things dont yet compete.  Or am i missing something?

When the difficulty is high enough that 6990s just barely turn a profit over the power cost, FPGA mining will become very cost effective.
makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
June 09, 2011, 12:52:02 PM
Last edit: June 09, 2011, 01:05:32 PM by makomk
 #194

Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.

Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this?

Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
rethaw
Sr. Member
****
Offline Offline

Activity: 378
Merit: 255



View Profile
June 09, 2011, 06:19:37 PM
 #195

Now I have: http://dl.dropbox.com/u/23683845/fpgaminer-virtex5.zip
You'll need to adjust the line "constant DEPTH : integer := 6;" (2^n pipeline stages) in top.vhd.

Hi All.  I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design.  But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240".  I would appreciate any guidance you could provide. Thanks.
As you're probably not running at 120MHz you'll need to adjust the UART clock divider.
If you provide your clock frequency I can calculate the correct values for you.

Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used.

The device is the xc6vlx240t.  I'm currently using the 200MHz clock on the device and have the MMCM set to 100MHz.  The MMCM supports up to 800MHz.

Here's the utilization with a depth of 5.

Device utilization summary:
---------------------------

Selected Device : 6vlx240tff1156-3


Slice Logic Utilization:
 Number of Slice Registers:           50042  out of  301440    16% 
 Number of Slice LUTs:                86029  out of  150720    57% 
    Number used as Logic:             86028  out of  150720    57% 
    Number used as Memory:                1  out of  58400     0% 
       Number used as SRL:                1

Slice Logic Distribution:
 Number of LUT Flip Flop pairs used:  86548
   Number with an unused Flip Flop:   36506  out of  86548    42% 
   Number with an unused LUT:           519  out of  86548     0% 
   Number of fully used LUT-FF pairs: 49523  out of  86548    57% 
   Number of unique control sets:        12

IO Utilization:
 Number of IOs:                           3
 Number of bonded IOBs:                   3  out of    600     0% 

Specific Feature Utilization:
 Number of BUFG/BUFGCTRLs:                4  out of     32    12%

TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 09, 2011, 08:26:45 PM
 #196

Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.

Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this?

Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap.
This obviously doesn't make sense on devices where the fully unrolled design fits, but on smaller parts it's the only way to make it work at all.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 09, 2011, 08:31:35 PM
 #197

Now I have: http://dl.dropbox.com/u/23683845/fpgaminer-virtex5.zip
You'll need to adjust the line "constant DEPTH : integer := 6;" (2^n pipeline stages) in top.vhd.

Hi All.  I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design.  But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240".  I would appreciate any guidance you could provide. Thanks.
As you're probably not running at 120MHz you'll need to adjust the UART clock divider.
If you provide your clock frequency I can calculate the correct values for you.

Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used.

The device is the xc6vlx240t.  I'm currently using the 200MHz clock on the device and have the MMCM set to 100MHz.  The MMCM supports up to 800MHz.

Here's the utilization with a depth of 5.

Device utilization summary:
---------------------------

Selected Device : 6vlx240tff1156-3


Slice Logic Utilization:
 Number of Slice Registers:           50042  out of  301440    16% 
 Number of Slice LUTs:                86029  out of  150720    57% 
    Number used as Logic:             86028  out of  150720    57% 
    Number used as Memory:                1  out of  58400     0% 
       Number used as SRL:                1

Slice Logic Distribution:
 Number of LUT Flip Flop pairs used:  86548
   Number with an unused Flip Flop:   36506  out of  86548    42% 
   Number with an unused LUT:           519  out of  86548     0% 
   Number of fully used LUT-FF pairs: 49523  out of  86548    57% 
   Number of unique control sets:        12

IO Utilization:
 Number of IOs:                           3
 Number of bonded IOBs:                   3  out of    600     0% 

Specific Feature Utilization:
 Number of BUFG/BUFGCTRLs:                4  out of     32    12%

How much headroom do you have frequency-wise?
As you have lots of spare flipflops, you should probably cut the pipeline stages into halves by doubly-registering their output (xst will balance some logic into this additional clock cycle). You should manage to cross 200MHz that way.
The UART clock divider values for 100MHz are "01101100100" and "10100010110".

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
deftx
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
June 10, 2011, 05:28:06 PM
 #198

It's likely going to be impossible for these devices to reach economies of scale anywhere near GPUs. These particular devices are marketed towards higher end audiences anyhow, so I imagine there's more room for price to be charged.

I predict GPUs will almost always be more feasible because more are produced. The economic incentive to produce them for both gamers and miners will always be high, and drive the price down both due to efficiency and the price able to be charged.

There's always someone that can make that magic combination of components to drive the price down, so we'll see where it actually goes.
pdki
Newbie
*
Offline Offline

Activity: 27
Merit: 0


View Profile
June 10, 2011, 08:08:35 PM
 #199

I think with a real ASIC hardware implementation of sha-256 it should easily be possible to outrun GPUs by at least a factor of 100, because of better space efficiency and the simplicity of the logic involved.

Considering that
-you can manufacture one of these for ~2M€ and then get 1000s of these chips
-they will not consume much power
-they can be put on cheap boards, because no heavy IO is needed (graphic cards are expensive due to the heavy IO with ram)

I am sure this will happen, if Bitcoins really establish as a currency and USD exchange rates stay in the 10$ range. If not, this would be a cheap option for aggressors like governments to take over the network. Much easier and cheaper then trying to shut it down by law.
vx609e
Newbie
*
Offline Offline

Activity: 29
Merit: 0


View Profile
June 11, 2011, 03:31:13 PM
Last edit: June 11, 2011, 07:41:07 PM by vx609e
 #200

Hi,

IMO, an ASIC implementation is the way to go. We already have decent RTL (those who contributed to this know who they are and I thank you guys for this). With little modifications to the currently RTL, we could easily daisy chain many "cores" (easiest implementation with current state of project is a token ring over UART...only need to assign a specific address to each core).

Let's say each manufactured chip would yield 100 MHash/s. We daisy chain 20 per boards (a board with 20 chips on it is not a big deal) That's 2 GHash/s right there. PCB design and manufacturing would be pretty straight forward. I volunteer for that.

The big question: how to we finance an ASIC project? And even more importantly: how do we get it done?

1) Outsource FPGA2ASIC flow to http://www.icnexus.com.tw/product.php?id=25 (first company I found...there's gotta be many others). Get a chips ASAP and limit the risks. With this forum, I'm sure we could get a small EE team together and do all the Synopsis, BIST, test scan, pads design, routing, etc. crap ourselves but there are specialists out there that will do it for us...and chances of success will be much higher with that approach. Being a 100% digital chip (+ regulator and PLL obviously) the project couldn't be easier for these guys (or whatever company that would get the contract)...now to mention they are already in the business of FPGA2ASIC conversion.

2) Crowd funding with kickstarter.com -- If we can get 500 people to pre-order one 2 GHash/s board at 1000$ a piece (a truly good deal IMO), we get a 500k$ budget to do #1. We need 10,000 chips. I think the budget makes sense if we spend 250k$ on design, 100k$ on chips (10$ a piece), 50k$ for tape-out (might be included in design cost...we need to see with the contractor), 10k$ on PCBs and assembly + the rest for overhead. Once we get real quote from contractor, we can adjust the cost per board...I'll I'm putting here are ball park figure to show the potential of this approach.

So far in my career all I've done is deal with PCB, FPGA and ASIC designs...this project seem very realistic to me. But maybe I'm day dreaming...please bring me back to earth if I'm doing so.

Feedback, suggestions and comments very welcome.
Pages: « 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!