Bitcoin Forum
April 26, 2024, 06:59:13 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432886 times)
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 08, 2011, 05:12:54 PM
 #181

Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need.
If you want to go for a cost effective solution, you'll need to build a board yourself.
If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
June 08, 2011, 05:15:46 PM
 #182

Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need.
If you want to go for a cost effective solution, you'll need to build a board yourself.
If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.
it would pay off in about 2 Month, if the current price and difficulty holds

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
teknohog
Sr. Member
****
Offline Offline

Activity: 519
Merit: 252


555


View Profile WWW
June 08, 2011, 05:21:41 PM
 #183

All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need.
If you want to go for a cost effective solution, you'll need to build a board yourself.
If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.

I've thought about this too -- not just because of the price, but simply having all those nice components I could not use while the thing is mining. There are some more minimal boards available, such as this one with an LX150, for about 400 euros with the necessary baseboard:

http://shop.trenz-electronic.de/catalog/product_info.php?cPath=1_65_143&products_id=917&osCsid=40823974778ae324bbd6778f2e17b289

Another problem is that the free-beer Xilinx software does not work with the largest chips, beyond LX45 or something.

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 08, 2011, 05:30:10 PM
 #184

Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need.
If you want to go for a cost effective solution, you'll need to build a board yourself.
If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.
it would pay off in about 2 Month, if the current price and difficulty holds

595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty.
966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days.
Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 08, 2011, 05:33:42 PM
 #185

I've thought about this too -- not just because of the price, but simply having all those nice components I could not use while the thing is mining. There are some more minimal boards available, such as this one with an LX150, for about 400 euros with the necessary baseboard:

http://shop.trenz-electronic.de/catalog/product_info.php?cPath=1_65_143&products_id=917&osCsid=40823974778ae324bbd6778f2e17b289

Another problem is that the free-beer Xilinx software does not work with the largest chips, beyond LX45 or something.

It isn't really clear though whether it's an LX100 or LX150.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
June 08, 2011, 05:43:27 PM
 #186

595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty.
966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days.
Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation Sad i was thinking about the DE0-nano, price at 79$/59$
my fault.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
rb2k
Member
**
Offline Offline

Activity: 109
Merit: 10


View Profile
June 08, 2011, 06:00:56 PM
 #187

595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty.
966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days.
Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation Sad i was thinking about the DE0-nano, price at 79$/59$
my fault.

What is the current state of the DE0-nano (how many MH/s, room for improvement)?
In general, I don't expect to get rich, but I'm a software engineer by trade and always happy about new gadgets Smiley If those gadgets would finance themselves sooner or later, I'd also be happy about that Wink
kokjo
Legendary
*
Offline Offline

Activity: 1050
Merit: 1000

You are WRONG!


View Profile
June 08, 2011, 06:11:44 PM
 #188

595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty.
966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days.
Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation Sad i was thinking about the DE0-nano, price at 79$/59$
my fault.

What is the current state of the DE0-nano (how many MH/s, room for improvement)?
In general, I don't expect to get rich, but I'm a software engineer by trade and always happy about new gadgets Smiley If those gadgets would finance themselves sooner or later, I'd also be happy about that Wink
i am known for my bad calculations but i think it might give 20MH/s its ~20k LEs. the DE2 is 80MH/s and the design fully unrolled is about ~ 80k LEs. do the math yourself. mine might be wrong.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
AnnihilaT
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
June 08, 2011, 11:23:24 PM
 #189

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
This seems to be a rather inexpensive system actually. Should yield 30GH/s.

Which one would yield 30GH/s ?  The cluster or the single card?
nathanrees19
Full Member
***
Offline Offline

Activity: 196
Merit: 100



View Profile
June 09, 2011, 12:03:43 AM
 #190

What is the current state of the DE0-nano (how many MH/s, room for improvement)?

It fits with the unrolling parameter set to 3 (just). This results in one hash per 8 clock cycles, or 6.25MH/s at 50MHz (Max 79 MHz). Depending on price/difficulty it would bring in $5-10 per month.

As with any development board, it isn't a cost effective mining solution. It is, however, a good choice if you're looking to learn.
rethaw
Sr. Member
****
Offline Offline

Activity: 378
Merit: 255



View Profile
June 09, 2011, 12:10:03 AM
 #191

Now I have: http://dl.dropbox.com/u/23683845/fpgaminer-virtex5.zip
You'll need to adjust the line "constant DEPTH : integer := 6;" (2^n pipeline stages) in top.vhd.

Hi All.  I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design.  But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240".  I would appreciate any guidance you could provide. Thanks.

TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 09, 2011, 08:51:11 AM
 #192

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
This seems to be a rather inexpensive system actually. Should yield 30GH/s.

Which one would yield 30GH/s ?  The cluster or the single card?
The cluster (190MH/s per XC6SLX150).

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 09, 2011, 08:53:33 AM
 #193

Now I have: http://dl.dropbox.com/u/23683845/fpgaminer-virtex5.zip
You'll need to adjust the line "constant DEPTH : integer := 6;" (2^n pipeline stages) in top.vhd.

Hi All.  I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design.  But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240".  I would appreciate any guidance you could provide. Thanks.
As you're probably not running at 120MHz you'll need to adjust the UART clock divider.
If you provide your clock frequency I can calculate the correct values for you.

Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
AnnihilaT
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
June 09, 2011, 09:01:31 AM
 #194

What rate would one of these be capable of?
 http://www.dinigroup.com/new/hpc.php

I also just received pricing on this cluster yesterday:
http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php

about 132K USD Smiley
This seems to be a rather inexpensive system actually. Should yield 30GH/s.

Which one would yield 30GH/s ?  The cluster or the single card?
The cluster (190MH/s per XC6SLX150).

I just got a price quote on the single card as well... 9,995 USD.  If i understand you correctly the card with 13 cores should yield about 2.5 GH/s.  So at these rates it still seems cheaper to go with 3 6990's at 750 USD per piece than something like this.  Power consumption is of course a factor but if you are talking pure hashes per sec to purchase price these things dont yet compete.  Or am i missing something?

1 x PCIe card with 13 cores = 9,995 USD -> 2560 MH/s
3 x 6990 = +/- 2350 USD -> 2100 MH/s

nathanrees19
Full Member
***
Offline Offline

Activity: 196
Merit: 100



View Profile
June 09, 2011, 11:33:10 AM
 #195

Power consumption is of course a factor but if you are talking pure hashes per sec to purchase price these things dont yet compete.  Or am i missing something?

When the difficulty is high enough that 6990s just barely turn a profit over the power cost, FPGA mining will become very cost effective.
makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
June 09, 2011, 12:52:02 PM
Last edit: June 09, 2011, 01:05:32 PM by makomk
 #196

Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.

Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this?

Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
rethaw
Sr. Member
****
Offline Offline

Activity: 378
Merit: 255



View Profile
June 09, 2011, 06:19:37 PM
 #197

Now I have: http://dl.dropbox.com/u/23683845/fpgaminer-virtex5.zip
You'll need to adjust the line "constant DEPTH : integer := 6;" (2^n pipeline stages) in top.vhd.

Hi All.  I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design.  But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240".  I would appreciate any guidance you could provide. Thanks.
As you're probably not running at 120MHz you'll need to adjust the UART clock divider.
If you provide your clock frequency I can calculate the correct values for you.

Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used.

The device is the xc6vlx240t.  I'm currently using the 200MHz clock on the device and have the MMCM set to 100MHz.  The MMCM supports up to 800MHz.

Here's the utilization with a depth of 5.

Device utilization summary:
---------------------------

Selected Device : 6vlx240tff1156-3


Slice Logic Utilization:
 Number of Slice Registers:           50042  out of  301440    16% 
 Number of Slice LUTs:                86029  out of  150720    57% 
    Number used as Logic:             86028  out of  150720    57% 
    Number used as Memory:                1  out of  58400     0% 
       Number used as SRL:                1

Slice Logic Distribution:
 Number of LUT Flip Flop pairs used:  86548
   Number with an unused Flip Flop:   36506  out of  86548    42% 
   Number with an unused LUT:           519  out of  86548     0% 
   Number of fully used LUT-FF pairs: 49523  out of  86548    57% 
   Number of unique control sets:        12

IO Utilization:
 Number of IOs:                           3
 Number of bonded IOBs:                   3  out of    600     0% 

Specific Feature Utilization:
 Number of BUFG/BUFGCTRLs:                4  out of     32    12%

TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 09, 2011, 08:26:45 PM
 #198

Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.

Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this?

Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap.
This obviously doesn't make sense on devices where the fully unrolled design fits, but on smaller parts it's the only way to make it work at all.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 09, 2011, 08:31:35 PM
 #199

Now I have: http://dl.dropbox.com/u/23683845/fpgaminer-virtex5.zip
You'll need to adjust the line "constant DEPTH : integer := 6;" (2^n pipeline stages) in top.vhd.

Hi All.  I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design.  But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240".  I would appreciate any guidance you could provide. Thanks.
As you're probably not running at 120MHz you'll need to adjust the UART clock divider.
If you provide your clock frequency I can calculate the correct values for you.

Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used.

The device is the xc6vlx240t.  I'm currently using the 200MHz clock on the device and have the MMCM set to 100MHz.  The MMCM supports up to 800MHz.

Here's the utilization with a depth of 5.

Device utilization summary:
---------------------------

Selected Device : 6vlx240tff1156-3


Slice Logic Utilization:
 Number of Slice Registers:           50042  out of  301440    16% 
 Number of Slice LUTs:                86029  out of  150720    57% 
    Number used as Logic:             86028  out of  150720    57% 
    Number used as Memory:                1  out of  58400     0% 
       Number used as SRL:                1

Slice Logic Distribution:
 Number of LUT Flip Flop pairs used:  86548
   Number with an unused Flip Flop:   36506  out of  86548    42% 
   Number with an unused LUT:           519  out of  86548     0% 
   Number of fully used LUT-FF pairs: 49523  out of  86548    57% 
   Number of unique control sets:        12

IO Utilization:
 Number of IOs:                           3
 Number of bonded IOBs:                   3  out of    600     0% 

Specific Feature Utilization:
 Number of BUFG/BUFGCTRLs:                4  out of     32    12%

How much headroom do you have frequency-wise?
As you have lots of spare flipflops, you should probably cut the pipeline stages into halves by doubly-registering their output (xst will balance some logic into this additional clock cycle). You should manage to cross 200MHz that way.
The UART clock divider values for 100MHz are "01101100100" and "10100010110".

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
deftx
Newbie
*
Offline Offline

Activity: 9
Merit: 0


View Profile
June 10, 2011, 05:28:06 PM
 #200

It's likely going to be impossible for these devices to reach economies of scale anywhere near GPUs. These particular devices are marketed towards higher end audiences anyhow, so I imagine there's more room for price to be charged.

I predict GPUs will almost always be more feasible because more are produced. The economic incentive to produce them for both gamers and miners will always be high, and drive the price down both due to efficiency and the price able to be charged.

There's always someone that can make that magic combination of components to drive the price down, so we'll see where it actually goes.
Pages: « 1 2 3 4 5 6 7 8 9 [10] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!