TheSeven
|
 |
June 08, 2011, 05:12:54 PM |
|
Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need. If you want to go for a cost effective solution, you'll need to build a board yourself. If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
kokjo
Legendary
Offline
Activity: 1050
Merit: 1000
You are WRONG!
|
 |
June 08, 2011, 05:15:46 PM |
|
Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need. If you want to go for a cost effective solution, you'll need to build a board yourself. If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best. it would pay off in about 2 Month, if the current price and difficulty holds
|
"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
|
|
|
teknohog
|
 |
June 08, 2011, 05:21:41 PM |
|
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need. If you want to go for a cost effective solution, you'll need to build a board yourself. If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.
I've thought about this too -- not just because of the price, but simply having all those nice components I could not use while the thing is mining. There are some more minimal boards available, such as this one with an LX150, for about 400 euros with the necessary baseboard: http://shop.trenz-electronic.de/catalog/product_info.php?cPath=1_65_143&products_id=917&osCsid=40823974778ae324bbd6778f2e17b289Another problem is that the free-beer Xilinx software does not work with the largest chips, beyond LX45 or something.
|
|
|
|
TheSeven
|
 |
June 08, 2011, 05:30:10 PM |
|
Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need. If you want to go for a cost effective solution, you'll need to build a board yourself. If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best. it would pay off in about 2 Month, if the current price and difficulty holds 595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty. 966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days. Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
TheSeven
|
 |
June 08, 2011, 05:33:42 PM |
|
It isn't really clear though whether it's an LX100 or LX150.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
kokjo
Legendary
Offline
Activity: 1050
Merit: 1000
You are WRONG!
|
 |
June 08, 2011, 05:43:27 PM |
|
595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty. 966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days. Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation  i was thinking about the DE0-nano, price at 79$/59$ my fault.
|
"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
|
|
|
rb2k
Member

Offline
Activity: 109
Merit: 10
|
 |
June 08, 2011, 06:00:56 PM |
|
595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty. 966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days. Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation  i was thinking about the DE0-nano, price at 79$/59$ my fault. What is the current state of the DE0-nano (how many MH/s, room for improvement)? In general, I don't expect to get rich, but I'm a software engineer by trade and always happy about new gadgets  If those gadgets would finance themselves sooner or later, I'd also be happy about that 
|
|
|
|
kokjo
Legendary
Offline
Activity: 1050
Merit: 1000
You are WRONG!
|
 |
June 08, 2011, 06:11:44 PM |
|
595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty. 966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days. Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation  i was thinking about the DE0-nano, price at 79$/59$ my fault. What is the current state of the DE0-nano (how many MH/s, room for improvement)? In general, I don't expect to get rich, but I'm a software engineer by trade and always happy about new gadgets  If those gadgets would finance themselves sooner or later, I'd also be happy about that  i am known for my bad calculations but i think it might give 20MH/s its ~20k LEs. the DE2 is 80MH/s and the design fully unrolled is about ~ 80k LEs. do the math yourself. mine might be wrong.
|
"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
|
|
|
AnnihilaT
|
 |
June 08, 2011, 11:23:24 PM |
|
This seems to be a rather inexpensive system actually. Should yield 30GH/s. Which one would yield 30GH/s ? The cluster or the single card?
|
|
|
|
nathanrees19
|
 |
June 09, 2011, 12:03:43 AM |
|
What is the current state of the DE0-nano (how many MH/s, room for improvement)?
It fits with the unrolling parameter set to 3 (just). This results in one hash per 8 clock cycles, or 6.25MH/s at 50MHz (Max 79 MHz). Depending on price/difficulty it would bring in $5-10 per month. As with any development board, it isn't a cost effective mining solution. It is, however, a good choice if you're looking to learn.
|
|
|
|
rethaw
|
 |
June 09, 2011, 12:10:03 AM |
|
Hi All. I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design. But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240". I would appreciate any guidance you could provide. Thanks.
|
|
|
|
TheSeven
|
 |
June 09, 2011, 08:51:11 AM |
|
This seems to be a rather inexpensive system actually. Should yield 30GH/s. Which one would yield 30GH/s ? The cluster or the single card? The cluster (190MH/s per XC6SLX150).
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
TheSeven
|
 |
June 09, 2011, 08:53:33 AM |
|
Hi All. I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design. But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240". I would appreciate any guidance you could provide. Thanks. As you're probably not running at 120MHz you'll need to adjust the UART clock divider. If you provide your clock frequency I can calculate the correct values for you. Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
AnnihilaT
|
 |
June 09, 2011, 09:01:31 AM |
|
This seems to be a rather inexpensive system actually. Should yield 30GH/s. Which one would yield 30GH/s ? The cluster or the single card? The cluster (190MH/s per XC6SLX150). I just got a price quote on the single card as well... 9,995 USD. If i understand you correctly the card with 13 cores should yield about 2.5 GH/s. So at these rates it still seems cheaper to go with 3 6990's at 750 USD per piece than something like this. Power consumption is of course a factor but if you are talking pure hashes per sec to purchase price these things dont yet compete. Or am i missing something? 1 x PCIe card with 13 cores = 9,995 USD -> 2560 MH/s 3 x 6990 = +/- 2350 USD -> 2100 MH/s
|
|
|
|
nathanrees19
|
 |
June 09, 2011, 11:33:10 AM |
|
Power consumption is of course a factor but if you are talking pure hashes per sec to purchase price these things dont yet compete. Or am i missing something? When the difficulty is high enough that 6990s just barely turn a profit over the power cost, FPGA mining will become very cost effective.
|
|
|
|
makomk
|
 |
June 09, 2011, 12:52:02 PM Last edit: June 09, 2011, 01:05:32 PM by makomk |
|
Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.
Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this? Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
rethaw
|
 |
June 09, 2011, 06:19:37 PM |
|
Hi All. I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design. But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240". I would appreciate any guidance you could provide. Thanks. As you're probably not running at 120MHz you'll need to adjust the UART clock divider. If you provide your clock frequency I can calculate the correct values for you. Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used. The device is the xc6vlx240t. I'm currently using the 200MHz clock on the device and have the MMCM set to 100MHz. The MMCM supports up to 800MHz. Here's the utilization with a depth of 5. Device utilization summary: --------------------------- Selected Device : 6vlx240tff1156-3 Slice Logic Utilization: Number of Slice Registers: 50042 out of 301440 16% Number of Slice LUTs: 86029 out of 150720 57% Number used as Logic: 86028 out of 150720 57% Number used as Memory: 1 out of 58400 0% Number used as SRL: 1 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 86548 Number with an unused Flip Flop: 36506 out of 86548 42% Number with an unused LUT: 519 out of 86548 0% Number of fully used LUT-FF pairs: 49523 out of 86548 57% Number of unique control sets: 12 IO Utilization: Number of IOs: 3 Number of bonded IOBs: 3 out of 600 0% Specific Feature Utilization: Number of BUFG/BUFGCTRLs: 4 out of 32 12%
|
|
|
|
TheSeven
|
 |
June 09, 2011, 08:26:45 PM |
|
Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.
Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this? Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap. This obviously doesn't make sense on devices where the fully unrolled design fits, but on smaller parts it's the only way to make it work at all.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
TheSeven
|
 |
June 09, 2011, 08:31:35 PM |
|
Hi All. I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design. But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240". I would appreciate any guidance you could provide. Thanks. As you're probably not running at 120MHz you'll need to adjust the UART clock divider. If you provide your clock frequency I can calculate the correct values for you. Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used. The device is the xc6vlx240t. I'm currently using the 200MHz clock on the device and have the MMCM set to 100MHz. The MMCM supports up to 800MHz. Here's the utilization with a depth of 5. Device utilization summary: --------------------------- Selected Device : 6vlx240tff1156-3 Slice Logic Utilization: Number of Slice Registers: 50042 out of 301440 16% Number of Slice LUTs: 86029 out of 150720 57% Number used as Logic: 86028 out of 150720 57% Number used as Memory: 1 out of 58400 0% Number used as SRL: 1 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 86548 Number with an unused Flip Flop: 36506 out of 86548 42% Number with an unused LUT: 519 out of 86548 0% Number of fully used LUT-FF pairs: 49523 out of 86548 57% Number of unique control sets: 12 IO Utilization: Number of IOs: 3 Number of bonded IOBs: 3 out of 600 0% Specific Feature Utilization: Number of BUFG/BUFGCTRLs: 4 out of 32 12% How much headroom do you have frequency-wise? As you have lots of spare flipflops, you should probably cut the pipeline stages into halves by doubly-registering their output (xst will balance some logic into this additional clock cycle). You should manage to cross 200MHz that way. The UART clock divider values for 100MHz are "01101100100" and "10100010110".
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
deftx
Newbie
Offline
Activity: 9
Merit: 0
|
 |
June 10, 2011, 05:28:06 PM |
|
It's likely going to be impossible for these devices to reach economies of scale anywhere near GPUs. These particular devices are marketed towards higher end audiences anyhow, so I imagine there's more room for price to be charged.
I predict GPUs will almost always be more feasible because more are produced. The economic incentive to produce them for both gamers and miners will always be high, and drive the price down both due to efficiency and the price able to be charged.
There's always someone that can make that magic combination of components to drive the price down, so we'll see where it actually goes.
|
|
|
|
|