teknohog
|
 |
June 08, 2011, 05:21:41 PM |
|
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need. If you want to go for a cost effective solution, you'll need to build a board yourself. If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best.
I've thought about this too -- not just because of the price, but simply having all those nice components I could not use while the thing is mining. There are some more minimal boards available, such as this one with an LX150, for about 400 euros with the necessary baseboard: http://shop.trenz-electronic.de/catalog/product_info.php?cPath=1_65_143&products_id=917&osCsid=40823974778ae324bbd6778f2e17b289Another problem is that the free-beer Xilinx software does not work with the largest chips, beyond LX45 or something.
|
|
|
|
TheSeven
|
 |
June 08, 2011, 05:30:10 PM |
|
Ever after reading the whole thread, I'm still not quite sure which board would currently be the best for mining. Is the Cyclone IV based Terasic DE2-115 Development Board still the best choice?
All development boards I've seen so far, which would be suited, aren't cost effective because they contain lots of peripherals that we don't need. If you want to go for a cost effective solution, you'll need to build a board yourself. If you don't need a cost effective solution, choose a board with a huge FPGA (Spartan6 LX150, Virtex5 LX110 or sone of the Altera ones which I don't really know), based on what your secondary application needs. Buying a development board just for mining won't pay off, it could serve as a prototype at best. it would pay off in about 2 Month, if the current price and difficulty holds 595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty. 966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days. Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
TheSeven
|
 |
June 08, 2011, 05:33:42 PM |
|
It isn't really clear though whether it's an LX100 or LX150.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
kokjo
Legendary
Offline
Activity: 1050
Merit: 1000
You are WRONG!
|
 |
June 08, 2011, 05:43:27 PM |
|
595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty. 966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days. Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation  i was thinking about the DE0-nano, price at 79$/59$ my fault.
|
"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
|
|
|
rb2k
Member

Offline
Activity: 109
Merit: 10
|
 |
June 08, 2011, 06:00:56 PM |
|
595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty. 966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days. Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation  i was thinking about the DE0-nano, price at 79$/59$ my fault. What is the current state of the DE0-nano (how many MH/s, room for improvement)? In general, I don't expect to get rich, but I'm a software engineer by trade and always happy about new gadgets  If those gadgets would finance themselves sooner or later, I'd also be happy about that 
|
|
|
|
kokjo
Legendary
Offline
Activity: 1050
Merit: 1000
You are WRONG!
|
 |
June 08, 2011, 06:11:44 PM |
|
595USD / 30USD/BTC / 50BTC/block * 567358shares/block * 4294967296hashes/share = 966591008532507 hashes until it pays off, at current difficulty. 966591008532507hashes / 80Mhashes/sec / 3600secs/hour / 24hours/day = 140 days. Two months? And all this relies on the price keeping pace with the difficulty, and didn't include any fees, downtimes, power costs or the PC driving the board. Account another 10-20% for that.
sorry for the bad calculation  i was thinking about the DE0-nano, price at 79$/59$ my fault. What is the current state of the DE0-nano (how many MH/s, room for improvement)? In general, I don't expect to get rich, but I'm a software engineer by trade and always happy about new gadgets  If those gadgets would finance themselves sooner or later, I'd also be happy about that  i am known for my bad calculations but i think it might give 20MH/s its ~20k LEs. the DE2 is 80MH/s and the design fully unrolled is about ~ 80k LEs. do the math yourself. mine might be wrong.
|
"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell
|
|
|
AnnihilaT
|
 |
June 08, 2011, 11:23:24 PM |
|
This seems to be a rather inexpensive system actually. Should yield 30GH/s. Which one would yield 30GH/s ? The cluster or the single card?
|
|
|
|
nathanrees19
|
 |
June 09, 2011, 12:03:43 AM |
|
What is the current state of the DE0-nano (how many MH/s, room for improvement)?
It fits with the unrolling parameter set to 3 (just). This results in one hash per 8 clock cycles, or 6.25MH/s at 50MHz (Max 79 MHz). Depending on price/difficulty it would bring in $5-10 per month. As with any development board, it isn't a cost effective mining solution. It is, however, a good choice if you're looking to learn.
|
|
|
|
rethaw
|
 |
June 09, 2011, 12:10:03 AM |
|
Hi All. I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design. But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240". I would appreciate any guidance you could provide. Thanks.
|
|
|
|
TheSeven
|
 |
June 09, 2011, 08:51:11 AM |
|
This seems to be a rather inexpensive system actually. Should yield 30GH/s. Which one would yield 30GH/s ? The cluster or the single card? The cluster (190MH/s per XC6SLX150).
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
TheSeven
|
 |
June 09, 2011, 08:53:33 AM |
|
Hi All. I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design. But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240". I would appreciate any guidance you could provide. Thanks. As you're probably not running at 120MHz you'll need to adjust the UART clock divider. If you provide your clock frequency I can calculate the correct values for you. Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
AnnihilaT
|
 |
June 09, 2011, 09:01:31 AM |
|
This seems to be a rather inexpensive system actually. Should yield 30GH/s. Which one would yield 30GH/s ? The cluster or the single card? The cluster (190MH/s per XC6SLX150). I just got a price quote on the single card as well... 9,995 USD. If i understand you correctly the card with 13 cores should yield about 2.5 GH/s. So at these rates it still seems cheaper to go with 3 6990's at 750 USD per piece than something like this. Power consumption is of course a factor but if you are talking pure hashes per sec to purchase price these things dont yet compete. Or am i missing something? 1 x PCIe card with 13 cores = 9,995 USD -> 2560 MH/s 3 x 6990 = +/- 2350 USD -> 2100 MH/s
|
|
|
|
nathanrees19
|
 |
June 09, 2011, 11:33:10 AM |
|
Power consumption is of course a factor but if you are talking pure hashes per sec to purchase price these things dont yet compete. Or am i missing something? When the difficulty is high enough that 6990s just barely turn a profit over the power cost, FPGA mining will become very cost effective.
|
|
|
|
makomk
|
 |
June 09, 2011, 12:52:02 PM Last edit: June 09, 2011, 01:05:32 PM by makomk |
|
Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.
Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this? Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
rethaw
|
 |
June 09, 2011, 06:19:37 PM |
|
Hi All. I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design. But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240". I would appreciate any guidance you could provide. Thanks. As you're probably not running at 120MHz you'll need to adjust the UART clock divider. If you provide your clock frequency I can calculate the correct values for you. Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used. The device is the xc6vlx240t. I'm currently using the 200MHz clock on the device and have the MMCM set to 100MHz. The MMCM supports up to 800MHz. Here's the utilization with a depth of 5. Device utilization summary: --------------------------- Selected Device : 6vlx240tff1156-3 Slice Logic Utilization: Number of Slice Registers: 50042 out of 301440 16% Number of Slice LUTs: 86029 out of 150720 57% Number used as Logic: 86028 out of 150720 57% Number used as Memory: 1 out of 58400 0% Number used as SRL: 1 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 86548 Number with an unused Flip Flop: 36506 out of 86548 42% Number with an unused LUT: 519 out of 86548 0% Number of fully used LUT-FF pairs: 49523 out of 86548 57% Number of unique control sets: 12 IO Utilization: Number of IOs: 3 Number of bonded IOBs: 3 out of 600 0% Specific Feature Utilization: Number of BUFG/BUFGCTRLs: 4 out of 32 12%
|
|
|
|
TheSeven
|
 |
June 09, 2011, 08:26:45 PM |
|
Thanks to the patch submitted by Udif, the code now supports a configurable amount of loop unrolling. The original design was fully unrolled, with 128 total round modules. By adjusting the CONFIG_LOOP_LOG2 Verilog define, you can choose to unroll to 64 round modules, 32, 16, 8, or 4. This makes the design smaller, at the equivalent cost of speed, which should allow it to run on many more FPGAs.
Hi. Having looked at the code I've got a question about the configurable loop unrolling. It appears from looking at sha256_transform.v that feedback is feeding the saved W and state into every stage of the hashing pipeline, not just the first, and I can't seem to see why this is necessary. What's more, if I'm reading what Quartus II is telling me correctly, doing this is costing me several MHz of clock speed and more importantly appears to be using fairly large amounts of logic resources. Is there any way to avoid this? Edit: Ah, having actually read the comments I now understand. You're doing feedback seperately at each stage of the pipeline, so each pipeline stage computes 2**DEPTH rounds and outputs at 1/(2**DEPTH) speed. Interesting. The trouble with this approach is that I don't think 512-bit wide muxes are exactly cheap. This obviously doesn't make sense on devices where the fully unrolled design fits, but on smaller parts it's the only way to make it work at all.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
TheSeven
|
 |
June 09, 2011, 08:31:35 PM |
|
Hi All. I am trying to implement the following on a Virtex 6. DCMs are no longer used on the Virtex 6 and have been replaced with MMCMs. So far I have swapped the DCM for an MMCM and am able to implement the design. But when I try to run the python script, it fails. I get a "Got bad message from FPGA: 240". I would appreciate any guidance you could provide. Thanks. As you're probably not running at 120MHz you'll need to adjust the UART clock divider. If you provide your clock frequency I can calculate the correct values for you. Oh, and it would be interesting which Virtex 6 model this is, which frequency you can reach and how many LUTs/slices/FFs are used. The device is the xc6vlx240t. I'm currently using the 200MHz clock on the device and have the MMCM set to 100MHz. The MMCM supports up to 800MHz. Here's the utilization with a depth of 5. Device utilization summary: --------------------------- Selected Device : 6vlx240tff1156-3 Slice Logic Utilization: Number of Slice Registers: 50042 out of 301440 16% Number of Slice LUTs: 86029 out of 150720 57% Number used as Logic: 86028 out of 150720 57% Number used as Memory: 1 out of 58400 0% Number used as SRL: 1 Slice Logic Distribution: Number of LUT Flip Flop pairs used: 86548 Number with an unused Flip Flop: 36506 out of 86548 42% Number with an unused LUT: 519 out of 86548 0% Number of fully used LUT-FF pairs: 49523 out of 86548 57% Number of unique control sets: 12 IO Utilization: Number of IOs: 3 Number of bonded IOBs: 3 out of 600 0% Specific Feature Utilization: Number of BUFG/BUFGCTRLs: 4 out of 32 12% How much headroom do you have frequency-wise? As you have lots of spare flipflops, you should probably cut the pipeline stages into halves by doubly-registering their output (xst will balance some logic into this additional clock cycle). You should manage to cross 200MHz that way. The UART clock divider values for 100MHz are "01101100100" and "10100010110".
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
deftx
Newbie
Offline
Activity: 9
Merit: 0
|
 |
June 10, 2011, 05:28:06 PM |
|
It's likely going to be impossible for these devices to reach economies of scale anywhere near GPUs. These particular devices are marketed towards higher end audiences anyhow, so I imagine there's more room for price to be charged.
I predict GPUs will almost always be more feasible because more are produced. The economic incentive to produce them for both gamers and miners will always be high, and drive the price down both due to efficiency and the price able to be charged.
There's always someone that can make that magic combination of components to drive the price down, so we'll see where it actually goes.
|
|
|
|
pdki
Newbie
Offline
Activity: 27
Merit: 0
|
 |
June 10, 2011, 08:08:35 PM |
|
I think with a real ASIC hardware implementation of sha-256 it should easily be possible to outrun GPUs by at least a factor of 100, because of better space efficiency and the simplicity of the logic involved.
Considering that -you can manufacture one of these for ~2M€ and then get 1000s of these chips -they will not consume much power -they can be put on cheap boards, because no heavy IO is needed (graphic cards are expensive due to the heavy IO with ram)
I am sure this will happen, if Bitcoins really establish as a currency and USD exchange rates stay in the 10$ range. If not, this would be a cheap option for aggressors like governments to take over the network. Much easier and cheaper then trying to shut it down by law.
|
|
|
|
vx609e
Newbie
Offline
Activity: 29
Merit: 0
|
 |
June 11, 2011, 03:31:13 PM Last edit: June 11, 2011, 07:41:07 PM by vx609e |
|
Hi, IMO, an ASIC implementation is the way to go. We already have decent RTL (those who contributed to this know who they are and I thank you guys for this). With little modifications to the currently RTL, we could easily daisy chain many "cores" (easiest implementation with current state of project is a token ring over UART...only need to assign a specific address to each core). Let's say each manufactured chip would yield 100 MHash/s. We daisy chain 20 per boards (a board with 20 chips on it is not a big deal) That's 2 GHash/s right there. PCB design and manufacturing would be pretty straight forward. I volunteer for that. The big question: how to we finance an ASIC project? And even more importantly: how do we get it done? 1) Outsource FPGA2ASIC flow to http://www.icnexus.com.tw/product.php?id=25 (first company I found...there's gotta be many others). Get a chips ASAP and limit the risks. With this forum, I'm sure we could get a small EE team together and do all the Synopsis, BIST, test scan, pads design, routing, etc. crap ourselves but there are specialists out there that will do it for us...and chances of success will be much higher with that approach. Being a 100% digital chip (+ regulator and PLL obviously) the project couldn't be easier for these guys (or whatever company that would get the contract)...now to mention they are already in the business of FPGA2ASIC conversion. 2) Crowd funding with kickstarter.com -- If we can get 500 people to pre-order one 2 GHash/s board at 1000$ a piece (a truly good deal IMO), we get a 500k$ budget to do #1. We need 10,000 chips. I think the budget makes sense if we spend 250k$ on design, 100k$ on chips (10$ a piece), 50k$ for tape-out (might be included in design cost...we need to see with the contractor), 10k$ on PCBs and assembly + the rest for overhead. Once we get real quote from contractor, we can adjust the cost per board...I'll I'm putting here are ball park figure to show the potential of this approach. So far in my career all I've done is deal with PCB, FPGA and ASIC designs...this project seem very realistic to me. But maybe I'm day dreaming...please bring me back to earth if I'm doing so. Feedback, suggestions and comments very welcome.
|
|
|
|
|