Bitcoin Forum
September 25, 2024, 08:49:00 AM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 [4]
61  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Smaller Devices Now Supported!) on: June 14, 2011, 12:51:57 PM
Don't suppose anyone wants to help me buy this? http://www.dinigroup.com/new/DNBFC_S12_12_Cluster.php
62  Other / Beginners & Help / Re: Newbie restrictions (Please discuss forum policy here.) on: June 12, 2011, 04:18:24 PM
would like whitelisting please
That's nice and all but we would like some justification to put on the paperwork. The paper-pushers don't want us letting out people willy-nilly.

Well I wanted to contribute to the FPGA development thread.  As it stands I have created a new thread in the Newbies section, so most people probably won't read it.
63  Other / Beginners & Help / Re: Newbie restrictions (Please discuss forum policy here.) on: June 12, 2011, 04:09:09 PM
would like whitelisting please
64  Other / CPU/GPU Bitcoin mining hardware / Re: FPGA Development (SHA256 core) on: June 12, 2011, 03:43:44 PM
Wow - that's fairly impressive. I guess precalculating must pay off in a big way, though that's probably not really surprising if you think about it. Managed to get it submitting shares yet? (I'm also curious if you've found a clean way to handle the parts of W that can't be precomputed; it's obviously doable but the obvious ways are really messy.)

Also, you're right about the Cyclone FPGAs not being able to combine combinational functions with registers very effectively. All their registers are hard-wired to the output of the LUTs and other logic devices, which means that if you need to feed a register from somewhere else (like from the output of a register) you can't use the LUT attached to that register for anything else.

I wonder if this'd fit into the EP4CE75...

I haven't spent any time on optimizing W calcs, mainly because the worst case path delay is caused by calculation of the A parameter.  The H+K+W precalc is the simplest way to improve performance as H, K, W are all known in the previous stage.  I get slightly better performance gains by further pipelining the A and E equations, although this seems to benefit Cyclone more than Stratix IV, perhaps because of fast carry chains in the Stratix device?  The difficulty with pipelining the unrolled loop stages is that the equations for A/E change, and special cases need to be handled for the first and last few unrolled stages.

Also I haven't run this on an FPGA card yet, only simulated the core in ModelSim - still need to create a top level file similar to fpgaminers and cascade two of these SHA256 cores.  A fully unrolled and pipelined design will not fit in EP4CE75, you should be going for a partially unrolled solution.

Have you considered using carry-save adders to achieve faster clock speeds?  Using carry-save effectively pipelines long carry chains, and usually means you can achieve an adder throughput at the limiting clock speed that is achievable for 1 combinational LUT stage between each stage of pipeline registers.  I've found that the adder megafunctions included in Altera's tools cannot run as fast.

Seems like a worthy consideration
65  Other / CPU/GPU Bitcoin mining hardware / Re: FPGA Development (SHA256 core) on: June 12, 2011, 02:26:31 PM
The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz.

Did the you reach that frequency with or without placement constraints?

No placement constraints, and virtual pins defined.  Clock rate will probably drop when more of these are packed in but I would still expect > 200MHz on a full device.
66  Other / CPU/GPU Bitcoin mining hardware / Re: FPGA Development (SHA256 core) on: June 12, 2011, 02:17:19 PM

Interested to hear what other people have achieved in terms of clock rate and resource usage.

This post restriction seems a little bit...
Well lets start a new Newbie FPGA thread here Wink

I am still struggeling with a port of TheSevens seriel miner to a Lattice ECP33, the design fits but the P&R is refusing its work, says its too compilcated, frustrating...

The ECP33 seems like a slightly small device for this application.  What resource usage does the synthesis tool report? Any higher than 75-80% and you are gonna be causing the fitter grief.
67  Other / CPU/GPU Bitcoin mining hardware / Re: FPGA Development (SHA256 core) on: June 12, 2011, 02:11:38 PM
Wow - that's fairly impressive. I guess precalculating must pay off in a big way, though that's probably not really surprising if you think about it. Managed to get it submitting shares yet? (I'm also curious if you've found a clean way to handle the parts of W that can't be precomputed; it's obviously doable but the obvious ways are really messy.)

Also, you're right about the Cyclone FPGAs not being able to combine combinational functions with registers very effectively. All their registers are hard-wired to the output of the LUTs and other logic devices, which means that if you need to feed a register from somewhere else (like from the output of a register) you can't use the LUT attached to that register for anything else.

I wonder if this'd fit into the EP4CE75...

I haven't spent any time on optimizing W calcs, mainly because the worst case path delay is caused by calculation of the A parameter.  The H+K+W precalc is the simplest way to improve performance as H, K, W are all known in the previous stage.  I get slightly better performance gains by further pipelining the A and E equations, although this seems to benefit Cyclone more than Stratix IV, perhaps because of fast carry chains in the Stratix device?  The difficulty with pipelining the unrolled loop stages is that the equations for A/E change, and special cases need to be handled for the first and last few unrolled stages.

Also I haven't run this on an FPGA card yet, only simulated the core in ModelSim - still need to create a top level file similar to fpgaminers and cascade two of these SHA256 cores.  A fully unrolled and pipelined design will not fit in EP4CE75, you should be going for a partially unrolled solution.
68  Other / CPU/GPU Bitcoin mining hardware / Re: FPGA Development (SHA256 core) on: June 12, 2011, 09:51:26 AM
Nah I am estimating $200k for 40GH/s, with capacity to make $65k month at current difficulty if btc hits $30 again.
69  Other / CPU/GPU Bitcoin mining hardware / FPGA Development (SHA256 core) on: June 12, 2011, 07:34:27 AM
Locked out of the FPGA development thread due to the new 50 post restriction.

I mentioned in the main FPGA development thread (by fpgaminer) that I have coded an unrolled sha256 using additional pipelining and was asked about resource usage.  I am now compiling with the subscription edition Quartus II v11.0 with an evaluation license.  The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz.  This device should fit 4 SHA256 pairs with a hash rate approaching 1GH/s.  The EP4SE820 should be capable of 2GH/s.  I have found a system that uses 20 of these on a single card.  Yes this would be expensive to buy (>$200K I assume) but much smaller size and less power usage / heat than a cluster of computers running 6990s, maybe as low as 1-2kW total?

As for the cheaper Cyclone IV (EP4CE115F29I7), resource usage is 62K LE (38K combinational functions, 44K registers, it seems Cyclone cannot combine them effectively) and clock rate is 134MHz.  A single SHA256 pair with the additional pipelining would struggle to fit in this device, however I have another version which uses just the precalculation of H + K + W to improve clock rate, which will be smaller.  I have found a card with 27 Cyclone IIIs, wonder how much that would cost...

Interested to hear what other people have achieved in terms of clock rate and resource usage.
70  Bitcoin / Hardware / Re: Official Open Source FPGA Bitcoin Miner (Smaller Devices Now Supported!) on: June 11, 2011, 09:27:06 PM
Just coded a fully unrolled SHA256 in VHDL using two different approaches to maximize clock rate, a simple approach that involves precalculating H + K + W, and a more advanced approach that further pipelines each stage.  Initial compiles targetted Cyclone IV using web edition quartus (which sucks), with the simple version achieving 110MHz and the advanced version 133MHz.  Will be interested to see maximum clock rate that can be achieved on Stratix IV.
Pages: « 1 2 3 [4]
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!