FPGA Development (SHA256 core)

Bitcoin Forum

July 31, 2024, 09:22:30 AM

Welcome, Guest. Please login or register.

News: Help 1Dq create 15th anniversary forum artwork.

Home

Help

Search

Login

Register

More

Bitcoin Forum > Other > Archival > CPU/GPU Bitcoin mining hardware > FPGA Development (SHA256 core)

Pages: [1] 2 » All

« previous topic next topic »

Author

Topic: FPGA Development (SHA256 core) (Read 13562 times)

OrphanedGland (OP)

Member

Offline

Offline

Activity: 70
Merit: 10

FPGA Development (SHA256 core)

June 12, 2011, 07:34:27 AM

#1

Locked out of the FPGA development thread due to the new 50 post restriction.

I mentioned in the main FPGA development thread (by fpgaminer) that I have coded an unrolled sha256 using additional pipelining and was asked about resource usage. I am now compiling with the subscription edition Quartus II v11.0 with an evaluation license. The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz. This device should fit 4 SHA256 pairs with a hash rate approaching 1GH/s. The EP4SE820 should be capable of 2GH/s. I have found a system that uses 20 of these on a single card. Yes this would be expensive to buy (>$200K I assume) but much smaller size and less power usage / heat than a cluster of computers running 6990s, maybe as low as 1-2kW total?

As for the cheaper Cyclone IV (EP4CE115F29I7), resource usage is 62K LE (38K combinational functions, 44K registers, it seems Cyclone cannot combine them effectively) and clock rate is 134MHz. A single SHA256 pair with the additional pipelining would struggle to fit in this device, however I have another version which uses just the precalculation of H + K + W to improve clock rate, which will be smaller. I have found a card with 27 Cyclone IIIs, wonder how much that would cost...

Interested to hear what other people have achieved in terms of clock rate and resource usage.

Jonathan Ryan Owens

Donator
Sr. Member

Offline

Offline

Activity: 392
Merit: 252

WWW

Re: FPGA Development (SHA256 core)

June 12, 2011, 09:26:39 AM

#2

Quote from: OrphanedGland on June 12, 2011, 07:34:27 AM

Locked out of the FPGA development thread due to the new 50 post restriction.

I mentioned in the main FPGA development thread (by fpgaminer) that I have coded an unrolled sha256 using additional pipelining and was asked about resource usage. I am now compiling with the subscription edition Quartus II v11.0 with an evaluation license. The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz. This device should fit 4 SHA256 pairs with a hash rate approaching 1GH/s. The EP4SE820 should be capable of 2GH/s. I have found a system that uses 20 of these on a single card. Yes this would be expensive to buy (>$200K I assume) but much smaller size and less power usage / heat than a cluster of computers running 6990s, maybe as low as 1-2kW total?

As for the cheaper Cyclone IV (EP4CE115F29I7), resource usage is 62K LE (38K combinational functions, 44K registers, it seems Cyclone cannot combine them effectively) and clock rate is 134MHz. A single SHA256 pair with the additional pipelining would struggle to fit in this device, however I have another version which uses just the precalculation of H + K + W to improve clock rate, which will be smaller. I have found a card with 27 Cyclone IIIs, wonder how much that would cost...

Interested to hear what other people have achieved in terms of clock rate and resource usage.

$200k for 1 GH/s? That's a deal right there..

http://ringcoin.com

kokjo

Legendary

Offline

Offline

Activity: 1050
Merit: 1000

You are WRONG!

Re: FPGA Development (SHA256 core)

June 12, 2011, 09:28:07 AM

#3

Quote from: pigki on June 12, 2011, 09:26:39 AM

Quote from: OrphanedGland on June 12, 2011, 07:34:27 AM

Locked out of the FPGA development thread due to the new 50 post restriction.

I mentioned in the main FPGA development thread (by fpgaminer) that I have coded an unrolled sha256 using additional pipelining and was asked about resource usage. I am now compiling with the subscription edition Quartus II v11.0 with an evaluation license. The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz. This device should fit 4 SHA256 pairs with a hash rate approaching 1GH/s. The EP4SE820 should be capable of 2GH/s. I have found a system that uses 20 of these on a single card. Yes this would be expensive to buy (>$200K I assume) but much smaller size and less power usage / heat than a cluster of computers running 6990s, maybe as low as 1-2kW total?

As for the cheaper Cyclone IV (EP4CE115F29I7), resource usage is 62K LE (38K combinational functions, 44K registers, it seems Cyclone cannot combine them effectively) and clock rate is 134MHz. A single SHA256 pair with the additional pipelining would struggle to fit in this device, however I have another version which uses just the precalculation of H + K + W to improve clock rate, which will be smaller. I have found a card with 27 Cyclone IIIs, wonder how much that would cost...

Interested to hear what other people have achieved in terms of clock rate and resource usage.

$200k for 1 GH/s? That's a deal right there..

no for 20GH/s, and very low power usage.

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell

Jonathan Ryan Owens

Donator
Sr. Member

Offline

Offline

Activity: 392
Merit: 252

WWW

Re: FPGA Development (SHA256 core)

June 12, 2011, 09:36:02 AM

#4

Quote from: kokjo on June 12, 2011, 09:28:07 AM

Quote from: pigki on June 12, 2011, 09:26:39 AM

Quote from: OrphanedGland on June 12, 2011, 07:34:27 AM

Locked out of the FPGA development thread due to the new 50 post restriction.

I mentioned in the main FPGA development thread (by fpgaminer) that I have coded an unrolled sha256 using additional pipelining and was asked about resource usage. I am now compiling with the subscription edition Quartus II v11.0 with an evaluation license. The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz. This device should fit 4 SHA256 pairs with a hash rate approaching 1GH/s. The EP4SE820 should be capable of 2GH/s. I have found a system that uses 20 of these on a single card. Yes this would be expensive to buy (>$200K I assume) but much smaller size and less power usage / heat than a cluster of computers running 6990s, maybe as low as 1-2kW total?

As for the cheaper Cyclone IV (EP4CE115F29I7), resource usage is 62K LE (38K combinational functions, 44K registers, it seems Cyclone cannot combine them effectively) and clock rate is 134MHz. A single SHA256 pair with the additional pipelining would struggle to fit in this device, however I have another version which uses just the precalculation of H + K + W to improve clock rate, which will be smaller. I have found a card with 27 Cyclone IIIs, wonder how much that would cost...

Interested to hear what other people have achieved in terms of clock rate and resource usage.

$200k for 1 GH/s? That's a deal right there..

no for 20GH/s, and very low power usage.

So $200k for 20 GH/s.. Assuming that BTCUSD stayed above 15, it would only take a year to get your money back.

http://ringcoin.com

kokjo

Legendary

Offline

Offline

Activity: 1050
Merit: 1000

You are WRONG!

Re: FPGA Development (SHA256 core)

June 12, 2011, 09:45:20 AM

#5

Quote from: pigki on June 12, 2011, 09:36:02 AM

Quote from: kokjo on June 12, 2011, 09:28:07 AM

Quote from: pigki on June 12, 2011, 09:26:39 AM

Quote from: OrphanedGland on June 12, 2011, 07:34:27 AM

Locked out of the FPGA development thread due to the new 50 post restriction.

I mentioned in the main FPGA development thread (by fpgaminer) that I have coded an unrolled sha256 using additional pipelining and was asked about resource usage. I am now compiling with the subscription edition Quartus II v11.0 with an evaluation license. The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz. This device should fit 4 SHA256 pairs with a hash rate approaching 1GH/s. The EP4SE820 should be capable of 2GH/s. I have found a system that uses 20 of these on a single card. Yes this would be expensive to buy (>$200K I assume) but much smaller size and less power usage / heat than a cluster of computers running 6990s, maybe as low as 1-2kW total?

As for the cheaper Cyclone IV (EP4CE115F29I7), resource usage is 62K LE (38K combinational functions, 44K registers, it seems Cyclone cannot combine them effectively) and clock rate is 134MHz. A single SHA256 pair with the additional pipelining would struggle to fit in this device, however I have another version which uses just the precalculation of H + K + W to improve clock rate, which will be smaller. I have found a card with 27 Cyclone IIIs, wonder how much that would cost...

Interested to hear what other people have achieved in terms of clock rate and resource usage.

$200k for 1 GH/s? That's a deal right there..

no for 20GH/s, and very low power usage.

So $200k for 20 GH/s.. Assuming that BTCUSD stayed above 15, it would only take a year to get your money back.

but after that you will get very rich

and with nearly no power usage, means you dont have to pay for getting rich

"The whole problem with the world is that fools and fanatics are always so certain of themselves and wiser people so full of doubts." -Bertrand Russell

OrphanedGland (OP)

Member

Offline

Offline

Activity: 70
Merit: 10

Re: FPGA Development (SHA256 core)

June 12, 2011, 09:51:26 AM

#6

Nah I am estimating $200k for 40GH/s, with capacity to make $65k month at current difficulty if btc hits $30 again.

Gregers

Newbie

Offline

Offline

Activity: 24
Merit: 0

Re: FPGA Development (SHA256 core)

June 12, 2011, 09:57:13 AM

#7

The FPGA stuff sounds like a really interesting investment and in the (unlikely?) event that the bitcoin stuff doesn't pan out then I'm sure you could sell the rig to an internet security company or something. 3 months to make your investment back really doesn't sound bad, though.

makomk

Hero Member

Offline

Offline

Activity: 686
Merit: 564

Re: FPGA Development (SHA256 core)

June 12, 2011, 10:56:21 AM

#8

Wow - that's fairly impressive. I guess precalculating must pay off in a big way, though that's probably not really surprising if you think about it. Managed to get it submitting shares yet? (I'm also curious if you've found a clean way to handle the parts of W that can't be precomputed; it's obviously doable but the obvious ways are really messy.)

Also, you're right about the Cyclone FPGAs not being able to combine combinational functions with registers very effectively. All their registers are hard-wired to the output of the LUTs and other logic devices, which means that if you need to feed a register from somewhere else (like from the output of a register) you can't use the LUT attached to that register for anything else.

I wonder if this'd fit into the EP4CE75...

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS

OrphanedGland (OP)

Member

Offline

Offline

Activity: 70
Merit: 10

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:11:38 PM
Last edit: June 12, 2011, 02:22:38 PM by OrphanedGland

#9

Quote from: makomk on June 12, 2011, 10:56:21 AM

Wow - that's fairly impressive. I guess precalculating must pay off in a big way, though that's probably not really surprising if you think about it. Managed to get it submitting shares yet? (I'm also curious if you've found a clean way to handle the parts of W that can't be precomputed; it's obviously doable but the obvious ways are really messy.)

Also, you're right about the Cyclone FPGAs not being able to combine combinational functions with registers very effectively. All their registers are hard-wired to the output of the LUTs and other logic devices, which means that if you need to feed a register from somewhere else (like from the output of a register) you can't use the LUT attached to that register for anything else.

I wonder if this'd fit into the EP4CE75...

I haven't spent any time on optimizing W calcs, mainly because the worst case path delay is caused by calculation of the A parameter. The H+K+W precalc is the simplest way to improve performance as H, K, W are all known in the previous stage. I get slightly better performance gains by further pipelining the A and E equations, although this seems to benefit Cyclone more than Stratix IV, perhaps because of fast carry chains in the Stratix device? The difficulty with pipelining the unrolled loop stages is that the equations for A/E change, and special cases need to be handled for the first and last few unrolled stages.

Also I haven't run this on an FPGA card yet, only simulated the core in ModelSim - still need to create a top level file similar to fpgaminers and cascade two of these SHA256 cores. A fully unrolled and pipelined design will not fit in EP4CE75, you should be going for a partially unrolled solution.

OrphanedGland (OP)

Member

Offline

Offline

Activity: 70
Merit: 10

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:17:19 PM

#10

Quote from: LazarusLong on June 12, 2011, 11:24:48 AM

Quote from: OrphanedGland on June 12, 2011, 07:34:27 AM

Interested to hear what other people have achieved in terms of clock rate and resource usage.

This post restriction seems a little bit...
Well lets start a new Newbie FPGA thread here Wink

Wink

I am still struggeling with a port of TheSevens seriel miner to a Lattice ECP33, the design fits but the P&R is refusing its work, says its too compilcated, frustrating...

The ECP33 seems like a slightly small device for this application. What resource usage does the synthesis tool report? Any higher than 75-80% and you are gonna be causing the fitter grief.

ijuz

Newbie

Offline

Offline

Activity: 6
Merit: 0

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:19:05 PM

#11

Quote from: OrphanedGland on June 12, 2011, 07:34:27 AM

The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz.

Did the you reach that frequency with or without placement constraints?

OrphanedGland (OP)

Member

Offline

Offline

Activity: 70
Merit: 10

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:26:31 PM

#12

Quote from: ijuz on June 12, 2011, 02:19:05 PM

Quote from: OrphanedGland on June 12, 2011, 07:34:27 AM

The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz.

Did the you reach that frequency with or without placement constraints?

No placement constraints, and virtual pins defined. Clock rate will probably drop when more of these are packed in but I would still expect > 200MHz on a full device.

mjoz

Member

Offline

Offline

Activity: 61
Merit: 10

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:29:57 PM

#13

Quote from: kokjo on June 12, 2011, 09:28:07 AM

Quote from: pigki on June 12, 2011, 09:26:39 AM

Quote from: OrphanedGland on June 12, 2011, 07:34:27 AM

Locked out of the FPGA development thread due to the new 50 post restriction.

I mentioned in the main FPGA development thread (by fpgaminer) that I have coded an unrolled sha256 using additional pipelining and was asked about resource usage. I am now compiling with the subscription edition Quartus II v11.0 with an evaluation license. The resource usage is around 44K LE in a Stratix IV (EP4SE530H40C2) for a single core and the clock rate achieved is 240MHz. This device should fit 4 SHA256 pairs with a hash rate approaching 1GH/s. The EP4SE820 should be capable of 2GH/s. I have found a system that uses 20 of these on a single card. Yes this would be expensive to buy (>$200K I assume) but much smaller size and less power usage / heat than a cluster of computers running 6990s, maybe as low as 1-2kW total?

As for the cheaper Cyclone IV (EP4CE115F29I7), resource usage is 62K LE (38K combinational functions, 44K registers, it seems Cyclone cannot combine them effectively) and clock rate is 134MHz. A single SHA256 pair with the additional pipelining would struggle to fit in this device, however I have another version which uses just the precalculation of H + K + W to improve clock rate, which will be smaller. I have found a card with 27 Cyclone IIIs, wonder how much that would cost...

Interested to hear what other people have achieved in terms of clock rate and resource usage.

$200k for 1 GH/s? That's a deal right there..

no for 20GH/s, and very low power usage.

For that kind of money you can buy about 200GH/s through noisy over, power consuming, heat producing rigs. FPGA has a long way to go unless your rich and have an irrational desire to go green regardless of the expense.

LCID Fire

Newbie

Offline

Offline

Activity: 1
Merit: 0

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:35:24 PM

#14

I would be very interested in that code and perhaps find out whether we can port it to run on GPUs as well.
That's currently the area the miners run on CPU still.

ijuz

Newbie

Offline

Offline

Activity: 6
Merit: 0

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:36:57 PM

#15

Quote from: OrphanedGland on June 12, 2011, 02:26:31 PM

No placement constraints, and virtual pins defined. Clock rate will probably drop when more of these are packed in but I would still expect > 200MHz on a full device.

Nice, Quartus seems to be much better in this regards than ISE.
I build a quarter sized pipeline (no loopback, just to see how behaves) for Virtex6 and it worked decently, a full sized pipeline took already a _very_ long buildtime and the reachable frequency was pretty bad or edatastic.

ijuz

Newbie

Offline

Offline

Activity: 6
Merit: 0

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:38:12 PM

#16

Quote from: LCID Fire on June 12, 2011, 02:35:24 PM

I would be very interested in that code and perhaps find out whether we can port it to run on GPUs as well.

How do you run Verilog code on an GPU? ;-)

mpfrank

Sr. Member

Offline

Offline

Activity: 247
Merit: 250

Cosmic Cubist

Re: FPGA Development (SHA256 core)

June 12, 2011, 02:40:45 PM

#17

Quote from: OrphanedGland on June 12, 2011, 02:11:38 PM

Quote from: makomk on June 12, 2011, 10:56:21 AM

Wow - that's fairly impressive. I guess precalculating must pay off in a big way, though that's probably not really surprising if you think about it. Managed to get it submitting shares yet? (I'm also curious if you've found a clean way to handle the parts of W that can't be precomputed; it's obviously doable but the obvious ways are really messy.)

Also, you're right about the Cyclone FPGAs not being able to combine combinational functions with registers very effectively. All their registers are hard-wired to the output of the LUTs and other logic devices, which means that if you need to feed a register from somewhere else (like from the output of a register) you can't use the LUT attached to that register for anything else.

I wonder if this'd fit into the EP4CE75...

I haven't spent any time on optimizing W calcs, mainly because the worst case path delay is caused by calculation of the A parameter. The H+K+W precalc is the simplest way to improve performance as H, K, W are all known in the previous stage. I get slightly better performance gains by further pipelining the A and E equations, although this seems to benefit Cyclone more than Stratix IV, perhaps because of fast carry chains in the Stratix device? The difficulty with pipelining the unrolled loop stages is that the equations for A/E change, and special cases need to be handled for the first and last few unrolled stages.

Also I haven't run this on an FPGA card yet, only simulated the core in ModelSim - still need to create a top level file similar to fpgaminers and cascade two of these SHA256 cores. A fully unrolled and pipelined design will not fit in EP4CE75, you should be going for a partially unrolled solution.

Have you considered using carry-save adders to achieve faster clock speeds? Using carry-save effectively pipelines long carry chains, and usually means you can achieve an adder throughput at the limiting clock speed that is achievable for 1 combinational LUT stage between each stage of pipeline registers. I've found that the adder megafunctions included in Altera's tools cannot run as fast.

If all the sovereign non-cryptocurrencies will eventually collapse from hyperinflation, you can't afford *not* to invest in Bitcoin... See my blog at http://minetopics.blogspot.com/ .

Donations accepted at: 17twYNyqTiCTM2gJmumkytvhZh4sCVSKNH

phillipsjk

Legendary

Offline

Offline

Activity: 1008
Merit: 1001

Let the chips fall where they may.

WWW

Re: FPGA Development (SHA256 core)

June 12, 2011, 03:24:42 PM
Last edit: June 12, 2011, 03:35:21 PM by phillipsjk

#18

Quote from: mjoz on June 12, 2011, 02:29:57 PM

For that kind of money you can buy about 200GH/s through noisy over, power consuming, heat producing rigs. FPGA has a long way to go unless your rich and have an irrational desire to go green regardless of the expense.

If you are generating your own power, the start-up costs are cheaper if you can reduce power usage significantly. I did the math for solar power in this post.

Quote

12 Watts is 288 Watt-hours (1.04 MJ) per day. A 12V battery would need a capacity of at least 24 Amp-hours to supply that much load all day (6 amp-hours for a 48V battery). You will want to be able to fully charge the battery in full sun during the day. To do this, the solar panels must charge the battery within ~8 hours (preferably 6). That will take at least 36Watts (assuming 100% battery efficiency) + the 12Watts you are constantly drawing (48 Watts). Round up to a 60Watt panel. For a 60Watt load, you need to multiply all those numbers by 5 (30 amp-hour 48 Volt battery, 300 Watts of panels).

For a 600 Watt load, multiply the 60 Watt load by 10: 300 amp-hour 48V battery, 3000 Watts of panels.

The beauty of it is that once the infrastructure is paid off, you have "free" (but limited) power. You can still keep power hungry machines using grid power on standby in case the network hash rate drops for whatever reason.

Edit: batteries would probably need replacing every 5 years.

PS: Solar panels are just an example. Once bitcoin mining goes industrial, we will see the large miners building mega-projects.

James' OpenPGP public key fingerprint: EB14 9E5B F80C 1F2D 3EBE 0A2F B3DE 81FF 7B9D 5160

OrphanedGland (OP)

Member

Offline

Offline

Activity: 70
Merit: 10

Re: FPGA Development (SHA256 core)

June 12, 2011, 03:43:44 PM

#19

Quote from: mpfrank on June 12, 2011, 02:40:45 PM

Quote from: OrphanedGland on June 12, 2011, 02:11:38 PM

Quote from: makomk on June 12, 2011, 10:56:21 AM

Wow - that's fairly impressive. I guess precalculating must pay off in a big way, though that's probably not really surprising if you think about it. Managed to get it submitting shares yet? (I'm also curious if you've found a clean way to handle the parts of W that can't be precomputed; it's obviously doable but the obvious ways are really messy.)

Also, you're right about the Cyclone FPGAs not being able to combine combinational functions with registers very effectively. All their registers are hard-wired to the output of the LUTs and other logic devices, which means that if you need to feed a register from somewhere else (like from the output of a register) you can't use the LUT attached to that register for anything else.

I wonder if this'd fit into the EP4CE75...

I haven't spent any time on optimizing W calcs, mainly because the worst case path delay is caused by calculation of the A parameter. The H+K+W precalc is the simplest way to improve performance as H, K, W are all known in the previous stage. I get slightly better performance gains by further pipelining the A and E equations, although this seems to benefit Cyclone more than Stratix IV, perhaps because of fast carry chains in the Stratix device? The difficulty with pipelining the unrolled loop stages is that the equations for A/E change, and special cases need to be handled for the first and last few unrolled stages.

Also I haven't run this on an FPGA card yet, only simulated the core in ModelSim - still need to create a top level file similar to fpgaminers and cascade two of these SHA256 cores. A fully unrolled and pipelined design will not fit in EP4CE75, you should be going for a partially unrolled solution.

Have you considered using carry-save adders to achieve faster clock speeds? Using carry-save effectively pipelines long carry chains, and usually means you can achieve an adder throughput at the limiting clock speed that is achievable for 1 combinational LUT stage between each stage of pipeline registers. I've found that the adder megafunctions included in Altera's tools cannot run as fast.

Seems like a worthy consideration

njloof

Member

Offline

Offline

Activity: 73
Merit: 10

Re: FPGA Development (SHA256 core)

June 12, 2011, 05:52:49 PM

#20

Quote from: eturnerx on June 12, 2011, 05:49:25 PM

Subscribe (don't mind me) Cheesy

Cheesy

Am I smoking crack or is there a "notify" button that does this same thing without the threadbump?

Pages: [1] 2 » All

Bitcoin Forum > Other > Archival > CPU/GPU Bitcoin mining hardware > FPGA Development (SHA256 core)

« previous topic next topic »

Jump to:

Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines