Bitcoin Forum
December 11, 2016, 06:19:29 AM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 [38] 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 403246 times)
iidx
Jr. Member
*
Offline Offline

Activity: 35


View Profile
April 15, 2013, 07:24:50 AM
 #741

Looks good!  I tried to do the same thing on a V6 LX130T (use almost all DSPs and pipeline the rest of the LUT adders), but there aren't enough registers in that device for tx_w and tx_state delays Sad.  so many 512 and 256 bit registers...

BTW, what does Xpower report for that design at 400 MHz?
1481437169
Hero Member
*
Offline Offline

Posts: 1481437169

View Profile Personal Message (Offline)

Ignore
1481437169
Reply with quote  #2

1481437169
Report to moderator
1481437169
Hero Member
*
Offline Offline

Posts: 1481437169

View Profile Personal Message (Offline)

Ignore
1481437169
Reply with quote  #2

1481437169
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1481437169
Hero Member
*
Offline Offline

Posts: 1481437169

View Profile Personal Message (Offline)

Ignore
1481437169
Reply with quote  #2

1481437169
Report to moderator
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
April 15, 2013, 07:49:13 AM
 #742

Quote
BTW, what does Xpower report for that design at 400 MHz?
Vivado said ~8-9W, but I don't have it set up with the right information for it to make an accurate measurement.  Using my Kill-a-Watt I estimate about 15W.

I hacked support into MPBM for this new firmware, and she's happily mining away now.  Die temperature is 62C using just the stock cooling on the KC705.  Cool

kingcoin
Sr. Member
****
Offline Offline

Activity: 262


View Profile
April 15, 2013, 08:17:40 AM
 #743

I have just pushed the experimental KC705 code to the repo.  Here is the project.  This is a DSP48E1 based design, and I have compiled and run it at 400MH/s.  I

Great! Thank you. I thought it would be interesting to browse the DSP48 code to see how you can archive the impressive performance.
anomalies
Newbie
*
Offline Offline

Activity: 13



View Profile
April 17, 2013, 01:36:43 PM
 #744

hi, total newbs here.  Grin

just wanna ask since i got this fpga for free (my friends bought it and decide not use it for whatever reason), could i use this for BTC mining?

Genesys™ Virtex-5 FPGA Development Board
http://www.digilentinc.com/Products/Detail.cfm?Prod=GENESYS

thank you for your kind answer.

regards,
Reggie0
Member
**
Offline Offline

Activity: 102


View Profile
April 17, 2013, 03:02:41 PM
 #745

fpgaminer: is there any advantage using "{a,b,c}<={x,y,z};" instead of "a<=x;b<=y;c<=z;"  ?
(My opinion is it only helps to make more readable code.)
Reggie0
Member
**
Offline Offline

Activity: 102


View Profile
April 17, 2013, 05:20:49 PM
 #746

hi, total newbs here.  Grin

just wanna ask since i got this fpga for free (my friends bought it and decide not use it for whatever reason), could i use this for BTC mining?

Genesys™ Virtex-5 FPGA Development Board
http://www.digilentinc.com/Products/Detail.cfm?Prod=GENESYS

thank you for your kind answer.

regards,

Probably you can use it, but it will be slow, because 50k logic gate is not enough to use fully unrolled pipes. As i know Spartan-6 LX90T produces 90MH/s, and it has almost twice gates.
ihtfp
Newbie
*
Offline Offline

Activity: 12


View Profile
April 17, 2013, 05:27:20 PM
 #747

hi, total newbs here.  Grin

just wanna ask since i got this fpga for free (my friends bought it and decide not use it for whatever reason), could i use this for BTC mining?

Genesys™ Virtex-5 FPGA Development Board
http://www.digilentinc.com/Products/Detail.cfm?Prod=GENESYS

thank you for your kind answer.

regards,

hi, total newbs here.  Grin

just wanna ask since i got this fpga for free (my friends bought it and decide not use it for whatever reason), could i use this for BTC mining?

Genesys™ Virtex-5 FPGA Development Board
http://www.digilentinc.com/Products/Detail.cfm?Prod=GENESYS

thank you for your kind answer.

regards,

Probably you can use it, but it will be slow, because 50k logic gate is not enough to use fully unrolled pipes. As i know Spartan-6 LX90T produces 90MH/s, and it has almost twice gates.
I wouldn't use it.  This FPGA only has 28k flip flops.  The Spartan6 LX150 has 184k for comparison.   As Reggie0 said, you wouldn't be able to use fully unrolled logic.

ihtfp
Newbie
*
Offline Offline

Activity: 12


View Profile
April 17, 2013, 05:56:49 PM
 #748

Looks good!  I tried to do the same thing on a V6 LX130T (use almost all DSPs and pipeline the rest of the LUT adders), but there aren't enough registers in that device for tx_w and tx_state delays Sad.  so many 512 and 256 bit registers...


   If you are short on flip flops, have you considered using the BRAMs?  You would need 11 primitives (there are 264 in the LX130T) to make a by 792 bit wide memory.  You can set the BRAM to 'write first' mode, which will echo the data to the output.  The clk-to-out for unpipelined BRAM is ~2.0ns...slower than FF. 
   Since the BRAMs are dual port, you can use both sides of the memory (with different locked addresses), you can get enough storage for 48 stages of a fully unrolled algorithm.   
   I've never tried this, but was just thinking of how to make use of all the unused BRAM laying around.  I usually run out of LUTs, but need to rethink if this is worthwhile with the DSP48 implementation.

anomalies
Newbie
*
Offline Offline

Activity: 13



View Profile
April 17, 2013, 06:32:01 PM
 #749

@ihftp: thanks for the info.. now i know why he didn't use it.  Grin
i got 5 of those though.. shame, can't be fully utilised it.
iidx
Jr. Member
*
Offline Offline

Activity: 35


View Profile
April 17, 2013, 09:18:10 PM
 #750

I think the problem is linking 11 BRAMs together requires a lot of LUTs for address decode/routing since the BRAMs are arranged in columns throughout the chip.  Plus linking 11 together would probably result in a minimum period much higher than 2.0ns (2.0 ns is for 1 BRAM I think).

So, you would need 128 (hashers) * 11 (BRAMs) for one pipeline stage = 1408 total BRAMs.  Of course, you're not suggesting you use BRAM for all the delay.  However, I think the slices you would sacrifice to connect the BRAMs and create their address logic would be more expensive than just using the built in FFs or DMEMs (plus the speed hit).

I'm hoping by floor planning each hashing module I can get to quick speeds.  Currently the logic delay I am facing is only around ~2.0 ns, with the routes taking the rest.  So with some nice routing I would hopefully meet my target.

The V6LX130 isn't even as big as the S6 150, but at least is has DSP48s.

I may also need to cut down the PCIe link from 4x to 1x and reduce its performance settings to regain some of the space that is being used up.

IIDX

Looks good!  I tried to do the same thing on a V6 LX130T (use almost all DSPs and pipeline the rest of the LUT adders), but there aren't enough registers in that device for tx_w and tx_state delays Sad.  so many 512 and 256 bit registers...


   If you are short on flip flops, have you considered using the BRAMs?  You would need 11 primitives (there are 264 in the LX130T) to make a by 792 bit wide memory.  You can set the BRAM to 'write first' mode, which will echo the data to the output.  The clk-to-out for unpipelined BRAM is ~2.0ns...slower than FF. 
   Since the BRAMs are dual port, you can use both sides of the memory (with different locked addresses), you can get enough storage for 48 stages of a fully unrolled algorithm.   
   I've never tried this, but was just thinking of how to make use of all the unused BRAM laying around.  I usually run out of LUTs, but need to rethink if this is worthwhile with the DSP48 implementation.


ihtfp
Newbie
*
Offline Offline

Activity: 12


View Profile
April 17, 2013, 09:32:45 PM
 #751

IIDX,
 
    The addressing would be constant, so no decoding would be needed.  They would be tied off to constants.
    The 2.0ns is the clk-to-out time for a data output.  Since all outputs are in parallel, (each BRAM configured as x72, and grouped together to give very wide access),the individual BRAM bit delay would not change. No demuxing of outputs would be necessary.
    The number of BRAMs needed is only half what you show, since you can use both sides (Port A & Port B) independently (assign each side a fixed, but different address).
    Yes, you are right though re getting the data from the BRAMs to the LUTs needed for the computation.  There is a routing delay which is probably too large.
    Obviously this is not the optimum solution, only bringing it up as a last resort if available flip flops have expired.
Regards,
ihtfp


I think the problem is linking 11 BRAMs together requires a lot of LUTs for address decode/routing since the BRAMs are arranged in columns throughout the chip.  Plus linking 11 together would probably result in a minimum period much higher than 2.0ns (2.0 ns is for 1 BRAM I think).

So, you would need 128 (hashers) * 11 (BRAMs) for one pipeline stage = 1408 total BRAMs.  Of course, you're not suggesting you use BRAM for all the delay.  However, I think the slices you would sacrifice to connect the BRAMs and create their address logic would be more expensive than just using the built in FFs or DMEMs (plus the speed hit).

I'm hoping by floor planning each hashing module I can get to quick speeds.  Currently the logic delay I am facing is only around ~2.0 ns, with the routes taking the rest.  So with some nice routing I would hopefully meet my target.

The V6LX130 isn't even as big as the S6 150, but at least is has DSP48s.

I may also need to cut down the PCIe link from 4x to 1x and reduce its performance settings to regain some of the space that is being used up.

IIDX

Looks good!  I tried to do the same thing on a V6 LX130T (use almost all DSPs and pipeline the rest of the LUT adders), but there aren't enough registers in that device for tx_w and tx_state delays Sad.  so many 512 and 256 bit registers...


   If you are short on flip flops, have you considered using the BRAMs?  You would need 11 primitives (there are 264 in the LX130T) to make a by 792 bit wide memory.  You can set the BRAM to 'write first' mode, which will echo the data to the output.  The clk-to-out for unpipelined BRAM is ~2.0ns...slower than FF. 
   Since the BRAMs are dual port, you can use both sides of the memory (with different locked addresses), you can get enough storage for 48 stages of a fully unrolled algorithm.   
   I've never tried this, but was just thinking of how to make use of all the unused BRAM laying around.  I usually run out of LUTs, but need to rethink if this is worthwhile with the DSP48 implementation.


AJRGale
Hero Member
*****
Offline Offline

Activity: 728



View Profile
April 18, 2013, 02:39:54 AM
 #752

So, looking into this whole mining with fpga system, and this code you people are working on, what is the required Logic cells/gates required for a full roll out? also whats the smallest unit you can get it running on? (the bare minimal for a half roll out (what ever you call it?))

I just want to dip my toe into the FPGA mining with a cheap and nasty chip set Wink

just tell me to piddle off else where if its the wrong spot to ask
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
April 18, 2013, 05:14:42 AM
 #753

Quote
fpgaminer: is there any advantage using "{a,b,c}<={x,y,z};" instead of "a<=x;b<=y;c<=z;"  ?
(My opinion is it only helps to make more readable code.)
No advantage, no.  As you pointed out, it would only be for readability.

ihtfp
Newbie
*
Offline Offline

Activity: 12


View Profile
April 18, 2013, 04:56:34 PM
 #754

So, looking into this whole mining with fpga system, and this code you people are working on, what is the required Logic cells/gates required for a full roll out? also whats the smallest unit you can get it running on? (the bare minimal for a half roll out (what ever you call it?))

I just want to dip my toe into the FPGA mining with a cheap and nasty chip set Wink

just tell me to piddle off else where if its the wrong spot to ask

AJRGale,
    I think you'll want at least a Spartan6 LX150.  This is the cheapest device I would use.  I would only run a fully pipelined implementation -- one that can do one hash per clock cycle.  If you can get a hold of a Kintex7 or Virtex7 board you'll be a lot better because you can instantiate more miners. 
    fpgaminer has posted a lot of useful code on github.
    I don't speak Altera, so not sure on specific devices.
AJRGale
Hero Member
*****
Offline Offline

Activity: 728



View Profile
April 19, 2013, 03:21:57 PM
 #755

So, looking into this whole mining with fpga system, and this code you people are working on, what is the required Logic cells/gates required for a full roll out? also whats the smallest unit you can get it running on? (the bare minimal for a half roll out (what ever you call it?))

I just want to dip my toe into the FPGA mining with a cheap and nasty chip set Wink

just tell me to piddle off else where if its the wrong spot to ask

AJRGale,
    I think you'll want at least a Spartan6 LX150.  This is the cheapest device I would use.  I would only run a fully pipelined implementation -- one that can do one hash per clock cycle.  If you can get a hold of a Kintex7 or Virtex7 board you'll be a lot better because you can instantiate more miners. 
    fpgaminer has posted a lot of useful code on github.
    I don't speak Altera, so not sure on specific devices.


so 150K gates? like a Cyclone V? (no idea what gates to logic cells ratios really are) ...so that means 75K gates for half miner?

Ether way, cant find a Spartan6 LX150, can find a http://www.adafruit.com/products/451 "DE0-Nano - Altera Cyclone IV FPGA starter board "
a miner could run on it, buut, only the smallest one to what I've read out of here, at 5Mh/s...

maybe i should look at the code and work out how to use the dev suite, maybe it might tell me what it needs to run i have no idea what I'll be looking at though :/
senseless
Sr. Member
****
Offline Offline

Activity: 388



View Profile
April 19, 2013, 03:41:41 PM
 #756

I have just pushed the experimental KC705 code to the repo.

Thanks!

I ordered my AC701 today. I'm playing with the eval software now. The clocks run a little bit slower than the Kintex line, but it has nearly as many DSPs as the chip you're using. I have high hopes for a minimum of 600Mh/s and shooting for 800Mh/s. Initial compile showing 92% dsp usage, 43% lut usage, 67% memory lut usage and a clock of 345mhz or so. Should be able to squeeze another core in there.

I was wondering, did it really take them 2 weeks to process & ship your unit to you after ordering? That's a big yes. They're not going to ship my card for 2 weeks after ordering Sad . Maybe they've got a large order queue? Maybe each card is made to order? no idea. Seems a rather long time to wait though.

AJR,

If you're going to get into it I would highly recommend you get the 705 or the 701.

http://www.xilinx.com/products/boards-and-kits/EK-K7-KC705-G.htm
http://www.xilinx.com/products/boards-and-kits/EK-A7-AC701-G.htm

The 705 will have room for more hashers, but I believe the Artix chip may be more cost effective.


AJRGale
Hero Member
*****
Offline Offline

Activity: 728



View Profile
April 19, 2013, 05:24:32 PM
 #757

I have just pushed the experimental KC705 code to the repo.
....
AJR,

If you're going to get into it I would highly recommend you get the 705 or the 701.
...


that's outside my budget pricing.. would be nice though
if i can get a cheap and nasty going, getting half a dozen coins over the next few months, then i will get one
senseless
Sr. Member
****
Offline Offline

Activity: 388



View Profile
April 19, 2013, 05:52:43 PM
 #758

that's outside my budget pricing.. would be nice though
if i can get a cheap and nasty going, getting half a dozen coins over the next few months, then i will get one

My biggest problem was software. If you're going for used chips, make sure whatever chip you buy has free development software for it. Wanting licensed software for a dev board is one of the reason I bought the kit.
AJRGale
Hero Member
*****
Offline Offline

Activity: 728



View Profile
April 19, 2013, 08:24:30 PM
 #759

that's outside my budget pricing.. would be nice though
if i can get a cheap and nasty going, getting half a dozen coins over the next few months, then i will get one

My biggest problem was software. If you're going for used chips, make sure whatever chip you buy has free development software for it. Wanting licensed software for a dev board is one of the reason I bought the kit.

the nano i was looking at does have "two CDs with the software necessary to 'compile' and 'upload' code to the board. " but not sure if the EP4CE22F17C6N is usable
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
April 19, 2013, 08:53:05 PM
 #760

Quote
So I need to figure out what additional logic exists besides the SHA-256 module, how do they interact with each other and how do they interact with the SHA-256 modules?
First, please note that there are multiple "flavors" of the hashing code, and for the most part they are optimized for synthesis to FPGA targets.  I would highly suggest you hire an ASIC engineer who can take the time to understand SHA-256 and the needs of the Bitcoin proof of work mining algorithm himself.

Second, the SHA-256 hashing units do need a controlling unit.  As you can see in the modules you linked, there are a few top-level signals that are expected to be driven by a controller.  Most importantly rx_state and rx_input.  And you need a controller to check the results, and talk to the outside world.

This is the top-level module for one of the projects you linked to.  In there you will find the code that controls the sha256_transform instances, and how they are connected together.

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 [38] 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!