Bitcoin Forum
December 10, 2016, 08:46:24 PM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 [39] 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 403182 times)
kramble
Sr. Member
****
Offline Offline

Activity: 384



View Profile WWW
April 19, 2013, 08:57:26 PM
 #761

the nano i was looking at does have "two CDs with the software necessary to 'compile' and 'upload' code to the board. " but not sure if the EP4CE22F17C6N is usable

The DE0-Nano is great to get started learning about fpga's, but it won't make you any useful coin. 5MHash/sec is about right, it will go faster but not without risk of overheating, and certainly no more than about 25MHash/sec (using Makomk's modified power supply). To put that in context 5MHash/sec will currently earn approx 0.0003 bitcoin per day (and getting less by roughly 20% every 2 weeks as the difficulty increases).

If you do decide to get a DE0-Nano, start with the DE2_70_Unoptimized_Pipelined project. You'll need to increase CONFIG_LOG_LOOP2 to 4 to get it to compile (that's one eighth of a core, I think). I cheated and edited the fpgaminer.qsf file directly to configure it for the EP4CE22, but its probably safer to create a new project from scratch and add in the source files.

Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
AJRGale
Hero Member
*****
Offline Offline

Activity: 728



View Profile
April 20, 2013, 06:22:28 AM
 #762

the nano i was looking at does have "two CDs with the software necessary to 'compile' and 'upload' code to the board. " but not sure if the EP4CE22F17C6N is usable

The DE0-Nano is great to get started learning about fpga's, but it won't make you any useful coin. 5MHash/sec is about right, it will go faster but not without risk of overheating, and certainly no more than about 25MHash/sec (using Makomk's modified power supply). To put that in context 5MHash/sec will currently earn approx 0.0003 bitcoin per day (and getting less by roughly 20% every 2 weeks as the difficulty increases).

If you do decide to get a DE0-Nano, start with the DE2_70_Unoptimized_Pipelined project. You'll need to increase CONFIG_LOG_LOOP2 to 4 to get it to compile (that's one eighth of a core, I think). I cheated and edited the fpgaminer.qsf file directly to configure it for the EP4CE22, but its probably safer to create a new project from scratch and add in the source files.

Mark

Heh, thats about $1 a month (at the ~$100/coin mark) so that thing is not going to break even this life time i thinks

So, i might have to go hunt out some 2nd hand Spartan6 with 150K gates (or similar)

So the question is, will any 150K gate fpga work with the full miner? or is there something I'm missing (EG: http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,790&Prod=BASYS2 with 250K gates, slap on the full miner, and bam, 1hash a clock? )
mnemonix
Newbie
*
Offline Offline

Activity: 19


View Profile
April 20, 2013, 08:31:59 AM
 #763

Thx for your work you put in the miner!

I ported the Xilinx_VHDL miner to the ml605 dev board.

Actually, straight forward ... Replaced the dcm with a newer Virtex6-aquivalent, wired the pins to rs232 and clock, adjusted the baud rate and it run instantly.

It does 200MHash/sec and is user by about 85% ...
kramble
Sr. Member
****
Offline Offline

Activity: 384



View Profile WWW
April 20, 2013, 08:52:43 AM
 #764

So the question is, will any 150K gate fpga work with the full miner? or is there something I'm missing (EG: http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,790&Prod=BASYS2 with 250K gates, slap on the full miner, and bam, 1hash a clock? )

NO!! Don't confuse gate with LE (logic element). Older fpga's often quoted a gate count (such as the one you linked to Spartan 3E 250K gates). Newer fpga's use a Logic Element (or Logic Cell) count (and google tells me there are 12 gates to a LE). So a Spartan 6 LX150 with 147,443 logic cells roughly equates to 1.7 million gates by my calculation (I can't find any direct quote for the actual figure, so take that as very approximate). You can see the spartan family spec at http://www.xilinx.com/support/documentation/data_sheets/ds160.pdf

The board you linked to will be (almost) useless for mining. You need to look for a purpose-built Spartan LX150 based miner and use the firmware (bitstream) that comes with it (and even then the economics look pretty grim).

If you want to compile your own bitstream for the Spartan series, you can download free software from the Xilinx web site http://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.htm but beware that it is limited to the smaller devices (LX75 maximum I think, but do your own due dilligence). You need the full (very expensive) version to compile for the LX150.

Regards
Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
AJRGale
Hero Member
*****
Offline Offline

Activity: 728



View Profile
April 20, 2013, 10:02:25 AM
 #765

So the question is, will any 150K gate fpga work with the full miner? or is there something I'm missing (EG: http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,790&Prod=BASYS2 with 250K gates, slap on the full miner, and bam, 1hash a clock? )

NO!! Don't confuse gate with LE (logic element). Older fpga's often quoted a gate count (such as the one you linked to Spartan 3E 250K gates). Newer fpga's use a Logic Element (or Logic Cell) count (and google tells me there are 12 gates to a LE). So a Spartan 6 LX150 with 147,443 logic cells roughly equates to 1.7 million gates by my calculation (I can't find any direct quote for the actual figure, so take that as very approximate). You can see the spartan family spec at http://www.xilinx.com/support/documentation/data_sheets/ds160.pdf

The board you linked to will be (almost) useless for mining. You need to look for a purpose-built Spartan LX150 based miner and use the firmware (bitstream) that comes with it (and even then the economics look pretty grim).

If you want to compile your own bitstream for the Spartan series, you can download free software from the Xilinx web site http://www.xilinx.com/products/design-tools/ise-design-suite/ise-webpack.htm but beware that it is limited to the smaller devices (LX75 maximum I think, but do your own due dilligence). You need the full (very expensive) version to compile for the LX150.

Regards
Mark

Ah, Sorry for my newbishness, never played with one of these devices (blame the 2 companies for their heavy secretive efforts unless you buy their $5000 suite) 
my mistake, so when a company quotes "Gates" number, i have to look for  ALM, LE, Slice etc?

Basically i want to know what a full miner roll out fits on, how many LEs i'll go to digi-key and look something up and go from there
minernb
Newbie
*
Offline Offline

Activity: 14


View Profile
April 20, 2013, 10:56:36 PM
 #766

Basically i want to know what a full miner roll out fits on, how many LEs i'll go to digi-key and look something up and go from there

Hi,

The Altera DE1 has 18K LE.

The non-optimized version fits using the factor 4 in the roll(?), for a total of 16K LE used. I get 3.10 MH/s.
The makomk_mod version fits using factor 2 (but all works are rejected, I don't know way!). It reports 12MH/s.


kramble
Sr. Member
****
Offline Offline

Activity: 384



View Profile WWW
April 21, 2013, 08:30:24 AM
 #767

The makomk_mod version fits using factor 2 (but all works are rejected, I don't know way!). It reports 12MH/s.

I had the same problem with the DE0-Nano (22k LE), this was Makomk's response ...

I've now started looking at the code in the DE2_115_makomk_mod branch, but I've hit a problem. The code compiles fine at CONFIG_LOOP_LOG2=2, 3 and 4 but its producing the wrong hashes (I'm just running at 40MHz for testing, not full blast) ... the mine.tcl script submits hashes to the pool, but they are all rejected!
Yeah, that branch doesn't work with CONFIG_LOOP_LOG2!=1. You probably want http://www.makomk.com/gitweb/?p=Open-Source-FPGA-Bitcoin-Miner.git;a=summary de0-nano-hax branch, projects/DE2_115_Unoptimized_Pipelined project. The voltage regulators are also indeed horribly inefficient on the DE0-nano.

I can't answer AJRGale's query about the LE's needed for a fully unrolled core as I haven't built anything larger than a one-sixth core which (just) fitted into 22k LE on an EP4CE22 on the Nano.

Regards
Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
senseless
Sr. Member
****
Offline Offline

Activity: 388



View Profile
April 21, 2013, 10:40:32 AM
 #768

This is a DSP48E1 based design, and I have compiled and run it at 400MH/s.

Have you done any testing as to which adders provide the best increase to the fmax? In order to get multiple cores in there going to need to pick and choose which adders to replace with dsps and which not to. I'm currently at 66% LUT usage with 99% memory LUT and 108% dsp usage with 2 unrolled cores (I had one core do even nonces while the other does odd nonces to make life easy). I've been slowly working down the number of dsps utilized per core to make it fit. I'm thinking it might be possible to get 3 full cores on the A7 200.

Does the DSP performance increase compound? If I change one adder over to DSP utilization and it gives a 10% fmax increase... would changing additional adders down the chain affect that 10%? or will that one adder always give a 10% boost? I'm wondering if it will be possible to go through the adders one by one and calculate the increase in frequency for each one to find which adders would be the most effectively utilized under DSP48 blocks to get the best timing.




anomalies
Newbie
*
Offline Offline

Activity: 13



View Profile
April 22, 2013, 01:55:25 AM
 #769

hi, another question from a newbs.. Grin


have any of you guys heard of parallella? http://www.parallella.org
what you guys think about it?  Cheesy
AJRGale
Hero Member
*****
Offline Offline

Activity: 728



View Profile
April 22, 2013, 04:21:40 AM
 #770

hi, another question from a newbs.. Grin


have any of you guys heard of parallella? http://www.parallella.org
what you guys think about it?  Cheesy

Ahh yes, that my friend is a completely different ball game to FPGA
i've been waiting for them to kick off, i want one to play with 64 threads per chip... mmmm
paszczakojad
Newbie
*
Offline Offline

Activity: 15


View Profile
April 24, 2013, 03:33:54 PM
 #771

This is a DSP48E1 based design, and I have compiled and run it at 400MH/s.

Have you done any testing as to which adders provide the best increase to the fmax? In order to get multiple cores in there going to need to pick and choose which adders to replace with dsps and which not to. I'm currently at 66% LUT usage with 99% memory LUT and 108% dsp usage with 2 unrolled cores (I had one core do even nonces while the other does odd nonces to make life easy). I've been slowly working down the number of dsps utilized per core to make it fit. I'm thinking it might be possible to get 3 full cores on the A7 200.

Does the DSP performance increase compound? If I change one adder over to DSP utilization and it gives a 10% fmax increase... would changing additional adders down the chain affect that 10%? or will that one adder always give a 10% boost? I'm wondering if it will be possible to go through the adders one by one and calculate the increase in frequency for each one to find which adders would be the most effectively utilized under DSP48 blocks to get the best timing.


I compiled fpgaminer's DSP code on A7 200 and I got 356 MHz on -3 grade, 311 MHz on -2 grade and 262 MHz on -1. The -3 variant only exists in extended temperature version, so it's much more expensive - so the -2 is the best choice in my opinion.

The usage was 20% slice logic, 34% slice logic distribution and 92% DSP.

What were your results? I.e. what maximum clocking do you have without DSP?

Now I'm trying to replace some DSPs with adder IP core - I think best candidates are these that don't use PCIN input (because they are simpler), like dsp_e, dsp_wp and dsp_t1p. When I replaced dsp_e with adder I got 302 MHz (-2 version), 23% logic, 37% distrib, 75% DSP. Then I replaced dsp_wp: 271 MHz, 24% logic, 38% distrib, 63% DSP. Compilation took over 5 hours, while it takes 30 min when using only DSP. Then I replaced dsp_t1p and the compilation takes ages to complete (it didn't complete yet) Sad

The estimation is that DSP usage will be 49%, so theoretically I should be able to fit two such cores. Even if I have to lower the clock to, say, 200 MHz then total output would be 400 MH/s, which would be better than 311 MH/s with one DSP-only core.
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
April 25, 2013, 12:14:13 AM
 #772

Quote
When I replaced dsp_e with adder I got 302 MHz
I find it odd that your Fmax is dropping when you replace the DSPs with LUTs.  You may want to fiddle around with Vivado's settings to make sure register retiming (or whatever Vivado calls it) is enabled.  Alternatively, implement the adders as two stages of 16-bits each.  Since the DSPs that are being replaced are two stage (or three) anyway.

Also, for dsp_t1p, it would be best to replace both dsp_t1p and compressor_t1p with a single LUT adder, since the LUT fabric can implement 3 way additions just as efficiently as 2-way addition.

paszczakojad
Newbie
*
Offline Offline

Activity: 15


View Profile
April 25, 2013, 05:10:48 AM
 #773

Quote
When I replaced dsp_e with adder I got 302 MHz
I find it odd that your Fmax is dropping when you replace the DSPs with LUTs.  You may want to fiddle around with Vivado's settings to make sure register retiming (or whatever Vivado calls it) is enabled.  Alternatively, implement the adders as two stages of 16-bits each.  Since the DSPs that are being replaced are two stage (or three) anyway.

I used 2-stage adders, because DSP adders worked in 2 cycles and I didn't want to debug too much. IP core generator recommended 3 cycles for the best performance - I'll try that next.

After replacing dsp_e, dsp_wp and dsp_t1p I got 46% DSPs used - so it's enough to fit two cores.
Khertan
Full Member
***
Offline Offline

Activity: 193


View Profile WWW
May 03, 2013, 06:30:39 PM
 #774

I m currently playing with the DE0 Nano code from Kramble.

And i ve a question, you said that running it at higher speed than 40Mhz could damage an unmodified DE0 Nano, and i didn't understand why.

As from Quartus PowerPlay Power Analyser, the design at 50 Mhz use only 328mW, that s arround 273mA right ? it s supposed to support 500mA, isn't it ?

Did i miss something ?

kramble
Sr. Member
****
Offline Offline

Activity: 384



View Profile WWW
May 03, 2013, 08:32:43 PM
 #775

I m currently playing with the DE0 Nano code from Kramble.

And i ve a question, you said that running it at higher speed than 40Mhz could damage an unmodified DE0 Nano, and i didn't understand why.

As from Quartus PowerPlay Power Analyser, the design at 50 Mhz use only 328mW, that s arround 273mA right ? it s supposed to support 500mA, isn't it ?

Did i miss something ?

No, I was just being conservative in case someone inexperienced just cranked it up to the max (and following the example of fpgaminer in his original readme). You can run it faster as long as you are happy the power supply will support it (I had a conversation with hardcore_fc a few months back about the regulators, it may be worth you looking back over it). I am currently running one board at 170Mhz (with a hardwired external 1.2V core supply as described at www.makomk.com) and a second at 80MHz on a conventional 3.3V external supply.

You are correct that a USB supply will probably be limited to 500mA, but this is at 5Volts. I haven't played with the Powerplay Analyser, but I would expect that this is reporting the power at the 1.2V fpga core rail. You have to account for the other devices on the DE0-Nano board too.

I just dug out some notes I made of measurements with the 3.3V supply. 40Mhz was 0.48A, 80Mhz 0,85A, 100Mhz 1.0A, 120MHz 1.2A and 140Mhz 1.36A, so roughly 10mA per Mhz. The regulators were getting very hot at the higher speeds (even though I was pointing a fan at the board), hence my caution at running the DE0-Nano at these sorts of speeds. The regulators themselves are overtemperature protected, but looking at the datasheet, this only kicks in at T(junction) of 175C, while the max operating temperature is 125C. It also quotes 85C/Watt junction-ambient assuming a big chunk of PCB copper dedicated to heatsinking, so you can work out roughly what they can practically support.

Given the tiny returns from mining on the Nano, my opinion was that its not worth risking the boards at the higher speeds. I'm happy with my current setup (as described above) as nothing is getting above 60C, but its your call on your own stuff.

[EDIT] I should add that I'm using a serial interface to communicate with the boards, rather than the quartus_stp jtag usb cable, which is why I can get away with a 3.3V external supply. If you are using the usb for communication, then an external 3.3V supply won't work as it will pull current from the usb instead (there are a couple of blocking diodes so no harm should occur). You could use a 5V external supply to supplement the usb's 500mA, but then its all getting a bit Heath Robinson, and the onboard regulators are under more heat stress at 5V than 3.3V. Oh, and the DE0-Nano manual says the minimum external supply is 3.6V (I just happened to have 3.3V to hand and it worked fine, but its technically out of spec so YMMV).

Regards
Mark

Github https://github.com/kramble BLC BkRaMaRkw3NeyzsZ2zUgXsNLogVVkQ1iPV
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
May 03, 2013, 10:20:06 PM
 #776

I've been asked a few times about a mining script for the current KC705 firmware.  I wrote a plugin for Modular Python Bitcoin Miner.  Here's the message I sent to someone about it:

Quote
I uploaded the custom MBPM module, which is compatible with the current KC705 mining code, here:
https://mega.co.nz/#!Oh5HTDRB!C0RLYW4yZN8gbg38FfgLpzmKFcseOql3Xx1i_gXTfdM

You'll want to download a copy of MPBM's testing branch.  Then extract the above archive into
Code:
modules/fpgamining
such that you end up with:

Code:
modules/fpgamining/kc705_uart/__init__.py
modules/fpgamining/kc705_uart/kc705uartworker.py

Once you start MPBM, you can now add a KC705 Worker by openning up the MPBM web-interface (http://127.0.0.1:8832) and clicking the "Workers" button on the left.  On Windows, I ran MPBM under Cygwin, and the "Port" ended up being /dev/com2 for me.  The Baudrate is 115200.

~fpgaminer

I haven't had a chance to clean it up and put it on the repo yet.

gingernuts
Member
**
Offline Offline

Activity: 89


View Profile
May 04, 2013, 12:01:41 AM
 #777

Looking at Digikey right now,for the chips you could actually buy today,

The Small  Kintex XC7K160T is $230 ish in -1 grade and $280 ish in -2 grade
The Biggest Artix XC7A200T is $200 ish in -1 grade and $270 ish in -2 grade  and both of these can be developed with the free Webpack software


The Kintex used on the KC705, XC7K325T is $1000 ish in the -1 grade, and $1500 odd in the -2 grade (They have a $1200 one, but not in stock), and needs a full Vivado/ISE license to play with - even if I were to buy a KC705 dev-kit, I can't see how the 325T device is going to be good bang for the buck...

Interestingly in a Kintex -> Artix migration guide Xilinx seem to reckon that a -1 grade Kintex is 1.6x as fast as a -1 Artix so while the 7A200T looks like a winner in terms of price and slices/DPS modules, I'm wondering whether the Kintex XC7K160 might not be the best value overall...


 
fpgaminer
Hero Member
*****
Offline Offline

Activity: 546



View Profile WWW
May 05, 2013, 07:16:29 AM
 #778

For those with a VC707 devkit (Virtex 7), I've done a blind port of the KC705_experimental project:

https://github.com/fpgaminer/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/VC707_experimental

Bitstream: https://mega.co.nz/#!7x4nkS4b!O2aEv0Khp541jwY8FIwpiUeYstoXAOSyMqUKxhBMwKY

Completely untested.  Let me know if it works, or doesn't!

Khertan
Full Member
***
Offline Offline

Activity: 193


View Profile WWW
May 05, 2013, 04:16:55 PM
 #779


Given the tiny returns from mining on the Nano, my opinion was that its not worth risking the boards at the higher speeds. I'm happy with my current setup (as described above) as nothing is getting above 60C, but its your call on your own stuff.

Regards
Mark

Thanks, indeed for bitcoin mining i ll not risk to burn mine little nano, i'm asking because i'm working on a other project, i want to understand things to not burn it. Smiley
I ll try to monitor the usb power used and temperature.

At 40Mhz PowerPlay estimate 296mA ... for the fpga only of course. But i've play with settings to reduce power usage from your original code / project settings.
So look like powerplay underestimate power usage

Thanks a lot for your explanation.

xbaby
Newbie
*
Offline Offline

Activity: 16


View Profile
May 07, 2013, 06:07:36 AM
 #780

I'm trying to compile the "projects/X6000_ztex_comm4" myself, for devices "xc6slx150, speed -3", under Xilinx ISE v13.4, and code from Github without any modification.

using default compiling option from "xilinx_fpgaminer.xise", under the goal of "Timing Performance", the placement failed. after change goal to "Minimum Runtime", the project compiled successfully, but the timing constrains can't be met. from the PAR report, the clock speed is only 153MHz (cycle 6.54ns). I'd like to ask what optimization options need to use to achieve > 190MHz clock speed? please help me, thanks very much.

Code:
+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|                               |   Period    |       Actual Period       |      Timing Errors        |      Paths Analyzed       |
|           Constraint          | Requirement |-------------+-------------|-------------+-------------|-------------+-------------|
|                               |             |   Direct    | Derivative  |   Direct    | Derivative  |   Direct    | Derivative  |
+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|TS_CLK_100MHZ                  |     10.000ns|      9.689ns|     13.082ns|            0|          633|         1456|      3690036|
| TS_dynamic_clk_blk_clkfx      |      5.000ns|      6.541ns|          N/A|          633|            0|      3690036|            0|
+-------------------------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+

Slice Logic Utilization:
  Number of Slice Registers:                84,129 out of 184,304   45%
    Number used as Flip Flops:              84,129
    Number used as Latches:                      0
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:                0
  Number of Slice LUTs:                     50,798 out of  92,152   55%
    Number used as logic:                   35,040 out of  92,152   38%
      Number using O6 output only:          15,507
      Number using O5 output only:             581
      Number using O5 and O6:               18,952
      Number used as ROM:                        0
    Number used as Memory:                   3,297 out of  21,680   15%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         3,297
        Number using O6 output only:           449
        Number using O5 output only:             0
        Number using O5 and O6:              2,848
    Number used exclusively as route-thrus: 12,461
      Number with same-slice register load: 12,036
      Number with same-slice carry load:       425
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                15,049 out of  23,038   65%
  Nummber of MUXCYs used:                   22,144 out of  46,076   48%
  Number of LUT Flip Flop pairs used:       58,734
    Number with an unused Flip Flop:           959 out of  58,734    1%
    Number with an unused LUT:               7,936 out of  58,734   13%
    Number of fully used LUT-FF pairs:      49,839 out of  58,734   84%
    Number of slice register sites lost
      to control set restrictions:               0 out of 184,304    0%
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 [39] 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!