Bitcoin Forum
November 07, 2024, 10:50:44 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [24] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
Author Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)  (Read 432941 times)
jonand
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
August 12, 2011, 12:50:12 PM
 #461

I've been playing around with the xilinx-verilog port in the github repo and can confirm that it works just fine on the Xilinx Spartan-6 XC6LX9 microboard eval board from Avnet for $69.

Couldn't fit anything but unrolling = 5 so speed is a few Mhash/s only, but I really recommend anyone who want to have a look at FPGAs to get this little USB "dongle" sized board. It has built-in usb jtag-cable and usb-serial console so nothing extra is required to get it up and running teknohog's code.

Let me know if anyone wants the UCF file, I just took the pin numbers from the avnet UCF example and replaced the UCF from github.

It is also nice to replace the 7segment with a simple LED output to see that your serial comm is working.

I tried clocks up to >90MHz and it runs just fine.

As a courtesy to teknohog I used his account (in the miner.py) when mining so he got the reward for the few shares I found..

Oh, I actually used windows7 for this, I couldn't get the digilent xilinx cable drivers to work on my linux box. It was easier to install python on the windows laptop.


You can also look at it this way: Learn FPGA programming, get a job and make more money than you most likely ever will on mining bitcoins! Smiley

jonand
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
August 12, 2011, 02:21:50 PM
 #462

The teknohog xilinx-verilog port also works good with the Xilinx Spartan 6 LX45T eval board called "SP605".

The only trick is that TX and RX in the SiLabs USB-RS232 adapter is reversed so you need to swap the pins in the UCF file to make a null modem.

After finding 7 blocks I have a hash rate of 13.1 +-4.95 with a clock speed of 63MHz and unrolling = 3.


I'm not saying these boards are useful for any real life mining but if you need a working reference example this does the job.

You can test run these devices with the xilinx ISE webpack license (which reports some stats back to xilinx but has a $0 cost). The only thing you risk is a salesguy calling you and asking when you will be purchasing volume..  Tongue

Good luck, have fun!

jonand
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
August 13, 2011, 09:44:45 PM
 #463

Makomk's sha256-transform.v improved the verilog-xilinx port a bit. unroll=2 is now working on the sp605 board with at least a 63MHz hash_clk.

100 shares found and speed seems to stabilize somewhere in the 15Mh/s region.

Anyone else running LX45?

Keninishna
Hero Member
*****
Offline Offline

Activity: 556
Merit: 500



View Profile
August 13, 2011, 10:05:07 PM
 #464

I'm interested in getting into fpgas, however where can I source a spartan-6 lx150 for cheap? on the hardware comparison site in the comments it says
Quote
3N 484-pin chip is ~$150, 0.67Mhash/$
NF6X
Member
**
Offline Offline

Activity: 98
Merit: 10



View Profile WWW
August 13, 2011, 11:06:38 PM
 #465

I'm interested in getting into fpgas, however where can I source a spartan-6 lx150 for cheap? on the hardware comparison site in the comments it says
Quote
3N 484-pin chip is ~$150, 0.67Mhash/$

That's about what they cost in single quantity, and as FPGAs of that size go, that is cheap. That price is just for the chip; it would need to be soldered onto a suitable board to be usable. It's in a 484-pin ball grid array package with 1mm ball pitch, which is a bit too advanced for most hobbyists to solder down themselves. Unfortunately, off-the-shelf development boards with LX150 chips on them are pretty expensive, partly because they support features of the device which add substantial cost to the board (such as having lots of layers in order to route out all 484 pins) that we don't need for the mining application. I don't have the link handy or remember the manufacturer's name at the moment, but the cheapest off-the-shelf LX150-based board that I've seen recently costs around $600-$700 if I recall correctly.

There's another thread in which folks are working on a fairly low-cost LX150-based board that's optimized for mining:

https://bitcointalk.org/index.php?topic=22426.0
NF6X
Member
**
Offline Offline

Activity: 98
Merit: 10



View Profile WWW
August 14, 2011, 01:29:15 AM
 #466

I found the Spartan 6 LX150 boards that I mentioned previously:

http://www.hdl.co.jp/en/index.php/xilinx-series1/spartan-6.html

They start at around $700 for LX150 boards (less for smaller parts). I don't know of any cheaper off-the-shelf Spartan 6 LX150-based boards right now, but I'd love to hear about any.
iidx
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 14, 2011, 02:57:42 AM
 #467

Hi Guys,

I had a bunch of spare stuff laying around at work, so I whipped up a mining configuration as an exercise.

Supplies:
6 Xilinx ML605 cards (XC6VLX240T, Virtex 6)
PCIe switch development kit (an external board that connects to your PC and has a ton of PCIe slots)
TI USB to GPIO pod (for the ML605 power supplies)

Starting with the Xilinx Verilog port of the code, I found that I could fit 2 instances of the LOOP = 0 core.  However, that wasn't enough for me.  I figured that if I used the DSP48s in the Virtex 6, I could fit at least 1 more in there.  With the 3 cores, I am using about 558 of the hardware multipliers.  Sadly, there isn't enough to fit a 4th in there.  I may be able to fit it in there if I use a few less hardware multipliers, but it will be tight.  I ended up running the cores at 125 Mhz, because that's the same speed my PCIe internal interface runs at.  There is more headroom available, 150 Mhz is probably doable, but power will start to become a concern.

It turned out that the ML605's power supplies could supply enough power to run 3 cores, but the digital power managers were not set to allow the full rated current.  I re-programmed the power managers to allow for the rated current in order to get the 3 core version of my design working.  The designs use about 16-18 watts of power and 16-17A on the VCCint rail (1v).  I had to supply additional cooling to make sure the power supplies didn't over heat (they have no cooling normally).

Next, I had to connect it to my PC for mining data.  Now, of course 6 serial ports wasn't going to be the most elegant solution (and my PC actually had no serial ports).  I used an off the shelf PCIe core in conjunction with the Xilinx hard IP to connect the 3 bitcoin cores to the PC.  Sadly, the PCIe core is a licensed product, so I won't be able to share the source here.

The hardest part for me was last - I had to figure out what data to get from a mining pool, what to do with it and how to get it in the card.  I found some open source C# mining libraries (I need to credit the guy, but I don't have the code in front of me), modified it and wrote a mining program to feed and poll all of the cards.  It was a pain in the ass to get that finally working, but through analyzing a bunch of different mining software I figured it out.

But finally, my experiment is working @ 2250 Mhash/s and about 100w!  The cost is out of control, but since I had these cards laying around from other experiments, I figured I'd give it a shot.

I'd be happy to contribute the changes to the modifications to usethe DSP48s, but I can't actually distribute the PCIe DMA/PIO engine since it's licensed (not from Xilinx).  I'm happy to distribute the source for the software too, since I didn't really find a C#/.NET windows version that suited my needs.

Questions and comments welcome!
newMeat1
Full Member
***
Offline Offline

Activity: 210
Merit: 100



View Profile
August 14, 2011, 03:34:38 AM
 #468

Can you humor me with a guess of how much this hardware would cost, iidx?

2250 Mhash/s- Wow! I wish I had that

NF6X
Member
**
Offline Offline

Activity: 98
Merit: 10



View Profile WWW
August 14, 2011, 04:25:42 AM
 #469

That ML605 hack sounds delightful! I love it!

Another group at my company has custom emulation platforms with more than 50 (!!) Virtex 5 parts each. I wish I could spend some quality with one of those and make it slave away in the Bitcoin mines, but sadly, that's not going to happen. I could get away with stuff like that when I was working for a little startup instead of a megacorp, but then we couldn't afford toys like that back then.
iidx
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 14, 2011, 04:37:59 AM
Last edit: August 14, 2011, 04:56:36 AM by iidx
 #470

Can you humor me with a guess of how much this hardware would cost, iidx?

2250 Mhash/s- Wow! I wish I had that

It's pretty unreasonable, I think each of the cards were $2000 when I bought them about 6 months ago for a project.  Xilinx has "generously" reduced the price to $1795 now...  Not sure what the raw chip prices are, maybe $500 each in volume.

Quote
Another group at my company has custom emulation platforms with more than 50 (!!) Virtex 5 parts each. I wish I could spend some quality with one of those and make it slave away in the Bitcoin mines, but sadly, that's not going to happen. I could get away with stuff like that when I was working for a little startup instead of a megacorp, but then we couldn't afford toys like that back then.

Wow, 50 devices!!  Maybe you can ask if you can do some performance testing for that group Cheesy

I actually have another board that has a Virtex 5 on it (ML555) that I thought about also trying to use.  However, it has a pretty small device and no heatsink, so it probably would be a bad idea and yield 125-150 MHash at best.
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
August 14, 2011, 05:07:33 AM
 #471

Quote
Finally got around to coding some maximum clock speed improvements for users of smaller Cyclone III and IV devices - now available from my new partial-unroll-speed branch. Expected minimum device size and speed is roughly as follows:
More fantastic work, makomk!  Cool *applause*

Quote
I've been playing around with the xilinx-verilog port in the github repo and can confirm that it works just fine on the Xilinx Spartan-6 XC6LX9 microboard eval board from Avnet for $69.
Thank you for taking the time to share your experiences with all these mini eval boards, jonand. That's great information.

It's a shame that LX9 microboard uses a 324 landing. Would be neat to re-solder an LX150 to it, but the LX150 doesn't come in 324 package :/

Quote
But finally, my experiment is working @ 2250 Mhash/s and about 100w!  The cost is out of control, but since I had these cards laying around from other experiments, I figured I'd give it a shot.
*drools* For reference, an AMD 5850 only gets ~350MH/s for ~150W.  Tongue

Keninishna
Hero Member
*****
Offline Offline

Activity: 556
Merit: 500



View Profile
August 14, 2011, 09:05:01 AM
 #472

I'm interested in getting into fpgas, however where can I source a spartan-6 lx150 for cheap? on the hardware comparison site in the comments it says
Quote
3N 484-pin chip is ~$150, 0.67Mhash/$

That's about what they cost in single quantity, and as FPGAs of that size go, that is cheap. That price is just for the chip; it would need to be soldered onto a suitable board to be usable. It's in a 484-pin ball grid array package with 1mm ball pitch, which is a bit too advanced for most hobbyists to solder down themselves. Unfortunately, off-the-shelf development boards with LX150 chips on them are pretty expensive, partly because they support features of the device which add substantial cost to the board (such as having lots of layers in order to route out all 484 pins) that we don't need for the mining application. I don't have the link handy or remember the manufacturer's name at the moment, but the cheapest off-the-shelf LX150-based board that I've seen recently costs around $600-$700 if I recall correctly.

There's another thread in which folks are working on a fairly low-cost LX150-based board that's optimized for mining:

https://bitcointalk.org/index.php?topic=22426.0

How about this universal board? http://www.hdl.co.jp/en/index.php/accessories/zkb-054.html It appears it'll take a socketed version of the chip and is only about 180$. If the chip can be sourced for 150-170$ It will still be expensive at 330$ but cheaper than 700$
Silverpike
Newbie
*
Offline Offline

Activity: 54
Merit: 0



View Profile
August 14, 2011, 09:50:29 AM
 #473

How about this universal board? http://www.hdl.co.jp/en/index.php/accessories/zkb-054.html It appears it'll take a socketed version of the chip and is only about 180$. If the chip can be sourced for 150-170$ It will still be expensive at 330$ but cheaper than 700$
You are a little off here.  This is not a board that can be used to mount FPGAs.  This board is designed specifically to aggregate the other FPGA dev boards this company sells onto one motherboard.  It won't help for finding a cheap host for raw FPGA parts.
Keninishna
Hero Member
*****
Offline Offline

Activity: 556
Merit: 500



View Profile
August 14, 2011, 10:26:00 AM
Last edit: August 14, 2011, 11:05:42 AM by Keninishna
 #474

How about this universal board? http://www.hdl.co.jp/en/index.php/accessories/zkb-054.html It appears it'll take a socketed version of the chip and is only about 180$. If the chip can be sourced for 150-170$ It will still be expensive at 330$ but cheaper than 700$
You are a little off here.  This is not a board that can be used to mount FPGAs.  This board is designed specifically to aggregate the other FPGA dev boards this company sells onto one motherboard.  It won't help for finding a cheap host for raw FPGA parts.


I had a feeling it wasn't that easy. Definitely a challenge to make fpgas feasible bitcoin miners. I found a site that offers a board for about 500$ http://shop.ztex.de/product_info.php?products_id=64&language=en

Holy moly take a look at this 12x LX150s on one pcie board. http://www.dinigroup.com/new/DNBFC_S12_PCIe.php  Grin

Also this page seems like a good reference http://www.fpga-faq.com/FPGA_Boards.shtml
ngzhang
Hero Member
*****
Offline Offline

Activity: 592
Merit: 501


We will stand and fight.


View Profile
August 14, 2011, 11:06:17 AM
Last edit: August 14, 2011, 11:44:00 AM by ngzhang
 #475

Hi guys.
I'm working hard around the XC6SLX150 -3N FPGA this week. I'm trying to design a FPGA computing unit for bitcoin mining and some other related project.
By the work of this thread:

https://bitcointalk.org/index.php?topic=22426.580
Modular FPGA Miner Hardware Design Development

I think, if we do not use expensive power-modules(instead of discrete comps, cheaper  but the design and test difficulty is upupup), find a  way to buy cheap FPGAs, well Design for manufacturability, have hundreds of people will buy it, etc......
After that, I'm quite sure  the daughter board will be able to be manufactured in 400$ for all costs.

But the question is, 2 of XC6SLX150 -3n will finally give us how much MH/s? If we want to make FPGA mining to be a feasible choice, the MH/s pre $ must close the GPU mining.
1 HD6870( now buy new HD5850s are difficult) is about 180$, provide a hashing power of 270MH/s, about 1.5Mhs/$. I think at least, a 1Mhs/$ is necessary for FPGA mining.
Can we optimize the XC6SLX150 to about 200Mh/s performance? Is it possible?

That means: 2 fully pipelined cores run at 100MHz, per FPGA. Note that XC6SLX150 has about 60% logic resource of XC6VLX240T and 180 DSP48A1s (XC6VLX240T has 768 of ESP48E1s).

If the above performance is possible, I can make the 400$ prize dual XC6SLX150 board come true in 1-2 month.
iidx
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 14, 2011, 08:00:50 PM
 #476

I don't know the usage for the single unrolled core on a S6 150, but here's the usage from my design in the V6 240T using 2 cores.  I'm guessing you'll want to use all 180 of those DSP48s to reduce the logic usage.  The number of slices and LUTs used is going to be close to your maximum capacity, but with 372 (186 per core) DSP48s used.

Code:
Device Utilization Summary:

Slice Logic Utilization:
  Number of Slice Registers:               101,697 out of 301,440   33%
    Number used as Flip Flops:              98,581
    Number used as Latches:                      1
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:            3,115
  Number of Slice LUTs:                     88,763 out of 150,720   58%
    Number used as logic:                   67,920 out of 150,720   45%
      Number using O6 output only:          32,057
      Number using O5 output only:           1,667
      Number using O5 and O6:               34,196
      Number used as ROM:                        0
    Number used as Memory:                   9,892 out of  58,400   16%
      Number used as Dual Port RAM:              0
      Number used as Single Port RAM:            0
      Number used as Shift Register:         9,892
        Number using O6 output only:         7,362
        Number using O5 output only:             0
        Number using O5 and O6:              2,530
    Number used exclusively as route-thrus: 10,951
      Number with same-slice register load: 10,889
      Number with same-slice carry load:        62
      Number with other load:                    0

Slice Logic Distribution:
  Number of occupied Slices:                27,898 out of  37,680   74%
  Number of LUT Flip Flop pairs used:      105,799
    Number with an unused Flip Flop:        28,962 out of 105,799   27%
    Number with an unused LUT:              17,036 out of 105,799   16%
    Number of fully used LUT-FF pairs:      59,801 out of 105,799   56%
    Number of slice register sites lost
      to control set restrictions:               0 out of 301,440    0%

Specific Feature Utilization:
  Number of RAMB36E1/FIFO36E1s:                 40 out of     416    9%
    Number using RAMB36E1 only:                 40
    Number using FIFO36E1 only:                  0
  Number of RAMB18E1/FIFO18E1s:                  0 out of     832    0%
  Number of BUFG/BUFGCTRLs:                      5 out of      32   15%
    Number used as BUFGs:                        5
    Number used as BUFGCTRLs:                    0
  Number of ILOGICE1/ISERDESE1s:                 0 out of     720    0%
  Number of OLOGICE1/OSERDESE1s:                 0 out of     720    0%
  Number of BSCANs:                              0 out of       4    0%
  Number of BUFHCEs:                             0 out of     144    0%
  Number of BUFIODQSs:                           0 out of      72    0%
  Number of BUFRs:                               0 out of      36    0%
  Number of CAPTUREs:                            0 out of       1    0%
  Number of DSP48E1s:                          372 out of     768   48%
  Number of EFUSE_USRs:                          0 out of       1    0%
  Number of FRAME_ECCs:                          0 out of       1    0%
  Number of GTXE1s:                              4 out of      20   20%
    Number of LOCed GTXE1s:                      4 out of       4  100%
  Number of IBUFDS_GTXE1s:                       1 out of      12    8%
  Number of ICAPs:                               0 out of       2    0%
  Number of IDELAYCTRLs:                         0 out of      18    0%
  Number of IODELAYE1s:                          0 out of     720    0%
  Number of MMCM_ADVs:                           1 out of      12    8%
  Number of PCIE_2_0s:                           1 out of       2   50%
    Number of LOCed PCIE_2_0s:                   1 out of       1  100%
  Number of STARTUPs:                            1 out of       1  100%
  Number of SYSMONs:                             0 out of       1    0%
  Number of TEMAC_SINGLEs:                       0 out of       4    0%
Silverpike
Newbie
*
Offline Offline

Activity: 54
Merit: 0



View Profile
August 14, 2011, 09:09:49 PM
 #477

But the question is, 2 of XC6SLX150 -3n will finally give us how much MH/s? If we want to make FPGA mining to be a feasible choice, the MH/s pre $ must close the GPU mining.
1 HD6870( now buy new HD5850s are difficult) is about 180$, provide a hashing power of 270MH/s, about 1.5Mhs/$. I think at least, a 1Mhs/$ is necessary for FPGA mining.
Can we optimize the XC6SLX150 to about 200Mh/s performance? Is it possible?

200MH/s is not possible on this part.  Artforz was able to tweak his to approx 118MH, and that was with a well optimized design with overclocking.  This open-source design isn't terribly efficient, but actually produces a very good result of approx 100MH on the Spartan 150 if the design can be routed at 100 MHZ.

200MH is simply way out of the question for an S6-LX150.
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
August 14, 2011, 10:46:29 PM
 #478

Quote
200MH is simply way out of the question for an S6-LX150.
That won't stop me from trying  Grin

As far as I can tell with the poking around I've done so far, the current bottleneck on the S6-LX150 is the far dependencies caused by the W calculations. These references make it so that the rounds are not isolated, and so cannot be routed into a uniform chain. This forces ISE to do completely absurd routing, splattering the placement of a round's components across a good 1/4th of the chip. And that, obviously, leads to massive routing delays. On my last few compiles, the worst-case paths were >80% routing (8ns+ of routing, with 2ns of logic).

If W is buffered between each round as a 512-bit register, instead of chains of shift registers and BRAMs, then the rounds can be isolated, but ISE fails to Map such a design for reasons I have not yet nailed down. 512-bits*~100 is quite a lot of registers  Undecided

If I, or someone else, can find a way to isolate the rounds and put them into a more consistent chain, then I highly suspect that both performance and area will improve considerably.

I may create a "fake" design that focuses specifically on the W calculations (without digester rounds), and see if I can somehow get them routed into a sensible structure (even if it requires manual placement  Angry )

Silverpike
Newbie
*
Offline Offline

Activity: 54
Merit: 0



View Profile
August 14, 2011, 10:58:57 PM
 #479

Quote
200MH is simply way out of the question for an S6-LX150.
That won't stop me from trying  Grin
Good luck!  Wink

Quote
As far as I can tell with the poking around I've done so far, the current bottleneck on the S6-LX150 is the far dependencies caused by the W calculations. These references make it so that the rounds are not isolated, and so cannot be routed into a uniform chain. This forces ISE to do completely absurd routing, splattering the placement of a round's components across a good 1/4th of the chip. And that, obviously, leads to massive routing delays. On my last few compiles, the worst-case paths were >80% routing (8ns+ of routing, with 2ns of logic).
My criticism of this design (your design?)  is that there is too much pipelining.  If you have ever taken a computer architecture class, pipelining can be a very serious impediment to having a high speed design.  This sounds counter-intuitive, but the cost you pay for all those registers is very high.  This can be mitigated quite a bit on FPGAs, since registers are part of each CLB (and in a sense they can come for "free" if you have enough combinatorial logic).  This is certainly part of your routing problem.
Quote
If W is buffered between each round as a 512-bit register, instead of chains of shift registers and BRAMs, then the rounds can be isolated, but ISE fails to Map such a design for reasons I have not yet nailed down. 512-bits*~100 is quite a lot of registers  Undecided

If I, or someone else, can find a way to isolate the rounds and put them into a more consistent chain, then I highly suspect that both performance and area will improve considerably.

I may create a "fake" design that focuses specifically on the W calculations (without digester rounds), and see if I can somehow get them routed into a sensible structure (even if it requires manual placement  Angry )
The roadblock to having a high density FPGA design (in this case) is not your routing issues.  The logic you are using to compute the basic hashes is not optimal, and you have not spent any time trying to optimize for your critical path.  I would suggest you concentrate your efforts in this domain (hint hint).  Keep in mind that you are duplicating each round 128 times, so any logic savings per round is magnified by 128x.
fpgaminer (OP)
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
August 14, 2011, 11:27:54 PM
 #480

Quote
My criticism of this design (your design?)  is that there is too much pipelining.
Thank you for the criticism. I really do appreciate the feedback, and I am by no means an expert Smiley

My intuition is similar to yours, in that a more traditional serial design should achieve better utilization and performance on the Spartan-6 architecture. But it is very easy to underestimate the massive amount of optimizations that occur in the fully unrolled design that takes my current primary focus.

I have a functioning serial implementation, but so far my estimates for its total performance once put in parallel on the S6-LX150 is not exciting. Something like 120MH/s of performance. It's in the back of my mind, and there is plenty more work to be done in optimizing and perfecting it, but it hasn't shown me enough promise to warrant being in my mental spotlight like the unrolled design.

Quote
The logic you are using to compute the basic hashes is not optimal, and you have not spent any time trying to optimize for your critical path.
The current critical path is approximately two 3-way 32-bit adders implemented as 16 total slices, thanks to the Spartan-6 fast carry look ahead chains. Is there a means of optimizating that logic that I have missed?

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [24] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!