Bitcoin Forum
April 16, 2024, 07:06:13 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Poll
Question: Wich FPGA shall be used on our prototype ?
Xilinx Spartan 6 LX 150 - 17 (70.8%)
Altera Cyclone IV 75k - 7 (29.2%)
Total Voters: 24

Pages: « 1 2 3 4 5 [6] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 »
  Print  
Author Topic: Modular FPGA Miner Hardware Design Development  (Read 119223 times)
jonboy009
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
June 28, 2011, 02:56:05 PM
 #101

Hi

Please check out http://forum.bitcoin.org/index.php?topic=23015 if you need any ideas on the computer interface side of your project.

At the bottom of the message is my take on a FPGA card. Please could anyone guild me as to what i may be missing on it?

thanks

jonboy
1713251173
Hero Member
*
Offline Offline

Posts: 1713251173

View Profile Personal Message (Offline)

Ignore
1713251173
Reply with quote  #2

1713251173
Report to moderator
1713251173
Hero Member
*
Offline Offline

Posts: 1713251173

View Profile Personal Message (Offline)

Ignore
1713251173
Reply with quote  #2

1713251173
Report to moderator
Every time a block is mined, a certain amount of BTC (called the subsidy) is created out of thin air and given to the miner. The subsidy halves every four years and will reach 0 in about 130 years.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 28, 2011, 05:59:37 PM
 #102

What do you need an Arduino for? The FT2232 does I2C. Edit: Or whatever other USB-chip we decide upon. And I2C has not even been firmly included, yet AFAIK.

The 100mil post system is much worse that the DIMM to hold vertical cards. Only if you have the card parallel to the backplane and use a connector on each end of the elongated board will it be sufficiently stable (my opinion, but plug in a largish card on a few pins and then move it to get a feel).

The text for the pushbutton sounds like hot-plugging. I don't think we want to to there in the first revision: get it wrong and you have fried one or more cards. As for card detection: no mechanics needed: just short two pins in the connector on the FPGA card and the backplane can detect the presence of this connection. And you mentioned the Arduino again. If the backplane becomes "intelligent" in a later revision, I think a bit more powerful CPU would be in order to also handle Ethernet. Or use an even smaller MCU or even a CPLD if all you want is a bit of control logic...
Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 28, 2011, 06:25:28 PM
 #103

What do you need an Arduino for? The FT2232 does I2C. Edit: Or whatever other USB-chip we decide upon. And I2C has not even been firmly included, yet AFAIK.
[...]

Admittedly: that would be slow as hell: there is only one MPSSE, so the I2C needs to be done by bitbanging GPIOs. To read out an EEPOM once per boot, it should be good enough, though.
jonboy009
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
June 28, 2011, 06:56:36 PM
 #104

What do you need an Arduino for? The FT2232 does I2C. Edit: Or whatever other USB-chip we decide upon. And I2C has not even been firmly included, yet AFAIK.

The 100mil post system is much worse that the DIMM to hold vertical cards. Only if you have the card parallel to the backplane and use a connector on each end of the elongated board will it be sufficiently stable (my opinion, but plug in a largish card on a few pins and then move it to get a feel).

The text for the pushbutton sounds like hot-plugging. I don't think we want to to there in the first revision: get it wrong and you have fried one or more cards. As for card detection: no mechanics needed: just short two pins in the connector on the FPGA card and the backplane can detect the presence of this connection. And you mentioned the Arduino again. If the backplane becomes "intelligent" in a later revision, I think a bit more powerful CPU would be in order to also handle Ethernet. Or use an even smaller MCU or even a CPLD if all you want is a bit of control logic...

I was thinking that the arduino could be used for the spi interface mainly but i will bow to the more electronic people in the group.

At a later stage of the idea i was looking at a card desgined like this

http://img696.imageshack.us/img696/6639/fpga1.jpg

But once again with no design or circuit skill i will bow to the group.

Hot-plugging with the arduino in spi interface mode was an idea for the next card not powered by the motherboard to signal it's need for power and prompt the arduino to power up the 400watt psu. Adding cards wuld only be done with the system turned off.

Ethernet - the motherboard has a 10/100/1000 link. As i thought the data rate of the bitcoin network was low?

Would there be far more data moving over the dimm (usb or what ever chip is chosen) link than the ehternet?

jonboy

Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 28, 2011, 07:17:55 PM
 #105

[...]
The text for the pushbutton sounds like hot-plugging. I don't think we want to [go] there in the first revision: get it wrong and you have fried one or more cards. As for card detection: no mechanics needed: just short two pins in the connector on the FPGA card and the backplane can detect the presence of this connection. And you mentioned the Arduino again. If the backplane becomes "intelligent" in a later revision, I think a bit more powerful CPU would be in order to also handle Ethernet. Or use an even smaller MCU or even a CPLD if all you want is a bit of control logic...

I was thinking that the arduino could be used for the spi interface mainly but i will bow to the more electronic people in the group.

I am (mostly) trying to keep size and cost down. I must admit though, that I am strongly underwhelmed by this Arduino: many people do many things with it, but it seems like the wrong thing for this project. Not trying to start a flame war here...

[...]
Hot-plugging with the arduino in spi interface mode was an idea for the next card not powered by the motherboard to signal it's need for power and prompt the arduino to power up the 400watt psu. Adding cards wuld only be done with the system turned off.

What I write next is not fully decided, yet. So this is my own thoughts: The local power up (12V -> 1.2V etc.) is controlled by a pin of the USB chip. So as soon the the host computer starts up the chip (detects its presence on the USB bus), it will boot up the local power supplies. To power on the external ATX power supply, I don't think concrete plans have been made. It is possible to have the backplane do this the same way as for the individual FPGA cards (by USB detection). It is also possible to assume a "constant on": the laptop power supplies discussed here do not have an enable input.

Ethernet - the motherboard has a 10/100/1000 link. As i thought the data rate of the bitcoin network was low?

Would there be far more data moving over the dimm (usb or what ever chip is chosen) link than the ehternet?

Why are you assuming a large data rate? The connection is extremely slow: JTAG can be thought of as the acoustic coupler of the on-PCB data busses: every station understands what they say and it is robust, but not really fast. We don't need more for this project.

If you are referring to me mentioning Ethernet: there are some participants of this board who would love to design an Ethernet capable backplane that has its own host computer included. You basically plug it into power and network and it starts mining! It was decided to postpone this for the second version, though: once the USB connected version works, adding the CPU on the backplane is straightforward: there is no extra work involved by going over another iteration of boards and that way is safer.
jonboy009
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
June 28, 2011, 07:37:27 PM
 #106

Ethernet - the motherboard has a 10/100/1000 link. As i thought the data rate of the bitcoin network was low?

Would there be far more data moving over the dimm (usb or what ever chip is chosen) link than the ehternet?

Why are you assuming a large data rate? The connection is extremely slow: JTAG can be thought of as the acoustic coupler of the on-PCB data busses: every station understands what they say and it is robust, but not really fast. We don't need more for this project.

If you are referring to me mentioning Ethernet: there are some participants of this board who would love to design an Ethernet capable backplane that has its own host computer included. You basically plug it into power and network and it starts mining! It was decided to postpone this for the second version, though: once the USB connected version works, adding the CPU on the backplane is straightforward: there is no extra work involved by going over another iteration of boards and that way is safer.

The question regarding ethernet speed was asked in a diffent forum post some time ago. i think it was asked if the data transfered was bigger that a bittorrent transfer.

but once again i get speed and data transfer mixed up.

How are you planing to connect the dimm bay board to the computer? EDIT - sorry spoted my own error here as you said it would be by usb.

jonboy
marcus_of_augustus
Legendary
*
Offline Offline

Activity: 3920
Merit: 2348


Eadem mutata resurgo


View Profile
June 28, 2011, 09:49:28 PM
Last edit: June 28, 2011, 10:23:22 PM by noone
 #107


How much? and how soon?

Price and availability? ...  Smiley

Someone has to draw the thing first! I will put my current (1FPGA) board up soonish for someone to convert to the desired board.

Yeah, I know that .... Just trying to make you aware that there are other considerations for a successful piece of kit (any product) besides the technical solution.

Edit: oops, see that you're already doing this ... butting out now.

Saying that I'd probably be in the market just for the interest aspect. What software is freely available and/or supplied?

Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 28, 2011, 09:58:24 PM
 #108

I tried benchmarking different FPGAs against each other (simulated only, no hardware in hand). For that I took the current code of OrphanedGland, removed the Altera-specific parts from top.vhd and tried compiling it. The interface to the outside world is just a shift register, nothing that would really be used. While I was at it, I added rolling up of the 128-step long pipeline at least for the first stage: divide the size by two for a cost of half the speed.

My problem is: I must have made a mistake somewhere, as even the code without my rolling modification gives ridiculous fast speeds. Normally I wouldn't complain  Smiley, but this seems a bit much! Can someone look at the code and tell me if they find the mistake?

In case I didn't make a mistake or the mistake I made does not invalidate the speeds gotten by the compiler, here my results for Xilinx:

I always used the default settings after starting a fresh project. The one exception: I enabled "-write_timing_constraints" under Synthesize. Then I ran the Synthesize target. The later targets (Map & Route) were not run. If the LUT usage in the table is larger than 100%, the hash rate is, of course, purely academic.

DeviceNUM_CORESSHA256_SELROLLUPSlice LUT usageMax freq [Mhz]Rate [Mhash/s]
XC6SLX45-3CSG324111143%217108
XC6SLX45-3CSG324101119%15778
XC6SLX75-3CSG484110166%217217
XC6SLX75-3CSG484100138%156156
XC6SLX75-3CSG48411184%217108

Can someone who has the ISE Logic Edition or better do the simulation for the XC6SLX100 and XC6SLX150?

I didn't get anything for Altera: I don't know the software that well and there are errors when trying to estimate the clock frequency (Early Timing Estimate gets negative numbers).
TheSeven
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500


FPGA Mining LLC


View Profile WWW
June 29, 2011, 12:31:09 AM
 #109

I tried benchmarking different FPGAs against each other (simulated only, no hardware in hand). For that I took the current code of OrphanedGland, removed the Altera-specific parts from top.vhd and tried compiling it. The interface to the outside world is just a shift register, nothing that would really be used. While I was at it, I added rolling up of the 128-step long pipeline at least for the first stage: divide the size by two for a cost of half the speed.

My problem is: I must have made a mistake somewhere, as even the code without my rolling modification gives ridiculous fast speeds. Normally I wouldn't complain  Smiley, but this seems a bit much! Can someone look at the code and tell me if they find the mistake?

In case I didn't make a mistake or the mistake I made does not invalidate the speeds gotten by the compiler, here my results for Xilinx:

I always used the default settings after starting a fresh project. The one exception: I enabled "-write_timing_constraints" under Synthesize. Then I ran the Synthesize target. The later targets (Map & Route) were not run. If the LUT usage in the table is larger than 100%, the hash rate is, of course, purely academic.

DeviceNUM_CORESSHA256_SELROLLUPSlice LUT usageMax freq [Mhz]Rate [Mhash/s]
XC6SLX45-3CSG324111143%217108
XC6SLX45-3CSG324101119%15778
XC6SLX75-3CSG484110166%217217
XC6SLX75-3CSG484100138%156156
XC6SLX75-3CSG48411184%217108

Can someone who has the ISE Logic Edition or better do the simulation for the XC6SLX100 and XC6SLX150?

I didn't get anything for Altera: I don't know the software that well and there are errors when trying to estimate the clock frequency (Early Timing Estimate gets negative numbers).

Don't trust timing data provided at the synthesis stage. The data after place&route is what's relevant, and it is often largely different from the post-synthesis estimates. Assuming it doesn't get stuck during routing, which seems to happen more often than not for Spartan6 FPGAs.

I could try to do a full synthesis run for the bigger ones overnight if you share your ISE project with me Smiley


My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 29, 2011, 06:45:34 AM
 #110

[...] Can someone look at the code and tell me if they find the mistake?
[...]

Don't trust timing data provided at the synthesis stage. The data after place&route is what's relevant, and it is often largely different from the post-synthesis estimates. Assuming it doesn't get stuck during routing, which seems to happen more often than not for Spartan6 FPGAs.

I could try to do a full synthesis run for the bigger ones overnight if you share your ISE project with me Smiley

It is only a four-letter link so you may have missed it, but the code is at https://rapidshare.com/files/1053611031/benchmark.zip. I already found the first bug in the ROLLUP, unfortunately:

Code:
--- benchmark/top.vhd
+++ benchmark/top.vhd
@@ -257 +257 @@
-        q_nonce(i)              <= q_nonce(0) + 2**ROLLUP * (i + NUM_CORES);
+        q_nonce(i)              <= q_nonce(0) + 2**ROLLUP * i + NUM_CORES;

The new clock speed for the one configuration I showed previously that fits is now 176MHz, so 88Mhash/s (still Synthesis only). "Implement Design" now running.
Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 29, 2011, 10:42:03 AM
 #111

Update on Altera speed: This is with using the default settings, in the "Early Timing Estimate with Synthesis" flow, looking at the Fmax value from the "Slow 1200mV 85C Model":

DeviceNUM_CORESSHA256_SELROLLUPLogic element usageMax freq [Mhz]Rate [Mhash/s]
EP4CE115F23C710096%8787

And I only got Xilinx "Post Place & Route Static Timing" when going for the smallest design on the largest chip I can look at (XC6SLX75, NUM_CORES=1, SHA256_SEL=0, ROLLUP=1) with the goal set to "Area reduction" (strategy 2). Even then the map step fails because of a too dense design. I give here the best clock he can find while having 100k signals unrouted:

DeviceNUM_CORESSHA256_SELROLLUPSlice LUT usageMax freq [Mhz]Rate [Mhash/s]
XC6SLX75-3CSG48410171%5226

This data is too fragile to make a decision Xilinx <-> Altera. For example, the Altera chip I tried is a lot more expensive than the Xilinx one (exceeding the specified 200EUR limit, actually).

Experts?
BkkCoins
Hero Member
*****
Offline Offline

Activity: 784
Merit: 1009


firstbits:1MinerQ


View Profile WWW
June 29, 2011, 11:09:21 AM
 #112

I think it's already been shown difficult to route even the Spartan XL150 size, so you're not likely to get much useable speed on smaller ones.

lame.duck
Legendary
*
Offline Offline

Activity: 1270
Merit: 1000


View Profile
June 29, 2011, 12:12:43 PM
 #113

This data is too fragile to make a decision Xilinx <-> Altera. For example, the Altera chip I tried is a lot more expensive than the Xilinx one (exceeding the specified 200EUR limit, actually).

Experts?

I did a testcompile for a ep3c120c8 and get 86% logic cells , Fmax 83 MHz, i did not check which is the next legal clock that a pll could generate, nor did i turned optimisation on. I think with the last design by fpgaminer i got similar results.

If your  rolled up design really works (the last design by fpgaminer does only work  fully unrolled) you could also try to fill this and that chip full with miners to get a estimate how much  logic elements is needed  and  use this as a scale factor for  smaller and bigger chips. And we could calculate a matrix with the cost per MHash/s ...
O_Shovah (OP)
Sr. Member
****
Offline Offline

Activity: 410
Merit: 252


Watercooling the world of mining


View Profile
June 29, 2011, 08:15:10 PM
 #114

It seems that in the given price range (i guess its just reasonable to limit this on a prototype) we will have to choose from one of the following.

Xilinx:
 
full Spartan 6 Series             (especially The LX150 wich is reported to have archieved up to 190 Mhash/s)

Plus earlier series                 (wich seem to be incompetetive regarding Mhash/€ efficiency)

ALtera:

The cyclone IV Series below 100k LE's              (more would crack my price limit but may be discussed if it shows superior performance)

Earlier series                                                  (May be used if price is limiting or shows better programming caracteristics)   

If your  rolled up design really works (the last design by fpgaminer does only work  fully unrolled) you could also try to fill this and that chip full with miners to get a estimate how much  logic elements is needed  and  use this as a scale factor for  smaller and bigger chips. And we could calculate a matrix with the cost per MHash/s ...

I would appreciate such a list of performance and Mhash/€.

Status so far:

I've been talking to TheSeven and he proposed the Spartan 6 LX150.He is familiar with Xilinx software desings and may work on that part.

OrphanedGland proposed the Cylone IV series (and also the Stratix Series.But i see them to be prohibitive regarding the prices)

fpgaminer has published a miner software for the Altera Cyclone IV 115. I contacted him for further disscussion of the subject and currently wait for his reply.


Please give your educated comment Smiley

PS:


I have my exams at university coming up.
So from now on i will be only be abled to work on the tread once a day until the begin of august.
I think it will go on as smooth as it did up to now.
Thanks a lot for all your work and ideas  Smiley


 

teknohog
Sr. Member
****
Offline Offline

Activity: 519
Merit: 252


555


View Profile WWW
June 30, 2011, 09:36:10 AM
 #115

It seems that in the given price range (i guess its just reasonable to limit this on a prototype) we will have to choose from one of the following.

Xilinx:
 
full Spartan 6 Series             (especially The LX150 wich is reported to have archieved up to 190 Mhash/s)

Plus earlier series                 (wich seem to be incompetetive regarding Mhash/€ efficiency)

ALtera:

The cyclone IV Series below 100k LE's              (more would crack my price limit but may be discussed if it shows superior performance)

Earlier series                                                  (May be used if price is limiting or shows better programming caracteristics)   

I'm no expert on the hardware design, but there is at least one issue against the LX150: It requires commercial software to synthesize the design. The free web edition only supports Spartan 6 up to LX45, if I recall correctly.

Xilinx seems to package the requisite software with their dev kits, so it is likely that someone here has it and could distribute the generated bitfile. But it is a problem for this open source mentality if I cannot tinker with the device I own.

Another question, is there already an open source implementation that works on the LX150? We know there is one for the Cyclone IV, along with a nicer license policy of the synthesis software.

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 30, 2011, 10:59:13 AM
 #116

[...]
I'm no expert on the hardware design, but there is at least one issue against the LX150: It requires commercial software to synthesize the design. The free web edition only supports Spartan 6 up to LX45, if I recall correctly.
[...]

LX75, actually. That is why I was doing my compiles for that chip Smiley
grid
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
June 30, 2011, 12:36:05 PM
 #117

For the standalone board version, perhaps this form-factor is interesting instead of the DIMM form-factor:

http://img825.imageshack.us/img825/2749/frontsidetwistunicellul.th.jpg
http://www.illuminatolabs.com/IlluminatoXMachina.htm

What I like about that type of design:
 - Ultra-modular - you can start with just one PCB FPGA node - provide power, use RS232 to communicate with PC
 - Minimum software development required for the one-node version. Scale the PoC by using a USB-to-multiple-RS232 hub.
 - Step up to PCB-variants - use 2 PCB variants - one with a Atmel ASIC (Arduino?) providing USB connectivity to PC, and power to FPGA nodes. Use I2C to communicate between hub node and FPGA worker nodes.
 - one Hub node could support as many worker nodes as the I2C and power distribution network allows (current, ohmic resistance of the interconnects)
 - Flat design - FPGAs can be easily fitted with heavy heatsinks since the weight is supported by a flat surface, or PCBs screw mounted to a flat surface. Don't care about temperature sensing with a glued-on heatsink.
 - You could implement support for many hubs on the I2C bus with failover capability

 - Low hardware cost? - minimum amount of components, cheap connectors. Cost probably comparable to the DIMM version.

The cons:
 - The hub node concept may be complicated to implement, but quite scalable and powerful if done correctly.
 - It won't be as elegant as the daughterboard-DIMM concept

Just some food for thought.

Disclaimer: I'm more of a hobbyist than a electronics professional. I'm a software guy.
phillipsjk
Legendary
*
Offline Offline

Activity: 1008
Merit: 1001

Let the chips fall where they may.


View Profile WWW
June 30, 2011, 03:21:35 PM
 #118

I don't want to kill the project via feature-creep, but if the chips can be individually be brought up via USB, can they be shut-down as well?

Uses I have in mind:
  • Stand-by computing power for if the network hash rate drops.
  • Running from solar power; with the ability to shed load if the sun goes behind a cloud.
  • Replacing resistive heater with expensive elements that happen to generate bitcoin

James' OpenPGP public key fingerprint: EB14 9E5B F80C 1F2D 3EBE  0A2F B3DE 81FF 7B9D 5160
Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 30, 2011, 04:41:04 PM
 #119

I don't want to kill the project via feature-creep, but if the chips can be individually be brought up via USB, can they be shut-down as well?
[...]

A resounding "yes". The individual 12V -> 1.2V etc power supplies we have discussed here all have an enable input: if the host interface chip (to USB or later ethernet) deasserts that signal, the power supplies shut off and then only a very small quiescent current is drawn by them (plus the power for the interface chip, e.g. from USB). This has already been proposed, but noone commented on it before. I take your post as a "pro" vote.
Olaf.Mandel
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
June 30, 2011, 04:58:41 PM
 #120

For the standalone board version, perhaps this form-factor is interesting instead of the DIMM form-factor:
[...]
 - Ultra-modular - you can start with just one PCB FPGA node - provide power, use RS232 to communicate with PC

Not RS232: you need an extra programming adapter then: either the interface chip on the hub node or the actual FPGA need to have firmware uploaded to them to understand RS232. Maybe you could bit-bang a different protocol over the connector, but that can't be very fast or portable (corrections?).

[...]
 - Step up to PCB-variants - use 2 PCB variants - one with a Atmel ASIC (Arduino?) providing USB connectivity to PC, and power to FPGA nodes. Use I2C to communicate between hub node and FPGA worker nodes.

Do you intend I2C as the only means of communication between the cards? Then my point from above is again valid: you need an extra cable / bus to transfer firmware to the FPGAs.

[...]
 - You could implement support for many hubs on the I2C bus with failover capability
[...]

We are outside my area of expertise, so correct me if I am wrong: does I2C allow for multiple paths between a master and any given slave? I would have expected a single path be important (maybe not for the 400kHz variant). If a single path is required, you would need to add I2C switches to each board to do the routing.

I must admit the 2D grid of FPGAs would look aesthetically pleasing. But having to replace a single board in the middle seems to be really tough! Actually, how would you even connect a larger set of boards (say, 5x5)? You would have to connect the boards to rows and then plug together the rows, 5 boards at a time!
Pages: « 1 2 3 4 5 [6] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!