jonboy009
Newbie
Offline
Activity: 11
Merit: 0
|
|
June 28, 2011, 02:56:05 PM |
|
Hi Please check out http://forum.bitcoin.org/index.php?topic=23015 if you need any ideas on the computer interface side of your project. At the bottom of the message is my take on a FPGA card. Please could anyone guild me as to what i may be missing on it? thanks jonboy
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 28, 2011, 05:59:37 PM |
|
What do you need an Arduino for? The FT2232 does I2C. Edit: Or whatever other USB-chip we decide upon. And I2C has not even been firmly included, yet AFAIK.
The 100mil post system is much worse that the DIMM to hold vertical cards. Only if you have the card parallel to the backplane and use a connector on each end of the elongated board will it be sufficiently stable (my opinion, but plug in a largish card on a few pins and then move it to get a feel).
The text for the pushbutton sounds like hot-plugging. I don't think we want to to there in the first revision: get it wrong and you have fried one or more cards. As for card detection: no mechanics needed: just short two pins in the connector on the FPGA card and the backplane can detect the presence of this connection. And you mentioned the Arduino again. If the backplane becomes "intelligent" in a later revision, I think a bit more powerful CPU would be in order to also handle Ethernet. Or use an even smaller MCU or even a CPLD if all you want is a bit of control logic...
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 28, 2011, 06:25:28 PM |
|
What do you need an Arduino for? The FT2232 does I2C. Edit: Or whatever other USB-chip we decide upon. And I2C has not even been firmly included, yet AFAIK. [...]
Admittedly: that would be slow as hell: there is only one MPSSE, so the I2C needs to be done by bitbanging GPIOs. To read out an EEPOM once per boot, it should be good enough, though.
|
|
|
|
jonboy009
Newbie
Offline
Activity: 11
Merit: 0
|
|
June 28, 2011, 06:56:36 PM |
|
What do you need an Arduino for? The FT2232 does I2C. Edit: Or whatever other USB-chip we decide upon. And I2C has not even been firmly included, yet AFAIK.
The 100mil post system is much worse that the DIMM to hold vertical cards. Only if you have the card parallel to the backplane and use a connector on each end of the elongated board will it be sufficiently stable (my opinion, but plug in a largish card on a few pins and then move it to get a feel).
The text for the pushbutton sounds like hot-plugging. I don't think we want to to there in the first revision: get it wrong and you have fried one or more cards. As for card detection: no mechanics needed: just short two pins in the connector on the FPGA card and the backplane can detect the presence of this connection. And you mentioned the Arduino again. If the backplane becomes "intelligent" in a later revision, I think a bit more powerful CPU would be in order to also handle Ethernet. Or use an even smaller MCU or even a CPLD if all you want is a bit of control logic...
I was thinking that the arduino could be used for the spi interface mainly but i will bow to the more electronic people in the group. At a later stage of the idea i was looking at a card desgined like this http://img696.imageshack.us/img696/6639/fpga1.jpgBut once again with no design or circuit skill i will bow to the group. Hot-plugging with the arduino in spi interface mode was an idea for the next card not powered by the motherboard to signal it's need for power and prompt the arduino to power up the 400watt psu. Adding cards wuld only be done with the system turned off. Ethernet - the motherboard has a 10/100/1000 link. As i thought the data rate of the bitcoin network was low? Would there be far more data moving over the dimm (usb or what ever chip is chosen) link than the ehternet? jonboy
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 28, 2011, 07:17:55 PM |
|
[...] The text for the pushbutton sounds like hot-plugging. I don't think we want to [go] there in the first revision: get it wrong and you have fried one or more cards. As for card detection: no mechanics needed: just short two pins in the connector on the FPGA card and the backplane can detect the presence of this connection. And you mentioned the Arduino again. If the backplane becomes "intelligent" in a later revision, I think a bit more powerful CPU would be in order to also handle Ethernet. Or use an even smaller MCU or even a CPLD if all you want is a bit of control logic...
I was thinking that the arduino could be used for the spi interface mainly but i will bow to the more electronic people in the group. I am (mostly) trying to keep size and cost down. I must admit though, that I am strongly underwhelmed by this Arduino: many people do many things with it, but it seems like the wrong thing for this project. Not trying to start a flame war here... [...] Hot-plugging with the arduino in spi interface mode was an idea for the next card not powered by the motherboard to signal it's need for power and prompt the arduino to power up the 400watt psu. Adding cards wuld only be done with the system turned off.
What I write next is not fully decided, yet. So this is my own thoughts: The local power up (12V -> 1.2V etc.) is controlled by a pin of the USB chip. So as soon the the host computer starts up the chip (detects its presence on the USB bus), it will boot up the local power supplies. To power on the external ATX power supply, I don't think concrete plans have been made. It is possible to have the backplane do this the same way as for the individual FPGA cards (by USB detection). It is also possible to assume a "constant on": the laptop power supplies discussed here do not have an enable input. Ethernet - the motherboard has a 10/100/1000 link. As i thought the data rate of the bitcoin network was low?
Would there be far more data moving over the dimm (usb or what ever chip is chosen) link than the ehternet?
Why are you assuming a large data rate? The connection is extremely slow: JTAG can be thought of as the acoustic coupler of the on-PCB data busses: every station understands what they say and it is robust, but not really fast. We don't need more for this project. If you are referring to me mentioning Ethernet: there are some participants of this board who would love to design an Ethernet capable backplane that has its own host computer included. You basically plug it into power and network and it starts mining! It was decided to postpone this for the second version, though: once the USB connected version works, adding the CPU on the backplane is straightforward: there is no extra work involved by going over another iteration of boards and that way is safer.
|
|
|
|
jonboy009
Newbie
Offline
Activity: 11
Merit: 0
|
|
June 28, 2011, 07:37:27 PM |
|
Ethernet - the motherboard has a 10/100/1000 link. As i thought the data rate of the bitcoin network was low?
Would there be far more data moving over the dimm (usb or what ever chip is chosen) link than the ehternet?
Why are you assuming a large data rate? The connection is extremely slow: JTAG can be thought of as the acoustic coupler of the on-PCB data busses: every station understands what they say and it is robust, but not really fast. We don't need more for this project. If you are referring to me mentioning Ethernet: there are some participants of this board who would love to design an Ethernet capable backplane that has its own host computer included. You basically plug it into power and network and it starts mining! It was decided to postpone this for the second version, though: once the USB connected version works, adding the CPU on the backplane is straightforward: there is no extra work involved by going over another iteration of boards and that way is safer. The question regarding ethernet speed was asked in a diffent forum post some time ago. i think it was asked if the data transfered was bigger that a bittorrent transfer. but once again i get speed and data transfer mixed up. How are you planing to connect the dimm bay board to the computer? EDIT - sorry spoted my own error here as you said it would be by usb.jonboy
|
|
|
|
marcus_of_augustus
Legendary
Offline
Activity: 3920
Merit: 2349
Eadem mutata resurgo
|
|
June 28, 2011, 09:49:28 PM Last edit: June 28, 2011, 10:23:22 PM by noone |
|
How much? and how soon? Price and availability? ... Someone has to draw the thing first! I will put my current (1FPGA) board up soonish for someone to convert to the desired board. Yeah, I know that .... Just trying to make you aware that there are other considerations for a successful piece of kit (any product) besides the technical solution. Edit: oops, see that you're already doing this ... butting out now. Saying that I'd probably be in the market just for the interest aspect. What software is freely available and/or supplied?
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 28, 2011, 09:58:24 PM |
|
I tried benchmarking different FPGAs against each other (simulated only, no hardware in hand). For that I took the current code of OrphanedGland, removed the Altera-specific parts from top.vhd and tried compiling it. The interface to the outside world is just a shift register, nothing that would really be used. While I was at it, I added rolling up of the 128-step long pipeline at least for the first stage: divide the size by two for a cost of half the speed. My problem is: I must have made a mistake somewhere, as even the code without my rolling modification gives ridiculous fast speeds. Normally I wouldn't complain , but this seems a bit much! Can someone look at the code and tell me if they find the mistake? In case I didn't make a mistake or the mistake I made does not invalidate the speeds gotten by the compiler, here my results for Xilinx: I always used the default settings after starting a fresh project. The one exception: I enabled "-write_timing_constraints" under Synthesize. Then I ran the Synthesize target. The later targets (Map & Route) were not run. If the LUT usage in the table is larger than 100%, the hash rate is, of course, purely academic. Device | NUM_CORES | SHA256_SEL | ROLLUP | Slice LUT usage | Max freq [Mhz] | Rate [Mhash/s] | XC6SLX45-3CSG324 | 1 | 1 | 1 | 143% | 217 | 108 | XC6SLX45-3CSG324 | 1 | 0 | 1 | 119% | 157 | 78 | XC6SLX75-3CSG484 | 1 | 1 | 0 | 166% | 217 | 217 | XC6SLX75-3CSG484 | 1 | 0 | 0 | 138% | 156 | 156 | XC6SLX75-3CSG484 | 1 | 1 | 1 | 84% | 217 | 108 |
Can someone who has the ISE Logic Edition or better do the simulation for the XC6SLX100 and XC6SLX150? I didn't get anything for Altera: I don't know the software that well and there are errors when trying to estimate the clock frequency (Early Timing Estimate gets negative numbers).
|
|
|
|
TheSeven
|
|
June 29, 2011, 12:31:09 AM |
|
I tried benchmarking different FPGAs against each other (simulated only, no hardware in hand). For that I took the current code of OrphanedGland, removed the Altera-specific parts from top.vhd and tried compiling it. The interface to the outside world is just a shift register, nothing that would really be used. While I was at it, I added rolling up of the 128-step long pipeline at least for the first stage: divide the size by two for a cost of half the speed. My problem is: I must have made a mistake somewhere, as even the code without my rolling modification gives ridiculous fast speeds. Normally I wouldn't complain , but this seems a bit much! Can someone look at the code and tell me if they find the mistake? In case I didn't make a mistake or the mistake I made does not invalidate the speeds gotten by the compiler, here my results for Xilinx: I always used the default settings after starting a fresh project. The one exception: I enabled "-write_timing_constraints" under Synthesize. Then I ran the Synthesize target. The later targets (Map & Route) were not run. If the LUT usage in the table is larger than 100%, the hash rate is, of course, purely academic. Device | NUM_CORES | SHA256_SEL | ROLLUP | Slice LUT usage | Max freq [Mhz] | Rate [Mhash/s] | XC6SLX45-3CSG324 | 1 | 1 | 1 | 143% | 217 | 108 | XC6SLX45-3CSG324 | 1 | 0 | 1 | 119% | 157 | 78 | XC6SLX75-3CSG484 | 1 | 1 | 0 | 166% | 217 | 217 | XC6SLX75-3CSG484 | 1 | 0 | 0 | 138% | 156 | 156 | XC6SLX75-3CSG484 | 1 | 1 | 1 | 84% | 217 | 108 |
Can someone who has the ISE Logic Edition or better do the simulation for the XC6SLX100 and XC6SLX150? I didn't get anything for Altera: I don't know the software that well and there are errors when trying to estimate the clock frequency (Early Timing Estimate gets negative numbers). Don't trust timing data provided at the synthesis stage. The data after place&route is what's relevant, and it is often largely different from the post-synthesis estimates. Assuming it doesn't get stuck during routing, which seems to happen more often than not for Spartan6 FPGAs. I could try to do a full synthesis run for the bigger ones overnight if you share your ISE project with me
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 29, 2011, 06:45:34 AM |
|
[...] Can someone look at the code and tell me if they find the mistake? [...] Don't trust timing data provided at the synthesis stage. The data after place&route is what's relevant, and it is often largely different from the post-synthesis estimates. Assuming it doesn't get stuck during routing, which seems to happen more often than not for Spartan6 FPGAs. I could try to do a full synthesis run for the bigger ones overnight if you share your ISE project with me It is only a four-letter link so you may have missed it, but the code is at https://rapidshare.com/files/1053611031/benchmark.zip. I already found the first bug in the ROLLUP, unfortunately: --- benchmark/top.vhd +++ benchmark/top.vhd @@ -257 +257 @@ - q_nonce(i) <= q_nonce(0) + 2**ROLLUP * (i + NUM_CORES); + q_nonce(i) <= q_nonce(0) + 2**ROLLUP * i + NUM_CORES;
The new clock speed for the one configuration I showed previously that fits is now 176MHz, so 88Mhash/s (still Synthesis only). "Implement Design" now running.
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 29, 2011, 10:42:03 AM |
|
Update on Altera speed: This is with using the default settings, in the "Early Timing Estimate with Synthesis" flow, looking at the Fmax value from the "Slow 1200mV 85C Model": Device | NUM_CORES | SHA256_SEL | ROLLUP | Logic element usage | Max freq [Mhz] | Rate [Mhash/s] | EP4CE115F23C7 | 1 | 0 | 0 | 96% | 87 | 87 |
And I only got Xilinx "Post Place & Route Static Timing" when going for the smallest design on the largest chip I can look at (XC6SLX75, NUM_CORES=1, SHA256_SEL=0, ROLLUP=1) with the goal set to "Area reduction" (strategy 2). Even then the map step fails because of a too dense design. I give here the best clock he can find while having 100k signals unrouted: Device | NUM_CORES | SHA256_SEL | ROLLUP | Slice LUT usage | Max freq [Mhz] | Rate [Mhash/s] | XC6SLX75-3CSG484 | 1 | 0 | 1 | 71% | 52 | 26 |
This data is too fragile to make a decision Xilinx <-> Altera. For example, the Altera chip I tried is a lot more expensive than the Xilinx one (exceeding the specified 200EUR limit, actually). Experts?
|
|
|
|
BkkCoins
|
|
June 29, 2011, 11:09:21 AM |
|
I think it's already been shown difficult to route even the Spartan XL150 size, so you're not likely to get much useable speed on smaller ones.
|
|
|
|
lame.duck
Legendary
Offline
Activity: 1270
Merit: 1000
|
|
June 29, 2011, 12:12:43 PM |
|
This data is too fragile to make a decision Xilinx <-> Altera. For example, the Altera chip I tried is a lot more expensive than the Xilinx one (exceeding the specified 200EUR limit, actually).
Experts?
I did a testcompile for a ep3c120c8 and get 86% logic cells , Fmax 83 MHz, i did not check which is the next legal clock that a pll could generate, nor did i turned optimisation on. I think with the last design by fpgaminer i got similar results. If your rolled up design really works (the last design by fpgaminer does only work fully unrolled) you could also try to fill this and that chip full with miners to get a estimate how much logic elements is needed and use this as a scale factor for smaller and bigger chips. And we could calculate a matrix with the cost per MHash/s ...
|
|
|
|
O_Shovah (OP)
Sr. Member
Offline
Activity: 410
Merit: 252
Watercooling the world of mining
|
|
June 29, 2011, 08:15:10 PM |
|
It seems that in the given price range (i guess its just reasonable to limit this on a prototype) we will have to choose from one of the following. Xilinx: full Spartan 6 Series (especially The LX150 wich is reported to have archieved up to 190 Mhash/s) Plus earlier series (wich seem to be incompetetive regarding Mhash/€ efficiency) ALtera:The cyclone IV Series below 100k LE's (more would crack my price limit but may be discussed if it shows superior performance) Earlier series (May be used if price is limiting or shows better programming caracteristics) If your rolled up design really works (the last design by fpgaminer does only work fully unrolled) you could also try to fill this and that chip full with miners to get a estimate how much logic elements is needed and use this as a scale factor for smaller and bigger chips. And we could calculate a matrix with the cost per MHash/s ...
I would appreciate such a list of performance and Mhash/€. Status so far: I've been talking to TheSeven and he proposed the Spartan 6 LX150.He is familiar with Xilinx software desings and may work on that part. OrphanedGland proposed the Cylone IV series (and also the Stratix Series.But i see them to be prohibitive regarding the prices) fpgaminer has published a miner software for the Altera Cyclone IV 115. I contacted him for further disscussion of the subject and currently wait for his reply. Please give your educated comment PS: I have my exams at university coming up. So from now on i will be only be abled to work on the tread once a day until the begin of august. I think it will go on as smooth as it did up to now. Thanks a lot for all your work and ideas
|
|
|
|
teknohog
|
|
June 30, 2011, 09:36:10 AM |
|
It seems that in the given price range (i guess its just reasonable to limit this on a prototype) we will have to choose from one of the following.
Xilinx: full Spartan 6 Series (especially The LX150 wich is reported to have archieved up to 190 Mhash/s)
Plus earlier series (wich seem to be incompetetive regarding Mhash/€ efficiency)
ALtera:
The cyclone IV Series below 100k LE's (more would crack my price limit but may be discussed if it shows superior performance)
Earlier series (May be used if price is limiting or shows better programming caracteristics)
I'm no expert on the hardware design, but there is at least one issue against the LX150: It requires commercial software to synthesize the design. The free web edition only supports Spartan 6 up to LX45, if I recall correctly. Xilinx seems to package the requisite software with their dev kits, so it is likely that someone here has it and could distribute the generated bitfile. But it is a problem for this open source mentality if I cannot tinker with the device I own. Another question, is there already an open source implementation that works on the LX150? We know there is one for the Cyclone IV, along with a nicer license policy of the synthesis software.
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 30, 2011, 10:59:13 AM |
|
[...] I'm no expert on the hardware design, but there is at least one issue against the LX150: It requires commercial software to synthesize the design. The free web edition only supports Spartan 6 up to LX45, if I recall correctly. [...]
LX75, actually. That is why I was doing my compiles for that chip
|
|
|
|
grid
Newbie
Offline
Activity: 42
Merit: 0
|
|
June 30, 2011, 12:36:05 PM |
|
For the standalone board version, perhaps this form-factor is interesting instead of the DIMM form-factor: http://img825.imageshack.us/img825/2749/frontsidetwistunicellul.th.jpghttp://www.illuminatolabs.com/IlluminatoXMachina.htmWhat I like about that type of design: - Ultra-modular - you can start with just one PCB FPGA node - provide power, use RS232 to communicate with PC - Minimum software development required for the one-node version. Scale the PoC by using a USB-to-multiple-RS232 hub. - Step up to PCB-variants - use 2 PCB variants - one with a Atmel ASIC (Arduino?) providing USB connectivity to PC, and power to FPGA nodes. Use I2C to communicate between hub node and FPGA worker nodes. - one Hub node could support as many worker nodes as the I2C and power distribution network allows (current, ohmic resistance of the interconnects) - Flat design - FPGAs can be easily fitted with heavy heatsinks since the weight is supported by a flat surface, or PCBs screw mounted to a flat surface. Don't care about temperature sensing with a glued-on heatsink. - You could implement support for many hubs on the I2C bus with failover capability - Low hardware cost? - minimum amount of components, cheap connectors. Cost probably comparable to the DIMM version. The cons: - The hub node concept may be complicated to implement, but quite scalable and powerful if done correctly. - It won't be as elegant as the daughterboard-DIMM concept Just some food for thought. Disclaimer: I'm more of a hobbyist than a electronics professional. I'm a software guy.
|
|
|
|
phillipsjk
Legendary
Offline
Activity: 1008
Merit: 1001
Let the chips fall where they may.
|
|
June 30, 2011, 03:21:35 PM |
|
I don't want to kill the project via feature-creep, but if the chips can be individually be brought up via USB, can they be shut-down as well? Uses I have in mind: - Stand-by computing power for if the network hash rate drops.
- Running from solar power; with the ability to shed load if the sun goes behind a cloud.
- Replacing resistive heater with expensive elements that happen to generate bitcoin
|
James' OpenPGP public key fingerprint: EB14 9E5B F80C 1F2D 3EBE 0A2F B3DE 81FF 7B9D 5160
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 30, 2011, 04:41:04 PM |
|
I don't want to kill the project via feature-creep, but if the chips can be individually be brought up via USB, can they be shut-down as well? [...]
A resounding "yes". The individual 12V -> 1.2V etc power supplies we have discussed here all have an enable input: if the host interface chip (to USB or later ethernet) deasserts that signal, the power supplies shut off and then only a very small quiescent current is drawn by them (plus the power for the interface chip, e.g. from USB). This has already been proposed, but noone commented on it before. I take your post as a "pro" vote.
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
June 30, 2011, 04:58:41 PM |
|
For the standalone board version, perhaps this form-factor is interesting instead of the DIMM form-factor: [...] - Ultra-modular - you can start with just one PCB FPGA node - provide power, use RS232 to communicate with PC
Not RS232: you need an extra programming adapter then: either the interface chip on the hub node or the actual FPGA need to have firmware uploaded to them to understand RS232. Maybe you could bit-bang a different protocol over the connector, but that can't be very fast or portable (corrections?). [...] - Step up to PCB-variants - use 2 PCB variants - one with a Atmel ASIC (Arduino?) providing USB connectivity to PC, and power to FPGA nodes. Use I2C to communicate between hub node and FPGA worker nodes.
Do you intend I2C as the only means of communication between the cards? Then my point from above is again valid: you need an extra cable / bus to transfer firmware to the FPGAs. [...] - You could implement support for many hubs on the I2C bus with failover capability [...]
We are outside my area of expertise, so correct me if I am wrong: does I2C allow for multiple paths between a master and any given slave? I would have expected a single path be important (maybe not for the 400kHz variant). If a single path is required, you would need to add I2C switches to each board to do the routing. I must admit the 2D grid of FPGAs would look aesthetically pleasing. But having to replace a single board in the middle seems to be really tough! Actually, how would you even connect a larger set of boards (say, 5x5)? You would have to connect the boards to rows and then plug together the rows, 5 boards at a time!
|
|
|
|
|