Bitcoin Forum
December 11, 2016, 04:26:53 AM *
News: To be able to use the next phase of the beta forum software, please ensure that your email address is correct/functional.
 
   Home   Help Search Donate Login Register  
Poll
Question: Wich FPGA shall be used on our prototype ?
Xilinx Spartan 6 LX 150 - 17 (70.8%)
Altera Cyclone IV 75k - 7 (29.2%)
Total Voters: 24

Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 »
  Print  
Author Topic: Modular FPGA Miner Hardware Design Development  (Read 112211 times)
makomk
Hero Member
*****
Offline Offline

Activity: 686


View Profile
July 03, 2011, 12:51:55 AM
 #141

I don't suppose the rules for generating a bitstream are documented?

I don't think it is exactly rocket science. It would be of comparable difficulty to writing a compiler. Obviously from the CPU time used, these tools brute force many possibilities.
Harder than rocket science, I think. Not only is the bitstream format totally undocumented, but the algorithms required to map a design to an FPGA effectively are apparently really hairy - which is why a the tools are slow and often tempramental. I hear simulated annealing is quite popular for the actual place-and-route stage...

Edit: Oh, and of course if you generate an incorrect bitstream you'll probably blow up your expensive FPGA.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
1481430413
Hero Member
*
Offline Offline

Posts: 1481430413

View Profile Personal Message (Offline)

Ignore
1481430413
Reply with quote  #2

1481430413
Report to moderator
1481430413
Hero Member
*
Offline Offline

Posts: 1481430413

View Profile Personal Message (Offline)

Ignore
1481430413
Reply with quote  #2

1481430413
Report to moderator
1481430413
Hero Member
*
Offline Offline

Posts: 1481430413

View Profile Personal Message (Offline)

Ignore
1481430413
Reply with quote  #2

1481430413
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1481430413
Hero Member
*
Offline Offline

Posts: 1481430413

View Profile Personal Message (Offline)

Ignore
1481430413
Reply with quote  #2

1481430413
Report to moderator
max3t
Newbie
*
Offline Offline

Activity: 25


View Profile
July 03, 2011, 09:09:09 AM
 #142

Please stop talking off-topic about bruteforcing bitstreams with gpus (that means, stop talking about this here). this thread is about the hardware part.

I'm sure that we'll find someone who could get us those bitstreams, if special software is needed. Please focus on the hardware part.

I don't expect anything, but I am listening at 18WN5YRGaBKGPus4n8QHuF7YnyzyDxMRQ6 Wink
Olaf.Mandel
Member
**
Offline Offline

Activity: 70


View Profile
July 03, 2011, 10:26:14 AM
 #143

We still need a decision on which FPGA to use. As there has been no new data, I thought of at least copying infos from elsewhere Wink. I took the performance data from the bitcoin wiki and changed the price to what just the chip costs (not the dev board price as stated in the linked table). Price may not be valid for single unit, FPGA has been substituted for cheapest comparable alternative with smallest package.

ChipRate [MHash/s]Power [W]Price [EUR]Rate/Price [MHash/s/EUR]Rate/Power [MHash/J]
Altera EP4CE115F23C7N804.4303.690.26318.2
Altera EP4CE115F23C7N109-303.690.359-
Xilinx XC5VLX110-1FFG676C120-1126.510.107-
Xilinx XC3S500E-5CPG132C3.1250.7820.380.1534

But seriously: isn't there someone who can give us some info on chip performance to wrap up this discussion? What I gave here is using different code and does not contain all chips of interest. Especially missing are the Altera EP4CE75F23C7 and the Xilinx XC6SLX75-3CSG484C through XC6SLX150-3CSG484C. The Altera and largest Xilinx are roughly comparable in price and the smaller Xilinx is the best that can be compiled with their free software.
lame.duck
Legendary
*
Offline Offline

Activity: 1242


View Profile
July 03, 2011, 11:02:49 AM
 #144

For the numbers, i would take the the unused resource into acount.
I have a EP3C25 and a EP2C35, both running a 8 stage pipeline that requires 8 clock cycles. If i could extend this scheme to a 12 stage long pipeline, this would only require 6 cycles which are 25% more MHash if there is no impact on the clock cycle length.
OrphanedGland
Member
**
Offline Offline

Activity: 71


View Profile
July 03, 2011, 11:23:51 AM
 #145

Ok well if you need some direction let me say that unless you are choosing the largest cyclone iv or spartan 6 device you are probably wasting your time.  You will need to put up with the issue that a license is required to perform compiles.  This can be overcome by people with licenses volunteering to perform compiles.
O_Shovah
Sr. Member
****
Offline Offline

Activity: 410


Watercooling the world of mining


View Profile
July 03, 2011, 11:40:44 AM
 #146

Would you consider it feasible to use both  the Spartan 6 Lx 150 (~130€) and an altera FPGA eg the Cyclone IV E 75k (~175) or The Cyclone IV GX 110K (~214€) on the first prototype stage?
It may require a different voltage supply for each of them but i will look into that.

I think it this would be a good chance to balance and test the different FPGA's. So we may develop a individual optimal software solution for each chip.Afther that we may decide on a final one for the series.


I hope on monday i will get the chance to negotiate with Xilinx about the software problem.

OrphanedGland
Member
**
Offline Offline

Activity: 71


View Profile
July 03, 2011, 11:58:21 AM
 #147

Cheaper just to choose one.  Looks like LX150 is probably the best bet, but would be nice to have some compilation results.
makomk
Hero Member
*****
Offline Offline

Activity: 686


View Profile
July 03, 2011, 01:01:58 PM
 #148

But seriously: isn't there someone who can give us some info on chip performance to wrap up this discussion? What I gave here is using different code and does not contain all chips of interest. Especially missing are the Altera EP4CE75F23C7 and the Xilinx XC6SLX75-3CSG484C through XC6SLX150-3CSG484C. The Altera and largest Xilinx are roughly comparable in price and the smaller Xilinx is the best that can be compiled with their free software.
I think I managed to compile a 100 MHash/s design for the EP4CE75F23C7, though no-one has one to test it on and the device was almost totally full so I'm not sure if you'd be able to fit any extra control logic in. Bear in mind that the last digit is the speed grade (lower is faster for Cyclone IV). You can try it for yourself - fpgaminer committed my modified version in projects/DE2_115_makomk_mod, just change which device it targets and the clock speed. If you're careful and do the design right you could probably build a PCB that supported both the 75 and 115.

Edit: Also,
Ok well if you need some direction let me say that unless you are choosing the largest cyclone iv or spartan 6 device you are probably wasting your time.  You will need to put up with the issue that a license is required to perform compiles.  This can be overcome by people with licenses volunteering to perform compiles.
The free tools support all Cyclone I, II, III and IV devices, it's just the other ranges of FPGAs that are limited.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
Olaf.Mandel
Member
**
Offline Offline

Activity: 70


View Profile
July 03, 2011, 01:31:30 PM
 #149

I just realised something when I read makomk previous post (thanks for the info, by the way!): the unit MHash/s is not self-explanatory! To do one check if a nonce is golden or not, you need two calculations of a SHA-256 hash. When I gave my synthesis results previously, I interpreted 1Hash as one calculation of sha(sha(.)). Does everyone do the same or does your unit equate 2Hash to sha(sha(.)) ?
makomk
Hero Member
*****
Offline Offline

Activity: 686


View Profile
July 03, 2011, 01:42:53 PM
 #150

I just realised something when I read makomk previous post (thanks for the info, by the way!): the unit MHash/s is not self-explanatory! To do one check if a nonce is golden or not, you need two calculations of a SHA-256 hash. When I gave my synthesis results previously, I interpreted 1Hash as one calculation of sha(sha(.)). Does everyone do the same or does your unit equate 2Hash to sha(sha(.)) ?
It seems to be standard to list the number of MHash as the number of total sha256(sha256(data)) operations just like you did - certainly that's what I've been doing. I get the impression this dates back to the early days of bitcoin. It's possible others haven't been doing it this way of course.

Edit: Also, after a slightly tedious 4-hour build process, Fmax=109.29MHz and 97% resource usage for the fully-unrolled DE2_115_makomk_mod on the EP4CE75F29C7. Might be able to get it up to 110MHz with the right options, but I wouldn't bet on it.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
Olaf.Mandel
Member
**
Offline Offline

Activity: 70


View Profile
July 03, 2011, 05:58:56 PM
 #151

[...]
It seems to be standard to list the number of MHash as the number of total sha256(sha256(data)) operations just like you did - certainly that's what I've been doing. I get the impression this dates back to the early days of bitcoin. It's possible others haven't been doing it this way of course.

Very good. In that case and optimistically assuming that the missing interface logic can be added to makomks code, the current table looks like this:

ChipRate [MHash/s]Power [W]Price [EUR]Rate/Price [MHash/s/EUR]Rate/Power [MHash/J]
Altera EP4CE75F23C7N109.29-174.470.626-
Altera EP4CE115F23C7N804.4303.690.26318.2
Altera EP4CE115F23C7N109-303.690.359-
Xilinx XC6SLX75-3CSG484C??67.29??
Xilinx XC6SLX100-3CSG484C??83.86??
Xilinx XC6SLX150-3CSG484C??120.47??
Xilinx XC3S500E-5CPG132C3.1250.7820.380.1534
Xilinx XC5VLX110-1FFG676C120-1126.510.107-

Please fill in what is missing.
O_Shovah
Sr. Member
****
Offline Offline

Activity: 410


Watercooling the world of mining


View Profile
July 03, 2011, 08:04:59 PM
 #152

Cheaper just to choose one.  Looks like LX150 is probably the best bet, but would be nice to have some compilation results.

I had a look into the documentation of both the Spartan 6 and the Cyclone IV series.
Xilinx SP 6 :http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf
Altera C IV :http://www.altera.com/literature/hb/cyclone-iv/cyiv-53001.pdf

As far as i see they both require a 2.5 V and a 1.2 V rail.So the only difference would be how much current each rail needs.(please someone verify that fact)
The EPE power calculation gives the ability to estimate the power needed for each rail.

Edit: Also, after a slightly tedious 4-hour build process, Fmax=109.29MHz and 97% resource usage for the fully-unrolled DE2_115_makomk_mod on the EP4CE75F29C7. Might be able to get it up to 110MHz with the right options, but I wouldn't bet on it.

So i see it as a given it is possible to run a full miner core on the Altera cyclone IV 75k. Therefore it shall be one final canidate.

For increased performance lateron we might have a look into the Cyclone IV GX 150k device. Maybe someone could run a compilation for this one.   Also i checked the website of altera and found the FPGA'S in their online shop to be cheaper than at digikey in some cases.This might improve economy.


If the power supply of the Spartan 6 series and the Cyclone IV turn out to be really interchangeable i see no reason not to try a prototype for both based on the same board.  

Olaf.Mandel
Member
**
Offline Offline

Activity: 70


View Profile
July 03, 2011, 08:21:00 PM
 #153

[...]
As far as i see they both require a 2.5 V and a 1.2 V rail.So the only difference would be how much current each rail needs.(please someone verify that fact)

AFAIK: Correct.

[...]
If the power supply of the Spartan 6 series and the Cyclone IV turn out to be really interchangeable i see no reason not to try a prototype for both based on the same board.  

Same BOM, but different boards: the pinout is different. Though much of the development work is identical, redrawing the layout for the other chip is a lot of work.
Olaf.Mandel
Member
**
Offline Offline

Activity: 70


View Profile
July 03, 2011, 08:31:51 PM
 #154

New version of the table with lower Altera prices (assuming 1USD=0.6891EUR):

ChipRate [MHash/s]Power [W]Price [EUR]Rate/Price [MHash/s/EUR]Rate/Power [MHash/J]
Altera EP4CE75F23C7N109.29-156.750.697-
Altera EP4CE115F23C7N804.4271.790.29418.2
Altera EP4CE115F23C7N109-271.790.401-
Xilinx XC6SLX75-3CSG484C??67.29??
Xilinx XC6SLX100-3CSG484C??83.86??
Xilinx XC6SLX150-3CSG484C??120.47??
Xilinx XC3S500E-5CPG132C3.1250.7820.380.1534
Xilinx XC5VLX110-1FFG676C120-1126.510.107-

An intermediate result: in order to beat the Altera EP4CE75 (in terms of Rate/Price), the Xilinx XC6SLX75 must achieve more than 46.9MHash/s, and the XC6SLX150 must beat 84MHash/s.
newMeat1
Full Member
***
Offline Offline

Activity: 210



View Profile
July 03, 2011, 08:35:44 PM
 #155

You guys might have seen my work on a Cyclone IV board on this thread:
http://forum.bitcoin.org/index.php?topic=9047.msg299381#msg299381

makomk convinced me since then that an EP4CE75 is the most efficient way to go. This seems to be supported by Olaf Mandel's table.

Just last night I waded through the documents and made a spreadsheet of the pinout. I'll sell it for 2 BTC, PM me if interested. It will get you started-- and save you several hours of boring, tedious, sometimes confusing work. It's for a JTAG-configured device with one clock input. I might also share a PCB design for several BTC.

Or is this one of those threads where a lot of talk happens, but no action? (excluding makomk, of course)

It's not possible for one set of pads to support both EP4CE75 and EP4CE115, unfortunately- too many different pins.

max3t
Newbie
*
Offline Offline

Activity: 25


View Profile
July 03, 2011, 08:43:50 PM
 #156

[...]In that case and optimistically assuming that the missing interface logic can be added to makomks code, the current table looks like this:[...]

I'm not quite sure, so eventually excuse me for wasting time. You filled in "109.29 MHash/s" although "Fmax=109.29MHz" was reported. (MHz instead of MHash/s). Or are they the same when fully unrolled?

Btw congrats makomk, your result sounds great Wink

I don't expect anything, but I am listening at 18WN5YRGaBKGPus4n8QHuF7YnyzyDxMRQ6 Wink
O_Shovah
Sr. Member
****
Offline Offline

Activity: 410


Watercooling the world of mining


View Profile
July 03, 2011, 08:46:59 PM
 #157

New version of the table with lower Altera prices (assuming 1USD=0.6891EUR):

ChipRate [MHash/s]Power [W]Price [EUR]Rate/Price [MHash/s/EUR]Rate/Power [MHash/J]
Altera EP4CE75F23C7N109.29-156.750.697-
Altera EP4CE115F23C7N804.4271.790.29418.2
Altera EP4CE115F23C7N109-271.790.401-
Xilinx XC6SLX75-3CSG484C??67.29??
Xilinx XC6SLX100-3CSG484C??83.86??
Xilinx XC6SLX150-3CSG484C??120.47??
Xilinx XC3S500E-5CPG132C3.1250.7820.380.1534
Xilinx XC5VLX110-1FFG676C120-1126.510.107-

Thank you Olaf  for the table.

I will add it to the first post and update it if we get further results.


makomk
Hero Member
*****
Offline Offline

Activity: 686


View Profile
July 03, 2011, 09:32:18 PM
 #158

I'm not quite sure, so eventually excuse me for wasting time. You filled in "109.29 MHash/s" although "Fmax=109.29MHz" was reported. (MHz instead of MHash/s). Or are they the same when fully unrolled?
The fully unrolled design does one hash per clock cycle, so yeah, they are the same.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
OrphanedGland
Member
**
Offline Offline

Activity: 71


View Profile
July 04, 2011, 01:15:45 AM
 #159

Good to see you have allowed 3% for interface changes  Roll Eyes
TheSeven
Hero Member
*****
Offline Offline

Activity: 504


FPGA Mining LLC


View Profile WWW
July 04, 2011, 08:32:40 AM
 #160

makomk convinced me since then that an EP4CE75 is the most efficient way to go. This seems to be supported by Olaf Mandel's table.

What about the Xilinx XC6SLX150-3CSG484C? It's cheaper than the EP4CE75 and will definitely allow for higher hash rates.
As I already mentioned multiple times, ArtForz (a bitcoin early adopter with a huge mining farm) claims to run 190MH/s on that one, and I think we can trust him. Sadly I haven't managed to reproduce this myself so far, as I don't have the time nor the processing power needed to do lots of synthesis runs to optimize it. He considered releasing the source code though... We might just need to poke him a bit more to actually do that.

My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!