makomk
|
|
July 03, 2011, 12:51:55 AM |
|
I don't suppose the rules for generating a bitstream are documented?
I don't think it is exactly rocket science. It would be of comparable difficulty to writing a compiler. Obviously from the CPU time used, these tools brute force many possibilities.
Harder than rocket science, I think. Not only is the bitstream format totally undocumented, but the algorithms required to map a design to an FPGA effectively are apparently really hairy - which is why a the tools are slow and often tempramental. I hear simulated annealing is quite popular for the actual place-and-route stage... Edit: Oh, and of course if you generate an incorrect bitstream you'll probably blow up your expensive FPGA.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
max3t
Newbie
Offline
Activity: 25
Merit: 0
|
|
July 03, 2011, 09:09:09 AM |
|
Please stop talking off-topic about bruteforcing bitstreams with gpus (that means, stop talking about this here). this thread is about the hardware part.
I'm sure that we'll find someone who could get us those bitstreams, if special software is needed. Please focus on the hardware part.
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
July 03, 2011, 10:26:14 AM |
|
We still need a decision on which FPGA to use. As there has been no new data, I thought of at least copying infos from elsewhere . I took the performance data from the bitcoin wiki and changed the price to what just the chip costs (not the dev board price as stated in the linked table). Price may not be valid for single unit, FPGA has been substituted for cheapest comparable alternative with smallest package. Chip | Rate [MHash/s] | Power [W] | Price [EUR] | Rate/Price [MHash/s/EUR] | Rate/Power [MHash/J] | Altera EP4CE115F23C7N | 80 | 4.4 | 303.69 | 0.263 | 18.2 | Altera EP4CE115F23C7N | 109 | - | 303.69 | 0.359 | - | Xilinx XC5VLX110-1FFG676C | 120 | - | 1126.51 | 0.107 | - | Xilinx XC3S500E-5CPG132C | 3.125 | 0.78 | 20.38 | 0.153 | 4 | But seriously: isn't there someone who can give us some info on chip performance to wrap up this discussion? What I gave here is using different code and does not contain all chips of interest. Especially missing are the Altera EP4CE75F23C7 and the Xilinx XC6SLX75-3CSG484C through XC6SLX150-3CSG484C. The Altera and largest Xilinx are roughly comparable in price and the smaller Xilinx is the best that can be compiled with their free software.
|
|
|
|
lame.duck
Legendary
Offline
Activity: 1270
Merit: 1000
|
|
July 03, 2011, 11:02:49 AM |
|
For the numbers, i would take the the unused resource into acount. I have a EP3C25 and a EP2C35, both running a 8 stage pipeline that requires 8 clock cycles. If i could extend this scheme to a 12 stage long pipeline, this would only require 6 cycles which are 25% more MHash if there is no impact on the clock cycle length.
|
|
|
|
OrphanedGland
Member
Offline
Activity: 70
Merit: 10
|
|
July 03, 2011, 11:23:51 AM |
|
Ok well if you need some direction let me say that unless you are choosing the largest cyclone iv or spartan 6 device you are probably wasting your time. You will need to put up with the issue that a license is required to perform compiles. This can be overcome by people with licenses volunteering to perform compiles.
|
|
|
|
O_Shovah (OP)
Sr. Member
Offline
Activity: 410
Merit: 252
Watercooling the world of mining
|
|
July 03, 2011, 11:40:44 AM |
|
Would you consider it feasible to use both the Spartan 6 Lx 150 (~130€) and an altera FPGA eg the Cyclone IV E 75k (~175) or The Cyclone IV GX 110K (~214€) on the first prototype stage? It may require a different voltage supply for each of them but i will look into that.
I think it this would be a good chance to balance and test the different FPGA's. So we may develop a individual optimal software solution for each chip.Afther that we may decide on a final one for the series.
I hope on monday i will get the chance to negotiate with Xilinx about the software problem.
|
|
|
|
OrphanedGland
Member
Offline
Activity: 70
Merit: 10
|
|
July 03, 2011, 11:58:21 AM |
|
Cheaper just to choose one. Looks like LX150 is probably the best bet, but would be nice to have some compilation results.
|
|
|
|
makomk
|
|
July 03, 2011, 01:01:58 PM Last edit: July 03, 2011, 01:14:39 PM by makomk |
|
But seriously: isn't there someone who can give us some info on chip performance to wrap up this discussion? What I gave here is using different code and does not contain all chips of interest. Especially missing are the Altera EP4CE75F23C7 and the Xilinx XC6SLX75-3CSG484C through XC6SLX150-3CSG484C. The Altera and largest Xilinx are roughly comparable in price and the smaller Xilinx is the best that can be compiled with their free software.
I think I managed to compile a 100 MHash/s design for the EP4CE75F23C7, though no-one has one to test it on and the device was almost totally full so I'm not sure if you'd be able to fit any extra control logic in. Bear in mind that the last digit is the speed grade (lower is faster for Cyclone IV). You can try it for yourself - fpgaminer committed my modified version in projects/DE2_115_makomk_mod, just change which device it targets and the clock speed. If you're careful and do the design right you could probably build a PCB that supported both the 75 and 115. Edit: Also, Ok well if you need some direction let me say that unless you are choosing the largest cyclone iv or spartan 6 device you are probably wasting your time. You will need to put up with the issue that a license is required to perform compiles. This can be overcome by people with licenses volunteering to perform compiles.
The free tools support all Cyclone I, II, III and IV devices, it's just the other ranges of FPGAs that are limited.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
July 03, 2011, 01:31:30 PM |
|
I just realised something when I read makomk previous post (thanks for the info, by the way!): the unit MHash/s is not self-explanatory! To do one check if a nonce is golden or not, you need two calculations of a SHA-256 hash. When I gave my synthesis results previously, I interpreted 1Hash as one calculation of sha(sha(.)). Does everyone do the same or does your unit equate 2Hash to sha(sha(.)) ?
|
|
|
|
makomk
|
|
July 03, 2011, 01:42:53 PM Last edit: July 03, 2011, 05:43:05 PM by makomk |
|
I just realised something when I read makomk previous post (thanks for the info, by the way!): the unit MHash/s is not self-explanatory! To do one check if a nonce is golden or not, you need two calculations of a SHA-256 hash. When I gave my synthesis results previously, I interpreted 1Hash as one calculation of sha(sha(.)). Does everyone do the same or does your unit equate 2Hash to sha(sha(.)) ?
It seems to be standard to list the number of MHash as the number of total sha256(sha256(data)) operations just like you did - certainly that's what I've been doing. I get the impression this dates back to the early days of bitcoin. It's possible others haven't been doing it this way of course. Edit: Also, after a slightly tedious 4-hour build process, Fmax=109.29MHz and 97% resource usage for the fully-unrolled DE2_115_makomk_mod on the EP4CE75F29C7. Might be able to get it up to 110MHz with the right options, but I wouldn't bet on it.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
July 03, 2011, 05:58:56 PM |
|
[...] It seems to be standard to list the number of MHash as the number of total sha256(sha256(data)) operations just like you did - certainly that's what I've been doing. I get the impression this dates back to the early days of bitcoin. It's possible others haven't been doing it this way of course.
Very good. In that case and optimistically assuming that the missing interface logic can be added to makomks code, the current table looks like this: Chip | Rate [MHash/s] | Power [W] | Price [EUR] | Rate/Price [MHash/s/EUR] | Rate/Power [MHash/J] | Altera EP4CE75F23C7N | 109.29 | - | 174.47 | 0.626 | - | Altera EP4CE115F23C7N | 80 | 4.4 | 303.69 | 0.263 | 18.2 | Altera EP4CE115F23C7N | 109 | - | 303.69 | 0.359 | - | Xilinx XC6SLX75-3CSG484C | ? | ? | 67.29 | ? | ? | Xilinx XC6SLX100-3CSG484C | ? | ? | 83.86 | ? | ? | Xilinx XC6SLX150-3CSG484C | ? | ? | 120.47 | ? | ? | Xilinx XC3S500E-5CPG132C | 3.125 | 0.78 | 20.38 | 0.153 | 4 | Xilinx XC5VLX110-1FFG676C | 120 | - | 1126.51 | 0.107 | - |
Please fill in what is missing.
|
|
|
|
O_Shovah (OP)
Sr. Member
Offline
Activity: 410
Merit: 252
Watercooling the world of mining
|
|
July 03, 2011, 08:04:59 PM |
|
Cheaper just to choose one. Looks like LX150 is probably the best bet, but would be nice to have some compilation results.
I had a look into the documentation of both the Spartan 6 and the Cyclone IV series. Xilinx SP 6 :http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf Altera C IV :http://www.altera.com/literature/hb/cyclone-iv/cyiv-53001.pdf As far as i see they both require a 2.5 V and a 1.2 V rail.So the only difference would be how much current each rail needs.( please someone verify that fact) The EPE power calculation gives the ability to estimate the power needed for each rail. Edit: Also, after a slightly tedious 4-hour build process, Fmax=109.29MHz and 97% resource usage for the fully-unrolled DE2_115_makomk_mod on the EP4CE75F29C7. Might be able to get it up to 110MHz with the right options, but I wouldn't bet on it.
So i see it as a given it is possible to run a full miner core on the Altera cyclone IV 75k. Therefore it shall be one final canidate.For increased performance lateron we might have a look into the Cyclone IV GX 150k device. Maybe someone could run a compilation for this one. Also i checked the website of altera and found the FPGA'S in their online shop to be cheaper than at digikey in some cases.This might improve economy. If the power supply of the Spartan 6 series and the Cyclone IV turn out to be really interchangeable i see no reason not to try a prototype for both based on the same board.
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
July 03, 2011, 08:21:00 PM |
|
[...] As far as i see they both require a 2.5 V and a 1.2 V rail.So the only difference would be how much current each rail needs.(please someone verify that fact)
AFAIK: Correct. [...] If the power supply of the Spartan 6 series and the Cyclone IV turn out to be really interchangeable i see no reason not to try a prototype for both based on the same board.
Same BOM, but different boards: the pinout is different. Though much of the development work is identical, redrawing the layout for the other chip is a lot of work.
|
|
|
|
Olaf.Mandel
Member
Offline
Activity: 70
Merit: 10
|
|
July 03, 2011, 08:31:51 PM |
|
New version of the table with lower Altera prices (assuming 1USD=0.6891EUR): Chip | Rate [MHash/s] | Power [W] | Price [EUR] | Rate/Price [MHash/s/EUR] | Rate/Power [MHash/J] | Altera EP4CE75F23C7N | 109.29 | - | 156.75 | 0.697 | - | Altera EP4CE115F23C7N | 80 | 4.4 | 271.79 | 0.294 | 18.2 | Altera EP4CE115F23C7N | 109 | - | 271.79 | 0.401 | - | Xilinx XC6SLX75-3CSG484C | ? | ? | 67.29 | ? | ? | Xilinx XC6SLX100-3CSG484C | ? | ? | 83.86 | ? | ? | Xilinx XC6SLX150-3CSG484C | ? | ? | 120.47 | ? | ? | Xilinx XC3S500E-5CPG132C | 3.125 | 0.78 | 20.38 | 0.153 | 4 | Xilinx XC5VLX110-1FFG676C | 120 | - | 1126.51 | 0.107 | - |
An intermediate result: in order to beat the Altera EP4CE75 (in terms of Rate/Price), the Xilinx XC6SLX75 must achieve more than 46.9MHash/s, and the XC6SLX150 must beat 84MHash/s.
|
|
|
|
newMeat1
|
|
July 03, 2011, 08:35:44 PM Last edit: July 03, 2011, 08:58:39 PM by newMeat1 |
|
You guys might have seen my work on a Cyclone IV board on this thread: http://forum.bitcoin.org/index.php?topic=9047.msg299381#msg299381makomk convinced me since then that an EP4CE75 is the most efficient way to go. This seems to be supported by Olaf Mandel's table. Just last night I waded through the documents and made a spreadsheet of the pinout. I'll sell it for 2 BTC, PM me if interested. It will get you started-- and save you several hours of boring, tedious, sometimes confusing work. It's for a JTAG-configured device with one clock input. I might also share a PCB design for several BTC. Or is this one of those threads where a lot of talk happens, but no action? (excluding makomk, of course) It's not possible for one set of pads to support both EP4CE75 and EP4CE115, unfortunately- too many different pins.
|
|
|
|
max3t
Newbie
Offline
Activity: 25
Merit: 0
|
|
July 03, 2011, 08:43:50 PM |
|
[...]In that case and optimistically assuming that the missing interface logic can be added to makomks code, the current table looks like this:[...]
I'm not quite sure, so eventually excuse me for wasting time. You filled in "109.29 MHash/s" although "Fmax=109.29MHz" was reported. (MHz instead of MHash/s). Or are they the same when fully unrolled? Btw congrats makomk, your result sounds great
|
|
|
|
O_Shovah (OP)
Sr. Member
Offline
Activity: 410
Merit: 252
Watercooling the world of mining
|
|
July 03, 2011, 08:46:59 PM |
|
New version of the table with lower Altera prices (assuming 1USD=0.6891EUR): Chip | Rate [MHash/s] | Power [W] | Price [EUR] | Rate/Price [MHash/s/EUR] | Rate/Power [MHash/J] | Altera EP4CE75F23C7N | 109.29 | - | 156.75 | 0.697 | - | Altera EP4CE115F23C7N | 80 | 4.4 | 271.79 | 0.294 | 18.2 | Altera EP4CE115F23C7N | 109 | - | 271.79 | 0.401 | - | Xilinx XC6SLX75-3CSG484C | ? | ? | 67.29 | ? | ? | Xilinx XC6SLX100-3CSG484C | ? | ? | 83.86 | ? | ? | Xilinx XC6SLX150-3CSG484C | ? | ? | 120.47 | ? | ? | Xilinx XC3S500E-5CPG132C | 3.125 | 0.78 | 20.38 | 0.153 | 4 | Xilinx XC5VLX110-1FFG676C | 120 | - | 1126.51 | 0.107 | - |
Thank you Olaf for the table. I will add it to the first post and update it if we get further results.
|
|
|
|
makomk
|
|
July 03, 2011, 09:32:18 PM |
|
I'm not quite sure, so eventually excuse me for wasting time. You filled in "109.29 MHash/s" although "Fmax=109.29MHz" was reported. (MHz instead of MHash/s). Or are they the same when fully unrolled?
The fully unrolled design does one hash per clock cycle, so yeah, they are the same.
|
Quad XC6SLX150 Board: 860 MHash/s or so. SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
|
|
|
OrphanedGland
Member
Offline
Activity: 70
Merit: 10
|
|
July 04, 2011, 01:15:45 AM |
|
Good to see you have allowed 3% for interface changes
|
|
|
|
TheSeven
|
|
July 04, 2011, 08:32:40 AM |
|
makomk convinced me since then that an EP4CE75 is the most efficient way to go. This seems to be supported by Olaf Mandel's table.
What about the Xilinx XC6SLX150-3CSG484C? It's cheaper than the EP4CE75 and will definitely allow for higher hash rates. As I already mentioned multiple times, ArtForz (a bitcoin early adopter with a huge mining farm) claims to run 190MH/s on that one, and I think we can trust him. Sadly I haven't managed to reproduce this myself so far, as I don't have the time nor the processing power needed to do lots of synthesis runs to optimize it. He considered releasing the source code though... We might just need to poke him a bit more to actually do that.
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
|