I have been working with the FPGA designs by fpgaminer and ztex for some time now. For a while, I was interested in seeing what might be possible with the relatively inexpensive Cyclove V which was recently released by Altera (looks like it might edge out the Xilinx LX150, but nothing definitive yet). But even if it does wind up beating the LX150, it will still fall significantly short of the mark set by Butterfly Labs, so it's hard to become too excited by the idea.
Not exactly. It is just extremely complex work to get LX150 to make it nice. And other thing is extremely complex to deal with FPGA vendors when have unstable demand. Building several racks already enables to make better deals by just showing them how this thing may look like. Because these single-board sales does not impress much.
Obviously, a much larger leap is possible going to a full ASIC design, and it sounds like that may already be in the works based on what others here have said. However, undertaking a full custom ASIC design not only takes a fair bit of cash, but it is also fairly time consuming compared to alternatives. In the world of bitcoin, things can change very quickly so I can't see investing my money in a project which would have a payoff more than around 6 months out.
Obviously. But Altera Hardcopy-based design would still have gap against same size chip, but with custom-designed ASIC. I've even mentioned in one topic how this should be done.
I took a slightly modified version of code originally developed by fpgaminer and subsequently enhanced by makomk. First, I built it for a Stratix IV device, and then for the corresponding Hardcopy IV device -- the HC4E35FF1152 which has the largest number of H-Cells of any Hardcopy IV device. Due to memory limitations of my build machine, I was not able to compile more than 2 full miners on the chip (each miner needs about 1.25GB of memory for synthesis, and I haven't yet upgraded my development machine with the needed ~48GB of virtual memory that would be required to compile a fully populated device. However, some interesting extrapolations can be drawn from what I did do.
First of all, a single miner compiled for the HC4E35FF1152 uses 306,648 H-Cells, with no optimizations enabled. Fmax is 316 MHz for the slow 85C model. But wait -- the device has a total of 9,774,880 H-Cells on it. So you could theoretically fit 31.9 miners it with no optimization. Assuming it's possible (with optimization) to fit 30 miners on the device, and (with optimization) reach 300 MHz per miner, I get 9 GH/s hash rate. Perhaps it would be more realistic to go with something like 25 miners on the device, though in that case it should be possible to get a slightly higher Fmax (say 325MHz). That still gives over 8 GH/s.
I just took one of my versions optimized for altera devices (unrolled prototype, and optimizations are made this time for size, not for clock as with LX150) and got following performance (tried to lay it on chip):
Device EP4SE230...C3: fmax about 253 Mhz, number of unrolled miners - 4 - That's about 1 Gh/s (not tried in H/W)
Device EP4SE530...C3: fmax about 249 Mhz, number of unrolled miners - 8 - That's about 2 Gh/s per chip.
Device 5CGXB... C7: fmax about 163 Mhz, number of unrolled miners - 2 - That's about 326 Mh/s per chip.
Quartus optimizations does not matter much here however, as most reward is in logics layout within design. Still NEITHER OF THAT DESIGN VERIFIED IN HARDWARE. So one just can pray, that same performance due to extreme toggle rates will be in hardware. I expect that there will be clock degradation in Stratix IV chips and no degradation for Cyclone V because of consumed power. What's nice about CycloneV compared to LX150 for example is more than twice less power consumption, however performance is near the same. Design like with LX150 is unlikely to get there due to routing limitations as there's more wires for routing unrolled round, while less resources to enter LAB, and so it is unlikely to get dense packing there as could be done with LX150.
Device: HC4E25FF484 ; Slow 85C model - 321.34 Mhz clock;
Synthesis tool says: 208'723 H-Cells / 98'873 block memory bits
Fitter tool says: 371'641 H-Cells / 98'873 block memory bits.
So what's your quote was ? out of synthesis tool or of fitter tool ?
Then I've started counting by M9K blocks - and found that there's not enough of them - as only about 5 miners can be put into HC4E25FF484 that way. And I expect about 10 miners into 9M H-Cell chip. That's not good indeed, as M9K/H-cell balance is completely different!
Second tryout (making it less consuming M9Ks by switching hard-coded M9K altsyncram to altshift_taps):
Device: HC4E25FF484 ; Slow 85C model - 340 Mhz
(but 362 Mhz if not counting m9k limits);
Synthesis tool say: 212'327 H-Cells / 27'778 block memory bits.
Fitter tool says: 382'488 H-Cells / 27'778 block memory bits.
Please note, that clock increased, this is due to improved density... With such thing I would say it would fit about
10 of such things into 5M chip and about 18 into 9M chip to stay safe with the clock and not packing it too dense.
Third tryout (removing M9K blocks completely, as it seems that it would be more dense with everything implemented in FF pairs).
Device: HC4E25FF484 ; Slow 85C model - 367 Mhz
Synthesis tool say: 277'929
H-Cells / 0 block memory bits;
Fitter tool says: 516'059
H-Cells / 0 block memory bits;
Looking at floor plan gives me clue that about 8 of these miners would fit into 5M chips and 14 into 9M chip without making tough problems. As you see, I am _lowering_ numbers significantly... Because if I even manage to squeeze things so hard, it could end up with clocks not like in tools, but like 200-220 Mhz...
So it will be likely about 6 Gh/s per single chip HC4E25FF484 with decent design, not 8 Gh/s. However that's without specific hardcopy-related optimizations... If spending additional 2-3 month on it - mine design could be improved to say 7-8 Gh/s potential, but not so much like on spartan6 indeed. For ztex or fpgaminer I suppose it would be less at about 4-5 Gh/s per such chip.
Then - I rise following question - if we say sign contract to buy Artix7 chips in quantities compared to ASIC development and production costs - would prices be SIGNIFICANTLY less ? Say like $30 per chip, while getting of about 0.5-0.6 Gh/s ? Then if so - what's the point to go into ASIC ? This is even less than planned for ASIC...
With ASIC however that would be even easier to do, as this would proof to Xilinx that if someone invested $2M into ASIC building, then definitely there exists market and they would lower their FPGA prices to be competitive.... So from hardware design point of view ASIC blow offs, but looking from other - financial point of view - sASIC does not offers much benefits compared to FPGA internal prices to their vendors - so this game could end with epic failure. However this would be still nice investment and result for community overall, as Mh/s would be lower, but I would rather choose gradually slow step-by-step evolution as there's no need to hurry.
Do you really think that cost of silicon differs so much ? Actually what makes costs here is the IP, design, etc... Stratix IV and Cyclone V costs internal to Altera to build would have costs compared to their silicon die area... But they sell it at different prices because NRE costs very different for these chips, and they want to recover their R&D costs.
I would expect that the boards with an sASIC could be manufactured for around $1500 in reasonable quantities (500+), though this should only be taken as a ballpark figure as I have not yet been in contact with Altera to work out more exact costs of producing the sASICs. I'm assuming they would cost on the order of $1000 each. If fully populated/tested hardware were then sold for $2000 each, that would yield a minimum of 4 MH/$, which I think is better than anything else out there at the moment.
Everything I've written above is very preliminary. I wanted to get a feel for the ballpark level of investment that would be required and the performance potential for a miner based on Altera Hardcopy. If based on the above very tentative numbers, there is enough interest in pursuing this further, I would certainly be interested in playing a role. If not, then I'll go back to playing with Cylcone IVs and Vs for the fun of it.
I would say that all of that extremely preliminary.... And I even doubt that until first payment to Altera goes, you would even know how tight chip could be filled... And even then with such toggle rates and sASIC (I suppose they design chip having 12.5-20% toggle rates in mind) there could be problems with logics powering, if you compact it too dense.