Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards

eldentyrell (OP)

Donator
Legendary

Offline

Activity: 1007
Merit: 1004

felonious vagrancy, personified

Algorithmically placed FPGA miner: 255MH/s/chip, supports all known boards

October 26, 2011, 10:52:05 PM
Last edit: July 26, 2012, 08:36:25 AM by eldentyrell

I've been working on an unconventional design and doing 100%
algorithmic placement. This design contains three rings, each of
which is 64 stages of SHA-256. Therefore, on each clock cycle each
ring gets 0.5 hashes (a nonce has to go through the ring twice in
order to be fully hashed).

Update, 20-Jul-2012: TML-1.0 has been released. We are getting an average of 255MH/s/chip across a large farm of speed grade 2 chips (not the more-expensive grade 3 chips). These are nexus6 boards, which have a power supply which is much more powerful than the one on the ztex boards.

All known boards (ztex, modminer, icarus, x6500, enterpoint, the-nameless-board-of-rph, nexus6) are supported. You might need a JTAG cable. If you would prefer to use your board's proprietary USB connector, contact your board vendor for assistance.

Additional announcement: during the last week, tricone mining has secured funding for its next (i.e. post-TML) bitcoin project (no, not the joke press release). Unfortunately this means that between now and the end of August, further development of the TML will be at a reduced priority. We will continue to maintain the signcryption servers, fix bugs, and produce new bitstreams (we have a farm of servers that sweeps through combinations of the various Xilinx tool command line parameters), so you can expect some minor improvements in performance and a steadily-growing supply of bitstreams to choose from. However, new features will have to wait until 07-Sep-2012. In order to compensate our users for the unfortunate timing of this priority shift, the commission rate is set to 0% until at least 07-Sep-2012.

Please do not send me forum PMs; I find the PM system very cumbersome to use. My email address is in my profile.

______________________________________________________________________________

Update, 12-May-2012: Preliminary power numbers posted.

Update, 9-May-2012: New numbers posted. This is the first design resulting from a substantial re-architecting begun about a month ago (and, I expect, the last one). All the designs up to 145mhz were incremental improvements; the 170mhz design was a major change.

Update, 19-Mar-2012: sorry for the long delay. Had two major
problems: one was a hardware/simulator mismatch that could
only be debugged with git bisect (thank you Linus!), but each
build takes ~24hours. The other problem was voltage sag on my crappy
PCBs (I'm not a PCB designer), which was why the actual speeds were
below what the Xilinx tools said they should be. I can now run at the
rated speed. Results of overclocking tests will be posted shortly. I
also have another build about to finish that should produce another
small bump in the design clock rate this weekend.

Update, 9-Mar-2012: new build finished last night, 140mhz
design rate, 135mhz actual clock rate, 202.5MH/s actual hashrate.
You'll notice that my actual clock rates are slower than my
design clock rates, meaning that I have to very slightly underclock
the design to get optimal error rates. I don't know why this is,
although I only have a single SG-3 chip and it's on one of the first
PCB's I made -- before I got my reflow oven tuned up.

Update, 8-Mar-2012: The results are finally out of the
totally-embarrassing region, so I think I'll start posting them. I
want to emphasize that these are very preliminary. I've only
been doing performance optimization on the full design for three weeks
now. As you can see, I'm getting these hashrates at 128mhz and others
have demonstrated that the SHA-256 critical path on a Spartan-6 can
run well above 200mhz.

All power consumption numbers are measured at the 12V rail of an
ATX power supply (SATA connector). So they include any
inefficiencies introduced by the 12V->1.2V stepdown. Power
consumption measured at the wall will be higher; how much
depends on the power supply's efficiency.

I use adaptive clocking (DCM_CLKGEN) at 1mhz granularity (yes, I
figured out how to fix the jitter issues) and a separate clock
adjustment for each ring. This means that a performance defect in
one part of the chip will force the slowdown of only one of the three
rings. This is another reason why I designed for three separate
half-pipelines instead of one full pipeline and one half-pipeline.

The "actual" power and frequency numbers produced by running the
design on an SG-3 chip at 1200mV, adapting the three clocks to
maximize (non-erroneous) hashrate, then averaging the three
frequencies. Most of my cluster is SG-2, but I quote the SG-3 numbers
since that's what everybody else quotes numbers for. I have digital
control of the voltage supply on my boards, so I'll try adaptive
overvolting at some point; the chips are rated up to 1320mV.

See the end of the thread for map output, or if you are
interested in licensing.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.

ElectricMucus

Legendary

Offline

Activity: 1666
Merit: 1057

Marketing manager - GO MP