1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

RandyFolds

Sr. Member

Offline

Activity: 448
Merit: 250

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 09:41:58 PM

#1061

Quote from: rjk on January 31, 2012, 08:55:50 PM

Quote from: yochdog on January 31, 2012, 08:31:11 PM

4-6 weeks. Shocked

I had to do it.....I'm sorry......

Bad yochdog! That line is reserved for use by RandyFold's god-size presence ^(TM) only.

Fixed that for ya...

Inspector 2211

Sr. Member

Offline

Activity: 448
Merit: 250

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 10:00:40 PM

#1062

Quote from: rjk on January 31, 2012, 09:18:52 PM

Quote from: 2112 on January 31, 2012, 09:04:11 PM

Lets not confuse unrolling with pipelining. Current open-source designs are fully-unrolled, but I have yet to see a proper pipelined design that is open-sourced. Maybe pipelining is what BFL did? By pipelining the unrolled design they could significantly crank up the clock, since the FPGAs are limited more by the propagation delay in the signal routing than in the propagation delay in the actual logic.

My understanding is that the lack of pipelining is due to the lack of registers in an FPGA, is this correct or not?

A Spartan6-LX150 has 184000 flipflops, and for a double SHA-256 only 32768 flip-flops are needed. 128 stages x 256 width = 32768. Fits easily if you have 184000 at your disposal.

Thus, I find it very hard to believe that current designs are not pipelined. Also, a typical design such as the ZTEX design achieves 200 MH/s with 200 MHz. Assuming it is not a fully pipelined design, that would mean that all 128 (or 125) stages have to percolate through in a mere 5 ns, because 5 ns is the clock period of 200 MHz. 40 ps (picoseconds) per stage? I don't think so.

▄█▄ ▄█ ▀█▀ ▄ ▄███▄▄████▄▀ ▄▄▀▄ ▀█▄██████████▀▄█████▀▄▀ ▄█▀▄███████████████████▄ ▄██▀█▀▀▀▀███▀▀▀█████▄▄▄▀█▀▄ ▄█▀▀ ▀████▀▄████████ █▀█▄▄ ██▀ ▀ ▀ ▀██████████▄ ▄▀▀█▄ ▀ ▀ ███▀▀▀▀▀████▌ ▄ ▀ ████████████▌ █ █████████████▀ ▀▀▀██▀▀██▀▀ ▀▀ ▀▀

BTC-GREEN

Ecological Community in the Green Planet
❱❱❱❱❱❱ WHITEPAGE | ANN THREAD ❰❰❰❰❰❰

FACEBOOK ❱❱ TWITTER ❱❱ YOUTUBE
J O I N I C O IIILIVE

2112

Legendary

Offline

Activity: 2128
Merit: 1076

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 10:01:08 PM
Last edit: January 31, 2012, 10:34:37 PM by 2112

#1063

Quote from: rjk on January 31, 2012, 09:18:52 PM

My understanding is that the lack of pipelining is due to the lack of registers in an FPGA, is this correct or not?

No, I don't think so. I think that the limitations are due to the heuristics used by FPGA synthesis tools. At least in Xilinxes the registers are essentially free. Pretty much each slice can have direct combinatorial outputs or registered outputs mixed with no restrictions.

I shouldn't have written about no pipelining. The more accurate way would be inflexible pipelining. It would be better to describe level of unrolling and level of pipelining as two variables that are somewhat independent.

I just looked again into the folder that I used to store the Verilog source code for Bitcoin hashers.

It seems like some of them are indeed pipelined, but the level of pipelining is equal to the level of unrolling. It seems like ztex uses 125-way unrolling and 125-way pipelining. So the design computes in a single clock rounds of hashes for nonces (N-124 to N). When nonce N is on the input the output shows the final hash for nonce N-124.

In general a 125-way unrolled design can be pipelined anywhere from 1 to 125 stages.

There are also other possible ways of pipelining the SHA-256. For example the (W(i) + K(i)) expansion function uses a four-way adder: K(i) + S1(W(i-2)) + W(-7) + S0(W(i-15)) + W(i-16). One could factor out the last two addends S0(W(i-15)) + W(i-16) and precompute them in previous round as S0(W(i-14)) + W(i-15). Or even go two rounds deep and compute S0(W(i-13)) + W(i-14). And so forth.

My guess is that the number of possible valid transformations overwhelm the synthesis tools and they blow up either on memory usage or time.

Anyway, those are just my speculations. I haven't spend much eyeball time analyzing the available codes.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0

2112

Legendary

Offline

Activity: 2128
Merit: 1076

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 10:06:04 PM

#1064

Quote from: Inspector 2211 on January 31, 2012, 10:00:40 PM

Thus, I find it very hard to believe that current designs are not pipelined.

Yeah, you are right and I was wrong. It seems like the N-way unrolled designs are also N-way pipelined. But the degree of pipelining doesn't have to equal the degree of unrolling.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0

2112

Legendary

Offline

Activity: 2128
Merit: 1076

⇾ Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 10:32:56 PM
Last edit: February 01, 2012, 12:18:41 AM by 2112

#1065

Quote from: 2112 on January 31, 2012, 10:01:08 PM

My guess is that the number of possible valid transformations overwhelm the synthesis tools and they blow up either on memory usage or time.

I apologize, I'm having problem posting and editing the posts.

There just so many logically-equivalent ways to synthesize the SHA-256. For example somebody earlier posted a snippet of his synthesis where he used the adders in the DSP blocks on the Virtex 6 chip. For this to be really beneficial on Spartan 6 chips one has to write a location-dependent Verilog: when near a DSP block use its adder, when far away synthesize the adder using local slice resources.

The number of available trade-offs is immense.

And thus far I have talked only about synthesis. But the full working design requires two more steps: place and route. This opens another of dimensions that need to be explored for optimization.

One guy here on this forum is working on a design where he wrote a Java program to generate a Verilog program that does hashing. The Verilog is all location-constrained to the particular slices.

Somebody else posted a code that explicitly uses ternary adders Y = A + B + C. As far as I know Xilinx ISE will always synthesize adder trees Y = (A + B) + C or Y = A + (B + C) or Y = (A + C) + B.

On some other site I've found an implementation that pipelines rounds in pairs: 128-way unrolled Bitcoin hash would've had 64-way pipelining. Again, it wasn't for Spartan 6, but some other Xilinx chip.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0

rjk

Sr. Member

Offline

Activity: 462
Merit: 250

1ngldh

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 10:39:19 PM

#1066

Quote from: 2112 on January 31, 2012, 10:01:08 PM

My guess is that the number of possible valid transformations overwhelm the synthesis tools and they blow up either on memory usage or time.

So if I could get my hands on a 4x hex core server with 256GB of RAM, you FPGA guys would love me long time?

Mining Rig Extraordinaire - the Trenton BPX6806 18-slot PCIe backplane [PICS] Dead project is dead, all hail the coming of the mighty ASIC!

DiabloD3

Legendary

Offline

Activity: 1162
Merit: 1000

DiabloMiner author

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 10:42:00 PM

#1067

Quote from: rjk on January 31, 2012, 10:39:19 PM

Quote from: 2112 on January 31, 2012, 10:01:08 PM

My guess is that the number of possible valid transformations overwhelm the synthesis tools and they blow up either on memory usage or time.

So if I could get my hands on a 4x hex core server with 256GB of RAM, you FPGA guys would love me long time?

They may even use lube.

DiabloMiner

RandyFolds

Sr. Member

Offline

Activity: 448
Merit: 250

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 10:48:30 PM

#1068

Quote from: DiabloD3 on January 31, 2012, 09:08:21 AM

Quote from: RandyFolds on January 30, 2012, 11:39:12 PM

To the FPGA guys here: Why is it 'rolled' and not 'furled'? It seems way more appropriate.

Because its always been unrolling loops. Ears are unfurled, loops are unrolled.

And just because "it's always been that way", it's ok? You don't happen to live in Alabama, now, do you? Tongue

2112

Legendary

Offline

Activity: 2128
Merit: 1076

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 11:07:45 PM

#1069

Quote from: rjk on January 31, 2012, 10:39:19 PM

So if I could get my hands on a 4x hex core server with 256GB of RAM, you FPGA guys would love me long time?

I recall somebody posting a screenshot of a control session for an Amazon EC2 farm containing over 50 machines doing the Xilinx design. I really don't think that using more brute-force would be helpful.

The SHA-family of algorithms are very regular and pretty much every bit depends on every bit. This hits a weak spot in the global optimization algorithm used by the FPGA tools.

I think that the way forward goes through the use of specialized synthesis tools that don't make generic assumptions about what kind of circuitry is being synthesized.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0

RandyFolds

Sr. Member

Offline

Activity: 448
Merit: 250

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 11:10:53 PM

#1070

Quote from: 2112 on January 31, 2012, 11:07:45 PM

Quote from: rjk on January 31, 2012, 10:39:19 PM

So if I could get my hands on a 4x hex core server with 256GB of RAM, you FPGA guys would love me long time?

Anyone remember that thread where a guy had some crazy graphic utility for FPGA design? I didn't understand a lick of what everyone was talking about, but it seemed that he was hand plotting it, and the pictures were awesome...

fizzisist

Hero Member

Offline

Activity: 720
Merit: 528

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 11:13:58 PM

#1071

Quote from: RandyFolds on January 31, 2012, 11:10:53 PM

https://bitcointalk.org/index.php?topic=49971

fizzisist.com | Price Image Generator | FPGAMining.com

makomk

Hero Member

Offline

Activity: 686
Merit: 564

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 11:24:48 PM

#1072

Quote from: 2112 on January 31, 2012, 10:01:08 PM

It seems like some of them are indeed pipelined, but the level of pipelining is equal to the level of unrolling. It seems like ztex uses 125-way unrolling and 125-way pipelining. So the design computes in a single clock rounds of hashes for nonces (N-124 to N). When nonce N is on the input the output shows the final hash for nonce N-124.

In general a 125-way unrolled design can be pipelined anywhere from 1 to 125 stages.

ztex's latest code actually has two pipeline stages for every SHA-256 round, which is partly why it's so much faster; ISE has trouble routing the design efficiently. It varies as to how much sense this makes though. Also, the FPGA synthesis tools support something called register rebalancing where they move the registers that divide up the calculations into pipeline stages backwards and forwards in order to get the best speed, so it's not necessarily a simple question of one (or two) pipeline stages per round.

Quote from: 2112 on January 31, 2012, 10:32:56 PM

Somebody else posted a code that explicitly uses ternary adders Y = A + B + C. As far as I know Xilinx ISE will always synthesize adder trees Y = (A + B) + C or Y = A + (B + C) or Y = (A + C) + B.

Actually, I seem to recall that it's quite happy to automatically use ternary adders on Spartan-6.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS

RandyFolds

Sr. Member

Offline

Activity: 448
Merit: 250

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 11:32:53 PM

#1073

Quote from: fizzisist on January 31, 2012, 11:13:58 PM

Quote from: RandyFolds on January 31, 2012, 11:10:53 PM

https://bitcointalk.org/index.php?topic=49971

That one! What is it?

2112

Legendary

Offline

Activity: 2128
Merit: 1076

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 11:46:08 PM
Last edit: February 01, 2012, 12:05:40 AM by 2112

#1074

Quote from: fizzisist on January 31, 2012, 11:13:58 PM

https://bitcointalk.org/index.php?topic=49971

I don't think he did manual placement using his hands and mouse. From my understanding he wrote a Java program that produced Verilog with explicit location constrains as well as a Tcl script that controlled the fpgaeditor to put the finishing touches in signal routing.

Quote from: RandyFolds on January 31, 2012, 11:32:53 PM

That one! What is it?

That is "fpgaeditor" from Xilinx ISE. This is actually how FPGA started, if I recall correctly the suite was called X-ACT not ISE. But for sure it wasn't called X-Acto, although it felt like using one. The automatic circuit syntesis was an expensive upgrade. I have no personal experience beyond peeking over coworker's shoulders.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0

yochdog

Legendary

Offline

Activity: 2044
Merit: 1000

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

February 01, 2012, 04:48:34 AM

#1075

Quote from: rjk on January 31, 2012, 08:55:50 PM

Quote from: yochdog on January 31, 2012, 08:31:11 PM

4-6 weeks. Shocked

I had to do it.....I'm sorry......

Bad yochdog! That line is reserved for use by RandyFold's god-like presence ^(TM) only.

I wish I could "like" this.

I am a trusted trader! Ask Inaba, Luo Demin, Vanderbleek, Sannyasi, Episking, Miner99er, Isepick, Amazingrando, Cablez, ColdHardMetal, Dextryn, MB300sd, Robocoder, gnar1ta$ and many others!

RandyFolds

Sr. Member

Offline

Activity: 448
Merit: 250

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

February 01, 2012, 04:22:50 PM

#1076

Well......it's february.

DeathAndTaxes

Donator
Legendary

Offline

Activity: 1218
Merit: 1187

Gerald Davis

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

February 01, 2012, 04:57:48 PM

#1077

I got my tracking #!

Er wait that was just a tracking # for dog food from petflow. Sorry.

Inaba

Legendary

Offline

Activity: 1260
Merit: 1000

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

February 01, 2012, 05:00:15 PM

#1078

Petflow is the shiznit. No more lugging 50lb bags around!

If you're searching these lines for a point, you've probably missed it. There was never anything there in the first place.

DeathAndTaxes

Donator
Legendary

Offline

Activity: 1218
Merit: 1187

Gerald Davis

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

February 01, 2012, 05:02:00 PM
Last edit: February 01, 2012, 09:48:45 PM by DeathAndTaxes

#1079

Quote from: Inaba on February 01, 2012, 05:00:15 PM

Petflow is the shiznit. No more lugging 50lb bags around!

Isn't it. I like the scheduler too. Set it & forget it.

kano

Legendary

Offline

Activity: 4732
Merit: 1904

Linux since 1997 RedHat 4

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

February 01, 2012, 08:15:45 PM

#1080

So I'm sure people have emailed Sonny and asked where their single(s) is/are.
Anyone gonna post the reply he gave this time?
Since they have redone the power on the board they should also have some new performance figures .... anyone got them yet?

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer

Pages: « 1 ... 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 [54] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 »

Bitcoin Forum > Other > Off-topic > 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

« previous topic next topic »