Bitcoin Forum
August 17, 2017, 05:41:55 PM *
News: Latest stable version of Bitcoin Core: 0.14.2  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 ... 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 [54] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 »
  Print  
Author Topic: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)  (Read 143926 times)
RandyFolds
Sr. Member
****
Offline Offline

Activity: 448



View Profile
January 31, 2012, 09:41:58 PM
 #1061

4-6 weeks.   Shocked



I had to do it.....I'm sorry......
Bad yochdog! That line is reserved for use by RandyFold's god-size presence (TM) only.

Fixed that for ya...
1502991715
Hero Member
*
Offline Offline

Posts: 1502991715

View Profile Personal Message (Offline)

Ignore
1502991715
Reply with quote  #2

1502991715
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1502991715
Hero Member
*
Offline Offline

Posts: 1502991715

View Profile Personal Message (Offline)

Ignore
1502991715
Reply with quote  #2

1502991715
Report to moderator
1502991715
Hero Member
*
Offline Offline

Posts: 1502991715

View Profile Personal Message (Offline)

Ignore
1502991715
Reply with quote  #2

1502991715
Report to moderator
Inspector 2211
Sr. Member
****
Offline Offline

Activity: 404



View Profile
January 31, 2012, 10:00:40 PM
 #1062

Lets not confuse unrolling with pipelining. Current open-source designs are fully-unrolled, but I have yet to see a proper pipelined design that is open-sourced. Maybe pipelining is what BFL did? By pipelining the unrolled design they could significantly crank up the clock, since the FPGAs are limited more by the propagation delay in the signal routing than in the propagation delay in the actual logic.
My understanding is that the lack of pipelining is due to the lack of registers in an FPGA, is this correct or not?

A Spartan6-LX150 has 184000 flipflops, and for a double SHA-256 only 32768 flip-flops are needed. 128 stages x 256 width = 32768. Fits easily if you have 184000 at your disposal.

Thus, I find it very hard to believe that current designs are not pipelined. Also, a typical design such as the ZTEX design achieves 200 MH/s with 200 MHz. Assuming it is not a fully pipelined design, that would mean that all 128 (or 125) stages have to percolate through in a mere 5 ns, because 5 ns is the clock period of 200 MHz. 40 ps (picoseconds) per stage? I don't think so.

KICKICO██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
|██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
██  ██  █
2112
Legendary
*
Offline Offline

Activity: 1918



View Profile
January 31, 2012, 10:01:08 PM
 #1063

My understanding is that the lack of pipelining is due to the lack of registers in an FPGA, is this correct or not?
No, I don't think so. I think that the limitations are due to the heuristics used by FPGA synthesis tools. At least in Xilinxes the registers are essentially free. Pretty much each slice can have direct combinatorial outputs or registered outputs mixed with no restrictions.

I shouldn't have written about no pipelining. The more accurate way would be inflexible pipelining. It would be better to describe level of unrolling and level of pipelining as two variables that are somewhat independent.

I just looked again into the folder that I used to store the Verilog source code for Bitcoin hashers.

It seems like some of them are indeed pipelined, but the level of pipelining is equal to the level of unrolling. It seems like ztex uses 125-way unrolling and 125-way pipelining.  So the design computes in a single clock rounds of hashes for nonces (N-124 to N). When nonce N is on the input the output shows the final hash for nonce N-124.

In general a 125-way unrolled design can be pipelined anywhere from 1 to 125 stages.

There are also other possible ways of pipelining the SHA-256. For example the (W(i) + K(i)) expansion function uses a four-way adder: K(i) + S1(W(i-2)) + W(-7) + S0(W(i-15)) + W(i-16). One could factor out the last two addends S0(W(i-15)) + W(i-16) and precompute them in previous round as S0(W(i-14)) + W(i-15). Or even go two rounds deep and compute S0(W(i-13)) + W(i-14). And so forth.

My guess is that the number of possible valid transformations overwhelm the synthesis tools and they blow up either on memory usage or time.

Anyway, those are just my speculations. I haven't spend much eyeball time analyzing the available codes.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
2112
Legendary
*
Offline Offline

Activity: 1918



View Profile
January 31, 2012, 10:06:04 PM
 #1064

Thus, I find it very hard to believe that current designs are not pipelined.
Yeah, you are right and I was wrong. It seems like the N-way unrolled designs are also N-way pipelined. But the degree of pipelining doesn't have to equal the degree of unrolling.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
2112
Legendary
*
Offline Offline

Activity: 1918



View Profile
January 31, 2012, 10:32:56 PM
 #1065

My guess is that the number of possible valid transformations overwhelm the synthesis tools and they blow up either on memory usage or time.
I apologize, I'm having problem posting and editing the posts.

There just so many logically-equivalent ways to synthesize the SHA-256. For example somebody earlier posted a snippet of his synthesis where he used the adders in the DSP blocks on the Virtex 6 chip. For this to be really beneficial on Spartan 6 chips one has to write a location-dependent Verilog: when near a DSP block use its adder, when far away synthesize the adder using local slice resources.

The number of available trade-offs is immense.

And thus far I have talked only about synthesis. But the full working design requires two more steps: place and route. This opens another of dimensions that need to be explored for optimization.

One guy here on this forum is working on a design where he wrote a Java program to generate a Verilog program that does hashing. The Verilog is all location-constrained to the particular slices.

Somebody else posted a code that explicitly uses ternary adders Y = A + B + C. As far as I know Xilinx ISE will always synthesize adder trees Y = (A + B) + C or Y = A + (B + C) or Y = (A + C) + B.

On some other site I've found an implementation that pipelines rounds in pairs: 128-way unrolled Bitcoin hash would've had 64-way pipelining. Again, it wasn't for Spartan 6, but some other Xilinx chip.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
rjk
Sr. Member
****
Offline Offline

Activity: 434


1ngldh


View Profile
January 31, 2012, 10:39:19 PM
 #1066

My guess is that the number of possible valid transformations overwhelm the synthesis tools and they blow up either on memory usage or time.
So if I could get my hands on a 4x hex core server with 256GB of RAM, you FPGA guys would love me long time?

Mining Rig Extraordinaire - the Trenton BPX6806 18-slot PCIe backplane [PICS] Dead project is dead, all hail the coming of the mighty ASIC!
DiabloD3
Legendary
*
Offline Offline

Activity: 1162


DiabloMiner author


View Profile WWW
January 31, 2012, 10:42:00 PM
 #1067

My guess is that the number of possible valid transformations overwhelm the synthesis tools and they blow up either on memory usage or time.
So if I could get my hands on a 4x hex core server with 256GB of RAM, you FPGA guys would love me long time?

They may even use lube.

RandyFolds
Sr. Member
****
Offline Offline

Activity: 448



View Profile
January 31, 2012, 10:48:30 PM
 #1068

To the FPGA guys here: Why is it 'rolled' and not 'furled'? It seems way more appropriate.

Because its always been unrolling loops. Ears are unfurled, loops are unrolled.

And just because "it's always been that way", it's ok? You don't happen to live in Alabama, now, do you? Tongue
2112
Legendary
*
Offline Offline

Activity: 1918



View Profile
January 31, 2012, 11:07:45 PM
 #1069

So if I could get my hands on a 4x hex core server with 256GB of RAM, you FPGA guys would love me long time?
I recall somebody posting a screenshot of a control session for an Amazon EC2 farm containing over 50 machines doing the Xilinx design. I really don't think that using more brute-force would be helpful.

The SHA-family of algorithms are very regular and pretty much every bit depends on every bit. This hits a weak spot in the global optimization algorithm used by the FPGA tools.

I think that the way forward goes through the use of specialized synthesis tools that don't make generic assumptions about what kind of circuitry is being synthesized.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
RandyFolds
Sr. Member
****
Offline Offline

Activity: 448



View Profile
January 31, 2012, 11:10:53 PM
 #1070

So if I could get my hands on a 4x hex core server with 256GB of RAM, you FPGA guys would love me long time?
I recall somebody posting a screenshot of a control session for an Amazon EC2 farm containing over 50 machines doing the Xilinx design. I really don't think that using more brute-force would be helpful.

The SHA-family of algorithms are very regular and pretty much every bit depends on every bit. This hits a weak spot in the global optimization algorithm used by the FPGA tools.

I think that the way forward goes through the use of specialized synthesis tools that don't make generic assumptions about what kind of circuitry is being synthesized.


Anyone remember that thread where a guy had some crazy graphic utility for FPGA design? I didn't understand a lick of what everyone was talking about, but it seemed that he was hand plotting it, and the pictures were awesome...
fizzisist
Hero Member
*****
Offline Offline

Activity: 720



View Profile WWW
January 31, 2012, 11:13:58 PM
 #1071

Anyone remember that thread where a guy had some crazy graphic utility for FPGA design? I didn't understand a lick of what everyone was talking about, but it seemed that he was hand plotting it, and the pictures were awesome...

https://bitcointalk.org/index.php?topic=49971

makomk
Hero Member
*****
Offline Offline

Activity: 686


View Profile
January 31, 2012, 11:24:48 PM
 #1072

It seems like some of them are indeed pipelined, but the level of pipelining is equal to the level of unrolling. It seems like ztex uses 125-way unrolling and 125-way pipelining.  So the design computes in a single clock rounds of hashes for nonces (N-124 to N). When nonce N is on the input the output shows the final hash for nonce N-124.

In general a 125-way unrolled design can be pipelined anywhere from 1 to 125 stages.
ztex's latest code actually has two pipeline stages for every SHA-256 round, which is partly why it's so much faster; ISE has trouble routing the design efficiently. It varies as to how much sense this makes though. Also, the FPGA synthesis tools support something called register rebalancing where they move the registers that divide up the calculations into pipeline stages backwards and forwards in order to get the best speed, so it's not necessarily a simple question of one (or two) pipeline stages per round.

Somebody else posted a code that explicitly uses ternary adders Y = A + B + C. As far as I know Xilinx ISE will always synthesize adder trees Y = (A + B) + C or Y = A + (B + C) or Y = (A + C) + B.
Actually, I seem to recall that it's quite happy to automatically use ternary adders on Spartan-6.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
RandyFolds
Sr. Member
****
Offline Offline

Activity: 448



View Profile
January 31, 2012, 11:32:53 PM
 #1073

Anyone remember that thread where a guy had some crazy graphic utility for FPGA design? I didn't understand a lick of what everyone was talking about, but it seemed that he was hand plotting it, and the pictures were awesome...

https://bitcointalk.org/index.php?topic=49971

That one! What is it?
2112
Legendary
*
Offline Offline

Activity: 1918



View Profile
January 31, 2012, 11:46:08 PM
 #1074

I don't think he did manual placement using his hands and mouse. From my understanding he wrote a Java program that produced Verilog with explicit location constrains as well as a Tcl script that controlled the fpgaeditor to put the finishing touches in signal routing.
That one! What is it?
That is "fpgaeditor" from Xilinx ISE. This is actually how FPGA started, if I recall correctly the suite was called X-ACT not ISE. But for sure it wasn't called X-Acto, although it felt like using one. The automatic circuit syntesis was an expensive upgrade. I have no personal experience beyond peeking over coworker's shoulders.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
yochdog
Legendary
*
Offline Offline

Activity: 1876



View Profile
February 01, 2012, 04:48:34 AM
 #1075

4-6 weeks.   Shocked



I had to do it.....I'm sorry......
Bad yochdog! That line is reserved for use by RandyFold's god-like presence (TM) only.

I wish I could "like" this. 

I am a trusted trader!  Ask Inaba, Luo Demin, Vanderbleek, Sannyasi, Episking, Miner99er, Isepick, Amazingrando, Cablez, ColdHardMetal, Dextryn, MB300sd, Robocoder, gnar1ta$ and many others!
RandyFolds
Sr. Member
****
Offline Offline

Activity: 448



View Profile
February 01, 2012, 04:22:50 PM
 #1076

Well......it's february.
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218


Gerald Davis


View Profile
February 01, 2012, 04:57:48 PM
 #1077

I got my tracking #!

Er wait that was just a tracking # for dog food from petflow.  Sorry.
Inaba
Legendary
*
Offline Offline

Activity: 1260



View Profile WWW
February 01, 2012, 05:00:15 PM
 #1078

Petflow is the shiznit.  No more lugging 50lb bags around!

If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218


Gerald Davis


View Profile
February 01, 2012, 05:02:00 PM
 #1079

Petflow is the shiznit.  No more lugging 50lb bags around!

Isn't it.  I like the scheduler too.  Set it & forget it.
kano
Legendary
*
Offline Offline

Activity: 2184


Linux since 1997 RedHat 4


View Profile
February 01, 2012, 08:15:45 PM
 #1080

So I'm sure people have emailed Sonny and asked where their single(s) is/are.
Anyone gonna post the reply he gave this time?
Since they have redone the power on the board they should also have some new performance figures .... anyone got them yet?

Pool: https://kano.is Here on Bitcointalk: Forum BTC: 1KanoPb8cKYqNrswjaA8cRDk4FAS9eDMLU
FreeNode IRC: irc.freenode.net channel #kano.is Majority developer of the ckpool code
Help keep Bitcoin secure by mining on pools with full block verification on all blocks - and NO empty blocks!
Pages: « 1 ... 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 [54] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!