1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

antirack

Hero Member

Offline

Activity: 489
Merit: 500

Immersionist

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 06:32:23 AM
Last edit: January 31, 2012, 06:46:38 AM by antirack

#1041

You guys are just too spoiled I guess Wink

Companies intentionally promise stuff that they cannot deliver and they know it up front that they will not. Look at Sony. They do it all the time and they are certainly not an exception. One example:

When they announced the PSP years ago, they said WE GONNA RULE THE WORLD WITH THIS SHIT AND GRAN TOURISMO WILL BE AVAILABLE FROM DAY ONE. THROW AWAY YOUR STUPID NINTENDOS. They used this title for months and months to push the PSP to the masses, always saying 'ok it didn't come out at launch day but it will be out soon, get ready'. Even after a year they still said 'coming soon'.

That stupid title (I mean how difficult can it be to port a game that existed for years on other consoles) was delayed for more than 5 years. It didn't even come out for the original PSP after all.

Sony Releases Stupid Piece Of Shit That Doesn't Fucking Work (Onion News Network)
http://www.youtube.com/watch?v=8AyVh1_vWYQ

kano

Legendary

Offline

Activity: 4494
Merit: 1808

Linux since 1997 RedHat 4

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 07:23:23 AM

#1042

... and what is the general opinion of Sony due to this?

So what should the general opinion of BFL be?

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer

kimmeriets

Legendary

Offline

Activity: 1064
Merit: 1000

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 08:16:46 AM

#1043

Quote from: bulanula on December 22, 2011, 01:37:33 PM

We need to know what the chip under the hood is.

Couple of reasons :

-maybe these were sourced from Libya / Egypt and there may be some ethical issues there

I wonder what ethical issues you have in Libya and Egypt? ))))) not fun my sneakers

DiabloD3

Legendary

Offline

Activity: 1162
Merit: 1000

DiabloMiner author

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 09:08:21 AM

#1044

Quote from: RandyFolds on January 30, 2012, 11:39:12 PM

To the FPGA guys here: Why is it 'rolled' and not 'furled'? It seems way more appropriate.

Because its always been unrolling loops. Ears are unfurled, loops are unrolled.

DiabloMiner

kano

Legendary

Offline

Activity: 4494
Merit: 1808

Linux since 1997 RedHat 4

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 09:45:40 AM

#1045

Quote from: DiabloD3 on January 31, 2012, 09:08:21 AM

Quote from: RandyFolds on January 30, 2012, 11:39:12 PM

To the FPGA guys here: Why is it 'rolled' and not 'furled'? It seems way more appropriate.

Because its always been unrolling loops. Ears are unfurled, loops are unrolled.

Actually - every time I see comments about unrolling the sha256 code I wonder how you would do it any way but unrolled.

The only 'rolled' option I can think of is to make a very small part of the FPGA just be P() and use it 122 times (in 2 loops) yet that would be senseless since I'd imagine it would be a MUCH slower way to do it? ...

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer

DiabloD3

Legendary

Offline

Activity: 1162
Merit: 1000

DiabloMiner author

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 09:54:05 AM

#1046

Quote from: kano on January 31, 2012, 09:45:40 AM

Quote from: DiabloD3 on January 31, 2012, 09:08:21 AM

Quote from: RandyFolds on January 30, 2012, 11:39:12 PM

To the FPGA guys here: Why is it 'rolled' and not 'furled'? It seems way more appropriate.

Because its always been unrolling loops. Ears are unfurled, loops are unrolled.

Actually - every time I see comments about unrolling the sha256 code I wonder how you would do it any way but unrolled.

The only 'rolled' option I can think of is to make a very small part of the FPGA just be P() and use it 122 times (in 2 loops) yet that would be senseless since I'd imagine it would be a MUCH slower way to do it? ...

The "dumb" way is to have one function (in the case of an FPGA, one circuit), and rotate the variables/registers in and out of the function. You have the code compiled/circuit implemented exactly once. This would actually be superior for FPGA _if_ they had enough registers, but they don't.

This is extremely slow for basically any implementation, and it also screws over the fact we essentially have 5 or more parallel ops at any given time in the way Bitcoin can optimize* the first, oh, 250 ops (depending on how certain things are implemented, of course).

* In OpenCL, due to all the shortcuts calculating stuff in the host, it starts out as two unrelated chains that eventually merge. The ability to pack VLIW5 here is pretty goddamned handy, makes optimization a much easier job.

DiabloMiner

kano

Legendary

Offline

Activity: 4494
Merit: 1808

Linux since 1997 RedHat 4

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 10:08:41 AM

#1047

... and an FPGA question

Is the FPGA process somewhat similar to something like dominoes falling where the data steps through the FPGA and presents an answer at the other end?
If so, could that stepping process effectively always be active at each step - i.e. input data is fed into the step process once each step (or each 2nd step if there is an overlap issue), so thus there would be output data happening each step time (or each 2x step time)?
Is that how it actually works? Or is it a once through per input then when it outputs, another input?
(yes I guess I really know nothing about how these things actually process internally)

Coz if it does step but the current implementation is once through per input, but it was actually possible to do one input per step (or per 2 steps), that would effectively multiply the processing power almost by the number of steps (or half the number of steps) in the process.

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer

Energizer

Sr. Member

Offline

Activity: 273
Merit: 250

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 12:25:02 PM

#1048

Quote from: kano on January 31, 2012, 10:08:41 AM

... and an FPGA question

Is the FPGA process somewhat similar to something like dominoes falling where the data steps through the FPGA and presents an answer at the other end?
If so, could that stepping process effectively always be active at each step - i.e. input data is fed into the step process once each step (or each 2nd step if there is an overlap issue), so thus there would be output data happening each step time (or each 2x step time)?
Is that how it actually works? Or is it a once through per input then when it outputs, another input?
(yes I guess I really know nothing about how these things actually process internally)

Coz if it does step but the current implementation is once through per input, but it was actually possible to do one input per step (or per 2 steps), that would effectively multiply the processing power almost by the number of steps (or half the number of steps) in the process.

It seems you are new to FPGA programming. You would find this video helpful: "Intro to FPGAs for Software Engineers":

http://www.youtube.com/watch?v=gsTpLtEEobE&feature=related

DeathAndTaxes

Donator
Legendary

Offline

Activity: 1218
Merit: 1079

Gerald Davis

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 12:46:58 PM

#1049

Quote from: kano on January 31, 2012, 09:45:40 AM

Actually - every time I see comments about unrolling the sha256 code I wonder how you would do it any way but unrolled.

This is 1 round of the SHA-256 loop.

It is executed 64 times. So a fully looped version of SHA-256 code (in C# or on a FPGA it doesn't matter) would be something like this in psuedo code:

Initialize all variables, and inputs, round = 1
while round <=64
(
perform SHA-256 round
round++
)
record output

Unrolling in any programming language is the process of converting a looping structure to a flat structure. Even many high level programming languages do it for speed/optimization.

Fully unrolled involves no looping structure at all. In Bitcoin since there is a double hash fully unrolled means input -> flat logic -> double hash output.

The only 'rolled' option I can think of is to make a very small part of the FPGA just be P() and use it 122 times (in 2 loops) yet that would be senseless since I'd imagine it would be a MUCH slower way to do it? ...
[/quote]

Quote from: kano on January 31, 2012, 10:08:41 AM

... and an FPGA question

Is the FPGA process somewhat similar to something like dominoes falling where the data steps through the FPGA and presents an answer at the other end?
If so, could that stepping process effectively always be active at each step - i.e. input data is fed into the step process once each step (or each 2nd step if there is an overlap issue), so thus there would be output data happening each step time (or each 2x step time)?
Is that how it actually works? Or is it a once through per input then when it outputs, another input?
(yes I guess I really know nothing about how these things actually process internally)

Coz if it does step but the current implementation is once through per input, but it was actually possible to do one input per step (or per 2 steps), that would effectively multiply the processing power almost by the number of steps (or half the number of steps) in the process.

makomk

Hero Member

Offline

Activity: 686
Merit: 564

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 01:11:44 PM
Last edit: January 31, 2012, 04:25:08 PM by makomk

#1050

Quote from: DeathAndTaxes on January 31, 2012, 12:46:58 PM

It is executed 64 times. So a fully looped version of SHA-256 code (in C# or on a FPGA it doesn't matter) would be something like this in psuedo code:

Initialize all variables, and inputs, round = 1
while round <=64
(
perform SHA-256 round
round++
)
record output

Which is actually a reasonably sensible way of implementing SHA-256 in an FPGA if you just want to have an efficient way to hash arbitrary pieces of data rather than do something like Bitcoin mining or password cracking. If you're only hashing a single message there's no parallelism that can be exploited - you can't start work on any chunk of the message until you've completely hashed the previous chunks - and no reason to unroll the hashing. That's partly why off the shelf SHA-256 cores aren't much use for Bitcoin mining.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS

heavyb

Full Member

Offline

Activity: 217
Merit: 100

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 01:44:14 PM

#1051

Anyone have tracking numbers or the singles in hand yet? I am anxious awaiting to hear about this, if it is legit I am going to buy.

http://oneminuteslow.com/bitcoin/100-20.png

I build optimized custom mining systems at www.bitcoinsystems.com

kano

Legendary

Offline

Activity: 4494
Merit: 1808

Linux since 1997 RedHat 4

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 03:02:37 PM

#1052

lulz - OK I need to repeat the question ...

But firstly, I know the sha256 code very well - otherwise I wouldn't have mentioned P() ... here's a fully unrolled, optimised as high as possibly needed for gcc's -O2, C sha256 that I generated myself quite a while back ... and yes it works.
It is generated and optimised by code I wrote to produce that entire file.
You cannot actually optimise it any better in C and gain anything but an extremely minor performance increase when you use -O2 with gcc on this.

http://pastebin.com/sxdVSJF1

Also as I said, 122, not 128 (or 178) coz: the 1st 64 is constant over a nonce range (commonly the midstate), you don't need to do the first 3 of the 2nd 64 inside the loop (also in the midstate) nor the last 3 ever of the 3rd 64 with bitcoin

My question is how do FPGA's work internally (if that youtube video really does answer this, oh well, but I've yet to see anything useful on youtube in my life that couldn't be replaced by a TINY web page of text so I ignore youtube links)

As I asked before, do they execute in a manner like dominoes where the clock process advances data through the FPGA in steps?

My googling on the subject suggests this is correct - but I was curious if anyone here knew that much about the internal workings of FPGA and could confirm or otherwise explain that.

Also, that would mean that inside the FPGA there would be something like the discrete steps to hash a nonce range and thus I was wondering if they do really actually hash multiple nonce at the same time each one 1 or 2 discrete steps behind the previous nonce.
Thus if the clock steps data through at X cycles per second, and the process is 1300 steps (a random number of my choosing), you aren't just waiting 1300 cycles for each nonce calculation, you are actually waiting (assuming each nonce is 2 steps apart) 2 cycles for each nonce result but with a startup time of 1300 cycles before the first nonce result comes out.

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer

Inspector 2211

Sr. Member

Offline

Activity: 448
Merit: 250

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 03:29:17 PM

#1053

Quote from: kano on January 31, 2012, 03:02:37 PM

lulz - OK I need to repeat the question ...

But firstly, I know the sha256 code very well - otherwise I wouldn't have mentioned P() ... here's a fully unrolled, optimised as high as possibly needed for gcc's -O2, C sha256 that I generated myself quite a while back ... and yes it works.
It is generated and optimised by code I wrote to produce that entire file.
You cannot actually optimise it any better in C and gain anything but an extremely minor performance increase when you use -O2 with gcc on this.

http://pastebin.com/sxdVSJF1

Also as I said, 122, not 128 (or 178) coz: the 1st 64 is constant over a nonce range (commonly the midstate), you don't need to do the first 3 of the 2nd 64 inside the loop (also in the midstate) nor the last 3 ever of the 3rd 64 with bitcoin

My question is how do FPGA's work internally (if that youtube video really does answer this, oh well, but I've yet to see anything useful on youtube in my life that couldn't be replaced by a TINY web page of text so I ignore youtube links)

As I asked before, do they execute in a manner like dominoes where the clock process advances data through the FPGA in steps?

My googling on the subject suggests this is correct - but I was curious if anyone here knew that much about the internal workings of FPGA and could confirm or otherwise explain that.

Also, that would mean that inside the FPGA there would be something like the discrete steps to hash a nonce range and thus I was wondering if they do really actually hash multiple nonce at the same time each one 1 or 2 discrete steps behind the previous nonce.
Thus if the clock steps data through at X cycles per second, and the process is 1300 steps (a random number of my choosing), you aren't just waiting 1300 cycles for each nonce calculation, you are actually waiting (assuming each nonce is 2 steps apart) 2 cycles for each nonce result but with a startup time of 1300 cycles before the first nonce result comes out.

Yes, current FPGA designs are fully pipelined, as long as they *fit* into the FPGA, and thus you get a hash rate of 200 MH/s at a clock frequency of 200 MHz. And it's not 1300 cycles, but literally 122 (or 128 or something like that).

▄█▄ ▄█ ▀█▀ ▄ ▄███▄▄████▄▀ ▄▄▀▄ ▀█▄██████████▀▄█████▀▄▀ ▄█▀▄███████████████████▄ ▄██▀█▀▀▀▀███▀▀▀█████▄▄▄▀█▀▄ ▄█▀▀ ▀████▀▄████████ █▀█▄▄ ██▀ ▀ ▀ ▀██████████▄ ▄▀▀█▄ ▀ ▀ ███▀▀▀▀▀████▌ ▄ ▀ ████████████▌ █ █████████████▀ ▀▀▀██▀▀██▀▀ ▀▀ ▀▀

BTC-GREEN

Ecological Community in the Green Planet
❱❱❱❱❱❱ WHITEPAGE | ANN THREAD ❰❰❰❰❰❰

.

FACEBOOK ❱❱ TWITTER ❱❱ YOUTUBE
J O I N I C O IIILIVE

malevolent

can into space
Legendary

Offline

Activity: 3472
Merit: 1721

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 07:48:42 PM

#1054

Quote from: heavyb on January 31, 2012, 01:44:14 PM

Anyone have tracking numbers or the singles in hand yet? I am anxious awaiting to hear about this, if it is legit I am going to buy.

Signature space available for rent.

Epoch

Legendary

Offline

Activity: 922
Merit: 1003

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 07:51:42 PM

#1055

Quote from: malevolent on January 31, 2012, 07:48:42 PM

Quote from: heavyb on January 31, 2012, 01:44:14 PM

Anyone have tracking numbers or the singles in hand yet? I am anxious awaiting to hear about this, if it is legit I am going to buy.

The lack of response suggests "no". Sad

yochdog

Legendary

Offline

Activity: 2044
Merit: 1000

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 08:31:11 PM

#1056

Quote from: Epoch on January 31, 2012, 07:51:42 PM

Quote from: malevolent on January 31, 2012, 07:48:42 PM

Quote from: heavyb on January 31, 2012, 01:44:14 PM

Anyone have tracking numbers or the singles in hand yet? I am anxious awaiting to hear about this, if it is legit I am going to buy.

The lack of response suggests "no". Sad

4-6 weeks. Shocked

I had to do it.....I'm sorry......

I am a trusted trader! Ask Inaba, Luo Demin, Vanderbleek, Sannyasi, Episking, Miner99er, Isepick, Amazingrando, Cablez, ColdHardMetal, Dextryn, MB300sd, Robocoder, gnar1ta$ and many others!

rjk

Sr. Member

Offline

Activity: 448
Merit: 250

1ngldh

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 08:55:50 PM

#1057

Quote from: yochdog on January 31, 2012, 08:31:11 PM

4-6 weeks. Shocked

I had to do it.....I'm sorry......

Bad yochdog! That line is reserved for use by RandyFold's god-like presence ^(TM) only.

Mining Rig Extraordinaire - the Trenton BPX6806 18-slot PCIe backplane [PICS] Dead project is dead, all hail the coming of the mighty ASIC!

Costia

Newbie

Offline

Activity: 28
Merit: 0

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 08:59:33 PM

#1058

can they trademark "4 to 6 weeks" by now ?

2112

Legendary

Offline

Activity: 2128
Merit: 1068

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 09:04:11 PM
Last edit: January 31, 2012, 11:14:37 PM by 2112

#1059

Quote from: Inspector 2211 on January 31, 2012, 03:29:17 PM

Yes, current FPGA designs are fully pipelined, as long as they *fit* into the FPGA, and thus you get a hash rate of 200 MH/s at a clock frequency of 200 MHz. And it's not 1300 cycles, but literally 122 (or 128 or something like that).

Lets not confuse unrolling with pipelining. Current open-source designs are fully-unrolled, but I have yet to see a ~~proper~~general pipelined design that is open-sourced. Maybe a different scheme of pipelining and unrolling is what BFL did? By flexibly pipelining the unrolled design they could significantly crank up the clock, since the FPGAs are limited more by the propagation delay in the signal routing than in the propagation delay in the actual logic.

Quote from: kano on January 31, 2012, 03:02:37 PM

As I asked before, do they execute in a manner like dominoes where the clock process advances data through the FPGA in steps?

No, this is a horrible analogy.

To understand the unrolling as applied to the logic design you need to understand the difference between combinatorial and sequential logic.

In combinatorial logic the outputs are simply a function of inputs.

In sequential logic the outputs are a function of inputs and the internal state. All the current FPGA hashing appliances use a variant of sequential logic called synchronous sequential logic: there is a dedicated clock input and a change on the clock is when the internal state gets updated.

Fully unrolled (128-way or 125-way) Bitcoin hash means that the logic that computes it is fully combinatorial, there is no internal state used inside the cascade of the two SHA246 hashers. The clock is still used in the fully-unrolled design: to increment the nonce counter and to sample the zero-comparator at the output of the hasher.

64-way unrolled Bitcoin hash means that there is one internal state register that stores the intermediate state. During the odd clock cycles it does single SHA-256 of the input (midstate and nonce) and stores it in the internal state. During the even clock cycles it does single SHA-256 of the internal state and presents it on the output. This cicrcuit is only about half the size of the above circuit.

Now fully unrolled and pipelined design would be about the same size as the above fully-unrolled design but it would have some internal state registers. If there is one level of pipelining then in each clock cycle first half of the circuit would compute single SHA-256 for nonce "N" and the second half of the circuit would compute single SHA-256 for the nonce "N-1".

In unpipelined design the input signal have to race through full 128 (or 125) rounds of SHA-256 to the zero-comparator. With one level pipelined design the the inputs have to race through 64 rounds of the first SHA-256 to the internal register simultaneously with another set of signals racing through 64 (or 61) rounds from the internal register to the zero-comparator. Simplistically one could say that the clock rate on this design could be almost double of the clock rate of the non-pipelined design.

Again, I haven't seen anyone publishing open-source designs that are ~~both~~ independently unrolled and pipelined. I think this is due to the limitations of the FPGA synthesis tools. They either require tens or hundreds of GB of RAM or months of CPU time.

Anyway, kano, I suggest that you throw away your domino set. Get yourself a free version of Xilinx ISE or Altera Quartus and use them to play FPGA design game. It is like playing Tetris, chess & contract bridge all on the same board.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0

rjk

Sr. Member

Offline

Activity: 448
Merit: 250

1ngldh

Re: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)

January 31, 2012, 09:18:52 PM

#1060

Quote from: 2112 on January 31, 2012, 09:04:11 PM

Lets not confuse unrolling with pipelining. Current open-source designs are fully-unrolled, but I have yet to see a proper pipelined design that is open-sourced. Maybe pipelining is what BFL did? By pipelining the unrolled design they could significantly crank up the clock, since the FPGAs are limited more by the propagation delay in the signal routing than in the propagation delay in the actual logic.

My understanding is that the lack of pipelining is due to the lack of registers in an FPGA, is this correct or not?

Mining Rig Extraordinaire - the Trenton BPX6806 18-slot PCIe backplane [PICS] Dead project is dead, all hail the coming of the mighty ASIC!