Bitcoin Forum
May 07, 2024, 11:02:09 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 [53] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 »
  Print  
Author Topic: 1GH/s, 20w, $700 (was $500) — Butterflylabs, is it for real? (Part 2)  (Read 146879 times)
antirack
Hero Member
*****
Offline Offline

Activity: 489
Merit: 500

Immersionist


View Profile
January 31, 2012, 06:32:23 AM
Last edit: January 31, 2012, 06:46:38 AM by antirack
 #1041

You guys are just too spoiled I guess Wink Companies intentionally promise stuff that they cannot deliver and they know it up front that they will not. Look at Sony. They do it all the time and they are certainly not an exception. One example:

When they announced the PSP years ago, they said WE GONNA RULE THE WORLD WITH THIS SHIT AND GRAN TOURISMO WILL BE AVAILABLE FROM DAY ONE. THROW AWAY YOUR STUPID NINTENDOS. They used this title for months and months to push the PSP to the masses, always saying 'ok it didn't come out at launch day but it will be out soon, get ready'. Even after a year they still said 'coming soon'.

That stupid title (I mean how difficult can it be to port a game that existed for years on other consoles) was delayed for more than 5 years. It didn't even come out for the original PSP after all.

Sony Releases Stupid Piece Of Shit That Doesn't Fucking Work (Onion News Network)
http://www.youtube.com/watch?v=8AyVh1_vWYQ
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715122929
Hero Member
*
Offline Offline

Posts: 1715122929

View Profile Personal Message (Offline)

Ignore
1715122929
Reply with quote  #2

1715122929
Report to moderator
1715122929
Hero Member
*
Offline Offline

Posts: 1715122929

View Profile Personal Message (Offline)

Ignore
1715122929
Reply with quote  #2

1715122929
Report to moderator
kano
Legendary
*
Offline Offline

Activity: 4494
Merit: 1808


Linux since 1997 RedHat 4


View Profile
January 31, 2012, 07:23:23 AM
 #1042

... and what is the general opinion of Sony due to this?

So what should the general opinion of BFL be?

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
kimmeriets
Legendary
*
Offline Offline

Activity: 1064
Merit: 1000


View Profile
January 31, 2012, 08:16:46 AM
 #1043

We need to know what the chip under the hood is.

Couple of reasons :

-maybe these were sourced from Libya / Egypt and there may be some ethical issues there


I wonder what ethical issues you have in Libya and Egypt? ))))) not fun my sneakers
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
January 31, 2012, 09:08:21 AM
 #1044

To the FPGA guys here: Why is it 'rolled' and not 'furled'? It seems way more appropriate.

Because its always been unrolling loops. Ears are unfurled, loops are unrolled.

kano
Legendary
*
Offline Offline

Activity: 4494
Merit: 1808


Linux since 1997 RedHat 4


View Profile
January 31, 2012, 09:45:40 AM
 #1045

To the FPGA guys here: Why is it 'rolled' and not 'furled'? It seems way more appropriate.

Because its always been unrolling loops. Ears are unfurled, loops are unrolled.
Actually - every time I see comments about unrolling the sha256 code I wonder how you would do it any way but unrolled.

The only 'rolled' option I can think of is to make a very small part of the FPGA just be P() and use it 122 times (in 2 loops) yet that would be senseless since I'd imagine it would be a MUCH slower way to do it? ...

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
January 31, 2012, 09:54:05 AM
 #1046

To the FPGA guys here: Why is it 'rolled' and not 'furled'? It seems way more appropriate.

Because its always been unrolling loops. Ears are unfurled, loops are unrolled.
Actually - every time I see comments about unrolling the sha256 code I wonder how you would do it any way but unrolled.

The only 'rolled' option I can think of is to make a very small part of the FPGA just be P() and use it 122 times (in 2 loops) yet that would be senseless since I'd imagine it would be a MUCH slower way to do it? ...

The "dumb" way is to have one function (in the case of an FPGA, one circuit), and rotate the variables/registers in and out of the function. You have the code compiled/circuit implemented exactly once. This would actually be superior for FPGA _if_ they had enough registers, but they don't.

This is extremely slow for basically any implementation, and it also screws over the fact we essentially have 5 or more parallel ops at any given time in the way Bitcoin can optimize* the first, oh, 250 ops (depending on how certain things are implemented, of course).

* In OpenCL, due to all the shortcuts calculating stuff in the host, it starts out as two unrelated chains that eventually merge. The ability to pack VLIW5 here is pretty goddamned handy, makes optimization a much easier job.

kano
Legendary
*
Offline Offline

Activity: 4494
Merit: 1808


Linux since 1997 RedHat 4


View Profile
January 31, 2012, 10:08:41 AM
 #1047

... and an FPGA question Smiley
Is the FPGA process somewhat similar to something like dominoes falling where the data steps through the FPGA and presents an answer at the other end?
If so, could that stepping process effectively always be active at each step - i.e. input data is fed into the step process once each step (or each 2nd step if there is an overlap issue), so thus there would be output data happening each step time (or each 2x step time)?
Is that how it actually works? Or is it a once through per input then when it outputs, another input?
(yes I guess I really know nothing about how these things actually process internally)

Coz if it does step but the current implementation is once through per input, but it was actually possible to do one input per step (or per 2 steps), that would effectively multiply the processing power almost by the number of steps (or half the number of steps) in the process.

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
Energizer
Sr. Member
****
Offline Offline

Activity: 273
Merit: 250



View Profile
January 31, 2012, 12:25:02 PM
 #1048

... and an FPGA question Smiley
Is the FPGA process somewhat similar to something like dominoes falling where the data steps through the FPGA and presents an answer at the other end?
If so, could that stepping process effectively always be active at each step - i.e. input data is fed into the step process once each step (or each 2nd step if there is an overlap issue), so thus there would be output data happening each step time (or each 2x step time)?
Is that how it actually works? Or is it a once through per input then when it outputs, another input?
(yes I guess I really know nothing about how these things actually process internally)

Coz if it does step but the current implementation is once through per input, but it was actually possible to do one input per step (or per 2 steps), that would effectively multiply the processing power almost by the number of steps (or half the number of steps) in the process.

It seems you are new to FPGA programming. You would find this video helpful: "Intro to FPGAs for Software Engineers":

http://www.youtube.com/watch?v=gsTpLtEEobE&feature=related
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
January 31, 2012, 12:46:58 PM
 #1049

Actually - every time I see comments about unrolling the sha256 code I wonder how you would do it any way but unrolled.

This is 1 round of the SHA-256 loop.



It is executed 64 times.  So a fully looped version of SHA-256 code (in C# or on a FPGA it doesn't matter) would be something like this in psuedo code:

Initialize all variables, and inputs, round = 1
while round <=64
(
perform SHA-256 round
round++
)
record output

Unrolling in any programming language is the process of converting a looping structure to a flat structure.  Even many high level programming languages do it for speed/optimization.

Fully unrolled involves no looping structure at all.  In Bitcoin since there is a double hash fully unrolled means input -> flat logic -> double hash output.

The only 'rolled' option I can think of is to make a very small part of the FPGA just be P() and use it 122 times (in 2 loops) yet that would be senseless since I'd imagine it would be a MUCH slower way to do it? ...
[/quote]
... and an FPGA question Smiley
Is the FPGA process somewhat similar to something like dominoes falling where the data steps through the FPGA and presents an answer at the other end?
If so, could that stepping process effectively always be active at each step - i.e. input data is fed into the step process once each step (or each 2nd step if there is an overlap issue), so thus there would be output data happening each step time (or each 2x step time)?
Is that how it actually works? Or is it a once through per input then when it outputs, another input?
(yes I guess I really know nothing about how these things actually process internally)

Coz if it does step but the current implementation is once through per input, but it was actually possible to do one input per step (or per 2 steps), that would effectively multiply the processing power almost by the number of steps (or half the number of steps) in the process.
makomk
Hero Member
*****
Offline Offline

Activity: 686
Merit: 564


View Profile
January 31, 2012, 01:11:44 PM
Last edit: January 31, 2012, 04:25:08 PM by makomk
 #1050

It is executed 64 times.  So a fully looped version of SHA-256 code (in C# or on a FPGA it doesn't matter) would be something like this in psuedo code:

Initialize all variables, and inputs, round = 1
while round <=64
(
perform SHA-256 round
round++
)
record output
Which is actually a reasonably sensible way of implementing SHA-256 in an FPGA if you just want to have an efficient way to hash arbitrary pieces of data rather than do something like Bitcoin mining or password cracking. If you're only hashing a single message there's no parallelism that can be exploited - you can't start work on any chunk of the message until you've completely hashed the previous chunks - and no reason to unroll the hashing. That's partly why off the shelf SHA-256 cores aren't much use for Bitcoin mining.

Quad XC6SLX150 Board: 860 MHash/s or so.
SIGS ABOUT BUTTERFLY LABS ARE PAID ADS
heavyb
Full Member
***
Offline Offline

Activity: 217
Merit: 100



View Profile WWW
January 31, 2012, 01:44:14 PM
 #1051

Anyone have tracking numbers or the singles in hand yet? I am anxious awaiting to hear about this, if it is legit I am going to buy.

kano
Legendary
*
Offline Offline

Activity: 4494
Merit: 1808


Linux since 1997 RedHat 4


View Profile
January 31, 2012, 03:02:37 PM
 #1052

lulz - OK I need to repeat the question ...

But firstly, I know the sha256 code very well - otherwise I wouldn't have mentioned P() ... here's a fully unrolled, optimised as high as possibly needed for gcc's -O2, C sha256 that I generated myself quite a while back ... and yes it works.
It is generated and optimised by code I wrote to produce that entire file.
You cannot actually optimise it any better in C and gain anything but an extremely minor performance increase when you use -O2 with gcc on this.

http://pastebin.com/sxdVSJF1

Also as I said, 122, not 128 (or 178) coz: the 1st 64 is constant over a nonce range (commonly the midstate), you don't need to do the first 3 of the 2nd 64 inside the loop (also in the midstate) nor the last 3 ever of the 3rd 64 with bitcoin

My question is how do FPGA's work internally (if that youtube video really does answer this, oh well, but I've yet to see anything useful on youtube in my life that couldn't be replaced by a TINY web page of text so I ignore youtube links)

As I asked before, do they execute in a manner like dominoes where the clock process advances data through the FPGA in steps?

My googling on the subject suggests this is correct - but I was curious if anyone here knew that much about the internal workings of FPGA and could confirm or otherwise explain that.

Also, that would mean that inside the FPGA there would be something like the discrete steps to hash a nonce range and thus I was wondering if they do really actually hash multiple nonce at the same time each one 1 or 2 discrete steps behind the previous nonce.
Thus if the clock steps data through at X cycles per second, and the process is 1300 steps (a random number of my choosing), you aren't just waiting 1300 cycles for each nonce calculation, you are actually waiting (assuming each nonce is 2 steps apart) 2 cycles for each nonce result but with a startup time of 1300 cycles before the first nonce result comes out.

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
Inspector 2211
Sr. Member
****
Offline Offline

Activity: 448
Merit: 250



View Profile
January 31, 2012, 03:29:17 PM
 #1053

lulz - OK I need to repeat the question ...

But firstly, I know the sha256 code very well - otherwise I wouldn't have mentioned P() ... here's a fully unrolled, optimised as high as possibly needed for gcc's -O2, C sha256 that I generated myself quite a while back ... and yes it works.
It is generated and optimised by code I wrote to produce that entire file.
You cannot actually optimise it any better in C and gain anything but an extremely minor performance increase when you use -O2 with gcc on this.

http://pastebin.com/sxdVSJF1

Also as I said, 122, not 128 (or 178) coz: the 1st 64 is constant over a nonce range (commonly the midstate), you don't need to do the first 3 of the 2nd 64 inside the loop (also in the midstate) nor the last 3 ever of the 3rd 64 with bitcoin

My question is how do FPGA's work internally (if that youtube video really does answer this, oh well, but I've yet to see anything useful on youtube in my life that couldn't be replaced by a TINY web page of text so I ignore youtube links)

As I asked before, do they execute in a manner like dominoes where the clock process advances data through the FPGA in steps?

My googling on the subject suggests this is correct - but I was curious if anyone here knew that much about the internal workings of FPGA and could confirm or otherwise explain that.

Also, that would mean that inside the FPGA there would be something like the discrete steps to hash a nonce range and thus I was wondering if they do really actually hash multiple nonce at the same time each one 1 or 2 discrete steps behind the previous nonce.
Thus if the clock steps data through at X cycles per second, and the process is 1300 steps (a random number of my choosing), you aren't just waiting 1300 cycles for each nonce calculation, you are actually waiting (assuming each nonce is 2 steps apart) 2 cycles for each nonce result but with a startup time of 1300 cycles before the first nonce result comes out.

Yes, current FPGA designs are fully pipelined, as long as they *fit* into the FPGA, and thus you get a hash rate of 200 MH/s at a clock frequency of 200 MHz. And it's not 1300 cycles, but literally 122 (or 128 or something like that).

               ▄█▄
            ▄█ ▀█▀
     ▄ ▄███▄▄████▄▀ ▄▄▀▄
    ▀█▄████
██████▀▄█████▀▄▀
   ▄█▀▄
███████████████████▄
 ▄██▀█▀
▀▀▀███▀▀▀█████▄▄▄▀█▀▄
 ▄█▀▀   ▀█
███▀▄████████ █▀█▄▄
██▀  ▀ ▀ ▀
██████████▄   ▄▀▀█▄
     ▀ ▀
  ███▀▀▀▀▀████▌ ▄  ▀
          ████████████▌   █
        █████████████▀
        ▀▀▀██▀▀██▀▀
           ▀▀  ▀▀
BTC-GREEN       ▄▄████████▄▄
    ▄██████████████▄
  ▄██████
██████████████▄
 ▄███
███████████████████▄
▄█████████████████████████▄
██████████████████████████
███████████████████████████
███████████████████████████
▀█████████████████████████▀
 ▀███████████████████████▀
  ▀█████████████████████▀
    ▀█████████████████
       ▀▀█████████▀▀
Ecological Community in the Green Planet
❱❱❱❱❱❱     WHITEPAGE   |   ANN THREAD     ❰❰❰❰❰❰
           ▄███▄▄
       ▄▄█████████▄
      ▄████████████▌
   ▄█████████████▄▄
 ▄████████████████████
███████████████▄
▄████████████████████▀
███████████████████████▀
 ▀▀██████▀██▌██████▀
   ▀██▀▀▀  ██  ▀▀▀▀▀▀
           ██
           ██▌
          ▐███▄
.
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
January 31, 2012, 07:48:42 PM
 #1054

Anyone have tracking numbers or the singles in hand yet? I am anxious awaiting to hear about this, if it is legit I am going to buy.

 Roll Eyes

Signature space available for rent.
Epoch
Legendary
*
Offline Offline

Activity: 922
Merit: 1003



View Profile
January 31, 2012, 07:51:42 PM
 #1055

Anyone have tracking numbers or the singles in hand yet? I am anxious awaiting to hear about this, if it is legit I am going to buy.

 Roll Eyes

The lack of response suggests "no".  Sad
yochdog
Legendary
*
Offline Offline

Activity: 2044
Merit: 1000



View Profile
January 31, 2012, 08:31:11 PM
 #1056

Anyone have tracking numbers or the singles in hand yet? I am anxious awaiting to hear about this, if it is legit I am going to buy.

 Roll Eyes

The lack of response suggests "no".  Sad

4-6 weeks.   Shocked



I had to do it.....I'm sorry......

I am a trusted trader!  Ask Inaba, Luo Demin, Vanderbleek, Sannyasi, Episking, Miner99er, Isepick, Amazingrando, Cablez, ColdHardMetal, Dextryn, MB300sd, Robocoder, gnar1ta$ and many others!
rjk
Sr. Member
****
Offline Offline

Activity: 448
Merit: 250


1ngldh


View Profile
January 31, 2012, 08:55:50 PM
 #1057

4-6 weeks.   Shocked



I had to do it.....I'm sorry......
Bad yochdog! That line is reserved for use by RandyFold's god-like presence (TM) only.

Mining Rig Extraordinaire - the Trenton BPX6806 18-slot PCIe backplane [PICS] Dead project is dead, all hail the coming of the mighty ASIC!
Costia
Newbie
*
Offline Offline

Activity: 28
Merit: 0



View Profile
January 31, 2012, 08:59:33 PM
 #1058

can they trademark "4 to 6 weeks" by now ?
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1068



View Profile
January 31, 2012, 09:04:11 PM
Last edit: January 31, 2012, 11:14:37 PM by 2112
 #1059

Yes, current FPGA designs are fully pipelined, as long as they *fit* into the FPGA, and thus you get a hash rate of 200 MH/s at a clock frequency of 200 MHz. And it's not 1300 cycles, but literally 122 (or 128 or something like that).
Lets not confuse unrolling with pipelining. Current open-source designs are fully-unrolled, but I have yet to see a propergeneral pipelined design that is open-sourced. Maybe a different scheme of pipelining and unrolling is what BFL did? By flexibly pipelining the unrolled design they could significantly crank up the clock, since the FPGAs are limited more by the propagation delay in the signal routing than in the propagation delay in the actual logic.

As I asked before, do they execute in a manner like dominoes where the clock process advances data through the FPGA in steps?

No, this is a horrible analogy.

To understand the unrolling as applied to the logic design you need to understand the difference between combinatorial and sequential logic.

In combinatorial logic the outputs are simply a function of inputs.

In sequential logic the outputs are a function of inputs and the internal state. All the current FPGA hashing appliances use a variant of sequential logic called synchronous sequential logic: there is a dedicated clock input and a change on the clock is when the internal state gets updated.

Fully unrolled (128-way or 125-way) Bitcoin hash means that the logic that computes it is fully combinatorial, there is no internal state used inside the cascade of the two SHA246 hashers. The clock is still used in the fully-unrolled design: to increment the nonce counter and to sample the zero-comparator at the output of the hasher.

64-way unrolled Bitcoin hash means that there is one internal state register that stores the intermediate state. During the odd clock cycles it does single SHA-256 of the input (midstate and nonce) and stores it in the internal state. During the even clock cycles it does single SHA-256 of the internal state and presents it on the output. This cicrcuit is only about half the size of the above circuit.

Now fully unrolled and pipelined design would be about the same size as the above fully-unrolled design but it would have some internal state registers. If there is one level of pipelining then in each clock cycle first half of the circuit would compute single SHA-256 for nonce "N" and the second half of the circuit would compute single SHA-256 for the nonce "N-1".

In unpipelined design the input signal have to race through full 128 (or 125) rounds of SHA-256 to the zero-comparator. With one level pipelined design the the inputs have to race through 64 rounds of the first SHA-256 to the internal register simultaneously with another set of signals racing through 64 (or 61) rounds from the internal register to the zero-comparator. Simplistically one could say that the clock rate on this design could be almost double of the clock rate of the non-pipelined design.

Again, I haven't seen anyone publishing open-source designs that are both independently unrolled and pipelined. I think this is due to the limitations of the FPGA synthesis tools. They either require tens or hundreds of GB of RAM or months of CPU time.

Anyway, kano, I suggest that you throw away your domino set. Get yourself a free version of Xilinx ISE or Altera Quartus and use them to play FPGA design game. It is like playing Tetris, chess & contract bridge all on the same board.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
rjk
Sr. Member
****
Offline Offline

Activity: 448
Merit: 250


1ngldh


View Profile
January 31, 2012, 09:18:52 PM
 #1060

Lets not confuse unrolling with pipelining. Current open-source designs are fully-unrolled, but I have yet to see a proper pipelined design that is open-sourced. Maybe pipelining is what BFL did? By pipelining the unrolled design they could significantly crank up the clock, since the FPGAs are limited more by the propagation delay in the signal routing than in the propagation delay in the actual logic.
My understanding is that the lack of pipelining is due to the lack of registers in an FPGA, is this correct or not?

Mining Rig Extraordinaire - the Trenton BPX6806 18-slot PCIe backplane [PICS] Dead project is dead, all hail the coming of the mighty ASIC!
Pages: « 1 ... 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 [53] 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!