Inspector 2211
|
|
March 10, 2012, 05:23:58 PM |
|
BFL Single, watch out below. What makes you think this cannot similarly be applied to the single ( even after a hardware modification ) 1. Consensus on this forum is, that the BFL Single uses Altera FPGAs of an unknown type (Stratix?) and one would first want to determine the exact FPGA being used, before speculating whether DSP blocks could be used to a similarly beneficial effect. Without knowing the exact FPGA make/model, it's way too premature to state that DSP blocks could be used there - maybe that particular FPGA make/model does not even have DSP blocks. or 2. Maybe they are already using this trick, maybe that's their secret sauce which allows them to reach 830 MH/s with but two FPGAs. Just my 2 cents.
|
|
|
|
bulanula
|
|
March 10, 2012, 05:25:51 PM |
|
BFL Single, watch out below. What makes you think this cannot similarly be applied to the single ( even after a hardware modification ) 1. Consensus on this forum is, that the BFL Single uses Altera FPGAs of an unknown type (Stratix?) and one would first want to determine the exact FPGA being used, before speculating whether DSP blocks could be used to a similarly beneficial effect. Without knowing the exact FPGA make/model, it's way too premature to state that DSP blocks could be used there - maybe that particular FPGA make/model does not even have DSP blocks. or 2. Maybe they are already using this trick, maybe that's their secret sauce which allows them to reach 830 MH/s with but two FPGAs.Just my 2 cents. That is what I meant. It seems this guy has found a way to speed up the hashrate using DSPs so what is so hard to understand Turbor and gigavps ? I was asking why couldn't BFL also do this "trick" and a valid question indeed. One bitstream or FPGA "trick" likely could be applied on a range of different FPGA hardware because the basic operating principles are the same for all FPGAs etc. I'm no expert but I understand ( reasonably well ) how FPGA works and this DSP trick allows you to do 3 loops of SHA256 in the same chip ( cheap Spartan 6 ones ) that previously only allowed us to do 2 loops etc.
|
|
|
|
Inspector 2211
|
|
March 10, 2012, 05:45:38 PM |
|
That is what I meant. It seems this guy has found a way to speed up the hashrate using DSPs so what is so hard to understand Turbor and gigavps ?
He doesn't use DSPs throughout (because there are not enough DSPs to go around), but only at the most critical spots, i.e. where the adders feed into longlines. That's the brilliance of it. That's the design idea I had completely missed before. I was asking why couldn't BFL also do this "trick" and a valid question indeed. One bitstream or FPGA "trick" likely could be applied on a range of different FPGA hardware because the basic operating principles are the same for all FPGAs etc.
Agreed. I'm no expert but I understand ( reasonably well ) how FPGA works and this DSP trick allows you to do 3 loops of SHA256 in the same chip ( cheap Spartan 6 ones ) that previously only allowed us to do 2 loops etc.
No, using this DSP trick has nothing to do with being able to squeeze three SHA-256 instances into a FPGA. You can do that with the plain old stream-powered ripple carry adders. Using DSPs in a few strategic places, however, ensures that the critical path (a deadly combination of two 32-bit adder stages and one longline path) stays well below 5 ns, when otherwise (with ripple carry adders) it can barely achieve 5 ns.
|
|
|
|
BFL-Engineer
|
|
March 10, 2012, 05:45:44 PM |
|
Number of DSP48A1s: 30 out of 180 16%
Aha! Interesting. When uncle Moshe (Gavrielov) gives you DSPs, make DSPeade. Thank you for providing an important puzzle piece on how Dr. Tyrell does it. The multiplier in the DSP48-block is not needed in SHA-256, hence what he obviously uses is the 18-bit adder BCOUT = B + D. He uses 30 DSP blocks, 10 per red / green / blue SHA-256 instance. For a 32 bit adder, two 18-bit adders BCOUT=B+D are needed. Thus, he can implement five 32-bit adders per SHA instance. So, why not just use [slow] 32-bit ripple adders everywhere, and use a few [very fast] DSP adders in some places? The answer is, IMHO, that he uses the fast DSP adders only where they feed into longlines. Were he to use normal ripple adders where he feeds into longlines, the aggregate delay would limit the design to a 5 ns clock cycle. Using the fast DSP adders will allow this design, when properly fine-tuned, to march into 4 ns clock cycle territory, for a total MH/s number of approximately 125 MH/s or approximately 375 MH/s per Spartan6-150. BFL Single, watch out below. I remember nghzang mentioned that going to 200MHz on chips was not suggested (chips got so hot), and he gave out a bitstream with a "Use at your own risk". Three loops on the same chip suggests far greater number of Registers is being used. Since each stage toggle rate approaches 50% (This idea behind Digest functions is that their toggle-rate must approach 50% in each stage to be effective, and so is the case in SHA256), I wonder how hot the chips will get in high frequencies, approaching 180MHz or 190MHz... Good Luck,
|
|
|
|
Inspector 2211
|
|
March 10, 2012, 05:58:47 PM Last edit: March 10, 2012, 11:41:39 PM by Inspector 2211 |
|
I remember nghzang mentioned that going to 200MHz on chips was not suggested (chips got so hot), and he gave out a bitstream with a "Use at your own risk". Three loops on the same chip suggests far greater number of Registers is being used. Since each stage toggle rate approaches 50% (This idea behind Digest functions is that their toggle-rate must approach 50% in each stage to be effective, and so is the case in SHA256), I wonder how hot the chips will get in high frequencies, approaching 180MHz or 190MHz...
Hahaha - funny that you mention it. You guys have found that out the hard way, haven't you? (By the way, I have a total of 12 BFL singles on order, so I'm not anti-BFL at all.) But you are correct and you raise a valid point. At 200 MH/s, Dr. Tyrell's design dissipates about 8 W, and so it's fair to assume that it dissipates 12 W at 300 MHz, which is probably stretching the boundaries of a tiny 20mm x 20mm plastic chip like that. I mean, you can mount a big cooler on it, but there is a thermal resistance from the FPGA die to the cooler. Maybe these devices have to be run inside a freezer to successfully achieve a consistent hash rate of 300 MH/s and beyond. Time will tell.
|
|
|
|
Wandering Albatross
Member
Offline
Activity: 70
Merit: 10
|
|
March 10, 2012, 08:54:32 PM |
|
Maybe these devices have to be run inside a freezer to successfully achieve a consistent hash rate 300 MH/s and beyond.
Maybe a peltier device would suffice.
|
BTC: 1JgPAC8RVeh7RXqzmeL8xt3fvYahRXL3fP
|
|
|
kakobrekla
|
|
March 10, 2012, 08:56:05 PM |
|
Maybe these devices have to be run inside a freezer to successfully achieve a consistent hash rate 300 MH/s and beyond.
Maybe a peltier device would suffice. Yeah cause those are free and need no power to run!
|
|
|
|
DeepBit
Donator
Hero Member
Offline
Activity: 532
Merit: 501
We have cookies
|
|
March 10, 2012, 09:22:16 PM |
|
Peltiers are cool and handy, but for big mining operations phase-change heat pumps will be FAR more efficient. Remember that Peltier modules consume a lot of current and act almost as 200% efficiency heaters on the other side :)
|
Welcome to my bitcoin mining pool: https://deepbit.net ~ 3600 GH/s, Both payment schemes, instant payout, no invalid blocks ! Coming soon: ICBIT Trading platform
|
|
|
Wandering Albatross
Member
Offline
Activity: 70
Merit: 10
|
|
March 10, 2012, 10:31:06 PM |
|
Peltiers are cool and handy, but for big mining operations phase-change heat pumps will be FAR more efficient. Remember that Peltier modules consume a lot of current and act almost as 200% efficiency heaters on the other side How would a phase-change heat pump work for cooling a chip? Is there an off-the-shelf device? Or would it be DIY? For the peltier I realize power is needed and heat is also produced but if your goal is to cool the chip it would work for that and peltier makers target this application of their products. e.g. micropelt Perhaps a better way to efficiency is to create your own cheaper power and don't create as much heat by running miners at lower freq.
|
BTC: 1JgPAC8RVeh7RXqzmeL8xt3fvYahRXL3fP
|
|
|
TheSeven
|
|
March 10, 2012, 10:48:35 PM |
|
Peltiers are cool and handy, but for big mining operations phase-change heat pumps will be FAR more efficient. Remember that Peltier modules consume a lot of current and act almost as 200% efficiency heaters on the other side How would a phase-change heat pump work for cooling a chip? Is there an off-the-shelf device? Or would it be DIY? For the peltier I realize power is needed and heat is also produced but if your goal is to cool the chip it would work for that and peltier makers target this application of their products. e.g. micropelt Perhaps a better way to efficiency is to create your own cheaper power and don't create as much heat by running miners at lower freq. What about just using a BFL single instead of wasting half the electricity on cooling?
|
My tip jar: 13kwqR7B4WcSAJCYJH1eXQcxG5vVUwKAqY
|
|
|
DeepBit
Donator
Hero Member
Offline
Activity: 532
Merit: 501
We have cookies
|
|
March 10, 2012, 10:49:04 PM |
|
Peltiers are cool and handy, but for big mining operations phase-change heat pumps will be FAR more efficient. Remember that Peltier modules consume a lot of current and act almost as 200% efficiency heaters on the other side :)
How would a phase-change heat pump work for cooling a chip? Is there an off-the-shelf device? Or would it be DIY? Any good solution for that will be some kind of DIY, otherwise it will be either unsuitable or expensive. For phase-change I would use two contours - one with normal liquid passing through many special waterblocks and second with freon for cooling the first contour (if we need lower-than-environment temps, or course).
|
Welcome to my bitcoin mining pool: https://deepbit.net ~ 3600 GH/s, Both payment schemes, instant payout, no invalid blocks ! Coming soon: ICBIT Trading platform
|
|
|
Gomeler
|
|
March 11, 2012, 12:11:38 AM |
|
Very interesting thread but I can actually chime in on this conversation about using a phase-change system to remove heat. If you're talking a traditional single-stage gas system like in your refrigerator then forget about it. The piping required plus the MINUSCULE load for each cold-head would make this cost prohibitive. Your best bet would be to repurpose a mini-fridge, use a proper condenser and throw a TXV for refrigerant metering and use something like r134a or n-butane/iso-butane and aim for evaporator temperatures in the 0-20 Celsius range. Then stick with your dinky little heatsinks and fans and not worry about having to mill expensive evaporators for such a small heatload. Mini-fridges or even something like a deep-chest freezer would be the perfect insulated box to work with. The issue is such systems are designed to remove the heat from a load that doesn't generate additional heat. Without modification you will kill a freezer/fridge. That's where the replacement condenser and TXV come in to place. Make the compressor happy and you'll have shockingly low compressor loads and could very well run these FPGAs at astonishing speeds. That all being said, compressed gasses are fun but can easily explode in your face with dire consequences if you aren't careful. Plenty of forums out there for amateur refrigeration. Take a gander at some of the things people have made and consider the tool costs. My own set of tools and gasses would buy a number of FPGAs and likely make me more money in the process
|
|
|
|
eldentyrell (OP)
Donator
Legendary
Offline
Activity: 980
Merit: 1004
felonious vagrancy, personified
|
|
March 20, 2012, 01:59:09 AM Last edit: March 20, 2012, 02:37:17 AM by eldentyrell |
|
Number of DSP48A1s: 30 out of 180 16%
Aha! Interesting. When uncle Moshe (Gavrielov) gives you DSPs, make DSPeade. This isn't my "secret sauce", but it is unique to my design. When I run out of SRL16s in the places where I need them, I use the DSP48's as 32-bit-wide 16-bit-wide, 6-bit-deep FIFOs. Useful trick.
|
The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
|
|
|
eldentyrell (OP)
Donator
Legendary
Offline
Activity: 980
Merit: 1004
felonious vagrancy, personified
|
|
March 20, 2012, 02:00:55 AM |
|
The multiplier in the DSP48-block is not needed in SHA-256, hence what he obviously uses is the 18-bit adder BCOUT = B + D.
Nah; I use the DSP48s as big fat FIFOs; they have lots of registers inside and if you configure them right everything's a no-op.
|
The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
|
|
|
rjk
Sr. Member
Offline
Activity: 448
Merit: 250
1ngldh
|
|
March 20, 2012, 02:01:58 AM |
|
The multiplier in the DSP48-block is not needed in SHA-256, hence what he obviously uses is the 18-bit adder BCOUT = B + D.
Nah; I use the DSP48s as big fat FIFOs; they have lots of registers inside and if you configure them right everything's a no-op. Interesting use case; so essentially no added latency using them this way?
|
|
|
|
eldentyrell (OP)
Donator
Legendary
Offline
Activity: 980
Merit: 1004
felonious vagrancy, personified
|
|
March 20, 2012, 02:02:32 AM |
|
BFL Single, watch out below. What makes you think this cannot similarly be applied to the single ( even after a hardware modification ) The BFL single definitely isn't a Spartan 6. BTW, I will offer a 10BTC bounty to anybody who posts the JTAG IDCODE readout from the BFL single -- merely to satisfy my curiosity. There was a JTAG header on the last PCB I saw them post.
|
The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
|
|
|
eldentyrell (OP)
Donator
Legendary
Offline
Activity: 980
Merit: 1004
felonious vagrancy, personified
|
|
March 20, 2012, 02:08:09 AM |
|
is there a way to port it to Ztex or other FPGA board's?
Yes.
|
The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
|
|
|
eldentyrell (OP)
Donator
Legendary
Offline
Activity: 980
Merit: 1004
felonious vagrancy, personified
|
|
March 20, 2012, 02:09:34 AM |
|
Sorry for falling off the radar there. Real life, quality time with git bisect, and some voltage drop issues on my own boards conspired to slow things down the last week or so.
|
The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators. So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
|
|
|
c_k
Donator
Full Member
Offline
Activity: 242
Merit: 100
|
|
March 20, 2012, 05:42:02 AM |
|
Ooh new speed update, what do we have to do to encourage you to make this available to all to use? Even if only binary blobs initially. X6500's would be awesome with this
|
|
|
|
Energizer
|
|
March 20, 2012, 01:49:25 PM |
|
I totally agree with you catfish! You can count me in such club! But before all Dr Tyrell should inform us whether he is willing to sell his bitstream to us or not!
And in case he is not! We may then fund the open-source project to speed up its development!
|
|
|
|
|