Bitcoin Forum
May 06, 2024, 06:50:30 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: BFL SC Die Guestimation/Speculation  (Read 4086 times)
tacotime (OP)
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
September 10, 2012, 12:41:43 AM
 #1

I'm not an electrical engineer, so probably someone who is will shit all over this.  Anyway.

I think these chips are small and probably on 90-130 nm technology.

SHA256 hashing requires about 13,500 logic gates per circuit or 27,000 transistors.  An AMD K8 130 nm CPU has about 106M transistors in 100mm^2 with a TDP of 60W, so we could fit about 3926 SHA256 hashing circuits on one of these ASIC dies.  These hashing units run at 65 cycles per hash; we would expect from an immature 130 nm process for the ASIC that clock rates of 1 GHz would be achievable.  This would mean 14.5 MH/s per hashing circuit or 56.9 GH/s per 100mm^2 die with a 60W power consumption.

How does that compare to what has been given to us by BFL?  The BFL single, which is assumed to be a single die, is rated at 40GH/s.  With the above ignoring crucial things like transistors for I/O, it does indeed seem possible that the BFL single can provide 40GH/s at 60W on a 130 nm process.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
YokoToriyama
Newbie
*
Offline Offline

Activity: 58
Merit: 0



View Profile
September 10, 2012, 12:52:26 AM
 #2

sounds good to me.
tytus
Sr. Member
****
Offline Offline

Activity: 250
Merit: 250


View Profile
September 10, 2012, 08:11:05 AM
 #3

A CPU has much smaller toggle rate [will use much less power than the Hashing chip ... maybe even 10 time less]. Getting 1GH on 130nm is probably also not easy.
Frizz23
Hero Member
*****
Offline Offline

Activity: 1162
Merit: 500


View Profile
September 10, 2012, 08:18:56 AM
 #4

... it does indeed seem possible that the BFL single can provide 40GH/s at 60W on a 130 nm process.

Assuming your data is correct, one Jalapeno (3.5GH/s) would need 5.25 Watt.

But a USB port provides only 2.5 Watt.

Hmmm ...

Ξtherization⚡️First P2E 2016⚡️🏰💎🌈 etherization.org
||bit
Hero Member
*****
Offline Offline

Activity: 924
Merit: 506


View Profile
September 10, 2012, 11:35:26 AM
 #5

... it does indeed seem possible that the BFL single can provide 40GH/s at 60W on a 130 nm process.

Assuming your data is correct, one Jalapeno (3.5GH/s) would need 5.25 Watt.

But a USB port provides only 2.5 Watt.

Hmmm ...

That would be with the end values of what he considers possible. But assume his values 56.9GH/s and ~60W it is 3.7W. Still high. Maybe, they have two USB cables, or maybe, sicne it's referred to as a "coffee warmer", they intended to overdrive it to make more heat? ;-)

||bit
tacotime (OP)
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
September 10, 2012, 04:38:55 PM
Last edit: September 10, 2012, 06:44:56 PM by tacotime
 #6

A CPU has much smaller toggle rate [will use much less power than the Hashing chip ... maybe even 10 time less]. Getting 1GH on 130nm is probably also not easy.

Okay, I read some notes on this here and it makes sense.  Basically, [for people reading about this too], the toggle rate is the number of transistors changing state per clock cycle designated a.

The net toggle rate n is such that
n = f × t
where f is the clock frequency and t is the ratio of transistors switching state per clock cycle.  For a SHA256 ASIC, we would assume this to be 98% or 0.98, whereas for a typical CPU or GPU this would be much less as SHA256 hashing would not require state changes for transistors in components not used, such as the FPU.

The total power consumption p of the ASIC is then
p = n × transistor dissipation factor (constant for the same process, e.g. 130 nm, but becomes a smaller constant as the fabrication shrinks)
because electricity is wasted (dissipated) with every state transition for the transistor.

Thus, as tytus stated, this chip hashing at full speed would likely require some 2x-10x more power, making the TDP 120-600w.  However, if the process were 28-45nm, a TDP close to that quoted by BFL may be possible, but it seems unlikely they have access to such an advanced fabrication method (which would require tens of millions of dollars in investment to get off the ground, most likely).

See also: http://cis.poly.edu/cs2214rvs/powers03.htm

So, the power claims by BFL are actually dubious... which is not surprising, given the 6x difference in power for their BFL single unit between quoted and actual.  This is all pretty suspicious.  If BFL really wanted to keep the user informed, they could at least give the process they're working on, the transistor count, fab location, etc like a real company that does chip fab, but so far all they're giving us is a place to send money to them and a hashing speed.

Further, BFL may risk patent infringement if they use published technical methods for the ASIC circuit designs.  But since BFL has said mostly nothing about the way it works, who knows?

edit: 10 BTC says they're just reselling CAST's ASICs for SHA256


Code:
ASIC Technology
                  max f (MHz)    Logic Area (um2)   Number of eq. gates
UMC 0.18 μm       280            250,040            20.5 K
TSMC 0.09 μm      500            50,800             18.0 K

So there you go, it's probably on TSMC 90 nm.  If this is the case then the 2.5 W of draw for the jalapeno is probably not true at all.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
tytus
Sr. Member
****
Offline Offline

Activity: 250
Merit: 250


View Profile
September 10, 2012, 09:35:56 PM
 #7

1. jalapeno will not be only USB powered
2. FPGA's use 45nm (or lower) technology and hardcopies of that also

=> they will probably use an altera hardcopy as the current models are based on altera FPGAs.

But the power problem remains. The hardcopies are apparently not more power efficient. This would question the idea of providing a Mini Rig SC as it would consume 40 x 1kW [if it would be based on FPGAs ... and if hardcopies consume as much power].
tacotime (OP)
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
September 11, 2012, 02:13:17 AM
 #8

1. jalapeno will not be only USB powered
2. FPGA's use 45nm (or lower) technology and hardcopies of that also

=> they will probably use an altera hardcopy as the current models are based on altera FPGAs.

But the power problem remains. The hardcopies are apparently not more power efficient. This would question the idea of providing a Mini Rig SC as it would consume 40 x 1kW [if it would be based on FPGAs ... and if hardcopies consume as much power].

If Altera HardCopy is used it will be on 28nm, with a maximum of 11.5M gates, or a maximum hash rate of 12.35 GH/s at 1 GHz, but these run at 400-700 MHz typically.

If this really is the case, the power usage will not be much less than the corresponding ASIC.  Altera themselves state, "Average of 50% performance improvement over corresponding FPGA, average of 40% less power consumption compared to corresponding FPGA."  Thus, from a hash/s/w standpoint, the ASIC would be about 200% greater than the corresponding FPGA.  A by-hand design like that of CAST's ASIC would be the only ASIC able to really deliver the kind of power consumption BFL has been hinting at.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
mrb
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
September 11, 2012, 03:43:10 AM
 #9

tacotime: I think the power claims made by BFL are absolutely plausible. 700 Mhash/Joule is doable at 65nm, check the math here: https://bitcointalk.org/index.php?topic=95762.0
tacotime (OP)
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
September 11, 2012, 02:46:38 PM
Last edit: September 11, 2012, 04:47:11 PM by tacotime
 #10

tacotime: I think the power claims made by BFL are absolutely plausible. 700 Mhash/Joule is doable at 65nm, check the math here: https://bitcointalk.org/index.php?topic=95762.0


Based on the 130 nm technology in the paper there there (as far as I can tell the only real experimental data) and the clock rates they've given, you'd be looking at 6 GH/s with a TDP of ~90 W (100mm^2 die) considering how much space the hashing unit in the study takes up on the die.  That'd be 66.7 MH/s/w.  At 65 nm you're moving to maybe three times the efficiency (real life examples: AMD K8 vs. early Core2Duo), or 200 MH/s/w.  Hence, you should NOT be able to achieve 700 MH/s/w without moving to 32 nm or below (even then it's likely below 700 MH/s/w).

As I've said above, even Altera themselves have stated that the ASICs produced from their FPGAs are not more efficient than the FPGAs by an order of magnitude.  The likelihood is higher that BFL's SC mining ASICs will perform somewhere in the vicinity of 100-200 MH/s/w.

Why do you think that BFL hasn't been talking about power consumption up to now?  Probably because they know it's unlikely they'll deliver to the hype of their rumours.

Cost to produce a 100mm^2 die on 45 nm technology that gets an estimated 15-30 GH/s at ~200 MHz is also probably $100-200.  Likely the reason in that study that they couldn't be clocked higher is incredible power consumption/heat dissipation.

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
runeks
Legendary
*
Offline Offline

Activity: 980
Merit: 1008



View Profile WWW
September 11, 2012, 05:22:23 PM
 #11

Nice thread. Good to see some more educated guesses.

Cost to produce a 100mm^2 die on 45 nm technology that gets an estimated 15-30 GH/s at ~200 MHz is also probably $100-200.  Likely the reason in that study that they couldn't be clocked higher is incredible power consumption/heat dissipation.
Where do you get this figure? As far as I can gather (disclaimer: not a hardware guy either) is that price per wafer makes no sense. If you produce one wafer it's several millions dollars. If you produce a million wafers it's more like $5 per wafer. Ie. the marginal cost of chunking out the chips is tiny compared to the NRE costs.
Frizz23
Hero Member
*****
Offline Offline

Activity: 1162
Merit: 500


View Profile
September 11, 2012, 06:45:35 PM
 #12

... If you produce one wafer it's several millions dollars. ...

Setup cost is high - but not in the millions.

Quote
If you produce a million wafers it's more like $5 per wafer.

No. It's more like 50.000$ per lot (=50 wafers).

At least this is what I can remember when working for a semiconductor manufacturer (ASICS). But that's 20 years ago ... I'm an old fart now.

Ξtherization⚡️First P2E 2016⚡️🏰💎🌈 etherization.org
tacotime (OP)
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
September 11, 2012, 07:49:39 PM
 #13

No. It's more like 50.000$ per lot (=50 wafers).

At least this is what I can remember when working for a semiconductor manufacturer (ASICS). But that's 20 years ago ... I'm an old fart now.

Values like these and the retail values of GPU/CPU dies on these technologies are where I'm pulling it from.  From a wafer I think you could expect ~200.   From a $2000 150mm wafer you would get ~200 100mm^2 dies, of which 70-80% may be usable or a yield of 150 usable dies.  This is $13 each to the company producing the ASICs, who would presumably need to test them for fidelity and mark them up before sending them off to BFL/whoever, plus assembly, R&D and whatever else setup overhead.  It is presumed that BFL would be ordering these wafers by the hundreds.

See more here:
http://smithsonianchips.si.edu/ice/cd/CEICM/SECTION7.pdf
http://www.overclockers.com/forums/showthread.php?t=550542

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
mrb
Legendary
*
Offline Offline

Activity: 1512
Merit: 1027


View Profile WWW
September 11, 2012, 08:51:06 PM
Last edit: September 12, 2012, 01:02:19 AM by mrb
 #14

tacotime: I think the power claims made by BFL are absolutely plausible. 700 Mhash/Joule is doable at 65nm, check the math here: https://bitcointalk.org/index.php?topic=95762.0

Based on the 130 nm technology in the paper there there (as far as I can tell the only real experimental data) and the clock rates they've given, you'd be looking at 6 GH/s with a TDP of ~90 W (100mm^2 die) considering how much space the hashing unit in the study takes up on the die.  That'd be 66.7 MH/s/w.  At 65 nm you're moving to maybe three times the efficiency (real life examples: AMD K8 vs. early Core2Duo), or 200 MH/s/w.  Hence, you should NOT be able to achieve 700 MH/s/w without moving to 32 nm or below (even then it's likely below 700 MH/s/w).

The pb is that you start your calculations from non-optimal numbers ("66.7 Mh/J").

Virginia Tech 130nm simulations estimated 75 Mh/J (13.42 mJ/Gbits); real chips did very, very close: 73 Mh/J (13.76 mJ/Gbits). The reason simulations predict very accurate numbers is because SHA-256 has a very predictable gate toggle rate.
Bitfountain 130nm simulations estimate 122 Mh/J; therefore real chips are very likely to achieve the same.

Then, even based on your very conservative estimation of 3x efficiency gain when moving from 130nm to 65nm, Bitfountain numbers should translate to 122 x 3 = 370 Mh/J, which even that is in the rough (~2x) ballpark of BFL's inferred claim of 700 Mh/J...
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
September 11, 2012, 11:24:55 PM
 #15

of which 70-80% may be usable or a yield of 150 usable dies.
They manufacturing yield on a Bitcoin mining chip will be eiher 0% or 100%. The structure is so repetitive and the failure modes are inconsequential: coin mining is essentially buying the lottery tickets really fast. On top of that almost every SHA256 implementation is essentially self-testing: it either always works or always doesn't work, so the post manufacturing tests are trivial.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
lame.duck
Legendary
*
Offline Offline

Activity: 1270
Merit: 1000


View Profile
September 12, 2012, 09:47:39 AM
 #16

While in fact testing should be trivial, it says nothing about the yield which depends on how good the fab can produce the chips. There could be 'dust' oarticles that produce defective dies, oder the metalization isn't homogenous over the wafer etc. As far i know dies in the inner of a wafer tend to be of 'better' quality.
squid
Member
**
Offline Offline

Activity: 112
Merit: 10


View Profile
September 12, 2012, 10:08:49 AM
 #17

No. It's more like 50.000$ per lot (=50 wafers).

At least this is what I can remember when working for a semiconductor manufacturer (ASICS). But that's 20 years ago ... I'm an old fart now.

Values like these and the retail values of GPU/CPU dies on these technologies are where I'm pulling it from.  From a wafer I think you could expect ~200.   From a $2000 150mm wafer you would get ~200 100mm^2 dies, of which 70-80% may be usable or a yield of 150 usable dies.  This is $13 each to the company producing the ASICs, who would presumably need to test them for fidelity and mark them up before sending them off to BFL/whoever, plus assembly, R&D and whatever else setup overhead.  It is presumed that BFL would be ordering these wafers by the hundreds.

See more here:
http://smithsonianchips.si.edu/ice/cd/CEICM/SECTION7.pdf
http://www.overclockers.com/forums/showthread.php?t=550542

Since when does cost of wafer = cost of device? Wafers are cheap compared to other pieces required to fabricate devices.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
September 22, 2012, 05:55:41 PM
Last edit: September 22, 2012, 07:45:38 PM by eldentyrell
 #18

SHA256 hashing requires about 13,500 logic gates per circuit or 27,000 transistors.

Transistor/gate counts aren't really useful anymore.  There's a saying in the industry "you pay for the wires, we throw in the transistors for free".  Interconnect is everything.  Unfortunately people are seduced by the fact that transistors come in discrete units so you can count how many of them you have.  Interconnect costs are more subtle.

Transistor/gate counts are really only useful if you're comparing standard-cell designs pushed through the same toolchain.  Otherwise the choice of synthesis tool matters way more than the gate count.

It's utterly pointless to compare a standard-cell design to a full-custom design using transistor count.  Even between full-custom designs it's normal to see a 4x variation in area based on the foresight of the architect and the skill of the layout designer.  By the way, BFL doesn't use the phrase "full custom" to mean the same thing it means in the industry.

Also keep in mind that unlike FPGA gates, VLSI gates come in all different sizes.  There are "strong" NAND gates that are 64x (or more) as large as the weakest NAND gates, yet they still count as one gate or four transistors!

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
Inaba
Legendary
*
Offline Offline

Activity: 1260
Merit: 1000



View Profile WWW
September 23, 2012, 12:53:28 AM
 #19

Quote
By the way, BFL doesn't use the phrase "full custom" to mean the same thing it means in the industry.

We don't?  Please elaborate. (I'm serious, I'm not being snarky.  If we/I am using it incorrectly, then I would like to use the proper term.)


If you're searching these lines for a point, you've probably missed it.  There was never anything there in the first place.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
September 23, 2012, 03:35:53 AM
 #20


Quote
By the way, BFL doesn't use the phrase "full custom" to mean the same thing it means in the industry.

We don't?  Please elaborate. (I'm serious, I'm not being snarky.  If we/I am using it incorrectly, then I would like to use the proper term.)

Standard-cell ASICs and synthesis-flow ASICs are not considered full-custom chips.

The phrase "fully custom" is a BFL-ism that sounds a lot like "truthiness" Smiley  In fact the third google hit for "fully custom asic" on the entire interweb is BFL which ought to be a hint that it is a contortion of the usual industry terminology...

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!