UPDATED 21-Jul-2013: added column showing delivery/verification status. "Verified" means by an independent third party. "Delivered" means at least a few have been sold in arm's-length transactions (i.e. not special favors to developers or reviewers).
UPDATED: 22-Jun-2013 changed BFL numbers from post-tapeout claim (7.5GH/s) to actual measurement (4GH/s).
UPDATED: 21-Jun-2013 added Bitfury
65nm 55nm figures (and fixed arithmetic error).
Known Figures Design | MH/s | Device | Process node, $\lambda$ | Area | η (H*pm/s) | status |
Bitfury 55nm | 2 GH/s | Custom | 55nm, 27.5nm | 14.44 mm2 | 2,880.45 | verified |
Avalon | 275 MH/s | Custom | 110nm, 55nm | 16.13 mm2 | 2,836.52 | verified, delivered |
BFL SC | 4.0GH/s | Custom | 65nm, 32.5nm | 56.25mm2 | 2,441.11 | verified, delivered |
Bitfury Spartan-6 | 300MH/s | Spartan-6 | 45nm, 22.5nm | 120mm2 | 28.47 | delivered |
Tricone | 255MH/s | Spartan-6 | 45nm, 22.5nm | 120mm2 | 24.20 | verified, delivered |
Ztex | 210MH/s | Spartan-6 | 45nm, 22.5nm | 120mm2 | 19.75 | verified, delivered |
BFL_MiniRig_1Card | 1.388 GH/s | 2 x Altera Arria II EP2AGX260 | 40nm, 20nm | 306.25mm2 | 18.14 | verified, delivered |
ATI 5870 | 393 MH/s | Evergreen | 40nm | 334mm2 | 9.39 | verified, delivered |
BFL_Single | 832MH/s | 2x EP3SL150F780 | 65nm, 32.5nm | ? | | verified, delivered |
Block Eruptor | ? | Custom | ?, ? | ? | conflicting data | announced |
Reclaimer | ? | Custom | ?, ? | ? | | announced |
I will list a chip in the table above when we have all of the following data:
- Hashrate either in a claim from the manufacturer or measurement by a third party
- Die size either in an unambiguous claim by the manufacturer or die photo from a third party
- Process node in an unambiguous claim by the manufacturer
- A plausible date by which independent verification will be possible.
SummaryAs more and more announcements about bitcoin-specific chips come out, it would be useful to have a metric that compares the quality of the underlying design.
I recommend "hash-meters per second" as a metric. This is calculated by dividing the hashrate (in H/s) by the die area in square meters and then multiplying by the cube of the process's feature size in meters (half of the process node's "name", so a 90nm process has a 45nm feature size). If you use hash-
picometers instead of hash-meters you wind up with reasonable-sized numbers.
Current GPUs and FPGAs get
8-24 H*pm/s; the three ASICs we have numbers for have η-factors around
2,400-2,800 H*pm/s -- 100 times more efficient use of silicon than FPGAs and GPUs.
Migrating a design from one process to another by direct scaling --
when possible -- will not change this metric. Therefore it gives you a good idea of how the "rising tide" of semiconductor process technology will lift the various "boats".
DetailsProcess-invariant metrics factor out the contribution of capital to the end product, since the expenditure of capital can overwhelm the quality of the actual IP and give misleading projections of its future potential. A 28nm mask set costs at least 1000 times as much as a 350nm mask set, but migrating a design from 350nm to 28nm is not going to give you anywhere near 1000 times as much hashpower.
This metric probably does not matter for immediate end-user purchasing decisions -- MH/$ and MH/J matter more for that -- but for investors, designers, and long-range planning purposes it gives a better idea of how much "headroom" a given design has to improve
simply by throwing more money at it and using a more-expensive IC process. Alternatively, this can be seen as a measure of
how much of its performance is due to money having been thrown at it. That is important for investors -- and the line between presale-customers and investors is a bit blurry these days with all the recent announcements.
As semiconductor processes become more advanced, two important things happen:
1. The transistors get smaller (area).
2. The time required for transistors to turn on gets shorter (speed).
AreaGenerally #1 (area) is indicated by the process name. For example, in a 90nm process the smallest transistor gates are 90nm long.
Chip designers refer to
half of this length (i.e. 45nm on a 90nm process) as the feature size. The feature size is half of a gate length because you can always place transistors on a grid whose squares are at least half the length of the smallest gate. Usually you get an even finer grid than that, but it's not universally guaranteed.
Therefore, to get an area-independent measure of the size of a circuit, measure the circuit's area (units: square meters) and divide that by the square of the feature size (units: square meters) to get a unitless quantity. Well, almost unitless. Technically the units for a process's feature size are "meters per lambda" rather than meters, meaning the units for the final quantity should be (hash-meters) per (second*lambda-cubed).
SpeedSemiconductor processes are also characterized by a measure called "tau", which is the RC
time constant of the process. This is the time it takes a symmetric inverter to drive a wire high or low, assuming the wire has no load.
The raw tau factor ignores the load presented by wires and other gates, so instead some desginers prefer to use This is also called the
FO4 or the normalized gate delay. FO4 is the same measurement, but each gate drives four copies of itself.
Unfortunately the tau and FO4 numbers can be hard to come by, and they frequently get mixed up with each other (one is listed where the other ought to be). Also, there is a bit of "wiggle room" in exactly how the RC circuit or loading is done, so it's common to see inconsistent numbers cited by different sources for the same process. Because of this, using tau or FO4 directly in a competitive metric is a bad idea: people will fight over which tau or FO4 numbers to use. A
previous proposal used gate delays as part of the metric, but I no longer recommend that metric since if it were to gain popularity it would inevitably lead to people playing games with the tau/FO4 numbers, picking and choosing whichever number cast their favorite product in the best light.
Fortunately, there is a fix. All we need here is a
relative comparison of two circuits. It turns out that both tau and FO4 scale more or less linearly with the gate length (and therefore with the feature size). So instead of converting hashes/sec into hashes/tau or hashes/FO4 we can use the feature size as a proxy for the gate delay time and
multiply the measure of hashes/sec by the feature size instead of multiplying by the tau/FO4 time.
The resulting number will be totally meaningless as an absolute quantity, but the ratio of this metric for two different circuits will still give the ratio of their performance on equivalent processes.
FormulaSo the forumla is:
(hashrate / area_in_square_lambda) * gate_switching_time
The units for this number are simply "hashes" (or "hashes per square lambda").
However remember that we're using feature_size (measured in meters per lambda) as a proxy for gate_switching_time since there is less wiggle room in how feature_size is measured and the two values tend to scale proportionally. This substitution gives us:
(hashrate / area_in_square_lambda) * feature_size
Since area_in_square_lambda is (area_in_square_meters / feature_size
2) we can substitute to get:
(hashrate / (area_in_square_meters / feature_size2)) * feature_size
which is equivalent to
((hashrate * feature_size2) / area_in_square_meters) * feature_size
collecting the occurrences of feature_size gives us:
(hashrate * feature_size3) / area_in_square_meters
or alternatively:
(hashrate / area_in_square_meters) * feature_size3
ExampleThe Bitfury hasher gets 300MH/s:
300*106H/s
It runs on a Spartan-6, which a 300mm
2 or 300*10
-6m
2die. Dividing the
hashrate by the area in meters gives:
1*1012H/(s*m2)
This is why the Bitfury hasher a convenient example -- out of coincidence its hashrate in H/s just happens to be the same as its die area in square millimeters. This makes the numbers simpler.
Multiplying the number above by the feature_size (22.5*10
-9) cubed (11390.625*10
-27 meters) gives
11390.625*10-15H*m/s
which is:
11.390625*10-12H*m/s
The SI units for 10
-12 are "pico", so the Bitfury hasher gets
11.390 H*pm/s
SummaryTo compute the metric, take the overall throughput of the device (hashes/sec), divide by the chip area measured in square meters and multiply by the cube of the process's feature size.
Shortcut: take the hashrate in gigahashes per second, divide by the area in mm2, multiply by the feature size (half the minimum gate length) in nanometers three times.This number can then be used to project the performance of the same design under the
huge assumption that the layout won't have to be changed radically.
This assumption is almost always false, but assuming the design is ported with the same level of skill and same amount of time as the original layout, it's unlikely to be wrong by a factor of two or more. So I would consider this metric to be useful for projecting the results of porting a design up to roughly a factor of 2x. That might sound bad, but at the moment we don't have anything better. It also gives you an idea of how efficiently you're utilizing the transistors; once I get the numbers I'm looking forward to seeing how huge the divergence is between CPUs/GPUs/FPGAs/ASICs.
I propose to denote this metric by the greek letter η, from which the latin letter "H" arose. "H" is for hashpower, of course. Here is a table of some existing designs and their η-factor (I will update this periodically):
This metric does not take power consumption into account in any way. I believe there ought to be a separate process-independent metric for that.
If anybody can add information to the table, please post below. Getting die sizes can be difficult; I know the Spartan-6 die size above is a conservative estimate (it definitely isn't any bigger or it wouldn't fit in the csg484).[/list][/list]