Announcing a modest technical program for QubitCoinActing as self-appointed technical lead, I created the
qubitcoin-project organisation (
https://github.com/qubitcoin-project) on github. Unlike an individual developer's account, an organisation account provides basic membership functions plus some support for team working. It is a practical means of effecting
collective ownership and control of the coin's key resources. Feel free to PM me with your github a/c name or send me email at
gjh@bel-epa.com.
As a first step to stability, I collected up all the QubitCoin code resources and committed them to the org's repository. This includes the original wallet code, the cpu miner and the p2pool code (which we should think about possibly running ourselves as a means of generating some operating/promotional budget.)
I also added the work I've been doing in parallel - reworking the graphics and upgrading the code to Bitcoin 0.8.6.
And so to the details of the announcement.
A modest technical programI've been looking into ways of developing the coin technically without changing the economic parameters. I've been making a few posts on Mooncoin in response to Moonies' suggestions of a change of pow algo and in the course of doing the background reading for the response, I found an interesting set of results from 2009
Classification of the SHA-3 candidates, including reported performance statistics in cpb (cycles per message block) for 256 and 512 output lengths (sometimes labelled "-32" or "-64") on 32 & 64-bit host architectures. I include it here because it's exemplary and I also need to reference the detail ....
Name_of_hash_function | 32-bit_arch | Class'n | 64-bit_arch | Class'n |
ARIRANG-256 | 20 | A | 55.3 | E |
ARIRANG-512 | 14.9 | AA | 11.2 | B |
AURORA-256 | 24.3 | B | 15.4 | B |
AURORA-512 | 46.9 | B | 27.4 | E |
BLAKE-32 | 28.3 | B | 16.7 | B |
BLAKE-64 | 61.7 | C | 12.3 | B |
Blender-32 | 105.8 | E | 105.8 | E |
Blender-64 | 122.4 | E | 164.2 | E |
BMW-256 | 8.6 | AA | 7.85 | AA |
BMW-512 | 13.37 | AA | 4.06 | AA |
Cheetah-256 | 15.3 | A | 10.5 | A |
Cheetah-512 | 83.8 | D | 15.6 | C |
Chi-256 | 49 | C | 26 | D |
Chi-512 | 78 | D | 16 | C |
CRUNCH-256 | 29.9 | C | 16.9 | B |
CRUNCH-512 | 86.4 | D | 46.9 | E |
CubeHash8/1-256 | 14 | A | 11 | A |
CubeHash16/1-512 | 14 | A | 11 | A |
DynamicSHA-256 | 27.9 | B | 27.9 | D |
DynamicSHA-512 | 47.2 | B | 47.2 | E |
DynamicSHA2-256 | 21.9 | B | 21.9 | C |
DynamicSHA2-512 | 67.3 | C | 67.1 | E |
ECHO-256 | 38 | D | 32 | D |
ECHO-256 | 83 | D | 66 | E |
Edon-R-256 | 9.1 | AA | 5.9 | AA |
Edon-R-512 | 13.7 | AA | 2.9 | AA |
EnRUPT-256 | 8.3 | AA | 8.3 | A |
EnRUPT-512 | 5.1 | AA | 5.1 | AA |
Essence-256 | 149.8 | E | 19.5 | B |
Essence-512 | 176.5 | E | 23.5 | D |
Fugue-256 | 36.2 | C | 61 | E |
Fugue-512 | 74.6 | D | 132.7 | E |
Grøstl-256 | 22.9 | B | 22.4 | D |
Grøstl-512 | 37.5 | A | 30.1 | E |
Hamsi-256 | 22 | B | 25 | D |
JH-256 | 21.3 | B | 16.8 | B |
JH-512 | 21.3 | AA | 16.8 | D |
Keccak-256 | 35.4 | C | 10.1 | A |
Keccak-512 | 68.9 | C | 20.3 | D |
LANE-256 | 40.4 | D | 25.6 | D |
LANE-512 | 152.2 | E | 145.3 | E |
Lesamnta-256 | 59.2 | E | 52.7 | E |
Lesamnta-512 | 54.5 | B | 51.2 | E |
Luffa-256 | 13.9 | AA | 13.4 | A |
Luffa-512 | 25.5 | AA | 23.2 | D |
Lux-256 | 16.7 | A | 28.2 | D |
Lux-512 | 14.9 | AA | 12.5 | B |
MD6-256 | 68 | E | 28 | D |
MD6-512 | 106 | D | 44 | E |
NaSHA-256 | 39 | D | 28.4 | D |
NaSHA-512 | 38.9 | A | 29.3 | E |
SANDstorm-256 | 62.5 | E | 36.5 | D |
SANDstorm-512 | 296.8 | E | 95.3 | E |
Sarmal-256 | 19.2 | A | 10 | A |
Sarmal-512 | 23.3 | AA | 12.6 | B |
SHA-256 | 29.3 | C | 20.1 | C |
SHA-512 | 55.2 | C | 13.1 | C |
Shabal-256 | 18.4 | A | 13.5 | A |
Shabal-512 | 18.4 | AA | 13.5 | C |
SHAvite-3-256 | 35.3 | C | 26.7 | C |
SHAvite-3-512 | 55 | B | 38.2 | E |
SIMD-256 | 12 | AA | 11 | A |
SIMD-512 | 118 | E | 85 | E |
Skein-256 | 21.6 | A | 7.6 | AA |
Skein-512 | 20.1 | AA | 6.1 | AA |
TIB3-256 | 12.9 | AA | 7.6 | A |
TIB3-512 | 17.5 | AA | 6.3 | AA |
Twister-256 | 35.8 | C | 15.8 | B |
Twister-512 | 39.6 | A | 17.5 | D |
Vortex-256 | 46.2 | D | 69.4 | E |
Vortex-512 | 56 | C | 90 | E |
I checked the qubit algos against the results and there seems some technical (and promotional) advantage to be gained by revising the qubit algo. Revisions are unexceptional in cryptography, pragmatism rules; the result matrix seems to suggest that two of the slower algos could be simply replaced by two faster ones. I've pulled the qubit-algo results out of the big table, so I can illustrate what and how much.
/*
512 | 256
32 64 | 32 64
luffa AA D | AA A
cubehash A A | A A
shavite B E | C C < bmw AA AA | AA AA
simd E E | AA A
echo D E | D D < skein A AA | AA AA
*/
I will have to dig deep into the crypto benchmarks (
http://bench.cr.yp.to/results-sha3.html) for contemporary figures but it looks like a starting point
Why leave SIMD unchanged despite its E-grade performance in 512bits? Because it's irrelevant. In the candidate submission packs, each algo's performance is described and for nearly all the NIST candidates, the 256-bit length implementation is faster than the 512-bit. Bitcoin itself (our code heritage) uses 256-bit hashes throughout. The choice of a 512-bit format can be traced back to SiFcoin, the first chained-algo altcoin and
QubitCoin's direct ancestor. The rationale for the decision is google-translated here:
Topic: Sifcoin (inflationary fork). Start 23/06/2013
hesh function for the signature block header changed from sha-256 (sha-256 ()), on the daisy chain of the candidates / finalists and winner sha-3. Blake, BMW, Groestl, JH, Keccak, Skein. All functions of the 512-bit, but the end result is truncated to 256 bits. [...] Complication chain to a length of 6 different hash functions and increased bit depth to intermediate 512 - an attempt to protect from further development of highly efficient Mh / s gpu-algorithms and theory, "simple" Gh / s devices)
If that strategy ever afforded any protection it has now long since expired and the declared “rationale” has since mutated into a rich tradition of security theatre and superstition that's well on its way to becoming out-and-out pantomime.
The performance of NIST candidates on ASIC and FPGA implementations
is what they were judged on. In Nov 2011
NIST were giving away all-in-one ASIC chips that carried implementations of all of the finalists plus a reference implementation of SHA2. The literature is very specific, the SHA3 session of
2010 CHES Workshop has some very good papers, e.g. “Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates using FPGAs”.
I've been gaining invaluable insights by working through post-NIST review papers such as AlAhmad and Alshaikhli’s
“Broad View of Cryptographic Hash Functions” (
IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 4, No 1, July 2013).
We also evaluated the impact of technology scaling on FPGA and ASIC, i.e. we estimated the impact of more advanced technology nodes on our results. For FPGAs, the scaling factors are generally hard to quantify because different FPGA families may have drastically different architectures. In [3], researchers have already demonstrated the influence of different technology nodes on the FPGA results for SHA-3 Round 2 candidates. For example, when moving from a 90nm Xilinx Spartan3E to a 65nm Xilinx Virtex-5, the basic logic element changes from 4-LUT to 6-LUT. In addition, the presence of hardened IP blocks, such as embedded memory (Block RAM), clocking management blocks and DSP functions, can lead to differences between two FPGAs within even the same technology node. Therefore, our comparisons of the 14 SHA3 designs in FPGA are specifically made for a Xilinx 65nm Virtex-5 FPGA. For other FPGA technologies, we recommend the use of an automated framework such as ATHENA [3].
As they say, “the devil is in the detail”. Following up the ATHENA reference, I found results for both
FPGA and
ASIC benchmarks for the SHA3 2nd- and 3rd-round candidates. They have rather a fine selection of options of ASIC benchmark:
Algorithm Group:
o Round 3 SHA-3 Candidates & SHA-2
o Round 2 SHA-3 Candidates & SHA-2
Implementation Type:
o 130nm Process, Virginia Tech, Optimization for Throughput/Area
o 65nm Process, ETH Zurich and GMU, Optimization for Throughput/Area
o 65nm Process, ETH Zurich, Optimization for Minimum Area at Throughput = 2.488 Gbit/s
Ranking:
o Throughput/Area
o Throughput
o Area
o Power
o Energy/Bit
All this was in full flood well before SiFcoin was launched in mid-2013; there was no justification for hand-wavy references to theoretical ASIC solutions.
Angles of approachWe can see an opportunity for couple of possibly useful speedups that might act to reduce the differential between CPU and GPU:
i) we might gain a speed-up by switching to a couple of allegedly-faster algos and
ii) we might be able to gain some performance improvement by abandoning the ineffective, unnecessarily burdensome 512-bit format and returning to 256-bit hashing.
If there is a performance penalty for using 512-bit hash lengths, then it's quite a high price to pay for unnecessary extra security - ECRYPT II's 2012 annual review of key lengths maintains that 256 bits will provide protection from 2014 to 2040 and the NSA Suite B recommends 256 bits for SECRET, reserving a whopping 384 bits for TOP SECRET.
However, it's not obvious that either course will be successful, contemporary hardware is considerably more sophisticated and performance varies by algo and architecture. There are extensive graphs of performance at
bench.cr.yp.to, in cycles per byte of message block, here's a scaled-down screenshot
Clearly, there's only one sane option, try it and see.
1. Reducing hash length to 256-bitsStraightforward to do, everything's limited to a single file:
hashblock.hI'm using an i3 2367M (1.40GHz) with AVX.
I used QubitCoin version v0.8.3.15 for tests
mainNet 512bit -> 256bit"hashespersec" : 33939,
"hashespersec" : 34336,
"hashespersec" : 35520,
"hashespersec" : 34927,
"hashespersec" : 35380,
"hashespersec" : 35380,
"hashespersec" : 35046,
"hashespersec" : 35155,
"hashespersec" : 35126,
"hashespersec" : 35019,
"hashespersec" : 32447,
"hashespersec" : 34966,
"hashespersec" : 34717,
After the change, running on testnet3
testNet 256bit -> 256bit"hashespersec" : 40336,
"hashespersec" : 40458,
"hashespersec" : 39867,
"hashespersec" : 39558,
"hashespersec" : 40691,
"hashespersec" : 40526,
"hashespersec" : 40566,
"hashespersec" : 38682,
"hashespersec" : 40358,
"hashespersec" : 39698,
"hashespersec" : 40623,
Seems clear enough, a noticeable speed-up, about the magnitude one might expect
2. Swapping algosAgain, quite straightforward,
just a couple of files affected:
After the algo swap, running on testnet3
revised qubittestNet 256bit -> 256bit"hashespersec" : 44704,
"hashespersec" : 44259,
"hashespersec" : 44392,
"hashespersec" : 44673,
"hashespersec" : 44580,
"hashespersec" : 44103,
"hashespersec" : 44552,
"hashespersec" : 44552,
"hashespersec" : 44113,
"hashespersec" : 44315,
"hashespersec" : 44660,
As above, apparently.
After migrating the code from its 0.8.3 branch to the 0.8.6 master, I was dismayed to see the hash rate drop to 20k. A brief investigation fingered the disabled KGW, so I swapped it out for the more favourably-viewed Digishield block retargeting - and the hash rate returned to its prior elevated level.
The next step is to invite people to make a direct comparison on their own machine: hashespersec for 0.8.3/4 client vs 0.8.6 client. The code is in the organisation's repository:
https://github.com/qubitcoin-project/QubitCoinQ2C, I hope to be able to make an OS X binary available.
To enable a client for testnet, you either call it on the command line with a -testnet=1 argument or add testnet=1 to the config file. There's a temporary testNetDNSSeed hard-coded in, so the client should find the node straight away.
With respect to DNSseeds: as far as I can ascertain, DNSSeeds (as a source of node IPs) don't actually need to be an node running on a server of their own, they (or it) can be simply implemented via common-or-garden DNS resolution, resolving to the address of a known server hosting a full node. It's horribly centralized but that's the practical reality of hard-coding DNSSeeds into the source.
The upshot of this is ... to be independent of addnode lists, the source code needs to include the IP address of at least one known, hosted node assigned with the responsibility for broadcasting the peers. In QubitCoin, this function was originally combined with a mining pool (and perhaps even the block explorer) all running off the same IP address but, as we are discovering, this degree of centralisation can be problematic when the centre collapses.
We (Ngaio and I) run a server (for as long as we can afford to pay for it) and we could host it all, just as krecu did. But sustaining that demands fiat, not q2c (even if we had any) and we're unsure whether the community is strong enough to support stumping up real $$$.
From a pragmatic standpoint, for as long as we have the bitcointalk and reddit threads, DNSSeeds are mainly a technical nicety which we can't afford right now but can hope to have again soon.
Now that there's some genuine basis for viewing Qubitcoin as a “technical altcoin”, the next order of business will be to address the issue of setting up an organisational structure. Hopefully, I'll have a little less to say.
Cheers
Graham