tcatm (OP)
|
|
July 30, 2010, 09:23:10 PM |
|
This patch will calculate four hashes on one core using vector instructions. There's a test programm included that validates the new hash function against the old one so it should be correct. The patch is against 0.3.6. Improves khash/s by roughly 115%. http://pastebin.com/XN1JDb53
|
|
|
|
knightmb
|
|
July 30, 2010, 09:33:29 PM |
|
I take it that you've already tested the hash limit before performance starts to suffer against the stock code? I'm just curious myself.
|
Timekoin - The World's Most Energy Efficient Encrypted Digital Currency
|
|
|
tcatm (OP)
|
|
July 30, 2010, 09:47:22 PM |
|
Performance of stock code (as measured by my test/benchmark program) is about 1500khash/s. My code does 3500khash/s. Both figures are for one core. It scales well because I do 128 hashes at once and keep the datastructures small enough to fit in the CPU cache.
I have two local collision attacks which will squeeze another 300khash/s out, but they are not stable yet.
|
|
|
|
knightmb
|
|
July 30, 2010, 09:51:10 PM |
|
Awesome, I'll have to give it a try myself then.
|
Timekoin - The World's Most Energy Efficient Encrypted Digital Currency
|
|
|
tcatm (OP)
|
|
July 30, 2010, 10:00:24 PM |
|
Tell me if it works Donations are welcome. 17asVKkzRGTFvvGH9dMGQaHe78xzfvgSSA
|
|
|
|
satoshi
Founder
Sr. Member
Offline
Activity: 364
Merit: 7065
|
|
July 31, 2010, 12:29:20 AM |
|
That's amazing...
So are you saying you use 128-bit registers to SIMD four 32-bit data at once? I've wondered about that for a long time, but I didn't think it would be possible due to addition carrying into the neighbour's value.
|
|
|
|
knightmb
|
|
July 31, 2010, 04:49:33 AM |
|
Darn, it means the next release, the difficulty is going to have to increase to 1000 or so to keep up, LOL
|
Timekoin - The World's Most Energy Efficient Encrypted Digital Currency
|
|
|
tcatm (OP)
|
|
July 31, 2010, 10:12:38 AM |
|
That's amazing...
So are you saying you use 128-bit registers to SIMD four 32-bit data at once? I've wondered about that for a long time, but I didn't think it would be possible due to addition carrying into the neighbour's value.
That's how it works. Four 32 bit values in a 128 bit vector. They're calculated independently, but at the same time. Btw. Why are you using this alignup<16> function when __attribute__ ((aligned (16))) will tell the compiler to align at compiletime?
|
|
|
|
em3rgentOrdr
|
|
July 31, 2010, 01:42:48 PM |
|
hmm...I wasn't able to apply the patch (I'm a noobie). Here's the command I ran from bitcoin-0.3.6/src # patch < XN1JDb53.txt
Output:
1 out of 1 hunk ignored (Stripping trailing CRs from patch.) patching file main.cpp Hunk #1 FAILED at 2555. Hunk #2 FAILED at 2701. 2 out of 2 hunks FAILED (Stripping trailing CRs from patch.) patching file makefile.unix Hunk #1 FAILED at 45. Hunk #2 FAILED at 58.
What's the proper command to type into linux? Or do you have linux binaries?
|
"We will not find a solution to political problems in cryptography, but we can win a major battle in the arms race and gain a new territory of freedom for several years.
Governments are good at cutting off the heads of a centrally controlled networks, but pure P2P networks are holding their own."
|
|
|
tcatm (OP)
|
|
July 31, 2010, 02:18:03 PM |
|
the mean client would send all generated bitcoins to a certain address @em3rgent0rder: i don't know why it fails, but it should be easy to patch it manually...
|
|
|
|
jgarzik
Legendary
Offline
Activity: 1596
Merit: 1099
|
|
July 31, 2010, 05:18:30 PM |
|
hmm...I wasn't able to apply the patch (I'm a noobie). Here's the command I ran from bitcoin-0.3.6/src # patch < XN1JDb53.txt
Output:
1 out of 1 hunk ignored (Stripping trailing CRs from patch.) patching file main.cpp Hunk #1 FAILED at 2555. Hunk #2 FAILED at 2701. 2 out of 2 hunks FAILED (Stripping trailing CRs from patch.) patching file makefile.unix Hunk #1 FAILED at 45. Hunk #2 FAILED at 58.
It definitely does not apply to the SVN trunk. Maybe tcatm could post the main.cpp itself?
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
tcatm (OP)
|
|
July 31, 2010, 05:40:27 PM |
|
Looks like pastebin.com messes up the patch... diff --git a/cryptopp/sha256.cpp b/cryptopp/sha256.cpp new file mode 100644 index 0000000..15f8be1 --- /dev/null +++ b/cryptopp/sha256.cpp @@ -0,0 +1,443 @@ +#include <string.h> +#include <assert.h> + +#include <xmmintrin.h> +#include <stdint.h> +#include <stdio.h> + +#define NPAR 32 + +static const unsigned int sha256_consts[] = { + 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, /* 0 */ + 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, + 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, /* 8 */ + 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, + 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, /* 16 */ + 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, + 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, /* 24 */ + 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, + 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, /* 32 */ + 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, + 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, /* 40 */ + 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, + 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, /* 48 */ + 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, + 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, /* 56 */ + 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 +}; + + +static inline __m128i Ch(const __m128i b, const __m128i c, const __m128i d) { + return (b & c) ^ (~b & d); +} + +static inline __m128i Maj(const __m128i b, const __m128i c, const __m128i d) { + return (b & c) ^ (b & d) ^ (c & d); +} + +static inline __m128i ROTR(__m128i x, const int n) { + return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n); +} + +static inline __m128i SHR(__m128i x, const int n) { + return _mm_srli_epi32(x, n); +} + +/* SHA256 Functions */ +#define BIGSIGMA0_256(x) (ROTR((x), 2) ^ ROTR((x), 13) ^ ROTR((x), 22)) +#define BIGSIGMA1_256(x) (ROTR((x), 6) ^ ROTR((x), 11) ^ ROTR((x), 25)) +#define SIGMA0_256(x) (ROTR((x), 7) ^ ROTR((x), 18) ^ SHR((x), 3)) +#define SIGMA1_256(x) (ROTR((x), 17) ^ ROTR((x), 19) ^ SHR((x), 10)) + +static inline __m128i load_epi32(const unsigned int x0, const unsigned int x1, const unsigned int x2, const unsigned int x3) { + return _mm_set_epi32(x0, x1, x2, x3); +} + +static inline unsigned int store32(const __m128i x, int i) { + union { unsigned int ret[4]; __m128i x; } box; + box.x = x; + return box.ret[i]; +} + +static inline void store_epi32(const __m128i x, unsigned int *x0, unsigned int *x1, unsigned int *x2, unsigned int *x3) { + union { unsigned int ret[4]; __m128i x; } box; + box.x = x; + *x0 = box.ret[3]; *x1 = box.ret[2]; *x2 = box.ret[1]; *x3 = box.ret[0]; +} + +static inline __m128i SHA256_CONST(const int i) { + return _mm_set1_epi32(sha256_consts[i]); +} + +#define add4(x0, x1, x2, x3) _mm_add_epi32(_mm_add_epi32(_mm_add_epi32(x0, x1), x2), x3) +#define add5(x0, x1, x2, x3, x4) _mm_add_epi32(add4(x0, x1, x2, x3), x4) + +#define SHA256ROUND(a, b, c, d, e, f, g, h, i, w) \ + T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \ +d = _mm_add_epi32(d, T1); \ +T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c)); \ +h = _mm_add_epi32(T1, T2); + +#define SHA256ROUND_lastd(a, b, c, d, e, f, g, h, i, w) \ + T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \ +d = _mm_add_epi32(d, T1); +//T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c)); +//h = _mm_add_epi32(T1, T2); + +#define SHA256ROUND_last(a, b, c, d, e, f, g, h, i, w) \ + T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \ +T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c)); \ +h = _mm_add_epi32(T1, T2); + +static inline unsigned int swap(unsigned int value) { + __asm__ ("bswap %0" : "=r" (value) : "0" (value)); + return value; +} + +static inline unsigned int SWAP32(const void *addr) { + unsigned int value = (*((unsigned int *)(addr))); + __asm__ ("bswap %0" : "=r" (value) : "0" (value)); + return value; +} + +static inline void dumpreg(__m128i x, char *msg) { + union { unsigned int ret[4]; __m128i x; } box; + box.x = x ; + printf("%s %08x %08x %08x %08x\n", msg, box.ret[0], box.ret[1], box.ret[2], box.ret[3]); +} + +#if 1 +#define dumpstate(i) printf("%s: %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", \ + __func__, store32(w0, i), store32(a, i), store32(b, i), store32(c, i), store32(d, i), store32(e, i), store32(f, i), store32(g, i), store32(h, i)); +#else +#define dumpstate() +#endif +void Double_BlockSHA256(const void* pin, void* pad, const void *pre, unsigned int thash[8][NPAR], const void *init) +{ + unsigned int* In = (unsigned int*)pin; + unsigned int* Pad = (unsigned int*)pad; + unsigned int* hPre = (unsigned int*)pre; + unsigned int* hInit = (unsigned int*)init; + unsigned int i, j, k; + + /* vectors used in calculation */ + __m128i w0, w1, w2, w3, w4, w5, w6, w7; + __m128i w8, w9, w10, w11, w12, w13, w14, w15; + __m128i T1, T2; + __m128i a, b, c, d, e, f, g, h; + + /* nonce offset for vector */ + __m128i offset = load_epi32(0x00000003, 0x00000002, 0x00000001, 0x00000000); + + + for(k = 0; k<NPAR; k+=4) { + w0 = load_epi32(In[0], In[0], In[0], In[0]); + w1 = load_epi32(In[1], In[1], In[1], In[1]); + w2 = load_epi32(In[2], In[2], In[2], In[2]); + w3 = load_epi32(In[3], In[3], In[3], In[3]); + w4 = load_epi32(In[4], In[4], In[4], In[4]); + w5 = load_epi32(In[5], In[5], In[5], In[5]); + w6 = load_epi32(In[6], In[6], In[6], In[6]); + w7 = load_epi32(In[7], In[7], In[7], In[7]); + w8 = load_epi32(In[8], In[8], In[8], In[8]); + w9 = load_epi32(In[9], In[9], In[9], In[9]); + w10 = load_epi32(In[10], In[10], In[10], In[10]); + w11 = load_epi32(In[11], In[11], In[11], In[11]); + w12 = load_epi32(In[12], In[12], In[12], In[12]); + w13 = load_epi32(In[13], In[13], In[13], In[13]); + w14 = load_epi32(In[14], In[14], In[14], In[14]); + w15 = load_epi32(In[15], In[15], In[15], In[15]); + + /* hack nonce into lowest byte of w3 */ + __m128i k_vec = load_epi32(k, k, k, k); + w3 = _mm_add_epi32(w3, offset); + w3 = _mm_add_epi32(w3, k_vec); + + a = load_epi32(hPre[0], hPre[0], hPre[0], hPre[0]); + b = load_epi32(hPre[1], hPre[1], hPre[1], hPre[1]); + c = load_epi32(hPre[2], hPre[2], hPre[2], hPre[2]); + d = load_epi32(hPre[3], hPre[3], hPre[3], hPre[3]); + e = load_epi32(hPre[4], hPre[4], hPre[4], hPre[4]); + f = load_epi32(hPre[5], hPre[5], hPre[5], hPre[5]); + g = load_epi32(hPre[6], hPre[6], hPre[6], hPre[6]); + h = load_epi32(hPre[7], hPre[7], hPre[7], hPre[7]); + + SHA256ROUND(a, b, c, d, e, f, g, h, 0, w0); + SHA256ROUND(h, a, b, c, d, e, f, g, 1, w1); + SHA256ROUND(g, h, a, b, c, d, e, f, 2, w2); + SHA256ROUND(f, g, h, a, b, c, d, e, 3, w3); + SHA256ROUND(e, f, g, h, a, b, c, d, 4, w4); + SHA256ROUND(d, e, f, g, h, a, b, c, 5, w5); + SHA256ROUND(c, d, e, f, g, h, a, b, 6, w6); + SHA256ROUND(b, c, d, e, f, g, h, a, 7, w7); + SHA256ROUND(a, b, c, d, e, f, g, h, 8, w8); + SHA256ROUND(h, a, b, c, d, e, f, g, 9, w9); + SHA256ROUND(g, h, a, b, c, d, e, f, 10, w10); + SHA256ROUND(f, g, h, a, b, c, d, e, 11, w11); + SHA256ROUND(e, f, g, h, a, b, c, d, 12, w12); + SHA256ROUND(d, e, f, g, h, a, b, c, 13, w13); + SHA256ROUND(c, d, e, f, g, h, a, b, 14, w14); + SHA256ROUND(b, c, d, e, f, g, h, a, 15, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 16, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 17, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 18, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 19, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 20, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 21, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 22, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 23, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 24, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 25, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 26, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 27, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 28, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 29, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 30, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 31, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 32, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 33, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 34, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 35, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 36, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 37, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 38, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 39, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 40, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 41, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 42, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 43, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 44, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 45, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 46, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 47, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 48, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 49, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 50, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 51, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 52, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 53, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 54, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 55, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 56, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 57, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 58, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 59, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 60, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 61, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 62, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 63, w15); + +#define store_load(x, i, dest) \ + w8 = load_epi32((hPre)[i], (hPre)[i], (hPre)[i], (hPre)[i]); \ + dest = _mm_add_epi32(w8, x); + + store_load(a, 0, w0); + store_load(b, 1, w1); + store_load(c, 2, w2); + store_load(d, 3, w3); + store_load(e, 4, w4); + store_load(f, 5, w5); + store_load(g, 6, w6); + store_load(h, 7, w7); + + w8 = load_epi32(Pad[8], Pad[8], Pad[8], Pad[8]); + w9 = load_epi32(Pad[9], Pad[9], Pad[9], Pad[9]); + w10 = load_epi32(Pad[10], Pad[10], Pad[10], Pad[10]); + w11 = load_epi32(Pad[11], Pad[11], Pad[11], Pad[11]); + w12 = load_epi32(Pad[12], Pad[12], Pad[12], Pad[12]); + w13 = load_epi32(Pad[13], Pad[13], Pad[13], Pad[13]); + w14 = load_epi32(Pad[14], Pad[14], Pad[14], Pad[14]); + w15 = load_epi32(Pad[15], Pad[15], Pad[15], Pad[15]); + + a = load_epi32(hInit[0], hInit[0], hInit[0], hInit[0]); + b = load_epi32(hInit[1], hInit[1], hInit[1], hInit[1]); + c = load_epi32(hInit[2], hInit[2], hInit[2], hInit[2]); + d = load_epi32(hInit[3], hInit[3], hInit[3], hInit[3]); + e = load_epi32(hInit[4], hInit[4], hInit[4], hInit[4]); + f = load_epi32(hInit[5], hInit[5], hInit[5], hInit[5]); + g = load_epi32(hInit[6], hInit[6], hInit[6], hInit[6]); + h = load_epi32(hInit[7], hInit[7], hInit[7], hInit[7]); + + SHA256ROUND(a, b, c, d, e, f, g, h, 0, w0); + SHA256ROUND(h, a, b, c, d, e, f, g, 1, w1); + SHA256ROUND(g, h, a, b, c, d, e, f, 2, w2); + SHA256ROUND(f, g, h, a, b, c, d, e, 3, w3); + SHA256ROUND(e, f, g, h, a, b, c, d, 4, w4); + SHA256ROUND(d, e, f, g, h, a, b, c, 5, w5); + SHA256ROUND(c, d, e, f, g, h, a, b, 6, w6); + SHA256ROUND(b, c, d, e, f, g, h, a, 7, w7); + SHA256ROUND(a, b, c, d, e, f, g, h, 8, w8); + SHA256ROUND(h, a, b, c, d, e, f, g, 9, w9); + SHA256ROUND(g, h, a, b, c, d, e, f, 10, w10); + SHA256ROUND(f, g, h, a, b, c, d, e, 11, w11); + SHA256ROUND(e, f, g, h, a, b, c, d, 12, w12); + SHA256ROUND(d, e, f, g, h, a, b, c, 13, w13); + SHA256ROUND(c, d, e, f, g, h, a, b, 14, w14); + SHA256ROUND(b, c, d, e, f, g, h, a, 15, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 16, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 17, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 18, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 19, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 20, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 21, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 22, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 23, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 24, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 25, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 26, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 27, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 28, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 29, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 30, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 31, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 32, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 33, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 34, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 35, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 36, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 37, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 38, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 39, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 40, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 41, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 42, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 43, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 44, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 45, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 46, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 47, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 48, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 49, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 50, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 51, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 52, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 53, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 54, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 55, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 56, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 57, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 58, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 59, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 60, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 61, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 62, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 63, w15); + + /* store resulsts directly in thash */ +#define store_2(x,i) \ + w0 = load_epi32((hInit)[i], (hInit)[i], (hInit)[i], (hInit)[i]); \ + *(__m128i *)&(thash)[i][0+k] = _mm_add_epi32(w0, x); + + store_2(a, 0); + store_2(b, 1); + store_2(c, 2); + store_2(d, 3); + store_2(e, 4); + store_2(f, 5); + store_2(g, 6); + store_2(h, 7); + } + +} diff --git a/main.cpp b/main.cpp index ddc359a..d30d642 100755 --- a/main.cpp +++ b/main.cpp @@ -2555,8 +2555,10 @@ inline void SHA256Transform(void* pstate, void* pinput, const void* pinit) CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput); } +// !!!! NPAR must match NPAR in cryptopp/sha256.cpp !!!! +#define NPAR 32 - +extern void Double_BlockSHA256(const void* pin, void* pout, const void *pinit, unsigned int hash[8][NPAR], const void *init2); void BitcoinMiner() @@ -2701,108 +2703,123 @@ void BitcoinMiner() uint256 hashTarget = CBigNum().SetCompact(pblock->nBits).getuint256(); uint256 hashbuf[2]; uint256& hash = *alignup<16>(hashbuf); + + // Cache for NPAR hashes + unsigned int thash[8][NPAR]; + + unsigned int j; loop { - SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate); - SHA256Transform(&hash, &tmp.hash1, pSHA256InitState); + Double_BlockSHA256((char*)&tmp.block + 64, &tmp.hash1, &midstate, thash, pSHA256InitState); - if (((unsigned short*)&hash)[14] == 0) + for(j = 0; j<NPAR; j++) { + if (thash[7][j] == 0) { - // Byte swap the result after preliminary check - for (int i = 0; i < sizeof(hash)/4; i++) - ((unsigned int*)&hash)[i] = ByteReverse(((unsigned int*)&hash)[i]); - - if (hash <= hashTarget) + // Byte swap the result after preliminary check + for (int i = 0; i < sizeof(hash)/4; i++) + ((unsigned int*)&hash)[i] = ByteReverse((unsigned int)thash[i][j]); + + if (hash <= hashTarget) + { + // Double_BlocSHA256 might only calculate parts of the hash. + // We'll insert the nonce and get the real hash. + //pblock->nNonce = ByteReverse(tmp.block.nNonce + j); + //hash = pblock->GetHash(); + + pblock->nNonce = ByteReverse(tmp.block.nNonce + j); + assert(hash == pblock->GetHash()); + + //// debug print + printf("BitcoinMiner:\n"); + printf("proof-of-work found \n hash: %s \ntarget: %s\n", hash.GetHex().c_str(), hashTarget.GetHex().c_str()); + pblock->print(); + printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str()); + printf("generated %s\n", FormatMoney(pblock->vtx[0].vout[0].nValue).c_str()); + + SetThreadPriority(THREAD_PRIORITY_NORMAL); + CRITICAL_BLOCK(cs_main) { - pblock->nNonce = ByteReverse(tmp.block.nNonce); - assert(hash == pblock->GetHash()); - - //// debug print - printf("BitcoinMiner:\n"); - printf("proof-of-work found \n hash: %s \ntarget: %s\n", hash.GetHex().c_str(), hashTarget.GetHex().c_str()); - pblock->print(); - printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str()); - printf("generated %s\n", FormatMoney(pblock->vtx[0].vout[0].nValue).c_str()); - - SetThreadPriority(THREAD_PRIORITY_NORMAL); - CRITICAL_BLOCK(cs_main) - { - if (pindexPrev == pindexBest) - { - // Save key - if (!AddKey(key)) - return; - key.MakeNewKey(); - - // Track how many getdata requests this block gets - CRITICAL_BLOCK(cs_mapRequestCount) - mapRequestCount[pblock->GetHash()] = 0; - - // Process this block the same as if we had received it from another node - if (!ProcessBlock(NULL, pblock.release())) - printf("ERROR in BitcoinMiner, ProcessBlock, block not accepted\n"); - } - } - SetThreadPriority(THREAD_PRIORITY_LOWEST); - - Sleep(500); - break; + if (pindexPrev == pindexBest) + { + // Save key + if (!AddKey(key)) + return; + key.MakeNewKey(); + + // Track how many getdata requests this block gets + CRITICAL_BLOCK(cs_mapRequestCount) + mapRequestCount[pblock->GetHash()] = 0; + + // Process this block the same as if we had received it from another node + if (!ProcessBlock(NULL, pblock.release())) + printf("ERROR in BitcoinMiner, ProcessBlock, block not accepted\n"); + + } } - } + SetThreadPriority(THREAD_PRIORITY_LOWEST); - // Update nTime every few seconds - const unsigned int nMask = 0xffff; - if ((++tmp.block.nNonce & nMask) == 0) + Sleep(500); + break; + } + } + } + + // Update nonce + tmp.block.nNonce += NPAR; + + // Update nTime every few seconds + const unsigned int nMask = 0xffff; + if ((tmp.block.nNonce & nMask) == 0) + { + // Meter hashes/sec + static int64 nTimerStart; + static int nHashCounter; + if (nTimerStart == 0) + nTimerStart = GetTimeMillis(); + else + nHashCounter++; + if (GetTimeMillis() - nTimerStart > 4000) { - // Meter hashes/sec - static int64 nTimerStart; - static int nHashCounter; - if (nTimerStart == 0) - nTimerStart = GetTimeMillis(); - else - nHashCounter++; + static CCriticalSection cs; + CRITICAL_BLOCK(cs) + { if (GetTimeMillis() - nTimerStart > 4000) { - static CCriticalSection cs; - CRITICAL_BLOCK(cs) - { - if (GetTimeMillis() - nTimerStart > 4000) - { - double dHashesPerSec = 1000.0 * (nMask+1) * nHashCounter / (GetTimeMillis() - nTimerStart); - nTimerStart = GetTimeMillis(); - nHashCounter = 0; - string strStatus = strprintf(" %.0f khash/s", dHashesPerSec/1000.0); - UIThreadCall(bind(CalledSetStatusBar, strStatus, 0)); - static int64 nLogTime; - if (GetTime() - nLogTime > 30 * 60) - { - nLogTime = GetTime(); - printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str()); - printf("hashmeter %3d CPUs %6.0f khash/s\n", vnThreadsRunning[3], dHashesPerSec/1000.0); - } - } - } + double dHashesPerSec = 1000.0 * (nMask+1) * nHashCounter / (GetTimeMillis() - nTimerStart); + nTimerStart = GetTimeMillis(); + nHashCounter = 0; + string strStatus = strprintf(" %.0f khash/s", dHashesPerSec/1000.0); + UIThreadCall(bind(CalledSetStatusBar, strStatus, 0)); + static int64 nLogTime; + if (GetTime() - nLogTime > 30 * 60) + { + nLogTime = GetTime(); + printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str()); + printf("hashmeter %3d CPUs %6.0f khash/s\n", vnThreadsRunning[3], dHashesPerSec/1000.0); + } } - - // Check for stop or if block needs to be rebuilt - if (fShutdown) - return; - if (!fGenerateBitcoins) - return; - if (fLimitProcessors && vnThreadsRunning[3] > nLimitProcessors) - return; - if (vNodes.empty()) - break; - if (tmp.block.nNonce == 0) - break; - if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60) - break; - if (pindexPrev != pindexBest) - break; - - pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime()); - tmp.block.nTime = ByteReverse(pblock->nTime); + } } + + // Check for stop or if block needs to be rebuilt + if (fShutdown) + return; + if (!fGenerateBitcoins) + return; + if (fLimitProcessors && vnThreadsRunning[3] > nLimitProcessors) + return; + if (vNodes.empty()) + break; + if (tmp.block.nNonce == 0) + break; + if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60) + break; + if (pindexPrev != pindexBest) + break; + + pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime()); + tmp.block.nTime = ByteReverse(pblock->nTime); + } } } } diff --git a/makefile.unix b/makefile.unix index 597a0ea..8fb0aa6 100755 --- a/makefile.unix +++ b/makefile.unix @@ -45,7 +45,8 @@ OBJS= \ obj/rpc.o \ obj/init.o \ cryptopp/obj/sha.o \ - cryptopp/obj/cpu.o + cryptopp/obj/cpu.o \ + cryptopp/obj/sha256.o all: bitcoin @@ -58,18 +59,20 @@ obj/%.o: %.cpp $(HEADERS) headers.h.gch g++ -c $(CFLAGS) -DGUI -o $@ $< cryptopp/obj/%.o: cryptopp/%.cpp - g++ -c $(CFLAGS) -O3 -DCRYPTOPP_DISABLE_SSE2 -o $@ $< + g++ -c $(CFLAGS) -frename-registers -funroll-all-loops -fomit-frame-pointer -march=native -msse2 -msse3 -ffast-math -O3 -o $@ $< bitcoin: $(OBJS) obj/ui.o obj/uibase.o g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(WXLIBS) $(LIBS) - obj/nogui/%.o: %.cpp $(HEADERS) g++ -c $(CFLAGS) -o $@ $< bitcoind: $(OBJS:obj/%=obj/nogui/%) g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(LIBS) +test: cryptopp/obj/sha.o cryptopp/obj/sha256.o test.cpp + g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(WXLIBS) $(LIBS) + clean: -rm -f obj/*.o diff --git a/test.cpp b/test.cpp new file mode 100755 index 0000000..7cab332 --- /dev/null +++ b/test.cpp @@ -0,0 +1,237 @@ +// Copyright (c) 2009-2010 Satoshi Nakamoto +// Distributed under the MIT/X11 software license, see the accompanying +// file license.txt or http://www.opensource.org/licenses/mit-license.php. +#include <assert.h> +#include <openssl/ecdsa.h> +#include <openssl/evp.h> +#include <openssl/rand.h> +#include <openssl/sha.h> +#include <openssl/ripemd.h> +#include <db_cxx.h> +#include <stdio.h> +#include <stdlib.h> +#include <math.h> +#include <limits.h> +#include <float.h> +#include <assert.h> +#include <memory> +#include <iostream> +#include <sstream> +#include <string> +#include <vector> +#include <list> +#include <deque> +#include <map> +#include <set> +#include <algorithm> +#include <numeric> +#include <boost/foreach.hpp> +#include <boost/lexical_cast.hpp> +#include <boost/tuple/tuple.hpp> +#include <boost/fusion/container/vector.hpp> +#include <boost/tuple/tuple_comparison.hpp> +#include <boost/tuple/tuple_io.hpp> +#include <boost/array.hpp> +#include <boost/bind.hpp> +#include <boost/function.hpp> +#include <boost/filesystem.hpp> +#include <boost/filesystem/fstream.hpp> +#include <boost/algorithm/string.hpp> +#include <boost/interprocess/sync/interprocess_mutex.hpp> +#include <boost/interprocess/sync/interprocess_recursive_mutex.hpp> +#include <boost/date_time/gregorian/gregorian_types.hpp> +#include <boost/date_time/posix_time/posix_time_types.hpp> +#include <sys/resource.h> +#include <sys/time.h> +using namespace std; +using namespace boost; +#include "cryptopp/sha.h" +#include "strlcpy.h" +#include "serialize.h" +#include "uint256.h" +#include "bignum.h" + +#undef printf + template <size_t nBytes, typename T> +T* alignup(T* p) +{ + union + { + T* ptr; + size_t n; + } u; + u.ptr = p; + u.n = (u.n + (nBytes-1)) & ~(nBytes-1); + return u.ptr; +} + +int FormatHashBlocks(void* pbuffer, unsigned int len) +{ + unsigned char* pdata = (unsigned char*)pbuffer; + unsigned int blocks = 1 + ((len + 8) / 64); + unsigned char* pend = pdata + 64 * blocks; + memset(pdata + len, 0, 64 * blocks - len); + pdata[len] = 0x80; + unsigned int bits = len * 8; + pend[-1] = (bits >> 0) & 0xff; + pend[-2] = (bits >> 8) & 0xff; + pend[-3] = (bits >> 16) & 0xff; + pend[-4] = (bits >> 24) & 0xff; + return blocks; +} + +using CryptoPP::ByteReverse; +static int detectlittleendian = 1; + +#define NPAR 32 + +extern void Double_BlockSHA256(const void* pin, void* pout, const void *pinit, unsigned int hash[8][NPAR], const void *init2); + +using CryptoPP::ByteReverse; + +static const unsigned int pSHA256InitState[8] = {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19}; + +inline void SHA256Transform(void* pstate, void* pinput, const void* pinit) +{ + memcpy(pstate, pinit, 32); + CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput); +} + +void BitcoinTester(char *filename) +{ + printf("SHA256 test started\n"); + + struct tmpworkspace + { + struct unnamed2 + { + int nVersion; + uint256 hashPrevBlock; + uint256 hashMerkleRoot; + unsigned int nTime; + unsigned int nBits; + unsigned int nNonce; + } + block; + unsigned char pchPadding0[64]; + uint256 hash1; + unsigned char pchPadding1[64]; + }; + char tmpbuf[sizeof(tmpworkspace)+16]; + tmpworkspace& tmp = *(tmpworkspace*)alignup<16>(tmpbuf); + + + char line[180]; + ifstream fin(filename); + char *p; + unsigned long int totalhashes= 0; + unsigned long int found = 0; + clock_t start, end; + unsigned long int cpu_time_used; + unsigned int tnonce; + start = clock(); + + while( fin.getline(line, 180)) + { + string in(line); + //printf("%s\n", in.c_str()); + tmp.block.nVersion = strtol(in.substr(0,8).c_str(), &p, 16); + tmp.block.hashPrevBlock.SetHex(in.substr(8,64)); + tmp.block.hashMerkleRoot.SetHex(in.substr(64+8,64)); + tmp.block.nTime = strtol(in.substr(128+8,8).c_str(), &p, 16); + tmp.block.nBits = strtol(in.substr(128+16,8).c_str(), &p, 16); + tnonce = strtol(in.substr(128+24,8).c_str(), &p, 16); + tmp.block.nNonce = tnonce; + + unsigned int nBlocks0 = FormatHashBlocks(&tmp.block, sizeof(tmp.block)); + unsigned int nBlocks1 = FormatHashBlocks(&tmp.hash1, sizeof(tmp.hash1)); + + // Byte swap all the input buffer + for (int i = 0; i < sizeof(tmp)/4; i++) + ((unsigned int*)&tmp)[i] = ByteReverse(((unsigned int*)&tmp)[i]); + + // Precalc the first half of the first hash, which stays constant + uint256 midstatebuf[2]; + uint256& midstate = *alignup<16>(midstatebuf); + SHA256Transform(&midstate, &tmp.block, pSHA256InitState); + + + uint256 hashTarget = CBigNum().SetCompact(ByteReverse(tmp.block.nBits)).getuint256(); + // printf("target %s\n", hashTarget.GetHex().c_str()); + uint256 hash; + uint256 hashbuf[2]; + uint256& refhash = *alignup<16>(hashbuf); + + unsigned int thash[8][NPAR]; + int done = 0; + unsigned int i, j; + + /* reference */ + SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate); + SHA256Transform(&refhash, &tmp.hash1, pSHA256InitState); + for (int i = 0; i < sizeof(refhash)/4; i++) + ((unsigned int*)&refhash)[i] = ByteReverse(((unsigned int*)&refhash)[i]); + + //printf("reference nonce %08x:\n%s\n\n", tnonce, refhash.GetHex().c_str()); + + tmp.block.nNonce = ByteReverse(tnonce) & 0xfffff000; + + + for(;;) + { + + Double_BlockSHA256((char*)&tmp.block + 64, &tmp.hash1, &midstate, thash, pSHA256InitState); + + for(i = 0; i<NPAR; i++) { + /* fast hash checking */ + if(thash[7][i] == 0) { + // printf("found something... "); + + for(j = 0; j<8; j++) ((unsigned int *)&hash)[j] = ByteReverse((unsigned int)thash[j][i]); + // printf("%s\n", hash.GetHex().c_str()); + + if (hash <= hashTarget) + { + found++; + if(tnonce == ByteReverse(tmp.block.nNonce + i) ) { + if(hash == refhash) { + printf("\r%lu", found); + totalhashes += NPAR; + done = 1; + } else { + printf("Hashes do not match!\n"); + } + } else { + printf("nonce does not match. %08x != %08x\n", tnonce, ByteReverse(tmp.block.nNonce + i)); + } + break; + } + } + } + if(done) break; + + tmp.block.nNonce+=NPAR; + totalhashes += NPAR; + if(tmp.block.nNonce == 0) { + printf("ERROR: Hash not found for:\n%s\n", in.c_str()); + return; + } + } + } + printf("\n"); + end = clock(); + cpu_time_used += (unsigned int)(end - start); + cpu_time_used /= ((CLOCKS_PER_SEC)/1000); + printf("found solutions = %lu\n", found); + printf("total hashes = %lu\n", totalhashes); + printf("total time = %lu ms\n", cpu_time_used); + printf("average speed: %lu khash/s\n", (totalhashes)/cpu_time_used); +} + +int main(int argc, char* argv[]) { + if(argc == 2) { + BitcoinTester(argv[1]); + } else + printf("Missing filename!\n"); + return 0; +}
|
|
|
|
nelisky
Legendary
Offline
Activity: 1540
Merit: 1002
|
|
July 31, 2010, 07:17:17 PM |
|
Had to manually patch, as I'm not using git for bitcoin and 'patch' doesn't munch this format, I guess. Anyway, got almost double speed on the OSX side, (i5 2.4, now ~2400 from ~1400), but my linux on Q6600 quad 2.4Ghz was pumping ~2500 with 0.3.6 (from source) and now, with the patch it's... ~2400. Need I tweak anything to take advantage on this?
|
|
|
|
nelisky
Legendary
Offline
Activity: 1540
Merit: 1002
|
|
July 31, 2010, 08:34:06 PM |
|
ahm, let me correct myself: on the quad core linux, I went from ~4400 with svn trunk @ 119 to ~2400 with the patch... not exactly what I hoped for after the success in OSX.
|
|
|
|
aceat64
|
|
July 31, 2010, 08:57:49 PM |
|
ahm, let me correct myself: on the quad core linux, I went from ~4400 with svn trunk @ 119 to ~2400 with the patch... not exactly what I hoped for after the success in OSX.
I noticed the same, I went from about 4300 to 2100 when I tested it on Linux.
|
|
|
|
tcatm (OP)
|
|
July 31, 2010, 10:38:29 PM |
|
What CPUs are you running it on? Could you send me sha256.o (compiled object of the algorithm)?
|
|
|
|
nelisky
Legendary
Offline
Activity: 1540
Merit: 1002
|
|
July 31, 2010, 11:18:07 PM |
|
I'm running on the Intel Q6600 2.4Ghz, how shall I get the file to you?
|
|
|
|
Mionione
Newbie
Offline
Activity: 10
Merit: 1
|
|
July 31, 2010, 11:30:11 PM |
|
care with __attribute__ ((aligned (16))) , it doesn't work with local variable, gcc doesn't align the stack
|
|
|
|
tcatm (OP)
|
|
July 31, 2010, 11:37:02 PM |
|
I'm running on the Intel Q6600 2.4Ghz, how shall I get the file to you?
yes. i will look at the assembler code. maybe the compiler did something "wrong".
|
|
|
|
tcatm (OP)
|
|
August 01, 2010, 12:00:10 AM |
|
Patch against SVN. Maybe it'll work now... diff --git a/cryptopp/sha256.cpp b/cryptopp/sha256.cpp new file mode 100644 index 0000000..6735678 --- /dev/null +++ b/cryptopp/sha256.cpp @@ -0,0 +1,447 @@ +#include <string.h> +#include <assert.h> + +#include <xmmintrin.h> +#include <stdint.h> +#include <stdio.h> + +#define NPAR 32 + +static const unsigned int sha256_consts[] = { + 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, /* 0 */ + 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, + 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, /* 8 */ + 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, + 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, /* 16 */ + 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, + 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, /* 24 */ + 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, + 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, /* 32 */ + 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, + 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, /* 40 */ + 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, + 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, /* 48 */ + 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, + 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, /* 56 */ + 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 +}; + + +static inline __m128i Ch(const __m128i b, const __m128i c, const __m128i d) { + return (b & c) ^ (~b & d); +} + +static inline __m128i Maj(const __m128i b, const __m128i c, const __m128i d) { + return (b & c) ^ (b & d) ^ (c & d); +} + +static inline __m128i ROTR(__m128i x, const int n) { + return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n); +} + +static inline __m128i SHR(__m128i x, const int n) { + return _mm_srli_epi32(x, n); +} + +/* SHA256 Functions */ +#define BIGSIGMA0_256(x) (ROTR((x), 2) ^ ROTR((x), 13) ^ ROTR((x), 22)) +#define BIGSIGMA1_256(x) (ROTR((x), 6) ^ ROTR((x), 11) ^ ROTR((x), 25)) +#define SIGMA0_256(x) (ROTR((x), 7) ^ ROTR((x), 18) ^ SHR((x), 3)) +#define SIGMA1_256(x) (ROTR((x), 17) ^ ROTR((x), 19) ^ SHR((x), 10)) + +static inline __m128i load_epi32(const unsigned int x0, const unsigned int x1, const unsigned int x2, const unsigned int x3) { + return _mm_set_epi32(x0, x1, x2, x3); +} + +static inline unsigned int store32(const __m128i x, int i) { + union { unsigned int ret[4]; __m128i x; } box; + box.x = x; + return box.ret[i]; +} + +static inline void store_epi32(const __m128i x, unsigned int *x0, unsigned int *x1, unsigned int *x2, unsigned int *x3) { + union { unsigned int ret[4]; __m128i x; } box; + box.x = x; + *x0 = box.ret[3]; *x1 = box.ret[2]; *x2 = box.ret[1]; *x3 = box.ret[0]; +} + +static inline __m128i SHA256_CONST(const int i) { + return _mm_set1_epi32(sha256_consts[i]); +} + +#define add4(x0, x1, x2, x3) _mm_add_epi32(_mm_add_epi32(_mm_add_epi32(x0, x1), x2), x3) +#define add5(x0, x1, x2, x3, x4) _mm_add_epi32(add4(x0, x1, x2, x3), x4) + +#define SHA256ROUND(a, b, c, d, e, f, g, h, i, w) \ + T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \ +d = _mm_add_epi32(d, T1); \ +T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c)); \ +h = _mm_add_epi32(T1, T2); + +#define SHA256ROUND_lastd(a, b, c, d, e, f, g, h, i, w) \ + T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \ +d = _mm_add_epi32(d, T1); +//T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c)); +//h = _mm_add_epi32(T1, T2); + +#define SHA256ROUND_last(a, b, c, d, e, f, g, h, i, w) \ + T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \ +T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c)); \ +h = _mm_add_epi32(T1, T2); + +static inline unsigned int swap(unsigned int value) { + __asm__ ("bswap %0" : "=r" (value) : "0" (value)); + return value; +} + +static inline unsigned int SWAP32(const void *addr) { + unsigned int value = (*((unsigned int *)(addr))); + __asm__ ("bswap %0" : "=r" (value) : "0" (value)); + return value; +} + +static inline void dumpreg(__m128i x, char *msg) { + union { unsigned int ret[4]; __m128i x; } box; + box.x = x ; + printf("%s %08x %08x %08x %08x\n", msg, box.ret[0], box.ret[1], box.ret[2], box.ret[3]); +} + +#if 1 +#define dumpstate(i) printf("%s: %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", \ + __func__, store32(w0, i), store32(a, i), store32(b, i), store32(c, i), store32(d, i), store32(e, i), store32(f, i), store32(g, i), store32(h, i)); +#else +#define dumpstate() +#endif +void Double_BlockSHA256(const void* pin, void* pad, const void *pre, unsigned int thash[9][NPAR], const void *init) +{ + unsigned int* In = (unsigned int*)pin; + unsigned int* Pad = (unsigned int*)pad; + unsigned int* hPre = (unsigned int*)pre; + unsigned int* hInit = (unsigned int*)init; + unsigned int i, j, k; + + /* vectors used in calculation */ + __m128i w0, w1, w2, w3, w4, w5, w6, w7; + __m128i w8, w9, w10, w11, w12, w13, w14, w15; + __m128i T1, T2; + __m128i a, b, c, d, e, f, g, h; + __m128i nonce; + + /* nonce offset for vector */ + __m128i offset = load_epi32(0x00000003, 0x00000002, 0x00000001, 0x00000000); + + + for(k = 0; k<NPAR; k+=4) { + w0 = load_epi32(In[0], In[0], In[0], In[0]); + w1 = load_epi32(In[1], In[1], In[1], In[1]); + w2 = load_epi32(In[2], In[2], In[2], In[2]); + //w3 = load_epi32(In[3], In[3], In[3], In[3]); nonce will be later hacked into the hash + w4 = load_epi32(In[4], In[4], In[4], In[4]); + w5 = load_epi32(In[5], In[5], In[5], In[5]); + w6 = load_epi32(In[6], In[6], In[6], In[6]); + w7 = load_epi32(In[7], In[7], In[7], In[7]); + w8 = load_epi32(In[8], In[8], In[8], In[8]); + w9 = load_epi32(In[9], In[9], In[9], In[9]); + w10 = load_epi32(In[10], In[10], In[10], In[10]); + w11 = load_epi32(In[11], In[11], In[11], In[11]); + w12 = load_epi32(In[12], In[12], In[12], In[12]); + w13 = load_epi32(In[13], In[13], In[13], In[13]); + w14 = load_epi32(In[14], In[14], In[14], In[14]); + w15 = load_epi32(In[15], In[15], In[15], In[15]); + + /* hack nonce into lowest byte of w3 */ + nonce = load_epi32(In[3], In[3], In[3], In[3]); + __m128i k_vec = load_epi32(k, k, k, k); + nonce = _mm_add_epi32(nonce, offset); + nonce = _mm_add_epi32(nonce, k_vec); + w3 = nonce; + + a = load_epi32(hPre[0], hPre[0], hPre[0], hPre[0]); + b = load_epi32(hPre[1], hPre[1], hPre[1], hPre[1]); + c = load_epi32(hPre[2], hPre[2], hPre[2], hPre[2]); + d = load_epi32(hPre[3], hPre[3], hPre[3], hPre[3]); + e = load_epi32(hPre[4], hPre[4], hPre[4], hPre[4]); + f = load_epi32(hPre[5], hPre[5], hPre[5], hPre[5]); + g = load_epi32(hPre[6], hPre[6], hPre[6], hPre[6]); + h = load_epi32(hPre[7], hPre[7], hPre[7], hPre[7]); + + SHA256ROUND(a, b, c, d, e, f, g, h, 0, w0); + SHA256ROUND(h, a, b, c, d, e, f, g, 1, w1); + SHA256ROUND(g, h, a, b, c, d, e, f, 2, w2); + SHA256ROUND(f, g, h, a, b, c, d, e, 3, w3); + SHA256ROUND(e, f, g, h, a, b, c, d, 4, w4); + SHA256ROUND(d, e, f, g, h, a, b, c, 5, w5); + SHA256ROUND(c, d, e, f, g, h, a, b, 6, w6); + SHA256ROUND(b, c, d, e, f, g, h, a, 7, w7); + SHA256ROUND(a, b, c, d, e, f, g, h, 8, w8); + SHA256ROUND(h, a, b, c, d, e, f, g, 9, w9); + SHA256ROUND(g, h, a, b, c, d, e, f, 10, w10); + SHA256ROUND(f, g, h, a, b, c, d, e, 11, w11); + SHA256ROUND(e, f, g, h, a, b, c, d, 12, w12); + SHA256ROUND(d, e, f, g, h, a, b, c, 13, w13); + SHA256ROUND(c, d, e, f, g, h, a, b, 14, w14); + SHA256ROUND(b, c, d, e, f, g, h, a, 15, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 16, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 17, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 18, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 19, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 20, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 21, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 22, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 23, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 24, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 25, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 26, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 27, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 28, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 29, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 30, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 31, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 32, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 33, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 34, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 35, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 36, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 37, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 38, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 39, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 40, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 41, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 42, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 43, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 44, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 45, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 46, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 47, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 48, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 49, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 50, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 51, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 52, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 53, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 54, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 55, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 56, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 57, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 58, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 59, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 60, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 61, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 62, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 63, w15); + +#define store_load(x, i, dest) \ + w8 = load_epi32((hPre)[i], (hPre)[i], (hPre)[i], (hPre)[i]); \ + dest = _mm_add_epi32(w8, x); + + store_load(a, 0, w0); + store_load(b, 1, w1); + store_load(c, 2, w2); + store_load(d, 3, w3); + store_load(e, 4, w4); + store_load(f, 5, w5); + store_load(g, 6, w6); + store_load(h, 7, w7); + + w8 = load_epi32(Pad[8], Pad[8], Pad[8], Pad[8]); + w9 = load_epi32(Pad[9], Pad[9], Pad[9], Pad[9]); + w10 = load_epi32(Pad[10], Pad[10], Pad[10], Pad[10]); + w11 = load_epi32(Pad[11], Pad[11], Pad[11], Pad[11]); + w12 = load_epi32(Pad[12], Pad[12], Pad[12], Pad[12]); + w13 = load_epi32(Pad[13], Pad[13], Pad[13], Pad[13]); + w14 = load_epi32(Pad[14], Pad[14], Pad[14], Pad[14]); + w15 = load_epi32(Pad[15], Pad[15], Pad[15], Pad[15]); + + a = load_epi32(hInit[0], hInit[0], hInit[0], hInit[0]); + b = load_epi32(hInit[1], hInit[1], hInit[1], hInit[1]); + c = load_epi32(hInit[2], hInit[2], hInit[2], hInit[2]); + d = load_epi32(hInit[3], hInit[3], hInit[3], hInit[3]); + e = load_epi32(hInit[4], hInit[4], hInit[4], hInit[4]); + f = load_epi32(hInit[5], hInit[5], hInit[5], hInit[5]); + g = load_epi32(hInit[6], hInit[6], hInit[6], hInit[6]); + h = load_epi32(hInit[7], hInit[7], hInit[7], hInit[7]); + + SHA256ROUND(a, b, c, d, e, f, g, h, 0, w0); + SHA256ROUND(h, a, b, c, d, e, f, g, 1, w1); + SHA256ROUND(g, h, a, b, c, d, e, f, 2, w2); + SHA256ROUND(f, g, h, a, b, c, d, e, 3, w3); + SHA256ROUND(e, f, g, h, a, b, c, d, 4, w4); + SHA256ROUND(d, e, f, g, h, a, b, c, 5, w5); + SHA256ROUND(c, d, e, f, g, h, a, b, 6, w6); + SHA256ROUND(b, c, d, e, f, g, h, a, 7, w7); + SHA256ROUND(a, b, c, d, e, f, g, h, 8, w8); + SHA256ROUND(h, a, b, c, d, e, f, g, 9, w9); + SHA256ROUND(g, h, a, b, c, d, e, f, 10, w10); + SHA256ROUND(f, g, h, a, b, c, d, e, 11, w11); + SHA256ROUND(e, f, g, h, a, b, c, d, 12, w12); + SHA256ROUND(d, e, f, g, h, a, b, c, 13, w13); + SHA256ROUND(c, d, e, f, g, h, a, b, 14, w14); + SHA256ROUND(b, c, d, e, f, g, h, a, 15, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 16, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 17, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 18, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 19, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 20, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 21, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 22, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 23, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 24, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 25, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 26, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 27, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 28, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 29, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 30, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 31, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 32, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 33, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 34, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 35, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 36, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 37, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 38, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 39, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 40, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 41, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 42, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 43, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 44, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 45, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 46, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 47, w15); + + w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0); + SHA256ROUND(a, b, c, d, e, f, g, h, 48, w0); + w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1); + SHA256ROUND(h, a, b, c, d, e, f, g, 49, w1); + w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2); + SHA256ROUND(g, h, a, b, c, d, e, f, 50, w2); + w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3); + SHA256ROUND(f, g, h, a, b, c, d, e, 51, w3); + w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4); + SHA256ROUND(e, f, g, h, a, b, c, d, 52, w4); + w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5); + SHA256ROUND(d, e, f, g, h, a, b, c, 53, w5); + w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6); + SHA256ROUND(c, d, e, f, g, h, a, b, 54, w6); + w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7); + SHA256ROUND(b, c, d, e, f, g, h, a, 55, w7); + w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8); + SHA256ROUND(a, b, c, d, e, f, g, h, 56, w8); + w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9); + SHA256ROUND(h, a, b, c, d, e, f, g, 57, w9); + w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10); + SHA256ROUND(g, h, a, b, c, d, e, f, 58, w10); + w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11); + SHA256ROUND(f, g, h, a, b, c, d, e, 59, w11); + w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12); + SHA256ROUND(e, f, g, h, a, b, c, d, 60, w12); + w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13); + SHA256ROUND(d, e, f, g, h, a, b, c, 61, w13); + w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14); + SHA256ROUND(c, d, e, f, g, h, a, b, 62, w14); + w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15); + SHA256ROUND(b, c, d, e, f, g, h, a, 63, w15); + + /* store resulsts directly in thash */ +#define store_2(x,i) \ + w0 = load_epi32((hInit)[i], (hInit)[i], (hInit)[i], (hInit)[i]); \ + *(__m128i *)&(thash)[i][0+k] = _mm_add_epi32(w0, x); + + store_2(a, 0); + store_2(b, 1); + store_2(c, 2); + store_2(d, 3); + store_2(e, 4); + store_2(f, 5); + store_2(g, 6); + store_2(h, 7); + *(__m128i *)&(thash)[8][0+k] = nonce; + } + +} diff --git a/main.cpp b/main.cpp index 0239915..50db1a3 100644 --- a/main.cpp +++ b/main.cpp @@ -2555,8 +2555,10 @@ inline void SHA256Transform(void* pstate, void* pinput, const void* pinit) CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput); } +// !!!! NPAR must match NPAR in cryptopp/sha256.cpp !!!! +#define NPAR 32 - +extern void Double_BlockSHA256(const void* pin, void* pout, const void *pinit, unsigned int hash[9][NPAR], const void *init2); void BitcoinMiner() @@ -2701,108 +2703,128 @@ void BitcoinMiner() uint256 hashTarget = CBigNum().SetCompact(pblock->nBits).getuint256(); uint256 hashbuf[2]; uint256& hash = *alignup<16>(hashbuf); + + // Cache for NPAR hashes + unsigned int thash[9][NPAR] __attribute__ ((aligned (16))); + + unsigned int j; loop { - SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate); - SHA256Transform(&hash, &tmp.hash1, pSHA256InitState); + Double_BlockSHA256((char*)&tmp.block + 64, &tmp.hash1, &midstate, thash, pSHA256InitState); - if (((unsigned short*)&hash)[14] == 0) + for(j = 0; j<NPAR; j++) { + if (thash[7][j] == 0) { - // Byte swap the result after preliminary check - for (int i = 0; i < sizeof(hash)/4; i++) - ((unsigned int*)&hash)[i] = ByteReverse(((unsigned int*)&hash)[i]); - - if (hash <= hashTarget) + // Byte swap the result after preliminary check + for (int i = 0; i < sizeof(hash)/4; i++) + ((unsigned int*)&hash)[i] = ByteReverse((unsigned int)thash[i][j]); + + if (hash <= hashTarget) + { + // Double_BlocSHA256 might only calculate parts of the hash. + // We'll insert the nonce and get the real hash. + //pblock->nNonce = ByteReverse(tmp.block.nNonce + j); + //hash = pblock->GetHash(); + + /* get nonce from hash */ + pblock->nNonce = ByteReverse((unsigned int)thash[8][j]); + assert(hash == pblock->GetHash()); + + //// debug print + printf("BitcoinMiner:\n"); + printf("proof-of-work found \n hash: %s \ntarget: %s\n", hash.GetHex().c_str(), hashTarget.GetHex().c_str()); + pblock->print(); + printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str()); + printf("generated %s\n", FormatMoney(pblock->vtx[0].vout[0].nValue).c_str()); + + SetThreadPriority(THREAD_PRIORITY_NORMAL); + CRITICAL_BLOCK(cs_main) { - pblock->nNonce = ByteReverse(tmp.block.nNonce); - assert(hash == pblock->GetHash()); - - //// debug print - printf("BitcoinMiner:\n"); - printf("proof-of-work found \n hash: %s \ntarget: %s\n", hash.GetHex().c_str(), hashTarget.GetHex().c_str()); - pblock->print(); - printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str()); - printf("generated %s\n", FormatMoney(pblock->vtx[0].vout[0].nValue).c_str()); - - SetThreadPriority(THREAD_PRIORITY_NORMAL); - CRITICAL_BLOCK(cs_main) - { - if (pindexPrev == pindexBest) - { - // Save key - if (!AddKey(key)) - return; - key.MakeNewKey(); - - // Track how many getdata requests this block gets - CRITICAL_BLOCK(cs_mapRequestCount) - mapRequestCount[pblock->GetHash()] = 0; - - // Process this block the same as if we had received it from another node - if (!ProcessBlock(NULL, pblock.release())) - printf("ERROR in BitcoinMiner, ProcessBlock, block not accepted\n"); - } + if (pindexPrev == pindexBest) + { + // Save key + if (!AddKey(key)) + return; + key.MakeNewKey(); + + // Track how many getdata requests this block gets + CRITICAL_BLOCK(cs_mapRequestCount) + mapRequestCount[pblock->GetHash()] = 0; + + // Process this block the same as if we had received it from another node + if (!ProcessBlock(NULL, pblock.release())) + printf("ERROR in BitcoinMiner, ProcessBlock, block not accepted\n"); + } SetThreadPriority(THREAD_PRIORITY_LOWEST); Sleep(500); break; } - } + SetThreadPriority(THREAD_PRIORITY_LOWEST); - // Update nTime every few seconds - const unsigned int nMask = 0xffff; - if ((++tmp.block.nNonce & nMask) == 0) + Sleep(500); + break; + } + } + } + + // Update nonce + tmp.block.nNonce += NPAR; + + // Update nTime every few seconds + const unsigned int nMask = 0xffff; + if ((tmp.block.nNonce & nMask) == 0) + { + // Meter hashes/sec + static int64 nTimerStart; + static int nHashCounter; + if (nTimerStart == 0) + nTimerStart = GetTimeMillis(); + else + nHashCounter++; + if (GetTimeMillis() - nTimerStart > 4000) { - // Meter hashes/sec - static int64 nTimerStart; - static int nHashCounter; - if (nTimerStart == 0) - nTimerStart = GetTimeMillis(); - else - nHashCounter++; + static CCriticalSection cs; + CRITICAL_BLOCK(cs) + { if (GetTimeMillis() - nTimerStart > 4000) { - static CCriticalSection cs; - CRITICAL_BLOCK(cs) - { - if (GetTimeMillis() - nTimerStart > 4000) - { - double dHashesPerSec = 1000.0 * (nMask+1) * nHashCounter / (GetTimeMillis() - nTimerStart); - nTimerStart = GetTimeMillis(); - nHashCounter = 0; - string strStatus = strprintf(" %.0f khash/s", dHashesPerSec/1000.0); - UIThreadCall(bind(CalledSetStatusBar, strStatus, 0)); - static int64 nLogTime; - if (GetTime() - nLogTime > 30 * 60) - { - nLogTime = GetTime(); - printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str()); - printf("hashmeter %3d CPUs %6.0f khash/s\n", vnThreadsRunning[3], dHashesPerSec/1000.0); - } - } - } + double dHashesPerSec = 1000.0 * (nMask+1) * nHashCounter / (GetTimeMillis() - nTimerStart); + nTimerStart = GetTimeMillis(); + nHashCounter = 0; + string strStatus = strprintf(" %.0f khash/s", dHashesPerSec/1000.0); + UIThreadCall(bind(CalledSetStatusBar, strStatus, 0)); + static int64 nLogTime; + if (GetTime() - nLogTime > 30 * 60) + { + nLogTime = GetTime(); + printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str()); + printf("hashmeter %3d CPUs %6.0f khash/s\n", vnThreadsRunning[3], dHashesPerSec/1000.0); + } } - - // Check for stop or if block needs to be rebuilt - if (fShutdown) - return; - if (!fGenerateBitcoins) - return; - if (fLimitProcessors && vnThreadsRunning[3] > nLimitProcessors) - return; - if (vNodes.empty()) - break; - if (tmp.block.nNonce == 0) - break; - if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60) - break; - if (pindexPrev != pindexBest) - break; - - pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime()); - tmp.block.nTime = ByteReverse(pblock->nTime); + } } + + // Check for stop or if block needs to be rebuilt + if (fShutdown) + return; + if (!fGenerateBitcoins) + return; + if (fLimitProcessors && vnThreadsRunning[3] > nLimitProcessors) + return; + if (vNodes.empty()) + break; + if (tmp.block.nNonce == 0) + break; + if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60) + break; + if (pindexPrev != pindexBest) + break; + + pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime()); + tmp.block.nTime = ByteReverse(pblock->nTime); + } } } } diff --git a/makefile.unix b/makefile.unix index e965287..04dac86 100644 --- a/makefile.unix +++ b/makefile.unix @@ -41,7 +41,8 @@ OBJS= \ obj/rpc.o \ obj/init.o \ cryptopp/obj/sha.o \ - cryptopp/obj/cpu.o + cryptopp/obj/cpu.o \ + cryptopp/obj/sha256.o all: bitcoin @@ -51,7 +52,7 @@ obj/%.o: %.cpp $(HEADERS) g++ -c $(CFLAGS) -DGUI -o $@ $< cryptopp/obj/%.o: cryptopp/%.cpp - g++ -c $(CFLAGS) -O3 -DCRYPTOPP_DISABLE_SSE2 -o $@ $< + g++ -c $(CFLAGS) -frename-registers -funroll-all-loops -fomit-frame-pointer -march=native -msse2 -msse3 -ffast-math -O3 -o $@ $< bitcoin: $(OBJS) obj/ui.o obj/uibase.o g++ $(CFLAGS) -o $@ $^ $(WXLIBS) $(LIBS) @@ -63,6 +64,9 @@ obj/nogui/%.o: %.cpp $(HEADERS) bitcoind: $(OBJS:obj/%=obj/nogui/%) g++ $(CFLAGS) -o $@ $^ $(LIBS) +test: cryptopp/obj/sha.o cryptopp/obj/sha256.o test.cpp + g++ $(CFLAGS) -o $@ $^ $(LIBS) + clean: -rm -f obj/*.o diff --git a/test.cpp b/test.cpp new file mode 100644 index 0000000..a55e972 --- /dev/null +++ b/test.cpp @@ -0,0 +1,221 @@ +// Copyright (c) 2009-2010 Satoshi Nakamoto +// Distributed under the MIT/X11 software license, see the accompanying +// file license.txt or http://www.opensource.org/licenses/mit-license.php. +#include <assert.h> +#include <openssl/ecdsa.h> +#include <openssl/evp.h> +#include <openssl/rand.h> +#include <openssl/sha.h> +#include <openssl/ripemd.h> +#include <db_cxx.h> +#include <stdio.h> +#include <stdlib.h> +#include <math.h> +#include <limits.h> +#include <float.h> +#include <assert.h> +#include <memory> +#include <iostream> +#include <sstream> +#include <string> +#include <vector> +#include <list> +#include <deque> +#include <map> +#include <set> +#include <algorithm> +#include <numeric> +#include <boost/foreach.hpp> +#include <boost/lexical_cast.hpp> +#include <boost/tuple/tuple.hpp> +#include <boost/fusion/container/vector.hpp> +#include <boost/tuple/tuple_comparison.hpp> +#include <boost/tuple/tuple_io.hpp> +#include <boost/array.hpp> +#include <boost/bind.hpp> +#include <boost/function.hpp> +#include <boost/filesystem.hpp> +#include <boost/filesystem/fstream.hpp> +#include <boost/algorithm/string.hpp> +#include <boost/interprocess/sync/interprocess_mutex.hpp> +#include <boost/interprocess/sync/interprocess_recursive_mutex.hpp> +#include <boost/date_time/gregorian/gregorian_types.hpp> +#include <boost/date_time/posix_time/posix_time_types.hpp> +#include <sys/resource.h> +#include <sys/time.h> +using namespace std; +using namespace boost; +#include "cryptopp/sha.h" +#include "strlcpy.h" +#include "serialize.h" +#include "uint256.h" +#include "bignum.h" + +#undef printf + +int FormatHashBlocks(void* pbuffer, unsigned int len) +{ + unsigned char* pdata = (unsigned char*)pbuffer; + unsigned int blocks = 1 + ((len + 8) / 64); + unsigned char* pend = pdata + 64 * blocks; + memset(pdata + len, 0, 64 * blocks - len); + pdata[len] = 0x80; + unsigned int bits = len * 8; + pend[-1] = (bits >> 0) & 0xff; + pend[-2] = (bits >> 8) & 0xff; + pend[-3] = (bits >> 16) & 0xff; + pend[-4] = (bits >> 24) & 0xff; + return blocks; +} + +using CryptoPP::ByteReverse; +static int detectlittleendian = 1; + +#define NPAR 32 + +extern void Double_BlockSHA256(const void* pin, void* pout, const void *pinit, unsigned int hash[9][NPAR], const void *init2); + +using CryptoPP::ByteReverse; + +static const unsigned int pSHA256InitState[8] = {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19}; + +inline void SHA256Transform(void* pstate, void* pinput, const void* pinit) +{ + memcpy(pstate, pinit, 32); + CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput); +} + +void BitcoinTester(char *filename) +{ + printf("SHA256 test started\n"); + + struct tmpworkspace + { + struct unnamed2 + { + int nVersion; + uint256 hashPrevBlock; + uint256 hashMerkleRoot; + unsigned int nTime; + unsigned int nBits; + unsigned int nNonce; + } + block; + unsigned char pchPadding0[64]; + uint256 hash1; + unsigned char pchPadding1[64]; + } + tmp __attribute__ ((aligned (16))); + + char line[180]; + ifstream fin(filename); + char *p; + unsigned long int totalhashes= 0; + unsigned long int found = 0; + clock_t start, end; + unsigned long int cpu_time_used; + unsigned int tnonce; + start = clock(); + + while( fin.getline(line, 180)) + { + string in(line); + //printf("%s\n", in.c_str()); + tmp.block.nVersion = strtol(in.substr(0,8).c_str(), &p, 16); + tmp.block.hashPrevBlock.SetHex(in.substr(8,64)); + tmp.block.hashMerkleRoot.SetHex(in.substr(64+8,64)); + tmp.block.nTime = strtol(in.substr(128+8,8).c_str(), &p, 16); + tmp.block.nBits = strtol(in.substr(128+16,8).c_str(), &p, 16); + tnonce = strtol(in.substr(128+24,8).c_str(), &p, 16); + tmp.block.nNonce = tnonce; + + unsigned int nBlocks0 = FormatHashBlocks(&tmp.block, sizeof(tmp.block)); + unsigned int nBlocks1 = FormatHashBlocks(&tmp.hash1, sizeof(tmp.hash1)); + + // Byte swap all the input buffer + for (int i = 0; i < sizeof(tmp)/4; i++) + ((unsigned int*)&tmp)[i] = ByteReverse(((unsigned int*)&tmp)[i]); + + // Precalc the first half of the first hash, which stays constant + uint256 midstate __attribute__ ((aligned(16))); + SHA256Transform(&midstate, &tmp.block, pSHA256InitState); + + + uint256 hashTarget = CBigNum().SetCompact(ByteReverse(tmp.block.nBits)).getuint256(); + // printf("target %s\n", hashTarget.GetHex().c_str()); + uint256 hash; + uint256 refhash __attribute__ ((aligned(16))); + + unsigned int thash[9][NPAR] __attribute__ ((aligned (16))); + int done = 0; + unsigned int i, j; + + /* reference */ + SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate); + SHA256Transform(&refhash, &tmp.hash1, pSHA256InitState); + for (int i = 0; i < sizeof(refhash)/4; i++) + ((unsigned int*)&refhash)[i] = ByteReverse(((unsigned int*)&refhash)[i]); + + //printf("reference nonce %08x:\n%s\n\n", tnonce, refhash.GetHex().c_str()); + + tmp.block.nNonce = ByteReverse(tnonce) & 0xfffff000; + + + for(;;) + { + + Double_BlockSHA256((char*)&tmp.block + 64, &tmp.hash1, &midstate, thash, pSHA256InitState); + + for(i = 0; i<NPAR; i++) { + /* fast hash checking */ + if(thash[7][i] == 0) { + // printf("found something... "); + + for(j = 0; j<8; j++) ((unsigned int *)&hash)[j] = ByteReverse((unsigned int)thash[j][i]); + // printf("%s\n", hash.GetHex().c_str()); + + if (hash <= hashTarget) + { + found++; + if(tnonce == ByteReverse((unsigned int)thash[8][i]) ) { + if(hash == refhash) { + printf("\r%lu", found); + totalhashes += NPAR; + done = 1; + } else { + printf("Hashes do not match!\n"); + } + } else { + printf("nonce does not match. %08x != %08x\n", tnonce, ByteReverse(tmp.block.nNonce + i)); + } + break; + } + } + } + if(done) break; + + tmp.block.nNonce+=NPAR; + totalhashes += NPAR; + if(tmp.block.nNonce == 0) { + printf("ERROR: Hash not found for:\n%s\n", in.c_str()); + return; + } + } + } + printf("\n"); + end = clock(); + cpu_time_used += (unsigned int)(end - start); + cpu_time_used /= ((CLOCKS_PER_SEC)/1000); + printf("found solutions = %lu\n", found); + printf("total hashes = %lu\n", totalhashes); + printf("total time = %lu ms\n", cpu_time_used); + printf("average speed: %lu khash/s\n", (totalhashes)/cpu_time_used); +} + +int main(int argc, char* argv[]) { + if(argc == 2) { + BitcoinTester(argv[1]); + } else + printf("Missing filename!\n"); + return 0; +}
|
|
|
|
|