Bitcoin Forum
December 15, 2024, 09:51:08 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3]  All
  Print  
Author Topic: [BOUNTY] sha256 shader for Linux OSS video drivers (15 BTC pledged)  (Read 31373 times)
daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
February 12, 2012, 04:33:23 PM
 #41

Well, at the moment I'm only interested in getting something running on my old GeForce 7600gs, that's not supported by Cuda or OpenCL. Just to give you an idea: my cpu mines at about 300 kilo-hashes at the moment, so 1 Mega-hash would be an improvement for me... Smiley

But even if it works, you have to consider, that the performance would be poor since simple operations like bitshifting require quite some float-operations. And there's no way to really use the entire card, since brook compiles only fragment shaders _or_ vertex shaders. The vertex profile has even limited integer support, but I have only 5 of those shaders, that's why I guess Xaci's system with the floats makes more sense.

So it's a great learning project, but don't see any practical use of it, to be honest...

daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
February 14, 2012, 10:44:03 PM
 #42

Current status:

- Code is incomplete and buggy, but compiles

- The kernel is not optimized and especially the stream transport of the nonces to the kernel is not really implemented.

- Few issues: nonce is not in the first hash, I think some infos is not passed to the 2nd hash round. And I think the endianess of the hash vs difficulty is not the same.

- The block header and the difficulty are not set yet, since I'm testing other stuff now.

- BrookGPU runs into some sort of infinite loop, consumes up to 2 gb mem and is terminated then (no clue why yet).

- If had tons of problems with arrays, since brook wanted to convert array constants to brook-streams, which are not constant during the nonce yet. No-go, so I just split the array into single vars and wrote me scripts to generate the code (since all var are passed as values and not pointers it's not so much of an issue for now).

- Arrays as local vars are causing trouble, since Brook wants to align them in some way, that cgc doesn't like, so I split them up, too.

Just to give you an idea of the strange-looking code:

Code:
/**
 * Aminer - a bitcoin miner for various platforms.
 *
 * Andreas Rueckert <a_rueckert@gmx.net>
 *
 * A good part of this code is based on the GLSL sha256 code of xaci: https://bitcointalk.org/index.php?topic=4618.msg191488#msg191488
 */

/*
#pragma optionNV looplimit 32768
*/



/**
 * Some utility functions to process integers represented as float2.
 */

/**
 * Add 2 integers represented as float2.
 *
 * Do not let overflow happen with this function, or use sum_c instead!
 */
kernel float2 add( float2 a, float2 b) {
        float2 ret;

        ret.x = a.x + b.x;
        ret.y = a.y + b.y;

        if (ret.y >= 65536.0) {
                ret.y -= 65536.0;
                ret.x += 1.0;
        }

        if (ret.x >= 65536.0) {
                ret.x -= 65536.0;
}

        return ret;
}

/**
 * Shift an integer represented as a float2 by log2(shift).
 *
 * Note: shift should be a power of two, e.g. to shift 3 steps, use 2^3.
 */
kernel float2 shiftr( float2 a, float shift) {
        float2 ret;

ret.x = a.x / shift;

ret.y = floor( a.y / shift) + frac( ret.x) * 65536.0;

ret.x = floor( ret.x);

        return ret;
}

/**
 * Rotate an integer represented as a float2 by log2(shift).
 *
 * Note: shift should be a power of two, e.g. to rotate 3 steps, use 2^3.
 */
kernel float2 rotater( float2 a, float shift) {
        float2 ret;

ret.x = a.x / shift;  // Shipt words and keep fractions to shift those bits later.
ret.y = a.y / shift;

ret.y += frac( ret.x) * 65536.0;  // Shift low bits from x into y;
ret.x += frac( ret.y) * 65536.0;  // Rotate low bits from y into x;

ret.x = floor( ret.x);  // Cut shifted bits.
ret.y = floor( ret.y);

        return ret;
}

/**
 * Xor half of an integer, represented as a float.
 */
kernel float xor16( float a<>, float b<>) {

        float ret = 0;
        float fact = 32768.0;

        while (fact > 0) {
                if( ( ( a >= fact) || ( b >= fact)) && ( ( a < fact) || ( b < fact))) {
                  ret += fact;
}

                if( a >= fact) {
                  a -= fact;
}
                if (b >= fact) {
                  b -= fact;
}

                fact /= 2.0;
        }
        return ret;
}

/**
 * Xor a complete integer represetended as a float2.
 */
kernel float2 xor( float2 a<>, float2 b<>) {
       float2 ret = { xor16( a.x, b.x), xor16( a.y, b.y) };

       return ret;
}

/**
 * And operation on half of an integer, represented as a float.
 */
kernel float and16( float a<>, float b<>) {
        float ret = 0;
        float fact = 32768.0;

        while (fact > 0) {
                if( ( a >= fact) && ( b >= fact)) {
                  ret += fact;
}

                if( a >= fact) {
                  a -= fact;
}
                if (b >= fact) {
                  b -= fact;
}

                fact /= 2.0;
        }
        return ret;
}

/**
 * And operation on a full integer, represented as a float2.
 */
kernel float2 and( float2 a<>, float2 b<>) {
        float2 ret =  { and16( a.x, b.x), and16( a.y, b.y) };

        return ret;
}

/*
 * Logical complement ("not")
 */
kernel float2 not( float2 a<>) {
       float2 ret = { 65535.0 - a.x, 65535.0 - a.y};

       return ret;
}

/**
 * Swap the 2 words of an int.
 */
kernel swapw( float2 a) {
       float2 ret;

       ret.x = a.y;
       ret.y = a.x;

       return ret;
}

/**
 * Swap the 2 bytes in an 16-bit word.
 */
kernel float swapb( float a) {
       float ret = a / 256.0;

       ret += frac( ret) * 65536.0;

       return floor( ret);
}

/**
 * Swap the 4 bytes of a 4-byte integer;
 */
kernel float2 swapInt( float2 a) {
       float2 ret = swapw( a);

       ret.x = swapb( ret.x);
       ret.y = swapb( ret.y);

       return ret;
}

/**
 * Check if float2 integer a is smaller than float2 integer b.
 */
kernel float isSmaller( float2 a, float2 b) {
       if( ( a.x < b.x) || ( ( a.x == a.x) && ( a.y < b.y))) {
           return 1.0;
       } else {
           return 0.0;
       }
}

kernel float2 blend( float2 m16, float2 m15, float2 m07, float2 m02) {
        float2 s0 = xor( rotater( m15, 128.0), xor( rotater( swapw( m15), 4.0), shiftr( m15, 8)));
        float2 s1 = xor( rotater( swapw( m02), 2.0), xor( rotater( swapw( m02), 8.0), shiftr( m02, 1024.0)));

        return add( add( m16, s0), add( m07, s1));
}

kernel float2 s0( float2 a) {
        return xor( rotater( a, 4.0), xor( rotater( a, 8192.0), rotater( swapw( a), 64.0)));
}

kernel float2 s1( float2 a) {
        return xor( rotater( a, 64.0), xor( rotater( a, 2048.0), rotater( swapw( a), 512.0)));
}

kernel float2 ch( float2 a, float2 b, float2 c) {
        return xor( and( a, b), and( not( a), c));
}

kernel float2 maj( float2 a, float2 b, float2 c) {
        return xor( xor( and( a, b), and( a, c)), and( b, c));
}


/**
 * Let the kernel check a nonce for a given bitcoin block.
 * That's basically 2 rounds of sha256 and a difficulty check.
 *
 * @param nonce The nonce to test for the given block
 * @param block_header* The block header as a set of ints, since brcc always converts constant arrays to brook::stream here.
 * @param difficulty The difficulty as a 256-bit superlong int.
 *
 * @return result Return the nonce, if it is valid. Return -nonce if not.
 */
kernel void kernelMinerCheckNonce( float2 nonce<>
             , float2 block_header0
   , float2 block_header1
   , float2 block_header2
   , float2 block_header3
   , float2 block_header4
   , float2 block_header5
   , float2 block_header6
   , float2 block_header7
   , float2 block_header8
   , float2 block_header9
             , float2 block_header10
   , float2 block_header11
   , float2 block_header12
   , float2 block_header13
   , float2 block_header14
   , float2 block_header15
   , float2 block_header16
   , float2 block_header17
   , float2 block_header18
   , float2 block_header19
   , float2 decoded_difficulty0
   , float2 decoded_difficulty1
   , float2 decoded_difficulty2
   , float2 decoded_difficulty3
   , float2 decoded_difficulty4
   , float2 decoded_difficulty5
   , float2 decoded_difficulty6
   , float2 decoded_difficulty7
   , out float2 result<>) {

       // brcc seems to have problems with array alignment in fp40 model, so no arrays for now... :-(
       float2 k0,k1,k2,k3,k4,k5,k6,k7,k8,k9,k10,k11,k12,k13,k14,k15;
       float2 k16,k17,k18,k19,k20,k21,k22,k23,k24,k25,k26,k27,k28,k29,k30,k31;
       float2 k32,k33,k34,k35,k36,k37,k38,k39,k40,k41,k42,k43,k44,k45,k46,k47;
       float2 k48,k49,k50,k51,k52,k53,k54,k55,k56,k57,k58,k59,k60,k61,k62,k63;
       float2 h0,h1,h2,h3,h4,h5,h6,h7;
       float2 w0,w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12,w13,w14,w15;
       float2 w16,w17,w18,w19,w20,w21,w22,w23,w24,w25,w26,w27,w28,w29,w30,w31;
       float2 w32,w33,w34,w35,w36,w37,w38,w39,w40,w41,w42,w43,w44,w45,w46,w47;
       float2 w48,w49,w50,w51,w52,w53,w54,w55,w56,w57,w58,w59,w60,w61,w62,w63;
       float2 a,b,c,d,e,f,g,h;
       float2 t1, t2;

       // Initialize k
       // ( Code generated with the following c-program:
       /*
       #include <stdio.h>

       int main(int argc, char *agv[]) {
            unsigned int k[64] = { 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
                         0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
   0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
   0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
           0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
       0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
         0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
           0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 };
    int i;

        for( i = 0; i < 64; ++i) {
            printf( "k%d.x = %5d.0; k%d.y = %5d.0;\n", i, k[i] >> 16, i, k[i] & 0xffff);
            }
       }
       */
       
       k0.x = 17034.0; k0.y = 12184.0;
       k1.x = 28983.0; k1.y = 17553.0;
       k2.x = 46528.0; k2.y = 64463.0;
       k3.x = 59829.0; k3.y = 56229.0;
       k4.x = 14678.0; k4.y = 49755.0;
       k5.x = 23025.0; k5.y =  4593.0;
       k6.x = 37439.0; k6.y = 33444.0;
       k7.x = 43804.0; k7.y = 24277.0;
       k8.x = 55303.0; k8.y = 43672.0;
       k9.x =  4739.0; k9.y = 23297.0;
       k10.x =  9265.0; k10.y = 34238.0;
       k11.x = 21772.0; k11.y = 32195.0;
       k12.x = 29374.0; k12.y = 23924.0;
       k13.x = 32990.0; k13.y = 45566.0;
       k14.x = 39900.0; k14.y =  1703.0;
       k15.x = 49563.0; k15.y = 61812.0;
       k16.x = 58523.0; k16.y = 27073.0;
       k17.x = 61374.0; k17.y = 18310.0;
       k18.x =  4033.0; k18.y = 40390.0;
       k19.x =  9228.0; k19.y = 41420.0;
       k20.x = 11753.0; k20.y = 11375.0;
       k21.x = 19060.0; k21.y = 33962.0;
       k22.x = 23728.0; k22.y = 43484.0;
       k23.x = 30457.0; k23.y = 35034.0;
       k24.x = 38974.0; k24.y = 20818.0;
       k25.x = 43057.0; k25.y = 50797.0;
       k26.x = 45059.0; k26.y = 10184.0;
       k27.x = 48985.0; k27.y = 32711.0;
       k28.x = 50912.0; k28.y =  3059.0;
       k29.x = 54695.0; k29.y = 37191.0;
       k30.x =  1738.0; k30.y = 25425.0;
       k31.x =  5161.0; k31.y = 10599.0;
       k32.x = 10167.0; k32.y =  2693.0;
       k33.x = 11803.0; k33.y =  8504.0;
       k34.x = 19756.0; k34.y = 28156.0;
       k35.x = 21304.0; k35.y =  3347.0;
       k36.x = 25866.0; k36.y = 29524.0;
       k37.x = 30314.0; k37.y =  2747.0;
       k38.x = 33218.0; k38.y = 51502.0;
       k39.x = 37490.0; k39.y = 11397.0;
       k40.x = 41663.0; k40.y = 59553.0;
       k41.x = 43034.0; k41.y = 26187.0;
       k42.x = 49739.0; k42.y = 35696.0;
       k43.x = 51052.0; k43.y = 20899.0;
       k44.x = 53650.0; k44.y = 59417.0;
       k45.x = 54937.0; k45.y =  1572.0;
       k46.x = 62478.0; k46.y = 13701.0;
       k47.x =  4202.0; k47.y = 41072.0;
       k48.x =  6564.0; k48.y = 49430.0;
       k49.x =  7735.0; k49.y = 27656.0;
       k50.x = 10056.0; k50.y = 30540.0;
       k51.x = 13488.0; k51.y = 48309.0;
       k52.x = 14620.0; k52.y =  3251.0;
       k53.x = 20184.0; k53.y = 43594.0;
       k54.x = 23452.0; k54.y = 51791.0;
       k55.x = 26670.0; k55.y = 28659.0;
       k56.x = 29839.0; k56.y = 33518.0;
       k57.x = 30885.0; k57.y = 25455.0;
       k58.x = 33992.0; k58.y = 30740.0;
       k59.x = 36039.0; k59.y =   520.0;
       k60.x = 37054.0; k60.y = 65530.0;
       k61.x = 42064.0; k61.y = 27883.0;
       k62.x = 48889.0; k62.y = 41975.0;
       k63.x = 50801.0; k63.y = 30962.0;


       // Initialize h

       h0.x = 27145.0; h0.y = 58983.0;  // 0x6a09 0xe667
       h1.x = 47975.0; h1.y = 44677.0;  // 0xbb67 0xae85
       h2.x = 15470.0; h2.y = 62322.0;  // 0x3c6e 0xf372
       h3.x = 42319.0; h3.y = 62778.0;  // 0xa54f 0xf53a
       h4.x = 20750.0; h4.y = 21119.0;  // 0x510e 0x527f
       h5.x = 39685.0; h5.y = 26764.0;  // 0x9b05 0x688c
       h6.x =  8067.0; h6.y = 55723.0;  // 0x1f83 0xd9ab
       h7.x = 23520.0; h7.y = 52505.0;  // 0x5be0 0xcd19

       // For the following algorithm, see: http://en.wikipedia.org/wiki/SHA-2#Examples_of_SHA-2_variants

       // Initialize the first 16 w values
       // ToDo? Precompute this outside of the kernel, since the nonce is not in these 16 values?

       /*
        * Process the message in successive 512-bit chunks:
* break message into 512-bit chunks
* for each chunk
    *   break chunk into sixteen 32-bit big-endian words w[0..15]
*/
// Implementation:
w0 = swapInt( block_header0);
w1 = swapInt( block_header1);
w2 = swapInt( block_header2);
w3 = swapInt( block_header3);
w4 = swapInt( block_header4);
w5 = swapInt( block_header5);
w6 = swapInt( block_header6);
w7 = swapInt( block_header7);
w8 = swapInt( block_header8);
w9 = swapInt( block_header9);
w10 = swapInt( block_header10);
w11 = swapInt( block_header11);
w12 = swapInt( block_header12);
w13 = swapInt( block_header13);
w14 = swapInt( block_header14);
w15 = swapInt( block_header15);

/*
         * for( i = 16; i < 64; i++) {
        *   w[i] = blend( w[ i - 16], w[ i - 15], w[ i -7], w[ i - 2]);
        * }
*/
// Implementation
// (Generated with bash script: for i in {16..63}; do echo "w$i = blend( w$((i - 16)), w$((i - 15)), w$((i -7)), w$((i - 2)));"; done ):
w16 = blend( w0, w1, w9, w14);
w17 = blend( w1, w2, w10, w15);
w18 = blend( w2, w3, w11, w16);
w19 = blend( w3, w4, w12, w17);
w20 = blend( w4, w5, w13, w18);
w21 = blend( w5, w6, w14, w19);
w22 = blend( w6, w7, w15, w20);
w23 = blend( w7, w8, w16, w21);
w24 = blend( w8, w9, w17, w22);
w25 = blend( w9, w10, w18, w23);
w26 = blend( w10, w11, w19, w24);
w27 = blend( w11, w12, w20, w25);
w28 = blend( w12, w13, w21, w26);
w29 = blend( w13, w14, w22, w27);
w30 = blend( w14, w15, w23, w28);
w31 = blend( w15, w16, w24, w29);
w32 = blend( w16, w17, w25, w30);
w33 = blend( w17, w18, w26, w31);
w34 = blend( w18, w19, w27, w32);
w35 = blend( w19, w20, w28, w33);
w36 = blend( w20, w21, w29, w34);
w37 = blend( w21, w22, w30, w35);
w38 = blend( w22, w23, w31, w36);
w39 = blend( w23, w24, w32, w37);
w40 = blend( w24, w25, w33, w38);
w41 = blend( w25, w26, w34, w39);
w42 = blend( w26, w27, w35, w40);
w43 = blend( w27, w28, w36, w41);
w44 = blend( w28, w29, w37, w42);
w45 = blend( w29, w30, w38, w43);
w46 = blend( w30, w31, w39, w44);
w47 = blend( w31, w32, w40, w45);
w48 = blend( w32, w33, w41, w46);
w49 = blend( w33, w34, w42, w47);
w50 = blend( w34, w35, w43, w48);
w51 = blend( w35, w36, w44, w49);
w52 = blend( w36, w37, w45, w50);
w53 = blend( w37, w38, w46, w51);
w54 = blend( w38, w39, w47, w52);
w55 = blend( w39, w40, w48, w53);
w56 = blend( w40, w41, w49, w54);
w57 = blend( w41, w42, w50, w55);
w58 = blend( w42, w43, w51, w56);
w59 = blend( w43, w44, w52, w57);
w60 = blend( w44, w45, w53, w58);
w61 = blend( w45, w46, w54, w59);
w62 = blend( w46, w47, w55, w60);
w63 = blend( w47, w48, w56, w61);

/*
* Initialize hash value for this chunk:
    * a := h0
* b := h1
    * c := h2
    * d := h3
    * e := h4
    * f := h5
    * g := h6
    * h := h7
*/
// Implementation:
a = h0;
b = h1;
c = h2;
d = h3;
e = h4;
f = h5;
g = h6;
h = h7;

/*
* Main loop:
* for i from 0 to 63
         *     s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
         *     maj := (a and b) xor (a and c) xor (b and c)
         *     t2 := s0 + maj
         *     s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
         *     ch := (e and f) xor ((not e) and g)
         *     t1 := h + s1 + ch + k[i] + w[i]
*
*     h := g
         *     g := f
         *     f := e
         *     e := d + t1
         *     d := c
         *     c := b
         *     b := a
         *     a := t1 + t2
*/
// Implementation:
// ( Generated with bash script:
// for i in {0..63}; do echo "t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k$i + w$i;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;"; done
// )
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k0 + w0;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k1 + w1;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k2 + w2;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k3 + w3;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k4 + w4;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k5 + w5;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k6 + w6;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k7 + w7;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k8 + w8;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k9 + w9;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k10 + w10;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k11 + w11;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k12 + w12;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k13 + w13;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k14 + w14;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k15 + w15;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k16 + w16;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k17 + w17;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k18 + w18;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k19 + w19;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k20 + w20;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k21 + w21;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k22 + w22;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k23 + w23;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k24 + w24;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k25 + w25;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k26 + w26;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k27 + w27;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k28 + w28;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k29 + w29;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k30 + w30;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k31 + w31;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k32 + w32;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k33 + w33;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k34 + w34;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k35 + w35;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k36 + w36;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k37 + w37;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k38 + w38;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k39 + w39;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k40 + w40;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k41 + w41;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k42 + w42;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k43 + w43;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k44 + w44;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k45 + w45;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k46 + w46;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k47 + w47;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k48 + w48;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k49 + w49;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k50 + w50;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k51 + w51;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k52 + w52;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k53 + w53;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k54 + w54;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k55 + w55;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k56 + w56;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k57 + w57;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k58 + w58;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k59 + w59;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k60 + w60;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k61 + w61;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k62 + w62;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k63 + w63;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;

/*
* Add this chunk's hash to result so far:
    * h0 := h0 + a
    * h1 := h1 + b
    * h2 := h2 + c
    * h3 := h3 + d
    * h4 := h4 + e
    * h5 := h5 + f
    * h6 := h6 + g
    * h7 := h7 + h
*/
h0 = add( h0, a);
h1 = add( h1, b);
h2 = add( h2, c);
h3 = add( h3, d);
h4 = add( h4, e);
h5 = add( h5, f);
h6 = add( h6, g);
h7 = add( h7, h);

// The result of the first sha256 round should be now in h0..h7 as a big endian encoded int.

// So use it as the new input for w0..w15

// ToDo: check if the h-order should be reversed, like w0 = h15; w1 = h14; ...

w0 = h0; 
w1 = h1;
w2 = h2;
w3 = h3;
w4 = h4;
w5 = h5;
w6 = h6;
w7 = h7;
w8.x = 0.0; w8.y = 0.0;
w9 = w8;
w10 = w8;
w11 = w8;
w12 = w8;
w13 = w8;
w14 = w8;
w15 = w8;

// Re-initialize h for the new sha256 round

        h0.x = 27145.0; h0.y = 58983.0;  // 0x6a09 0xe667
        h1.x = 47975.0; h1.y = 44677.0;  // 0xbb67 0xae85
        h2.x = 15470.0; h2.y = 62322.0;  // 0x3c6e 0xf372
        h3.x = 42319.0; h3.y = 62778.0;  // 0xa54f 0xf53a
        h4.x = 20750.0; h4.y = 21119.0;  // 0x510e 0x527f
        h5.x = 39685.0; h5.y = 26764.0;  // 0x9b05 0x688c
        h6.x =  8067.0; h6.y = 55723.0;  // 0x1f83 0xd9ab
        h7.x = 23520.0; h7.y = 52505.0;  // 0x5be0 0xcd19
/*
         * for( i = 16; i < 64; i++) {
        *   w[i] = blend( w[ i - 16], w[ i - 15], w[ i -7], w[ i - 2]);
        * }
*/
// Implementation
// (Generated with bash script: for i in {16..63}; do echo "w$i = blend( w$((i - 16)), w$((i - 15)), w$((i -7)), w$((i - 2)));"; done ):
w16 = blend( w0, w1, w9, w14);
w17 = blend( w1, w2, w10, w15);
w18 = blend( w2, w3, w11, w16);
w19 = blend( w3, w4, w12, w17);
w20 = blend( w4, w5, w13, w18);
w21 = blend( w5, w6, w14, w19);
w22 = blend( w6, w7, w15, w20);
w23 = blend( w7, w8, w16, w21);
w24 = blend( w8, w9, w17, w22);
w25 = blend( w9, w10, w18, w23);
w26 = blend( w10, w11, w19, w24);
w27 = blend( w11, w12, w20, w25);
w28 = blend( w12, w13, w21, w26);
w29 = blend( w13, w14, w22, w27);
w30 = blend( w14, w15, w23, w28);
w31 = blend( w15, w16, w24, w29);
w32 = blend( w16, w17, w25, w30);
w33 = blend( w17, w18, w26, w31);
w34 = blend( w18, w19, w27, w32);
w35 = blend( w19, w20, w28, w33);
w36 = blend( w20, w21, w29, w34);
w37 = blend( w21, w22, w30, w35);
w38 = blend( w22, w23, w31, w36);
w39 = blend( w23, w24, w32, w37);
w40 = blend( w24, w25, w33, w38);
w41 = blend( w25, w26, w34, w39);
w42 = blend( w26, w27, w35, w40);
w43 = blend( w27, w28, w36, w41);
w44 = blend( w28, w29, w37, w42);
w45 = blend( w29, w30, w38, w43);
w46 = blend( w30, w31, w39, w44);
w47 = blend( w31, w32, w40, w45);
w48 = blend( w32, w33, w41, w46);
w49 = blend( w33, w34, w42, w47);
w50 = blend( w34, w35, w43, w48);
w51 = blend( w35, w36, w44, w49);
w52 = blend( w36, w37, w45, w50);
w53 = blend( w37, w38, w46, w51);
w54 = blend( w38, w39, w47, w52);
w55 = blend( w39, w40, w48, w53);
w56 = blend( w40, w41, w49, w54);
w57 = blend( w41, w42, w50, w55);
w58 = blend( w42, w43, w51, w56);
w59 = blend( w43, w44, w52, w57);
w60 = blend( w44, w45, w53, w58);
w61 = blend( w45, w46, w54, w59);
w62 = blend( w46, w47, w55, w60);
w63 = blend( w47, w48, w56, w61);

/*
* Initialize hash value for this chunk:
    * a := h0
* b := h1
    * c := h2
    * d := h3
    * e := h4
    * f := h5
    * g := h6
    * h := h7
*/
// Implementation:
a = h0;
b = h1;
c = h2;
d = h3;
e = h4;
f = h5;
g = h6;
h = h7;

/*
* Main loop:
* for i from 0 to 63
         *     s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
         *     maj := (a and b) xor (a and c) xor (b and c)
         *     t2 := s0 + maj
         *     s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
         *     ch := (e and f) xor ((not e) and g)
         *     t1 := h + s1 + ch + k[i] + w[i]
*
*     h := g
         *     g := f
         *     f := e
         *     e := d + t1
         *     d := c
         *     c := b
         *     b := a
         *     a := t1 + t2
*/
// Implementation:
// ( Generated with bash script:
// for i in {0..63}; do echo "t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k$i + w$i;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;"; done
// )
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k0 + w0;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k1 + w1;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k2 + w2;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k3 + w3;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k4 + w4;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k5 + w5;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k6 + w6;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k7 + w7;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k8 + w8;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k9 + w9;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k10 + w10;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k11 + w11;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k12 + w12;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k13 + w13;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k14 + w14;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k15 + w15;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k16 + w16;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k17 + w17;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k18 + w18;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k19 + w19;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k20 + w20;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k21 + w21;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k22 + w22;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k23 + w23;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k24 + w24;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k25 + w25;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k26 + w26;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k27 + w27;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k28 + w28;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k29 + w29;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k30 + w30;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k31 + w31;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k32 + w32;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k33 + w33;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k34 + w34;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k35 + w35;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k36 + w36;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k37 + w37;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k38 + w38;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k39 + w39;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k40 + w40;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k41 + w41;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k42 + w42;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k43 + w43;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k44 + w44;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k45 + w45;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k46 + w46;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k47 + w47;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k48 + w48;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k49 + w49;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k50 + w50;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k51 + w51;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k52 + w52;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k53 + w53;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k54 + w54;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k55 + w55;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k56 + w56;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k57 + w57;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k58 + w58;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k59 + w59;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k60 + w60;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k61 + w61;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k62 + w62;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;
t2 = s0( a) + maj( a, b, c); t1 = h + s1( e) + ch( e, f, g) + k63 + w63;  h = g; g = f; f = e; e = d + t1; d = c; c = b; b = a; a = t1 + t2;

/*
* Add this chunk's hash to result so far:
    * h0 := h0 + a
    * h1 := h1 + b
    * h2 := h2 + c
    * h3 := h3 + d
    * h4 := h4 + e
    * h5 := h5 + f
    * h6 := h6 + g
    * h7 := h7 + h
*/
h0 = add( h0, a);
h1 = add( h1, b);
h2 = add( h2, c);
h3 = add( h3, d);
h4 = add( h4, e);
h5 = add( h5, f);
h6 = add( h6, g);
h7 = add( h7, h);

// Compare the nonce with the decoded difficulty.

// ToDo: check for endianess!!!

if( isSmaller( h7, decoded_difficulty7)
    + isSmaller( h6, decoded_difficulty6)
    + isSmaller( h5, decoded_difficulty5)
    + isSmaller( h4, decoded_difficulty4)
    + isSmaller( h3, decoded_difficulty3)
    + isSmaller( h2, decoded_difficulty2)
    + isSmaller( h1, decoded_difficulty1)
    + isSmaller( h0, decoded_difficulty0) == 8.0) {
    result = nonce;  // Found a valid nonce!
} else {
    result.x = -nonce.x;  // Return -nonce to indicate, that this nonce was not valid...
    result.y = -nonce.y;
}

}

Ciao,
Andreas

K1773R
Legendary
*
Offline Offline

Activity: 1792
Merit: 1008


/dev/null


View Profile
October 30, 2012, 02:56:24 PM
 #43

bounty still "alive"?

[GPG Public Key]
BTC/DVC/TRC/FRC: 1K1773RbXRZVRQSSXe9N6N2MUFERvrdu6y ANC/XPM AK1773RTmRKtvbKBCrUu95UQg5iegrqyeA NMC: NK1773Rzv8b4ugmCgX789PbjewA9fL9Dy1 LTC: LKi773RBuPepQH8E6Zb1ponoCvgbU7hHmd EMC: EK1773RxUes1HX1YAGMZ1xVYBBRUCqfDoF BQC: bK1773R1APJz4yTgRkmdKQhjhiMyQpJgfN
jgarzik (OP)
Legendary
*
Offline Offline

Activity: 1596
Merit: 1100


View Profile
October 30, 2012, 04:34:25 PM
 #44

bounty still "alive"?

Yes.  I'll personally keep it alive, matching the $subject pledge (200 BTC).

Updated OP.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
K1773R
Legendary
*
Offline Offline

Activity: 1792
Merit: 1008


/dev/null


View Profile
November 01, 2012, 10:35:55 AM
Last edit: November 15, 2012, 10:34:27 PM by K1773R
 #45

EDIT: nvm Wink

[GPG Public Key]
BTC/DVC/TRC/FRC: 1K1773RbXRZVRQSSXe9N6N2MUFERvrdu6y ANC/XPM AK1773RTmRKtvbKBCrUu95UQg5iegrqyeA NMC: NK1773Rzv8b4ugmCgX789PbjewA9fL9Dy1 LTC: LKi773RBuPepQH8E6Zb1ponoCvgbU7hHmd EMC: EK1773RxUes1HX1YAGMZ1xVYBBRUCqfDoF BQC: bK1773R1APJz4yTgRkmdKQhjhiMyQpJgfN
PeanutPower
Member
**
Offline Offline

Activity: 89
Merit: 10



View Profile
March 13, 2013, 05:36:18 PM
 #46

any updates on this project?

I have a nice old Radeon HD 2600 to play with
MoonShadow
Legendary
*
Offline Offline

Activity: 1708
Merit: 1010



View Profile
March 13, 2013, 05:48:53 PM
 #47

I have an update.  I'm withdrawing any pledges that I have made here.  I no longer consider this project to be relevant or worthwhile.

"The powers of financial capitalism had another far-reaching aim, nothing less than to create a world system of financial control in private hands able to dominate the political system of each country and the economy of the world as a whole. This system was to be controlled in a feudalist fashion by the central banks of the world acting in concert, by secret agreements arrived at in frequent meetings and conferences. The apex of the systems was to be the Bank for International Settlements in Basel, Switzerland, a private bank owned and controlled by the world's central banks which were themselves private corporations. Each central bank...sought to dominate its government by its ability to control Treasury loans, to manipulate foreign exchanges, to influence the level of economic activity in the country, and to influence cooperative politicians by subsequent economic rewards in the business world."

- Carroll Quigley, CFR member, mentor to Bill Clinton, from 'Tragedy And Hope'
jgarzik (OP)
Legendary
*
Offline Offline

Activity: 1596
Merit: 1100


View Profile
March 13, 2013, 09:39:04 PM
 #48

ACK.  I left the 15 BTC pledge active, as I do consider the project still worthwhile... although of diminished importance in FPGA/ASIC era.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
PeanutPower
Member
**
Offline Offline

Activity: 89
Merit: 10



View Profile
March 27, 2013, 11:33:47 AM
 #49

$1200 eh Wink
Luke-Jr
Legendary
*
Offline Offline

Activity: 2576
Merit: 1186



View Profile
April 12, 2013, 12:08:34 AM
 #50

Slashdot: Open Source Radeon Gallium3D OpenCL Stack Adds Bitcoin Mining

In the process of working out the kinks now...

jenga
Newbie
*
Offline Offline

Activity: 26
Merit: 0


View Profile
September 02, 2013, 06:55:17 PM
 #51

Hi everyone.
I'm probably reviving an old thread and I apologize for it.

I made a GLSL implementation of the sha256d script I posted here:
https://bitcointalk.org/index.php?topic=286532

Using Opengl 3.3 and GLSL shaders 1.3 with built-in bitwise operations support I can almost reach cgminer performance.

As an example, in a machine I got a GeForce 9600GT. Pretty bad for mining, but shows very well how things went out.

https://en.bitcoin.it/wiki/Mining_hardware_comparison
The chart tells that the GPU gets 15.66 Mh/s. That's what a couple of benchmarks show:

9600GT -> ideally: 15.66 Mh/s
9600GT -> cgminer speed: 15.34 Mh/s
9600GT -> my GLSL script: 14.80 Mh/s

Not sure how this is going to help. The shader right now is naively translated from C code, but even if most of it can be optimized for glsl at most one can reach cgminer performance but not a Kh more.

I considered the possibility of using a combination of vertex/fragment/geometry shaders together but this didn't work out (how do you call 1M vertex shaders? With 1M GL_POINTS to be rasterized...).

Btw I post the code here just in case someone is interested.

Code:
#version 130
#pragma optionNV(unroll all)

uint ROTLEFT(in uint a, in int b) { return (a << b) | (a >> (32-b)); }
uint ROTRIGHT(in uint a, in int b) { return (a >> b) | (a << (32-b)); }

uint CH(in uint x,in uint y,in uint z) { return (x & y) ^ (~x & z); }
uint MAJ(in uint x,in uint y,in uint z) { return (x & y) ^ (x & z) ^ (y & z); }
uint EP0(in uint x) { return ROTRIGHT(x,2) ^ ROTRIGHT(x,13) ^ ROTRIGHT(x,22); }
uint EP1(in uint x) { return ROTRIGHT(x,6) ^ ROTRIGHT(x,11) ^ ROTRIGHT(x,25); }
uint SIG0(in uint x) { return ROTRIGHT(x,7) ^ ROTRIGHT(x,18) ^ (x >> 3); }
uint SIG1(in uint x) { return ROTRIGHT(x,17) ^ ROTRIGHT(x,19) ^ (x >> 10); }

uint k[64] = uint[64](
0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5,0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5,
0xd807aa98,0x12835b01,0x243185be,0x550c7dc3,0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174,
0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc,0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da,
0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7,0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967,
0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13,0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85,
0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3,0xd192e819,0xd6990624,0xf40e3585,0x106aa070,
0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5,0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3,
0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208,0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
);

uniform uint midstate[8];
uniform uint text[16];

void main() {

uint a,b,c,d,e,f,g,h,t1,t2,m[64];
uint ee,eee,eeee;
int i;

a = midstate[0];
b = midstate[1];
c = midstate[2];
d = midstate[3];
e = midstate[4];
f = midstate[5];
g = midstate[6];
h = midstate[7];

for (i = 0;  i < 16; i++) m[i] = text[i];

for (; i < 64; i++) m[i] = SIG1(m[i-2]) + m[i-7] + SIG0(m[i-15]) + m[i-16];

for (i = 0; i < 64; i++) {
t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];
t2 = EP0(a) + MAJ(a,b,c);
h = g;
g = f;
f = e;
e = d + t1;
d = c;
c = b;
b = a;
a = t1 + t2;
}

m[0] = midstate[0] + a;
m[1] = midstate[1] + b;
m[2] = midstate[2] + c;
m[3] = midstate[3] + d;
m[4] = midstate[4] + e;
m[5] = midstate[5] + f;
m[6] = midstate[6] + g;
m[7] = midstate[7] + h;

a = 0x6a09e667U;
b = 0xbb67ae85U;
c = 0x3c6ef372U;
d = 0xa54ff53aU;
e = 0x510e527fU;
f = 0x9b05688cU;
g = 0x1f83d9abU;
h = 0x5be0cd19U;

m[8]  = 0x80000000U;
m[9]  = 0x00U;
m[10] = 0x00U;
m[11] = 0x00U;
m[12] = 0x00U;
m[13] = 0x00U;
m[14] = 0x00U;
m[15] = 0x100U;

for (i = 16; i < 64; i++) m[i] = SIG1(m[i-2]) + m[i-7] + SIG0(m[i-15]) + m[i-16];

for (i = 0; i < 57; i++) {
t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];
t2 = EP0(a) + MAJ(a,b,c);
h = g;
g = f;
f = e;
e = d + t1;
d = c;
c = b;
b = a;
a = t1 + t2;
}

eeee = d + h + EP1(e) + CH(e,f,g) + 0x78a5636fU + m[57];
eee = c + g + EP1(eeee) + CH(eeee,e,f) + 0x84c87814U + m[58];
ee = b + f + EP1(eee) + CH(eee,eeee,e) + 0x8cc70208U + m[59];
h = a + e + EP1(ee) + CH(ee,eee,eeee) + 0x90befffaU + m[60];

if (0x5be0cd19U + h == 0x00U) {
gl_FragColor=vec4(0.0,1.0,0.0,1.0);
} else { gl_FragColor=vec4(1.0,0.0,0.0,1.0); }
}
teknohog
Sr. Member
****
Offline Offline

Activity: 520
Merit: 253


555


View Profile WWW
September 03, 2013, 11:59:39 AM
 #52

Here's another 10 BTC. With the recent USD price of bitcoins, I wouldn't even say "it's not much".

Retracting my offer, as GPUs are now useless for sha256 mining.

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
jgarzik (OP)
Legendary
*
Offline Offline

Activity: 1596
Merit: 1100


View Profile
September 03, 2013, 02:14:55 PM
 #53

Here's another 10 BTC. With the recent USD price of bitcoins, I wouldn't even say "it's not much".

Retracting my offer, as GPUs are now useless for sha256 mining.

Yes, the OP was long ago updated to reflect current bounty status (15 BTC from me).


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
Pages: « 1 2 [3]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!