Bitcoin Forum
December 12, 2024, 07:34:49 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 »  All
  Print  
Author Topic: [BOUNTY] sha256 shader for Linux OSS video drivers (15 BTC pledged)  (Read 31373 times)
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
May 20, 2011, 07:17:37 AM
 #21

:-(

Well, its why OpenCL was invented in the first place. Using GLSL for even generic computing tasks that don't fit the OpenGL workflow is problematic and really not worth it.

I don't want to shoot down anyone's hopes, but Mesa already is growing OpenCL support for Gallium. What more could we possibly ask for?
best wishes ?
quicker Gallium3D adoption ?
quicker Mesa development[along with Gallium3D] ?
better drivers[esp free drivers. now about 10x times slower than proprietary counterparts] ?
more suitable SDK' ?
AMD/NVidia support for both developers and OpenCL itself[today Intel CPU's had better OpenCL support than Nvidia GPU's].
and yes OpenCL/OpenGL ES is cool. at least in theory. heil glorious OpenMAXdeveloper


Mesa and Gallium are open source projects, you can always start developing for them.

Basiley
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
May 20, 2011, 07:28:01 AM
 #22

im terrific developer, never seriously wrote anything serious[years ago was last time] so im hardly helpful for such project.
except docs polishing maybe or translation.
until they need AV engineer or system administrator or regional representative[Russia to be particular] and etc.
Zamicol
Newbie
*
Offline Offline

Activity: 56
Merit: 0



View Profile
June 09, 2011, 02:10:59 AM
 #23

I should probably retract my offer... I sold all my Bitcoins the other day.  I didn't even think about this until last night.  Seeing where this discussion went, I hope that isn't a problem. 
xaci
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
June 09, 2011, 05:01:57 PM
 #24

As Diablo has already pointed out, GLSL (version 1.2 and earlier) has no support for 32-bit integers nor bitwise operators. GLSL 1.2 corresponds to OpenGL 2.1, which is what you'll currently get with FLOSS drivers (i.e. Mesa/Gallium). It is in theory possible to do equivalent calculations using float-pairs (16-bits in each) and do conditional-arithmetic equivalents for XOR, bitshift, rotation etc. (f.e. division with a power of two is the same as right-shift, rotation may be implemented by moving fractional bits after a shift, and so on). Look, it *might* work, but there will be absolutely no gain at all. You'll be lucky if you get a few Mhash/s from it.

GLSL versions 1.3 and above have support for 32-bit unsigned integers as well as bitwise operators. I'm in progress of writing a GLSL 1.3 shader, and it should be completed in a few days. The downside with this is that it requires (at least partial support of) OpenGL 3.0. There is no complete FLOSS OpenGL 3.0 implementation. AFAIK, the proprietary ATI/AMD driver has OpenGL 3.0 support for R600 and later. On the other hand, in practice, only the extension GL_EXT_gpu_shader4 is necessary, not the complete OpenGL 3.0 (at least I think that's right). If any of the FLOSS drivers implements that extension, then it would/should be possible to run my (to be written) shader on those drivers as well. In any case, the proprietary ATI/AMD driver should run it, which means it will become possible to mine on R600 and R700 hardware which does not support OpenCL.

Unfortunately it will take a few days before I get access to hardware to test this out. I do have a HD3850, but nowhere to plug it in.

See below (sorry, couldn't attach it) for a ridiculous GLSL 1.2 shader that (partially) calculates SHA256 hashes. (NOTE: it's incomplete, and may even be incorrect since it's untested -- it also crashes my system which has an old Intel IGP with partial GLSL support)

Code:
#version 120

/*
32-bit integers are represented by a vec2. GLSL 2 integers may only have up
to 16-bit precision (in portable code), and they are likely to be implemented
with floats anyway. Instead we use float-pairs, with 16-bit in each (although
floats fit 24-bit precision). A vec4 is also used instead of two vec2, where
possible.
*/

uniform vec4 data[8]; /* Second part of data */
uniform vec4 hash1[4]; /* Second part of hash1 */
uniform vec4 midstate[4];
uniform vec4 target[4];
uniform vec2 nonce_base;

/* Note: N is the width of the buffer and should only be between 1 and 2048 or
so. Preferably less -- around 128 or 256. */
uniform float N;

/* Note: offset is two independent floats, with values between 0 and N. */
varying vec2 varying_nonce_offset;

const vec4 stdstate[4] = vec4[](
vec4 (float (0x6a09), float (0xe667), float (0xbb67), float (0xae85)),
vec4 (float (0x3c6e), float (0xf372), float (0xa54f), float (0xf53a)),
vec4 (float (0x510e), float (0x527f), float (0x9b05), float (0x688c)),
vec4 (float (0x1f83), float (0xd9ab), float (0x5be0), float (0xcd19)));

const vec4 k[32] = vec4[](
vec4 (float (0x428a), float (0x2f98), float (0x7137), float (0x4491)),
vec4 (float (0xb5c0), float (0xfbcf), float (0xe9b5), float (0xdba5)),
vec4 (float (0x3956), float (0xc25b), float (0x59f1), float (0x11f1)),
vec4 (float (0x923f), float (0x82a4), float (0xab1c), float (0x5ed5)),

vec4 (float (0xd807), float (0xaa98), float (0x1283), float (0x5b01)),
vec4 (float (0x2431), float (0x85be), float (0x550c), float (0x7dc3)),
vec4 (float (0x72be), float (0x5d74), float (0x80de), float (0xb1fe)),
vec4 (float (0x9bdc), float (0x06a7), float (0xc19b), float (0xf174)),

vec4 (float (0xe49b), float (0x69c1), float (0xefbe), float (0x4786)),
vec4 (float (0x0fc1), float (0x9dc6), float (0x240c), float (0xa1cc)),
vec4 (float (0x2de9), float (0x2c6f), float (0x4a74), float (0x84aa)),
vec4 (float (0x5cb0), float (0xa9dc), float (0x76f9), float (0x88da)),

vec4 (float (0x983e), float (0x5152), float (0xa831), float (0xc66d)),
vec4 (float (0xb003), float (0x27c8), float (0xbf59), float (0x7fc7)),
vec4 (float (0xc6e0), float (0x0bf3), float (0xd5a7), float (0x9147)),
vec4 (float (0x06ca), float (0x6351), float (0x1429), float (0x2967)),

vec4 (float (0x27b7), float (0x0a85), float (0x2e1b), float (0x2138)),
vec4 (float (0x4d2c), float (0x6dfc), float (0x5338), float (0x0d13)),
vec4 (float (0x650a), float (0x7354), float (0x766a), float (0x0abb)),
vec4 (float (0x81c2), float (0xc92e), float (0x9272), float (0x2c85)),

vec4 (float (0xa2bf), float (0xe8a1), float (0xa81a), float (0x664b)),
vec4 (float (0xc24b), float (0x8b70), float (0xc76c), float (0x51a3)),
vec4 (float (0xd192), float (0xe819), float (0xd699), float (0x0624)),
vec4 (float (0xf40e), float (0x3585), float (0x106a), float (0xa070)),

vec4 (float (0x19a4), float (0xc116), float (0x1e37), float (0x6c08)),
vec4 (float (0x2748), float (0x774c), float (0x34b0), float (0xbcb5)),
vec4 (float (0x391c), float (0x0cb3), float (0x4ed8), float (0xaa4a)),
vec4 (float (0x5b9c), float (0xca4f), float (0x682e), float (0x6ff3)),

vec4 (float (0x748f), float (0x82ee), float (0x78a5), float (0x636f)),
vec4 (float (0x84c8), float (0x7814), float (0x8cc7), float (0x0208)),
vec4 (float (0x90be), float (0xfffa), float (0xa450), float (0x6ceb)),
vec4 (float (0xbef9), float (0xa3f7), float (0xc671), float (0x78f2)));

/* For rotr (>>) use division with appropriate power of 2. */

/* Do not let overflow happen with this function, or use sum_c instead! */
vec2 sum (vec2 a, vec2 b)
{
vec2 ret;
ret.x = a.x + b.x;
ret.y = a.y + b.y;
if (ret.y >= float(0x10000))
{
ret.y -= float(0x10000);
ret.x += 1.0;
}
if (ret.x >= float(0x10000))
ret.x -= float(0x10000);
return ret;
}

vec2 sum_c (vec2 a, vec2 b, out float carry)
{
vec2 ret;
ret.x = a.x + b.x;
ret.y = a.y + b.y;
if (ret.y >= float(0x10000))
{
ret.y -= float(0x10000);
ret.x += 1.0;
}
if (ret.x >= float(0x10000))
{
ret.x -= float(0x10000);
carry = 1.0;
}
return ret;
}

vec2 prod (float a, float b)
{
vec2 ret;
ret.x = 0;
ret.y = a * b;
if (ret.y >= float(0x10000))
{
float c = floor (ret.y / float(0x10000));
ret.x += c;
ret.y -= c * float(0x10000);
}
return ret;
}

/* Note: shift should be a power of two, e.g. to shift 3 steps, use 2^3. */
vec2 sftr (vec2 a, float shift)
{
vec2 ret = a / shift;
ret = vec2 (floor (ret.x), floor (ret.y) + fract (ret.x) * float (0x10000));
return ret;
}

/* Note: shift should be a power of two, e.g. to rotate 3 steps, use 2^3. */
vec2 rotr (vec2 a, float shift)
{
vec2 ret = a / shift;
ret = floor (ret) + fract (ret.yx) * float (0x10000);
return ret;
}

float xor16 (float a, float b)
{
float ret = 0;
float fact = float (0x8000);
while (fact > 0)
{
if ((a >= fact || b >= fact) && (a < fact || b < fact))
ret += fact;

if (a >= fact)
a -= fact;
if (b >= fact)
b -= fact;

fact /= 2.0;
}
return ret;
}

vec2 xor (vec2 a, vec2 b)
{
return vec2 (xor16 (a.x, b.x), xor16 (a.y, b.y));
}

float and16 (float a, float b)
{
float ret = 0;
float fact = float (0x8000);
while (fact > 0)
{
/* TODO: This still does XOR */
if ((a >= fact || b >= fact) && (a < fact || b < fact))
ret += fact;

if (a >= fact)
a -= fact;
if (b >= fact)
b -= fact;

fact /= 2.0;
}
return ret;
}

vec2 and (vec2 a, vec2 b)
{
return vec2 (and16 (a.x, b.x), and16 (a.y, b.y));
}

/* Logical complement ("not") */
vec2 cpl (vec2 a)
{
return vec2 (float (0x10000), float (0x10000)) - a;
}

#define POW_2_01 2.0
#define POW_2_02 4.0
#define POW_2_03 8.0
#define POW_2_06 64.0
#define POW_2_07 128.0
#define POW_2_09 512.0
#define POW_2_10 1024.0
#define POW_2_11 2048.0
#define POW_2_13 8192.0

vec2 blend (vec2 m16, vec2 m15, vec2 m07, vec2 m02)
{
vec2 s0 = xor (rotr (m15   , POW_2_07), xor (rotr (m15.yx, POW_2_02), sftr (m15, POW_2_03)));
vec2 s1 = xor (rotr (m02.yx, POW_2_01), xor (rotr (m02.yx, POW_2_03), sftr (m02, POW_2_10)));
return sum (sum (m16, s0), sum (m07, s1));
}

vec2 e0 (vec2 a)
{
return xor (rotr (a, POW_2_02), xor (rotr (a, POW_2_13), rotr (a.yx, POW_2_06)));
}

vec2 e1 (vec2 a)
{
return xor (rotr (a, POW_2_06), xor (rotr (a, POW_2_11), rotr (a.yx, POW_2_09)));
}

vec2 ch (vec2 a, vec2 b, vec2 c)
{
return xor (and (a, b), and (cpl (a), c));
}

vec2 maj (vec2 a, vec2 b, vec2 c)
{
return xor (xor (and (a, b), and (a, c)), and (b, c));
}

void main ()
{
vec2 nonce_offset = floor (varying_nonce_offset);
vec2 nonce = sum (nonce_base, sum(prod(nonce_offset.y, N), vec2 (0.0, nonce_offset.x)));

vec4 w[24];
vec4 hash0[4];
vec4 tmp[4];
#define a (tmp[0].xy)
#define b (tmp[0].zw)
#define c (tmp[1].xy)
#define d (tmp[1].zw)
#define e (tmp[2].xy)
#define f (tmp[2].zw)
#define g (tmp[3].xy)
#define h (tmp[3].zw)
vec2 t1, t2;

/* TODO: Using midstate as state, calculate hash "hash0" of data with nonce applied */
w[0].xy = blend (data[0].xy, data[0].zw, data[4].zw, data[7].xy);
w[0].zw = blend (data[0].zw, data[1].xy, data[5].xy, data[7].zw);
w[1].xy = blend (data[1].xy, data[1].zw, data[5].zw,    w[0].xy);
w[1].zw = blend (data[1].zw, data[2].xy, data[6].xy,    w[0].zw);
w[2].xy = blend (data[2].xy, data[2].zw, data[6].zw,    w[1].xy);
w[2].zw = blend (data[2].zw, nonce.xy,   data[7].xy,    w[1].zw);
w[3].xy = blend (nonce.xy,   nonce.zw,   data[7].zw,    w[2].xy);
w[3].zw = blend (nonce.zw,   data[4].xy,    w[0].xy,    w[2].zw);
w[4].xy = blend (data[4].xy, data[4].zw,    w[0].zw,    w[3].xy);
w[4].zw = blend (data[4].zw, data[5].xy,    w[1].xy,    w[3].zw);
w[5].xy = blend (data[5].xy, data[5].zw,    w[1].zw,    w[4].xy);
w[5].zw = blend (data[5].zw, data[6].xy,    w[2].xy,    w[4].zw);
w[6].xy = blend (data[6].xy, data[6].zw,    w[2].zw,    w[5].xy);
w[6].zw = blend (data[6].zw, data[7].xy,    w[3].xy,    w[5].zw);
w[7].xy = blend (data[7].xy, data[7].zw,    w[3].zw,    w[6].xy);
w[7].zw = blend (data[7].zw, w[0].xy,    w[4].xy,    w[6].zw);
for (int i = 8; i < 24; ++i)
{
w[i].xy = blend (w[i-8].xy, w[i-8].zw, w[i-4].zw, w[i-1].xy);
w[i].zw = blend (w[i-8].zw, w[i-7].xy, w[i-3].xy, w[i-1].zw);
}
tmp = midstate;

/* TODO: Add loop-unrolled of i = 0 to 3, where data is used instead of w. */
/*for (int i = 4; i < 32; i+=4)
{
t1 = sum (sum (sum (sum (h, e1(e)), ch(e,f,g)), k[i+0].xy), w[i-4+0].xy);
t2 = sum (e0(a), maj(a,b,c)); d = sum (d, t1); h = sum (t1, t2);
t1 = sum (sum (sum (sum (g, e1(d)), ch(d,e,f)), k[i+0].zw), w[i-4+0].zw);
t2 = sum (e0(h), maj(h,a,b)); c = sum (c, t1); g = sum (t1, t2);
t1 = sum (sum (sum (sum (f, e1(c)), ch(c,d,e)), k[i+1].xy), w[i-4+1].xy);
t2 = sum (e0(g), maj(g,h,a)); b = sum (b, t1); f = sum (t1, t2);
t1 = sum (sum (sum (sum (e, e1(b)), ch(b,c,d)), k[i+1].zw), w[i-4+1].zw);
t2 = sum (e0(f), maj(f,g,h)); a = sum (a, t1); e = sum (t1, t2);
t1 = sum (sum (sum (sum (d, e1(a)), ch(a,b,c)), k[i+2].xy), w[i-4+2].xy);
t2 = sum (e0(e), maj(e,f,g)); h = sum (h, t1); d = sum (t1, t2);
t1 = sum (sum (sum (sum (c, e1(h)), ch(h,a,b)), k[i+2].zw), w[i-4+2].zw);
t2 = sum (e0(d), maj(d,e,f)); g = sum (g, t1); c = sum (t1, t2);
t1 = sum (sum (sum (sum (b, e1(g)), ch(g,h,a)), k[i+3].xy), w[i-4+3].xy);
t2 = sum (e0(c), maj(c,d,e)); f = sum (f, t1); b = sum (t1, t2);
t1 = sum (sum (sum (sum (a, e1(f)), ch(f,g,h)), k[i+3].zw), w[i-4+3].zw);
t2 = sum (e0(b), maj(b,c,d)); e = sum (e, t1); a = sum (t1, t2);
}*/

/* TODO: More iterations... Copy-paste block and fix k-index and W-value. */

for (int i = 0; i < 4; ++i)
{
hash0[i].xy = sum (midstate[i].xy, tmp[i].xy);
hash0[i].zw = sum (midstate[i].zw, tmp[i].zw);
}

vec4 hash[4];
/* TODO: Using stdstate as state, calculate the hash of (hash0, hash1) */

/* TODO: Compare with target. */

gl_FragColor.r = nonce.y / 255.0;
if (mod (nonce.y, 2.0) == 0.0)
gl_FragColor.r = 0;
else
gl_FragColor.r = 1;
}
xf2_org
Member
**
Offline Offline

Activity: 98
Merit: 13


View Profile
June 09, 2011, 06:51:08 PM
 #25

As Diablo has already pointed out, GLSL (version 1.2 and earlier) has no support for 32-bit integers nor bitwise operators. GLSL 1.2 corresponds to OpenGL 2.1, which is what you'll currently get with FLOSS drivers (i.e. Mesa/Gallium). It is in theory possible to do equivalent calculations using float-pairs (16-bits in each) and do conditional-arithmetic equivalents for XOR, bitshift, rotation etc. (f.e. division with a power of two is the same as right-shift, rotation may be implemented by moving fractional bits after a shift, and so on). Look, it *might* work, but there will be absolutely no gain at all. You'll be lucky if you get a few Mhash/s from it.

The bounty is for full-performance assembly that works on ATI 5870/5970 at a minimum, not slow GLSL.

Please ignore Diablo-D3, he has a talent for taking threads off-topic.

The bounty requires loading full performance binary code onto ATI hardware running full open source drivers/stack.

xaci
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
June 09, 2011, 07:22:19 PM
 #26

Well, I don't think an OpenGL 3 shader will be necessarily slow. The GLSL code is also compiled into a binary to be run on the GPU, after all. The OpenGL 3 shading language has enough features to do SHA256 without ugly hacks. It should in theory be just as fast as an OpenCL equivalent (I will admit that I'm not sure though. However, I believe it's worth a try -- especially for models where OpenCL is not an alternative). Do you know if the FLOSS drivers for ATI 5870/5970 (or other models) support GL_EXT_gpu_shader4?

And just in case it wasn't obvious; I didn't expect to collect a bounty for that code I posted.
LegitBit
Full Member
***
Offline Offline

Activity: 140
Merit: 100



View Profile
June 09, 2011, 07:36:52 PM
 #27

Forgive me if I am totally wrong.. but don't ATI cards have specific tessellation units separate from the shader ALU's?

Tessellation is a math heavy algo, but nVidia cards even surpass ATI's in this case.

Is that because of tessellation requiring more iterations? Math isn't my strong suit, but I figure a probe in that direction might help.

Donate : 1EiAKUmTVtqXsaGLKQQVvLT9DDnHsT7jTZ (Block Explorer)
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
June 09, 2011, 10:39:13 PM
 #28

Forgive me if I am totally wrong.. but don't ATI cards have specific tessellation units separate from the shader ALU's?

Tessellation is a math heavy algo, but nVidia cards even surpass ATI's in this case.

Is that because of tessellation requiring more iterations? Math isn't my strong suit, but I figure a probe in that direction might help.

Both ATI and Nvidia have fixed function hardware dedicated to tessellation. Nvidia 5xx performance on tess is about the same as Radeon 5xxx/68xx performance, which both are really inferior to 69xx performance.

error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
June 10, 2011, 01:48:35 AM
 #29

Oh goody, now I can mine not only on my huge stack of 5850s, but also on the motherboard's embedded HD3300!

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
derjanb
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
July 08, 2011, 01:44:49 PM
 #30

I've created a WebGL bitcoin miner: http://forum.bitcoin.org/index.php?topic=27056.0

Maybe the shader can be reused for this?!
teknohog
Sr. Member
****
Offline Offline

Activity: 520
Merit: 253


555


View Profile WWW
December 14, 2011, 10:22:43 PM
 #31

Seems like regular OpenCL is on its way to the opensource Radeon drivers:

http://www.phoronix.com/scan.php?page=news_item&px=MTAyNTg

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines
shakaru
Sr. Member
****
Offline Offline

Activity: 406
Merit: 250


QUIFAS EXCHANGE


View Profile
December 16, 2011, 12:36:34 AM
 #32

Holy necroposting batman!

                             ▄▄▄████████▄▄▄
                         ▄▄██████████████████▄▄
                       ▄███████▄▄▄▄▄▄▄▄▄▄███████▄
                     ▄█████▄▄██████████████▄▄█████▄
        ██████  █████████▄████████████████████▄█████
        ██████  ███████▄████████▄▄▄▄▄▄▄▄████████▄████
                      ▄██████▀████████████▀██████▄████
███████   █████████████████████████████████████████████
███████   █████████████████████████████████████████████
                   ████████████████████████████████████
     ██████████████████████████████████████████████████
     ██████████████████████████████████████████████████
                     █████████████████████████████████
            ██████████▀██████▄████████████▄██████▀████
            ███████████▀████████▀▀▀▀▀▀▀▀▀▀███████▄███
                    █████▀████████████████▄▀██████▄
                     ▀█████▀▀██████████████▀██▀██████▄
                       ▀███████▀▀▀▀▀▀▀▀▀▀███████▀▀▀▀▀▀
                         ▀▀██████████████████▀▀
                             ▀▀▀████████▀▀▀
QUIFAS                    
                    ███
 █              ███ ███
 █              ███  █
███          █  ███
███         ███  █
███  █      ███  █
    ███  █  ███  █
    ███ ███  █   █
     █   █   █
     █      
daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
February 04, 2012, 02:06:28 AM
 #33

@xaci: thanks a lot for your sha256 code. Do you know the NVidia cg compiler? It has 32 bit ints AFAIK? I have a Geforce 7 card, that is supported by cgc, but has only OpenGL 2.1 support AFAIK at the moment (can't check it at the moment, since I'm at another machine). If I'd write a BrookGPU kernel and use the cgc compiler, I'd have 32 bit integers, right?

TIA,
Andreas

daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
February 07, 2012, 09:13:21 PM
 #34

Did anyone got the sha256 GLSL code to work?

So far I was reading GLSL tutorials hacked me a test app together (from too many sources to recall all the authors... sorry Sad ):

Code:
#include <stdio.h>                      //C standard IO
#include <stdlib.h>                     //C standard lib
#include <string.h>                     //C string lib

#include <GL/glew.h>                    //GLEW lib
#include <GL/glut.h>                    //GLUT lib


//Function from: http://www.evl.uic.edu/aej/594/code/ogl.cpp
//Read in a textfile (GLSL program)
// we need to pass it as a string to the GLSL driver
char *textFileRead(char *fn) {
  FILE *fp;
  char *content = NULL;
  
  int count=0;
  
  if (fn != NULL) {
    
    fp = fopen(fn,"rt");
    
    if (fp != NULL) {
      
      fseek(fp, 0, SEEK_END);
      count = ftell(fp);
      rewind(fp);
      
      if (count > 0) {
        content = (char *)malloc(sizeof(char) * (count+1));
        count = fread(content,sizeof(char),count,fp);
        content[count] = '\0';
      }
      fclose(fp);
      
    }
  }
  
  return content;
}

//Function from: http://www.evl.uic.edu/aej/594/code/ogl.cpp
//Read in a textfile (GLSL program)
// we can use this to write to a text file
int textFileWrite(char *fn, char *s) {
  FILE *fp;
  int status = 0;
  
  if (fn != NULL) {
    fp = fopen(fn,"w");
    
    if (fp != NULL) {                  
      if (fwrite(s,sizeof(char),strlen(s),fp) == strlen(s))
        status = 1;
      fclose(fp);
    }
  }
  return(status);
}

/**
 * Setup shaders
 */
void setShaders() {
  char *my_fragment_shader_source;
  // char * my_vertex_shader_source;
  GLenum error;

  GLenum my_program;
  // GLenum my_vertex_shader;
  GLenum my_fragment_shader;
  
  // Get Vertex And Fragment Shader Sources
  my_fragment_shader_source = textFileRead( "sha256.glsl");
  // my_vertex_shader_source = GetVertexShaderSource();

  // my_vertex_shader = glCreateShaderObjectARB(GL_VERTEX_SHADER_ARB);
  my_fragment_shader = glCreateShaderObjectARB(GL_FRAGMENT_SHADER_ARB);
 
  // Load Shader Sources
  // glShaderSourceARB(my_vertex_shader, 1, &my_vertex_shader_source, NULL);
  glShaderSourceARB( my_fragment_shader, 1, (const GLcharARB** )&my_fragment_shader_source, NULL);
 
  // Compile The Shaders
  // glCompileShaderARB(my_vertex_shader);
  glCompileShaderARB(my_fragment_shader);
  
  // Check for compile errors
  int compiled = 0;
  glGetObjectParameterivARB( my_fragment_shader, GL_OBJECT_COMPILE_STATUS_ARB, &compiled );

  if  ( !compiled ) {
    int maxLength;

    glGetShaderiv( my_fragment_shader, GL_INFO_LOG_LENGTH, &maxLength);
 
    /* The maxLength includes the NULL character */
    char *fragmentInfoLog = malloc( maxLength *sizeof(char));
    
    glGetShaderInfoLog( my_fragment_shader, maxLength, &maxLength, fragmentInfoLog);
 
    printf( "Compile error log: %s\n\n", fragmentInfoLog);

    /* Handle the error in an appropriate way such as displaying a message or writing to a log file. */
    /* In this simple program, we'll just leave */
    free( fragmentInfoLog);

    // printf( "compile error...\n" );
  }

  // Create Shader And Program Objects
  my_program = glCreateProgramObjectARB();

  if(( error=glGetError()) != GL_NO_ERROR) {
    exit( error);
  }

  // Attach The Shader Objects To The Program Object
  // glAttachObjectARB(my_program, my_vertex_shader);
  glAttachObjectARB(my_program, my_fragment_shader);
 
  // Link The Program Object
  glLinkProgramARB(my_program);
  
  // Use The Program Object Instead Of Fixed Function OpenGL
  glUseProgramObjectARB(my_program);
}

int main( int argc, char *argv[]) {

  glutInit(&argc, argv);
  //glutInitDisplayMode(GLUT_DEPTH | GLUT_DOUBLE | GLUT_RGBA);
  glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGBA);
  glutInitWindowPosition(100,100);
  glutInitWindowSize(320,320);
  glutCreateWindow("GPU");
  
  //  glutDisplayFunc(renderScene);
  // glutIdleFunc(renderScene);
  // glutReshapeFunc(changeSize);
  // glutKeyboardFunc(processNormalKeys);
  
  glewInit();
  if (glewIsSupported("GL_VERSION_2_1"))
    printf("Ready for OpenGL 2.1\n");
  else {
    printf("OpenGL 2.1 not supported\n");
    exit(1);
  }
  if (GLEW_ARB_vertex_shader && GLEW_ARB_fragment_shader && GL_EXT_geometry_shader4)
    printf("Ready for GLSL - vertex, fragment, and geometry units\n");
  else {
    printf("Not totally ready :( \n");
    exit(1);
  }

  setShaders();
  
  glutMainLoop();
  
  // just for compatibiliy purposes
  return 0;

  // glDeleteObjectARB( my_program);
  // glDeleteObjectARB( my_fragment_shader);
}

There are lots of bugs in this code, but at the moment, I just want to compile the shader and start it to do further checks.

I also wrote me a small makefile:
Code:
PROGRAM := glslminer

SOURCES := $(wildcard *.c)

CC = gcc
CCOPTS =
LINKEROPTS = -lGL -lGLEW -lglut

.PHONY: all
all:
        $(CC) $(CCOPTS) $(LINKEROPTS) $(SOURCES) -o $(PROGRAM)

.PHONY: clean
        rm *.o

and when I compile and start the code as root (as a regular user, I don't get access the the nvidia card here), I get:

Code:
localhost glsl # ./glslminer 
Ready for OpenGL 2.1
Ready for GLSL - vertex, fragment, and geometry units
Compile error log: 0(232) : error C1031: swizzle mask element not present in operand "zw"
0(233) : error C1031: swizzle mask element not present in operand "zw"

, which seems to mean, that some of the <something>.zw operations fail (I don't know the linenumber yet, since the newlines seems to get lost in my shader source import).

Anyone with more luck?

Ciao,
Andreas

daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
February 07, 2012, 09:50:36 PM
 #35

After some more debugging, it seems I've found the problem:

in line 210 nonce is declared as a vec2, so it has 2 elements x and y. But in line 232 and 233 (IIRC), nonce.zw is used for computation. Doesn't work as nonce has no element z and w. When I change those expression to nonce.xy the code compiles and it seems there's even something started, although I get no output so far. Will have to investigate that further and fix more issues of the test code.

Any help is really appreciated!

Ciao,
Andreas

ThiagoCMC
Legendary
*
Offline Offline

Activity: 1204
Merit: 1000

฿itcoin: Currency of Resistance!


View Profile
February 11, 2012, 11:35:26 PM
 #36

subscribing... I love Linux and its new video memory management, called GEM + KMS...

Mining with purely open source tools and drivers will be awesome!!

I wanna test this out!!
daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
February 12, 2012, 02:25:09 PM
 #37

Getting the GLSL code to work properly is really tricky to me. Here's a tutorial that describes some of the issues:

http://www.mathematik.tu-dortmund.de/~goeddeke/gpgpu/tutorial.html

You have to render the GLSL output to a texture and read it back to the host.

At this point, I'm not really sure how Xaci wants to pass the header and the nonce to the shader. Is the header supposed to be variable in a way, too?

I'm trying to simplify things for me a bit, so I translated some of the code to BrookGPU to get float2 streams. This might give a performance hit, since I'm not sure yet, what kind of texture brook generates and passes to the GPU (I've found some posting that said it's a streamlength^2 * 4 * sizeof(float) texture, which would be really big.

So as I'm trying to simplyfy things, I just assume the header as constant and pass an array of nonces to the kernel. The shader should then replace the header nonce with the current nonce and do the double sha256 computation. I guess I'll have to pass the decoded difficulty, too, but I'll see that later...

Ciao,
Andreas

daybyter
Legendary
*
Offline Offline

Activity: 965
Merit: 1000


View Profile
February 12, 2012, 04:09:18 PM
 #38

Completed 'and' and fixed bug in 'not' function. This is the brook version, but it should be easy to port the change back to GLSL if wanted:

Code:
/**
 * Some utility functions to process integers represented as float2.
 */

/**
 * Add 2 integers represented as float2.
 *
 * Do not let overflow happen with this function, or use sum_c instead!
 */
kernel float2 add( float2 a, float2 b) {
        float2 ret;

        ret.x = a.x + b.x;
        ret.y = a.y + b.y;

        if (ret.y >= 65536.0) {
                ret.y -= 65536.0;
                ret.x += 1.0;
        }

        if (ret.x >= 65536.0) {
                ret.x -= 65536.0;
}

        return ret;
}

/**
 * Shift an integer represented as a float2 by log2(shift).
 *
 * Note: shift should be a power of two, e.g. to shift 3 steps, use 2^3.
 */
kernel float2 shiftr( float2 a, float shift) {
        float2 ret;

ret.x = a.x / shift;

ret.y = floor( a.y / shift) + frac( ret.x) * 65536.0;

ret.x = floor( ret.x);

        return ret;
}

/**
 * Rotate an integer represented as a float2 by log2(shift).
 *
 * Note: shift should be a power of two, e.g. to rotate 3 steps, use 2^3.
 */
kernel float2 rotater( float2 a, float shift) {
        float2 ret;

ret.x = a.x / shift;  // Shipt words and keep fractions to shift those bits later.
ret.y = a.y / shift;

ret.y += frac( ret.x) * 65536.0;  // Shift low bits from x into y;
ret.x += frac( ret.y) * 65536.0;  // Rotate low bits from y into x;

ret.x = floor( ret.x);  // Cut shifted bits.
ret.y = floor( ret.y);

        return ret;
}

/**
 * Xor half of an integer, represented as a float.
 */
kernel float xor16( float a<>, float b<>) {

        float ret = 0;
        float fact = 32768.0;

        while (fact > 0) {
                if( ( ( a >= fact) || ( b >= fact)) && ( ( a < fact) || ( b < fact))) {
                  ret += fact;
}

                if( a >= fact) {
                  a -= fact;
}
                if (b >= fact) {
                  b -= fact;
}

                fact /= 2.0;
        }
        return ret;
}

/**
 * Xor a complete integer represetended as a float2.
 */
kernel float2 xor( float2 a<>, float2 b<>) {
       float2 ret = { xor16( a.x, b.x), xor16( a.y, b.y) };

       return ret;
}

/**
 * And operation on half of an integer, represented as a float.
 */
kernel float and16( float a<>, float b<>) {
        float ret = 0;
        float fact = 32768.0;

        while (fact > 0) {
                if( ( a >= fact) && ( b >= fact)) {
                  ret += fact;
}

                if( a >= fact) {
                  a -= fact;
}
                if (b >= fact) {
                  b -= fact;
}

                fact /= 2.0;
        }
        return ret;
}

/**
 * And operation on a full integer, represented as a float2.
 */
kernel float2 and( float2 a<>, float2 b<>) {
        float2 ret =  { and16( a.x, b.x), and16( a.y, b.y) };

        return ret;
}

/*
 * Logical complement ("not")
 */
kernel float2 not( float2 a<>) {
       float2 ret = { 65535.0 - a.x, 65535.0 - a.y};

       return ret;
}

/**
 * Swap the 2 words of an int.
 */
kernel swapw( float2 a) {
       float2 ret;

       ret.x = a.y;
       ret.y = a.x;

       return ret;
}

kernel float2 blend( float2 m16, float2 m15, float2 m07, float2 m02) {
        float2 s0 = xor( rotater( m15, 128.0), xor( rotater( swapw( m15), 4.0), shiftr( m15, 8)));
        float2 s1 = xor( rotater( swapw( m02), 2.0), xor( rotater( swapw( m02), 8.0), shiftr( m02, 1024.0)));

        return add( add( m16, s0), add( m07, s1));
}

kernel float2 e0( float2 a) {
        return xor( rotater( a, 4.0), xor( rotater( a, 8192.0), rotater( swapw( a), 64.0)));
}

kernel float2 e1( float2 a) {
        return xor( rotater( a, 64.0), xor( rotater( a, 2048.0), rotater( swapw( a), 512.0)));
}

kernel float2 ch( float2 a, float2 b, float2 c) {
        return xor( and( a, b), and( not( a), c));
}

kernel float2 maj( float2 a, float2 b, float2 c) {
        return xor( xor( and( a, b), and( a, c)), and( b, c));
}

This code compiles here at least. Don't know if it actually works, since I don't have the actually sha256 code in brook yet.

Ciao,
Andreas

bulanula
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile
February 12, 2012, 04:12:48 PM
 #39

Getting the GLSL code to work properly is really tricky to me. Here's a tutorial that describes some of the issues:

http://www.mathematik.tu-dortmund.de/~goeddeke/gpgpu/tutorial.html

You have to render the GLSL output to a texture and read it back to the host.

At this point, I'm not really sure how Xaci wants to pass the header and the nonce to the shader. Is the header supposed to be variable in a way, too?

I'm trying to simplify things for me a bit, so I translated some of the code to BrookGPU to get float2 streams. This might give a performance hit, since I'm not sure yet, what kind of texture brook generates and passes to the GPU (I've found some posting that said it's a streamlength^2 * 4 * sizeof(float) texture, which would be really big.

So as I'm trying to simplyfy things, I just assume the header as constant and pass an array of nonces to the kernel. The shader should then replace the header nonce with the current nonce and do the double sha256 computation. I guess I'll have to pass the decoded difficulty, too, but I'll see that later...

Ciao,
Andreas


If you can get this working you are my absolute HERO.

I absolutely DESPISE ATI and their proprietary BS drivers that always break. Once they fix X then Y comes up and once they fix Y then Z and X comes up etc.

It's a never ending cycle of desperation, at least for me.

Good luck !
Dusty
Hero Member
*****
Offline Offline

Activity: 731
Merit: 503


Libertas a calumnia


View Profile WWW
February 12, 2012, 04:29:26 PM
 #40

[ watching (mining on open source drivers would be awesome) ]

Articoli bitcoin: Il portico dipinto
Pages: « 1 [2] 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!