Phatk2 Mod (Already seeing improvement!)

blandead

Newbie

Activity: 46
Merit: 0

Re: Phatk2 Mod (Already seeing improvement!)

February 13, 2012, 01:59:57 PM

#41

Quote from: d3m0n1q_733rz on February 13, 2012, 12:24:26 PM

Here's the most recent changed phatk2 kernel. I've decided to directly output any found valid hashes instead of having it write to and then read from nonce. It works and it should give more accepted hashes (in theory) without dropping any and having the overhead of another variable.

Stack Overflow can be fixed with a barrier. Try putting this before the output, it only adds one ALU instruction and will prevent any overflows. put as many as you want to narrow down your problem.

barrier(CLK_GLOBAL_MEM_FENCE);

Also can you please replace the useless "WORKSIZE" variable, and just use "get_local_size(0)"

The select() function works right when it's actually a vector type variable, otherwise it will just do "result = c ? b : a" instead of "result = c ? b : a"

e21

Member

Activity: 105
Merit: 10

Re: Phatk2 Mod (Already seeing improvement!)

February 13, 2012, 07:39:05 PM

#42

Quote from: d3m0n1q_733rz on February 13, 2012, 12:24:26 PM

Code:

// This file is in the public domain

#ifdef VECTORS8 
	typedef uint8 u;
#elif defined VECTORS4
	typedef uint4 u;
#elif defined VECTORS
		typedef uint2 u;
	#else
		typedef uint u;
#endif

__constant uint K[64] = { 
	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

__constant uint ConstW[128] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000280U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000100U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000
};

__constant uint H[8] = { 
	0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
};

#ifdef BITALIGN
	#pragma OPENCL EXTENSION cl_amd_media_ops : enable
	#define rot(x, y) amd_bitalign(x, x, (uint)(32 - y))
#else
	#define rot(x, y) rotate(x, (uint)y)
#endif

// Some AMD devices have the BFI_INT opcode, which behaves exactly like the
// SHA-256 Ch function, but provides it in exactly one instruction. If
// detected, use it for Ch. Otherwise, use bitselect() for Ch.

#ifdef BFI_INT
	// Well, slight problem... It turns out BFI_INT isn't actually exposed to
	// OpenCL (or CAL IL for that matter) in any way. However, there is 
	// a similar instruction, BYTE_ALIGN_INT, which is exposed to OpenCL via
	// amd_bytealign, takes the same inputs, and provides the same output. 
	// We can use that as a placeholder for BFI_INT and have the application 
	// patch it after compilation.

	// This is the BFI_INT function
	#define Ch(x, y, z) amd_bytealign(x,y,z)
	// Ma can also be implemented in terms of BFI_INT...
	#define Ma(z, x, y) amd_bytealign(z^x,y,x)
#else
	#define Ch(x, y, z) bitselect(z,y,x)
	#define Ma(x, y, z) bitselect(x,y,(z^x))
#endif

//Various intermediate calculations for each SHA round
#define s0(n) (S0(Vals[(0 + 128 - (n)) % 8]))
#define S0(n) (rot(n, 30u)^rot(n, 19u)^rot(n,10u))

#define s1(n) (S1(Vals[(4 + 128 - (n)) % 8]))
#define S1(n) (rot(n, 26u)^rot(n, 21u)^rot(n, 7u))

#define ch(n) Ch(Vals[(4 + 128 - (n)) % 8],Vals[(5 + 128 - (n)) % 8],Vals[(6 + 128 - (n)) % 8])
#define maj(n) Ma(Vals[(1 + 128 - (n)) % 8],Vals[(2 + 128 - (n)) % 8],Vals[(0 + 128 - (n)) % 8])

//t1 calc when W is already calculated
#define t1(n) K[(n) % 64] + Vals[(7 + 128 - (n)) % 8] +  W[(n)] + s1(n) + ch(n) 

//t1 calc which calculates W
#define t1W(n) K[(n) % 64] + Vals[(7 + 128 - (n)) % 8] +  W(n) + s1(n) + ch(n)

//Used for constant W Values (the compiler optimizes out zeros)
#define t1C(n) (K[(n) % 64]+ ConstW[(n)]) + Vals[(7 + 128 - (n)) % 8] + s1(n) + ch(n)

//t2 Calc
#define t2(n)  maj(n) + s0(n)

#define rotC(x,n) (x<<n | x >> (32-n))

//W calculation used for SHA round
#define W(n) (W[n] = P4(n) + P3(n) + P2(n) + P1(n))


//Partial W calculations (used for the begining where only some values are nonzero)
#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U)))
#define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U)))
#define p1(x) ((rot(x,15u)^rot(x,13u)^((x)>>10U)))
#define p2(x) ((rot(x,25u)^rot(x,14u)^((x)>>3U)))
#define P3(n)  W[n-7]
#define P4(n)  W[n-16]

//Partial Calcs for constant W values
#define P1C(n) ((rotC(ConstW[(n)-2],15)^rotC(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U)))
#define P2C(n) ((rotC(ConstW[(n)-15],25)^rotC(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U)))
#define P3C(x)  ConstW[x-7]
#define P4C(x)  ConstW[x-16]

//SHA round with built in W calc
#define sharoundW(n) Barrier1(n);  Vals[(3 + 128 - (n)) % 8] += t1W(n); Vals[(7 + 128 - (n)) % 8] = t1W(n) + t2(n);  

//SHA round without W calc
#define sharound(n)  Barrier2(n); Vals[(3 + 128 - (n)) % 8] += t1(n); Vals[(7 + 128 - (n)) % 8] = t1(n) + t2(n);

//SHA round for constant W values
#define sharoundC(n)  Barrier2(n); Vals[(3 + 128 - (n)) % 8] += t1C(n); Vals[(7 + 128 - (n)) % 8] = t1C(n) + t2(n);

//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order
#define Barrier1(n) t1 = t1C((n+1))
#define Barrier2(n) t1 = t1C((n))

__kernel
//removed this to allow detection of invalid work size
//__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
void search(	const uint state0, const uint state1, const uint state2, const uint state3,
						const uint state4, const uint state5, const uint state6, const uint state7,
						const uint B1, const uint C1, const uint D1,
						const uint F1, const uint G1, const uint H1,
			#ifndef GOFFSET
						const u base,
			#endif
						const uint W16, const uint W17,
						const uint PreVal4, const uint PreVal0,
						const uint PreW31, const uint PreW32,
						const uint PreW19, const uint PreW20,
						__global uint * output)
{

	u W[124];
	u Vals[8];

//Dummy Variable to prevent compiler from reordering between rounds
	u t1;

	W[16] = W16;
	W[17] = W17;

#ifdef VECTORS8
	#ifdef GOFFSET
	W[3] = ((uint)get_global_id(0) << 3) + (u)(0, 1, 2, 3, 4, 5, 6, 7);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	#else
	W[3] = base + (uint)(get_local_id(0)) * 8u + (uint)(get_group_id(0)) * (WORKSIZE * 8u);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
	W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};
#endif

#elif defined VECTORS4
		#ifdef GOFFSET
	W[3] = ((uint)get_global_id(0) << 2) + (u)(0, 1, 2, 3);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	#else

	//Less dependencies to get both the local id and group id and then add them
	W[3] = base + (uint)(get_local_id(0)) * 4u + (uint)(get_group_id(0)) * (WORKSIZE * 4u);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
	W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U};
#endif
#elif defined VECTORS
		#ifdef GOFFSET
	W[3] = ((uint)get_global_id(0) << 1) + (u)(0, 1,);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	#else
		W[3] = base + (uint)(get_local_id(0)) * 2u + (uint)(get_group_id(0)) * (WORKSIZE * 2u);
		uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
		W[18] = PreW20 + (u){r, r ^ 0x2004000U};
	#endif
	#else
		#ifdef GOFFSET
		W[3] = ((uint)get_global_id(0));
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	#else
		W[3] = base + get_local_id(0) + get_group_id(0) * (WORKSIZE);
		u r = rot(W[3],25u)^rot(W[3],14u)^((W[3])>>3U);
		W[18] = PreW20 + r;
#endif
#endif
	//the order of the W calcs and Rounds is like this because the compiler needs help finding how to order the instructions

	//Vals[0]=state0;
	Vals[0] = PreVal0 + W[3];
	Vals[1]=B1;
	Vals[2]=C1;
	Vals[3]=D1;
	//Vals[4]=PreVal4;
	Vals[4] = PreVal4 + W[3];
	Vals[5]=F1;
	Vals[6]=G1;
	Vals[7]=H1;

	sharoundC(4);
	W[19] = PreW19 + W[3];
	sharoundC(5);
	W[20] = P4C(20) + P1(20);
	sharoundC(6);
	W[21] = P1(21);
	sharoundC(7);
	W[22] = P3C(22) + P1(22);
	sharoundC(8);
	W[23] = W[16] + P1(23);
	sharoundC(9);
	W[24] = W[17] + P1(24);
	sharoundC(10);
	W[25] = P3(25) + P1(25);
	W[26] = P3(26) + P1(26);
	sharoundC(11);
	W[27] = P3(27) + P1(27);
	W[28] = P3(28) + P1(28);
	sharoundC(12);
	W[29] = P3(29) + P1(29);
	sharoundC(13);
	W[30] = P3(30) + P2C(30) + P1(30);
	W[31] = PreW31 + (P3(31) + P1(31));
	sharoundC(14);
	W[32] = PreW32 + (P3(32) + P1(32));
	sharoundC(15);
	sharound(16);
	sharound(17);
	sharound(18);
	sharound(19);
	sharound(20);
	sharound(21);
	sharound(22);
	sharound(23);
	sharound(24);
	sharound(25);
	sharound(26);
	sharound(27);
	sharound(28);
	sharound(29);
	sharound(30);
	sharound(31);
	sharound(32);
	sharoundW(33);
	sharoundW(34);
	sharoundW(35);
	sharoundW(36);
	sharoundW(37);
	sharoundW(38);
	sharoundW(39);
	sharoundW(40);
	sharoundW(41);
	sharoundW(42);
	sharoundW(43);
	sharoundW(44);
	sharoundW(45);
	sharoundW(46);
	sharoundW(47);
	sharoundW(48);
	sharoundW(49);
	sharoundW(50);
	sharoundW(51);
	sharoundW(52);
	sharoundW(53);
	sharoundW(54);
	sharoundW(55);
	sharoundW(56);
	sharoundW(57);
	sharoundW(58);
	sharoundW(59);
	sharoundW(60);
	sharoundW(61);
	sharoundW(62);
	sharoundW(63);

	W[64]=state0+Vals[0];
	W[65]=state1+Vals[1];
	W[66]=state2+Vals[2];
	W[67]=state3+Vals[3];
	W[68]=state4+Vals[4];
	W[69]=state5+Vals[5];
	W[70]=state6+Vals[6];
	W[71]=state7+Vals[7];

	const u Temp = (0xb0edbdd0U + K[0]) +  W[64];
	Vals[0]=H[0];
	Vals[1]=H[1];
	Vals[2]=H[2];
	Vals[3]=0xa54ff53aU + Temp;
	Vals[4]=H[4];
	Vals[5]=H[5];
	Vals[6]=H[6];
	Vals[7]=0x08909ae5U + Temp;

#define P124(n) P2(n) + P1(n) + P4(n)

	W[80] = P2(80) + P4(80);
	sharound(65);
	W[81] = P1C(81) + P2(81) + P4(81);
	sharound(66);
	W[82] = P124(82);
	sharound(67);
	W[83] = P124(83);
	sharound(68);
	W[84] = P124(84);
	sharound(69);
	W[85] = P124(85);
	sharound(70);
	W[86] = P4(86) + P3C(86) + P2(86) + P1(86);
	sharound(71);
	W[87] = P4(87) + P3(87) + P2C(87) + P1(87);
	sharoundC(72);
	W[88] = P1(88) + P4C(88) + P3(88);
	sharoundC(73);
	W[89] = P3(89) + P1(89);
	sharoundC(74);
	W[90] = P3(90) + P1(90);
	sharoundC(75);
	W[91] = P3(91) + P1(91);
	sharoundC(76);
	W[92] = P3(92) + P1(92);
	sharoundC(77);
	W[93] = P3(93) + P1(93);
	W[94] = P3(94) + P2C(94) + P1(94);
	sharoundC(78);
	W[95] = P4C(95) + P3(95) + P2(95) + P1(95);
	sharoundC(79);
	sharound(80);
	sharound(81);
	sharound(82);
	sharound(83);
	sharound(84);
	sharound(85);
	sharound(86);
	sharound(87);
	sharound(88);
	sharound(89);
	sharound(90);
	sharound(91);
	sharound(92);
	sharound(93);
	sharound(94);
	sharound(95);
	sharoundW(96);
	sharoundW(97);
	sharoundW(98);
	sharoundW(99);
	sharoundW(100);
	sharoundW(101);
	sharoundW(102);
	sharoundW(103);
	sharoundW(104);
	sharoundW(105);
	sharoundW(106);
	sharoundW(107);
	sharoundW(108);
	sharoundW(109);
	sharoundW(110);
	sharoundW(111);
	sharoundW(112);
	sharoundW(113);
	sharoundW(114);
	sharoundW(115);
	sharoundW(116);
	sharoundW(117);
	sharoundW(118);
	sharoundW(119);
	sharoundW(120);
	sharoundW(121);
	sharoundW(122);

	u v = W[117] + W[108] + Vals[3] + Vals[7] + P2(124) + P1(124) + Ch((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123),Vals[1],Vals[2]); 
	u g = -(K[60] + H[7]) - S1((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123));

//	uint nonce = 0;
#ifdef VECTORS8
if (v.s0==g.s0)
	{
		output[WORKSIZE] = W[3].s0;
		output[get_local_id(0)] = W[3].s0;
	}
if (v.s1==g.s1)
	{
		output[WORKSIZE] = W[3].s1;
		output[get_local_id(0)] = W[3].s1;
	}
if (v.s2==g.s2)
	{
		output[WORKSIZE] = W[3].s2;
		output[get_local_id(0)] = W[3].s2;
	}
if (v.s3==g.s3)
	{
		output[WORKSIZE] = W[3].s3;
		output[get_local_id(0)] = W[3].s3;
	}
if (v.s4==g.s4)
	{
		output[WORKSIZE] = W[3].s4;
		output[get_local_id(0)] = W[3].s4;
	}
if (v.s5==g.s5)
	{
		output[WORKSIZE] = W[3].s5;
		output[get_local_id(0)] = W[3].s5;
	}
if (v.s6==g.s6)
	{
		output[WORKSIZE] = W[3].s6;
		output[get_local_id(0)] = W[3].s6;
	}
if (v.s7==g.s7)
	{
		output[WORKSIZE] = W[3].s7;
		output[get_local_id(0)] = W[3].s7;
	}
#elif defined VECTORS4
if (v.s0==g.s0)
	{
		output[WORKSIZE] = W[3].s0;
		output[get_local_id(0)] = W[3].s0;
	}
if (v.s1==g.s1)
	{
		output[WORKSIZE] = W[3].s1;
		output[get_local_id(0)] = W[3].s1;
	}
if (v.s2==g.s2)
	{
		output[WORKSIZE] = W[3].s2;
		output[get_local_id(0)] = W[3].s2;
	}
if (v.s3==g.s3)
	{
		output[WORKSIZE] = W[3].s3;
		output[get_local_id(0)] = W[3].s3;
	}

#elif defined VECTORS
if (v.s0==g.s0)
	{
		output[WORKSIZE] = W[3].s0;
		output[get_local_id(0)] = W[3].s0;
	}
if (v.s1==g.s1)
	{
		output[WORKSIZE] = W[3].s1;
		output[get_local_id(0)] = W[3].s1;
	}
	#else
		if (v==g)
	{
		output[WORKSIZE] = W[3];
		output[get_local_id(0)] = W[3];
	}
#endif
}

Is there an easy way to implement this mod? I have copied the code and replaced the code in the phatk2 kernel.cl file, however I get a bunch of errors when I try to run phoenix. Where do I get the init file, or how do I make one?

d3m0n1q_733rz (OP)

Sr. Member

Activity: 378
Merit: 250

⇾ Re: Phatk2 Mod (Already seeing improvement!)

February 14, 2012, 09:09:07 AM

#43

Quote from: blandead on February 13, 2012, 01:59:57 PM

Quote from: d3m0n1q_733rz on February 13, 2012, 12:24:26 PM

The only problems I've run into with stack overflows were with VECTORS8 and GOFFSET=false implementation. And I'm all for cutting out the crap and replacing with the original variable. We'll just toss in a comment that get_local_size(0) is the same as WORKSIZE after replacing it. It's better than define in my opinion. I only use define when there's a fairly large string of complex math to be repeated.
Going back to stack overflow, I'm guessing that it's not able to keep up with multiple nonce being output in a row? *sigh* Of course. Thanks for the tip. Feel free to give any modified sections if you type them up. We'll just toss them in, try them out and make them a part of the kernel if they cause for better output.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs. If you need it, we can get it. We have solutions for your computing conundrums. BTC accepted! 12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq

d3m0n1q_733rz (OP)

Sr. Member

Activity: 378
Merit: 250

Re: Phatk2 Mod (Already seeing improvement!)

February 14, 2012, 01:37:51 PM

#44

Quote from: e21 on February 13, 2012, 07:39:05 PM

Quote from: d3m0n1q_733rz on February 13, 2012, 12:24:26 PM

Code:

// This file is in the public domain

#ifdef VECTORS8 
	typedef uint8 u;
#elif defined VECTORS4
	typedef uint4 u;
#elif defined VECTORS
		typedef uint2 u;
	#else
		typedef uint u;
#endif

__constant uint K[64] = { 
	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

__constant uint ConstW[128] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000280U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000100U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000
};

__constant uint H[8] = { 
	0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
};

#ifdef BITALIGN
	#pragma OPENCL EXTENSION cl_amd_media_ops : enable
	#define rot(x, y) amd_bitalign(x, x, (uint)(32 - y))
#else
	#define rot(x, y) rotate(x, (uint)y)
#endif

// Some AMD devices have the BFI_INT opcode, which behaves exactly like the
// SHA-256 Ch function, but provides it in exactly one instruction. If
// detected, use it for Ch. Otherwise, use bitselect() for Ch.

#ifdef BFI_INT
	// Well, slight problem... It turns out BFI_INT isn't actually exposed to
	// OpenCL (or CAL IL for that matter) in any way. However, there is 
	// a similar instruction, BYTE_ALIGN_INT, which is exposed to OpenCL via
	// amd_bytealign, takes the same inputs, and provides the same output. 
	// We can use that as a placeholder for BFI_INT and have the application 
	// patch it after compilation.

	// This is the BFI_INT function
	#define Ch(x, y, z) amd_bytealign(x,y,z)
	// Ma can also be implemented in terms of BFI_INT...
	#define Ma(z, x, y) amd_bytealign(z^x,y,x)
#else
	#define Ch(x, y, z) bitselect(z,y,x)
	#define Ma(x, y, z) bitselect(x,y,(z^x))
#endif

//Various intermediate calculations for each SHA round
#define s0(n) (S0(Vals[(0 + 128 - (n)) % 8]))
#define S0(n) (rot(n, 30u)^rot(n, 19u)^rot(n,10u))

#define s1(n) (S1(Vals[(4 + 128 - (n)) % 8]))
#define S1(n) (rot(n, 26u)^rot(n, 21u)^rot(n, 7u))

#define ch(n) Ch(Vals[(4 + 128 - (n)) % 8],Vals[(5 + 128 - (n)) % 8],Vals[(6 + 128 - (n)) % 8])
#define maj(n) Ma(Vals[(1 + 128 - (n)) % 8],Vals[(2 + 128 - (n)) % 8],Vals[(0 + 128 - (n)) % 8])

//t1 calc when W is already calculated
#define t1(n) K[(n) % 64] + Vals[(7 + 128 - (n)) % 8] +  W[(n)] + s1(n) + ch(n) 

//t1 calc which calculates W
#define t1W(n) K[(n) % 64] + Vals[(7 + 128 - (n)) % 8] +  W(n) + s1(n) + ch(n)

//Used for constant W Values (the compiler optimizes out zeros)
#define t1C(n) (K[(n) % 64]+ ConstW[(n)]) + Vals[(7 + 128 - (n)) % 8] + s1(n) + ch(n)

//t2 Calc
#define t2(n)  maj(n) + s0(n)

#define rotC(x,n) (x<<n | x >> (32-n))

//W calculation used for SHA round
#define W(n) (W[n] = P4(n) + P3(n) + P2(n) + P1(n))


//Partial W calculations (used for the begining where only some values are nonzero)
#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U)))
#define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U)))
#define p1(x) ((rot(x,15u)^rot(x,13u)^((x)>>10U)))
#define p2(x) ((rot(x,25u)^rot(x,14u)^((x)>>3U)))
#define P3(n)  W[n-7]
#define P4(n)  W[n-16]

//Partial Calcs for constant W values
#define P1C(n) ((rotC(ConstW[(n)-2],15)^rotC(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U)))
#define P2C(n) ((rotC(ConstW[(n)-15],25)^rotC(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U)))
#define P3C(x)  ConstW[x-7]
#define P4C(x)  ConstW[x-16]

//SHA round with built in W calc
#define sharoundW(n) Barrier1(n);  Vals[(3 + 128 - (n)) % 8] += t1W(n); Vals[(7 + 128 - (n)) % 8] = t1W(n) + t2(n);  

//SHA round without W calc
#define sharound(n)  Barrier2(n); Vals[(3 + 128 - (n)) % 8] += t1(n); Vals[(7 + 128 - (n)) % 8] = t1(n) + t2(n);

//SHA round for constant W values
#define sharoundC(n)  Barrier2(n); Vals[(3 + 128 - (n)) % 8] += t1C(n); Vals[(7 + 128 - (n)) % 8] = t1C(n) + t2(n);

//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order
#define Barrier1(n) t1 = t1C((n+1))
#define Barrier2(n) t1 = t1C((n))

__kernel
//removed this to allow detection of invalid work size
//__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
void search(	const uint state0, const uint state1, const uint state2, const uint state3,
						const uint state4, const uint state5, const uint state6, const uint state7,
						const uint B1, const uint C1, const uint D1,
						const uint F1, const uint G1, const uint H1,
			#ifndef GOFFSET
						const u base,
			#endif
						const uint W16, const uint W17,
						const uint PreVal4, const uint PreVal0,
						const uint PreW31, const uint PreW32,
						const uint PreW19, const uint PreW20,
						__global uint * output)
{

	u W[124];
	u Vals[8];

//Dummy Variable to prevent compiler from reordering between rounds
	u t1;

	W[16] = W16;
	W[17] = W17;

#ifdef VECTORS8
	#ifdef GOFFSET
	W[3] = ((uint)get_global_id(0) << 3) + (u)(0, 1, 2, 3, 4, 5, 6, 7);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	#else
	W[3] = base + (uint)(get_local_id(0)) * 8u + (uint)(get_group_id(0)) * (WORKSIZE * 8u);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
	W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};
#endif

#elif defined VECTORS4
		#ifdef GOFFSET
	W[3] = ((uint)get_global_id(0) << 2) + (u)(0, 1, 2, 3);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	#else

	//Less dependencies to get both the local id and group id and then add them
	W[3] = base + (uint)(get_local_id(0)) * 4u + (uint)(get_group_id(0)) * (WORKSIZE * 4u);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
	W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U};
#endif
#elif defined VECTORS
		#ifdef GOFFSET
	W[3] = ((uint)get_global_id(0) << 1) + (u)(0, 1,);
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	#else
		W[3] = base + (uint)(get_local_id(0)) * 2u + (uint)(get_group_id(0)) * (WORKSIZE * 2u);
		uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
		W[18] = PreW20 + (u){r, r ^ 0x2004000U};
	#endif
	#else
		#ifdef GOFFSET
		W[3] = ((uint)get_global_id(0));
	uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
	#else
		W[3] = base + get_local_id(0) + get_group_id(0) * (WORKSIZE);
		u r = rot(W[3],25u)^rot(W[3],14u)^((W[3])>>3U);
		W[18] = PreW20 + r;
#endif
#endif
	//the order of the W calcs and Rounds is like this because the compiler needs help finding how to order the instructions

	//Vals[0]=state0;
	Vals[0] = PreVal0 + W[3];
	Vals[1]=B1;
	Vals[2]=C1;
	Vals[3]=D1;
	//Vals[4]=PreVal4;
	Vals[4] = PreVal4 + W[3];
	Vals[5]=F1;
	Vals[6]=G1;
	Vals[7]=H1;

	sharoundC(4);
	W[19] = PreW19 + W[3];
	sharoundC(5);
	W[20] = P4C(20) + P1(20);
	sharoundC(6);
	W[21] = P1(21);
	sharoundC(7);
	W[22] = P3C(22) + P1(22);
	sharoundC(8);
	W[23] = W[16] + P1(23);
	sharoundC(9);
	W[24] = W[17] + P1(24);
	sharoundC(10);
	W[25] = P3(25) + P1(25);
	W[26] = P3(26) + P1(26);
	sharoundC(11);
	W[27] = P3(27) + P1(27);
	W[28] = P3(28) + P1(28);
	sharoundC(12);
	W[29] = P3(29) + P1(29);
	sharoundC(13);
	W[30] = P3(30) + P2C(30) + P1(30);
	W[31] = PreW31 + (P3(31) + P1(31));
	sharoundC(14);
	W[32] = PreW32 + (P3(32) + P1(32));
	sharoundC(15);
	sharound(16);
	sharound(17);
	sharound(18);
	sharound(19);
	sharound(20);
	sharound(21);
	sharound(22);
	sharound(23);
	sharound(24);
	sharound(25);
	sharound(26);
	sharound(27);
	sharound(28);
	sharound(29);
	sharound(30);
	sharound(31);
	sharound(32);
	sharoundW(33);
	sharoundW(34);
	sharoundW(35);
	sharoundW(36);
	sharoundW(37);
	sharoundW(38);
	sharoundW(39);
	sharoundW(40);
	sharoundW(41);
	sharoundW(42);
	sharoundW(43);
	sharoundW(44);
	sharoundW(45);
	sharoundW(46);
	sharoundW(47);
	sharoundW(48);
	sharoundW(49);
	sharoundW(50);
	sharoundW(51);
	sharoundW(52);
	sharoundW(53);
	sharoundW(54);
	sharoundW(55);
	sharoundW(56);
	sharoundW(57);
	sharoundW(58);
	sharoundW(59);
	sharoundW(60);
	sharoundW(61);
	sharoundW(62);
	sharoundW(63);

	W[64]=state0+Vals[0];
	W[65]=state1+Vals[1];
	W[66]=state2+Vals[2];
	W[67]=state3+Vals[3];
	W[68]=state4+Vals[4];
	W[69]=state5+Vals[5];
	W[70]=state6+Vals[6];
	W[71]=state7+Vals[7];

	const u Temp = (0xb0edbdd0U + K[0]) +  W[64];
	Vals[0]=H[0];
	Vals[1]=H[1];
	Vals[2]=H[2];
	Vals[3]=0xa54ff53aU + Temp;
	Vals[4]=H[4];
	Vals[5]=H[5];
	Vals[6]=H[6];
	Vals[7]=0x08909ae5U + Temp;

#define P124(n) P2(n) + P1(n) + P4(n)

	W[80] = P2(80) + P4(80);
	sharound(65);
	W[81] = P1C(81) + P2(81) + P4(81);
	sharound(66);
	W[82] = P124(82);
	sharound(67);
	W[83] = P124(83);
	sharound(68);
	W[84] = P124(84);
	sharound(69);
	W[85] = P124(85);
	sharound(70);
	W[86] = P4(86) + P3C(86) + P2(86) + P1(86);
	sharound(71);
	W[87] = P4(87) + P3(87) + P2C(87) + P1(87);
	sharoundC(72);
	W[88] = P1(88) + P4C(88) + P3(88);
	sharoundC(73);
	W[89] = P3(89) + P1(89);
	sharoundC(74);
	W[90] = P3(90) + P1(90);
	sharoundC(75);
	W[91] = P3(91) + P1(91);
	sharoundC(76);
	W[92] = P3(92) + P1(92);
	sharoundC(77);
	W[93] = P3(93) + P1(93);
	W[94] = P3(94) + P2C(94) + P1(94);
	sharoundC(78);
	W[95] = P4C(95) + P3(95) + P2(95) + P1(95);
	sharoundC(79);
	sharound(80);
	sharound(81);
	sharound(82);
	sharound(83);
	sharound(84);
	sharound(85);
	sharound(86);
	sharound(87);
	sharound(88);
	sharound(89);
	sharound(90);
	sharound(91);
	sharound(92);
	sharound(93);
	sharound(94);
	sharound(95);
	sharoundW(96);
	sharoundW(97);
	sharoundW(98);
	sharoundW(99);
	sharoundW(100);
	sharoundW(101);
	sharoundW(102);
	sharoundW(103);
	sharoundW(104);
	sharoundW(105);
	sharoundW(106);
	sharoundW(107);
	sharoundW(108);
	sharoundW(109);
	sharoundW(110);
	sharoundW(111);
	sharoundW(112);
	sharoundW(113);
	sharoundW(114);
	sharoundW(115);
	sharoundW(116);
	sharoundW(117);
	sharoundW(118);
	sharoundW(119);
	sharoundW(120);
	sharoundW(121);
	sharoundW(122);

	u v = W[117] + W[108] + Vals[3] + Vals[7] + P2(124) + P1(124) + Ch((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123),Vals[1],Vals[2]); 
	u g = -(K[60] + H[7]) - S1((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123));

//	uint nonce = 0;
#ifdef VECTORS8
if (v.s0==g.s0)
	{
		output[WORKSIZE] = W[3].s0;
		output[get_local_id(0)] = W[3].s0;
	}
if (v.s1==g.s1)
	{
		output[WORKSIZE] = W[3].s1;
		output[get_local_id(0)] = W[3].s1;
	}
if (v.s2==g.s2)
	{
		output[WORKSIZE] = W[3].s2;
		output[get_local_id(0)] = W[3].s2;
	}
if (v.s3==g.s3)
	{
		output[WORKSIZE] = W[3].s3;
		output[get_local_id(0)] = W[3].s3;
	}
if (v.s4==g.s4)
	{
		output[WORKSIZE] = W[3].s4;
		output[get_local_id(0)] = W[3].s4;
	}
if (v.s5==g.s5)
	{
		output[WORKSIZE] = W[3].s5;
		output[get_local_id(0)] = W[3].s5;
	}
if (v.s6==g.s6)
	{
		output[WORKSIZE] = W[3].s6;
		output[get_local_id(0)] = W[3].s6;
	}
if (v.s7==g.s7)
	{
		output[WORKSIZE] = W[3].s7;
		output[get_local_id(0)] = W[3].s7;
	}
#elif defined VECTORS4
if (v.s0==g.s0)
	{
		output[WORKSIZE] = W[3].s0;
		output[get_local_id(0)] = W[3].s0;
	}
if (v.s1==g.s1)
	{
		output[WORKSIZE] = W[3].s1;
		output[get_local_id(0)] = W[3].s1;
	}
if (v.s2==g.s2)
	{
		output[WORKSIZE] = W[3].s2;
		output[get_local_id(0)] = W[3].s2;
	}
if (v.s3==g.s3)
	{
		output[WORKSIZE] = W[3].s3;
		output[get_local_id(0)] = W[3].s3;
	}

#elif defined VECTORS
if (v.s0==g.s0)
	{
		output[WORKSIZE] = W[3].s0;
		output[get_local_id(0)] = W[3].s0;
	}
if (v.s1==g.s1)
	{
		output[WORKSIZE] = W[3].s1;
		output[get_local_id(0)] = W[3].s1;
	}
	#else
		if (v==g)
	{
		output[WORKSIZE] = W[3];
		output[get_local_id(0)] = W[3];
	}
#endif
}

The init file is the same as the original for Phatk2 that came with Phoenix. I'll see about uploading the file to a filesharing site soon. ^_^

d3m0n1q_733rz (OP)

Sr. Member

Activity: 378
Merit: 250

Re: Phatk2 Mod (Already seeing improvement!)

February 19, 2012, 03:35:15 AM

#45

I'm working on another theory right now that could potentially multiply the output hashes without doing very much more work. However, it's just a theory and it'll take some coding to verify. I'll work on it either tonight or later tomorrow. IF this works, you're gonna love it. Cool

d3m0n1q_733rz (OP)

Sr. Member

Activity: 378
Merit: 250

Re: Phatk2 Mod (Already seeing improvement!)

February 20, 2012, 03:53:54 AM

#46

Well, big surprise--it didn't work. I made a rookie mistake of using a linear function on a non-linear equation.

Diapolo

Hero Member

Activity: 773
Merit: 500

Re: Phatk2 Mod (Already seeing improvement!)

February 20, 2012, 05:53:30 PM

#47

I think you have got a little bug in your VEC8 code.

This:

Code:

W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};

should be replaced with:

Code:

W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA014000U, r ^ 0xC018000U, r ^ 0xE01C000U};

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

d3m0n1q_733rz (OP)

Sr. Member

Activity: 378
Merit: 250

Re: Phatk2 Mod (Already seeing improvement!)

February 20, 2012, 08:31:39 PM

#48

Quote from: Diapolo on February 20, 2012, 05:53:30 PM

I think you have got a little bug in your VEC8 code.

This:

Code:

W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};

should be replaced with:

Code:

W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA014000U, r ^ 0xC018000U, r ^ 0xE01C000U};

Dia

Dang it, I fixed that before, I guess I must have reverted it on another foul-up while playing with the code. Thanks for pointing it out. I wondered why my code suddenly didn't work with that option. Either way, I'm still ending up with it spilling over into memory.

d3m0n1q_733rz (OP)

Sr. Member

Activity: 378
Merit: 250

Re: Phatk2 Mod (Already seeing improvement!)

February 22, 2012, 01:34:08 AM

#49

I'm really wanting to clean up the code and get rid of some of the multiple large vector variables that remain memory resident. I'll be overwriting them as it progresses through so as to lower the memory footprint and increase available vector sizes without the memory spill.

Diapolo

Hero Member

Activity: 773
Merit: 500

Re: Phatk2 Mod (Already seeing improvement!)

February 22, 2012, 03:32:34 PM

#50

By the way, my latest changes to DiaKGCN have introduced the same problem and VECTORS8 is now unusable slow ^^. Well some changes require to revert them, right Wink

?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

d3m0n1q_733rz (OP)

Sr. Member

Activity: 378
Merit: 250

Re: Phatk2 Mod (Already seeing improvement!)

February 22, 2012, 10:26:08 PM

#51

Quote from: Diapolo on February 22, 2012, 03:32:34 PM

By the way, my latest changes to DiaKGCN have introduced the same problem and VECTORS8 is now unusable slow ^^. Well some changes require to revert them, right Wink

?

Dia

Depends on what you're trying to do. If it's possible to asynchronously bring the vectors out of global memory, it could be possible to avoid reversions. But I think the reason why you were able to use VECTORS8 in the first place was because you were reusing your variables via += instead of creating new ones.

neo_rage

Full Member

Activity: 196
Merit: 100

Re: Phatk2 Mod (Already seeing improvement!)

March 11, 2012, 03:57:56 AM

#52

Any news?

Bitcoin mining contracts for only 0.0064 BTC per GH for 5 years!

ssateneth

Legendary

Activity: 1344
Merit: 1004

Re: Phatk2 Mod (Already seeing improvement!)

March 11, 2012, 06:58:33 AM

#53

Quote from: neo_rage on March 11, 2012, 03:57:56 AM

Any news?

interested in this too. I would test the kernel, but really has to be in a format where I can just put the kernel files where the other kernels are and just use it. no copy pasting of code, no special configuring inside the kernel (aside from kernel-specific options like setting GOFFSET to true), because I wouldn't know what to do. This is probably the case for other people that are interested too: they see you worked on phatk2 and see positive results and want to use it, but have no clue how to implement it.

Can we please get a kernel.cl and __init__.py, preferably phoenix 2.0 beta compatible?

I am a long time trusted user: Bitcointalk forum trust ratings, Bitcoin-OTC Ratings, eBay Feedback, and Localbitcoins public profile.

d3m0n1q_733rz (OP)

Sr. Member

Activity: 378
Merit: 250

Re: Phatk2 Mod (Already seeing improvement!)

March 11, 2012, 10:11:53 PM

#54

Maybe, I'll have to work on it. Right now, it's been on the back burner while I get some medical things sorted out. If you or anyone else wants to work on it, it's up for grabs.

Pages: « 1 2 [3] All

Bitcoin Forum > Bitcoin > Mining > Mining software (miners) > Phatk2 Mod (Already seeing improvement!)

« previous topic next topic »