Diapolo


February 06, 2012, 08:31:17 PM 

For the ones, who want to try DiaKGCN with Phoenix 2, here is a preview: http://www.filedropper.com/diakgcnphoenix2preview_1This can be used as config: [cl:0:0] autoconfigure = false kernel = diakgcn aggression = 12 vectors2 = true vectors4 = false vectors8 = false worksize = 256 The kernel is the same as the last version for Phoenix 1.7.X, only has some small fixes in the init. Please report bugs, glitches, ideas and results back to me, thanks! Dia





Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.


lodcrappo


February 06, 2012, 08:40:02 PM 

please add to phoenix overclocking support...
please no. don't turn phoenix into the monstrosity of some other miners. it's simplicity is it's beauty. do one thing, and do it well. when you throw everything possible into one program, you end up with too many compromises.




bulanula


February 06, 2012, 08:49:20 PM 

please add to phoenix overclocking support...
please no. don't turn phoenix into the monstrosity of some other miners. it's simplicity is it's beauty. do one thing, and do it well. when you throw everything possible into one program, you end up with too many compromises. I 100% agree and support this decision. Remember the KISS principle. If you need OC stick to AMD API thing.




HendrikJan
Member
Offline
Activity: 64
Merit: 10


February 06, 2012, 09:41:21 PM 

I did try the diakgcn kernel. Still about 10Mh/s less then older version.
Also would like to know what the "Rolling time" is.




Barlog
Member
Offline
Activity: 67
Merit: 10


February 06, 2012, 09:46:56 PM 

You will need new address for donations aka 2PHoenix




d3m0n1q_733rz


February 06, 2012, 11:04:57 PM 

Alright, I have a register spill somewhere in here...can someone find it for me? I'm using 8 vectors to make better use of the 16 available to the HD79xx cards. But the code itself isn't made to handle it. I just can't figure out what part needs to be changed to make it capable. // This file is in the public domain
#ifdef VECTORS8 typedef uint8 u; #elif defined VECTORS4 typedef uint4 u; #elif defined VECTORS typedef uint2 u; #else typedef uint u; #endif
__constant uint K[64] = { 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 };
__constant uint ConstW[128] = { 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000280U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000100U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 };
__constant uint H[8] = { 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19 };
#ifdef BITALIGN #pragma OPENCL EXTENSION cl_amd_media_ops : enable #define rot(x, y) amd_bitalign(x, x, (uint)(32  y)) #else #define rot(x, y) rotate(x, (uint)y) #endif
// Some AMD devices have the BFI_INT opcode, which behaves exactly like the // SHA256 Ch function, but provides it in exactly one instruction. If // detected, use it for Ch. Otherwise, use bitselect() for Ch.
#ifdef BFI_INT // Well, slight problem... It turns out BFI_INT isn't actually exposed to // OpenCL (or CAL IL for that matter) in any way. However, there is // a similar instruction, BYTE_ALIGN_INT, which is exposed to OpenCL via // amd_bytealign, takes the same inputs, and provides the same output. // We can use that as a placeholder for BFI_INT and have the application // patch it after compilation.
// This is the BFI_INT function #define Ch(x, y, z) amd_bytealign(x,y,z) // Ma can also be implemented in terms of BFI_INT... #define Ma(z, x, y) amd_bytealign(z^x,y,x) #else #define Ch(x, y, z) bitselect(z,y,x) #define Ma(x, y, z) bitselect(x,y,(z^x)) #endif
//Various intermediate calculations for each SHA round #define s0(n) (S0(Vals[(128  (n)) % 8])) #define S0(n) (rot(n, 30u)^rot(n, 19u)^rot(n,10u))
#define s1(n) (S1(Vals[(132  (n)) % 8])) #define S1(n) (rot(n, 26u)^rot(n, 21u)^rot(n, 7u))
#define ch(n) Ch(Vals[(132  (n)) % 8],Vals[(133  (n)) % 8],Vals[(134  (n)) % 8]) #define maj(n) Ma(Vals[(129  (n)) % 8],Vals[(130  (n)) % 8],Vals[(128  (n)) % 8])
//t1 calc when W is already calculated #define t1(n) K[(n) % 64] + Vals[(135  (n)) % 8] + W[(n)] + s1(n) + ch(n)
//t1 calc which calculates W #define t1W(n) K[(n) % 64] + Vals[(135  (n)) % 8] + W(n) + s1(n) + ch(n)
//Used for constant W Values (the compiler optimizes out zeros) #define t1C(n) (K[(n) % 64]+ ConstW[(n)]) + Vals[(135  (n)) % 8] + s1(n) + ch(n)
//t2 Calc #define t2(n) maj(n) + s0(n)
#define rotC(x,n) (x<<n  x >> (32n))
//W calculation used for SHA round #define W(n) (W[n] = P4(n) + P3(n) + P2(n) + P1(n))
//Partial W calculations (used for the begining where only some values are nonzero) #define P1(n) ((rot(W[(n)2],15u)^rot(W[(n)2],13u)^((W[(n)2])>>10U))) #define P2(n) ((rot(W[(n)15],25u)^rot(W[(n)15],14u)^((W[(n)15])>>3U))) #define p1(x) ((rot(x,15u)^rot(x,13u)^((x)>>10U))) #define p2(x) ((rot(x,25u)^rot(x,14u)^((x)>>3U))) #define P3(n) W[n7] #define P4(n) W[n16]
//Partial Calcs for constant W values #define P1C(n) ((rotC(ConstW[(n)2],15)^rotC(ConstW[(n)2],13)^((ConstW[(n)2])>>10U))) #define P2C(n) ((rotC(ConstW[(n)15],25)^rotC(ConstW[(n)15],14)^((ConstW[(n)15])>>3U))) #define P3C(x) ConstW[x7] #define P4C(x) ConstW[x16]
//SHA round with built in W calc #define sharoundW(n) Barrier1(n); Vals[(131  (n)) % 8] += t1W(n); Vals[(135  (n)) % 8] = t1W(n) + t2(n);
//SHA round without W calc #define sharound(n) Barrier2(n); Vals[(131  (n)) % 8] += t1(n); Vals[(135  (n)) % 8] = t1(n) + t2(n);
//SHA round for constant W values #define sharoundC(n) Barrier2(n); Vals[(131  (n)) % 8] += t1C(n); Vals[(135  (n)) % 8] = t1C(n) + t2(n);
//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order #define Barrier1(n) t1 = t1C((n+1)) #define Barrier2(n) t1 = t1C((n))
__kernel //removed this to allow detection of invalid work size //__attribute__((reqd_work_group_size(WORKSIZE, 1, 1))) void search( const uint state0, const uint state1, const uint state2, const uint state3, const uint state4, const uint state5, const uint state6, const uint state7, const uint B1, const uint C1, const uint D1, const uint F1, const uint G1, const uint H1, const u base, const uint W16, const uint W17, const uint PreVal4, const uint PreVal0, const uint PreW31, const uint PreW32, const uint PreW19, const uint PreW20, __global uint * output) {
u W[124]; u Vals[8];
//Dummy Variable to prevent compiler from reordering between rounds u t1;
W[16] = W16; W[17] = W17;
#ifdef VECTORS8 //Modified from VECTORS4 W[3] = base + (uint)(get_local_id(0)) * 8u + (uint)(get_group_id(0)) * (WORKSIZE * 8u); uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U); W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};
#elif defined VECTORS4 //Less dependencies to get both the local id and group id and then add them W[3] = base + (uint)(get_local_id(0)) * 4u + (uint)(get_group_id(0)) * (WORKSIZE * 4u); uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U); //Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3 W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U};
#elif defined VECTORS W[3] = base + (uint)(get_local_id(0)) * 2u + (uint)(get_group_id(0)) * (WORKSIZE * 2u); uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U); W[18] = PreW20 + (u){r, r ^ 0x2004000U};
#else W[3] = base + get_local_id(0) + get_group_id(0) * (WORKSIZE); u r = rot(W[3],25u)^rot(W[3],14u)^((W[3])>>3U); W[18] = PreW20 + r; #endif
//the order of the W calcs and Rounds is like this because the compiler needs help finding how to order the instructions
//Vals[0]=state0; Vals[0] = PreVal0 + W[3]; Vals[1]=B1; Vals[2]=C1; Vals[3]=D1; //Vals[4]=PreVal4; Vals[4] = PreVal4 + W[3]; Vals[5]=F1; Vals[6]=G1; Vals[7]=H1;
sharoundC(4); W[19] = PreW19 + W[3]; sharoundC(5); W[20] = P1(20) + P4C(20); sharoundC(6); W[21] = P1(21); sharoundC(7); W[22] = P1(22) + P3C(22); sharoundC(8); W[23] = W[16] + P1(23); sharoundC(9); W[24] = W[17] + P1(24); sharoundC(10); W[25] = P1(25) + P3(25); W[26] = P1(26) + P3(26); sharoundC(11); W[27] = P1(27) + P3(27); W[28] = P1(28) + P3(28); sharoundC(12); W[29] = P1(29) + P3(29); sharoundC(13); W[30] = P1(30) + P2C(30) + P3(30); W[31] = P1(31) + P3(31) + PreW31; sharoundC(14); W[32] = P1(32) + P3(32) + PreW32; sharoundC(15); sharound(16); sharound(17); sharound(18); sharound(19); sharound(20); sharound(21); sharound(22); sharound(23); sharound(24); sharound(25); sharound(26); sharound(27); sharound(28); sharound(29); sharound(30); sharound(31); sharound(32); sharoundW(33); sharoundW(34); sharoundW(35); sharoundW(36); sharoundW(37); sharoundW(38); sharoundW(39); sharoundW(40); sharoundW(41); sharoundW(42); sharoundW(43); sharoundW(44); sharoundW(45); sharoundW(46); sharoundW(47); sharoundW(48); sharoundW(49); sharoundW(50); sharoundW(51); sharoundW(52); sharoundW(53); sharoundW(54); sharoundW(55); sharoundW(56); sharoundW(57); sharoundW(58); sharoundW(59); sharoundW(60); sharoundW(61); sharoundW(62); sharoundW(63);
W[64]=state0+Vals[0]; W[65]=state1+Vals[1]; W[66]=state2+Vals[2]; W[67]=state3+Vals[3]; W[68]=state4+Vals[4]; W[69]=state5+Vals[5]; W[70]=state6+Vals[6]; W[71]=state7+Vals[7];
Vals[0]=H[0]; Vals[1]=H[1]; Vals[2]=H[2]; // Vals[3]=H[3]; Vals[3] = 0xa54ff53aU + (0xb0edbdd0U + K[0]) + W[64]; Vals[4]=H[4]; Vals[5]=H[5]; Vals[6]=H[6]; // Vals[7]=H[7]; Vals[7] = 0x08909ae5U + (0xb0edbdd0U + K[0]) + W[64];
//const u Temp = (0xb0edbdd0U + K[0]) + W[64];
//#define P124(n) P1(n) + P2(n) + P4(n)
W[80] = + P2(80) + P4(80); sharound(65); W[81] = P1C(81) + P2(81) + P4(81); sharound(66); W[82] = P1(82) + P2(82) + P4(82); sharound(67); W[83] = P1(83) + P2(83) + P4(83); sharound(68); W[84] = P1(84) + P2(84) + P4(84); sharound(69); W[85] = P1(85) + P2(85) + P4(85); sharound(70); W[86] = P1(86) + P2(86) + P3C(86) + P4(86); sharound(71); W[87] = P1(87) + P2C(87) + P3(87) + P4(87); sharoundC(72); W[88] = P1(88) + P3(88) + P4C(88); sharoundC(73); W[89] = P1(89) + P3(89); sharoundC(74); W[90] = P1(90) + P3(90); sharoundC(75); W[91] = P1(91) + P3(91); sharoundC(76); W[92] = P1(92) + P3(92); sharoundC(77); W[93] = P1(93) + P3(93); W[94] = P1(94) + P2C(94) + P3(94); sharoundC(78); W[95] = P1(95) + P2(95) + P3(95) + P4C(95); sharoundC(79); sharound(80); sharound(81); sharound(82); sharound(83); sharound(84); sharound(85); sharound(86); sharound(87); sharound(88); sharound(89); sharound(90); sharound(91); sharound(92); sharound(93); sharound(94); sharound(95); sharoundW(96); sharoundW(97); sharoundW(98); sharoundW(99); sharoundW(100); sharoundW(101); sharoundW(102); sharoundW(103); sharoundW(104); sharoundW(105); sharoundW(106); sharoundW(107); sharoundW(108); sharoundW(109); sharoundW(110); sharoundW(111); sharoundW(112); sharoundW(113); sharoundW(114); sharoundW(115); sharoundW(116); sharoundW(117); sharoundW(118); sharoundW(119); sharoundW(120); sharoundW(121); sharoundW(122);
u v = W[117] + W[108] + Vals[3] + Vals[7] + P2(124) + P1(124) + Ch((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123),Vals[1],Vals[2]); u g = (K[60] + H[7])  S1((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123));
uint nonce = 0;
#ifdef VECTORS8 if (v.s0 == g.s0) { nonce = W[3].s0; } if (v.s1 == g.s1) { nonce = W[3].s1; } if (v.s2 == g.s2) { nonce = W[3].s2; } if (v.s3 == g.s3) { nonce = W[3].s3; } if (v.s4 == g.s4) { nonce = W[3].s4; } if (v.s5 == g.s5) { nonce = W[3].s5; } if (v.s6 == g.s6) { nonce = W[3].s6; } if (v.s7 == g.s7) { nonce = W[3].s7; } #elif defined VECTORS4 if (v.s0 == g.s0) { nonce = W[3].s0; } if (v.s1 == g.s1) { nonce = W[3].s1; } if (v.s2 == g.s2) { nonce = W[3].s2; } if (v.s3 == g.s3) { nonce = W[3].s3; } #elif defined VECTORS if (v.s0 == g.s0) { nonce = W[3].s0; } if (v.s1 == g.s1) { nonce = W[3].s1; } #else if (v == g) { nonce = W[3]; } #endif if(nonce) { //Faster to shift the nonce by 2 due to 4DWORD addressing and does not add more collisions output[WORKSIZE] = nonce; output[get_local_id(0)] = nonce; } }

Funroll_Loops, the theoretically quicker breakfast cereal! Check out http://www.facebook.com/JupiterICT for all of your computing needs. If you need it, we can get it. We have solutions for your computing conundrums. BTC accepted! 12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq



Bananington
Sr. Member
Offline
Activity: 366
Merit: 250
Twinkle twinkle motherfucker, twinkle twinkle.


February 06, 2012, 11:48:50 PM 

Could we get a list of all accepted options? Kernel selection Worksize= Vectors, Vectors4, Vectors8, Vectors16 etc.
The Phoenix 2 core itself recognizes (not an exhaustive list): # [cl:0:0] autoconfigure # Allow the kernel to choose its own configuration variables? # [cl:0:0] disabled # Prevent mining on this device? # [cl:0:0] kernel # Can be used to manually choose a kernel to mine on. # [cl:0:0] start_undetected # Start even if autodetect doesn't find it? # [general] autodetect # A list of rules for what devices to autodetect. # [general] backend # The URL to mine on in the backend. # [general] logfile # Set this option to log to a file. # [general] queuedelay # Advanced  ??? # [general] queuesize # Advanced  ??? # [general] ratesamples # Advanced  number of samples to average for rate reporting # [general] statusinterval # Advanced  how long to delay between statusbar updates # [general] verbose # Enable verbose mode? (Shows debug messages) # [web] bind # Bind the web/RPC server to a specific IP # [web] disabled # Disable the webserver altogether? # [web] logbuffer # Advanced  how many logs to remember in the getlogs() RPC call # [web] password # The password necessary for web/RPC login (username is ignored) # [web] port # What port should the web/RPC server listen on? # [web] root # Advanced  The root directory for the webserver.
The phatk2/opencl kernels recognize: # [cl:0:0] vectors # Enable twoway vectors? # [cl:0:0] vectors4 # Enable fourway vectors? # [cl:0:0] fastloop # Advanced  fastloop optimization for low aggressions # [cl:0:0] aggression # Controls how hard Phoenix 2 hits the hardware # [cl:0:0] worksize # Advanced  controls size of individual executions # [cl:0:0] bfi_int # Enable BFI_INT optimization for Radeon cards that support it
[cl:0:0] autoconfigure = False BFI_INT VECTORS4 WORKSIZE=64
What's wrong with this portion that it's not giving me the correct settings?
Try: [cl:0:0] autoconfigure = false # Not actually needed since autoconfiguration disables by default when you supply your own args BFI_INT = true # Also the boolean options aren't case sensitive. VECTORS4 = true WORKSIZE = 64
I believe this should be edited into the first post for the late joiners.




Bananington
Sr. Member
Offline
Activity: 366
Merit: 250
Twinkle twinkle motherfucker, twinkle twinkle.


February 06, 2012, 11:59:03 PM 

How high can I turn up the aggression? Is 12 the max?




d3m0n1q_733rz


February 07, 2012, 12:29:36 AM 

Alright, I have a register spill somewhere in here...can someone find it for me? I'm using 8 vectors to make better use of the 16 available to the HD79xx cards. But the code itself isn't made to handle it. I just can't figure out what part needs to be changed to make it capable. // This file is in the public domain
#ifdef VECTORS8 typedef uint8 u; #elif defined VECTORS4 typedef uint4 u; #elif defined VECTORS typedef uint2 u; #else typedef uint u; #endif
__constant uint K[64] = { 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 };
__constant uint ConstW[128] = { 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000280U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000100U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000 };
__constant uint H[8] = { 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19 };
#ifdef BITALIGN #pragma OPENCL EXTENSION cl_amd_media_ops : enable #define rot(x, y) amd_bitalign(x, x, (uint)(32  y)) #else #define rot(x, y) rotate(x, (uint)y) #endif
// Some AMD devices have the BFI_INT opcode, which behaves exactly like the // SHA256 Ch function, but provides it in exactly one instruction. If // detected, use it for Ch. Otherwise, use bitselect() for Ch.
#ifdef BFI_INT // Well, slight problem... It turns out BFI_INT isn't actually exposed to // OpenCL (or CAL IL for that matter) in any way. However, there is // a similar instruction, BYTE_ALIGN_INT, which is exposed to OpenCL via // amd_bytealign, takes the same inputs, and provides the same output. // We can use that as a placeholder for BFI_INT and have the application // patch it after compilation.
// This is the BFI_INT function #define Ch(x, y, z) amd_bytealign(x,y,z) // Ma can also be implemented in terms of BFI_INT... #define Ma(z, x, y) amd_bytealign(z^x,y,x) #else #define Ch(x, y, z) bitselect(z,y,x) #define Ma(x, y, z) bitselect(x,y,(z^x)) #endif
//Various intermediate calculations for each SHA round #define s0(n) (S0(Vals[(128  (n)) % 8])) #define S0(n) (rot(n, 30u)^rot(n, 19u)^rot(n,10u))
#define s1(n) (S1(Vals[(132  (n)) % 8])) #define S1(n) (rot(n, 26u)^rot(n, 21u)^rot(n, 7u))
#define ch(n) Ch(Vals[(132  (n)) % 8],Vals[(133  (n)) % 8],Vals[(134  (n)) % 8]) #define maj(n) Ma(Vals[(129  (n)) % 8],Vals[(130  (n)) % 8],Vals[(128  (n)) % 8])
//t1 calc when W is already calculated #define t1(n) K[(n) % 64] + Vals[(135  (n)) % 8] + W[(n)] + s1(n) + ch(n)
//t1 calc which calculates W #define t1W(n) K[(n) % 64] + Vals[(135  (n)) % 8] + W(n) + s1(n) + ch(n)
//Used for constant W Values (the compiler optimizes out zeros) #define t1C(n) (K[(n) % 64]+ ConstW[(n)]) + Vals[(135  (n)) % 8] + s1(n) + ch(n)
//t2 Calc #define t2(n) maj(n) + s0(n)
#define rotC(x,n) (x<<n  x >> (32n))
//W calculation used for SHA round #define W(n) (W[n] = P4(n) + P3(n) + P2(n) + P1(n))
//Partial W calculations (used for the begining where only some values are nonzero) #define P1(n) ((rot(W[(n)2],15u)^rot(W[(n)2],13u)^((W[(n)2])>>10U))) #define P2(n) ((rot(W[(n)15],25u)^rot(W[(n)15],14u)^((W[(n)15])>>3U))) #define p1(x) ((rot(x,15u)^rot(x,13u)^((x)>>10U))) #define p2(x) ((rot(x,25u)^rot(x,14u)^((x)>>3U))) #define P3(n) W[n7] #define P4(n) W[n16]
//Partial Calcs for constant W values #define P1C(n) ((rotC(ConstW[(n)2],15)^rotC(ConstW[(n)2],13)^((ConstW[(n)2])>>10U))) #define P2C(n) ((rotC(ConstW[(n)15],25)^rotC(ConstW[(n)15],14)^((ConstW[(n)15])>>3U))) #define P3C(x) ConstW[x7] #define P4C(x) ConstW[x16]
//SHA round with built in W calc #define sharoundW(n) Barrier1(n); Vals[(131  (n)) % 8] += t1W(n); Vals[(135  (n)) % 8] = t1W(n) + t2(n);
//SHA round without W calc #define sharound(n) Barrier2(n); Vals[(131  (n)) % 8] += t1(n); Vals[(135  (n)) % 8] = t1(n) + t2(n);
//SHA round for constant W values #define sharoundC(n) Barrier2(n); Vals[(131  (n)) % 8] += t1C(n); Vals[(135  (n)) % 8] = t1C(n) + t2(n);
//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order #define Barrier1(n) t1 = t1C((n+1)) #define Barrier2(n) t1 = t1C((n))
__kernel //removed this to allow detection of invalid work size //__attribute__((reqd_work_group_size(WORKSIZE, 1, 1))) void search( const uint state0, const uint state1, const uint state2, const uint state3, const uint state4, const uint state5, const uint state6, const uint state7, const uint B1, const uint C1, const uint D1, const uint F1, const uint G1, const uint H1, const u base, const uint W16, const uint W17, const uint PreVal4, const uint PreVal0, const uint PreW31, const uint PreW32, const uint PreW19, const uint PreW20, __global uint * output) {
u W[124]; u Vals[8];
//Dummy Variable to prevent compiler from reordering between rounds u t1;
W[16] = W16; W[17] = W17;
#ifdef VECTORS8 //Modified from VECTORS4 W[3] = base + (uint)(get_local_id(0)) * 8u + (uint)(get_group_id(0)) * (WORKSIZE * 8u); uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U); W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};
#elif defined VECTORS4 //Less dependencies to get both the local id and group id and then add them W[3] = base + (uint)(get_local_id(0)) * 4u + (uint)(get_group_id(0)) * (WORKSIZE * 4u); uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U); //Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3 W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U};
#elif defined VECTORS W[3] = base + (uint)(get_local_id(0)) * 2u + (uint)(get_group_id(0)) * (WORKSIZE * 2u); uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U); W[18] = PreW20 + (u){r, r ^ 0x2004000U};
#else W[3] = base + get_local_id(0) + get_group_id(0) * (WORKSIZE); u r = rot(W[3],25u)^rot(W[3],14u)^((W[3])>>3U); W[18] = PreW20 + r; #endif
//the order of the W calcs and Rounds is like this because the compiler needs help finding how to order the instructions
//Vals[0]=state0; Vals[0] = PreVal0 + W[3]; Vals[1]=B1; Vals[2]=C1; Vals[3]=D1; //Vals[4]=PreVal4; Vals[4] = PreVal4 + W[3]; Vals[5]=F1; Vals[6]=G1; Vals[7]=H1;
sharoundC(4); W[19] = PreW19 + W[3]; sharoundC(5); W[20] = P1(20) + P4C(20); sharoundC(6); W[21] = P1(21); sharoundC(7); W[22] = P1(22) + P3C(22); sharoundC(8); W[23] = W[16] + P1(23); sharoundC(9); W[24] = W[17] + P1(24); sharoundC(10); W[25] = P1(25) + P3(25); W[26] = P1(26) + P3(26); sharoundC(11); W[27] = P1(27) + P3(27); W[28] = P1(28) + P3(28); sharoundC(12); W[29] = P1(29) + P3(29); sharoundC(13); W[30] = P1(30) + P2C(30) + P3(30); W[31] = P1(31) + P3(31) + PreW31; sharoundC(14); W[32] = P1(32) + P3(32) + PreW32; sharoundC(15); sharound(16); sharound(17); sharound(18); sharound(19); sharound(20); sharound(21); sharound(22); sharound(23); sharound(24); sharound(25); sharound(26); sharound(27); sharound(28); sharound(29); sharound(30); sharound(31); sharound(32); sharoundW(33); sharoundW(34); sharoundW(35); sharoundW(36); sharoundW(37); sharoundW(38); sharoundW(39); sharoundW(40); sharoundW(41); sharoundW(42); sharoundW(43); sharoundW(44); sharoundW(45); sharoundW(46); sharoundW(47); sharoundW(48); sharoundW(49); sharoundW(50); sharoundW(51); sharoundW(52); sharoundW(53); sharoundW(54); sharoundW(55); sharoundW(56); sharoundW(57); sharoundW(58); sharoundW(59); sharoundW(60); sharoundW(61); sharoundW(62); sharoundW(63);
W[64]=state0+Vals[0]; W[65]=state1+Vals[1]; W[66]=state2+Vals[2]; W[67]=state3+Vals[3]; W[68]=state4+Vals[4]; W[69]=state5+Vals[5]; W[70]=state6+Vals[6]; W[71]=state7+Vals[7];
Vals[0]=H[0]; Vals[1]=H[1]; Vals[2]=H[2]; // Vals[3]=H[3]; Vals[3] = 0xa54ff53aU + (0xb0edbdd0U + K[0]) + W[64]; Vals[4]=H[4]; Vals[5]=H[5]; Vals[6]=H[6]; // Vals[7]=H[7]; Vals[7] = 0x08909ae5U + (0xb0edbdd0U + K[0]) + W[64];
//const u Temp = (0xb0edbdd0U + K[0]) + W[64];
//#define P124(n) P1(n) + P2(n) + P4(n)
W[80] = + P2(80) + P4(80); sharound(65); W[81] = P1C(81) + P2(81) + P4(81); sharound(66); W[82] = P1(82) + P2(82) + P4(82); sharound(67); W[83] = P1(83) + P2(83) + P4(83); sharound(68); W[84] = P1(84) + P2(84) + P4(84); sharound(69); W[85] = P1(85) + P2(85) + P4(85); sharound(70); W[86] = P1(86) + P2(86) + P3C(86) + P4(86); sharound(71); W[87] = P1(87) + P2C(87) + P3(87) + P4(87); sharoundC(72); W[88] = P1(88) + P3(88) + P4C(88); sharoundC(73); W[89] = P1(89) + P3(89); sharoundC(74); W[90] = P1(90) + P3(90); sharoundC(75); W[91] = P1(91) + P3(91); sharoundC(76); W[92] = P1(92) + P3(92); sharoundC(77); W[93] = P1(93) + P3(93); W[94] = P1(94) + P2C(94) + P3(94); sharoundC(78); W[95] = P1(95) + P2(95) + P3(95) + P4C(95); sharoundC(79); sharound(80); sharound(81); sharound(82); sharound(83); sharound(84); sharound(85); sharound(86); sharound(87); sharound(88); sharound(89); sharound(90); sharound(91); sharound(92); sharound(93); sharound(94); sharound(95); sharoundW(96); sharoundW(97); sharoundW(98); sharoundW(99); sharoundW(100); sharoundW(101); sharoundW(102); sharoundW(103); sharoundW(104); sharoundW(105); sharoundW(106); sharoundW(107); sharoundW(108); sharoundW(109); sharoundW(110); sharoundW(111); sharoundW(112); sharoundW(113); sharoundW(114); sharoundW(115); sharoundW(116); sharoundW(117); sharoundW(118); sharoundW(119); sharoundW(120); sharoundW(121); sharoundW(122);
u v = W[117] + W[108] + Vals[3] + Vals[7] + P2(124) + P1(124) + Ch((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123),Vals[1],Vals[2]); u g = (K[60] + H[7])  S1((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123));
uint nonce = 0;
#ifdef VECTORS8 if (v.s0 == g.s0) { nonce = W[3].s0; } if (v.s1 == g.s1) { nonce = W[3].s1; } if (v.s2 == g.s2) { nonce = W[3].s2; } if (v.s3 == g.s3) { nonce = W[3].s3; } if (v.s4 == g.s4) { nonce = W[3].s4; } if (v.s5 == g.s5) { nonce = W[3].s5; } if (v.s6 == g.s6) { nonce = W[3].s6; } if (v.s7 == g.s7) { nonce = W[3].s7; } #elif defined VECTORS4 if (v.s0 == g.s0) { nonce = W[3].s0; } if (v.s1 == g.s1) { nonce = W[3].s1; } if (v.s2 == g.s2) { nonce = W[3].s2; } if (v.s3 == g.s3) { nonce = W[3].s3; } #elif defined VECTORS if (v.s0 == g.s0) { nonce = W[3].s0; } if (v.s1 == g.s1) { nonce = W[3].s1; } #else if (v == g) { nonce = W[3]; } #endif if(nonce) { //Faster to shift the nonce by 2 due to 4DWORD addressing and does not add more collisions output[WORKSIZE] = nonce; output[get_local_id(0)] = nonce; } } I think the problem's in this line: W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U}; I probably used the wrong values on the second half. Edit: Bingo! But still have a spill...

Funroll_Loops, the theoretically quicker breakfast cereal! Check out http://www.facebook.com/JupiterICT for all of your computing needs. If you need it, we can get it. We have solutions for your computing conundrums. BTC accepted! 12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq



CFSworks
Member
Offline
Activity: 63
Merit: 10


February 07, 2012, 05:11:55 AM 

Will this include automatic fan and gpu management like cgminer ?
What about killing mining thread on GPU that has a dead fan while I am away ?
While both of these features are really nice, there is no easy way to interface with these functions on the card through OpenCL. We can do it, and I've asked jedi95 about it. His stance on the issue is pretty much what lodcrappo said: please no. don't turn phoenix into the monstrosity of some other miners. it's simplicity is it's beauty.
do one thing, and do it well.
when you throw everything possible into one program, you end up with too many compromises.
So, we're definitely not adding GPU management features to the Phoenix 2 core. IMHO cgminer is good because of auto fan feature and support for backup pools and screw all that python BS.
Thank you !
It's important to keep in mind that cgminer and Phoenix are alternatives, not competitors. We're both on the same side here. If you like cgminer better, please, use it! Phoenix's purpose is to do things differently for those that don't prefer the way cgminer operates. (And, Python BS? I'm assuming you mean the dependencies you have to install to get Phoenix operational, since I don't see how choice of language affects the enduser, especially when the core was written with speed in mind.) I 100% agree and support this decision. Remember the KISS principle. If you need OC stick to AMD API thing.
Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed nonkernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.




lodcrappo


February 07, 2012, 05:18:12 AM 

Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed nonkernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.
as the developer of a popular mining farm management system that uses "best of breed" tools for each function, including phoenix for the mining client part, I think this is a great idea. Currently we have to hack the management code into each phoenix release, which isn't a big deal but having a standard way to interface would be much nicer.




wind
Member
Offline
Activity: 125
Merit: 10


February 07, 2012, 10:49:13 AM 

is it possible to add commandline parameters like phoenix 1.x does?

BTC : 1LhadV94a3GqFSFg7eDZiQqERSt78w4fKA



echris1


February 07, 2012, 12:50:21 PM 

Just switched all my miners to this after a drastic drop in cgminer performance (something to do with SDK, not sure)
Working great on my 2x6770, 6850 and 5770, back at max after a bit of tweaking.
I was just wondering how to add a backup pool to the conf file?
Keep up the good work!




HendrikJan
Member
Offline
Activity: 64
Merit: 10


February 07, 2012, 02:59:18 PM 

Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed nonkernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.
It could be a nice way to get the best of both worlds. But i still get less Mh/s with this 2.0 version. What could be the difference? Should i get the same result or does this version still need some tweaking?




Diapolo


February 07, 2012, 03:47:42 PM 

[general] autodetect = +cl backend = XYZ verbose = true
[cl:0:0] kernel = diakgcn aggression = 12 vectors2 = true vectors4 = false vectors8 = false worksize = 256
[cl:0:1] disabled = true
[cl:0:2] disabled = true
[web] disabled = true
Above config generates "Detected [cl:0:0]: [Tahiti 0] using opencl (rating 2)", which I don't understand. Shouldn't it simply use the kernel specified and tell that autodetect had been overridden by own settings. Another thing I don't understand is, why getDevice() and autodetect() reside in kernels\opencl\__init__.py I dislike the idea, that these functions are derived from there, because opencl is simply another kernel folder. I think they should be placed somewhere else (have no good idea currently, but perhaps in PhoenixCore.py). The supplied phatk2 version uses stuff in opencl\__init__.py, too via "opencl = sys.modules['opencl']", which seems sort of not ideal. I think every kernel should specify his own options and stuff, even if they are the same. Your idea was perhaps to edit only one place, if you add new changes, but for addon kernels like diakgcn I really have to specify my own options, which I would promote as a rule for all supplied or addon kernels to be better structured and to be independend of the opencl kernel folder. What do you think? Another small change I would suggest for analyzeDevice() is your CPU detection code, which could be replaced with: # Check if the device is a CPU if device.get_info(cl.device_info.TYPE) == cl.device_type.CPU: return (1, {'name': name, 'aggression': 0}, [devid, 'cpu:0']) Dia




Schwede65


February 07, 2012, 05:44:43 PM 

great work... just tested 1.75... and now only one process for all gpu's
Question of setting the backuppool:
beginning of phoenix.cfg:
[general] verbose = True autodetect = +cl cpu backend = http: 123:45@67.89.0123:8332/ # URL format is exactly as it was in Phoenix 1 backup = http: 456:78@90.12.0123:8332/
is that correctly done?




d3m0n1q_733rz


February 07, 2012, 07:26:38 PM 

Think you could add the ability to use plugins? That might shutup some of the people wanting more functions like overclocking, slowing hash rates based on core temps, restarting of crashed GPUs, pausing GPUs, etc. by letting them code things their self. Just push P for plugins and start configuring away or access the config file for the plugin directly to change settings. Seems like a good solution.

Funroll_Loops, the theoretically quicker breakfast cereal! Check out http://www.facebook.com/JupiterICT for all of your computing needs. If you need it, we can get it. We have solutions for your computing conundrums. BTC accepted! 12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq



ssateneth
Legendary
Offline
Activity: 1330
Merit: 1004


February 07, 2012, 09:23:38 PM 

great work... just tested 1.75... and now only one process for all gpu's
Question of setting the backuppool:
beginning of phoenix.cfg:
[general] verbose = True autodetect = +cl cpu backend = http: 123:45@67.89.0123:8332/ # URL format is exactly as it was in Phoenix 1 backup = http: 456:78@90.12.0123:8332/
is that correctly done?
there is no backup pool support (atm)




jedi95


February 08, 2012, 02:33:32 AM 

Above config generates "Detected [cl:0:0]: [Tahiti 0] using opencl (rating 2)", which I don't understand. Shouldn't it simply use the kernel specified and tell that autodetect had been overridden by own settings.
That's because you still have autodetect = +cl in the config file. Any devices with specific settings defined in the config file will use those instead of autodetect. The autodetect messages are currently displayed even if the settings are overridden by the config file. This will be clarified in a future release by either hiding the autodetect message or changing the message to indicate that the userdefined settings were used. Another thing I don't understand is, why getDevice() and autodetect() reside in kernels\opencl\__init__.py I dislike the idea, that these functions are derived from there, because opencl is simply another kernel folder. I think they should be placed somewhere else (have no good idea currently, but perhaps in PhoenixCore.py).
The reason we have the device detection code at the kernel level is so that it can support any type of device. For example, the current FPGA miners don't have a standard API, which makes including this functionality in the Phoenix core a bad idea. We would have to add support for new devices into the Phoenix core. By doing these functions at the kernel level, it allows other developers to support new hardware with no changes to Phoenix itself. The supplied phatk2 version uses stuff in opencl\__init__.py, too via "opencl = sys.modules['opencl']", which seems sort of not ideal. I think every kernel should specify his own options and stuff, even if they are the same. Your idea was perhaps to edit only one place, if you add new changes, but for addon kernels like diakgcn I really have to specify my own options, which I would promote as a rule for all supplied or addon kernels to be better structured and to be independend of the opencl kernel folder. What do you think?
Using functions from opencl for other kernels isn't required. This is simply how we decided to implement the supplied version of phatk2. Kernels DO NOT need to be implemented in this way. Another small change I would suggest for analyzeDevice() is your CPU detection code, which could be replaced with: # Check if the device is a CPU if device.get_info(cl.device_info.TYPE) == cl.device_type.CPU: return (1, {'name': name, 'aggression': 0}, [devid, 'cpu:0']) Thanks for this code, I will modify opencl/phatk2 to use this method of detecting CPUs.

Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU



baby_ghost
Member
Offline
Activity: 77
Merit: 10


February 08, 2012, 05:16:33 AM 

Here is my phoenix.cfg, may I wrong on other sections? Miner speed lower than Phoenix 1.7.5 [general] verbose = True autodetect = +cl cpu # The rightmost parameter takes precedence. This enables all OpenCL devices, except those that are CPUs. backend = http://vinhpk.06:thunganhi@api.bitcoin.cz:8332/ # URL format is exactly as it was in Phoenix 1
[web] password = rpc_password # Set an RPC password to keep people from messing with your miners.
# If you want to configure miners yourself, edit and uncomment this section: #[cl:0:0] # Or whatever ID you want to configure. #autoconfigure = True # Do you still want autoconfiguration? #disabled = False # Do you want to disable the miner? #bfi_int = True # Any other kernel options... #vectors = False # can go into this section. phatk2 = True VECTORS = True BFI_INT = True WORKSIZE = 256 AGGRESSION = 11 FASTLOOPS = false

Yahoo / Skype: khacvinhp



