Bitcoin Forum
November 19, 2024, 12:30:27 AM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 5 6 7 8 9 »  All
  Print  
Author Topic: Phoenix 2 beta discussion  (Read 58074 times)
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
February 06, 2012, 08:31:17 PM
 #41

For the ones, who want to try DiaKGCN with Phoenix 2, here is a preview:
http://www.filedropper.com/diakgcnphoenix2preview_1

This can be used as config:

Code:
[cl:0:0]
autoconfigure = false
kernel = diakgcn
aggression = 12
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256

The kernel is the same as the last version for Phoenix 1.7.X, only has some small fixes in the init.

Please report bugs, glitches, ideas and results back to me, thanks!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
lodcrappo
Hero Member
*****
Offline Offline

Activity: 616
Merit: 506


View Profile
February 06, 2012, 08:40:02 PM
 #42

please add to phoenix overclocking support...

please no.  don't turn phoenix into the monstrosity of some other miners.  it's simplicity is it's beauty.

do one thing, and do it well.

when you throw everything possible into one program, you end up with too many compromises.

bulanula
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile
February 06, 2012, 08:49:20 PM
 #43

please add to phoenix overclocking support...

please no.  don't turn phoenix into the monstrosity of some other miners.  it's simplicity is it's beauty.

do one thing, and do it well.

when you throw everything possible into one program, you end up with too many compromises.



I 100% agree and support this decision. Remember the KISS principle. If you need OC stick to AMD API thing.
HendrikJan
Member
**
Offline Offline

Activity: 64
Merit: 10



View Profile
February 06, 2012, 09:41:21 PM
Last edit: February 06, 2012, 09:52:37 PM by HendrikJan
 #44

I did try the diakgcn kernel.
Still about 10Mh/s less then older version.

Also would like to know what the "Rolling time" is.

Referral link to a trading bot: https://www.cryptohopper.com/?atid=224
Barlog
Member
**
Offline Offline

Activity: 65
Merit: 10


View Profile
February 06, 2012, 09:46:56 PM
 #45

You will need new address for donations aka 2PHoenix  Cheesy
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 06, 2012, 11:04:57 PM
Last edit: February 07, 2012, 12:37:32 AM by d3m0n1q_733rz
 #46

Alright, I have a register spill somewhere in here...can someone find it for me?  I'm using 8 vectors to make better use of the 16 available to the HD79xx cards.  But the code itself isn't made to handle it.  I just can't figure out what part needs to be changed to make it capable.   Embarrassed

Code:
// This file is in the public domain

#ifdef VECTORS8
typedef uint8 u;
#elif defined VECTORS4
typedef uint4 u;
#elif defined VECTORS
typedef uint2 u;
#else
typedef uint u;
#endif

__constant uint K[64] = {
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

__constant uint ConstW[128] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000280U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000100U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000
};

__constant uint H[8] = {
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
};

#ifdef BITALIGN
#pragma OPENCL EXTENSION cl_amd_media_ops : enable
#define rot(x, y) amd_bitalign(x, x, (uint)(32 - y))
#else
#define rot(x, y) rotate(x, (uint)y)
#endif

// Some AMD devices have the BFI_INT opcode, which behaves exactly like the
// SHA-256 Ch function, but provides it in exactly one instruction. If
// detected, use it for Ch. Otherwise, use bitselect() for Ch.

#ifdef BFI_INT
// Well, slight problem... It turns out BFI_INT isn't actually exposed to
// OpenCL (or CAL IL for that matter) in any way. However, there is
// a similar instruction, BYTE_ALIGN_INT, which is exposed to OpenCL via
// amd_bytealign, takes the same inputs, and provides the same output.
// We can use that as a placeholder for BFI_INT and have the application
// patch it after compilation.

// This is the BFI_INT function
#define Ch(x, y, z) amd_bytealign(x,y,z)
// Ma can also be implemented in terms of BFI_INT...
#define Ma(z, x, y) amd_bytealign(z^x,y,x)
#else
#define Ch(x, y, z) bitselect(z,y,x)
#define Ma(x, y, z) bitselect(x,y,(z^x))
#endif

//Various intermediate calculations for each SHA round
#define s0(n) (S0(Vals[(128 - (n)) % 8]))
#define S0(n) (rot(n, 30u)^rot(n, 19u)^rot(n,10u))

#define s1(n) (S1(Vals[(132 - (n)) % 8]))
#define S1(n) (rot(n, 26u)^rot(n, 21u)^rot(n, 7u))

#define ch(n) Ch(Vals[(132 - (n)) % 8],Vals[(133 - (n)) % 8],Vals[(134 - (n)) % 8])
#define maj(n) Ma(Vals[(129 - (n)) % 8],Vals[(130 - (n)) % 8],Vals[(128 - (n)) % 8])

//t1 calc when W is already calculated
#define t1(n) K[(n) % 64] + Vals[(135 - (n)) % 8] +  W[(n)] + s1(n) + ch(n)

//t1 calc which calculates W
#define t1W(n) K[(n) % 64] + Vals[(135 - (n)) % 8] +  W(n) + s1(n) + ch(n)

//Used for constant W Values (the compiler optimizes out zeros)
#define t1C(n) (K[(n) % 64]+ ConstW[(n)]) + Vals[(135 - (n)) % 8] + s1(n) + ch(n)

//t2 Calc
#define t2(n)  maj(n) + s0(n)

#define rotC(x,n) (x<<n | x >> (32-n))

//W calculation used for SHA round
#define W(n) (W[n] = P4(n) + P3(n) + P2(n) + P1(n))


//Partial W calculations (used for the begining where only some values are nonzero)
#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U)))
#define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U)))
#define p1(x) ((rot(x,15u)^rot(x,13u)^((x)>>10U)))
#define p2(x) ((rot(x,25u)^rot(x,14u)^((x)>>3U)))
#define P3(n)  W[n-7]
#define P4(n)  W[n-16]

//Partial Calcs for constant W values
#define P1C(n) ((rotC(ConstW[(n)-2],15)^rotC(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U)))
#define P2C(n) ((rotC(ConstW[(n)-15],25)^rotC(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U)))
#define P3C(x)  ConstW[x-7]
#define P4C(x)  ConstW[x-16]

//SHA round with built in W calc
#define sharoundW(n) Barrier1(n);  Vals[(131 - (n)) % 8] += t1W(n); Vals[(135 - (n)) % 8] = t1W(n) + t2(n);  

//SHA round without W calc
#define sharound(n)  Barrier2(n); Vals[(131 - (n)) % 8] += t1(n); Vals[(135 - (n)) % 8] = t1(n) + t2(n);

//SHA round for constant W values
#define sharoundC(n)  Barrier2(n); Vals[(131 - (n)) % 8] += t1C(n); Vals[(135 - (n)) % 8] = t1C(n) + t2(n);

//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order
#define Barrier1(n) t1 = t1C((n+1))
#define Barrier2(n) t1 = t1C((n))

__kernel
//removed this to allow detection of invalid work size
//__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
void search( const uint state0, const uint state1, const uint state2, const uint state3,
const uint state4, const uint state5, const uint state6, const uint state7,
const uint B1, const uint C1, const uint D1,
const uint F1, const uint G1, const uint H1,
const u base,
const uint W16, const uint W17,
const uint PreVal4, const uint PreVal0,
const uint PreW31, const uint PreW32,
const uint PreW19, const uint PreW20,
__global uint * output)
{

u W[124];
u Vals[8];

//Dummy Variable to prevent compiler from reordering between rounds
u t1;

W[16] = W16;
W[17] = W17;

#ifdef VECTORS8

//Modified from VECTORS4
W[3] = base + (uint)(get_local_id(0)) * 8u + (uint)(get_group_id(0)) * (WORKSIZE * 8u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};

#elif defined VECTORS4
//Less dependencies to get both the local id and group id and then add them
W[3] = base + (uint)(get_local_id(0)) * 4u + (uint)(get_group_id(0)) * (WORKSIZE * 4u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U};

#elif defined VECTORS
W[3] = base + (uint)(get_local_id(0)) * 2u + (uint)(get_group_id(0)) * (WORKSIZE * 2u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
W[18] = PreW20 + (u){r, r ^ 0x2004000U};

#else
W[3] = base + get_local_id(0) + get_group_id(0) * (WORKSIZE);
u r = rot(W[3],25u)^rot(W[3],14u)^((W[3])>>3U);
W[18] = PreW20 + r;
#endif

//the order of the W calcs and Rounds is like this because the compiler needs help finding how to order the instructions


//Vals[0]=state0;
Vals[0] = PreVal0 + W[3];
Vals[1]=B1;
Vals[2]=C1;
Vals[3]=D1;
//Vals[4]=PreVal4;
Vals[4] = PreVal4 + W[3];
Vals[5]=F1;
Vals[6]=G1;
Vals[7]=H1;

sharoundC(4);
W[19] = PreW19 + W[3];
sharoundC(5);
W[20] = P1(20) + P4C(20);
sharoundC(6);
W[21] = P1(21);
sharoundC(7);
W[22] = P1(22) + P3C(22);
sharoundC(8);
W[23] = W[16] + P1(23);
sharoundC(9);
W[24] = W[17] + P1(24);
sharoundC(10);
W[25] = P1(25) + P3(25);
W[26] = P1(26) + P3(26);
sharoundC(11);
W[27] = P1(27) + P3(27);
W[28] = P1(28) + P3(28);
sharoundC(12);
W[29] = P1(29) + P3(29);
sharoundC(13);
W[30] = P1(30) + P2C(30) + P3(30);
W[31] = P1(31) + P3(31) + PreW31;
sharoundC(14);
W[32] = P1(32) + P3(32) + PreW32;
sharoundC(15);
sharound(16);
sharound(17);
sharound(18);
sharound(19);
sharound(20);
sharound(21);
sharound(22);
sharound(23);
sharound(24);
sharound(25);
sharound(26);
sharound(27);
sharound(28);
sharound(29);
sharound(30);
sharound(31);
sharound(32);
sharoundW(33);
sharoundW(34);
sharoundW(35);
sharoundW(36);
sharoundW(37);
sharoundW(38);
sharoundW(39);
sharoundW(40);
sharoundW(41);
sharoundW(42);
sharoundW(43);
sharoundW(44);
sharoundW(45);
sharoundW(46);
sharoundW(47);
sharoundW(48);
sharoundW(49);
sharoundW(50);
sharoundW(51);
sharoundW(52);
sharoundW(53);
sharoundW(54);
sharoundW(55);
sharoundW(56);
sharoundW(57);
sharoundW(58);
sharoundW(59);
sharoundW(60);
sharoundW(61);
sharoundW(62);
sharoundW(63);

W[64]=state0+Vals[0];
W[65]=state1+Vals[1];
W[66]=state2+Vals[2];
W[67]=state3+Vals[3];
W[68]=state4+Vals[4];
W[69]=state5+Vals[5];
W[70]=state6+Vals[6];
W[71]=state7+Vals[7];

Vals[0]=H[0];
Vals[1]=H[1];
Vals[2]=H[2];
// Vals[3]=H[3];
Vals[3] = 0xa54ff53aU + (0xb0edbdd0U + K[0]) +  W[64];
Vals[4]=H[4];
Vals[5]=H[5];
Vals[6]=H[6];
// Vals[7]=H[7];
Vals[7] = 0x08909ae5U + (0xb0edbdd0U + K[0]) +  W[64];

//const u Temp = (0xb0edbdd0U + K[0]) +  W[64];



//#define P124(n) P1(n) + P2(n) + P4(n)

W[80] = + P2(80) + P4(80);
sharound(65);
W[81] = P1C(81) + P2(81) + P4(81);
sharound(66);
W[82] = P1(82) + P2(82) + P4(82);
sharound(67);
W[83] = P1(83) + P2(83) + P4(83);
sharound(68);
W[84] = P1(84) + P2(84) + P4(84);
sharound(69);
W[85] = P1(85) + P2(85) + P4(85);
sharound(70);
W[86] = P1(86) + P2(86) + P3C(86) + P4(86);
sharound(71);
W[87] = P1(87) + P2C(87) + P3(87) + P4(87);
sharoundC(72);
W[88] =   P1(88) + P3(88) + P4C(88);
sharoundC(73);
W[89] = P1(89) + P3(89);
sharoundC(74);
W[90] = P1(90) + P3(90);
sharoundC(75);
W[91] = P1(91) + P3(91);
sharoundC(76);
W[92] = P1(92) + P3(92);
sharoundC(77);
W[93] = P1(93) + P3(93);
W[94] = P1(94) + P2C(94) + P3(94);
sharoundC(78);
W[95] = P1(95) + P2(95) + P3(95) + P4C(95);
sharoundC(79);
sharound(80);
sharound(81);
sharound(82);
sharound(83);
sharound(84);
sharound(85);
sharound(86);
sharound(87);
sharound(88);
sharound(89);
sharound(90);
sharound(91);
sharound(92);
sharound(93);
sharound(94);
sharound(95);
sharoundW(96);
sharoundW(97);
sharoundW(98);
sharoundW(99);
sharoundW(100);
sharoundW(101);
sharoundW(102);
sharoundW(103);
sharoundW(104);
sharoundW(105);
sharoundW(106);
sharoundW(107);
sharoundW(108);
sharoundW(109);
sharoundW(110);
sharoundW(111);
sharoundW(112);
sharoundW(113);
sharoundW(114);
sharoundW(115);
sharoundW(116);
sharoundW(117);
sharoundW(118);
sharoundW(119);
sharoundW(120);
sharoundW(121);
sharoundW(122);

u v = W[117] + W[108] + Vals[3] + Vals[7] + P2(124) + P1(124) + Ch((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123),Vals[1],Vals[2]);
u g = -(K[60] + H[7]) - S1((Vals[0] + Vals[4]) + (K[59] + W(123))  + s1(123)+ ch(123));

uint nonce = 0;

#ifdef VECTORS8
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
if (v.s2 == g.s2)
{
nonce = W[3].s2;
}
if (v.s3 == g.s3)
{
nonce = W[3].s3;
}
if (v.s4 == g.s4)
{
nonce = W[3].s4;
}
if (v.s5 == g.s5)
{
nonce = W[3].s5;
}
if (v.s6 == g.s6)
{
nonce = W[3].s6;
}
if (v.s7 == g.s7)
{
nonce = W[3].s7;
}
#elif defined VECTORS4
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
if (v.s2 == g.s2)
{
nonce = W[3].s2;
}
if (v.s3 == g.s3)
{
nonce = W[3].s3;
}
#elif defined VECTORS
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
#else
if (v == g)
{
nonce = W[3];
}
#endif
if(nonce)
{
//Faster to shift the nonce by 2 due to 4-DWORD addressing and does not add more collisions
output[WORKSIZE] = nonce;
output[get_local_id(0)] = nonce;
}
}

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
Bananington
Sr. Member
****
Offline Offline

Activity: 1470
Merit: 428



View Profile
February 06, 2012, 11:48:50 PM
 #47

Could we get a list of all accepted options?
Kernel selection
Worksize=
Vectors, Vectors4, Vectors8, Vectors16
etc.

The Phoenix 2 core itself recognizes (not an exhaustive list):
Code:
# [cl:0:0] autoconfigure # Allow the kernel to choose its own configuration variables?
# [cl:0:0] disabled # Prevent mining on this device?
# [cl:0:0] kernel # Can be used to manually choose a kernel to mine on.
# [cl:0:0] start_undetected # Start even if autodetect doesn't find it?
# [general] autodetect # A list of rules for what devices to autodetect.
# [general] backend # The URL to mine on in the backend.
# [general] logfile # Set this option to log to a file.
# [general] queuedelay # Advanced - ???
# [general] queuesize # Advanced - ???
# [general] ratesamples # Advanced - number of samples to average for rate reporting
# [general] statusinterval # Advanced - how long to delay between statusbar updates
# [general] verbose # Enable verbose mode? (Shows debug messages)
# [web] bind # Bind the web/RPC server to a specific IP
# [web] disabled # Disable the webserver altogether?
# [web] logbuffer # Advanced - how many logs to remember in the getlogs() RPC call
# [web] password # The password necessary for web/RPC login (username is ignored)
# [web] port # What port should the web/RPC server listen on?
# [web] root # Advanced - The root directory for the webserver.

The phatk2/opencl kernels recognize:
Code:
# [cl:0:0] vectors # Enable two-way vectors?
# [cl:0:0] vectors4 # Enable four-way vectors?
# [cl:0:0] fastloop # Advanced - fastloop optimization for low aggressions
# [cl:0:0] aggression # Controls how hard Phoenix 2 hits the hardware
# [cl:0:0] worksize # Advanced - controls size of individual executions
# [cl:0:0] bfi_int # Enable BFI_INT optimization for Radeon cards that support it

[cl:0:0]
autoconfigure = False
BFI_INT VECTORS4 WORKSIZE=64

What's wrong with this portion that it's not giving me the correct settings?

Try:
Code:
[cl:0:0]
autoconfigure = false # Not actually needed since autoconfiguration disables by default when you supply your own args
BFI_INT = true # Also the boolean options aren't case sensitive.
VECTORS4 = true
WORKSIZE = 64

I believe this should be edited into the first post for the late joiners. Smiley

Bananington
Sr. Member
****
Offline Offline

Activity: 1470
Merit: 428



View Profile
February 06, 2012, 11:59:03 PM
 #48

How high can I turn up the aggression? Is 12 the max?

d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 07, 2012, 12:29:36 AM
Last edit: February 07, 2012, 12:50:22 AM by d3m0n1q_733rz
 #49

Alright, I have a register spill somewhere in here...can someone find it for me?  I'm using 8 vectors to make better use of the 16 available to the HD79xx cards.  But the code itself isn't made to handle it.  I just can't figure out what part needs to be changed to make it capable.   Embarrassed

Code:
// This file is in the public domain

#ifdef VECTORS8
typedef uint8 u;
#elif defined VECTORS4
typedef uint4 u;
#elif defined VECTORS
typedef uint2 u;
#else
typedef uint u;
#endif

__constant uint K[64] = {
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

__constant uint ConstW[128] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000280U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000100U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000
};

__constant uint H[8] = {
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
};

#ifdef BITALIGN
#pragma OPENCL EXTENSION cl_amd_media_ops : enable
#define rot(x, y) amd_bitalign(x, x, (uint)(32 - y))
#else
#define rot(x, y) rotate(x, (uint)y)
#endif

// Some AMD devices have the BFI_INT opcode, which behaves exactly like the
// SHA-256 Ch function, but provides it in exactly one instruction. If
// detected, use it for Ch. Otherwise, use bitselect() for Ch.

#ifdef BFI_INT
// Well, slight problem... It turns out BFI_INT isn't actually exposed to
// OpenCL (or CAL IL for that matter) in any way. However, there is
// a similar instruction, BYTE_ALIGN_INT, which is exposed to OpenCL via
// amd_bytealign, takes the same inputs, and provides the same output.
// We can use that as a placeholder for BFI_INT and have the application
// patch it after compilation.

// This is the BFI_INT function
#define Ch(x, y, z) amd_bytealign(x,y,z)
// Ma can also be implemented in terms of BFI_INT...
#define Ma(z, x, y) amd_bytealign(z^x,y,x)
#else
#define Ch(x, y, z) bitselect(z,y,x)
#define Ma(x, y, z) bitselect(x,y,(z^x))
#endif

//Various intermediate calculations for each SHA round
#define s0(n) (S0(Vals[(128 - (n)) % 8]))
#define S0(n) (rot(n, 30u)^rot(n, 19u)^rot(n,10u))

#define s1(n) (S1(Vals[(132 - (n)) % 8]))
#define S1(n) (rot(n, 26u)^rot(n, 21u)^rot(n, 7u))

#define ch(n) Ch(Vals[(132 - (n)) % 8],Vals[(133 - (n)) % 8],Vals[(134 - (n)) % 8])
#define maj(n) Ma(Vals[(129 - (n)) % 8],Vals[(130 - (n)) % 8],Vals[(128 - (n)) % 8])

//t1 calc when W is already calculated
#define t1(n) K[(n) % 64] + Vals[(135 - (n)) % 8] +  W[(n)] + s1(n) + ch(n)

//t1 calc which calculates W
#define t1W(n) K[(n) % 64] + Vals[(135 - (n)) % 8] +  W(n) + s1(n) + ch(n)

//Used for constant W Values (the compiler optimizes out zeros)
#define t1C(n) (K[(n) % 64]+ ConstW[(n)]) + Vals[(135 - (n)) % 8] + s1(n) + ch(n)

//t2 Calc
#define t2(n)  maj(n) + s0(n)

#define rotC(x,n) (x<<n | x >> (32-n))

//W calculation used for SHA round
#define W(n) (W[n] = P4(n) + P3(n) + P2(n) + P1(n))


//Partial W calculations (used for the begining where only some values are nonzero)
#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U)))
#define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U)))
#define p1(x) ((rot(x,15u)^rot(x,13u)^((x)>>10U)))
#define p2(x) ((rot(x,25u)^rot(x,14u)^((x)>>3U)))
#define P3(n)  W[n-7]
#define P4(n)  W[n-16]

//Partial Calcs for constant W values
#define P1C(n) ((rotC(ConstW[(n)-2],15)^rotC(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U)))
#define P2C(n) ((rotC(ConstW[(n)-15],25)^rotC(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U)))
#define P3C(x)  ConstW[x-7]
#define P4C(x)  ConstW[x-16]

//SHA round with built in W calc
#define sharoundW(n) Barrier1(n);  Vals[(131 - (n)) % 8] += t1W(n); Vals[(135 - (n)) % 8] = t1W(n) + t2(n);  

//SHA round without W calc
#define sharound(n)  Barrier2(n); Vals[(131 - (n)) % 8] += t1(n); Vals[(135 - (n)) % 8] = t1(n) + t2(n);

//SHA round for constant W values
#define sharoundC(n)  Barrier2(n); Vals[(131 - (n)) % 8] += t1C(n); Vals[(135 - (n)) % 8] = t1C(n) + t2(n);

//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order
#define Barrier1(n) t1 = t1C((n+1))
#define Barrier2(n) t1 = t1C((n))

__kernel
//removed this to allow detection of invalid work size
//__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
void search( const uint state0, const uint state1, const uint state2, const uint state3,
const uint state4, const uint state5, const uint state6, const uint state7,
const uint B1, const uint C1, const uint D1,
const uint F1, const uint G1, const uint H1,
const u base,
const uint W16, const uint W17,
const uint PreVal4, const uint PreVal0,
const uint PreW31, const uint PreW32,
const uint PreW19, const uint PreW20,
__global uint * output)
{

u W[124];
u Vals[8];

//Dummy Variable to prevent compiler from reordering between rounds
u t1;

W[16] = W16;
W[17] = W17;

#ifdef VECTORS8

//Modified from VECTORS4
W[3] = base + (uint)(get_local_id(0)) * 8u + (uint)(get_group_id(0)) * (WORKSIZE * 8u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};

#elif defined VECTORS4
//Less dependencies to get both the local id and group id and then add them
W[3] = base + (uint)(get_local_id(0)) * 4u + (uint)(get_group_id(0)) * (WORKSIZE * 4u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U};

#elif defined VECTORS
W[3] = base + (uint)(get_local_id(0)) * 2u + (uint)(get_group_id(0)) * (WORKSIZE * 2u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
W[18] = PreW20 + (u){r, r ^ 0x2004000U};

#else
W[3] = base + get_local_id(0) + get_group_id(0) * (WORKSIZE);
u r = rot(W[3],25u)^rot(W[3],14u)^((W[3])>>3U);
W[18] = PreW20 + r;
#endif

//the order of the W calcs and Rounds is like this because the compiler needs help finding how to order the instructions


//Vals[0]=state0;
Vals[0] = PreVal0 + W[3];
Vals[1]=B1;
Vals[2]=C1;
Vals[3]=D1;
//Vals[4]=PreVal4;
Vals[4] = PreVal4 + W[3];
Vals[5]=F1;
Vals[6]=G1;
Vals[7]=H1;

sharoundC(4);
W[19] = PreW19 + W[3];
sharoundC(5);
W[20] = P1(20) + P4C(20);
sharoundC(6);
W[21] = P1(21);
sharoundC(7);
W[22] = P1(22) + P3C(22);
sharoundC(8);
W[23] = W[16] + P1(23);
sharoundC(9);
W[24] = W[17] + P1(24);
sharoundC(10);
W[25] = P1(25) + P3(25);
W[26] = P1(26) + P3(26);
sharoundC(11);
W[27] = P1(27) + P3(27);
W[28] = P1(28) + P3(28);
sharoundC(12);
W[29] = P1(29) + P3(29);
sharoundC(13);
W[30] = P1(30) + P2C(30) + P3(30);
W[31] = P1(31) + P3(31) + PreW31;
sharoundC(14);
W[32] = P1(32) + P3(32) + PreW32;
sharoundC(15);
sharound(16);
sharound(17);
sharound(18);
sharound(19);
sharound(20);
sharound(21);
sharound(22);
sharound(23);
sharound(24);
sharound(25);
sharound(26);
sharound(27);
sharound(28);
sharound(29);
sharound(30);
sharound(31);
sharound(32);
sharoundW(33);
sharoundW(34);
sharoundW(35);
sharoundW(36);
sharoundW(37);
sharoundW(38);
sharoundW(39);
sharoundW(40);
sharoundW(41);
sharoundW(42);
sharoundW(43);
sharoundW(44);
sharoundW(45);
sharoundW(46);
sharoundW(47);
sharoundW(48);
sharoundW(49);
sharoundW(50);
sharoundW(51);
sharoundW(52);
sharoundW(53);
sharoundW(54);
sharoundW(55);
sharoundW(56);
sharoundW(57);
sharoundW(58);
sharoundW(59);
sharoundW(60);
sharoundW(61);
sharoundW(62);
sharoundW(63);

W[64]=state0+Vals[0];
W[65]=state1+Vals[1];
W[66]=state2+Vals[2];
W[67]=state3+Vals[3];
W[68]=state4+Vals[4];
W[69]=state5+Vals[5];
W[70]=state6+Vals[6];
W[71]=state7+Vals[7];

Vals[0]=H[0];
Vals[1]=H[1];
Vals[2]=H[2];
// Vals[3]=H[3];
Vals[3] = 0xa54ff53aU + (0xb0edbdd0U + K[0]) +  W[64];
Vals[4]=H[4];
Vals[5]=H[5];
Vals[6]=H[6];
// Vals[7]=H[7];
Vals[7] = 0x08909ae5U + (0xb0edbdd0U + K[0]) +  W[64];

//const u Temp = (0xb0edbdd0U + K[0]) +  W[64];



//#define P124(n) P1(n) + P2(n) + P4(n)

W[80] = + P2(80) + P4(80);
sharound(65);
W[81] = P1C(81) + P2(81) + P4(81);
sharound(66);
W[82] = P1(82) + P2(82) + P4(82);
sharound(67);
W[83] = P1(83) + P2(83) + P4(83);
sharound(68);
W[84] = P1(84) + P2(84) + P4(84);
sharound(69);
W[85] = P1(85) + P2(85) + P4(85);
sharound(70);
W[86] = P1(86) + P2(86) + P3C(86) + P4(86);
sharound(71);
W[87] = P1(87) + P2C(87) + P3(87) + P4(87);
sharoundC(72);
W[88] =   P1(88) + P3(88) + P4C(88);
sharoundC(73);
W[89] = P1(89) + P3(89);
sharoundC(74);
W[90] = P1(90) + P3(90);
sharoundC(75);
W[91] = P1(91) + P3(91);
sharoundC(76);
W[92] = P1(92) + P3(92);
sharoundC(77);
W[93] = P1(93) + P3(93);
W[94] = P1(94) + P2C(94) + P3(94);
sharoundC(78);
W[95] = P1(95) + P2(95) + P3(95) + P4C(95);
sharoundC(79);
sharound(80);
sharound(81);
sharound(82);
sharound(83);
sharound(84);
sharound(85);
sharound(86);
sharound(87);
sharound(88);
sharound(89);
sharound(90);
sharound(91);
sharound(92);
sharound(93);
sharound(94);
sharound(95);
sharoundW(96);
sharoundW(97);
sharoundW(98);
sharoundW(99);
sharoundW(100);
sharoundW(101);
sharoundW(102);
sharoundW(103);
sharoundW(104);
sharoundW(105);
sharoundW(106);
sharoundW(107);
sharoundW(108);
sharoundW(109);
sharoundW(110);
sharoundW(111);
sharoundW(112);
sharoundW(113);
sharoundW(114);
sharoundW(115);
sharoundW(116);
sharoundW(117);
sharoundW(118);
sharoundW(119);
sharoundW(120);
sharoundW(121);
sharoundW(122);

u v = W[117] + W[108] + Vals[3] + Vals[7] + P2(124) + P1(124) + Ch((Vals[0] + Vals[4]) + (K[59] + W(123)) + s1(123)+ ch(123),Vals[1],Vals[2]);
u g = -(K[60] + H[7]) - S1((Vals[0] + Vals[4]) + (K[59] + W(123))  + s1(123)+ ch(123));

uint nonce = 0;

#ifdef VECTORS8
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
if (v.s2 == g.s2)
{
nonce = W[3].s2;
}
if (v.s3 == g.s3)
{
nonce = W[3].s3;
}
if (v.s4 == g.s4)
{
nonce = W[3].s4;
}
if (v.s5 == g.s5)
{
nonce = W[3].s5;
}
if (v.s6 == g.s6)
{
nonce = W[3].s6;
}
if (v.s7 == g.s7)
{
nonce = W[3].s7;
}
#elif defined VECTORS4
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
if (v.s2 == g.s2)
{
nonce = W[3].s2;
}
if (v.s3 == g.s3)
{
nonce = W[3].s3;
}
#elif defined VECTORS
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
#else
if (v == g)
{
nonce = W[3];
}
#endif
if(nonce)
{
//Faster to shift the nonce by 2 due to 4-DWORD addressing and does not add more collisions
output[WORKSIZE] = nonce;
output[get_local_id(0)] = nonce;
}
}
I think the problem's in this line:  W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};
I probably used the wrong values on the second half.
Edit:  Bingo!  But still have a spill...

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
CFSworks (OP)
Member
**
Offline Offline

Activity: 63
Merit: 10


View Profile
February 07, 2012, 05:11:55 AM
 #50

Will this include automatic fan and gpu management like cgminer ?

What about killing mining thread on GPU that has a dead fan while I am away ?

While both of these features are really nice, there is no easy way to interface with these functions on the card through OpenCL. We can do it, and I've asked jedi95 about it. His stance on the issue is pretty much what lodcrappo said:
please no.  don't turn phoenix into the monstrosity of some other miners.  it's simplicity is it's beauty.

do one thing, and do it well.

when you throw everything possible into one program, you end up with too many compromises.

So, we're definitely not adding GPU management features to the Phoenix 2 core.

IMHO cgminer is good because of auto fan feature and support for backup pools and screw all that python BS.

Thank you !

It's important to keep in mind that cgminer and Phoenix are alternatives, not competitors. We're both on the same side here. If you like cgminer better, please, use it! Smiley Phoenix's purpose is to do things differently for those that don't prefer the way cgminer operates. (And, Python BS? I'm assuming you mean the dependencies you have to install to get Phoenix operational, since I don't see how choice of language affects the end-user, especially when the core was written with speed in mind.)


I 100% agree and support this decision. Remember the KISS principle. If you need OC stick to AMD API thing.

Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed non-kernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.

Phoenix Miner developer

PGP/GPG key: FC5461A3
Personal donations: 1Abq88sPz2MjH4Yi8yZVCbfu1ZXRSP7id5
lodcrappo
Hero Member
*****
Offline Offline

Activity: 616
Merit: 506


View Profile
February 07, 2012, 05:18:12 AM
 #51



Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed non-kernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.

as the developer of a popular mining farm management system that uses "best of breed" tools for each function, including phoenix for the mining client part, I think this is a great idea.  Currently we have to hack the management code into each phoenix release, which isn't a big deal but having a standard way to interface would be much nicer.
wind
Member
**
Offline Offline

Activity: 125
Merit: 10


View Profile
February 07, 2012, 10:49:13 AM
 #52

is it possible to add commandline parameters like phoenix 1.x does?
echris1
Full Member
***
Offline Offline

Activity: 125
Merit: 100


View Profile
February 07, 2012, 12:50:21 PM
 #53

Just switched all my miners to this after a drastic drop in cgminer performance (something to do with SDK, not sure) 

Working great on my 2x6770, 6850 and 5770, back at max after a bit of tweaking.

I was just wondering how to add a backup pool to the conf file?

Keep up the good work!
HendrikJan
Member
**
Offline Offline

Activity: 64
Merit: 10



View Profile
February 07, 2012, 02:59:18 PM
 #54

Adding OC and fan management is very tempting, I must admit. A thought occurs to me: What if we renamed the "kernels" directory to "modules" or the like, and allowed non-kernel modules to be loaded into Phoenix as well? This allows a third party to easily develop a complete GPU management subsystem that integrates into Phoenix while still keeping the core slim and fast for those who prefer to do OC themselves.

It could be a nice way to get the best of both worlds.

But i still get less Mh/s with this 2.0 version.
What could be the difference?
Should i get the same result or does this version still need some tweaking?

Referral link to a trading bot: https://www.cryptohopper.com/?atid=224
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
February 07, 2012, 03:47:42 PM
 #55

Code:
[general]
autodetect = +cl
backend = XYZ
verbose = true

[cl:0:0]
kernel = diakgcn
aggression = 12
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256

[cl:0:1]
disabled = true

[cl:0:2]
disabled = true

[web]
disabled = true

Above config generates "Detected [cl:0:0]: [Tahiti 0] using opencl (rating 2)", which I don't understand. Shouldn't it simply use the kernel specified and tell that autodetect had been overridden by own settings. Another thing I don't understand is, why getDevice() and autodetect() reside in kernels\opencl\__init__.py I dislike the idea, that these functions are derived from there, because opencl is simply another kernel folder. I think they should be placed somewhere else (have no good idea currently, but perhaps in PhoenixCore.py).

The supplied phatk2 version uses stuff in opencl\__init__.py, too via "opencl = sys.modules['opencl']", which seems sort of not ideal. I think every kernel should specify his own options and stuff, even if they are the same. Your idea was perhaps to edit only one place, if you add new changes, but for addon kernels like diakgcn I really have to specify my own options, which I would promote as a rule for all supplied or addon kernels to be better structured and to be independend of the opencl kernel folder. What do you think?

Another small change I would suggest for analyzeDevice() is your CPU detection code, which could be replaced with:
Code:
# Check if the device is a CPU
if device.get_info(cl.device_info.TYPE) == cl.device_type.CPU:
return (1, {'name': name, 'aggression': 0}, [devid, 'cpu:0'])

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Schwede65
Sr. Member
****
Offline Offline

Activity: 309
Merit: 250


View Profile
February 07, 2012, 05:44:43 PM
 #56

great work... just tested 1.75... and now only one process for all gpu's

Question of setting the back-up-pool:

beginning of phoenix.cfg:

[general]
verbose = True
autodetect = +cl -cpu
backend = http: 123:45@67.89.0123:8332/ # URL format is exactly as it was in Phoenix 1
backup = http:  456:78@90.12.0123:8332/

is that correctly done?

d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 07, 2012, 07:26:38 PM
 #57

Think you could add the ability to use plugins?  That might shut-up some of the people wanting more functions like overclocking, slowing hash rates based on core temps, restarting of crashed GPUs, pausing GPUs, etc. by letting them code things their self.  Just push P for plugins and start configuring away or access the config file for the plugin directly to change settings.  Seems like a good solution.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
February 07, 2012, 09:23:38 PM
 #58

great work... just tested 1.75... and now only one process for all gpu's

Question of setting the back-up-pool:

beginning of phoenix.cfg:

[general]
verbose = True
autodetect = +cl -cpu
backend = http: 123:45@67.89.0123:8332/ # URL format is exactly as it was in Phoenix 1
backup = http:  456:78@90.12.0123:8332/

is that correctly done?



there is no backup pool support (atm)

jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
February 08, 2012, 02:33:32 AM
 #59

Above config generates "Detected [cl:0:0]: [Tahiti 0] using opencl (rating 2)", which I don't understand. Shouldn't it simply use the kernel specified and tell that autodetect had been overridden by own settings.

That's because you still have autodetect = +cl in the config file. Any devices with specific settings defined in the config file will use those instead of autodetect. The autodetect messages are currently displayed even if the settings are overridden by the config file. This will be clarified in a future release by either hiding the autodetect message or changing the message to indicate that the user-defined settings were used.

Another thing I don't understand is, why getDevice() and autodetect() reside in kernels\opencl\__init__.py I dislike the idea, that these functions are derived from there, because opencl is simply another kernel folder. I think they should be placed somewhere else (have no good idea currently, but perhaps in PhoenixCore.py).

The reason we have the device detection code at the kernel level is so that it can support any type of device. For example, the current FPGA miners don't have a standard API, which makes including this functionality in the Phoenix core a bad idea. We would have to add support for new devices into the Phoenix core. By doing these functions at the kernel level, it allows other developers to support new hardware with no changes to Phoenix itself.

The supplied phatk2 version uses stuff in opencl\__init__.py, too via "opencl = sys.modules['opencl']", which seems sort of not ideal. I think every kernel should specify his own options and stuff, even if they are the same. Your idea was perhaps to edit only one place, if you add new changes, but for addon kernels like diakgcn I really have to specify my own options, which I would promote as a rule for all supplied or addon kernels to be better structured and to be independend of the opencl kernel folder. What do you think?

Using functions from opencl for other kernels isn't required. This is simply how we decided to implement the supplied version of phatk2. Kernels DO NOT need to be implemented in this way.

Another small change I would suggest for analyzeDevice() is your CPU detection code, which could be replaced with:
Code:
# Check if the device is a CPU
if device.get_info(cl.device_info.TYPE) == cl.device_type.CPU:
return (1, {'name': name, 'aggression': 0}, [devid, 'cpu:0'])

Thanks for this code, I will modify opencl/phatk2 to use this method of detecting CPUs.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
baby_ghost
Member
**
Offline Offline

Activity: 89
Merit: 10


View Profile
February 08, 2012, 05:16:33 AM
 #60

Here is my phoenix.cfg, may I wrong on other sections? Miner speed lower than Phoenix 1.7.5
Code:
[general]
verbose = True
autodetect = +cl -cpu # The rightmost parameter takes precedence. This enables all OpenCL devices, except those that are CPUs.
backend = http://vinhpk.06:thunganhi@api.bitcoin.cz:8332/ # URL format is exactly as it was in Phoenix 1

[web]
password = rpc_password # Set an RPC password to keep people from messing with your miners.

# If you want to configure miners yourself, edit and uncomment this section:
#[cl:0:0] # Or whatever ID you want to configure.
#autoconfigure = True # Do you still want autoconfiguration?
#disabled = False # Do you want to disable the miner?
#bfi_int = True    # Any other kernel options...
#vectors = False # can go into this section.
phatk2 = True
VECTORS = True
BFI_INT = True
WORKSIZE = 256
AGGRESSION = 11
FASTLOOPS = false
Pages: « 1 2 [3] 4 5 6 7 8 9 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!