Bitcoin Forum
January 19, 2025, 01:42:50 AM *
News: Community Awards voting is open
 
   Home   Help Search Login Register More  
Pages: « 1 ... 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 [56] 57 »
  Print  
Author Topic: Phoenix - Efficient, fast, modular miner  (Read 760900 times)
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 03, 2012, 09:36:55 AM
 #1101

I want to use this miner.

How do I install it on an Ubunut 11.10 32bit system with one 5850?
I could not found any installation advice, readme or howto.

Thank you ...

In general this is what you will need to do:

1. Install drivers + APP SDK
 There are many guides for this around, so I won't go into the details here. In general SDK versions 2.1, 2.4, 2.5 are good, and 2.2, 2.3, 2.6 are bad.

2. Install prerequisites:
 sudo apt-get install python-twisted python-numpy python-pyopencl

3. Download Phoenix
 You can do this one of 2 ways:
 A. Clone the Git repo using:
     git clone git@github.com:jedi95/Phoenix-Miner.git /home/username/phoenix
     Obviously you will want to modify the directory to suit your needs.
 B. Download
     https://github.com/jedi95/Phoenix-Miner/tarball/master
     Extract to the directory of your choice

4. Start Phoenix:
    ./phoenix.py -u http://username:password@pool.com:8332 -k phatk2 AGGRESSION=6 WORKSIZE=128 VECTORS DEVICE=0
    You will need modify the URL to connect to the pool of your choice. The above settings should be optimal for a 5850 running at normal memory clocks. (substitute WORKSIZE=256 if underclocking the memory)
On that last part, I recommend using VECTORS4 and WORKSIZE=64.
Out of curiosity, have you already unlocked the extra shaders on your card by flashing it to a 5870?  Using VECTORS4 will allow for the full 128-bit transfers which will increase throughput to it's theoretical limit of 122-128 Gb/s.  Granted that limit is easier to reach using floating points, but let's not get into that.
Play around with it and find out what works best.  But definitely unlock those shaders!

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
FreeBit
Member
**
Offline Offline

Activity: 106
Merit: 10


View Profile
February 03, 2012, 02:18:08 PM
 #1102

In general this is what you will need to do:

1. Install drivers + APP SDK
 There are many guides for this around, so I won't go into the details here. In general SDK versions 2.1, 2.4, 2.5 are good, and 2.2, 2.3, 2.6 are bad.

2. Install prerequisites:
 sudo apt-get install python-twisted python-numpy python-pyopencl

....



Thank you. I did these steps. but ...

Quote
./phoenix.py -u http://xxx:xxx@xxx.com:8332 -k phatk2 AGGRESSION=6 WORKSIZE=128 VECTORS DEVICE=0

Quote
FATAL: Error inserting nvidia_current (/lib/modules/3.0.0-15-generic/updates/dkms/nvidia_current.ko): No such device
No device specified or device not found, use DEVICE=ID to specify one of the following

    
  • AMD Sempron(tm) 140 Processor
[0 Khash/sec] [0 Accepted] [0 Rejected]bb@hayeh:~/phoenix$


Quote
FATAL: Error inserting nvidia_current (/lib/modules/3.0.0-15-generic/updates/dkms/nvidia_current.ko): No such device
[03/02/2012 15:19:44] FATAL kernel error: Failed to apply BFI_INT patch to kernel! Is BFI_INT supported on this hardware?


I am reading about this, but I don't know and don't understand, what AMD APP, catalyst, aticonfig, blablabla are and what they are doing ...
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 03, 2012, 02:37:04 PM
Last edit: February 03, 2012, 02:50:27 PM by d3m0n1q_733rz
 #1103

Hey, small little thing I noticed about my mod to add VECTORS8 to the phatk2 kernel.  I just tried it on my CPU that would normally get around 775 KHash/sec and it shot up to around 4.06 MHash/sec.  There might be something here to modding CPU mining for underprivileged miners like myself.  I would be really grateful if someone could toss some nonce calculations into my mod for me.   Grin  Please?
I wanted to add that I'm aware that this is not a CPU miner.  But if the CPU handles 8 vectors (256-bit) faster than 4 or even 2 (64-bit), we may have found a good starting point for a new approach.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1036



View Profile WWW
February 03, 2012, 04:15:45 PM
 #1104

Out of curiosity, have you already unlocked the extra shaders on your card by flashing it to a 5870?
You are thinking of the 6xxx series. The 5850 is its own die and product that is architecturally different than 5830 and 5870 (which share a die and reference PCB). The 5770 is a similar VLIW5 with 800 shaders. None can be "unlocked".

Has anyone fried a 5770 using AGGRESSION=7?

The skinny: I was using poclbm, SDK2.1, and 5770 (stock clocks) when I thought I'd try out phoenix.  I saw the 'Recommended' settings for High-End cards and thought they were merely guidelines.  It would run for 5 minutes then it froze twice then it wouldn't even POST.  Bummer, because it had a twin.

It is likely that there is(was) some problem with that card, such as the heatsink not being mounted correctly and the GPU die overheating within seconds (or you had it ridiculously overclocked or overvolted). My 5770 runs fine at aggression 12 overclocked to 980MHz, cranking out 225+ MHash/s at 75% fan and 75C in a case.
pogi23
Member
**
Offline Offline

Activity: 266
Merit: 10


View Profile
February 03, 2012, 05:32:42 PM
 #1105

Has anyone fried a 5770 using AGGRESSION=7?

The skinny: I was using poclbm, SDK2.1, and 5770 (stock clocks) when I thought I'd try out phoenix.  I saw the 'Recommended' settings for High-End cards and thought they were merely guidelines.  It would run for 5 minutes then it froze twice then it wouldn't even POST.  Bummer, because it had a twin.

It is likely that there is(was) some problem with that card, such as the heatsink not being mounted correctly and the GPU die overheating within seconds (or you had it ridiculously overclocked or overvolted). My 5770 runs fine at aggression 12 overclocked to 980MHz, cranking out 225+ MHash/s at 75% fan and 75C in a case.
[/quote]

I think it must have been really bad luck as I was running it minutes before switching to phoenix without making any changes to the card.  Or it may have been on the way out but I couldn't see why since I kept it at stock 850 at a cool 59C.  Oh well, thanks for listening--I'm just grieving over the loss of a loved one.
Diapolo
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
February 03, 2012, 06:19:15 PM
Last edit: February 03, 2012, 11:23:54 PM by Diapolo
 #1106

Hey, small little thing I noticed about my mod to add VECTORS8 to the phatk2 kernel.  I just tried it on my CPU that would normally get around 775 KHash/sec and it shot up to around 4.06 MHash/sec.  There might be something here to modding CPU mining for underprivileged miners like myself.  I would be really grateful if someone could toss some nonce calculations into my mod for me.   Grin  Please?
I wanted to add that I'm aware that this is not a CPU miner.  But if the CPU handles 8 vectors (256-bit) faster than 4 or even 2 (64-bit), we may have found a good starting point for a new approach.

Now that's an interesting observation, hm perhaps I should add uint8 support, too ... just for the sake of it Cheesy.

Edit: Check this out!

AMD A8-3850 (CPU device) - default settings (AGGRESSION=5 WORKSIZE=1024) - DiaKGCN kernel with VECTORS8 support added:
no vectors: 2,6 MH/s
VECTORS2: 1,1 MH/s
VECTORS4: 7,4 MH/s
VECTORS8: 2,9 MH/s

I'll let it run for a while to check if valid nonces are found.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 04, 2012, 07:57:46 AM
 #1107

Hey, small little thing I noticed about my mod to add VECTORS8 to the phatk2 kernel.  I just tried it on my CPU that would normally get around 775 KHash/sec and it shot up to around 4.06 MHash/sec.  There might be something here to modding CPU mining for underprivileged miners like myself.  I would be really grateful if someone could toss some nonce calculations into my mod for me.   Grin  Please?
I wanted to add that I'm aware that this is not a CPU miner.  But if the CPU handles 8 vectors (256-bit) faster than 4 or even 2 (64-bit), we may have found a good starting point for a new approach.

Now that's an interesting observation, hm perhaps I should add uint8 support, too ... just for the sake of it Cheesy.

Edit: Check this out!

AMD A8-3850 (CPU device) - default settings (AGGRESSION=5 WORKSIZE=1024) - DiaKGCN kernel with VECTORS8 support added:
no vectors: 2,6 MH/s
VECTORS2: 1,1 MH/s
VECTORS4: 7,4 MH/s
VECTORS8: 2,9 MH/s

I'll let it run for a while to check if valid nonces are found.

Dia
I probably should have mentioned I'm using an Core2 Quad.  I'm surprised at what you have for the worksize.  I really don't know what good some of the CPU information is going to do, but I'm sort of hoping it'll put a little more oomph into CPU mining if it can be incorporated into CPU miners.  Right now, my CPU is mining faster than my GPU!  I'm seeing 15.6 with Ufasoft and only about 14.96 with phatk2.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
February 04, 2012, 10:32:10 AM
 #1108

phatk2 gives lesser than phatk for some unknown reason.

I get 380 with phatk2 & 414 with phatk on 5870 with 945/300
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 04, 2012, 12:41:28 PM
 #1109

phatk2 gives lesser than phatk for some unknown reason.

I get 380 with phatk2 & 414 with phatk on 5870 with 945/300
What are your settings?  I recommend using VECTORS and WORKSIZE=128 for that card at that memory speed.  If you increase the memory speed, you should get better results with VECTORS4 and WORKSIZE=64...in theory.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
February 04, 2012, 03:31:42 PM
Last edit: February 04, 2012, 03:58:47 PM by dishwara
 #1110

I don't know accurate values as i am using AOCLBF gui for phoenix.

My settings are vectors, HD5870, worksize 256, aggerrsion=12 BFI_INT.
Actually if i increase memory speed above 400 my system crashes.
TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
February 04, 2012, 03:53:45 PM
 #1111

I don't know accurate values as i am using AOCLBF gui for phoenix.

My settings are vectors, HD5870, worksize 256, aggerrsion=12 BFI_INT.
Actually if i increase memory speed above 400 my system crashes.

It'd be AGGRESSION=12 not "aggerrsion=12".
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
February 04, 2012, 03:59:31 PM
 #1112

ya, i have to tell correctly even in GUI AOCLBF
greatwolf
Full Member
***
Offline Offline

Activity: 230
Merit: 100


View Profile
February 05, 2012, 07:11:42 AM
 #1113

I'm running 1.7.2.  What does the following mean regarding a rejected share?
Code:
TypeError in RPC sendResult callback

I'm also getting this rejected share error with phoenix 1.7.4. What's the cause for this? Any possible fix or workarounds?
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 05, 2012, 08:39:43 AM
 #1114

Hey, small little thing I noticed about my mod to add VECTORS8 to the phatk2 kernel.  I just tried it on my CPU that would normally get around 775 KHash/sec and it shot up to around 4.06 MHash/sec.  There might be something here to modding CPU mining for underprivileged miners like myself.  I would be really grateful if someone could toss some nonce calculations into my mod for me.   Grin  Please?
I wanted to add that I'm aware that this is not a CPU miner.  But if the CPU handles 8 vectors (256-bit) faster than 4 or even 2 (64-bit), we may have found a good starting point for a new approach.

Now that's an interesting observation, hm perhaps I should add uint8 support, too ... just for the sake of it Cheesy.

Edit: Check this out!

AMD A8-3850 (CPU device) - default settings (AGGRESSION=5 WORKSIZE=1024) - DiaKGCN kernel with VECTORS8 support added:
no vectors: 2,6 MH/s
VECTORS2: 1,1 MH/s
VECTORS4: 7,4 MH/s
VECTORS8: 2,9 MH/s

I'll let it run for a while to check if valid nonces are found.

Dia
Once AVX2 comes out, we'll be seeing VECTORS8 higher with CPUs.  Anyhow, I've attempted to use Dia's GOFFSET=false code with phatk2, but I can't seem to match the two up.  Combining with VECTORS8 (once I can get the nonces figured out) and a worksize of 64 will probably end up with the fastest code for VLIW5 processors sporting 256-byte capability.  At least this held true for Dia's new kernel.  I enjoy trying to combine the best of all kernels into one to see what the result is, but translation of variables is a pain sometimes.
If anyone wants to take a stab at it, the results could be amazing.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
jedi95 (OP)
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
February 06, 2012, 04:32:38 AM
 #1115

Phoenix 2 beta has been released:
https://bitcointalk.org/index.php?topic=62765.0

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 06, 2012, 01:22:28 PM
 #1116

Last phatk2 kernel update for those with GCN (HD79xx) cards.  This is a mod that I've tossed together that allows the use of 8 out of the 16 available vectors in these cards' SIMD.  I don't know how well it'll work for you, but I need someone with a 79xx series card to test it out for me.  I'll port it over to Phoenix 2 later, but this is what I have now.  Play around with the settings if you would to "dial in" the right combination.  If anyone wants to donate bitcoin toward a GCN card so I can test the kernel mods myself as I go, I would appreciate it.  I'm trying to expand the miners to the full capabilities of the cards.  So, this means that I might be attempting a new kernel as well though I can't promise anything that big.  But I have some ideas on what I want to achieve and how.

Code:
// This file is taken and modified from the public-domain poclbm project, and
// I have therefore decided to keep it public-domain.
#ifdef VECTORS8
typedef uint8 u;
#elif defined VECTORS4
typedef uint4 u;
#elif defined VECTORS
typedef uint2 u;
#else
typedef uint u;
#endif

__constant uint K[64] = {
0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

__constant uint ConstW[128] = {
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x80000000U, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000280U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,

0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x80000000U, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000100U,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000
};

__constant uint H[8] = {
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
};


#ifdef BITALIGN
#pragma OPENCL EXTENSION cl_amd_media_ops : enable
#define rot(x, y) amd_bitalign(x, x, (uint)(32 - y))
#else
#define rot(x, y) rotate(x, (uint)y)
#endif

// This part is not from the stock poclbm kernel. It's part of an optimization
// added in the Phoenix Miner.

// Some AMD devices have the BFI_INT opcode, which behaves exactly like the
// SHA-256 Ch function, but provides it in exactly one instruction. If
// detected, use it for Ch. Otherwise, construct Ch out of simpler logical
// primitives.

#ifdef BFI_INT
// Well, slight problem... It turns out BFI_INT isn't actually exposed to
// OpenCL (or CAL IL for that matter) in any way. However, there is
// a similar instruction, BYTE_ALIGN_INT, which is exposed to OpenCL via
// amd_bytealign, takes the same inputs, and provides the same output.
// We can use that as a placeholder for BFI_INT and have the application
// patch it after compilation.

// This is the BFI_INT function
#define Ch(x, y, z) amd_bytealign(x,y,z)
// Ma can also be implemented in terms of BFI_INT...
#define Ma(z, x, y) amd_bytealign(z^x,y,x)
#else
#define Ch(x, y, z) bitselect(z,y,x)
#define Ma(x, y, z) bitselect(x,y,(z^x))
#endif

//Various intermediate calculations for each SHA round
#define s0(n) (S0(Vals[(0 + 128 - (n)) % 8]))
#define S0(n) (rot(n, 30u)^rot(n, 19u)^rot(n,10u))

#define s1(n) (S1(Vals[(4 + 128 - (n)) % 8]))
#define S1(n) (rot(n, 26u)^rot(n, 21u)^rot(n, 7u))

#define ch(n) Ch(Vals[(4 + 128 - (n)) % 8],Vals[(5 + 128 - (n)) % 8],Vals[(6 + 128 - (n)) % 8])
#define maj(n) Ma(Vals[(1 + 128 - (n)) % 8],Vals[(2 + 128 - (n)) % 8],Vals[(0 + 128 - (n)) % 8])

//t1 calc when W is already calculated
#define t1(n) K[(n) % 64] + Vals[(7 + 128 - (n)) % 8] +  W[(n)] + s1(n) + ch(n)

//t1 calc which calculates W
#define t1W(n) K[(n) % 64] + Vals[(7 + 128 - (n)) % 8] +  W(n) + s1(n) + ch(n)

//Used for constant W Values (the compiler optimizes out zeros)
#define t1C(n) (K[(n) % 64]+ ConstW[(n)]) + Vals[(7 + 128 - (n)) % 8] + s1(n) + ch(n)

//t2 Calc
#define t2(n)  maj(n) + s0(n)

#define rotC(x,n) (x<<n | x >> (32-n))

//W calculation used for SHA round
#define W(n) (W[n] = P4(n) + P3(n) + P2(n) + P1(n))


//Partial W calculations (used for the begining where only some values are nonzero)
#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U)))
#define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U)))
#define p1(x) ((rot(x,15u)^rot(x,13u)^((x)>>10U)))
#define p2(x) ((rot(x,25u)^rot(x,14u)^((x)>>3U)))
#define P3(n)  W[n-7]
#define P4(n)  W[n-16]

//Partial Calcs for constant W values
#define P1C(n) ((rotC(ConstW[(n)-2],15)^rotC(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U)))
#define P2C(n) ((rotC(ConstW[(n)-15],25)^rotC(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U)))
#define P3C(x)  ConstW[x-7]
#define P4C(x)  ConstW[x-16]

//SHA round with built in W calc
#define sharoundW(n) Barrier1(n);  Vals[(3 + 128 - (n)) % 8] += t1W(n); Vals[(7 + 128 - (n)) % 8] = t1W(n) + t2(n);  

//SHA round without W calc
#define sharound(n)  Barrier2(n); Vals[(3 + 128 - (n)) % 8] += t1(n); Vals[(7 + 128 - (n)) % 8] = t1(n) + t2(n);

//SHA round for constant W values
#define sharoundC(n)  Barrier2(n); Vals[(3 + 128 - (n)) % 8] += t1C(n); Vals[(7 + 128 - (n)) % 8] = t1C(n) + t2(n);

//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order
#define Barrier1(n) t1 = t1C((n+1))
#define Barrier2(n) t1 = t1C((n))
//#define Barrier3(n) Barrier2(n)

__kernel
//removed this to allow detection of invalid work size
//__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
void search( const uint state0, const uint state1, const uint state2, const uint state3,
const uint state4, const uint state5, const uint state6, const uint state7,
const uint B1, const uint C1, const uint D1,
const uint F1, const uint G1, const uint H1,
const u base,
const uint W16, const uint W17,
const uint PreVal4, const uint PreVal0,
const uint PreW31, const uint PreW32,
const uint PreW19, const uint PreW20,
__global uint * output)
{

u W[124];
u Vals[8];

//Dummy Variable to prevent compiler from reordering between rounds
u t1;

//Vals[0]=state0;
Vals[1]=B1;
Vals[2]=C1;
Vals[3]=D1;
//Vals[4]=PreVal4;
Vals[5]=F1;
Vals[6]=G1;
Vals[7]=H1;

W[16] = W16;
W[17] = W17;

#ifdef VECTORS8
// #ifdef GOFFSET
// W[3] = ((uint)get_global_id(0) << 3) + (u)(0, 1, 2, 3, 4, 5, 6, 7);
// uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
// #else
W[3] = base + (uint)(get_local_id(0)) * 8u + (uint)(get_group_id(0)) * (WORKSIZE * 8u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U, r ^ 0x8010000U, r ^ 0xA050000U, r ^ 0xC090000U, r ^ 0xE0D0000U};

#elif defined VECTORS4
//Less dependencies to get both the local id and group id and then add them
W[3] = base + (uint)(get_local_id(0)) * 4u + (uint)(get_group_id(0)) * (WORKSIZE * 4u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
//Since only the 2 LSB is opposite between the nonces, we can save an instruction by flipping the 4 bits in W18 rather than the 1 bit in W3
W[18] = PreW20 + (u){r, r ^ 0x2004000U, r ^ 0x4008000U, r ^ 0x600C000U};
#elif defined VECTORS
W[3] = base + (uint)(get_local_id(0)) * 2u + (uint)(get_group_id(0)) * (WORKSIZE * 2u);
uint r = rot(W[3].s0,25u)^rot(W[3].s0,14u)^((W[3].s0)>>3U);
W[18] = PreW20 + (u){r, r ^ 0x2004000U};
#else
W[3] = base + get_local_id(0) + get_group_id(0) * (WORKSIZE);
u r = rot(W[3],25u)^rot(W[3],14u)^((W[3])>>3U);
W[18] = PreW20 + r;
#endif

//the order of the W calcs and Rounds is like this because the compiler needs help finding how to order the instructions

Vals[4] = PreVal4 + W[3];
Vals[0] = PreVal0 + W[3];

sharoundC(4);
W[19] = PreW19 + W[3];
sharoundC(5);
W[20] = P4C(20) + P1(20);
sharoundC(6);
W[21] = P1(21);
sharoundC(7);
W[22] = P3C(22) + P1(22);
sharoundC(8);
W[23] = W[16] + P1(23);
sharoundC(9);
W[24] = W[17] + P1(24);
sharoundC(10);
W[25] = P1(25) + P3(25);
W[26] = P1(26) + P3(26);
sharoundC(11);
W[27] = P1(27) + P3(27);
W[28] = P1(28) + P3(28);
sharoundC(12);
W[29] = P1(29) + P3(29);
sharoundC(13);
W[30] = P1(30) + P2C(30) + P3(30);
W[31] = PreW31 + (P1(31) + P3(31));
sharoundC(14);
W[32] = PreW32 + (P1(32) + P3(32));
sharoundC(15);
sharound(16);
sharound(17);
sharound(18);
sharound(19);
sharound(20);
sharound(21);
sharound(22);
sharound(23);
sharound(24);
sharound(25);
sharound(26);
sharound(27);
sharound(28);
sharound(29);
sharound(30);
sharound(31);
sharound(32);
sharoundW(33);
sharoundW(34);
sharoundW(35);
sharoundW(36);
sharoundW(37);
sharoundW(38);
sharoundW(39);
sharoundW(40);
sharoundW(41);
sharoundW(42);
sharoundW(43);
sharoundW(44);
sharoundW(45);
sharoundW(46);
sharoundW(47);
sharoundW(48);
sharoundW(49);
sharoundW(50);
sharoundW(51);
sharoundW(52);
sharoundW(53);
sharoundW(54);
sharoundW(55);
sharoundW(56);
sharoundW(57);
sharoundW(58);
sharoundW(59);
sharoundW(60);
sharoundW(61);
sharoundW(62);
sharoundW(63);

W[64]=state0+Vals[0];
W[65]=state1+Vals[1];
W[66]=state2+Vals[2];
W[67]=state3+Vals[3];
W[68]=state4+Vals[4];
W[69]=state5+Vals[5];
W[70]=state6+Vals[6];
W[71]=state7+Vals[7];

//const u Temp = (0xb0edbdd0U + K[0]) +  W[64];

Vals[0]=H[0];
Vals[1]=H[1];
Vals[2]=H[2];
Vals[3] = 0xa54ff53aU + (0xb0edbdd0U + K[0]) +  W[64];
//Vals[3]=H[3];
Vals[4]=H[4];
Vals[5]=H[5];
Vals[6]=H[6];
Vals[7] = 0x08909ae5U + (0xb0edbdd0U + K[0]) +  W[64];
//Vals[7]=H[7];


#define P124(n) P2(n) + P1(n) + P4(n)

W[64 + 16] = + P2(64 + 16) + P4(64 + 16);
sharound(64 + 1);
W[64 + 17] = P1C(64 + 17) + P2(64 + 17) + P4(64 + 17);
sharound(64 + 2);
W[64 + 18] = P124(64 + 18);
sharound(64 + 3);
W[64 + 19] = P124(64 + 19);
sharound(64 + 4);
W[64 + 20] = P124(64 + 20);
sharound(64 + 5);
W[64 + 21] = P124(64 + 21);
sharound(64 + 6);
W[64 + 22] = P4(64 + 22) + P3C(64 + 22) + P2(64 + 22) + P1(64 + 22);
sharound(64 + 7);
W[64 + 23] = P4(64 + 23) + P3(64 + 23) + P2C(64 + 23) + P1(64 + 23);
sharoundC(64 + 8);
W[64 + 24] =   P1(64 + 24) + P4C(64 + 24) + P3(64 + 24);
sharoundC(64 + 9);
W[64 + 25] = P3(64 + 25) + P1(64 + 25);
sharoundC(64 + 10);
W[64 + 26] = P3(64 + 26) + P1(64 + 26);
sharoundC(64 + 11);
W[64 + 27] = P3(64 + 27) + P1(64 + 27);
sharoundC(64 + 12);
W[64 + 28] = P3(64 + 28) + P1(64 + 28);
sharoundC(64 + 13);
W[64 + 29] = P1(64 + 29) + P3(64 + 29);
W[64 + 30] = P3(64 + 30) + P2C(64 + 30) + P1(64 + 30);
sharoundC(64 + 14);
W[64 + 31] = P4C(64 + 31) + P3(64 + 31) + P2(64 + 31) + P1(64 + 31);
sharoundC(64 + 15);
sharound(64 + 16);
sharound(64 + 17);
sharound(64 + 18);
sharound(64 + 19);
sharound(64 + 20);
sharound(64 + 21);
sharound(64 + 22);
sharound(64 + 23);
sharound(64 + 24);
sharound(64 + 25);
sharound(64 + 26);
sharound(64 + 27);
sharound(64 + 28);
sharound(64 + 29);
sharound(64 + 30);
sharound(64 + 31);
sharoundW(64 + 32);
sharoundW(64 + 33);
sharoundW(64 + 34);
sharoundW(64 + 35);
sharoundW(64 + 36);
sharoundW(64 + 37);
sharoundW(64 + 38);
sharoundW(64 + 39);
sharoundW(64 + 40);
sharoundW(64 + 41);
sharoundW(64 + 42);
sharoundW(64 + 43);
sharoundW(64 + 44);
sharoundW(64 + 45);
sharoundW(64 + 46);
sharoundW(64 + 47);
sharoundW(64 + 48);
sharoundW(64 + 49);
sharoundW(64 + 50);
sharoundW(64 + 51);
sharoundW(64 + 52);
sharoundW(64 + 53);
sharoundW(64 + 54);
sharoundW(64 + 55);
sharoundW(64 + 56);
sharoundW(64 + 57);
sharoundW(64 + 58);


u g = -(K[60] + H[7]) - S1((Vals[0] + Vals[4]) + (K[59] + W(59+64)) + s1(64+59)+ ch(59+64));
u v = W[117] + W[108] + Vals[3] + Vals[7] + P2(124) + P1(124) + Ch((Vals[0] + Vals[4]) + (K[59] + W(59+64)) + s1(64+59)+ ch(59+64),Vals[1],Vals[2]);

uint nonce = 0;

#ifdef VECTORS8
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
if (v.s2 == g.s2)
{
nonce = W[3].s2;
}
if (v.s3 == g.s3)
{
nonce = W[3].s3;
}
if (v.s4 == g.s4)
{
nonce = W[3].s4;
}
if (v.s5 == g.s5)
{
nonce = W[3].s5;
}
if (v.s6 == g.s6)
{
nonce = W[3].s6;
}
if (v.s7 == g.s7)
{
nonce = W[3].s7;
}
#elif defined VECTORS4
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
if (v.s2 == g.s2)
{
nonce = W[3].s2;
}
if (v.s3 == g.s3)
{
nonce = W[3].s3;
}
#elif defined VECTORS
if (v.s0 == g.s0)
{
nonce = W[3].s0;
}
if (v.s1 == g.s1)
{
nonce = W[3].s1;
}
#else
if (v == g)
{
nonce = W[3];
}
#endif
if(nonce)
{
//Faster to shift the nonce by 2 due to 4-DWORD addressing and does not add more collisions
output[OUTPUT_SIZE] = nonce;
output[get_local_id(0)] = nonce;
}
}

Credit goes to Diapolo for all of their help on the kernel.  Some of their ideas and coding style are placed into this.  GOFFSET is a work in progress here, but totally their idea.

Also, as this has created a new page,

Phoenix 2 beta has been released:
https://bitcointalk.org/index.php?topic=62765.0

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
os2sam
Legendary
*
Offline Offline

Activity: 3586
Merit: 1099


Think for yourself


View Profile
March 12, 2012, 03:23:57 PM
 #1117

I recently started playing around with solo mining.  I'm using Phoenix 1.5 and am getting the following message

"Result didn't meet full difficulty, not sending"

I get this about 2 to 4 times per block.

Is this normal?
Thanks,
Sam

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1036



View Profile WWW
March 12, 2012, 04:59:16 PM
 #1118

I recently started playing around with solo mining.  I'm using Phoenix 1.5 and am getting the following message

"Result didn't meet full difficulty, not sending"

I get this about 2 to 4 times per block.

Is this normal?
Thanks,
Sam
That's quite an old version, 1.7.5 is current.
That message means either that the pool you are mining for has a higher difficulty setting for shares than normal, or more likely your miner kernel is returning bad hashes (too high GPU overclock, etc).
os2sam
Legendary
*
Offline Offline

Activity: 3586
Merit: 1099


Think for yourself


View Profile
March 12, 2012, 05:16:34 PM
 #1119

I recently started playing around with solo mining.  I'm using Phoenix 1.5 and am getting the following message

"Result didn't meet full difficulty, not sending"

I get this about 2 to 4 times per block.

Is this normal?
Thanks,
Sam
That's quite an old version, 1.7.5 is current.
That message means either that the pool you are mining for has a higher difficulty setting for shares than normal, or more likely your miner kernel is returning bad hashes (too high GPU overclock, etc).

Yep, its an old version.  I believe I had tried a newer version on this system and it didn't work.  But I'll re examine that.

Since this is mining against a local Bitcoin client which has the updated blockchain it should be mining the correct difficulty.  How would I go about verifying that?

This system is being used for other purposes so the GPU is set to factory default clocks with a very low aggression.
Thanks,
Sam

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1036



View Profile WWW
March 12, 2012, 06:35:18 PM
 #1120

I recently started playing around with solo mining.  I'm using Phoenix 1.5 and am getting the following message

"Result didn't meet full difficulty, not sending"

I get this about 2 to 4 times per block.

Is this normal?
Thanks,
Sam
That's quite an old version, 1.7.5 is current.
That message means either that the pool you are mining for has a higher difficulty setting for shares than normal, or more likely your miner kernel is returning bad hashes (too high GPU overclock, etc).

Yep, its an old version.  I believe I had tried a newer version on this system and it didn't work.  But I'll re examine that.

Since this is mining against a local Bitcoin client which has the updated blockchain it should be mining the correct difficulty.  How would I go about verifying that?

This system is being used for other purposes so the GPU is set to factory default clocks with a very low aggression.
Thanks,
Sam
The miner kernel is designed to find difficulty 1 hashes. If you are solo mining, then only 1/1,500,000th of these will meet the full difficulty and be a block solve. It may take many months to find a 50BTC block. If you are using P2PPool, this is normal as they use a higher difficulty than 1.
Pages: « 1 ... 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 [56] 57 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!