further improved phatk_dia kernel for Phoenix + SDK 2.6

dishwara

Legendary

Offline

Activity: 1855
Merit: 1016

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

November 04, 2011, 03:11:47 PM

#321

Sorry for not donating.
I tried & used your software when i was mining.
2 btc sent to the address in signature. 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM
Something is better than nothing.
Thank you for your valuable kernel.

gat3way

Sr. Member

Offline

Activity: 256
Merit: 250

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

November 05, 2011, 08:27:23 PM

#322

Quote from: Diapolo on November 04, 2011, 02:12:26 PM

I did

, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.
Dia

Still coding OpenCL stuff, Diapolo?

Diapolo (OP)

Hero Member

Offline

Activity: 769
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

November 06, 2011, 10:22:05 AM

#323

Quote from: gat3way on November 05, 2011, 08:27:23 PM

Quote from: Diapolo on November 04, 2011, 02:12:26 PM

I did

, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.
Dia

Still coding OpenCL stuff, Diapolo?

I had no time to make any further progress, from time to time I vist AMDs OpenCL forum to stay a little up to date, but I'm currently not coding. Last thing I tried was to implement 3-component vectors into the kernel, but AMDs drivers seem still buggy there.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

Diapolo (OP)

Hero Member

Offline

Activity: 769
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

November 06, 2011, 10:22:30 AM

#324

Quote from: dishwara on November 04, 2011, 03:11:47 PM

Sorry for not donating.
I tried & used your software when i was mining.
2 btc sent to the address in signature. 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM
Something is better than nothing.
Thank you for your valuable kernel.

I received your donation, a warm thank you

.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

d3m0n1q_733rz

Sr. Member

Offline

Activity: 378
Merit: 250

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 08, 2011, 08:50:39 AM
Last edit: December 08, 2011, 09:01:41 AM by d3m0n1q_733rz

#325

Small change I could suggest just looking at some of the code. I notice that some variables use simple addition and subtraction a few times. For example:

Code:

// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))

You notice that n - O comes up about 3 times in a row. Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read? Afterall, n - 7 - O is the same as n - 16 - O. If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs. If you need it, we can get it. We have solutions for your computing conundrums. BTC accepted! 12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq

kano

Legendary

Offline

Activity: 4494
Merit: 1808

Linux since 1997 RedHat 4

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 08, 2011, 11:27:41 AM

#326

Quote from: d3m0n1q_733rz on December 08, 2011, 08:50:39 AM

Small change I could suggest just looking at some of the code. I notice that some variables use simple addition and subtraction a few times. For example:

Code:

// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))

You notice that n - O comes up about 3 times in a row. Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read? Afterall, n - 7 - O is the same as n - 16 - O. If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.

Just looking at that from a standard compiler point of view
(I have no idea how good or bad or literal the OpenCL compiler is)
The compiler would probably notice that anyway if the OpenCL compiler was able to do basic optimisations.

Actually, Diapolo, do you know if it has an optimiser in it? (and how good it is?)
i.e. would that change suggested make a difference, or would the compiler work it out itself?
... just curious

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer

deepceleron

Legendary

Offline

Activity: 1512
Merit: 1028

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 08, 2011, 12:15:20 PM

#327

Yes, in fact there are some tweaks done in the code now to make the OpenCL compiler produce more optimized code than it normally does.

Diapolo (OP)

Hero Member

Offline

Activity: 769
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 08, 2011, 06:25:19 PM

#328

Quote from: kano on December 08, 2011, 11:27:41 AM

Quote from: d3m0n1q_733rz on December 08, 2011, 08:50:39 AM

Small change I could suggest just looking at some of the code. I notice that some variables use simple addition and subtraction a few times. For example:

Code:

// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))

You notice that n - O comes up about 3 times in a row. Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read? Afterall, n - 7 - O is the same as n - 16 - O. If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.

Just looking at that from a standard compiler point of view
(I have no idea how good or bad or literal the OpenCL compiler is)
The compiler would probably notice that anyway if the OpenCL compiler was able to do basic optimisations.

Actually, Diapolo, do you know if it has an optimiser in it? (and how good it is?)
i.e. would that change suggested make a difference, or would the compiler work it out itself?
... just curious

It's not a beneficial change, because the compiler optimizes this out + it makes the code a bit more readable.
I'm pretty sure the easy optimizations are all done, but if you guys prove me wrong it would be nice Wink

.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

d3m0n1q_733rz

Sr. Member

Offline

Activity: 378
Merit: 250

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 09, 2011, 04:54:42 AM

#329

Is there a way to disassemble the compiled version to the readable format so that I can do a little bit of a search for things to optimize? I've learned never leave to a compile what you can do yourself. Sometimes compilers will take you at your word.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs. If you need it, we can get it. We have solutions for your computing conundrums. BTC accepted! 12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq

kano

Legendary

Offline

Activity: 4494
Merit: 1808

Linux since 1997 RedHat 4

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 09, 2011, 08:14:08 AM

#330

True - however, consider this little comparison ...
A reasonably simple version of sha256 in C when compile with -O2 versus without is almost a double in performance.
(yeah I spent a couple of weeks recently playing with sha256 in C code and seeing what I could do with it ... and early on wondering why I was getting so bad results when I noticed I stupidly left out -O2 ... Tongue

)
Their compiler may not be as good as gcc, but hopefully not much worse.

Of course yes do try and many will be interested in your results

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer

gat3way

Sr. Member

Offline

Activity: 256
Merit: 250

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 09, 2011, 01:55:33 PM

#331

The OpenCL compiler does involve constant folding as an optimization pass. It is an obvious optimization, no need to try this.

d3m0n1q_733rz

Sr. Member

Offline

Activity: 378
Merit: 250

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 10, 2011, 11:50:49 PM

#332

Anyone know of a decompiler I can use to look at the compiled source? It'll help me remove unnecessary variables and the like. Granted, I'm only decent with assembly at the moment, but I wouldn't mind seeing the finished product when the optimizer takes hold.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs. If you need it, we can get it. We have solutions for your computing conundrums. BTC accepted! 12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq

Diapolo (OP)

Hero Member

Offline

Activity: 769
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 11, 2011, 09:05:42 AM

#333

Quote from: d3m0n1q_733rz on December 10, 2011, 11:50:49 PM

Anyone know of a decompiler I can use to look at the compiled source? It'll help me remove unnecessary variables and the like. Granted, I'm only decent with assembly at the moment, but I wouldn't mind seeing the finished product when the optimizer takes hold.

Take a look at AMD APP KernelAnalyzer 1.9 it creates assembly like output for OpenCL kernels and gives register informations and that stuff ... it's in the AMD APP SDK.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

gat3way

Sr. Member

Offline

Activity: 256
Merit: 250

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 19, 2011, 09:54:19 PM

#334

Someone interested in keeping that kernel up with 2.6? 3-component vectors are working now and it would need to get reordered a bit again to get better ALUPacking as the compiler backend has apparently changed in a way. I lost my interest in bitcoin, but it would be an interesting experiment. I believe pre-2.6 speeds can easily be regained.

Diapolo (OP)

Hero Member

Offline

Activity: 769
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 20, 2011, 04:07:11 PM

#335

I made a few quick performance checks on a 6950 + a 6650D (APU) and it's weird. CGMINER is quite a bit slower with phatk2, compared to Phoenix 1.7 with my latest kernel on my rig.

For the 6950 CGMINER 2.0.8 is @ 330 MH/s with

Code:

-I 8 -d 0 -v 2 -w 128 --auto-gpu --gpu-fan 25-50 --gpu-engine 800 --gpu-memclock 1250 --temp-target 70

.
For the 6950 Phoenix 1.7 is @ 355 MH/s with

Code:

-a 50 -k phatk AGGRESSION=12 DEVICE=0 FASTLOOP=false VECTORS2 WORKSIZE=128

.

Am I missing something? Both run with 800 / 1250 and 2-component Vectors + Worksize of 128. I'm using SDK 2.6 installed with Cat 12.1 Preview!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

Dexter770221

Legendary

Offline

Activity: 1029
Merit: 1000

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 20, 2011, 09:43:22 PM

#336

For 6950 and cgminer I have identical hashrate. But memclock is set to 690. Catalyst 11.9

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.

Diapolo (OP)

Hero Member

Offline

Activity: 769
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 20, 2011, 10:23:48 PM

#337

Quote from: Dexter770221 on December 20, 2011, 09:43:22 PM

For 6950 and cgminer I have identical hashrate. But memclock is set to 690. Catalyst 11.9

Could you give Phoenix 1.7 with my latest posted version on posting 1 a try and report back

?

Thanks,
Dia

Btw.: Is anyone able to help me getting 3-component vectors to work? The kernel should be valid but in __init__.py line 50

Code:

self.size = (nonceRange.size / rateDivisor) / self.iterations

it seems that

Code:

nonceRange.size / rateDivisor

(rateDivisor == 3 if VECTORS3 is used as kernel argument instead of VECTORS2) generates a problem, because nonceRange.size is a multiple of 256, which is not dividable by 3.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

ssateneth

Legendary

Offline

Activity: 1344
Merit: 1004

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 27, 2011, 03:21:39 AM

#338

So whats the latest kernel? 8-27? Or is there a secret newer version that I'm not seeing? Because according to main page, there is an unreleased kernel thats faster than 8-27 which is also called current. Where can I get the current kernel?

I am a long time trusted user: Bitcointalk forum trust ratings, Bitcoin-OTC Ratings, eBay Feedback, and Localbitcoins public profile.

Diapolo (OP)

Hero Member

Offline

Activity: 769
Merit: 500

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 27, 2011, 09:37:44 AM

#339

Quote from: ssateneth on December 27, 2011, 03:21:39 AM

So whats the latest kernel? 8-27? Or is there a secret newer version that I'm not seeing? Because according to main page, there is an unreleased kernel thats faster than 8-27 which is also called current. Where can I get the current kernel?

It's not released, because I had no time over Christmas Wink

... I guess I can put it on later today or tomorror.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo

naz86

Member

Offline

Activity: 111
Merit: 10

Re: further improved phatk OpenCL Kernel (> 4% increase) for Phoenix - 2011-08-27

December 27, 2011, 10:52:27 AM

#340

Hi Diapolo,

do you think we can still have such big improvements like in the past ?