Bitcoin Forum

Other => Beginners & Help => Topic started by: Diapolo on July 01, 2011, 05:59:34 PM



Title: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13
Post by: Diapolo on July 01, 2011, 05:59:34 PM
If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).



Important (2012-01-13): The FASTLOOP=False parameter is not needed anymore, because FASTLOOP defaults to false in this version. Update: FASTLOOP=True works now, I uploaded a fixed version!

Important: since OpenCL SDK / Runtime version 2.6 AMD updated their OpenCL compiler, so that some older kernels and optimizations in them seem to not work anymore or are not needed anymore. In order to reflect this change I had to edit the kernel performance section of this thread.

Important: since version 2011-08-27 you don't need to supply the BFI_INT switch anymore. If your HW supports it, it's enabled automatically. To disable it use BFI_INT=false.

Important: since version 2011-08-04 (pre-release) you have to use the switch VECTORS2 instead of VECTORS. I made this change to be clear what vectors are used in the kernel (2- or 4-component). To use 4-component vectors use switch VECTORS4.

Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix miner) is included in this package and has to be used! The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.



This is the preferred switch for Phoenix with phatk_dia in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 VECTORS2 WORKSIZE=128


Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg
Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji
Download version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4j
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788
Download version 2011-07-17: http://www.mediafire.com/?4zxdd5557243has
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6
Download version 2011-07-07: http://www.mediafire.com/?o7jfp60s7xefrg4
Download version 2011-07-06: http://www.mediafire.com/?f8b8q3w5u5p0ln0
Download version 2011-07-03: http://www.mediafire.com/?xlkcc08jvp5a43v
Download version 2011-07-01: http://www.mediafire.com/?5jmt7t0e83k3eox

Kernel performance (BFI_INT / VECTORS2 / WORKSIZE=128 / SDK 2.6 / APP KernelAnalyzer 1.11 - Cal 11.12 profile):
HD5870
2011-08-20: 22 GPR / 1427 ALU OPs / 66 CF OPs
2011-08-27: 22 GPR / 1426 ALU OPs / 66 CF OPs
2011-12-21: 20 GPR / 1400 ALU OPs / 66 CF OPs
2012-01-13: 21 GPR / 1394 ALU OPs / 67 CF OPs

HD6970
2011-08-20: 21 GPR / 1687 ALU OPs / 66 CF OPs
2011-08-27:  23 GPR / 1688 ALU OPs / 68 CF OPs
2011-12-21: 21 GPR / 1687 ALU OPs / 66 CF OPs
2012-01-13: 20 GPR / 1687 ALU OPs / 66 CF OPs



Kernel performance (BFI_INT / VECTORS2 / SDK 2.5 / APP KernelAnalyzer 1.9 - Cal 11.7 profile):
HD5870
original phatk 1.X: 1393 ALU OPs
2011-07-01: 1389 ALU OPs
2011-07-03: 1385 ALU OPs
2011-07-06: 1380 ALU OPs
2011-07-07: 1380 ALU OPs
2011-07-11: 1378 ALU OPs
2011-07-17: 1376 ALU OPs
2011-08-04 (pre-release): 1368 ALU OPs
2011-08-11: 1364 ALU OPs
2011-08-27: 1363 ALU OPs (30 less compared to original phatk 1.X)
HD6970
original phatk 1.X: 1707 ALU OPs
2011-07-01: 1710 ALU OPs
2011-07-03: 1706 ALU OPs
2011-07-06: 1702 ALU OPs
2011-07-07: 1702 ALU OPs
2011-07-11: 1701 ALU OPs
2011-07-17: 1699 ALU OPs
2011-08-04 (pre-release): 1689 ALU OPs
2011-08-11: 1687 ALU OPs
2011-08-27: 1687 ALU OPs (20 less compared to original phatk 1.X)



changelog:

2012-01-13
Kernel:
- modified: Disclaimer is now the same as in original Phoenix package
- removed: all (u) typecasts in front of scalars, where vectors and scalars were used together because per OpenCL definition this is not needed
- removed: all () brackets around n in the #define parts of the kernel
- removed: S0(), which is now again merged into s0()
- removed: brackets around the commands in t1W(), t1(), t2() and W() were removed, to allow the compiler to reorder these
- added: W() function missed an ; at it's end
- added: init variable B1addK6 used in round 6 to save an add -> THX to DiabloD3
- added: a (uint) typecast in front of get_local_id() and get_group_id() calls, because return value could be 64 bits long, which is not wanted
- modified: replaced all ma() + s0() or s0() + ma() calls with t2()
- modified: round 6 now uses the new new B1addK6 variable
- modified: reordered W[] calculation for rounds 32, 91 and 92
- modified: rounds 121, 122 and 123 to not compute Vals[4], Vals[5] and Vals[6], because they are not needed for final computation of Vals[7] -> THX to jhajduk
- modified: removed + H[7] from round 124 and use -0xec9fcd13 to check for valid nonces
- added: result_r124 variable to take the result of the last round 124, this saves a few ALU OPs on VLIW5 GPUs
Python Init:
- modified: replaced spaces with tabs in the source code formatting (I really dislike this part in Python ^^)
- modified: a few comments and commands were reformatted for better readability or to be better understandable
- modified: FASTLOOP parameter now defaults to False, which means you don't need to supply FASTLOOP=False anymore
- removed: OUTPUT_SIZE is not used anymore so all references to it were removed
- modified: changed REVISION to 122
- modified: moved the WORKSIZE checks below the part where the check, if and which vectors should be used is performed
            this takes into account, that the global worksize passed to the kernel is influenced by vector usage and vectorsize
            (currently the use of FASTLOOP can break this, because of the "dynamic" number of iterations)
- added: some debug info about worksize and pyOpenCL is displayed at the start
- added: B1 + K[6] is passed as new kernel parameter
- modified: made enqueue_read_buffer() / enqueue_write_buffer() blocking and removed finish() after the read, as per AMDs recommendations
            to minimize API overhead

2011-08-27:
Kernel:
- added: code path for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the kernel, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- modified: non BFI_INT Ch() function, which was broken in 2011-08-11 -> THX to Vince
- modified: kernel output buffer is now an ulong array and not an uint array
- removed: OUTPUT_SIZE argument is not passed and used in the kernel anymore
- modified: WORKSIZEx4, WORKSIZEx3 and WORKSIZEx2 arguments were merged into WORKSIZExVECSIZE
- modified: removed, reordered and added some brackets and type-casting stuff in the kernel
- modified: restored command order for round 108 - 123 to free a GPR
- modified: added H[7] into round 124 calculation
- modified: changed the checking for positive nonces again to cover the H[7] change
- modified: writing of nonces to output now uses 1 write for Vec2 and max. 2 writes for Vec4, because 2x uints are now encoded into 1x ulong
Python Init:
- added: code for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the Python init, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- added: detection of maximum supported WORKSIZE per Device, which is used if no WORKSIZE is supplied, if supplied WORKSIZE > max. supported WORKSIZE
    or if WORKSIZE is not a power of 2
- added: code to decode the ulong from the output buffer into 2x uint and process the results
- modified: comments, code formating and line breaks for better readability
- modified: output buffer size is now the WORKSIZE -> THX to Phaetus

2011-08-11:
- modified: reverted a former change to the Ma() function to save an ALU OP for 69XX cards
- added: S0() and S1() function, which is a compiler help -> THX Phateus
- modified: a few brackets and layout of all helper functions for better readability and compatibility
- added: t2() function, which is (s0(n) + ma(n)) and saves a few GPRs -> THX Phateus and myself (had this in earlier, but removed it sometime ^^)
- modified: changed layout of kernel definition for better readability
- modified: all values which for example had a 10u now have a 10U (uppercase) to be consistent in the whole kernel
- modified: modified round 94 W calculation for better performance
- modified: round 108 - 123 now consists of 2 W() blocks followed by 2 sharoundW() blocks to save a GPR
- modified: changed the checking for positive nonces again to never create an invalid share and lower ALU OP usage

2011-08-04 (pre-release):
- added: user Vince into disclaimer -> THX Vince :)
- added: kernel is now able to work with 4-component vectors (switch VECTORS4) -> THX to Phateus
- modified: to use 2-component vectors I renamed the switch VECTORS to VECTORS2
- added: __attribute__((reqd_work_group_size(WORKSIZE, 1, 1))) -> THX to Phateus
- added: constants PreW31 and PreW32, which store P2() + P4() for round 31 and 32 -> THX to Phateus
- renamed - modified: W17_2 is now PreW19, W2 is now PreW18, PreVal4addT1 is now PreVal4 (= PreVal4 + T1), state0subT1 is now PreVal0 (= Preval4 + state0)
- modified: base is now declared as u to save the addidion of uint2(0, 1) or uint4(0, 1, 2, 3) for W_3 init -> THX to Phateus
- modified: nonce calculation now uses the local Work-Item ID, the group ID and the WORKSIZE instead of only the global Work-Item ID -> THX to Phateus
- added: saved a multiplication by passing WORKSIZEx2 and WORKSIZEx4 constants to the kernel
- modified: calculation for W[18 - O] was optimized so that P2(18) is only calculated for x component (if Vectors are used), because x and y only differ
       in the LSB and afterwards Bit 14 and 25 are rotated for W[18 - O].y -> THX to Phateus
- modified: saved an addition for Vals[0] init, because of the change to PreVal0
- modified: reordered code for round 4 - 95 to optimize for less ALU OPs used -> THX Phateus and myself ^^
- modified: ordering of variables in additions for Round 124 was changed to optimize for less ALU OPs used
- modified: rewrote the part where nonces are checked, if they are positive and where they are written into output buffer
       (saves 2 global writes per work-item and saves additional ALU OPs)
- modified: changed variables W_3, P2_18_x, P2_18 and nonce into a constant
- modified: changed code formating for rounds 4 - 124 better readability
- removed: some comments to cleanup the code

2011-07-17:
- added: offset for W[] array to reduce it's size -> THX to user Vince
- modified: function t1() renamed to t1W() / function sharound() renamed to sharoundW()
- added: function t1() and sharound() which are used where the W[] addition can be left out, because W[] == 0
    I guess the compiler already does this optimization, but doesn't hurt) -> THX to user Vince
- modified: P1() - P4() and W() to male use of the offset
- modified: quite a few kernel parameters have new values or were added (mixed ideas from User Vince with own ones)
    C1addK5: C1addK5 = C1 + K[5]: C1addK5 = C1 + 0x59f111f1
    D1: D1 = D1 + K[4] + W[4]: D1 = D1 + 0xe9b5dba5 + 0x80000000U
    W2: W2 + W16 in P1(): W2 = P1(18) + P4(18)
    W17_2: 0x80000000U in P2() = 0x11002000 + W17 in P1(): W17_2 = P1(19) + P2(19)
    PreValaddT1: PreValaddT1 = PreVal4 + T1
    T1substate0: T1substate0 = T1 - substate0
- added: variable W_3, which stores the first value formely held in W[3]
- added: Temp variable used to speed up calculation for rounds 4 and 5
- modified: changed round 3 so that it's more efficient (uses: Vals[0] and Vals[4])
- modified: W[0] - W[14] are now kind of hard-coded or left out, where they were 0
- modified: optimized P1(18) + P2(18) + P4(18)
- modified: optimized P1(19) + P2(19) + P4(19)
- modified: optimized round 4 + 5
- modified: rounds 6 - 14 and 73 - 78 now use new sharound() without W[] addition
- modified: offset added for all parts, where W[] is used
- modified: W_3 is used as result instead of W[3] (W[3] is still used to generate random possition in output buffer) -> THX to user Vince

2011-07-11:
- modified: constant H[7] has a new value (saves an addition in round 124)
- modified: non BFI_INT Ch() function now uses OpenCL built-in bitselect
- modified: reordered W[] calculations for round 18 - 30, 87 and 94
- modified: reordered calculation for round 5
- modified: W[] calculation for round 80 - 86 is now a block before sharound() is called
- removed: K[60] from round 124 (because of new H[7] value)

2011-07-07:
- removed: some large comments in the source were removed
- modified: Ma() function is now unique in the kernel, no matter if BFI_INT is used or not -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: Ch() function which uses OpenCL bitselect() command (but it's not active, so you are free to try it) -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- modified: u W[128] is replaced with u W[124] because no more than 124 values are used
- modified: initialisation for Vals[0], Vals[3], Vals[4] and Vals[7] is now processed in other places to save some unneeded writes to these variables
- fixed: some hex values, which were used in vector additions are now properly type-casted, which hopefully restores AMD APP SDK 2.1 compatibility
- modified: rounds 3, 4 and 5 were modified for better performance (guess this can be tuned, if I have a working KernelAnalyzer)

2011-07-06:
- modified: H[] constants were reordered (2 were not used because of earlier mods)
- added: ulong L constant added (it's value doesn't fit into an uint)
- modified: new Ma() for non BFI_INT capable cards, should be faster -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- removed: t1W()
- modified: t1() reordered function calls for better performance
- modified: W() reordered function calls for better performance
- modified: sharound() removed writing to t1, now t1() is called twice, which makes this function FASTER (OpenCL compiler optimization)
- removed: sharound2() (if needed W() + sharound() is used instead)
- removed: partround() not needed because of another solution for round 3 and 124
- removed: t1 and t1W variabled
- modified: rounds 3, 19, 30, 81, 87, 94 and 124 were modified for better performance

2011-07-03:
- removed: t2(), w(n), r0(x), r1(x), R0(n) and R1(n)
- renamed - modified: R(x) to W(x) plus now uses P1, P2, P3 and P4 directly
- modified: P1(x) and P2(x) to not use R1(x - 2), R0(x - 15) but do that directly
- modified: SHA rounds 31, 32, 47 - 61, 86, 87, 114 - 119 now use sharound2() instead of W() + sharound()
- modified: reordered code for SHA rounds 66 - 94 -> saw no decrease in performance -> better readability
- modified: SHA rounds 18, 19, 20, 80, 93, 94 now use a simpler calculation because of removed zero addions
--> 1x P1(x), 2x P2(x), 4x P3(x) and 2x P4(x) were removed which should give a little MHash/sec boost
- modified: sharound() so that a double execution of t1() is avoided -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

2011-07-01:
Code:
Vals[7] = 0xb0edbdd0 + K[0] +  W[64] + 0x08909ae5U; -> Vals[7] = 0xfc08884d + W[64];
Vals[3] = 0xa54ff53a + 0xb0edbdd0 + K[0] +  W[64]; -> Vals[3] = 0x198c7e2a2 + W[64];
- removed the
Code:
Vals[7] += H[7]
addition and replaced the final if-statements in the Kernel
- reordered some W[n] = statements to remove some unneeded additions
- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse :D)
- removed some unneeded brackets
- re-formatted for better readability

If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).

Thanks,
Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Bert on July 01, 2011, 06:42:49 PM
454 Mhash/sec to 456 Mhash/sec for me on two overclocked/underclocked 5870's. The reject ratio is about 1-2% which is what it was for me before. Still looking at the diff of the two files and scratching my head. Bitcoins on the way.

EDIT: My reject ratio down to about 0.5% *scratch*, maybe it is just a streak of good luck.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: maykelmoya on July 01, 2011, 06:52:39 PM
Not working for me. phoenix 1.48, ubuntu 11.04 64bits.

$ ./phoenix.py -v -u http://user:passwd@foomining.org:8332/ -k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=12
/usr/local/lib/python2.7/dist-packages/pyopencl-2011.1beta3-py2.7-linux-x86_64.egg/pyopencl/__init__.py:163: UserWarning: Build succeeded, but resulted in non-empty logs:
Build on <pyopencl.Device 'Cypress' at 0x24b01e0> succeeded, but said:
                                                                                                                                                             
/tmp/OCLmC5ovs.cl(1): warning: ignore unrecognized OpenCL extension                 
  #pragma OPENCL __FAST_RELAXED_MATH__
                 ^

/tmp/OCLmC5ovs.cl(234): warning: integer conversion resulted in truncation
        Vals[3] = 0x198c7e2a2 + W[64];                         
                  ^

  warn("Build succeeded, but resulted in non-empty logs:\n"+message)
[01/07/2011 14:45:48] Finding inner ELF...
[01/07/2011 14:45:48] Patching inner ELF...
[01/07/2011 14:45:48] Patching instructions...
[01/07/2011 14:45:48] BFI-patched 472 instructions...
[01/07/2011 14:45:48] Patch complete, returning to kernel...
[01/07/2011 14:45:48] Applied BFI_INT patch
[01/07/2011 14:45:48] FATAL kernel error: Failed to load OpenCL kernel!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Bert on July 01, 2011, 07:02:34 PM
I'm the same setup as you (phoenix 1.48, ubuntu 11.04 64bits), but with ati-driver-installer-11-6-x86.x86_64.run (because it allows for better overclocking). Are you maybe using ati-driver-installer-11-5-x86.x86_64.run ?


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 01, 2011, 07:17:08 PM
/tmp/OCLmC5ovs.cl(1): warning: ignore unrecognized OpenCL extension                  
  #pragma OPENCL __FAST_RELAXED_MATH__
                 ^
/tmp/OCLmC5ovs.cl(234): warning: integer conversion resulted in truncation
        Vals[3] = 0x198c7e2a2 + W[64];                        
                  ^

It could be, that the pragma command has wrong syntax, in Windows no OpenCL build log is shown so I didnīt see that warning. I will re-check the OpenCL man pages. For now you can safely remove the pragma line. -> Edit: Line should have no effect, so delete it. Is only for floating-point computations.

The second line seems to need a proper typecast perhaps a (u) is sufficient.

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Bert on July 01, 2011, 07:52:07 PM
I tested on a Windows machine as well, one also with a 5870 which is overclocked a little bit @ 975/337 with a modest AGGRESSION=8 and got an increase from 412Mhash/sec -> 415Mhash/sec. But I did have the original 3% update (http://forum.bitcoin.org/index.php?topic=22965.0) install on all machines previously.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: kookiekrak on July 01, 2011, 08:14:31 PM
can you host this on like mediafire or rapidshare or something?

that shitty file host you use is redirecting me to random sites before the 80 second countdown timer finishes.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Bert on July 01, 2011, 08:48:25 PM
that shitty file host you use is redirecting me to random sites before the 80 second countdown timer finishes.
I uploaded a 7zip'ed version of Diapolo's kernel.cl file here http://www.megaupload.com/?d=H606MS0O (not sure, but probably only good for 10 downloads using a free account)

Send donations to Diapolo @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM (see first post) it's their file and not mine.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 01, 2011, 09:55:05 PM
Here is another link to Mediafire: http://www.mediafire.com/?5jmt7t0e83k3eox

That file has the #pragma removed and the typecast added. Can someone verify, if the OpenCL compiler warnings are gone now?

Edit: Is there a way to output the OpenCL compiler log from Phoenix in Windows, too?

Thanks,
Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: being on July 01, 2011, 10:12:00 PM
450->451

windows 7 32bit, poclbm, SDK 2.1, cat 11.6

Also noticed that the startup of the miner took longer with this kernel.

Thank you. ^^


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 01, 2011, 10:19:02 PM
450->451

windows 7 32bit, poclbm, SDK 2.1, cat 11.6

Also noticed that the startup of the miner took longer with this kernel.

Thank you. ^^

Try to uninstall SDK 2.1 and use that Cat 11.7 preview here: http://developer.amd.com/Downloads/110619a-121104E.zip
It has a newer OpenCL runtime :). What card are you on?

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: skillerd on July 01, 2011, 11:12:34 PM
Nice, keep 'm coming :) 377->380 on 5870

Using -k phatk VECTORS BFI_INT WORKSIZE=128 AGGRESSION=11

Any tips to get some more? Using catalyst 11.4 and opencl 2.4, windows 7 (64bit)..


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 02, 2011, 07:32:22 AM
Nice, keep 'm coming :) 377->380 on 5870

Using -k phatk VECTORS BFI_INT WORKSIZE=128 AGGRESSION=11

Any tips to get some more? Using catalyst 11.4 and opencl 2.4, windows 7 (64bit)..

Thats nearly the setting I use for my 5870.

-k phatk AGGRESSION=12 BFI_INT DEVICE=0 FASTLOOP=false VECTORS WORKSIZE=128

Core is at 900 MHz and Mem at 200 MHz with 405 MHash/sec.

You could try Cat 11.7 with the 2.5 APP Runtime!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Clipse on July 02, 2011, 10:55:07 PM
good grief this patch did wonders

It added extra 22mhash per 5850, amazed.

Wonder if there is more optimizations coming ;D


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: SeriousWorm on July 03, 2011, 12:20:32 AM
msi 6870 hawk @ 1000mHz/260mHz/1.3V, phoenix 1.5, catalyst 11.6, aggression 11
311 -> 312

Thanks!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 03, 2011, 07:47:12 AM
I donīt want to beg, but perhaps someone could link this thread into the Mining -> Mining Software forum so the users there will see it :)?
Remember, if you like this and it speeds up your Hash calculations consider a small donation: 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM

Iīm still working on this, so stay tuned :D!

Thanks,
Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: being on July 03, 2011, 08:28:31 AM
Currently running 11.7 drivers, 2.1 SDK and poclbm.
Did some testing.
With 11.7, SDK 2.5 this kernel runs better than the poclbm phatk kernel.
With 11.7, SDK 2.1 this kernel runs a bit worse than the poclbm phatk kernel.

And with 11.7, SDK 2.1 I get the highest Mhash/s. So I'll be still using the poclbm phatk kernel.
Pretty much same goes for 11.6, which seems to perform slightly slower than 11.7, but otherwise all same.

I'm not sure how or why it seemed as if your kernel was boosting my Mhash, when I first tried it, but with my current setup, unfortunately poclbm phatk wins. :P


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 09:04:15 AM
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 03, 2011, 09:38:10 AM
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!

Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may :).
Can only think of a good compiler optimization...

Thanks,
Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: saykor on July 03, 2011, 09:49:43 AM
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!

i not have this line. what version you use? can you upload it?


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: saykor on July 03, 2011, 09:56:20 AM
Currently running 11.7 drivers, 2.1 SDK and poclbm.
Did some testing.
With 11.7, SDK 2.5 this kernel runs better than the poclbm phatk kernel.
With 11.7, SDK 2.1 this kernel runs a bit worse than the poclbm phatk kernel.

And with 11.7, SDK 2.1 I get the highest Mhash/s. So I'll be still using the poclbm phatk kernel.
Pretty much same goes for 11.6, which seems to perform slightly slower than 11.7, but otherwise all same.

I'm not sure how or why it seemed as if your kernel was boosting my Mhash, when I first tried it, but with my current setup, unfortunately poclbm phatk wins. :P

hi,
where you find SDK 2.5? i see only 2.4


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 03, 2011, 10:24:20 AM
New version online!

DL here: http://www.mediafire.com/?xlkcc08jvp5a43v

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 03, 2011, 10:46:21 AM
Currently running 11.7 drivers, 2.1 SDK and poclbm.
Did some testing.
With 11.7, SDK 2.5 this kernel runs better than the poclbm phatk kernel.
With 11.7, SDK 2.1 this kernel runs a bit worse than the poclbm phatk kernel.

And with 11.7, SDK 2.1 I get the highest Mhash/s. So I'll be still using the poclbm phatk kernel.
Pretty much same goes for 11.6, which seems to perform slightly slower than 11.7, but otherwise all same.

I'm not sure how or why it seemed as if your kernel was boosting my Mhash, when I first tried it, but with my current setup, unfortunately poclbm phatk wins. :P

hi,
where you find SDK 2.5? i see only 2.4

Use Cat 11.7 preview!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 10:51:17 AM
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may :).
Can only think of a good compiler optimization...

Thanks,
Dia

Your modifications don't improve my speed either  ::)

I'm using poclbm on Linux with the SDK version 2.4, card is 4670
currently 33.8 MHash/s with my change, hope I can get it up a bit more

i not have this line. what version you use? can you upload it?

http://uploading.com/files/69c85b51/phatk.cl/
what miner are you using?


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: saykor on July 03, 2011, 10:53:00 AM
http://uploading.com/files/69c85b51/phatk.cl/
what miner are you using?

i use Phoenix 1.5
i am with windows7 with 3 cards 5830
and windows server 2008 with 1 card 5830

i not have a file phatk.cl


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 10:57:26 AM
http://uploading.com/files/69c85b51/phatk.cl/
what miner are you using?

i use Phoenix 1.5
i am with windows7 with 3 cards 5830
and windows server 2008 with 1 card 5830

i not have a file phatk.cl

The file is named kernel.cl in Phoenix, it's in the "kernels\poclbm" folder.

EDIT:
this version of the kernel doesn't have the #define sharound
if you want to try this, rename your kernel.cl to something else,
then rename the downloaded file to kernel.cl and restart phoenix


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: saykor on July 03, 2011, 11:05:13 AM
yes i do exactly what you told in edit. but it not run on windows os
Phoenix just crash after connect to the server


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: saykor on July 03, 2011, 11:18:30 AM
with last Diapolo's patch here the result for my cards

win7:
1th card +1
2th card +2
3th card +5

win2008
1th card +1


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 11:36:19 AM
yes i do exactly what you told in edit. but it not run on windows os
Phoenix just crash after connect to the server
with last Diapolo's patch here the result for my cards

win7:
1th card +1
2th card +2
3th card +5

win2008
1th card +1

Strange, both work for me but with Diapolo's version I get 6 MH/s less, even though it also removes the redundant t1(n).



Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: mazmorbid on July 03, 2011, 11:50:38 AM
Thanks for the code change, adds around 2% to my 5870's  :)


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Bert on July 03, 2011, 02:37:03 PM
New version online!

DL here: http://www.mediafire.com/?xlkcc08jvp5a43v

Dia
Moved my 5870 cards, which were already using the previous version, from 456Mh/s to 456.75Mh/s, sent an additional donation, thanks.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: deepceleron on July 03, 2011, 02:59:59 PM

Use Cat 11.7 preview!

Dia

The ATI 110619a-121104E file doesn't include a 11.7 video driver for OSs other than Windows 7. It does have a new AMD APP SDK, version 2.5.684.24. The package also installs drivers for ATI TV Wonder 600 USB and Hydravision (WTF?), so if you want to try out the new SDK, be sure to do a custom install and uncheck the other stuff.

That being said, the new SDK makes no difference for me. I've ran 10.11 and 11.6 drivers on 2.4 and 2.5 SDK with identical benchmarks on all in a very repeatable setup. However your patch does make a difference! The improvement on my overclocked 5830 (1070/392, WinXPsp3 stripped, Sempron 2.7ghz):

Before: 340.21 Mhash/s
After: 341.59 Mhash/s
Improvement: 0.41%

Update: Total accept/reject rate for three miners running this kernel: 11644/253 = 97.9% efficiency, no change there.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: saykor on July 03, 2011, 04:24:03 PM

Use Cat 11.7 preview!

Dia

The ATI 110619a-121104E file doesn't include a "11.7" video driver. It does have a new AMD APP SDK, version 2.5.684.24. The package also installs drivers for ATI TV Wonder 600 USB and Hydravision (WTF?), so if you want to try out the new SDK, be sure to do a custom install and uncheck the other stuff.

That being said, the new SDK makes no difference. I've ran 10.11 and 11.6 drivers on 2.4 and 2.5 SDK with identical benchmarks on all in a very repeatable setup. However your patch does make a difference! The improvement on my overclocked 5830 (1070/392, WinXPsp3 stripped, Sempron 2.7ghz):

Before: 340.21 Mhash/s
After: 341.59 Mhash/s
Improvement: 0.41%

I haven't run it long enough to see if it increases rejects or stales, but I will report back. This is looking good (but it will take a week of mining with the patch to even make up for my 30 minutes downtime...)


What is your setting? How you make 340 with 5830? i am on 307max with same cards.
You know from where to download the "11.7" video driver?

Thanks for your help


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: h2 on July 03, 2011, 04:33:47 PM
thanks a lot! brought me from 421 to 423 (x2) for my 5970 (already had the 3% Ma patch applied)

everybody getting "FATAL kernel error: Failed to load OpenCL kernel!" run phoenix once with sudo prefix. afterwards you can launch it again without sudo.

greets


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: figvam on July 03, 2011, 05:17:28 PM
340 Mh/s -> 344 Mh/sec in peak on a 5850.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Soak on July 03, 2011, 05:48:18 PM
Try to uninstall SDK 2.1 and use that Cat 11.7 preview here: http://developer.amd.com/Downloads/110619a-121104E.zip

390 with new Catalyst and old kernel.
388 with new Catalyst and new kernel.

ATI Radeon 6970 on Windows 7.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: strictlyfocused on July 03, 2011, 06:03:31 PM
Went 232 to 233 on an MSI 5770   :D

Sending a donation your way!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: sturle on July 03, 2011, 06:22:12 PM
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may :).
Can only think of a good compiler optimization...
It is not obvious that this will gain anything.  On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first.  This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Apopfis on July 03, 2011, 06:37:59 PM
http://i55.tinypic.com/2q0kq69.png

http://i55.tinypic.com/2q0kq69.png (http://i55.tinypic.com/2q0kq69.png)

Testing my new Sapphire 5850
I need new PSU and watercooling for the card to make it stable. Currently running cards (overnight) steady at 412 Mh/s @ 1030 MHz core and other one ~360 Mh/s @ 950 core because only AGGRESSION=8. Psu is Corsair 450W so cannot OC & OV both cards  to the max. For the first card this mod gave increase from 401-> 412 while card is running @ 1030 core.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: rcocchiararo on July 03, 2011, 06:43:07 PM
downloaded newest file.

307 > 310
278 > 280

cant't complain, it was free :P


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: 1MLyg5WVFSMifFjkrZiyGW2nw on July 03, 2011, 07:46:41 PM
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may :).
Can only think of a good compiler optimization...
It is not obvious that this will gain anything.  On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first.  This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

Well, it worked for me, might be because I only have a slow 4670. However, the same change in sharound2 decreases performance.

Another thing that seems to run a little bit faster on cards without BFI_INT:
Code:
#define Ma(x, y, z) Ch((z^x), (y), (x))


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 03, 2011, 08:07:14 PM
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may :).
Can only think of a good compiler optimization...
It is not obvious that this will gain anything.  On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first.  This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

I wish I could use the AMD APP KernelAnalyzer, but it currently wonīt work, so I rely on the pure MHash/sec numbers that Phoenix gives me :-/. Anyway, I perhaps will try to follow your hint. If itīs faster to do a calculation twice but be independend of eachother then okay :D.

My work is not over ;).

Thanks for the 3 people who sent a donation so far! Itīs a nice motivation. Did anyone repost this in the Mining Software forum? I am not allowed to post there ^^.

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: bitless on July 03, 2011, 09:55:19 PM
Good, but this

> - added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

may actually hurt the performance, in theory. If you're using more registers, at least some GPUs may not be able to run as many threads concurrently as they used to, thus slowing things down.



Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: plantucha on July 03, 2011, 10:29:58 PM
Hello,

you might want to change this

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1(n); Vals[(135 - n) % 8] = t1(n) + t2(n); }

to

Code:
#define sharound(n) { t1 = t1(n); Vals[(131 - n) % 8] += t1; Vals[(135 - n) % 8] = t1 + t2(n); }

This got me a 25% performance increase!
Seems good, but it brings no gain for me ... weird. Will look into this and perhaps re-use your idea, if I may :).
Can only think of a good compiler optimization...
It is not obvious that this will gain anything.  On the contrary, unless the compiler optimize it away, it makes the second and third instruction dependant on the first.  This is bad on a GPU which issues 4 or 5 instructions in parallel every clock cycle.

I wish I could use the AMD APP KernelAnalyzer, but it currently wonīt work, so I rely on the pure MHash/sec numbers that Phoenix gives me :-/. Anyway, I perhaps will try to follow your hint. If itīs faster to do a calculation twice but be independend of eachother then okay :D.

My work is not over ;).

Thanks for the 3 people who sent a donation so far! Itīs a nice motivation. Did anyone repost this in the Mining Software forum? I am not allowed to post there ^^.

Dia

newbie rules are pretty hard here.
you do have to spend more than 4 hours play with this forum to become able post anywhere


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Diapolo on July 04, 2011, 04:32:05 AM
Good, but this

> - added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

may actually hurt the performance, in theory. If you're using more registers, at least some GPUs may not be able to run as many threads concurrently as they used to, thus slowing things down.

That one was removed a few hours after I added it, donīt worry :). You can safely remove "u t1W;" and replace "t1W = t1w(n);" with "t1 = t1W(n);" in sharound2.

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: Alex AXe on July 04, 2011, 05:21:54 AM
360 -> 362 HD6950@900MHz  :)


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-03
Post by: r4in on July 04, 2011, 09:56:59 AM
Thanks alot for this.

303 -> 309 @ radeon 6870 (1005/350) using phoenix with your kernel!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-03
Post by: xurious on July 04, 2011, 02:02:56 PM
Was using some patch to get a few extra mh/s yesterday, but I just downloaded this new one and get about 6 more! Badass!

I need to find a way to stop having to implement all these changes across all my machines! :D

Thanks!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-03
Post by: Diapolo on July 04, 2011, 02:06:36 PM
All those who are happy and gain a few MHash/sec make me proud and happy, too :). Keep up posting here!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-03
Post by: erek on July 04, 2011, 04:11:29 PM
2x 6970s:  755 (old) -> 781 (new)


thanks!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-03
Post by: DullJack on July 04, 2011, 06:19:10 PM
Very nice, will try this on my new rig.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-03
Post by: nebiki on July 04, 2011, 06:30:58 PM
went up from 385 to 398. didn't use the 3% thing before. thanks. now i'll have to look at the stale rates.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: Diapolo on July 06, 2011, 01:08:32 PM
New version is ready, DL here: http://www.mediafire.com/?f8b8q3w5u5p0ln0

Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested!

Try, have fun, comment and donate :D.

Thanks,
Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: dewon on July 06, 2011, 03:20:00 PM
Went from 322 to 329 with hd 5830


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: Diapolo on July 06, 2011, 03:38:11 PM
Went from 322 to 329 with hd 5830

Great, seems liket HD5830 scales really well with my mod :).

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: 1MLyg5WVFSMifFjkrZiyGW2nw on July 06, 2011, 05:23:55 PM
New version is ready, DL here: http://www.mediafire.com/?f8b8q3w5u5p0ln0

Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested!

Try, have fun, comment and donate :D.

Thanks,
Dia

Thanks, best version yet :D
Still not reached the 40 MHash/sec the wiki says my card could do  ???

Did you notice that Ma(x, y, z) is defined exactly the same now whether BFI_INT is enabled or not? Seems more elegant to me if moved out of the #ifdef. Also I tried to replace some #define's with functions, guessing that it would make it easier for a somewhat smart compiler to find repeatedly used terms and put them into registers. No performance improvement, but didn't hurt it either.

Also, OpenCL has a builtin Ch function, not faster for me but maybe for someone else:
#define Ch(x, y, z) bitselect(z, y, x)


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: Diapolo on July 06, 2011, 07:30:15 PM
Thanks, best version yet :D
Still not reached the 40 MHash/sec the wiki says my card could do  ???

Did you notice that Ma(x, y, z) is defined exactly the same now whether BFI_INT is enabled or not? Seems more elegant to me if moved out of the #ifdef. Also I tried to replace some #define's with functions, guessing that it would make it easier for a somewhat smart compiler to find repeatedly used terms and put them into registers. No performance improvement, but didn't hurt it either.

Also, OpenCL has a builtin Ch function, not faster for me but maybe for someone else:
#define Ch(x, y, z) bitselect(z, y, x)


Thank YOU :) another nice hint, even if it not boosts, the code gets cleaner. I'm not sure about the #define as functions. Could you post an example? My problem is, that most variables are defined and declared inside the kernel function. So a function for sharound() for example needs Vals[] and others as passed parameters (copy or pointer).

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: 1MLyg5WVFSMifFjkrZiyGW2nw on July 06, 2011, 08:23:47 PM
Thank YOU :) another nice hint, even if it not boosts, the code gets cleaner. I'm not sure about the #define as functions. Could you post an example? My problem is, that most variables are defined and declared inside the kernel function. So a function for sharound() for example needs Vals[] and others as passed parameters (copy or pointer).

Dia

Yes I haven't changed those defines yet, mostly added my own intermediate functions:

Code:
// Ma can also be implemented in terms of Ch...
u Ma(u x, u y, u z) { return Ch(z^x, y, x); }

// Various intermediate calculations for each SHA round

u xrot2(u n, const uint r1, const uint r2) {
        return rot(n, r1) ^ rot(n, r2);
}

u xrot3(u n, const uint r1, const uint r2, const uint r3) {
        return xrot2(n, r1, r2) ^ rot(n, r3);
}

u xrrs(u n, const uint r1, const uint r2, const uint r3) {
        return xrot2(n, r1, r2) ^ (n >> r3);
}

#define s0(n) xrot3(Vals[(128-n) % 8], 30, 19, 10)
#define s1(n) xrot3(Vals[(132-n) % 8], 26, 21, 7)
#define ch(n) Ch(Vals[(132 - n) % 8], Vals[(133 - n) % 8], Vals[(134 - n) % 8])
#define ma(n) Ma(Vals[(129 - n) % 8], Vals[(130 - n) % 8], Vals[(128 - n) % 8])
#define t1(n) (K[n % 64] + Vals[(135 - n) % 8] + W[n] + s1(n) + ch(n))

// intermediate W calculations
#define P1(x) xrrs(W[x - 2], 15, 13, 10)
#define P2(x) xrrs(W[x - 15], 25, 14, 3)

Since there is no noticeable drop in hashrate, I assume the compiler is inlining these functions.

Also, you can eliminate one extra assignment to Vals[4]:

Code:
//Vals[4] = PreVal4;
//...
#ifdef VECTORS.
        Vals[4] = (W[3] = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4;
#else
        Vals[4] = (W[3] = base + get_global_id(0)) + PreVal4;
#endif
//...
//Vals[4] += W[3];


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: Diapolo on July 06, 2011, 09:01:00 PM
I really like this "let's do better"-game :). But for now I say good n8!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: conspirosphere.tk on July 06, 2011, 11:28:17 PM
I don't have a benchmark, but according phoenix miner I passed from about 160Mhs to 180+ using your patch with my 5770.
Good work!
Many thanks


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: zmcgrew on July 07, 2011, 12:54:13 AM
Just wanted to say thanks for the hard work, but today's (07/06/2011) kernel dropped me by about 2 Mh/s.

07/03/2011 got me ~300.8 Mh/s, but 07/06/2011 won't go above ~298.5 Mh/s.

I'm running on a 6870, Catalyst 11.6, SDK 2.4, and using the following: BFI_INT VECTORS AGGRESSION=13 WORKSIZE=128

Card is clocked at 960Mhz core, and 300 Mhz RAM.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: swivel on July 07, 2011, 05:15:47 AM
Nice work! Plugged in the 2011-07-06 kernel to phoenix and saw my 5850 jump from 348 Mhash/s to 354 Mhash/s.


Debian sid 64-bit
Catalyst 11.6 and AMD APP 2.4 SDK
phoenix 1.50 with VECTORS BFI_INT WORKSIZE=256 AGGRESSION=12
XFX 5850 BE 860 core 300 memory stock voltage fan speed at 55% temp at 61C



Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: Diapolo on July 07, 2011, 05:59:43 AM
Just wanted to say thanks for the hard work, but today's (07/06/2011) kernel dropped me by about 2 Mh/s.

07/03/2011 got me ~300.8 Mh/s, but 07/06/2011 won't go above ~298.5 Mh/s.

I'm running on a 6870, Catalyst 11.6, SDK 2.4, and using the following: BFI_INT VECTORS AGGRESSION=13 WORKSIZE=128

Card is clocked at 960Mhz core, and 300 Mhz RAM.

Could you raise your Mem clock to ~350 MHz and report back. What about Worksize of 256, for 5830 cards this helps a lot.

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: gominoa on July 07, 2011, 07:16:09 AM
This doesnt compile when VECTORS is defined.

Quote
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

I cant post on the mining thread, but this is the same error reported there.
Works fine without VECTORS defined.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: Vrekk on July 07, 2011, 08:01:12 AM
Got an increase from 425 to 435 :-) Thanks a bunch!! Sent a little something something your way/


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: Diapolo on July 07, 2011, 09:50:37 AM
This doesnt compile when VECTORS is defined.

Quote
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

I cant post on the mining thread, but this is the same error reported there.
Works fine without VECTORS defined.

I'm looking into this, it seems to only happen for SDK 2.1!
In the other thread, we try to nail it down ... if I find a solution to this a fixed version will be upped.
If you have no problem with a bit fiddling in the code, you can try to change a few lines.

Code:
W[19] = P4(19) + (u)0x11002000 + P1(19);

W[30] = P3(30) + (u)0xA00055 + P1(30);

Vals[3] = (u)L + W[64];

W[81] = P4(81) + P2(81) + (u)0xA00000;

W[87] = P4(87) + P3(87) + (u)0x11002000 + P1(87);

W[94] = P3(94) + (u)0x400022 + P1(94);

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: SeriousWorm on July 07, 2011, 10:32:27 AM
Wow, I got a nice increase when I upped my memory to 350mhz.
6870 @ 980/350/1.25V:
310 mhash/sec, 10 aggression.
312 mhash/sec, 12 aggression.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: Diapolo on July 07, 2011, 11:03:11 AM
Wow, I got a nice increase when I upped my memory to 350mhz.
6870 @ 980/350/1.25V:
310 mhash/sec, 10 aggression.
312 mhash/sec, 12 aggression.


Latest kernel seems to be sensitive to higher Mem clock, thanks for verifying.

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 07, 2011, 03:20:02 PM
New version 2011-07-07 is ready: http://www.mediafire.com/?7j70gnmllgi9b73

This is mainly a bugfix release for SDK 2.1 with some code restructuring to save a few writes and additions. I can not guarantee, that this really works for 2.1, because I didn't test it. If you are unsure, wait for users to test it for you and consider applying this patch later!

By the way, I want to thank all of those who donated a few Bitcents to me, feels great!

Thanks,
Dia

PS.: If it works, please post here and consider a small donation @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM :).


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: conspirosphere.tk on July 07, 2011, 04:19:36 PM
This cause the immediate crash and closing of Phoenix miner 1.50 for me, so I'm reverting to your previous patch.
Donation sent.

update: it was my -f flag. Without it, it now works.

BTW: How do you get accurate measures of your Mhs??? My Phoenix miner oscillates between 170 and 190 Mhs.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: gominoa on July 07, 2011, 10:58:10 PM
New version 2011-07-07 works on SDK 2.1 w/ VECTORS.

Thanks


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Bert on July 08, 2011, 02:31:17 AM
... snip ...
BTW: How do you get accurate measures of your Mhs??? My Phoenix miner oscillates between 170 and 190 Mhs.

I add "-a 50" to average the Mhash/sec over 50 samples, this overrides the default value of 10 and smooths out the jumps, but it is slower to converge to the real hash rate. So the jumps are 5 times smaller.


$ ./phoenix.py --help
Usage: phoenix.py -u URL [-k kernel] [kernel params]

Options:
  -h, --help            show this help message and exit
  -v, --verbose         show debug messages
  -k KERNEL, --kernel=KERNEL
                        the name of the kernel to use
  -u URL, --url=URL     the URL of the mining server to work for [REQUIRED]
  -q QUEUESIZE, --queuesize=QUEUESIZE
                        how many work units to keep queued at all times
 -a AVGSAMPLES, --avgsamples=AVGSAMPLES
                        how many samples to use for hashrate average
$


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 08, 2011, 04:57:06 AM
New version 2011-07-07 works on SDK 2.1 w/ VECTORS.

Thanks

So how does it work for you? Compared to other kernels? Which cards do you use?

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: burningrave101 on July 08, 2011, 05:28:39 AM
Tested the latest 2011-07-07 kernel on my 6990 @ 880Mhz core using the latest 7/1 version of GUIMiner without any additional kernel tweaks and saw roughly a 15 Mh/s increase. Thanks and hope to see further improvements in hash rate to come :).


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 08, 2011, 05:30:47 AM
Tested the latest 2011-07-07 kernel on my 6990 @ 880Mhz core using the latest 7/1 version of GUIMiner without any additional kernel tweaks and saw roughly a 15 Mh/s increase. Thanks and hope to see further improvements in hash rate to come :).

It gets's harder after each new version, so I guess next version could take some time :). Any ideas and hints are welcome.

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: kr105 on July 08, 2011, 06:44:36 AM
Asus EAH5850, core 840, mem 180, volt 1080:

version 2011-07-01: 338mh/s
version 2011-07-03: 336mh/s
version 2011-07-06: 301mh/s
version 2011-07-07: 301mh/s

I'll try to play with core/mem clocks again, because this values was the optimals for the old phatk. Thanks.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 08, 2011, 07:45:38 AM
Asus EAH5850, core 840, mem 180, volt 1080:

version 2011-07-01: 338mh/s
version 2011-07-03: 336mh/s
version 2011-07-06: 301mh/s
version 2011-07-07: 301mh/s

I'll try to play with core/mem clocks again, because this values was the optimals for the old phatk. Thanks.

I bet 0,1 BTC, that you will reach higher values, with raised mem clocks :D. Deal?

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Bert on July 08, 2011, 08:18:32 AM
... snip ...
Any ideas and hints are welcome.

Dia

I've been toying with an idea, but I don't have the necessary programming skills (or knowledge of the SHA-256 algorithm) to implement anything.

http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_APP_SDK_FAQ.pdf
Quote
41. What is the difference between 24-bit and 32-bit integer operations?

    24-bit operations are faster because they use floating point hardware and can execute on all compute unts. Many 32-bit integer operations also run on all stream processors, but if both a 24-bit and a 32-bit version exist for the same  instruction, the 32-bit instruction executes only one per cycle.

43. Do 24-bit integers exist in hardware?

    No, there are 24-bit instructions, such as MUL24/MAD24, but the smallest integer in hardware registers is 32-bits.

75. Is it possible to use all 256 register in a thread?

    No, the compiler limits a wavefront to half of the register pool, so there can always be at least two wavefronts executing in parallel.
http://developer.amd.com/sdks/amdappsdk/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf
Page 4-62
Quote
24-bit integer MULs and MADs have five times the throughput of 32-bit integer multiplies. 24-bit unsigned integers are natively supported only on the Evergreen family of devices and later. Signed 24-bit integers are supported only on the Northern Island family of devices and later. The use of OpenCL built-in functions for mul24 and mad24 is encouraged. Note that mul24 can be useful for array indexing operations.

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=144722
Quote
On the 5800 series, signed mul24(a,b) is turned into
Code:
(((a<<8)>>8)*((b<<8)>>8))
. This makes it noticeably SLOWER than simply using a*b. Unsigned mul24(a,b) uses a native function. mad24 is similar. I made some kernels which just looped the same operation over and over:
signed a * b: 0.9736s
unsigned mul24(a,b): 0.9734s
signed mul24(a,b): 2.2771s

So anyhow what I was thinking was the following

Current  kernel: 1 * 256 bit hash / 32int =  8 32bit operations (speed 100% )
Possible Kernel: 3 * 256 bit hash / 24int = 32 24bit operations (speed a maximum of  166% [5 times faster divided by 3 SHA-256 operations in parallel])*

* It may actually end up being slower than the current kernel.cl if 32bit and 24bit operations are sent as wavefronts at the same time.

There may be some merit in trying to write a new kernel.cl that uses 32 x 24bit integers to carry out 3 parallel SHA-256 operations at once faster than one SHA-256 operation using 8 32bit integers .

But not everything can be carried out as 24bit operations, only mul24(a,b) and mad24(a,b), so the 166% speed up would only be achieved if every SHA-256 operation was covered by these two operations. The new kernel.cl would be limited to modern ATI hardware (54xx-59xx,67xx-69xx), which is generally what miners are using.

But to be honest I haven't looked into the SHA-256 algorithm, so I'm not sure if parts of it could ever be rewritten to utilise mad24(a,b) or mul24(a,b). But I like thinking outside the box.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-06
Post by: zmcgrew on July 08, 2011, 09:36:38 AM
Could you raise your Mem clock to ~350 MHz and report back. What about Worksize of 256, for 5830 cards this helps a lot.

Played with mem clock speeds. 350 saw no improvement, but 600 to 1050 saw a ~.5 Mh/s improvement, but still not enough to get me back the 2 Mh/s I lost.

Work size of 256 dropped off another few Mh/s, so that definitely didn't help. It seems like 07/03/2011 is the winner for me! =)
Thanks for your efforts though, I'll definitely keep testing and see if the newer kernels can return to the 07/03/2011 level.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: makiet on July 08, 2011, 10:13:27 AM
nice work, I'll try it  ;)


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 08, 2011, 11:54:04 AM
Guys, I introduced a small glitch, which produces an OpenCL compiler warning in version 07-07. For stability reasons please change line 77:

old:
u W[123];

new:
u W[124];

I missed sharound(123), which writes to W[123], which is undefined, because it's out of range. Sorry for that!
Will upload a fixed version shortly (only includes the change above and stays 07-07).

Edit:
Download 07-07 fixed: http://www.mediafire.com/?o7jfp60s7xefrg4

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: John (John K.) on July 08, 2011, 12:08:39 PM
Thanks for the updates. Hashing rate increased like 1-2MH/s ;D


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: mitchel on July 08, 2011, 02:38:37 PM
Thanks man this is awesome!  Hashing rate increased by 10 for each 5830 that i have. 

I'm at work right but i will definitely donate when i get home.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Maxim Gladkov on July 08, 2011, 02:40:01 PM
Thank you for this improvements!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: gominoa on July 09, 2011, 08:59:24 AM
Works great!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 09, 2011, 02:19:50 PM
I've been toying with an idea, but I don't have the necessary programming skills (or knowledge of the SHA-256 algorithm) to implement anything.

http://developer.amd.com/sdks/AMDAPPSDK/assets/AMD_APP_SDK_FAQ.pdf
Quote
41. What is the difference between 24-bit and 32-bit integer operations?

    24-bit operations are faster because they use floating point hardware and can execute on all compute unts. Many 32-bit integer operations also run on all stream processors, but if both a 24-bit and a 32-bit version exist for the same  instruction, the 32-bit instruction executes only one per cycle.

43. Do 24-bit integers exist in hardware?

    No, there are 24-bit instructions, such as MUL24/MAD24, but the smallest integer in hardware registers is 32-bits.

75. Is it possible to use all 256 register in a thread?

    No, the compiler limits a wavefront to half of the register pool, so there can always be at least two wavefronts executing in parallel.
http://developer.amd.com/sdks/amdappsdk/assets/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf
Page 4-62
Quote
24-bit integer MULs and MADs have five times the throughput of 32-bit integer multiplies. 24-bit unsigned integers are natively supported only on the Evergreen family of devices and later. Signed 24-bit integers are supported only on the Northern Island family of devices and later. The use of OpenCL built-in functions for mul24 and mad24 is encouraged. Note that mul24 can be useful for array indexing operations.

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=144722
Quote
On the 5800 series, signed mul24(a,b) is turned into
Code:
(((a<<8)>>8)*((b<<8)>>8))
. This makes it noticeably SLOWER than simply using a*b. Unsigned mul24(a,b) uses a native function. mad24 is similar. I made some kernels which just looped the same operation over and over:
signed a * b: 0.9736s
unsigned mul24(a,b): 0.9734s
signed mul24(a,b): 2.2771s

So anyhow what I was thinking was the following

Current  kernel: 1 * 256 bit hash / 32int =  8 32bit operations (speed 100% )
Possible Kernel: 3 * 256 bit hash / 24int = 32 24bit operations (speed a maximum of  166% [5 times faster divided by 3 SHA-256 operations in parallel])*

* It may actually end up being slower than the current kernel.cl if 32bit and 24bit operations are sent as wavefronts at the same time.

There may be some merit in trying to write a new kernel.cl that uses 32 x 24bit integers to carry out 3 parallel SHA-256 operations at once faster than one SHA-256 operation using 8 32bit integers .

But not everything can be carried out as 24bit operations, only mul24(a,b) and mad24(a,b), so the 166% speed up would only be achieved if every SHA-256 operation was covered by these two operations. The new kernel.cl would be limited to modern ATI hardware (54xx-59xx,67xx-69xx), which is generally what miners are using.

But to be honest I haven't looked into the SHA-256 algorithm, so I'm not sure if parts of it could ever be rewritten to utilise mad24(a,b) or mul24(a,b). But I like thinking outside the box.

Hi Bert, sorry for not directly answering you. I checked the OpenCL 1.1 specs and yes, there faster 24-Bit integer operations are mentioned, too.

mul24    (Fast integer function.) Multiply 24-bit integer values a and b
mad24    (Fast integer function.) Multiply 24-bit integer then add the 32-bit result to 32-bit integer

But here lies the problem, AFAIK there are only additions, bit-shiftig and other bit-wise operations used for current kernel (no multiplications). So there should be no use for it on the first sight.

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: KKAtan on July 09, 2011, 05:26:21 PM
I've tested your patch, and there are some great improvements from the original phoenix 1.50 miner indeed.
My 6950 gets improvement.
My 5870 gets improvement.
My 5850 gets improvement.

But I have noticed a regression with my 6870 (1005mhz core / 200 mhz mem)

Configuration:
-Using window 7
-Using Catalyst 11.6
-Using the aoclbf 1.74 frontend for phoenix 1.50
phatk
Vector
BFI_INT
Aggression 13
Work size 128

2011-07-03 kernel: 317 mhash/s (all 3 number are peak value)
2011-07-06 kernel: 283
2011-07-07 kernel: 283

...Needless to say, something bad happened between 07-03 and 07-06. I hope we can get to the bottom of this. If you need me to test something, I will be happy to do what I can for you.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Bert on July 09, 2011, 06:18:15 PM
... snip ...
But here lies the problem, AFAIK there are only additions, bit-shiftig and other bit-wise operations used for current kernel (no multiplications). So there should be no use for it on the first sight.
Yea, I was thinking the same, but then I thought that there may be some smart way to rearrange some of the SHA-256 algorithm to change simple bit shifts, exclusive-ors and addition into more complex multiplies, maybe by carrying out two or three operations at once. But it would require a deep knowledge of the SHA-256 algorithm and binary maths. Something along the lines of way that Laplace can be used to solve differential equations by transferring everything into the S-domain, solve with addition and subtraction and then transfer back for the answer.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: 1MLyg5WVFSMifFjkrZiyGW2nw on July 09, 2011, 10:44:39 PM
Yea, I was thinking the same, but then I thought that there may be some smart way to rearrange some of the SHA-256 algorithm to change simple bit shifts, exclusive-ors and addition into more complex multiplies, maybe by carrying out two or three operations at once. But it would require a deep knowledge of the SHA-256 algorithm and binary maths. Something along the lines of way that Laplace can be used to solve differential equations by transferring everything into the S-domain, solve with addition and subtraction and then transfer back for the answer.

While not a math guru, I am certain this can't be done. The algorithm uses these kinds of operations:

- rotate or shift bits right by three different numbers of places, then XOR together
- select bits from one of two values, depending on bits in a third
- majority of bits set/clear in three values
- addition of the result of these operations and constant values for each round

you can build multiplication out of these operations if combined in a certain way, but the SHA-256 algorithm does not use them like this. If (parts of) SHA were equivalent to something as simple as multiplication, I'd say it could be broken in no time.

Also, SHA256 uses 32 bit values for everything. You could of course implement it on an 8 bit machine, but this would make it much slower. And having 24 bit wide registers does not even mean you could run three 8 bit ops at the same time

Another thought I had:
Is aggressive loop unrolling really helping performance? At least for FPGAs, I guess that lots of very small units that maybe do one hash every 64 clock cycles could be better than a much bigger unrolled design, and the same could be true for GPUs. Was this already tested or did everyone start with the assumption that unrolling is the way to go?


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: SeriousWorm on July 10, 2011, 12:23:26 AM
I've tested your patch, and there are some great improvements from the original phoenix 1.50 miner indeed.
My 6950 gets improvement.
My 5870 gets improvement.
My 5850 gets improvement.

But I have noticed a regression with my 6870 (1005mhz core / 200 mhz mem)

Configuration:
-Using window 7
-Using Catalyst 11.6
-Using the aoclbf 1.74 frontend for phoenix 1.50
phatk
Vector
BFI_INT
Aggression 13
Work size 128

2011-07-03 kernel: 317 mhash/s (all 3 number are peak value)
2011-07-06 kernel: 283
2011-07-07 kernel: 283

...Needless to say, something bad happened between 07-03 and 07-06. I hope we can get to the bottom of this. If you need me to test something, I will be happy to do what I can for you.

Try setting worksize to 256 and upping your memory to 350mhz. I get the most mhash/sec using that.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: bmgjet on July 10, 2011, 12:29:20 AM
Iv found 500mhz to be best for memory clock for gddr5 cards and 800mhz for gddr3 cards. Runs a bit hotter then 350 but is worth it imo.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: ssateneth on July 10, 2011, 04:19:33 AM
Came across this topic when browsing https://en.bitcoin.it/wiki/Mining_hardware_comparison and the comment on the last 5830 entry (not the crossfire one). Here's my results

Baseline: 11.6 drivers. Not sure what SDK it is (I'm going to assume 2.4. I haven't done anything to my knowledge to change SDK and it seems 2.4 is what comes with 11.6) 5870 @ 1015 core, 300 memory with original phatk with phoenix 1.5 VECTORS BFI_INT WORKSIZE=128 FASTLOOP=false AGGRESSION=13
441Mhash
7-07 build, immediate gain to 450 MHash/sec
Increased memory to 350. 459 Mhash/sec
Increase WORKSIZE to 256. 463 MHash/sec

And I found that any increases to memory after 360 cause weird things to happen (almost certain crash, I panicked to get it back to 350 before it crashed), but I see people posting about 500MHz, so I'm going to try that out, hopefully not crash.

Edit: 500 memory speed causes a -decrease- to 452 MHash/sec. It leads me to think that there are certain dividers/timings being changed at certain thresholds. This would explain why 500MHz and 350 MHz appears to be stable, but 360+ and 600MHz are unstable (I was limited to 600MHz for a while because I didn't know how to push memory lower than 600. I used MSI Afterburner).

Now I use MSI Afterburner to alter voltage and AMD GPU Clock Tool to set frequencies. I don't suppose anyone knows an all-in-one solution? AMD GPU Clock Tool doesn't seem to want to set custom voltages. It just has 0.9500, 1.0630, and Max VDCC. It won't accept custom numbers.



Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 10, 2011, 11:39:06 AM
Next kernel version will, once more, be faster for 69XX and 58XX cards :). Stay tuned!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: BitCoinJack.com on July 10, 2011, 12:00:13 PM
Next kernel version will, once more, be faster for 69XX and 58XX cards :). Stay tuned!

Dia

Great, looking forward to it! :D


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Nialpo on July 11, 2011, 05:54:53 AM
Also, you can simplify last sharound()'s - there is no need in calculating second variables:

Change
           W(120);
   sharound(120);
   W(121);
   sharound(121);
   W(122);
   sharound(122);
   W(123);
   sharound(123);

To
           W(120);
   sharound(120);
           W(121);
           Vals[2] += t1(121);
           W(122);
           Vals[1] += t1(122);
           W(123);
           Vals[0] += t1(123);


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 11, 2011, 06:38:50 AM
Also, you can simplify last sharound()'s - there is no need in calculating second variables:

Change
           W(120);
   sharound(120);
   W(121);
   sharound(121);
   W(122);
   sharound(122);
   W(123);
   sharound(123);

To
           W(120);
   sharound(120);
           W(121);
           Vals[2] += t1(121);
           W(122);
           Vals[1] += t1(122);
           W(123);
           Vals[0] += t1(123);


Seems like a good idea, but I checked via KernelAnalyzer and it doesn't lower the needed ALU operations ... perhaps with reordering of commands this will help. Looking into it and thanks for your posting!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Nialpo on July 11, 2011, 07:03:14 AM
Probably compiler already removed this, since there is no dependency.

Also, dimension of W[] can be lowered, down to 16. In each step no values before w[n-16] are used. I've tried this, but it's not faster. Maybe with W[32] or W[64] it will be better. At least, compiled .elf file is shorter :)



Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: zephyr4 on July 11, 2011, 07:08:14 AM
thanks ill give it a try


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: Diapolo on July 11, 2011, 02:09:28 PM
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I :D. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Bert on July 11, 2011, 05:40:01 PM
Thanks for the hard work, sent you another donation. 456.75Mh/sec up to 458.88 with the last update on overclocked 5870's running at 1000/347.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 11, 2011, 07:34:21 PM
Thanks for the hard work, sent you another donation. 456.75Mh/sec up to 458.88 with the last update on overclocked 5870's running at 1000/347.

Sounds great, I hope jedi95 will include this kernel or parts of it in a new Phoenix version :). I guess quite a few people don't know about this mod. Thanks for your support Bert!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: being on July 11, 2011, 08:58:34 PM
Thanks a lot for this. Really great work!
I've been using/testing it on and off with my poclbm (because I get high CPU usage with phoenix) on my windows 7 32bit, cat 11.6 (not b), SDK 2.1, HD5870 rig. This is the kernel I got 460Mhash/s with (https://i.imgur.com/4UzsU.jpg). :)
The only problem is, that at times it has been a bit unstable with my setup. Hoping this last version (07-11) will be stable enough, so I can use it 24/7. =)

Thank you for the hard work! =)


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-07
Post by: KKAtan on July 12, 2011, 05:35:17 AM
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6
Amazing. You managed to solve the previous (http://forum.bitcoin.org/index.php?topic=25135.msg344837#msg344837) regression of 6870 performance as well.

Everything is perfect now.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 12, 2011, 09:38:52 AM
With the ideas that user Vince gave to us in the other thread, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again :).

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: nebiki on July 12, 2011, 05:49:56 PM
i don't know why, but i lose 2Mhash/s going from 07-07 -> 07-11


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 12, 2011, 05:57:48 PM
i don't know why, but i lose 2Mhash/s going from 07-07 -> 07-11

You can always return to the previous version, what's your setup?

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: navigator on July 12, 2011, 09:42:05 PM
DELETED for privacy


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: actudoran on July 13, 2011, 08:07:23 AM
hi and thanks for the great work!

I got a real issue now ... I worked with your latest krenel and everything was peachy 315 mhas on the hot one and pushed the other to 330 mhas on sapphire 5830 with : phoenix -u http://XXXXXXXXXXXXXXXX:8332 VECTORS FASTLOOP=false AGGRESSION=9 BFI_INT -k phatk DEVICE=1

however, now device 0 works fine no probs, but device 1 won't work ... it says failed to load krenel BFI_INT something ... I remove the bfi and then it says just failed to load krenel ...

any advice ?

Thanks!
Alex


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 13, 2011, 08:29:30 AM
hi and thanks for the great work!

I got a real issue now ... I worked with your latest krenel and everything was peachy 315 mhas on the hot one and pushed the other to 330 mhas on sapphire 5830 with : phoenix -u http://Oana1:h0m3@pool.bitclockers.com:8332 VECTORS FASTLOOP=false AGGRESSION=9 BFI_INT -k phatk DEVICE=1

however, now device 0 works fine no probs, but device 1 won't work ... it says failed to load krenel BFI_INT something ... I remove the bfi and then it says just failed to load krenel ...

any advice ?

Thanks!
Alex


Try:
phoenix -u http://Oana1:h0m3@pool.bitclockers.com:8332 -k phatk AGGRESSION=9 BFI_INT DEVICE=1 FASTLOOP=false VECTORS WORKSIZE=256

Perhaps it was just the ordering of parameters?

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: actudoran on July 13, 2011, 08:49:40 AM
tried that ... didn't work

it's funny since just yesterday was ok ... then I had to reboot then didn't work  for GPU 1 ...


I'm running win 7 64 if that's any help ...

Thanks, Alex


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Bert on July 13, 2011, 10:50:34 AM
Does the card still show up under Catayist Control Center/Information/hardware as a secondary card.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: bcheck555 on July 13, 2011, 11:57:26 AM
Thanks for these patches, current numbers:

Catalyst v11.6; SDK v2.4
5830: 321Mhash/s
1000Mhz/350Mhz/1.15v; air cooled temps ~60c;  -k phatk AGGRESSION=13 VECTORS BFI_INT DEVICE=1 FASTLOOP=false WORKSIZE=256

6950: 368Mhash/s
900Mhz/775Mhz/1.15v; air cooled temps ~65c;  -k phatk AGGRESSION=11 VECTORS BFI_INT DEVICE=1 FASTLOOP=false WORKSIZE=128


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 13, 2011, 12:13:05 PM
Thanks for these patches, current numbers:

Catalyst v11.6; SDK v2.4
5830: 321Mhash/s
1000Mhz/350Mhz/1.15v; air cooled temps ~60c;  -k phatk AGGRESSION=13 VECTORS BFI_INT DEVICE=1 FASTLOOP=false WORKSIZE=256

6950: 368Mhash/s
900Mhz/775Mhz/1.15v; air cooled temps ~65c;  -k phatk AGGRESSION=11 VECTORS BFI_INT DEVICE=1 FASTLOOP=false WORKSIZE=128

Sounds great, your 5830 scores are just like mine :).

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: nebiki on July 13, 2011, 01:52:33 PM
i don't know why, but i lose 2Mhash/s going from 07-07 -> 07-11

You can always return to the previous version, what's your setup?

Dia

i've added vince's mod now and am running better than 07-07. 6950 with unlocked shaders, running at 425 Mhash/s at 1.1V now. if i think back to when i had 380 at higher power consumption.. all these tweaks are sure worth it. keep 'em coming


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: actudoran on July 13, 2011, 05:31:13 PM
Great job Dia ! btw ... reinstalled the lot ... even had a bug in recovery so was screwed ... whoooshhh ... press any key to boot from CD  ;D

Works every time!

Before acting up ... wierd ... it dropped a lot in temperature under full load from like 75 - 76 C to about 68 C ( GPU 0 that is gpu1 was idle since the bug ... funny is that GPU1 is the cold one on mine ... GPU 0 is about 5 C hotter since closer to the CPU ( big ol' 775 lump churning out heat )

Cheers guys !
Al


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: ditchmagnet on July 14, 2011, 06:49:10 AM
Really loving this kernel!  Looking forward to the next release!

So does the 11.7 preview give much of a performance increase with this?  Currently getting about 329 Mhash/s @ 1030/350 on 5830 Xtremes.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: ssateneth on July 14, 2011, 08:35:31 AM
Really loving this kernel!  Looking forward to the next release!

So does the 11.7 preview give much of a performance increase with this?  Currently getting about 329 Mhash/s @ 1030/350 on 5830 Xtremes.

I'm assuming you jumped on the newegg bandwagon too? I just started (wasnt aware of a dummy plug so I only have 2 running tonight, will have 5 after dummy plugs), and I can only get 970/300 stable @ 100% fan and 1.195 core. Temperatures are 78c., 309 mhash

How can you push the core so far? Did I just get a bad card? ty in advance


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 14, 2011, 08:54:54 AM
Really loving this kernel!  Looking forward to the next release!

So does the 11.7 preview give much of a performance increase with this?  Currently getting about 329 Mhash/s @ 1030/350 on 5830 Xtremes.

I'm assuming you jumped on the newegg bandwagon too? I just started (wasnt aware of a dummy plug so I only have 2 running tonight, will have 5 after dummy plugs), and I can only get 970/300 stable @ 100% fan and 1.195 core. Temperatures are 78c., 309 mhash

How can you push the core so far? Did I just get a bad card? ty in advance

Well, my 5830 runs @ 1000 / 350 and I did never need to use 100% FAN speed.
What are your temps? Is the case good ventilated / enough airflow?

Dia

Edit: WOHOOOO 100 postings :D!


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: ssateneth on July 14, 2011, 09:04:14 AM
if you tried to read my post, i said 78c.

also airflow should be fine, got a ton of fans sucking in air. maybe I should take off the plastic shell so the metal fins are exposed?
Noise also isn't an issue, they'll all be tucked away somewhere where I can't hear them. The difference in power between 70% fan and 100% fan is also negligible, seeing that they may only use 2 watts max at full power, if that.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 14, 2011, 09:09:48 AM
if you tried to read my post, i said 78c.

also airflow should be fine, got a ton of fans sucking in air. maybe I should take off the plastic shell so the metal fins are exposed?
Noise also isn't an issue, they'll all be tucked away somewhere where I can't hear them. The difference in power between 70% fan and 100% fan is also negligible, seeing that they may only use 2 watts max at full power, if that.

I tried, but not hard enough :-P ... sorry. 78c does not seem right, I have 77c with 1000 / 350 and Fan @ 72%.

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: actudoran on July 14, 2011, 03:53:53 PM
I got a pair of 5830 and they run diferently as one is closer to the CPU cooler that's pumping out about 125 w of residue heat from cpu ...
the "hot one" 74 - 80 C ( varying upon day/night ambiental temperature ) runs at 990/300 getting about 308 - 314 mhas  stok volts
the cool one 70-77 C running flat out 1040/360 332mhas avg 318 - 338 @ 1.163 V

boy they do give out some heat ...  at night the temp didn't drop under 30 C where I keep em ... about 18 Sqm of a well ventilated room ( big window )

Any of you keeping rigs in the basement / garrage ? what's that like tempwise ?


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: ditchmagnet on July 14, 2011, 05:14:26 PM
Really loving this kernel!  Looking forward to the next release!

So does the 11.7 preview give much of a performance increase with this?  Currently getting about 329 Mhash/s @ 1030/350 on 5830 Xtremes.

I'm assuming you jumped on the newegg bandwagon too? I just started (wasnt aware of a dummy plug so I only have 2 running tonight, will have 5 after dummy plugs), and I can only get 970/300 stable @ 100% fan and 1.195 core. Temperatures are 78c., 309 mhash

How can you push the core so far? Did I just get a bad card? ty in advance

I run my fans at 75%.  As for taking off the shroud, it will hurt cooling if the card has access to good air, but when they are sandwiched, I found this setup to work decent:

http://imageshack.us/photo/my-images/219/img20110628120156.jpg/

Also, try putting on some fresh TIM.

I wouldn't say you have a bad card since you are still OCed.  And ya, got most of mine from newegg when they had the $110 shipped cards - and 1 I have is a used one I traded for.


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: ssateneth on July 15, 2011, 01:18:30 AM
I finally got some dummy plugs and got 3 5830s working in 1 PC, though its not "in the open" like yours is, Furthermore, they are sandwiched very tightly (only 1 "blank" slot between the 16x slots instead of 2), so much so that when initially testing it before dummy plugs, the fan from 1 video card loudly grinded against the back of the card in front of it. It was very sudden (at 100% fan speed no less) and I pulled the power plug in a panic. I've since made all 3 "naked" with a drop of AS5 I've had lying around, but I'm not sure if its good anymore (its really hard to force out of the tube, but its still very tacky, like a slow running glue). Regardless, I also tied the PCI-E Power cables to parts of the case so it bends the top and bottom card away from the middle one a little bit to rpevent the fans from grinding again, and to give access to better air. Finally, when I get a drill that actually works, I'm going to screw this directly on the case side so I can close it. http://www.youtube.com/watch?v=RyMUddH0qzY (not my video, but exact same fan)


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: simonk83 on July 15, 2011, 11:44:04 AM
Very nice, thanks for this :)

2 x 5870's, both running at 970/320
11.6 drivers, 2.4 SDK

Before:  429Mh each
After:  443Mh each

:)


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 15, 2011, 12:14:06 PM
Very nice, thanks for this :)

2 x 5870's, both running at 970/320
11.6 drivers, 2.4 SDK

Before:  429Mh each
After:  443Mh each

:)

Another great result :) ... I had other things to do, but during the next week I will release a new version.

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-11
Post by: P1n3apqlExpr3ss on July 16, 2011, 03:26:26 AM
Works nicely, went from 298~ to 301~ on a 6870 @ 975/440


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-11
Post by: ssateneth on July 17, 2011, 04:02:58 AM
What happened to the estimated huge gains with your unreleased patch? I could've sworn the first page had a difference of about 7 ALU less between current version and unreleased version, now it's only 2. O_o


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-11
Post by: 1984 on July 17, 2011, 04:09:44 AM
355 Mh/s to 370 Mh/s on a stock hd6950, I've seen no difference apart from the increased hash rate, keep up the good work!


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-11
Post by: checkmate on July 17, 2011, 10:37:55 AM
Nice... I went from 295 to 307! Thanks!


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-11
Post by: Diapolo on July 17, 2011, 11:58:43 AM
What happened to the estimated huge gains with your unreleased patch? I could've sworn the first page had a difference of about 7 ALU less between current version and unreleased version, now it's only 2. O_o

I can explain this, I made a change in the kernel, checked it via KernelAnalyzer and it looked good. Then it was time for some sleep, checked the next day with Phoenix and observed, that all shares were invalid, so I had to revert the changes and the numbers on page 1. Sorry, my mistake!

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 17, 2011, 01:01:21 PM
Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix 1.5 miner) is included in this package and has to be used!
The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.



The new version 2011-07-17 is ready for download :). Should be faster on 58XX and 69XX cards again.
This version will only work, if you use it with Phoenix and the supplied __init__.py file because of modifications to kernel variables!

A very big thank you goes to user Vince for input and ideas!

Download here:
http://www.mediafire.com/?317u0y93u7mnbys

Have fun,
Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: bcheck555 on July 17, 2011, 02:24:35 PM
roughly .5Mhash/s increase with latest. Thanks.
337.20>337.86
5830 1050/350 -k phatkmod DEVICE=1 VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=13


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: urstroyer on July 17, 2011, 02:26:00 PM
very nice work, thanks a lot!


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: BeeTeaSea on July 17, 2011, 03:03:18 PM
Hi,

I am getting this error with a ATI 5870. The modified kernel from July 6th works fine.

./phoenix.py -u http://XXX DEVICE=1 -k phatk VECTORS BFI_INT WORKSIZE=256 AGGRESSION=14 FASTLOOP=true -q 8
[17/07/2011 16:58:36] Phoenix 1.50 starting...
[17/07/2011 16:58:36] Connected to server
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/python/threadpool.py", line 207, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/twisted/python/context.py", line 59, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.6/dist-packages/twisted/python/context.py", line 37, in callWithContext
    return func(*args,**kw)
  File "kernels/phatk/__init__.py", line 391, in mineThread
    self.output_buf)
  File "/usr/lib/pymodules/python2.6/pyopencl/__init__.py", line 202, in kernel_call
    self.set_args(*args)
  File "/usr/lib/pymodules/python2.6/pyopencl/__init__.py", line 224, in kernel_set_args
    self.set_arg(i, arg)
pyopencl.LogicError: clSetKernelArg failed: invalid arg size


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Tartarus on July 17, 2011, 03:26:02 PM
Hi,

I am getting this error with a ATI 5870. The modified kernel from July 6th works fine.

./phoenix.py -u http://XXX DEVICE=1 -k phatk VECTORS BFI_INT WORKSIZE=256 AGGRESSION=14 FASTLOOP=true -q 8
[17/07/2011 16:58:36] Phoenix 1.50 starting...
[17/07/2011 16:58:36] Connected to server
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/python/threadpool.py", line 207, in _worker
    result = context.call(ctx, function, *args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/twisted/python/context.py", line 59, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/lib/python2.6/dist-packages/twisted/python/context.py", line 37, in callWithContext
    return func(*args,**kw)
  File "kernels/phatk/__init__.py", line 391, in mineThread
    self.output_buf)
  File "/usr/lib/pymodules/python2.6/pyopencl/__init__.py", line 202, in kernel_call
    self.set_args(*args)
  File "/usr/lib/pymodules/python2.6/pyopencl/__init__.py", line 224, in kernel_set_args
    self.set_arg(i, arg)
pyopencl.LogicError: clSetKernelArg failed: invalid arg size


Both the new kernel.cl and __init__.py need to go into kernels/phatk/


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 17, 2011, 05:21:17 PM
Both the new kernel.cl and __init__.py need to go into kernels/phatk/

That's it, I thought I made it clear enough :D.

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Bert on July 17, 2011, 05:43:27 PM
Thanks for the hard work, sent you another donation. 458.88Mh/sec up to 459.41 with the last update on overclocked 5870's running at 1000/347.


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 17, 2011, 05:54:19 PM
Thanks for the hard work, sent you another donation. 458.88Mh/sec up to 459.41 with the last update on overclocked 5870's running at 1000/347.

Thanks Bert, you are one of the frequent donators and I really appreciate it :)! Have fun with this latest version!

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Soak on July 17, 2011, 05:55:18 PM
388 to 391 with last update (Radeon HD 6970). Now equal with original poclbm miner. Thank you! ;D


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: streetuff on July 17, 2011, 06:33:24 PM
just wanted to report....

HD5770 from 220 -> 222MH/s
HD5850 from 355 -> 359MH/s

with 2011-07-17


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: ZeroShift on July 17, 2011, 07:56:02 PM
Radeon 5770 @ 935/300: 200 MH/s to 213 MH/s!

Coins are on the way  ;D


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Reuef on July 17, 2011, 08:18:24 PM
5830 went from 313mh/s to 316 mh/s.  1000/400


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: streetuff on July 17, 2011, 08:29:39 PM
my 5770 is at 975/300


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: simonk83 on July 17, 2011, 08:40:44 PM
No change for me with the latest version.   Can't complain though :)


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: ssateneth on July 18, 2011, 03:00:12 AM
Got about 0.75 more on my 5830's. Not sure about my 5870 but its likely on the magnitude of 1-1.5. Keep up the hard work!

edit: 5870 had about 0.5-0.75 improvement


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: anatolikostis on July 18, 2011, 06:59:09 AM
5770@900core/200mem 512Mb

got decrease from 200MH/s to 198MH/s (working size 128/256/64 - no changes just more decrease), have rolled back to 11-07-2011 mode...

may be this because of 512Mb mem or 200MHz freq?


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 18, 2011, 08:46:21 AM
5770@900core/200mem 512Mb

got decrease from 200MH/s to 198MH/s (working size 128/256/64 - no changes just more decrease), have rolled back to 11-07-2011 mode...

may be this because of 512Mb mem or 200MHz freq?


What happens if you raise your mem clocks a little? Could be, that some optimisations are bad for older or weaker cards, because newest version uses more GPU registers, than the previous ones. Well if that's the case it's a trial and error, which version works best ;).

Dia


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: cyrusfox on July 18, 2011, 09:31:02 AM
Not working for me. phoenix 1.48, ubuntu 11.04 64bits.

[01/07/2011 14:45:48] Finding inner ELF...
[01/07/2011 14:45:48] Patching inner ELF...
[01/07/2011 14:45:48] Patching instructions...
[01/07/2011 14:45:48] BFI-patched 472 instructions...
[01/07/2011 14:45:48] Patch complete, returning to kernel...
[01/07/2011 14:45:48] Applied BFI_INT patch
[01/07/2011 14:45:48] FATAL kernel error: Failed to load OpenCL kernel!


Not working for me. phoenix 1.5, Linuxcoin.

Yes I copied both files kernel.cl and __init__.py into kernels/phatk/

I get this when I try to start Phoenix(it will work fine with the original kernel files, 6850@930/830=250Mh/s, 6950@850/735= 340 Mh/s)

Quote
user@linuxcoin:/opt/miners/phoenix$ ./phoenix.py -u http://user:pass@bitcoinpool.com:8334/ -k phatk DEVICE=0 VECTORS AGGRESSION=7 -v FASTLOOP BFI_INT
[18/07/2011 09:27:56] Finding inner ELF...
[18/07/2011 09:27:56] Patching inner ELF...
[18/07/2011 09:27:56] Patching instructions...
[18/07/2011 09:27:56] BFI-patched 472 instructions...
[18/07/2011 09:27:56] Patch complete, returning to kernel...
[18/07/2011 09:27:56] Applied BFI_INT patch
[18/07/2011 09:27:56] FATAL kernel error: Failed to load OpenCL kernel!

Can anyone tell me what I am doing wrong? Thanks


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: 1984 on July 18, 2011, 09:40:32 AM
cyrusfox have you tried using the PLATFORM= option?


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: damon1492 on July 18, 2011, 10:54:35 PM
Not faster for me still same hashes,
maybe just .2%extra but nothing big


doing about 24.3Ghash/s so not a big difference


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: navigator on July 19, 2011, 03:45:37 AM
DELETED for privacy


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 19, 2011, 05:34:54 AM
I get the "Kernel error: Unusual behavior from OpenCL. Hardware problem?" with the 7-17 version

It's only giving the error on my Sapphire 100297L. I've seen it come up twice within a few minutes up to once per hour. I also have a 100297-2L in the same box and it doesn't produce the error. I reverted back to 7-11 version and so far the error hasn't come up. Both cards are currently running at 1000/350 stock voltage 62c max. Even if I overclock to something unstable I have never gotten this error. I am currently watching the 7-11 version. If the error pops up even once I will report back.


In the other thread it seem that the error occured because of a faulty card or a too aggressive overclocking. It seems the new kernel pushes the cards a bit harder. If the error only occurs once every hour it should not be a big problem in terms of generated shares. What OS, SDK and driver are you on?

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: navigator on July 19, 2011, 07:20:25 AM
DELETED for privacy


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: ovidiusoft on July 20, 2011, 07:40:29 PM
I also rolled back to 07-11, 07-17 was giving me "Kernel error: Unusual behavior from OpenCL. Hardware problem?" just like @navigator had. Board is Sapphire 5830 Xtreme, GPU @1040, RAM @350, 334 MH/s with 07-11. 07-07 was also stable and getting 331 MH/s.


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 23, 2011, 09:40:20 AM
Reposted 2011-07-17 version because of a small mistake in variable naming. T1substate0 was wrong, it has to be state0subT1.
No further changes, that will do anything for those, who grabbed the version before this posting!

Currently no news for you guys, perhaps I can do a special version for 69XX cards (which could be 1 - 2 ALU OPs faster, but slower for 58XX), when there is demand. But for 58XX cards I'm out of optimisation ideas ;).

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 23, 2011, 10:09:21 PM
To all 69XX card owners, that want 1 ALU OP less, down to 1697 :). Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works :). Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: bcheck555 on July 26, 2011, 03:13:49 AM
To all 69XX card owners, that want 1 ALU OP less, down to 1697 :). Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works :). Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

twin frozr III 6950: 950Mhz, 825Mhz, 1.15v

7-17 kernel
388.55Mhash/s VECTORS BFI_INT FASTLOOP=false WORKSIZE=128 AGGRESSION=13

7-17 kernel modded with above
390.04Mhash/s VECTORS BFI_INT FASTLOOP=false WORKSIZE=64 AGGRESSION=13


edit for clarification...


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: norulezapply on July 26, 2011, 11:56:49 AM
Excellent thanks! Extra 4%!


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: sealkid on July 27, 2011, 01:56:24 PM
super cool! on my 5850 - increased from 360Mh/s to 385Mh/s - thanks!!!


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Boatski on July 28, 2011, 12:53:16 AM
great stuff!


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: jh1523 on July 28, 2011, 01:16:19 AM
Are these changes in the phatk svn now? Compared to a svn checkout from 7/25 I see absolutely no change in hash rates on a 5850 or a 5870.


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: ptfff on July 28, 2011, 05:57:22 AM
thanks


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 28, 2011, 06:39:54 AM
Are these changes in the phatk svn now? Compared to a svn checkout from 7/25 I see absolutely no change in hash rates on a 5850 or a 5870.

I don't know about a phatk svn, sorry. This version here will and should give you higher performance compared to the stock phatk. Just do a diff between my file here and the one on the svn, to check for differences.

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: jh1523 on July 28, 2011, 10:50:44 AM
Here's what I get when I do a diff between your kernel and the one from svn (diff kernel.diapolo kernel.svn):

Code:
79c79
< // state0subT1: state0subT1 = state0 - T1
---
> // T1substate0: T1substate0 = T1 - substate0
88c88
<                                               const uint PreVal4addT1, const uint state0subT1,
---
>                                               const uint PreVal4addT1, const uint T1substate0,
100c100
<       Vals[0] = W_3 + PreVal4addT1 + state0subT1;
---
>       Vals[0] = W_3 + PreVal4addT1 + T1substate0;
390c390
<               output[OUTPUT_SIZE] = output[OUTPUT_MASK & (W[3].x >> 2)] = W_3.x;
---
>               output[OUTPUT_SIZE] = output[(W[3].x >> 2) & OUTPUT_MASK] = W_3.x;
394c394
<               output[OUTPUT_SIZE] = output[OUTPUT_MASK & (W[3].y >> 2)] =  W_3.y;
---
>               output[OUTPUT_SIZE] = output[(W[3].y >> 2) & OUTPUT_MASK] =  W_3.y;
399c399
<               output[OUTPUT_SIZE] = output[OUTPUT_MASK & (W[3] >> 2)] = W_3;
---
>               output[OUTPUT_SIZE] = output[(W[3] >> 2) & OUTPUT_MASK] = W_3;

The repository is at http://svn3.xp-dev.com/svn/phoenix-miner/trunk/kernels/phatk




Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 28, 2011, 12:27:50 PM
Here's what I get when I do a diff between your kernel and the one from svn (diff kernel.diapolo kernel.svn):

Code:
79c79
< // state0subT1: state0subT1 = state0 - T1
---
> // T1substate0: T1substate0 = T1 - substate0
88c88
<                                               const uint PreVal4addT1, const uint state0subT1,
---
>                                               const uint PreVal4addT1, const uint T1substate0,
100c100
<       Vals[0] = W_3 + PreVal4addT1 + state0subT1;
---
>       Vals[0] = W_3 + PreVal4addT1 + T1substate0;
390c390
<               output[OUTPUT_SIZE] = output[OUTPUT_MASK & (W[3].x >> 2)] = W_3.x;
---
>               output[OUTPUT_SIZE] = output[(W[3].x >> 2) & OUTPUT_MASK] = W_3.x;
394c394
<               output[OUTPUT_SIZE] = output[OUTPUT_MASK & (W[3].y >> 2)] =  W_3.y;
---
>               output[OUTPUT_SIZE] = output[(W[3].y >> 2) & OUTPUT_MASK] =  W_3.y;
399c399
<               output[OUTPUT_SIZE] = output[OUTPUT_MASK & (W[3] >> 2)] = W_3;
---
>               output[OUTPUT_SIZE] = output[(W[3] >> 2) & OUTPUT_MASK] = W_3;

The repository is at http://svn3.xp-dev.com/svn/phoenix-miner/trunk/kernels/phatk

Now it's clear, the one on svn is the same speed wise, but I uploaded mine again to rename a variable and reordered something for the output buffer. Nice that Jedi took my kernel as base for Phoenix, was not sure about that :).

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on July 30, 2011, 06:15:19 PM
I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus).
Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.

Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: jh1523 on July 30, 2011, 06:16:43 PM
*fingers crossed*


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: shotgun on July 30, 2011, 08:31:28 PM
I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus).
Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.

Dia

where is the download link for version 2.0 of phatk?


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: joulesbeef on July 30, 2011, 08:35:59 PM
phatk 2.0 is in the phatk thread (http://forum.bitcoin.org/?topic=7964.0)

but it sounds like you might want to wait til monday

Quote
Update:  Second new version with a few optimizations from Diapolo's Miner It appears to be somewhat broken, I will post an update this weekend


Title: Re: further improved phatk OpenCL kernel (> 2% increase) for Phoenix - 2011-07-01
Post by: cyrusfox on July 31, 2011, 07:45:00 PM
Not working for me. phoenix 1.48, ubuntu 11.04 64bits.

[01/07/2011 14:45:48] Finding inner ELF...
[01/07/2011 14:45:48] Patching inner ELF...
[01/07/2011 14:45:48] Patching instructions...
[01/07/2011 14:45:48] BFI-patched 472 instructions...
[01/07/2011 14:45:48] Patch complete, returning to kernel...
[01/07/2011 14:45:48] Applied BFI_INT patch
[01/07/2011 14:45:48] FATAL kernel error: Failed to load OpenCL kernel!


Not working for me. phoenix 1.5, Linuxcoin.

Yes I copied both files kernel.cl and __init__.py into kernels/phatk/

I get this when I try to start Phoenix(it will work fine with the original kernel files, 6850@930/830=250Mh/s, 6950@850/735= 340 Mh/s)

Quote
user@linuxcoin:/opt/miners/phoenix$ ./phoenix.py -u http://user:pass@bitcoinpool.com:8334/ -k phatk DEVICE=0 VECTORS AGGRESSION=7 -v FASTLOOP BFI_INT
[18/07/2011 09:27:56] Finding inner ELF...
[18/07/2011 09:27:56] Patching inner ELF...
[18/07/2011 09:27:56] Patching instructions...
[18/07/2011 09:27:56] BFI-patched 472 instructions...
[18/07/2011 09:27:56] Patch complete, returning to kernel...
[18/07/2011 09:27:56] Applied BFI_INT patch
[18/07/2011 09:27:56] FATAL kernel error: Failed to load OpenCL kernel!

Can anyone tell me what I am doing wrong? Thanks

It took setting up a second mining machine to figure out what I did wrong here. I had to type sudo the first time when I start the miner using the changed kernel file for some reason(really quite odd). Anyways my results
Card, prior hash, current hash
5770,212 Mhash to 229 Mhash(8%! improvement)
6950,341 Mhash to 344 Mhash

So if you are using linux and it gives you any sort of failure, try it again with super privileges, has worked everytime now. Thanks so much, I will make a humble donation.


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Sick_Boy on July 31, 2011, 08:08:37 PM
with this kernel on my hd 5850 I reach 415MH/s, I was at 400 before with core at 1000MHz and memory at 350MHz


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: Diapolo on August 02, 2011, 08:23:51 PM
Hey guys, I'm making good progress for 69XX users and need a few to test my current version as I want to mature it before release.

- only 69XX users / users on VLIW4 design cards
- I need Win and Linux users
- I need SDK 2.1 / SDK 2.4 and SDK 2.5 users

Drop me a PM, if you want to participate and meet the requirements, I will then send you a DL link! If you have received the files the test results may be posted here and may be freely discussed, so that's no shitty NDA or something :). I only want to make sure that the kernel works and can compete with phatk 2.1 :D.

I'm sorry to say, but 58XX users are better with phatk 2.1 for now :-/.

Regards,
Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-07-17
Post by: UrbanAdventurer on August 03, 2011, 05:56:12 AM
Just updated. About +7Mh Thanks


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 04, 2011, 07:33:40 PM
New version was just released, it should be the fastest for 69XX cards:
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788

This is the preferred switch for Phoenix in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256
or
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

Please test this version with SDK 2.4 / SDK 2.5! SDK 2.1 performance seems worse, but at least it should work. Report any errors and problems here and let me know what you think.
Have a look at your cards temperatures, I got a report, that they may be lower, which would be great :).

Regards,
Dia


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-08-04
Post by: Soak on August 04, 2011, 07:47:49 PM
Quote
-k phatk VECTORS VECTORS2 BFI_INT WORKSIZE=128 AGGRESSION=11 DEVICE=1

Radeon 6970, 391 to 393. Nice :)


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: st4rdust on August 04, 2011, 08:35:07 PM
I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 13 MHash/s more on my 5830 (up from 310 to 323)!
The modified version seems to work, Hashes are accepted and no errors are observed so far.

Brilliant. I also run 5830s and got the same result as you, from ~310 to 322. Keep up the good work, myself and every other miner who uses phatk cannot overstate how much we appreciate your efforts.


Title: Re: further improved phatk OpenCL kernel (> 3% increase) for Phoenix - 2011-08-04
Post by: Clipse on August 05, 2011, 12:23:09 AM
New version was just released, it should be the fastest for 69XX cards:
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788

This is the preferred switch for Phoenix in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256
or
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

Please test this version with SDK 2.4 / SDK 2.5! SDK 2.1 performance seems worse, but at least it should work. Report any errors and problems here and let me know what you think.
Have a look at your cards temperatures, I got a report, that they may be lower, which would be great :).

Regards,
Dia

Well I'll be damned, you do mention its for 69xx range but Im on 5850 and I got a boost of 9mh/s per card using the 08-04 pre-release versuin with VECTORS VECTORS4 , hot damn :)

Just to give everyone an idea of the gains:

stock phatk: 400mh maxed out (1000/180)
08-04 phatk: 440mh maxed out (1000/180)
08-04 phatk: 475mh maxed out (1100/180)
08-04 phatk: 500mh+ is my guess (1200/180) <---- This is in the pipeline, mostly just for interest sake I will be LN2 cooling it and running voltage at 1.4v ;) card might die but 500mh will be sexy.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: littlekid on August 05, 2011, 01:46:11 AM
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256
My 5770@970/300 improve from 220 -> 223 MH/s
Cyclone 6850@1030/300 nothing changes, still 280 MH/s
not test ''VECTORS4'' switch yet
anyone have another configure switch for further improvement?


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Psychonautchn on August 05, 2011, 03:56:56 AM
gotta try on my 5870, thx!


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 07, 2011, 03:12:37 PM
Updated 1st post kernel performance data with SDK 2.5 and KernelAnalyzer 1.9 Cal 11.7 profile.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: ezdvd on August 07, 2011, 06:15:19 PM
gave me 25 more mhash on my 5870!!!


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: saykor on August 07, 2011, 07:11:36 PM
strange with card 5870:
with vectors - 454mhash
with vectors2 - drop with 50mhash
with vectors4 - drop with 200mhash


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 07, 2011, 07:17:13 PM
strange with card 5870:
with vectors - 454mhash
with vectors2 - drop with 50mhash
with vectors4 - drop with 200mhash

What kernel version, what driver version, which SDK, please post command line for Phoenix ... seems very strange to me.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: saykor on August 07, 2011, 07:23:57 PM
strange with card 5870:
with vectors - 454mhash
with vectors2 - drop with 50mhash
with vectors4 - drop with 200mhash

What kernel version, what driver version, which SDK, please post command line for Phoenix ... seems very strange to me.

Dia

i just install the new drivers:
AMD-APP-SDK-v2.5-Windows-64
11-7_vista64_win7_64_dd_ccc_ocl
2011-08-04 kernel (pre-release)


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 07, 2011, 07:27:03 PM
To all happy new kernel users, there is one thing you should know ... there have been NO donations since 2011-07-31, which makes me a bit sad.

It's my free time that I put in here (it were many hours till now) and the motivation is not only to get a "Thank you!". Remember, you guys generate more BTC with the kernel mods. It doesn't matter if it's my mod, Phateus mod or any others mod ... just be a little thankful and you keep a free and fast kernel + a motivated kernel mixer Diapolo ;).

No offense to all the great people who already donated a few bitcents or even more, who helped me testing this, who helped me fix bugs or who added great ideas into this work!

Regards,
Diapolo


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 07, 2011, 07:28:16 PM
strange with card 5870:
with vectors - 454mhash
with vectors2 - drop with 50mhash
with vectors4 - drop with 200mhash

What kernel version, what driver version, which SDK, please post command line for Phoenix ... seems very strange to me.

Dia

i just install the new drivers:
AMD-APP-SDK-v2.5-Windows-64
11-7_vista64_win7_64_dd_ccc_ocl
2011-08-04 kernel (pre-release)


How do you start Phoenix? What are your clock speeds?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: saykor on August 07, 2011, 07:36:21 PM
How do you start Phoenix? What are your clock speeds?

Dia

phoenix -u http://xxx:xxx@api.bitcoin.cz:8332 -k phatk DEVICE=0 VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256

clock 982
memory 352
voltage - default

thanks
saykor


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 07, 2011, 07:46:54 PM
How do you start Phoenix? What are your clock speeds?

Dia

phoenix -u http://xxx:xxx@api.bitcoin.cz:8332 -k phatk DEVICE=0 VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256

clock 982
memory 352
voltage - default

thanks
saykor

If you use VECTORS you won't use Vectors at all with the current pre-release version (the switch VECTORS is ignored). You HAVE to use the switch VECTORS2, take a lookt at the first posting.

Edit: Did you replace the Phoenix init file, too?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: saykor on August 07, 2011, 07:48:23 PM
How do you start Phoenix? What are your clock speeds?

Dia

phoenix -u http://xxx:xxx@api.bitcoin.cz:8332 -k phatk DEVICE=0 VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256

clock 982
memory 352
voltage - default

thanks
saykor

If you use VECTORS you won't use Vectors at all with the current pre-release version (the switch VECTORS is ignored). You HAVE to use the switch VECTORS2, take a lookt at the first posting.

Dia

when I use VECTORS2 mhash drop with 50, from 454 to 404

yes I replace all files in the package


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: cubemonkey on August 07, 2011, 10:01:40 PM
My miners went from 355 to 362 on 6950 based systems running ubuntu natty 64 bit.  I tried various flags, and this is what worked best for me:

-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

WORKSIZE=256 was slightly slower. I tried VECTORS4 just for fun, and it dropped down to 300 or so.

My 6950s are running at 880Mhz w/ 800 mhz memory clock.  (no shader unlock)




Thank you for all of your hard work.  Donation sent.









Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: M4rk5M4n on August 08, 2011, 01:26:08 PM
thx dude^^ gave me alot of extrapower on my 5850's (995/1250) (under 370 before -> 410 with phateus/405 with yours)
[setup = 11.8, 2.5 SDK, 2 5850 @ 995/1250 (watercooled), phoenix 1.50 (-v -q 2 -a 50 -u http://y:x@mining.eligius.st:8337/ -k phatk VECTORS4 BFI_INT WORKSIZE=128 AGGRESSION=13 DEVICE=0/1)]

some things that got to my mind:

i compared your 8-4 to phateus 2.1 today, let the whole setup run with the same settings for a few hours. Though the phoenix-instance with you kernel showed less MH/s, i got more accepted shares (~10%) and less stales (less than halve) on the same pool in the same time running.

since 7-11 i get 2-3 rejects EVERYTIME longpoll pushes new work, doesnt matter which pool, which clocks or which settings (yours produces fewer than phat's kernel, though).

heres a screenshot of my test (phateus is left, dia is right -> had alil under 1700 shares and alil over 20 rejects) . Watch the left one at 15.05.42 or 15.05.28 or the right one at 15.05.41 or 15.06.02 to see what i mean with the rejects that are caused by longpoll (that was mining at eligius, but its the same with every pool).

the sreen:
http://img69.imageshack.us/img69/8787/miningscreen.png


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: jh1523 on August 08, 2011, 04:45:24 PM
Can't get the new kernel to work at all in Linux (using BAMT) with the phoenix miner. The miner simply refuses to start with the new kernel, even with changed parameters (VECTORS2 instead of VECTORS). I thought at first it's a problem with line endings (DOS vs Unix-style) since the files from mediafire are in DOS format; but even after converting with dos2unix it still won't work. The mioner works perfectly well with the previous version of the kernel.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 08, 2011, 04:56:05 PM
Can't get the new kernel to work at all in Linux (using BAMT) with the phoenix miner. The miner simply refuses to start with the new kernel, even with changed parameters (VECTORS2 instead of VECTORS). I thought at first it's a problem with line endings (DOS vs Unix-style) since the files from mediafire are in DOS format; but even after converting with dos2unix it still won't work. The mioner works perfectly well with the previous version of the kernel.

Did you start Phoenix via sudo the first time because of the OpenCL kernel compilation?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: jh1523 on August 08, 2011, 05:02:08 PM
Can't get the new kernel to work at all in Linux (using BAMT) with the phoenix miner. The miner simply refuses to start with the new kernel, even with changed parameters (VECTORS2 instead of VECTORS). I thought at first it's a problem with line endings (DOS vs Unix-style) since the files from mediafire are in DOS format; but even after converting with dos2unix it still won't work. The mioner works perfectly well with the previous version of the kernel.

Did you start Phoenix via sudo the first time because of the OpenCL kernel compilation?

Dia

Yep, started as root.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 08, 2011, 05:07:03 PM
Can't get the new kernel to work at all in Linux (using BAMT) with the phoenix miner. The miner simply refuses to start with the new kernel, even with changed parameters (VECTORS2 instead of VECTORS). I thought at first it's a problem with line endings (DOS vs Unix-style) since the files from mediafire are in DOS format; but even after converting with dos2unix it still won't work. The mioner works perfectly well with the previous version of the kernel.

Did you start Phoenix via sudo the first time because of the OpenCL kernel compilation?

Dia

Yep, started as root.

Did Phoenix create a new .elf file (if that's the case in Linux)? Any output any error message?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: jh1523 on August 08, 2011, 05:11:34 PM
No, no new .elf file. Last .elf is from 7/28 (the last time I messed with the kernel). Haven't seen any error messages either.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 08, 2011, 07:40:27 PM
No, no new .elf file. Last .elf is from 7/28 (the last time I messed with the kernel). Haven't seen any error messages either.


Well it should have created a new .elf for the new kernel ... there seems to be something wrong.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: codyrbrown on August 08, 2011, 07:42:44 PM
Thanks, this topic helped me in the past as well.  Keep up the good work.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: jh1523 on August 08, 2011, 08:17:46 PM
No, no new .elf file. Last .elf is from 7/28 (the last time I messed with the kernel). Haven't seen any error messages either.


Well it should have created a new .elf for the new kernel ... there seems to be something wrong.

Dia

When I try again after re-downloading the files and converting them to Unix format:
Code:
  python phoenix.py -u user:pass@pool:8332 -k phatk DEVICE=0 VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=128

I get the following message:

Code:
Unknown protocol:

FWIW the pool is deepbit. Still no new .elf is created.
 


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Diapolo on August 09, 2011, 04:28:52 AM
No, no new .elf file. Last .elf is from 7/28 (the last time I messed with the kernel). Haven't seen any error messages either.


Well it should have created a new .elf for the new kernel ... there seems to be something wrong.

Dia

When I try again after re-downloading the files and converting them to Unix format:
Code:
  python phoenix.py -u user:pass@pool:8332 -k phatk DEVICE=0 VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=128

I get the following message:

Code:
Unknown protocol:

FWIW the pool is deepbit. Still no new .elf is created.
 

You could try to add http:// in front of user / pass:
Code:
python phoenix.py -u http://user:pass@pool:8332 -k phatk DEVICE=0 VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=128

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: Enterpol on August 09, 2011, 08:57:52 AM
Thanks for this.  Only 3 more hashes/s on my 5830, but it allows me to clock it down about 5-10 mhz without losing hashes.  This leads to a more stable card!  Awesome!  ;D


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: jh1523 on August 09, 2011, 10:19:13 AM
You could try to add http:// in front of user / pass:
Code:
python phoenix.py -u http://user:pass@pool:8332 -k phatk DEVICE=0 VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=128

Dia

Duh! That works, thanks! However it still doesn't work with BAMT's automatic mine scripts (but this isn't the kernel's problem).

Nonetheless I get 4Mhash/s less with the new kernel compared to the old one on a 5870 (466Mhash/s vs 470Mhash/s, all other things being equal except for using VECTORS2 with the new kernel).


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: typhoon on August 09, 2011, 12:57:14 PM
I can't wait to give this kernel a shot when I get home!


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-04
Post by: cletus815 on August 10, 2011, 01:35:48 AM
This is great work and I can't believe I'm just finding it. Thanks!


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 11, 2011, 03:40:40 PM
Download version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4j

New version for your testing pleasure ;). Remember to use VECTORS2 as switch!
This one should be a bit faster for 58XX and 69XX cards compared to earlier versions PLUS it should not generate invalid shares, if more than 1 positve nonce is found in a work-group!

If a few of you could make a comparison (with older or other kernel versions) of accepted shares over a certain period of time, this woule be pretty cool!

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: jh1523 on August 11, 2011, 03:57:08 PM
So far my results are as follows (all other things being equal except for using VECTORS2 instead of VEC TORS on all kernels since 8/4/11):

kernel from SVN 7/25/11: 470MHash/s, approx 0.5% stale shares
Diapolo kernel 8/4/11: 466Mhash/s, didn't run long enough for % stales
Diapolo kernel 8/11/11: 467 Mhash/s, stale test in progress.

All of this is on a 5870.

On a 5850, same remarks as above:

SVN 7/25/11: 381Mhash/s, 0.5% avg stales
Diapolo 8/4/11: not tested
Diapolo 8/11/11: 381Mhash/s, stales in progress.

(edit) after 537 shares on the 5870 and 423 on the 5850, 0% stales on either.

(edit2) around share #580, the 5870 gave me a " Kernel error: Unusual behavior from OpenCL. Hardware problem?" but then continued mining.

(edit3) after 2 1/2 hours:
5870: 932 shares, 10 stales (1.06%)
5850: 758 shares, 14 stale (1.69%)

I am switching back to the kernel from 7/25; on my system I get better performance from it.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Una on August 12, 2011, 04:48:41 AM
Diapolo:
Have you tried using the APP Profiler for actual execution stats?
I've found it to be a much better way of comparing different kernel versions.
The method I've used is:
  • Run it for a fixed period of time. Just a couple minutes is fine.
  • Open up the csv output in a spreadsheet.
  • Average the ALUBusy then the  ALUPacking columns.
  • Convert the averages from percent to decimal (99.42 to .9942) and multiply them together.
  • Multiply that number by the device SP count. (5770=800, 5870=1600, 6950=1400, 6970=1536, etc..)
  • Divide by ALUInsts.

Now you have a number (for that specific device only) which paints a more complete picture of actual performance.
This also shows how (but not why) the last couple versions have performed slower on 69xx.
While the number of executed instructions have gone down, the overall SP utilization has also gone down.

-Una


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 12, 2011, 05:07:56 AM
Diapolo:
Have you tried using the APP Profiler for actual execution stats?
I've found it to be a much better way of comparing different kernel versions.
The method I've used is:
  • Run it for a fixed period of time. Just a couple minutes is fine.
  • Open up the csv output in a spreadsheet.
  • Average the ALUBusy then the  ALUPacking columns.
  • Convert the averages from percent to decimal (99.42 to .9942) and multiply them together.
  • Multiply that number by the device SP count. (5770=800, 5870=1600, 6950=1400, 6970=1536, etc..)
  • Divide by ALUInsts.

Now you have a number (for that specific device only) which paints a more complete picture of actual performance.
This also shows how (but not why) the last couple versions have performed slower on 69xx.
While the number of executed instructions have gone down, the overall SP utilization has also gone down.

-Una

Not yet, I had the impression that this only works for V++ non Express Editions!? I will take a look now :).
Are you skilled to interpret APP Profiler results?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: mute20 on August 12, 2011, 05:18:59 AM
This worked very good got a bigger increase than overclocking 50 mhz which would crash my computer


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Una on August 12, 2011, 05:39:02 AM

Not yet, I had the impression that this only works for V++ non Express Editions!? I will take a look now :).
Are you skilled to interpret APP Profiler results?

Dia


I have no idea about the Windows version, but the Linux version works completely standalone.
I wouldn't say I'm skilled at all. It just happened to be bundled with the SDK, and seemed like it would be helpful.
I more or less just figured out what was important in the output and figured out a way to directly compare results.
Having actual runtime data seems like it would be more helpful than the projections the kernel analyzer gives you.

-Una


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: JBDive on August 12, 2011, 05:49:32 AM
NICE!

Dropped the new kernel into GUIMiner's sub folders, created a new Phoenix Miner with the default switch listed on page 1 and BAM a solid 10% increase if not a little more and so far stales are zero in the 20 minutes I have been running it using Bitcoins.LC pool.

I can't send much but I will certainly send something, will see how the pool goes over the next day and how well the new kernel holds up.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 12, 2011, 05:54:24 AM
I did a short test with the latest kernel and it seems the ALUBusy value of 68,9% is way too low. Only 68% of GPU time is used for ALU instruction processing. The  ALUPacking of 98,5% is nearly ideal.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: tiberiandusk on August 12, 2011, 06:16:27 AM
My 5870 went from 436 to 398.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 12, 2011, 06:38:46 AM
My 5870 went from 436 to 398.

VECTORS2?


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: jh1523 on August 12, 2011, 11:51:35 AM
As an addition to my post above, I have tried to test the new kernel in Windows as well, using guiminer/phoenix. When replacing the default kernel.cl and __init__.py with the ones from kernel 8/11/11 followed by creating a new miner I get the following:

Code:
2011-08-12 07:44:40: Running command: C:\guiminer-20110701\guiminer\phoenix.exe -u http://user:pass@pool:8332 PLATFORM=0 DEVICE=0 -k phatk BFI_INT  VECTORS2  FASTLOOP=false WORKLOAD=128 AGGRESSION=11
2011-08-12 07:44:40: Listener for "testdiapolo" started
2011-08-12 07:44:41: Listener for "testdiapolo": [12/08/2011 07:44:41] FATAL kernel error: Failed to load OpenCL kernel!

As a further test I put back the original kernel.cl and __init__.py, and the test miner works fine with them.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: tiberiandusk on August 12, 2011, 11:35:48 PM
My 5870 went from 436 to 398.

VECTORS2?

Thanks, I knew I was missing something. I'm at 451 now.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: aeonf on August 13, 2011, 10:01:13 AM
I have the exact same problem as jh1523 .


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: lenguyenphat on August 13, 2011, 10:36:09 AM
Is there any download link different than mediafire? Unable to download!


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: rafuter on August 13, 2011, 01:19:55 PM
Is it also possible to update the latest OpenCL driver, for example ATI SDK, before the program starts ?

Why dont you add such feauture before improving speeds ?


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 13, 2011, 02:32:26 PM
As an addition to my post above, I have tried to test the new kernel in Windows as well, using guiminer/phoenix. When replacing the default kernel.cl and __init__.py with the ones from kernel 8/11/11 followed by creating a new miner I get the following:

Code:
2011-08-12 07:44:40: Running command: C:\guiminer-20110701\guiminer\phoenix.exe -u http://user:pass@pool:8332 PLATFORM=0 DEVICE=0 -k phatk BFI_INT  VECTORS2  FASTLOOP=false WORKLOAD=128 AGGRESSION=11
2011-08-12 07:44:40: Listener for "testdiapolo" started
2011-08-12 07:44:41: Listener for "testdiapolo": [12/08/2011 07:44:41] FATAL kernel error: Failed to load OpenCL kernel!

As a further test I put back the original kernel.cl and __init__.py, and the test miner works fine with them.

What is WORKLOAD=128? This should be WORKSIZE=128!

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 13, 2011, 02:33:10 PM
Is it also possible to update the latest OpenCL driver, for example ATI SDK, before the program starts ?

Why dont you add such feauture before improving speeds ?

No that's not possible ... you have to make sure you use current drivers.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: mute20 on August 13, 2011, 03:57:43 PM
so I currently have 1.50 phoenix ,but I am curious if the current 1.61 will work better with your increases?


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Beta-coiner1 on August 13, 2011, 05:53:34 PM
Speed decrease by 2-3.5 Mh/s on my 6950 but .5-.8 increase for my 5770.Stale/Invalid rate seems to be higher over @BTCGuild as well.....but not Deepbit.Got mixed results on this one.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 14, 2011, 07:53:15 PM
Speed decrease by 2-3.5 Mh/s on my 6950 but .5-.8 increase for my 5770.Stale/Invalid rate seems to be higher over @BTCGuild as well.....but not Deepbit.Got mixed results on this one.

That decrease for 6950 is weird and it should NOT be like this ... whats your Phoenix comand line, driver and SDK version?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Beta-coiner1 on August 14, 2011, 08:32:44 PM
^^ Catalyst driver 11.6B/SDK 2.5  -V 4 -w64 -F3 using GUI miner


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: JBDive on August 15, 2011, 12:24:22 AM
^^ Catalyst driver 11.6B/SDK 2.5  -V 4 -w64 -F3 using GUI miner

Use the default posted on page one for that 5770 and you should see almost a 10% gain.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: DiamondPlus on August 15, 2011, 01:35:40 AM
I've tested your patch, and there are some great improvements from the original phoenix 1.50 miner indeed.
My 6950 gets improvement.
My 5870 gets improvement.
My 5850 gets improvement.

But I have noticed a regression with my 6870 (1005mhz core / 200 mhz mem)

Configuration:
-Using window 7
-Using Catalyst 11.6
-Using the aoclbf 1.74 frontend for phoenix 1.50
phatk
Vector
BFI_INT
Aggression 13
Work size 128

2011-07-03 kernel: 317 mhash/s (all 3 number are peak value)
2011-07-06 kernel: 283
2011-07-07 kernel: 283

...Needless to say, something bad happened between 07-03 and 07-06. I hope we can get to the bottom of this. If you need me to test something, I will be happy to do what I can for you.

-DiamondPlus


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 15, 2011, 05:29:49 AM
I've tested your patch, and there are some great improvements from the original phoenix 1.50 miner indeed.
My 6950 gets improvement.
My 5870 gets improvement.
My 5850 gets improvement.

But I have noticed a regression with my 6870 (1005mhz core / 200 mhz mem)

Configuration:
-Using window 7
-Using Catalyst 11.6
-Using the aoclbf 1.74 frontend for phoenix 1.50
phatk
Vector
BFI_INT
Aggression 13
Work size 128

2011-07-03 kernel: 317 mhash/s (all 3 number are peak value)
2011-07-06 kernel: 283
2011-07-07 kernel: 283

...Needless to say, something bad happened between 07-03 and 07-06. I hope we can get to the bottom of this. If you need me to test something, I will be happy to do what I can for you.

The latest version need the argument VECTORS2 in order to use 2-component Vectors + try to raise your mem clock. There was a version, who needed a little more MHz for the memory to be efficient. Please report back!

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 15, 2011, 05:31:44 AM
^^ Catalyst driver 11.6B/SDK 2.5  -V 4 -w64 -F3 using GUI miner

I'm pretty sure 4-component Vectors are not helpful for 69XX, please try -V 2 or VECTORS2.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 15, 2011, 08:25:01 AM
I leave! I will give the BTC :'(
increase of difficulty has made for me unprofitable to continue :-[
It is ready to give the last 21.02 BTC to the one who will send me 0.1 BTC (I will send on the same address)
1vsnfxtUY6cn8TU7LdHXi89r7pRnNDLsc

You got my application ;) ...

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Beta-coiner1 on August 15, 2011, 06:01:09 PM
I leave! I will give the BTC :'(
increase of difficulty has made for me unprofitable to continue :-[
It is ready to give the last 21.02 BTC to the one who will send me 0.1 BTC (I will send on the same address)
1vsnfxtUY6cn8TU7LdHXi89r7pRnNDLsc
Just send it to me or the hardware if you don't require it anymore.

You got my application ;) ...

Dia
Lol,That new member is a scammer if you look at the 4 posts all say the same.Hopefully,no one actually sent him the .1.

BTW,I did do V2 but my 6950 doesn't seem to like that as the results are still a bit lower than Phateus' variant.I get 352-356.6 with your latest variant but with Phateus I get 353.2-358.4 at the same clocks 867/1250 clock.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: norulezapply on August 15, 2011, 06:38:39 PM
hey whenever I try copying the latest version into my kernels/phatk folder with phoenix 1.50 (also tested with 1.6.1) I get the following:

H:\Apps\Bitcoin\Phoenix>phoenix.exe -u http://user:password@192.168.1.77
:8337/ -k phatk VECTORS2 BFI_INT AGGRESSION=7 DEVICE=0

[15/08/2011 19:36:51] FATAL kernel error: Failed to load OpenCL kernel!

It works fine with the non-modified kernels...
Any ideas? =[


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: skadoosh00 on August 15, 2011, 07:20:24 PM
Hmm... seem to be getting 4-5mh/s less for the latest release. This is on my stock 6950 with nothing unlocked using

Code:
 -k phatk BFI_INT FASTLOOP DEVICE=0 VECTORS2 WORKSIZE=128 AGGRESSION=8

Also tried VECTORS VECTORS2 with no difference. I reverted back to the July kernel release for now.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 15, 2011, 08:11:02 PM
Hmm... seem to be getting 4-5mh/s less for the latest release. This is on my stock 6950 with nothing unlocked using

Code:
 -k phatk BFI_INT FASTLOOP DEVICE=0 VECTORS2 WORKSIZE=128 AGGRESSION=8

Also tried VECTORS VECTORS2 with no difference. I reverted back to the July kernel release for now.

Try FASTLOOP=false and perhaps a higher agression or WORKSIZE=256 ... SDK 2.5?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: nerv on August 15, 2011, 11:06:08 PM
I tried it, and in my case, with a HD5870 at 990Mhz and 300MHz memory using the latest version of the kernel and Catalyst 11.5 + AMD Stream SDK 2.1 gives me 450MH / s.
With the flags-v-w 256-f0.
A good improvement.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: cubemonkey on August 17, 2011, 04:16:18 PM
6950's using the new phoenix + phatk2  (clocked 880 gpu / 300 mem)

old: 360Mh/s
now: 364Mh/s

(consistent across all GPUs)

I messed a lot with flags... this is what gave me the highest numbers:

Code:
-k phatk2 VECTORS2 AGGRESSION=13 WORKSIZE=64 FASTLOOP=false

Not sure why, but worksize=64 is what gave me the biggest gain.  Tried all aggression between 10 and 16 as well.  13 yielded the highest.



Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 17, 2011, 05:08:02 PM
6950's using the new phoenix + phatk2  (clocked 880 gpu / 300 mem)

old: 360Mh/s
now: 364Mh/s

(consistent across all GPUs)

I messed a lot with flags... this is what gave me the highest numbers:

Code:
-k phatk2 VECTORS2 AGGRESSION=13 WORKSIZE=64 FASTLOOP=false

Not sure why, but worksize=64 is what gave me the biggest gain.  Tried all aggression between 10 and 16 as well.  13 yielded the highest.



phatk2 has no VECTORS2 in Phoenix ;) ... you were not using my kernel either. Wrong forum?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: cubemonkey on August 17, 2011, 07:03:04 PM

Quote

phatk2 has no VECTORS2 in Phoenix ;) ... you were not using my kernel either. Wrong forum?

Dia

I'm so confused :P   For some reason I thought you = phatk (and by extension phatk2). So your kernel is a modified version of the original phatk?   Where do I find your kernel?   And any flags you would recommend for a 6950?   I'll report back with results :)


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: jh1523 on August 17, 2011, 07:43:22 PM
Where do I find your kernel?

First post of this thread.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: conspirosphere.tk on August 20, 2011, 01:54:19 PM
Strange: now I am getting the max speed with the original phatk2 included in Phoenix 1.6.2 with my 5750. Here the results of some tests:

phoenix-1.6.2 original
phatk VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
=168 Mhs

phoenix-1.6.2 original
phatk2 VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
=170 Mhs

phoenix-1.6.2 original
-k phatk2 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
=169 Mhs

2011-08-11 kernel 
phatk VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
= 165 Mhs

2011-08-11 kernel
phatk2 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
= 153 Mhs

phatk 2011-07-17 kernel
phatk VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
=167 Mhs

poclbm VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
= 157 Mhs


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 20, 2011, 02:40:33 PM
Strange: now I am getting the max speed with the original phatk2 included in Phoenix 1.6.2 with my 5750. Here the results of some tests:

phoenix-1.6.2 original
phatk VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
=168 Mhs

phoenix-1.6.2 original
phatk2 VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
=170 Mhs

phoenix-1.6.2 original
-k phatk2 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
=169 Mhs

2011-08-11 kernel 
phatk VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
= 165 Mhs

2011-08-11 kernel
phatk2 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
= 153 Mhs

phatk 2011-07-17 kernel
phatk VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
=167 Mhs

poclbm VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 BFI_INT
= 157 Mhs

Seems mixed up a bit :).

phatk = VECTORS2 (with my version)
phatk2 = VECTORS

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: hash on August 22, 2011, 04:20:48 AM
This worked very good. I've updated my miners.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: mute20 on August 22, 2011, 09:53:17 PM
Anymore updates planned for the future ;D.

Also I saw a 3% boost someone else was touting can I use that in conjunction with this?


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: Diapolo on August 23, 2011, 05:45:52 AM
Anymore updates planned for the future ;D.

Also I saw a 3% boost someone else was touting can I use that in conjunction with this?

Yes, there are updates planned, but it's really hard to squeeze more performance out of this. I'm currently testing a few different paths to achieve better performance!

Where have you seen a 3% tweak? Can you post a link here, so that I can check if this can be included (if it's not in already).

Thanks,
Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: mute20 on August 25, 2011, 05:37:57 PM
Can't seem to find it ,but I assume it had similar fixes to yours.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-11
Post by: supersquish on August 26, 2011, 11:31:59 AM
Thanks for this great kernel mod, it did result in a solid 7~ mhash increase on my 5570, up from 63~. Will donate some if I manage to get amazingly lucky solo mining (I figure it's more fun to try for freak luck, It's a better habbit than lotto tickets right?)


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-27
Post by: Diapolo on August 27, 2011, 12:11:09 PM
Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji

This version is a bit faster on 58XX cards, reports indicate it can be faster on 69XX cards, too ... I guess this is because of the optimized writing to the output buffer.
You can leave out the BFI_INT switch, but remember to supply the VECTORS2 switch :)! This version takes care of wrong WORKSIZE arguments, too ... if you forget that switch, if it has an too big value or if it's not a power of 2, the maximum supported WORKSIZE for each device is used.

Thanks,
Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-27
Post by: swapper on August 27, 2011, 12:58:57 PM
Thanks, we will try.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-27
Post by: pekv2 on August 27, 2011, 01:21:21 PM
What Mhash/s rates the 5830 users receiving on version 2011-08-27?


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-27
Post by: Crs on August 28, 2011, 11:42:22 PM
here's my magic formula:

-k phatk2 AGGRESSION=13 FASTLOOP VECTORS2 WORKSIZE=128 PLATFORM=1
Ati 5850
before:348 Mh/s
after:359 Mh/s


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-08-27
Post by: pk on August 29, 2011, 12:44:27 AM
here's my magic formula:

-k phatk2 AGGRESSION=13 FASTLOOP VECTORS2 WORKSIZE=128 PLATFORM=1
Ati 5850
before:348 Mh/s
after:359 Mh/s

Saw an increase of 20MH/s, thanks.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: Diapolo on December 27, 2011, 11:40:41 AM
Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9

Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.

Edit: Guys, try a setting of 64 for the WORKSIZE, it showed good results for me, but still depends on the card!

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: jhajduk on January 02, 2012, 05:26:49 PM
Hi, I noticed a potential improvement

you can replace
Code:
	W(121);
sharoundW(121);
W(122);
sharoundW(122);
W(123);
sharoundW(123);

// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + s1(124) + ch(124) + H[7];
with
Code:
	W(121);
Vals[2] += t1W(121);
W(122);
Vals[1] += t1W(122);
W(123);
                Vals[0] += t1W(123);

// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + s1(124) + ch(124) + H[7];

Because you don't need Vals[4],Vals[5], and Vals[6] to compute the final Vals[7]


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: Diapolo on January 02, 2012, 06:42:48 PM
Hi, I noticed a potential improvement

you can replace
Code:
	W(121);
sharoundW(121);
W(122);
sharoundW(122);
W(123);
sharoundW(123);

// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + s1(124) + ch(124) + H[7];
with
Code:
	W(121);
Vals[2] += t1W(121);
W(122);
Vals[1] += t1W(122);
W(123);
                Vals[0] += t1W(123);

// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + s1(124) + ch(124) + H[7];

Because you don't need Vals[4],Vals[5], and Vals[6] to compute the final Vals[7]

I'll have to look into this, during the first test all performance relevant numbers were identical to my latest kernel. But perhaps reordering of the operations will help.
Thanks for your input :)!

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: jhajduk on January 05, 2012, 05:15:07 AM
I also noticed that if the pools used the 7th 32 bit component of the hash ("g") rather than the first 32 bit ("a") for computing shares you could stop after the 61st round of computation rather than the 63rd.  That would be maybe a 2% efficiency for pools that implemented it.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: Diapolo on January 08, 2012, 11:49:18 AM
Hi, I'm new to bitcoin mining and I'm not sure if I'm posting this in the correct forum (I can't post in the phoenix thread), I've used guiminer for a few months now and decided to give phoenix a try.
And its great, phoenix is slightly faster and has a stable hashrate.
However I noticed that phatk2 is much slower(50-60Mhashes/s) then pathk, and I believe this optimized kernel is supposed to replace the original pathk2 kernel, because when I overwrite the original pathk kernel with this one my hash rate is almost the same as with pathk2.

Am I using incorrect settings? I have an unlocked 6950 card with 910 mhz core and 1440 mhz memory.

The settings I use are: -k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 worksize=128  FASTLOOP=false

or if I want to run on phatk2:  -k phatk2 DEVICE=0 VECTORS BFI_INT AGGRESSION=11 worksize=128  FASTLOOP=false

I will only comment on my kernel here, which has to be used with:
Code:
-k phatk DEVICE=0 VECTORS2 AGGRESSION=11 WORKSIZE=128 FASTLOOP=false

The normal Phoenix kernel, which is in the default Phoenix download package doesn't have VECTORS2 and needs BFI_INT switch supplied (I activate this by myself if cl_amd_media_ops extension is available).

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: Diapolo on January 08, 2012, 01:12:23 PM
Hi, I'm new to bitcoin mining and I'm not sure if I'm posting this in the correct forum (I can't post in the phoenix thread), I've used guiminer for a few months now and decided to give phoenix a try.
And its great, phoenix is slightly faster and has a stable hashrate.
However I noticed that phatk2 is much slower(50-60Mhashes/s) then pathk, and I believe this optimized kernel is supposed to replace the original pathk2 kernel, because when I overwrite the original pathk kernel with this one my hash rate is almost the same as with pathk2.

Am I using incorrect settings? I have an unlocked 6950 card with 910 mhz core and 1440 mhz memory.

The settings I use are: -k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 worksize=128  FASTLOOP=false

or if I want to run on phatk2:  -k phatk2 DEVICE=0 VECTORS BFI_INT AGGRESSION=11 worksize=128  FASTLOOP=false

I will only comment on my kernel here, which has to be used with:
Code:
-k phatk DEVICE=0 VECTORS2 AGGRESSION=11 WORKSIZE=128 FASTLOOP=false

The normal Phoenix kernel, which is in the default Phoenix download package doesn't have VECTORS2 and needs BFI_INT switch supplied (I activate this by myself if cl_amd_media_ops extension is available).

Dia
Thanks,I replaced the normal pathk kernel with yours, the performance is now the same in Mh/s although shares seem to come slightly quicker.
Hmm with a worksize of 64 I get slightly better performance(0.5 Mh/s more).
I'm now at ~408 MH/s drops to 407.91 sometimes, and with aggression on 12 I get 408+ Mh/s
Fine tuned aggression to 16 and I get 409+ Mh/s nearly 410.

What's your setup? Driver, OS, card?


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: Diapolo on January 08, 2012, 01:37:13 PM
Thanks for your infos.

The higher the AGRESSION, the more Desktop lag you observe, that's normal.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: Diapolo on January 08, 2012, 03:03:22 PM
Thanks for your infos.

The higher the AGRESSION, the more Desktop lag you observe, that's normal.

Dia
No problem, so aggression doesn't affect the speed of shares?

btw I'm having some connection issues every now and then duo to my isp and sometimes I get a warning that my work queue is empty for a few seconds although this only happens once a hour or so.
Could I prevent this from happening by using -q 2 or -q 5, or is this not a good idea?

A higher agression can lead to higher MH/s, while then having more desktop lag.
For your ISP stuff, yes you could try -q 2 or specify a backup pool to Phoenix via -b, which needs the same format as -u.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: stevegee58 on January 09, 2012, 05:40:55 PM
Hmm I made some calculations and I saw that the Mhash/s is incorrect.
it indicates I get around 418Mh/s
my pool indicates I get 403 Mh/s

7610 9-1-2012 17:18

7879  9-1-2012 18:01

time=2580 sec
shares=269

269 x 2^32 =1 155 346 202 624 / 2580 =447 808 606
447 808 606 / 1 000 000 = 447,808606 Mh/s

or am I doing something wrong here, why are they all reporting different speeds ???

I've noticed this myself.  Guiminer reports I'm running a pretty solid 185 Mhash/sec but my speed as reported by deepbit is all over the place.
I've seen it as high as 260 and as low as 50.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: petala on January 09, 2012, 09:22:16 PM
Any news about 7970 support as that card is quite good at 666 mhash/s and $550 right now ?

cheers !


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2011-12-21
Post by: Diapolo on January 13, 2012, 09:23:20 AM
Any news about 7970 support as that card is quite good at 666 mhash/s and $550 right now ?

cheers !

Currently I have none and the AMD KernelAnalyzer seems to currently not support GCN architecture, so it's hard to do any optimizsations for it. But I would be interested in results with 7970 and my kernel (new kernel is on it's way to release). AMD sais, that massive vectorisation would not be needed for optimal performance with GCN, so perhaps it would run well without the use of a VECTORSn parameter.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2012-01-13
Post by: Diapolo on January 13, 2012, 11:45:44 AM
A new version is ready for your testing pleasure:
Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23p

highlights:
- the child has it's name, I call it phatk_dia - would be nice if you guys use this in discussions to be clear what your kernel is ;)
- faster on VLIW5 GPUs with VECTORS2 and VECTORS4
- more efficient on VLIW4 GPUs with VECTORS2 and a little faster with VECTORS4
- FASTLOOP defaults to false, so you don't need to supply FASTLOOP=false
- added an extended check for supplied WORKSIZE parameter
- removed a pyOpenCL finish() to reduce API overhead (could cause problems, but works here -> consider this beta till it proves stable)

Please report and give me all your coins :-D!

Edit: Please don't complain if this doesn't work good for non 2.6 SDK / Runtime versions, because this IS for 2.6 or later!

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2012-01-13
Post by: SocialBonobo on January 13, 2012, 04:10:52 PM
A new version is ready for your testing pleasure:
Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23p

Do you provide a git repository of this fork?


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2012-01-13
Post by: Diapolo on January 13, 2012, 05:42:52 PM
A new version is ready for your testing pleasure:
Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23p

Do you provide a git repository of this fork?

Sorry, but no ... I use Beyond Compare 3 for editting / comparing the files and manage the rest in my brain ;). Guess that's kind of old school, but it works.
To be honest, I don't even know how to use a git repository.

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2012-01-13
Post by: Diapolo on January 14, 2012, 12:35:29 PM
Uploaded a fixed version, which corrects an error with FASTLOOP=True:
Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg

There are no other changes in this version!

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2012-01-13
Post by: Diapolo on January 14, 2012, 01:06:55 PM
Uploaded a fixed version, which corrects an error with FASTLOOP=True:
Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg

There are no other changes in this version!

Dia
I no longer see the mhash/s. also I don't see the shares and other statistics.(basically the entire button line)

I do see the log which says at which time a share was accepted and that says other things.

That's not a helpful bug report ... sorry. What OS? Can you paste an output of the Phoenix window.
What's your command line? Was this introduced with the FASTLOOP fix or before?

Dia


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2012-01-13
Post by: Diapolo on January 14, 2012, 01:39:57 PM
The part with the stats you mention is blinking for me too, this seems to happen with high aggression and when it updates to new values.
You should try the latest version with the latest Phoenix 1.7.3 and it would be great to get some speed reports in here.

I have to say I'm a bit disappointed at least with the feedback to this release, not to mention simply nothing is coming in ... even if this version is NOT faster for some, it took many hours to do it and it's not satisfying that way. For me the current version IS faster than phatk2 with 6550D and the difference is huge, I don't understand why this seems to be not the case for any other user here. Guys please use 12.1a with Phoenix 1.7.3 and the settings mentioned on page 1 in this thread. If you complain that it's not faster only do this with some system infos like SDK, OS, driver, card, Phoenix version and used command switches, thank you!

Edit: You can also post the contents of the Phoenix window here.
Code:
[14/01/2012 14:43:18] using PyOpenCL version 0.92
[14/01/2012 14:43:18] checked nonces per kernel execution: 67108864
[14/01/2012 14:43:18] using VECTORS2, resulting global worksize is: 33554432
[14/01/2012 14:43:18] using local worksize of 128 (HW max. is 256)
[14/01/2012 14:43:18] cl_amd_media_ops ext. found - BFI_INT enabled

[14/01/2012 14:43:19] Finding inner ELF...
[14/01/2012 14:43:19] Patching inner ELF...
[14/01/2012 14:43:19] Patching instructions...
[14/01/2012 14:43:19] BFI-patched 472 instructions...
[14/01/2012 14:43:19] Patch complete, returning to kernel...
[14/01/2012 14:43:19] Applied BFI_INT patch
[14/01/2012 14:43:19] Phoenix v1.7.3 starting...
[14/01/2012 14:43:19] Connected to server
[14/01/2012 14:43:19] Server gave new work; passing to WorkQueue
[14/01/2012 14:43:19] New block (WorkQueue)
[14/01/2012 14:43:21] Server gave new work; passing to WorkQueue
[66.19 Mhash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]

Dia :-/


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2012-01-13
Post by: deepceleron on January 14, 2012, 02:13:55 PM

its cause of the aggresion at 16, when I put it on 15 it shows again but the miner is idle a lot according to the log and the khash/s are at 0 however its still mining and getting hashes accepted.
with 14 its working fine but slightly lower mhash/s then with the older one kernel at aggression 17. (417 compared to 418 mhash/s.)
This is because above 12 or 13 is insane aggression - the miner hogs the GPU so much that the Windows GUI can't even draw stuff like the status line on the screen.


Title: Re: further improved phatk OpenCL kernel (> 4% increase) for Phoenix - 2012-01-13
Post by: deepceleron on January 14, 2012, 06:56:56 PM

its cause of the aggresion at 16, when I put it on 15 it shows again but the miner is idle a lot according to the log and the khash/s are at 0 however its still mining and getting hashes accepted.
with 14 its working fine but slightly lower mhash/s then with the older one kernel at aggression 17. (417 compared to 418 mhash/s.)
This is because above 12 or 13 is insane aggression - the miner hogs the GPU so much that the Windows GUI can't even draw stuff like the status line on the screen.
It was fine with the older kernel.
Does that mean anything?
I think this kernel is slightly faster then the older one.
It is possible that the previous kernel and miner parameters didn't work the GPU so hard; as kernels are optimized, they use more of the GPU resources available, approaching 100%, leaving less chance that OS draw instructions will make it through in a timely fashion.


Title: Re: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13
Post by: Diapolo on January 19, 2012, 08:25:24 AM
Just a hint, this kernel is for SDK 2.6+, it doesn't work well with earlier versions / runtimes!

Dia