Diapolo (OP)
|
|
July 04, 2011, 08:15:55 AM Last edit: February 25, 2012, 02:26:12 PM by Diapolo |
|
This is a repost from the Newbies forum, because I'm now allowed to post here :). original Thread is located here: http://forum.bitcoin.org/index.php?topic=25135.0If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :). Important (2012-01-13): The FASTLOOP=False parameter is not needed anymore, because FASTLOOP defaults to false in this version. Update: FASTLOOP=True works now, I uploaded a fixed version! Important: since OpenCL SDK / Runtime version 2.6 AMD updated their OpenCL compiler, so that some older kernels and optimizations in them seem to not work anymore or are not needed anymore. In order to reflect this change I had to edit the kernel performance section of this thread. Important: since version 2011-08-27 you don't need to supply the BFI_INT switch anymore. If your HW supports it, it's enabled automatically. To disable it use BFI_INT=false. Important: since version 2011-08-04 (pre-release) you have to use the switch VECTORS2 instead of VECTORS. I made this change to be clear what vectors are used in the kernel (2- or 4-component). To use 4-component vectors use switch VECTORS4. Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix miner) is included in this package and has to be used! The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.
This is the preferred switch for Phoenix with phatk_dia in order to achieve comparable performance: -k phatk AGGRESSION=12 VECTORS2 WORKSIZE=128 Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dgDownload version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419jiDownload version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4jDownload version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788Download version 2011-07-17: http://www.mediafire.com/?4zxdd5557243hasDownload version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6Download version 2011-07-07: http://www.mediafire.com/?o7jfp60s7xefrg4Download version 2011-07-06: http://www.mediafire.com/?f8b8q3w5u5p0ln0Download version 2011-07-03: http://www.mediafire.com/?xlkcc08jvp5a43vDownload version 2011-07-01: http://www.mediafire.com/?5jmt7t0e83k3eoxKernel performance (BFI_INT / VECTORS2 / WORKSIZE=128 / SDK 2.6 / APP KernelAnalyzer 1.11 - Cal 11.12 profile): HD58702011-08-20: 22 GPR / 1427 ALU OPs / 66 CF OPs2011-08-27: 22 GPR / 1426 ALU OPs / 66 CF OPs2011-12-21: 20 GPR / 1400 ALU OPs / 66 CF OPs2012-01-13: 21 GPR / 1394 ALU OPs / 67 CF OPsHD69702011-08-20: 21 GPR / 1687 ALU OPs / 66 CF OPs2011-08-27: 23 GPR / 1688 ALU OPs / 68 CF OPs2011-12-21: 21 GPR / 1687 ALU OPs / 66 CF OPs2012-01-13: 20 GPR / 1687 ALU OPs / 66 CF OPs
Kernel performance (BFI_INT / VECTORS2 / SDK 2.5 / APP KernelAnalyzer 1.9 - Cal 11.7 profile): HD5870 original phatk 1.X: 1393 ALU OPs 2011-07-01: 1389 ALU OPs 2011-07-03: 1385 ALU OPs 2011-07-06: 1380 ALU OPs 2011-07-07: 1380 ALU OPs 2011-07-11: 1378 ALU OPs 2011-07-17: 1376 ALU OPs 2011-08-04 (pre-release): 1368 ALU OPs 2011-08-11: 1364 ALU OPs 2011-08-27: 1363 ALU OPs (30 less compared to original phatk 1.X) HD6970 original phatk 1.X: 1707 ALU OPs 2011-07-01: 1710 ALU OPs 2011-07-03: 1706 ALU OPs 2011-07-06: 1702 ALU OPs 2011-07-07: 1702 ALU OPs 2011-07-11: 1701 ALU OPs 2011-07-17: 1699 ALU OPs 2011-08-04 (pre-release): 1689 ALU OPs 2011-08-11: 1687 ALU OPs 2011-08-27: 1687 ALU OPs (20 less compared to original phatk 1.X) changelog:2012-01-13Kernel: - modified: Disclaimer is now the same as in original Phoenix package - removed: all (u) typecasts in front of scalars, where vectors and scalars were used together because per OpenCL definition this is not needed - removed: all () brackets around n in the #define parts of the kernel - removed: S0(), which is now again merged into s0() - removed: brackets around the commands in t1W(), t1(), t2() and W() were removed, to allow the compiler to reorder these - added: W() function missed an ; at it's end - added: init variable B1addK6 used in round 6 to save an add -> THX to DiabloD3 - added: a (uint) typecast in front of get_local_id() and get_group_id() calls, because return value could be 64 bits long, which is not wanted - modified: replaced all ma() + s0() or s0() + ma() calls with t2() - modified: round 6 now uses the new new B1addK6 variable - modified: reordered W[] calculation for rounds 32, 91 and 92 - modified: rounds 121, 122 and 123 to not compute Vals[4], Vals[5] and Vals[6], because they are not needed for final computation of Vals[7] -> THX to jhajduk - modified: removed + H[7] from round 124 and use -0xec9fcd13 to check for valid nonces - added: result_r124 variable to take the result of the last round 124, this saves a few ALU OPs on VLIW5 GPUs Python Init: - modified: replaced spaces with tabs in the source code formatting (I really dislike this part in Python ^^) - modified: a few comments and commands were reformatted for better readability or to be better understandable - modified: FASTLOOP parameter now defaults to False, which means you don't need to supply FASTLOOP=False anymore - removed: OUTPUT_SIZE is not used anymore so all references to it were removed - modified: changed REVISION to 122 - modified: moved the WORKSIZE checks below the part where the check, if and which vectors should be used is performed this takes into account, that the global worksize passed to the kernel is influenced by vector usage and vectorsize (currently the use of FASTLOOP can break this, because of the "dynamic" number of iterations) - added: some debug info about worksize and pyOpenCL is displayed at the start - added: B1 + K[6] is passed as new kernel parameter - modified: made enqueue_read_buffer() / enqueue_write_buffer() blocking and removed finish() after the read, as per AMDs recommendations to minimize API overhead 2011-08-27:Kernel: - added: code path for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8) - removed: BITALIGN option from the kernel, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false) - modified: non BFI_INT Ch() function, which was broken in 2011-08-11 -> THX to Vince - modified: kernel output buffer is now an ulong array and not an uint array - removed: OUTPUT_SIZE argument is not passed and used in the kernel anymore - modified: WORKSIZEx4, WORKSIZEx3 and WORKSIZEx2 arguments were merged into WORKSIZExVECSIZE - modified: removed, reordered and added some brackets and type-casting stuff in the kernel - modified: restored command order for round 108 - 123 to free a GPR - modified: added H[7] into round 124 calculation - modified: changed the checking for positive nonces again to cover the H[7] change - modified: writing of nonces to output now uses 1 write for Vec2 and max. 2 writes for Vec4, because 2x uints are now encoded into 1x ulong Python Init: - added: code for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8) - removed: BITALIGN option from the Python init, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false) - added: detection of maximum supported WORKSIZE per Device, which is used if no WORKSIZE is supplied, if supplied WORKSIZE > max. supported WORKSIZE or if WORKSIZE is not a power of 2 - added: code to decode the ulong from the output buffer into 2x uint and process the results - modified: comments, code formating and line breaks for better readability - modified: output buffer size is now the WORKSIZE -> THX to Phaetus 2011-08-11:- modified: reverted a former change to the Ma() function to save an ALU OP for 69XX cards - added: S0() and S1() function, which is a compiler help -> THX Phateus - modified: a few brackets and layout of all helper functions for better readability and compatibility - added: t2() function, which is (s0(n) + ma(n)) and saves a few GPRs -> THX Phateus and myself (had this in earlier, but removed it sometime ^^) - modified: changed layout of kernel definition for better readability - modified: all values which for example had a 10u now have a 10U (uppercase) to be consistent in the whole kernel - modified: modified round 94 W calculation for better performance - modified: round 108 - 123 now consists of 2 W() blocks followed by 2 sharoundW() blocks to save a GPR - modified: changed the checking for positive nonces again to never create an invalid share and lower ALU OP usage 2011-08-04 (pre-release):- added: user Vince into disclaimer -> THX Vince :) - added: kernel is now able to work with 4-component vectors (switch VECTORS4) -> THX to Phateus - modified: to use 2-component vectors I renamed the switch VECTORS to VECTORS2 - added: __attribute__((reqd_work_group_size(WORKSIZE, 1, 1))) -> THX to Phateus - added: constants PreW31 and PreW32, which store P2() + P4() for round 31 and 32 -> THX to Phateus - renamed - modified: W17_2 is now PreW19, W2 is now PreW18, PreVal4addT1 is now PreVal4 (= PreVal4 + T1), state0subT1 is now PreVal0 (= Preval4 + state0) - modified: base is now declared as u to save the addidion of uint2(0, 1) or uint4(0, 1, 2, 3) for W_3 init -> THX to Phateus - modified: nonce calculation now uses the local Work-Item ID, the group ID and the WORKSIZE instead of only the global Work-Item ID -> THX to Phateus - added: saved a multiplication by passing WORKSIZEx2 and WORKSIZEx4 constants to the kernel - modified: calculation for W[18 - O] was optimized so that P2(18) is only calculated for x component (if Vectors are used), because x and y only differ in the LSB and afterwards Bit 14 and 25 are rotated for W[18 - O].y -> THX to Phateus - modified: saved an addition for Vals[0] init, because of the change to PreVal0 - modified: reordered code for round 4 - 95 to optimize for less ALU OPs used -> THX Phateus and myself ^^ - modified: ordering of variables in additions for Round 124 was changed to optimize for less ALU OPs used - modified: rewrote the part where nonces are checked, if they are positive and where they are written into output buffer (saves 2 global writes per work-item and saves additional ALU OPs) - modified: changed variables W_3, P2_18_x, P2_18 and nonce into a constant - modified: changed code formating for rounds 4 - 124 better readability - removed: some comments to cleanup the code 2011-07-17:- added: offset for W[] array to reduce it's size -> THX to user Vince - modified: function t1() renamed to t1W() / function sharound() renamed to sharoundW() - added: function t1() and sharound() which are used where the W[] addition can be left out, because W[] == 0 I guess the compiler already does this optimization, but doesn't hurt) -> THX to user Vince - modified: P1() - P4() and W() to male use of the offset - modified: quite a few kernel parameters have new values or were added (mixed ideas from User Vince with own ones) C1addK5: C1addK5 = C1 + K[5]: C1addK5 = C1 + 0x59f111f1 D1: D1 = D1 + K[4] + W[4]: D1 = D1 + 0xe9b5dba5 + 0x80000000U W2: W2 + W16 in P1(): W2 = P1(18) + P4(18) W17_2: 0x80000000U in P2() = 0x11002000 + W17 in P1(): W17_2 = P1(19) + P2(19) PreValaddT1: PreValaddT1 = PreVal4 + T1 T1substate0: T1substate0 = T1 - substate0 - added: variable W_3, which stores the first value formely held in W[3] - added: Temp variable used to speed up calculation for rounds 4 and 5 - modified: changed round 3 so that it's more efficient (uses: Vals[0] and Vals[4]) - modified: W[0] - W[14] are now kind of hard-coded or left out, where they were 0 - modified: optimized P1(18) + P2(18) + P4(18) - modified: optimized P1(19) + P2(19) + P4(19) - modified: optimized round 4 + 5 - modified: rounds 6 - 14 and 73 - 78 now use new sharound() without W[] addition - modified: offset added for all parts, where W[] is used - modified: W_3 is used as result instead of W[3] (W[3] is still used to generate random possition in output buffer) -> THX to user Vince 2011-07-11:- modified: constant H[7] has a new value (saves an addition in round 124) - modified: non BFI_INT Ch() function now uses OpenCL built-in bitselect - modified: reordered W[] calculations for round 18 - 30, 87 and 94 - modified: reordered calculation for round 5 - modified: W[] calculation for round 80 - 86 is now a block before sharound() is called - removed: K[60] from round 124 (because of new H[7] value) 2011-07-07:- removed: some large comments in the source were removed - modified: Ma() function is now unique in the kernel, no matter if BFI_INT is used or not -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw - added: Ch() function which uses OpenCL bitselect() command (but it's not active, so you are free to try it) -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw - modified: u W[128] is replaced with u W[124] because no more than 124 values are used - modified: initialisation for Vals[0], Vals[3], Vals[4] and Vals[7] is now processed in other places to save some unneeded writes to these variables - fixed: some hex values, which were used in vector additions are now properly type-casted, which hopefully restores AMD APP SDK 2.1 compatibility - modified: rounds 3, 4 and 5 were modified for better performance (guess this can be tuned, if I have a working KernelAnalyzer) 2011-07-06:- modified: H[] constants were reordered (2 were not used because of earlier mods) - added: ulong L constant added (it's value doesn't fit into an uint) - modified: new Ma() for non BFI_INT capable cards, should be faster -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw - removed: t1W() - modified: t1() reordered function calls for better performance - modified: W() reordered function calls for better performance - modified: sharound() removed writing to t1, now t1() is called twice, which makes this function FASTER (OpenCL compiler optimization) - removed: sharound2() (if needed W() + sharound() is used instead) - removed: partround() not needed because of another solution for round 3 and 124 - removed: t1 and t1W variabled - modified: rounds 3, 19, 30, 81, 87, 94 and 124 were modified for better performance 2011-07-03:- removed: t2(), w(n), r0(x), r1(x), R0(n) and R1(n) - renamed - modified: R(x) to W(x) plus now uses P1, P2, P3 and P4 directly - modified: P1(x) and P2(x) to not use R1(x - 2), R0(x - 15) but do that directly - modified: SHA rounds 31, 32, 47 - 61, 86, 87, 114 - 119 now use sharound2() instead of W() + sharound() - modified: reordered code for SHA rounds 66 - 94 -> saw no decrease in performance -> better readability - modified: SHA rounds 18, 19, 20, 80, 93, 94 now use a simpler calculation because of removed zero addions --> 1x P1(x), 2x P2(x), 4x P3(x) and 2x P4(x) were removed which should give a little MHash/sec boost - modified: sharound() so that a double execution of t1() is avoided -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw - added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W() 2011-07-01:Vals[7] = 0xb0edbdd0 + K[0] + W[64] + 0x08909ae5U; -> Vals[7] = 0xfc08884d + W[64]; Vals[3] = 0xa54ff53a + 0xb0edbdd0 + K[0] + W[64]; -> Vals[3] = 0x198c7e2a2 + W[64]; - removed the Vals[7] += H[7] addition and replaced the final if-statements in the Kernel - reordered some W[n] = statements to remove some unneeded additions - replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse :D) - removed some unneeded brackets - re-formatted for better readability If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).Thanks, Dia
|
|
|
|
lebuen
|
|
July 04, 2011, 08:38:12 AM |
|
Works perfectly for me, although only slight increase in performance (371 -> 373 MH/s). But I had the previous patch already in place. Thanks!
|
|
|
|
Fletch
Full Member
Offline
Activity: 168
Merit: 100
I'll have a steak sandwich and a... steak sandwich
|
|
July 04, 2011, 08:40:32 AM |
|
Great, thanks. I went from 240 -> 242.5 on my 5850's. I was using only the "3% Ma-function patch" before. - replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse ) I've never developed any code for GPUs or used OpenCL, but wouldn't the compiler take care of that for you? At least all C compilers would.
|
|
|
|
Diapolo (OP)
|
|
July 04, 2011, 09:38:52 AM |
|
Great, thanks. I went from 240 -> 242.5 on my 5850's. I was using only the "3% Ma-function patch" before. - replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse ) I've never developed any code for GPUs or used OpenCL, but wouldn't the compiler take care of that for you? At least all C compilers would. Great that you benefit from the modifications . Your comment about the compiler stuff IS right, but at least it doesn't make things worse . Dia
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 04, 2011, 10:37:43 AM |
|
No change for me at all unfortunately, not even 0.1 MH/s. I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.
Settings: Linux Catalyst 11.6 SDK 2.1 phatk (bundled with phoenix 1.50 with MA tweak) VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256 Solo mining
5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C) 5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)
|
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 04, 2011, 10:41:57 AM |
|
No change for me at all unfortunately, not even 0.1 MH/s. I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.
Settings: Linux Catalyst 11.6 SDK 2.1 phatk (bundled with phoenix 1.50 with MA tweak) VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256 Solo mining
5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C) 5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)
I went from 56->61Mhash/s on my 4850. Phatk is now faster than DiabloMiner. Getting 59Mhash/s with DiabloMiner.
|
|
|
|
Diapolo (OP)
|
|
July 04, 2011, 10:44:37 AM |
|
No change for me at all unfortunately, not even 0.1 MH/s. I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.
Settings: Linux Catalyst 11.6 SDK 2.1 phatk (bundled with phoenix 1.50 with MA tweak) VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256 Solo mining
5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C) 5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)
Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though. Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel? Dia
|
|
|
|
Diapolo (OP)
|
|
July 04, 2011, 10:46:19 AM |
|
I went from 56->61Mhash/s on my 4850. Phatk is now faster than DiabloMiner. Getting 59Mhash/s with DiabloMiner.
Good to know that 4XXX series get a boost, too . What SDK are you on? Dia
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 04, 2011, 10:49:47 AM |
|
No change for me at all unfortunately, not even 0.1 MH/s. I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.
Settings: Linux Catalyst 11.6 SDK 2.1 phatk (bundled with phoenix 1.50 with MA tweak) VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256 Solo mining
5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C) 5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)
Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though. Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel? Dia I'm using SDK 2.1 because when I compared it with 2.4 I found a slight speed improvement with both phatk and poclbm (much larger with poclbm). I haven't tried SDK 2.4 since applying the MA patch so that is maybe worth a try. BFI_INT is definitely being used. If I restart the command without BFI_INT on my 399.4 MH/s gpu I get 354.5 MH/s instead.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 04, 2011, 10:52:56 AM Last edit: July 04, 2011, 11:37:47 AM by teukon |
|
No change for me at all unfortunately, not even 0.1 MH/s. I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.
Settings: Linux Catalyst 11.6 SDK 2.1 phatk (bundled with phoenix 1.50 with MA tweak) VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256 Solo mining
5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C) 5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)
Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though. Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel? Dia I'm using SDK 2.1 because when I compared it with 2.4 I found a slight speed improvement with both phatk and poclbm (much larger with poclbm). I haven't tried SDK 2.4 since applying the MA patch so that is maybe worth a try. BFI_INT is definitely being used. If I restart the command without BFI_INT on my 399.4 MH/s gpu I get 354.5 MH/s instead. Just tried SDK 2.4 on my 399.4 MH/s gpu. It went down to 393.7 MH/s (actually 394.3 MH/s but occasionally dropping off to 390-391, I took an average). Edit: I compared the new kernel with the old one using SDK 2.4 and the improvement was 3.1 MH/s (+0.79%). This is a nice improvement but not enough to make me move away from SDK 2.1. Also, SDK 2.4 causes the MH/s to drop suddenly by 3 or 4 MH/s every so often (variance is within 0.5 MH/s with SDK 2.1 for me) and even the peak values I achieve with SDK 2.4, new kernel or not, are below my SDK 2.1 average. Ah well, good work though. I sent you some BTC anyway simply because you tried to help me fix my problem.
|
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 04, 2011, 11:48:23 AM |
|
I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 04, 2011, 12:00:17 PM |
|
I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.
I found when pool mining that most of my rejects came shortly after new work was pushed. As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency.
|
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 04, 2011, 12:08:45 PM |
|
I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.
I found when pool mining that most of my rejects came shortly after new work was pushed. As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency. I'm aware of that, but I've had more rejects in the 10 minutes I'm running it than I've had all day. 411 shares, 22 stale. 13 of those are from testing with this kernel in the last 45 minutes or so. I'm trying another pool and it's a bit better at 6 Accepted 2 Rejected. On bitclockers I was running at 300+ shares and 11 rejects with DM but getting the numbers above with the modded phatk kernel using phoenix. Trying it again. [04/07/2011 05:05:06] Phoenix 1.50 starting... [04/07/2011 05:05:06] Connected to server [04/07/2011 05:05:55] Result: 12d3028b accepted [04/07/2011 05:06:46] Result: e2513abe accepted [04/07/2011 05:09:05] LP: New work pushed [04/07/2011 05:09:09] Result: 55153340 accepted [04/07/2011 05:09:48] Result: c49813f2 accepted [04/07/2011 05:11:17] Result: 3e257a0d rejected [04/07/2011 05:11:25] Result: 28da50c1 rejected [04/07/2011 05:11:26] Result: e062d59e rejected [04/07/2011 05:11:48] LP: New work pushed [04/07/2011 05:12:05] Result: 75d54b7b rejected [04/07/2011 05:12:08] Result: c832f2b0 rejected
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 04, 2011, 12:15:40 PM |
|
I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.
I found when pool mining that most of my rejects came shortly after new work was pushed. As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency. I'm aware of that, but I've had more rejects in the 10 minutes I'm running it than I've had all day. 411 shares, 22 stale. 13 of those are from testing with this kernel in the last 45 minutes or so. I'm trying another pool and it's a bit better at 6 Accepted 2 Rejected. On bitclockers I was running at 300+ shares and 11 rejects with DM but getting the numbers above with the modded phatk kernel using phoenix. Trying it again. Yes, after 45 minutes things look highly suspect to me. A good 3 hour test is useful for comparing different very-good setups but this tweak seems to have really hurt your accept/reject ratio. Let us know your best when you're done testing.
|
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 04, 2011, 12:23:02 PM Last edit: July 04, 2011, 12:58:11 PM by OCedHrt |
|
With the original phatk kernel (ma patched) [04/07/2011 05:13:33] Phoenix 1.50 starting... [04/07/2011 05:13:33] Connected to server [04/07/2011 05:14:21] Result: 8c4fd15d accepted [04/07/2011 05:14:45] Result: f15aefe5 accepted [04/07/2011 05:15:35] Result: 9d5dfc38 rejected [04/07/2011 05:19:21] LP: New work pushed [04/07/2011 05:19:54] Result: 8adaadf6 accepted [04/07/2011 05:19:57] Result: 5382cf90 accepted [04/07/2011 05:22:54] Result: 2d0233f8 rejected [04/07/2011 05:24:03] Result: 28c05c3a rejected [04/07/2011 05:25:41] Result: 9dff1142 rejected [04/07/2011 05:25:54] Result: 33095b05 accepted [04/07/2011 05:26:05] Result: 3ec67e7e accepted [04/07/2011 05:27:33] Result: 5307e072 accepted [04/07/2011 05:27:37] Result: 20237b07 accepted [04/07/2011 05:29:18] Result: c8abce0f rejected It's actually 05:22 now so duration is same but number of results is significantly less though total accepted is same. Hashrate is 57 vs 61 with your kernel. Update: Some more results...seems like it may just be a phoenix thing with bitclockers. I will get some data from DM for reference but I may have to go ask in Phoenix/GUIMiner thread. From DiabloMiner [7/4/11 5:31:58 AM] Started [7/4/11 5:31:58 AM] Connecting to: http://pool.bitclockers.com:8332/[7/4/11 5:31:58 AM] Using AMD Accelerated Parallel Processing OpenCL 1.1 AMD-APP -SDK-v2.5 (684.211) [7/4/11 5:32:00 AM] Added ATI RV770 (#1) (10 CU, local work size of 128) [7/4/11 5:33:20 AM] Accepted block 1 found on ATI RV770 (#1) [7/4/11 5:35:41 AM] Accepted block 2 found on ATI RV770 (#1) [7/4/11 5:36:49 AM] Accepted block 3 found on ATI RV770 (#1) [7/4/11 5:36:49 AM] Accepted block 4 found on ATI RV770 (#1) [7/4/11 5:37:54 AM] Rejected block 1 found on ATI RV770 (#1) [7/4/11 5:39:40 AM] Accepted block 5 found on ATI RV770 (#1) [7/4/11 5:40:23 AM] Accepted block 6 found on ATI RV770 (#1) [7/4/11 5:40:34 AM] Accepted block 7 found on ATI RV770 (#1) [7/4/11 5:40:56 AM] Accepted block 8 found on ATI RV770 (#1) [7/4/11 5:41:53 AM] Accepted block 9 found on ATI RV770 (#1) [7/4/11 5:42:15 AM] Accepted block 10 found on ATI RV770 (#1) [7/4/11 5:42:38 AM] Accepted block 11 found on ATI RV770 (#1) [7/4/11 5:42:59 AM] Accepted block 12 found on ATI RV770 (#1) [7/4/11 5:43:57 AM] Accepted block 13 found on ATI RV770 (#1) [7/4/11 5:44:48 AM] Accepted block 14 found on ATI RV770 (#1) Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix. Update again: No issues on poclbm using your kernel so I guess it's Phoenix.
|
|
|
|
Diapolo (OP)
|
|
July 04, 2011, 03:22:12 PM |
|
Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix.
Update again: No issues on poclbm using your kernel so I guess it's Phoenix.
Dunno what the difference is between the 2 OpenCL wise. Setup, command queues, perhaps kernel result download or processing. If you see a problem you should contact jedi95, perhaps he can clear this up? Dia
|
|
|
|
huayra.agera
|
|
July 04, 2011, 04:20:35 PM |
|
Hi! Just tested this, it did boost my 5850 @ 960/300/1.174v from 390 > 395. However, it seems to drop off to 380 all of a sudden like 3-4 seconds then back up to 395. Still, an increase, thanks for the work!
|
BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 04, 2011, 05:48:07 PM |
|
Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix.
Update again: No issues on poclbm using your kernel so I guess it's Phoenix.
Dunno what the difference is between the 2 OpenCL wise. Setup, command queues, perhaps kernel result download or processing. If you see a problem you should contact jedi95, perhaps he can clear this up? Dia I will look into it. GUIMiner has their own Phoenix and that may be related.
|
|
|
|
gfaust
Newbie
Offline
Activity: 24
Merit: 0
|
|
July 04, 2011, 11:50:16 PM |
|
I was already running the "#define Ma" optimized kernel, and this is good for another .5% on top of that.
|
|
|
|
shakaru
Sr. Member
Offline
Activity: 406
Merit: 250
QUIFAS EXCHANGE
|
|
July 04, 2011, 11:58:33 PM |
|
went from 306-314 on my 5830's Sent you a little thank you. People PAY THIS MAN!
|
|
|
|
Alan Lupton
Newbie
Offline
Activity: 42
Merit: 0
|
|
July 05, 2011, 01:46:19 AM Last edit: July 26, 2011, 03:46:48 AM by Alan Lupton |
|
Hi, I offer my rapidshare account for the links. I would mark the files as trafficshare, meaning no popups, no wait time, nothing. Just like a normal download from, lets say any software website, click and go. You wouldn't even have to go to the rapidshare page. And since we're talking about 4KB, I couldn't care less Try it: Version 2011-07-17: https://www.rapidshare.com/files/4111719732/2011-07-17_kernel.7zVersion 2001-07-11: https://www.rapidshare.com/files/3730055236/2011-07-11_kernel.7zVersion 2011-07-07: https://www.rapidshare.com/files/1447400948/2011-07-07_kernel.7zVersion 2011-07-06: https://www.rapidshare.com/files/698776394/2011-07-06_kernel.7zVersion 2011-07-03: https://www.rapidshare.com/files/3813413034/2011-07-03_kernel.7zVersion 2011-07-01: https://www.rapidshare.com/files/946373551/kernel.7z
|
|
|
|
Diapolo (OP)
|
|
July 05, 2011, 05:10:27 AM |
|
went from 306-314 on my 5830's Sent you a little thank you. People PAY THIS MAN!
Sounds great and thank you ! Dia
|
|
|
|
Diapolo (OP)
|
|
July 05, 2011, 05:12:41 AM |
|
Can somebody upload a version of the kernel we can extract natively in windows without downloading yet another compression/decompression app?
TIA
edit: Online decompression: wobzip.org
The kernel is that small I guess it wouldn't even need to be zipped. Download volume should not be that big. But I really like 7-Zip, it's free and open source ... you should consider to install it as your default packer . Dia
|
|
|
|
Keninishna
|
|
July 05, 2011, 08:51:23 AM |
|
winrar can unzip like everything.
|
|
|
|
Diapolo (OP)
|
|
July 05, 2011, 11:32:17 AM |
|
winrar can unzip like everything.
And is Shareware ... but let's not start a packer discussion here an return on topic! Dia
|
|
|
|
Turix
|
|
July 05, 2011, 12:15:45 PM |
|
Can somebody upload a version of the kernel we can extract natively in windows without downloading yet another compression/decompression app?
TIA
edit: Online decompression: wobzip.org
Seriously just get 7-zip and remove the rest, its free, open source and its default format .7z is probably the most effective lossless compression format commonly used.
|
|
|
|
Wildvest
Newbie
Offline
Activity: 41
Merit: 0
|
|
July 05, 2011, 02:13:26 PM |
|
i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s
|
|
|
|
Diapolo (OP)
|
|
July 05, 2011, 02:41:02 PM |
|
i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s
Too bad, but thanks for trying . Perhaps a new version will be ready by the end of this week. But guys don´t expect a huge improvement. On my setup I get 0,5 - 1,5 MHash/s more than with 2011-07-03 kernel version (guess the puzzle reaches it's end ^^). I can´t work like I would like to because the AMD APP KernelAnalyzer doesn't work ... hoping for a new version! Dia
|
|
|
|
jedi95
|
|
July 05, 2011, 03:50:18 PM |
|
i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s
This isn't exactly a fair comparison since phatk was specifically targeted at VLIW5 GPUs on SDK 2.4. A better comparison would be against phatk without this modification.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 05, 2011, 03:58:21 PM |
|
The newest kernel gives an error on only 1 of my 4 cards. Phoenix says something about a kernel or OpenCL error and suggest that the card might be malfunctioning (and then keeps mining), and poclbm just crashes. The card is a ASUS 5870, but I have an identical model that works fine. The error might not happen for like 20 minutes. And the card apparently works perfectly since various pools accept the results from that card with no special extra-stales.
Also, I find it weird that the newest kernel seems to add some Watts of power consumption (I need to measure this propperly though).. Since its doing less operations and that way achieving a higher rate, shouldnt it consume the same?
|
|
|
|
Turix
|
|
July 05, 2011, 07:29:19 PM Last edit: July 05, 2011, 11:38:01 PM by Turix |
|
Bumped my hash rate from about 428 to 431 (3 MHash/s or 0.0696%), although previously I was using a customized kernel that has some of the same changes you've made so this increase is not representative.
Edit: XFX 5870 @ 950/315
|
|
|
|
CYPER
|
|
July 05, 2011, 08:25:23 PM |
|
4x 5870 @ 960Mhz Core = rock solid 1748Mhash/s After I put the modified kernel: Variable 1746-1751 with most of the time staying around 1748 So no use for me, but thanks for your effort anyway
|
|
|
|
techwtf
|
|
July 05, 2011, 11:50:56 PM |
|
5870, SDK 2.1, 11.6: no change, 419->419
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
July 06, 2011, 12:29:43 AM |
|
saapphire 5830 xtreme.. upped me from 329.2 to 332.6
overclocked 1040/355
aggression 12 worksize 256
66c
|
mooo for rent
|
|
|
c_k
Donator
Full Member
Offline
Activity: 242
Merit: 100
|
|
July 06, 2011, 02:41:40 AM Last edit: July 06, 2011, 03:04:27 AM by c_k |
|
I get 5MH/s increase on 5850 (372MH/s -> 377MH/s) and 2MH/s increase on 5770 (215MH/s -> 217MH/s)
hmm, it does seem to vary a hell of a lot more though
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 06, 2011, 05:13:15 AM |
|
5870, SDK 2.1, 11.6: no change, 419->419 Since most gpu miners are using now the phatk kernel you should upgrade to 2.4. Phatk kernel is optimized for 2.4. You will get better results.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 10:12:04 AM |
|
5870, SDK 2.1, 11.6: no change, 419->419 Since most gpu miners are using now the phatk kernel you should upgrade to 2.4. Phatk kernel is optimized for 2.4. You will get better results. Well, you may get better results. For me: SDK 2.4: 393.7 -> 396.8 SDK 2.1: 399.4 -> 399.4 That's for a Sapphire HD5850 Xtreme 970/300@1.0875V using catalyst 11.6 on Linux and phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 Definitely try out both SDK 2.1 and SDK 2.4 though. If you are using Windows there's also an early version of SDK 2.5 with catalyst 11.7 which may be worth a look.
|
|
|
|
Diapolo (OP)
|
|
July 06, 2011, 01:09:39 PM |
|
New version is ready, DL here: http://www.mediafire.com/?f8b8q3w5u5p0ln0Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested! Try, have fun, comment and donate . Thanks, Dia
|
|
|
|
Apopfis
Newbie
Offline
Activity: 12
Merit: 0
|
|
July 06, 2011, 01:39:30 PM |
|
from ~402Mh/s ---to--- ~405Mh/s and I have 4x5850 sapphires each clocked to 1000Mhz. Will take time to see what stales rate will be. With the last kernel it was around 2-3 %. Usually closer to 2%. PLEASE KEEP THESE COMING
|
|
|
|
strictlyfocused
Newbie
Offline
Activity: 55
Merit: 0
|
|
July 06, 2011, 01:56:34 PM |
|
MSI Hawk 5770 2011-07-03 kernel :: 233 MH/s 2011-07-03 kernel :: 236 MH/s Thanks!!!
|
|
|
|
Keninishna
|
|
July 06, 2011, 02:53:22 PM |
|
no change for me and my 6950s 11.6 drivers sdk 2.4
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 04:43:35 PM |
|
Another improvement but still not enough to beat SDK 2.1 for me.
The two phatk kernels of interest to me are: - Kernel A = standard phatk kernel with the MA tweak applied. - Kernel B = the latest kernel from this thread.
I'm using: Linux, Catalyst 11.6, a Sapphire HD5850 Xtreme:
At 900 MHz things look promising...
[900 MHz - 360 MHz RAM] SDK 2.1, kernel A: 364.4 MH/s SDK 2.1, kernel B: Fatal error SDK 2.4, kernel A: 360.8 MH/s SDK 2.4, kernel B: 365.6 MH/s
...but at higher core clock rates SDK 2.1 takes the lead once more.
[980 MHz - 360 MHz RAM] SDK 2.1, kernel A: 404.7 MH/s SDK 2.4, kernel B: 404.3 MH/s
[1020 MHz - 360 MHz RAM] SDK 2.1, kernel A: 421.5 MH/s SDK 2.4, kernel B: 420.9 MH/s
I would give some higher clocks but I can't go much past 1020 MHz without overvolting my card and I don't want to do that.
I tried playing with the RAM frequency but everything dropped off slowly as I lowered it and quickly as I raised it no matter which kernel or version of SDK I choose.
It's a shame that SDK 2.1 cannot drive your latest kernel but then I guess you are specifically designing it for SDK 2.4 and are now using features which are not available in SDK 2.1. It would be great to finally put SDK 2.1 to bed but another MH/s sounds like a tall order at this point.
I have no data on accepts and rejects (I mine solo).
|
|
|
|
CYPER
|
|
July 06, 2011, 04:50:18 PM |
|
I'm using Autominer and the newest kernel does not work for me at all, but I don't have time to troubleshoot it as I'll miss on my mining and any expected rewards.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 05:13:20 PM |
|
I'm using Autominer and the newest kernel does not work for me at all, but I don't have time to troubleshoot it as I'll miss on my mining and any expected rewards.
Possibly you're using an old version of SDK. I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.
|
|
|
|
Diapolo (OP)
|
|
July 06, 2011, 07:07:04 PM |
|
I made an interesting discovery during my own tests with the new kernel version. I had to up the memory clock of my 5870 from 200 to 350 MHz in order to achieve the highest hashing values. Another thing to mention is, that I drive a Phenom II X6 1090T with only 800 MHz for every core, due to power saving, while mining. If I let the CPU use full speed, MHash/s goes even higher, let's say 3-4 MH/s.
Conclusion: Perhaps you guys should try to raise your mem speeds + experiment with CPU clocks, too. I know it has to be a good balance, so that higher MH/s values are not eaten by higher energy costs.
Dia
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 07:15:24 PM |
|
I made an interesting discovery during my own tests with the new kernel version. I had to up the memory clock of my 5870 from 200 to 350 MHz in order to achieve the highest hashing values. Another thing to mention is, that I drive a Phenom II X6 1090T with only 800 MHz for every core, due to power saving, while mining. If I let the CPU use full speed, MHash/s goes even higher, let's say 3-4 MH/s.
Conclusion: Perhaps you guys should try to raise your mem speeds + experiment with CPU clocks, too. I know it has to be a good balance, so that higher MH/s values are not eaten by higher energy costs.
Dia
My card RAM is already at 360 MHz and I've tested but I can't find a better frequency for the RAM at my core speeds if I'm only interested in MH/s. As for CPU usage I've not touched my CPU settings at all and the miners only use about 0.4% each. I even removed the fan from the CPU and placed it to cool the back of my hot card (the heatsink on the CPU is not even warm). I'm assuming significant CPU loads is a Windows thing. What interests me is how SDK 2.1 seems to be better at higher clock speeds whereas SDK 2.4 with your kernel is better at moderate speeds (940 MHz or below). I admit I have little data on this but if anyone else gets the same results it would be interesting to know why.
|
|
|
|
CYPER
|
|
July 06, 2011, 07:18:31 PM |
|
Possibly you're using an old version of SDK. I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.
Well it worked with the previous version of the modified kernel.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 07:22:37 PM |
|
Possibly you're using an old version of SDK. I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.
Well it worked with the previous version of the modified kernel. Yes, I found the previous version worked with SDK 2.1. But the version released today doesn't. I had to change to SDK 2.4 for this most recent version and this change actually lost me 0.4-0.6 MH/s.
|
|
|
|
Diapolo (OP)
|
|
July 06, 2011, 07:34:13 PM |
|
Possibly you're using an old version of SDK. I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.
Well it worked with the previous version of the modified kernel. Yes, I found the previous version worked with SDK 2.1. But the version released today doesn't. I had to change to SDK 2.4 for this most recent version and this change actually lost me 0.4-0.6 MH/s. I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it. Dia
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 07:49:09 PM |
|
I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.
Dia
Don't worry about it. The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have. I don't know how to get detailed error messages from phatk. When I use SDK 2.1 and your latest kernel I run the command python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1and get [<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!If I try the same with the previous version of your kernel everything works happily. I wish I had more details for you but I just don't know how to get them.
|
|
|
|
Diapolo (OP)
|
|
July 06, 2011, 08:07:43 PM |
|
I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.
Dia
Don't worry about it. The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have. I don't know how to get detailed error messages from phatk. When I use SDK 2.1 and your latest kernel I run the command python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1and get [<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!If I try the same with the previous version of your kernel everything works happily. I wish I had more details for you but I just don't know how to get them. If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion . Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^). Dia
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 08:19:16 PM |
|
If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion . Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^). Dia Would I get more detailed feedback from another front-end to phatk? I haven't really 'shopped around' with the front ends.
|
|
|
|
jedi95
|
|
July 06, 2011, 08:53:03 PM |
|
I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.
Dia
Don't worry about it. The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have. I don't know how to get detailed error messages from phatk. When I use SDK 2.1 and your latest kernel I run the command python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1and get [<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!If I try the same with the previous version of your kernel everything works happily. I wish I had more details for you but I just don't know how to get them. If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion . Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^). Dia There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel. For mining I see only 2 real options: SDK 2.1 with poclbm SDK 2.4 with phatk 2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either. 2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4 Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 06, 2011, 09:02:09 PM |
|
I am sure, it increases. 424-447 Mhash/s & 413 -430 Mhash/s Sapphire 5870 & MSI 5870.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 09:07:25 PM |
|
There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.
I read elsewhere that this is the theory but in practice phatk is faster than poclbm on SDK 2.1 for me. Maybe this has something to do with the fact that I've applied the MA tweak (one less operation) to both kernels. E.g. Sapphire HD5850 Xtreme 1000MHz core, 350MHz RAM, Catalyst 11.6 (Linux x86_64), VECTORS BFI_INT FASTLOOP=false AGGRESSION=13 WORKSIZE=256: phatk: 413.3 MH/s (+/- 0.2 MH/s) poclbm: 411.4 MH/s (+/- 0.2 MH/s) I've tried lower core speeds and higher RAM speeds but always phatk outperforms poclbm on SDK 2.1 for me. For mining I see only 2 real options: SDK 2.1 with poclbm SDK 2.4 with phatk
2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either. 2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4
Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().
I'll try that.
|
|
|
|
Wildvest
Newbie
Offline
Activity: 41
Merit: 0
|
|
July 06, 2011, 09:18:42 PM |
|
THANKS for your efforts ! just reporting back 6990 version 2011-07-06 with Catalyst 11.4, SDK 2.4 now equal with the latest poclbm (phatk) - maybe 0.5 MH/s slower
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 06, 2011, 09:20:17 PM |
|
Ok, here are the errors for the latest kernel on SDK 2.1. { Build on <pyopencl.Device 'Cypress' at 0x34a3680>:
/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[19] = P4(19) + 0x11002000 + P1(19); ^
/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[30] = P3(30) + 0xA00055 + P1(30); ^
/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) Vals[3] = L + W[64]; ^
/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[81] = P4(81) + P2(81) + 0xA00000; ^
/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[87] = P4(87) + P3(87) + 0x11002000 + P1(87); ^
/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[94] = P3(94) + 0x400022 + P1(94); ^
6 errors detected in the compilation of "/tmp/OCLthVTDN.cl". }
Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through. If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.
|
|
|
|
Alan Lupton
Newbie
Offline
Activity: 42
Merit: 0
|
|
July 06, 2011, 10:51:07 PM |
|
2001-06-07: Wow, nice work! Now I'm getting not 5-15% rejections and working like a charm. No speed increase though from last update.
|
|
|
|
c_k
Donator
Full Member
Offline
Activity: 242
Merit: 100
|
|
July 07, 2011, 02:38:11 AM |
|
New release gives me 2-3MH/s more I've given a small donation Thanks for the hard work!
|
|
|
|
Diapolo (OP)
|
|
July 07, 2011, 06:02:05 AM |
|
Ok, here are the errors for the latest kernel on SDK 2.1. { Build on <pyopencl.Device 'Cypress' at 0x34a3680>:
/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[19] = P4(19) + 0x11002000 + P1(19); ^
/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[30] = P3(30) + 0xA00055 + P1(30); ^
/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) Vals[3] = L + W[64]; ^
/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[81] = P4(81) + P2(81) + 0xA00000; ^
/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[87] = P4(87) + P3(87) + 0x11002000 + P1(87); ^
/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[94] = P3(94) + 0x400022 + P1(94); ^
6 errors detected in the compilation of "/tmp/OCLthVTDN.cl". }
Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through. If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.
You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then. Dia
|
|
|
|
dsky
|
|
July 07, 2011, 07:22:21 AM |
|
All miner are Windows 7 x32 - SDK 2.4 - Catalyst 11.6
Latest changes: HD5770 - from 219 up to 220 HD6950 (unlockable) - from 367 to 370 HD6970 (6950 with 6950 BIOS) - from 405 up to 408
Small speed increase on all three kind of cards and the rejected rate seems better, too.
Well done again, Sir!
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 07, 2011, 07:40:36 AM Last edit: July 07, 2011, 09:07:07 AM by hugolp |
|
5870, Ubuntu 11.04, 11.6, 2.4, poclbm, went up 1MH/s (with last modification from previous modification).
The good news is the card that was randomly crashing the miner every 20 minutes with previous patch has been running for more than an hour without problems, so it seems stable now. Just crashed. I dont know what happens with this card and the modified kernel. Also, consumption has gone down like 5W. Im very puzzled by this changes in consumption by the different kernels.
Very good job. A small donation is going your way.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 07, 2011, 07:48:45 AM |
|
Ok, here are the errors for the latest kernel on SDK 2.1. { Build on <pyopencl.Device 'Cypress' at 0x34a3680>:
/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[19] = P4(19) + 0x11002000 + P1(19); ^
/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[30] = P3(30) + 0xA00055 + P1(30); ^
/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) Vals[3] = L + W[64]; ^
/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[81] = P4(81) + P2(81) + 0xA00000; ^
/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[87] = P4(87) + P3(87) + 0x11002000 + P1(87); ^
/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[94] = P3(94) + 0x400022 + P1(94); ^
6 errors detected in the compilation of "/tmp/OCLthVTDN.cl". }
Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through. If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.
You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then. Dia No good. Now I get a ton of "expression must have a constant value" errors. The end of the log looks like: { /tmp/OCLgV3our.cl(25): error: expression must have a constant value (u)0x6a09e667, (u)0xbb67ae85, (u)0x3c6ef372, (u)0x510e527f, (u)0x9b05688c, (u)0x1f83d9ab, (u)0xfc08884d, (u)0x5be0cd19 ^ /tmp/OCLgV3our.cl(29): error: expression must have a constant value __constant ulong L = (u)0x198c7e2a2; ^ /tmp/OCLgV3our.cl(261): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) Vals[3] = L + W[64]; ^ 74 errors detected in the compilation of "/tmp/OCLgV3our.cl". } Only one of the "mixed vector-scalar operation" errors remains but I'm guessing the others are still there but just buried by the even more urgent "constant value" errors.
|
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 07, 2011, 08:32:42 AM |
|
7/6 kernel seems to have the following effects for me:
4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.
6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 07, 2011, 08:50:22 AM |
|
You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.
Dia
Sorry about that last post. I'm not usually that dumb I assure you. I've modified your kernel code by adding (u) before each of the 5 raw hex values corresponding to the error messages. I also added (u) directly before L from the other error message. After this everything starts working in SDK 2.1. For my stock voltage 5850: 423.7 (+/- 0.1) MH/s -> 425.9 (+/- 0.05) MH/s This does of course mean that SDK 2.1 has increased its lead against SDK 2.4 for me. So many people are convinced that SDK 2.4 is faster so perhaps this is a Windows/Linux thing. If this runs for 24 hours without freezing then I have a new personal best! I will want to test what proportion of these hashes are inaccurate but things are looking good. Another donation is coming your way.
|
|
|
|
Diapolo (OP)
|
|
July 07, 2011, 09:46:52 AM |
|
Ok, here are the errors for the latest kernel on SDK 2.1. { Build on <pyopencl.Device 'Cypress' at 0x34a3680>:
/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[19] = P4(19) + 0x11002000 + P1(19); ^
/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[30] = P3(30) + 0xA00055 + P1(30); ^
/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) Vals[3] = L + W[64]; ^
/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[81] = P4(81) + P2(81) + 0xA00000; ^
/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[87] = P4(87) + P3(87) + 0x11002000 + P1(87); ^
/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[94] = P3(94) + 0x400022 + P1(94); ^
6 errors detected in the compilation of "/tmp/OCLthVTDN.cl". }
Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through. If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.
You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then. Dia No good. Now I get a ton of "expression must have a constant value" errors. The end of the log looks like: { /tmp/OCLgV3our.cl(25): error: expression must have a constant value (u)0x6a09e667, (u)0xbb67ae85, (u)0x3c6ef372, (u)0x510e527f, (u)0x9b05688c, (u)0x1f83d9ab, (u)0xfc08884d, (u)0x5be0cd19 ^ /tmp/OCLgV3our.cl(29): error: expression must have a constant value __constant ulong L = (u)0x198c7e2a2; ^ /tmp/OCLgV3our.cl(261): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) Vals[3] = L + W[64]; ^ 74 errors detected in the compilation of "/tmp/OCLgV3our.cl". } Only one of the "mixed vector-scalar operation" errors remains but I'm guessing the others are still there but just buried by the even more urgent "constant value" errors. Ah sorry, I was not clear enough. You must not add (u) in front of every hex value in the kernel, but ONLY in front of the hex values, that generated an error. W[19] = P4(19) + (u)0x11002000 + P1(19);
W[30] = P3(30) + (u)0xA00055 + P1(30);
Vals[3] = (u)L + W[64];
W[81] = P4(81) + P2(81) + (u)0xA00000;
W[87] = P4(87) + P3(87) + (u)0x11002000 + P1(87);
W[94] = P3(94) + (u)0x400022 + P1(94); If you can be so kind and test this out and report back. I would say restore the latest kernel and then modifiy the 6 places. Dia
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 07, 2011, 09:58:45 AM |
|
Ah sorry, I was not clear enough. You must not add (u) in front of every hex value in the kernel, but ONLY in front of the hex values, that generated an error.
You were perfectly clear, I was just being dumb. Incase you missed my second post, this fix works for SDK 2.1. Thank you very much.
|
|
|
|
Diapolo (OP)
|
|
July 07, 2011, 10:06:00 AM |
|
Great, so we have a fix and a version that works with 2.1. Will release a fix later today!
Dia
|
|
|
|
n4l3hp
|
|
July 07, 2011, 10:44:05 AM |
|
7/6 kernel seems to have the following effects for me:
4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.
6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.
My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining. For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.
|
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 07, 2011, 12:11:48 PM |
|
7/6 kernel seems to have the following effects for me:
4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.
6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.
My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining. For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate. You misread my post. I am running at 460 core because the card is 105C at that speed. I cannot run it any faster. I can run 480 core with this new kernel. Btw, the days of SDK 2.1 and poclbm are nearly over. I get 84MH/s at 675 core and 494 mem. I can't do 250 mem the card doesn't downclock that far with afterburner. At 250 mem it would be even higher. However I can actually clock 700+ even though only for a few seconds.
|
|
|
|
Diapolo (OP)
|
|
July 07, 2011, 03:20:41 PM |
|
New version 2011-07-07 is ready: http://www.mediafire.com/?7j70gnmllgi9b73This is mainly a bugfix release for SDK 2.1 with some code restructuring to save a few writes and additions. I can not guarantee, that this really works for 2.1, because I didn't test it. If you are unsure, wait for users to test it for you and consider applying this patch later! By the way, I want to thank all of those who donated a few Bitcents to me, feels great! Thanks, Dia PS.: If it works, please post here and consider a small donation @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM .
|
|
|
|
Saturn7
|
|
July 07, 2011, 03:29:29 PM |
|
Went from 433/Mhash to 440/Mhash on 5870. Overclocked to 970Mhz. Thanks Donation sent.
|
First there was Fire, then Electricity, and now Bitcoins
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 07, 2011, 03:30:10 PM |
|
This kernel will cause poclbm to exit after running for a while. I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing
AMD5870 with SDK 2.5
|
|
|
|
Diapolo (OP)
|
|
July 07, 2011, 03:37:03 PM |
|
This kernel will cause poclbm to exit after running for a while. I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing
AMD5870 with SDK 2.5
That's the first report I get with that problem. Any other poclbm users with that observation? Dia
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 07, 2011, 03:45:03 PM |
|
This kernel will cause poclbm to exit after running for a while. I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing
AMD5870 with SDK 2.5
That's the first report I get with that problem. Any other poclbm users with that observation? Dia I reported it twice in this same thread... It only happens in one of my cards (I have 4 5870's, and I have one that its an identical model that works fine). In phoenix what happens is that it will give a mistake about a kernel mistake and will continue mining (I suspect phoenix reloads the kernel).
|
|
|
|
phorensic
|
|
July 07, 2011, 03:56:54 PM |
|
Excellent work on this kernel. It seems like the original author got to a point where he thought he had improved performance to almost the max, but you are progressing very nicely!
|
|
|
|
CYPER
|
|
July 07, 2011, 07:16:33 PM |
|
Well finally I can see some improvement.
With todays version I got from 1748 to 1758 so ~10Mhash increase.
This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory SDK 2.1
|
|
|
|
Diapolo (OP)
|
|
July 07, 2011, 08:01:50 PM |
|
Well finally I can see some improvement.
With todays version I got from 1748 to 1758 so ~10Mhash increase.
This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory SDK 2.1
Seems like good news for SDK 2.1 users, right ?
|
|
|
|
CYPER
|
|
July 07, 2011, 08:11:12 PM |
|
Well to be perfectly honest and objective 0.5% increase won't make any difference even with my setup At least not in terms of financial benefits But I don't mean to belittle your work - well done
|
|
|
|
erek
Newbie
Offline
Activity: 36
Merit: 0
|
|
July 07, 2011, 09:40:20 PM |
|
808.6 MH/sec max now (up 1-2% at least) from 7-6-11 to 7-7-11
|
|
|
|
zimpixa
Member
Offline
Activity: 98
Merit: 10
|
|
July 07, 2011, 10:16:10 PM |
|
SDK 2.1 working for 5h now.
Newest version seems to behave more stable (less delta max-min). Speed is about 0.5MHash higher, but it can be just impression.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 07, 2011, 10:36:13 PM |
|
Excellent.
It looks like this SDK 2.1 bugfix is popular! Perhaps now people will stop telling me to use SDK 2.4.
Thanks Diapolo for the fix and I'm glad I was able to help.
|
|
|
|
bmgjet
Member
Offline
Activity: 98
Merit: 10
|
|
July 07, 2011, 11:23:51 PM |
|
version 2011-07-06 gives best over all speed. 0.500 faster then my modded one but opening firefox drops speed from 277 down to 233 where my one only drops 2mh/s. Probably just the way phatk works since iv never used the stock one.
version 2011-07-07 is all over the place. jumps between 255-280 without firefox open so don't know if its faster or slower. With firefox its more stable then older version and drops to 252-258
Im using it with poclbm.exe -v -w 256 (128 gave same result) on 6850 overclocked/underclocked and 2.1 SDK.
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 08, 2011, 12:55:40 AM |
|
I have a feature request
Can you please make it work with intel openCL?
thank you
|
|
|
|
Diapolo (OP)
|
|
July 08, 2011, 05:24:10 AM |
|
Excellent.
It looks like this SDK 2.1 bugfix is popular! Perhaps now people will stop telling me to use SDK 2.4.
Thanks Diapolo for the fix and I'm glad I was able to help.
It was only possible to fix it that fast, because you showed me the log files and error output! So we helped eachother, thanks too! Dia
|
|
|
|
Diapolo (OP)
|
|
July 08, 2011, 05:25:46 AM |
|
version 2011-07-06 gives best over all speed. 0.500 faster then my modded one but opening firefox drops speed from 277 down to 233 where my one only drops 2mh/s. Probably just the way phatk works since iv never used the stock one.
version 2011-07-07 is all over the place. jumps between 255-280 without firefox open so don't know if its faster or slower. With firefox its more stable then older version and drops to 252-258
Im using it with poclbm.exe -v -w 256 (128 gave same result) on 6850 overclocked/underclocked and 2.1 SDK.
I discovered a MH/sec drop while, using Firefox, too. It has to be related with the new GPU acceleration that FF implemented in 4.0 and up.
|
|
|
|
Diapolo (OP)
|
|
July 08, 2011, 05:27:00 AM |
|
I have a feature request
Can you please make it work with intel openCL?
thank you
I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!? Dia
|
|
|
|
kwaaak
|
|
July 08, 2011, 09:55:22 AM |
|
Thanks a lot
|
|
|
|
r3v3rs3
Newbie
Offline
Activity: 22
Merit: 0
|
|
July 08, 2011, 10:12:33 AM |
|
2011-07-03 -> 2011-07-07Wheezy x64, 11.4, SDK 2.4:Box #1: - HD5750, 875/300, AGGRESSION=11, 175 MH/s -> 176 MH/s Box #2: - HD5750, 900/300, AGGRESSION=11, 181 MH/s -> 183 MH/s - HD5770, 950/300, AGGRESSION=11, 215 MH/s -> 217 MH/s XP 32, 11.7 preview, SDK 2.5:- HD5770, 1000/300, AGGRESSION=12, 222 MH/s -> 223 MH/s - HD5830, 1050/300, AGGRESSION=9, 337 MH/s -> 337 MH/s phatk w/ Ma patch -> 2011-07-07Natty x32, 11.6, SDK 2.4:- HD5750, 900/300, AGGRESSION=9, 176 MH/s -> 176 MH/s - HD5770, 950/1200 (going to be RBE'ed to 300 ), AGGRESSION=9, 204 MH/s -> 204 MH/s - HD5830, 1000/300, AGGRESSION=9, 312 MH/s -> 313 MH/s phoenix 1.50 w/ common flags: VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 Nice work, sent some bitcents to 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM.
|
|
|
|
Diapolo (OP)
|
|
July 08, 2011, 11:57:07 AM |
|
Guys, I introduced a small glitch, which produces an OpenCL compiler warning in version 07-07. For stability reasons please change line 77: old: u W[123]; new: u W[124]; I missed sharound(123), which writes to W[123], which is undefined, because it's out of range. Sorry for that! Will upload a fixed version shortly (only includes the change above and stays 07-07). Edit: Download 07-07 fixed: http://www.mediafire.com/?o7jfp60s7xefrg4Dia
|
|
|
|
Tx2000
|
|
July 08, 2011, 02:55:09 PM |
|
Tried it out for the first time and I do see some noticable gains.
5850 @ 970/350 GUIMiner 2011-7-1
poclbm opencl - 410-413Mh phoenix phatk - 408-413Mh
with your modified kern - 415-416.4Mh
Send a little something your way
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 08, 2011, 06:28:38 PM |
|
I have a feature request
Can you please make it work with intel openCL?
thank you
I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!? Dia I have tried. Poclum will not run at all. It crashed upon starting. I took a look at the code. I think the comments are messy and some not really helpful. Do you think its an good idea to 'fix' the comment? btw comment why you type cast the hex value so other developer wont think its unnecessary and remove it.
|
|
|
|
error
|
|
July 08, 2011, 09:58:24 PM |
|
These two changes have taken my 5850s from 345MHash/sec to 360MHash/sec. Very nice.
|
3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
|
|
|
Diapolo (OP)
|
|
July 08, 2011, 11:57:40 PM |
|
I have a feature request
Can you please make it work with intel openCL?
thank you
I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!? Dia I have tried. Poclum will not run at all. It crashed upon starting. I took a look at the code. I think the comments are messy and some not really helpful. Do you think its an good idea to 'fix' the comment? btw comment why you type cast the hex value so other developer wont think its unnecessary and remove it. Hex-values are type-casted so that the kernel works with AMD 2.1 SDK, which throws an error, if NOT type-casted. I don't understand what you want to tell me with the "comments are messy" part. If you get an error log with Intel SDK please post it here, so I can have a look at it. Dia
|
|
|
|
Diapolo (OP)
|
|
July 08, 2011, 11:58:43 PM |
|
These two changes have taken my 5850s from 345MHash/sec to 360MHash/sec. Very nice.
5830s seem like THE card for my modded kernel. Great to hear . Dia
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 09, 2011, 02:04:58 PM |
|
Hi Diapolo!
Great to see you're making progress! There's one thing that pops into my eye:
you already do: if(Vals[7].x == -H[7])
why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)
if(Vals[7].x == -H[7]-K[60])
Lets see if I can find more ..
|
|
|
|
Diapolo (OP)
|
|
July 09, 2011, 02:45:37 PM |
|
Hi Diapolo!
Great to see you're making progress! There's one thing that pops into my eye:
you already do: if(Vals[7].x == -H[7])
why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)
if(Vals[7].x == -H[7]-K[60])
Lets see if I can find more ..
Good idea and works, can't verify via KernelAnalyzer, but seems like a vector addition less. Will be included in the next version! Dia
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 09, 2011, 02:59:02 PM |
|
Another addition waiting to be removed: Vals[7] = (Vals[3] = (u)0xb956c25b + D1 + s1(4) + ch(4)) + H1; -> D1 is only used here, so why not add (u)0xb956c25b during precalculation? Add self.state2[3] = np.uint32(self.state2[3] + 0xb956c25b); to __init__.py, line 77 for me, right behind: self.calculateF(data) And remove (u)0xb956c25b from kernel.cl This also works 100%, no logic change involved here.
|
|
|
|
bcforum
|
|
July 09, 2011, 05:35:47 PM |
|
What parameters are you using with phatk? On my normally aspirated R6970 Lightning I get 419.xMH/s (Ma fix in poclbm): -k poclbm DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS FASTLOOP=false The fastest I've ever gotten with phatk is 403.xMH/s -k phatk DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS FASTLOOP=false Any suggestions?
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
huayra.agera
|
|
July 09, 2011, 06:08:03 PM |
|
I hope you can make an optimization for the next OCL version (v2.5 (684.212)) available in beta form already. =)
|
BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
|
|
|
indio007
|
|
July 09, 2011, 06:09:16 PM |
|
Vince i editted the code like you said and I got errors.
Diapola what you did works fine. I have 2 5830's testing with now. Expect some Bit.love is all goes well.
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 09, 2011, 07:34:44 PM Last edit: July 09, 2011, 08:13:28 PM by Vince |
|
Vince i editted the code like you said and I got errors.
Which one of the changes did you try, both? Tell me about the error you got, just "does not work" helps nobody!
|
|
|
|
indio007
|
|
July 09, 2011, 09:07:32 PM |
|
Vince i editted the code like you said and I got errors.
Which one of the changes did you try, both? Tell me about the error you got, just "does not work" helps nobody! In Kernel.cl I changed this: if(Vals[7].x == -H[7]) to this if(Vals[7].x == -H[7]-K[60]) and changed this Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); to this Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); Then i changed this Vals[7] = (Vals[3] = (u)0xb956c25b + D1 + s1(4) + ch(4)) + H1; to this Vals[7] = (Vals[3] = D1 + s1(4) + ch(4)) + H1; as instructed here and yes i added the line to init.py Add
self.state2[3] = np.uint32(self.state2[3] + 0xb956c25b);
to __init__.py, line 77 for me, right behind:
self.calculateF(data)
And remove (u)0xb956c25b from kernel.cl
The error is opencl is having unusual behavior or something. it shows MH etc... just when it seems to want to accept a share it spits that out
|
|
|
|
1bitc0inplz
Member
Offline
Activity: 112
Merit: 10
|
|
July 10, 2011, 03:20:39 AM |
|
This most recent update is everything you said it would be!
My 5830 went from 295 MH/s to 305 MH/s with just this update!
Thanks for the great work.
|
|
|
|
gominoa
Newbie
Offline
Activity: 17
Merit: 0
|
|
July 10, 2011, 07:19:09 AM |
|
In Kernel.cl I changed this: if(Vals[7].x == -H[7])
to this if(Vals[7].x == -H[7]-K[60])
Try also changing: if(Vals[7].y == -H[7]) ... to ... if(Vals[7].y == -H[7]-K[60]) notice Y instead of X. Will be just below the X line
|
|
|
|
indio007
|
|
July 10, 2011, 09:44:34 AM |
|
In Kernel.cl I changed this: if(Vals[7].x == -H[7])
to this if(Vals[7].x == -H[7]-K[60])
Try also changing: if(Vals[7].y == -H[7]) ... to ... if(Vals[7].y == -H[7]-K[60]) notice Y instead of X. Will be just below the X line Same error
|
|
|
|
Diapolo (OP)
|
|
July 10, 2011, 11:38:08 AM |
|
Next kernel version will, once more, be faster for 69XX and 58XX cards . Stay tuned! Dia
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 10, 2011, 01:12:49 PM Last edit: July 10, 2011, 01:32:52 PM by Vince |
|
Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
to this Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
is this a typo or did you leave out the "="?
should be:
Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
|
|
|
|
indio007
|
|
July 10, 2011, 04:55:33 PM |
|
That Worked , i managed to go from 306-308MH to 308-310MH on 5830. donations forth coming if it remains stable. Thx...
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 10, 2011, 06:07:41 PM |
|
Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
to this Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
is this a typo or did you leave out the "="?
should be:
Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
do i need to change if(Vals[7].y == -H[7])? and if(Vals[7] == -H[7])?
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 10, 2011, 06:13:45 PM |
|
Hi Diapolo!
Great to see you're making progress! There's one thing that pops into my eye:
you already do: if(Vals[7].x == -H[7])
why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)
if(Vals[7].x == -H[7]-K[60])
Lets see if I can find more ..
how would this save an instruction? did you just move -K[60]?
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 10, 2011, 06:30:33 PM |
|
how would this save an instruction? did you just move -K[60]?
Yes, just moved it. Now the compiler optimizes it away, before it didn't.
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 10, 2011, 06:50:32 PM |
|
how would this save an instruction? did you just move -K[60]?
Yes, just moved it. Now the compiler optimizes it away, before it didn't. Mind explaining more? I don't get it
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 10, 2011, 07:02:44 PM |
|
Mind explaining more? I don't get it
Its a constant. If I add it together with the other stuff in round 124 to Vals[7], it takes an addition to do so, cause its the only constant. If moved to the comparison at the end, the two constants H[7] and K[60] are merged together into one by the compiler, same execution time here.
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 10, 2011, 07:09:28 PM |
|
So i changed the whole thing to
Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124); #ifdef VECTORS if(Vals[7].x == -H[7]-K[60]) output[OUTPUT_SIZE] = output[(W[3].x >> 2) & OUTPUT_MASK] = W[3].x; if(Vals[7].y == -H[7]-K[60]) output[OUTPUT_SIZE] = output[(W[3].y >> 2) & OUTPUT_MASK] = W[3].y; #else if(Vals[7] == -H[7]-K[60]) output[OUTPUT_SIZE] = output[(W[3] >> 2) & OUTPUT_MASK] = W[3]; #endif
does not notice any speed difference, hope that helps
Ps: Does the compiler really do the optimization? If not you introduced one more step cause K[60] appear twice now
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 10, 2011, 07:18:48 PM |
|
I'm pretty sure the compiler will catch this Note that the speed increase is minimal, ~0.1-0.2% maybe.
|
|
|
|
Diapolo (OP)
|
|
July 11, 2011, 02:10:13 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia
|
|
|
|
Turix
|
|
July 11, 2011, 02:31:01 PM |
|
Gained about 1 Mhash (431->432) from the 7th version to todays new version on my 5870 (950/315).
|
|
|
|
Bobnova
|
|
July 11, 2011, 03:08:38 PM |
|
I also gained about 1Mh/s from todays update compared to the previous update, this is on a 5830 at 875/900 in linux. The previous update made a big difference over what ships with phoenix 1.50. I sent a small donation, as you've helped me make more money
|
BTC: 1AURXf66t7pw65NwRiKukwPq1hLSiYLqbP
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 11, 2011, 04:29:14 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.
|
|
|
|
Diapolo (OP)
|
|
July 11, 2011, 04:51:53 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5. Various functions before the reassignment use Vals[2] search the kernel for Vals[ and you will see them. I tried to remove that assignment, but as you discovered, it is needed there . Dia
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 11, 2011, 05:06:15 PM |
|
This does work with SDK 2.1 but it might be a tiny bit slower than your previous version.
HD5850, 1.0875V, 975MHz clock, 360 MHz RAM, aggression=14, worksize=256, Catalyst 11.6 (Linux)
SDK 2.1: 404.6 MH/s -> 404.5 MH/s SDK 2.4: 401.8 MH/s -> 402.2 MH/s
Note that at aggression=14 my rate can sometimes drop as much as 1 MH/s suddenly before recovering but usually varies by 0.2 MH/s so the apparent decrease with SDK 2.1 could well be statistical noise.
I might also have to play with the RAM frequency again.
|
|
|
|
OCedHrt
Member
Offline
Activity: 111
Merit: 10
|
|
July 11, 2011, 05:14:01 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5. Various functions before the reassignment use Vals[2] search the kernel for Vals[ and you will see them. I tried to remove that assignment, but as you discovered, it is needed there . Dia I only searched for Vals[2] so did not see the % XD Very small increase for me on this one 278->278.5. Interestingly a lower memory clock on my 6870 actually has a detrimental effect. Downclocking from 1050->800 reduces hash rate b 0.5MH/s. I can't clock it any lower so don't know if 300 will be better or not.
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 11, 2011, 05:22:22 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia I wish & hope you will still find out some ways to get more hashes. Thanks.
|
|
|
|
dikidera
|
|
July 11, 2011, 05:23:18 PM |
|
At 2 megahashes, your device produces around 2 to 2,5 million hashes per second, If we halven that to 1 megahash, that's still 1,25 million hashes per second, if we halven that, around 750 thousand per second. So a increase of 0.2% or so, yields around 100 thousand hashes more per second.
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 11, 2011, 05:39:13 PM |
|
Dont give up! There's still more to optimize, I'm at 1694 ALU OPs (HD6970) at the moment.
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 11, 2011, 06:19:54 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 11, 2011, 06:28:47 PM |
|
I wonder why do we need const uint D1. It is only use once.
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 11, 2011, 06:50:09 PM |
|
I wonder why do we need const uint D1. It is only use once.
Its part of the precalculation. Its needed.
|
|
|
|
Diapolo (OP)
|
|
July 11, 2011, 07:31:01 PM |
|
Dont give up! There's still more to optimize, I'm at 1694 ALU OPs (HD6970) at the moment.
I can't read or edit Phyton, so yes there is room if one could alter or add some more kernel arguments. Strange thing is, that I saw some additions, of known values, which I tried to to eleminate via constants, but this led to lower kernel performance. I played around with this today and saw no more improvement ... too bad, was real fun the last days! If you would like to share your work, we all will be happy . What is your kernel doing for 58XX cards? I thought it makes no sense, to optimize one over the other and tried to reduce ALU OP count for both platforms. Dia
|
|
|
|
Diapolo (OP)
|
|
July 11, 2011, 07:31:55 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes. Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts? Dia
|
|
|
|
error
|
|
July 11, 2011, 08:25:28 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia This one seems to act rather strangely. On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower. The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux. I'm going to let it run a while longer.
|
3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
|
|
|
pennytrader
|
|
July 11, 2011, 09:22:54 PM |
|
On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.
|
please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
|
|
|
CYPER
|
|
July 11, 2011, 09:53:22 PM |
|
On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.
Same here. With the previous version my average was 1758 and now it is 1756. This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory SDK 2.1 Ubuntu 32bit
|
|
|
|
wazoo42
Newbie
Offline
Activity: 42
Merit: 0
|
|
July 11, 2011, 10:18:46 PM |
|
7/4/11 = a 1-2 MH/s increase 7/6/11 = 0 increase (maybe slight decrease) 7/11/11 = 1-2 MH/s further increase over 7/4/11
These are on 2x 5830s, and 3x 5770s using ati-drivers-11.6, phoenix-1.50, pyopencl-0.92, and ati-stream-sdk-bin-2.4.
|
|
|
|
error
|
|
July 11, 2011, 10:21:38 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia This one seems to act rather strangely. On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower. The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux. I'm going to let it run a while longer. Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.
|
3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 11, 2011, 11:31:37 PM |
|
Decreasing hashrates .. thats really strange. These 58xx-cards sometimes behave quite strange.
I cant test it cause all my rigs run on 6950's unlocked to 6970's
|
|
|
|
erek
Newbie
Offline
Activity: 36
Merit: 0
|
|
July 11, 2011, 11:32:18 PM |
|
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I . This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message! Thanks to all donators and your feedback! Dia This one seems to act rather strangely. On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower. The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux. I'm going to let it run a while longer. Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06. I totally disagree, each version for me has been getting faster and faster. 7-11-11 is the fastest, yet for me.
|
|
|
|
Wildvest
Newbie
Offline
Activity: 41
Merit: 0
|
|
July 12, 2011, 12:25:55 AM |
|
2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 12, 2011, 02:27:20 AM |
|
So here are some more changes: I introduced const uint W17_2, containing P1(19) + 0x11002000, thats 3 shifts, 2 xor, 1 add traded against one extra parameter, well worth it, extended self.f: self.f = np.zeros(5, np.uint32) to self.f = np.zeros(6, np.uint32) just after W17 calculation in calculateF: #W17_2 self.f[5] = np.uint32(0x11002000+( rot(self.f[2], 32-13) ^ rot(self.f[2], 32-15) ^ (self.f[2] >> 10) )) added the parameter (right after W17) in call and function => Effectively 3 Op's saved. next change: You can cut out all W0 to W14! Most of them are zero anyway, just needed to hardcode the first ones. Also W[73] to W[78] are not used anymore with some small changes, so no need to initialize them. => less memory use, but has the same speed for me Next one: Round 3 #ifdef VECTORS Vals[4] = (W_3 = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4; #else Vals[4] = (W_3 = base + get_global_id(0)) + PreVal4; #endif -- // Round 3 Vals[0] = state0 + Vals[4]; Vals[4] += T1; -- W[64 - O] = state0 + Vals[0]; you can reorganize and shorten round 3 to: Vals[0] = T1 + Vals[4]; needed changes in precalculation: Preval4 += T1 T1 = state0 - T1 => another addition almost effortless here the files with these changes: http://www.filesonic.com/file/1423103594still some more to come!
|
|
|
|
TurdHurdur
|
|
July 12, 2011, 02:36:59 AM |
|
2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel
How'd you get this future kernel?
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 12, 2011, 05:57:08 AM Last edit: July 12, 2011, 06:44:28 AM by hugolp |
|
Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.
Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts? Dia Im using the previous patch, that is almost as fast, in that card, so its ok. Im wondering as well if the card has some kind of problem, but with other kernels it has been running non-stop for days without a problem. Dont know why some kernels trigger the crash.
|
|
|
|
Dubs420
Newbie
Offline
Activity: 20
Merit: 0
|
|
July 12, 2011, 06:48:31 AM |
|
I just tried your latest kernel in GUIminer with poclbm miner went from 417 to 419 each GPU great work thanks. was able to tweak a little more up to 420.6 to 421.0 using -f 1
|
|
|
|
Diapolo (OP)
|
|
July 12, 2011, 09:38:26 AM |
|
With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again . Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer. Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)! Dia
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 12, 2011, 11:26:22 AM |
|
With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again . Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer. Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)! Dia You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent. That being the case, which cards are you using for testing?
|
|
|
|
pandemic
|
|
July 12, 2011, 12:14:46 PM |
|
My 5830 went from 304mh/s to 307mh/s. Small increase, but why not?
|
|
|
|
Diapolo (OP)
|
|
July 12, 2011, 12:55:02 PM |
|
With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again . Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer. Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)! Dia You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent. That being the case, which cards are you using for testing? I own a 5870, a 5830 and use AMD KernelAnalyzer to get infos for 69XX cards. You see I focused on that cards during my own tests. I could receive infos for more cards via AMD KA, but it seems hard to optimize one kernel for all cards . Dia
|
|
|
|
kbsbtc
Newbie
Offline
Activity: 53
Merit: 0
|
|
July 12, 2011, 01:05:59 PM |
|
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 12, 2011, 01:28:58 PM |
|
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
So .. what about some more information? Pool? Version used? Clock speeds? 5830 @ 305 seems to be somewhat overclocked ..
|
|
|
|
Diapolo (OP)
|
|
July 12, 2011, 02:12:18 PM |
|
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said. Dia
|
|
|
|
jedi95
|
|
July 12, 2011, 03:18:25 PM |
|
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said. Dia Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
Diapolo (OP)
|
|
July 12, 2011, 03:24:45 PM |
|
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said. Dia Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand. Thanks for clarification! Good to know, this was new for me. Dia
|
|
|
|
talldude
Member
Offline
Activity: 224
Merit: 10
|
|
July 12, 2011, 03:55:17 PM |
|
cool, went from 349 (original improved kernel) to 350.3 with latest. Keep 'em coming
|
|
|
|
kbsbtc
Newbie
Offline
Activity: 53
Merit: 0
|
|
July 12, 2011, 06:25:37 PM |
|
I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said. Dia Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand. Thanks for clarification! Good to know, this was new for me. Dia Thanks for the heads up. I am running 4 5830s on 1 box clocked at 950/300 pointed at bitcoins.lc. It seems the stale count went up for me, didn't mean to blame you or anything just saying that was what happened. I'lll look into on the pool end though.
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 12, 2011, 06:33:27 PM |
|
Thanks. It increased 440 to 443 in 5870 @ 975/325 Windows. 431-434 in 6970 & 5870 @ 975/1375 & 984/300 Ubuntu - Smartcoin With the inclusion of _init_.py, i hope there will be still some room to tweak.
|
|
|
|
jedi95
|
|
July 12, 2011, 06:34:48 PM |
|
Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.
Thanks for clarification! Good to know, this was new for me. Dia Thanks for the heads up. I am running 4 5830s on 1 box clocked at 950/300 pointed at bitcoins.lc. It seems the stale count went up for me, didn't mean to blame you or anything just saying that was what happened. I'lll look into on the pool end though. My post was mainly intended to clarify that stale shares are not a good measurement for kernel changes. A much more reliable test would be to count the total number of shares submitted over a long period (say 24 hours or so) This includes stales, since the goal is to test how many shares the kernel finds, not how many the server accepts. If this number is higher than without the kernel modifications, you know that it's helping.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 12, 2011, 11:55:24 PM |
|
With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again . Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer. Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)! Dia what about for poclbm? there is no __init__.py
|
|
|
|
Diapolo (OP)
|
|
July 13, 2011, 05:08:30 AM |
|
With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again . Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer. Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)! Dia what about for poclbm? there is no __init__.py That's what my edit was about . It's all a matter of how much time I and others have and current focus is on Phoenix, because that's my main miner software. Perhaps some mods can be done without new init values, so they will work without new __init__.py. But then I have to take care of 2 kernel versions. For now there is no need to worry, new version is not out . Dia
|
|
|
|
hchc
|
|
July 13, 2011, 05:34:33 AM |
|
>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!
I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.
|
|
|
|
............
| . | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▀ ▀▓▓▓▀ ▀▓▓▀ ▀▓▓▓▓ ▓▓▓▓▓▄ ▄▓▓▓▄ ▄▓▓▄ ▄▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓ ▓▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▀ ▀▓▓▓▀ ▀▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▄ ▄▓▓▓▄ ▄▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
| . | | . | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▀ ▀▓▓▓▀ ▀▓▓▀ ▀▓▓▓▓ ▓▓▓▓▓▄ ▄▓▓▓▄ ▄▓▓▄ ▄▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓ ▓▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▀ ▀▓▓▓▀ ▀▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▄ ▄▓▓▓▄ ▄▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
| . | ............
|
|
|
|
Diapolo (OP)
|
|
July 13, 2011, 05:57:00 AM |
|
>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!
I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.
- Win7 X64 SP1 - Cat 11.7 with SDK 2.4 and Runtime 2.4 (in order to be able to use AMD APP KernelAnalyzer) - Sapphire 5830 Xtreme @ 1000 MHz core / 350 MHz Mem - Phoenix 1.5: agression 12, vectors, bfi_int Dia
|
|
|
|
jedi95
|
|
July 13, 2011, 08:41:00 AM |
|
>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!
I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.
That might be tough since phatk doesn't work very well on older SDK versions. It was designed to be used with SDK 2.4, and on 2.1 I get better results with poclbm. The Ma() changes also apply to poclbm, so you won't see a gain there.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
Uzza
Newbie
Offline
Activity: 35
Merit: 0
|
|
July 13, 2011, 11:54:08 PM Last edit: July 14, 2011, 09:42:39 AM by Uzza |
|
I just did a quick comparison against poclbm for me.
On my dedicated 5870: poclbm SDK 2.1: ~424 phatk SDK 2.1: <420 phatk imp SDK 2.1: ~432
poclbm SDK 2.4: <424 phatk SDK 2.4: ~424 phatk imp SDK 2.4: ~437
So on SDK 2.1 your improvements made it so phatk was better than poclbm in 2.1, and way better in 2.4.
The init optimizations gave me a minor boost of ~0.5 MHs over 2011-07-11.
|
|
|
|
PcChip
|
|
July 14, 2011, 02:03:05 AM |
|
Uzza, two questions:
1.) what is "phatk imp"
2.) Surely you meant a 5870 instead of a 4870? Either that or you must have four 4870's to hit 430 MH/s!
|
Legacy signature from 2011: All rates with Phoenix 1.50 / PhatK 5850 - 400 MH/s | 5850 - 355 MH/s | 5830 - 310 MH/s | GTX570 - 115 MH/s | 5770 - 210 MH/s | 5770 - 200 MH/s
|
|
|
CYPER
|
|
July 14, 2011, 04:17:30 AM |
|
Uzza, two questions:
1.) what is "phatk imp"
2.) Surely you meant a 5870 instead of a 4870? Either that or you must have four 4870's to hit 430 MH/s!
1 - Phatk Improved - it's what this topic is all about. 2 - Most probably he meant 5870
|
|
|
|
Uzza
Newbie
Offline
Activity: 35
Merit: 0
|
|
July 14, 2011, 09:43:38 AM |
|
1 - Phatk Improved - it's what this topic is all about. 2 - Most probably he meant 5870 What this guy said.
|
|
|
|
nico_w
Newbie
Offline
Activity: 12
Merit: 0
|
|
July 14, 2011, 04:27:41 PM |
|
Sadly it only gives me 1Mhash/s increase from 344 to 345 on my server, but keep up the good work!
|
|
|
|
MiningBuddy
|
|
July 15, 2011, 11:33:29 AM Last edit: July 15, 2011, 12:02:21 PM by MiningBuddy |
|
Testing the 2011-07-11 kernel.7z
XFX 5870 @ 940/300 Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1 Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec] Stales before: 0.22% - > Stales after: 3.48% Over a 4 hour test period
XFX 5870 @ 940/300 Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4 Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec] Stales: Not tested due to adverse affects.
|
|
|
|
Diapolo (OP)
|
|
July 15, 2011, 12:15:11 PM |
|
Testing the 2011-07-11 kernel.7z
XFX 5870 @ 940/300 Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1 Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec] Stales before: 0.22% - > Stales after: 3.48% Over a 4 hour test period
XFX 5870 @ 940/300 Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4 Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec] Stales: Not tested due to adverse affects.
Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^. Btw. I had other things to do, but during the next week I will release a new version. Dia
|
|
|
|
MiningBuddy
|
|
July 15, 2011, 12:26:44 PM |
|
Testing the 2011-07-11 kernel.7z
XFX 5870 @ 940/300 Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1 Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec] Stales before: 0.22% - > Stales after: 3.48% Over a 4 hour test period
XFX 5870 @ 940/300 Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4 Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec] Stales: Not tested due to adverse affects.
Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^. Btw. I had other things to do, but during the next week I will release a new version. Dia Awesome, I look forward to it. I think the rejected shares was random variance from my side as it seems to have settled down to a more realistic 0.88%.
|
|
|
|
jedi95
|
|
July 15, 2011, 07:47:25 PM |
|
Testing the 2011-07-11 kernel.7z
XFX 5870 @ 940/300 Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1 Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec] Stales before: 0.22% - > Stales after: 3.48% Over a 4 hour test period
XFX 5870 @ 940/300 Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4 Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec] Stales: Not tested due to adverse affects.
Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^. Btw. I had other things to do, but during the next week I will release a new version. Dia Awesome, I look forward to it. I think the rejected shares was random variance from my side as it seems to have settled down to a more realistic 0.88%. Keep in mind that OpenCL kernel changes have NO effect on stale shares (aside from the VERY small difference in time it takes to run 1 execution of some number of hashes) All nonces found by the kernel to satisfy H == 0 are verified on the CPU prior to sending. Shares are also checked against the current known block before sending, in case new work was received while the kernel was executing. Basically this means that every share sent to the server is valid as far as the miner is concerned. If the OpenCL kernel is returning bad work it will never be sent to the server, and instead you will get "Unusual behavior from OpenCL. Hardware problem?" That said, changes to the python portion of a Phoenix kernel can increase stale shares if badly implemented. (see: FASTLOOP excessive stales with high aggression in older versions of Phoenix)
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
zimpixa
Member
Offline
Activity: 98
Merit: 10
|
|
July 16, 2011, 11:31:45 PM |
|
Every new version up to 07~ was faster than previous on both of my rigs. 11~ version however is faster for single 5850, but on 5870+5850 rig Im noticing minor slowdown (due to bigger deltas, means best performance is same, but it can go little lower). In adition primary GPU set '-f0' stopped to bottleneck other GPU (but changing from -f35 to -f0 didnt add any speed). After all tests I've changed version on single gpu rig and left old one on double gpu rig.
|
|
|
|
coblee
Donator
Legendary
Offline
Activity: 1654
Merit: 1351
Creator of Litecoin. Cryptocurrency enthusiast.
|
|
July 17, 2011, 07:53:58 AM |
|
Donation sent! When you release the next version, please explain whats changed in __init__.py. I'm using fpgaminer's poclbm w/ phatk: http://forum.bitcoin.org/index.php?topic=19169.0I'd like to see if we can make your phatk kernel still work with that. Thanks!
|
|
|
|
Diapolo (OP)
|
|
July 17, 2011, 01:00:49 PM |
|
Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix 1.5 miner) is included in this package and has to be used! The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.
The new version 2011-07-17 is ready for download . Should be faster on 58XX and 69XX cards again. This version will only work, if you use it with Phoenix and the supplied __init__.py file because of modifications to kernel variables! A very big thank you goes to user Vince for input and ideas!Download here: http://www.mediafire.com/?317u0y93u7mnbysHave fun, Dia
|
|
|
|
dikidera
|
|
July 17, 2011, 02:20:07 PM |
|
With this version on my 5870: 410.58/411.6 -> 413.44/414Mhash/s
But on my HD5850, i see a decrease of a minimum of 0.50 mhash/s
|
|
|
|
CYPER
|
|
July 17, 2011, 02:24:58 PM Last edit: July 17, 2011, 02:36:39 PM by CYPER |
|
2011-07-07 was the best for me so far and with 2011-07-17 I see no improvements - the speed is 1758-1760 as before with the same fluctuations.
4x XFX 5870 @ 960Mhz Core & 300Mhz Memory Ubuntu 32bit SDK 2.1 11.5 Drivers
|
|
|
|
Peao
Legendary
Offline
Activity: 1320
Merit: 1001
|
|
July 17, 2011, 02:48:05 PM |
|
I noticed an improvement in performance.
Thank you, Dia!
|
|
|
|
erek
Newbie
Offline
Activity: 36
Merit: 0
|
|
July 17, 2011, 04:04:50 PM |
|
Solid increase equivalent to nearly 5MHz overclock w/ 2011-07-17
Catalyst 11.7 Early + SDK 2.5
905/340 GPU/VRAM clocks on (2x 6970s)
hitting 812+ MH/sec
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 17, 2011, 04:22:47 PM |
|
Same hash rate as mod.zip. No "more" hashes.
|
|
|
|
bcforum
|
|
July 17, 2011, 05:43:55 PM |
|
I get the same speed with DiabloMiner
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
Diapolo (OP)
|
|
July 17, 2011, 05:52:28 PM |
|
I get the same speed with DiabloMiner
No need to use Phoenix or this kernel then ... Dia
|
|
|
|
dadittox
Newbie
Offline
Activity: 23
Merit: 0
|
|
July 17, 2011, 06:17:58 PM |
|
Updated from 2011-07-11 to 2011-07-17 kernel. Hash rate increased from 325 to 327 on 6950, and from 280 to 281 on 6870 all at stock clocks. Not a big increase but it's always nice to have a few mhashes for free. Keep up good work!
|
|
|
|
MiningBuddy
|
|
July 17, 2011, 10:24:47 PM |
|
Got an extra 1mhs from my cards with the latest patch, cheers
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 17, 2011, 11:32:27 PM |
|
Can you make another version that includes all the recent changes and __init__.py does not need to changed? thanks
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1036
|
|
July 18, 2011, 01:16:17 AM |
|
I get lots of 'unusual opencl behavior. hardware problem?' on the latest version, using a fresh Catalyst 11.6 driver (not CPU buggy hotfix) on WinXP on two different miners, going back to stock phoenix and 07-11 kernel is fine.
|
|
|
|
Tartarus
Newbie
Offline
Activity: 47
Merit: 0
|
|
July 18, 2011, 02:47:58 AM |
|
Can you make another version that includes all the recent changes and __init__.py does not need to changed? thanks
Why? Just unzip in the kernels/phatk/ directory.
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
July 18, 2011, 03:09:03 AM |
|
Can you make another version that includes all the recent changes and __init__.py does not need to changed? thanks no sure your reasoning, but if you want that, just grab the 7-11 version.. the version before this latest. You dont gain all that much from the changes in __init__.py if you really want to do with out, you wont be shooting yourself in the foot. So grab the last version. Personally I appreciate even the smallest increase.. so thanks for the work
|
mooo for rent
|
|
|
coblee
Donator
Legendary
Offline
Activity: 1654
Merit: 1351
Creator of Litecoin. Cryptocurrency enthusiast.
|
|
July 18, 2011, 03:23:28 AM |
|
Can you make another version that includes all the recent changes and __init__.py does not need to changed? thanks
Why? Just unzip in the kernels/phatk/ directory. Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.
|
|
|
|
Diapolo (OP)
|
|
July 18, 2011, 05:23:18 AM |
|
Can you make another version that includes all the recent changes and __init__.py does not need to changed? thanks
Why? Just unzip in the kernels/phatk/ directory. Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py. It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry. You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without. Dia
|
|
|
|
phorensic
|
|
July 18, 2011, 08:28:53 AM |
|
New version 2011-07-17 getting a lot of: 2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?
|
|
|
|
Diapolo (OP)
|
|
July 18, 2011, 08:37:33 AM |
|
New version 2011-07-17 getting a lot of: 2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem? I have never seen that message during my tests with the release version of the kernel. What OS, which SDK are you on?
|
|
|
|
dikidera
|
|
July 18, 2011, 08:48:53 AM |
|
New version 2011-07-17 getting a lot of: 2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem? I have never seen that message during my tests with the release version of the kernel. What OS, which SDK are you on? I wonder if he copied over the init.py file.
|
|
|
|
phorensic
|
|
July 18, 2011, 09:03:54 AM |
|
Yes I copied over the init file, because without it it won't even run. Windows 7 64-bit, Catalyst 11.5, Stream 2.4. Haven't had this message pop up with any other version of phatk kernel, nor any other kernel for that matter.
|
|
|
|
bcforum
|
|
July 18, 2011, 02:34:20 PM |
|
I get lots of 'unusual opencl behavior. hardware problem?' on the latest version, using a fresh Catalyst 11.6 driver (not CPU buggy hotfix) on WinXP on two different miners, going back to stock phoenix and 07-11 kernel is fine.
I had the same issue under windows which magically went away when I switched to Linux.
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
ahitman
|
|
July 18, 2011, 03:15:25 PM |
|
New version 2011-07-17 getting a lot of: 2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem? I had the same issue on my 5850, but when I took 5Mhz off my overclock the error went away, guessing it was picking up some errors from being pushed harder?
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1011
|
|
July 18, 2011, 03:17:51 PM |
|
I'm on Linux and I'm still able to get these error messages, even with earlier versions of the phatk patch from this thread. For me they start happening if my clocks are too high and become more frequenty as I increase the clocks. For my 5850 at stock volts, 1015MHz doesn't generate any errors and with 1020 MHz they are very occasional but become more frequent as I ramp up to 1035MHz (my card always freezes at 1040MHz). When I overvolted my 5850 to 1.25V and took it to 1110 MHz for 3 hours I got a few but noticed that I was generating good shares with the same OpenCL work request as was throwing errors so it seems the error doesn't invalidate the whole second and likely just a very small portion of it. I have a screenshot here. At lower voltages I never see these errors. Either the card runs error free or crashes and the MHz line dividing these two states is pretty fine in my experience. It's interesting that the kernel version might affect the frequency of such errors but they don't bother me. Thanks for this latest kernel patch Diapolo. My clock rates increased by 0.3 MH/s each. 2x5850: 722.2 MH/s -> 722.8 MH/s. I'll send another small tip.
|
|
|
|
Diapolo (OP)
|
|
July 18, 2011, 05:42:53 PM |
|
I'm on Linux and I'm still able to get these error messages, even with earlier versions of the phatk patch from this thread. For me they start happening if my clocks are too high and become more frequenty as I increase the clocks. For my 5850 at stock volts, 1015MHz doesn't generate any errors and with 1020 MHz they are very occasional but become more frequent as I ramp up to 1035MHz (my card always freezes at 1040MHz). When I overvolted my 5850 to 1.25V and took it to 1110 MHz for 3 hours I got a few but noticed that I was generating good shares with the same OpenCL work request as was throwing errors so it seems the error doesn't invalidate the whole second and likely just a very small portion of it. I have a screenshot here. At lower voltages I never see these errors. Either the card runs error free or crashes and the MHz line dividing these two states is pretty fine in my experience. It's interesting that the kernel version might affect the frequency of such errors but they don't bother me. Thanks for this latest kernel patch Diapolo. My clock rates increased by 0.3 MH/s each. 2x5850: 722.2 MH/s -> 722.8 MH/s. I'll send another small tip. Very nice posting with relevant information to all who encounter this error. Thanks for sharing . I never had this error, because my 5830 clocks were (and are) never above 1000 MHz for the chip, so it seems logical to me. Dia
|
|
|
|
phorensic
|
|
July 18, 2011, 07:08:31 PM |
|
I think I jumped the gun. I believe I am having a real hardware problem. It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell. I think what happened is that these errors were hidden from the console until this new kernel update! If that's the case, kudos for making the errors work! haha.
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 18, 2011, 09:23:29 PM |
|
I think I jumped the gun. I believe I am having a real hardware problem. It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell. I think what happened is that these errors were hidden from the console until this new kernel update! If that's the case, kudos for making the errors work! haha.
Not necessarely. I had the same problem with a card in previous versions of the kernel. After 20 minutes it would produce that message in phoenix or would crash poclbm. But with exactly the same configuration and later kernels it was solved, even when it was producing higher hashing rates. I am not sure why it happens exactly.
|
|
|
|
shin234
Newbie
Offline
Activity: 39
Merit: 0
|
|
July 19, 2011, 01:25:44 AM |
|
my stales are at / shares 4234 / stale (7, 0.17%) / after the latest version using AOCLBF on 2 5850's one running at 840/300 and 1000/300 the latter is because of getting a card with a crappy part number from sapphire
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 19, 2011, 04:10:23 AM |
|
Can you make another version that includes all the recent changes and __init__.py does not need to changed? thanks
Why? Just unzip in the kernels/phatk/ directory. Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py. It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry. You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without. Dia what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes?
|
|
|
|
Diapolo (OP)
|
|
July 19, 2011, 05:37:38 AM |
|
Can you make another version that includes all the recent changes and __init__.py does not need to changed? thanks
Why? Just unzip in the kernels/phatk/ directory. Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py. It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry. You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without. Dia what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes? What I do here is just hobby and I don't want it to take even more time, I hope you understand that. I can't maintain 2 different kernel versions, sorry. Dia
|
|
|
|
burningrave101
Newbie
Offline
Activity: 55
Merit: 0
|
|
July 19, 2011, 04:11:45 PM |
|
Can you make another version that includes all the recent changes and __init__.py does not need to changed? thanks
Why? Just unzip in the kernels/phatk/ directory. Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py. It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry. You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without. Dia what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes? Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 19, 2011, 04:21:30 PM |
|
Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster. Backup pools. Its a big plus (and peace of mind). Also, for me poclmb is slightly faster than phoenix with the same kernel.
|
|
|
|
burningrave101
Newbie
Offline
Activity: 55
Merit: 0
|
|
July 19, 2011, 04:43:58 PM |
|
Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster. Backup pools. Its a big plus (and peace of mind). Also, for me poclmb is slightly faster than phoenix with the same kernel. Can't you just create another Phoenix miner on a different pool with a low aggression value and it will take over if your main pool worker goes idle?
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
July 19, 2011, 06:08:00 PM |
|
Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster. Backup pools. Its a big plus (and peace of mind). Also, for me poclmb is slightly faster than phoenix with the same kernel. Can't you just create another Phoenix miner on a different pool with a low aggression value and it will take over if your main pool worker goes idle? I tried this (I was a phoenix user until poclbm added backup pools), but the second miner would take some hashing power from the main one (big deal), and then if the main one went down it would not perform at full speed because the aggression was lower.
|
| | |