Bitcoin Forum
October 12, 2024, 12:50:57 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [All]
  Print  
Author Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13  (Read 106854 times)
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 04, 2011, 08:15:55 AM
Last edit: February 25, 2012, 02:26:12 PM by Diapolo
 #1

This is a repost from the Newbies forum, because I'm now allowed to post here :).
original Thread is located here: http://forum.bitcoin.org/index.php?topic=25135.0

If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).



Important (2012-01-13): The FASTLOOP=False parameter is not needed anymore, because FASTLOOP defaults to false in this version. Update: FASTLOOP=True works now, I uploaded a fixed version!

Important: since OpenCL SDK / Runtime version 2.6 AMD updated their OpenCL compiler, so that some older kernels and optimizations in them seem to not work anymore or are not needed anymore. In order to reflect this change I had to edit the kernel performance section of this thread.

Important: since version 2011-08-27 you don't need to supply the BFI_INT switch anymore. If your HW supports it, it's enabled automatically. To disable it use BFI_INT=false.

Important: since version 2011-08-04 (pre-release) you have to use the switch VECTORS2 instead of VECTORS. I made this change to be clear what vectors are used in the kernel (2- or 4-component). To use 4-component vectors use switch VECTORS4.

Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix miner) is included in this package and has to be used! The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.



This is the preferred switch for Phoenix with phatk_dia in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 VECTORS2 WORKSIZE=128


Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg
Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji
Download version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4j
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788
Download version 2011-07-17: http://www.mediafire.com/?4zxdd5557243has
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6
Download version 2011-07-07: http://www.mediafire.com/?o7jfp60s7xefrg4
Download version 2011-07-06: http://www.mediafire.com/?f8b8q3w5u5p0ln0
Download version 2011-07-03: http://www.mediafire.com/?xlkcc08jvp5a43v
Download version 2011-07-01: http://www.mediafire.com/?5jmt7t0e83k3eox

Kernel performance (BFI_INT / VECTORS2 / WORKSIZE=128 / SDK 2.6 / APP KernelAnalyzer 1.11 - Cal 11.12 profile):
HD5870
2011-08-20: 22 GPR / 1427 ALU OPs / 66 CF OPs
2011-08-27: 22 GPR / 1426 ALU OPs / 66 CF OPs
2011-12-21: 20 GPR / 1400 ALU OPs / 66 CF OPs
2012-01-13: 21 GPR / 1394 ALU OPs / 67 CF OPs

HD6970
2011-08-20: 21 GPR / 1687 ALU OPs / 66 CF OPs
2011-08-27:  23 GPR / 1688 ALU OPs / 68 CF OPs
2011-12-21: 21 GPR / 1687 ALU OPs / 66 CF OPs
2012-01-13: 20 GPR / 1687 ALU OPs / 66 CF OPs



Kernel performance (BFI_INT / VECTORS2 / SDK 2.5 / APP KernelAnalyzer 1.9 - Cal 11.7 profile):
HD5870
original phatk 1.X: 1393 ALU OPs
2011-07-01: 1389 ALU OPs
2011-07-03: 1385 ALU OPs
2011-07-06: 1380 ALU OPs
2011-07-07: 1380 ALU OPs
2011-07-11: 1378 ALU OPs
2011-07-17: 1376 ALU OPs
2011-08-04 (pre-release): 1368 ALU OPs
2011-08-11: 1364 ALU OPs
2011-08-27: 1363 ALU OPs (30 less compared to original phatk 1.X)
HD6970
original phatk 1.X: 1707 ALU OPs
2011-07-01: 1710 ALU OPs
2011-07-03: 1706 ALU OPs
2011-07-06: 1702 ALU OPs
2011-07-07: 1702 ALU OPs
2011-07-11: 1701 ALU OPs
2011-07-17: 1699 ALU OPs
2011-08-04 (pre-release): 1689 ALU OPs
2011-08-11: 1687 ALU OPs
2011-08-27: 1687 ALU OPs (20 less compared to original phatk 1.X)



changelog:

2012-01-13
Kernel:
- modified: Disclaimer is now the same as in original Phoenix package
- removed: all (u) typecasts in front of scalars, where vectors and scalars were used together because per OpenCL definition this is not needed
- removed: all () brackets around n in the #define parts of the kernel
- removed: S0(), which is now again merged into s0()
- removed: brackets around the commands in t1W(), t1(), t2() and W() were removed, to allow the compiler to reorder these
- added: W() function missed an ; at it's end
- added: init variable B1addK6 used in round 6 to save an add -> THX to DiabloD3
- added: a (uint) typecast in front of get_local_id() and get_group_id() calls, because return value could be 64 bits long, which is not wanted
- modified: replaced all ma() + s0() or s0() + ma() calls with t2()
- modified: round 6 now uses the new new B1addK6 variable
- modified: reordered W[] calculation for rounds 32, 91 and 92
- modified: rounds 121, 122 and 123 to not compute Vals[4], Vals[5] and Vals[6], because they are not needed for final computation of Vals[7] -> THX to jhajduk
- modified: removed + H[7] from round 124 and use -0xec9fcd13 to check for valid nonces
- added: result_r124 variable to take the result of the last round 124, this saves a few ALU OPs on VLIW5 GPUs
Python Init:
- modified: replaced spaces with tabs in the source code formatting (I really dislike this part in Python ^^)
- modified: a few comments and commands were reformatted for better readability or to be better understandable
- modified: FASTLOOP parameter now defaults to False, which means you don't need to supply FASTLOOP=False anymore
- removed: OUTPUT_SIZE is not used anymore so all references to it were removed
- modified: changed REVISION to 122
- modified: moved the WORKSIZE checks below the part where the check, if and which vectors should be used is performed
            this takes into account, that the global worksize passed to the kernel is influenced by vector usage and vectorsize
            (currently the use of FASTLOOP can break this, because of the "dynamic" number of iterations)
- added: some debug info about worksize and pyOpenCL is displayed at the start
- added: B1 + K[6] is passed as new kernel parameter
- modified: made enqueue_read_buffer() / enqueue_write_buffer() blocking and removed finish() after the read, as per AMDs recommendations
            to minimize API overhead

2011-08-27:
Kernel:
- added: code path for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the kernel, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- modified: non BFI_INT Ch() function, which was broken in 2011-08-11 -> THX to Vince
- modified: kernel output buffer is now an ulong array and not an uint array
- removed: OUTPUT_SIZE argument is not passed and used in the kernel anymore
- modified: WORKSIZEx4, WORKSIZEx3 and WORKSIZEx2 arguments were merged into WORKSIZExVECSIZE
- modified: removed, reordered and added some brackets and type-casting stuff in the kernel
- modified: restored command order for round 108 - 123 to free a GPR
- modified: added H[7] into round 124 calculation
- modified: changed the checking for positive nonces again to cover the H[7] change
- modified: writing of nonces to output now uses 1 write for Vec2 and max. 2 writes for Vec4, because 2x uints are now encoded into 1x ulong
Python Init:
- added: code for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the Python init, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- added: detection of maximum supported WORKSIZE per Device, which is used if no WORKSIZE is supplied, if supplied WORKSIZE > max. supported WORKSIZE
    or if WORKSIZE is not a power of 2
- added: code to decode the ulong from the output buffer into 2x uint and process the results
- modified: comments, code formating and line breaks for better readability
- modified: output buffer size is now the WORKSIZE -> THX to Phaetus

2011-08-11:
- modified: reverted a former change to the Ma() function to save an ALU OP for 69XX cards
- added: S0() and S1() function, which is a compiler help -> THX Phateus
- modified: a few brackets and layout of all helper functions for better readability and compatibility
- added: t2() function, which is (s0(n) + ma(n)) and saves a few GPRs -> THX Phateus and myself (had this in earlier, but removed it sometime ^^)
- modified: changed layout of kernel definition for better readability
- modified: all values which for example had a 10u now have a 10U (uppercase) to be consistent in the whole kernel
- modified: modified round 94 W calculation for better performance
- modified: round 108 - 123 now consists of 2 W() blocks followed by 2 sharoundW() blocks to save a GPR
- modified: changed the checking for positive nonces again to never create an invalid share and lower ALU OP usage

2011-08-04 (pre-release):
- added: user Vince into disclaimer -> THX Vince :)
- added: kernel is now able to work with 4-component vectors (switch VECTORS4) -> THX to Phateus
- modified: to use 2-component vectors I renamed the switch VECTORS to VECTORS2
- added: __attribute__((reqd_work_group_size(WORKSIZE, 1, 1))) -> THX to Phateus
- added: constants PreW31 and PreW32, which store P2() + P4() for round 31 and 32 -> THX to Phateus
- renamed - modified: W17_2 is now PreW19, W2 is now PreW18, PreVal4addT1 is now PreVal4 (= PreVal4 + T1), state0subT1 is now PreVal0 (= Preval4 + state0)
- modified: base is now declared as u to save the addidion of uint2(0, 1) or uint4(0, 1, 2, 3) for W_3 init -> THX to Phateus
- modified: nonce calculation now uses the local Work-Item ID, the group ID and the WORKSIZE instead of only the global Work-Item ID -> THX to Phateus
- added: saved a multiplication by passing WORKSIZEx2 and WORKSIZEx4 constants to the kernel
- modified: calculation for W[18 - O] was optimized so that P2(18) is only calculated for x component (if Vectors are used), because x and y only differ
       in the LSB and afterwards Bit 14 and 25 are rotated for W[18 - O].y -> THX to Phateus
- modified: saved an addition for Vals[0] init, because of the change to PreVal0
- modified: reordered code for round 4 - 95 to optimize for less ALU OPs used -> THX Phateus and myself ^^
- modified: ordering of variables in additions for Round 124 was changed to optimize for less ALU OPs used
- modified: rewrote the part where nonces are checked, if they are positive and where they are written into output buffer
       (saves 2 global writes per work-item and saves additional ALU OPs)
- modified: changed variables W_3, P2_18_x, P2_18 and nonce into a constant
- modified: changed code formating for rounds 4 - 124 better readability
- removed: some comments to cleanup the code

2011-07-17:
- added: offset for W[] array to reduce it's size -> THX to user Vince
- modified: function t1() renamed to t1W() / function sharound() renamed to sharoundW()
- added: function t1() and sharound() which are used where the W[] addition can be left out, because W[] == 0
    I guess the compiler already does this optimization, but doesn't hurt) -> THX to user Vince
- modified: P1() - P4() and W() to male use of the offset
- modified: quite a few kernel parameters have new values or were added (mixed ideas from User Vince with own ones)
    C1addK5: C1addK5 = C1 + K[5]: C1addK5 = C1 + 0x59f111f1
    D1: D1 = D1 + K[4] + W[4]: D1 = D1 + 0xe9b5dba5 + 0x80000000U
    W2: W2 + W16 in P1(): W2 = P1(18) + P4(18)
    W17_2: 0x80000000U in P2() = 0x11002000 + W17 in P1(): W17_2 = P1(19) + P2(19)
    PreValaddT1: PreValaddT1 = PreVal4 + T1
    T1substate0: T1substate0 = T1 - substate0
- added: variable W_3, which stores the first value formely held in W[3]
- added: Temp variable used to speed up calculation for rounds 4 and 5
- modified: changed round 3 so that it's more efficient (uses: Vals[0] and Vals[4])
- modified: W[0] - W[14] are now kind of hard-coded or left out, where they were 0
- modified: optimized P1(18) + P2(18) + P4(18)
- modified: optimized P1(19) + P2(19) + P4(19)
- modified: optimized round 4 + 5
- modified: rounds 6 - 14 and 73 - 78 now use new sharound() without W[] addition
- modified: offset added for all parts, where W[] is used
- modified: W_3 is used as result instead of W[3] (W[3] is still used to generate random possition in output buffer) -> THX to user Vince

2011-07-11:
- modified: constant H[7] has a new value (saves an addition in round 124)
- modified: non BFI_INT Ch() function now uses OpenCL built-in bitselect
- modified: reordered W[] calculations for round 18 - 30, 87 and 94
- modified: reordered calculation for round 5
- modified: W[] calculation for round 80 - 86 is now a block before sharound() is called
- removed: K[60] from round 124 (because of new H[7] value)

2011-07-07:
- removed: some large comments in the source were removed
- modified: Ma() function is now unique in the kernel, no matter if BFI_INT is used or not -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: Ch() function which uses OpenCL bitselect() command (but it's not active, so you are free to try it) -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- modified: u W[128] is replaced with u W[124] because no more than 124 values are used
- modified: initialisation for Vals[0], Vals[3], Vals[4] and Vals[7] is now processed in other places to save some unneeded writes to these variables
- fixed: some hex values, which were used in vector additions are now properly type-casted, which hopefully restores AMD APP SDK 2.1 compatibility
- modified: rounds 3, 4 and 5 were modified for better performance (guess this can be tuned, if I have a working KernelAnalyzer)

2011-07-06:
- modified: H[] constants were reordered (2 were not used because of earlier mods)
- added: ulong L constant added (it's value doesn't fit into an uint)
- modified: new Ma() for non BFI_INT capable cards, should be faster -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- removed: t1W()
- modified: t1() reordered function calls for better performance
- modified: W() reordered function calls for better performance
- modified: sharound() removed writing to t1, now t1() is called twice, which makes this function FASTER (OpenCL compiler optimization)
- removed: sharound2() (if needed W() + sharound() is used instead)
- removed: partround() not needed because of another solution for round 3 and 124
- removed: t1 and t1W variabled
- modified: rounds 3, 19, 30, 81, 87, 94 and 124 were modified for better performance

2011-07-03:
- removed: t2(), w(n), r0(x), r1(x), R0(n) and R1(n)
- renamed - modified: R(x) to W(x) plus now uses P1, P2, P3 and P4 directly
- modified: P1(x) and P2(x) to not use R1(x - 2), R0(x - 15) but do that directly
- modified: SHA rounds 31, 32, 47 - 61, 86, 87, 114 - 119 now use sharound2() instead of W() + sharound()
- modified: reordered code for SHA rounds 66 - 94 -> saw no decrease in performance -> better readability
- modified: SHA rounds 18, 19, 20, 80, 93, 94 now use a simpler calculation because of removed zero addions
--> 1x P1(x), 2x P2(x), 4x P3(x) and 2x P4(x) were removed which should give a little MHash/sec boost
- modified: sharound() so that a double execution of t1() is avoided -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

2011-07-01:
Code:
Vals[7] = 0xb0edbdd0 + K[0] +  W[64] + 0x08909ae5U; -> Vals[7] = 0xfc08884d + W[64];
Vals[3] = 0xa54ff53a + 0xb0edbdd0 + K[0] +  W[64]; -> Vals[3] = 0x198c7e2a2 + W[64];
- removed the
Code:
Vals[7] += H[7]
addition and replaced the final if-statements in the Kernel
- reordered some W[n] = statements to remove some unneeded additions
- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse :D)
- removed some unneeded brackets
- re-formatted for better readability

If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
lebuen
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 04, 2011, 08:38:12 AM
 #2

Works perfectly for me, although only slight increase in performance (371 -> 373 MH/s). But I had the previous patch already in place. Thanks!
Fletch
Full Member
***
Offline Offline

Activity: 168
Merit: 100


I'll have a steak sandwich and a... steak sandwich


View Profile
July 04, 2011, 08:40:32 AM
 #3

Great, thanks. I went from 240 -> 242.5 on my 5850's. I was using only the "3% Ma-function patch" before.

- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse Cheesy)
I've never developed any code for GPUs or used OpenCL, but wouldn't the compiler take care of that for you? At least all C compilers would.

HashPeak - GPU mining hashrate peak detector
BTC: 1FLETCHvcUKosefrcZCLUQTtvx4WvgnYMC
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 04, 2011, 09:38:52 AM
 #4

Great, thanks. I went from 240 -> 242.5 on my 5850's. I was using only the "3% Ma-function patch" before.

- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse Cheesy)
I've never developed any code for GPUs or used OpenCL, but wouldn't the compiler take care of that for you? At least all C compilers would.

Great that you benefit from the modifications Smiley. Your comment about the compiler stuff IS right, but at least it doesn't make things worse Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 04, 2011, 10:37:43 AM
 #5

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 10:41:57 AM
 #6

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)


I went from 56->61Mhash/s on my 4850. Phatk is now faster than DiabloMiner. Getting 59Mhash/s with DiabloMiner.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 04, 2011, 10:44:37 AM
 #7

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)


Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though.
Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 04, 2011, 10:46:19 AM
 #8

I went from 56->61Mhash/s on my 4850. Phatk is now faster than DiabloMiner. Getting 59Mhash/s with DiabloMiner.

Good to know that 4XXX series get a boost, too Smiley. What SDK are you on?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 04, 2011, 10:49:47 AM
 #9

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)


Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though.
Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel?

Dia

I'm using SDK 2.1 because when I compared it with 2.4 I found a slight speed improvement with both phatk and poclbm (much larger with poclbm).

I haven't tried SDK 2.4 since applying the MA patch so that is maybe worth a try.

BFI_INT is definitely being used.  If I restart the command without BFI_INT on my 399.4 MH/s gpu I get 354.5 MH/s instead.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 04, 2011, 10:52:56 AM
Last edit: July 04, 2011, 11:37:47 AM by teukon
 #10

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)


Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though.
Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel?

Dia

I'm using SDK 2.1 because when I compared it with 2.4 I found a slight speed improvement with both phatk and poclbm (much larger with poclbm).

I haven't tried SDK 2.4 since applying the MA patch so that is maybe worth a try.

BFI_INT is definitely being used.  If I restart the command without BFI_INT on my 399.4 MH/s gpu I get 354.5 MH/s instead.


Just tried SDK 2.4 on my 399.4 MH/s gpu.  It went down to 393.7 MH/s (actually 394.3 MH/s but occasionally dropping off to 390-391, I took an average).

Edit: I compared the new kernel with the old one using SDK 2.4 and the improvement was 3.1 MH/s (+0.79%).  This is a nice improvement but not enough to make me move away from SDK 2.1.  Also, SDK 2.4 causes the MH/s to drop suddenly by 3 or 4 MH/s every so often (variance is within 0.5 MH/s with SDK 2.1 for me) and even the peak values I achieve with SDK 2.4, new kernel or not, are below my SDK 2.1 average.

Ah well, good work though.  I sent you some BTC anyway simply because you tried to help me fix my problem.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 11:48:23 AM
 #11

I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 04, 2011, 12:00:17 PM
 #12

I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.

I found when pool mining that most of my rejects came shortly after new work was pushed.  As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 12:08:45 PM
 #13

I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.

I found when pool mining that most of my rejects came shortly after new work was pushed.  As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency.


I'm aware of that, but I've had more rejects in the 10 minutes I'm running it than I've had all day. 411 shares, 22 stale. 13 of those are from testing with this kernel in the last 45 minutes or so.

I'm trying another pool and it's a bit better at 6 Accepted 2 Rejected. On bitclockers I was running at 300+ shares and 11 rejects with DM but getting the numbers above with the modded phatk kernel using phoenix. Trying it again.

Quote
[04/07/2011 05:05:06] Phoenix 1.50 starting...
[04/07/2011 05:05:06] Connected to server
[04/07/2011 05:05:55] Result: 12d3028b accepted
[04/07/2011 05:06:46] Result: e2513abe accepted
[04/07/2011 05:09:05] LP: New work pushed
[04/07/2011 05:09:09] Result: 55153340 accepted
[04/07/2011 05:09:48] Result: c49813f2 accepted
[04/07/2011 05:11:17] Result: 3e257a0d rejected
[04/07/2011 05:11:25] Result: 28da50c1 rejected
[04/07/2011 05:11:26] Result: e062d59e rejected
[04/07/2011 05:11:48] LP: New work pushed
[04/07/2011 05:12:05] Result: 75d54b7b rejected
[04/07/2011 05:12:08] Result: c832f2b0 rejected

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 04, 2011, 12:15:40 PM
 #14

I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.

I found when pool mining that most of my rejects came shortly after new work was pushed.  As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency.


I'm aware of that, but I've had more rejects in the 10 minutes I'm running it than I've had all day. 411 shares, 22 stale. 13 of those are from testing with this kernel in the last 45 minutes or so.

I'm trying another pool and it's a bit better at 6 Accepted 2 Rejected. On bitclockers I was running at 300+ shares and 11 rejects with DM but getting the numbers above with the modded phatk kernel using phoenix. Trying it again.

Yes, after 45 minutes things look highly suspect to me.  A good 3 hour test is useful for comparing different very-good setups but this tweak seems to have really hurt your accept/reject ratio.  Let us know your best when you're done testing.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 12:23:02 PM
Last edit: July 04, 2011, 12:58:11 PM by OCedHrt
 #15

With the original phatk kernel (ma patched)

Quote
[04/07/2011 05:13:33] Phoenix 1.50 starting...
[04/07/2011 05:13:33] Connected to server
[04/07/2011 05:14:21] Result: 8c4fd15d accepted
[04/07/2011 05:14:45] Result: f15aefe5 accepted
[04/07/2011 05:15:35] Result: 9d5dfc38 rejected
[04/07/2011 05:19:21] LP: New work pushed
[04/07/2011 05:19:54] Result: 8adaadf6 accepted
[04/07/2011 05:19:57] Result: 5382cf90 accepted
[04/07/2011 05:22:54] Result: 2d0233f8 rejected
[04/07/2011 05:24:03] Result: 28c05c3a rejected
[04/07/2011 05:25:41] Result: 9dff1142 rejected
[04/07/2011 05:25:54] Result: 33095b05 accepted
[04/07/2011 05:26:05] Result: 3ec67e7e accepted
[04/07/2011 05:27:33] Result: 5307e072 accepted
[04/07/2011 05:27:37] Result: 20237b07 accepted
[04/07/2011 05:29:18] Result: c8abce0f rejected

It's actually 05:22 now so duration is same but number of results is significantly less though total accepted is same. Hashrate is 57 vs 61 with your kernel.

Update: Some more results...seems like it may just be a phoenix thing with bitclockers. I will get some data from DM for reference but I may have to go ask in Phoenix/GUIMiner thread.

From DiabloMiner
Quote
[7/4/11 5:31:58 AM] Started
[7/4/11 5:31:58 AM] Connecting to: http://pool.bitclockers.com:8332/
[7/4/11 5:31:58 AM] Using AMD Accelerated Parallel Processing OpenCL 1.1 AMD-APP
-SDK-v2.5 (684.211)
[7/4/11 5:32:00 AM] Added ATI RV770 (#1) (10 CU, local work size of 128)
[7/4/11 5:33:20 AM] Accepted block 1 found on ATI RV770 (#1)
[7/4/11 5:35:41 AM] Accepted block 2 found on ATI RV770 (#1)
[7/4/11 5:36:49 AM] Accepted block 3 found on ATI RV770 (#1)
[7/4/11 5:36:49 AM] Accepted block 4 found on ATI RV770 (#1)
[7/4/11 5:37:54 AM] Rejected block 1 found on ATI RV770 (#1)
[7/4/11 5:39:40 AM] Accepted block 5 found on ATI RV770 (#1)
[7/4/11 5:40:23 AM] Accepted block 6 found on ATI RV770 (#1)
[7/4/11 5:40:34 AM] Accepted block 7 found on ATI RV770 (#1)
[7/4/11 5:40:56 AM] Accepted block 8 found on ATI RV770 (#1)
[7/4/11 5:41:53 AM] Accepted block 9 found on ATI RV770 (#1)
[7/4/11 5:42:15 AM] Accepted block 10 found on ATI RV770 (#1)
[7/4/11 5:42:38 AM] Accepted block 11 found on ATI RV770 (#1)
[7/4/11 5:42:59 AM] Accepted block 12 found on ATI RV770 (#1)
[7/4/11 5:43:57 AM] Accepted block 13 found on ATI RV770 (#1)
[7/4/11 5:44:48 AM] Accepted block 14 found on ATI RV770 (#1)

Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix.

Update again: No issues on poclbm using your kernel so I guess it's Phoenix.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 04, 2011, 03:22:12 PM
 #16

Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix.

Update again: No issues on poclbm using your kernel so I guess it's Phoenix.

Dunno what the difference is between the 2 OpenCL wise. Setup, command queues, perhaps kernel result download or processing.
If you see a problem you should contact jedi95, perhaps he can clear this up?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
huayra.agera
Full Member
***
Offline Offline

Activity: 154
Merit: 100



View Profile
July 04, 2011, 04:20:35 PM
 #17

Hi! Just tested this, it did boost my 5850 @ 960/300/1.174v from 390 > 395. However, it seems to drop off to 380 all of a sudden like 3-4 seconds then back up to 395. Still, an increase, thanks for the work!

BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 05:48:07 PM
 #18

Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix.

Update again: No issues on poclbm using your kernel so I guess it's Phoenix.

Dunno what the difference is between the 2 OpenCL wise. Setup, command queues, perhaps kernel result download or processing.
If you see a problem you should contact jedi95, perhaps he can clear this up?

Dia

I will look into it. GUIMiner has their own Phoenix and that may be related.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
gfaust
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
July 04, 2011, 11:50:16 PM
 #19

I was already running the "#define Ma" optimized kernel, and this is good for another .5% on top of that.
shakaru
Sr. Member
****
Offline Offline

Activity: 406
Merit: 250


QUIFAS EXCHANGE


View Profile
July 04, 2011, 11:58:33 PM
 #20

went from 306-314 on my 5830's Sent you a little thank you. People PAY THIS MAN!

                             ▄▄▄████████▄▄▄
                         ▄▄██████████████████▄▄
                       ▄███████▄▄▄▄▄▄▄▄▄▄███████▄
                     ▄█████▄▄██████████████▄▄█████▄
        ██████  █████████▄████████████████████▄█████
        ██████  ███████▄████████▄▄▄▄▄▄▄▄████████▄████
                      ▄██████▀████████████▀██████▄████
███████   █████████████████████████████████████████████
███████   █████████████████████████████████████████████
                   ████████████████████████████████████
     ██████████████████████████████████████████████████
     ██████████████████████████████████████████████████
                     █████████████████████████████████
            ██████████▀██████▄████████████▄██████▀████
            ███████████▀████████▀▀▀▀▀▀▀▀▀▀███████▄███
                    █████▀████████████████▄▀██████▄
                     ▀█████▀▀██████████████▀██▀██████▄
                       ▀███████▀▀▀▀▀▀▀▀▀▀███████▀▀▀▀▀▀
                         ▀▀██████████████████▀▀
                             ▀▀▀████████▀▀▀
QUIFAS                    
                    ███
 █              ███ ███
 █              ███  █
███          █  ███
███         ███  █
███  █      ███  █
    ███  █  ███  █
    ███ ███  █   █
     █   █   █
     █      
Alan Lupton
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 05, 2011, 01:46:19 AM
Last edit: July 26, 2011, 03:46:48 AM by Alan Lupton
 #21

Hi,

I offer my rapidshare account for the links. I would mark the files as trafficshare, meaning no popups, no wait time, nothing. Just like a normal download from, lets say any software website, click and go. You wouldn't even have to go to the rapidshare page.

And since we're talking about 4KB, I couldn't care less Wink

Try it:

Version 2011-07-17: https://www.rapidshare.com/files/4111719732/2011-07-17_kernel.7z
Version 2001-07-11: https://www.rapidshare.com/files/3730055236/2011-07-11_kernel.7z
Version 2011-07-07: https://www.rapidshare.com/files/1447400948/2011-07-07_kernel.7z
Version 2011-07-06: https://www.rapidshare.com/files/698776394/2011-07-06_kernel.7z
Version 2011-07-03: https://www.rapidshare.com/files/3813413034/2011-07-03_kernel.7z
Version 2011-07-01: https://www.rapidshare.com/files/946373551/kernel.7z
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 05, 2011, 05:10:27 AM
 #22

went from 306-314 on my 5830's Sent you a little thank you. People PAY THIS MAN!

Sounds great and thank you Smiley!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 05, 2011, 05:12:41 AM
 #23

Can somebody upload a version of the kernel we can extract natively in windows without downloading yet another compression/decompression app?

TIA

edit:  Online decompression:  wobzip.org

The kernel is that small I guess it wouldn't even need to be zipped. Download volume should not be that big.
But I really like 7-Zip, it's free and open source ... you should consider to install it as your default packer Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Keninishna
Hero Member
*****
Offline Offline

Activity: 556
Merit: 500



View Profile
July 05, 2011, 08:51:23 AM
 #24

winrar can unzip like everything.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 05, 2011, 11:32:17 AM
 #25

winrar can unzip like everything.

And is Shareware ... but let's not start a packer discussion here an return on topic!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Turix
Member
**
Offline Offline

Activity: 76
Merit: 10



View Profile WWW
July 05, 2011, 12:15:45 PM
 #26

Can somebody upload a version of the kernel we can extract natively in windows without downloading yet another compression/decompression app?

TIA

edit:  Online decompression:  wobzip.org

Seriously just get 7-zip and remove the rest, its free, open source and its default format .7z is probably the most effective lossless compression format commonly used.

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
Bwincoin - 100% Free POS. BSqnSwv7xdD6UEh8bJz8Xp6YcndPQ2JFyF
Wildvest
Newbie
*
Offline Offline

Activity: 41
Merit: 0


View Profile WWW
July 05, 2011, 02:13:26 PM
 #27

i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 05, 2011, 02:41:02 PM
 #28

i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s

Too bad, but thanks for trying Smiley. Perhaps a new version will be ready by the end of this week. But guys don´t expect a huge improvement. On my setup I get 0,5 - 1,5 MHash/s more than with 2011-07-03 kernel version (guess the puzzle reaches it's end ^^). I can´t work like I would like to because the AMD APP KernelAnalyzer doesn't work ... hoping for a new version!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 05, 2011, 03:50:18 PM
 #29

i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s

This isn't exactly a fair comparison since phatk was specifically targeted at VLIW5 GPUs on SDK 2.4. A better comparison would be against phatk without this modification.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 05, 2011, 03:58:21 PM
 #30

The newest kernel gives an error on only 1 of my 4 cards. Phoenix says something about a kernel or OpenCL error and suggest that the card might be malfunctioning (and then keeps mining), and poclbm just crashes. The card is a ASUS 5870, but I have an identical model that works fine. The error might not happen for like 20 minutes. And the card apparently works perfectly since various pools accept the results from that card with no special extra-stales.

Also, I find it weird that the newest kernel seems to add some Watts of power consumption (I need to measure this propperly though).. Since its doing less operations and that way achieving a higher rate, shouldnt it consume the same?


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
Turix
Member
**
Offline Offline

Activity: 76
Merit: 10



View Profile WWW
July 05, 2011, 07:29:19 PM
Last edit: July 05, 2011, 11:38:01 PM by Turix
 #31

Bumped my hash rate from about 428 to 431 (3 MHash/s or 0.0696%), although previously I was using a customized kernel that has some of the same changes you've made so this increase is not representative.

Edit: XFX 5870 @ 950/315

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
Bwincoin - 100% Free POS. BSqnSwv7xdD6UEh8bJz8Xp6YcndPQ2JFyF
CYPER
Hero Member
*****
Offline Offline

Activity: 812
Merit: 502



View Profile
July 05, 2011, 08:25:23 PM
 #32

4x 5870 @ 960Mhz Core = rock solid 1748Mhash/s

After I put the modified kernel:

Variable 1746-1751 with most of the time staying around 1748

So no use for me, but thanks for your effort anyway Smiley
techwtf
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 05, 2011, 11:50:56 PM
 #33

5870, SDK 2.1, 11.6: no change, 419->419 Sad
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
July 06, 2011, 12:29:43 AM
 #34

saapphire 5830 xtreme.. upped me from 329.2 to 332.6

overclocked 1040/355

aggression 12 worksize 256

66c

mooo for rent
c_k
Donator
Full Member
*
Offline Offline

Activity: 242
Merit: 100



View Profile
July 06, 2011, 02:41:40 AM
Last edit: July 06, 2011, 03:04:27 AM by c_k
 #35

I get 5MH/s increase on 5850 (372MH/s -> 377MH/s) and 2MH/s increase on 5770 (215MH/s -> 217MH/s)

hmm, it does seem to vary a hell of a lot more though

hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 06, 2011, 05:13:15 AM
 #36

5870, SDK 2.1, 11.6: no change, 419->419 Sad

Since most gpu miners are using now the phatk kernel you should upgrade to 2.4. Phatk kernel is optimized for 2.4. You will get better results.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 10:12:04 AM
 #37

5870, SDK 2.1, 11.6: no change, 419->419 Sad

Since most gpu miners are using now the phatk kernel you should upgrade to 2.4. Phatk kernel is optimized for 2.4. You will get better results.

Well, you may get better results.  For me:

SDK 2.4:  393.7 -> 396.8
SDK 2.1:  399.4 -> 399.4

That's for a Sapphire HD5850 Xtreme 970/300@1.0875V using catalyst 11.6 on Linux and phatk
VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256

Definitely try out both SDK 2.1 and SDK 2.4 though.  If you are using Windows there's also an early version of SDK 2.5 with catalyst 11.7 which may be worth a look.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 06, 2011, 01:09:39 PM
 #38

New version is ready, DL here: http://www.mediafire.com/?f8b8q3w5u5p0ln0

Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested!

Try, have fun, comment and donate Cheesy.

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Apopfis
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
July 06, 2011, 01:39:30 PM
 #39

from ~402Mh/s ---to--- ~405Mh/s   Shocked and I have 4x5850 sapphires each clocked to 1000Mhz.


Will take time to see what stales rate will be. With the last kernel it was around 2-3 %. Usually closer to 2%.


PLEASE KEEP THESE COMING  Grin
strictlyfocused
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
July 06, 2011, 01:56:34 PM
 #40

MSI Hawk 5770
2011-07-03 kernel :: 233 MH/s
2011-07-03 kernel :: 236 MH/s

Thanks!!!

 Grin
Keninishna
Hero Member
*****
Offline Offline

Activity: 556
Merit: 500



View Profile
July 06, 2011, 02:53:22 PM
 #41

no change for me and my 6950s 11.6 drivers sdk 2.4
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 04:43:35 PM
 #42

Another improvement but still not enough to beat SDK 2.1 for me.

The two phatk kernels of interest to me are:
- Kernel A = standard phatk kernel with the MA tweak applied.
- Kernel B = the latest kernel from this thread.

I'm using: Linux, Catalyst 11.6, a Sapphire HD5850 Xtreme:

At 900 MHz things look promising...

[900 MHz - 360 MHz RAM]
SDK 2.1, kernel A: 364.4 MH/s
SDK 2.1, kernel B: Fatal error
SDK 2.4, kernel A: 360.8 MH/s
SDK 2.4, kernel B: 365.6 MH/s

...but at higher core clock rates SDK 2.1 takes the lead once more.

[980 MHz - 360 MHz RAM]
SDK 2.1, kernel A: 404.7 MH/s
SDK 2.4, kernel B: 404.3 MH/s

[1020 MHz - 360 MHz RAM]
SDK 2.1, kernel A: 421.5 MH/s
SDK 2.4, kernel B: 420.9 MH/s

I would give some higher clocks but I can't go much past 1020 MHz without overvolting my card and I don't want to do that.

I tried playing with the RAM frequency but everything dropped off slowly as I lowered it and quickly as I raised it no matter which kernel or version of SDK I choose.

It's a shame that SDK 2.1 cannot drive your latest kernel but then I guess you are specifically designing it for SDK 2.4 and are now using features which are not available in SDK 2.1.  It would be great to finally put SDK 2.1 to bed but another MH/s sounds like a tall order at this point.

I have no data on accepts and rejects (I mine solo).
CYPER
Hero Member
*****
Offline Offline

Activity: 812
Merit: 502



View Profile
July 06, 2011, 04:50:18 PM
 #43

I'm using Autominer and the newest kernel does not work for me at all, but I don't have time to troubleshoot it as I'll miss on my mining and any expected rewards.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 05:13:20 PM
 #44

I'm using Autominer and the newest kernel does not work for me at all, but I don't have time to troubleshoot it as I'll miss on my mining and any expected rewards.

Possibly you're using an old version of SDK.  I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 06, 2011, 07:07:04 PM
 #45

I made an interesting discovery during my own tests with the new kernel version. I had to up the memory clock of my 5870 from 200 to 350 MHz in order to achieve the highest hashing values. Another thing to mention is, that I drive a Phenom II X6 1090T with only 800 MHz for every core, due to power saving, while mining. If I let the CPU use full speed, MHash/s goes even higher, let's say 3-4 MH/s.

Conclusion: Perhaps you guys should try to raise your mem speeds + experiment with CPU clocks, too. I know it has to be a good balance, so that higher MH/s values are not eaten by higher energy costs.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 07:15:24 PM
 #46

I made an interesting discovery during my own tests with the new kernel version. I had to up the memory clock of my 5870 from 200 to 350 MHz in order to achieve the highest hashing values. Another thing to mention is, that I drive a Phenom II X6 1090T with only 800 MHz for every core, due to power saving, while mining. If I let the CPU use full speed, MHash/s goes even higher, let's say 3-4 MH/s.

Conclusion: Perhaps you guys should try to raise your mem speeds + experiment with CPU clocks, too. I know it has to be a good balance, so that higher MH/s values are not eaten by higher energy costs.

Dia

My card RAM is already at 360 MHz and I've tested but I can't find a better frequency for the RAM at my core speeds if I'm only interested in MH/s.

As for CPU usage  I've not touched my CPU settings at all and the miners only use about 0.4% each.  I even removed the fan from the CPU and placed it to cool the back of my hot card (the heatsink on the CPU is not even warm).  I'm assuming significant CPU loads is a Windows thing.

What interests me is how SDK 2.1 seems to be better at higher clock speeds whereas SDK 2.4 with your kernel is better at moderate speeds (940 MHz or below).  I admit I have little data on this but if anyone else gets the same results it would be interesting to know why.
CYPER
Hero Member
*****
Offline Offline

Activity: 812
Merit: 502



View Profile
July 06, 2011, 07:18:31 PM
 #47

Possibly you're using an old version of SDK.  I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.


Well it worked with the previous version of the modified kernel.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 07:22:37 PM
 #48

Possibly you're using an old version of SDK.  I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.


Well it worked with the previous version of the modified kernel.

Yes, I found the previous version worked with SDK 2.1.  But the version released today doesn't.  I had to change to SDK 2.4 for this most recent version and this change actually lost me 0.4-0.6 MH/s.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 06, 2011, 07:34:13 PM
 #49

Possibly you're using an old version of SDK.  I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.


Well it worked with the previous version of the modified kernel.

Yes, I found the previous version worked with SDK 2.1.  But the version released today doesn't.  I had to change to SDK 2.4 for this most recent version and this change actually lost me 0.4-0.6 MH/s.


I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 07:49:09 PM
 #50

I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Don't worry about it.  The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have.

I don't know how to get detailed error messages from phatk.  When I use SDK 2.1 and your latest kernel I run the command
python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1
and get
[<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!

If I try the same with the previous version of your kernel everything works happily.  I wish I had more details for you but I just don't know how to get them.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 06, 2011, 08:07:43 PM
 #51

I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Don't worry about it.  The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have.

I don't know how to get detailed error messages from phatk.  When I use SDK 2.1 and your latest kernel I run the command
python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1
and get
[<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!

If I try the same with the previous version of your kernel everything works happily.  I wish I had more details for you but I just don't know how to get them.


If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 08:19:16 PM
 #52

If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

Would I get more detailed feedback from another front-end to phatk?  I haven't really 'shopped around' with the front ends.
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 06, 2011, 08:53:03 PM
 #53

I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Don't worry about it.  The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have.

I don't know how to get detailed error messages from phatk.  When I use SDK 2.1 and your latest kernel I run the command
python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1
and get
[<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!

If I try the same with the previous version of your kernel everything works happily.  I wish I had more details for you but I just don't know how to get them.


If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.

For mining I see only 2 real options:
SDK 2.1 with poclbm
SDK 2.4 with phatk

2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either.
2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4

Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
July 06, 2011, 09:02:09 PM
 #54

I am sure, it increases.
424-447 Mhash/s & 413 -430 Mhash/s
Sapphire 5870 & MSI 5870.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 09:07:25 PM
 #55

There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.

I read elsewhere that this is the theory but in practice phatk is faster than poclbm on SDK 2.1 for me.  Maybe this has something to do with the fact that I've applied the MA tweak (one less operation) to both kernels.

E.g. Sapphire HD5850 Xtreme 1000MHz core, 350MHz RAM, Catalyst 11.6 (Linux x86_64), VECTORS BFI_INT FASTLOOP=false AGGRESSION=13 WORKSIZE=256:
phatk: 413.3 MH/s (+/- 0.2 MH/s)
poclbm: 411.4 MH/s (+/- 0.2 MH/s)

I've tried lower core speeds and higher RAM speeds but always phatk outperforms poclbm on SDK 2.1 for me.

For mining I see only 2 real options:
SDK 2.1 with poclbm
SDK 2.4 with phatk

2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either.
2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4

Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().

I'll try that.
Wildvest
Newbie
*
Offline Offline

Activity: 41
Merit: 0


View Profile WWW
July 06, 2011, 09:18:42 PM
 #56

THANKS for your efforts  ! just reporting back  Cool

6990 version 2011-07-06 with Catalyst 11.4, SDK 2.4 now equal with the latest poclbm (phatk) - maybe 0.5 MH/s slower  Cry
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 06, 2011, 09:20:17 PM
 #57

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.
Alan Lupton
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 06, 2011, 10:51:07 PM
 #58

2001-06-07: Wow, nice work! Now I'm getting not 5-15% rejections and working like a charm. No speed increase though from last update.
c_k
Donator
Full Member
*
Offline Offline

Activity: 242
Merit: 100



View Profile
July 07, 2011, 02:38:11 AM
 #59

New release gives me 2-3MH/s more

I've given a small donation Smiley

Thanks for the hard work!

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 07, 2011, 06:02:05 AM
 #60

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.


You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dsky
Sr. Member
****
Offline Offline

Activity: 279
Merit: 250


View Profile
July 07, 2011, 07:22:21 AM
 #61

All miner are Windows 7 x32 - SDK 2.4 - Catalyst 11.6

Latest changes:
HD5770 - from 219 up to 220
HD6950 (unlockable) - from 367 to 370
HD6970 (6950 with 6950 BIOS) - from 405 up to 408

Small speed increase on all three kind of cards and the rejected rate seems better, too.

Well done again, Sir!

hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 07, 2011, 07:40:36 AM
Last edit: July 07, 2011, 09:07:07 AM by hugolp
 #62

5870, Ubuntu 11.04, 11.6, 2.4, poclbm, went up 1MH/s (with last modification from previous modification).

The good news is the card that was randomly crashing the miner every 20 minutes with previous patch has been running for more than an hour without problems, so it seems stable now. Just crashed. I dont know what happens with this card and the modified kernel. Also, consumption has gone down like 5W. Im very puzzled by this changes in consumption by the different kernels.

Very good job. A small donation is going your way.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 07, 2011, 07:48:45 AM
 #63

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.


You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

No good.  Now I get a ton of "expression must have a constant value" errors.  The end of the log looks like:
{
/tmp/OCLgV3our.cl(25): error: expression must have a constant value
        (u)0x6a09e667, (u)0xbb67ae85, (u)0x3c6ef372, (u)0x510e527f, (u)0x9b05688c, (u)0x1f83d9ab, (u)0xfc08884d, (u)0x5be0cd19
                                                                                                                    ^

/tmp/OCLgV3our.cl(29): error: expression must have a constant value
  __constant ulong L = (u)0x198c7e2a2;
                          ^

/tmp/OCLgV3our.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

74 errors detected in the compilation of "/tmp/OCLgV3our.cl".
}

Only one of the "mixed vector-scalar operation" errors remains but I'm guessing the others are still there but just buried by the even more urgent "constant value" errors.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 07, 2011, 08:32:42 AM
 #64

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 07, 2011, 08:50:22 AM
 #65

You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

Sorry about that last post.  I'm not usually that dumb I assure you.

I've modified your kernel code by adding (u) before each of the 5 raw hex values corresponding to the error messages.  I also added (u) directly before L from the other error message.  After this everything starts working in SDK 2.1.

For my stock voltage 5850:
423.7 (+/- 0.1) MH/s -> 425.9 (+/- 0.05) MH/s

This does of course mean that SDK 2.1 has increased its lead against SDK 2.4 for me.  So many people are convinced that SDK 2.4 is faster so perhaps this is a Windows/Linux thing.

If this runs for 24 hours without freezing then I have a new personal best!  I will want to test what proportion of these hashes are inaccurate but things are looking good.  Another donation is coming your way.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 07, 2011, 09:46:52 AM
 #66

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.


You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

No good.  Now I get a ton of "expression must have a constant value" errors.  The end of the log looks like:
{
/tmp/OCLgV3our.cl(25): error: expression must have a constant value
        (u)0x6a09e667, (u)0xbb67ae85, (u)0x3c6ef372, (u)0x510e527f, (u)0x9b05688c, (u)0x1f83d9ab, (u)0xfc08884d, (u)0x5be0cd19
                                                                                                                    ^

/tmp/OCLgV3our.cl(29): error: expression must have a constant value
  __constant ulong L = (u)0x198c7e2a2;
                          ^

/tmp/OCLgV3our.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

74 errors detected in the compilation of "/tmp/OCLgV3our.cl".
}

Only one of the "mixed vector-scalar operation" errors remains but I'm guessing the others are still there but just buried by the even more urgent "constant value" errors.


Ah sorry, I was not clear enough. You must not add (u) in front of every hex value in the kernel, but ONLY in front of the hex values, that generated an error.

Code:
W[19] = P4(19) + (u)0x11002000 + P1(19);

W[30] = P3(30) + (u)0xA00055 + P1(30);

Vals[3] = (u)L + W[64];

W[81] = P4(81) + P2(81) + (u)0xA00000;

W[87] = P4(87) + P3(87) + (u)0x11002000 + P1(87);

W[94] = P3(94) + (u)0x400022 + P1(94);

If you can be so kind and test this out and report back. I would say restore the latest kernel and then modifiy the 6 places.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 07, 2011, 09:58:45 AM
 #67

Ah sorry, I was not clear enough. You must not add (u) in front of every hex value in the kernel, but ONLY in front of the hex values, that generated an error.

You were perfectly clear, I was just being dumb.  Incase you missed my second post, this fix works for SDK 2.1.  Thank you very much.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 07, 2011, 10:06:00 AM
 #68

Great, so we have a fix and a version that works with 2.1. Will release a fix later today!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
n4l3hp
Full Member
***
Offline Offline

Activity: 173
Merit: 100


View Profile
July 07, 2011, 10:44:05 AM
 #69

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining.

For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 07, 2011, 12:11:48 PM
 #70

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining.

For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.

You misread my post. I am running at 460 core because the card is 105C at that speed. I cannot run it any faster. I can run 480 core with this new kernel. Btw, the days of SDK 2.1 and poclbm are nearly over. I get 84MH/s at 675 core and 494 mem. I can't do 250 mem the card doesn't downclock that far with afterburner. At 250 mem it would be even higher. However I can actually clock 700+ even though only for a few seconds.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 07, 2011, 03:20:41 PM
 #71

New version 2011-07-07 is ready: http://www.mediafire.com/?7j70gnmllgi9b73

This is mainly a bugfix release for SDK 2.1 with some code restructuring to save a few writes and additions. I can not guarantee, that this really works for 2.1, because I didn't test it. If you are unsure, wait for users to test it for you and consider applying this patch later!

By the way, I want to thank all of those who donated a few Bitcents to me, feels great!

Thanks,
Dia

PS.: If it works, please post here and consider a small donation @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM Smiley.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Saturn7
Full Member
***
Offline Offline

Activity: 147
Merit: 100



View Profile
July 07, 2011, 03:29:29 PM
 #72

Went from 433/Mhash to 440/Mhash on 5870. Overclocked to 970Mhz.
Thanks  Smiley Donation sent.

First there was Fire, then Electricity, and now Bitcoins Wink
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 07, 2011, 03:30:10 PM
 #73

This kernel will cause poclbm to exit after running for a while.
I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing

AMD5870 with SDK 2.5
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 07, 2011, 03:37:03 PM
 #74

This kernel will cause poclbm to exit after running for a while.
I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing

AMD5870 with SDK 2.5

That's the first report I get with that problem. Any other poclbm users with that observation?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 07, 2011, 03:45:03 PM
 #75

This kernel will cause poclbm to exit after running for a while.
I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing

AMD5870 with SDK 2.5

That's the first report I get with that problem. Any other poclbm users with that observation?

Dia

I reported it twice in this same thread...

It only happens in one of my cards (I have 4 5870's, and I have one that its an identical model that works fine). In phoenix what happens is that it will give a mistake about a kernel mistake and will continue mining (I suspect phoenix reloads the kernel).


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 07, 2011, 03:56:54 PM
 #76

Excellent work on this kernel.  It seems like the original author got to a point where he thought he had improved performance to almost the max, but you are progressing very nicely!
CYPER
Hero Member
*****
Offline Offline

Activity: 812
Merit: 502



View Profile
July 07, 2011, 07:16:33 PM
 #77

Well finally I can see some improvement.

With todays version I got from 1748 to 1758 so ~10Mhash increase.

This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory
SDK 2.1
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 07, 2011, 08:01:50 PM
 #78

Well finally I can see some improvement.

With todays version I got from 1748 to 1758 so ~10Mhash increase.

This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory
SDK 2.1


Seems like good news for SDK 2.1 users, right Wink?

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
CYPER
Hero Member
*****
Offline Offline

Activity: 812
Merit: 502



View Profile
July 07, 2011, 08:11:12 PM
 #79

Well to be perfectly honest and objective 0.5% increase won't make any difference even with my setup  Roll Eyes At least not in terms of financial benefits Smiley

But I don't mean to belittle your work - well done Smiley
erek
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
July 07, 2011, 09:40:20 PM
 #80

808.6 MH/sec max now (up 1-2% at least) from 7-6-11 to 7-7-11
zimpixa
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
July 07, 2011, 10:16:10 PM
 #81

SDK 2.1 working for 5h now.

Newest version seems to behave more stable (less delta max-min). Speed is about 0.5MHash higher, but it can be just impression.

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 07, 2011, 10:36:13 PM
 #82

Excellent.

It looks like this SDK 2.1 bugfix is popular!  Perhaps now people will stop telling me to use SDK 2.4.

Thanks Diapolo for the fix and I'm glad I was able to help.
bmgjet
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
July 07, 2011, 11:23:51 PM
 #83

version 2011-07-06 gives best over all speed. 0.500 faster then my modded one but opening firefox drops speed from 277 down to 233 where my one only drops 2mh/s. Probably just the way phatk works since iv never used the stock one.

version 2011-07-07 is all over the place. jumps between 255-280 without firefox open so don't know if its faster or slower. With firefox its more stable then older version and drops to 252-258

Im using it with poclbm.exe -v -w 256 (128 gave same result) on 6850 overclocked/underclocked and 2.1 SDK.

Donations to: 1BMGjetfht9XLkGBYR4TSsuXjrYEKACcow
1stbits: 1bmgjet
300MHash/s 6850 http://www.techpowerup.com/gpuz/5u6wr/
Overclocked for 6 years and still strong http://valid.canardpc.com/show_oc.php?id=1931458 & http://valid.canardpc.com/show_oc.php?id=285337
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 08, 2011, 12:55:40 AM
 #84

I have a feature request

Can you please make it work with intel openCL?

thank you
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 08, 2011, 05:24:10 AM
 #85

Excellent.

It looks like this SDK 2.1 bugfix is popular!  Perhaps now people will stop telling me to use SDK 2.4.

Thanks Diapolo for the fix and I'm glad I was able to help.


It was only possible to fix it that fast, because you showed me the log files and error output! So we helped eachother, thanks too!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 08, 2011, 05:25:46 AM
 #86

version 2011-07-06 gives best over all speed. 0.500 faster then my modded one but opening firefox drops speed from 277 down to 233 where my one only drops 2mh/s. Probably just the way phatk works since iv never used the stock one.

version 2011-07-07 is all over the place. jumps between 255-280 without firefox open so don't know if its faster or slower. With firefox its more stable then older version and drops to 252-258

Im using it with poclbm.exe -v -w 256 (128 gave same result) on 6850 overclocked/underclocked and 2.1 SDK.

I discovered a MH/sec drop while, using Firefox, too. It has to be related with the new GPU acceleration that FF implemented in 4.0 and up.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 08, 2011, 05:27:00 AM
 #87

I have a feature request

Can you please make it work with intel openCL?

thank you

I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
kwaaak
Full Member
***
Offline Offline

Activity: 139
Merit: 100


View Profile
July 08, 2011, 09:55:22 AM
 #88

Thanks a lot
r3v3rs3
Newbie
*
Offline Offline

Activity: 22
Merit: 0


View Profile
July 08, 2011, 10:12:33 AM
 #89

2011-07-03 -> 2011-07-07

Wheezy x64, 11.4, SDK 2.4:

Box #1:

- HD5750, 875/300, AGGRESSION=11, 175 MH/s -> 176 MH/s

Box #2:

- HD5750, 900/300, AGGRESSION=11, 181 MH/s -> 183 MH/s
- HD5770, 950/300, AGGRESSION=11, 215 MH/s -> 217 MH/s

XP 32, 11.7 preview, SDK 2.5:

- HD5770, 1000/300, AGGRESSION=12, 222 MH/s -> 223 MH/s
- HD5830, 1050/300, AGGRESSION=9, 337 MH/s -> 337 MH/s

phatk w/ Ma patch -> 2011-07-07

Natty x32, 11.6, SDK 2.4:

- HD5750, 900/300, AGGRESSION=9, 176 MH/s -> 176 MH/s
- HD5770, 950/1200 (going to be RBE'ed to 300 Wink), AGGRESSION=9, 204 MH/s -> 204 MH/s
- HD5830, 1000/300, AGGRESSION=9, 312 MH/s -> 313 MH/s

phoenix 1.50 w/ common flags: VECTORS BFI_INT FASTLOOP=false WORKSIZE=256

Nice work, sent some bitcents to 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM. Smiley
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 08, 2011, 11:57:07 AM
 #90

Guys, I introduced a small glitch, which produces an OpenCL compiler warning in version 07-07. For stability reasons please change line 77:

old:
u W[123];

new:
u W[124];

I missed sharound(123), which writes to W[123], which is undefined, because it's out of range. Sorry for that!
Will upload a fixed version shortly (only includes the change above and stays 07-07).

Edit:
Download 07-07 fixed: http://www.mediafire.com/?o7jfp60s7xefrg4

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Tx2000
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile
July 08, 2011, 02:55:09 PM
 #91

Tried it out for the first time and I do see some noticable gains.



5850 @ 970/350
GUIMiner 2011-7-1

poclbm opencl - 410-413Mh
phoenix phatk - 408-413Mh

with your modified kern - 415-416.4Mh



Send a little something your way
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 08, 2011, 06:28:38 PM
 #92

I have a feature request

Can you please make it work with intel openCL?

thank you

I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!?

Dia
I have tried.  Poclum will not run at all.  It crashed upon starting.

I took a look at the code.  I think the comments are messy and some not really helpful.  Do you think its an good idea to 'fix' the comment?  btw comment why you type cast the hex value so other developer wont think its unnecessary and remove it.
error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
July 08, 2011, 09:58:24 PM
 #93

These two changes have taken my 5850s from 345MHash/sec to 360MHash/sec. Very nice.

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 08, 2011, 11:57:40 PM
 #94

I have a feature request

Can you please make it work with intel openCL?

thank you

I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!?

Dia
I have tried.  Poclum will not run at all.  It crashed upon starting.

I took a look at the code.  I think the comments are messy and some not really helpful.  Do you think its an good idea to 'fix' the comment?  btw comment why you type cast the hex value so other developer wont think its unnecessary and remove it.

Hex-values are type-casted so that the kernel works with AMD 2.1 SDK, which throws an error, if NOT type-casted.
I don't understand what you want to tell me with the "comments are messy" part.

If you get an error log with Intel SDK please post it here, so I can have a look at it.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 08, 2011, 11:58:43 PM
 #95

These two changes have taken my 5850s from 345MHash/sec to 360MHash/sec. Very nice.

5830s seem like THE card for my modded kernel. Great to hear Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 09, 2011, 02:04:58 PM
 #96

Hi Diapolo!

Great to see you're making progress!
There's one thing that pops into my eye:

you already do:
if(Vals[7].x == -H[7])

why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)

if(Vals[7].x == -H[7]-K[60])


Lets see if I can find more ..
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 09, 2011, 02:45:37 PM
 #97

Hi Diapolo!

Great to see you're making progress!
There's one thing that pops into my eye:

you already do:
if(Vals[7].x == -H[7])

why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)

if(Vals[7].x == -H[7]-K[60])


Lets see if I can find more ..


Good idea and works, can't verify via KernelAnalyzer, but seems like a vector addition less.
Will be included in the next version!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 09, 2011, 02:59:02 PM
 #98

Another addition waiting to be removed:

 Vals[7] = (Vals[3] = (u)0xb956c25b + D1 + s1(4) + ch(4)) + H1;

-> D1 is only used here, so why not add (u)0xb956c25b during precalculation?  Wink

Add

self.state2[3] = np.uint32(self.state2[3] + 0xb956c25b);

to __init__.py, line 77 for me, right behind:

self.calculateF(data)

And remove (u)0xb956c25b from kernel.cl

This also works 100%, no logic change involved here.

bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 09, 2011, 05:35:47 PM
 #99


What parameters are you using with phatk?

On my normally aspirated R6970 Lightning I get 419.xMH/s (Ma fix in poclbm):

Code:
-k poclbm DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS FASTLOOP=false

The fastest I've ever gotten with phatk is 403.xMH/s

Code:
-k phatk DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS FASTLOOP=false

Any suggestions?

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
huayra.agera
Full Member
***
Offline Offline

Activity: 154
Merit: 100



View Profile
July 09, 2011, 06:08:03 PM
 #100

I hope you can make an optimization for the next OCL version (v2.5 (684.212)) available in beta form already. =)

BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 09, 2011, 06:09:16 PM
 #101

Vince i editted the code like you said and I got errors.

Diapola what you did works fine. I have 2 5830's testing with now. Expect some Bit.love is all goes well.
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 09, 2011, 07:34:44 PM
Last edit: July 09, 2011, 08:13:28 PM by Vince
 #102

Vince i editted the code like you said and I got errors.

Which one of the changes did you try, both?

Tell me about the error you got, just "does not work" helps nobody!
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 09, 2011, 09:07:32 PM
 #103

Vince i editted the code like you said and I got errors.

Which one of the changes did you try, both?

Tell me about the error you got, just "does not work" helps nobody!

In Kernel.cl
I changed this:
if(Vals[7].x == -H[7])

to this
if(Vals[7].x == -H[7]-K[60])

and changed this

Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

to this
Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);


Then i changed this
Vals[7] = (Vals[3] = (u)0xb956c25b + D1 + s1(4) + ch(4)) + H1;

to this

Vals[7] = (Vals[3] = D1 + s1(4) + ch(4)) + H1;


as instructed here and yes i added the line to init.py

Quote
Add

self.state2[3] = np.uint32(self.state2[3] + 0xb956c25b);

to __init__.py, line 77 for me, right behind:

self.calculateF(data)

And remove (u)0xb956c25b from kernel.cl


The error is opencl is having unusual behavior or something. it shows MH etc... just when it seems to want to accept a share it spits that out
1bitc0inplz
Member
**
Offline Offline

Activity: 112
Merit: 10


View Profile
July 10, 2011, 03:20:39 AM
 #104

This most recent update is everything you said it would be!

My 5830 went from 295 MH/s to 305 MH/s with just this update!

Thanks for the great work.

Mine @ http://pool.bitp.it - No fees, virtually 0 stales, what's not to love!
Chat with us @ #bitp.it on irc.freenode.net
Learn more about our pool @ http://forum.bitcoin.org/index.php?topic=12181.0
gominoa
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
July 10, 2011, 07:19:09 AM
 #105

In Kernel.cl
I changed this:
if(Vals[7].x == -H[7])

to this
if(Vals[7].x == -H[7]-K[60])

Try also changing:
if(Vals[7].y == -H[7])
... to ...
if(Vals[7].y == -H[7]-K[60])

notice Y instead of X. Will be just below the X line
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 10, 2011, 09:44:34 AM
 #106

In Kernel.cl
I changed this:
if(Vals[7].x == -H[7])

to this
if(Vals[7].x == -H[7]-K[60])

Try also changing:
if(Vals[7].y == -H[7])
... to ...
if(Vals[7].y == -H[7]-K[60])

notice Y instead of X. Will be just below the X line


Same error
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 10, 2011, 11:38:08 AM
 #107

Next kernel version will, once more, be faster for 69XX and 58XX cards Smiley. Stay tuned!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 10, 2011, 01:12:49 PM
Last edit: July 10, 2011, 01:32:52 PM by Vince
 #108

Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

to this
Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);


is this a typo or did you leave out the "="?

should be:

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);


indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 10, 2011, 04:55:33 PM
 #109

That Worked , i managed to go from 306-308MH to 308-310MH on 5830.
donations forth coming if it remains stable. Thx...
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 10, 2011, 06:07:41 PM
 #110

Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

to this
Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);


is this a typo or did you leave out the "="?

should be:

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);



do i need to change    if(Vals[7].y == -H[7])?
and    if(Vals[7] == -H[7])?
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 10, 2011, 06:13:45 PM
 #111

Hi Diapolo!

Great to see you're making progress!
There's one thing that pops into my eye:

you already do:
if(Vals[7].x == -H[7])

why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)

if(Vals[7].x == -H[7]-K[60])


Lets see if I can find more ..

how would this save an instruction? did you just move -K[60]?
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 10, 2011, 06:30:33 PM
 #112

how would this save an instruction? did you just move -K[60]?

Yes, just moved it. Now the compiler optimizes it away, before it didn't.

BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 10, 2011, 06:50:32 PM
 #113

how would this save an instruction? did you just move -K[60]?

Yes, just moved it. Now the compiler optimizes it away, before it didn't.


Mind explaining more? I don't get it
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 10, 2011, 07:02:44 PM
 #114

Mind explaining more? I don't get it

Its a constant.
If I add it together with the other stuff in round 124 to Vals[7], it takes an addition to do so, cause its the only constant.

If moved to the comparison at the end, the two constants H[7] and K[60] are merged together into one by the compiler, same execution time here.
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 10, 2011, 07:09:28 PM
 #115

So i changed the whole thing to

   Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
   
#ifdef VECTORS
   if(Vals[7].x == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3].x >> 2) & OUTPUT_MASK] = W[3].x;
   if(Vals[7].y == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3].y >> 2) & OUTPUT_MASK] =  W[3].y;
#else
   if(Vals[7] == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3] >> 2) & OUTPUT_MASK] = W[3];
#endif

does not notice any speed difference, hope that helps

Ps: Does the compiler really do the optimization? If not you introduced one more step cause K[60] appear twice now
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 10, 2011, 07:18:48 PM
 #116

I'm pretty sure the compiler will catch this  Wink
Note that the speed increase is minimal, ~0.1-0.2% maybe.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 11, 2011, 02:10:13 PM
 #117

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Turix
Member
**
Offline Offline

Activity: 76
Merit: 10



View Profile WWW
July 11, 2011, 02:31:01 PM
 #118

Gained about 1 Mhash (431->432) from the 7th version to todays new version on my 5870 (950/315).

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
Bwincoin - 100% Free POS. BSqnSwv7xdD6UEh8bJz8Xp6YcndPQ2JFyF
Bobnova
Full Member
***
Offline Offline

Activity: 210
Merit: 100


View Profile
July 11, 2011, 03:08:38 PM
 #119

I also gained about 1Mh/s from todays update compared to the previous update, this is on a 5830 at 875/900 in linux.
The previous update made a big difference over what ships with phoenix 1.50.

I sent a small donation, as you've helped me make more money Cheesy

BTC:  1AURXf66t7pw65NwRiKukwPq1hLSiYLqbP
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 11, 2011, 04:29:14 PM
 #120

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 11, 2011, 04:51:53 PM
 #121

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

Various functions before the reassignment use Vals[2] search the kernel for Vals[ and you will see them. I tried to remove that assignment, but as you discovered, it is needed there Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 11, 2011, 05:06:15 PM
 #122

This does work with SDK 2.1 but it might be a tiny bit slower than your previous version.

HD5850, 1.0875V, 975MHz clock, 360 MHz RAM, aggression=14, worksize=256, Catalyst 11.6 (Linux)

SDK 2.1:   404.6 MH/s  ->  404.5 MH/s
SDK 2.4:   401.8 MH/s  ->  402.2 MH/s

Note that at aggression=14 my rate can sometimes drop as much as 1 MH/s suddenly before recovering but usually varies by 0.2 MH/s so the apparent decrease with SDK 2.1 could well be statistical noise.

I might also have to play with the RAM frequency again.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 11, 2011, 05:14:01 PM
 #123

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

Various functions before the reassignment use Vals[2] search the kernel for Vals[ and you will see them. I tried to remove that assignment, but as you discovered, it is needed there Cheesy.

Dia

I only searched for Vals[2] so did not see the % XD

Very small increase for me on this one 278->278.5. Interestingly a lower memory clock on my 6870 actually has a detrimental effect. Downclocking from 1050->800 reduces hash rate b 0.5MH/s. I can't clock it any lower so don't know if 300 will be better or not.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
July 11, 2011, 05:22:22 PM
 #124

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

I wish & hope you will still find out some ways to get more hashes.
Thanks.
dikidera
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 11, 2011, 05:23:18 PM
 #125

At 2 megahashes, your device produces around 2 to 2,5 million hashes per second, If we halven that to 1 megahash, that's still 1,25 million hashes per second, if we halven that, around 750 thousand per second.
So a increase of 0.2% or so, yields around 100 thousand hashes more per second.
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 11, 2011, 05:39:13 PM
 #126

Dont give up! There's still more to optimize, I'm at 1694 ALU OPs (HD6970) at the moment.
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 11, 2011, 06:19:54 PM
 #127

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 11, 2011, 06:28:47 PM
 #128

I wonder why do we need const uint D1.
It is only use once.
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 11, 2011, 06:50:09 PM
 #129

I wonder why do we need const uint D1.
It is only use once.

Its part of the precalculation. Its needed.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 11, 2011, 07:31:01 PM
 #130

Dont give up! There's still more to optimize, I'm at 1694 ALU OPs (HD6970) at the moment.

I can't read or edit Phyton, so yes there is room if one could alter or add some more kernel arguments.
Strange thing is, that I saw some additions, of known values, which I tried to to eleminate via constants, but this led to lower kernel performance. I played around with this today and saw no more improvement ... too bad, was real fun the last days!
If you would like to share your work, we all will be happy Smiley. What is your kernel doing for 58XX cards? I thought it makes no sense, to optimize one over the other and tried to reduce ALU OP count for both platforms.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 11, 2011, 07:31:55 PM
 #131

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.

Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
July 11, 2011, 08:25:28 PM
 #132

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
pennytrader
Sr. Member
****
Offline Offline

Activity: 254
Merit: 250


View Profile
July 11, 2011, 09:22:54 PM
 #133

On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.

please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
CYPER
Hero Member
*****
Offline Offline

Activity: 812
Merit: 502



View Profile
July 11, 2011, 09:53:22 PM
 #134

On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.

Same here. With the previous version my average was 1758 and now it is 1756.

This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory
SDK 2.1
Ubuntu 32bit
wazoo42
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 11, 2011, 10:18:46 PM
 #135

7/4/11   = a 1-2 MH/s increase
7/6/11   = 0 increase (maybe slight decrease)
7/11/11  = 1-2 MH/s further increase over 7/4/11

These are on 2x 5830s, and 3x 5770s using ati-drivers-11.6, phoenix-1.50, pyopencl-0.92, and ati-stream-sdk-bin-2.4.
error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
July 11, 2011, 10:21:38 PM
 #136

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 11, 2011, 11:31:37 PM
 #137

Decreasing hashrates .. thats really strange. These 58xx-cards sometimes behave quite strange.

I cant test it cause all my rigs run on 6950's unlocked to 6970's
erek
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
July 11, 2011, 11:32:18 PM
 #138

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.

I totally disagree, each version for me has been getting faster and faster. 7-11-11 is the fastest, yet for me.
Wildvest
Newbie
*
Offline Offline

Activity: 41
Merit: 0


View Profile WWW
July 12, 2011, 12:25:55 AM
 #139

2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 12, 2011, 02:27:20 AM
 #140

So here are some more changes:

I introduced const uint W17_2, containing P1(19) + 0x11002000, thats 3 shifts, 2 xor, 1 add traded against one extra parameter, well worth it,

extended self.f:
self.f = np.zeros(5, np.uint32)
to
self.f = np.zeros(6, np.uint32)

just after W17 calculation in calculateF:
        #W17_2
        self.f[5] = np.uint32(0x11002000+(
                rot(self.f[2], 32-13) ^
                rot(self.f[2], 32-15) ^
                (self.f[2] >> 10)
                ))

added the parameter (right after W17) in call and function

=> Effectively 3 Op's saved.

next change:
You can cut out all W0 to W14! Most of them are zero anyway, just needed to hardcode the first ones.
Also W[73] to W[78] are not used anymore with some small changes, so no need to initialize them.

=> less memory use, but has the same speed for me

Next one:
Round 3

#ifdef VECTORS
        Vals[4] = (W_3 = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4;
#else
        Vals[4] = (W_3 = base + get_global_id(0)) + PreVal4;
#endif

--
        // Round 3
        Vals[0] = state0 + Vals[4];
        Vals[4] += T1;

--

W[64 - O] = state0 + Vals[0];

you can reorganize and shorten round 3 to:
        Vals[0] = T1 + Vals[4];

needed changes in precalculation:
Preval4 += T1
T1 = state0 - T1

=> another addition almost effortless


here the files with these changes:
http://www.filesonic.com/file/1423103594

still some more to come!



TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
July 12, 2011, 02:36:59 AM
 #141

2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel
How'd you get this future kernel?
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 12, 2011, 05:57:08 AM
Last edit: July 12, 2011, 06:44:28 AM by hugolp
 #142

Quote from: hugolp
Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.

Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts?

Dia

Im using the previous patch, that is almost as fast, in that card, so its ok.

Im wondering as well if the card has some kind of problem, but with other kernels it has been running non-stop for days without a problem. Dont know why some kernels trigger the crash.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
Dubs420
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
July 12, 2011, 06:48:31 AM
 #143

I just tried your latest kernel in GUIminer with poclbm miner went from 417 to 419 each GPU great work thanks. was able to tweak a little more up to 420.6 to 421.0 using -f 1
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 12, 2011, 09:38:26 AM
 #144

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 12, 2011, 11:26:22 AM
 #145

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent.  That being the case, which cards are you using for testing?
pandemic
Sr. Member
****
Offline Offline

Activity: 434
Merit: 250


View Profile
July 12, 2011, 12:14:46 PM
 #146

My 5830 went from 304mh/s to 307mh/s. Small increase, but why not? Smiley
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 12, 2011, 12:55:02 PM
 #147

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent.  That being the case, which cards are you using for testing?


I own a 5870, a 5830 and use AMD KernelAnalyzer to get infos for 69XX cards. You see I focused on that cards during my own tests. I could receive infos for more cards via AMD KA, but it seems hard to optimize one kernel for all cards Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
kbsbtc
Newbie
*
Offline Offline

Activity: 53
Merit: 0


View Profile
July 12, 2011, 01:05:59 PM
 #148

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 12, 2011, 01:28:58 PM
 #149

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

So .. what about some more information?  Sad

Pool? Version used? Clock speeds? 5830 @ 305 seems to be somewhat overclocked ..
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 12, 2011, 02:12:18 PM
 #150

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 12, 2011, 03:18:25 PM
 #151

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 12, 2011, 03:24:45 PM
 #152

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Thanks for clarification! Good to know, this was new for me.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
talldude
Member
**
Offline Offline

Activity: 224
Merit: 10


View Profile
July 12, 2011, 03:55:17 PM
 #153

cool, went from 349 (original improved kernel) to 350.3 with latest. Keep 'em coming Cheesy
kbsbtc
Newbie
*
Offline Offline

Activity: 53
Merit: 0


View Profile
July 12, 2011, 06:25:37 PM
 #154

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Thanks for clarification! Good to know, this was new for me.

Dia

Thanks for the heads up. I am running 4 5830s on 1 box clocked at 950/300 pointed at bitcoins.lc. It seems the stale count went up for me, didn't mean to blame you or anything just saying that was what happened. I'lll look into on the pool end though.
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
July 12, 2011, 06:33:27 PM
 #155

=> another addition almost effortless

here the files with these changes:
http://www.filesonic.com/file/1423103594

still some more to come!
Thanks.
It increased 440 to 443 in 5870 @ 975/325 Windows.
431-434 in 6970 & 5870 @ 975/1375 & 984/300 Ubuntu - Smartcoin

With the inclusion of _init_.py, i hope there will be still some room to tweak.
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 12, 2011, 06:34:48 PM
 #156


Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Thanks for clarification! Good to know, this was new for me.

Dia

Thanks for the heads up. I am running 4 5830s on 1 box clocked at 950/300 pointed at bitcoins.lc. It seems the stale count went up for me, didn't mean to blame you or anything just saying that was what happened. I'lll look into on the pool end though.

My post was mainly intended to clarify that stale shares are not a good measurement for kernel changes.

A much more reliable test would be to count the total number of shares submitted over a long period (say 24 hours or so) This includes stales, since the goal is to test how many shares the kernel finds, not how many the server accepts. If this number is higher than without the kernel modifications, you know that it's helping.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 12, 2011, 11:55:24 PM
 #157

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia
what about for poclbm? there is no __init__.py
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 13, 2011, 05:08:30 AM
 #158

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia
what about for poclbm? there is no __init__.py

That's what my edit was about Wink. It's all a matter of how much time I and others have and current focus is on Phoenix, because that's my main miner software.
Perhaps some mods can be done without new init values, so they will work without new __init__.py. But then I have to take care of 2 kernel versions. For now there is no need to worry, new version is not out Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
hchc
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500



View Profile
July 13, 2011, 05:34:33 AM
 #159

>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!

I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.

............
.           ▓▓▓▓▓▓▓▓▓▓▓▓▓▓
        ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
      ▓▓▓▀  ▀▓▓▓▀  ▀▓▓▀  ▀▓▓▓▓
    ▓▓▓▓▓▄  ▄▓▓▓▄  ▄▓▓▄  ▄▓▓▓▓▓▓
   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  ▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓
▓▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▀  ▀▓▓▓▀  ▀▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▄  ▄▓▓▓▄  ▄▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  ▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
    ▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
     ▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
       ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
          ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓

..           ▓▓▓▓▓▓▓▓▓▓▓▓▓▓
        ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
      ▓▓▓▀  ▀▓▓▓▀  ▀▓▓▀  ▀▓▓▓▓
    ▓▓▓▓▓▄  ▄▓▓▓▄  ▄▓▓▄  ▄▓▓▓▓▓▓
   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  ▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓
▓▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▀  ▀▓▓▓▀  ▀▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▄  ▄▓▓▓▄  ▄▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  ▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
    ▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
     ▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
       ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
          ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓

.............
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 13, 2011, 05:57:00 AM
 #160

>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!

I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.


- Win7 X64 SP1
- Cat 11.7 with SDK 2.4 and Runtime 2.4 (in order to be able to use AMD APP KernelAnalyzer)
- Sapphire 5830 Xtreme @ 1000 MHz core / 350 MHz Mem
- Phoenix 1.5: agression 12, vectors, bfi_int

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 13, 2011, 08:41:00 AM
 #161

>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!

I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.


That might be tough since phatk doesn't work very well on older SDK versions. It was designed to be used with SDK 2.4, and on 2.1 I get better results with poclbm. The Ma() changes also apply to poclbm, so you won't see a gain there.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
Uzza
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
July 13, 2011, 11:54:08 PM
Last edit: July 14, 2011, 09:42:39 AM by Uzza
 #162

I just did a quick comparison against poclbm for me.

On my dedicated 5870:
poclbm     SDK 2.1: ~424
phatk       SDK 2.1: <420
phatk imp SDK 2.1: ~432

poclbm     SDK 2.4: <424
phatk       SDK 2.4: ~424
phatk imp SDK 2.4: ~437

So on SDK 2.1 your improvements made it so phatk was better than poclbm in 2.1, and way better in 2.4.

The init optimizations gave me a minor boost of ~0.5 MHs over 2011-07-11.
PcChip
Sr. Member
****
Offline Offline

Activity: 418
Merit: 250


View Profile
July 14, 2011, 02:03:05 AM
 #163

Uzza, two questions:

1.) what is "phatk imp"

2.) Surely you meant a 5870 instead of a 4870?  Either that or you must have four 4870's to hit 430 MH/s!

Legacy signature from 2011: 
All rates with Phoenix 1.50 / PhatK
5850 - 400 MH/s  |  5850 - 355 MH/s | 5830 - 310 MH/s  |  GTX570 - 115 MH/s | 5770 - 210 MH/s | 5770 - 200 MH/s
CYPER
Hero Member
*****
Offline Offline

Activity: 812
Merit: 502



View Profile
July 14, 2011, 04:17:30 AM
 #164

Uzza, two questions:

1.) what is "phatk imp"

2.) Surely you meant a 5870 instead of a 4870?  Either that or you must have four 4870's to hit 430 MH/s!

1 - Phatk Improved - it's what this topic is all about.
2 - Most probably he meant 5870  Wink
Uzza
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
July 14, 2011, 09:43:38 AM
 #165

1 - Phatk Improved - it's what this topic is all about.
2 - Most probably he meant 5870  Wink
What this guy said.
nico_w
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
July 14, 2011, 04:27:41 PM
 #166

Sadly it only gives me 1Mhash/s increase from 344 to 345 on my server, but keep up the good work!
MiningBuddy
Hero Member
*****
Offline Offline

Activity: 927
Merit: 1000


฿itcoin ฿itcoin ฿itcoin


View Profile
July 15, 2011, 11:33:29 AM
Last edit: July 15, 2011, 12:02:21 PM by MiningBuddy
 #167

Testing the 2011-07-11 kernel.7z

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1
Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec]
Stales before: 0.22% - > Stales after: 3.48%
Over a 4 hour test period

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4
Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec]
Stales: Not tested due to adverse affects.

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 15, 2011, 12:15:11 PM
 #168

Testing the 2011-07-11 kernel.7z

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1
Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec]
Stales before: 0.22% - > Stales after: 3.48%
Over a 4 hour test period

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4
Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec]
Stales: Not tested due to adverse affects.

Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^.
Btw. I had other things to do, but during the next week I will release a new version.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
MiningBuddy
Hero Member
*****
Offline Offline

Activity: 927
Merit: 1000


฿itcoin ฿itcoin ฿itcoin


View Profile
July 15, 2011, 12:26:44 PM
 #169

Testing the 2011-07-11 kernel.7z

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1
Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec]
Stales before: 0.22% - > Stales after: 3.48%
Over a 4 hour test period

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4
Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec]
Stales: Not tested due to adverse affects.

Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^.
Btw. I had other things to do, but during the next week I will release a new version.

Dia
Awesome, I look forward to it.
I think the rejected shares was random variance from my side as it seems to have settled down to a more realistic 0.88%.

jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 15, 2011, 07:47:25 PM
 #170

Testing the 2011-07-11 kernel.7z

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1
Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec]
Stales before: 0.22% - > Stales after: 3.48%
Over a 4 hour test period

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4
Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec]
Stales: Not tested due to adverse affects.

Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^.
Btw. I had other things to do, but during the next week I will release a new version.

Dia
Awesome, I look forward to it.
I think the rejected shares was random variance from my side as it seems to have settled down to a more realistic 0.88%.

Keep in mind that OpenCL kernel changes have NO effect on stale shares (aside from the VERY small difference in time it takes to run 1 execution of some number of hashes) All nonces found by the kernel to satisfy H == 0 are verified on the CPU prior to sending. Shares are also checked against the current known block before sending, in case new work was received while the kernel was executing. Basically this means that every share sent to the server is valid as far as the miner is concerned. If the OpenCL kernel is returning bad work it will never be sent to the server, and instead you will get "Unusual behavior from OpenCL. Hardware problem?"

That said, changes to the python portion of a Phoenix kernel can increase stale shares if badly implemented. (see: FASTLOOP excessive stales with high aggression in older versions of Phoenix)

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
zimpixa
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
July 16, 2011, 11:31:45 PM
 #171

Every new version up to 07~ was faster than previous on both of my rigs. 11~ version however is faster for single 5850, but on 5870+5850 rig Im noticing minor slowdown (due to bigger deltas, means best performance is same, but it can go little lower). In adition primary GPU set '-f0' stopped to bottleneck other GPU (but changing from -f35 to -f0 didnt add any speed). After all tests I've changed version on single gpu rig and left old one on double gpu rig.

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
coblee
Donator
Legendary
*
Offline Offline

Activity: 1654
Merit: 1351


Creator of Litecoin. Cryptocurrency enthusiast.


View Profile
July 17, 2011, 07:53:58 AM
 #172

Donation sent!

When you release the next version, please explain whats changed in __init__.py.
I'm using fpgaminer's poclbm w/ phatk: http://forum.bitcoin.org/index.php?topic=19169.0
I'd like to see if we can make your phatk kernel still work with that.

Thanks!

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 17, 2011, 01:00:49 PM
 #173

Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix 1.5 miner) is included in this package and has to be used!
The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.



The new version 2011-07-17 is ready for download Smiley. Should be faster on 58XX and 69XX cards again.
This version will only work, if you use it with Phoenix and the supplied __init__.py file because of modifications to kernel variables!

A very big thank you goes to user Vince for input and ideas!

Download here:
http://www.mediafire.com/?317u0y93u7mnbys

Have fun,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dikidera
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 17, 2011, 02:20:07 PM
 #174

With this version on my 5870: 410.58/411.6 -> 413.44/414Mhash/s

But on my HD5850, i see a decrease of a minimum of 0.50 mhash/s
CYPER
Hero Member
*****
Offline Offline

Activity: 812
Merit: 502



View Profile
July 17, 2011, 02:24:58 PM
Last edit: July 17, 2011, 02:36:39 PM by CYPER
 #175

2011-07-07 was the best for me so far and with 2011-07-17 I see no improvements - the speed is 1758-1760 as before with the same fluctuations.

4x XFX 5870 @ 960Mhz Core & 300Mhz Memory
Ubuntu 32bit
SDK 2.1
11.5 Drivers

Peao
Legendary
*
Offline Offline

Activity: 1320
Merit: 1001


View Profile
July 17, 2011, 02:48:05 PM
 #176

I noticed an improvement in performance.

Thank you, Dia!
erek
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
July 17, 2011, 04:04:50 PM
 #177

Solid increase equivalent to nearly 5MHz overclock w/ 2011-07-17

Catalyst 11.7 Early + SDK 2.5


905/340 GPU/VRAM clocks on (2x 6970s)

hitting 812+ MH/sec
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
July 17, 2011, 04:22:47 PM
 #178

Same hash rate as mod.zip. No "more" hashes.
bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 17, 2011, 05:43:55 PM
 #179


I get the same speed with DiabloMiner

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 17, 2011, 05:52:28 PM
 #180


I get the same speed with DiabloMiner

No need to use Phoenix or this kernel then Wink...

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dadittox
Newbie
*
Offline Offline

Activity: 23
Merit: 0


View Profile
July 17, 2011, 06:17:58 PM
 #181

Updated from 2011-07-11 to 2011-07-17 kernel. Hash rate increased from 325 to 327 on 6950, and from 280 to 281 on 6870 all at stock clocks. Not a big increase but it's always nice to have a few mhashes for free. Keep up good work!
MiningBuddy
Hero Member
*****
Offline Offline

Activity: 927
Merit: 1000


฿itcoin ฿itcoin ฿itcoin


View Profile
July 17, 2011, 10:24:47 PM
 #182

Got an extra 1mhs from my cards with the latest patch, cheers  Smiley

BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 17, 2011, 11:32:27 PM
 #183

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1036



View Profile WWW
July 18, 2011, 01:16:17 AM
 #184

I get lots of 'unusual opencl behavior. hardware problem?' on the latest version, using a fresh Catalyst 11.6 driver (not CPU buggy hotfix) on WinXP on two different miners, going back to stock phoenix and 07-11 kernel is fine.
Tartarus
Newbie
*
Offline Offline

Activity: 47
Merit: 0


View Profile
July 18, 2011, 02:47:58 AM
 #185

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
July 18, 2011, 03:09:03 AM
 #186

Quote
Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

no sure your reasoning, but if you want that, just grab the 7-11 version.. the version before this latest.

You dont gain all that much from the changes in __init__.py if you really want to do with out, you wont be shooting yourself in the foot. So grab the last version.


Personally I appreciate even the smallest increase.. so thanks for the work

mooo for rent
coblee
Donator
Legendary
*
Offline Offline

Activity: 1654
Merit: 1351


Creator of Litecoin. Cryptocurrency enthusiast.


View Profile
July 18, 2011, 03:23:28 AM
 #187

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 18, 2011, 05:23:18 AM
 #188

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry.

You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 18, 2011, 08:28:53 AM
 #189

New version 2011-07-17 getting a lot of:

Code:
2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 18, 2011, 08:37:33 AM
 #190

New version 2011-07-17 getting a lot of:

Code:
2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?

I have never seen that message during my tests with the release version of the kernel. What OS, which SDK are you on?

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dikidera
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 18, 2011, 08:48:53 AM
 #191

New version 2011-07-17 getting a lot of:

Code:
2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?

I have never seen that message during my tests with the release version of the kernel. What OS, which SDK are you on?
I wonder if he copied over the init.py file.
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 18, 2011, 09:03:54 AM
 #192

Yes I copied over the init file, because without it it won't even run.  Windows 7 64-bit, Catalyst 11.5, Stream 2.4.  Haven't had this message pop up with any other version of phatk kernel, nor any other kernel for that matter.
bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 18, 2011, 02:34:20 PM
 #193

I get lots of 'unusual opencl behavior. hardware problem?' on the latest version, using a fresh Catalyst 11.6 driver (not CPU buggy hotfix) on WinXP on two different miners, going back to stock phoenix and 07-11 kernel is fine.

I had the same issue under windows which magically went away when I switched to Linux.

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
ahitman
Sr. Member
****
Offline Offline

Activity: 302
Merit: 250


View Profile
July 18, 2011, 03:15:25 PM
 #194

New version 2011-07-17 getting a lot of:

Code:
2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?

I had the same issue on my 5850, but when I took 5Mhz off my overclock the error went away, guessing it was picking up some errors from being pushed harder?
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1011



View Profile
July 18, 2011, 03:17:51 PM
 #195

I'm on Linux and I'm still able to get these error messages, even with earlier versions of the phatk patch from this thread.  For me they start happening if my clocks are too high and become more frequenty as I increase the clocks.

For my 5850 at stock volts, 1015MHz doesn't generate any errors and with 1020 MHz they are very occasional but become more frequent as I ramp up to 1035MHz (my card always freezes at 1040MHz).

When I overvolted my 5850 to 1.25V and took it to 1110 MHz for 3 hours I got a few but noticed that I was generating good shares with the same OpenCL work request as was throwing errors so it seems the error doesn't invalidate the whole second and likely just a very small portion of it.  I have a screenshot here.

At lower voltages I never see these errors.  Either the card runs error free or crashes and the MHz line dividing these two states is pretty fine in my experience.

It's interesting that the kernel version might affect the frequency of such errors but they don't bother me.

Thanks for this latest kernel patch Diapolo.  My clock rates increased by 0.3 MH/s each.  2x5850: 722.2 MH/s -> 722.8 MH/s.  I'll send another small tip.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 18, 2011, 05:42:53 PM
 #196

I'm on Linux and I'm still able to get these error messages, even with earlier versions of the phatk patch from this thread.  For me they start happening if my clocks are too high and become more frequenty as I increase the clocks.

For my 5850 at stock volts, 1015MHz doesn't generate any errors and with 1020 MHz they are very occasional but become more frequent as I ramp up to 1035MHz (my card always freezes at 1040MHz).

When I overvolted my 5850 to 1.25V and took it to 1110 MHz for 3 hours I got a few but noticed that I was generating good shares with the same OpenCL work request as was throwing errors so it seems the error doesn't invalidate the whole second and likely just a very small portion of it.  I have a screenshot here.

At lower voltages I never see these errors.  Either the card runs error free or crashes and the MHz line dividing these two states is pretty fine in my experience.

It's interesting that the kernel version might affect the frequency of such errors but they don't bother me.

Thanks for this latest kernel patch Diapolo.  My clock rates increased by 0.3 MH/s each.  2x5850: 722.2 MH/s -> 722.8 MH/s.  I'll send another small tip.

Very nice posting with relevant information to all who encounter this error. Thanks for sharing Smiley. I never had this error, because my 5830 clocks were (and are) never above 1000 MHz for the chip, so it seems logical to me.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 18, 2011, 07:08:31 PM
 #197

I think I jumped the gun.  I believe I am having a real hardware problem.  It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell.  I think what happened is that these errors were hidden from the console until this new kernel update!  If that's the case, kudos for making the errors work! haha.
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 18, 2011, 09:23:29 PM
 #198

I think I jumped the gun.  I believe I am having a real hardware problem.  It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell.  I think what happened is that these errors were hidden from the console until this new kernel update!  If that's the case, kudos for making the errors work! haha.

Not necessarely. I had the same problem with a card in previous versions of the kernel. After 20 minutes it would produce that message in phoenix or would crash poclbm. But with exactly the same configuration and later kernels it was solved, even when it was producing higher hashing rates. I am not sure why it happens exactly.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
shin234
Newbie
*
Offline Offline

Activity: 39
Merit: 0


View Profile
July 19, 2011, 01:25:44 AM
 #199

my stales are at / shares 4234 / stale (7, 0.17%) / after the latest version using AOCLBF  on 2 5850's one running at 840/300 and 1000/300 the latter is because of getting a card with a crappy part number from sapphire
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 19, 2011, 04:10:23 AM
 #200

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry.

You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without.

Dia
what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
July 19, 2011, 05:37:38 AM
 #201

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry.

You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without.

Dia
what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes?

What I do here is just hobby and I don't want it to take even more time, I hope you understand that. I can't maintain 2 different kernel versions, sorry.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
burningrave101
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
July 19, 2011, 04:11:45 PM
 #202

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry.

You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without.

Dia
what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes?

Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 19, 2011, 04:21:30 PM
 #203

Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.

Backup pools. Its a big plus (and peace of mind).

Also, for me poclmb is slightly faster than phoenix with the same kernel.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
burningrave101
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
July 19, 2011, 04:43:58 PM
 #204

Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.

Backup pools. Its a big plus (and peace of mind).

Also, for me poclmb is slightly faster than phoenix with the same kernel.

Can't you just create another Phoenix miner on a different pool with a low aggression value and it will take over if your main pool worker goes idle?
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 19, 2011, 06:08:00 PM
 #205

Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.

Backup pools. Its a big plus (and peace of mind).

Also, for me poclmb is slightly faster than phoenix with the same kernel.

Can't you just create another Phoenix miner on a different pool with a low aggression value and it will take over if your main pool worker goes idle?

I tried this (I was a phoenix user until poclbm added backup pools), but the second miner would take some hashing power from the main one (big deal), and then if the main one went down it would not perform at full speed because the aggression was lower.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██