Bitcoin Forum
May 09, 2024, 02:29:15 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [All]
  Print  
Author Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13  (Read 106678 times)
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 04, 2011, 08:15:55 AM
Last edit: February 25, 2012, 02:26:12 PM by Diapolo
 #1

This is a repost from the Newbies forum, because I'm now allowed to post here :).
original Thread is located here: http://forum.bitcoin.org/index.php?topic=25135.0

If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).



Important (2012-01-13): The FASTLOOP=False parameter is not needed anymore, because FASTLOOP defaults to false in this version. Update: FASTLOOP=True works now, I uploaded a fixed version!

Important: since OpenCL SDK / Runtime version 2.6 AMD updated their OpenCL compiler, so that some older kernels and optimizations in them seem to not work anymore or are not needed anymore. In order to reflect this change I had to edit the kernel performance section of this thread.

Important: since version 2011-08-27 you don't need to supply the BFI_INT switch anymore. If your HW supports it, it's enabled automatically. To disable it use BFI_INT=false.

Important: since version 2011-08-04 (pre-release) you have to use the switch VECTORS2 instead of VECTORS. I made this change to be clear what vectors are used in the kernel (2- or 4-component). To use 4-component vectors use switch VECTORS4.

Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix miner) is included in this package and has to be used! The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.



This is the preferred switch for Phoenix with phatk_dia in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 VECTORS2 WORKSIZE=128


Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg
Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji
Download version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4j
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788
Download version 2011-07-17: http://www.mediafire.com/?4zxdd5557243has
Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6
Download version 2011-07-07: http://www.mediafire.com/?o7jfp60s7xefrg4
Download version 2011-07-06: http://www.mediafire.com/?f8b8q3w5u5p0ln0
Download version 2011-07-03: http://www.mediafire.com/?xlkcc08jvp5a43v
Download version 2011-07-01: http://www.mediafire.com/?5jmt7t0e83k3eox

Kernel performance (BFI_INT / VECTORS2 / WORKSIZE=128 / SDK 2.6 / APP KernelAnalyzer 1.11 - Cal 11.12 profile):
HD5870
2011-08-20: 22 GPR / 1427 ALU OPs / 66 CF OPs
2011-08-27: 22 GPR / 1426 ALU OPs / 66 CF OPs
2011-12-21: 20 GPR / 1400 ALU OPs / 66 CF OPs
2012-01-13: 21 GPR / 1394 ALU OPs / 67 CF OPs

HD6970
2011-08-20: 21 GPR / 1687 ALU OPs / 66 CF OPs
2011-08-27:  23 GPR / 1688 ALU OPs / 68 CF OPs
2011-12-21: 21 GPR / 1687 ALU OPs / 66 CF OPs
2012-01-13: 20 GPR / 1687 ALU OPs / 66 CF OPs



Kernel performance (BFI_INT / VECTORS2 / SDK 2.5 / APP KernelAnalyzer 1.9 - Cal 11.7 profile):
HD5870
original phatk 1.X: 1393 ALU OPs
2011-07-01: 1389 ALU OPs
2011-07-03: 1385 ALU OPs
2011-07-06: 1380 ALU OPs
2011-07-07: 1380 ALU OPs
2011-07-11: 1378 ALU OPs
2011-07-17: 1376 ALU OPs
2011-08-04 (pre-release): 1368 ALU OPs
2011-08-11: 1364 ALU OPs
2011-08-27: 1363 ALU OPs (30 less compared to original phatk 1.X)
HD6970
original phatk 1.X: 1707 ALU OPs
2011-07-01: 1710 ALU OPs
2011-07-03: 1706 ALU OPs
2011-07-06: 1702 ALU OPs
2011-07-07: 1702 ALU OPs
2011-07-11: 1701 ALU OPs
2011-07-17: 1699 ALU OPs
2011-08-04 (pre-release): 1689 ALU OPs
2011-08-11: 1687 ALU OPs
2011-08-27: 1687 ALU OPs (20 less compared to original phatk 1.X)



changelog:

2012-01-13
Kernel:
- modified: Disclaimer is now the same as in original Phoenix package
- removed: all (u) typecasts in front of scalars, where vectors and scalars were used together because per OpenCL definition this is not needed
- removed: all () brackets around n in the #define parts of the kernel
- removed: S0(), which is now again merged into s0()
- removed: brackets around the commands in t1W(), t1(), t2() and W() were removed, to allow the compiler to reorder these
- added: W() function missed an ; at it's end
- added: init variable B1addK6 used in round 6 to save an add -> THX to DiabloD3
- added: a (uint) typecast in front of get_local_id() and get_group_id() calls, because return value could be 64 bits long, which is not wanted
- modified: replaced all ma() + s0() or s0() + ma() calls with t2()
- modified: round 6 now uses the new new B1addK6 variable
- modified: reordered W[] calculation for rounds 32, 91 and 92
- modified: rounds 121, 122 and 123 to not compute Vals[4], Vals[5] and Vals[6], because they are not needed for final computation of Vals[7] -> THX to jhajduk
- modified: removed + H[7] from round 124 and use -0xec9fcd13 to check for valid nonces
- added: result_r124 variable to take the result of the last round 124, this saves a few ALU OPs on VLIW5 GPUs
Python Init:
- modified: replaced spaces with tabs in the source code formatting (I really dislike this part in Python ^^)
- modified: a few comments and commands were reformatted for better readability or to be better understandable
- modified: FASTLOOP parameter now defaults to False, which means you don't need to supply FASTLOOP=False anymore
- removed: OUTPUT_SIZE is not used anymore so all references to it were removed
- modified: changed REVISION to 122
- modified: moved the WORKSIZE checks below the part where the check, if and which vectors should be used is performed
            this takes into account, that the global worksize passed to the kernel is influenced by vector usage and vectorsize
            (currently the use of FASTLOOP can break this, because of the "dynamic" number of iterations)
- added: some debug info about worksize and pyOpenCL is displayed at the start
- added: B1 + K[6] is passed as new kernel parameter
- modified: made enqueue_read_buffer() / enqueue_write_buffer() blocking and removed finish() after the read, as per AMDs recommendations
            to minimize API overhead

2011-08-27:
Kernel:
- added: code path for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the kernel, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- modified: non BFI_INT Ch() function, which was broken in 2011-08-11 -> THX to Vince
- modified: kernel output buffer is now an ulong array and not an uint array
- removed: OUTPUT_SIZE argument is not passed and used in the kernel anymore
- modified: WORKSIZEx4, WORKSIZEx3 and WORKSIZEx2 arguments were merged into WORKSIZExVECSIZE
- modified: removed, reordered and added some brackets and type-casting stuff in the kernel
- modified: restored command order for round 108 - 123 to free a GPR
- modified: added H[7] into round 124 calculation
- modified: changed the checking for positive nonces again to cover the H[7] change
- modified: writing of nonces to output now uses 1 write for Vec2 and max. 2 writes for Vec4, because 2x uints are now encoded into 1x ulong
Python Init:
- added: code for 3-component Vectors, activated via VECTORS3 (currently not usable, because of a bug in the AMD drivers up to Cat 11.8)
- removed: BITALIGN option from the Python init, BFI_INT is now used automatically, if the HW supports it (disabled via BFI_INT=false)
- added: detection of maximum supported WORKSIZE per Device, which is used if no WORKSIZE is supplied, if supplied WORKSIZE > max. supported WORKSIZE
    or if WORKSIZE is not a power of 2
- added: code to decode the ulong from the output buffer into 2x uint and process the results
- modified: comments, code formating and line breaks for better readability
- modified: output buffer size is now the WORKSIZE -> THX to Phaetus

2011-08-11:
- modified: reverted a former change to the Ma() function to save an ALU OP for 69XX cards
- added: S0() and S1() function, which is a compiler help -> THX Phateus
- modified: a few brackets and layout of all helper functions for better readability and compatibility
- added: t2() function, which is (s0(n) + ma(n)) and saves a few GPRs -> THX Phateus and myself (had this in earlier, but removed it sometime ^^)
- modified: changed layout of kernel definition for better readability
- modified: all values which for example had a 10u now have a 10U (uppercase) to be consistent in the whole kernel
- modified: modified round 94 W calculation for better performance
- modified: round 108 - 123 now consists of 2 W() blocks followed by 2 sharoundW() blocks to save a GPR
- modified: changed the checking for positive nonces again to never create an invalid share and lower ALU OP usage

2011-08-04 (pre-release):
- added: user Vince into disclaimer -> THX Vince :)
- added: kernel is now able to work with 4-component vectors (switch VECTORS4) -> THX to Phateus
- modified: to use 2-component vectors I renamed the switch VECTORS to VECTORS2
- added: __attribute__((reqd_work_group_size(WORKSIZE, 1, 1))) -> THX to Phateus
- added: constants PreW31 and PreW32, which store P2() + P4() for round 31 and 32 -> THX to Phateus
- renamed - modified: W17_2 is now PreW19, W2 is now PreW18, PreVal4addT1 is now PreVal4 (= PreVal4 + T1), state0subT1 is now PreVal0 (= Preval4 + state0)
- modified: base is now declared as u to save the addidion of uint2(0, 1) or uint4(0, 1, 2, 3) for W_3 init -> THX to Phateus
- modified: nonce calculation now uses the local Work-Item ID, the group ID and the WORKSIZE instead of only the global Work-Item ID -> THX to Phateus
- added: saved a multiplication by passing WORKSIZEx2 and WORKSIZEx4 constants to the kernel
- modified: calculation for W[18 - O] was optimized so that P2(18) is only calculated for x component (if Vectors are used), because x and y only differ
       in the LSB and afterwards Bit 14 and 25 are rotated for W[18 - O].y -> THX to Phateus
- modified: saved an addition for Vals[0] init, because of the change to PreVal0
- modified: reordered code for round 4 - 95 to optimize for less ALU OPs used -> THX Phateus and myself ^^
- modified: ordering of variables in additions for Round 124 was changed to optimize for less ALU OPs used
- modified: rewrote the part where nonces are checked, if they are positive and where they are written into output buffer
       (saves 2 global writes per work-item and saves additional ALU OPs)
- modified: changed variables W_3, P2_18_x, P2_18 and nonce into a constant
- modified: changed code formating for rounds 4 - 124 better readability
- removed: some comments to cleanup the code

2011-07-17:
- added: offset for W[] array to reduce it's size -> THX to user Vince
- modified: function t1() renamed to t1W() / function sharound() renamed to sharoundW()
- added: function t1() and sharound() which are used where the W[] addition can be left out, because W[] == 0
    I guess the compiler already does this optimization, but doesn't hurt) -> THX to user Vince
- modified: P1() - P4() and W() to male use of the offset
- modified: quite a few kernel parameters have new values or were added (mixed ideas from User Vince with own ones)
    C1addK5: C1addK5 = C1 + K[5]: C1addK5 = C1 + 0x59f111f1
    D1: D1 = D1 + K[4] + W[4]: D1 = D1 + 0xe9b5dba5 + 0x80000000U
    W2: W2 + W16 in P1(): W2 = P1(18) + P4(18)
    W17_2: 0x80000000U in P2() = 0x11002000 + W17 in P1(): W17_2 = P1(19) + P2(19)
    PreValaddT1: PreValaddT1 = PreVal4 + T1
    T1substate0: T1substate0 = T1 - substate0
- added: variable W_3, which stores the first value formely held in W[3]
- added: Temp variable used to speed up calculation for rounds 4 and 5
- modified: changed round 3 so that it's more efficient (uses: Vals[0] and Vals[4])
- modified: W[0] - W[14] are now kind of hard-coded or left out, where they were 0
- modified: optimized P1(18) + P2(18) + P4(18)
- modified: optimized P1(19) + P2(19) + P4(19)
- modified: optimized round 4 + 5
- modified: rounds 6 - 14 and 73 - 78 now use new sharound() without W[] addition
- modified: offset added for all parts, where W[] is used
- modified: W_3 is used as result instead of W[3] (W[3] is still used to generate random possition in output buffer) -> THX to user Vince

2011-07-11:
- modified: constant H[7] has a new value (saves an addition in round 124)
- modified: non BFI_INT Ch() function now uses OpenCL built-in bitselect
- modified: reordered W[] calculations for round 18 - 30, 87 and 94
- modified: reordered calculation for round 5
- modified: W[] calculation for round 80 - 86 is now a block before sharound() is called
- removed: K[60] from round 124 (because of new H[7] value)

2011-07-07:
- removed: some large comments in the source were removed
- modified: Ma() function is now unique in the kernel, no matter if BFI_INT is used or not -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: Ch() function which uses OpenCL bitselect() command (but it's not active, so you are free to try it) -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- modified: u W[128] is replaced with u W[124] because no more than 124 values are used
- modified: initialisation for Vals[0], Vals[3], Vals[4] and Vals[7] is now processed in other places to save some unneeded writes to these variables
- fixed: some hex values, which were used in vector additions are now properly type-casted, which hopefully restores AMD APP SDK 2.1 compatibility
- modified: rounds 3, 4 and 5 were modified for better performance (guess this can be tuned, if I have a working KernelAnalyzer)

2011-07-06:
- modified: H[] constants were reordered (2 were not used because of earlier mods)
- added: ulong L constant added (it's value doesn't fit into an uint)
- modified: new Ma() for non BFI_INT capable cards, should be faster -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- removed: t1W()
- modified: t1() reordered function calls for better performance
- modified: W() reordered function calls for better performance
- modified: sharound() removed writing to t1, now t1() is called twice, which makes this function FASTER (OpenCL compiler optimization)
- removed: sharound2() (if needed W() + sharound() is used instead)
- removed: partround() not needed because of another solution for round 3 and 124
- removed: t1 and t1W variabled
- modified: rounds 3, 19, 30, 81, 87, 94 and 124 were modified for better performance

2011-07-03:
- removed: t2(), w(n), r0(x), r1(x), R0(n) and R1(n)
- renamed - modified: R(x) to W(x) plus now uses P1, P2, P3 and P4 directly
- modified: P1(x) and P2(x) to not use R1(x - 2), R0(x - 15) but do that directly
- modified: SHA rounds 31, 32, 47 - 61, 86, 87, 114 - 119 now use sharound2() instead of W() + sharound()
- modified: reordered code for SHA rounds 66 - 94 -> saw no decrease in performance -> better readability
- modified: SHA rounds 18, 19, 20, 80, 93, 94 now use a simpler calculation because of removed zero addions
--> 1x P1(x), 2x P2(x), 4x P3(x) and 2x P4(x) were removed which should give a little MHash/sec boost
- modified: sharound() so that a double execution of t1() is avoided -> THX to User: 1MLyg5WVFSMifFjkrZiyGW2nw
- added: "u t1W" variable, which is used in sharound2() to avoid double execution of t1W()

2011-07-01:
Code:
Vals[7] = 0xb0edbdd0 + K[0] +  W[64] + 0x08909ae5U; -> Vals[7] = 0xfc08884d + W[64];
Vals[3] = 0xa54ff53a + 0xb0edbdd0 + K[0] +  W[64]; -> Vals[3] = 0x198c7e2a2 + W[64];
- removed the
Code:
Vals[7] += H[7]
addition and replaced the final if-statements in the Kernel
- reordered some W[n] = statements to remove some unneeded additions
- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse :D)
- removed some unneeded brackets
- re-formatted for better readability

If it works, please post here and consider a small donation @ 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x :).

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
1715221755
Hero Member
*
Offline Offline

Posts: 1715221755

View Profile Personal Message (Offline)

Ignore
1715221755
Reply with quote  #2

1715221755
Report to moderator
Transactions must be included in a block to be properly completed. When you send a transaction, it is broadcast to miners. Miners can then optionally include it in their next blocks. Miners will be more inclined to include your transaction if it has a higher transaction fee.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715221755
Hero Member
*
Offline Offline

Posts: 1715221755

View Profile Personal Message (Offline)

Ignore
1715221755
Reply with quote  #2

1715221755
Report to moderator
lebuen
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 04, 2011, 08:38:12 AM
 #2

Works perfectly for me, although only slight increase in performance (371 -> 373 MH/s). But I had the previous patch already in place. Thanks!
Fletch
Full Member
***
Offline Offline

Activity: 168
Merit: 100


I'll have a steak sandwich and a... steak sandwich


View Profile
July 04, 2011, 08:40:32 AM
 #3

Great, thanks. I went from 240 -> 242.5 on my 5850's. I was using only the "3% Ma-function patch" before.

- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse Cheesy)
I've never developed any code for GPUs or used OpenCL, but wouldn't the compiler take care of that for you? At least all C compilers would.

HashPeak - GPU mining hashrate peak detector
BTC: 1FLETCHvcUKosefrcZCLUQTtvx4WvgnYMC
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 04, 2011, 09:38:52 AM
 #4

Great, thanks. I went from 240 -> 242.5 on my 5850's. I was using only the "3% Ma-function patch" before.

- replaced all additions like 64 + 5 with the corresponding integer value (guess it was in there for readability reasons, so here it got worse Cheesy)
I've never developed any code for GPUs or used OpenCL, but wouldn't the compiler take care of that for you? At least all C compilers would.

Great that you benefit from the modifications Smiley. Your comment about the compiler stuff IS right, but at least it doesn't make things worse Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 04, 2011, 10:37:43 AM
 #5

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 10:41:57 AM
 #6

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)


I went from 56->61Mhash/s on my 4850. Phatk is now faster than DiabloMiner. Getting 59Mhash/s with DiabloMiner.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 04, 2011, 10:44:37 AM
 #7

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)


Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though.
Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 04, 2011, 10:46:19 AM
 #8

I went from 56->61Mhash/s on my 4850. Phatk is now faster than DiabloMiner. Getting 59Mhash/s with DiabloMiner.

Good to know that 4XXX series get a boost, too Smiley. What SDK are you on?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 04, 2011, 10:49:47 AM
 #9

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)


Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though.
Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel?

Dia

I'm using SDK 2.1 because when I compared it with 2.4 I found a slight speed improvement with both phatk and poclbm (much larger with poclbm).

I haven't tried SDK 2.4 since applying the MA patch so that is maybe worth a try.

BFI_INT is definitely being used.  If I restart the command without BFI_INT on my 399.4 MH/s gpu I get 354.5 MH/s instead.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 04, 2011, 10:52:56 AM
Last edit: July 04, 2011, 11:37:47 AM by teukon
 #10

No change for me at all unfortunately, not even 0.1 MH/s.  I copied the file in place of phoenix-1.50/kernels/phatk/kernel.cl and restarted my instances of phoenix.

Settings:
Linux
Catalyst 11.6
SDK 2.1
phatk (bundled with phoenix 1.50 with MA tweak)
VECTORS BFI_INT FASTLOOP=false AGGRESSION=12 WORKSIZE=256
Solo mining

5850 Xtreme @ 1.0875V - 970MHz (399.4 MH/s) (75*C)
5850 Xtreme @ 1.1625V - 1050MHz (432.8 MH/s) (60*C)


Why are you using SDK 2.1? The phatk Kernel likes 2.4 best. Your Phoenix settings look good though.
Strange that your MH/s didn't change at all. Did Phoenix apply the BFI_INT patch the first time you startet with the new Kernel?

Dia

I'm using SDK 2.1 because when I compared it with 2.4 I found a slight speed improvement with both phatk and poclbm (much larger with poclbm).

I haven't tried SDK 2.4 since applying the MA patch so that is maybe worth a try.

BFI_INT is definitely being used.  If I restart the command without BFI_INT on my 399.4 MH/s gpu I get 354.5 MH/s instead.


Just tried SDK 2.4 on my 399.4 MH/s gpu.  It went down to 393.7 MH/s (actually 394.3 MH/s but occasionally dropping off to 390-391, I took an average).

Edit: I compared the new kernel with the old one using SDK 2.4 and the improvement was 3.1 MH/s (+0.79%).  This is a nice improvement but not enough to make me move away from SDK 2.1.  Also, SDK 2.4 causes the MH/s to drop suddenly by 3 or 4 MH/s every so often (variance is within 0.5 MH/s with SDK 2.1 for me) and even the peak values I achieve with SDK 2.4, new kernel or not, are below my SDK 2.1 average.

Ah well, good work though.  I sent you some BTC anyway simply because you tried to help me fix my problem.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 11:48:23 AM
 #11

I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 04, 2011, 12:00:17 PM
 #12

I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.

I found when pool mining that most of my rejects came shortly after new work was pushed.  As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 12:08:45 PM
 #13

I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.

I found when pool mining that most of my rejects came shortly after new work was pushed.  As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency.


I'm aware of that, but I've had more rejects in the 10 minutes I'm running it than I've had all day. 411 shares, 22 stale. 13 of those are from testing with this kernel in the last 45 minutes or so.

I'm trying another pool and it's a bit better at 6 Accepted 2 Rejected. On bitclockers I was running at 300+ shares and 11 rejects with DM but getting the numbers above with the modded phatk kernel using phoenix. Trying it again.

Quote
[04/07/2011 05:05:06] Phoenix 1.50 starting...
[04/07/2011 05:05:06] Connected to server
[04/07/2011 05:05:55] Result: 12d3028b accepted
[04/07/2011 05:06:46] Result: e2513abe accepted
[04/07/2011 05:09:05] LP: New work pushed
[04/07/2011 05:09:09] Result: 55153340 accepted
[04/07/2011 05:09:48] Result: c49813f2 accepted
[04/07/2011 05:11:17] Result: 3e257a0d rejected
[04/07/2011 05:11:25] Result: 28da50c1 rejected
[04/07/2011 05:11:26] Result: e062d59e rejected
[04/07/2011 05:11:48] LP: New work pushed
[04/07/2011 05:12:05] Result: 75d54b7b rejected
[04/07/2011 05:12:08] Result: c832f2b0 rejected

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 04, 2011, 12:15:40 PM
 #14

I have a really high reject rate with this kernel on my 4850. I've tried alternatively running this kernel and DiabloMiner. I'm getting about 30%-50% reject on your kernel vs 10% on DiabloMiner. Could just be bad luck. Right now it's 4 accepted and 5 rejected after 10 minutes.

I found when pool mining that most of my rejects came shortly after new work was pushed.  As a result I had to try much longer test runs (3 hours or so) before coming to a conclusion about the miner's efficiency.


I'm aware of that, but I've had more rejects in the 10 minutes I'm running it than I've had all day. 411 shares, 22 stale. 13 of those are from testing with this kernel in the last 45 minutes or so.

I'm trying another pool and it's a bit better at 6 Accepted 2 Rejected. On bitclockers I was running at 300+ shares and 11 rejects with DM but getting the numbers above with the modded phatk kernel using phoenix. Trying it again.

Yes, after 45 minutes things look highly suspect to me.  A good 3 hour test is useful for comparing different very-good setups but this tweak seems to have really hurt your accept/reject ratio.  Let us know your best when you're done testing.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 12:23:02 PM
Last edit: July 04, 2011, 12:58:11 PM by OCedHrt
 #15

With the original phatk kernel (ma patched)

Quote
[04/07/2011 05:13:33] Phoenix 1.50 starting...
[04/07/2011 05:13:33] Connected to server
[04/07/2011 05:14:21] Result: 8c4fd15d accepted
[04/07/2011 05:14:45] Result: f15aefe5 accepted
[04/07/2011 05:15:35] Result: 9d5dfc38 rejected
[04/07/2011 05:19:21] LP: New work pushed
[04/07/2011 05:19:54] Result: 8adaadf6 accepted
[04/07/2011 05:19:57] Result: 5382cf90 accepted
[04/07/2011 05:22:54] Result: 2d0233f8 rejected
[04/07/2011 05:24:03] Result: 28c05c3a rejected
[04/07/2011 05:25:41] Result: 9dff1142 rejected
[04/07/2011 05:25:54] Result: 33095b05 accepted
[04/07/2011 05:26:05] Result: 3ec67e7e accepted
[04/07/2011 05:27:33] Result: 5307e072 accepted
[04/07/2011 05:27:37] Result: 20237b07 accepted
[04/07/2011 05:29:18] Result: c8abce0f rejected

It's actually 05:22 now so duration is same but number of results is significantly less though total accepted is same. Hashrate is 57 vs 61 with your kernel.

Update: Some more results...seems like it may just be a phoenix thing with bitclockers. I will get some data from DM for reference but I may have to go ask in Phoenix/GUIMiner thread.

From DiabloMiner
Quote
[7/4/11 5:31:58 AM] Started
[7/4/11 5:31:58 AM] Connecting to: http://pool.bitclockers.com:8332/
[7/4/11 5:31:58 AM] Using AMD Accelerated Parallel Processing OpenCL 1.1 AMD-APP
-SDK-v2.5 (684.211)
[7/4/11 5:32:00 AM] Added ATI RV770 (#1) (10 CU, local work size of 128)
[7/4/11 5:33:20 AM] Accepted block 1 found on ATI RV770 (#1)
[7/4/11 5:35:41 AM] Accepted block 2 found on ATI RV770 (#1)
[7/4/11 5:36:49 AM] Accepted block 3 found on ATI RV770 (#1)
[7/4/11 5:36:49 AM] Accepted block 4 found on ATI RV770 (#1)
[7/4/11 5:37:54 AM] Rejected block 1 found on ATI RV770 (#1)
[7/4/11 5:39:40 AM] Accepted block 5 found on ATI RV770 (#1)
[7/4/11 5:40:23 AM] Accepted block 6 found on ATI RV770 (#1)
[7/4/11 5:40:34 AM] Accepted block 7 found on ATI RV770 (#1)
[7/4/11 5:40:56 AM] Accepted block 8 found on ATI RV770 (#1)
[7/4/11 5:41:53 AM] Accepted block 9 found on ATI RV770 (#1)
[7/4/11 5:42:15 AM] Accepted block 10 found on ATI RV770 (#1)
[7/4/11 5:42:38 AM] Accepted block 11 found on ATI RV770 (#1)
[7/4/11 5:42:59 AM] Accepted block 12 found on ATI RV770 (#1)
[7/4/11 5:43:57 AM] Accepted block 13 found on ATI RV770 (#1)
[7/4/11 5:44:48 AM] Accepted block 14 found on ATI RV770 (#1)

Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix.

Update again: No issues on poclbm using your kernel so I guess it's Phoenix.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 04, 2011, 03:22:12 PM
 #16

Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix.

Update again: No issues on poclbm using your kernel so I guess it's Phoenix.

Dunno what the difference is between the 2 OpenCL wise. Setup, command queues, perhaps kernel result download or processing.
If you see a problem you should contact jedi95, perhaps he can clear this up?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
huayra.agera
Full Member
***
Offline Offline

Activity: 154
Merit: 100



View Profile
July 04, 2011, 04:20:35 PM
 #17

Hi! Just tested this, it did boost my 5850 @ 960/300/1.174v from 390 > 395. However, it seems to drop off to 380 all of a sudden like 3-4 seconds then back up to 395. Still, an increase, thanks for the work!

BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 04, 2011, 05:48:07 PM
 #18

Trying your kernel with poclbm miner now. Actually getting 62 on here vs 61 from Phoenix.

Update again: No issues on poclbm using your kernel so I guess it's Phoenix.

Dunno what the difference is between the 2 OpenCL wise. Setup, command queues, perhaps kernel result download or processing.
If you see a problem you should contact jedi95, perhaps he can clear this up?

Dia

I will look into it. GUIMiner has their own Phoenix and that may be related.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
gfaust
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
July 04, 2011, 11:50:16 PM
 #19

I was already running the "#define Ma" optimized kernel, and this is good for another .5% on top of that.
shakaru
Sr. Member
****
Offline Offline

Activity: 406
Merit: 250


QUIFAS EXCHANGE


View Profile
July 04, 2011, 11:58:33 PM
 #20

went from 306-314 on my 5830's Sent you a little thank you. People PAY THIS MAN!

                             ▄▄▄████████▄▄▄
                         ▄▄██████████████████▄▄
                       ▄███████▄▄▄▄▄▄▄▄▄▄███████▄
                     ▄█████▄▄██████████████▄▄█████▄
        ██████  █████████▄████████████████████▄█████
        ██████  ███████▄████████▄▄▄▄▄▄▄▄████████▄████
                      ▄██████▀████████████▀██████▄████
███████   █████████████████████████████████████████████
███████   █████████████████████████████████████████████
                   ████████████████████████████████████
     ██████████████████████████████████████████████████
     ██████████████████████████████████████████████████
                     █████████████████████████████████
            ██████████▀██████▄████████████▄██████▀████
            ███████████▀████████▀▀▀▀▀▀▀▀▀▀███████▄███
                    █████▀████████████████▄▀██████▄
                     ▀█████▀▀██████████████▀██▀██████▄
                       ▀███████▀▀▀▀▀▀▀▀▀▀███████▀▀▀▀▀▀
                         ▀▀██████████████████▀▀
                             ▀▀▀████████▀▀▀
QUIFAS                    
                    ███
 █              ███ ███
 █              ███  █
███          █  ███
███         ███  █
███  █      ███  █
    ███  █  ███  █
    ███ ███  █   █
     █   █   █
     █      
Alan Lupton
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 05, 2011, 01:46:19 AM
Last edit: July 26, 2011, 03:46:48 AM by Alan Lupton
 #21

Hi,

I offer my rapidshare account for the links. I would mark the files as trafficshare, meaning no popups, no wait time, nothing. Just like a normal download from, lets say any software website, click and go. You wouldn't even have to go to the rapidshare page.

And since we're talking about 4KB, I couldn't care less Wink

Try it:

Version 2011-07-17: https://www.rapidshare.com/files/4111719732/2011-07-17_kernel.7z
Version 2001-07-11: https://www.rapidshare.com/files/3730055236/2011-07-11_kernel.7z
Version 2011-07-07: https://www.rapidshare.com/files/1447400948/2011-07-07_kernel.7z
Version 2011-07-06: https://www.rapidshare.com/files/698776394/2011-07-06_kernel.7z
Version 2011-07-03: https://www.rapidshare.com/files/3813413034/2011-07-03_kernel.7z
Version 2011-07-01: https://www.rapidshare.com/files/946373551/kernel.7z
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 05, 2011, 05:10:27 AM
 #22

went from 306-314 on my 5830's Sent you a little thank you. People PAY THIS MAN!

Sounds great and thank you Smiley!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 05, 2011, 05:12:41 AM
 #23

Can somebody upload a version of the kernel we can extract natively in windows without downloading yet another compression/decompression app?

TIA

edit:  Online decompression:  wobzip.org

The kernel is that small I guess it wouldn't even need to be zipped. Download volume should not be that big.
But I really like 7-Zip, it's free and open source ... you should consider to install it as your default packer Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Keninishna
Hero Member
*****
Offline Offline

Activity: 556
Merit: 500



View Profile
July 05, 2011, 08:51:23 AM
 #24

winrar can unzip like everything.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 05, 2011, 11:32:17 AM
 #25

winrar can unzip like everything.

And is Shareware ... but let's not start a packer discussion here an return on topic!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Turix
Member
**
Offline Offline

Activity: 76
Merit: 10



View Profile WWW
July 05, 2011, 12:15:45 PM
 #26

Can somebody upload a version of the kernel we can extract natively in windows without downloading yet another compression/decompression app?

TIA

edit:  Online decompression:  wobzip.org

Seriously just get 7-zip and remove the rest, its free, open source and its default format .7z is probably the most effective lossless compression format commonly used.

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
Bwincoin - 100% Free POS. BSqnSwv7xdD6UEh8bJz8Xp6YcndPQ2JFyF
Wildvest
Newbie
*
Offline Offline

Activity: 41
Merit: 0


View Profile WWW
July 05, 2011, 02:13:26 PM
 #27

i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 05, 2011, 02:41:02 PM
 #28

i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s

Too bad, but thanks for trying Smiley. Perhaps a new version will be ready by the end of this week. But guys don´t expect a huge improvement. On my setup I get 0,5 - 1,5 MHash/s more than with 2011-07-03 kernel version (guess the puzzle reaches it's end ^^). I can´t work like I would like to because the AMD APP KernelAnalyzer doesn't work ... hoping for a new version!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 05, 2011, 03:50:18 PM
 #29

i tried it on one of my 3 x 6990 mining rig - normally using poclbm with the phatk per GPU 408 MH/s - with your kernel per GPU 407 MH/s

This isn't exactly a fair comparison since phatk was specifically targeted at VLIW5 GPUs on SDK 2.4. A better comparison would be against phatk without this modification.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 05, 2011, 03:58:21 PM
 #30

The newest kernel gives an error on only 1 of my 4 cards. Phoenix says something about a kernel or OpenCL error and suggest that the card might be malfunctioning (and then keeps mining), and poclbm just crashes. The card is a ASUS 5870, but I have an identical model that works fine. The error might not happen for like 20 minutes. And the card apparently works perfectly since various pools accept the results from that card with no special extra-stales.

Also, I find it weird that the newest kernel seems to add some Watts of power consumption (I need to measure this propperly though).. Since its doing less operations and that way achieving a higher rate, shouldnt it consume the same?


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
Turix
Member
**
Offline Offline

Activity: 76
Merit: 10



View Profile WWW
July 05, 2011, 07:29:19 PM
Last edit: July 05, 2011, 11:38:01 PM by Turix
 #31

Bumped my hash rate from about 428 to 431 (3 MHash/s or 0.0696%), although previously I was using a customized kernel that has some of the same changes you've made so this increase is not representative.

Edit: XFX 5870 @ 950/315

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
Bwincoin - 100% Free POS. BSqnSwv7xdD6UEh8bJz8Xp6YcndPQ2JFyF
CYPER
Hero Member
*****
Offline Offline

Activity: 798
Merit: 502



View Profile
July 05, 2011, 08:25:23 PM
 #32

4x 5870 @ 960Mhz Core = rock solid 1748Mhash/s

After I put the modified kernel:

Variable 1746-1751 with most of the time staying around 1748

So no use for me, but thanks for your effort anyway Smiley
techwtf
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 05, 2011, 11:50:56 PM
 #33

5870, SDK 2.1, 11.6: no change, 419->419 Sad
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
July 06, 2011, 12:29:43 AM
 #34

saapphire 5830 xtreme.. upped me from 329.2 to 332.6

overclocked 1040/355

aggression 12 worksize 256

66c

mooo for rent
c_k
Donator
Full Member
*
Offline Offline

Activity: 242
Merit: 100



View Profile
July 06, 2011, 02:41:40 AM
Last edit: July 06, 2011, 03:04:27 AM by c_k
 #35

I get 5MH/s increase on 5850 (372MH/s -> 377MH/s) and 2MH/s increase on 5770 (215MH/s -> 217MH/s)

hmm, it does seem to vary a hell of a lot more though

hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 06, 2011, 05:13:15 AM
 #36

5870, SDK 2.1, 11.6: no change, 419->419 Sad

Since most gpu miners are using now the phatk kernel you should upgrade to 2.4. Phatk kernel is optimized for 2.4. You will get better results.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 10:12:04 AM
 #37

5870, SDK 2.1, 11.6: no change, 419->419 Sad

Since most gpu miners are using now the phatk kernel you should upgrade to 2.4. Phatk kernel is optimized for 2.4. You will get better results.

Well, you may get better results.  For me:

SDK 2.4:  393.7 -> 396.8
SDK 2.1:  399.4 -> 399.4

That's for a Sapphire HD5850 Xtreme 970/300@1.0875V using catalyst 11.6 on Linux and phatk
VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256

Definitely try out both SDK 2.1 and SDK 2.4 though.  If you are using Windows there's also an early version of SDK 2.5 with catalyst 11.7 which may be worth a look.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 06, 2011, 01:09:39 PM
 #38

New version is ready, DL here: http://www.mediafire.com/?f8b8q3w5u5p0ln0

Updated first post with changelog and performance info. This one should be a bit faster on 69XX cards than the original phatk, faster than all other phatk versions I did on 58XX and faster on non BFI_INT cards because of a change user 1MLyg5WVFSMifFjkrZiyGW2nw suggested!

Try, have fun, comment and donate Cheesy.

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Apopfis
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
July 06, 2011, 01:39:30 PM
 #39

from ~402Mh/s ---to--- ~405Mh/s   Shocked and I have 4x5850 sapphires each clocked to 1000Mhz.


Will take time to see what stales rate will be. With the last kernel it was around 2-3 %. Usually closer to 2%.


PLEASE KEEP THESE COMING  Grin
strictlyfocused
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
July 06, 2011, 01:56:34 PM
 #40

MSI Hawk 5770
2011-07-03 kernel :: 233 MH/s
2011-07-03 kernel :: 236 MH/s

Thanks!!!

 Grin
Keninishna
Hero Member
*****
Offline Offline

Activity: 556
Merit: 500



View Profile
July 06, 2011, 02:53:22 PM
 #41

no change for me and my 6950s 11.6 drivers sdk 2.4
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 04:43:35 PM
 #42

Another improvement but still not enough to beat SDK 2.1 for me.

The two phatk kernels of interest to me are:
- Kernel A = standard phatk kernel with the MA tweak applied.
- Kernel B = the latest kernel from this thread.

I'm using: Linux, Catalyst 11.6, a Sapphire HD5850 Xtreme:

At 900 MHz things look promising...

[900 MHz - 360 MHz RAM]
SDK 2.1, kernel A: 364.4 MH/s
SDK 2.1, kernel B: Fatal error
SDK 2.4, kernel A: 360.8 MH/s
SDK 2.4, kernel B: 365.6 MH/s

...but at higher core clock rates SDK 2.1 takes the lead once more.

[980 MHz - 360 MHz RAM]
SDK 2.1, kernel A: 404.7 MH/s
SDK 2.4, kernel B: 404.3 MH/s

[1020 MHz - 360 MHz RAM]
SDK 2.1, kernel A: 421.5 MH/s
SDK 2.4, kernel B: 420.9 MH/s

I would give some higher clocks but I can't go much past 1020 MHz without overvolting my card and I don't want to do that.

I tried playing with the RAM frequency but everything dropped off slowly as I lowered it and quickly as I raised it no matter which kernel or version of SDK I choose.

It's a shame that SDK 2.1 cannot drive your latest kernel but then I guess you are specifically designing it for SDK 2.4 and are now using features which are not available in SDK 2.1.  It would be great to finally put SDK 2.1 to bed but another MH/s sounds like a tall order at this point.

I have no data on accepts and rejects (I mine solo).
CYPER
Hero Member
*****
Offline Offline

Activity: 798
Merit: 502



View Profile
July 06, 2011, 04:50:18 PM
 #43

I'm using Autominer and the newest kernel does not work for me at all, but I don't have time to troubleshoot it as I'll miss on my mining and any expected rewards.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 05:13:20 PM
 #44

I'm using Autominer and the newest kernel does not work for me at all, but I don't have time to troubleshoot it as I'll miss on my mining and any expected rewards.

Possibly you're using an old version of SDK.  I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 06, 2011, 07:07:04 PM
 #45

I made an interesting discovery during my own tests with the new kernel version. I had to up the memory clock of my 5870 from 200 to 350 MHz in order to achieve the highest hashing values. Another thing to mention is, that I drive a Phenom II X6 1090T with only 800 MHz for every core, due to power saving, while mining. If I let the CPU use full speed, MHash/s goes even higher, let's say 3-4 MH/s.

Conclusion: Perhaps you guys should try to raise your mem speeds + experiment with CPU clocks, too. I know it has to be a good balance, so that higher MH/s values are not eaten by higher energy costs.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 07:15:24 PM
 #46

I made an interesting discovery during my own tests with the new kernel version. I had to up the memory clock of my 5870 from 200 to 350 MHz in order to achieve the highest hashing values. Another thing to mention is, that I drive a Phenom II X6 1090T with only 800 MHz for every core, due to power saving, while mining. If I let the CPU use full speed, MHash/s goes even higher, let's say 3-4 MH/s.

Conclusion: Perhaps you guys should try to raise your mem speeds + experiment with CPU clocks, too. I know it has to be a good balance, so that higher MH/s values are not eaten by higher energy costs.

Dia

My card RAM is already at 360 MHz and I've tested but I can't find a better frequency for the RAM at my core speeds if I'm only interested in MH/s.

As for CPU usage  I've not touched my CPU settings at all and the miners only use about 0.4% each.  I even removed the fan from the CPU and placed it to cool the back of my hot card (the heatsink on the CPU is not even warm).  I'm assuming significant CPU loads is a Windows thing.

What interests me is how SDK 2.1 seems to be better at higher clock speeds whereas SDK 2.4 with your kernel is better at moderate speeds (940 MHz or below).  I admit I have little data on this but if anyone else gets the same results it would be interesting to know why.
CYPER
Hero Member
*****
Offline Offline

Activity: 798
Merit: 502



View Profile
July 06, 2011, 07:18:31 PM
 #47

Possibly you're using an old version of SDK.  I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.


Well it worked with the previous version of the modified kernel.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 07:22:37 PM
 #48

Possibly you're using an old version of SDK.  I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.


Well it worked with the previous version of the modified kernel.

Yes, I found the previous version worked with SDK 2.1.  But the version released today doesn't.  I had to change to SDK 2.4 for this most recent version and this change actually lost me 0.4-0.6 MH/s.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 06, 2011, 07:34:13 PM
 #49

Possibly you're using an old version of SDK.  I get a fatal error when trying this with SDK 2.1 but am fine with SDK 2.4.


Well it worked with the previous version of the modified kernel.

Yes, I found the previous version worked with SDK 2.1.  But the version released today doesn't.  I had to change to SDK 2.4 for this most recent version and this change actually lost me 0.4-0.6 MH/s.


I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 07:49:09 PM
 #50

I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Don't worry about it.  The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have.

I don't know how to get detailed error messages from phatk.  When I use SDK 2.1 and your latest kernel I run the command
python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1
and get
[<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!

If I try the same with the previous version of your kernel everything works happily.  I wish I had more details for you but I just don't know how to get them.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 06, 2011, 08:07:43 PM
 #51

I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Don't worry about it.  The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have.

I don't know how to get detailed error messages from phatk.  When I use SDK 2.1 and your latest kernel I run the command
python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1
and get
[<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!

If I try the same with the previous version of your kernel everything works happily.  I wish I had more details for you but I just don't know how to get them.


If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 08:19:16 PM
 #52

If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

Would I get more detailed feedback from another front-end to phatk?  I haven't really 'shopped around' with the front ends.
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 06, 2011, 08:53:03 PM
 #53

I'm sorry to hear that ... what are the error messages you get with 2.1? I only tried with 2.4 and will test only on 2.4 and later SDKs by myself. Buf If you give me a hint I can try to fix it.

Dia

Don't worry about it.  The improvements for the SDK 2.4 users are clear and I'm impressed that you've managed to close the gap between 2.4 and 2.1 as much as you have.

I don't know how to get detailed error messages from phatk.  When I use SDK 2.1 and your latest kernel I run the command
python phoenix.py -u http://<user>:<pass>@<host>:<port>/ -a 1 -q 1 -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 DEVICE=1
and get
[<date> <time>] FATAL kernel error: Failed to load OpenCL kernel!

If I try the same with the previous version of your kernel everything works happily.  I wish I had more details for you but I just don't know how to get them.


If Phoenix would allow to output the OpenCL compiler build log we could get an idea what's wrong. Perhaps jedi95 reads here and takes this as a suggestion Cheesy.
Perhaps I can take the lead with 2.4 and newer versions of my kernel, but for now I have no huge optimization ideas ... (but I'm thinking about it right now ^^).

Dia

There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.

For mining I see only 2 real options:
SDK 2.1 with poclbm
SDK 2.4 with phatk

2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either.
2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4

Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
July 06, 2011, 09:02:09 PM
 #54

I am sure, it increases.
424-447 Mhash/s & 413 -430 Mhash/s
Sapphire 5870 & MSI 5870.
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 09:07:25 PM
 #55

There is no point trying to run phatk on pre-2.4 SDK versions. It will just end up being slower than the poclbm kernel.

I read elsewhere that this is the theory but in practice phatk is faster than poclbm on SDK 2.1 for me.  Maybe this has something to do with the fact that I've applied the MA tweak (one less operation) to both kernels.

E.g. Sapphire HD5850 Xtreme 1000MHz core, 350MHz RAM, Catalyst 11.6 (Linux x86_64), VECTORS BFI_INT FASTLOOP=false AGGRESSION=13 WORKSIZE=256:
phatk: 413.3 MH/s (+/- 0.2 MH/s)
poclbm: 411.4 MH/s (+/- 0.2 MH/s)

I've tried lower core speeds and higher RAM speeds but always phatk outperforms poclbm on SDK 2.1 for me.

For mining I see only 2 real options:
SDK 2.1 with poclbm
SDK 2.4 with phatk

2.2 is slower than 2.1 on poclbm and doesn't work well with phatk either.
2.3 is even slower than 2.2 on poclbm, but all I know with phatk is that it's slower than with 2.4

Anyway, getting the output from the compiler is very simple. You just need to comment out the try/except block surrounding self.loadKernel().

I'll try that.
Wildvest
Newbie
*
Offline Offline

Activity: 41
Merit: 0


View Profile WWW
July 06, 2011, 09:18:42 PM
 #56

THANKS for your efforts  ! just reporting back  Cool

6990 version 2011-07-06 with Catalyst 11.4, SDK 2.4 now equal with the latest poclbm (phatk) - maybe 0.5 MH/s slower  Cry
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 06, 2011, 09:20:17 PM
 #57

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.
Alan Lupton
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 06, 2011, 10:51:07 PM
 #58

2001-06-07: Wow, nice work! Now I'm getting not 5-15% rejections and working like a charm. No speed increase though from last update.
c_k
Donator
Full Member
*
Offline Offline

Activity: 242
Merit: 100



View Profile
July 07, 2011, 02:38:11 AM
 #59

New release gives me 2-3MH/s more

I've given a small donation Smiley

Thanks for the hard work!

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 06:02:05 AM
 #60

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.


You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dsky
Sr. Member
****
Offline Offline

Activity: 279
Merit: 250


View Profile
July 07, 2011, 07:22:21 AM
 #61

All miner are Windows 7 x32 - SDK 2.4 - Catalyst 11.6

Latest changes:
HD5770 - from 219 up to 220
HD6950 (unlockable) - from 367 to 370
HD6970 (6950 with 6950 BIOS) - from 405 up to 408

Small speed increase on all three kind of cards and the rejected rate seems better, too.

Well done again, Sir!

hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 07, 2011, 07:40:36 AM
Last edit: July 07, 2011, 09:07:07 AM by hugolp
 #62

5870, Ubuntu 11.04, 11.6, 2.4, poclbm, went up 1MH/s (with last modification from previous modification).

The good news is the card that was randomly crashing the miner every 20 minutes with previous patch has been running for more than an hour without problems, so it seems stable now. Just crashed. I dont know what happens with this card and the modified kernel. Also, consumption has gone down like 5W. Im very puzzled by this changes in consumption by the different kernels.

Very good job. A small donation is going your way.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 07, 2011, 07:48:45 AM
 #63

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.


You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

No good.  Now I get a ton of "expression must have a constant value" errors.  The end of the log looks like:
{
/tmp/OCLgV3our.cl(25): error: expression must have a constant value
        (u)0x6a09e667, (u)0xbb67ae85, (u)0x3c6ef372, (u)0x510e527f, (u)0x9b05688c, (u)0x1f83d9ab, (u)0xfc08884d, (u)0x5be0cd19
                                                                                                                    ^

/tmp/OCLgV3our.cl(29): error: expression must have a constant value
  __constant ulong L = (u)0x198c7e2a2;
                          ^

/tmp/OCLgV3our.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

74 errors detected in the compilation of "/tmp/OCLgV3our.cl".
}

Only one of the "mixed vector-scalar operation" errors remains but I'm guessing the others are still there but just buried by the even more urgent "constant value" errors.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 07, 2011, 08:32:42 AM
 #64

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 07, 2011, 08:50:22 AM
 #65

You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

Sorry about that last post.  I'm not usually that dumb I assure you.

I've modified your kernel code by adding (u) before each of the 5 raw hex values corresponding to the error messages.  I also added (u) directly before L from the other error message.  After this everything starts working in SDK 2.1.

For my stock voltage 5850:
423.7 (+/- 0.1) MH/s -> 425.9 (+/- 0.05) MH/s

This does of course mean that SDK 2.1 has increased its lead against SDK 2.4 for me.  So many people are convinced that SDK 2.4 is faster so perhaps this is a Windows/Linux thing.

If this runs for 24 hours without freezing then I have a new personal best!  I will want to test what proportion of these hashes are inaccurate but things are looking good.  Another donation is coming your way.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 09:46:52 AM
 #66

Ok, here are the errors for the latest kernel on SDK 2.1.
{
Build on <pyopencl.Device 'Cypress' at 0x34a3680>:

/tmp/OCLthVTDN.cl(126): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[19] = P4(19) + 0x11002000 + P1(19);
                         ^

/tmp/OCLthVTDN.cl(138): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[30] = P3(30) + 0xA00055 + P1(30);
                         ^

/tmp/OCLthVTDN.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

/tmp/OCLthVTDN.cl(286): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[81] = P4(81) + P2(81) + 0xA00000;
                                  ^

/tmp/OCLthVTDN.cl(299): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[87] = P4(87) + P3(87) + 0x11002000 + P1(87);
                                  ^

/tmp/OCLthVTDN.cl(316): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        W[94] = P3(94) + 0x400022 + P1(94);
                         ^

6 errors detected in the compilation of "/tmp/OCLthVTDN.cl".
}

Hopefully this is just some implicit casting of a kind which SDK 2.1 wants to be babied through.  If I were versed in OpenCL I'd have a go at fixing this myself but I'm sure you would be much more efficient.


You could try to add (u) in front of the raw hex values like (u)0x400022 and report back then.

Dia

No good.  Now I get a ton of "expression must have a constant value" errors.  The end of the log looks like:
{
/tmp/OCLgV3our.cl(25): error: expression must have a constant value
        (u)0x6a09e667, (u)0xbb67ae85, (u)0x3c6ef372, (u)0x510e527f, (u)0x9b05688c, (u)0x1f83d9ab, (u)0xfc08884d, (u)0x5be0cd19
                                                                                                                    ^

/tmp/OCLgV3our.cl(29): error: expression must have a constant value
  __constant ulong L = (u)0x198c7e2a2;
                          ^

/tmp/OCLgV3our.cl(261): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
        Vals[3] = L + W[64];
                      ^

74 errors detected in the compilation of "/tmp/OCLgV3our.cl".
}

Only one of the "mixed vector-scalar operation" errors remains but I'm guessing the others are still there but just buried by the even more urgent "constant value" errors.


Ah sorry, I was not clear enough. You must not add (u) in front of every hex value in the kernel, but ONLY in front of the hex values, that generated an error.

Code:
W[19] = P4(19) + (u)0x11002000 + P1(19);

W[30] = P3(30) + (u)0xA00055 + P1(30);

Vals[3] = (u)L + W[64];

W[81] = P4(81) + P2(81) + (u)0xA00000;

W[87] = P4(87) + P3(87) + (u)0x11002000 + P1(87);

W[94] = P3(94) + (u)0x400022 + P1(94);

If you can be so kind and test this out and report back. I would say restore the latest kernel and then modifiy the 6 places.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 07, 2011, 09:58:45 AM
 #67

Ah sorry, I was not clear enough. You must not add (u) in front of every hex value in the kernel, but ONLY in front of the hex values, that generated an error.

You were perfectly clear, I was just being dumb.  Incase you missed my second post, this fix works for SDK 2.1.  Thank you very much.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 10:06:00 AM
 #68

Great, so we have a fix and a version that works with 2.1. Will release a fix later today!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
n4l3hp
Full Member
***
Offline Offline

Activity: 173
Merit: 100


View Profile
July 07, 2011, 10:44:05 AM
 #69

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining.

For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 07, 2011, 12:11:48 PM
 #70

7/6 kernel seems to have the following effects for me:

4850 - elf kernel is much larger, but slight speed increase of ~3-4% from 56-57 @ 460 core to 58-59 @ 460 core. also seems to run cooler. since my card is overheating I have to underclock. cooler kernel means higher clock at 480 - getting 60. Using -v -w 128. worksize 128 is still optimal, 64 is slightly slower and 256 is much slower < 50mhash.

6450 - again elf kernel is much larger, but slight speed decrease at worksize 128. previous kernel was optimal at 128 but this one is optimal at 64. worksize 64 with new kernel is equivalent to worksize 128 with old kernel. potentially runs cooler but did not check. this is a fanless card but only gets 32mhash.

My 4850 @ 675 core 250 mem gets 85MH/s. 0.32% stale rate at DeepBit. (bought from eBay, dont know what brand, came with zalman cooler. anything higher than 680 core will cause it to stop hashing even if overvolted). Temps at 71 degrees celsius, closed case. Been running Milkyway@Home for more than a year at the same settings before I switched it to bitcoin mining.

For ATI 4000 series, use SDK 2.1 and poclbm (April 28 version). Using phatk and higher opencl sdk version on these cards will only lower the hash rate.

You misread my post. I am running at 460 core because the card is 105C at that speed. I cannot run it any faster. I can run 480 core with this new kernel. Btw, the days of SDK 2.1 and poclbm are nearly over. I get 84MH/s at 675 core and 494 mem. I can't do 250 mem the card doesn't downclock that far with afterburner. At 250 mem it would be even higher. However I can actually clock 700+ even though only for a few seconds.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 03:20:41 PM
 #71

New version 2011-07-07 is ready: http://www.mediafire.com/?7j70gnmllgi9b73

This is mainly a bugfix release for SDK 2.1 with some code restructuring to save a few writes and additions. I can not guarantee, that this really works for 2.1, because I didn't test it. If you are unsure, wait for users to test it for you and consider applying this patch later!

By the way, I want to thank all of those who donated a few Bitcents to me, feels great!

Thanks,
Dia

PS.: If it works, please post here and consider a small donation @ 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM Smiley.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Saturn7
Full Member
***
Offline Offline

Activity: 147
Merit: 100



View Profile
July 07, 2011, 03:29:29 PM
 #72

Went from 433/Mhash to 440/Mhash on 5870. Overclocked to 970Mhz.
Thanks  Smiley Donation sent.

First there was Fire, then Electricity, and now Bitcoins Wink
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 07, 2011, 03:30:10 PM
 #73

This kernel will cause poclbm to exit after running for a while.
I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing

AMD5870 with SDK 2.5
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 03:37:03 PM
 #74

This kernel will cause poclbm to exit after running for a while.
I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing

AMD5870 with SDK 2.5

That's the first report I get with that problem. Any other poclbm users with that observation?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 07, 2011, 03:45:03 PM
 #75

This kernel will cause poclbm to exit after running for a while.
I have tested several times, after few hours poclbm will be gone and my machine left there doing nothing

AMD5870 with SDK 2.5

That's the first report I get with that problem. Any other poclbm users with that observation?

Dia

I reported it twice in this same thread...

It only happens in one of my cards (I have 4 5870's, and I have one that its an identical model that works fine). In phoenix what happens is that it will give a mistake about a kernel mistake and will continue mining (I suspect phoenix reloads the kernel).


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 07, 2011, 03:56:54 PM
 #76

Excellent work on this kernel.  It seems like the original author got to a point where he thought he had improved performance to almost the max, but you are progressing very nicely!
CYPER
Hero Member
*****
Offline Offline

Activity: 798
Merit: 502



View Profile
July 07, 2011, 07:16:33 PM
 #77

Well finally I can see some improvement.

With todays version I got from 1748 to 1758 so ~10Mhash increase.

This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory
SDK 2.1
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 07, 2011, 08:01:50 PM
 #78

Well finally I can see some improvement.

With todays version I got from 1748 to 1758 so ~10Mhash increase.

This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory
SDK 2.1


Seems like good news for SDK 2.1 users, right Wink?

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
CYPER
Hero Member
*****
Offline Offline

Activity: 798
Merit: 502



View Profile
July 07, 2011, 08:11:12 PM
 #79

Well to be perfectly honest and objective 0.5% increase won't make any difference even with my setup  Roll Eyes At least not in terms of financial benefits Smiley

But I don't mean to belittle your work - well done Smiley
erek
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
July 07, 2011, 09:40:20 PM
 #80

808.6 MH/sec max now (up 1-2% at least) from 7-6-11 to 7-7-11
zimpixa
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
July 07, 2011, 10:16:10 PM
 #81

SDK 2.1 working for 5h now.

Newest version seems to behave more stable (less delta max-min). Speed is about 0.5MHash higher, but it can be just impression.

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 07, 2011, 10:36:13 PM
 #82

Excellent.

It looks like this SDK 2.1 bugfix is popular!  Perhaps now people will stop telling me to use SDK 2.4.

Thanks Diapolo for the fix and I'm glad I was able to help.
bmgjet
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
July 07, 2011, 11:23:51 PM
 #83

version 2011-07-06 gives best over all speed. 0.500 faster then my modded one but opening firefox drops speed from 277 down to 233 where my one only drops 2mh/s. Probably just the way phatk works since iv never used the stock one.

version 2011-07-07 is all over the place. jumps between 255-280 without firefox open so don't know if its faster or slower. With firefox its more stable then older version and drops to 252-258

Im using it with poclbm.exe -v -w 256 (128 gave same result) on 6850 overclocked/underclocked and 2.1 SDK.

Donations to: 1BMGjetfht9XLkGBYR4TSsuXjrYEKACcow
1stbits: 1bmgjet
300MHash/s 6850 http://www.techpowerup.com/gpuz/5u6wr/
Overclocked for 6 years and still strong http://valid.canardpc.com/show_oc.php?id=1931458 & http://valid.canardpc.com/show_oc.php?id=285337
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 08, 2011, 12:55:40 AM
 #84

I have a feature request

Can you please make it work with intel openCL?

thank you
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 05:24:10 AM
 #85

Excellent.

It looks like this SDK 2.1 bugfix is popular!  Perhaps now people will stop telling me to use SDK 2.4.

Thanks Diapolo for the fix and I'm glad I was able to help.


It was only possible to fix it that fast, because you showed me the log files and error output! So we helped eachother, thanks too!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 05:25:46 AM
 #86

version 2011-07-06 gives best over all speed. 0.500 faster then my modded one but opening firefox drops speed from 277 down to 233 where my one only drops 2mh/s. Probably just the way phatk works since iv never used the stock one.

version 2011-07-07 is all over the place. jumps between 255-280 without firefox open so don't know if its faster or slower. With firefox its more stable then older version and drops to 252-258

Im using it with poclbm.exe -v -w 256 (128 gave same result) on 6850 overclocked/underclocked and 2.1 SDK.

I discovered a MH/sec drop while, using Firefox, too. It has to be related with the new GPU acceleration that FF implemented in 4.0 and up.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 05:27:00 AM
 #87

I have a feature request

Can you please make it work with intel openCL?

thank you

I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
kwaaak
Full Member
***
Offline Offline

Activity: 139
Merit: 100


View Profile
July 08, 2011, 09:55:22 AM
 #88

Thanks a lot
r3v3rs3
Newbie
*
Offline Offline

Activity: 22
Merit: 0


View Profile
July 08, 2011, 10:12:33 AM
 #89

2011-07-03 -> 2011-07-07

Wheezy x64, 11.4, SDK 2.4:

Box #1:

- HD5750, 875/300, AGGRESSION=11, 175 MH/s -> 176 MH/s

Box #2:

- HD5750, 900/300, AGGRESSION=11, 181 MH/s -> 183 MH/s
- HD5770, 950/300, AGGRESSION=11, 215 MH/s -> 217 MH/s

XP 32, 11.7 preview, SDK 2.5:

- HD5770, 1000/300, AGGRESSION=12, 222 MH/s -> 223 MH/s
- HD5830, 1050/300, AGGRESSION=9, 337 MH/s -> 337 MH/s

phatk w/ Ma patch -> 2011-07-07

Natty x32, 11.6, SDK 2.4:

- HD5750, 900/300, AGGRESSION=9, 176 MH/s -> 176 MH/s
- HD5770, 950/1200 (going to be RBE'ed to 300 Wink), AGGRESSION=9, 204 MH/s -> 204 MH/s
- HD5830, 1000/300, AGGRESSION=9, 312 MH/s -> 313 MH/s

phoenix 1.50 w/ common flags: VECTORS BFI_INT FASTLOOP=false WORKSIZE=256

Nice work, sent some bitcents to 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM. Smiley
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 11:57:07 AM
 #90

Guys, I introduced a small glitch, which produces an OpenCL compiler warning in version 07-07. For stability reasons please change line 77:

old:
u W[123];

new:
u W[124];

I missed sharound(123), which writes to W[123], which is undefined, because it's out of range. Sorry for that!
Will upload a fixed version shortly (only includes the change above and stays 07-07).

Edit:
Download 07-07 fixed: http://www.mediafire.com/?o7jfp60s7xefrg4

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Tx2000
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile
July 08, 2011, 02:55:09 PM
 #91

Tried it out for the first time and I do see some noticable gains.



5850 @ 970/350
GUIMiner 2011-7-1

poclbm opencl - 410-413Mh
phoenix phatk - 408-413Mh

with your modified kern - 415-416.4Mh



Send a little something your way
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 08, 2011, 06:28:38 PM
 #92

I have a feature request

Can you please make it work with intel openCL?

thank you

I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!?

Dia
I have tried.  Poclum will not run at all.  It crashed upon starting.

I took a look at the code.  I think the comments are messy and some not really helpful.  Do you think its an good idea to 'fix' the comment?  btw comment why you type cast the hex value so other developer wont think its unnecessary and remove it.
error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
July 08, 2011, 09:58:24 PM
 #93

These two changes have taken my 5850s from 345MHash/sec to 360MHash/sec. Very nice.

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 11:57:40 PM
 #94

I have a feature request

Can you please make it work with intel openCL?

thank you

I know that Intel recently released and OpenCL gold SDK ... the kernel uses standard OpenCL commands and an AMD extension only for BFI_INT / BITALIGN. I see no reason why it should not work. Have you got some error logs for me? You know that for Intel it will only use the CPUs!?

Dia
I have tried.  Poclum will not run at all.  It crashed upon starting.

I took a look at the code.  I think the comments are messy and some not really helpful.  Do you think its an good idea to 'fix' the comment?  btw comment why you type cast the hex value so other developer wont think its unnecessary and remove it.

Hex-values are type-casted so that the kernel works with AMD 2.1 SDK, which throws an error, if NOT type-casted.
I don't understand what you want to tell me with the "comments are messy" part.

If you get an error log with Intel SDK please post it here, so I can have a look at it.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 08, 2011, 11:58:43 PM
 #95

These two changes have taken my 5850s from 345MHash/sec to 360MHash/sec. Very nice.

5830s seem like THE card for my modded kernel. Great to hear Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 09, 2011, 02:04:58 PM
 #96

Hi Diapolo!

Great to see you're making progress!
There's one thing that pops into my eye:

you already do:
if(Vals[7].x == -H[7])

why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)

if(Vals[7].x == -H[7]-K[60])


Lets see if I can find more ..
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 09, 2011, 02:45:37 PM
 #97

Hi Diapolo!

Great to see you're making progress!
There's one thing that pops into my eye:

you already do:
if(Vals[7].x == -H[7])

why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)

if(Vals[7].x == -H[7]-K[60])


Lets see if I can find more ..


Good idea and works, can't verify via KernelAnalyzer, but seems like a vector addition less.
Will be included in the next version!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 09, 2011, 02:59:02 PM
 #98

Another addition waiting to be removed:

 Vals[7] = (Vals[3] = (u)0xb956c25b + D1 + s1(4) + ch(4)) + H1;

-> D1 is only used here, so why not add (u)0xb956c25b during precalculation?  Wink

Add

self.state2[3] = np.uint32(self.state2[3] + 0xb956c25b);

to __init__.py, line 77 for me, right behind:

self.calculateF(data)

And remove (u)0xb956c25b from kernel.cl

This also works 100%, no logic change involved here.

bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 09, 2011, 05:35:47 PM
 #99


What parameters are you using with phatk?

On my normally aspirated R6970 Lightning I get 419.xMH/s (Ma fix in poclbm):

Code:
-k poclbm DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS FASTLOOP=false

The fastest I've ever gotten with phatk is 403.xMH/s

Code:
-k phatk DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS FASTLOOP=false

Any suggestions?

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
huayra.agera
Full Member
***
Offline Offline

Activity: 154
Merit: 100



View Profile
July 09, 2011, 06:08:03 PM
 #100

I hope you can make an optimization for the next OCL version (v2.5 (684.212)) available in beta form already. =)

BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 09, 2011, 06:09:16 PM
 #101

Vince i editted the code like you said and I got errors.

Diapola what you did works fine. I have 2 5830's testing with now. Expect some Bit.love is all goes well.
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 09, 2011, 07:34:44 PM
Last edit: July 09, 2011, 08:13:28 PM by Vince
 #102

Vince i editted the code like you said and I got errors.

Which one of the changes did you try, both?

Tell me about the error you got, just "does not work" helps nobody!
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 09, 2011, 09:07:32 PM
 #103

Vince i editted the code like you said and I got errors.

Which one of the changes did you try, both?

Tell me about the error you got, just "does not work" helps nobody!

In Kernel.cl
I changed this:
if(Vals[7].x == -H[7])

to this
if(Vals[7].x == -H[7]-K[60])

and changed this

Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

to this
Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);


Then i changed this
Vals[7] = (Vals[3] = (u)0xb956c25b + D1 + s1(4) + ch(4)) + H1;

to this

Vals[7] = (Vals[3] = D1 + s1(4) + ch(4)) + H1;


as instructed here and yes i added the line to init.py

Quote
Add

self.state2[3] = np.uint32(self.state2[3] + 0xb956c25b);

to __init__.py, line 77 for me, right behind:

self.calculateF(data)

And remove (u)0xb956c25b from kernel.cl


The error is opencl is having unusual behavior or something. it shows MH etc... just when it seems to want to accept a share it spits that out
1bitc0inplz
Member
**
Offline Offline

Activity: 112
Merit: 10


View Profile
July 10, 2011, 03:20:39 AM
 #104

This most recent update is everything you said it would be!

My 5830 went from 295 MH/s to 305 MH/s with just this update!

Thanks for the great work.

Mine @ http://pool.bitp.it - No fees, virtually 0 stales, what's not to love!
Chat with us @ #bitp.it on irc.freenode.net
Learn more about our pool @ http://forum.bitcoin.org/index.php?topic=12181.0
gominoa
Newbie
*
Offline Offline

Activity: 17
Merit: 0


View Profile
July 10, 2011, 07:19:09 AM
 #105

In Kernel.cl
I changed this:
if(Vals[7].x == -H[7])

to this
if(Vals[7].x == -H[7]-K[60])

Try also changing:
if(Vals[7].y == -H[7])
... to ...
if(Vals[7].y == -H[7]-K[60])

notice Y instead of X. Will be just below the X line
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 10, 2011, 09:44:34 AM
 #106

In Kernel.cl
I changed this:
if(Vals[7].x == -H[7])

to this
if(Vals[7].x == -H[7]-K[60])

Try also changing:
if(Vals[7].y == -H[7])
... to ...
if(Vals[7].y == -H[7]-K[60])

notice Y instead of X. Will be just below the X line


Same error
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 10, 2011, 11:38:08 AM
 #107

Next kernel version will, once more, be faster for 69XX and 58XX cards Smiley. Stay tuned!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 10, 2011, 01:12:49 PM
Last edit: July 10, 2011, 01:32:52 PM by Vince
 #108

Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

to this
Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);


is this a typo or did you leave out the "="?

should be:

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);


indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 10, 2011, 04:55:33 PM
 #109

That Worked , i managed to go from 306-308MH to 308-310MH on 5830.
donations forth coming if it remains stable. Thx...
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 10, 2011, 06:07:41 PM
 #110

Vals[7] += K[60] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

to this
Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);


is this a typo or did you leave out the "="?

should be:

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);



do i need to change    if(Vals[7].y == -H[7])?
and    if(Vals[7] == -H[7])?
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 10, 2011, 06:13:45 PM
 #111

Hi Diapolo!

Great to see you're making progress!
There's one thing that pops into my eye:

you already do:
if(Vals[7].x == -H[7])

why not add the K[60] right into it and remove from upper instruction? Saves a whole instruction and will work 100% ;-)

if(Vals[7].x == -H[7]-K[60])


Lets see if I can find more ..

how would this save an instruction? did you just move -K[60]?
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 10, 2011, 06:30:33 PM
 #112

how would this save an instruction? did you just move -K[60]?

Yes, just moved it. Now the compiler optimizes it away, before it didn't.

BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 10, 2011, 06:50:32 PM
 #113

how would this save an instruction? did you just move -K[60]?

Yes, just moved it. Now the compiler optimizes it away, before it didn't.


Mind explaining more? I don't get it
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 10, 2011, 07:02:44 PM
 #114

Mind explaining more? I don't get it

Its a constant.
If I add it together with the other stuff in round 124 to Vals[7], it takes an addition to do so, cause its the only constant.

If moved to the comparison at the end, the two constants H[7] and K[60] are merged together into one by the compiler, same execution time here.
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 10, 2011, 07:09:28 PM
 #115

So i changed the whole thing to

   Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);
   
#ifdef VECTORS
   if(Vals[7].x == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3].x >> 2) & OUTPUT_MASK] = W[3].x;
   if(Vals[7].y == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3].y >> 2) & OUTPUT_MASK] =  W[3].y;
#else
   if(Vals[7] == -H[7]-K[60])
      output[OUTPUT_SIZE] = output[(W[3] >> 2) & OUTPUT_MASK] = W[3];
#endif

does not notice any speed difference, hope that helps

Ps: Does the compiler really do the optimization? If not you introduced one more step cause K[60] appear twice now
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 10, 2011, 07:18:48 PM
 #116

I'm pretty sure the compiler will catch this  Wink
Note that the speed increase is minimal, ~0.1-0.2% maybe.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 11, 2011, 02:10:13 PM
 #117

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Turix
Member
**
Offline Offline

Activity: 76
Merit: 10



View Profile WWW
July 11, 2011, 02:31:01 PM
 #118

Gained about 1 Mhash (431->432) from the 7th version to todays new version on my 5870 (950/315).

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
Bwincoin - 100% Free POS. BSqnSwv7xdD6UEh8bJz8Xp6YcndPQ2JFyF
Bobnova
Full Member
***
Offline Offline

Activity: 210
Merit: 100


View Profile
July 11, 2011, 03:08:38 PM
 #119

I also gained about 1Mh/s from todays update compared to the previous update, this is on a 5830 at 875/900 in linux.
The previous update made a big difference over what ships with phoenix 1.50.

I sent a small donation, as you've helped me make more money Cheesy

BTC:  1AURXf66t7pw65NwRiKukwPq1hLSiYLqbP
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 11, 2011, 04:29:14 PM
 #120

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 11, 2011, 04:51:53 PM
 #121

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

Various functions before the reassignment use Vals[2] search the kernel for Vals[ and you will see them. I tried to remove that assignment, but as you discovered, it is needed there Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 11, 2011, 05:06:15 PM
 #122

This does work with SDK 2.1 but it might be a tiny bit slower than your previous version.

HD5850, 1.0875V, 975MHz clock, 360 MHz RAM, aggression=14, worksize=256, Catalyst 11.6 (Linux)

SDK 2.1:   404.6 MH/s  ->  404.5 MH/s
SDK 2.4:   401.8 MH/s  ->  402.2 MH/s

Note that at aggression=14 my rate can sometimes drop as much as 1 MH/s suddenly before recovering but usually varies by 0.2 MH/s so the apparent decrease with SDK 2.1 could well be statistical noise.

I might also have to play with the RAM frequency again.
OCedHrt
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
July 11, 2011, 05:14:01 PM
 #123

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.

Various functions before the reassignment use Vals[2] search the kernel for Vals[ and you will see them. I tried to remove that assignment, but as you discovered, it is needed there Cheesy.

Dia

I only searched for Vals[2] so did not see the % XD

Very small increase for me on this one 278->278.5. Interestingly a lower memory clock on my 6870 actually has a detrimental effect. Downclocking from 1050->800 reduces hash rate b 0.5MH/s. I can't clock it any lower so don't know if 300 will be better or not.

ALL.ME  ●●●  SOCIAL NETWORK OF THE BLOCKCHAIN TIME ●●●
▄▄▄▬▬▄▄▄  Bounty all.me ▶ Jan 29th - May 8th 2018  ▄▄▄▬▬▄▄▄
Facebook   ▲   Twitter   ▲   Telegram
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
July 11, 2011, 05:22:22 PM
 #124

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

I wish & hope you will still find out some ways to get more hashes.
Thanks.
dikidera
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 11, 2011, 05:23:18 PM
 #125

At 2 megahashes, your device produces around 2 to 2,5 million hashes per second, If we halven that to 1 megahash, that's still 1,25 million hashes per second, if we halven that, around 750 thousand per second.
So a increase of 0.2% or so, yields around 100 thousand hashes more per second.
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 11, 2011, 05:39:13 PM
 #126

Dont give up! There's still more to optimize, I'm at 1694 ALU OPs (HD6970) at the moment.
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 11, 2011, 06:19:54 PM
 #127

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 11, 2011, 06:28:47 PM
 #128

I wonder why do we need const uint D1.
It is only use once.
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 11, 2011, 06:50:09 PM
 #129

I wonder why do we need const uint D1.
It is only use once.

Its part of the precalculation. Its needed.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 11, 2011, 07:31:01 PM
 #130

Dont give up! There's still more to optimize, I'm at 1694 ALU OPs (HD6970) at the moment.

I can't read or edit Phyton, so yes there is room if one could alter or add some more kernel arguments.
Strange thing is, that I saw some additions, of known values, which I tried to to eleminate via constants, but this led to lower kernel performance. I played around with this today and saw no more improvement ... too bad, was real fun the last days!
If you would like to share your work, we all will be happy Smiley. What is your kernel doing for 58XX cards? I thought it makes no sense, to optimize one over the other and tried to reduce ALU OP count for both platforms.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 11, 2011, 07:31:55 PM
 #131

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.

Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
July 11, 2011, 08:25:28 PM
 #132

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
pennytrader
Sr. Member
****
Offline Offline

Activity: 254
Merit: 250


View Profile
July 11, 2011, 09:22:54 PM
 #133

On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.

please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
CYPER
Hero Member
*****
Offline Offline

Activity: 798
Merit: 502



View Profile
July 11, 2011, 09:53:22 PM
 #134

On 5830 + SDK 2.1, it's slightly slower than the previous version. Guess I'll revert it back.

Same here. With the previous version my average was 1758 and now it is 1756.

This is for 4x 5870 @ 960Mhz Core & 300Mhz Memory
SDK 2.1
Ubuntu 32bit
wazoo42
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 11, 2011, 10:18:46 PM
 #135

7/4/11   = a 1-2 MH/s increase
7/6/11   = 0 increase (maybe slight decrease)
7/11/11  = 1-2 MH/s further increase over 7/4/11

These are on 2x 5830s, and 3x 5770s using ati-drivers-11.6, phoenix-1.50, pyopencl-0.92, and ati-stream-sdk-bin-2.4.
error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
July 11, 2011, 10:21:38 PM
 #136

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 11, 2011, 11:31:37 PM
 #137

Decreasing hashrates .. thats really strange. These 58xx-cards sometimes behave quite strange.

I cant test it cause all my rigs run on 6950's unlocked to 6970's
erek
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
July 11, 2011, 11:32:18 PM
 #138

Download version 2011-07-11: http://www.mediafire.com/?k404b6lqn8vu6z6

This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I Cheesy. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!

Thanks to all donators and your feedback!

Dia

This one seems to act rather strangely.

On 2011-07-06 I had a pretty consistent 360MHash/sec with little variation. With 2011-07-11 the card is varying dramatically between 355-363, and usually at the lower end of that. Overall it may be slightly slower.

The cards are 5850s clocked at 900/300, using 11.4 and SDK 2.4 on Linux.

I'm going to let it run a while longer.

Sorry, but this is still slower; my cards were running around 355MH/sec and even down to 350. Went back to 2011-07-06.

I totally disagree, each version for me has been getting faster and faster. 7-11-11 is the fastest, yet for me.
Wildvest
Newbie
*
Offline Offline

Activity: 41
Merit: 0


View Profile WWW
July 12, 2011, 12:25:55 AM
 #139

2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 12, 2011, 02:27:20 AM
 #140

So here are some more changes:

I introduced const uint W17_2, containing P1(19) + 0x11002000, thats 3 shifts, 2 xor, 1 add traded against one extra parameter, well worth it,

extended self.f:
self.f = np.zeros(5, np.uint32)
to
self.f = np.zeros(6, np.uint32)

just after W17 calculation in calculateF:
        #W17_2
        self.f[5] = np.uint32(0x11002000+(
                rot(self.f[2], 32-13) ^
                rot(self.f[2], 32-15) ^
                (self.f[2] >> 10)
                ))

added the parameter (right after W17) in call and function

=> Effectively 3 Op's saved.

next change:
You can cut out all W0 to W14! Most of them are zero anyway, just needed to hardcode the first ones.
Also W[73] to W[78] are not used anymore with some small changes, so no need to initialize them.

=> less memory use, but has the same speed for me

Next one:
Round 3

#ifdef VECTORS
        Vals[4] = (W_3 = ((base + get_global_id(0)) << 1) + (uint2)(0, 1)) + PreVal4;
#else
        Vals[4] = (W_3 = base + get_global_id(0)) + PreVal4;
#endif

--
        // Round 3
        Vals[0] = state0 + Vals[4];
        Vals[4] += T1;

--

W[64 - O] = state0 + Vals[0];

you can reorganize and shorten round 3 to:
        Vals[0] = T1 + Vals[4];

needed changes in precalculation:
Preval4 += T1
T1 = state0 - T1

=> another addition almost effortless


here the files with these changes:
http://www.filesonic.com/file/1423103594

still some more to come!



TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
July 12, 2011, 02:36:59 AM
 #141

2011-11-11 i can now report a 0.5-1% increase over the improved phatk kernel
How'd you get this future kernel?
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 12, 2011, 05:57:08 AM
Last edit: July 12, 2011, 06:44:28 AM by hugolp
 #142

Quote from: hugolp
Previous patches had solved one of my cards crashing the miner every 30 minutes or so, but this one has reintroduced the problem. One miner mining a 5870 crashed after 20 minutes.

Sorry for that, but I have no idea, what would cause this. Perhaps the card is faulty? Will Furmark "crash" the card or show artifacts?

Dia

Im using the previous patch, that is almost as fast, in that card, so its ok.

Im wondering as well if the card has some kind of problem, but with other kernels it has been running non-stop for days without a problem. Dont know why some kernels trigger the crash.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
Dubs420
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
July 12, 2011, 06:48:31 AM
 #143

I just tried your latest kernel in GUIminer with poclbm miner went from 417 to 419 each GPU great work thanks. was able to tweak a little more up to 420.6 to 421.0 using -f 1
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 12, 2011, 09:38:26 AM
 #144

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 12, 2011, 11:26:22 AM
 #145

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent.  That being the case, which cards are you using for testing?
pandemic
Sr. Member
****
Offline Offline

Activity: 434
Merit: 250


View Profile
July 12, 2011, 12:14:46 PM
 #146

My 5830 went from 304mh/s to 307mh/s. Small increase, but why not? Smiley
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 12, 2011, 12:55:02 PM
 #147

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia

You say that some of Vince's changes seem to reduce kernel speed but it looks like actual speed gain/loss is very much card dependent.  That being the case, which cards are you using for testing?


I own a 5870, a 5830 and use AMD KernelAnalyzer to get infos for 69XX cards. You see I focused on that cards during my own tests. I could receive infos for more cards via AMD KA, but it seems hard to optimize one kernel for all cards Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
kbsbtc
Newbie
*
Offline Offline

Activity: 53
Merit: 0


View Profile
July 12, 2011, 01:05:59 PM
 #148

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....
Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 12, 2011, 01:28:58 PM
 #149

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

So .. what about some more information?  Sad

Pool? Version used? Clock speeds? 5830 @ 305 seems to be somewhat overclocked ..
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 12, 2011, 02:12:18 PM
 #150

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 12, 2011, 03:18:25 PM
 #151

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 12, 2011, 03:24:45 PM
 #152

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Thanks for clarification! Good to know, this was new for me.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
talldude
Member
**
Offline Offline

Activity: 224
Merit: 10


View Profile
July 12, 2011, 03:55:17 PM
 #153

cool, went from 349 (original improved kernel) to 350.3 with latest. Keep 'em coming Cheesy
kbsbtc
Newbie
*
Offline Offline

Activity: 53
Merit: 0


View Profile
July 12, 2011, 06:25:37 PM
 #154

I was getting only .01% stales, with your patch I have an increase of 10mhash on average with my 5830 ( 295->305) but my stale count is now around 3-4%.....

Perhaps the kernel pushes your card harder and it generates errors. But could be the pool, driver and so on, like Vince said.

Dia

Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Thanks for clarification! Good to know, this was new for me.

Dia

Thanks for the heads up. I am running 4 5830s on 1 box clocked at 950/300 pointed at bitcoins.lc. It seems the stale count went up for me, didn't mean to blame you or anything just saying that was what happened. I'lll look into on the pool end though.
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
July 12, 2011, 06:33:27 PM
 #155

=> another addition almost effortless

here the files with these changes:
http://www.filesonic.com/file/1423103594

still some more to come!
Thanks.
It increased 440 to 443 in 5870 @ 975/325 Windows.
431-434 in 6970 & 5870 @ 975/1375 & 984/300 Ubuntu - Smartcoin

With the inclusion of _init_.py, i hope there will be still some room to tweak.
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 12, 2011, 06:34:48 PM
 #156


Stales have nothing to do with GPU errors in Phoenix. All results returned by the GPU are verified before being sent to the server, which means if the kernel finds invalid shares you will get "Unexpected behavior from OpenCL. Hardware problem?" instead of the share being submitted. If you are not getting any of these errors there is NO WAY a kernel change can affect the number of stale shares. It might affect the total number of shares submitted in a given time period, but every share sent to the server by Phoenix is verified to be H == 0 on the CPU beforehand.

Thanks for clarification! Good to know, this was new for me.

Dia

Thanks for the heads up. I am running 4 5830s on 1 box clocked at 950/300 pointed at bitcoins.lc. It seems the stale count went up for me, didn't mean to blame you or anything just saying that was what happened. I'lll look into on the pool end though.

My post was mainly intended to clarify that stale shares are not a good measurement for kernel changes.

A much more reliable test would be to count the total number of shares submitted over a long period (say 24 hours or so) This includes stales, since the goal is to test how many shares the kernel finds, not how many the server accepts. If this number is higher than without the kernel modifications, you know that it's helping.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 12, 2011, 11:55:24 PM
 #157

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia
what about for poclbm? there is no __init__.py
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 13, 2011, 05:08:30 AM
 #158

With the ideas that Vince gave to us, I was able to lower the ALU OP usage even further. This means next version will speed up things for 69XX and 58XX again Smiley.
Thank you Vince, I didn't need all your changes (some seem to reduce kernel speed, even if they look good), but merged the ones I like and verified everything with KernelAnalyzer.

Edit: Drawback is, that you will need to replace the Phoenix __init__.py file, so it won't be easy usable for non Phoenix users, sorry for that (some init values changed)!

Dia
what about for poclbm? there is no __init__.py

That's what my edit was about Wink. It's all a matter of how much time I and others have and current focus is on Phoenix, because that's my main miner software.
Perhaps some mods can be done without new init values, so they will work without new __init__.py. But then I have to take care of 2 kernel versions. For now there is no need to worry, new version is not out Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
hchc
Hero Member
*****
Offline Offline

Activity: 504
Merit: 500



View Profile
July 13, 2011, 05:34:33 AM
 #159

>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!

I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.

............
.           ▓▓▓▓▓▓▓▓▓▓▓▓▓▓
        ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
      ▓▓▓▀  ▀▓▓▓▀  ▀▓▓▀  ▀▓▓▓▓
    ▓▓▓▓▓▄  ▄▓▓▓▄  ▄▓▓▄  ▄▓▓▓▓▓▓
   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  ▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓
▓▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▀  ▀▓▓▓▀  ▀▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▄  ▄▓▓▓▄  ▄▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  ▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
    ▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
     ▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
       ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
          ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓

..           ▓▓▓▓▓▓▓▓▓▓▓▓▓▓
        ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
      ▓▓▓▀  ▀▓▓▓▀  ▀▓▓▀  ▀▓▓▓▓
    ▓▓▓▓▓▄  ▄▓▓▓▄  ▄▓▓▄  ▄▓▓▓▓▓▓
   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  ▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓
▓▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▀  ▀▓▓▓▀  ▀▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▄  ▄▓▓▓▄  ▄▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
▓▓▓▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
  ▓▓▓▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
   ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
    ▓▓▓▓▓▀  ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
     ▓▓▓▓▄  ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
       ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
          ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓

.............
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 13, 2011, 05:57:00 AM
 #160

>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!

I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.


- Win7 X64 SP1
- Cat 11.7 with SDK 2.4 and Runtime 2.4 (in order to be able to use AMD APP KernelAnalyzer)
- Sapphire 5830 Xtreme @ 1000 MHz core / 350 MHz Mem
- Phoenix 1.5: agression 12, vectors, bfi_int

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 13, 2011, 08:41:00 AM
 #161

>I modified the shipping phatk Kernel from Phoenix 1.50. I now get round about 9-10 MHash/s more on my 5830 (up >from 310 to 319/320)!

I would really like to replicate this. Currently getting 310 Mh with 2.1 + 11.5 with bitless Ma() changes. Using the 7-11 I only get a small jump to 311. Can you share the config you use? sdk/driver version etc Thanks.


That might be tough since phatk doesn't work very well on older SDK versions. It was designed to be used with SDK 2.4, and on 2.1 I get better results with poclbm. The Ma() changes also apply to poclbm, so you won't see a gain there.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
Uzza
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
July 13, 2011, 11:54:08 PM
Last edit: July 14, 2011, 09:42:39 AM by Uzza
 #162

I just did a quick comparison against poclbm for me.

On my dedicated 5870:
poclbm     SDK 2.1: ~424
phatk       SDK 2.1: <420
phatk imp SDK 2.1: ~432

poclbm     SDK 2.4: <424
phatk       SDK 2.4: ~424
phatk imp SDK 2.4: ~437

So on SDK 2.1 your improvements made it so phatk was better than poclbm in 2.1, and way better in 2.4.

The init optimizations gave me a minor boost of ~0.5 MHs over 2011-07-11.
PcChip
Sr. Member
****
Offline Offline

Activity: 418
Merit: 250


View Profile
July 14, 2011, 02:03:05 AM
 #163

Uzza, two questions:

1.) what is "phatk imp"

2.) Surely you meant a 5870 instead of a 4870?  Either that or you must have four 4870's to hit 430 MH/s!

Legacy signature from 2011: 
All rates with Phoenix 1.50 / PhatK
5850 - 400 MH/s  |  5850 - 355 MH/s | 5830 - 310 MH/s  |  GTX570 - 115 MH/s | 5770 - 210 MH/s | 5770 - 200 MH/s
CYPER
Hero Member
*****
Offline Offline

Activity: 798
Merit: 502



View Profile
July 14, 2011, 04:17:30 AM
 #164

Uzza, two questions:

1.) what is "phatk imp"

2.) Surely you meant a 5870 instead of a 4870?  Either that or you must have four 4870's to hit 430 MH/s!

1 - Phatk Improved - it's what this topic is all about.
2 - Most probably he meant 5870  Wink
Uzza
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
July 14, 2011, 09:43:38 AM
 #165

1 - Phatk Improved - it's what this topic is all about.
2 - Most probably he meant 5870  Wink
What this guy said.
nico_w
Newbie
*
Offline Offline

Activity: 12
Merit: 0


View Profile
July 14, 2011, 04:27:41 PM
 #166

Sadly it only gives me 1Mhash/s increase from 344 to 345 on my server, but keep up the good work!
MiningBuddy
Hero Member
*****
Offline Offline

Activity: 927
Merit: 1000


฿itcoin ฿itcoin ฿itcoin


View Profile
July 15, 2011, 11:33:29 AM
Last edit: July 15, 2011, 12:02:21 PM by MiningBuddy
 #167

Testing the 2011-07-11 kernel.7z

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1
Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec]
Stales before: 0.22% - > Stales after: 3.48%
Over a 4 hour test period

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4
Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec]
Stales: Not tested due to adverse affects.

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 15, 2011, 12:15:11 PM
 #168

Testing the 2011-07-11 kernel.7z

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1
Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec]
Stales before: 0.22% - > Stales after: 3.48%
Over a 4 hour test period

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4
Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec]
Stales: Not tested due to adverse affects.

Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^.
Btw. I had other things to do, but during the next week I will release a new version.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
MiningBuddy
Hero Member
*****
Offline Offline

Activity: 927
Merit: 1000


฿itcoin ฿itcoin ฿itcoin


View Profile
July 15, 2011, 12:26:44 PM
 #169

Testing the 2011-07-11 kernel.7z

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1
Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec]
Stales before: 0.22% - > Stales after: 3.48%
Over a 4 hour test period

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4
Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec]
Stales: Not tested due to adverse affects.

Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^.
Btw. I had other things to do, but during the next week I will release a new version.

Dia
Awesome, I look forward to it.
I think the rejected shares was random variance from my side as it seems to have settled down to a more realistic 0.88%.

jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
July 15, 2011, 07:47:25 PM
 #170

Testing the 2011-07-11 kernel.7z

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.1
Before: [428.56 Mhash/sec] -> After: [432.28 Mhash/sec]
Stales before: 0.22% - > Stales after: 3.48%
Over a 4 hour test period

XFX 5870 @ 940/300
Ubuntu 11.04, ATI Drivers 11.5, SDK 2.4
Before: [422.34 Mhash/sec] -> After: [189.58 Mhash/sec]
Stales: Not tested due to adverse affects.

Well that is very strange, but at least you are able to mine faster with SDK2.1 and the current kernel version ^^.
Btw. I had other things to do, but during the next week I will release a new version.

Dia
Awesome, I look forward to it.
I think the rejected shares was random variance from my side as it seems to have settled down to a more realistic 0.88%.

Keep in mind that OpenCL kernel changes have NO effect on stale shares (aside from the VERY small difference in time it takes to run 1 execution of some number of hashes) All nonces found by the kernel to satisfy H == 0 are verified on the CPU prior to sending. Shares are also checked against the current known block before sending, in case new work was received while the kernel was executing. Basically this means that every share sent to the server is valid as far as the miner is concerned. If the OpenCL kernel is returning bad work it will never be sent to the server, and instead you will get "Unusual behavior from OpenCL. Hardware problem?"

That said, changes to the python portion of a Phoenix kernel can increase stale shares if badly implemented. (see: FASTLOOP excessive stales with high aggression in older versions of Phoenix)

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
zimpixa
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
July 16, 2011, 11:31:45 PM
 #171

Every new version up to 07~ was faster than previous on both of my rigs. 11~ version however is faster for single 5850, but on 5870+5850 rig Im noticing minor slowdown (due to bigger deltas, means best performance is same, but it can go little lower). In adition primary GPU set '-f0' stopped to bottleneck other GPU (but changing from -f35 to -f0 didnt add any speed). After all tests I've changed version on single gpu rig and left old one on double gpu rig.

YinCoin YangCoin ☯☯First Ever POS/POW Alternator! Multipool! ☯ ☯ http://yinyangpool.com/ 
Free Distribution! https://bitcointalk.org/index.php?topic=623937
coblee
Donator
Legendary
*
Offline Offline

Activity: 1653
Merit: 1286


Creator of Litecoin. Cryptocurrency enthusiast.


View Profile
July 17, 2011, 07:53:58 AM
 #172

Donation sent!

When you release the next version, please explain whats changed in __init__.py.
I'm using fpgaminer's poclbm w/ phatk: http://forum.bitcoin.org/index.php?topic=19169.0
I'd like to see if we can make your phatk kernel still work with that.

Thanks!

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 17, 2011, 01:00:49 PM
 #173

Important: since version 2011-07-17 a modified version of __init__.py (for the Phoenix 1.5 miner) is included in this package and has to be used!
The kernel won't work for other Miners without modifications to them, see kernel.cl for further infos.



The new version 2011-07-17 is ready for download Smiley. Should be faster on 58XX and 69XX cards again.
This version will only work, if you use it with Phoenix and the supplied __init__.py file because of modifications to kernel variables!

A very big thank you goes to user Vince for input and ideas!

Download here:
http://www.mediafire.com/?317u0y93u7mnbys

Have fun,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dikidera
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 17, 2011, 02:20:07 PM
 #174

With this version on my 5870: 410.58/411.6 -> 413.44/414Mhash/s

But on my HD5850, i see a decrease of a minimum of 0.50 mhash/s
CYPER
Hero Member
*****
Offline Offline

Activity: 798
Merit: 502



View Profile
July 17, 2011, 02:24:58 PM
Last edit: July 17, 2011, 02:36:39 PM by CYPER
 #175

2011-07-07 was the best for me so far and with 2011-07-17 I see no improvements - the speed is 1758-1760 as before with the same fluctuations.

4x XFX 5870 @ 960Mhz Core & 300Mhz Memory
Ubuntu 32bit
SDK 2.1
11.5 Drivers

Peao
Legendary
*
Offline Offline

Activity: 1320
Merit: 1001


View Profile
July 17, 2011, 02:48:05 PM
 #176

I noticed an improvement in performance.

Thank you, Dia!
erek
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
July 17, 2011, 04:04:50 PM
 #177

Solid increase equivalent to nearly 5MHz overclock w/ 2011-07-17

Catalyst 11.7 Early + SDK 2.5


905/340 GPU/VRAM clocks on (2x 6970s)

hitting 812+ MH/sec
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
July 17, 2011, 04:22:47 PM
 #178

Same hash rate as mod.zip. No "more" hashes.
bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 17, 2011, 05:43:55 PM
 #179


I get the same speed with DiabloMiner

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 17, 2011, 05:52:28 PM
 #180


I get the same speed with DiabloMiner

No need to use Phoenix or this kernel then Wink...

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dadittox
Newbie
*
Offline Offline

Activity: 23
Merit: 0


View Profile
July 17, 2011, 06:17:58 PM
 #181

Updated from 2011-07-11 to 2011-07-17 kernel. Hash rate increased from 325 to 327 on 6950, and from 280 to 281 on 6870 all at stock clocks. Not a big increase but it's always nice to have a few mhashes for free. Keep up good work!
MiningBuddy
Hero Member
*****
Offline Offline

Activity: 927
Merit: 1000


฿itcoin ฿itcoin ฿itcoin


View Profile
July 17, 2011, 10:24:47 PM
 #182

Got an extra 1mhs from my cards with the latest patch, cheers  Smiley

BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 17, 2011, 11:32:27 PM
 #183

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
July 18, 2011, 01:16:17 AM
 #184

I get lots of 'unusual opencl behavior. hardware problem?' on the latest version, using a fresh Catalyst 11.6 driver (not CPU buggy hotfix) on WinXP on two different miners, going back to stock phoenix and 07-11 kernel is fine.
Tartarus
Newbie
*
Offline Offline

Activity: 47
Merit: 0


View Profile
July 18, 2011, 02:47:58 AM
 #185

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
July 18, 2011, 03:09:03 AM
 #186

Quote
Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

no sure your reasoning, but if you want that, just grab the 7-11 version.. the version before this latest.

You dont gain all that much from the changes in __init__.py if you really want to do with out, you wont be shooting yourself in the foot. So grab the last version.


Personally I appreciate even the smallest increase.. so thanks for the work

mooo for rent
coblee
Donator
Legendary
*
Offline Offline

Activity: 1653
Merit: 1286


Creator of Litecoin. Cryptocurrency enthusiast.


View Profile
July 18, 2011, 03:23:28 AM
 #187

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 18, 2011, 05:23:18 AM
 #188

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry.

You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 18, 2011, 08:28:53 AM
 #189

New version 2011-07-17 getting a lot of:

Code:
2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 18, 2011, 08:37:33 AM
 #190

New version 2011-07-17 getting a lot of:

Code:
2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?

I have never seen that message during my tests with the release version of the kernel. What OS, which SDK are you on?

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dikidera
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 18, 2011, 08:48:53 AM
 #191

New version 2011-07-17 getting a lot of:

Code:
2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?

I have never seen that message during my tests with the release version of the kernel. What OS, which SDK are you on?
I wonder if he copied over the init.py file.
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 18, 2011, 09:03:54 AM
 #192

Yes I copied over the init file, because without it it won't even run.  Windows 7 64-bit, Catalyst 11.5, Stream 2.4.  Haven't had this message pop up with any other version of phatk kernel, nor any other kernel for that matter.
bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
July 18, 2011, 02:34:20 PM
 #193

I get lots of 'unusual opencl behavior. hardware problem?' on the latest version, using a fresh Catalyst 11.6 driver (not CPU buggy hotfix) on WinXP on two different miners, going back to stock phoenix and 07-11 kernel is fine.

I had the same issue under windows which magically went away when I switched to Linux.

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
ahitman
Sr. Member
****
Offline Offline

Activity: 302
Merit: 250


View Profile
July 18, 2011, 03:15:25 PM
 #194

New version 2011-07-17 getting a lot of:

Code:
2011-07-18 01:24:09: Listener for "bitcoinmonkey": [18/07/2011 01:24:09] Kernel error: Unusual behavior from OpenCL. Hardware problem?

I had the same issue on my 5850, but when I took 5Mhz off my overclock the error went away, guessing it was picking up some errors from being pushed harder?
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
July 18, 2011, 03:17:51 PM
 #195

I'm on Linux and I'm still able to get these error messages, even with earlier versions of the phatk patch from this thread.  For me they start happening if my clocks are too high and become more frequenty as I increase the clocks.

For my 5850 at stock volts, 1015MHz doesn't generate any errors and with 1020 MHz they are very occasional but become more frequent as I ramp up to 1035MHz (my card always freezes at 1040MHz).

When I overvolted my 5850 to 1.25V and took it to 1110 MHz for 3 hours I got a few but noticed that I was generating good shares with the same OpenCL work request as was throwing errors so it seems the error doesn't invalidate the whole second and likely just a very small portion of it.  I have a screenshot here.

At lower voltages I never see these errors.  Either the card runs error free or crashes and the MHz line dividing these two states is pretty fine in my experience.

It's interesting that the kernel version might affect the frequency of such errors but they don't bother me.

Thanks for this latest kernel patch Diapolo.  My clock rates increased by 0.3 MH/s each.  2x5850: 722.2 MH/s -> 722.8 MH/s.  I'll send another small tip.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 18, 2011, 05:42:53 PM
 #196

I'm on Linux and I'm still able to get these error messages, even with earlier versions of the phatk patch from this thread.  For me they start happening if my clocks are too high and become more frequenty as I increase the clocks.

For my 5850 at stock volts, 1015MHz doesn't generate any errors and with 1020 MHz they are very occasional but become more frequent as I ramp up to 1035MHz (my card always freezes at 1040MHz).

When I overvolted my 5850 to 1.25V and took it to 1110 MHz for 3 hours I got a few but noticed that I was generating good shares with the same OpenCL work request as was throwing errors so it seems the error doesn't invalidate the whole second and likely just a very small portion of it.  I have a screenshot here.

At lower voltages I never see these errors.  Either the card runs error free or crashes and the MHz line dividing these two states is pretty fine in my experience.

It's interesting that the kernel version might affect the frequency of such errors but they don't bother me.

Thanks for this latest kernel patch Diapolo.  My clock rates increased by 0.3 MH/s each.  2x5850: 722.2 MH/s -> 722.8 MH/s.  I'll send another small tip.

Very nice posting with relevant information to all who encounter this error. Thanks for sharing Smiley. I never had this error, because my 5830 clocks were (and are) never above 1000 MHz for the chip, so it seems logical to me.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
phorensic
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500



View Profile
July 18, 2011, 07:08:31 PM
 #197

I think I jumped the gun.  I believe I am having a real hardware problem.  It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell.  I think what happened is that these errors were hidden from the console until this new kernel update!  If that's the case, kudos for making the errors work! haha.
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 18, 2011, 09:23:29 PM
 #198

I think I jumped the gun.  I believe I am having a real hardware problem.  It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell.  I think what happened is that these errors were hidden from the console until this new kernel update!  If that's the case, kudos for making the errors work! haha.

Not necessarely. I had the same problem with a card in previous versions of the kernel. After 20 minutes it would produce that message in phoenix or would crash poclbm. But with exactly the same configuration and later kernels it was solved, even when it was producing higher hashing rates. I am not sure why it happens exactly.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
shin234
Newbie
*
Offline Offline

Activity: 39
Merit: 0


View Profile
July 19, 2011, 01:25:44 AM
 #199

my stales are at / shares 4234 / stale (7, 0.17%) / after the latest version using AOCLBF  on 2 5850's one running at 840/300 and 1000/300 the latter is because of getting a card with a crappy part number from sapphire
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 19, 2011, 04:10:23 AM
 #200

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry.

You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without.

Dia
what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 19, 2011, 05:37:38 AM
 #201

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry.

You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without.

Dia
what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes?

What I do here is just hobby and I don't want it to take even more time, I hope you understand that. I can't maintain 2 different kernel versions, sorry.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
burningrave101
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
July 19, 2011, 04:11:45 PM
 #202

Can you make another version that includes all the recent changes and __init__.py does not need to changed?
thanks

Why?  Just unzip in the kernels/phatk/ directory.

Main reason is some people like to use this kernel with poclbm. The 7-11 works, but the newest doesn't because it requires changes in __init__.py which is specific to phoenix. BOARBEAR is asking for a version that includes all the additional improvements since 7-11 that doesn't require any changes to __init__.py.

It is impossible to create a version of this kernel that works without modifications in the main miner software. As I wrote, there are values and variables, that the kernel uses and which are precalculated in the miner software and then passed as parameters to the kernel. A miner, which doesn't pass the required parameters will not work without beeing modified, sorry.

You guys are free to mod the kernel for yourself to revert the changes, which require a modded miner software and only take the ones, which can work without.

Dia
what I meant was, excluding the changes that needs the modification of __init__.py can you release a version that has all the other changes?

Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 19, 2011, 04:21:30 PM
 #203

Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.

Backup pools. Its a big plus (and peace of mind).

Also, for me poclmb is slightly faster than phoenix with the same kernel.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
burningrave101
Newbie
*
Offline Offline

Activity: 55
Merit: 0


View Profile
July 19, 2011, 04:43:58 PM
 #204

Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.

Backup pools. Its a big plus (and peace of mind).

Also, for me poclmb is slightly faster than phoenix with the same kernel.

Can't you just create another Phoenix miner on a different pool with a low aggression value and it will take over if your main pool worker goes idle?
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 19, 2011, 06:08:00 PM
 #205

Unless you have a good reason for needing to use Poclbm over Phoenix, which I can't currently think of one, then there's no point in running Poclbm over Phoenix when Phoenix is faster.

Backup pools. Its a big plus (and peace of mind).

Also, for me poclmb is slightly faster than phoenix with the same kernel.

Can't you just create another Phoenix miner on a different pool with a low aggression value and it will take over if your main pool worker goes idle?

I tried this (I was a phoenix user until poclbm added backup pools), but the second miner would take some hashing power from the main one (big deal), and then if the main one went down it would not perform at full speed because the aggression was lower.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
coblee
Donator
Legendary
*
Offline Offline

Activity: 1653
Merit: 1286


Creator of Litecoin. Cryptocurrency enthusiast.


View Profile
July 19, 2011, 06:27:27 PM
 #206

Can't you just create another Phoenix miner on a different pool with a low aggression value and it will take over if your main pool worker goes idle?

That's really not the same. It's also more hassle. I use poclbm b/c it's just as fast as phoenix has better display and has backup pool.

cyberlync
Full Member
***
Offline Offline

Activity: 226
Merit: 100



View Profile
July 19, 2011, 07:11:49 PM
Last edit: July 20, 2011, 05:49:21 PM by cyberlync
 #207

To above posters, AOCLBF with Phoenix will solve your backup pool problems.

edit: Forgot about the linux peeps, just shows how much of a WinBlows nab I am. Pardon me good folks.

Giving away your BTC's? Send 'em here: 1F7XgercyaXeDHiuq31YzrVK5YAhbDkJhf
coblee
Donator
Legendary
*
Offline Offline

Activity: 1653
Merit: 1286


Creator of Litecoin. Cryptocurrency enthusiast.


View Profile
July 19, 2011, 07:23:18 PM
 #208

To above posters, AOCLBF with Phoenix will solve your backup pool problems.

But I'm using linux.

ed64
Newbie
*
Offline Offline

Activity: 42
Merit: 0


View Profile
July 19, 2011, 07:56:10 PM
 #209

Latest poclbm from github made some tasks asynchronous and should bring it up to par with phoenix now as well.

Can't you just create another Phoenix miner on a different pool with a low aggression value and it will take over if your main pool worker goes idle?

That's really not the same. It's also more hassle. I use poclbm b/c it's just as fast as phoenix has better display and has backup pool.
hugolp
Legendary
*
Offline Offline

Activity: 1148
Merit: 1001


Radix-The Decentralized Finance Protocol


View Profile
July 20, 2011, 05:13:15 AM
 #210

To above posters, AOCLBF with Phoenix will solve your backup pool problems.

But I'm using linux.

Im using linux too and dont want a GUI. I manage my mining rig through ssh.


               ▄████████▄
               ██▀▀▀▀▀▀▀▀
              ██▀
             ███
▄▄▄▄▄       ███
██████     ███
    ▀██▄  ▄██
     ▀██▄▄██▀
       ████▀
        ▀█▀
The Radix DeFi Protocol is
R A D I X

███████████████████████████████████

The Decentralized

Finance Protocol
Scalable
▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██
██                   ██
██                   ██
████████████████     ██
██            ██     ██
██            ██     ██
██▄▄▄▄▄▄      ██     ██
██▀▀▀▀██      ██     ██
██    ██      ██     
██    ██      ██
███████████████████████

███
Secure
      ▄▄▄▄▄
    █████████
   ██▀     ▀██
  ███       ███

▄▄███▄▄▄▄▄▄▄███▄▄
██▀▀▀▀▀▀▀▀▀▀▀▀▀██
██             ██
██             ██
██             ██
██             ██
██             ██
██    ███████████

███
Community Driven
      ▄█   ▄▄
      ██ ██████▄▄
      ▀▀▄█▀   ▀▀██▄
     ▄▄ ██       ▀███▄▄██
    ██ ██▀          ▀▀██▀
    ██ ██▄            ██
   ██ ██████▄▄       ██▀
  ▄██       ▀██▄     ██
  ██▀         ▀███▄▄██▀
 ▄██             ▀▀▀▀
 ██▀
▄██
▄▄
██
███▄
▀███▄
 ▀███▄
  ▀████
    ████
     ████▄
      ▀███▄
       ▀███▄
        ▀████
          ███
           ██
           ▀▀

███
Radix is using our significant technology
innovations to be the first layer 1 protocol
specifically built to serve the rapidly growing DeFi.
Radix is the future of DeFi
█████████████████████████████████████

   ▄▄█████
  ▄████▀▀▀
  █████
█████████▀
▀▀█████▀▀
  ████
  ████
  ████

Facebook

███

             ▄▄
       ▄▄▄█████
  ▄▄▄███▀▀▄███
▀▀███▀ ▄██████
    █ ███████
     ██▀▀▀███
           ▀▀

Telegram

███

▄      ▄███▄▄
██▄▄▄ ██████▀
████████████
 ██████████▀
   ███████▀
 ▄█████▀▀

Twitter

██████

...Get Tokens...
KKAtan
Newbie
*
Offline Offline

Activity: 50
Merit: 0


View Profile
July 20, 2011, 05:38:54 AM
 #211

here the files with these changes:
http://www.filesonic.com/file/1423103594
Thank you so much Vince. Your patch gives very nice performance for the HD 6870 cards I have, I sent a little something your way too.

Keep up the the good work guys, you are awesome.
MegaBux
Newbie
*
Offline Offline

Activity: 31
Merit: 0


View Profile
July 20, 2011, 04:38:14 PM
 #212

I am running a 4x6770 rig, with all cards manufactured by Sapphire.  Also using Phoenix 1.5 with SDK 2.4 and 11.6 Catalyst drivers.  Each GPU is clocked to 960/800 at the stock voltage.

Prior to the patch, each GPU capped at 217Mhps.  This is with the 3% phatk mod.  After the patch, I saw no difference until I reduced the memory clock to 300Mhz.  The GPUs now cap at about 220Mhps.

I am wondering though if this is an accurate throughput measurement.  My pool is reporting lower-than-expected 24-rewards, and GPU temperatures are also 2 degrees cooler in this configuration.  I am also under the impression that 800Mhz is the lowest supported memory clock for this card.

Anyone else experience this phenomenon?
MegaBux
Newbie
*
Offline Offline

Activity: 31
Merit: 0


View Profile
July 20, 2011, 11:41:28 PM
 #213

I am wondering though if this is an accurate throughput measurement.  My pool is reporting lower-than-expected 24-rewards, and GPU temperatures are also 2 degrees cooler in this configuration.  I am also under the impression that 800Mhz is the lowest supported memory clock for this card.

Anyone else experience this phenomenon?

I just realized that the difficulty went up during the time I was testing; this explains the lower 24h rewards.  I'm still curious as to how a lower memory clock frequency could improve the hash rate though.
bmgjet
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
July 21, 2011, 03:33:14 AM
 #214

lower heat, less power for sure
Also possible it does something with the timings.

Donations to: 1BMGjetfht9XLkGBYR4TSsuXjrYEKACcow
1stbits: 1bmgjet
300MHash/s 6850 http://www.techpowerup.com/gpuz/5u6wr/
Overclocked for 6 years and still strong http://valid.canardpc.com/show_oc.php?id=1931458 & http://valid.canardpc.com/show_oc.php?id=285337
Dubs420
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
July 23, 2011, 07:55:45 AM
 #215

Sent you a little donation as thanks for your work.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 23, 2011, 09:37:33 AM
 #216

Reposted 2011-07-17 version because of a small mistake in variable naming. T1substate0 was wrong, it has to be state0subT1.
No further changes, that will do anything for those, who grabbed the version before this posting!

Currently no news for you guys, perhaps I can do a special version for 69XX cards (which could be 1 - 2 ALU OPs faster, but slower for 58XX), when there is demand. But for 58XX cards I'm out of optimisation ideas Wink.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
MiningBuddy
Hero Member
*****
Offline Offline

Activity: 927
Merit: 1000


฿itcoin ฿itcoin ฿itcoin


View Profile
July 23, 2011, 09:47:17 AM
 #217

perhaps I can do a special version for 69XX cards (which could be 1 - 2 ALU OPs faster, but slower for 58XX)

Yes, please do!  Grin

xcooling
Member
**
Offline Offline

Activity: 145
Merit: 10


View Profile
July 23, 2011, 12:03:01 PM
 #218

69xx version would be wonderful ;-P

deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
July 23, 2011, 03:40:20 PM
Last edit: July 24, 2011, 01:43:41 AM by deepceleron
 #219

I think I jumped the gun.  I believe I am having a real hardware problem.  It's only on one card, the hotter one of the group, and it's overclocked and over-volted to all hell.  I think what happened is that these errors were hidden from the console until this new kernel update!  If that's the case, kudos for making the errors work! haha.

Not necessarely. I had the same problem with a card in previous versions of the kernel. After 20 minutes it would produce that message in phoenix or would crash poclbm. But with exactly the same configuration and later kernels it was solved, even when it was producing higher hashing rates. I am not sure why it happens exactly.

The code that creates this error is in the phatk init file:

                    if not hash.endswith('\x00\x00\x00\x00'):
                        self.interface.error('Unusual behavior from OpenCL. '
                            'Hardware problem?')


The error is reported if the hash returned by OpenCL does not begin with zeros. The error means that the hash-checking done in OpenCL thought the hash was valid and returned it, but this simple sanity check showed it was invalid. Either the hash-checking math was done wrong in OpenCL (saying that a bad hash was good, and perhaps silently discarding good hashes), or the correct hash is being corrupted when it is returned back to phatk core. It seems like something about the 07-17 kernel causes more errors on high overclock cards (perhaps running a different shader instruction that in silicon that is less tolerant to overclock?), errors that were not produced before at the same clock speed, which reduces overclockability.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 23, 2011, 10:11:17 PM
 #220

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
BOARBEAR
Member
**
Offline Offline

Activity: 77
Merit: 10


View Profile
July 23, 2011, 10:43:17 PM
 #221

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

What's the rationale behind this?  It seems very weird to me that the compiler interpret the two statement differently.
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
July 23, 2011, 11:49:00 PM
 #222

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

Will it be slower on a 6850?
MiningBuddy
Hero Member
*****
Offline Offline

Activity: 927
Merit: 1000


฿itcoin ฿itcoin ฿itcoin


View Profile
July 24, 2011, 02:32:56 AM
 #223

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia
Thanks, gave me a 0.31 Mh/s increase per core on my 6990's  Cool

Vince
Newbie
*
Offline Offline

Activity: 38
Merit: 0


View Profile
July 24, 2011, 03:29:33 AM
 #224

thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:
        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 24, 2011, 08:41:18 AM
 #225

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

What's the rationale behind this?  It seems very weird to me that the compiler interpret the two statement differently.

You are not the only one Cheesy, but the compiler sais it's one ALU OP less!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 24, 2011, 08:46:06 AM
 #226

69xx version would be wonderful ;-P

To all 69XX card owners, that want 1 ALU OP less, down to 1697 Smiley. Just edit the kernel.cl file and replace Line 385 (DL the latest 2011-07-17 version):

Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

with

Vals[7] = Vals[7] + Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

Please report if it works Smiley. Remember, this WILL be slower for 58XX owners, so don't try this, if you are on 58XX cards or even (s)lower!

Dia

Will it be slower on a 6850?

6850 is a VLIW5 design and will be slower ... really only 69XX cards!

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 24, 2011, 08:48:26 AM
 #227

thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:
        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)

Hey Vince, tried this by myself a few days ago and didn't get a better efficiency either ... perhaps I will have to throw it into the mixer again Cheesy.
And thanks again for your work!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 25, 2011, 02:03:38 PM
 #228

thats really weired. I cant see why the compiler generates better code with this statement - but it does. Verified it with SDK 2.4 myself ..

Anyone noticed that during the last few rounds a lot of variables are calculated, but never used again?

I took the time to optimize all unused statements out, but got no speed improvements at all. Seems like the compiler optimized it anyway.

Here it is - from round 124 to the #ifdef VECTORS preprocessor command

Code:
        sharound(121);

        // Optimized out all unused calculations
        W[122 - O] = P4(122) + P3(122) + P2(122) + P1(122);
        Vals[1] += K[58] + Vals[5] + W[122 - O] + s1(122) + ch(122);
        Vals[0] += K[59] + Vals[4] + s1(123) + ch(123) + P4(123) + P3(123) + P2(123) + P1(123);
        Vals[7] += Vals[3] + P4(124) + P3(124) + P2(124) + P1(124) + s1(124) + ch(124);

#ifdef VECTORS

Maybe someone can trick the compiler into making better code ;-)

Hey Vince, tried this by myself a few days ago and didn't get a better efficiency either ... perhaps I will have to throw it into the mixer again Cheesy.
And thanks again for your work!

Dia

No chance, tried different things and combinations, but the OpenCL compiler does it better, than I do (again ^^).

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
iopq
Hero Member
*****
Offline Offline

Activity: 658
Merit: 500


View Profile
July 29, 2011, 02:07:11 PM
 #229

Phateus posted some improvements in his own kernel, check it out:
http://forum.bitcoin.org/index.php?topic=7964.0

unfortunately it doesn't run for me, so i can't check whether it's faster on my card
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 29, 2011, 06:48:07 PM
 #230

Phateus posted some improvements in his own kernel, check it out:
http://forum.bitcoin.org/index.php?topic=7964.0

unfortunately it doesn't run for me, so i can't check whether it's faster on my card

Thanks for pointing me to that thread!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
July 30, 2011, 06:14:41 PM
 #231

I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus).
Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Tx2000
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile
July 30, 2011, 06:27:01 PM
 #232

I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus).
Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.

Dia

Hopefully you can re-work his version because atm, it doesn't work for me on Win7/2.4SDK/GUIMiner lastest.  I replaced your phatk kernel with his and it just stays at connecting spamming idle worker in console.
pennytrader
Sr. Member
****
Offline Offline

Activity: 254
Merit: 250


View Profile
July 31, 2011, 11:12:27 AM
 #233

I'm working on a new version! The inputs came from the original Author of phatk, who released a version 2.0 of phatk (THANKS Phateus).
Currently my version IS slower, but I see this as a fair and cool competition, from which all of us will benefit in the end.

Dia

Actually your kernel is fast for me.

Catalyst 11.6 + SDK 2.1, 975/300 gives me 313 mhs (your kernel)

Catalyst 11.6 + SDK 2.4, 975/300 gives me 311 mhs (latest from the original author. SDK 2.1 doesn't work)

please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
July 31, 2011, 07:14:16 PM
 #234

I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
July 31, 2011, 11:26:38 PM
 #235

I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.
this is the exact same problem I have too, but i use guiminer

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 01, 2011, 01:57:33 PM
 #236

I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.

Are you talking about my kernel mod or phatk 2.0? What are your command line options. Try to delete the *.elf files in the phatk directory!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
August 01, 2011, 05:17:10 PM
 #237

I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.

Are you talking about my kernel mod or phatk 2.0? What are your command line options. Try to delete the *.elf files in the phatk directory!

Dia
Sorry for the confusion, that's regarding the 2.0 phatk that came this weekend.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 01, 2011, 05:59:37 PM
 #238

I tried the new kernel with phoenix/11.6/2.4/Win7/5830, and gave it lots of different command line options. I could only get it to be 'miner idle' five times a second, or 5Ghash/s with no share solves.

Are you talking about my kernel mod or phatk 2.0? What are your command line options. Try to delete the *.elf files in the phatk directory!

Dia
Sorry for the confusion, that's regarding the 2.0 phatk that came this weekend.

Okay, but then please don't use this thread for phatk 2.0 support Smiley. I think you understand that.

Regards,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
August 02, 2011, 09:10:31 AM
 #239

Alright diapolo, you got your work cut out for you. Phateus fixed the guiminer bug (It was in his kernel. 1 too many #'s in __init__). You do great work; you might want to join forces at this point though Wink

Tx2000
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile
August 02, 2011, 08:21:31 PM
 #240

Alright diapolo, you got your work cut out for you. Phateus fixed the guiminer bug (It was in his kernel. 1 too many #'s in __init__). You do great work; you might want to join forces at this point though Wink

No way! Competition breeds innovation.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 02, 2011, 08:23:16 PM
Last edit: August 02, 2011, 08:43:09 PM by Diapolo
 #241

Hey guys, I'm making good progress for 69XX users and need a few to test my current version as I want to mature it before release.

- only 69XX users / users on VLIW4 design cards
- I need Win and Linux users
- I need SDK 2.1 / SDK 2.4 and SDK 2.5 users

Drop me a PM, if you want to participate and meet the requirements, I will then send you a DL link! If you have received the files the test results may be posted here and may be freely discussed, so that's no shitty NDA or something Smiley. I only want to make sure that the kernel works and can compete with phatk 2.1 Cheesy.

I'm sorry to say, but 58XX users are better with phatk 2.1 for now :-/.

Regards,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 02, 2011, 08:26:46 PM
 #242

Alright diapolo, you got your work cut out for you. Phateus fixed the guiminer bug (It was in his kernel. 1 too many #'s in __init__). You do great work; you might want to join forces at this point though Wink

No way! Competition breeds innovation.

I like to look at his code and even to share ideas. I mean Phateus uses parts of my mod for his re-work of phatk 2.1 and I WILL do the same Smiley.
So, I share Tx2000's oppinion and would like to keep our work separated ... I think this leaves more room for improvements and for competition Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 04, 2011, 07:32:46 PM
 #243

New version was just released, it should be the fastest for 69XX cards:
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788

This is the preferred switch for Phoenix in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256
or
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

Please test this version with SDK 2.4 / SDK 2.5! SDK 2.1 performance seems worse, but at least it should work. Report any errors and problems here and let me know what you think.
Have a look at your cards temperatures, I got a report, that they may be lower, which would be great Smiley.

Regards,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Dubs420
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
August 04, 2011, 08:04:19 PM
 #244

New version was just released, it should be the fastest for 69XX cards:
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788

This is the preferred switch for Phoenix in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256
or
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

Please test this version with SDK 2.4 / SDK 2.5! SDK 2.1 performance seems worse, but at least it should work. Report any errors and problems here and let me know what you think.
Have a look at your cards temperatures, I got a report, that they may be lower, which would be great Smiley.

Regards,
Dia

I just tried this on my 2 6970's Cat 11.7 SDK 2.4 Win 7 x64
Using poclbm on GUIminer with your 7-11 Kernel = 425.4 -v -f0 -w 128
Using phoenix on GUIminer with your 8-4 Kernel = 425.5 -k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128
Side note the phoenix miner stays better fully utilized now meaning less fluctuation it's ranging from 425.4 to 425.5 where as using poclbm I was getting ranges of 424.8 to 425.4

Keep up the great work.
mike678
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
August 04, 2011, 08:06:28 PM
 #245

How does this compare to the other one Phateus has?
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 04, 2011, 08:09:51 PM
 #246

Quote
2011-08-04 (pre-release): 1368 ALU OPs (23 less compared to original phatk 1.X / Cal 11.7 profile)
is it only for 69xx?

i get a steange new error import error no module named bfipatcher

mooo for rent
Dubs420
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
August 04, 2011, 08:10:43 PM
 #247

How does this compare to the other one Phateus has?

I can't give you exact numbers as I don't feel like reinstalling kernel 2.1 but I do know it was slower for me than this latest Kernel from Diapolo
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 04, 2011, 09:23:07 PM
Last edit: August 05, 2011, 03:30:40 AM by joulesbeef
 #248

Ok found and downloaded a copy of bfipatcher.py

did you mention this dependency? was I supposed to have it already?

First run.. way way slow... oh i need vectors vectors2.. oops.

ok seems to be working.. will keep an eye on the stale count.

faster than 7-17 for me.. slower than phatk 2.1 edit... no rejects on phatk that was my pool being bad

mooo for rent
Tx2000
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile
August 04, 2011, 10:07:38 PM
 #249

1.50 phoenix command line

5850 920c 315m 1.1v - 11.4 preview w/ 2.4 SDK

396-397Mh/s

with the recommended WORKSIZE=256 flags.


Get 399 with phatk 2.1. I will run for awhile to see if there is any changes with stales though I was not getting many to begin with on 2.1 anyway.
Coaster
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
August 05, 2011, 12:42:04 AM
 #250

the 8-4 prerelease for some reason lowered the hashrate on only 1 of 2 of my cards.. (the second card went from an average 426 to 389) same OC settings while card #1 remained around 426 (6970's)

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 05, 2011, 04:29:15 AM
 #251

Ok found and downloaded a copy of bfipatcher.py

did you mention this dependency? was I supposed to have it already?

First run.. way way slow... oh i need vectors vectors2.. oops.

ok seems to be working.. will keep an eye on the stale count.

faster than 7-17 for me.. slower than phatk 2.1 edit... no rejects on phatk that was my pool being bad

Phoenix 1.5 has the bfipatcher.py included, so I never included it in any kernel package Wink.
The phatk version will be faster for 58XX cards, but 69XX should be faster or even now.
Guys just figure out which version works best for your personal configuration!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 05, 2011, 04:30:18 AM
 #252

the 8-4 prerelease for some reason lowered the hashrate on only 1 of 2 of my cards.. (the second card went from an average 426 to 389) same OC settings while card #1 remained around 426 (6970's)

Are you sure the VECTORS2 switch IS set!?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
August 05, 2011, 05:36:35 AM
 #253

04-08-2011 version reduces hash speed on 5870.
I used vectors 4 & got only 444, while 2.1 phateus gives 447.
 
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 05, 2011, 06:52:43 AM
 #254

04-08-2011 version reduces hash speed on 5870.
I used vectors 4 & got only 444, while 2.1 phateus gives 447.
 

The current phatk should be faster for 58XX, sorry, but I said it _IS_ faster for 69XX Wink.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Okama
Newbie
*
Offline Offline

Activity: 23
Merit: 0


View Profile
August 05, 2011, 10:04:42 AM
 #255

Catalyst 11.6, SDK 2.4 here. Jumped from ~428 to ~435 with 5870, 950/300 clocks.

My friend got from ~416 to 425 with 920/300 clocks, also 5870 with Catalyst 11.5, SDK 2.1.

Keep up the good work, Dia!!
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 05, 2011, 10:16:04 AM
 #256

Catalyst 11.6, SDK 2.4 here. Jumped from ~428 to ~435 with 5870, 950/300 clocks.

My friend got from ~416 to 425 with 920/300 clocks, also 5870 with Catalyst 11.5, SDK 2.1.

Keep up the good work, Dia!!

No problem mate, that's 10 BTC each card :-D.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
ovidiusoft
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250


View Profile
August 06, 2011, 07:06:37 AM
 #257

I am happy to report these results for my 5830 running on SDK 2.4, fglrx-driver 1:11-6-2 (Debian unstable ~3 weeks snapshot), phoenix 1.50 and options "VECTORS VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256":

* 07-11 @ 1040 Mhz - 334 Mhash
* 08-04 @ 1040 Mhz - 335,7 Mhash

* 07-11 @ 1050 Mhz - 337,3 Mhash
* 08-04 @ 1050 Mhz - 338,9 Mhash

07-17 was untested because of lots of hardware errors. The even more happy news for me is that 08-14 brought the number of hardware erors at about the same level as 07-11 (~0.2% of the accepted shares).

Thank you again for your work and I'm looking forward to having this kernel ported to cgminer!
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
August 06, 2011, 10:49:54 AM
 #258

I am happy to report these results for my 5830 running on SDK 2.4, fglrx-driver 1:11-6-2 (Debian unstable ~3 weeks snapshot), phoenix 1.50 and options "VECTORS VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256":

blah blah blah

You don't need VECTORS VECTORS2. Just VECTORS2

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 06, 2011, 10:52:38 AM
 #259

I am happy to report these results for my 5830 running on SDK 2.4, fglrx-driver 1:11-6-2 (Debian unstable ~3 weeks snapshot), phoenix 1.50 and options "VECTORS VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256":

blah blah blah

You don't need VECTORS VECTORS2. Just VECTORS2

Correct Smiley

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
ovidiusoft
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250


View Profile
August 06, 2011, 10:57:41 AM
 #260

It says otherwise in the first post Tongue Tongue
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 06, 2011, 11:02:36 AM
 #261

It says otherwise in the first post Tongue Tongue

You are right, sorry for that! Just VECTORS2 is the way to go. I edited the first post.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 06, 2011, 08:31:26 PM
 #262

Quote
Phoenix 1.5 has the bfipatcher.py included, so I never included it in any kernel package
strange must not have been in my guiminer's version of phoenix 1.5 which does seem different.


Quote
You are right, sorry for that! Just VECTORS2 is the way to go. I edited the first post


just to be clear though.. vectors vectors2 doesnt hurt anything.. it is just extraneous?


I preferred to leave it as it was lest typing and deleting when testing the versions.

mooo for rent
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 06, 2011, 09:31:21 PM
 #263

Quote
Phoenix 1.5 has the bfipatcher.py included, so I never included it in any kernel package
strange must not have been in my guiminer's version of phoenix 1.5 which does seem different.


Quote
You are right, sorry for that! Just VECTORS2 is the way to go. I edited the first post


just to be clear though.. vectors vectors2 doesnt hurt anything.. it is just extraneous?


I preferred to leave it as it was lest typing and deleting when testing the versions.


Yeah, it doesn't hurt, it's just ignored.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
August 07, 2011, 01:39:19 AM
 #264

New version was just released, it should be the fastest for 69XX cards:
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788

This is the preferred switch for Phoenix in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256
or
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

Please test this version with SDK 2.4 / SDK 2.5! SDK 2.1 performance seems worse, but at least it should work. Report any errors and problems here and let me know what you think.
Have a look at your cards temperatures, I got a report, that they may be lower, which would be great Smiley.

Regards,
Dia

I get  0.8MH/s faster with phoenix-r112, but temps do appear to be 3C-4C lower.

6970 Lightning (940,1375) x2
Ubuntu 10.10
SDK 2.4
Cat 11.3


If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
August 07, 2011, 03:52:36 AM
 #265

just to be clear though.. vectors vectors2 doesnt hurt anything.. it is just extraneous?
I preferred to leave it as it was lest typing and deleting when testing the versions.
But if their is 2 vectors like "vectors vectors2" , which will be taken in to acc. 1st one or last one in command line?
coz vectors2 & vectors both give different performances.
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
August 07, 2011, 06:37:46 AM
 #266

The latest version (2011-08-04) has a major problem that I can see.

The assumption that there won't be more than 1 valid nonce per kernel execution is very wrong. At aggression 14 for example each kernel execution tests 2^30 nonces. The chance that there will be more than 1 valid nonce in any given kernel execution in this case is going to be about 2.5% (if I did the math right) This effectively causes a net loss in performance compared to the previous version at high aggression. At lower aggression values (10 and below) this is less of a problem since the performance loss in these cases will be much less than 1%.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 07, 2011, 07:21:39 AM
 #267

The latest version (2011-08-04) has a major problem that I can see.

The assumption that there won't be more than 1 valid nonce per kernel execution is very wrong. At aggression 14 for example each kernel execution tests 2^30 nonces. The chance that there will be more than 1 valid nonce in any given kernel execution in this case is going to be about 2.5% (if I did the math right) This effectively causes a net loss in performance compared to the previous version at high aggression. At lower aggression values (10 and below) this is less of a problem since the performance loss in these cases will be much less than 1%.

You have to compare the loss of valid nonces to the higher efficiency because of the removed control flow in the kernel (all current GPUs dislike if/else and so on). I thought this tradeoff would be well worth it, but you could prove me wrong. I was thinking about a better way of writing the positive nonces into output, but that didn't work.

Any good ideas for that part of the kernel will be a big plus!

Dia


Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 07, 2011, 03:13:05 PM
 #268

Updated 1st post kernel performance data with SDK 2.5 and KernelAnalyzer 1.9 Cal 11.7 profile.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Beta-coiner1
Hero Member
*****
Offline Offline

Activity: 532
Merit: 500


View Profile
August 07, 2011, 05:55:18 PM
 #269

New version was just released, it should be the fastest for 69XX cards:
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788

This is the preferred switch for Phoenix in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256
or
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

Please test this version with SDK 2.4 / SDK 2.5! SDK 2.1 performance seems worse, but at least it should work. Report any errors and problems here and let me know what you think.
Have a look at your cards temperatures, I got a report, that they may be lower, which would be great Smiley.

Regards,
Dia

I get  0.8MH/s faster with phoenix-r112, but temps do appear to be 3C-4C lower.

6970 Lightning (940,1375) x2
Ubuntu 10.10
SDK 2.4
Cat 11.3


I can confirm the temps difference,which I thought was strange.Using Catalyst 11.6B/SDK 2.5 on a 6950 @867/1250 using V 4 W64 F3 temps are 3 C lower using GUI miner.Hash rate has also increased 3 Mh's using those settings as well as invalids are definitely much lower vs. Phataeus.

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 07, 2011, 07:15:22 PM
 #270

New version was just released, it should be the fastest for 69XX cards:
Download version 2011-08-04 (pre-release): http://www.mediafire.com/?upwwud7kfyx7788

This is the preferred switch for Phoenix in order to achieve comparable performance:
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=256
or
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

Please test this version with SDK 2.4 / SDK 2.5! SDK 2.1 performance seems worse, but at least it should work. Report any errors and problems here and let me know what you think.
Have a look at your cards temperatures, I got a report, that they may be lower, which would be great Smiley.

Regards,
Dia

I get  0.8MH/s faster with phoenix-r112, but temps do appear to be 3C-4C lower.

6970 Lightning (940,1375) x2
Ubuntu 10.10
SDK 2.4
Cat 11.3


I can confirm the temps difference,which I thought was strange.Using Catalyst 11.6B/SDK 2.5 on a 6950 @867/1250 using V 4 W64 F3 temps are 3 C lower using GUI miner.Hash rate has also increased 3 Mh's using those settings as well as invalids are definitely much lower vs. Phataeus.

I have to ask to understand you ... you say that my current pre-release version generates 3°C less heat for your card and invalid share rate is lower in comparison to the latest Phateus phatk?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 07, 2011, 07:27:31 PM
 #271

To all happy new kernel users, there is one thing you should know ... there have been NO donations since 2011-07-31, which makes me a bit sad.

It's my free time that I put in here (it were many hours till now) and the motivation is not only to get a "Thank you!". Remember, you guys generate more BTC with the kernel mods. It doesn't matter if it's my mod, Phateus mod or any others mod ... just be a little thankful and you keep a free and fast kernel + a motivated kernel mixer Diapolo Wink.

No offense to all the great people who already donated a few bitcents or even more, who helped me testing this, who helped me fix bugs or who added great ideas into this work!

Regards,
Diapolo

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Beta-coiner1
Hero Member
*****
Offline Offline

Activity: 532
Merit: 500


View Profile
August 07, 2011, 08:07:03 PM
 #272

I can confirm the temps difference,which I thought was strange.Using Catalyst 11.6B/SDK 2.5 on a 6950 @867/1250 using V 4 W64 F3 temps are 3 C lower using GUI miner.Hash rate has also increased 3 Mh's using those settings as well as invalids are definitely much lower vs. Phataeus.

I have to ask to understand you ... you say that my current pre-release version generates 3°C less heat for your card and invalid share rate is lower in comparison to the latest Phateus phatk?

Dia
[/quote]Yes,that would be correct.also sent a Bitcent your way to help out even though it might not be much.Here's hoping to more development for the 69xx architecture.Wink

drlatino999
Sr. Member
****
Offline Offline

Activity: 335
Merit: 250



View Profile
August 07, 2011, 10:59:38 PM
Last edit: August 08, 2011, 01:00:16 AM by drlatino999
 #273

Using the recommended settings -
Code:
-k phatk AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS VECTORS2 WORKSIZE=128

My 6950 dropped 3C, 5830 stayed the same.

Sappers clear the way
joulesbeef
Sr. Member
****
Offline Offline

Activity: 476
Merit: 250


moOo


View Profile
August 07, 2011, 11:49:40 PM
 #274

Quote
WORKSIZE=128p
typo or something knew I dont know about?

mooo for rent
drlatino999
Sr. Member
****
Offline Offline

Activity: 335
Merit: 250



View Profile
August 08, 2011, 01:00:03 AM
 #275

Typo, let me edit that to reflect.

Sappers clear the way
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 08, 2011, 04:31:25 AM
 #276

Quote
WORKSIZE=128p
typo or something knew I dont know about?


It's only a typo there ...

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
RedLine888
Full Member
***
Offline Offline

Activity: 236
Merit: 109


View Profile
August 08, 2011, 09:19:31 AM
 #277

Hi! Dunno whether the info I provide would be of any use but nevertheless...

Installed the 2011-08-04 kernel version and got + ~4 MHs on 6950 and - ~3 MHs on 5870 and my 5870 became unstable!!!

It works at 990 core and 360 mem with the previous version of your kernel and is perfectly stable but with this new version the driver crashes after a few seconds at even 980 core. The temps are perfect and stay at less than 78 C.

Thanx though for your work!
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
August 08, 2011, 05:14:29 PM
 #278

I still don't know why people are doing "VECTORS VECTORS2". VECTORS is an invalid argument for diapolo phatk ever since 8-04. The only valid arguments are VECTORS2 and VECTORS4.
Quote
Important: since version 2011-08-04 (pre-release) you have to use the switch VECTORS2 instead of VECTORS. I made this change to be clear what vectors are used in the kernel (2- or 4-component). To use 4-component vectors use switch VECTORS4.

jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
August 08, 2011, 06:57:58 PM
Last edit: August 09, 2011, 02:55:18 AM by jedi95
 #279

You have to compare the loss of valid nonces to the higher efficiency because of the removed control flow in the kernel (all current GPUs dislike if/else and so on). I thought this tradeoff would be well worth it, but you could prove me wrong. I was thinking about a better way of writing the positive nonces into output, but that didn't work.

Any good ideas for that part of the kernel will be a big plus!

Dia

After looking at the code more carefully your method is only problematic if more than 1 vector component returns a valid nonce. The odds of this happening are EXTREMELY small, since you would have to find more than 1 valid hash in a range of only 2 or 4 hashes.

That said, I have devised a way to remove the if(nonce) control structure entirely. This makes a couple assumptions:

1. Control flow instructions have a large clock cycle penalty regardless of the branch taken (so you get 44 cycle penalty on Cypress and Cayman regardless of if H == 0)
2. Writing values to output[] for every nonce even if the nonce is invalid does not incur a significant clock cycle cost relative to the control flow instructions. (ideally <10 clocks, but if it's below ~30 the code below will still be faster than the current code)

The steps:

1. OR the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)

Steps 7-8 are to produce an 8-bit index that is 0 for all invalid nonces and hopefuly unique for each valid nonce assuming there are a small number of valid nonces. However in the worst case (more than 1 hash found in a single execution) at least 1 will be returned. However if 3 or less nonces are found per execution all of them should be returned in most cass.

output[0] will be overwritten constantly by invalid nonces (since the 1-bit number from step 5 will be 0 unless the hash satisfies H == 0, the resulting 8-bit number will also be 0) output[>0] will contain valid nonces will a small chance of collisions.

Cypress and Cayman (58xx and 69xx respectively) have a 44 cycle latency for control flow instructions

Steps 1 - 8 should execute in 1 clock each (however they can't be vectorized, so this won't exploit any ILP)

Step 9 takes no longer than the current code for valid nonces, but this will now also apply to invalid nonces.

overall this should be fast, return only valid nonces, and retain the capability to return more than one nonce if the assumptions above are true.

An example of how even a single 1 in the input will cause the output of steps 1-5 to be 0:
--------------------------------------------------------------------------------------

H = 0000000000000001 0000000000000000

00000000 00000001
00000000 00000000
-------------------OR
00000000 00000001

0000 0000
0000 0001
----------OR
0000 0001

00 00
00 01
------OR
00 01

0 0
0 1
---OR
0 1

0
1
-NOR
0

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 08, 2011, 07:38:59 PM
 #280

You have to compare the loss of valid nonces to the higher efficiency because of the removed control flow in the kernel (all current GPUs dislike if/else and so on). I thought this tradeoff would be well worth it, but you could prove me wrong. I was thinking about a better way of writing the positive nonces into output, but that didn't work.

Any good ideas for that part of the kernel will be a big plus!

Dia

After looking at the code more carefully your method is only problematic if more than 1 vector component returns a valid nonce. The odds of this happening are EXTREMELY small, since you would have to find more than 1 valid hash in a range of only 2 or 4 hashes.

That said, I have devised a way to remove the if(nonce) control structure entirely. This makes a couple assumptions:

1. Control flow instructions have a large clock cycle penalty regardless of the branch taken (so you get 44 cycle penalty on Cypress and Cayman regardless of if H == 0)
2. Writing values to output[] for every nonce even if the nonce is invalid does not incur a significant clock cycle cost relative to the control flow instructions. (ideally <10 clocks, but if it's below ~30 the code below will still be faster than the current code)

The steps:

1. AND the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)

Steps 7-8 are to produce an 8-bit index that is 0 for all invalid nonces and hopefuly unique for each valid nonce assuming there are a small number of valid nonces. However in the worst case (more than 1 hash found in a single execution) at least 1 will be returned. However if 3 or less nonces are found per execution all of them should be returned in most cass.

output[0] will be overwritten constantly by invalid nonces (since the 1-bit number from step 5 will be 0 unless the hash satisfies H == 0, the resulting 8-bit number will also be 0) output[>0] will contain valid nonces will a small chance of collisions.

Cypress and Cayman (58xx and 69xx respectively) have a 44 cycle latency for control flow instructions

Steps 1 - 8 should execute in 1 clock each (however they can't be vectorized, so this won't exploit any ILP)

Step 9 takes no longer than the current code for valid nonces, but this will now also apply to invalid nonces.

overall this should be fast, return only valid nonces, and retain the capability to return more than one nonce if the assumptions above are true.

An example of how even a single 1 in the input will cause the output of steps 1-5 to be 0:
--------------------------------------------------------------------------------------

H = 0000000000000001 0000000000000000

00000000 00000001
00000000 00000000
-------------------OR
00000000 00000001

0000 0000
0000 0001
----------OR
0000 0001

00 00
00 01
------OR
00 01

0 0
0 1
---OR
0 1

0
1
-NOR
0

Thanks Jedi, I will look into this tomorrow, the last thing I tried was (and look into every piece of the output buffer):

Code:
const uint2 nonce = (uint2){((Vals[7].x == -H[7]) * W_3.x), ((Vals[7].y == -H[7]) * W_3.y)};

output[OUTPUT_MASK & (nonce.x >> 2)] = nonce.x;
output[OUTPUT_MASK & (nonce.y >> 2)] = nonce.y;

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Phateus
Newbie
*
Offline Offline

Activity: 52
Merit: 0


View Profile
August 08, 2011, 10:43:00 PM
 #281

The steps:

1. AND the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)

Steps 7-8 are to produce an 8-bit index that is 0 for all invalid nonces and hopefuly unique for each valid nonce assuming there are a small number of valid nonces. However in the worst case (more than 1 hash found in a single execution) at least 1 will be returned. However if 3 or less nonces are found per execution all of them should be returned in most cass.


Sorry to jump in in the middle of the conversation, but if I understand what you are trying to do...
Can't you just replace all of the steps  with:
Code:
Valid = 1 - min(H, 1u);
Nonce = W[3];
OUTPUT[((Nonce & OUTPUT_MASK) + 1) * Valid] = Nonce;
if you are trying to remove all control flow?  Any invalid nonce will be written into Output[0] and the valid nonces will be randomly distributed through the rest of the array.

I really don't know how the architecture handles having 4 billion threads writing to the same address, but... you may want to try it out...

Also, it is easy enough to make it work with VECTORS ;

Code:
Valid = 1 - (min(H.x, H.y), 1u);
//If .y is valid, add 1 to the nonce.
Nonce = W[3].x + min(H.y, 1);
OUTPUT[((Nonce & OUTPUT_MASK) + 1) * Valid] = Nonce;
(or you could just double the code for .x and .y)

OR
Code:
Valid = 1 - (min(H.x, H.y), 1u);
//If .y is valid, add 1 to the nonce.
Nonce = W[3].x;
OUTPUT[((Nonce & OUTPUT_MASK) + 1) * Valid] = Nonce;
and have the __init__ file check both Nonce and Nonce+1


another way of doing it would be (the compiler should replace the if statement with a set conditional):
Code:
Nonce = W[3];
Position = W[3] & OUTPUT_MASK;
if(H)
   Position = OUTPUT_MASK + 1;
//Invalid nonce are at the last position of the array, valid are distributed at the front
OUTPUT[Position] = Nonce;

Slightly faster would be to have the Position = the local thread # (since you save an &) and make sure that the size of the output* array is WORKSIZE + 1:
Code:
Nonce = W[3];
Position = get_local_id(0);
if(H)
   Position = WORKSIZE + 1;
OUTPUT[Position] = Nonce;

EDIT:  Ooh, just thought of something else: 

If it doesn't like writing everything to the same address: Make the buffer size = 2*WORKSIZE...
Code:
Nonce = W[3];
Position = get_local_id(0);
if(H)
   Position += WORKSIZE;
OUTPUT[Position] = Nonce;
Then all of the threads in a workgroup will write to a different address.  The valid nonces will be in the first half, and the invalid will be in the second.

Now I have no idea if any of these things would be faster, but I think all of them would work...

Sorry to put so much code down... but this kind of coding isn't really an exact science...
bcforum
Full Member
***
Offline Offline

Activity: 140
Merit: 100


View Profile
August 09, 2011, 02:50:54 AM
 #282



1. AND the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)


I don't claim to understand this, but step (1) should be an OR, not an AND.

If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
jedi95
Full Member
***
Offline Offline

Activity: 219
Merit: 120


View Profile
August 09, 2011, 02:56:28 AM
 #283



1. AND the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)


I don't claim to understand this, but step (1) should be an OR, not an AND.


Yeah that's right. Must have missed that when I went over the post. I had it correct in the example though.

Phoenix Miner developer

Donations appreciated at:
1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
August 09, 2011, 04:14:13 PM
 #284

Sent you half a bit to keep you motivated.   Grin

Keep up the good work Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 09, 2011, 04:26:31 PM
 #285

Sent you half a bit to keep you motivated.   Grin

Keep up the good work Diapolo

Woohoo I feel damn motivated Wink ... thanks mate!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
indio007
Full Member
***
Offline Offline

Activity: 224
Merit: 100


View Profile
August 09, 2011, 04:35:18 PM
 #286

Just out of curiosity , how many unique downloads of your modification have there been? If you know of course.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 09, 2011, 05:33:03 PM
 #287

Just out of curiosity , how many unique downloads of your modification have there been? If you know of course.

The sum of all downloads is > 5500 (for all released versions).

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 09, 2011, 07:49:07 PM
 #288



1. AND the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)


I don't claim to understand this, but step (1) should be an OR, not an AND.


Yeah that's right. Must have missed that when I went over the post. I had it correct in the example though.

I tried to implement this, but the kernel only crashes the display driver THAT hard, I get a Bluescreen everytime ... weird.

Code:
	// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + ch(124) + s1(124) + H[7];

...

// lo 16 Bits OR hi 16 Bits
uint positive = (Vals[7].x & 0x0000FFFFU) | (Vals[7].x & 0xFFFF0000U);
// lo 8 Bits OR hi 8 Bits
positive = (positive & 0x00FFU) | (positive & 0xFF00U);
// lo 4 Bits OR hi 4 Bits
positive = (positive & 0x0FU) | (positive & 0xF0U);
// lo 2 Bits OR hi 2 Bits
positive = (positive & 0x3U) | (positive & 0xCU);
// lo 1 Bit NOR hi 1 Bit
positive = ~((positive & 0x1U) | (positive & 0x2U));

// nonce AND positive
uint position = W_3.x & positive;
// lo 16 Bits XOR hi 16 Bits
position = (position & 0x0000FFFFU) ^ (position & 0xFFFF0000U);
// lo 8 Bits OR hi 8 Bits
position = (position & 0x00FFU) | (position & 0xFF00U);

output[position] = W_3.x;

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Phateus
Newbie
*
Offline Offline

Activity: 52
Merit: 0


View Profile
August 09, 2011, 08:20:46 PM
 #289



1. AND the low 16-bits of H against the high 16 bits
2. Take the resulting 16-bit number and OR the low 8 bits against the high 8-bits
3. Take the resulting 8-bit number and OR the low 4 bits against the high 4-bits
4. Take the resulting 4-bit number and OR the low 2 bits against the high 2-bits
5. Take the resulting 2-bit number and NOR the first bit against the second bit

6. do bitwise AND of the resulting 1-bit number against the nonce
7. take the result from #6 and XOR the low 16-bits against the high 16-bits
8. take the resulting 16-bit number from #7 and OR the low 8-bits against the high 8-bits
9. store the result by doing output[OUTPUT_SIZE] = OUTPUT[result of #8] = nonce

Steps 1-5 create a single bit indicating if the nonce meets H == 0. When you bitwise AND this against the nonce in step 6 you will get 0 for any invalid nonces and for valid nonces you will just get the nonce again. (1 AND X = X)


I don't claim to understand this, but step (1) should be an OR, not an AND.


Yeah that's right. Must have missed that when I went over the post. I had it correct in the example though.

I tried to implement this, but the kernel only crashes the display driver THAT hard, I get a Bluescreen everytime ... weird.

Code:
	// Round 124
Vals[7] += Vals[3] + P4(124) + P3(124) + P1(124) + P2(124) + ch(124) + s1(124) + H[7];

...

// lo 16 Bits OR hi 16 Bits
uint positive = (Vals[7].x & 0x0000FFFFU) | (Vals[7].x & 0xFFFF0000U);
// lo 8 Bits OR hi 8 Bits
positive = (positive & 0x00FFU) | (positive & 0xFF00U);
// lo 4 Bits OR hi 4 Bits
positive = (positive & 0x0FU) | (positive & 0xF0U);
// lo 2 Bits OR hi 2 Bits
positive = (positive & 0x3U) | (positive & 0xCU);
// lo 1 Bit NOR hi 1 Bit
positive = ~((positive & 0x1U) | (positive & 0x2U));

// nonce AND positive
uint position = W_3.x & positive;
// lo 16 Bits XOR hi 16 Bits
position = (position & 0x0000FFFFU) ^ (position & 0xFFFF0000U);
// lo 8 Bits OR hi 8 Bits
position = (position & 0x00FFU) | (position & 0xFF00U);

output[position] = W_3.x;

Dia

You need to shift the the bits for each stage:

For example, oring the top bits to the bottom bits should be:

Code:
uint positive = (Vals[7].x & 0x0000FFFFU) | ((Vals[7].x & 0xFFFF0000U) >> 16);
or just:
Code:
uint positive = (Vals[7].x & 0x0000FFFFU) | (Vals[7].x >> 16);
because the upper 16 bits will already be 0 because of the shift;

Otherwise, you will just get the original Vals[7] value;
if you want to do it that way, the code would be:
Code:
	uint positive = (Vals[7].x & 0x0000FFFFU) | (Vals[7].x >> 16);
// lo 8 Bits OR hi 8 Bits
positive = (positive & 0x00FFU) | (positive >> 8);
// lo 4 Bits OR hi 4 Bits
positive = (positive & 0x0FU) | (positive >> 4);
// lo 2 Bits OR hi 2 Bits
positive = (positive & 0x3U) | (positive >> 2);
// lo 1 Bit NOR hi 1 Bit
positive = ~((positive & 0x1U) | (positive >> 1));

However, similar to what I said earlier, the following code does the same thing:
Code:
	uint positive = 0xFFFFFFFF + min(Vals[7], 1u);
if Vals[7] ==0, then min(Vals[7], 1u) == 0, otherwise it equals 1
0xFFFFFFFF + 0 = 0xFFFFFFFF
0xFFFFFFFF + 1 = 0


oh yeah...  you are getting blue screens because your address would be a random 32 bit number and it was probably trying to access memory that your video card doesn't have
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 11, 2011, 03:42:23 PM
 #290

Download version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4j

New version for your testing pleasure Wink. Remember to use VECTORS2 as switch!
This one should be a bit faster for 58XX and 69XX cards compared to earlier versions PLUS it should not generate invalid shares, if more than 1 positve nonce is found in a work-group!

If a few of you could make a comparison (with older or other kernel versions) of accepted shares over a certain period of time, this woule be pretty cool!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
miscreanity
Legendary
*
Offline Offline

Activity: 1316
Merit: 1005


View Profile
August 11, 2011, 04:20:20 PM
 #291

Download version 2011-08-11: http://www.mediafire.com/?s5c7h4r91r4ad4j

New version for your testing pleasure Wink. Remember to use VECTORS2 as switch!
This one should be a bit faster for 58XX and 69XX cards compared to earlier versions PLUS it should not generate invalid shares, if more than 1 positve nonce is found in a work-group!

If a few of you could make a comparison (with older or other kernel versions) of accepted shares over a certain period of time, this woule be pretty cool!

Dia


6950 @ 920/300; Linux 2.6.38, 11.6/2.4; 2x 5 min runs for each setting with Phoenix 1.50

AGGRESSION=12 BFI_INT FASTLOOP=false VECTORS2

WORKSIZE=128
[374.89 Mhash/sec] [28 Accepted] [0 Rejected] [RPC (+LP)]
- Negligible difference from 2011-08-02 kernel.

WORKSIZE=256
[344.50 Mhash/sec] [25 Accepted] [0 Rejected] [RPC (+LP)]
- Significant drop of ~25-30 Mh/s from 08-02 kernel.
Tx2000
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile
August 11, 2011, 04:53:46 PM
 #292

11.8 / SDK 2.4   920c/320m  5850 reference

-k phatk VECTORS VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=10 WORKSIZE=256

393-394 Mh/s, compared to 398-399 with prior version.
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
August 11, 2011, 06:16:37 PM
 #293

436 & 426 using diapolo 2011-8-11.
While phatk 2.2 of Phateus gives 448 & 433.

Windows 7, 64 bit, AERO enabled, AOCLBF 1.75, for diapolo used vectors2 & removed check mark for vectors in AOCLBF.
Aggression=12, worksize=256
11.8 catalyst beta.
MSI R5870 Lightning & Sapphore HD 5870.
975/325 & 939/313.
talldude
Member
**
Offline Offline

Activity: 224
Merit: 10


View Profile
August 11, 2011, 07:10:33 PM
 #294

I'm giving this a go.

5850, all the usual flags, aggression 11. Dropped 1.5mhash compared to phatk 2.2 but we'll see if invalid shares also drop and/or valid shares go up. I'll edit this post in a day or so (if I remember).
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 11, 2011, 08:03:52 PM
 #295

Did anyone with SDK 2.5 check this out? I get better results on 5870 and 5830 than with former kernels and I had hoped 69XX would be really faster :-/.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
moomoocow
Newbie
*
Offline Offline

Activity: 18
Merit: 0


View Profile
August 11, 2011, 10:54:39 PM
 #296

I was running the 8-4-2011 pre-release before this and the new 8-11-2011 release yields identical hash rates on my 6950.

Cat 11.8 preview.
PcChip
Sr. Member
****
Offline Offline

Activity: 418
Merit: 250


View Profile
August 18, 2011, 02:03:05 AM
 #297

On Cat 11.8 Preview:

Your latest: 307 MH/s
Phateus 2.2: 312 MH/s

(5830 @ 965/300, Worksize 256)

Legacy signature from 2011: 
All rates with Phoenix 1.50 / PhatK
5850 - 400 MH/s  |  5850 - 355 MH/s | 5830 - 310 MH/s  |  GTX570 - 115 MH/s | 5770 - 210 MH/s | 5770 - 200 MH/s
iopq
Hero Member
*****
Offline Offline

Activity: 658
Merit: 500


View Profile
August 18, 2011, 03:59:37 AM
 #298

I am getting about the same with diapolo's as I do with phatk2.2 on a 5750 with memory clock at 200, worksize 256, vectors2

I'm using fpgaminer's modified poclbm
Parja
Newbie
*
Offline Offline

Activity: 36
Merit: 0


View Profile
August 20, 2011, 04:36:36 PM
 #299

I made an interesting discovery during my own tests with the new kernel version. I had to up the memory clock of my 5870 from 200 to 350 MHz in order to achieve the highest hashing values. Another thing to mention is, that I drive a Phenom II X6 1090T with only 800 MHz for every core, due to power saving, while mining. If I let the CPU use full speed, MHash/s goes even higher, let's say 3-4 MH/s.

Conclusion: Perhaps you guys should try to raise your mem speeds + experiment with CPU clocks, too. I know it has to be a good balance, so that higher MH/s values are not eaten by higher energy costs.

Dia

I'm actually finding with the 8-11 kernel that memory speed can be dropped down very low and still maintain optimal performance.  I've got a total of 5 58X0 cards running, and they're all perfectly content to max out the MH/s at 150MHz memory speed.

So while I've found that phatk 2.2 can do about 1-1.5% higher than 8-11 at the same core speed, phatk likes a memory speed up around 430MHz for optimal performance.  So with that memory speed drop, I'm seeing about 2-3C lower core temps on my cards...or about 20MHz higher core speeds for the same temps, which more than makes up for the performance gap.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 23, 2011, 05:48:08 AM
 #300

I made an interesting discovery during my own tests with the new kernel version. I had to up the memory clock of my 5870 from 200 to 350 MHz in order to achieve the highest hashing values. Another thing to mention is, that I drive a Phenom II X6 1090T with only 800 MHz for every core, due to power saving, while mining. If I let the CPU use full speed, MHash/s goes even higher, let's say 3-4 MH/s.

Conclusion: Perhaps you guys should try to raise your mem speeds + experiment with CPU clocks, too. I know it has to be a good balance, so that higher MH/s values are not eaten by higher energy costs.

Dia

I'm actually finding with the 8-11 kernel that memory speed can be dropped down very low and still maintain optimal performance.  I've got a total of 5 58X0 cards running, and they're all perfectly content to max out the MH/s at 150MHz memory speed.

So while I've found that phatk 2.2 can do about 1-1.5% higher than 8-11 at the same core speed, phatk likes a memory speed up around 430MHz for optimal performance.  So with that memory speed drop, I'm seeing about 2-3C lower core temps on my cards...or about 20MHz higher core speeds for the same temps, which more than makes up for the performance gap.

Very interesting, but I guess currently the focus for most users is on phatk2, even if your observation could turn out to change some users mind Wink. I'm still working on the kernel, but the really big jumps are hard to do these days Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
CanaryInTheMine
Donator
Legendary
*
Offline Offline

Activity: 2352
Merit: 1060


between a rock and a block!


View Profile
August 24, 2011, 06:26:49 PM
 #301

i've noticed a 2 degree temp drop using phatk2 with the 1.6 phoenix. Nice!

allowed me to increase the clock speed to a level that before wasnot stable. now it's stable a a little bit higher clock rate.

will try lowering mem some more based on other's feedback here.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 27, 2011, 12:12:02 PM
Last edit: August 27, 2011, 01:10:21 PM by Diapolo
 #302

Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji

This version is a bit faster on 58XX cards, reports indicate it can be faster on 69XX cards, too ... I guess this is because of the optimized writing to the output buffer.
You can leave out the BFI_INT switch, but remember to supply the VECTORS2 switch Smiley! This version takes care of wrong WORKSIZE arguments, too ... if you forget that switch, if it has an too big value or if it's not a power of 2, the maximum supported WORKSIZE for each device is used.

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
sd
Hero Member
*****
Offline Offline

Activity: 730
Merit: 500



View Profile
August 30, 2011, 07:21:50 PM
 #303

Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji

This version is a bit faster on 58XX cards, reports indicate it can be faster on 69XX cards, too ... I guess this is because of the optimized writing to the output buffer.
You can leave out the BFI_INT switch, but remember to supply the VECTORS2 switch Smiley! This version takes care of wrong WORKSIZE arguments, too ... if you forget that switch, if it has an too big value or if it's not a power of 2, the maximum supported WORKSIZE for each device is used.

Thanks,
Dia

I get a lot of hardware errors with this kernel but none from the phatk-2.2 kernel:

...
[30/08/2011 21:03:37] Kernel error: Unusual behavior from OpenCL. Hardware problem?
[30/08/2011 21:03:42] Kernel error: Unusual behavior from OpenCL. Hardware problem?
[30/08/2011 21:03:53] Kernel error: Unusual behavior from OpenCL. Hardware problem?
...

That's with the core clocked to 940 and ram clocked down to 300. I'm using phoenix 1.5 and SDK 2.4 with both kernel versions. The GPU is a HD5870.

Is this a kernel fault, a hardware fault, or am I doing something wrong?

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 30, 2011, 07:51:34 PM
 #304

Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji

This version is a bit faster on 58XX cards, reports indicate it can be faster on 69XX cards, too ... I guess this is because of the optimized writing to the output buffer.
You can leave out the BFI_INT switch, but remember to supply the VECTORS2 switch Smiley! This version takes care of wrong WORKSIZE arguments, too ... if you forget that switch, if it has an too big value or if it's not a power of 2, the maximum supported WORKSIZE for each device is used.

Thanks,
Dia

I get a lot of hardware errors with this kernel but none from the phatk-2.2 kernel:

...
[30/08/2011 21:03:37] Kernel error: Unusual behavior from OpenCL. Hardware problem?
[30/08/2011 21:03:42] Kernel error: Unusual behavior from OpenCL. Hardware problem?
[30/08/2011 21:03:53] Kernel error: Unusual behavior from OpenCL. Hardware problem?
...

That's with the core clocked to 940 and ram clocked down to 300. I'm using phoenix 1.5 and SDK 2.4 with both kernel versions. The GPU is a HD5870.

Is this a kernel fault, a hardware fault, or am I doing something wrong?



I run a 5870 with 900 Core and 350 Mem, NO errors at all ... so you could try to lower your Core clock a little. Strange that phatk2 gives you no errors.
What Hasrate are you achieving?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
sd
Hero Member
*****
Offline Offline

Activity: 730
Merit: 500



View Profile
August 30, 2011, 08:13:30 PM
 #305

I run a 5870 with 900 Core and 350 Mem, NO errors at all ... so you could try to lower your Core clock a little. Strange that phatk2 gives you no errors.
What Hasrate are you achieving?


434.39 Mhash/sec On Phatk2.2. I got no hardware errors for weeks on the phatk 2.2 kernel. I'll try adjusting the clockrate.

Nice work BTW. I've sent you ( small ) donations in the past.
sd
Hero Member
*****
Offline Offline

Activity: 730
Merit: 500



View Profile
August 30, 2011, 08:28:07 PM
 #306

I run a 5870 with 900 Core and 350 Mem, NO errors at all ... so you could try to lower your Core clock a little. Strange that phatk2 gives you no errors.
What Hasrate are you achieving?


On phatk2.2 I'm getting 433.5 MHash/Sec.
On the new kernel I'm getting 345.35 MHash/Sec and hardware errors a few times a minute. 80MHash/Sec less!

That's with all other settings being equal. The only thing I changed was VECTORS to VECTORS2 and I removed BFI_INT from the kernel arguments.

Upgrading phoenix won't actually change anything relevant will it?
iopq
Hero Member
*****
Offline Offline

Activity: 658
Merit: 500


View Profile
August 30, 2011, 09:46:57 PM
 #307

Download version 2011-08-27: http://www.mediafire.com/?697r8t2pdk419ji

This version is a bit faster on 58XX cards, reports indicate it can be faster on 69XX cards, too ... I guess this is because of the optimized writing to the output buffer.
You can leave out the BFI_INT switch, but remember to supply the VECTORS2 switch Smiley! This version takes care of wrong WORKSIZE arguments, too ... if you forget that switch, if it has an too big value or if it's not a power of 2, the maximum supported WORKSIZE for each device is used.

Thanks,
Dia

I get a lot of hardware errors with this kernel but none from the phatk-2.2 kernel:
try rebooting your computer before running the kernel
any time I switch kernels I get hardware errors
sd
Hero Member
*****
Offline Offline

Activity: 730
Merit: 500



View Profile
August 30, 2011, 10:49:25 PM
Last edit: August 30, 2011, 11:03:26 PM by sd
 #308

try rebooting your computer before running the kernel
any time I switch kernels I get hardware errors

I tried it and but it didn't help. I get the exact same low hashrate and hardware errors after the reboot.

EDIT: After installing phoenix 1.6.2 the new kernel works perfectly. Either I screwed something up or the latest kernel just doesn't work with phoenix 1.5. It's working now but it's 2.5Mhash/Sec slower than the phatk2.2 one with the same settings.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
August 31, 2011, 06:18:33 AM
 #309

try rebooting your computer before running the kernel
any time I switch kernels I get hardware errors

I tried it and but it didn't help. I get the exact same low hashrate and hardware errors after the reboot.

EDIT: After installing phoenix 1.6.2 the new kernel works perfectly. Either I screwed something up or the latest kernel just doesn't work with phoenix 1.5. It's working now but it's 2.5Mhash/Sec slower than the phatk2.2 one with the same settings.

That results seem accurate, phatk2.2 is a bit faster on VLIW5 hardware. Could you take a look at your temperatures or number of accepted shares in comparison to phatk2.2?

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
sd
Hero Member
*****
Offline Offline

Activity: 730
Merit: 500



View Profile
August 31, 2011, 07:40:58 AM
 #310

try rebooting your computer before running the kernel
any time I switch kernels I get hardware errors

I tried it and but it didn't help. I get the exact same low hashrate and hardware errors after the reboot.

EDIT: After installing phoenix 1.6.2 the new kernel works perfectly. Either I screwed something up or the latest kernel just doesn't work with phoenix 1.5. It's working now but it's 2.5Mhash/Sec slower than the phatk2.2 one with the same settings.

That results seem accurate, phatk2.2 is a bit faster on VLIW5 hardware. Could you take a look at your temperatures or number of accepted shares in comparison to phatk2.2?

Thanks,
Dia

The temperature seems somewhere between 2 and 4 degrees cooler. I'm solo mining so the only shares I generate are real blocks. I can't tell if anything has improved in that regard.

mute20
Sr. Member
****
Offline Offline

Activity: 265
Merit: 250


21


View Profile
September 02, 2011, 04:00:21 PM
 #311

I have finally found the 3% boost that I was talking about. Would this work with your improvement?

http://bitcointalk.org/index.php?topic=23067.0
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
September 02, 2011, 05:21:23 PM
 #312

I have finally found the 3% boost that I was talking about. Would this work with your improvement?

http://bitcointalk.org/index.php?topic=23067.0

This is in since I saw that thread, I'm sorry to say Wink.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Remember remember the 5th of November
Legendary
*
Offline Offline

Activity: 1862
Merit: 1011

Reverse engineer from time to time


View Profile
September 06, 2011, 01:03:07 AM
 #313

Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s

BTC:1AiCRMxgf1ptVQwx6hDuKMu4f7F27QmJC2
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
September 06, 2011, 05:03:07 AM
 #314

Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s

That's strange, could you check if your OpenCL driver reports cl_amd_media_ops as available (via GPU Caps Viewer).

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
September 14, 2011, 09:55:47 PM
 #315

Diapolo, i tried the latest kernel and you say that if BFI int is supported it will be enabled, well on my 5870 it wasnt...
So from 385 i went to 334mh/s

That's strange, could you check if your OpenCL driver reports cl_amd_media_ops as available (via GPU Caps Viewer).

Thanks,
Dia

Did he even bother to post what flags he was using? I know there was some confusion between VECTORS and VECTORS2 between yours and phateus kernels. It sounds like he's only doing 1 nonce per execution instead of 2.

ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
November 04, 2011, 03:27:43 AM
 #316

Sad Dead thread is dead. No more kernel updates?

Dexter770221
Legendary
*
Offline Offline

Activity: 1029
Merit: 1000


View Profile
November 04, 2011, 08:30:54 AM
 #317

Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
November 04, 2011, 12:00:28 PM
 #318

Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
teukon
Legendary
*
Offline Offline

Activity: 1246
Merit: 1004



View Profile
November 04, 2011, 12:25:35 PM
 #319

Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia

The donation problems throughout Bitcoin are rather strange.  Despite having a very easy way to send wealth across the internet even the most well known and respected developers receive very little as thanks for their efforts.  Perhaps this is partly due to the fact that Bitcoin (particularly Bitcoin mining) attracts people who are, on average, not so inclined to donations, and that most of the miners that care about the extra 0.5% of income from mining are naturally very tight with their money.

Ah well.  Thanks once again for all of your work.  Following this tread and trying out the patches as they came out was fun.  I hope you had fun too!
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
November 04, 2011, 02:12:26 PM
 #320

Bitcoin price don't encourage to work, and kernels are almost perfect efficient, so many threads are dead...

It's not only the BTC price, but the thankfulness in terms of donations was not really motivating anymore + most people compared my mod only in terms of raw performance with Phat's 2.X kernel, which made me think no one cares about a different approach or even to test out if there are differences in power draw or other factors. Well know I'm to far away anyway ^^.

Dia

The donation problems throughout Bitcoin are rather strange.  Despite having a very easy way to send wealth across the internet even the most well known and respected developers receive very little as thanks for their efforts.  Perhaps this is partly due to the fact that Bitcoin (particularly Bitcoin mining) attracts people who are, on average, not so inclined to donations, and that most of the miners that care about the extra 0.5% of income from mining are naturally very tight with their money.

Ah well.  Thanks once again for all of your work.  Following this tread and trying out the patches as they came out was fun.  I hope you had fun too!


I did Smiley, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
dishwara
Legendary
*
Offline Offline

Activity: 1855
Merit: 1016



View Profile
November 04, 2011, 03:11:47 PM
 #321

Sorry for not donating.
I tried & used your software when i was mining.
2 btc sent to the address in signature. 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM
Something is better than nothing.
Thank you for your valuable kernel.
gat3way
Sr. Member
****
Offline Offline

Activity: 256
Merit: 250


View Profile
November 05, 2011, 08:27:23 PM
 #322

I did Smiley, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.
Dia

Still coding OpenCL stuff, Diapolo?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
November 06, 2011, 10:22:05 AM
 #323

I did Smiley, liked getting (positve) feedback and to be an interesting part of Bitcoin for a short period of time.
Dia

Still coding OpenCL stuff, Diapolo?

I had no time to make any further progress, from time to time I vist AMDs OpenCL forum to stay a little up to date, but I'm currently not coding. Last thing I tried was to implement 3-component vectors into the kernel, but AMDs drivers seem still buggy there.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
November 06, 2011, 10:22:30 AM
 #324

Sorry for not donating.
I tried & used your software when i was mining.
2 btc sent to the address in signature. 1B6LEGEUu1USreFNaUfvPWLu6JZb7TLivM
Something is better than nothing.
Thank you for your valuable kernel.

I received your donation, a warm thank you Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
December 08, 2011, 08:50:39 AM
Last edit: December 08, 2011, 09:01:41 AM by d3m0n1q_733rz
 #325

Small change I could suggest just looking at some of the code.  I notice that some variables use simple addition and subtraction a few times.  For example:  
Code:
// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))
You notice that n - O comes up about 3 times in a row.  Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read?  Afterall, n - 7 - O is the same as n - 16 - O.  If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
kano
Legendary
*
Offline Offline

Activity: 4494
Merit: 1808


Linux since 1997 RedHat 4


View Profile
December 08, 2011, 11:27:41 AM
 #326

Small change I could suggest just looking at some of the code.  I notice that some variables use simple addition and subtraction a few times.  For example:  
Code:
// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))
You notice that n - O comes up about 3 times in a row.  Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read?  Afterall, n - 7 - O is the same as n - 16 - O.  If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.
Just looking at that from a standard compiler point of view
(I have no idea how good or bad or literal the OpenCL compiler is)
The compiler would probably notice that anyway if the OpenCL compiler was able to do basic optimisations.

Actually, Diapolo, do you know if it has an optimiser in it? (and how good it is?)
i.e. would that change suggested make a difference, or would the compiler work it out itself?
... just curious Smiley

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
December 08, 2011, 12:15:20 PM
 #327

Yes, in fact there are some tweaks done in the code now to make the OpenCL compiler produce more optimized code than it normally does.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 08, 2011, 06:25:19 PM
 #328

Small change I could suggest just looking at some of the code.  I notice that some variables use simple addition and subtraction a few times.  For example:  
Code:
// intermediate W calculations
#define P1(n) (rot(W[(n) - 2 - O], 15U) ^ rot(W[(n) - 2 - O], 13U) ^ (W[(n) - 2 - O] >> 10U))
#define P2(n) (rot(W[(n) - 15 - O], 25U) ^ rot(W[(n) - 15 - O], 14U) ^ (W[(n) - 15 - O] >> 3U))
#define P3(n) W[n - 7 - O]
#define P4(n) W[n - 16 - O]

// full W calculation
#define W(n) (W[n - O] = P4(n) + P3(n) + P2(n) + P1(n))
You notice that n - O comes up about 3 times in a row.  Wouldn't it be better to just combine n - O into its own variable and subtract from it to reduce the number of variables required to read?  Afterall, n - 7 - O is the same as n - 16 - O.  If you combine n - O, it's just a simple "pull from buffer and subtract this number" problem.
I haven't tested it, but I imagine it could speed things alone slightly.
Just looking at that from a standard compiler point of view
(I have no idea how good or bad or literal the OpenCL compiler is)
The compiler would probably notice that anyway if the OpenCL compiler was able to do basic optimisations.

Actually, Diapolo, do you know if it has an optimiser in it? (and how good it is?)
i.e. would that change suggested make a difference, or would the compiler work it out itself?
... just curious Smiley

It's not a beneficial change, because the compiler optimizes this out + it makes the code a bit more readable.
I'm pretty sure the easy optimizations are all done, but if you guys prove me wrong it would be nice Wink.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
December 09, 2011, 04:54:42 AM
 #329

Is there a way to disassemble the compiled version to the readable format so that I can do a little bit of a search for things to optimize?  I've learned never leave to a compile what you can do yourself.  Sometimes compilers will take you at your word.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
kano
Legendary
*
Offline Offline

Activity: 4494
Merit: 1808


Linux since 1997 RedHat 4


View Profile
December 09, 2011, 08:14:08 AM
 #330

True - however, consider this little comparison ...
A reasonably simple version of sha256 in C when compile with -O2 versus without is almost a double in performance.
(yeah I spent a couple of weeks recently playing with sha256 in C code and seeing what I could do with it ... and early on wondering why I was getting so bad results when I noticed I stupidly left out -O2 ... Tongue)
Their compiler may not be as good as gcc, but hopefully not much worse.

Of course yes do try and many will be interested in your results Smiley

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
gat3way
Sr. Member
****
Offline Offline

Activity: 256
Merit: 250


View Profile
December 09, 2011, 01:55:33 PM
 #331

The OpenCL compiler does involve constant folding as an optimization pass. It is an obvious optimization, no need to try this.
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
December 10, 2011, 11:50:49 PM
 #332

Anyone know of a decompiler I can use to look at the compiled source?  It'll help me remove unnecessary variables and the like.  Granted, I'm only decent with assembly at the moment, but I wouldn't mind seeing the finished product when the optimizer takes hold.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 11, 2011, 09:05:42 AM
 #333

Anyone know of a decompiler I can use to look at the compiled source?  It'll help me remove unnecessary variables and the like.  Granted, I'm only decent with assembly at the moment, but I wouldn't mind seeing the finished product when the optimizer takes hold.

Take a look at AMD APP KernelAnalyzer 1.9 it creates assembly like output for OpenCL kernels and gives register informations and that stuff ... it's in the AMD APP SDK.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
gat3way
Sr. Member
****
Offline Offline

Activity: 256
Merit: 250


View Profile
December 19, 2011, 09:54:19 PM
 #334

Someone interested in keeping that kernel up with 2.6? 3-component vectors are working now and it would need to get reordered a bit again to get better ALUPacking as the compiler backend has apparently changed in a way. I lost my interest in bitcoin, but it would be an interesting experiment. I believe pre-2.6 speeds can easily be regained.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 20, 2011, 04:07:11 PM
 #335

I made a few quick performance checks on a 6950 + a 6650D (APU) and it's weird. CGMINER is quite a bit slower with phatk2, compared to Phoenix 1.7 with my latest kernel on my rig.

For the 6950 CGMINER 2.0.8 is @ 330 MH/s with
Code:
-I 8 -d 0 -v 2 -w 128 --auto-gpu --gpu-fan 25-50 --gpu-engine 800 --gpu-memclock 1250 --temp-target 70
.
For the 6950 Phoenix 1.7 is @ 355 MH/s with
Code:
-a 50 -k phatk AGGRESSION=12 DEVICE=0 FASTLOOP=false VECTORS2 WORKSIZE=128
.

Am I missing something? Both run with 800 / 1250 and 2-component Vectors + Worksize of 128. I'm using SDK 2.6 installed with Cat 12.1 Preview!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Dexter770221
Legendary
*
Offline Offline

Activity: 1029
Merit: 1000


View Profile
December 20, 2011, 09:43:22 PM
 #336

For 6950 and cgminer I have identical hashrate. But memclock is set to 690. Catalyst 11.9

Under development Modular UPGRADEABLE Miner (MUM). Looking for investors.
Changing one PCB with screwdriver and you have brand new miner in hand... Plug&Play, scalable from one module to thousands.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 20, 2011, 10:23:48 PM
 #337

For 6950 and cgminer I have identical hashrate. But memclock is set to 690. Catalyst 11.9

Could you give Phoenix 1.7 with my latest posted version on posting 1 a try and report back Smiley?

Thanks,
Dia

Btw.: Is anyone able to help me getting 3-component vectors to work? The kernel should be valid but in __init__.py line 50
Code:
self.size = (nonceRange.size / rateDivisor) / self.iterations
it seems that
Code:
nonceRange.size / rateDivisor
(rateDivisor == 3 if VECTORS3 is used as kernel argument instead of VECTORS2) generates a problem, because nonceRange.size is a multiple of 256, which is not dividable by 3.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
December 27, 2011, 03:21:39 AM
 #338

So whats the latest kernel? 8-27? Or is there a secret newer version that I'm not seeing? Because according to main page, there is an unreleased kernel thats faster than 8-27 which is also called current. Where can I get the current kernel?

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 09:37:44 AM
 #339

So whats the latest kernel? 8-27? Or is there a secret newer version that I'm not seeing? Because according to main page, there is an unreleased kernel thats faster than 8-27 which is also called current. Where can I get the current kernel?

It's not released, because I had no time over Christmas Wink ... I guess I can put it on later today or tomorror.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
naz86
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
December 27, 2011, 10:52:27 AM
 #340

Hi Diapolo,

do you think we can still have such big improvements like in the past ?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 11:15:55 AM
 #341

Hi Diapolo,

do you think we can still have such big improvements like in the past ?

Perhaps with AMDs Graphics Core Next architecture and new kernels or the use of new OpenCL features, but I'm not able to write a new kernel from scratch Wink.
My current work is only to reorder some instructions in the kernel for better performance ... no big deal Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 11:39:49 AM
 #342

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9

Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.

Edit: Guys, try a setting of 64 for the WORKSIZE, it showed good results for me, but still depends on the card!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
bulanula
Hero Member
*****
Offline Offline

Activity: 518
Merit: 500



View Profile
December 27, 2011, 11:41:23 AM
 #343

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9

Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.

Dia

Great work mate !!!

So this should be pretty ideal for 5870 cards right ? What is the best setup for owners of 5870s right now in terms of SDK version, kernel, miner etc. ( software ) ? Thanks !
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 11:44:22 AM
 #344

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9

Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.

Dia

Great work mate !!!

So this should be pretty ideal for 5870 cards right ? What is the best setup for owners of 5870s right now in terms of SDK version, kernel, miner etc. ( software ) ? Thanks !

Well I sold my 5830, so I could only test on 6950, but what I saw there was a boost with Phoenix 1.7 in comparison to CGMINER, which uses Phatk 2.X (which seems somewhat broken with SDK 2.6). If this is not the case for your setup don't blame me Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
cyberlync
Full Member
***
Offline Offline

Activity: 226
Merit: 100



View Profile
December 27, 2011, 02:33:48 PM
 #345

Thanks for the work!

I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?

Thanks in advance.

Giving away your BTC's? Send 'em here: 1F7XgercyaXeDHiuq31YzrVK5YAhbDkJhf
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 03:56:30 PM
 #346

Thanks for the work!

I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?

Thanks in advance.

Tell me which filehoster works for you and I can upload it there. But it should be possible to upload without registering!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
conspirosphere.tk
Legendary
*
Offline Offline

Activity: 2352
Merit: 1064


Bitcoin is antisemitic


View Profile
December 27, 2011, 09:20:52 PM
 #347

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.
Dia

Thanks, but no thanks. It makes about 405 Mhs on my 5870@965/300, against 456 Mhs with phatk2 on Phoenix 1.7, same string. Ati Driver 11.6.
cyberlync
Full Member
***
Offline Offline

Activity: 226
Merit: 100



View Profile
December 27, 2011, 10:35:24 PM
 #348

Thanks for the work!

I can't get the file from mediafire, I keep getting an error that says something about no servers available with the file on them... Any possibility to upload to an alternative place?

Thanks in advance.

Tell me which filehoster works for you and I can upload it there. But it should be possible to upload without registering!

Dia

Just checked again and it worked, they must have been updating some servers or whatever, sorry to bother you. Thanks again, will send a donation as soon as the client is done downloading the blocks.

Giving away your BTC's? Send 'em here: 1F7XgercyaXeDHiuq31YzrVK5YAhbDkJhf
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 27, 2011, 11:15:28 PM
 #349

Download version 2011-12-21: http://www.mediafire.com/?r3n2m5s2y2b32d9
Should restore some of the speed loss for 58XX owners, who switched to SDK / runtime 2.6 and is the best for 69XX owners, too.
Dia

Thanks, but no thanks. It makes about 405 Mhs on my 5870@965/300, against 456 Mhs with phatk2 on Phoenix 1.7, same string. Ati Driver 11.6.

Same string means you didn't supply VECTORS2 instead of VECTORS, right?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
conspirosphere.tk
Legendary
*
Offline Offline

Activity: 2352
Merit: 1064


Bitcoin is antisemitic


View Profile
December 28, 2011, 08:18:27 AM
 #350

Same string means you didn't supply VECTORS2 instead of VECTORS, right?
Dia

This is the string I used with your kernel to get barely 405 Mhs with Phoenix 1.7:
phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=64 -a 1000

And this is the string which makes me 455 Mhs with Phoenix 1.7:
phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk2 DEVICE=0 VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 -a 1000
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
December 28, 2011, 09:52:16 AM
 #351

Same string means you didn't supply VECTORS2 instead of VECTORS, right?
Dia

This is the string I used with your kernel to get barely 405 Mhs with Phoenix 1.7:
phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=7 WORKSIZE=64 -a 1000

And this is the string which makes me 455 Mhs with Phoenix 1.7:
phoenix.exe -u http://***:***@btcguild.com:8332/ -k phatk2 DEVICE=0 VECTORS FASTLOOP=false AGGRESSION=7 WORKSIZE=256 -a 1000

Thanks for sharing, on my machine phatk2 is slower with 6950 (VLIW4) and 6550D (VLIW5 - Fusion APU).
You have 11.6 as driver!? And I said this kernel is for SDK / Runtime 2.6, which means it's best for 11.12 / 12.1 preview and newer!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
naz86
Member
**
Offline Offline

Activity: 111
Merit: 10


View Profile
December 28, 2011, 01:38:31 PM
 #352

NICE,

went from 243mhs to 245 on my 5830 on stock clock with parameters:

-k phatk DEVICE=0 VECTORS2 FASTLOOP=false AGGRESSION=6 WORKSIZE=128 -a 1000
disclaimer201
Legendary
*
Offline Offline

Activity: 1526
Merit: 1001


View Profile
January 08, 2012, 03:35:15 PM
 #353

Interesting. I have Sdk2.6 & 11.12 driver. With your kernel, I get back my better performance, but the cpu bug is back. Once I switch off phoenix and switch on poclbm again, it's gone.

I assume there is no way to get rid of the bug AND have the good performance back again?
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
January 08, 2012, 08:29:57 PM
 #354

If you didn't note it, I profiled the performance of Diapolo's kernel on my 5830 using both SDK 2.5 and the performance-robbing 2.6 here with lots of options and RAM clock speeds. It might give insights of where to go on the kernel for current driver OpenCL performance; the 58xx is a bit different than Diapolo's 5770 in how it responds to worksize.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 08, 2012, 09:17:06 PM
 #355

If you didn't note it, I profiled the performance of Diapolo's kernel on my 5830 using both SDK 2.5 and the performance-robbing 2.6 here with lots of options and RAM clock speeds. It might give insights of where to go on the kernel for current driver OpenCL performance; the 58xx is a bit different than Diapolo's 5770 in how it responds to worksize.

For my setup (6950 + 6550D) the strange thing is, that CGMINER (phatk2.x) is slower, no matter how it's configured. The 6950 is quite faster, for the 6550D the difference makes only a few MH/s.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
January 11, 2012, 08:20:16 AM
 #356

So i've got cat12.1
Crossfire 6870's 1000core 600mem
Phoenix 1.7.3 And 1.7.2
Using Dia's most recent custom kernal
And it Pegs my cpu core at 100%
It also doesnt display hashrates correctly, Unless it actually is only getting 205mh/sec with
-v -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=256
-k phatk AGGRESSION=11 FASTLOOP=false VECTORS2 WORKSIZE=128
-v -k phatk AGGRESSION=12 FASTLOOP=false VECTORS2 WORKSIZE=64

It just seems to Shit all over my cards... and also Pegs my cpu core upto 100%

What the hell am i doing wrong?
Even the Default Phoenix kernal brings back the 100% cpu bug....

And yet, GUIminer Poclbm with -v -w128 -f1 gives me 298mh/sec and no cpu bug....
Really i WANT to use this kernal, But why the hell is it failing?

Poclbm is apparently Stupid Easy to use. And as such i would assume that it cannot do as much as other miners can.
So i came to Phoenix, It was either that or CG, And CG miner ALSO brings back 100%cpu usage, but atleast cg miner gives me 294mhash/sec

use aggression 10 or lower to avoid 100% cpu. you'll get about 40% cpu with 11, and 100 with 12+. 10 and lower will be maybe 4-5%

blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 12, 2012, 03:57:25 AM
 #357

Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 12, 2012, 06:28:01 AM
Last edit: January 12, 2012, 01:02:05 PM by Diapolo
 #358

Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 12, 2012, 09:41:24 PM
 #359

Hi Diapolo,

I just wanted to say thank you for your most recent kernel, the 2011-12-21 version is by far the best.

Im getting 455 Mhash/sec on my 6970 using Phoenix 1.7.1 (Aggression=12 Worksize=128 VECTORS2)

It easily averages 7 shares/minute which is fantastic. I had to lower my memory clock to 200 Mhz for optimal performance as it actually runs slower above this speed, which is fine with me as I can raise the core clock a bit higher with the lower temperature (When using VECTORS4 I had to set memory clock to 375 Mhz to get similar performance, but then I had to lower clock speed and it wasn't worth it. Worksize of 256 becomes really slow after a while regardless of settings)

So far this is the best kernel to use, much more efficient than any version of phatk2 if you find the right speed and setting. I wish VECTORS3 would work though, but the way you calculate Worksize it's impossible.

I'm running 12.1 preview drivers with new 2.6 SDK. With a customer forked poclbm or even guiminer, I can set Worksize in "32" increments.

I've been trying to run VECTORS3 with Worksize=192 as this should be divisible, and I've used this worksize before with phatk2.2, which is slightly faster for VECTORS4.
But, when I try to use your kernel with a different Worksize (other than 64,128,256) it just gives me an error saying "this worksize is not valid" even though it should be possible, and with the Worksizes it does accept, I don't think the ratedivisor will accept them. I was hoping there was some way to allow worksize in 32 increments as I think Worksize=192 would be a good test for VECTORS3.

Anyways, currently I'm just using VECTORS2 and so far the results are outstanding on my AMD 6970!

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Hohohoo!, Thank you guy what told me to drop Aggro to 10, No more cpu pegging!
I just have one more problem, My 2nd gpu uses about 91-98% usage.. While 1st gpu is at a Lovely 99%... 'Sup? Both gpus are on different cpu cores..

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 12, 2012, 11:48:11 PM
 #360

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Sounds great, I'm looking forward to your next release! Even though wavefront may get crippled a little, with worksize=192 on vectors4 I didn't see much of a difference in the number of shares output, that's why I'm hoping to try it with vectors3. I'll definitely be sending a donation your way tomorrow!
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 13, 2012, 11:02:21 AM
 #361

Hi blandead, thanks for your positive feedback. About the WORKSIZE, I read through AMDs OpenCL guide and they say it's best to use a multiple of 64 as WORKSIZE, because a Wavefront on modern AMD GPUS runs 64 work-items in parallel. But it's not that hard to change the code to accept 32 as a divider ... I'll consider this. For the vec3 thing, I'm trying hard to figure out what the problem is, but so far I have no clue :-/.

Edit: The WORKSIZE can be set to any value smaller than the HW max (256 for current AMD GPUs) and which is divisible by 2, so 32 works ... tried it, but cripples Wavefronts!

Edit 2: From the OpenCL manual: "If local_work_size is specified, the values specified in global_work_size[0],... global_work_size[work_dim - 1] must be evenly divisible by the corresponding values specified in local_work_size[0],... local_work_size[work_dim - 1]." So the global worksize needs to be evenly divisable by the supplied WORKSIZE, which seems to not be true for WORKSIZE=96 for Phoenix.

Edit 3: I edited the __init__.py to allow WORKSIZE to be every value, which is able to evenly divide the global worksize. This is in the next release.

Dia

PS.: Guys if you wish to support my work keep donating, there were nearly no donations in the last weeks!

Sounds great, I'm looking forward to your next release! Even though wavefront may get crippled a little, with worksize=192 on vectors4 I didn't see much of a difference in the number of shares output, that's why I'm hoping to try it with vectors3. I'll definitely be sending a donation your way tomorrow!

I took a deep look into Phoenix, the initial number of nonces to run per execution is 1 << AGRESSION, so this currently seems to be a value, which is always evenly divisable by 64. That means it is NOT evenly divisable by 192, which makes 192 as WORKSIZE invalid. I'm not sure how to change this to allow for 192 as valid value, whithout breaking other things in the code.

Internal tests with my latest kernel show good results with "VECTORS2 WORKSIZE=128" and even with "VECTORS4 WORKSIZE=64" on VLIW5 GPUs, so perhaps 192 is not needed ... will see.

I'm currently working on release notes, stay tuned.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 13, 2012, 11:44:43 AM
 #362

A new version is ready for your testing pleasure:
Download version 2012-01-13: http://www.mediafire.com/download.php?2sqoj8obvp1q23p

highlights:
- the child has it's name, I call it phatk_dia - would be nice if you guys use this in discussions to be clear what your kernel is Wink
- faster on VLIW5 GPUs with VECTORS2 and VECTORS4
- more efficient on VLIW4 GPUs with VECTORS2 and a little faster with VECTORS4
- FASTLOOP defaults to false, so you don't need to supply FASTLOOP=false
- added an extended check for supplied WORKSIZE parameter
- removed a pyOpenCL finish() to reduce API overhead (could cause problems, but works here -> consider this beta till it proves stable)

Please report and give me all your coins :-D!

Edit: Please don't complain if this doesn't work good for non 2.6 SDK / Runtime versions, because this IS for 2.6 or later!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 13, 2012, 05:30:35 PM
Last edit: January 13, 2012, 05:43:03 PM by JackRabiit
 #363

Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 13, 2012, 05:46:27 PM
 #364

Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 13, 2012, 05:49:22 PM
Last edit: January 13, 2012, 06:03:30 PM by JackRabiit
 #365

Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
316.8mh/sec with previous kernal, 317.6mh/sec with new kernal
No this version does not Appear to be Noteably faster for me, But, On the other hand, My comp feels like its "mining cleaner" i cant really describe it.. Samespeeds.. Less desktoplag...
Yes my cards are Crossfired. When i set just the 2nd gpu to PhatkD, It does what it should, It goes to 99% and gives out exactly the same as what gpu 1 does, But as soon as i enable gpu 1 to mine at the same time as gpu 2 (with gpu2 starting first and running at 99%) Then it Drops off, to 98-92% fluttering, I'll add a pic. Both cards are on differnt cpu cores, just in case

I occasionally get smacked with a stale share RIGHT AWAY, But after that everything is normal.. and it's only occasional, and it's like "Star---OMFG INVALI--Running"

http://imageshack.us/f/718/28674354.png/<--Only useful info is the MSI window

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 13, 2012, 05:52:05 PM
 #366

Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
Testing, Expect an update in 6mins

6870s are VLIW5, so I'm hoping for good news.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 13, 2012, 05:58:27 PM
 #367

Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
Testing, Expect an update in 6mins

6870s are VLIW5, so I'm hoping for good news.

Dia
Well then sorry for the sad news :|

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 13, 2012, 06:00:51 PM
 #368

Phatk_Dia! Whooo! I personally think you've done enough "tweaks" to this kernal for it to become embedded with your name, How'bout just making it PhatkD

Ughhh....
Memclock seems to be mandatory to have set near 1000 otherwise i lose speed,
-k phatk AGGRESSION=10 VECTORS4 WORKSIZE=64 Dual 6870's
1000core 600mem=285mh/s @81°c fans at 90%
1000core 1000mem=317mh/s @88°C fans at 100%

I dont suppose thiers a way to Not have my mem clock up? 88°C Is MURDER,

Also, Im still having issues with the 2gpu have Low usage, Gpu1=99% Wich is fucking beautiful but gpu#2 Never kisses 99% once, Always dancing in 91-98%, 285-315mh/s

Is this version any faster for you, what were your results with the last version as a comparison.
Are your cards x-fired? What happens if only the 2nd GPU is active for Phoenix? I'm not sure what makes card 2 behave like that.
Have you more rejects or more shares submitted with this version?

Dia
Testing, Expect an update in 6mins

6870s are VLIW5, so I'm hoping for good news.

Dia
Well then sorry for the sad news :|

Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too.
Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 13, 2012, 06:05:06 PM
 #369



Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too.
Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure.

Dia
Deleted all .Elf's
VECTORS2 and WORKSIZE=128, 305mh/s
VECTORS4 and WORKSIZE=64, 317mh/s

Please note, That i apparently MUST have my memclock at 1000 or i cannot reach these speeds, Problem, Is heat, If my mem is at 1000, Then i cant run my core at 1000, It gets too hot,
With my mem at 600

VECTORS2 and WORKSIZE=128, 283mh/s
VECTORS4 and WORKSIZE=64, 287mh/s

I currently have two different problems with running PhatkD
2nd gpu dances and messes around
and mem clock Must be at 1000<--Bullocks, That kills my cards

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 13, 2012, 06:17:23 PM
 #370



Boooo :-/. Please try VECTORS2 and WORKSIZE=128, too.
Phoenix should compile a new binary for this kernel version, but perhaps you could delete all the .elf files in the phark directory to be sure.

Dia
Deleted all .Elf's
VECTORS2 and WORKSIZE=128, 307.8mh/s
VECTORS4 and WORKSIZE=64, 314.2mh/s

Please note, That i apparently MUST have my memclock at 1000 or i cannot reach these speeds, Problem, Is heat, If my mem is at 1000, Then i cant run my core at 1000, It gets too hot,
With my mem at 600

VECTORS2 and WORKSIZE=128, 283mh/s
VECTORS4 and WORKSIZE=64, 287mh/s

I currently have two different problems with running PhatkD
2nd gpu dances and messes around
and mem clock Must be at 1000<--Bullocks, That kills my cards
Shocked Huh Undecided
!!!!!!!!!!!
Just noticed... That after deleting all .elf's I've lost performance.... But it's like 2mhash/s and could simply just be that fact that im using my comp while doing these tests
Wtf.. Killing, those .elf's made 128 run better, aswell as made 64 run worse? Thats gotta be inaccurate on my part...

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 13, 2012, 06:20:07 PM
Last edit: January 13, 2012, 06:37:49 PM by Diapolo
 #371

Ok, I'll let you first play around a bit, before asking for a performance comparison Cheesy.

I asked, what's happening, if only one card is mining in terms of GPU2 usage "bug", does it go up to 99% then?
Are the cards connected via Crossfirebridge? What OS and driver are you on?

Edit: By the way, did you try to lower mem clock even more via MSI Afterburner and unofficial overclocking mode?

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 13, 2012, 07:14:50 PM
Last edit: January 13, 2012, 07:57:45 PM by JackRabiit
 #372

Ok, I'll let you first play around a bit, before asking for a performance comparison Cheesy.

I asked, what's happening, if only one card is mining in terms of GPU2 usage "bug", does it go up to 99% then?<---... I Said yeah, it works flawlessly when running alone
Are the cards connected via Crossfirebridge?<---I said yes, What OS and driver are you on?<-Win7x64 sdk2.6 cat 12.1

Edit: By the way, did you try to lower mem clock even more via MSI Afterburner and unofficial overclocking mode?

Dia
I never saw a good reason to drop my mem below 600, But i cant do it Easily... I'll go do 1000core 315mem and post results Aswell as 1000core 1000mem.
Using GUIminer+PhatkD, MSIa, sdk 2.6, cat 12.1, crossfired 6870's

1kcore 300mem=255mh/s 70°C Fans@70%
1kcore 1kmem=314.8mh/s 88°C Fans@100%
Using GUIminer+pcolbm, MSIa, sdk 2.6, cat 12.1, crossfired 6870's
1kcore 500mem=307mh/s 77°C Fans@ 80%
1kcore 1kmem=307mh/s OverheatShutdown.


http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
January 13, 2012, 08:10:41 PM
 #373

Benchmarks on a 5770 (VLIW5, 800 stream processors, 980MHz core [scales more like 5870 than 5830]), Catalyst 12.1a/SDK 2.6, Phoenix 1.7.3 exe, win7 x32:

Typical command line (single cpu affinity, realtime priority):
start /AFFINITY 08 /REALTIME phoenix.exe -v -u http://xx/ -k dia VECTORS4 AGGRESSION=12 FASTLOOP=False WORKSIZE=64


worksize:64128256
phatk2VECTORS41000MHz223.88226.34181.40
phatk2VECTORS1000MHz197205195
dia_newVECTORS41000MHz223.28225.48195.75
dia_newVECTORS21000MHz215.71220.37212.23
dia_lastVECTORS41000MHz207.27200.41

less MH/s than phatk2, peak performance at 1000MHz RAM...
Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 13, 2012, 08:19:00 PM
 #374

Benchmarks on a 5770 (VLIW5, 800 stream processors, 980MHz core [scales more like 5870 than 5830]), Catalyst 12.1a/SDK 2.6, Phoenix 1.7.3 exe, win7 x32:

Typical command line (single cpu affinity, realtime priority):
start /AFFINITY 08 /REALTIME phoenix.exe -v -u http://xx/ -k dia VECTORS4 AGGRESSION=12 FASTLOOP=False WORKSIZE=64 PLATFORM=0 DEVICE=0


worksize:64128256
phatk2VECTORS41000MHz223.88226.34181.40
phatk2VECTORS1000MHz197205195
dia_newVECTORS41000MHz223.28225.48195.75
dia_newVECTORS21000MHz215.71220.37212.23

less MH/s than phatk2, peak performance at 1000MHz RAM...
Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
January 13, 2012, 08:26:06 PM
 #375

Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...
This is something that has changed in SDK 2.6; The best performance at the best settings after trying all options comes at a GPU RAM speed of 1000MHz (stock speed for most cards) instead of at an underclock of 300MHz-370MHz. Version 2.6, included with driver 11.12 and 12.1, is significantly different in how it responds to worksizes, vector settings, and OpenCL programming than the previous SDKs.

It is a benefit in that one doesn't need oddly tweak memory speeds from stock to get the best performance (annoying to tell noobs over and over to underclock RAM), but bad in that this old quirk was actually an electricity saver if you did it.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 13, 2012, 10:37:18 PM
 #376

Benchmarks on a 5770 (VLIW5, 800 stream processors, 980MHz core [scales more like 5870 than 5830]), Catalyst 12.1a/SDK 2.6, Phoenix 1.7.3 exe, win7 x32:

Typical command line (single cpu affinity, realtime priority):
start /AFFINITY 08 /REALTIME phoenix.exe -v -u http://xx/ -k dia VECTORS4 AGGRESSION=12 FASTLOOP=False WORKSIZE=64


worksize:64128256
phatk2VECTORS41000MHz223.88226.34181.40
phatk2VECTORS1000MHz197205195
dia_newVECTORS41000MHz223.28225.48195.75
dia_newVECTORS21000MHz215.71220.37212.23
dia_lastVECTORS41000MHz207.27200.41

less MH/s than phatk2, peak performance at 1000MHz RAM...

I really have a problem with this results, I simply don't understand, why VLIW5 cards with different stream processor counts behave THAT different.
Take a look at my result with 6550D (VLIW5 / 400 shaders / 800 MHz Mem via DDR3-1600):

phatk2 VECTORS WORKSIZE=128: 61,54 MH/s
phatk_dia VECTORS2 WORKSIZE=128: 67,15 MH/s

Anyone with 69XX hardware willing to test, it seems a bit quiet in here Wink.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
January 13, 2012, 11:26:28 PM
 #377

Is FASTLOOP broken? I get:

Code:
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
  File "twisted\internet\defer.pyc", line 361, in callback

  File "twisted\internet\defer.pyc", line 455, in _startRunCallbacks

  File "twisted\internet\defer.pyc", line 542, in _runCallbacks

  File "QueueReader.pyc", line 136, in preprocess

--- <exception caught here> ---
  File "twisted\internet\defer.pyc", line 133, in maybeDeferred

  File "kernels\phatk_dia\__init__.py", line 167, in <lambda>

  File "kernels\phatk_dia\__init__.py", line 381, in preprocess

  File "kernels\phatk_dia\__init__.py", line 377, in updateIterations

exceptions.UnboundLocalError: local variable 'EXP' referenced before assignment

attempting to use it...
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 14, 2012, 12:00:09 PM
Last edit: January 14, 2012, 12:25:57 PM by Diapolo
 #378

Is FASTLOOP broken? I get:

Code:
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
  File "twisted\internet\defer.pyc", line 361, in callback

  File "twisted\internet\defer.pyc", line 455, in _startRunCallbacks

  File "twisted\internet\defer.pyc", line 542, in _runCallbacks

  File "QueueReader.pyc", line 136, in preprocess

--- <exception caught here> ---
  File "twisted\internet\defer.pyc", line 133, in maybeDeferred

  File "kernels\phatk_dia\__init__.py", line 167, in <lambda>

  File "kernels\phatk_dia\__init__.py", line 381, in preprocess

  File "kernels\phatk_dia\__init__.py", line 377, in updateIterations

exceptions.UnboundLocalError: local variable 'EXP' referenced before assignment

attempting to use it...

I wrote this in the first posting, yes it is broken currently! I'm looking into it.
Are you sure it's needed for you?

Edit: self.loopExponent = int(max(0, EXP)) causes this error, but I'm not sure yet, why this happens with my init and not the default one ...

Edit 2: Fix is to place another tabstop at the beginning in line 377 in front of self.loopExponent = int(max(0, EXP))! Wow that's a stupid one. Will upload a fixed version later today.

Edit 3: It has to look like this in an editor:
Code:
		if not (rate <= 0):
# calculate the number of iterations to run
EXP = max(0, (math.log(rate)/math.log(2)) - (self.AGGRESSION - 8))
# prevent switching between loop exponent sizes constantly
if EXP > self.loopExponent + 0.54:
EXP = round(EXP)
elif EXP < self.loopExponent - 0.65:
EXP = round(EXP)
else:
EXP = self.loopExponent

self.loopExponent = int(max(0, EXP))

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 14, 2012, 12:34:55 PM
 #379

Uploaded a fixed version, which corrects an error with FASTLOOP=True:
Download version 2012-01-13: http://www.mediafire.com/?xzk6b1yvb24r4dg

There are no other changes in this version!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
January 14, 2012, 01:45:39 PM
 #380

worksize:64128256
phatk2VECTORS1000MHz197205195
dia_newVECTORS21000MHz215.71220.37212.23

phatk2 VECTORS WORKSIZE=128: 61.54 MH/s
phatk_dia VECTORS2 WORKSIZE=128: 67.15 MH/s
That corresponds closely with the two-vector results I quote, however in finding the highest output possible from a GPU, VECTORS4 (@ 64 or 128, depending on card), phatk2 still eeks out a win for me.
TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
January 14, 2012, 08:17:47 PM
 #381

FASTLOOP is great with AGGRESSION=6 for good desktop responsiveness, I did indeed need it. Mind you, this newest kernel doesn't seem to improve the performance of my 5970 with Catalyst 12.1.
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 18, 2012, 03:33:57 AM
 #382

Hey Dia,

So I had to do a fresh install on my computer, but I sent you a small donation just now, lemme know if it went through : D

Anyways, while I was in the process of installing AMD drivers I saw an awesome article about OpenCL 1.2 Preview with SDK 2.6. I'm testing the preview drivers out now since they add a couple new extensions, though I have to figure out the best place to use them. I ran your kernel through the latest APP KernelAnalyzer. I think there are many places it can be optimized as I'm seeing BFI_INT directly from the GPU ISA for many of the rounds, and it looks like there are a lot of new patterns they added to do so.

I also found a really cool pdf on new optimizations that are recommended for OpenCL 1.2, and it is supposed to provide a pretty good performance increase for VLIW4 architecture, and there was one part that I think would solve your VECTORS3 issue or even a better way of achieving it. If you have time send me a PM, and I can send you the pdf.

Anyways, new kernel is a little faster with VECTORS4, but for some reason the temperature is higher. That could just be because of the fresh wipe I did, did anyone else notice their GPU running hotter?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 18, 2012, 06:29:13 AM
 #383

Hey Dia,

So I had to do a fresh install on my computer, but I sent you a small donation just now, lemme know if it went through : D

Anyways, while I was in the process of installing AMD drivers I saw an awesome article about OpenCL 1.2 Preview with SDK 2.6. I'm testing the preview drivers out now since they add a couple new extensions, though I have to figure out the best place to use them. I ran your kernel through the latest APP KernelAnalyzer. I think there are many places it can be optimized as I'm seeing BFI_INT directly from the GPU ISA for many of the rounds, and it looks like there are a lot of new patterns they added to do so.

I also found a really cool pdf on new optimizations that are recommended for OpenCL 1.2, and it is supposed to provide a pretty good performance increase for VLIW4 architecture, and there was one part that I think would solve your VECTORS3 issue or even a better way of achieving it. If you have time send me a PM, and I can send you the pdf.

Anyways, new kernel is a little faster with VECTORS4, but for some reason the temperature is higher. That could just be because of the fresh wipe I did, did anyone else notice their GPU running hotter?

Your donation has just arrived, thank you Smiley!

Sounds pretty interesting and I would like to receive a copy of that PDF. Can you upload it somewhere or send me a link via PM? I saw, that there is a new cl_amd_media_ops2 extension in the latest drivers, but I could not find and documentation for it (the first one is used for BFI_INT patching). Would be very nice, if BFI_INT would be directly accessible via OpenCL, so that we could kick the binary patching out. The vec3 bug is really strange, I guess it happens in the Python host code and not in the kernel, because KernelAnalyzer will run it just fine.

I'm looking forward to further discussions!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
gat3way
Sr. Member
****
Offline Offline

Activity: 256
Merit: 250


View Profile
January 18, 2012, 10:48:13 AM
 #384

Hello,

Unfortunately the amd_cl_media_ops2 extension has nothing to do with BFI_INT. There are amd_bfe() and amd_bfm() functions defined, but nothing that maps to bfi_int.

Can I have that pdf too please?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 18, 2012, 11:17:09 AM
 #385

Hello,

Unfortunately the amd_cl_media_ops2 extension has nothing to do with BFI_INT. There are amd_bfe() and amd_bfm() functions defined, but nothing that maps to bfi_int.

Can I have that pdf too please?

Have you got a link to the amd_media_ops2 documentation?

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
gat3way
Sr. Member
****
Offline Offline

Activity: 256
Merit: 250


View Profile
January 18, 2012, 11:58:36 AM
 #386

There is no documentation yet. Those are the strings carved from libamdocl64.so. Additionaly, I've tested most of them (excluding max3/min3 and the sad ones) and they work. For some reason, you need to compile with -Dcl_amd_media_ops2, because just the pragma does not enable it.

For the full list see this thread:

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=157516&messid=1274705&parentid=1274660&FTVAR_FORUMVIEWTMP=Branch
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
January 18, 2012, 11:40:25 PM
 #387

minus 12 Mhash/s for HD 6850
minus 25 Mhash/s for each of my HD 5850s

compared to guiminer from July 1st :/

PS. Yes, I did experiment with flags, etc.

drivers: 11.5 and 2.3 stream SDK
OS: win 7 64 pro

Signature space available for rent.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 19, 2012, 08:21:14 AM
 #388

minus 12 Mhash/s for HD 6850
minus 25 Mhash/s for each of my HD 5850s

compared to guiminer from July 1st :/

PS. Yes, I did experiment with flags, etc.

drivers: 11.5 and 2.3 stream SDK
OS: win 7 64 pro

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
January 19, 2012, 06:27:59 PM
 #389

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.

Dia

My fault then Tongue

Wasn't SDK 2.6 the one that was significantly slower? Which driver version would you recommend to work along with SDK 2.6?

Signature space available for rent.
BCMan
Hero Member
*****
Offline Offline

Activity: 535
Merit: 500



View Profile
January 19, 2012, 06:55:34 PM
 #390

 Why better don't improve kernel for phatk2? It's faster than 1st version and still faster than phatk_dia.
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 20, 2012, 10:29:51 PM
 #391

Quote
Your donation has just arrived, thank you Smiley!

Sounds pretty interesting and I would like to receive a copy of that PDF. Can you upload it somewhere or send me a link via PM? I saw, that there is a new cl_amd_media_ops2 extension in the latest drivers, but I could not find and documentation for it (the first one is used for BFI_INT patching). Would be very nice, if BFI_INT would be directly accessible via OpenCL, so that we could kick the binary patching out. The vec3 bug is really strange, I guess it happens in the Python host code and not in the kernel, because KernelAnalyzer will run it just fine.

I'm looking forward to further discussions!

Dia

I'm not sure where I downloaded it, but I can easily e-mail you it. The cl_amd_media_ops2 command is for mapping 3d images, so that doesn't help us. But if you look at AMD 11.12 driver they tell you to add an environment path "GPU_ASYNC_MEM_COPY=2" to make use of a new feature. There is a preview driver of the opencl 1.2 that adds some functionality. They are lifting the rule of only 1 overloaded function, and will allow you to code directly in c++. Here is a reference card of commands http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdf

Here is the bases to one of the new commands (cl_khr_fp64) http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/cl_khr_fp64.html -- adds double floating-point precision.
Only works on AMD 69xx devices though, and probably the GCN cards

I'm trying to find a direct link to this nice pdf I found with excellent examples. I have the file on my computer though.

Ah! found it... http://www.bu.edu/pasi/files/2011/01/AndreasKloeckner3-07-1000.pdf Look at page 56-60

This code should look familiar to anybody who took a programming class.
zvs
Legendary
*
Offline Offline

Activity: 1680
Merit: 1000


https://web.archive.org/web/*/nogleg.com


View Profile WWW
January 21, 2012, 07:26:54 PM
 #392

Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...
This is something that has changed in SDK 2.6; The best performance at the best settings after trying all options comes at a GPU RAM speed of 1000MHz (stock speed for most cards) instead of at an underclock of 300MHz-370MHz. Version 2.6, included with driver 11.12 and 12.1, is significantly different in how it responds to worksizes, vector settings, and OpenCL programming than the previous SDKs.

It is a benefit in that one doesn't need oddly tweak memory speeds from stock to get the best performance (annoying to tell noobs over and over to underclock RAM), but bad in that this old quirk was actually an electricity saver if you did it.
Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure. 
Bananington
Sr. Member
****
Offline Offline

Activity: 1414
Merit: 344



View Profile
January 23, 2012, 03:55:18 AM
 #393

I get about 10-9MH/s increase. Thank you Diapolo!

.
SPIN

       ▄▄▄██████████▄▄▄
     ▄███████████████████▄
   ▄██████████▀▀███████████▄
   ██████████    ███████████
 ▄██████████      ▀█████████▄
▄██████████        ▀█████████▄
█████████▀▀   ▄▄    ▀▀▀███████
█████████▄▄  ████▄▄███████████
███████▀  ▀▀███▀      ▀███████
▀█████▀          ▄█▄   ▀█████▀
 ▀███▀   ▄▄▄  ▄█████▄   ▀███▀
   ██████████████████▄▄▄███
   ▀██████████████████████▀
     ▀▀████████████████▀▀
        ▀▀▀█████████▀▀▀
.
RIUM
.
███
███
███
███
███
███
███
███
███
███
███
███
SAFE GAMES
WITH WITHDRAWALS
       ▄▀▀▀▀▀▀▄▄▄▄
 ▄▀▀▀▀▀▀▀▀▀▀▀▀▄  ▀▀▄
█    ▄         █   ▀▌
█   █ █        █    ▌
█      ▄█▄     █   ▐
█     ▄███▄    █   ▌
█    ███████   █  ▐
█    ▀▀ █ ▀▀   █  ▌
█     ▄███▄    █ ▐
█              █▐▌
█        █ █   █▌
 ▀▄▄▄▄▄▄▄▄█▄▄▄▀
       ▄▀▀▀▀▀▀▄▄▄▄
 ▄▀▀▀▀▀▀▀▀▀▀▀▀▄  ▀▀▄
█    ▄         █   ▀▌
█   █ █        █    ▌
█      ▄█▄     █   ▐
█     ▄███▄    █   ▌
█    ███████   █  ▐
█    ▀▀ █ ▀▀   █  ▌
█     ▄███▄    █ ▐
█              █▐▌
█        █ █   █▌
 ▀▄▄▄▄▄▄▄▄█▄▄▄▀
.
███
███
███
███
███
███
███
███
███
███
███
███
▄▀▀▀











▀▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
SIGN UP


▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▄











▄▄▄▀
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1032



View Profile WWW
January 23, 2012, 05:44:45 PM
 #394

Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure.  
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be:

GPU model/architecture,
GPU card memory bus/memory size,
GPU core overclock,
Operating System/32or64bit/video card driver,
OpenCL/APP SDK runtime installed on system,
Miner software,
Miner kernel (and it's particular optimizations),
Miner kernel parameters (worksize, vector size),
Compiler/SDK used to create miner,
Libraries installed on system (if running interpreted source)...

So there is no one right answer.
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
January 23, 2012, 05:58:57 PM
 #395

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.
Dia

OK, managed to set it work at full speed with sdk 2.3  Cheesy

Signature space available for rent.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 23, 2012, 07:58:51 PM
 #396

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.
Dia

OK, managed to set it work at full speed with sdk 2.3  Cheesy

Great you got it working, I only wanted to mention it's intended for 2.6+ Cheesy.

Dia

Btw.: The current kernel doesn't work with 7970 + GCN seems to dislike vectors for mining.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
January 24, 2012, 05:15:23 AM
 #397

(and is not common, most 5xxx/6xxx cards are at 300MHz)

ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.

Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 24, 2012, 05:16:56 PM
 #398

(and is not common, most 5xxx/6xxx cards are at 300MHz)

ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.
OH THATS THE TRICK?!?! My 6870's "sweetspot" SEEMS to be 490 with the core at 990! That makes Quite alot of sense!, I was planning on look for a SweetER spot but i felt that 490 "was it" and that i wouldnt find anything better, So i didnt look.

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
zvs
Legendary
*
Offline Offline

Activity: 1680
Merit: 1000


https://web.archive.org/web/*/nogleg.com


View Profile WWW
January 25, 2012, 02:43:20 AM
 #399

Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure.  
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be:

GPU model/architecture,
GPU card memory bus/memory size,
GPU core overclock,
Operating System/32or64bit/video card driver,
OpenCL/APP SDK runtime installed on system,
Miner software,
Miner kernel (and it's particular optimizations),
Miner kernel parameters (worksize, vector size),
Compiler/SDK used to create miner,
Libraries installed on system (if running interpreted source)...

So there is no one right answer.
Hasn't been my experience, nor any of the other half a dozen people I know that run 5830 setups.  The decision is more along the lines of 'do I want to run the card cooler with a lower memory setting', vs 'do I want to run at 395mhz memory, but gain a few mhash?'.

I speak of 5830's exclusively.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 27, 2012, 03:53:48 PM
 #400

I'm currently working pretty hard on a kernel for 7970 cards and am looking for a few guys, who are willing to test / benchmark it.
Please apply in this thread or via PM, you need to have a 7970 card and be on a current Phoenix version with latest Catalyst.
For now I don't want to release the kernel into the wild, sorry ... it's not polished Cheesy.

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
pandemic
Sr. Member
****
Offline Offline

Activity: 434
Merit: 250


View Profile
January 29, 2012, 03:31:10 PM
 #401

You should start rolling out pre-compiled compressed files or something. It's getting above my knowledge, lol!
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 29, 2012, 03:45:00 PM
 #402

DiaKGCN kernel is ready, if you like try it with VLIW5 and VLIW4 hardware It should be interesting how worse or good a GCN optimized kernel performs on older hardware:
https://bitcointalk.org/index.php?topic=61406.0

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
blissfulyoshi
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
February 02, 2012, 04:46:16 AM
 #403

Tried out diakgcn on my 6870. All tests at aggression 12 and done over very short time periods, unless said otherwise.

Results:

2.5:
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps (best one from previous test)
diakgcn, WORKSIZE=128, VECTORS: 278 MHps
diakgcn, WORKSIZE=128, VECTORS2: 279 MHps
diakgcn, WORKSIZE=64, VECTORS: 278 MHps
diakgcn, WORKSIZE=64, VECTORS2: 279 MHps
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 278 MHps (spiked up to 282 at one point.....)
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=5: 275 MHps

2.6: (Sorry, did poor documentation here. Only listed best results If asked for I will document this better later)
phatk_dia: 272 MHps
diakgcn: 260MHps

tl;dr: diakgcn is currently slower for 6870
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 02, 2012, 05:58:45 AM
 #404

Tried out diakgcn on my 6870. All tests at aggression 12 and done over very short time periods, unless said otherwise.

Results:

2.5:
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps (best one from previous test)
diakgcn, WORKSIZE=128, VECTORS: 278 MHps
diakgcn, WORKSIZE=128, VECTORS2: 279 MHps
diakgcn, WORKSIZE=64, VECTORS: 278 MHps
diakgcn, WORKSIZE=64, VECTORS2: 279 MHps
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 278 MHps (spiked up to 282 at one point.....)
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=5: 275 MHps

2.6: (Sorry, did poor documentation here. Only listed best results If asked for I will document this better later)
phatk_dia: 272 MHps
diakgcn: 260MHps

tl;dr: diakgcn is currently slower for 6870

Thanks for your results, that behaviour was awaited ... now it's confirmed. Well DiaKGCN is not finished, so perhaps it will get better for older cards over time Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
blissfulyoshi
Newbie
*
Offline Offline

Activity: 11
Merit: 0


View Profile
February 02, 2012, 05:59:49 PM
 #405

Since you had an update today, I guess I'll retest. (diaggcn?Huh, spelling mistake?). Like before all test are short test at aggression 12, unless stated otherwise.

2.5
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps
diaggcn, WORKSIZE=64, VECTORS: 248 MHps
diaggcn, WORKSIZE=64, VECTORS2: 277 MHps
diaggcn, WORKSIZE=64, VECTORS4: 545 MHps  (Guess the vectors 4 bug has not been fixed? That probably means this is 272MHps)
diaggcn, WORKSIZE=128, VECTORS: 248 MHps
diaggcn, WORKSIZE=128, VECTORS2: 277 MHps
diaggcn, WORKSIZE=128, VECTORS4: 551 MHps (Probably 276MHps)
diaggcn, WORKSIZE=256, VECTORS: 248 MHps
diaggcn, WORKSIZE=256, VECTORS2: 271 MHps
diaggcn, WORKSIZE=256, VECTORS4: 540 MHps (Probably 270MHps)
diaggcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 276 MHps

might test my card on 2.6 later, but on 2.5, I am getting worse results than before, oh well.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 02, 2012, 07:39:54 PM
 #406

DiaKGCN -> Diapolo Kernel Graphics Core Next

As I said, the new one this is for the 79XX cards, but I really would be interested in how it performs on older cards with current drivers / OpenCL runtime.
The next time you should perhaps reply in the other thread, as I won't work on phatk_dia anymore.

Thanks for your tests,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
February 03, 2012, 03:21:20 PM
 #407

Now tried on 12.1 and SDK 2.6.

HD 6850 - 35 Mhash/s slower
5850s - 100Mhash/s slower

win7 64 pro
-k phatk AGGRESSION=12 VECTORS2 WORKSIZE=128

5850s are 80 Mhash/s slower if I turn the 6850 off
a solo 5850 will be only 5-8 Mhash/s slower if ran on its own
6850 3 Mhash/s slower when ran on its own

only solution was to run miner on all 4 cores, instead of 1 but then it is utilized in 50-75% (=more heat)
only then I get 5-8Mh/s less

any ideas?


Signature space available for rent.
mtminer
Member
**
Offline Offline

Activity: 86
Merit: 10


View Profile
February 03, 2012, 03:51:49 PM
 #408

Tried out diakgcn on my 6870. All tests at aggression 12 and done over very short time periods, unless said otherwise.

Results:

2.5:
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps (best one from previous test)
diakgcn, WORKSIZE=128, VECTORS: 278 MHps
diakgcn, WORKSIZE=128, VECTORS2: 279 MHps
diakgcn, WORKSIZE=64, VECTORS: 278 MHps
diakgcn, WORKSIZE=64, VECTORS2: 279 MHps
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 278 MHps (spiked up to 282 at one point.....)
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=5: 275 MHps

2.6: (Sorry, did poor documentation here. Only listed best results If asked for I will document this better later)
phatk_dia: 272 MHps
diakgcn: 260MHps

tl;dr: diakgcn is currently slower for 6870

Thanks for your results, that behaviour was awaited ... now it's confirmed. Well DiaKGCN is not finished, so perhaps it will get better for older cards over time Smiley.

Dia

Wouldn't it be easier to separate the kernels out for each of the 5xxx, 6xxx, and 7xxx series cards instead of trying to make a one size fits all. Is it possible to test at start up and exclude cards that a certain kernel isn't designed to run on? I hate to see you wasting time supporting the older cards with new kernels.



Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 03, 2012, 06:16:54 PM
 #409

Tried out diakgcn on my 6870. All tests at aggression 12 and done over very short time periods, unless said otherwise.

Results:

2.5:
phatk_dia, WORKSIZE=128, VECTORS: 282 MHps (best one from previous test)
diakgcn, WORKSIZE=128, VECTORS: 278 MHps
diakgcn, WORKSIZE=128, VECTORS2: 279 MHps
diakgcn, WORKSIZE=64, VECTORS: 278 MHps
diakgcn, WORKSIZE=64, VECTORS2: 279 MHps
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=10: 278 MHps (spiked up to 282 at one point.....)
diakgcn, WORKSIZE=128, VECTORS2, AGGRESSION=5: 275 MHps

2.6: (Sorry, did poor documentation here. Only listed best results If asked for I will document this better later)
phatk_dia: 272 MHps
diakgcn: 260MHps

tl;dr: diakgcn is currently slower for 6870

Thanks for your results, that behaviour was awaited ... now it's confirmed. Well DiaKGCN is not finished, so perhaps it will get better for older cards over time Smiley.

Dia

Wouldn't it be easier to separate the kernels out for each of the 5xxx, 6xxx, and 7xxx series cards instead of trying to make a one size fits all. Is it possible to test at start up and exclude cards that a certain kernel isn't designed to run on? I hate to see you wasting time supporting the older cards with new kernels.





Easy answer, I just focused on GCN performance with DiaKGCN, that it runs on VLIW4/5 is just nice to have. I won't spent anymore time in optimising performance of phatk_dia or DiaKGCN for older cards. I even don't know for what I do all this, because this one (phatk_dia) seems to not be faster really for anyone + no one cares to support development via a small donation. People seem to just donate something if they gain 10+ MH/s over another kernel ... my hard work that was put into a specific version is not paid any attention to it seems :-/.

Dia

PS.: To discuss DiaKGCN further please use https://bitcointalk.org/index.php?topic=61406.0

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
film2240
Legendary
*
Offline Offline

Activity: 1022
Merit: 1000


Freelance videographer


View Profile WWW
February 07, 2012, 06:50:42 PM
 #410

Can someone tell me which aggression setting gives me the absolute best performance with my HD6950 (Heavily OC'd with unlocked shaders) please as I can't seem to find it in this thread or anywhere? I also want a list of flags for this miner (running in GUIMiner with Phoneix as polcm kept having issues lately)

thanks

[This signature is available for rent.BTC/ETH/LTC or £50 equivalent a month]
[This signature is available for rent.BTC/ETH/LTC or £50 equivalent a month]
[This signature is available for rent.BTC/ETH/LTC or £50 equivalent a month]
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
February 07, 2012, 08:22:07 PM
 #411

Can someone tell me which aggression setting gives me the absolute best performance with my HD6950 (Heavily OC'd with unlocked shaders) please as I can't seem to find it in this thread or anywhere? I also want a list of flags for this miner (running in GUIMiner with Phoneix as polcm kept having issues lately)

thanks

AGGRESSION=12, higher levels will lead to an idle miner in Phoenix, because it can't get work fast enough. Perhaps 13 or 14 works for your setup!

Available switches can be found in the init file:
Code:
	PLATFORM = KernelOption(
'PLATFORM', int, default=None,
help='The ID of the OpenCL platform to use')
DEVICE = KernelOption(
'DEVICE', int, default=None,
help='The ID of the OpenCL device to use')
VECTORS2 = KernelOption(
'VECTORS2', bool, default=False, advanced=True,
help='Enable vector uint2 support in the kernel.')
VECTORS4 = KernelOption(
'VECTORS4', bool, default=False, advanced=True,
help='Enable vector uint4 support in the kernel.')
FASTLOOP = KernelOption(
'FASTLOOP', bool, default=False, advanced=True,
help='Run iterative mining thread.')
AGGRESSION = KernelOption(
'AGGRESSION', int, default=5, advanced=True,
help='Exponential factor indicating how much work to run per OpenCL execution')
WORKSIZE = KernelOption(
'WORKSIZE', int, default=None, advanced=True,
help='The local worksize to use when executing OpenCL kernels.')
BFI_INT = KernelOption(
'BFI_INT', bool, default=True, advanced=True,
help='Use the BFI_INT instruction for AMD GPUs.')

Remember, this one will get no further development time!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 [All]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!