Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
July 28, 2011, 03:53:33 AM |
|
Sorry I haven't really be on the forums much lately... wedding planning stuff . But... Any chance of getting a kernel optimized for the 6xxx series?
The is optimized for the 5xxx series, the 66xx series, the 67xx series and the 68xx series since they all use the same architecture. Only the 69xx cards use a different architecture which is less efficient for mining (VLIW4 instead of VLIW5 for those who are interested). I have debated whether to rewrite the kernel for the 69xx series, but at most, it would only increase performance by at most ~1%. Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.
In the current version, In addition to numerous very tiny optimizations, I have reordered the Ma() operands which reduce the number of instructions on operations with at least one non-vector operand. #define Ma(z, x, y) amd_bytealign((y), (x | z), (z & x)) I think this is what you are talking about... Anywho... here is my new version which is a very slight improvement over 1.0 (about 1% faster for me). One thing to note is that you MUST put in a valid WORKSIZE value when running version 1.1 due to one of the optimizations. https://sourceforge.net/projects/phatk/files/phatk-1.1.zip/download Post any questions or bugs you have, thanks -Phateus
|
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
July 28, 2011, 09:16:19 PM |
|
Ah.. there is a lot I've missed since I've been gone... I will combine my improvements and his to see if I can get it lower. Thanks for the info. -Phateus
|
|
|
|
pennytrader
|
|
July 29, 2011, 01:19:37 AM |
|
Great to see the continuous improvment
|
please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
July 29, 2011, 05:55:59 AM |
|
Alright, check the first post, I uploaded a second version today with a few tweaks (The Ma() tweak and slight reordering of some operations). It should be faster than diapolo's now. Also, anyone who wants to help with this or has any suggestions, PM me and I'll be more than happy to discuss when I get the chance. And... Diapolo (and anyone else who wants to help), if you read this... We should work together on trying to improve this I think it is a good idea to keep separate code sources to increase the chances of finding optimizations, but if you have any questions about my code, let me know. -Phateus
|
|
|
|
pennytrader
|
|
July 29, 2011, 06:24:26 AM |
|
kernel opencl error. does this work with phoenix 1.5?
|
please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
|
|
|
krzynek1
Newbie
Offline
Activity: 41
Merit: 0
|
|
July 29, 2011, 07:00:44 AM |
|
not working with Phoenix r101
|
|
|
|
jedi95
|
|
July 29, 2011, 07:20:04 AM |
|
not working with Phoenix r101
Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
CanaryInTheMine
Donator
Legendary
Offline
Activity: 2352
Merit: 1060
between a rock and a block!
|
|
July 29, 2011, 07:22:21 AM |
|
not working with Phoenix r101
Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster. where can I find r112?
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 29, 2011, 07:23:41 AM |
|
not working with Phoenix r101
Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster. where can I find r112? +1, also how to know the revision number?
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 29, 2011, 07:45:08 AM |
|
I am getting STRANGE results. Using Diapolo I got 434 & 427. Both are 5870 card. 1st is MSI Lightning 5870 @ 957/319, 1175mV. 2nd is Sapphire HD 5870 @ 939/313, 1163mV.
From your Phatk 2.0, i get 430 & 429. No change in any flags... 434 reduced to 430, But 427 increased to 429.
|
|
|
|
lagmo
Member
Offline
Activity: 67
Merit: 10
|
|
July 29, 2011, 08:23:51 AM |
|
I'm getting this error when i try to use your 2.0 kernel on Phoenix 1.5/Linuxcoin 2.0(Debian live) Works just fine on my Win7 x64 box though, so guessing it's specific to linuxcoins default complement of packages. Unhandled error in Deferred: Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 361, in callback self._startRunCallbacks(result) File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/opt/miners/phoenix/QueueReader.py", line 136, in preprocess d2 = defer.maybeDeferred(self.preprocessor, nr) --- <exception caught here> --- File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "kernels/phatk/__init__.py", line 167, in <lambda> self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr), File "kernels/phatk/__init__.py", line 361, in preprocess kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION) File "kernels/phatk/__init__.py", line 46, in __init__ unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32) struct.error: unpack requires a string argument of length 32
|
|
|
|
iopq
|
|
July 29, 2011, 01:54:34 PM |
|
Running windows 7, 64 bit I'm getting [29/07/2011 06:50:32] FATAL kernel error: Failed to load OpenCL kernel! when I try the newest one I tried with phoenix 1.5 and the latest 112 revision, get the same error I'm doing python phoenix.py -u http://iopq.me:***@mineco.in: 3000/ -k phatk DEVICE=1 VECTORS BFI_INT AGGRESSION=7 WORKSIZE=128 does it have something to do with worksize? because when i supply an invalid worksize to the phatk 1.0 it also gives the same error
|
|
|
|
bcforum
|
|
July 29, 2011, 02:11:21 PM Last edit: July 29, 2011, 02:25:25 PM by bcforum |
|
Gives an error in Linux (Ubuntu 10.10 x64), Python 2.6.6, Twisted 10.1.0-2: [29/07/2011 08:10:17] Phoenix 1.50 starting... [29/07/2011 08:10:17] Connected to server [29/07/2011 08:10:17] Server gave new work; passing to WorkQueue [29/07/2011 08:10:17] New block (WorkQueue) [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC] Unhandled error in Deferred: Traceback (most recent call last): File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 318, in callback self._startRunCallbacks(result) File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess d2 = defer.maybeDeferred(self.preprocessor, nr) --- <exception caught here> --- File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 125, in maybeDeferred result = f(*args, **kw) File "kernels/phatk/__init__.py", line 167, in <lambda> self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr), File "kernels/phatk/__init__.py", line 361, in preprocess kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION) File "kernels/phatk/__init__.py", line 46, in __init__ unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32) struct.error: unpack requires a string argument of length 32 [29/07/2011 08:10:17] Server gave new work; passing to WorkQueue [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]^C
I tried changing the 'LLLL' and 'LLLLLLLL' to 'IIII' (like in the old __init__.py, but that caused a new error further along.
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
dikidera
|
|
July 29, 2011, 02:12:52 PM |
|
Yup the new kernel doesnt work.
|
|
|
|
iopq
|
|
July 29, 2011, 02:17:58 PM |
|
1.1 doesn't work either for me, same error
|
|
|
|
Diapolo
|
|
July 29, 2011, 03:31:10 PM |
|
Alright, check the first post, I uploaded a second version today with a few tweaks (The Ma() tweak and slight reordering of some operations). It should be faster than diapolo's now. Also, anyone who wants to help with this or has any suggestions, PM me and I'll be more than happy to discuss when I get the chance. And... Diapolo (and anyone else who wants to help), if you read this... We should work together on trying to improve this I think it is a good idea to keep separate code sources to increase the chances of finding optimizations, but if you have any questions about my code, let me know. -Phateus Currently looking at your code ... Dia
|
|
|
|
Mr.Prayer
Newbie
Offline
Activity: 8
Merit: 0
|
|
July 29, 2011, 04:08:16 PM |
|
Win7 x64, 5870, Catalyst 11.7, latest GUIMiner (Phoenix 1.5). After copying v2.0 files into "kernels\phatk" i get this messages in console: 2011-07-29 11:02:49: Listener for "itzod2": [29/07/2011 11:02:49] [4.19 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-29 11:02:50: Listener for "itzod2": [29/07/2011 11:02:50] Warning: work queue empty, miner is idle
No work is being done. Here's miner starting parameters: 2011-07-29 18:08:49: Running command: .\phoenix.exe -u http://****:****@lp1.itzod.ru:8344 PLATFORM=0 DEVICE=0 AGGRESSION=12 -k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256
Your v1.0 and Diapolo's 2011-07-17 kernel works fine.
|
|
|
|
Diapolo
|
|
July 29, 2011, 05:51:47 PM |
|
1st question, how is 0x2004000U in line 170 computed? Currently I don't get it . Dia
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
July 29, 2011, 06:12:38 PM |
|
Yup the new kernel doesnt work.
BAH!.. I'll look through it, tonight I am going to a Sublime with Rome and 311 concert... so this weekend. 1st question, how is 0x2004000U in line 170 computed? Currently I don't get it . Dia Basically, since only the last bit is different between the 2 nonces W3.x and W3.y, the first calculation done on those values is P2: P2(18) = rot(W[3],25)^rot(W[3],14)^((W[3])>>3U); So, basically, instead of flipping Bit 0 on W[3] and calculating both W[18].x and W[18].y, we can calculate W[18].x and W[18].y will be the same besides bits 25 and 14 being flipped P2(18).x = rot(W[3].x,25)^rot(W[3].x,14)^((W[3].x)>>3U); W[3].y = W[3].x ^ 1, therefore:
P2(18).y = P2(18).x ^ (rot(1,25)^rot(1,14)^((1)>>3U)); so, P2(18).y = P2(18).x ^ 0x2004000U;
|
|
|
|
|