Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
May 11, 2011, 05:05:55 PM Last edit: August 08, 2011, 06:16:28 PM by Phateus |
|
phatk Kernel for Phoenix 1.5I have started working on my phoenix kernel again, so I should be putting out normal updates. Anyone with bugs, questions or suggestions, post in the thread and I'll try to look into them. After an update, if you are still having issues, please feel free to post again since it is hard to track which bugs I have fixed and which are still out there. Version 2.2: https://sourceforge.net/projects/phatk/files/phatk-2.2.zip/downloadVersion 2.1: https://sourceforge.net/projects/phatk/files/phatk-2.1.zip/downloadVersion 2.0: https://sourceforge.net/projects/phatk/files/phatk-2.0.zip/downloadVersion 1.0: https://sourceforge.net/projects/phatk/files/phatk-1.0.zip/downloadMake sure if are you using version 2.0, you supply a valid WORKSIZE option (such as "WORKSIZE=256")Kernel performance (BFI_INT active / APP KernelAnalyzer CAL 11.7 profile): HD5870 (Also any other 5xxx or 68xx card) Diapolo 2011-07-17: 1374 ALU OPs Version 1.0: 1418 ALU OPs Version 2.0 (7/29/11): 1363 ALU OPs Version 2.1 (8/2/11): 1359 ALU OPs Version 2.2 (8/8/11): 1354 ALU OPs HD6970Diapolo 2011-07-17: 1698 ALU OPs Version 1.0: 1747 ALU OPs Version 2.0: 1691 ALU OPs Version 2.1: 1692 ALU OPs Version 2.2: 1688 ALU OPs As of version 2.1, phatk now has command line option "VECTORS4" which can be used instead of "VECTORS".This option works on 4 nonces per thread instead of 2 and may increase speed mainly if you do not underclock your memory, but feel free to try it out. Note that if you use this, you will more than likely have to decrease your WORKSIZE to 128 or 64. Below is a graph I came up with for my 5870 with the core clocked at 950. V1 is the speed with no VECTORS option enabled, V2 is with using the standard "VECTORS" and V4 is using the new "VECTORS4" command line option. The numbers with them show the WORKSIZE. https://spreadsheets.google.com/spreadsheet/oimg?key=0Ar69rrd0ZESNdGU3NElvU3Q0eFYzYkhuUFJUbkVraUE&oid=1&zx=ks7ngj3nt03gTo install, unzip into the phoenix's kernel folder (files should be in [phoenix root]/kernels/phatk/) I use the command: phoenix.exe -u http://user:password@www.bitcoinpool.com:8334/ DEVICE=0 BFI_INT VECTORS AGGRESSION=12 WORKSIZE=256 -k phatk Lastly, I am keeping track of new features that I have thought of adding to my kernel (not sure what is feasible yet, but these are just things I am looking into). If anyone has any suggestions, I will add them to the list. If any of these sound useful to you, let me know so I know where to put my efforts: - Precompiled Kernels for SDK 2.4 so, any version of the SDK will get the full speed of the latest SDK
- Auto-optimize which will iterate through all of the combinations of command line options to give you the fastest hashrate
- Logging
- Web Interface for controlling miners and viewing hashrate graphs (this will probably have to be a separate project and would likely slow my progress on optimizing)
If it works out for you and you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software: 124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv -Phateus
|
|
|
|
|
|
|
|
Make sure you back up your wallet regularly! Unlike a bank account, nobody can help you if you lose access to your BTC.
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
mitak64
|
|
May 11, 2011, 05:21:45 PM |
|
Just tried it.
phoenix 1.4 aggression=11 on HD5850 @865/300 - 340mh/s phatk 1.4 aggression=11 on HD5850 @865/300 - 338mh/s
|
|
|
|
Gnaffel
Newbie
Offline
Activity: 53
Merit: 0
|
|
May 11, 2011, 05:48:11 PM Last edit: May 12, 2011, 01:01:42 PM by 4z3rt |
|
on my test machine HD5570 OC@700Mhz
poclbm 72MH/s no desktop lag phoenix 73MH/s no desktop lag hashkill 75MH/s sometimes slow mouse phoenix-phatk 76MH/s very slow desktop environment/must be good for headless
|
|
|
|
Convery
|
|
May 11, 2011, 07:10:00 PM Last edit: May 11, 2011, 07:34:10 PM by Convery |
|
5850 1050/300: Phoenix - 402Mhash Phatk - 415-417Mhash
Quite nice ;3
|
|
|
|
bolapara
Member
Offline
Activity: 78
Merit: 10
|
|
May 11, 2011, 07:19:04 PM |
|
phoenix 1.46 agg=13 bfi_int vectors card 1: 5870, 995 core, 300 mem poclbm - 431 phatk - 438 card 2: 5870, 900 core, 300 mem poclbm - 389 phatk - 397 Nice little bump.
|
|
|
|
EPiSKiNG
Legendary
Offline
Activity: 800
Merit: 1001
|
|
May 11, 2011, 07:19:58 PM |
|
5870 @ 970core 300mem Guiminer-2011.05.01: 431.5MH/s (--platform=0 -v -w 256 -f 0) PhatK: 426.62MH/s (phoenix.exe -u http://XXX:XXX@deepbit.net:8332/;askrate=15 PLATFORM=0 DEVICE=1 BFI_INT VECTORS AGGRESSION=12 -k phatk) -EP
|
|
|
|
Kick
|
|
May 11, 2011, 07:48:47 PM Last edit: May 11, 2011, 07:59:50 PM by Kick |
|
5870 @ 970core 300mem Guiminer-2011.05.01: 431.5MH/s (--platform=0 -v -w 256 -f 0) PhatK: 426.62MH/s (phoenix.exe -u http://XXX:XXX@deepbit.net:8332/;askrate=15 PLATFORM=0 DEVICE=1 BFI_INT VECTORS AGGRESSION=12 -k phatk) -EP not really a correct comparison. youre missing the w256 flag for phatK actually, i take that back. default should be max the device can support.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
May 11, 2011, 07:57:30 PM |
|
I see most of you are running 300 Mhz memory. One thing that I've noticed from messing around with everything is that 300 Mhz memory can be too slow. I found that with 1000Mhz core, 330 was optimal for the memory. At really low memory clocks(especially with my kernel), the speed is limited by the memory. A good estimation for memory speed(for both the 5850 and 5870) was 1/3 the core speed.
Happy mining
-Phateus
|
|
|
|
Convery
|
|
May 11, 2011, 08:25:53 PM |
|
I see most of you are running 300 Mhz memory. One thing that I've noticed from messing around with everything is that 300 Mhz memory can be too slow. I found that with 1000Mhz core, 330 was optimal for the memory. At really low memory clocks(especially with my kernel), the speed is limited by the memory. A good estimation for memory speed(for both the 5850 and 5870) was 1/3 the core speed.
5850 peak Mhash: 1055/300 - 417Mhash 1055/350 - 419Mhash 1055/375 - 420Mhash 1055/400 - 416Mhash - Unstable. 1055/425 - 417Mhash
|
|
|
|
Tyran
Newbie
Offline
Activity: 40
Merit: 0
|
|
May 11, 2011, 09:14:13 PM |
|
Unfortunately no improvement on a 5770: @935/300 -k poclbm: 207.5 @935/300 -k phatk: 202.5 Higher memory clocks only decrease performance more. Might be because I'm running SDK 2.1, do you think it would make up for the ~5% loss going to 2.4?
|
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
May 11, 2011, 10:04:52 PM |
|
Unfortunately no improvement on a 5770: @935/300 -k poclbm: 207.5 @935/300 -k phatk: 202.5 Higher memory clocks only decrease performance more. Might be because I'm running SDK 2.1, do you think it would make up for the ~5% loss going to 2.4?
Yeah, at least for my kernel, which was specifically written for 2.4. The optimizations are mainly tricking the compiler into doing what I want it to do, so using 2.4 should increase performance a fair amount but not actually having different SDKs on any of my machines, I cannot test it. Might be worth a shot. Its always a toss-up whether its worth the hassle/down-time to tinker with your miner.
|
|
|
|
EPiSKiNG
Legendary
Offline
Activity: 800
Merit: 1001
|
|
May 11, 2011, 10:26:47 PM |
|
5870 @ 970core 300mem Guiminer-2011.05.01: 431.5MH/s (--platform=0 -v -w 256 -f 0) PhatK: 426.62MH/s (phoenix.exe -u http://XXX:XXX@deepbit.net:8332/;askrate=15 PLATFORM=0 DEVICE=1 BFI_INT VECTORS AGGRESSION=12 -k phatk) -EP Also, I am using ATI-Stream-v2.1 (145) & Catalyst 11.3 (3-8-2011)... Haven't tried using 2.4 yet, and I don't really feel like switching it... Is 2.4 supposed to be better performance? -EP
|
|
|
|
jedi95
|
|
May 11, 2011, 10:38:49 PM Last edit: May 12, 2011, 07:19:16 AM by jedi95 |
|
Very nice! I am getting 408 Mhash/sec now vs 394 Mhash/sec using the poclbm kernel. There is also no difference in desktop responsiveness compared to the poclbm kernel. This is very close to what I get with the poclbm kernel on Linux with SDK 2.1. (410 Mhash/sec, but that's at AGGRESSION=11) 5870 @ 930/300 (Win7 x64, 11.5 + SDK 2.4) Arguments: FASTLOOP VECTORS BFI_INT AGGRESSION=8 Also, it appears you used an older revision of the poclbm kernel as the base for phatk. It doesn't include the FASTLOOP changes in Phoenix 1.45 and newer. The hashrate comparison above is with the FASTLOOP updates added to phatk, however with these particular settings it should be nearly identical. Donation coming your way
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
Herodes
|
|
May 11, 2011, 11:02:00 PM |
|
Seeing an 3-4% increase in hashing speed. Donation coming your way. Thanks for sharing.
|
|
|
|
grndzero
|
|
May 11, 2011, 11:24:55 PM |
|
5850 900/300
Went from 361 to 355 VECTORS AGGRESSION=12 BFI_INT
|
Ubuntu Desktop x64 - HD5850 Reference - 400Mh/s w/ cgminer @ 975C/325M/1.175V - 11.6/2.1 SDK Donate if you find this helpful: 1NimouHg2acbXNfMt5waJ7ohKs2TtYHePy
|
|
|
jedi95
|
|
May 11, 2011, 11:30:43 PM |
|
5850 900/300
Went from 361 to 355 VECTORS AGGRESSION=12 BFI_INT
This is probably because it's optimized for SDK 2.4. If you are using the Linux + SDK 2.1 setup in your sig then it's probably better to stick with the poclbm kernel. The advantage of phatk is that it produces similar speed to poclbm + SDK 2.1 with SDK 2.4.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
grndzero
|
|
May 11, 2011, 11:32:40 PM |
|
5850 900/300
Went from 361 to 355 VECTORS AGGRESSION=12 BFI_INT
This is probably because it's optimized for SDK 2.4. If you are using the Linux + SDK 2.1 setup in your sig then it's probably better to stick with the poclbm kernel. The advantage of phatk is that it produces similar speed to poclbm + SDK 2.1 with SDK 2.4. Ah, yeah, I did read that, it failed to register. (I just woke up)
|
Ubuntu Desktop x64 - HD5850 Reference - 400Mh/s w/ cgminer @ 975C/325M/1.175V - 11.6/2.1 SDK Donate if you find this helpful: 1NimouHg2acbXNfMt5waJ7ohKs2TtYHePy
|
|
|
Nicksasa
|
|
May 11, 2011, 11:33:29 PM |
|
Tried it again on my 6970 @ 925Mhz, dropped from 379mhash to 366mhash on 11.4 & sdk 2.4
|
|
|
|
Herodes
|
|
May 12, 2011, 12:34:20 AM |
|
Anyone tested it on 5970 yet?
|
|
|
|
JWU42
Legendary
Offline
Activity: 1666
Merit: 1000
|
|
May 12, 2011, 02:47:58 AM |
|
Tried on 5970 (using 2.1 though so didn't expect much).
367 - poclbm 362 - phatk
|
|
|
|
gmaxwell
Moderator
Legendary
Offline
Activity: 4158
Merit: 8382
|
|
May 12, 2011, 02:52:38 AM |
|
On a stock 5870 at AGRESSION=12, I get 371 (vs. 353 with the default kernel) and O/C at 1GHz i get 438 (vs. 420 with the default kernel) With VECTORS and BFI_INT it compiles to 1418 ALU ops for 2 hashes. [snip] Id you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software: 124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
On 5870 900/300 383.17 -> 398.16 On the 5850s 852/284 327.12 -> 340.90 CLI: DISPLAY=:0.0 python phoenix.py -q 2 -u http://15xWuDHSyKzpvp6FacGKXijBeaaaYhKWSi:x@pool.bitcoin.dashjr.org:8337/ -k phatk DEVICE=$1 AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT SDK 2.4 While screwing around with memory settings previously I found that having an integer ratio of clock to mem made a fair improvement, around 3MH/s with the old kernel vs being near but not quite. I wasn't sure if this was chance or something substantive, but considering that I'm seeing better improvements (and performance) than some others I thought I'd mention it. Phateus, you have my thanks and a donation of a day worth of the income improvement your code brought me.
|
|
|
|
OtaconEmmerich
|
|
May 12, 2011, 03:23:37 AM |
|
poclbm(GUI Miner) 200MHs (-v -w64 -f0) phatk 210MHs (BFI_INT VECTORS FASTLOOP=false AGGRESSION=12)
This is on a 5770@955/300 I'd say that's worth a small donation from me. I should try out Diablo miner next, maybe after his upcoming upgrade he may beat your kernel.
|
|
|
|
nster
|
|
May 12, 2011, 03:40:40 AM |
|
On a stock 5870 at AGRESSION=12, I get 371 (vs. 353 with the default kernel) and O/C at 1GHz i get 438 (vs. 420 with the default kernel) With VECTORS and BFI_INT it compiles to 1418 ALU ops for 2 hashes. [snip] Id you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software: 124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
On 5870 900/300 383.17 -> 398.16 On the 5850s 852/284 327.12 -> 340.90 CLI: DISPLAY=:0.0 python phoenix.py -q 2 -u http://15xWuDHSyKzpvp6FacGKXijBeaaaYhKWSi:x@pool.bitcoin.dashjr.org:8337/ -k phatk DEVICE=$1 AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT SDK 2.4 While screwing around with memory settings previously I found that having an integer ratio of clock to mem made a fair improvement, around 3MH/s with the old kernel vs being near but not quite. I wasn't sure if this was chance or something substantive, but considering that I'm seeing better improvements (and performance) than some others I thought I'd mention it. Phateus, you have my thanks and a donation of a day worth of the income improvement your code brought me. not for me, 1020/344 has about 5Mh/s advantage over 1020/340 and another 5Mh/s than 1020/510
|
167q1CHgVjzLCwQwQvJ3tRMUCrjfqvSznd Donations are welcome Please be kind if I helped
|
|
|
fpgaminer
|
|
May 12, 2011, 04:03:23 AM |
|
I ran this new kernel against stock poclbm using my 5970. Although the MHash/s was +10 for the modified kernel, it ended up getting less accepted shares in the long run (several hours). That may just be terrible luck, but I tried it twice; once under Windows, and then under Ubuntu. Both times for several hours. Both times with the same results (stock poclbm with more accepted shares). I have not tried swapping which core the respective kernels were running on, but it's been enough downtime for me today
|
|
|
|
Melvin132
Newbie
Offline
Activity: 13
Merit: 0
|
|
May 12, 2011, 07:06:41 AM |
|
Seems to work great with the 5850 as well. Got my average Mh/s by by an average of 10-15, Using 6 cards that's really respectable.
|
|
|
|
Clavulanic
|
|
May 12, 2011, 07:36:35 AM |
|
2x 5870's "-k poclbm device=1 WORKSIZE=128 VECTORS BFI_INT AGGRESSION=7 FASTLOOP " core on both is at 935. I got 385mhash on both with poclbm and now i'm getting 398 on both with phatk.
Does fastloop work with this or not? I'm working on upping my aggression and overclocking still.
|
|
|
|
██████████
████████
██████
████
██████
████████
██████████ | | | | | . Appreciate Coin | | | │ | Send and receive tokens in blogs and social media communities | │ | | | ▄▄█████████▄▄ ▄█████████████████▄ ▄███▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀███▄ ▄████ ███ █████ ███ ████▄ ▄█████▄▄ ▄▄▄▄ ▄▄▄▄ ▄▄█████▄ ██████▀▀ ▀▀▀▀ ▀▀▀▀ ▀▀██████ ██████ ███ █████ ███ ██████ ██████▄▄ ▄▄▄▄ ▄▄▄▄ ▄▄██████ ▀█████▀▀ ▀▀▀▀ ▀▀▀▀ ▀▀█████▀ ▀████ ███ █████ ███ ████▀ ▀███▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄███▀ ▀█████████████████▀ ▀▀█████████▀▀ | ▀▀▀▀▀▀▀▀ NO INFLATION . ▀▀▀▀▀▀▀▀ | | | | | ▄▄█████████▄▄ ▄███████▀▀▀███████▄ ▄████▀▀▀▀ ▄ ▀▀▀▀████▄ ▄████ ▄▄▄▄███▄▄▄▄ ████▄ ▄█████ ███████████ █████▄ ████▀ ▄██▀ ▀██▄ █████ ████ ████ ███ ████ ████ ████▄ ▀██▄ ▄██▀ █████ ▀█████ ███████████ █████▀ ▀████ ▀▀▀▀███▀▀▀▀ ████▀ ▀████▄▄▄▄ ▀ ▄▄▄▄████▀ ▀███████▄▄▄███████▀ ▀▀█████████▀▀ | ▀▀▀▀▀▀▀▀ TECHNICAL INTEGRATIONS . ▀▀▀▀▀▀▀▀ | | | ▄▄█████████▄▄ ▄█████████████████▄ ▄██████████▀██████████▄ ▄█████████▀ ▀█████████▄ ▄████████▀ ▀████████▄ █████████▄▄█▌ ▐█▄▄█████████ ████████████▌ ▐████████████ ████████████▌ ▐████████████ ▀███████████▌ ▐███████████▀ ▀██████████▌ ▐██████████▀ ▀█████████████████████▀ ▀█████████████████▀ ▀▀█████████▀▀ | ▀▀▀▀▀▀▀▀ GROWING USE CASES . ▀▀▀▀▀▀▀▀ | | | ▄▄█████████▄▄ ▄█████████████████▄ ▄████████▀ ▀████████▄ ▄████▀ ▄▄▄ ▀████▄ ▄█████ ███████████ █████▄ ████▀ ▄███████▀▀██▄ ▀████ ████ ████ ▀▀ ▄████ ████ ████▄ ▀███▄ ▄████▀ ▄████ ▀█████ ███████████ █████▀ ▀████▄ ▀▀▀ ▄████▀ ▀████████▄ ▄████████▀ ▀█████████████████▀ ▀▀█████████▀▀ | ▀▀▀▀▀▀▀▀ EASE OF CRYPTO TRADING . ▀▀▀▀▀▀▀▀ | | | | ██████████
████████
██████
████
██████
████████
██████████ |
|
|
|
ataranlen
|
|
May 12, 2011, 07:49:01 AM Last edit: May 12, 2011, 08:15:16 AM by ataranlen |
|
I've just switched to this kernal, Seeing a 10% increase on all GPU's
2x 5870x2's at 950mhz core, getting 407-412mhash/s
Now I want to see just how much I can pull from these with your kernal, so I can update my listings on the wiki! Its only 3am, and I work at 6am, I'm sure I have time to finish xD
|
|
|
|
Enky1974
|
|
May 12, 2011, 07:58:55 AM Last edit: May 12, 2011, 08:39:17 AM by Enky1974 |
|
Poclbm last version = 396 -f 60 -v -w128 gpu load 98% phoenix tweaked kernel = 406 aggression 7 fastloop gpu load 97%
ati 5870 sapphire 1gb ddr3 @950/@333 sdk 2.3 catalyst 11.4
|
|
|
|
jedi95
|
|
May 12, 2011, 07:59:58 AM |
|
2x 5870's "-k poclbm device=1 WORKSIZE=128 VECTORS BFI_INT AGGRESSION=7 FASTLOOP " core on both is at 935. I got 385mhash on both with poclbm and now i'm getting 398 on both with phatk.
Does fastloop work with this or not? I'm working on upping my aggression and overclocking still.
It does, but it has the same behavior as the poclbm kernel included in Phoenix 1.4. This means it doesn't have as much of a speed benefit at low aggression and it causes stale shares if used with high aggression.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
gmaxwell
Moderator
Legendary
Offline
Activity: 4158
Merit: 8382
|
|
May 12, 2011, 08:32:55 AM Last edit: May 12, 2011, 08:49:46 AM by gmaxwell |
|
I ran this new kernel against stock poclbm using my 5970. Although the MHash/s was +10 for the modified kernel, it ended up getting less accepted shares in the long run (several hours). That may just be terrible luck, but I tried it twice; once under Windows, and then under Ubuntu. Both times for several hours. Both times with the same results (stock poclbm with more accepted shares). I have not tried swapping which core the respective kernels were running on, but it's been enough downtime for me today I'm in a position to speak objectively about this as I log all my found shares. More data would helpful, but it's only run for a few hours. Ideally I would have collected data from two cards in parallel over the same time to isolate network effects, instead I'll just exclude the extreme outliers (>90s). Using the 1814 shares before the change and 1814 since the change on a single node (the 5870), I found that the mean time between shares before was 11.127 seconds and the mean time after was 10.8. This difference is not large enough to make the 95% confidence intervals assuming an exponential distribution, and a permutation test finds only p=0.369, so with this amount of data I can't say it made it better for _sure_ but it's certainly more likely than not, and it's also very unlikely to have made it worse. 10.8 seconds at difficulty 1 implies 397,688,225 h/s and 11.127 implies 386,000,973 h/s, which is basically what the tool shows... well, a little less— it looks like the performance was overstated a bit before and its less so now? (The formula for hashrate from share gaps is 281474976710656/(65535*seconds)=h/s)
|
|
|
|
jedi95
|
|
May 12, 2011, 08:48:18 AM |
|
I have uploaded a modified version of phatk to the Phoenix SVN. The main difference is that it now has the same FASTLOOP improvements as the poclbm kernel from 1.45. Performance should be around the same except at low aggression. Download
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
Clavulanic
|
|
May 12, 2011, 09:05:33 AM |
|
I have uploaded a modified version of phatk to the Phoenix SVN. The main difference is that it now has the same FASTLOOP improvements as the poclbm kernel from 1.45. Performance should be around the same except at low aggression. DownloadI didn't realize something had changed with fastloop. Neat. Same rule applies though is what you're saying right? fastloop at < or = to aggression 7, no fastloop above 7.
|
|
|
|
██████████
████████
██████
████
██████
████████
██████████ | | | | | . Appreciate Coin | | | │ | Send and receive tokens in blogs and social media communities | │ | | | ▄▄█████████▄▄ ▄█████████████████▄ ▄███▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀███▄ ▄████ ███ █████ ███ ████▄ ▄█████▄▄ ▄▄▄▄ ▄▄▄▄ ▄▄█████▄ ██████▀▀ ▀▀▀▀ ▀▀▀▀ ▀▀██████ ██████ ███ █████ ███ ██████ ██████▄▄ ▄▄▄▄ ▄▄▄▄ ▄▄██████ ▀█████▀▀ ▀▀▀▀ ▀▀▀▀ ▀▀█████▀ ▀████ ███ █████ ███ ████▀ ▀███▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄███▀ ▀█████████████████▀ ▀▀█████████▀▀ | ▀▀▀▀▀▀▀▀ NO INFLATION . ▀▀▀▀▀▀▀▀ | | | | | ▄▄█████████▄▄ ▄███████▀▀▀███████▄ ▄████▀▀▀▀ ▄ ▀▀▀▀████▄ ▄████ ▄▄▄▄███▄▄▄▄ ████▄ ▄█████ ███████████ █████▄ ████▀ ▄██▀ ▀██▄ █████ ████ ████ ███ ████ ████ ████▄ ▀██▄ ▄██▀ █████ ▀█████ ███████████ █████▀ ▀████ ▀▀▀▀███▀▀▀▀ ████▀ ▀████▄▄▄▄ ▀ ▄▄▄▄████▀ ▀███████▄▄▄███████▀ ▀▀█████████▀▀ | ▀▀▀▀▀▀▀▀ TECHNICAL INTEGRATIONS . ▀▀▀▀▀▀▀▀ | | | ▄▄█████████▄▄ ▄█████████████████▄ ▄██████████▀██████████▄ ▄█████████▀ ▀█████████▄ ▄████████▀ ▀████████▄ █████████▄▄█▌ ▐█▄▄█████████ ████████████▌ ▐████████████ ████████████▌ ▐████████████ ▀███████████▌ ▐███████████▀ ▀██████████▌ ▐██████████▀ ▀█████████████████████▀ ▀█████████████████▀ ▀▀█████████▀▀ | ▀▀▀▀▀▀▀▀ GROWING USE CASES . ▀▀▀▀▀▀▀▀ | | | ▄▄█████████▄▄ ▄█████████████████▄ ▄████████▀ ▀████████▄ ▄████▀ ▄▄▄ ▀████▄ ▄█████ ███████████ █████▄ ████▀ ▄███████▀▀██▄ ▀████ ████ ████ ▀▀ ▄████ ████ ████▄ ▀███▄ ▄████▀ ▄████ ▀█████ ███████████ █████▀ ▀████▄ ▀▀▀ ▄████▀ ▀████████▄ ▄████████▀ ▀█████████████████▀ ▀▀█████████▀▀ | ▀▀▀▀▀▀▀▀ EASE OF CRYPTO TRADING . ▀▀▀▀▀▀▀▀ | | | | ██████████
████████
██████
████
██████
████████
██████████ |
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
May 12, 2011, 09:25:07 AM |
|
I have uploaded a modified version of phatk to the Phoenix SVN. The main difference is that it now has the same FASTLOOP improvements as the poclbm kernel from 1.45. Performance should be around the same except at low aggression. DownloadAwesome, thanks . I didn't even notice that you changed that code. And everyone, thanks for informational and BTC support. It's really really appreciated.
|
|
|
|
Herodes
|
|
May 12, 2011, 10:37:37 AM |
|
On 5970 it seems to increase the hashing rate with 3-4%. More coins your way.
|
|
|
|
drcoin
Newbie
Offline
Activity: 12
Merit: 0
|
|
May 12, 2011, 10:43:30 AM |
|
Using latest catalyst drivers with 2.4 on 5830 @974/298 with AGGRESSION=12 BFI_INT VECTORS FASTLOOP=false:
poclbm: 290 Mhash/s phatk: 301Mhash/s
Nice work!
Edit: Tweaked memory clock - seems to peak around 335Mhz at 302.5 Mhash/s.
|
|
|
|
Tyran
Newbie
Offline
Activity: 40
Merit: 0
|
|
May 12, 2011, 02:23:44 PM |
|
Decided to give SDK 2.4 a try on my 5770, and I can confirm that this kernel is indeed ~11 mhash/sec faster than the default one, but it is not enough to make up for the 2.1 -> 2.4 loss
|
|
|
|
OtaconEmmerich
|
|
May 12, 2011, 04:24:25 PM |
|
Decided to give SDK 2.4 a try on my 5770, and I can confirm that this kernel is indeed ~11 mhash/sec faster than the default one, but it is not enough to make up for the 2.1 -> 2.4 loss Is 2.1 that much better? I really don't want to use old drivers just to get SDK 2.1, Can you use SKD 2.1 on 11.5? Last time I tried that it was exactly the same speed.
|
|
|
|
gmaxwell
Moderator
Legendary
Offline
Activity: 4158
Merit: 8382
|
|
May 12, 2011, 06:10:08 PM |
|
Decided to give SDK 2.4 a try on my 5770, and I can confirm that this kernel is indeed ~11 mhash/sec faster than the default one, but it is not enough to make up for the 2.1 -> 2.4 loss Is 2.1 that much better? I really don't want to use old drivers just to get SDK 2.1, Can you use SKD 2.1 on 11.5? Last time I tried that it was exactly the same speed. People posting numbers from 2.1 appear to be lower than mine on 2.4.
|
|
|
|
OtaconEmmerich
|
|
May 12, 2011, 06:36:32 PM |
|
Decided to give SDK 2.4 a try on my 5770, and I can confirm that this kernel is indeed ~11 mhash/sec faster than the default one, but it is not enough to make up for the 2.1 -> 2.4 loss Is 2.1 that much better? I really don't want to use old drivers just to get SDK 2.1, Can you use SKD 2.1 on 11.5? Last time I tried that it was exactly the same speed. People posting numbers from 2.1 appear to be lower than mine on 2.4. So many conflicting reports..Bah!
|
|
|
|
anisoptera
Member
Offline
Activity: 308
Merit: 10
|
|
May 12, 2011, 07:25:20 PM |
|
Went from ~402mhash at 950/300 on a 5870 to ~417 with this (SDK 2.4) Tweaked the memory clock up a bit to 350 and now it's more like 418-420. Definitely a nice improvement
|
|
|
|
Enky1974
|
|
May 13, 2011, 10:36:15 AM |
|
Went from ~402mhash at 950/300 on a 5870 to ~417 with this (SDK 2.4) Tweaked the memory clock up a bit to 350 and now it's more like 418-420. Definitely a nice improvement with aggression 13 i've 412, same clock settings as you but sdk 2.3
|
|
|
|
exahash
|
|
May 14, 2011, 03:27:21 AM |
|
Very nice! I'm getting almost 10 Mh/s more than with the poclbm kernel on my Sapphire Xtreme 5850. Thanks Phateus.
|
|
|
|
trumpetx
Member
Offline
Activity: 99
Merit: 10
|
|
May 14, 2011, 12:32:19 PM |
|
Unfortunately no improvement on a 5770: @935/300 -k poclbm: 207.5 @935/300 -k phatk: 202.5 Higher memory clocks only decrease performance more. Might be because I'm running SDK 2.1, do you think it would make up for the ~5% loss going to 2.4?
Same results here on my 5770 - nothing changed really. @960/300 -k poclbm: 213.4 @960/275 -k poclbm: 214.1 @960/300 -k phatk: 212.8 @960/275 -k phatk: 212.2
|
|
|
|
elrock
Newbie
Offline
Activity: 41
Merit: 0
|
|
May 14, 2011, 01:21:12 PM |
|
I get the following error message when I try to run phatk: File "./phoenix.py", line 123, in <module> miner.start(options) File "/home/elrock/phoenix-1.47/Miner.py", line 74, in start self.kernel = self.options.makeKernel(KernelInterface(self)) File "./phoenix.py", line 112, in makeKernel self.kernel = kernelModule.MiningKernel(requester) File "kernels/phatk/__init__.py", line 126, in __init__ platforms = cl.get_platforms() pyopencl.LogicError: clGetPlatformIDs failed: invalid/unknown error code
I think this may have something to do with the fact that my GPU is DEVICE 1 and not 0. (For some reason OpenCL recognizes my CPU as DEVICE 0.)
|
|
|
|
Enky1974
|
|
May 14, 2011, 01:29:10 PM |
|
I get the following error message when I try to run phatk: File "./phoenix.py", line 123, in <module> miner.start(options) File "/home/elrock/phoenix-1.47/Miner.py", line 74, in start self.kernel = self.options.makeKernel(KernelInterface(self)) File "./phoenix.py", line 112, in makeKernel self.kernel = kernelModule.MiningKernel(requester) File "kernels/phatk/__init__.py", line 126, in __init__ platforms = cl.get_platforms() pyopencl.LogicError: clGetPlatformIDs failed: invalid/unknown error code
I think this may have something to do with the fact that my GPU is DEVICE 1 and not 0. (For some reason OpenCL recognizes my CPU as DEVICE 0.) i've had the same problem when switching from catalyst 11.1 to 11.4, before it was recognized as device 1 and now 0.
|
|
|
|
redicarus
Newbie
Offline
Activity: 26
Merit: 0
|
|
May 15, 2011, 01:54:08 AM |
|
910Mhz/300Mhz on a HD5850, jumped from 330~ to 345-350~Mhash/s. Nice job.
|
|
|
|
tiberiandusk
|
|
May 15, 2011, 07:50:44 AM |
|
On my OC'd 5870 I went from 410 to 430. awwww yeeeeeaaaaah!
|
|
|
|
allinvain
Legendary
Offline
Activity: 3080
Merit: 1080
|
|
May 15, 2011, 04:55:07 PM |
|
This modified kernel kicks ass. Went from 350 to 371 with stock 5970 (850 Mhz - it's the slightly overclocked 4 gb vram one) speeds and aggression level 7. With aggression level 12 performance bumps up to 377. All the memory is at 300 Mhz. Very nice! Thank you so much OP!!!!
|
|
|
|
Miner-TE
|
|
May 15, 2011, 07:09:08 PM Last edit: May 15, 2011, 09:41:56 PM by Miner-TE |
|
Nice little bump up from 405 MH/s to 420 Mh/s on my 5870 but power usage went up 15W as well. ~15Mh/s gain with ~15W more power? 5870 @ 970 core 300 mem Phoenix 1.46 ~405 Hh/s (-k poclbm DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256) 72 degC 213-215 Watts (measured by KillAWatt) Same 5870 @ 970 core 300 mem Phoenix 1.46 with new Kernel ~420 Mh/s (-k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256) 75 degC 227-230 Watts (measured by KillAWatt) Can anyone else verify?
|
BTC - 1PeMMYGn7xbZjUYeaWe9ct1VV6szLS1vkD - LTC - LbtcJRJJQQBjZuHr6Wm7vtB9RnnWtRNYpq
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
May 15, 2011, 07:17:00 PM |
|
Holy....... I am using this word second time. 1st time when my hash jumped from 275 to 300 Mhash/s & now to 313 Mhash/s after using phatk. phoenix.exe -u http://XXXXXXXXX@mining.bitcoin.cz:8332/ DEVICE=0 VECTORS BFI_INT AGGRESSION=10 -k phatk HD 6870 With core clk 1038, mem clk 360, fan 100%, temp 75-77C windows 7 32 bit.
|
|
|
|
allinvain
Legendary
Offline
Activity: 3080
Merit: 1080
|
|
May 16, 2011, 12:26:06 AM |
|
Nice little bump up from 405 MH/s to 420 Mh/s on my 5870 but power usage went up 15W as well. ~15Mh/s gain with ~15W more power? 5870 @ 970 core 300 mem Phoenix 1.46 ~405 Hh/s (-k poclbm DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256) 72 degC 213-215 Watts (measured by KillAWatt) Same 5870 @ 970 core 300 mem Phoenix 1.46 with new Kernel ~420 Mh/s (-k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256) 75 degC 227-230 Watts (measured by KillAWatt) Can anyone else verify? I can verify this. My power usage went up too. From 412 to 421 on one rig to 440~ Now the question is whether the extra hash power justifies the extra power consumption..math anyone?
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
May 16, 2011, 12:46:07 AM |
|
Nice little bump up from 405 MH/s to 420 Mh/s on my 5870 but power usage went up 15W as well. ~15Mh/s gain with ~15W more power? 5870 @ 970 core 300 mem Phoenix 1.46 ~405 Hh/s (-k poclbm DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256) 72 degC 213-215 Watts (measured by KillAWatt) Same 5870 @ 970 core 300 mem Phoenix 1.46 with new Kernel ~420 Mh/s (-k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256) 75 degC 227-230 Watts (measured by KillAWatt) Can anyone else verify? I can verify this. My power usage went up too. From 412 to 421 on one rig to 440~ Now the question is whether the extra hash power justifies the extra power consumption..math anyone? Ok, According the deepbit's reward calculator, 405MH/s gives .097 and 420MH/s gives 0.10 BTC per hour Switching gains you .003BTC per hour or at the current exchange rate of $7 per: $0.021/hthe difference in power is 15 Watts ( .015 kW) The price of electricity is about $0.10/kWh ( http://www.eia.doe.gov/cneaf/electricity/epm/table5_6_b.html) The cost of electricity from the increase is .015 kW * $0.10/kWh = .0015$/hSo... the increase in profits is 14 times higher than the increase in cost. Unless the price drops to .5 or the dificulty goes up 14-fold, pretty much any overclocking / optimizing is worth it. Edit: Also, the increase in air conditioning will likely double the cost of electricity, but the cost is still is negligible compared to the increase in profit. Hope this helps -Phateus
|
|
|
|
jondecker76
|
|
May 16, 2011, 12:56:01 AM |
|
Running a single saphire 5850 at 875,900 overclock with ati sdk 2.4
using the poslbm kernel - 328 MHash using the phatk kernel - 340 MHash!
Very nice!!! I'll be sure to donate!
|
|
|
|
allinvain
Legendary
Offline
Activity: 3080
Merit: 1080
|
|
May 16, 2011, 07:31:31 AM |
|
Nice little bump up from 405 MH/s to 420 Mh/s on my 5870 but power usage went up 15W as well. ~15Mh/s gain with ~15W more power? 5870 @ 970 core 300 mem Phoenix 1.46 ~405 Hh/s (-k poclbm DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256) 72 degC 213-215 Watts (measured by KillAWatt) Same 5870 @ 970 core 300 mem Phoenix 1.46 with new Kernel ~420 Mh/s (-k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256) 75 degC 227-230 Watts (measured by KillAWatt) Can anyone else verify? I can verify this. My power usage went up too. From 412 to 421 on one rig to 440~ Now the question is whether the extra hash power justifies the extra power consumption..math anyone? Ok, According the deepbit's reward calculator, 405MH/s gives .097 and 420MH/s gives 0.10 BTC per hour Switching gains you .003BTC per hour or at the current exchange rate of $7 per: $0.021/hthe difference in power is 15 Watts ( .015 kW) The price of electricity is about $0.10/kWh ( http://www.eia.doe.gov/cneaf/electricity/epm/table5_6_b.html) The cost of electricity from the increase is .015 kW * $0.10/kWh = .0015$/hSo... the increase in profits is 14 times higher than the increase in cost. Unless the price drops to .5 or the dificulty goes up 14-fold, pretty much any overclocking / optimizing is worth it. Edit: Also, the increase in air conditioning will likely double the cost of electricity, but the cost is still is negligible compared to the increase in profit. Hope this helps -Phateus It helps a lot. Thanks for that analysis. I for one very much appreciate it.
|
|
|
|
mosimo
Newbie
Offline
Activity: 9
Merit: 0
|
|
May 16, 2011, 08:07:15 AM |
|
I'm running 2x 5870s but at 965 core, 300 mem. phoenix -u http://blah/ -k poclbm VECTORS AGGRESSION=11 BFI_INT PLATFORM=0 DEVICE=0 WORKSIZE=768 Gets me 404 MH/s phoenix -u http://blah/ VECTORS AGGRESSION=12 BFI_INT PLATFORM=0 DEVICE=0 WORKSIZE=768 -k phatk Gets me 420 MH/s Huge improvement. Thanks for this.
|
|
|
|
DiabloD3
Legendary
Offline
Activity: 1162
Merit: 1000
DiabloMiner author
|
|
May 16, 2011, 08:08:53 PM |
|
I'm running 2x 5870s but at 965 core, 300 mem. phoenix -u http://blah/ -k poclbm VECTORS AGGRESSION=11 BFI_INT PLATFORM=0 DEVICE=0 WORKSIZE=768 Gets me 404 MH/s phoenix -u http://blah/ VECTORS AGGRESSION=12 BFI_INT PLATFORM=0 DEVICE=0 WORKSIZE=768 -k phatk Gets me 420 MH/s Huge improvement. Thanks for this. 5xxx maxes out at a worksize of 256.
|
|
|
|
icaci
Newbie
Offline
Activity: 28
Merit: 0
|
|
May 16, 2011, 10:49:38 PM |
|
5xxx maxes out at a worksize of 256.
My dual 5870 (w/o CF bridges) maxes out at WORKSIZE=128.
|
|
|
|
DiabloD3
Legendary
Offline
Activity: 1162
Merit: 1000
DiabloMiner author
|
|
May 17, 2011, 12:15:28 AM |
|
5xxx maxes out at a worksize of 256.
My dual 5870 (w/o CF bridges) maxes out at WORKSIZE=128. Nope, that too maxes out at 256. What I said was 768 simply is not valid for 5xxx hardware. Phoenix should output the error OpenCL is returning instead of covering it up.
|
|
|
|
nster
|
|
May 17, 2011, 12:31:58 AM |
|
5xxx maxes out at a worksize of 256.
My dual 5870 (w/o CF bridges) maxes out at WORKSIZE=128. by mxes out he means maximum worksize, not maximum hashrate
|
167q1CHgVjzLCwQwQvJ3tRMUCrjfqvSznd Donations are welcome Please be kind if I helped
|
|
|
jedi95
|
|
May 17, 2011, 12:41:16 AM |
|
5xxx maxes out at a worksize of 256.
My dual 5870 (w/o CF bridges) maxes out at WORKSIZE=128. Nope, that too maxes out at 256. What I said was 768 simply is not valid for 5xxx hardware. Phoenix should output the error OpenCL is returning instead of covering it up. I'll probably add this in the next version, but for now it just uses the maximum supported if you enter a higher value.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
rowbot
Member
Offline
Activity: 96
Merit: 10
NOW
|
|
May 17, 2011, 12:40:07 PM |
|
Tried it on my 5830 and there was no difference.
|
|
|
|
Folax
|
|
May 21, 2011, 10:48:53 AM |
|
Works nicely on XP64. Anyone using it on Linux?
|
My GF thinks I'm useless, if you think otherwise and can proof it to her, please do so and donate: 14wG6u2bAD9q1nLmLL9MST1ZzbTE9Pt8nG
|
|
|
William Reed
Newbie
Offline
Activity: 15
Merit: 0
|
|
May 21, 2011, 03:05:57 PM |
|
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.
|
|
|
|
JayC
Newbie
Offline
Activity: 34
Merit: 0
|
|
May 21, 2011, 03:23:52 PM |
|
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.
Just out of curiosity, how do you tell what worksize you need for a specific card?
|
|
|
|
huayra.agera
|
|
May 21, 2011, 05:47:35 PM |
|
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.
This worked well for me! Thanks for this tip man! +1: I have 3 5850s and these settings added like 20 Mhash/s while on my 6850 +10Mh/s! Cool!
|
BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
|
|
|
lagmo
Member
Offline
Activity: 67
Merit: 10
|
|
May 21, 2011, 06:17:20 PM |
|
Very nice job! Finally got to break the 400Mhash/s barrier on my HD5850, an increase of about 8-10Mhash/s over POCLBM kernel.
|
|
|
|
William Reed
Newbie
Offline
Activity: 15
Merit: 0
|
|
May 21, 2011, 07:10:55 PM |
|
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.
Just out of curiosity, how do you tell what worksize you need for a specific card? There is no general rule. It mostly depends on the architecture and memory technology used. In heavy scientific calculations best worksize is usually the one that the card can process natively but in mining where a single loop is very simple and fast the optimal worksize can vary. In mining lowering memory clocks saves power and therefore may allow for extra OC on the core thus speeding up computation. If you lower your memory clocks too much it can lower your processing power but this kind of loss can be compensated by lowering worksize. So without solid background in high speed computation architectures the fastest way to know is trying out all possible combinations.
|
|
|
|
Syke
Legendary
Offline
Activity: 3878
Merit: 1193
|
|
May 24, 2011, 05:23:40 PM |
|
Any chance of getting a kernel optimized for the 6xxx series?
|
Buy & Hold
|
|
|
EPiSKiNG
Legendary
Offline
Activity: 800
Merit: 1001
|
|
May 25, 2011, 09:19:28 PM |
|
Any chance of getting a kernel optimized for the 6xxx series?
+1 !!
|
|
|
|
tiberiandusk
|
|
May 26, 2011, 04:39:25 AM |
|
My experience with my 5870 shows that worksize=128 works the best. With worksize=256 I show a slightly higher hashrate but overall submitted shares goes down a bit.
|
|
|
|
AngelusWebDesign
|
|
June 03, 2011, 05:07:55 PM |
|
Hashkill is faster for me on Linux 64-bit.
|
|
|
|
allinvain
Legendary
Offline
Activity: 3080
Merit: 1080
|
|
June 04, 2011, 09:00:46 AM |
|
Hashkill is faster for me on Linux 64-bit.
Hmm, wish they'd release a windblowz binary soon
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
June 04, 2011, 05:28:05 PM |
|
Waiting for windows version, so i too can get more hashes.
|
|
|
|
redcodenl
Newbie
Offline
Activity: 12
Merit: 0
|
|
June 07, 2011, 07:10:01 PM |
|
Any chance of getting a kernel optimized for the 6xxx series?
+1 as well! I'm now using phatk (with Phoenix) for my double 6870's, it is working like a charm. But the tought that it might do better with an optimized kernel is killing me ;-) Are there indications a better/optimized kernel for the 6xxx series can be created?
|
|
|
|
mbraun
Newbie
Offline
Activity: 2
Merit: 0
|
|
June 11, 2011, 02:52:33 PM |
|
HD5830 (Sapphire, stock volts) with SDK 2.4 VECTORS BFI_INT AGGRESSION=12 DEVICE=0 FASTLOOP=false WORKSIZE=256
1000/300: 298MH/s, 66°C 1000/300: 310MH/s, 66°C (phatk)
Thanks a lot man!
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
June 11, 2011, 03:13:45 PM |
|
My experience with my 5870 shows that worksize=128 works the best. With worksize=256 I show a slightly higher hashrate but overall submitted shares goes down a bit.
How is this posible?
|
|
|
|
hchc
|
|
June 11, 2011, 04:11:07 PM |
|
Hashkill is faster for me on Linux 64-bit.
can you post some number? I'm contemplating switching from windows to linux just because of this and not sure if its worth while. Currentlly getting 300mh/s with 5830 at 970/300..
|
|
|
|
............
| . | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▀ ▀▓▓▓▀ ▀▓▓▀ ▀▓▓▓▓ ▓▓▓▓▓▄ ▄▓▓▓▄ ▄▓▓▄ ▄▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓ ▓▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▀ ▀▓▓▓▀ ▀▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▄ ▄▓▓▓▄ ▄▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
| . | | . | ▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▀ ▀▓▓▓▀ ▀▓▓▀ ▀▓▓▓▓ ▓▓▓▓▓▄ ▄▓▓▓▄ ▄▓▓▄ ▄▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓ ▓▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▀ ▀▓▓▓▀ ▀▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▄ ▄▓▓▓▄ ▄▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▀ ▀▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▄ ▄▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
| . | ............
|
|
|
|
mbraun
Newbie
Offline
Activity: 2
Merit: 0
|
|
June 11, 2011, 06:41:34 PM |
|
can you post some number? I'm contemplating switching from windows to linux just because of this and not sure if its worth while. Currentlly getting 300mh/s with 5830 at 970/300..
These are already great numbers, don't think they'll change much on linux or windows. I also do not believe that mining gets faster because the CPU is able to work 64bits in a single cycle. It's not GPU related.
|
|
|
|
Hawkix
|
|
June 27, 2011, 09:08:35 PM |
|
Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
July 28, 2011, 03:53:33 AM |
|
Sorry I haven't really be on the forums much lately... wedding planning stuff . But... Any chance of getting a kernel optimized for the 6xxx series?
The is optimized for the 5xxx series, the 66xx series, the 67xx series and the 68xx series since they all use the same architecture. Only the 69xx cards use a different architecture which is less efficient for mining (VLIW4 instead of VLIW5 for those who are interested). I have debated whether to rewrite the kernel for the 69xx series, but at most, it would only increase performance by at most ~1%. Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.
In the current version, In addition to numerous very tiny optimizations, I have reordered the Ma() operands which reduce the number of instructions on operations with at least one non-vector operand. #define Ma(z, x, y) amd_bytealign((y), (x | z), (z & x)) I think this is what you are talking about... Anywho... here is my new version which is a very slight improvement over 1.0 (about 1% faster for me). One thing to note is that you MUST put in a valid WORKSIZE value when running version 1.1 due to one of the optimizations. https://sourceforge.net/projects/phatk/files/phatk-1.1.zip/download Post any questions or bugs you have, thanks -Phateus
|
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
July 28, 2011, 09:16:19 PM |
|
Ah.. there is a lot I've missed since I've been gone... I will combine my improvements and his to see if I can get it lower. Thanks for the info. -Phateus
|
|
|
|
pennytrader
|
|
July 29, 2011, 01:19:37 AM |
|
Great to see the continuous improvment
|
please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
July 29, 2011, 05:55:59 AM |
|
Alright, check the first post, I uploaded a second version today with a few tweaks (The Ma() tweak and slight reordering of some operations). It should be faster than diapolo's now. Also, anyone who wants to help with this or has any suggestions, PM me and I'll be more than happy to discuss when I get the chance. And... Diapolo (and anyone else who wants to help), if you read this... We should work together on trying to improve this I think it is a good idea to keep separate code sources to increase the chances of finding optimizations, but if you have any questions about my code, let me know. -Phateus
|
|
|
|
pennytrader
|
|
July 29, 2011, 06:24:26 AM |
|
kernel opencl error. does this work with phoenix 1.5?
|
please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
|
|
|
krzynek1
Newbie
Offline
Activity: 41
Merit: 0
|
|
July 29, 2011, 07:00:44 AM |
|
not working with Phoenix r101
|
|
|
|
jedi95
|
|
July 29, 2011, 07:20:04 AM |
|
not working with Phoenix r101
Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
CanaryInTheMine
Donator
Legendary
Offline
Activity: 2352
Merit: 1060
between a rock and a block!
|
|
July 29, 2011, 07:22:21 AM |
|
not working with Phoenix r101
Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster. where can I find r112?
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 29, 2011, 07:23:41 AM |
|
not working with Phoenix r101
Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster. where can I find r112? +1, also how to know the revision number?
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 29, 2011, 07:45:08 AM |
|
I am getting STRANGE results. Using Diapolo I got 434 & 427. Both are 5870 card. 1st is MSI Lightning 5870 @ 957/319, 1175mV. 2nd is Sapphire HD 5870 @ 939/313, 1163mV.
From your Phatk 2.0, i get 430 & 429. No change in any flags... 434 reduced to 430, But 427 increased to 429.
|
|
|
|
lagmo
Member
Offline
Activity: 67
Merit: 10
|
|
July 29, 2011, 08:23:51 AM |
|
I'm getting this error when i try to use your 2.0 kernel on Phoenix 1.5/Linuxcoin 2.0(Debian live) Works just fine on my Win7 x64 box though, so guessing it's specific to linuxcoins default complement of packages. Unhandled error in Deferred: Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 361, in callback self._startRunCallbacks(result) File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/opt/miners/phoenix/QueueReader.py", line 136, in preprocess d2 = defer.maybeDeferred(self.preprocessor, nr) --- <exception caught here> --- File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "kernels/phatk/__init__.py", line 167, in <lambda> self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr), File "kernels/phatk/__init__.py", line 361, in preprocess kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION) File "kernels/phatk/__init__.py", line 46, in __init__ unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32) struct.error: unpack requires a string argument of length 32
|
|
|
|
iopq
|
|
July 29, 2011, 01:54:34 PM |
|
Running windows 7, 64 bit I'm getting [29/07/2011 06:50:32] FATAL kernel error: Failed to load OpenCL kernel! when I try the newest one I tried with phoenix 1.5 and the latest 112 revision, get the same error I'm doing python phoenix.py -u http://iopq.me:***@mineco.in: 3000/ -k phatk DEVICE=1 VECTORS BFI_INT AGGRESSION=7 WORKSIZE=128 does it have something to do with worksize? because when i supply an invalid worksize to the phatk 1.0 it also gives the same error
|
|
|
|
bcforum
|
|
July 29, 2011, 02:11:21 PM Last edit: July 29, 2011, 02:25:25 PM by bcforum |
|
Gives an error in Linux (Ubuntu 10.10 x64), Python 2.6.6, Twisted 10.1.0-2: [29/07/2011 08:10:17] Phoenix 1.50 starting... [29/07/2011 08:10:17] Connected to server [29/07/2011 08:10:17] Server gave new work; passing to WorkQueue [29/07/2011 08:10:17] New block (WorkQueue) [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC] Unhandled error in Deferred: Traceback (most recent call last): File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 318, in callback self._startRunCallbacks(result) File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 441, in _runCallbacks self.result = callback(self.result, *args, **kw) File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess d2 = defer.maybeDeferred(self.preprocessor, nr) --- <exception caught here> --- File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 125, in maybeDeferred result = f(*args, **kw) File "kernels/phatk/__init__.py", line 167, in <lambda> self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr), File "kernels/phatk/__init__.py", line 361, in preprocess kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION) File "kernels/phatk/__init__.py", line 46, in __init__ unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32) struct.error: unpack requires a string argument of length 32 [29/07/2011 08:10:17] Server gave new work; passing to WorkQueue [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]^C
I tried changing the 'LLLL' and 'LLLLLLLL' to 'IIII' (like in the old __init__.py, but that caused a new error further along.
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
dikidera
|
|
July 29, 2011, 02:12:52 PM |
|
Yup the new kernel doesnt work.
|
|
|
|
iopq
|
|
July 29, 2011, 02:17:58 PM |
|
1.1 doesn't work either for me, same error
|
|
|
|
Diapolo
|
|
July 29, 2011, 03:31:10 PM |
|
Alright, check the first post, I uploaded a second version today with a few tweaks (The Ma() tweak and slight reordering of some operations). It should be faster than diapolo's now. Also, anyone who wants to help with this or has any suggestions, PM me and I'll be more than happy to discuss when I get the chance. And... Diapolo (and anyone else who wants to help), if you read this... We should work together on trying to improve this I think it is a good idea to keep separate code sources to increase the chances of finding optimizations, but if you have any questions about my code, let me know. -Phateus Currently looking at your code ... Dia
|
|
|
|
Mr.Prayer
Newbie
Offline
Activity: 8
Merit: 0
|
|
July 29, 2011, 04:08:16 PM |
|
Win7 x64, 5870, Catalyst 11.7, latest GUIMiner (Phoenix 1.5). After copying v2.0 files into "kernels\phatk" i get this messages in console: 2011-07-29 11:02:49: Listener for "itzod2": [29/07/2011 11:02:49] [4.19 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-29 11:02:50: Listener for "itzod2": [29/07/2011 11:02:50] Warning: work queue empty, miner is idle
No work is being done. Here's miner starting parameters: 2011-07-29 18:08:49: Running command: .\phoenix.exe -u http://****:****@lp1.itzod.ru:8344 PLATFORM=0 DEVICE=0 AGGRESSION=12 -k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256
Your v1.0 and Diapolo's 2011-07-17 kernel works fine.
|
|
|
|
Diapolo
|
|
July 29, 2011, 05:51:47 PM |
|
1st question, how is 0x2004000U in line 170 computed? Currently I don't get it . Dia
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
July 29, 2011, 06:12:38 PM |
|
Yup the new kernel doesnt work.
BAH!.. I'll look through it, tonight I am going to a Sublime with Rome and 311 concert... so this weekend. 1st question, how is 0x2004000U in line 170 computed? Currently I don't get it . Dia Basically, since only the last bit is different between the 2 nonces W3.x and W3.y, the first calculation done on those values is P2: P2(18) = rot(W[3],25)^rot(W[3],14)^((W[3])>>3U); So, basically, instead of flipping Bit 0 on W[3] and calculating both W[18].x and W[18].y, we can calculate W[18].x and W[18].y will be the same besides bits 25 and 14 being flipped P2(18).x = rot(W[3].x,25)^rot(W[3].x,14)^((W[3].x)>>3U); W[3].y = W[3].x ^ 1, therefore:
P2(18).y = P2(18).x ^ (rot(1,25)^rot(1,14)^((1)>>3U)); so, P2(18).y = P2(18).x ^ 0x2004000U;
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
July 30, 2011, 04:38:25 AM |
|
Win7 x64, 5870, Catalyst 11.7, latest GUIMiner (Phoenix 1.5). After copying v2.0 files into "kernels\phatk" i get this messages in console: 2011-07-29 11:02:49: Listener for "itzod2": [29/07/2011 11:02:49] [4.19 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-29 11:02:50: Listener for "itzod2": [29/07/2011 11:02:50] Warning: work queue empty, miner is idle
No work is being done. Here's miner starting parameters: 2011-07-29 18:08:49: Running command: .\phoenix.exe -u http://****:****@lp1.itzod.ru:8344 PLATFORM=0 DEVICE=0 AGGRESSION=12 -k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256
Your v1.0 and Diapolo's 2011-07-17 kernel works fine. This is exactly what happens to mine too. guiminer 7-01-2011 with phoenix 1.5, using newest kernel in this thread, catalyst 11.7, win7 x64. It just spams "Warning: work queue empty, miner is idle" in console. I'm going to assume this kernel is either meant for an older miner, or its just plain broken. I'll be looking at this thread and diapolo's for an update. 12alu improvement is huge , might be able to break 470 mhash on my 5870. edit: I thought you had to declare kernel arguments -after- the -k switch and argument after to declare what kernel to use.
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 30, 2011, 06:44:30 AM |
|
Win7 x64, 5870, Catalyst 11.7, latest GUIMiner (Phoenix 1.5). After copying v2.0 files into "kernels\phatk" i get this messages in console: 2011-07-29 11:02:49: Listener for "itzod2": [29/07/2011 11:02:49] [4.19 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-29 11:02:50: Listener for "itzod2": [29/07/2011 11:02:50] Warning: work queue empty, miner is idle
No work is being done. Here's miner starting parameters: 2011-07-29 18:08:49: Running command: .\phoenix.exe -u http://****:****@lp1.itzod.ru:8344 PLATFORM=0 DEVICE=0 AGGRESSION=12 -k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256
Your v1.0 and Diapolo's 2011-07-17 kernel works fine. This is exactly what happens to mine too. guiminer 7-01-2011 with phoenix 1.5, using newest kernel in this thread, catalyst 11.7, win7 x64. It just spams "Warning: work queue empty, miner is idle" in console. I'm going to assume this kernel is either meant for an older miner, or its just plain broken. I'll be looking at this thread and diapolo's for an update. 12alu improvement is huge , might be able to break 470 mhash on my 5870. edit: I thought you had to declare kernel arguments -after- the -k switch and argument after to declare what kernel to use. this is an old bug of phoenix that the author was not able to fix. Try search for idle bug. It is not the problem of this kernel. see http://forum.bitcoin.org/index.php?topic=19169.0
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
July 30, 2011, 07:40:15 AM |
|
What makes you so sure its phoenix and not the kernel? 7-11 kernel and 7-17 kernel = work perfectly. Swap out to 2.0 kernel, spams idle.
Btw I searched that thread you linked and didn't see any mention of idle bug. :/
edit: did some other searching and apparently someone mentione didle bug was fixed in 1.50 but guiminer v2011-07-01 uses phoenix 1.50 according to my console, so I don't know where to go from here. If someone can figure out the problem and give steps to solve the idle bug with guiminer v2011-07-01, catalyst 11.7, opencl driver 2.5, win7 x64, and this kernel, i'll donate 0.25 btc to you. I would prefer to keep using phoenix in guiminer.
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
July 30, 2011, 12:41:19 PM |
|
Seems it only gives problem to those using GUIminer. I am using AOCLBF 1.75 & i so far didn't get any error, except strange Mhash/s variation which i already posted in this thread some posts back. GUIminer users , try with AOCLBF to check that also gives you problem.
OS-Windows 7, 64 bit with AERO enabled, catalyst 11.8 beta.
|
|
|
|
Diapolo
|
|
July 30, 2011, 01:15:45 PM |
|
Yup the new kernel doesnt work.
BAH!.. I'll look through it, tonight I am going to a Sublime with Rome and 311 concert... so this weekend. 1st question, how is 0x2004000U in line 170 computed? Currently I don't get it . Dia Basically, since only the last bit is different between the 2 nonces W3.x and W3.y, the first calculation done on those values is P2: P2(18) = rot(W[3],25)^rot(W[3],14)^((W[3])>>3U); So, basically, instead of flipping Bit 0 on W[3] and calculating both W[18].x and W[18].y, we can calculate W[18].x and W[18].y will be the same besides bits 25 and 14 being flipped P2(18).x = rot(W[3].x,25)^rot(W[3].x,14)^((W[3].x)>>3U); W[3].y = W[3].x ^ 1, therefore:
P2(18).y = P2(18).x ^ (rot(1,25)^rot(1,14)^((1)>>3U)); so, P2(18).y = P2(18).x ^ 0x2004000U; This is the first change that I implemented into my kernel, but it seems that only 69XX cards do benefit from that change. Will investigate further ... Dia
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
July 30, 2011, 01:38:17 PM |
|
Seems it only gives problem to those using GUIminer. I am using AOCLBF 1.75 & i so far didn't get any error, except strange Mhash/s variation which i already posted in this thread some posts back. GUIminer users , try with AOCLBF to check that also gives you problem.
OS-Windows 7, 64 bit with AERO enabled, catalyst 11.8 beta.
I tried aoclbf about a week or 2 back. I didn't like how the on-screen display was ugly, or didn't even show, so I didn't know the status of my miners. :/
|
|
|
|
fpgaminer
|
|
July 30, 2011, 02:36:57 PM |
|
Good stuff Phateus I'm getting an extra 2-3MH/s with your newest kernel compared to Diapolo's last kernel. I merged the code into my fork of poclbm and it seems to be working fine there (with command line option --phatk2): https://github.com/progranism/poclbmThe only bug I found was that the kernel wouldn't compile without BITALIGN. Not really important, since all my mining cards support BITALIGN. It complained about rotate being ambiguous. Keep up the good work!
|
|
|
|
techwtf
|
|
July 30, 2011, 02:45:57 PM Last edit: July 30, 2011, 03:06:48 PM by techwtf |
|
I'm also trying to port the kernel to poclbm.
RuntimeError: clBuildProgram failed: build program failure
Build on <pyopencl.Device 'Cypress' at 0x2d6d530>:
/tmp/OCLZO4wZQ.cl(184): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(4); ^ ...
/tmp/OCLZO4wZQ.cl(185): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[20] = P4C(20) + P1(20);
/tmp/OCLZO4wZQ.cl(186): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(5); ^
/tmp/OCLZO4wZQ.cl(187): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[21] = P1(21); ^
/tmp/OCLZO4wZQ.cl(189): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[22] = P3C(22) + P1(22); ^
Update: Is SDK 2.4 Required? I'm using 2.1. without VECTORS, it works...
|
|
|
|
techwtf
|
|
July 30, 2011, 03:25:32 PM |
|
Finally Working now. 435.5 -> 437.2. Some other small issue still exists, but not kernel related.
|
|
|
|
Vince
Newbie
Offline
Activity: 38
Merit: 0
|
|
July 30, 2011, 03:34:00 PM |
|
I had some trouble getting this to work with SDK 2.4, but finally its running.
On my HD6950 I get ~343Mhash/s, with diapolo's 11-07-17 its slightly better, ~345Mhash/s.
I tested some combinations - but both run best with BFI_INT WORKSIZE=128 VECTORS AGGRESSION=11
|
|
|
|
MiningBuddy
|
|
July 30, 2011, 05:45:21 PM |
|
V2 works fine on my windows box with 11.6 drivers and sdk 2.4 (gives me around 3mhs extra) But it does not work on my linux boxes with 11.5 drivers and sdk 2.1 or 2.4, giving errors such as FATAL kernel error: Failed to load OpenCL kernel!
Dropped back to diapolo's version on my linux box.
Thanks!
|
|
|
|
iopq
|
|
July 30, 2011, 06:10:13 PM |
|
Good stuff Phateus I'm getting an extra 2-3MH/s with your newest kernel compared to Diapolo's last kernel. I merged the code into my fork of poclbm and it seems to be working fine there (with command line option --phatk2): https://github.com/progranism/poclbmThe only bug I found was that the kernel wouldn't compile without BITALIGN. Not really important, since all my mining cards support BITALIGN. It complained about rotate being ambiguous. Keep up the good work! doesn't work when vectors are turned on I am running SDK 2.1, it's either -v or --phatk2, doesn't work with both
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
July 30, 2011, 06:32:29 PM |
|
What makes you so sure its phoenix and not the kernel? 7-11 kernel and 7-17 kernel = work perfectly. Swap out to 2.0 kernel, spams idle.
Btw I searched that thread you linked and didn't see any mention of idle bug. :/
edit: did some other searching and apparently someone mentione didle bug was fixed in 1.50 but guiminer v2011-07-01 uses phoenix 1.50 according to my console, so I don't know where to go from here. If someone can figure out the problem and give steps to solve the idle bug with guiminer v2011-07-01, catalyst 11.7, opencl driver 2.5, win7 x64, and this kernel, i'll donate 0.25 btc to you. I would prefer to keep using phoenix in guiminer.
"[0 Khash/sec] blah blah" is referring to the idle problem. See http://forum.bitcoin.org/index.php?topic=6458.msg229912#msg229912That is with the original kernel that people are having the idling problem. And the author basically said he will not be able to fix that bug and he is leaving it to other developers. He claimed the idling bug might be fixed in 1.50, in fact it was not. The idling bug has something to do with the AGGRESSION option, try lowing it might fix the problem.
|
|
|
|
Clipse
|
|
July 30, 2011, 08:09:08 PM |
|
Im having issue with this version 2.0 of phatk
On my ubuntu machines running phoenix 1.5, and the 3 updated files from phatk 2.0, I get some Queuereader error and it just hangs.
On my windows machines Im using phatk 2.0 with phoenix 1.5 and it isnt giving me the same error, works great(additional 9mh/s gained from previous diapolo kernel changes.)
|
...In the land of the stale, the man with one share is king... >> ClipseWe pay miners at 130% PPS | Signup here : Bonus PPS Pool (Please read OP to understand the current process)
|
|
|
Diapolo
|
|
July 31, 2011, 01:00:59 AM |
|
Phat, what is the effect of "LLLL" instead of "IIII" in the .py file? It seems to work even with IIII.
Thanks, Dia
|
|
|
|
1bitc0inplz
Member
Offline
Activity: 112
Merit: 10
|
|
July 31, 2011, 01:42:40 AM |
|
Hey, Thanks for the updated kernel. My 5830 went from 303 MH/s to 307 MH/s. However, this new kernel does not seem to work for my 5670, I went back to Diapolo's latest for that card. Using this kernel on my 5670 resulted in a strange "connecting" / "disconnected" loop where the card never actually did any hashing.... very odd. I don't know if this helps you any, but I'm on Windows 7 64-bit, Catalyst 11.7, Phoenix 1.5, and the configuration that I normally use for my 5670 is: phoenix.exe -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=15 WORKSIZE=128 -q 3 -u http://username:password@pool.bitp.it:8334 DEVICE=1
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
July 31, 2011, 02:56:00 AM Last edit: July 31, 2011, 03:24:13 AM by ssateneth |
|
What makes you so sure its phoenix and not the kernel? 7-11 kernel and 7-17 kernel = work perfectly. Swap out to 2.0 kernel, spams idle.
Btw I searched that thread you linked and didn't see any mention of idle bug. :/
edit: did some other searching and apparently someone mentione didle bug was fixed in 1.50 but guiminer v2011-07-01 uses phoenix 1.50 according to my console, so I don't know where to go from here. If someone can figure out the problem and give steps to solve the idle bug with guiminer v2011-07-01, catalyst 11.7, opencl driver 2.5, win7 x64, and this kernel, i'll donate 0.25 btc to you. I would prefer to keep using phoenix in guiminer.
"[0 Khash/sec] blah blah" is referring to the idle problem. See http://forum.bitcoin.org/index.php?topic=6458.msg229912#msg229912That is with the original kernel that people are having the idling problem. And the author basically said he will not be able to fix that bug and he is leaving it to other developers. He claimed the idling bug might be fixed in 1.50, in fact it was not. The idling bug has something to do with the AGGRESSION option, try lowing it might fix the problem. I tried all aggressions from 1 to 14 as well as changing FASTLOOP from false to true. Setting to true would just spam idle. Setting to false above aggression 6 would also spam idle. Setting to 6 or lower would spam stuff like.. 2011-07-30 21:50:53: Listener for "5830": [30/07/2011 21:50:53] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-30 21:50:54: Listener for "5830": [30/07/2011 21:50:54] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-30 21:50:55: Listener for "5830": [30/07/2011 21:50:55] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-30 21:50:56: Listener for "5830": [30/07/2011 21:50:56] [15.22 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-30 21:50:57: Listener for "5830": [30/07/2011 21:50:57] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-30 21:50:58: Listener for "5830": [30/07/2011 21:50:58] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-30 21:50:59: Listener for "5830": [30/07/2011 21:50:59] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-30 21:51:00: Listener for "5830": [30/07/2011 21:51:00] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] 2011-07-30 21:51:01: Listener for "5830": [30/07/2011 21:51:01] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)] ...which I know is absolutely wrong. Plus my GPU utilization is still 0% when it's spamming that. The donation (or prize if you want to look at it like that) of 0.25 btc for a fix to this while using guiminer v2011-07-01 win7 x64 is still up for grabs. edit: I also tried different WORKSIZE, no avail. Under regular phoenix (no guiminer) the kernel seems to work, though the improvement isn't as big as I had hoped. Still looking for a fix for guiminer based phoenix, unless the phoenix for guiminer is bad. I don't suppose anyone knows is subbing original phoenix 1.5 (the 6 meg one) in for the 22KB one in guiminer will work? edit2: subbing the 6800KB phoenix.exe in guiminer dir instead of the 22KB one -seems- to work (my shares are going up) but guiminer doesn't know what phoenix is doing. it only shwos shares going up. It doesnt show hash rate or any system messages in the console (LP new block, opencl errors, etc)
|
|
|
|
fpgaminer
|
|
July 31, 2011, 05:10:05 AM |
|
doesn't work when vectors are turned on I am running SDK 2.1, it's either -v or --phatk2, doesn't work with both How odd. Do you recall what the error message was, if any?
|
|
|
|
Yannick
Member
Offline
Activity: 68
Merit: 10
|
|
July 31, 2011, 07:27:55 AM |
|
I have two 5870 in my machine with W7 ultimate, AERO disabled. I'm using: SDK 2.1 ATI Drivers 10.7 I tested all drivers and SDK versions, this combination of driver and SDK version gave me the highest Hash/sec. But I'm not able to use the phatk kernel. I'm getting the following error: FATAL kernel error: Failed to load OpenCL kernel! How can I fix this? poclbm works fine. I'd like to try phatk too.
|
|
|
|
macboy80
Member
Offline
Activity: 102
Merit: 10
|
|
July 31, 2011, 09:23:08 AM |
|
Working well for me. Radeon 6950 @ 900/900 7/11 = ~360 Mh/s 2.0 = ~370 Mh/s Windows 7 x64 AERO Phoenix 1.5 -k phatk DEVICE=0 VECTORS BFI_INT FASTLOOP WORKSIZE=128 AGGRESSION=8 Catalyst 11.7 SDK 2.4 Thanks for the hard work. EDIT: Did not work in Phoenix 1.4. I had to upgrade to 1.5.
|
|
|
|
iopq
|
|
July 31, 2011, 10:20:05 AM |
|
doesn't work when vectors are turned on I am running SDK 2.1, it's either -v or --phatk2, doesn't work with both How odd. Do you recall what the error message was, if any? C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[22] = P3C(22) + P1(22); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[22] = P3C(22) + P1(22); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[22] = P3C(22) + P1(22); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[23] = W16 + P1(23); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[23] = W16 + P1(23); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[23] = W16 + P1(23); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(7); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[24] = W17 + P1(24); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[24] = W17 + P1(24); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[24] = W17 + P1(24); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC( ; ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[25] = P1(25) + P3(25); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[25] = P1(25) + P3(25); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[25] = P1(25) + P3(25); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" sharoundC(9); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[26] = P1(26) + P3(26); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[26] = P1(26) + P3(26); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: mixed vector-scalar operation not allowed unless up-convertable(scalar-type=>vector-element-type) W[26] = P1(26) + P3(26); ^ C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(197): error: bad argument type to opencl builtin function: expected type "uint2", actual type "int" W[27] = P1(27) + P3(27); ^ Error limit reached. 100 errors detected in the compilation of "C:\Users\Igor\AppData\Local\Temp\OCL1 F64.tmp.cl". Compilation terminated.
|
|
|
|
UniverseMan
Newbie
Offline
Activity: 26
Merit: 0
|
|
July 31, 2011, 11:18:42 PM |
|
I'm using Ubuntu 11.04, Catalyst 11.6, Phoenix 1.50. I unpacked the phatk version 2 files into my phoenix-1.50/kernels/phatk folder. When I ran my phoenix with kernel options -k phatk DEVICE=0 BFI_INT VECTORS AGGRESSION=12 FASTLOOP=FALSE WORKSIZE=256 I got the following error: user@computer:~$ sudo ./btcg0.sh [31/07/2011 18:04:08] Phoenix 1.50 starting... [31/07/2011 18:04:09] Connected to server [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 361, in callback self._startRunCallbacks(result) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess d2 = defer.maybeDeferred(self.preprocessor, nr) --- <exception caught here> --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "kernels/phatk/__init__.py", line 167, in <lambda> self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr), File "kernels/phatk/__init__.py", line 361, in preprocess kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION) File "kernels/phatk/__init__.py", line 46, in __init__ unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32) struct.error: unpack requires a string argument of length 32
I then had to CTRL+Z to kill the process. Not sure if this error is related to anything discussed before. But it's no big deal, as I've merely switched back to my previous kernel. Cheers!
|
|
|
|
bcforum
|
|
July 31, 2011, 11:23:32 PM |
|
I'm using Ubuntu 11.04, Catalyst 11.6, Phoenix 1.50. I unpacked the phatk version 2 files into my phoenix-1.50/kernels/phatk folder. When I ran my phoenix with kernel options -k phatk DEVICE=0 BFI_INT VECTORS AGGRESSION=12 FASTLOOP=FALSE WORKSIZE=256 I got the following error: user@computer:~$ sudo ./btcg0.sh [31/07/2011 18:04:08] Phoenix 1.50 starting... [31/07/2011 18:04:09] Connected to server [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 361, in callback self._startRunCallbacks(result) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess d2 = defer.maybeDeferred(self.preprocessor, nr) --- <exception caught here> --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "kernels/phatk/__init__.py", line 167, in <lambda> self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr), File "kernels/phatk/__init__.py", line 361, in preprocess kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION) File "kernels/phatk/__init__.py", line 46, in __init__ unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32) struct.error: unpack requires a string argument of length 32
I then had to CTRL+Z to kill the process. Not sure if this error is related to anything discussed before. But it's no big deal, as I've merely switched back to my previous kernel. Cheers! I get the same error with a similiar setup.
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
Tx2000
|
|
August 01, 2011, 01:34:16 AM |
|
miner is idle spam in console multiple times a second. Under Windows 7 x64, 11.4 Cat, 2.4 SDK, GuiMiner 2011-07-01. Essentially, does not work.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 01, 2011, 05:14:36 PM Last edit: August 01, 2011, 05:27:56 PM by Phateus |
|
Phat, what is the effect of "LLLL" instead of "IIII" in the .py file? It seems to work even with IIII.
Thanks, Dia
Nothing, I was trying to fix a bug with low WORKSIZE numbers which results in duplicate hashes (not sure if it is solved yet). Technically, the values are 32-bit which are "L" values instead of 16-bit "I" values, but python seems to handle both the same. As for all of the other issues, I think there is an issue with SDK 2.1 with my kernel. I will try explicitly declaring the rotation constant as uint instead of int (that may fix the problem) if anyone with SDK 2.1 wants to help out: change #define rot(x, y) amd_bitalign(x, x, (32-y)) #else #define rot(x, y) rotate(x, y) #endif to #define rot(x, y) amd_bitalign(x, x, (uint)(32-y)) #else #define rot(x, y) rotate(x, (uint)(y)) #endif and #define rot2(x, y) rotate(x, y) to #define rot2(x, y) rotate(x, (uint)(y)) If anyone tries this out, let me know if it changes anything. I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having... Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above? If I can't figure it out, I may just make all of the constants uint. Also, one more thing, does "rotate(x, y)" compile to 1 instruction in SDK 2.1? Running 2.4, explicitly using amd_bitalign does not improve performance (might be cleaner if I can just use rotate(x, y) regardless of whether BITALIGN is defined). I was also thinking of possibly just precompiling different versions of the kernel and using them, therefore, you'd be able to use the faster 2.4 kernel even if you use SDK 2.1. I'm not sure if this is possible, but I will look into it.
|
|
|
|
Diapolo
|
|
August 01, 2011, 07:28:40 PM |
|
I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having...
Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above? If I can't figure it out, I may just make all of the constants uint.
I really don't understand, why the compiler needs so much help and why one has to use such ugly code to get the best performance ... I hope AMD can optimize the compiler, so that we can use clean and straight forward code. I tried to reorder the comands and did not change the code itself and it saved 3 ALU OPs ... for nothing. that sucks so bad! The SDK 2.1 compatibilty was achieved via type-casts in front of hex-values in the code. Simply add (u) in front, where you use such values. Dia
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 01, 2011, 08:16:15 PM |
|
I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having...
Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above? If I can't figure it out, I may just make all of the constants uint.
I really don't understand, why the compiler needs so much help and why one has to use such ugly code to get the best performance ... I hope AMD can optimize the compiler, so that we can use clean and straight forward code. I tried to reorder the comands and did not change the code itself and it saved 3 ALU OPs ... for nothing. that sucks so bad! The SDK 2.1 compatibilty was achieved via type-casts in front of hex-values in the code. Simply add (u) in front, where you use such values. Dia OMG yeah, I know... They really need to work on the compiler... I actually work at the US Patent Office and work in instruction processing... VLIW is a fairly new area and there is a lot of new work coming out.. so give it a couple years (sigh)... What you have to remember that compiling VLIW code is extremely complicated (The kernel itself only uses 21 registers) and most of the instructions have to be based solely on the previous instruction. from Wikipedia [ http://en.wikipedia.org/wiki/Very_long_instruction_word] As a result, VLIW CPUs offer significant computational power with less hardware complexity (but greater compiler complexity) than is associated with most superscalar CPUs.
As is the case with any novel architectural approach, the concept is only as useful as code generation makes it. That is, the fact that a number of special-purpose instructions are available to facilitate certain complicated operations... is useless if compilers are unable to spot relevant source code constructs and generate target code that duly utilizes the CPU's advanced offerings. Therefore, programmers must be able to express their algorithms in a manner that makes the compiler's task easier. With all of that said, it would be amazing if you could just write: Init1(); for (int n = 0; n != 64; n++) { SHARound(); } Init2(); for (int n = 0; n != 64; n++) { SHARound(); } and let the compiler sort it out...
|
|
|
|
iopq
|
|
August 01, 2011, 09:59:10 PM |
|
change #define rot(x, y) amd_bitalign(x, x, (32-y)) #else #define rot(x, y) rotate(x, y) #endif to #define rot(x, y) amd_bitalign(x, x, (uint)(32-y)) #else #define rot(x, y) rotate(x, (uint)(y)) #endif and #define rot2(x, y) rotate(x, y) to #define rot2(x, y) rotate(x, (uint)(y)) If anyone tries this out, let me know if it changes anything. this works on 2.1 SDK
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 01, 2011, 11:16:49 PM |
|
change #define rot(x, y) amd_bitalign(x, x, (32-y)) #else #define rot(x, y) rotate(x, y) #endif to #define rot(x, y) amd_bitalign(x, x, (uint)(32-y)) #else #define rot(x, y) rotate(x, (uint)(y)) #endif and #define rot2(x, y) rotate(x, y) to #define rot2(x, y) rotate(x, (uint)(y)) If anyone tries this out, let me know if it changes anything. this works on 2.1 SDK Awesome, Thanks. I'll implement the changes and release soon. On another note, I just was searching through AMD's downloads and the KernelAnalyzer 1.9 just came out today with "Support for AMD APP SDK 2.5."... I think someone said that SDK 2.5 is supposed to support BFI_INT natively, so, maybe we can get some better performance with 2.5 *crosses fingers*
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 02, 2011, 01:21:35 AM |
|
I think someone said that SDK 2.5 is supposed to support BFI_INT natively, sounds like it"In SDK 2.5 we are expanding that, along with other optimizations, to generate BFI instructions."
|
mooo for rent
|
|
|
Diapolo
|
|
August 02, 2011, 05:10:57 AM |
|
I think someone said that SDK 2.5 is supposed to support BFI_INT natively, sounds like it"In SDK 2.5 we are expanding that, along with other optimizations, to generate BFI instructions." Seems you are wrong (at least for now): The optimization has been disabled in the current SDK due to a bug in the implementation that didn't get fixed in time. By the way is there any official Download link for the KernelAnalyzer 1.9? Dia
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
August 02, 2011, 05:13:43 AM |
|
|
|
|
|
Diapolo
|
|
August 02, 2011, 05:34:27 AM |
|
Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error? Dia
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 02, 2011, 05:36:16 AM |
|
Seems you are wrong (at least for now): read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time. at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4
|
mooo for rent
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 02, 2011, 05:41:36 AM |
|
Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error? Dia edit: BTW, I always thought your numbers were a couple lower than mine because you defined OUTPUT_MASK as something like "0x10" or something... doing that makes all my numbers match the ones on your thread lol.... mostly trial and error, Initially, for version 1.1, I looked at filling the gaps in the VLIW assembly (see which VLIW5 only had 4 instructions using barrier(0) instructions to see where in the assembly the OpenCL code is), but that took a LONG time, and I think I am done with that... (it turned out it only gave me like 3 ALU ops anyway). Seems you are wrong (at least for now): read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time. at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4 Yeah, I said that KernelAnalyzer 1.9 was out today saying that it supports 2.5, but 2.5 isn't out yet... probably tomorrow. And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link
|
|
|
|
Diapolo
|
|
August 02, 2011, 05:45:26 AM |
|
Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error? Dia lol.... mostly trial and error, Initially, for version 1.1, I looked at filling the gaps in the VLIW assembly (see which VLIW5 only had 4 instructions using barrier(0) instructions to see where in the assembly the OpenCL code is), but that took a LONG time, and I think I am done with that... (it turned out it only gave me like 3 ALU ops anyway). Seems you are wrong (at least for now): read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time. at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4 Yeah, I said that KernelAnalyzer 1.9 was out today saying that it supports 2.5, but 2.5 isn't out yet... probably tomorrow. And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link Cat 11.8 preview and Cat 11.7 have the SDK 2.5 runtime, so my tests are real :-/. Dia
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 02, 2011, 05:46:45 AM |
|
oooh, I will have to try that out... boo for AMD
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 02, 2011, 05:55:32 AM |
|
And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link I'm still getting miner idle errors in guiminer with VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2 is it just guiminer? edit:works fine with aoclbf 1.75.. i wonder why guiminer has such trouble speed 318 over 315 with diablo 7-17
|
mooo for rent
|
|
|
Diapolo
|
|
August 02, 2011, 05:56:07 AM |
|
You are using the OpenCL rotate() instead of amd_bitalign(), what's the benefit here (is it the same under the hood)?
Dia
|
|
|
|
pennytrader
|
|
August 02, 2011, 05:57:59 AM |
|
With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1
With Diapolo's kernel, I was able to get 314 mhs
|
please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 02, 2011, 06:08:02 AM |
|
With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1
With Diapolo's kernel, I was able to get 314 mhs
AAAH! I think OpenCL is going to make my head explode.. lol You are using the OpenCL rotate() instead of amd_bitalign(), what's the benefit here (is it the same under the hood)?
Dia
No, just cleaner.. since it is the same code (well... for SDK 2.4 at least)... it looks like 2.1 does not realize that they are the same and I will have to change it back... And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link I'm still getting miner idle errors in guiminer with VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2 is it just guiminer? edit:works fine with aoclbf 1.75.. i wonder why guiminer has such trouble speed 318 over 315 with diablo 7-17 No clue, I have no used or downloaded GUIMiner, I use aoclbf. I might be able to take a look at it after figuring out how to make it work for SDK 2.1
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 02, 2011, 06:15:52 AM |
|
With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1
With Diapolo's kernel, I was able to get 314 mhs
I changed it again, if it still doesn't work at all, can you give me some details on the settings you are using?
|
|
|
|
pennytrader
|
|
August 02, 2011, 06:21:31 AM |
|
With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1
With Diapolo's kernel, I was able to get 314 mhs
I changed it again, if it still doesn't work at all, can you give me some details on the settings you are using? Now worked! 316 Mhash/sec! -k phatk DEVICE=1 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256 And it uses 0% CPU as usual. Excellent work!
|
please donate to 1P3m2resGCP2o2sFX324DP1mfqHgGPA8BL
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 02, 2011, 06:30:23 AM Last edit: August 02, 2011, 06:45:53 AM by Phateus |
|
And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link
Still does not work for me. Here's the error message: /usr/local/lib/python2.6/dist-packages/pyopencl-2011.1-py2.6-linux-x86_64.egg/pyopencl/__init__.py:163: UserWarning: Build succeeded, but resulted in non-empty logs: Build on <pyopencl.Device 'Cypress' at 0x2cd7590> succeeded, but said:
/tmp/OCLWaeOzJ.cl(152): warning: variable "t1" was set but never used u t1; ^
warn("Build succeeded, but resulted in non-empty logs:\n"+message) [02/08/2011 06:15:45] Finding inner ELF... [02/08/2011 06:15:45] Patching inner ELF... [02/08/2011 06:15:45] Patching instructions... [02/08/2011 06:15:45] BFI-patched 472 instructions... [02/08/2011 06:15:45] Patch complete, returning to kernel... [02/08/2011 06:15:45] Applied BFI_INT patch [02/08/2011 06:15:46] Phoenix r100 starting... [02/08/2011 06:15:46] Connected to server [02/08/2011 06:15:46] Server gave new work; passing to WorkQueue [02/08/2011 06:15:46] New block (WorkQueue) [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred: Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 361, in callback self._startRunCallbacks(result) File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/media/persistent/phoenix/QueueReader.py", line 136, in preprocess d2 = defer.maybeDeferred(self.preprocessor, nr) --- <exception caught here> --- File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "kernels/phatk/__init__.py", line 179, in <lambda> self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr), File "kernels/phatk/__init__.py", line 379, in preprocess kd = KernelData(nr, self.core, self.rateDivisor, self.AGGRESSION) File "kernels/phatk/__init__.py", line 46, in __init__ unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32) struct.error: unpack requires a str[02/08/2011 06:15:46] Server gave new work; passing to WorkQueue [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
Hmmm... can you try replacing the 'LLLL' with 'IIII' (line 46 of __init__.py), I think the windows version uses python 2.7 which may handle that differently. Edit: I've made the changes already and posted as 2.1 again (hopefully this fixes it)
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 02, 2011, 06:52:50 AM |
|
Thanks for the update. Apparently guiminer needs to be updated for this kernel to work though (outdated phoenix?..) It just spams idle on the console. I really need to use guiminer. This is so frustrating
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
August 02, 2011, 07:13:21 AM |
|
2.1version. With VECTORS4 & worksize 128 or 64, i only get 365 instead of 441. I under clock memory. But with just vectors i get 448 & 432. Using 2version i got 441 & 427
cards MSI Lightning 5870 & Sapphire HD 5870 MSI 448 Mhash/s - 975/325, 1175mV - aggression 13 Sapphire 432 Mhash/s - 939/313, 1163mV - aggression 12
Windows 7, 64 bit, AERO enabled, AOCLBF 1.75
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 02, 2011, 07:31:26 AM |
|
2.1version. With VECTORS4 & worksize 128 or 64, i only get 365 instead of 441. I under clock memory. But with just vectors i get 448 & 432. Using 2version i got 441 & 427
cards MSI Lightning 5870 & Sapphire HD 5870 MSI 448 Mhash/s - 975/325, 1175mV - aggression 13 Sapphire 432 Mhash/s - 939/313, 1163mV - aggression 12
Windows 7, 64 bit, AERO enabled, AOCLBF 1.75
VECTORS4 is only if you DON'T underclock memory i.e. stock memory clocks or the glitch where you can only underclock memory 100mhz lower than core speeds. Low memory speed (<400MHz) = VECTORS and WORKSIZE=256 High memory speed (>900MHz) = VECTORS4 and WORKSIZE=64 or WORKSIZE=128
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 02, 2011, 07:49:03 AM |
|
Thanks for the update. Apparently guiminer needs to be updated for this kernel to work though (outdated phoenix?..) It just spams idle on the console. I really need to use guiminer. This is so frustrating nah phoenix seems up to date I'm guessing it is due to it using python 2.6 instead of 2.7
|
mooo for rent
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 02, 2011, 08:19:39 AM |
|
Thanks for the update. Apparently guiminer needs to be updated for this kernel to work though (outdated phoenix?..) It just spams idle on the console. I really need to use guiminer. This is so frustrating nah phoenix seems up to date I'm guessing it is due to it using python 2.6 instead of 2.7 Woooo!, found the bug... it is in my kernel... replace #self.commandQueue.finish() with self.commandQueue.finish() near the end of __init__.py *sigh*... Uploaded the file yet again...
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 02, 2011, 08:47:56 AM Last edit: August 02, 2011, 09:00:28 AM by ssateneth |
|
Thanks for the update. Apparently guiminer needs to be updated for this kernel to work though (outdated phoenix?..) It just spams idle on the console. I really need to use guiminer. This is so frustrating nah phoenix seems up to date I'm guessing it is due to it using python 2.6 instead of 2.7 Woooo!, found the bug... it is in my kernel... replace #self.commandQueue.finish() with self.commandQueue.finish() near the end of __init__.py *sigh*... Uploaded the file yet again... THIS FIXED IT! THANK YOU!!!!! Donation coming your way. EXCELLENT improvement. Gained 4 mhash on my 5830 and and 5.4 on my 5870. Amazing! Also 3 x 5830 rig went from 966.1 to 977.3, and increase of 11.2 mhash or 1.159%
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
August 02, 2011, 09:11:32 AM |
|
With NEW 2.1, same results only. 448 & 432. I am using 11.8 beta. AMD APP 2.5.709.2 AMD Display Driver 8.880.3.0000
|
|
|
|
John (John K.)
Global Troll-buster and
Legendary
Offline
Activity: 1288
Merit: 1225
Away on an extended break
|
|
August 02, 2011, 09:36:04 AM |
|
With new 2.1, hashes have improved at 4 mhs from 410 to 414 for my 5850's each ! Thank you!
|
|
|
|
Clipse
|
|
August 02, 2011, 10:51:29 AM |
|
Good stuff Phateus I'm getting an extra 2-3MH/s with your newest kernel compared to Diapolo's last kernel. I merged the code into my fork of poclbm and it seems to be working fine there (with command line option --phatk2): https://github.com/progranism/poclbmThe only bug I found was that the kernel wouldn't compile without BITALIGN. Not really important, since all my mining cards support BITALIGN. It complained about rotate being ambiguous. Keep up the good work! Hey fpgaminer, I really like this poclbm version of phatk2 but could you update the same version with --phatk2_1 switch or something so we could testdrive both versions with ease
|
...In the land of the stale, the man with one share is king... >> ClipseWe pay miners at 130% PPS | Signup here : Bonus PPS Pool (Please read OP to understand the current process)
|
|
|
iopq
|
|
August 02, 2011, 12:04:17 PM |
|
I'm getting a warning:
D:\sw\python27\lib\site-packages\pyopencl\__init__.py:173: UserWarning: Build su cceeded, but resulted in non-empty logs: Build on <pyopencl.Device 'Juniper ' at 0x414d7a0> succeeded, but said:
C:\Users\Igor\AppData\Local\Temp\OCL6496.tmp.cl(155): warning: variable "t1" was set but never used u t1; ^
NT -D warn("Build succeeded, but resulted in non-empty logs:\n"+message)
|
|
|
|
UniverseMan
Newbie
Offline
Activity: 26
Merit: 0
|
|
August 02, 2011, 01:43:38 PM Last edit: August 02, 2011, 02:02:37 PM by UniverseMan |
|
Using 2.1, I'm still getting the same error as before. I'm using Ubuntu 11.04, Catalyst 11.6, Phoenix 1.50. I unpacked the phatk version 2 files into my phoenix-1.50/kernels/phatk folder. When I ran my phoenix with kernel options -k phatk DEVICE=0 BFI_INT VECTORS AGGRESSION=12 FASTLOOP=FALSE WORKSIZE=256 I got the following error: user@computer:~$ sudo ./btcg0.sh [31/07/2011 18:04:08] Phoenix 1.50 starting... [31/07/2011 18:04:09] Connected to server [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 361, in callback self._startRunCallbacks(result) File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess d2 = defer.maybeDeferred(self.preprocessor, nr) --- <exception caught here> --- File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "kernels/phatk/__init__.py", line 167, in <lambda> self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr), File "kernels/phatk/__init__.py", line 361, in preprocess kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION) File "kernels/phatk/__init__.py", line 46, in __init__ unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32) struct.error: unpack requires a string argument of length 32
I then had to CTRL+Z to kill the process. I'm still getting the unpack 'LLLL' error. (Note: I tried to pipe the output of phoenix through tee to make a log, but tee gave a completely garbled log file and I didn't notice it until after I reverted my kernel files. This is obviously not your problem; I just want you to know why I don't have any new error messages to show.) This is the same error as znort (except I'm on python 2.7 and he's on 2.6). You suggested a fix... Hmmm... can you try replacing the 'LLLL' with 'IIII' (line 46 of __init__.py), I think the windows version uses python 2.7 which may handle that differently.
...which I tried (even though you say it's a windows problem and I'm not on windows). It gave another error at a later 'unpack' call being passed 'LLLLLLLL', and the error said 'unpack requires a string argument of length 64'. I tried changing that one to 'IIIIIIII', but it gave another error down the line that said something like 'incorrect arguments passed to kernel'. (Again, apologies for not having an error log.) EDIT: I checked something else, and now I'm more than ever. I loaded up the python interpreter on my machine so I can check how it sees the 'LLLL' and 'IIII' strings. >>> import struct >>> struct.calcsize('LLLL') 32 >>> struct.calcsize('IIII') 16 >>> struct.calcsize('LLLLLLLL') 64 >>> struct.calcsize('IIIIIIII') 32
Does this mean it's the nonceRange data and not the 'LLLL' that's the wrong size? How could that be? Does that mean there's some error wherever that nonceRange got packed in the first place? Like I said, I'm
|
|
|
|
UniverseMan
Newbie
Offline
Activity: 26
Merit: 0
|
|
August 02, 2011, 02:23:03 PM Last edit: August 02, 2011, 02:55:38 PM by UniverseMan |
|
All that stuff I just said.
HA! Got it fixed. Anyone who's having the error I just had, go through __init__.py, and every time there's an 'unpack' or a 'pack' statement that gets passed some number of 'L's (they will be 2, 4, or 8 'L's long), just add an '=' to the beginning. So 'LLLL' becomes '=LLLL'. If you look up the documentation on how struct (which is where pack and unpack come from) parses its arguments, found here, it's system dependent by default. But if you add the '=', it forces the size characters (Like the 'L's and 'I's and such) to be standard size. (Also, I had to uncomment the self.commandQueue.finish() statement, as per this post. I thought that was fixed, but it was still broken when I dled this morning.) Kernel - 6870 945/1050 - 5830 1030/330 Diapolo 7-17 - 293 MH/s - 325 MH/s phatk2.1 - 299 MH/s - 328 MH/s Thanks for the work, phateus.
|
|
|
|
iopq
|
|
August 02, 2011, 02:31:02 PM |
|
using poclbm fork with phatk2.1 and it's the fastest kernel so far I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)
|
|
|
|
Clipse
|
|
August 02, 2011, 02:41:19 PM |
|
using poclbm fork with phatk2.1 and it's the fastest kernel so far I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)
Yeh I must say, poclbm fork with phatk2 outperformed phatk2 on phoenix 1.5 (and up till now all phatk mods performed between on phoenix 1.5 for me) quite interesting. ps. iopq, can you post the changes made to run phatk2.1 on poclbm mod by fpgaminer, I assume you are using that? Also what arg is added to use vectors4. Ive replaced phatk2.cl with phatk2.1 cl but I get ~11mh less with phatk2.1 so I am wondering if there is other changes required. I am using sdk 2.4
|
...In the land of the stale, the man with one share is king... >> ClipseWe pay miners at 130% PPS | Signup here : Bonus PPS Pool (Please read OP to understand the current process)
|
|
|
UniverseMan
Newbie
Offline
Activity: 26
Merit: 0
|
|
August 02, 2011, 02:43:35 PM |
|
I tried VECTORS4 on my 6870, since I can't underclock the memory (it's at 1050).
Results: VECTORS WS 64: 295 MH/s WS 128: 299 MH/s WS 256: 292 MH/s
VECTORS4 WS 64: 278 MH/s WS 128: 258 MH/s WS 256: 230 MH/s
So VECTORS4 doesn't give me any boost. But thanks for putting it in. New functionality is always a plus.
|
|
|
|
Diapolo
|
|
August 02, 2011, 03:10:38 PM |
|
@Phat:
I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?
Thanks, Dia
|
|
|
|
iopq
|
|
August 02, 2011, 03:14:14 PM |
|
using poclbm fork with phatk2.1 and it's the fastest kernel so far I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)
Yeh I must say, poclbm fork with phatk2 outperformed phatk2 on phoenix 1.5 (and up till now all phatk mods performed between on phoenix 1.5 for me) quite interesting. ps. iopq, can you post the changes made to run phatk2.1 on poclbm mod by fpgaminer, I assume you are using that? Also what arg is added to use vectors4. Ive replaced phatk2.cl with phatk2.1 cl but I get ~11mh less with phatk2.1 so I am wondering if there is other changes required. I am using sdk 2.4 I'm using that, just replaced the phatk2 kernel with phatk2.1 and that's it vectors4 should be slower, why would you want to use it? I use -v only
|
|
|
|
Clipse
|
|
August 02, 2011, 03:21:02 PM |
|
using poclbm fork with phatk2.1 and it's the fastest kernel so far I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)
Yeh I must say, poclbm fork with phatk2 outperformed phatk2 on phoenix 1.5 (and up till now all phatk mods performed between on phoenix 1.5 for me) quite interesting. ps. iopq, can you post the changes made to run phatk2.1 on poclbm mod by fpgaminer, I assume you are using that? Also what arg is added to use vectors4. Ive replaced phatk2.cl with phatk2.1 cl but I get ~11mh less with phatk2.1 so I am wondering if there is other changes required. I am using sdk 2.4 I'm using that, just replaced the phatk2 kernel with phatk2.1 and that's it vectors4 should be slower, why would you want to use it? I use -v only Just wanted to test vectors4 with default memory, not high priority. Still phatk2.1 is much slower than phatk2 for me as I said ~11mh per card, ati hd5850 , I wonder why o_0
|
...In the land of the stale, the man with one share is king... >> ClipseWe pay miners at 130% PPS | Signup here : Bonus PPS Pool (Please read OP to understand the current process)
|
|
|
Tx2000
|
|
August 02, 2011, 03:52:39 PM |
|
I have a 3Mhash avg improvement over Diapolo last kernel update (393 -> 396)
Setup is as follows:
Reference 5850, 1.100v 920 core / 350 mem. 11.4 preview / SDK 2.4. Lastest GUIMiner / phoenix 1.50
Going to run it for a day to see it's stability and report back if anything arises.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 02, 2011, 05:34:57 PM |
|
@Phat:
I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?
Thanks, Dia
I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4. The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs. So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints. Does that answer your question?
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 02, 2011, 05:42:57 PM |
|
Woooo!, found the bug... it is in my kernel... you rock sir phateus... all is working here and nice speed up.. especially over the stock phatk 1.0 but yeha faster than diablo 7-17 for me.. on a 5830 sdk 2.4 11.6 cat guiminer..
|
mooo for rent
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 02, 2011, 07:15:09 PM |
|
in case some feedback was wanted for VECTORS4, I got about 20 mhash improvment on my 5870 when I have it set to stock speeds (850/1200) when using computer normally (360 -> 380 mhash) I will continue to use VECTORS when I am AFK (1015/355) for 470.1 mhash.
|
|
|
|
Diapolo
|
|
August 02, 2011, 07:20:40 PM |
|
@Phat:
I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?
Thanks, Dia
I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4. The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs. So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints. Does that answer your question?
I understand what you say and it makes sense, but not what I see now ... the variable base in your code _IS_ declared as u and not uint2. Did I look at the old 2.0 version!? Dia
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 02, 2011, 09:26:38 PM |
|
@Phat:
I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?
Thanks, Dia
I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4. The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs. So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints. Does that answer your question?
I understand what you say and it makes sense, but not what I see now ... the variable base in your code _IS_ declared as u and not uint2. Did I look at the old 2.0 version!? Dia yes, it is declared as u (it was uint2 in 2.0, but have made it variable for efficiency) #ifdef VECTORS4 typedef uint4 u; #else #ifdef VECTORS typedef uint2 u; #else typedef uint u; #endif #endif u is uint2 when VECTORS is declared Bah, I know all of this scattered code is confusing
|
|
|
|
BTC_Junkie
Member
Offline
Activity: 97
Merit: 10
|
|
August 03, 2011, 01:49:08 AM |
|
Thanks, getting +1-3% on my cards... better improvement on 5800 series than 6900 series.
|
12jAZVfnCjKmPUXTszwmoji9S4NmY26Qvu
|
|
|
fpgaminer
|
|
August 03, 2011, 02:07:11 AM |
|
Hey fpgaminer, I really like this poclbm version of phatk2 but could you update the same version with --phatk2_1 switch or something so we could testdrive both versions with ease Sure thing. All updated. Added --phatk2_1 option, and --vectors4 (which can only be used in combo with phatk2_1). https://github.com/progranism/poclbmLet me know how it works. I tested it on my 5850s. I tested with no vectors, vectors, and vectors4 and they all seemed to work. For my own sake, I also added a special feature where you can use "-e -1" to force the hashing estimation algorithm to estimate hashing speed over the entire run-time of the miner, and include both accepted and rejected shares. I'm using it to check that the code is actually hashing at the reported rate; no duplicate nonces or other bugs.
|
|
|
|
what@3
Newbie
Offline
Activity: 45
Merit: 0
|
|
August 03, 2011, 04:44:24 AM |
|
my 6950 took a hit from 390 to 356 Mh/s
however all my 6870's all got a 7 mhs bump!
5830 up by 7 also to 327.9 mhs
Thanks!
|
|
|
|
lagmo
Member
Offline
Activity: 67
Merit: 10
|
|
August 03, 2011, 03:51:42 PM |
|
Awesome! V. 2.1 kernel works flawlessly on my Linuxcoin 2.0 rigs (SDK 2.4 + 11.5 catalyst, HD5850/5830) generally + 3-4MH/s across the board compared to Diapolo 17-07 Excellent job!
|
|
|
|
phelix
Legendary
Offline
Activity: 1708
Merit: 1019
|
|
August 03, 2011, 03:57:03 PM |
|
the graph is really cool! how did you create it?
it's very interesting to see that the hashrate really is increasing with lower mem clock.
Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 03, 2011, 06:25:18 PM Last edit: August 03, 2011, 06:37:30 PM by Phateus |
|
the graph is really cool! how did you create it?
it's very interesting to see that the hashrate really is increasing with lower mem clock.
Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.
Manually testing and inputting data into a google docs spreadsheet :-p As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried). Best to just try it on your own hardware.
|
|
|
|
iopq
|
|
August 04, 2011, 03:57:33 AM |
|
the graph is really cool! how did you create it?
it's very interesting to see that the hashrate really is increasing with lower mem clock.
Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.
Manually testing and inputting data into a google docs spreadsheet :-p As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried). Best to just try it on your own hardware. 225 gets unstable, but 200 is fine, you just didn't go LOW enough (kind of how 400 hung my GPU) try 200, it's the best performance on my card with worksize 256, vectors 2
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 04, 2011, 04:24:32 AM |
|
Alright, new version 2.2 is coming out in the next couple days. As the front page says, 1354 ALU Ops for the 5xxx series vs. 1359 for 2.1 Changes I've made in 2.2 are: - added a rotC function for constant values since the compiler apparently does not know how to perform rotate() on constants
#define rotC(x,n) (x<<n | x >> (32-n)) [/li]
- Small tweaking of the order of certain functions and other random things that shouldn't really have done anything >
I will add anything else I think of the next couple days... Also, keep the bug reports coming, so I know if I need to fix anything. -Phateus
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 04, 2011, 04:25:03 AM |
|
the graph is really cool! how did you create it?
it's very interesting to see that the hashrate really is increasing with lower mem clock.
Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.
Manually testing and inputting data into a google docs spreadsheet :-p As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried). Best to just try it on your own hardware. 225 gets unstable, but 200 is fine, you just didn't go LOW enough (kind of how 400 hung my GPU) try 200, it's the best performance on my card with worksize 256, vectors 2 Awesome, thanks for the info, I'll definitely try it out.
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 04, 2011, 04:34:42 AM |
|
so i coppied #define rotC(x,n) (x<<n | x >> (32-n)) and pasted it in my kernel file and nothing blew up didnt really get any speed increases but of course I am flying blind here so perhaps that wasnt the right thing to do but hey i did it anyways. My cat got sick on the carpet but I am willing to believe for now that it has nothing to do with your function
|
mooo for rent
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
August 04, 2011, 04:57:05 AM |
|
My cat got sick on the carpet but I am willing to believe for now that it has nothing to do with your function
LOL
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
August 04, 2011, 06:23:13 AM Last edit: September 14, 2011, 06:57:14 PM by deepceleron |
|
I have bad news to report - phatk 2.1 sends bad shares. On pool mining hardware that consistently gets <2% rejects (and those only are stales within 5 seconds of a new block), I have only changed the phatk kernel: 2956/190 = 6.0% rejected 1944/290 = 13.0% rejected 2656/116 = 4.2% rejected 2615/184 = 6.6% rejected Here's a log from this new kernel showing the atypical random rejects: (old links) We can see on the result line that the hashes are bad, by not starting with 00000000: [03/08/2011 22:17:48] Result c877f46db0d6ab44... rejected These do not give an "OpenCL error, hardware problem?", or a "didn't meet minimum difficulty, not sending", they are sent and rejected. For an improvement in hashrate of 1% (333.58->336.53 typical) over Diapolo's 07-17 kernel, I get a 5% increase in rejects. I will have to revert. This is on WinXP/5830/11.6/SDK2.4 running phoenix.py 1.50 unmodified source on Python 2.6.6/numpy-1.6.0/... Two miner instances per GPU. Command line is: python phoenix.py -v -u http://xxx/ -k phatk VECTORS AGGRESSION=13 BFI_INT WORKSIZE=256 PLATFORM=0 DEVICE=0
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 04, 2011, 09:08:03 AM Last edit: August 04, 2011, 12:50:20 PM by ssateneth |
|
Since I really liked the graph on the front page but thought it lacked granularity, I'm going to take a shot at making a graph too. I'll be doing tests on a 5830 instead of a 5870 though (My 5830 seems a LOT more stable when it comes to memory speeds compared to my 5870). They'll be based on... GUIMiner v2011-07-01 Built-in Phoenix miner 11.7 Catalyst 2.5 SDK phatk 2.1 kernel with.. BFI_INT FASTLOOP=false AGGRESSION=14 and varying worksizes, memory speeds, and VECTORS vs VECTORS4. Stay tuned Edit: Here's a work in progress spreadsheet. It's updated as I test more combos (need to manually test and update spreadhseet manually). https://spreadsheets.google.com/spreadsheet/ccc?key=0AjXdY6gpvmJ4dEo4OXhwdTlyeS1Vc1hDWV94akJHZFE&hl=en_USI was planning to put in worksizes of 192, 96, and 48 too, but phatk 2.1 doesn't seem to support it. Less work for me though
|
|
|
|
Diapolo
|
|
August 04, 2011, 12:32:11 PM |
|
@Phat:
What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix). Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?
Dia
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 04, 2011, 01:45:53 PM |
|
So far tests indicate the first "sweet spot" is ~220 MHz with VECTORS WORKSIZE=128. The next "sweet spot" (and fastest one yet) is ~370-380MHz with VECTORS WORKSIZE=256. Will keep you guys posted as I run through more combos.
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
August 04, 2011, 03:33:08 PM |
|
My always sweet spot for 5870 is memory clock is equal to core clock divided by three. mem= core/3 = 975/3=325
|
|
|
|
CanaryInTheMine
Donator
Legendary
Offline
Activity: 2352
Merit: 1060
between a rock and a block!
|
|
August 04, 2011, 04:16:57 PM |
|
My always sweet spot for 5870 is memory clock is equal to core clock divided by three. mem= core/3 = 975/3=325
Would this hold true for 5850 and/or 5830s?
|
|
|
|
mike678
|
|
August 04, 2011, 04:33:01 PM |
|
My always sweet spot for 5870 is memory clock is equal to core clock divided by three. mem= core/3 = 975/3=325
Would this hold true for 5850 and/or 5830s? I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350. I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.
|
|
|
|
CanaryInTheMine
Donator
Legendary
Offline
Activity: 2352
Merit: 1060
between a rock and a block!
|
|
August 04, 2011, 04:35:46 PM |
|
My always sweet spot for 5870 is memory clock is equal to core clock divided by three. mem= core/3 = 975/3=325
Would this hold true for 5850 and/or 5830s? I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350. I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that. Mike, did you ever figure out that problem you had with MSI Afterburner? I think you posted on my thread as well about same issue I had...
|
|
|
|
mike678
|
|
August 04, 2011, 04:58:13 PM |
|
Mike, did you ever figure out that problem you had with MSI Afterburner? I think you posted on my thread as well about same issue I had...
Which thread are you talking about? I know I made a thread the other day in support about afterbuner freezing when I hit apply for my 5850's but cant remember what your thread was. If your talking about the freezing I haven't had a chance to test any further with the 5850's because I literally spent from the time I got out of work to like 1 am working on a skeleton case and trying to figure out why the psu was making a clicking noise. Also I know you got the ncixus 5850's as well whats your top speed on those so far? I can get up to 395ish with stock voltage.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 04, 2011, 05:18:10 PM |
|
@deepceleron That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel). How are you getting the hash results? If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right? If they are from the Client, are you running a modified version of phoenix? I don't think the stock phoenix logs that information. If you are, could you post details, so I can look into the bug. Two miner instances per GPU.
Why are you running 2 instances per GPU? That seems like it would just increase overhead and double the amount of stales. Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12. If that doesn't fix it, I'm not sure what else I can do without further information. Anyone else getting this bug?
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 04, 2011, 05:33:39 PM |
|
@Phat:
What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix). Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?
Dia
Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance. Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds). But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol. As of the newest edit on the first page, I am using CAL 11.7. On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little. I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...
|
|
|
|
dishwara
Legendary
Offline
Activity: 1855
Merit: 1016
|
|
August 04, 2011, 06:17:14 PM |
|
My always sweet spot for 5870 is memory clock is equal to core clock divided by three. mem= core/3 = 975/3=325
Would this hold true for 5850 and/or 5830s? Only trial & error will tell. My sweet spot for 6870 is mem clk = (core clk/3) + 14. I havn't tested sweet spot for 6970 yet, since my mother board is in repair for the past 7 days & when i was mining Linux didn't allowed to under clock not more than core clock minus 125 Mhz. I hope Windows 11.8 will give correct sweet spot for 6970, which i know once i got my mother board back.
|
|
|
|
Diapolo
|
|
August 04, 2011, 07:43:07 PM |
|
@Phat:
What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix). Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?
Dia
Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance. Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds). But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol. As of the newest edit on the first page, I am using CAL 11.7. On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little. I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch... In terms of efficiency one has to consider if a higer RAM frequency is worth it, becaus the cards draws much more power with a higher mem clock :-/. The sweet spot for my 5870 and 5830 seems to be @ 350 MHz Mem. Hope you get a new card soon ! Dia
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 04, 2011, 07:57:53 PM |
|
Yeah Phatk i hate to say it but I am having similar issues as deepceleron.
I started to notice an uptick in stales, I thought it was due to our proxy as we had problems before and we update it a lot. about 3-5% across the board
i reverted back to dia 7-17 for the the past 10 hours, and I have less than 1% stales.. which is normal for me. Using a 5830, 2.4 11.6 win7 32
phoenix, guiminer VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2
|
mooo for rent
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 04, 2011, 08:30:41 PM |
|
@joulesbeef Hmm... this might be a really hard bug to find. If anyone has any ideas... At first I was thinking it was because I compare the nonce to 0, but that would only give false negatives (1 in every 4 billion nonce will not be found) The main difference between mine and diapolo's init file is that I pack 2 bases together and send them to the kernel. I may try to get rid of the Base variable altogether and just use the offset parameter of the EnqueueKernel() command (I think you can do that in pyopencl)... Basically just thinking out loud... If i didn't love low level programming so much, I think I would shoot myself @Diapolo Yeah, the unreleased version I am working on uses 20 registers (It performs about the same as a configuration which uses 19 but has 2 more ALU OPs) Also, are you getting increased number of stales now that you have implemented some of the optimizations from phatk?
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
August 04, 2011, 09:04:00 PM Last edit: August 04, 2011, 09:26:24 PM by deepceleron |
|
@deepceleron That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel). How are you getting the hash results? If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right? If they are from the Client, are you running a modified version of phoenix? I don't think the stock phoenix logs that information. If you are, could you post details, so I can look into the bug. Two miner instances per GPU.
Why are you running 2 instances per GPU? That seems like it would just increase overhead and double the amount of stales. Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12. If that doesn't fix it, I'm not sure what else I can do without further information. Anyone else getting this bug? The output that I pastebinned is the standard console output of phoenix in -v verbose mode, I just highlighted the screen output on my console (with a 3000 line buffer) and copy-pasted it. It includes the first eight bytes of the hash in the results as you can see. Actually when I said that it was unmodified phoenix that I was running, I lied, by forgetting I had done this modification at line 236 in KernelInterface.py (because of a difficulty bug in a namecoin pool I was previously using): Original: if self.checkTarget(hash, nr.unit.target): formattedResult = pack('<76sI', nr.unit.data[:76], nonce) d = self.miner.connection.sendResult(formattedResult) def callback(accepted): self.miner.logger.reportFound(hash, accepted) d.addCallback(callback) return True else: self.miner.logger.reportDebug("Result didn't meet full " "difficulty, not sending") return False
Mine: formattedResult = pack('<76sI', nr.unit.data[:76], nonce) d = self.miner.connection.sendResult(formattedResult) def callback(accepted): self.miner.logger.reportFound(hash, accepted) d.addCallback(callback) return True
All I've done is remove the second difficulty check in phoenix, and trust that the kernel is returning only valid difficulty 1 shares. Now, instead of spitting out an error "Result didn't meet full difficulty, not sending", phoenix sends on all results returned by the kernel to the pool. Without this mod, logs of your kernel would just show a "didn't meet full difficulty" error message instead of rejects from the pool, which would still be a problem (but the helpful hash value wouldn't be printed for debugging). We can see from the hash value that the bad results are nowhere near a valid share. This code mod only exposes a problem in the kernel optimization code, that sometimes wild hashes are being returned by the kernel from some bad math (or by the kernel code being vulnerable to some overclocking glitch that no other kernel activates.) Are these just "extra" hashes that are leaking though, or is the number of valid shares being returned by the kernel lower too - hard to tell without a very long statistics run. I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too. With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners. My python lib versions are documented here. Joulesbeef: I don't like the word 'stales' for rejected shares unless it specifically refer to shares rejected at a block change because they were obsolete when submitted to a pool, as logged by pushpool. The results I have above are not stale work, they are invalid hashes.
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 04, 2011, 09:14:48 PM |
|
Do you think VLIW4 is a step backward from VLIW5?
VLIW4 is slower than VLIW5 in many computational tasks
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 04, 2011, 10:15:06 PM |
|
The output that I pastebinned is the standard console output of phoenix in -v verbose mode Oh, thanks, I didn't even know you could do that... I'll do some testing with that I think I'm going to have to download the source code for Phoenix and see what is actually happening... I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.
With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners. I agree totally, go with what works. I am just trying to figure all this out. Thanks for all your help. -Phateus
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 05, 2011, 01:51:10 AM |
|
If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
August 05, 2011, 02:15:00 AM |
|
If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14
Unless you use the -v flag for verbose logging in phoenix, set your console window so it has a log of thousands of lines you can scroll back through, and look for the "Result didn't meet full difficulty, not sending" error message, you wouldn't see any difference.
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 05, 2011, 02:39:40 AM |
|
I'll give it another try.. and use the verbose tag to see what is going on. right now i have 2 rejects over 360 shares on diablos newest 8-4 version. 3 different pools, both rejects at the same pool, all 3 have over 100 shares. 30 shares with yours 2.1 and no rejects which looks good so far.. I'll let you know when i get up over 300, maybe it was a fluke as some of my pools had connection issues.
|
mooo for rent
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 05, 2011, 03:00:23 AM |
|
I like the VECTORS4 feature, it gives me extra 5Mhash/s using SDK2.5
|
|
|
|
joulesbeef
Sr. Member
Offline
Activity: 476
Merit: 250
moOo
|
|
August 05, 2011, 03:28:59 AM |
|
well crap Phateus, I guess I owe ya an apology. 300 shares, no rejects. it must have been a bad day on the pools i was on.
|
mooo for rent
|
|
|
navigator
|
|
August 05, 2011, 06:50:17 PM Last edit: September 11, 2011, 09:38:55 PM by navigator |
|
DELETED for privacy
|
|
|
|
CanaryInTheMine
Donator
Legendary
Offline
Activity: 2352
Merit: 1060
between a rock and a block!
|
|
August 05, 2011, 06:53:01 PM |
|
I am getting rejects and hardware problem errors since the modification of __init__.py. Diapolo's 7-11 version is the last stable version for me on my 2 5830's @ 1000/350 stock voltage. I switched back to 7-11 version last night and today on BTCGuild I am back to showing 4500 (31, 0.68%) on one card and 4324 (27, 0.62%). I am getting 320mh/s with the 7-11 version and was getting 324mh/s with your phatk 2.1. The number of stales/rejects showing on BTCGuild and in phoenix log after a short period of mining was up to and over 3% on both cards yesterday. If you would like I can let one card run each version to compare the difference. I can also provide a log of phoenix if needed. Not for certain about this, but I believe I was only getting the "hardware problem?" error from diapolo's versions after 7-11 and your phatk2.0. With phatk2.1 I saw the introduction of the rejected shares. I can provide any info if needed. I assume the kernel is pushing the card too hard as Diapolo mentioned earlier in his thread. I have no idea why it is affecting some and not others.
I had to reduce the OC by 10, kept same memory settings with my 5830s using 2.1
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 05, 2011, 09:30:05 PM |
|
well crap Phateus, I guess I owe ya an apology. 300 shares, no rejects. it must have been a bad day on the pools i was on.
Not a problem, I'm glad its working out for ya I am getting rejects and hardware problem errors since the modification of __init__.py. Diapolo's 7-11 version is the last stable version for me on my 2 5830's @ 1000/350 stock voltage. I switched back to 7-11 version last night and today on BTCGuild I am back to showing 4500 (31, 0.68%) on one card and 4324 (27, 0.62%). I am getting 320mh/s with the 7-11 version and was getting 324mh/s with your phatk 2.1. The number of stales/rejects showing on BTCGuild and in phoenix log after a short period of mining was up to and over 3% on both cards yesterday. If you would like I can let one card run each version to compare the difference. I can also provide a log of phoenix if needed. Not for certain about this, but I believe I was only getting the "hardware problem?" error from diapolo's versions after 7-11 and your phatk2.0. With phatk2.1 I saw the introduction of the rejected shares. I can provide any info if needed. I assume the kernel is pushing the card too hard as Diapolo mentioned earlier in his thread. I have no idea why it is affecting some and not others.
Any information you can post would be helpful in fixing this. I am going to try to rewrite some of the init file and see if I can make it a bit more stable...
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 06, 2011, 12:20:05 AM |
|
I am getting rejects and hardware problem errors since the modification of __init__.py. Diapolo's 7-11 version is the last stable version for me on my 2 5830's @ 1000/350 stock voltage. I switched back to 7-11 version last night and today on BTCGuild I am back to showing 4500 (31, 0.68%) on one card and 4324 (27, 0.62%). I am getting 320mh/s with the 7-11 version and was getting 324mh/s with your phatk 2.1. The number of stales/rejects showing on BTCGuild and in phoenix log after a short period of mining was up to and over 3% on both cards yesterday. If you would like I can let one card run each version to compare the difference. I can also provide a log of phoenix if needed. Not for certain about this, but I believe I was only getting the "hardware problem?" error from diapolo's versions after 7-11 and your phatk2.0. With phatk2.1 I saw the introduction of the rejected shares. I can provide any info if needed. I assume the kernel is pushing the card too hard as Diapolo mentioned earlier in his thread. I have no idea why it is affecting some and not others.
Maybe try disable overclock. I had an experience where switching to new miner caused problems. Then I revert back to stock clock and problem solved.
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 06, 2011, 02:38:53 AM |
|
I've gotten "Hardware problem?" errors often since the __init__ patches of 7-17 diapolo and phatk 2.0+, but they don't seem to cause any decrease in mhash, or crashes, or increased stales, so I'm not too worried about them. I'm assuming it causes "Hardware problem" when 1 hash isn't quite right, and getting 1 "bad" hash error every 10 minutes while doing 330 million hashes every second... you can probably see where I'm going with this.
If it matters, reducing my memory clock from 370 to 350 drastically reduced my "Hardware problem?" errors. I've gone from 1 every ~2 minutes to 1 every ~30 minutes.
|
|
|
|
navigator
|
|
August 06, 2011, 03:00:37 AM Last edit: September 11, 2011, 09:39:17 PM by navigator |
|
DELETED for privacy
|
|
|
|
bcforum
|
|
August 06, 2011, 11:26:28 AM |
|
Now running 2.1 on one card and at ~500 shares without a single reject or hardware problem. I don't get it. Maybe a pool issue? I know for certain that I still haven't seen one hardware problem? error come up running diapolo's 7-11 version. My temps are usually 62c or lower, currently at 59c.
Rejects are valid hashes that weren't accepted by the pool for some reason. Generally it is because the pool has moved on (added transactions, bumped up the timestamp, etc) Reject count shouldn't be used to validate GPU kernel performance, but you can argue good mining software shouldn't have a high reject rate. Phoenix (and other mining frontends) checks the results of the GPU miners before sending it on to the pool. If the nonce is invalid (doesn't generate a hash with zeros in the proper place) it is reported as a hardware problem. Different kernels may cause more hardware errors due to how they are exercising the GPU. Unfortunately, this is a problem with your hardware and not the kernel. You will probably have to tweak your overclocking for each kernel to find a stable operating point.
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
navigator
|
|
August 06, 2011, 04:25:36 PM Last edit: September 11, 2011, 09:39:26 PM by navigator |
|
DELETED for privacy
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
August 06, 2011, 07:19:29 PM |
|
I left 2.1 running on both cards overnight. Today, one card is showing only the hardware problem error occasionally, no increase in rejects. Here is ~5hr log of it to get an idea of how often it pops up, http://pastebin.com/raw.php?i=qQedNWRG. I immediately restarted that miner using 2.1 and got the first hardware problem after 4mins of mining. Next I switched back to diapolo 7-11. Ran it for 15 mins without a single hardware error. Okay once more back to 2.1, again another hardware problem after only 2 mins this time. So I can turn the problem on and off by switching versions. This whole time my other card has been running 2.1 since last night without a single problem. I'm not too concerned about the hardware problem error. But the first night I tried out 2.1 I was getting a large amount of rejects also. EDIT: Backed off clock 10mHz like CanaryInTheMine said in his earlier post to 990 for 10 mins and no problem showed up. Put back to 1000 and one popped up after ~30secs. Running the newer version at the slower speed doesn't net me any gain in mhash. Again, the hardware problem doesn't concern me much. It's just the sudden amounts of rejects I was getting after I first switched to 2.1 Well the hardware error indicates not necessarily a 'hardware error', but a bad hash was detected outside the running OpenCL kernel by the first validity check: In __init.py__: if not hash.endswith('\x00\x00\x00\x00'): self.interface.error('Unusual behavior from OpenCL. ' 'Hardware problem?') So to get this error, either the SHA hashing math or the hash checking in OpenCL was corrupted, and an invalid hash was returned as a valid share. If this happens, then it is also possible that a hash that would be a valid solution could be corrupted and not returned or sent. Diapolo's 7-17 kernel is also more sensitive to overclock than previous ones, and will start returning the 'hardware error' at overclocks where 7-11 doesn't. Either a different stream core instruction on the die is being used that doesn't overclock well, or the higher utilization causes some failure. My way of thinking is you don't want to overclock to the point where any bad math is happening in any stream processors. If you have to overclock 5% less on a kernel that is 1% more efficient, then you lose any gains.
|
|
|
|
phorensic
|
|
August 08, 2011, 06:20:12 AM |
|
Ever since the kernel development for phatk got fired up again people noticed more "kernel errors". After weeks of tweaking and testing I've come to realize a few simple things. When overclocking to the edge, first the kernel will throw kernel error's a couple times an hour, after just a *few* more MHz it will actually crash the driver and reset the desktop, possibly killing the connection to the pool server. It's amazing how razor thing those margins are. Even as small as 4MHz on the core clock can go from rock stable all day, to kernel error, to crashed driver. I don't see it as a bug, I see it as the kernel reporting what is going on better than before (verbosity).
|
|
|
|
ovidiusoft
|
|
August 08, 2011, 06:26:15 AM |
|
@deepceleron
As long as the board doesn't crash or overheat, I think you just need to find out if the increase in hashrate is significantly more than the increase in hardware errors. For my board, on Diapolo's 08-04 version I get 0,2% hardware errors (out of the accepted shares, maybe time referenced would be better) at 1040 Mhz. At 1050, it's 0,21%, so a variance that can be as well network conditions or measurement errors (if time was involved). But, the hashrate increases by 0,95%. So my reasonment is that I gain about 1% performance while losing 0,01% because of more hardware errors.
I only tested 2.1 on 1050 and for a shorter time, but hardware errors seem to be in the same range as in Diapolo's kernel, so I will most likely leave the frequency alone and do comparative testing on the kernels only.
|
|
|
|
bcforum
|
|
August 08, 2011, 04:09:11 PM |
|
@deepceleron
As long as the board doesn't crash or overheat, I think you just need to find out if the increase in hashrate is significantly more than the increase in hardware errors. For my board, on Diapolo's 08-04 version I get 0,2% hardware errors (out of the accepted shares, maybe time referenced would be better) at 1040 Mhz. At 1050, it's 0,21%, so a variance that can be as well network conditions or measurement errors (if time was involved). But, the hashrate increases by 0,95%. So my reasonment is that I gain about 1% performance while losing 0,01% because of more hardware errors.
I only tested 2.1 on 1050 and for a shorter time, but hardware errors seem to be in the same range as in Diapolo's kernel, so I will most likely leave the frequency alone and do comparative testing on the kernels only.
I think you should double (at least) the number of hardware errors. Remember that for every bad nonce that is reported, there is probably a good nonce that goes unreported.
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 08, 2011, 05:11:47 PM |
|
No update yet? It's aug 8 now <.<
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 08, 2011, 06:17:18 PM |
|
No update yet? It's aug 8 now <.<
Just, posted the new version, enjoy.
|
|
|
|
ovidiusoft
|
|
August 08, 2011, 07:34:27 PM |
|
First run report: * Diapolo's 08-04: 338.9 * phatk-2.1: 340.9 * phatk-2.2: 341.3 Board is a 5830 Xtreme from Sapphire, GPU at 1050, RAM at 325, phoenix options: -k phatkmod-0804 VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 -k phatk-2.1 VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 -k phatk-2.2 VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256 I'll leave it overnight to see if any problems (hardware errors, etc), but it looks good! Thank you for your work! (and waiting for Diapolo's reply on your kernel ).
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1002
|
|
August 08, 2011, 09:01:35 PM |
|
Sapphire HD5850 Xtreme: 899MHz/327MHz@0.9875V Linux x86_64, Catalyst 11.6, SDK 2.1 VECTORS BFI_INT AGGRESSION=14 WORKSIZE=256
phatk 2.1: 376.9 Mh/s (+/- 0.1 Mh/s) phatk 2.2: 377.5 Mh/s (+/- 0.1 Mh/s)
|
|
|
|
Beta-coiner1
|
|
August 08, 2011, 10:09:50 PM |
|
Radeon 6950- .1-.3 Mh/s over Diapolo's latest.v 4 w64 f3 Radeon 5770- .2-2.0 Mh/s " " ".v 2 w128 f30
Cat. 11.6B /SDK 2.5
Not bad of an improvement.
|
|
|
|
UniverseMan
Newbie
Offline
Activity: 26
Merit: 0
|
|
August 08, 2011, 11:25:33 PM |
|
Cat 11.6, SDK 2.4 Ubunutu 11.04, phoenix 1.50
cards - 5830 1000/300, 6870 945/1050 phatk 2.1 - 324, 299 phatk 2.2 - 323, 298
Slightly slower on 2.2. I've listed it as 1 MH/s slower on both cards, but in fact it's more like .5 or less slower.
|
|
|
|
Tx2000
|
|
August 09, 2011, 03:17:15 AM |
|
Catalyst 11.4 / SDK 2.4 Ref 5850 @ 920c/320m
-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12
2.1: 399.27 to 399.63 Mh/s 2.2: 399.87 to 400.17 Mh/s
|
|
|
|
Clipse
|
|
August 09, 2011, 04:08:22 AM |
|
Catalyst 11.4 / SDK 2.4 Ref 5850 @ 920c/320m
-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12
2.1: 399.27 to 399.63 Mh/s 2.2: 399.87 to 400.17 Mh/s
Damn those are some good hashrates for the core. I think i will setup cat 11.4 aswell and test my card mem out at 320, my cores running between 1050-1150(for the extreme voltmodded version) all hd5850's aswell.
|
...In the land of the stale, the man with one share is king... >> ClipseWe pay miners at 130% PPS | Signup here : Bonus PPS Pool (Please read OP to understand the current process)
|
|
|
metacontent
Newbie
Offline
Activity: 16
Merit: 0
|
|
August 09, 2011, 05:50:56 AM |
|
Hey, I've been using this modified kernel for a couple weeks now, I quite like it, just wanted to say thanks.
|
|
|
|
teukon
Legendary
Offline
Activity: 1246
Merit: 1002
|
|
August 09, 2011, 07:00:39 AM |
|
Catalyst 11.4 / SDK 2.4 Ref 5850 @ 920c/320m
-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12
2.1: 399.27 to 399.63 Mh/s 2.2: 399.87 to 400.17 Mh/s
Damn those are some good hashrates for the core. I think i will setup cat 11.4 aswell and test my card mem out at 320, my cores running between 1050-1150(for the extreme voltmodded version) all hd5850's aswell. Yeah - seriously! I've come up against this before when trying to find the maximum hash-rate for a 1GHz 5850 and ended up being well and truly trumped by a Windows user with Catalyst 11.4. I may have to try playing with this version of Catalyst again.
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 09, 2011, 07:06:47 AM |
|
something wrong with kernel 2.2
i get 330 MHs using 2.2 410 MHs using kernel 2.1
card AMD 5870 clock at 900 Mhz
using 11.8 beta driver with SDK2.5
|
|
|
|
bcforum
|
|
August 09, 2011, 10:30:53 AM |
|
Ubuntu 10.10 Cata 11.3 SDK 2.4 6970x2 OC 940,1375
Phoenix-r112 (Diapolo 7-17 w/ Vals[7] patch) 422.8MH/s Phatk-2.2 423.3MH/s
So up 0.5MH/s, sent you my profits for the week.
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 09, 2011, 05:06:12 PM |
|
I found that VECTER4 option does not work for version 2.2
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 09, 2011, 05:40:16 PM |
|
I found that VECTER4 option does not work for version 2.2
Same. Using VECTORS4 drops my hash rate from 385 to 310 on my 5870. Using VECTORS WORKSIZE=128 brings it back up to about 380.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 09, 2011, 08:33:43 PM |
|
I found that VECTER4 option does not work for version 2.2
I optimize the code for VECTORS, so probably making it faster in 2.2 made VECTORS4 slower. I can't really optimize the kernel for both, so I would just stick with version 2.1 if that is faster for you. And everyone, thanks for your support, every little bit helps
|
|
|
|
jedi95
|
|
August 09, 2011, 08:57:30 PM |
|
I found that VECTER4 option does not work for version 2.2
Same. Using VECTORS4 drops my hash rate from 385 to 310 on my 5870. Using VECTORS WORKSIZE=128 brings it back up to about 380. This is probably because of the increased GPR usage of the VECTORS4 code. According to KernelAnalyzer VECTORS4 uses 2707 ALU OPS and 33 GPRs. This is compared with VECTORS which is 1355 ALU OPS and only 23 GPRs. Theoretically VECTORS4 would be faster, since it tests twice the number of nonces using 3 fewer ALU OPS than 2 executions of VECTORS. However, if the GPU runs out of GPRs then this limits the number of threads that can be running at once, which is what causes the performance drop. (Above ALU OPS and GPR numbers are for Cypress, AKA 58xx) VECTORS4 might be faster for 69xx users though, when combined with a smaller WORKSIZE. EDIT: Just looked at the 2.1 version and it uses even more GPRs with VECTORS4 than 2.2 does. (35 GPRs, 1358 ALU OPS) I'm not quite sure how it can be faster than 2.2.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
metacontent
Newbie
Offline
Activity: 16
Merit: 0
|
|
August 09, 2011, 09:16:34 PM |
|
Why not make two separate kernels then?
VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?
|
|
|
|
bcforum
|
|
August 09, 2011, 10:40:58 PM |
|
VECTORS4 might be faster for 69xx users though, when combined with a smaller WORKSIZE.
Ubuntu 10.10 Catalyst 11.3 SDK 2.4 6970 @ 940,1375 Phatk 2.2 315.5MH/s DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS4 FASTLOOP=false 414.2MH/s DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=128 VECTORS4 FASTLOOP=false 321.1MH/s DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS4 FASTLOOP=false
422.8MH/s DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS FASTLOOP=false 423.5MH/s DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=128 VECTORS FASTLOOP=false 420.9MH/s DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS FASTLOOP=false
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 09, 2011, 10:43:16 PM |
|
Why not make two separate kernels then?
VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?
Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations. I just don't have the time to keep up with two kernels. If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like. Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4. Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory). Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock). -Phateus
|
|
|
|
metacontent
Newbie
Offline
Activity: 16
Merit: 0
|
|
August 09, 2011, 11:00:04 PM |
|
Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).
I think for the foreseeable future those cards will be doing the lions share of the work, so I would say you are on the right track.
|
|
|
|
cyberlync
|
|
August 09, 2011, 11:44:27 PM |
|
Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).
I think for the foreseeable future those cards will be doing the lions share of the work, so I would say you are on the right track. +1
|
Giving away your BTC's? Send 'em here: 1F7XgercyaXeDHiuq31YzrVK5YAhbDkJhf
|
|
|
Tx2000
|
|
August 10, 2011, 12:32:41 AM |
|
Catalyst 11.4 / SDK 2.4 Ref 5850 @ 920c/320m
-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12
2.1: 399.27 to 399.63 Mh/s 2.2: 399.87 to 400.17 Mh/s
Damn those are some good hashrates for the core. I think i will setup cat 11.4 aswell and test my card mem out at 320, my cores running between 1050-1150(for the extreme voltmodded version) all hd5850's aswell. Yea beats me =/ I haven't been able to get my second 5850 (new 230SA Sapphire 5850 Xtreme) to achieve the same results. In fact, it seems to hate SDK 2.4.
|
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 10, 2011, 09:40:16 AM |
|
Why not make two separate kernels then?
VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?
Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations. I just don't have the time to keep up with two kernels. If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like. Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4. Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory). Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock). -Phateus the thing is, VECTORS4 worked perfectly for me in version 2.1 in version 2.2 its broken
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 10, 2011, 04:21:30 PM |
|
Why not make two separate kernels then?
VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?
Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations. I just don't have the time to keep up with two kernels. If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like. Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4. Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory). Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock). -Phateus the thing is, VECTORS4 worked perfectly for me in version 2.1 in version 2.2 its broken As in it doesn't work at all, or that it is much slower?... Just use version 2.1 then
|
|
|
|
huayra.agera
|
|
August 10, 2011, 04:43:21 PM |
|
Hi! Just used v2.2 and it increased my hashrate by 3 Mhash compared to Diapolo's. From 402 > 405. Vectors4 seemed to drop the hashrate significantly on my 5850 by 50 Mhash. Great work to you guys and we are very grateful =).
I think the mods should create a Child Board under Mining support and name it "Mods" or Tweaks I guess and put this thread there.
|
BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
August 10, 2011, 07:52:14 PM Last edit: August 11, 2011, 04:32:23 AM by ssateneth |
|
Why not make two separate kernels then?
VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?
Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations. I just don't have the time to keep up with two kernels. If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like. Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4. Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory). Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock). -Phateus the thing is, VECTORS4 worked perfectly for me in version 2.1 in version 2.2 its broken As in it doesn't work at all, or that it is much slower?... Just use version 2.1 then The behavior is as if it's not doing 4 nonces, but only doing 1 (i.e. no VECTORS option specified). My compute speed remained the same regardless of memory speed, which is exactly like your V1 result on the graph on page 1.
|
|
|
|
critical
|
|
August 11, 2011, 09:49:54 AM |
|
in guiminer, i keep getting invalid buffer, unable to write to file, wonder why
|
|
|
|
Diapolo
|
|
August 11, 2011, 11:09:00 AM |
|
Just did a test: Rig setup: Linuxcoin v0.2b (Linux version 2.6.38-2-amd64) Dual HD5970 (4 GPU cores in the rig) Mem clock @ 300Mhz Core clock @ 800Mhz VCore @ 1.125v AMD SDK 2.5 Phoenix r100 Phatk v2.2 -v -k phatk BFI_INT VECTORS WORKSIZE=256 AGGRESSION=11 FASTLOOP=false Result: Overall Rig rate: 1484 MH/s Rate per core: 371 MH/s This is ~4MH/s faster than Diapolo's latest. On 5970, phatk 2.2 is current king of the hill. For the world to be perfect, this kernel needs to be integrated into cgminer The last kernel releases show, that it is a bit of trial and error to find THE perfect kernel for a specific setup. Phaetus and I try to use the KernelAnalyzer and our Setups as a first measurement, if a new Kernel got "faster". But there are many different factors that come into play like OS, driver, SDK, miner-software and so on. I would suggest that we should try to create a kernel which is based on the same kernel-parameters for phatk and phatk-Diapolo so that the users are free to chose which kernel is used. One thing is CGMINER kernel uses the switch VECTORS2, where Phoenix used only VECTORS (which I changed to VECTORS2 in my last kernel releases). It doesn't even matter to use the same variable names in the kernel (in fact they are different sometimes) as long as the main miner software passes the awaited values in a defined sequence to the kernel. Dia
|
|
|
|
MegaBux
Newbie
Offline
Activity: 31
Merit: 0
|
|
August 11, 2011, 03:26:33 PM Last edit: August 11, 2011, 05:19:23 PM by MegaBux |
|
As of version 2.1, phatk now has command line option "VECTORS4" which can be used instead of "VECTORS". This option works on 4 nonces per thread instead of 2 and may increase speed mainly if you do not underclock your memory, but feel free to try it out. Note that if you use this, you will more than likely have to decrease your WORKSIZE to 128 or 64.
I'm using a 6770 @ 1.01Ghz with phatk 2.2. When I run the memory clock at 300Mhz with the VECTORS option, I get 234.5Mhps. However, I can't seem to reap the benefits of VECTORS2 or VECTORS4 at a higher memory clock (i.e. 1.2Ghz). I've reduced the WORKSIZE from 256 to 128 and 64 and peak around 213Mhps; with these options, I can only achieve between 204 and 213 Mhps.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 11, 2011, 04:33:14 PM |
|
Just did a test: Rig setup: Linuxcoin v0.2b (Linux version 2.6.38-2-amd64) Dual HD5970 (4 GPU cores in the rig) Mem clock @ 300Mhz Core clock @ 800Mhz VCore @ 1.125v AMD SDK 2.5 Phoenix r100 Phatk v2.2 -v -k phatk BFI_INT VECTORS WORKSIZE=256 AGGRESSION=11 FASTLOOP=false Result: Overall Rig rate: 1484 MH/s Rate per core: 371 MH/s This is ~4MH/s faster than Diapolo's latest. On 5970, phatk 2.2 is current king of the hill. For the world to be perfect, this kernel needs to be integrated into cgminer The last kernel releases show, that it is a bit of trial and error to find THE perfect kernel for a specific setup. Phaetus and I try to use the KernelAnalyzer and our Setups as a first measurement, if a new Kernel got "faster". But there are many different factors that come into play like OS, driver, SDK, miner-software and so on. I would suggest that we should try to create a kernel which is based on the same kernel-parameters for phatk and phatk-Diapolo so that the users are free to chose which kernel is used. One thing is CGMINER kernel uses the switch VECTORS2, where Phoenix used only VECTORS (which I changed to VECTORS2 in my last kernel releases). It doesn't even matter to use the same variable names in the kernel (in fact they are different sometimes) as long as the main miner software passes the awaited values in a defined sequence to the kernel. Dia A good idea. A further improvement: I'd like to have an option in my miner that spends ~2mn benchmarking all the kernels available in the current directory (without talking to a pool, i.e. doing pure SHA256 on bogus nonces), and picking the fastest for the current rig. For people with lots of different rigs/setups, that would save them the headache of having to hand-tune each instance. What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork). I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...) This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you. As for cgminer support, I haven't tried it, are there any benefits over phoenix? I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 11, 2011, 04:50:32 PM |
|
As of version 2.1, phatk now has command line option "VECTORS4" which can be used instead of "VECTORS". This option works on 4 nonces per thread instead of 2 and may increase speed mainly if you do not underclock your memory, but feel free to try it out. Note that if you use this, you will more than likely have to decrease your WORKSIZE to 128 or 64.
I'm using a 6770 @ 1.01Ghz with phatk 2.2. When I run the memory clock at 300Mhz with the VECTORS option, I get 234.5Mhps. However, I can't seem to reap the benefits of VECTORS2 or VECTORS4 at a higher memory clock (i.e. 1.2Ghz). I've reduced the WORKSIZE from 256 to 128 and 64 and can only seem to peek at 213Mhps. With these options, I can only achieve between 204 and 213 Mhps. I have found that VECTORS4 is extremely unreliable... even tiny changes in the kernel and other factors affect the hashrate tremendously... OpenCL gets really weird when you use a lot of registers. I added it in 2.1 because it was comparable to VECTORS in some situations, but changing the kernel slightly in 2.2 seems to have broken it (even though kernel analyer says it uses less registers and less ALU ops... *sigh*) Anyone wondering about any new kernel improvements, I seem to be at a standstill... I have tried the following: - Removing all control flow operations (about 1MH/s slower)
- Sending all kernel arguments in a buffer (about 1MH/s slower)
- Using an atomic counter for the output so that the output buffer is written sequentially (about the same speed and only works on ATI xxx cards and newer)
- Using an internal loop in the kernel to process multiple nonces (Either significantly slower or massive desktop lag)
- Calling set_arg only once per getwork instead of once per kernel call (only faster when using very low aggression and FASTLOOP, I will add this to my next kernel release)
-Phateus
|
|
|
|
jedi95
|
|
August 11, 2011, 08:44:55 PM |
|
What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork). I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...) This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.
As for cgminer support, I haven't tried it, are there any benefits over phoenix? I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).
In most cases you won't see much if any decrease in the number of getwork requests by running multiple kernels behind the same work queue. The reason for having a work queue in the first place is so that the miner only needs to ask for more work when the queue falls below a certain size. During normal operation Phoenix won't request more work than absolutely necessary. There might be a small benefit to doing this when the block changes, but aside from that the getwork count for a single instance running 2 kernels compared to 2 instances will be very close. That said, I am interested to see the results of the other changes you mentioned. Feel free to PM me if you have any questions.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1028
|
|
August 12, 2011, 03:12:32 AM Last edit: August 12, 2011, 04:30:40 AM by deepceleron |
|
Big Edit: I looked again at the AMD APP SDK v2.5, trying to get it to not suck. I did one more thing, not only did I install the 2.5 SDK (on Catalyst 11.6), but I also re-compiled pyopencl 0.92 against the newer SDK. On phatk 2.2, changing just from 2.4 SDK to 2.5 SDK with a matching pyOpenCL gets a hair more mhash: SDK 2.4: 309.97 SDK 2.5: 310.10 Just to let people know, regarding the APP SDK, the version installed as well as the version used to compile pyopencl both seem to matter (not that this helps you if you are using just the prepackaged Windows phoenix.exe.) Using a pyOpenCL newer than 0.92 gives a deprecation warning: [0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]kernels\phatk\__init__.py:414: Depr ecationWarning: 'enqueue_read_buffer' has been deprecated in version 2011.1. Ple ase use enqueue_copy() instead. self.commandQueue, self.output_buf, self.output) [11/08/2011 21:10:22] Server gave new work; passing to WorkQueue [291.32 Mhash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]kernels\phatk\__init__.p y:427: DeprecationWarning: 'enqueue_write_buffer' has been deprecated in version 2011.1. Please use enqueue_copy() instead. self.commandQueue, self.output_buf, self.output) Using pyOpenCL 2011.1.2 with the kernel in its current form gets me less mhash though: SDK 2.4: 307.98 SDK 2.5: 307.84 (5830@955/350; Catalyst 11.6; Win7; py 2.6.6)
|
|
|
|
CYPER
|
|
August 12, 2011, 03:24:26 AM |
|
Using the latest 2.2 version got quite a noticeable increase:
Before: 4x 440Mh/s = 1760Mh/s
After: 4x 446Mh/s = 1784Mh/s
My best settings are: Worksize = 256 Aggresion = 12 VECTORS
|
|
|
|
Tx2000
|
|
August 12, 2011, 03:46:11 AM |
|
What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork). I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...) This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.
As for cgminer support, I haven't tried it, are there any benefits over phoenix? I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).
Would definitely be interested in a cgminer fork. Don't get me wrong, phoenix is great and has always given me the best performance overall but it does lack some of the more refined features, which the other poster listed above. Failover and nice static but updated command line "UI". Seems like you and diapolo are hitting the ceiling with phoenix anyway.
|
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
August 12, 2011, 06:54:36 AM |
|
There is a thing I dont understand about the results of these modifications. They increase the hash rate but they also increase consumption, and I always though that since they are making the kernel more efficient (same task with less instructions, less work for the gpu per hash) they should increase the hash rate without chaning consumption too much. Does anyone know why the more efficient kernel is not also more energy efficient?
Also, if one of you guys is out of ideas to make the cards runs faster it could be interesting to target energy efficiency instead of speed. A lot of us are not interested in running our cards at the maximum MHash/s rate but are more interested on having a better MHash/J rate.
|
|
|
|
talldude
Member
Offline
Activity: 224
Merit: 10
|
|
August 12, 2011, 01:23:02 PM |
|
It is more efficient - the more output per unit time you have, the more efficient it is since the card will be wasting less power sitting idle.
If you want to increase efficiency, that is a hardware thing - namely undervolt your card.
|
|
|
|
bcforum
|
|
August 12, 2011, 01:28:56 PM |
|
There is a thing I dont understand about the results of these modifications. They increase the hash rate but they also increase consumption, and I always though that since they are making the kernel more efficient (same task with less instructions, less work for the gpu per hash) they should increase the hash rate without chaning consumption too much. Does anyone know why the more efficient kernel is not also more energy efficient?
Also, if one of you guys is out of ideas to make the cards runs faster it could be interesting to target energy efficiency instead of speed. A lot of us are not interested in running our cards at the maximum MHash/s rate but are more interested on having a better MHash/J rate.
In theory, fewer ALU ops translates to less energy consumption. In practice, each ALU op uses a slightly different amount of power and a kernel which 10x instruction A may burn more power than 12x instruction B. Unfortunately, instruction power numbers aren't documented anywhere so it is almost impossible to optimize in a theoretical sense, and could vary from GPU to GPU (due to minor manufacturing defects.) One of Diapolo's recent kernels lowered operating temperature by ~3C without changing hashrate significantly. Presumably that particular kernel is ~10% more power efficient than others.
|
If you found this post useful, feel free to share the wealth: 1E35gTBmJzPNJ3v72DX4wu4YtvHTWqNRbM
|
|
|
hugolp
Legendary
Offline
Activity: 1148
Merit: 1001
Radix-The Decentralized Finance Protocol
|
|
August 12, 2011, 01:35:19 PM Last edit: August 12, 2011, 02:29:51 PM by hugolp |
|
In theory, fewer ALU ops translates to less energy consumption. In practice, each ALU op uses a slightly different amount of power and a kernel which 10x instruction A may burn more power than 12x instruction B. Unfortunately, instruction power numbers aren't documented anywhere so it is almost impossible to optimize in a theoretical sense, and could vary from GPU to GPU (due to minor manufacturing defects.)
One of Diapolo's recent kernels lowered operating temperature by ~3C without changing hashrate significantly. Presumably that particular kernel is ~10% more power efficient than others. Thanks for the answer. Can you indicate the version of Diapolo's kernel you are refering to?
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 12, 2011, 05:53:22 PM |
|
What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork). I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...) This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.
As for cgminer support, I haven't tried it, are there any benefits over phoenix? I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).
Would definitely be interested in a cgminer fork. Don't get me wrong, phoenix is great and has always given me the best performance overall but it does lack some of the more refined features, which the other poster listed above. Failover and nice static but updated command line "UI". Seems like you and diapolo are hitting the ceiling with phoenix anyway. I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version). We are hitting a ceiling with opencl in general (and perhaps with the current hardware). In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something. Now that doesn't mean that there is NO room for improvement, just that any other improvement will probably have to be faster hardware, a more efficient implementation of openCL by AMD or figuring out a better way to finagle the current openCL implementation to reduce the implementation overhead. But, unless there is a problem with pyopenCL, c and python should give equivalent speeds as long as they are just calling the openCL interface (the actual miner uses negligible resources). I suppose it could be possible to access the hardware drivers directly and run the kernel that way... but I don't see that as being feasible. But, with all of that said, I have looked through some of his code, and it some really clean code. Part of the reason I want to add these features is to learn more python (this is the first thing I have programmed in python), but it probably will just be easier modifying the cgminer code. Thanks for pointing out cgminer to me
|
|
|
|
Tx2000
|
|
August 12, 2011, 06:04:56 PM |
|
Sent another donation your way. Look forward to your work on cgminer.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 12, 2011, 06:30:17 PM |
|
Sent another donation your way. Look forward to your work on cgminer.
Thanks
|
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 12, 2011, 07:38:31 PM |
|
I took a look at the comparison between version 2.2 and version 2.1 could it because __constant uint ConstW[128] change that broke VECTORS4?
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 12, 2011, 08:01:13 PM |
|
I took a look at the comparison between version 2.2 and version 2.1 could it because __constant uint ConstW[128] change that broke VECTORS4?
That change is inconsequential (I was trying some things that required the change but did not keep them).. the compiler doesn't use those values, so they code should be exactly the same doing it either way (you can try and replace the code with the old code if you want to check). You keep saying that it is broken.. if it does not run, post the errors. I have found that on my card, VECTORS4 is much slower in version 2.2 than 2.1, but this is not a bug... it seems to be because openCL does not like allocating that many registers... Version 2.1 uses around 99.7% of instruction slots with VECTORS4 and I have tried many many ways to make it faster and more reliable (in 2.1), but I have given up on it. It is still in the release because I don't see any point in taking it out... but getting 2.2 to run as fast as 2.1 with VECTORS4 is not going to happen. Also, the differences between 2.1 and 2.2 with VECTORS are very tiny anyway (less than .5%)... Getting into more detail about it: If you look at the graph on the main page of the thread, you can see the graph of VECTORS4 in version 2.1... in version 2.2 for some reason, the spike (and corresponding valley) is located higher (somewhere around 500), this could mean that it would be just as fast if you had 1500 Mhz memory, but I have no idea why openCL reacts this way to changing the memory speed. There are way to many GPU architecture/GPU bios/PCIe bus/CPU-GPU transfer/driver/openCL implementation unknowns to try to predict this behavior. -Phateus
|
|
|
|
huayra.agera
|
|
August 12, 2011, 08:19:20 PM |
|
Hi, I used phatk 2.2 on my 5 rigs and I had restarting/BSOD errors occuring on all machines (5850 multi/single, 6850) on several occasions already.
Yes, there was an increase in hashrate however, it seemed to have a memory leak or something. Just thought I'd inform you on this. Anyways, great work still. Looking forward to further improvements on the project. But for now, I'll revert to my previous settings.
|
BTC: 1JMPScxohom4MXy9X1Vgj8AGwcHjT8XTuy
|
|
|
jedi95
|
|
August 13, 2011, 06:38:02 AM |
|
I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).
Looking forward to this !! Just sent one coin your way, and there's another once the work is done. We are hitting a ceiling with opencl in general (and perhaps with the current hardware). In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.
Out of curiosity, have you looked into trying to code a version directly in AMD's assembly language and bypassing OpenCL entirely ? (I'm thinking: since we're already patching the ELF output, this seems like the logical next step ) Also, have you looked at AMD CAL ? I know this is what ufasoft's miner uses ( https://bitcointalk.org/index.php?topic=3486.500), and also what zorinaq considers the most efficient way to access AMD hardware (somwhere on http://blog.zorinaq.com) Replacing one instruction in the ELF with another that uses the exact same inputs/outputs is one thing, but manually editing the ASM code is another thing entirely. Besides, with the work that has been done the GPU is already at >99% of the theoretical maximum throughput. (ALU packing) And as said above, we are also close to the theoretical minimum number of instructions to correctly run SHA256. Also, if you look near the end of the hdminer thread you will notice that users are able to get the same hashrates from phatk on 69xx. For 58xx and other VLIW5 cards phatk is significantly faster than hdminer. If that's the best he can do with CAL then I don't see any reason to use it. hdminer had a substantial performance advantage back in March/April, but with basically every miner supporting BFI_INT this is no longer the case.
|
Phoenix Miner developer Donations appreciated at: 1PHoenix9j9J3M6v3VQYWeXrHPPjf7y3rU
|
|
|
kano
Legendary
Offline
Activity: 4480
Merit: 1800
Linux since 1997 RedHat 4
|
|
August 14, 2011, 03:03:42 AM Last edit: August 14, 2011, 06:02:15 AM by kano |
|
Well I've been talking to a few people about this but got no real response from anyone, that it was possible ... (Woke up with this idea back on the 4th of August ...) So I guess I need to post in a thread where someone works on a CL kernel and just let them implement it if they don't already do it I've written it in pseudo-code coz I still don't follow how the CL file actually does 2^n checks and returns the full list of valid results. Yeah I've programmed in almost every language known to man (except C# and that's avoided by choice) but I still don't quite get the interface from C/C++ to the CL and how that matches what happens What I am discussing, is the 2nd call to SHA256 with the output of the first call (not the first call) Anyway, to explain, here's the end of the SHA256 pseudo code from the wikipedia: ================== for i from 0 to 63 s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22) maj := (a and b) xor (a and c) xor (b and c) t2 := s0 + maj s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25) ch := (e and f) xor ((not e) and g) t1 := h + s1 + ch + k[ i] + w[ i] h := g g := f f := e e := d + t1 d := c c := b b := a a := t1 + t2 Add this chunk's hash to result: h0 := h0 + a h1 := h1 + b h2 := h2 + c h3 := h3 + d h4 := h4 + e h5 := h5 + f h6 := h6 + g h7 := h7 + h Then test if h0..h7 is a share (CHECK0, CHECK1, ?) ================== Firstly, I added that last line of course. I understand that with current difficulty, if h0 != 0 then we don't have a share (call this CHECK0) If h0=0 then check some leading part of h1 based on the current difficulty (call this CHECK1) ... feel free to correct this anyone who knows better If a difficulty actually gets to checking h2 then my optimisation can be made even better by going back one more step (adding an i := 61) in the pseudo code shown below A reasonably simple optimisation of the end code for when we are about to check if h0..h7 is a share (i.e. only the 2nd hash) ================== for i from 0 to 61 s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22) maj := (a and b) xor (a and c) xor (b and c) t2 := s0 + maj s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25) ch := (e and f) xor ((not e) and g) t1 := h + s1 + ch + k[ i] + w[ i] h := g g := f f := e e := d + t1 d := c c := b b := a a := t1 + t2 i := 62 s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22) maj := (a and b) xor (a and c) xor (b and c) t2 := s0 + maj s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25) ch := (e and f) xor ((not e) and g) t1 := h + s1 + ch + k[ i] + w[ i] tmpa := t1 + t2 tmpb := h1 + tmpa (this is the actual value of h1 at the end) if CHECK1 on tmpb then abort - not a share (i.e. return false for a share) h := g g := f f := e e := d + t1 d := c c := b b := a a := tmpa i := 63 s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22) maj := (a and b) xor (a and c) xor (b and c) t2 := s0 + maj s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25) ch := (e and f) xor ((not e) and g) t1 := h + s1 + ch + k[ i] + w[ i] tmpa := h0 + t1 + t2 (this is the actual value of h0 at the end) if CHECK0 on tmpa then abort - not a share (i.e. return false for a share) h := g g := f f := e e := d + t1 d := c c := b Add this chunk's hash to result: h0 := tmpa h1 := tmpb h2 := h2 + c h3 := h3 + d h4 := h4 + e h5 := h5 + f h6 := h6 + g h7 := h7 + h Its a share - unless we need to test h2? ================== Firstly the obvious (as I've said twice above): This should only be done when calculating a hash to be tested as a share. Since the actual process is a double-hash, the first hash should not, of course, do this. In i=62: If the tmpb test (CHECK1) says it isn't a share it avoids an entire loop (i=63), the 'e' calculation at i=62 and any unneeded assignments after that and also we don't care about the actual values of h0-h7 so there is no need to assign them anything (or do the additions) except whatever is needed to affirm the result is not a share (e.g. set h0=-1 if h0..h7 must be examined later - or just return false if that is good enough - I don't know which the code actually needs) CHECK1's probability of failure is high so it easily cover the issue of an extra calculation (h1 + tmpa) to do it. In i=63: If the tmpa test (CHECK0) says it isn't a share it avoids the 'e' calculation at i=63 and any unneeded assigments after that and also we don't care about the actual values of h0-h7 so there is no need to assign them anything (or do the additions) except whatever is needed to affirm the result is not a share (e.g. set h0=-1 if h0..h7 must be examined later - or just return false if that is good enough - I don't know which the code actually needs) P.S. any and all mistakes I've made - oh well but the concept is there anyway Any mistakes? Comments?
|
|
|
|
fpgaminer
|
|
August 14, 2011, 10:11:42 AM |
|
I've compiled a Win32 EXE for my poclbm fork (which has phatk, phatk2, phatk2.1, and phatk2.2 support): http://www.bitcoin-mining.com/poclbm-progranism-win32-20110814a.zipmd5sum - df623a45f8cb0a50fcded92728f12c14 Let me know if it works, I was only able to test it on one machine so far. Well I've been talking to a few people about this but got no real response from anyone, that it was possible ... The optimization you've spelled out is more or less already implemented in most, if not all GPU miners. The way GPU miners currently work is that they check in the GPU code whether h7==0. If it does, the result (a nonce) is returned, otherwise nothing is returned. It is the responsibility of the CPU software to do any further difficulty checks if needed. Since the only thing the GPU miners care about is H7, they completely skip the last 3 rounds (stopping after the 61st round). Also note, that GPU miners don't calculate the first 3 rounds of the first pass. Those rounds are pre-computed, because the inputs to those rounds remains constant for a given unit of getwork. So a GPU miner really only computes a grand total of 122 rounds, minus various other small pre-calculations here and there.
|
|
|
|
Clipse
|
|
August 14, 2011, 10:57:41 AM Last edit: August 14, 2011, 11:11:21 AM by Clipse |
|
You may be one, but you are the champion of many Its working great on my lazy spare windows machine, thanks
|
...In the land of the stale, the man with one share is king... >> ClipseWe pay miners at 130% PPS | Signup here : Bonus PPS Pool (Please read OP to understand the current process)
|
|
|
kano
Legendary
Offline
Activity: 4480
Merit: 1800
Linux since 1997 RedHat 4
|
|
August 14, 2011, 11:47:00 AM Last edit: August 14, 2011, 03:39:04 PM by kano |
|
... Well I've been talking to a few people about this but got no real response from anyone, that it was possible ... The optimization you've spelled out is more or less already implemented in most, if not all GPU miners. The way GPU miners currently work is that they check in the GPU code whether h7==0. If it does, the result (a nonce) is returned, otherwise nothing is returned. It is the responsibility of the CPU software to do any further difficulty checks if needed. Since the only thing the GPU miners care about is H7, they completely skip the last 3 rounds (stopping after the 61st round). Also note, that GPU miners don't calculate the first 3 rounds of the first pass. Those rounds are pre-computed, because the inputs to those rounds remains constant for a given unit of getwork. So a GPU miner really only computes a grand total of 122 rounds, minus various other small pre-calculations here and there. OK, so I've got the H's back-to-front (H7 is the first one, not H0) then yeah that makes sense of doing fewer steps yet again than what I said. Still, why not do the share/H6 test in GPU - it would certainly be faster - shares are also rare compared to a job (about 1 in 2 billion) Is that an issue with the CL not being able to be changed based on the difficulty? Yet it could be done as a simple pre-calculated number to AND against the H6 value (extra calculation) when H7 is zero. (I should work out what's the difficulty value high enough to need to test H5 ... though that may be so large it would never be reached) Edit: of course if a nonce (H7=0) is the requirement of a share - then there is no more testing (no testing of H6) required I need to read pushpool more closely to determine exactly what a share is ... unless someone feels like answering that ... Edit2: so skipping the first 3 rounds of the first pass is possible (128 - 3 = 125) but there are actually 3.5 rounds not needed at the end of the 2nd pass - though I guess you already do that Round 60 (2nd round) becomes only the calculations necessary to get t1 (s1 & ch) since unneeded are s0 and maj (and of course t2)
|
|
|
|
Diapolo
|
|
August 14, 2011, 04:05:07 PM Last edit: August 14, 2011, 04:42:12 PM by Diapolo |
|
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time. Any idea Phateus? Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right? Edit: Could be my setup if no one else has this error . Dia
|
|
|
|
techwtf
|
|
August 14, 2011, 04:46:33 PM |
|
One of my cards (5850, 835 MHz. down clock to 810M still failed) seems not able to work well with phatk 2.1/2.2. it die after a while(<1h), having to restart miner(win32)/reset(linux). diabolo's 2011.7.17 is ok @ 835MHz.
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 14, 2011, 05:58:50 PM |
|
I tried to figure out the reason version 2.2 does not work well with VECTORS4 I could not find out why as I do not have enough knowledge. Here are some results I found:
replacing this block of code in version 2.1 with the corresponding block in version 2.2 will make VECTORS4 much slower
#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U))) #define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U))) #define P3(x) W[x-7] #define P4(x) W[x-16]
//Partial Calcs for constant W values #define P1C(n) ((rotate(ConstW[(n)-2],15)^rotate(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U))) #define P2C(n) ((rotate(ConstW[(n)-15],25)^rotate(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U))) #define P3C(x) ConstW[x-7] #define P4C(x) ConstW[x-16]
//SHA round with built in W calc #define sharoundW(n) Vals[(3 + 128 - (n)) % 8] += t1W(n); Vals[(7 + 128 - (n)) % 8] = t1W(n) + t2(n);
//SHA round without W calc #define sharound(n) Vals[(3 + 128 - (n)) % 8] += t1(n); Vals[(7 + 128 - (n)) % 8] = t1(n) + t2(n);
//SHA round for constant W values #define sharoundC(n) Barrier(n); Vals[(3 + 128 - (n)) % 8] += t1C(n); Vals[(7 + 128 - (n)) % 8] = t1C(n) + t2(n);
//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order #define Barrier(n) t1 = t1C((n) % 64)
And this block is not the only thing that causes the problem.
I am guessing there is something to do with rotC function.(it is a guess only
|
|
|
|
fpgaminer
|
|
August 14, 2011, 06:30:05 PM |
|
Still, why not do the share/H6 test in GPU - it would certainly be faster - shares are also rare compared to a job (about 1 in 2 billion) Is that an issue with the CL not being able to be changed based on the difficulty? There are several reasons. 99.99% of the time the mining software only needs to look for Difficulty 1 (a share, H7==0), so there is rarely the needed to check for anything else. GPU's absolutely hate branching; a full Difficulty check involves many branches. Smaller GPU programs are better GPU programs. The CPU runs in parallel to the GPU. Since the CPU is fully capable of checking for extra Difficulty levels, why would you burden the GPU with such work? The CPU should double-check the GPU's results anyway, to detect errors. Since the CPU will thus be recomputing the full two SHA-256 passes for each result returned by the GPU, it again makes sense to only check for higher difficulties on the CPU.
|
|
|
|
Diapolo
|
|
August 14, 2011, 07:47:31 PM |
|
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time. Any idea Phateus? Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right? Edit and solved, non BFI_INT Ch has to be: #define Ch(x, y, z) bitselect(z, y, x) If you want to thank someone, you can donate to 1LY4hGSY6rRuL7BQ8cjUhP2JFHFrPp5JVe (Vince -> who did a GREAT job during my kernel development)! Dia
|
|
|
|
RoadStress
Legendary
Offline
Activity: 1904
Merit: 1007
|
|
August 15, 2011, 01:24:41 PM |
|
Sent another donation your way. Look forward to your work on cgminer.
+1
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 15, 2011, 05:54:59 PM |
|
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time. Any idea Phateus? Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right? Edit and solved, non BFI_INT Ch has to be: #define Ch(x, y, z) bitselect(z, y, x) If you want to thank someone, you can donate to 1LY4hGSY6rRuL7BQ8cjUhP2JFHFrPp5JVe (Vince -> who did a GREAT job during my kernel development)! Dia Awesome, thank you! I was under the assumption that BFI_INT and bitselect were the same operation, apparently, the operand order is different. I will fix it in my next release. Thank you everyone for your support (both in BTC and discussion). I should have a drop-in version of the kernel available for cgminer soon, so anyone wanting to try out the pre-release, I'll be posting it tonight. @BOARBEAR *sigh*.... come on man... do you even read my posts? There is no single cause of the bad performance. 2.2 executes less instructions and uses less registers than 2.1, but as I said... there is some weird issue which makes openCL slower behind the scenes. My best guess is that it has to do with register allocation. The GPU has a total of 256x32x4 registers (8192 UINT4). At the most, there are 256 threads per workgroup (8192/256 = 32 registers per thread). Using VECTORS, the number of registers is far below this number, therefore the hardware can operate on the maximum allowable threads at a time. However, when you compile with VECTORS4, there is more than 32 registers per thread. OpenCL must determine how to allocate the threads, and the utilization of the video card is sub-optimal) Below is a diagram of what I think is going on. 4 thread groups running simultaneously VECTORS (2 running at a time) [1111111122222222] [3333333344444444] using an optimal version of VECTORS4, it would look much like this (double the work is done per thread) [1111111111111111] [2222222222222222] [3333333333333333] [4444444444444444] now making it use slightly less resources will make it slower because the threads are out of sync and there will be overhead in syncing and tracking data within threadgroups: [1111111111111112] [2222222222222233] [3333333333333444] [4444444444445555] Now, I may be waaaaay off here, but something like this is what makes sense to me. Especially, since this would explain why decreasing the memory actually improves performance in some cases (by forcing synchronization). Anyway, enough of my off-topic analysis... I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).
Looking forward to this !! Just sent one coin your way, and there's another once the work is done. We are hitting a ceiling with opencl in general (and perhaps with the current hardware). In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.
Out of curiosity, have you looked into trying to code a version directly in AMD's assembly language and bypassing OpenCL entirely ? (I'm thinking: since we're already patching the ELF output, this seems like the logical next step ) Also, have you looked at AMD CAL ? I know this is what ufasoft's miner uses ( https://bitcointalk.org/index.php?topic=3486.500), and also what zorinaq considers the most efficient way to access AMD hardware (somwhere on http://blog.zorinaq.com) Replacing one instruction in the ELF with another that uses the exact same inputs/outputs is one thing, but manually editing the ASM code is another thing entirely. Besides, with the work that has been done the GPU is already at >99% of the theoretical maximum throughput. (ALU packing) And as said above, we are also close to the theoretical minimum number of instructions to correctly run SHA256. Also, if you look near the end of the hdminer thread you will notice that users are able to get the same hashrates from phatk on 69xx. For 58xx and other VLIW5 cards phatk is significantly faster than hdminer. If that's the best he can do with CAL then I don't see any reason to use it. hdminer had a substantial performance advantage back in March/April, but with basically every miner supporting BFI_INT this is no longer the case. Agreed, the kernel itself is pretty optimal. I might look into calling lower level CAL functions to manage the (OpenCL compiled) GPU threads (instead of using openCL), but I doubt this will give any speedup (although, I might be able to reduce the CPU overhead).
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 16, 2011, 02:35:42 AM |
|
Alright... I'm getting a little delayed on the prerelease for cgminer... mingw is a pain in the ass.. trying a full cygwin install next...
Bear with me, hopefully I'll get it running tomorrow.
-Phateus
|
|
|
|
-ck
Legendary
Offline
Activity: 4088
Merit: 1631
Ruu \o/
|
|
August 16, 2011, 12:07:38 PM |
|
Alright... I'm getting a little delayed on the prerelease for cgminer... mingw is a pain in the ass.. trying a full cygwin install next...
Bear with me, hopefully I'll get it running tomorrow.
-Phateus
You could just tell me what to do to interface it with cgminer (i.e. what new variables you want) and I'd copy most of your kernel across. Only the return code and define macros are actually different in cgminer in the kernel itself.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
August 16, 2011, 03:20:46 PM |
|
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time. Any idea Phateus? Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right? Edit and solved, non BFI_INT Ch has to be: #define Ch(x, y, z) bitselect(z, y, x) If you want to thank someone, you can donate to 1LY4hGSY6rRuL7BQ8cjUhP2JFHFrPp5JVe (Vince -> who did a GREAT job during my kernel development)! Dia Awesome, thank you! I was under the assumption that BFI_INT and bitselect were the same operation, apparently, the operand order is different. I will fix it in my next release. Thank you everyone for your support (both in BTC and discussion). I should have a drop-in version of the kernel available for cgminer soon, so anyone wanting to try out the pre-release, I'll be posting it tonight. @BOARBEAR *sigh*.... come on man... do you even read my posts? There is no single cause of the bad performance. 2.2 executes less instructions and uses less registers than 2.1, but as I said... there is some weird issue which makes openCL slower behind the scenes. My best guess is that it has to do with register allocation. The GPU has a total of 256x32x4 registers (8192 UINT4). At the most, there are 256 threads per workgroup (8192/256 = 32 registers per thread). Using VECTORS, the number of registers is far below this number, therefore the hardware can operate on the maximum allowable threads at a time. However, when you compile with VECTORS4, there is more than 32 registers per thread. OpenCL must determine how to allocate the threads, and the utilization of the video card is sub-optimal) Below is a diagram of what I think is going on. 4 thread groups running simultaneously VECTORS (2 running at a time) [1111111122222222] [3333333344444444] using an optimal version of VECTORS4, it would look much like this (double the work is done per thread) [1111111111111111] [2222222222222222] [3333333333333333] [4444444444444444] now making it use slightly less resources will make it slower because the threads are out of sync and there will be overhead in syncing and tracking data within threadgroups: [1111111111111112] [2222222222222233] [3333333333333444] [4444444444445555] Now, I may be waaaaay off here, but something like this is what makes sense to me. Especially, since this would explain why decreasing the memory actually improves performance in some cases (by forcing synchronization). Anyway, enough of my off-topic analysis... I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).
Looking forward to this !! Just sent one coin your way, and there's another once the work is done. We are hitting a ceiling with opencl in general (and perhaps with the current hardware). In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.
Out of curiosity, have you looked into trying to code a version directly in AMD's assembly language and bypassing OpenCL entirely ? (I'm thinking: since we're already patching the ELF output, this seems like the logical next step ) Also, have you looked at AMD CAL ? I know this is what ufasoft's miner uses ( https://bitcointalk.org/index.php?topic=3486.500), and also what zorinaq considers the most efficient way to access AMD hardware (somwhere on http://blog.zorinaq.com) Replacing one instruction in the ELF with another that uses the exact same inputs/outputs is one thing, but manually editing the ASM code is another thing entirely. Besides, with the work that has been done the GPU is already at >99% of the theoretical maximum throughput. (ALU packing) And as said above, we are also close to the theoretical minimum number of instructions to correctly run SHA256. Also, if you look near the end of the hdminer thread you will notice that users are able to get the same hashrates from phatk on 69xx. For 58xx and other VLIW5 cards phatk is significantly faster than hdminer. If that's the best he can do with CAL then I don't see any reason to use it. hdminer had a substantial performance advantage back in March/April, but with basically every miner supporting BFI_INT this is no longer the case. Agreed, the kernel itself is pretty optimal. I might look into calling lower level CAL functions to manage the (OpenCL compiled) GPU threads (instead of using openCL), but I doubt this will give any speedup (although, I might be able to reduce the CPU overhead). I understand what you are saying. Perhaps version2.1 will be the last version that works well with VECTORS4. You said the work that has been done on the GPU is already at >99% of the theoretical maximum throughput. But VECTORS4 alone gives me about 1.5% boost.(contraindication?) That is why I tried hard to find a way to make VECTORS4 work so that the future versions can use it.
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 16, 2011, 05:49:36 PM |
|
Alright... I'm getting a little delayed on the prerelease for cgminer... mingw is a pain in the ass..
Yeah, mingw is most certainly a giant PITA. To compile cgminer with mingw, the trick is to use msys and get pkg-config and libcurl installed properly For pkg-config, the best is to install this: http://ftp.gnome.org/pub/gnome/binaries/win32/gtk+/2.22/gtk+-bundle_2.22.1-20101227_win32.zipOnce you have that, libcurl is rather easy. trying a full cygwin install next...
Mmmh. Not sure this'll get you very far. If your main dev box is windows and your goal is to integrate phatk into cgminer, your best bet is probably to install a small virtual machine (qemu or vmplayer) running ubuntu inside your windows box and work on cgminer directly on Linux in there. That's exactly what I do (the other way round) when I have to try windows-specific things or a piece of code. Yeah, I think I will stay away from using the mingw environment from now on... Cygwin was easy as pie. No issues, I think can cross compile from cygwin using mingw if I want native Win32 support. Apparently, getting pkg-conf (i think) working without POSIX support is terrible. I got my kernel working around 5am last night linking against the cygwin dlls.. so tonight I will release the changes when I get home. Alright... I'm getting a little delayed on the prerelease for cgminer... mingw is a pain in the ass.. trying a full cygwin install next...
Bear with me, hopefully I'll get it running tomorrow.
-Phateus
You could just tell me what to do to interface it with cgminer (i.e. what new variables you want) and I'd copy most of your kernel across. Only the return code and define macros are actually different in cgminer in the kernel itself. Yeah, if you want, I can send you the changes tonight so you can put it in your release. The only modifications I had to make to the kernel is changing VECTORS to VECTORS2 , hardcoding OUTPUT_SIZE = 4095 and hardcoding WORKSIZE=256 (I really do need this passed to the kernel though). Also, my kernel only uses WORKSIZE+1 entries in the buffer, it would be better if you made the buffer that size. As for the changes in the miner, I think I only had to change the precalc_hash() function, the kernel input and output file name, queue_phatk_kernel() function what I will do tonight, is add KL_PHATK_2_2 to the cl_kernel enum and copy the function code and add the corresponding command line argument (right now I have just replaced PHATK with mine) and add -DWORKSIZE= arguments for the kernel. Anyway, I will give you more details tonight when I am in front of my code. My fork is https://github.com/Phateus/cgminer, I will upload the changes tonight (as soon as I figure out git... never used that before) -Phateus P.S. thanks for the easy to read code
|
|
|
|
-ck
Legendary
Offline
Activity: 4088
Merit: 1631
Ruu \o/
|
|
August 16, 2011, 10:18:23 PM |
|
Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 17, 2011, 04:13:55 AM |
|
Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.
Ok, the source is up... I am trying to figure out how to compile this for windows without the cygwin layer (I really haven't done any of this before... I am soooo lost)... https://github.com/Phateus/cgminerckolivas... if you want to merge this into your code at some point, let me know what I have to do... I literally installed git yesterday, and there is only so much you can learn on the internet in a day ;-) As for the buffer, my kernel only uses WORKSIZE+1 parts of your buffer, but I left the buffer size intact.
|
|
|
|
-ck
Legendary
Offline
Activity: 4088
Merit: 1631
Ruu \o/
|
|
August 17, 2011, 05:14:20 AM |
|
Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.
Ok, the source is up... I am trying to figure out how to compile this for windows without the cygwin layer (I really haven't done any of this before... I am soooo lost)... https://github.com/Phateus/cgminerckolivas... if you want to merge this into your code at some point, let me know what I have to do... I literally installed git yesterday, and there is only so much you can learn on the internet in a day ;-) As for the buffer, my kernel only uses WORKSIZE+1 parts of your buffer, but I left the buffer size intact. Very good work. Nice of you to figure out how to do git and all as well. Don't worry about the merge, I've taken care of everything and cherry picked your changes as I needed to. I've modified a few things too to be consistent with cgminer's code and there is definitely a significant speed advantage thanks to your changes. Note that if you're ever working on git doing your own changes, do them to a branch that's not called master as you may end up making it impossible to pull back my changes since I won't necessarily take all your code. Thanks again, and I'm sure the cgminer users will be most grateful.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 17, 2011, 05:51:08 AM |
|
Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.
Ok, the source is up... I am trying to figure out how to compile this for windows without the cygwin layer (I really haven't done any of this before... I am soooo lost)... https://github.com/Phateus/cgminerckolivas... if you want to merge this into your code at some point, let me know what I have to do... I literally installed git yesterday, and there is only so much you can learn on the internet in a day ;-) As for the buffer, my kernel only uses WORKSIZE+1 parts of your buffer, but I left the buffer size intact. Very good work. Nice of you to figure out how to do git and all as well. Don't worry about the merge, I've taken care of everything and cherry picked your changes as I needed to. I've modified a few things too to be consistent with cgminer's code and there is definitely a significant speed advantage thanks to your changes. Note that if you're ever working on git doing your own changes, do them to a branch that's not called master as you may end up making it impossible to pull back my changes since I won't necessarily take all your code. Thanks again, and I'm sure the cgminer users will be most grateful. Ah, that's how that works... good to know. This whole git seems really useful for working together. Thanks -Phateus
|
|
|
|
-ck
Legendary
Offline
Activity: 4088
Merit: 1631
Ruu \o/
|
|
August 17, 2011, 06:44:39 AM |
|
If you want to restore your tree without losing your changes, create a new branch and reset the master to the last one before your commits.
git checkout master git branch newphatk git reset --hard 58eb4d58599521933a3fef599e1dcba4f996dadc git pull
that will pull my changes into the master branch and your personal changes will be in newphatk. Unfortunately your github account has a messed up master now so
git push -f
will force the changes to propagate. Do not use this command normally as it makes it impossible for people pulling from your branch to keep in sync.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
1984
Newbie
Offline
Activity: 51
Merit: 0
|
|
August 18, 2011, 06:28:30 AM |
|
excellent work on the cgminer, I'm seeing the ~same performance as phoenix and am enjoying the fancy cg features. Donation on it's way.
|
|
|
|
iopq
|
|
August 18, 2011, 08:23:43 AM |
|
I'm getting hardware errors on phatk 2.2, didn't get them on diapolo's or 2.1
the three are about undistinguishable in terms of speed for me
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
August 18, 2011, 05:07:53 PM |
|
I'm getting hardware errors on phatk 2.2, didn't get them on diapolo's or 2.1
the three are about undistinguishable in terms of speed for me
Are you using BFI_INT? Of not, there is a bug in the 2.2 kernel, Vince found that in the kernel.cl file, you have to replace #define Ch(x, y, z) bitselect(x,y,z) on line 78 with #define Ch(x, y, z) bitselect(z, y, x) I haven't gotten around to release a new version, but if you make the change yourself, it should fix it. -Phateus
|
|
|
|
iopq
|
|
August 19, 2011, 01:45:40 AM |
|
I am using BFI_INT, the hardware errors are kind of random I should mention I'm using fpgaminer's poclbm fork for this so maybe it might have something to do with it
|
|
|
|
-ck
Legendary
Offline
Activity: 4088
Merit: 1631
Ruu \o/
|
|
August 19, 2011, 01:48:38 AM |
|
Hey phateus, just a head's up. Your cgminer code only worked for 2 vectors. I've updated it in my git tree to work with 1 and 4. Simple enough change.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
iopq
|
|
September 04, 2011, 12:47:26 PM |
|
how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
September 05, 2011, 10:47:39 PM |
|
how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto.
|
|
|
|
iopq
|
|
September 06, 2011, 10:26:13 AM |
|
how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto. surely, this can be done programmatically, it's just changing clocks and measuring speeds for x seconds and averaging although on my cards some values will make them unstable, lol
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
September 06, 2011, 04:36:44 PM |
|
how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto. surely, this can be done programmatically, it's just changing clocks and measuring speeds for x seconds and averaging although on my cards some values will make them unstable, lol well the OP already said he did it manually. you're free to write a program to do it automatically, or hire someone to write one for you.
|
|
|
|
Lord F(r)og
Donator
Sr. Member
Offline
Activity: 477
Merit: 250
|
|
September 25, 2011, 06:28:38 PM |
|
donated knickknack
|
|
|
|
phelix
Legendary
Offline
Activity: 1708
Merit: 1019
|
|
September 29, 2011, 07:31:57 AM |
|
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
September 29, 2011, 03:21:15 PM |
|
1354 OPs are for two double hashes. SHA256(SHA256(Block_Header1)), SHA256(SHA256(Block_Header2)) so, 677 per double hash. Although, there aren't completely full hashes, since the first and last few rounds (a few %) have optimized out. Also, each ALU OP is a VLIW5 (very long instruction word) instruction which contains 5 integer operations that run simultaneously, so... depending on how you think about it, could be ~3375 integer operations or 677 VLIW5 instructions Hope this helps, let me know if you need any more help with this. I am interested in how this turns out.
|
|
|
|
phelix
Legendary
Offline
Activity: 1708
Merit: 1019
|
|
September 30, 2011, 03:04:31 PM Last edit: September 30, 2011, 03:14:35 PM by phelix |
|
first you shocked me with TWO double hashes but ~3375 integer operations per hash is just perfect edit: did you mean 3385??
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
October 01, 2011, 10:29:26 PM |
|
first you shocked me with TWO double hashes but ~3375 integer operations per hash is just perfect edit: did you mean 3385?? It's actually closer to 3375 because some VLIW5 instructions only have 4 operations in them. I can get a more exact number if needed, but its kinda a PITA cuz AMD's software won't actually tell you outright.
|
|
|
|
phelix
Legendary
Offline
Activity: 1708
Merit: 1019
|
|
October 02, 2011, 08:33:46 PM |
|
first you shocked me with TWO double hashes but ~3375 integer operations per hash is just perfect edit: did you mean 3385?? It's actually closer to 3375 because some VLIW5 instructions only have 4 operations in them. I can get a more exact number if needed, but its kinda a PITA cuz AMD's software won't actually tell you outright. ok thanks for elaborating. I used 3385 in the last calc but will just say it makes up for all the 6xxx cards
|
|
|
|
Crypt_Current
|
|
October 06, 2011, 05:58:56 PM |
|
Is the latest version of phatk the one that's included in LinuxCoin final? I could probably check somehow, as I am a LinuxCoin user... I just don't know much about Linux and don't want to poke at my rig while it's on a roll...
|
|
|
|
Lord F(r)og
Donator
Sr. Member
Offline
Activity: 477
Merit: 250
|
|
October 09, 2011, 01:47:33 PM |
|
If it works out for you and you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software: 124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
-Phateus
It worked out for me and I'm feeling generous, so I donated for further development.
|
|
|
|
Crypt_Current
|
|
October 09, 2011, 04:29:51 PM |
|
If it works out for you and you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software: 124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv
-Phateus
It worked out for me and I'm feeling generous, so I donated for further development. +1 I would if I could. I wonder how many BTC enthusiasts live below the USA's so-called "poverty line"?
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
January 19, 2012, 08:58:16 AM |
|
Bumping an ancient thread. Just noting that phatk 2.2 still holds the crown for fastest kernel on 2.1, 2.5, and 2.6 SDKs for VLIW5 tech (radeon 5xxx and 60xx-68xx). I hope phateus can come back and do even moar tweaks for moar speed!
|
|
|
|
Phateus (OP)
Newbie
Offline
Activity: 52
Merit: 0
|
|
January 21, 2012, 08:44:43 PM |
|
I'm checking back in after being gone for so long... I just downloaded the 2.6 SDK and it destroys my optimization... I will see if there is anything I can do without completely rewriting. Stay tuned and I should have more info later this week. P.S. Thanks to everyone who has donated to me in the past, I have been busy lately, but I have not forgotten. -Phateus
|
|
|
|
ssateneth
Legendary
Offline
Activity: 1344
Merit: 1004
|
|
January 24, 2012, 02:43:21 AM |
|
I'm checking back in after being gone for so long... I just downloaded the 2.6 SDK and it destroys my optimization... I will see if there is anything I can do without completely rewriting. Stay tuned and I should have more info later this week. P.S. Thanks to everyone who has donated to me in the past, I have been busy lately, but I have not forgotten. -Phateus Should be noted that current phatk 2.2 still runs amazingly fast on 2.1 SDK; people suggested poclbm but phatk2 still runs fastest for my system with 2.1 along with 2.4/2.5. At least it does for me. I use VLIW5 hardware. Perhaps keep a 2.1-2.5 kernel around for those that want to keep using it, and a seperate 2.6 optimized kernel for those that need it for GCN architecture hardware or people that game with their gpus (they'll probably be running 1GHz+ on memory which is better suited for VECTORS4 which works well with high memory frequency)
|
|
|
|
BOARBEAR
Member
Offline
Activity: 77
Merit: 10
|
|
January 24, 2012, 04:38:31 AM |
|
Note: new sdk version works best work worksize 64 for 5870
|
|
|
|
|