Bitcoin Forum

Bitcoin => Mining software (miners) => Topic started by: Phateus on May 11, 2011, 05:05:55 PM



Title: Modified Kernel for Phoenix 1.5
Post by: Phateus on May 11, 2011, 05:05:55 PM
phatk Kernel for Phoenix 1.5

I have started working on my phoenix kernel again, so I should be putting out normal updates.  Anyone with bugs, questions or suggestions, post in the thread and I'll try to look into them.  After an update, if you are still having issues, please feel free to post again since it is hard to track which bugs I have fixed and which are still out there.

Version 2.2: https://sourceforge.net/projects/phatk/files/phatk-2.2.zip/download (https://sourceforge.net/projects/phatk/files/phatk-2.2.zip/download)
Version 2.1: https://sourceforge.net/projects/phatk/files/phatk-2.1.zip/download (https://sourceforge.net/projects/phatk/files/phatk-2.1.zip/download)
Version 2.0: https://sourceforge.net/projects/phatk/files/phatk-2.0.zip/download (https://sourceforge.net/projects/phatk/files/phatk-2.0.zip/download)
Version 1.0: https://sourceforge.net/projects/phatk/files/phatk-1.0.zip/download (https://sourceforge.net/projects/phatk/files/phatk-1.0.zip/download)

Make sure if are you using version 2.0, you supply a valid WORKSIZE option (such as "WORKSIZE=256")

Kernel performance (BFI_INT active / APP KernelAnalyzer CAL 11.7 profile):
HD5870 (Also any other 5xxx or 68xx card)
Diapolo 2011-07-17: 1374 ALU OPs
Version 1.0: 1418 ALU OPs
Version 2.0 (7/29/11): 1363 ALU OPs
Version 2.1 (8/2/11): 1359 ALU OPs
Version 2.2 (8/8/11): 1354 ALU OPs

HD6970
Diapolo 2011-07-17: 1698 ALU OPs
Version 1.0: 1747 ALU OPs
Version 2.0: 1691 ALU OPs
Version 2.1: 1692 ALU OPs
Version 2.2: 1688 ALU OPs

As of version 2.1, phatk now has command line option "VECTORS4" which can be used instead of "VECTORS".
This option works on 4 nonces per thread instead of 2 and may increase speed mainly if you do not underclock your memory, but feel free to try it out.  Note that if you use this, you will more than likely have to decrease your WORKSIZE to 128 or 64.

Below is a graph I came up with for my 5870 with the core clocked at 950.
V1 is the speed with no VECTORS option enabled, V2 is with using the standard "VECTORS" and V4 is using the new "VECTORS4" command line option.  The numbers with them show the WORKSIZE.
https://spreadsheets.google.com/spreadsheet/oimg?key=0Ar69rrd0ZESNdGU3NElvU3Q0eFYzYkhuUFJUbkVraUE&oid=1&zx=ks7ngj3nt03g

To install, unzip into the phoenix's kernel folder (files should be in [phoenix root]/kernels/phatk/)

I use the command:
phoenix.exe -u http://user:password@www.bitcoinpool.com:8334/ DEVICE=0 BFI_INT VECTORS AGGRESSION=12 WORKSIZE=256 -k phatk

Lastly, I am keeping track of new features that I have thought of adding to my kernel (not sure what is feasible yet, but these are just things I am looking into).  If anyone has any suggestions, I will add them to the list.  If any of these sound useful to you, let me know so I know where to put my efforts:
  • Precompiled Kernels for SDK 2.4 so, any version of the SDK will get the full speed of the latest SDK
  • Auto-optimize which will iterate through all of the combinations of command line options to give you the fastest hashrate
  • Logging
  • Web Interface for controlling miners and viewing hashrate graphs (this will probably have to be a separate project and would likely slow my progress on optimizing)

If it works out for you and you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software:
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv

-Phateus


Title: Re: Modified Kernel for Phoenix 1.4
Post by: mitak64 on May 11, 2011, 05:21:45 PM
Just tried it.

phoenix 1.4 aggression=11 on HD5850 @865/300 - 340mh/s
phatk 1.4   aggression=11 on HD5850 @865/300 - 338mh/s


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Gnaffel on May 11, 2011, 05:48:11 PM
on my test machine HD5570 OC@700Mhz

poclbm          72MH/s no desktop lag
phoenix          73MH/s no desktop lag
hashkill          75MH/s sometimes slow mouse
phoenix-phatk 76MH/s very slow desktop environment/must be good for headless


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Convery on May 11, 2011, 07:10:00 PM
5850 1050/300:
Phoenix - 402Mhash
Phatk - 415-417Mhash

Quite nice ;3


Title: Re: Modified Kernel for Phoenix 1.4
Post by: bolapara on May 11, 2011, 07:19:04 PM
phoenix 1.46 agg=13 bfi_int vectors

card 1:

5870, 995 core, 300 mem

poclbm - 431
phatk - 438

card 2:

5870, 900 core, 300 mem

poclbm - 389
phatk - 397

Nice little bump.  :)


Title: Re: Modified Kernel for Phoenix 1.4
Post by: EPiSKiNG on May 11, 2011, 07:19:58 PM
5870 @ 970core 300mem
Guiminer-2011.05.01: 431.5MH/s (--platform=0 -v -w 256 -f 0)
PhatK: 426.62MH/s (phoenix.exe -u http://XXX:XXX@deepbit.net:8332/;askrate=15 PLATFORM=0 DEVICE=1 BFI_INT VECTORS AGGRESSION=12 -k phatk)

-EP






Title: Re: Modified Kernel for Phoenix 1.4
Post by: Kick on May 11, 2011, 07:48:47 PM
5870 @ 970core 300mem
Guiminer-2011.05.01: 431.5MH/s (--platform=0 -v -w 256 -f 0)
PhatK: 426.62MH/s (phoenix.exe -u http://XXX:XXX@deepbit.net:8332/;askrate=15 PLATFORM=0 DEVICE=1 BFI_INT VECTORS AGGRESSION=12 -k phatk)

-EP






not really a correct comparison. youre missing the w256 flag for phatK

actually, i take that back. default should be max the device can support.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on May 11, 2011, 07:57:30 PM
I see most of you are running 300 Mhz memory.  One thing that I've noticed from messing around with everything is that 300 Mhz memory can be too slow.  I found that with 1000Mhz core, 330 was optimal for the memory.  At really low memory clocks(especially with my kernel), the speed is limited by the memory.  A good estimation for memory speed(for both the 5850 and 5870) was 1/3 the core speed.

Happy mining

-Phateus


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Convery on May 11, 2011, 08:25:53 PM
I see most of you are running 300 Mhz memory.  One thing that I've noticed from messing around with everything is that 300 Mhz memory can be too slow.  I found that with 1000Mhz core, 330 was optimal for the memory.  At really low memory clocks(especially with my kernel), the speed is limited by the memory.  A good estimation for memory speed(for both the 5850 and 5870) was 1/3 the core speed.
5850 peak Mhash:
1055/300 - 417Mhash
1055/350 - 419Mhash
1055/375 - 420Mhash
1055/400 - 416Mhash - Unstable.
1055/425 - 417Mhash


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Tyran on May 11, 2011, 09:14:13 PM
Unfortunately no improvement on a 5770:
@935/300 -k poclbm: 207.5
@935/300 -k phatk: 202.5
Higher memory clocks only decrease performance more.
Might be because I'm running SDK 2.1, do you think it would make up for the ~5% loss going to 2.4?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: nster on May 11, 2011, 09:18:18 PM
http://bitcointalk.org/index.php?topic=4292.0

I run 1020/344

Went from 301 Mh/s to 310 Mh/s with phatk


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on May 11, 2011, 10:04:52 PM
Unfortunately no improvement on a 5770:
@935/300 -k poclbm: 207.5
@935/300 -k phatk: 202.5
Higher memory clocks only decrease performance more.
Might be because I'm running SDK 2.1, do you think it would make up for the ~5% loss going to 2.4?

Yeah, at least for my kernel, which was specifically written for 2.4.  The optimizations are mainly tricking the compiler into doing what I want it to do, so using 2.4 should increase performance a fair amount  but not actually having different SDKs on any of my machines, I cannot test it.  Might be worth a shot.  Its always a toss-up whether its worth the hassle/down-time to tinker with your miner.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: EPiSKiNG on May 11, 2011, 10:26:47 PM
5870 @ 970core 300mem
Guiminer-2011.05.01: 431.5MH/s (--platform=0 -v -w 256 -f 0)
PhatK: 426.62MH/s (phoenix.exe -u http://XXX:XXX@deepbit.net:8332/;askrate=15 PLATFORM=0 DEVICE=1 BFI_INT VECTORS AGGRESSION=12 -k phatk)

-EP






Also, I am using ATI-Stream-v2.1 (145) & Catalyst 11.3 (3-8-2011)... Haven't tried using 2.4 yet, and I don't really feel like switching it... Is 2.4 supposed to be better performance?

-EP


Title: Re: Modified Kernel for Phoenix 1.4
Post by: jedi95 on May 11, 2011, 10:38:49 PM
Very nice!
I am getting 408 Mhash/sec now vs 394 Mhash/sec using the poclbm kernel. There is also no difference in desktop responsiveness compared to the poclbm kernel.

This is very close to what I get with the poclbm kernel on Linux with SDK 2.1. (410 Mhash/sec, but that's at AGGRESSION=11)

5870 @ 930/300 (Win7 x64, 11.5 + SDK 2.4)
Arguments: FASTLOOP VECTORS BFI_INT AGGRESSION=8

Also, it appears you used an older revision of the poclbm kernel as the base for phatk. It doesn't include the FASTLOOP changes in Phoenix 1.45 and newer. The hashrate comparison above is with the FASTLOOP updates added to phatk, however with these particular settings it should be nearly identical.

Donation coming your way  8)


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Herodes on May 11, 2011, 11:02:00 PM
Seeing an 3-4% increase in hashing speed. Donation coming your way. Thanks for sharing.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: grndzero on May 11, 2011, 11:24:55 PM
5850 900/300

Went from 361 to 355 VECTORS AGGRESSION=12 BFI_INT


Title: Re: Modified Kernel for Phoenix 1.4
Post by: jedi95 on May 11, 2011, 11:30:43 PM
5850 900/300

Went from 361 to 355 VECTORS AGGRESSION=12 BFI_INT

This is probably because it's optimized for SDK 2.4. If you are using the Linux + SDK 2.1 setup in your sig then it's probably better to stick with the poclbm kernel. The advantage of phatk is that it produces similar speed to poclbm + SDK 2.1 with SDK 2.4.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: grndzero on May 11, 2011, 11:32:40 PM
5850 900/300

Went from 361 to 355 VECTORS AGGRESSION=12 BFI_INT

This is probably because it's optimized for SDK 2.4. If you are using the Linux + SDK 2.1 setup in your sig then it's probably better to stick with the poclbm kernel. The advantage of phatk is that it produces similar speed to poclbm + SDK 2.1 with SDK 2.4.

Ah, yeah, I did read that, it failed to register.  (I just woke up)


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Nicksasa on May 11, 2011, 11:33:29 PM
Tried it again on my 6970 @ 925Mhz, dropped from 379mhash to 366mhash on 11.4 & sdk 2.4


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Herodes on May 12, 2011, 12:34:20 AM
Anyone tested it on 5970 yet?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: JWU42 on May 12, 2011, 02:47:58 AM
Tried on 5970 (using 2.1 though so didn't expect much).

367 - poclbm
362 - phatk



Title: Re: Modified Kernel for Phoenix 1.4
Post by: gmaxwell on May 12, 2011, 02:52:38 AM
On a stock 5870 at AGRESSION=12, I get 371 (vs. 353 with the default kernel) and O/C at 1GHz i get  438 (vs. 420 with the default kernel)
With VECTORS and BFI_INT it compiles to 1418 ALU ops for 2 hashes.
[snip]
Id you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software:
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv

On 5870 900/300 383.17 -> 398.16
On the 5850s 852/284  327.12 -> 340.90

CLI:
DISPLAY=:0.0 python phoenix.py -q 2 -u http://15xWuDHSyKzpvp6FacGKXijBeaaaYhKWSi:x@pool.bitcoin.dashjr.org:8337/ -k phatk DEVICE=$1 AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT

SDK 2.4

While screwing around with memory settings previously I found that having an integer ratio of clock to mem made a fair improvement, around 3MH/s with the old kernel vs being near but not quite.  I wasn't sure if this was chance or something substantive, but considering that I'm seeing better improvements (and performance) than some others I thought I'd mention it.

Phateus, you have my thanks and a donation of a day worth of the income improvement your code brought me.




Title: Re: Modified Kernel for Phoenix 1.4
Post by: OtaconEmmerich on May 12, 2011, 03:23:37 AM
poclbm(GUI Miner) 200MHs (-v -w64 -f0)
phatk 210MHs (BFI_INT VECTORS FASTLOOP=false AGGRESSION=12)

This is on a 5770@955/300
I'd say that's worth a small donation from me.
I should try out Diablo miner next, maybe after his upcoming upgrade he may beat your kernel.



Title: Re: Modified Kernel for Phoenix 1.4
Post by: nster on May 12, 2011, 03:40:40 AM
On a stock 5870 at AGRESSION=12, I get 371 (vs. 353 with the default kernel) and O/C at 1GHz i get  438 (vs. 420 with the default kernel)
With VECTORS and BFI_INT it compiles to 1418 ALU ops for 2 hashes.
[snip]
Id you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software:
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv

On 5870 900/300 383.17 -> 398.16
On the 5850s 852/284  327.12 -> 340.90

CLI:
DISPLAY=:0.0 python phoenix.py -q 2 -u http://15xWuDHSyKzpvp6FacGKXijBeaaaYhKWSi:x@pool.bitcoin.dashjr.org:8337/ -k phatk DEVICE=$1 AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT

SDK 2.4

While screwing around with memory settings previously I found that having an integer ratio of clock to mem made a fair improvement, around 3MH/s with the old kernel vs being near but not quite.  I wasn't sure if this was chance or something substantive, but considering that I'm seeing better improvements (and performance) than some others I thought I'd mention it.

Phateus, you have my thanks and a donation of a day worth of the income improvement your code brought me.




not for me, 1020/344 has about 5Mh/s advantage over 1020/340 and another 5Mh/s than 1020/510


Title: Re: Modified Kernel for Phoenix 1.4
Post by: fpgaminer on May 12, 2011, 04:03:23 AM
I ran this new kernel against stock poclbm using my 5970. Although the MHash/s was +10 for the modified kernel, it ended up getting less accepted shares in the long run (several hours). That may just be terrible luck, but I tried it twice; once under Windows, and then under Ubuntu. Both times for several hours. Both times with the same results (stock poclbm with more accepted shares).

???


I have not tried swapping which core the respective kernels were running on, but it's been enough downtime for me today  :P


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Melvin132 on May 12, 2011, 07:06:41 AM
Seems to work great with the 5850 as well. Got my average Mh/s by by an average of 10-15, Using 6 cards that's really respectable.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Clavulanic on May 12, 2011, 07:36:35 AM
2x 5870's "-k poclbm device=1 WORKSIZE=128 VECTORS BFI_INT AGGRESSION=7 FASTLOOP  " core on both is at 935.
I got 385mhash on both with poclbm and now i'm getting 398 on both with phatk.

Does fastloop work with this or not? I'm working on upping my aggression and overclocking still.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: ataranlen on May 12, 2011, 07:49:01 AM
I've just switched to this kernal, Seeing a 10% increase on all GPU's

2x 5870x2's at 950mhz core, getting 407-412mhash/s

Now I want to see just how much I can pull from these with your kernal, so I can update my listings on the wiki! Its only 3am, and I work at 6am, I'm sure I have time to finish xD


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Enky1974 on May 12, 2011, 07:58:55 AM
Poclbm last version = 396         -f 60 -v  -w128          gpu load 98%
phoenix tweaked kernel = 406   aggression 7 fastloop   gpu load 97%

ati 5870 sapphire 1gb ddr3
@950/@333
sdk 2.3
catalyst 11.4


Title: Re: Modified Kernel for Phoenix 1.4
Post by: jedi95 on May 12, 2011, 07:59:58 AM
2x 5870's "-k poclbm device=1 WORKSIZE=128 VECTORS BFI_INT AGGRESSION=7 FASTLOOP  " core on both is at 935.
I got 385mhash on both with poclbm and now i'm getting 398 on both with phatk.

Does fastloop work with this or not? I'm working on upping my aggression and overclocking still.

It does, but it has the same behavior as the poclbm kernel included in Phoenix 1.4. This means it doesn't have as much of a speed benefit at low aggression and it causes stale shares if used with high aggression.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: gmaxwell on May 12, 2011, 08:32:55 AM
I ran this new kernel against stock poclbm using my 5970. Although the MHash/s was +10 for the modified kernel, it ended up getting less accepted shares in the long run (several hours). That may just be terrible luck, but I tried it twice; once under Windows, and then under Ubuntu. Both times for several hours. Both times with the same results (stock poclbm with more accepted shares).

???
I have not tried swapping which core the respective kernels were running on, but it's been enough downtime for me today  :P

I'm in a position to speak objectively about this as I log all my found shares.

More data would helpful, but it's only run for a few hours. Ideally I would have collected data from two cards in parallel over the same time to isolate network effects, instead I'll just exclude the extreme outliers (>90s).

Using the 1814 shares before the change and 1814 since the change on a single node (the 5870), I found that the mean time between shares before was 11.127  seconds and the mean time after was 10.8.  This difference is not large enough to make the 95% confidence intervals assuming an exponential distribution, and a permutation test finds only p=0.369, so with this amount of data I can't say it made it better for _sure_ but it's certainly more likely than not, and it's also very unlikely to have made it worse.

10.8 seconds at difficulty 1 implies 397,688,225 h/s and 11.127 implies 386,000,973 h/s, which is basically what the tool shows... well, a little less— it looks like the performance was overstated a bit before and its less so now?

(The formula for hashrate from share gaps is 281474976710656/(65535*seconds)=h/s)




Title: Re: Modified Kernel for Phoenix 1.4
Post by: jedi95 on May 12, 2011, 08:48:18 AM
I have uploaded a modified version of phatk to the Phoenix SVN. The main difference is that it now has the same FASTLOOP improvements as the poclbm kernel from 1.45.

Performance should be around the same except at low aggression.
Download (http://svn3.xp-dev.com/svn/phoenix-miner/files/kernels/phatk.zip)


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Clavulanic on May 12, 2011, 09:05:33 AM
I have uploaded a modified version of phatk to the Phoenix SVN. The main difference is that it now has the same FASTLOOP improvements as the poclbm kernel from 1.45.

Performance should be around the same except at low aggression.
Download (http://svn3.xp-dev.com/svn/phoenix-miner/files/kernels/phatk.zip)

I didn't realize something had changed with fastloop. Neat.
Same rule applies though is what you're saying right? fastloop at < or = to aggression 7, no fastloop above 7.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on May 12, 2011, 09:25:07 AM
I have uploaded a modified version of phatk to the Phoenix SVN. The main difference is that it now has the same FASTLOOP improvements as the poclbm kernel from 1.45.

Performance should be around the same except at low aggression.
Download (http://svn3.xp-dev.com/svn/phoenix-miner/files/kernels/phatk.zip)

Awesome, thanks :).  I didn't even notice that you changed that code.

And everyone, thanks for informational and BTC support.  It's really really appreciated.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Herodes on May 12, 2011, 10:37:37 AM
On 5970 it seems to increase the hashing rate with 3-4%. More coins your way.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: drcoin on May 12, 2011, 10:43:30 AM
Using latest catalyst drivers with 2.4 on 5830 @974/298 with AGGRESSION=12 BFI_INT VECTORS FASTLOOP=false:

poclbm: 290 Mhash/s
phatk: 301Mhash/s

Nice work!

Edit: Tweaked memory clock - seems to peak around 335Mhz at 302.5 Mhash/s.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Tyran on May 12, 2011, 02:23:44 PM
Decided to give SDK 2.4 a try on my 5770, and I can confirm that this kernel is indeed ~11 mhash/sec faster than the default one, but it is not enough to make up for the 2.1 -> 2.4 loss :(


Title: Re: Modified Kernel for Phoenix 1.4
Post by: OtaconEmmerich on May 12, 2011, 04:24:25 PM
Decided to give SDK 2.4 a try on my 5770, and I can confirm that this kernel is indeed ~11 mhash/sec faster than the default one, but it is not enough to make up for the 2.1 -> 2.4 loss :(
Is 2.1 that much better? I really don't want to use old drivers just to get SDK 2.1, Can you use SKD 2.1 on 11.5? Last time I tried that it was exactly the same speed.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: gmaxwell on May 12, 2011, 06:10:08 PM
Decided to give SDK 2.4 a try on my 5770, and I can confirm that this kernel is indeed ~11 mhash/sec faster than the default one, but it is not enough to make up for the 2.1 -> 2.4 loss :(
Is 2.1 that much better? I really don't want to use old drivers just to get SDK 2.1, Can you use SKD 2.1 on 11.5? Last time I tried that it was exactly the same speed.

People posting numbers from 2.1 appear to be lower than mine on 2.4.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: OtaconEmmerich on May 12, 2011, 06:36:32 PM
Decided to give SDK 2.4 a try on my 5770, and I can confirm that this kernel is indeed ~11 mhash/sec faster than the default one, but it is not enough to make up for the 2.1 -> 2.4 loss :(
Is 2.1 that much better? I really don't want to use old drivers just to get SDK 2.1, Can you use SKD 2.1 on 11.5? Last time I tried that it was exactly the same speed.

People posting numbers from 2.1 appear to be lower than mine on 2.4.

So many conflicting reports..Bah!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: anisoptera on May 12, 2011, 07:25:20 PM
Went from ~402mhash at 950/300 on a 5870 to ~417 with this (SDK 2.4)

Tweaked the memory clock up a bit to 350 and now it's more like 418-420.

Definitely a nice improvement :)


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Enky1974 on May 13, 2011, 10:36:15 AM
Went from ~402mhash at 950/300 on a 5870 to ~417 with this (SDK 2.4)

Tweaked the memory clock up a bit to 350 and now it's more like 418-420.

Definitely a nice improvement :)
with aggression 13 i've 412, same clock settings as you  but sdk 2.3


Title: Re: Modified Kernel for Phoenix 1.4
Post by: exahash on May 14, 2011, 03:27:21 AM
Very nice!  I'm getting almost 10 Mh/s more than with the poclbm kernel on my Sapphire Xtreme 5850.  Thanks Phateus.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: trumpetx on May 14, 2011, 12:32:19 PM
Unfortunately no improvement on a 5770:
@935/300 -k poclbm: 207.5
@935/300 -k phatk: 202.5
Higher memory clocks only decrease performance more.
Might be because I'm running SDK 2.1, do you think it would make up for the ~5% loss going to 2.4?

Same results here on my 5770 - nothing changed really.

@960/300 -k poclbm: 213.4
@960/275 -k poclbm: 214.1
@960/300 -k phatk: 212.8
@960/275 -k phatk: 212.2


Title: Re: Modified Kernel for Phoenix 1.4
Post by: elrock on May 14, 2011, 01:21:12 PM
I get the following error message when I try to run phatk:

Code:
  File "./phoenix.py", line 123, in <module>
    miner.start(options)
  File "/home/elrock/phoenix-1.47/Miner.py", line 74, in start
    self.kernel = self.options.makeKernel(KernelInterface(self))
  File "./phoenix.py", line 112, in makeKernel
    self.kernel = kernelModule.MiningKernel(requester)
  File "kernels/phatk/__init__.py", line 126, in __init__
    platforms = cl.get_platforms()
pyopencl.LogicError: clGetPlatformIDs failed: invalid/unknown error code

I think this may have something to do with the fact that my GPU is DEVICE 1 and not 0.  (For some reason OpenCL recognizes my CPU as DEVICE 0.)  ???


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Enky1974 on May 14, 2011, 01:29:10 PM
I get the following error message when I try to run phatk:

Code:
  File "./phoenix.py", line 123, in <module>
    miner.start(options)
  File "/home/elrock/phoenix-1.47/Miner.py", line 74, in start
    self.kernel = self.options.makeKernel(KernelInterface(self))
  File "./phoenix.py", line 112, in makeKernel
    self.kernel = kernelModule.MiningKernel(requester)
  File "kernels/phatk/__init__.py", line 126, in __init__
    platforms = cl.get_platforms()
pyopencl.LogicError: clGetPlatformIDs failed: invalid/unknown error code

I think this may have something to do with the fact that my GPU is DEVICE 1 and not 0.  (For some reason OpenCL recognizes my CPU as DEVICE 0.)  ???
i've had the same problem when switching from catalyst 11.1 to 11.4, before it was recognized as device 1 and now 0.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: redicarus on May 15, 2011, 01:54:08 AM
910Mhz/300Mhz on a HD5850, jumped from 330~ to 345-350~Mhash/s. Nice job.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: tiberiandusk on May 15, 2011, 07:50:44 AM
On my OC'd 5870 I went from 410 to 430. awwww yeeeeeaaaaah!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: allinvain on May 15, 2011, 04:55:07 PM
This modified kernel kicks ass. Went from 350 to 371 with stock 5970 (850 Mhz - it's the slightly overclocked 4 gb vram one) speeds and aggression level 7. With aggression level 12 performance bumps up to 377. All the memory is at 300 Mhz.

Very nice! Thank you so much OP!!!! :D


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Miner-TE on May 15, 2011, 07:09:08 PM
Nice little bump up from 405 MH/s to 420 Mh/s on my 5870 but power usage went up 15W as well.

~15Mh/s gain with ~15W more power?  :-\

5870 @ 970 core 300 mem
Phoenix 1.46 ~405 Hh/s  (-k poclbm DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256)
72 degC
213-215  Watts (measured by KillAWatt)

Same 5870 @ 970 core 300 mem
Phoenix 1.46 with new Kernel  ~420 Mh/s (-k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256)
75 degC
227-230 Watts (measured by KillAWatt)

Can anyone else verify?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: dishwara on May 15, 2011, 07:17:00 PM
Holy.......
I am using this word second time.
1st time when my hash jumped from 275 to 300 Mhash/s
& now to 313 Mhash/s after using phatk.

phoenix.exe -u http://XXXXXXXXX@mining.bitcoin.cz:8332/ DEVICE=0 VECTORS BFI_INT AGGRESSION=10 -k phatk
HD 6870 With core clk 1038, mem clk 360, fan 100%, temp 75-77C
windows 7 32 bit.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: allinvain on May 16, 2011, 12:26:06 AM
Nice little bump up from 405 MH/s to 420 Mh/s on my 5870 but power usage went up 15W as well.

~15Mh/s gain with ~15W more power?  :-\

5870 @ 970 core 300 mem
Phoenix 1.46 ~405 Hh/s  (-k poclbm DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256)
72 degC
213-215  Watts (measured by KillAWatt)

Same 5870 @ 970 core 300 mem
Phoenix 1.46 with new Kernel  ~420 Mh/s (-k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256)
75 degC
227-230 Watts (measured by KillAWatt)

Can anyone else verify?

I can verify this. My power usage went up too. From 412 to 421 on one rig to 440~

Now the question is whether the extra hash power justifies the extra power consumption..math anyone?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on May 16, 2011, 12:46:07 AM
Nice little bump up from 405 MH/s to 420 Mh/s on my 5870 but power usage went up 15W as well.

~15Mh/s gain with ~15W more power?  :-\

5870 @ 970 core 300 mem
Phoenix 1.46 ~405 Hh/s  (-k poclbm DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256)
72 degC
213-215  Watts (measured by KillAWatt)

Same 5870 @ 970 core 300 mem
Phoenix 1.46 with new Kernel  ~420 Mh/s (-k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256)
75 degC
227-230 Watts (measured by KillAWatt)

Can anyone else verify?

I can verify this. My power usage went up too. From 412 to 421 on one rig to 440~

Now the question is whether the extra hash power justifies the extra power consumption..math anyone?

Ok, According the deepbit's reward calculator, 405MH/s gives .097 and 420MH/s gives 0.10 BTC per hour
Switching gains you .003BTC per hour or at the current exchange rate of $7 per: $0.021/h
the difference in power is 15 Watts (.015 kW)
The price of electricity is about $0.10/kWh (http://www.eia.doe.gov/cneaf/electricity/epm/table5_6_b.html)
The cost of electricity from the increase is  .015 kW * $0.10/kWh = .0015$/h

So... the increase in profits is 14 times higher than the increase in cost.  Unless the price drops to .5 or the dificulty goes up 14-fold, pretty much any overclocking / optimizing is worth it.

Edit: Also, the increase in air conditioning will likely double the cost of electricity, but the cost is still is negligible compared to the increase in profit.

Hope this helps :)

-Phateus


Title: Re: Modified Kernel for Phoenix 1.4
Post by: jondecker76 on May 16, 2011, 12:56:01 AM
Running a single saphire 5850 at 875,900 overclock with ati sdk 2.4

using the poslbm kernel - 328 MHash
using the phatk kernel - 340 MHash!

Very nice!!!  I'll be sure to donate!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: allinvain on May 16, 2011, 07:31:31 AM
Nice little bump up from 405 MH/s to 420 Mh/s on my 5870 but power usage went up 15W as well.

~15Mh/s gain with ~15W more power?  :-\

5870 @ 970 core 300 mem
Phoenix 1.46 ~405 Hh/s  (-k poclbm DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256)
72 degC
213-215  Watts (measured by KillAWatt)

Same 5870 @ 970 core 300 mem
Phoenix 1.46 with new Kernel  ~420 Mh/s (-k phatk DEVICE=0 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256)
75 degC
227-230 Watts (measured by KillAWatt)

Can anyone else verify?

I can verify this. My power usage went up too. From 412 to 421 on one rig to 440~

Now the question is whether the extra hash power justifies the extra power consumption..math anyone?

Ok, According the deepbit's reward calculator, 405MH/s gives .097 and 420MH/s gives 0.10 BTC per hour
Switching gains you .003BTC per hour or at the current exchange rate of $7 per: $0.021/h
the difference in power is 15 Watts (.015 kW)
The price of electricity is about $0.10/kWh (http://www.eia.doe.gov/cneaf/electricity/epm/table5_6_b.html)
The cost of electricity from the increase is  .015 kW * $0.10/kWh = .0015$/h

So... the increase in profits is 14 times higher than the increase in cost.  Unless the price drops to .5 or the dificulty goes up 14-fold, pretty much any overclocking / optimizing is worth it.

Edit: Also, the increase in air conditioning will likely double the cost of electricity, but the cost is still is negligible compared to the increase in profit.

Hope this helps :)

-Phateus

It helps a lot. Thanks for that analysis. I for one very much appreciate it.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: mosimo on May 16, 2011, 08:07:15 AM
I'm running 2x 5870s but at 965 core, 300 mem.

phoenix -u http://blah/ -k poclbm VECTORS AGGRESSION=11 BFI_INT PLATFORM=0 DEVICE=0 WORKSIZE=768
Gets me 404 MH/s

phoenix -u http://blah/ VECTORS AGGRESSION=12 BFI_INT PLATFORM=0 DEVICE=0 WORKSIZE=768 -k phatk
Gets me 420 MH/s

Huge improvement. Thanks for this.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: DiabloD3 on May 16, 2011, 08:08:53 PM
I'm running 2x 5870s but at 965 core, 300 mem.

phoenix -u http://blah/ -k poclbm VECTORS AGGRESSION=11 BFI_INT PLATFORM=0 DEVICE=0 WORKSIZE=768
Gets me 404 MH/s

phoenix -u http://blah/ VECTORS AGGRESSION=12 BFI_INT PLATFORM=0 DEVICE=0 WORKSIZE=768 -k phatk
Gets me 420 MH/s

Huge improvement. Thanks for this.

5xxx maxes out at a worksize of 256.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: icaci on May 16, 2011, 10:49:38 PM
5xxx maxes out at a worksize of 256.
My dual 5870 (w/o CF bridges) maxes out at WORKSIZE=128.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: DiabloD3 on May 17, 2011, 12:15:28 AM
5xxx maxes out at a worksize of 256.
My dual 5870 (w/o CF bridges) maxes out at WORKSIZE=128.

Nope, that too maxes out at 256. What I said was 768 simply is not valid for 5xxx hardware.

Phoenix should output the error OpenCL is returning instead of covering it up.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: nster on May 17, 2011, 12:31:58 AM
5xxx maxes out at a worksize of 256.
My dual 5870 (w/o CF bridges) maxes out at WORKSIZE=128.

by mxes out he means maximum worksize, not maximum hashrate


Title: Re: Modified Kernel for Phoenix 1.4
Post by: jedi95 on May 17, 2011, 12:41:16 AM
5xxx maxes out at a worksize of 256.
My dual 5870 (w/o CF bridges) maxes out at WORKSIZE=128.

Nope, that too maxes out at 256. What I said was 768 simply is not valid for 5xxx hardware.

Phoenix should output the error OpenCL is returning instead of covering it up.

I'll probably add this in the next version, but for now it just uses the maximum supported if you enter a higher value.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: rowbot on May 17, 2011, 12:40:07 PM
Tried it on my 5830 and there was no difference.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Folax on May 21, 2011, 10:48:53 AM
Works nicely on XP64.
Anyone using it on Linux?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: William Reed on May 21, 2011, 03:05:57 PM
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: JayC on May 21, 2011, 03:23:52 PM
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.


Just out of curiosity, how do you tell what worksize you need for a specific card?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: huayra.agera on May 21, 2011, 05:47:35 PM
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.


This worked well for me! Thanks for this tip man! +1: I have 3 5850s and these settings added like 20 Mhash/s while on my 6850 +10Mh/s! Cool!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: lagmo on May 21, 2011, 06:17:20 PM
Very nice job!
Finally got to break the 400Mhash/s barrier on my HD5850, an increase of about 8-10Mhash/s over POCLBM kernel.  ;D


Title: Re: Modified Kernel for Phoenix 1.4
Post by: William Reed on May 21, 2011, 07:10:55 PM
Works very well. I am getting over 440 Mhash/s on HD 5870 (1000/375) with -k phatk AGGRESSION=13 WORKSIZE=256 VECTORS BFI_INT and about 416 Mhash/s on poclbm. However my other HD 5870 running at 950/375 with same switches only hashes about 410 MHash/s with phatk while poclbm gives about 400MHash/s.


Just out of curiosity, how do you tell what worksize you need for a specific card?

There is no general rule. It mostly depends on the architecture and memory technology used.  In heavy scientific calculations best worksize is usually the one that the card can process natively but in mining where a single loop is very simple and fast the optimal worksize can vary. In mining lowering memory clocks saves power and therefore may allow for extra OC on the core thus speeding up computation. If you lower your memory clocks too much it can lower your processing power but this kind of loss can be compensated by lowering worksize.

So without solid background in high speed computation architectures the fastest way to know is trying out all possible combinations.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Syke on May 24, 2011, 05:23:40 PM
Any chance of getting a kernel optimized for the 6xxx series?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: EPiSKiNG on May 25, 2011, 09:19:28 PM
Any chance of getting a kernel optimized for the 6xxx series?

+1 !!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: tiberiandusk on May 26, 2011, 04:39:25 AM
My experience with my 5870 shows that worksize=128 works the best. With worksize=256 I show a slightly higher hashrate but overall submitted shares goes down a bit.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: AngelusWebDesign on June 03, 2011, 05:07:55 PM
Hashkill is faster for me on Linux 64-bit.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: allinvain on June 04, 2011, 09:00:46 AM
Hashkill is faster for me on Linux 64-bit.


Hmm, wish they'd release a windblowz binary soon :(


Title: Re: Modified Kernel for Phoenix 1.4
Post by: dishwara on June 04, 2011, 05:28:05 PM
Waiting for windows version, so i too can get more hashes.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: redcodenl on June 07, 2011, 07:10:01 PM
Any chance of getting a kernel optimized for the 6xxx series?

+1 as well!

I'm now using phatk (with Phoenix) for my double 6870's, it is working like a charm. But the tought that it might do better with an optimized kernel is killing me ;-)
Are there indications a better/optimized kernel for the 6xxx series can be created?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: mbraun on June 11, 2011, 02:52:33 PM
HD5830 (Sapphire, stock volts) with SDK 2.4
VECTORS BFI_INT AGGRESSION=12 DEVICE=0 FASTLOOP=false WORKSIZE=256

1000/300: 298MH/s, 66°C
1000/300: 310MH/s, 66°C (phatk)

Thanks a lot man!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: hugolp on June 11, 2011, 03:13:45 PM
My experience with my 5870 shows that worksize=128 works the best. With worksize=256 I show a slightly higher hashrate but overall submitted shares goes down a bit.

How is this posible?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: hchc on June 11, 2011, 04:11:07 PM
Hashkill is faster for me on Linux 64-bit.


can you post some number? I'm contemplating switching from windows to linux just because of this and not sure if its worth while. Currentlly getting 300mh/s with 5830 at 970/300..


Title: Re: Modified Kernel for Phoenix 1.4
Post by: mbraun on June 11, 2011, 06:41:34 PM
can you post some number? I'm contemplating switching from windows to linux just because of this and not sure if its worth while. Currentlly getting 300mh/s with 5830 at 970/300..

These are already great numbers, don't think they'll change much on linux or windows. I also do not believe that mining gets faster because the CPU is able to work 64bits in a single cycle. It's not GPU related.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Hawkix on June 27, 2011, 09:08:35 PM
Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on July 28, 2011, 03:53:33 AM
Sorry I haven't really be on the forums much lately... wedding planning stuff :D.

But...

Any chance of getting a kernel optimized for the 6xxx series?

The is optimized for the 5xxx series, the 66xx series, the 67xx series and the 68xx series since they all use the same architecture.  Only the 69xx cards use a different architecture which is less efficient for mining (VLIW4 instead of VLIW5 for those who are interested).  I have debated whether to rewrite the kernel for the 69xx series, but at most, it would only increase performance by at most ~1%.

Phateus, would you consider to replace the Ma() macro as suggested by bitless and re-run the ATI optimization to check if it can be further improved? Bitless saved 1 operation from each Ma() call. Maybe, with some re-ordering, this can be optimized further.

In the current version, In addition to numerous very tiny optimizations, I have reordered the Ma() operands which reduce the number of instructions on operations with at least one non-vector operand.
Code:
#define Ma(z, x, y) amd_bytealign((y), (x | z), (z & x))
I think this is what you are talking about...

Anywho... here is my new version which is a very slight improvement over 1.0 (about 1% faster for me).

One thing to note is that you MUST put in a valid WORKSIZE value when running version 1.1 due to one of the optimizations.

https://sourceforge.net/projects/phatk/files/phatk-1.1.zip/download (https://sourceforge.net/projects/phatk/files/phatk-1.1.zip/download)

 Post any questions or bugs you have, thanks

-Phateus


Title: Re: Modified Kernel for Phoenix 1.4
Post by: dishwara on July 28, 2011, 08:23:40 AM
Yours gives less hash than Diapolo's
http://forum.bitcoin.org/index.php?topic=25860.0

Using Diapolo's 2011-7-17 i gets 434 & 427 in 5870
While using yours gave 424 & 417, exactly a 10 Mhash/s less.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on July 28, 2011, 09:16:19 PM
Yours gives less hash than Diapolo's
http://forum.bitcoin.org/index.php?topic=25860.0

Using Diapolo's 2011-7-17 i gets 434 & 427 in 5870
While using yours gave 424 & 417, exactly a 10 Mhash/s less.

Ah.. there is a lot I've missed since I've been gone...

I will combine my improvements and his to see if I can get it lower.  Thanks for the info.

-Phateus


Title: Re: Modified Kernel for Phoenix 1.4
Post by: pennytrader on July 29, 2011, 01:19:37 AM
Great to see the continuous improvment


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on July 29, 2011, 05:55:59 AM
Yours gives less hash than Diapolo's
http://forum.bitcoin.org/index.php?topic=25860.0

Using Diapolo's 2011-7-17 i gets 434 & 427 in 5870
While using yours gave 424 & 417, exactly a 10 Mhash/s less.

Alright, check the first post, I uploaded a second version today with a few tweaks (The Ma() tweak and slight reordering of some operations).  It should be faster than diapolo's now.

Also, anyone who wants to help with this or has any suggestions, PM me and I'll be more than happy to discuss when I get the chance.

And... Diapolo (and anyone else who wants to help), if you read this... We should work together on trying to improve this :)  I think it is a good idea to keep separate code sources to increase the chances of finding optimizations, but if you have any questions about my code, let me know.

-Phateus


Title: Re: Modified Kernel for Phoenix 1.4
Post by: pennytrader on July 29, 2011, 06:24:26 AM
kernel opencl error. does this work with phoenix 1.5?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: krzynek1 on July 29, 2011, 07:00:44 AM
not working with Phoenix r101


Title: Re: Modified Kernel for Phoenix 1.4
Post by: jedi95 on July 29, 2011, 07:20:04 AM
not working with Phoenix r101

Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: CanaryInTheMine on July 29, 2011, 07:22:21 AM
not working with Phoenix r101

Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster.

where can I find r112?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: dishwara on July 29, 2011, 07:23:41 AM
not working with Phoenix r101

Phoenix 1.5 includes the phatk kernel by default, unlike 1.4. Just use the included one. If you want more performance, phatk r112 from the Phoenix SVN is even faster.
where can I find r112?
+1, also how to know the revision number?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: dishwara on July 29, 2011, 07:45:08 AM
I am getting STRANGE results.
Using Diapolo I got 434 & 427. Both are 5870 card.
1st is MSI Lightning 5870 @ 957/319, 1175mV.
2nd is Sapphire HD 5870 @ 939/313, 1163mV.

From your Phatk 2.0, i get 430 & 429. No change in any flags...
434 reduced to 430, But 427 increased to 429.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: lagmo on July 29, 2011, 08:23:51 AM
I'm getting this error when i try to use your 2.0 kernel on Phoenix 1.5/Linuxcoin 2.0(Debian live)
Works just fine on my Win7 x64 box though, so guessing it's specific to linuxcoins default complement of packages.
Code:
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 361, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/opt/miners/phoenix/QueueReader.py", line 136, in preprocess
    d2 = defer.maybeDeferred(self.preprocessor, nr)
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "kernels/phatk/__init__.py", line 167, in <lambda>
    self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr),
  File "kernels/phatk/__init__.py", line 361, in preprocess
    kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION)
  File "kernels/phatk/__init__.py", line 46, in __init__
    unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32)
struct.error: unpack requires a string argument of length 32


Title: Re: Modified Kernel for Phoenix 1.4
Post by: iopq on July 29, 2011, 01:54:34 PM
Running windows 7, 64 bit

I'm getting [29/07/2011 06:50:32] FATAL kernel error: Failed to load OpenCL kernel! when I try the newest one
I tried with phoenix 1.5 and the latest 112 revision, get the same error

I'm doing python phoenix.py -u http://iopq.me:***@mineco.in:
3000/ -k phatk DEVICE=1 VECTORS BFI_INT AGGRESSION=7 WORKSIZE=128

does it have something to do with worksize? because when i supply an invalid worksize to the phatk 1.0 it also gives the same error


Title: Re: Modified Kernel for Phoenix 1.4
Post by: bcforum on July 29, 2011, 02:11:21 PM
Gives an error in Linux (Ubuntu 10.10 x64), Python 2.6.6, Twisted 10.1.0-2:

Code:
[29/07/2011 08:10:17] Phoenix 1.50 starting...
[29/07/2011 08:10:17] Connected to server
[29/07/2011 08:10:17] Server gave new work; passing to WorkQueue
[29/07/2011 08:10:17] New block (WorkQueue)  
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]
Unhandled error in Deferred:
Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 318, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 424, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 441, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess
    d2 = defer.maybeDeferred(self.preprocessor, nr)
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 125, in maybeDeferred
    result = f(*args, **kw)
  File "kernels/phatk/__init__.py", line 167, in <lambda>
    self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr),
  File "kernels/phatk/__init__.py", line 361, in preprocess
    kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION)
  File "kernels/phatk/__init__.py", line 46, in __init__
    unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32)
struct.error: unpack requires a string argument of length 32
[29/07/2011 08:10:17] Server gave new work; passing to WorkQueue
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]^C

I tried changing the 'LLLL' and 'LLLLLLLL' to 'IIII' (like in the old __init__.py, but that caused a new error further along.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: dikidera on July 29, 2011, 02:12:52 PM
Yup the new kernel doesnt work.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: iopq on July 29, 2011, 02:17:58 PM
1.1 doesn't work either for me, same error


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Diapolo on July 29, 2011, 03:31:10 PM
Yours gives less hash than Diapolo's
http://forum.bitcoin.org/index.php?topic=25860.0

Using Diapolo's 2011-7-17 i gets 434 & 427 in 5870
While using yours gave 424 & 417, exactly a 10 Mhash/s less.

Alright, check the first post, I uploaded a second version today with a few tweaks (The Ma() tweak and slight reordering of some operations).  It should be faster than diapolo's now.

Also, anyone who wants to help with this or has any suggestions, PM me and I'll be more than happy to discuss when I get the chance.

And... Diapolo (and anyone else who wants to help), if you read this... We should work together on trying to improve this :)  I think it is a good idea to keep separate code sources to increase the chances of finding optimizations, but if you have any questions about my code, let me know.

-Phateus

Currently looking at your code :) ...

Dia


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Mr.Prayer on July 29, 2011, 04:08:16 PM
Win7 x64, 5870, Catalyst 11.7, latest GUIMiner (Phoenix 1.5).
After copying v2.0 files into "kernels\phatk" i get this messages in console:
Code:
2011-07-29 11:02:49: Listener for "itzod2": [29/07/2011 11:02:49] [4.19 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-29 11:02:50: Listener for "itzod2": [29/07/2011 11:02:50] Warning: work queue empty, miner is idle
No work is being done.

Here's miner starting parameters:
Code:
2011-07-29 18:08:49: Running command: .\phoenix.exe -u http://****:****@lp1.itzod.ru:8344 PLATFORM=0 DEVICE=0 AGGRESSION=12 -k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256

Your v1.0 and Diapolo's 2011-07-17 kernel works fine.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Diapolo on July 29, 2011, 05:51:47 PM
1st question, how is 0x2004000U in line 170 computed? Currently I don't get it :D.

Dia


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on July 29, 2011, 06:12:38 PM
Yup the new kernel doesnt work.

BAH!.. I'll look through it, tonight I am going to a Sublime with Rome and 311 concert... so this weekend.

 
1st question, how is 0x2004000U in line 170 computed? Currently I don't get it :D.

Dia

Basically, since only the last bit is different between the 2 nonces W3.x and W3.y, the first calculation done on those values is P2:
Code:
P2(18) = rot(W[3],25)^rot(W[3],14)^((W[3])>>3U);

So, basically, instead of flipping Bit 0 on W[3] and calculating both W[18].x and W[18].y, we can calculate W[18].x and W[18].y will be the same besides bits 25 and 14 being flipped

Code:
P2(18).x = rot(W[3].x,25)^rot(W[3].x,14)^((W[3].x)>>3U);
W[3].y = W[3].x ^ 1, therefore:

P2(18).y = P2(18).x ^ (rot(1,25)^rot(1,14)^((1)>>3U));
so, P2(18).y = P2(18).x ^ 0x2004000U;


Title: Re: Modified Kernel for Phoenix 1.4
Post by: ssateneth on July 30, 2011, 04:38:25 AM
Win7 x64, 5870, Catalyst 11.7, latest GUIMiner (Phoenix 1.5).
After copying v2.0 files into "kernels\phatk" i get this messages in console:
Code:
2011-07-29 11:02:49: Listener for "itzod2": [29/07/2011 11:02:49] [4.19 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-29 11:02:50: Listener for "itzod2": [29/07/2011 11:02:50] Warning: work queue empty, miner is idle
No work is being done.

Here's miner starting parameters:
Code:
2011-07-29 18:08:49: Running command: .\phoenix.exe -u http://****:****@lp1.itzod.ru:8344 PLATFORM=0 DEVICE=0 AGGRESSION=12 -k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256

Your v1.0 and Diapolo's 2011-07-17 kernel works fine.


This is exactly what happens to mine too. guiminer 7-01-2011 with phoenix 1.5, using newest kernel in this thread, catalyst 11.7, win7 x64.
It just spams "Warning: work queue empty, miner is idle" in console.
I'm going to assume this kernel is either meant for an older miner, or its just plain broken. I'll be looking at this thread and diapolo's for an update. 12alu improvement is huge :), might be able to break 470 mhash on my 5870.

edit: I thought you had to declare kernel arguments -after- the -k switch and argument after to declare what kernel to use.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: BOARBEAR on July 30, 2011, 06:44:30 AM
Win7 x64, 5870, Catalyst 11.7, latest GUIMiner (Phoenix 1.5).
After copying v2.0 files into "kernels\phatk" i get this messages in console:
Code:
2011-07-29 11:02:49: Listener for "itzod2": [29/07/2011 11:02:49] [4.19 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-29 11:02:50: Listener for "itzod2": [29/07/2011 11:02:50] Warning: work queue empty, miner is idle
No work is being done.

Here's miner starting parameters:
Code:
2011-07-29 18:08:49: Running command: .\phoenix.exe -u http://****:****@lp1.itzod.ru:8344 PLATFORM=0 DEVICE=0 AGGRESSION=12 -k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256

Your v1.0 and Diapolo's 2011-07-17 kernel works fine.


This is exactly what happens to mine too. guiminer 7-01-2011 with phoenix 1.5, using newest kernel in this thread, catalyst 11.7, win7 x64.
It just spams "Warning: work queue empty, miner is idle" in console.
I'm going to assume this kernel is either meant for an older miner, or its just plain broken. I'll be looking at this thread and diapolo's for an update. 12alu improvement is huge :), might be able to break 470 mhash on my 5870.

edit: I thought you had to declare kernel arguments -after- the -k switch and argument after to declare what kernel to use.
this is an old bug of phoenix that the author was not able to fix.  Try search for idle bug.  It is not the problem of this kernel.
see http://forum.bitcoin.org/index.php?topic=19169.0


Title: Re: Modified Kernel for Phoenix 1.4
Post by: ssateneth on July 30, 2011, 07:40:15 AM
What makes you so sure its phoenix and not the kernel? 7-11 kernel and 7-17 kernel = work perfectly. Swap out to 2.0 kernel, spams idle.

Btw I searched that thread you linked and didn't see any mention of idle bug. :/

edit: did some other searching and apparently someone mentione didle bug was fixed in 1.50 but guiminer v2011-07-01 uses phoenix 1.50 according to my console, so I don't know where to go from here.
If someone can figure out the problem and give steps to solve the idle bug with guiminer v2011-07-01, catalyst 11.7, opencl driver 2.5, win7 x64, and this kernel, i'll donate 0.25 btc to you. I would prefer to keep using phoenix in guiminer.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: dishwara on July 30, 2011, 12:41:19 PM
Seems it only gives problem to those using GUIminer. I am using AOCLBF 1.75 & i so far didn't get any error, except strange Mhash/s variation which i already posted in this thread some posts back.
GUIminer users , try with AOCLBF to check that also gives you problem.

OS-Windows 7, 64 bit with AERO enabled, catalyst 11.8 beta.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Diapolo on July 30, 2011, 01:15:45 PM
Yup the new kernel doesnt work.

BAH!.. I'll look through it, tonight I am going to a Sublime with Rome and 311 concert... so this weekend.

 
1st question, how is 0x2004000U in line 170 computed? Currently I don't get it :D.

Dia

Basically, since only the last bit is different between the 2 nonces W3.x and W3.y, the first calculation done on those values is P2:
Code:
P2(18) = rot(W[3],25)^rot(W[3],14)^((W[3])>>3U);

So, basically, instead of flipping Bit 0 on W[3] and calculating both W[18].x and W[18].y, we can calculate W[18].x and W[18].y will be the same besides bits 25 and 14 being flipped

Code:
P2(18).x = rot(W[3].x,25)^rot(W[3].x,14)^((W[3].x)>>3U);
W[3].y = W[3].x ^ 1, therefore:

P2(18).y = P2(18).x ^ (rot(1,25)^rot(1,14)^((1)>>3U));
so, P2(18).y = P2(18).x ^ 0x2004000U;

This is the first change that I implemented into my kernel, but it seems that only 69XX cards do benefit from that change. Will investigate further ...

Dia


Title: Re: Modified Kernel for Phoenix 1.4
Post by: ssateneth on July 30, 2011, 01:38:17 PM
Seems it only gives problem to those using GUIminer. I am using AOCLBF 1.75 & i so far didn't get any error, except strange Mhash/s variation which i already posted in this thread some posts back.
GUIminer users , try with AOCLBF to check that also gives you problem.

OS-Windows 7, 64 bit with AERO enabled, catalyst 11.8 beta.


I tried aoclbf about a week or 2 back. I didn't like how the on-screen display was ugly, or didn't even show, so I didn't know the status of my miners. :/


Title: Re: Modified Kernel for Phoenix 1.4
Post by: fpgaminer on July 30, 2011, 02:36:57 PM
Good stuff Phateus :) I'm getting an extra 2-3MH/s with your newest kernel compared to Diapolo's last kernel. I merged the code into my fork of poclbm and it seems to be working fine there (with command line option --phatk2):
https://github.com/progranism/poclbm (https://github.com/progranism/poclbm)

The only bug I found was that the kernel wouldn't compile without BITALIGN. Not really important, since all my mining cards support BITALIGN. It complained about rotate being ambiguous.

Keep up the good work!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: techwtf on July 30, 2011, 02:45:57 PM
I'm also trying to port the kernel to poclbm.

RuntimeError: clBuildProgram failed: build program failure

Build on <pyopencl.Device 'Cypress' at 0x2d6d530>:

/tmp/OCLZO4wZQ.cl(184): error: bad argument type to opencl builtin function:
          expected type "uint2", actual type "int"
     sharoundC(4);
     ^
...

/tmp/OCLZO4wZQ.cl(185): error: bad argument type to opencl builtin function:
          expected type "uint2", actual type "int"
     W[20] = P4C(20) + P1(20);

/tmp/OCLZO4wZQ.cl(186): error: bad argument type to opencl builtin function:
          expected type "uint2", actual type "int"
     sharoundC(5);
     ^


/tmp/OCLZO4wZQ.cl(187): error: bad argument type to opencl builtin function:
          expected type "uint2", actual type "int"
     W[21] = P1(21);
             ^

/tmp/OCLZO4wZQ.cl(189): error: mixed vector-scalar operation not allowed
          unless up-convertable(scalar-type=>vector-element-type)
     W[22] = P3C(22) + P1(22);
                       ^


Update: Is SDK 2.4 Required? I'm using 2.1.
without VECTORS, it works...


Title: Re: Modified Kernel for Phoenix 1.4
Post by: techwtf on July 30, 2011, 03:25:32 PM
Finally Working now. 435.5 -> 437.2. Some other small issue still exists, but not kernel related.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Vince on July 30, 2011, 03:34:00 PM
I had some trouble getting this to work with SDK 2.4, but finally its running.

On my HD6950 I get ~343Mhash/s,
with diapolo's 11-07-17 its slightly better, ~345Mhash/s.

I tested some combinations - but both run best with BFI_INT WORKSIZE=128 VECTORS AGGRESSION=11


Title: Re: Modified Kernel for Phoenix 1.4
Post by: MiningBuddy on July 30, 2011, 05:45:21 PM
V2 works fine on my windows box with 11.6 drivers and sdk 2.4 (gives me around 3mhs extra)
But it does not work on my linux boxes with 11.5 drivers and sdk 2.1 or 2.4, giving errors such as  FATAL kernel error: Failed to load OpenCL kernel!

Dropped back to diapolo's version on my linux box.

Thanks!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: iopq on July 30, 2011, 06:10:13 PM
Good stuff Phateus :) I'm getting an extra 2-3MH/s with your newest kernel compared to Diapolo's last kernel. I merged the code into my fork of poclbm and it seems to be working fine there (with command line option --phatk2):
https://github.com/progranism/poclbm (https://github.com/progranism/poclbm)

The only bug I found was that the kernel wouldn't compile without BITALIGN. Not really important, since all my mining cards support BITALIGN. It complained about rotate being ambiguous.

Keep up the good work!
doesn't work when vectors are turned on
I am running SDK 2.1, it's either -v or --phatk2, doesn't work with both


Title: Re: Modified Kernel for Phoenix 1.4
Post by: BOARBEAR on July 30, 2011, 06:32:29 PM
What makes you so sure its phoenix and not the kernel? 7-11 kernel and 7-17 kernel = work perfectly. Swap out to 2.0 kernel, spams idle.

Btw I searched that thread you linked and didn't see any mention of idle bug. :/

edit: did some other searching and apparently someone mentione didle bug was fixed in 1.50 but guiminer v2011-07-01 uses phoenix 1.50 according to my console, so I don't know where to go from here.
If someone can figure out the problem and give steps to solve the idle bug with guiminer v2011-07-01, catalyst 11.7, opencl driver 2.5, win7 x64, and this kernel, i'll donate 0.25 btc to you. I would prefer to keep using phoenix in guiminer.
"[0 Khash/sec] blah blah" is referring to the idle problem.

See
http://forum.bitcoin.org/index.php?topic=6458.msg229912#msg229912

That is with the original kernel that people are having the idling problem.  And the author basically said he will not be able to fix that bug and he is leaving it to other developers.
He claimed the idling bug might be fixed in 1.50, in fact it was not.

The idling bug has something to do with the AGGRESSION option, try lowing it might fix the problem.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Clipse on July 30, 2011, 08:09:08 PM
Im having issue with this version 2.0 of phatk

On my ubuntu machines running phoenix 1.5, and the 3 updated files from phatk 2.0, I get some Queuereader error and it just hangs.

On my windows machines Im using phatk 2.0 with phoenix 1.5 and it isnt giving me the same error, works great(additional 9mh/s gained from previous diapolo kernel changes.)


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Diapolo on July 31, 2011, 01:00:59 AM
Phat, what is the effect of "LLLL" instead of "IIII" in the .py file? It seems to work even with IIII.

Thanks,
Dia


Title: Re: Modified Kernel for Phoenix 1.4
Post by: 1bitc0inplz on July 31, 2011, 01:42:40 AM
Hey,

Thanks for the updated kernel. My 5830 went from 303 MH/s to 307 MH/s.

However, this new kernel does not seem to work for my 5670, I went back to Diapolo's latest for that card. Using this kernel on my 5670 resulted in a strange "connecting" / "disconnected" loop where the card never actually did any hashing.... very odd.

I don't know if this helps you any, but I'm on Windows 7 64-bit, Catalyst 11.7, Phoenix 1.5, and the configuration that I normally use for my 5670 is:

Code:
phoenix.exe -k phatk VECTORS BFI_INT FASTLOOP=false AGGRESSION=15 WORKSIZE=128 -q 3 -u http://username:password@pool.bitp.it:8334 DEVICE=1


Title: Re: Modified Kernel for Phoenix 1.4
Post by: ssateneth on July 31, 2011, 02:56:00 AM
What makes you so sure its phoenix and not the kernel? 7-11 kernel and 7-17 kernel = work perfectly. Swap out to 2.0 kernel, spams idle.

Btw I searched that thread you linked and didn't see any mention of idle bug. :/

edit: did some other searching and apparently someone mentione didle bug was fixed in 1.50 but guiminer v2011-07-01 uses phoenix 1.50 according to my console, so I don't know where to go from here.
If someone can figure out the problem and give steps to solve the idle bug with guiminer v2011-07-01, catalyst 11.7, opencl driver 2.5, win7 x64, and this kernel, i'll donate 0.25 btc to you. I would prefer to keep using phoenix in guiminer.
"[0 Khash/sec] blah blah" is referring to the idle problem.

See
http://forum.bitcoin.org/index.php?topic=6458.msg229912#msg229912

That is with the original kernel that people are having the idling problem.  And the author basically said he will not be able to fix that bug and he is leaving it to other developers.
He claimed the idling bug might be fixed in 1.50, in fact it was not.

The idling bug has something to do with the AGGRESSION option, try lowing it might fix the problem.

I tried all aggressions from 1 to 14 as well as changing FASTLOOP from false to true. Setting to true would just spam idle. Setting to false above aggression 6 would also spam idle. Setting to 6 or lower would spam stuff like..
Code:
2011-07-30 21:50:53: Listener for "5830": [30/07/2011 21:50:53] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-30 21:50:54: Listener for "5830": [30/07/2011 21:50:54] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-30 21:50:55: Listener for "5830": [30/07/2011 21:50:55] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-30 21:50:56: Listener for "5830": [30/07/2011 21:50:56] [15.22 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-30 21:50:57: Listener for "5830": [30/07/2011 21:50:57] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-30 21:50:58: Listener for "5830": [30/07/2011 21:50:58] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-30 21:50:59: Listener for "5830": [30/07/2011 21:50:59] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-30 21:51:00: Listener for "5830": [30/07/2011 21:51:00] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
2011-07-30 21:51:01: Listener for "5830": [30/07/2011 21:51:01] [16.77 Ghash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]
...which I know is absolutely wrong. Plus my GPU utilization is still 0% when it's spamming that.

The donation (or prize if you want to look at it like that) of 0.25 btc for a fix to this while using guiminer v2011-07-01 win7 x64 is still up for grabs.

edit: I also tried different WORKSIZE, no avail.

Under regular phoenix (no guiminer) the kernel seems to work, though the improvement isn't as big as I had hoped. Still looking for a fix for guiminer based phoenix, unless the phoenix for guiminer is bad. I don't suppose anyone knows is subbing original phoenix 1.5 (the 6 meg one) in for the 22KB one in guiminer will work?

edit2: subbing the 6800KB phoenix.exe in guiminer dir instead of the 22KB one -seems- to work (my shares are going up) but guiminer doesn't know what phoenix is doing. it only shwos shares going up. It doesnt show hash rate or any system messages in the console (LP new block, opencl errors, etc)


Title: Re: Modified Kernel for Phoenix 1.4
Post by: fpgaminer on July 31, 2011, 05:10:05 AM
Quote
doesn't work when vectors are turned on
I am running SDK 2.1, it's either -v or --phatk2, doesn't work with both
How odd. Do you recall what the error message was, if any?


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Yannick on July 31, 2011, 07:27:55 AM
I have two 5870 in my machine with W7 ultimate, AERO disabled.

I'm using:

SDK 2.1
ATI Drivers 10.7

I tested all drivers and SDK versions, this combination of driver and SDK version gave me the highest Hash/sec.

But I'm not able to use the phatk kernel. I'm getting the following error: FATAL kernel error: Failed to load OpenCL kernel! :(

How can I fix this? poclbm works fine. I'd like to try phatk too. :(


Title: Re: Modified Kernel for Phoenix 1.4
Post by: macboy80 on July 31, 2011, 09:23:08 AM
Working well for me.

Radeon 6950 @ 900/900
7/11 = ~360 Mh/s
2.0   = ~370 Mh/s

Windows 7 x64 AERO
Phoenix 1.5 -k phatk DEVICE=0 VECTORS BFI_INT FASTLOOP WORKSIZE=128 AGGRESSION=8
Catalyst 11.7
SDK 2.4

Thanks for the hard work.  :)

EDIT: Did not work in Phoenix 1.4. I had to upgrade to 1.5.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: iopq on July 31, 2011, 10:20:05 AM
Quote
doesn't work when vectors are turned on
I am running SDK 2.1, it's either -v or --phatk2, doesn't work with both
How odd. Do you recall what the error message was, if any?

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[22] = P3C(22) + P1(22);
                          ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[22] = P3C(22) + P1(22);
                          ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(189): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[22] = P3C(22) + P1(22);
                          ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[23] = W16 + P1(23);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[23] = W16 + P1(23);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(190): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[23] = W16 + P1(23);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(191): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(7);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[24] = W17 + P1(24);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[24] = W17 + P1(24);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(192): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[24] = W17 + P1(24);
                      ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(193): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(8);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[25] = P1(25) + P3(25);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[25] = P1(25) + P3(25);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(194): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[25] = P1(25) + P3(25);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(195): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        sharoundC(9);
        ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[26] = P1(26) + P3(26);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[26] = P1(26) + P3(26);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(196): error: mixed
          vector-scalar operation not allowed unless
          up-convertable(scalar-type=>vector-element-type)
        W[26] = P1(26) + P3(26);
                ^

C:\Users\Igor\AppData\Local\Temp\OCL1F64.tmp.cl(197): error: bad argument type
          to opencl builtin function: expected type "uint2", actual type "int"
        W[27] = P1(27) + P3(27);
                ^

Error limit reached.
100 errors detected in the compilation of "C:\Users\Igor\AppData\Local\Temp\OCL1
F64.tmp.cl".
Compilation terminated.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: UniverseMan on July 31, 2011, 11:18:42 PM
I'm using Ubuntu 11.04, Catalyst 11.6, Phoenix 1.50. I unpacked the phatk version 2 files into my phoenix-1.50/kernels/phatk folder.
When I ran my phoenix with kernel options
Code:
-k phatk DEVICE=0 BFI_INT VECTORS AGGRESSION=12 FASTLOOP=FALSE WORKSIZE=256
I got the following error:
Code:
user@computer:~$ sudo ./btcg0.sh
[31/07/2011 18:04:08] Phoenix 1.50 starting...
[31/07/2011 18:04:09] Connected to server
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 361, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess
    d2 = defer.maybeDeferred(self.preprocessor, nr)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "kernels/phatk/__init__.py", line 167, in <lambda>
    self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr),
  File "kernels/phatk/__init__.py", line 361, in preprocess
    kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION)
  File "kernels/phatk/__init__.py", line 46, in __init__
    unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32)
struct.error: unpack requires a string argument of length 32
I then had to CTRL+Z to kill the process.

Not sure if this error is related to anything discussed before. But it's no big deal, as I've merely switched back to my previous kernel. Cheers!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: bcforum on July 31, 2011, 11:23:32 PM
I'm using Ubuntu 11.04, Catalyst 11.6, Phoenix 1.50. I unpacked the phatk version 2 files into my phoenix-1.50/kernels/phatk folder.
When I ran my phoenix with kernel options
Code:
-k phatk DEVICE=0 BFI_INT VECTORS AGGRESSION=12 FASTLOOP=FALSE WORKSIZE=256
I got the following error:
Code:
user@computer:~$ sudo ./btcg0.sh
[31/07/2011 18:04:08] Phoenix 1.50 starting...
[31/07/2011 18:04:09] Connected to server
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 361, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess
    d2 = defer.maybeDeferred(self.preprocessor, nr)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "kernels/phatk/__init__.py", line 167, in <lambda>
    self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr),
  File "kernels/phatk/__init__.py", line 361, in preprocess
    kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION)
  File "kernels/phatk/__init__.py", line 46, in __init__
    unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32)
struct.error: unpack requires a string argument of length 32
I then had to CTRL+Z to kill the process.

Not sure if this error is related to anything discussed before. But it's no big deal, as I've merely switched back to my previous kernel. Cheers!

I get the same error with a similiar setup.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Tx2000 on August 01, 2011, 01:34:16 AM
miner is idle spam in console multiple times a second. Under Windows 7 x64, 11.4 Cat, 2.4 SDK, GuiMiner 2011-07-01. Essentially, does not work.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on August 01, 2011, 05:14:36 PM
Phat, what is the effect of "LLLL" instead of "IIII" in the .py file? It seems to work even with IIII.

Thanks,
Dia

Nothing, I was trying to fix a bug with low WORKSIZE numbers which results in duplicate hashes (not sure if it is solved yet).  Technically, the values are 32-bit which are "L" values instead of 16-bit "I" values, but python seems to handle both the same.

As for all of the other issues, I think there is an issue with SDK 2.1 with my kernel.  I will try explicitly declaring the rotation constant as uint instead of int (that may fix the problem)
if anyone with SDK 2.1 wants to help out:
change
Code:
#define rot(x, y) amd_bitalign(x, x, (32-y))
#else
#define rot(x, y) rotate(x, y)
#endif
to
Code:
#define rot(x, y) amd_bitalign(x, x, (uint)(32-y))
#else
#define rot(x, y) rotate(x, (uint)(y))
#endif
and
Code:
#define rot2(x, y) rotate(x, y)
to
Code:
#define rot2(x, y) rotate(x, (uint)(y))
If anyone tries this out, let me know if it changes anything.


I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having...

Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above?  If I can't figure it out, I may just make all of the constants uint.

Also, one more thing, does "rotate(x, y)" compile to 1 instruction in SDK 2.1?  Running 2.4, explicitly using amd_bitalign does not improve performance (might be cleaner if I can just use rotate(x, y) regardless of whether BITALIGN is defined).

I was also thinking of possibly just precompiling different versions of the kernel and using them, therefore, you'd be able to use the faster 2.4 kernel even if you use SDK 2.1.  I'm not sure if this is possible, but I will look into it.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Diapolo on August 01, 2011, 07:28:40 PM
I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having...

Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above?  If I can't figure it out, I may just make all of the constants uint.

I really don't understand, why the compiler needs so much help and why one has to use such ugly code to get the best performance ... I hope AMD can optimize the compiler, so that we can use clean and straight forward code. I tried to reorder the comands and did not change the code itself and it saved 3 ALU OPs ... for nothing. that sucks so bad!

The SDK 2.1 compatibilty was achieved via type-casts in front of hex-values in the code. Simply add (u) in front, where you use such values.

Dia


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on August 01, 2011, 08:16:15 PM
I've done a few things over the weekend (increased performance another ~.5%) and cleaned up my code a lot, so I will release another version when I figure what is causing some of the issues that people are having...

Diapolo, I know you made some modifications to my kernel to make it compatible with 2.1, are they basically type casting issues like the one above?  If I can't figure it out, I may just make all of the constants uint.

I really don't understand, why the compiler needs so much help and why one has to use such ugly code to get the best performance ... I hope AMD can optimize the compiler, so that we can use clean and straight forward code. I tried to reorder the comands and did not change the code itself and it saved 3 ALU OPs ... for nothing. that sucks so bad!

The SDK 2.1 compatibilty was achieved via type-casts in front of hex-values in the code. Simply add (u) in front, where you use such values.

Dia

OMG yeah, I know... They really need to work on the compiler...

I actually work at the US Patent Office and work in instruction processing... VLIW is a fairly new area and there is a lot of new work coming out.. so give it a couple years (sigh)... What you have to remember that compiling VLIW code is extremely complicated (The kernel itself only uses 21 registers) and most of the instructions have to be based solely on the previous instruction.


from Wikipedia [http://en.wikipedia.org/wiki/Very_long_instruction_word (http://en.wikipedia.org/wiki/Very_long_instruction_word)]
Quote
As a result, VLIW CPUs offer significant computational power with less hardware complexity (but greater compiler complexity) than is associated with most superscalar CPUs.

As is the case with any novel architectural approach, the concept is only as useful as code generation makes it. That is, the fact that a number of special-purpose instructions are available to facilitate certain complicated operations... is useless if compilers are unable to spot relevant source code constructs and generate target code that duly utilizes the CPU's advanced offerings. Therefore, programmers must be able to express their algorithms in a manner that makes the compiler's task easier.

With all of that said, it would be amazing if you could just write:
Code:
Init1();
for (int n = 0; n != 64; n++)
{
SHARound();
}
Init2();
for (int n = 0; n != 64; n++)
{
SHARound();
}
and let the compiler sort it out...


Title: Re: Modified Kernel for Phoenix 1.4
Post by: iopq on August 01, 2011, 09:59:10 PM

change
Code:
#define rot(x, y) amd_bitalign(x, x, (32-y))
#else
#define rot(x, y) rotate(x, y)
#endif
to
Code:
#define rot(x, y) amd_bitalign(x, x, (uint)(32-y))
#else
#define rot(x, y) rotate(x, (uint)(y))
#endif
and
Code:
#define rot2(x, y) rotate(x, y)
to
Code:
#define rot2(x, y) rotate(x, (uint)(y))
If anyone tries this out, let me know if it changes anything.
this works on 2.1 SDK


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Phateus on August 01, 2011, 11:16:49 PM

change
Code:
#define rot(x, y) amd_bitalign(x, x, (32-y))
#else
#define rot(x, y) rotate(x, y)
#endif
to
Code:
#define rot(x, y) amd_bitalign(x, x, (uint)(32-y))
#else
#define rot(x, y) rotate(x, (uint)(y))
#endif
and
Code:
#define rot2(x, y) rotate(x, y)
to
Code:
#define rot2(x, y) rotate(x, (uint)(y))
If anyone tries this out, let me know if it changes anything.
this works on 2.1 SDK


Awesome, Thanks.  I'll implement the changes and release soon.

On another note, I just was searching through AMD's downloads and the KernelAnalyzer 1.9 just came out today with "Support for AMD APP SDK 2.5."... I think someone said that SDK 2.5 is supposed to support BFI_INT natively, so, maybe we can get some better performance with 2.5 *crosses fingers* :)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 02, 2011, 01:21:35 AM
Quote
I think someone said that SDK 2.5 is supposed to support BFI_INT natively,

sounds like it (http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=150003&enterthread=y)

Quote
"In SDK 2.5 we are expanding that, along with other optimizations, to generate BFI instructions."


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 02, 2011, 05:10:57 AM
Quote
I think someone said that SDK 2.5 is supposed to support BFI_INT natively,

sounds like it (http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=150003&enterthread=y)

Quote
"In SDK 2.5 we are expanding that, along with other optimizations, to generate BFI instructions."

Seems you are wrong (at least for now):

Quote
The optimization has been disabled in the current SDK due to a bug in the implementation that didn't get fixed in time.

By the way is there any official Download link for the KernelAnalyzer 1.9?

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: dishwara on August 02, 2011, 05:13:43 AM
By the way is there any official Download link for the KernelAnalyzer 1.9?
Dia
http://developer.amd.com/TOOLS/AMDAPPKERNELANALYZER/Pages/default.aspx
http://developer.amd.com/Downloads/AMDAPPKernelAnalyzer-v1.9.1016.msi


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 02, 2011, 05:34:27 AM
By the way is there any official Download link for the KernelAnalyzer 1.9?
Dia
http://developer.amd.com/TOOLS/AMDAPPKERNELANALYZER/Pages/default.aspx
http://developer.amd.com/Downloads/AMDAPPKernelAnalyzer-v1.9.1016.msi


Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error?

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 02, 2011, 05:36:16 AM
Quote
Seems you are wrong (at least for now):

read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time.

at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 02, 2011, 05:41:36 AM
By the way is there any official Download link for the KernelAnalyzer 1.9?
Dia
http://developer.amd.com/TOOLS/AMDAPPKERNELANALYZER/Pages/default.aspx
http://developer.amd.com/Downloads/AMDAPPKernelAnalyzer-v1.9.1016.msi


Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error?

Dia

edit:  BTW, I always thought your numbers were a couple lower than mine because you defined OUTPUT_MASK as something like "0x10" or something... doing that makes all my numbers match the ones on your thread
lol.... mostly trial and error, Initially, for version 1.1, I looked at filling the gaps in the VLIW assembly (see which VLIW5 only had 4 instructions using barrier(0) instructions to see where in the assembly the OpenCL code is), but that took a LONG time, and I think I am done with that... (it turned out it only gave me like 3 ALU ops anyway).


Quote
Seems you are wrong (at least for now):

read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time.

at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4

Yeah, I said that KernelAnalyzer 1.9 was out today saying that it supports 2.5, but 2.5 isn't out yet... probably tomorrow.


And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 02, 2011, 05:45:26 AM
By the way is there any official Download link for the KernelAnalyzer 1.9?
Dia
http://developer.amd.com/TOOLS/AMDAPPKERNELANALYZER/Pages/default.aspx
http://developer.amd.com/Downloads/AMDAPPKernelAnalyzer-v1.9.1016.msi


Thank you very much! But bad news, I checked phatk 2.0, my old and my new kernel version and all of em use less GPRs. but 1 - 2 ALU OPs more ... SDK 2.5 is a sucker until (again) some optimisations have been done. Phat, how do you order the commands to achieve best performance, are you using the ASM code from KernelAnalyzer or is it trial and error?

Dia

lol.... mostly trial and error, Initially, for version 1.1, I looked at filling the gaps in the VLIW assembly (see which VLIW5 only had 4 instructions using barrier(0) instructions to see where in the assembly the OpenCL code is), but that took a LONG time, and I think I am done with that... (it turned out it only gave me like 3 ALU ops anyway).


Quote
Seems you are wrong (at least for now):

read it again.. he is asking if it is in 2.4..he says I read here.. it will be in 2.5.. isnt it already in the current one.. meaning 2.4.. they answer,, no it was disabled in the current one, meaning 2.4 as it wasnt fixed in time.

at least that is how i read it... note the dates of the posts.. they have to be talking about 2.4

Yeah, I said that KernelAnalyzer 1.9 was out today saying that it supports 2.5, but 2.5 isn't out yet... probably tomorrow.


And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link

Cat 11.8 preview and Cat 11.7 have the SDK 2.5 runtime, so my tests are real :-/.

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 02, 2011, 05:46:45 AM
oooh, I will have to try that out... boo for AMD


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 02, 2011, 05:55:32 AM
Quote
And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link

I'm still getting miner idle errors in guiminer  with  VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2

is it just guiminer?

edit:works fine with aoclbf 1.75.. i wonder why guiminer has such trouble

speed 318 over 315 with diablo 7-17


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 02, 2011, 05:56:07 AM
You are using the OpenCL rotate() instead of amd_bitalign(), what's the benefit here (is it the same under the hood)?

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: pennytrader on August 02, 2011, 05:57:59 AM
With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1

With Diapolo's kernel, I was able to get 314 mhs


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 02, 2011, 06:08:02 AM
With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1

With Diapolo's kernel, I was able to get 314 mhs

AAAH! I think OpenCL is going to make my head explode.. lol

You are using the OpenCL rotate() instead of amd_bitalign(), what's the benefit here (is it the same under the hood)?

Dia

No, just cleaner.. since it is the same code (well... for SDK 2.4 at least)... it looks like 2.1 does not realize that they are the same and I will have to change it back...

Quote
And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link

I'm still getting miner idle errors in guiminer  with  VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2

is it just guiminer?

edit:works fine with aoclbf 1.75.. i wonder why guiminer has such trouble

speed 318 over 315 with diablo 7-17

No clue, I have no used or downloaded GUIMiner, I use aoclbf.  I might be able to take a look at it after figuring out how to make it work for SDK 2.1


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 02, 2011, 06:15:52 AM
With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1

With Diapolo's kernel, I was able to get 314 mhs

I changed it again, if it still doesn't work at all, can you give me some details on the settings you are using?


Title: Re: Modified Kernel for Phoenix 1.5
Post by: pennytrader on August 02, 2011, 06:21:31 AM
With catalyst 11.6 + SDK 2.1, 975/300 setting, I'm only getting 176 mhs with phatk 2.1

With Diapolo's kernel, I was able to get 314 mhs

I changed it again, if it still doesn't work at all, can you give me some details on the settings you are using?

Now worked! 316 Mhash/sec!

-k phatk DEVICE=1 VECTORS BFI_INT AGGRESSION=11 WORKSIZE=256

And it uses 0% CPU as usual.

Excellent work!


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 02, 2011, 06:30:23 AM
And, I just posted another kernel... this one is must better to look at than 2.0... I got rid of all but 3 of the SHARound #defines... Check the first page for the link

Still does not work for me.
Here's the error message:

Code:
/usr/local/lib/python2.6/dist-packages/pyopencl-2011.1-py2.6-linux-x86_64.egg/pyopencl/__init__.py:163: UserWarning: Build succeeded, but resulted in non-empty logs:
Build on <pyopencl.Device 'Cypress' at 0x2cd7590> succeeded, but said:

/tmp/OCLWaeOzJ.cl(152): warning: variable "t1" was set but never used
        u t1;
          ^


  warn("Build succeeded, but resulted in non-empty logs:\n"+message)
[02/08/2011 06:15:45] Finding inner ELF...
[02/08/2011 06:15:45] Patching inner ELF...
[02/08/2011 06:15:45] Patching instructions...
[02/08/2011 06:15:45] BFI-patched 472 instructions...
[02/08/2011 06:15:45] Patch complete, returning to kernel...
[02/08/2011 06:15:45] Applied BFI_INT patch
[02/08/2011 06:15:46] Phoenix r100 starting...
[02/08/2011 06:15:46] Connected to server
[02/08/2011 06:15:46] Server gave new work; passing to WorkQueue
[02/08/2011 06:15:46] New block (WorkQueue)
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 361, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/media/persistent/phoenix/QueueReader.py", line 136, in preprocess
    d2 = defer.maybeDeferred(self.preprocessor, nr)
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "kernels/phatk/__init__.py", line 179, in <lambda>
    self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr),
  File "kernels/phatk/__init__.py", line 379, in preprocess
    kd = KernelData(nr, self.core, self.rateDivisor, self.AGGRESSION)
  File "kernels/phatk/__init__.py", line 46, in __init__
    unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32)
struct.error: unpack requires a str[02/08/2011 06:15:46] Server gave new work; passing to WorkQueue
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]


Hmmm... can you try replacing the 'LLLL' with 'IIII' (line 46 of __init__.py), I think the windows version uses python 2.7 which may handle that differently.

Edit:  I've made the changes already and posted as 2.1 again (hopefully this fixes it)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 02, 2011, 06:52:50 AM
Thanks for the update. Apparently guiminer needs to be updated for this kernel to work though (outdated phoenix?..) It just spams idle on the console. I really need to use guiminer. This is so frustrating :(


Title: Re: Modified Kernel for Phoenix 1.5
Post by: dishwara on August 02, 2011, 07:13:21 AM
2.1version. With VECTORS4 & worksize 128 or 64, i only get 365 instead of 441. I under clock memory.
But with just vectors i get 448 & 432.
Using 2version i got 441 & 427

cards MSI Lightning 5870 & Sapphire HD 5870
MSI  448 Mhash/s - 975/325, 1175mV - aggression 13
Sapphire 432 Mhash/s - 939/313, 1163mV - aggression 12

Windows 7, 64 bit, AERO enabled, AOCLBF 1.75


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 02, 2011, 07:31:26 AM
2.1version. With VECTORS4 & worksize 128 or 64, i only get 365 instead of 441. I under clock memory.
But with just vectors i get 448 & 432.
Using 2version i got 441 & 427

cards MSI Lightning 5870 & Sapphire HD 5870
MSI  448 Mhash/s - 975/325, 1175mV - aggression 13
Sapphire 432 Mhash/s - 939/313, 1163mV - aggression 12

Windows 7, 64 bit, AERO enabled, AOCLBF 1.75

VECTORS4 is only if you DON'T underclock memory i.e. stock memory clocks or the glitch where you can only underclock memory 100mhz lower than core speeds.

Low memory speed (<400MHz) = VECTORS and WORKSIZE=256
High memory speed (>900MHz) = VECTORS4 and WORKSIZE=64 or WORKSIZE=128


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 02, 2011, 07:49:03 AM
Quote
Thanks for the update. Apparently guiminer needs to be updated for this kernel to work though (outdated phoenix?..) It just spams idle on the console. I really need to use guiminer. This is so frustrating

nah phoenix seems up to date I'm guessing it is due to it using python 2.6 instead of 2.7


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 02, 2011, 08:19:39 AM
Quote
Thanks for the update. Apparently guiminer needs to be updated for this kernel to work though (outdated phoenix?..) It just spams idle on the console. I really need to use guiminer. This is so frustrating

nah phoenix seems up to date I'm guessing it is due to it using python 2.6 instead of 2.7

Woooo!, found the bug... it is in my kernel...
replace
Code:
#self.commandQueue.finish()
with
Code:
self.commandQueue.finish()
near the end of __init__.py

*sigh*... Uploaded the file yet again...


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 02, 2011, 08:47:56 AM
Quote
Thanks for the update. Apparently guiminer needs to be updated for this kernel to work though (outdated phoenix?..) It just spams idle on the console. I really need to use guiminer. This is so frustrating

nah phoenix seems up to date I'm guessing it is due to it using python 2.6 instead of 2.7

Woooo!, found the bug... it is in my kernel...
replace
Code:
#self.commandQueue.finish()
with
Code:
self.commandQueue.finish()
near the end of __init__.py

*sigh*... Uploaded the file yet again...

THIS FIXED IT! THANK YOU!!!!!

Donation coming your way. EXCELLENT improvement. Gained 4 mhash on my 5830 and and 5.4 on my 5870. Amazing!
Also 3 x 5830 rig went from 966.1 to 977.3, and increase of 11.2 mhash or 1.159%


Title: Re: Modified Kernel for Phoenix 1.5
Post by: dishwara on August 02, 2011, 09:11:32 AM
With NEW 2.1, same results only. 448 & 432.
I am using 11.8 beta.
AMD APP 2.5.709.2
AMD Display Driver 8.880.3.0000


Title: Re: Modified Kernel for Phoenix 1.5
Post by: John (John K.) on August 02, 2011, 09:36:04 AM
With new 2.1, hashes have improved at 4 mhs from 410 to 414 for my 5850's each ! ;D ;D
Thank you!


Title: Re: Modified Kernel for Phoenix 1.4
Post by: Clipse on August 02, 2011, 10:51:29 AM
Good stuff Phateus :) I'm getting an extra 2-3MH/s with your newest kernel compared to Diapolo's last kernel. I merged the code into my fork of poclbm and it seems to be working fine there (with command line option --phatk2):
https://github.com/progranism/poclbm (https://github.com/progranism/poclbm)

The only bug I found was that the kernel wouldn't compile without BITALIGN. Not really important, since all my mining cards support BITALIGN. It complained about rotate being ambiguous.

Keep up the good work!

Hey fpgaminer, I really like this poclbm version of phatk2 but could you update the same version with --phatk2_1 switch or something so we could testdrive both versions with ease :)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: iopq on August 02, 2011, 12:04:17 PM
I'm getting a warning:

D:\sw\python27\lib\site-packages\pyopencl\__init__.py:173: UserWarning: Build su
cceeded, but resulted in non-empty logs:
Build on <pyopencl.Device 'Juniper
          ' at 0x414d7a0> succeeded, but said:

C:\Users\Igor\AppData\Local\Temp\OCL6496.tmp.cl(155): warning: variable "t1"
          was set but never used
        u t1;
          ^

NT -D
  warn("Build succeeded, but resulted in non-empty logs:\n"+message)


Title: Re: Modified Kernel for Phoenix 1.4
Post by: UniverseMan on August 02, 2011, 01:43:38 PM
Using 2.1, I'm still getting the same error as before.
I'm using Ubuntu 11.04, Catalyst 11.6, Phoenix 1.50. I unpacked the phatk version 2 files into my phoenix-1.50/kernels/phatk folder.
When I ran my phoenix with kernel options
Code:
-k phatk DEVICE=0 BFI_INT VECTORS AGGRESSION=12 FASTLOOP=FALSE WORKSIZE=256
I got the following error:
Code:
user@computer:~$ sudo ./btcg0.sh
[31/07/2011 18:04:08] Phoenix 1.50 starting...
[31/07/2011 18:04:09] Connected to server
[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]Unhandled error in Deferred:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 361, in callback
    self._startRunCallbacks(result)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 542, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/user/phoenix-1.50/QueueReader.py", line 136, in preprocess
    d2 = defer.maybeDeferred(self.preprocessor, nr)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "kernels/phatk/__init__.py", line 167, in <lambda>
    self.qr = QueueReader(self.core, lambda nr: self.preprocess(nr),
  File "kernels/phatk/__init__.py", line 361, in preprocess
    kd = KernelData(nr, self.core, self.VECTORS, self.AGGRESSION)
  File "kernels/phatk/__init__.py", line 46, in __init__
    unpack('LLLL', nonceRange.unit.data[64:]), dtype=np.uint32)
struct.error: unpack requires a string argument of length 32
I then had to CTRL+Z to kill the process.

I'm still getting the unpack 'LLLL' error. (Note: I tried to pipe the output of phoenix through tee to make a log, but tee gave a completely garbled log file and I didn't notice it until after I reverted my kernel files. This is obviously not your problem; I just want you to know why I don't have any new error messages to show.)

This is the same error as znort (except I'm on python 2.7 and he's on 2.6).

You suggested a fix...
Hmmm... can you try replacing the 'LLLL' with 'IIII' (line 46 of __init__.py), I think the windows version uses python 2.7 which may handle that differently.
...which I tried (even though you say it's a windows problem and I'm not on windows). It gave another error at a later 'unpack' call being passed 'LLLLLLLL', and the error said 'unpack requires a string argument of length 64'. I tried changing that one to 'IIIIIIII', but it gave another error down the line that said something like 'incorrect arguments passed to kernel'. (Again, apologies for not having an error log.)

EDIT: I checked something else, and now I'm more ??? than ever. I loaded up the python interpreter on my machine so I can check how it sees the 'LLLL' and 'IIII' strings.
Code:
>>> import struct
>>> struct.calcsize('LLLL')
32
>>> struct.calcsize('IIII')
16
>>> struct.calcsize('LLLLLLLL')
64
>>> struct.calcsize('IIIIIIII')
32
Does this mean it's the nonceRange data and not the 'LLLL' that's the wrong size? How could that be? Does that mean there's some error wherever that nonceRange got packed in the first place?

Like I said, I'm  ???


Title: Re: Modified Kernel for Phoenix 1.4
Post by: UniverseMan on August 02, 2011, 02:23:03 PM
All that stuff I just said.

HA! Got it fixed. Anyone who's having the error I just had, go through __init__.py, and every time there's an 'unpack' or a 'pack' statement that gets passed some number of 'L's (they will be 2, 4, or 8 'L's long), just add an '=' to the beginning. So 'LLLL' becomes '=LLLL'.

If you look up the documentation on how struct (which is where pack and unpack come from) parses its arguments, found here (http://docs.python.org/library/struct.html?highlight=struct#byte-order-size-and-alignment), it's system dependent by default. But if you add the '=', it forces the size characters (Like the 'L's and 'I's and such) to be standard size.  ;D

(Also, I had to uncomment the self.commandQueue.finish() statement, as per this post (https://bitcointalk.org/index.php?topic=7964.msg420457#msg420457). I thought that was fixed, but it was still broken when I dled this morning.)

Kernel - 6870 945/1050 - 5830 1030/330
Diapolo 7-17 - 293 MH/s - 325 MH/s
phatk2.1 - 299 MH/s - 328 MH/s

Thanks for the work, phateus.  ;D ;D


Title: Re: Modified Kernel for Phoenix 1.5
Post by: iopq on August 02, 2011, 02:31:02 PM
using poclbm fork with phatk2.1 and it's the fastest kernel so far
I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Clipse on August 02, 2011, 02:41:19 PM
using poclbm fork with phatk2.1 and it's the fastest kernel so far
I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)

Yeh I must say, poclbm fork with phatk2 outperformed phatk2 on phoenix 1.5 (and up till now all phatk mods performed between on phoenix 1.5 for me)

quite interesting.

ps. iopq, can you post the changes made to run phatk2.1 on poclbm mod by fpgaminer, I assume you are using that? Also what arg is added to use vectors4. Ive replaced phatk2.cl with phatk2.1 cl but I get ~11mh less with phatk2.1 so I am wondering if there is other changes required. I am using sdk 2.4


Title: Re: Modified Kernel for Phoenix 1.5
Post by: UniverseMan on August 02, 2011, 02:43:35 PM
I tried VECTORS4 on my 6870, since I can't underclock the memory (it's at 1050).

Results:
VECTORS
WS 64: 295 MH/s
WS 128: 299 MH/s
WS 256: 292 MH/s

VECTORS4
WS 64: 278 MH/s
WS 128: 258 MH/s
WS 256: 230 MH/s

So VECTORS4 doesn't give me any boost. But thanks for putting it in. New functionality is always a plus.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 02, 2011, 03:10:38 PM
@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: iopq on August 02, 2011, 03:14:14 PM
using poclbm fork with phatk2.1 and it's the fastest kernel so far
I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)

Yeh I must say, poclbm fork with phatk2 outperformed phatk2 on phoenix 1.5 (and up till now all phatk mods performed between on phoenix 1.5 for me)

quite interesting.

ps. iopq, can you post the changes made to run phatk2.1 on poclbm mod by fpgaminer, I assume you are using that? Also what arg is added to use vectors4. Ive replaced phatk2.cl with phatk2.1 cl but I get ~11mh less with phatk2.1 so I am wondering if there is other changes required. I am using sdk 2.4

I'm using that, just replaced the phatk2 kernel with phatk2.1 and that's it
vectors4 should be slower, why would you want to use it? I use -v only


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Clipse on August 02, 2011, 03:21:02 PM
using poclbm fork with phatk2.1 and it's the fastest kernel so far
I tried with 2.4 opencl and it was slower, so I went back to 2.1 which is the fastest on my card (hd 5750)

Yeh I must say, poclbm fork with phatk2 outperformed phatk2 on phoenix 1.5 (and up till now all phatk mods performed between on phoenix 1.5 for me)

quite interesting.

ps. iopq, can you post the changes made to run phatk2.1 on poclbm mod by fpgaminer, I assume you are using that? Also what arg is added to use vectors4. Ive replaced phatk2.cl with phatk2.1 cl but I get ~11mh less with phatk2.1 so I am wondering if there is other changes required. I am using sdk 2.4

I'm using that, just replaced the phatk2 kernel with phatk2.1 and that's it
vectors4 should be slower, why would you want to use it? I use -v only

Just wanted to test vectors4 with default memory, not high priority.

Still phatk2.1 is much slower than phatk2 for me as I said ~11mh per card, ati hd5850 , I wonder why o_0


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Tx2000 on August 02, 2011, 03:52:39 PM
I have a 3Mhash avg improvement over Diapolo last kernel update (393 -> 396)

Setup is as follows:

Reference 5850, 1.100v 920 core / 350 mem.  11.4 preview / SDK 2.4.   Lastest GUIMiner / phoenix 1.50


Going to run it for a day to see it's stability and report back if anything arises.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 02, 2011, 05:34:57 PM
@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia

I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4.  The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs.  So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints.  Does that answer your question?


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 02, 2011, 05:42:57 PM
Quote
Woooo!, found the bug... it is in my kernel...


you rock sir phateus... all is working here and nice speed up.. especially over the stock phatk 1.0

but yeha faster than diablo 7-17 for me.. on a 5830 sdk 2.4 11.6 cat guiminer..


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 02, 2011, 07:15:09 PM
in case some feedback was wanted for VECTORS4, I got about 20 mhash improvment on my 5870 when I have it set to stock speeds (850/1200) when using computer normally (360 -> 380 mhash)
I will continue to use VECTORS when I am AFK (1015/355) for 470.1 mhash.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 02, 2011, 07:20:40 PM
@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia

I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4.  The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs.  So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints.  Does that answer your question?

I understand what you say and it makes sense, but not what I see now ... the variable base in your code _IS_ declared as u and not uint2. Did I look at the old 2.0 version!?

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 02, 2011, 09:26:38 PM
@Phat:

I don't understand how you achieve, that base is always an uint as kernel parameter now that base has (uint2)(0, 1) or (uint4)(0, 1, 2, 3) added into it via the init-file. If I try to do this with my mod it just crashes Phoenix, now if I use const u base, instead of const uint base, it seems to work (because u reflects the correct variable type uint, uint2 or uint4). Have you got an idea for this?

Thanks,
Dia

I'm not sure I understand you... Depending on whether the number of nonces per thread (VECTORS) is 1, 2, or 4, the kernel compiles as base being either uint, uint2 or uint4.  The init file packs either 1, 2 or 4 uinits into each base entry and therefore, the init files always produces the same size variable as the kernel needs.  So, in short, both the base{i] variable being passed to the kernel and the "u base" value in the kernel can be either 1, 2 or 4 uints.  Does that answer your question?

I understand what you say and it makes sense, but not what I see now ... the variable base in your code _IS_ declared as u and not uint2. Did I look at the old 2.0 version!?

Dia

yes, it is declared as u (it was uint2 in 2.0, but have made it variable for efficiency)

Code:
#ifdef VECTORS4
typedef uint4 u;
#else
#ifdef VECTORS
typedef uint2 u;
#else
typedef uint u;
#endif
#endif

u is uint2 when VECTORS is declared

Bah, I know all of this scattered code is confusing


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BTC_Junkie on August 03, 2011, 01:49:08 AM
Thanks, getting +1-3% on my cards... better improvement on 5800 series than 6900 series.


Title: Re: Modified Kernel for Phoenix 1.4
Post by: fpgaminer on August 03, 2011, 02:07:11 AM
Hey fpgaminer, I really like this poclbm version of phatk2 but could you update the same version with --phatk2_1 switch or something so we could testdrive both versions with ease :)
Sure thing. All updated. Added --phatk2_1 option, and --vectors4 (which can only be used in combo with phatk2_1).

https://github.com/progranism/poclbm (https://github.com/progranism/poclbm)

Let me know how it works. I tested it on my 5850s. I tested with no vectors, vectors, and vectors4 and they all seemed to work.

For my own sake, I also added a special feature where you can use "-e -1" to force the hashing estimation algorithm to estimate hashing speed over the entire run-time of the miner, and include both accepted and rejected shares. I'm using it to check that the code is actually hashing at the reported rate; no duplicate nonces or other bugs.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: what@3 on August 03, 2011, 04:44:24 AM
my 6950 took a hit from 390 to 356 Mh/s

however all my 6870's all got a 7 mhs bump!

5830 up by 7 also to 327.9 mhs

Thanks!


Title: Re: Modified Kernel for Phoenix 1.5
Post by: lagmo on August 03, 2011, 03:51:42 PM
Awesome!
V. 2.1 kernel works flawlessly on my Linuxcoin 2.0 rigs (SDK 2.4 + 11.5 catalyst, HD5850/5830) generally + 3-4MH/s across the board compared to Diapolo 17-07
Excellent job!  ;D


Title: Re: Modified Kernel for Phoenix 1.5
Post by: phelix on August 03, 2011, 03:57:03 PM
the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 03, 2011, 06:25:18 PM
the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: iopq on August 04, 2011, 03:57:33 AM
the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.
225 gets unstable, but 200 is fine, you just didn't go LOW enough (kind of how 400 hung my GPU)
try 200, it's the best performance on my card with worksize 256, vectors 2


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 04, 2011, 04:24:32 AM
Alright, new version 2.2 is coming out in the next couple days.

As the front page says, 1354 ALU Ops for the 5xxx series vs. 1359 for 2.1

Changes I've made in 2.2 are:
  • added a rotC function for constant values since the compiler apparently does not know how to perform rotate() on constants
Code:
#define rotC(x,n) (x<<n | x >> (32-n))
    [/li]
  • Small tweaking of the order of certain functions and other random things that shouldn't really have done anything >:o

I will add anything else I think of the next couple days... Also, keep the bug reports coming, so I know if I need to fix anything.


-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 04, 2011, 04:25:03 AM
the graph is really cool! how did you create it?

it's very interesting to see that the hashrate really is increasing with lower mem clock.

Can you go below 300 to see if the hashrate starts declining again at some point? I am running my 5850s at a mem clock of 269 and wonder if that is the optimum.


Manually testing and inputting data into a google docs spreadsheet :-p
As for going under 300, it might decrease performance, but it would probably also start getting unstable around 250 (from what I've tried).  Best to just try it on your own hardware.
225 gets unstable, but 200 is fine, you just didn't go LOW enough (kind of how 400 hung my GPU)
try 200, it's the best performance on my card with worksize 256, vectors 2

Awesome, thanks for the info, I'll definitely try it out.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 04, 2011, 04:34:42 AM
so i coppied
Code:
#define rotC(x,n) (x<<n | x >> (32-n))

and pasted it in my kernel file and nothing blew up

didnt really get any speed increases but of course I am flying blind here so perhaps that wasnt the right thing to do but hey i did it anyways.

My cat got sick on the carpet but I am willing to believe for now that it has nothing to do with your function


Title: Re: Modified Kernel for Phoenix 1.5
Post by: dishwara on August 04, 2011, 04:57:05 AM
My cat got sick on the carpet but I am willing to believe for now that it has nothing to do with your function
LOL


Title: Re: Modified Kernel for Phoenix 1.5
Post by: deepceleron on August 04, 2011, 06:23:13 AM
I have bad news to report - phatk 2.1 sends bad shares.

On pool mining hardware that consistently gets <2% rejects (and those only are stales within 5 seconds of a new block), I have only changed the phatk kernel:

2956/190 = 6.0% rejected
1944/290 = 13.0% rejected
2656/116 = 4.2% rejected
2615/184 = 6.6% rejected

Here's a log from this new kernel showing the atypical random rejects:
(old links)

We can see on the result line that the hashes are bad, by not starting with 00000000:
[03/08/2011 22:17:48] Result c877f46db0d6ab44... rejected

These do not give an "OpenCL error, hardware problem?", or a "didn't meet minimum difficulty, not sending", they are sent and rejected.

For an improvement in hashrate of 1% (333.58->336.53 typical) over Diapolo's 07-17 kernel, I get a 5% increase in rejects. I will have to revert. This is on WinXP/5830/11.6/SDK2.4 running phoenix.py 1.50 unmodified source on Python 2.6.6/numpy-1.6.0/... Two miner instances per GPU.

Command line is:
python phoenix.py -v -u http://xxx/ -k phatk VECTORS AGGRESSION=13 BFI_INT WORKSIZE=256 PLATFORM=0 DEVICE=0


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 04, 2011, 09:08:03 AM
Since I really liked the graph on the front page but thought it lacked granularity, I'm going to take a shot at making a graph too. I'll be doing tests on a 5830 instead of a 5870 though (My 5830 seems a LOT more stable when it comes to memory speeds compared to my 5870).
They'll be based on...
GUIMiner v2011-07-01
Built-in Phoenix miner
11.7 Catalyst
2.5 SDK
phatk 2.1 kernel with..
BFI_INT FASTLOOP=false AGGRESSION=14 and varying worksizes, memory speeds, and VECTORS vs VECTORS4.

Stay tuned :)

Edit: Here's a work in progress spreadsheet. It's updated as I test more combos (need to manually test and update spreadhseet manually).
https://spreadsheets.google.com/spreadsheet/ccc?key=0AjXdY6gpvmJ4dEo4OXhwdTlyeS1Vc1hDWV94akJHZFE&hl=en_US

I was planning to put in worksizes of 192, 96, and 48 too, but phatk 2.1 doesn't seem to support it. Less work for me though :P


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 04, 2011, 12:32:11 PM
@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 04, 2011, 01:45:53 PM
So far tests indicate the first "sweet spot" is ~220 MHz with VECTORS WORKSIZE=128. The next "sweet spot" (and fastest one yet) is ~370-380MHz with VECTORS WORKSIZE=256. Will keep you guys posted as I run through more combos.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: dishwara on August 04, 2011, 03:33:08 PM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325


Title: Re: Modified Kernel for Phoenix 1.5
Post by: CanaryInTheMine on August 04, 2011, 04:16:57 PM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?


Title: Re: Modified Kernel for Phoenix 1.5
Post by: mike678 on August 04, 2011, 04:33:01 PM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350.

I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: CanaryInTheMine on August 04, 2011, 04:35:46 PM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350.

I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.

Mike, did you ever figure out that problem you had with MSI Afterburner?  I think you posted on my thread as well about same issue I had...


Title: Re: Modified Kernel for Phoenix 1.5
Post by: mike678 on August 04, 2011, 04:58:13 PM
Mike, did you ever figure out that problem you had with MSI Afterburner?  I think you posted on my thread as well about same issue I had...
Which thread are you talking about? I know I made a thread the other day in support about afterbuner freezing when I hit apply for my 5850's but cant remember what your thread was. If your talking about the freezing I haven't had a chance to test any further with the 5850's because I literally spent from the time I got out of work to like 1 am working on a skeleton case and trying to figure out why the psu was making a clicking noise.

Also I know you got the ncixus 5850's as well whats your top speed on those so far? I can get up to 395ish with stock voltage.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 04, 2011, 05:18:10 PM
@deepceleron

That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel).

How are you getting the hash results?

If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right?

If they are from the Client, are you running a modified version of phoenix?  I don't think the stock phoenix logs that information.  If you are, could you post details, so I can look into the bug.

Quote
Two miner instances per GPU.
Why are you running 2 instances per GPU?  That seems like it would just increase overhead and double the amount of stales.  Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12.  If that doesn't fix it, I'm not sure what else I can do without further information.

Anyone else getting this bug?


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 04, 2011, 05:33:39 PM
@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance.  Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds).

But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol.

As of the newest edit on the first page, I am using CAL 11.7.


On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little.  I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...


Title: Re: Modified Kernel for Phoenix 1.5
Post by: dishwara on August 04, 2011, 06:17:14 PM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
Only trial & error will tell.
My sweet spot for 6870 is mem clk = (core clk/3) + 14.
I havn't tested sweet spot for 6970 yet, since my mother board is in repair for the past 7 days & when i was mining Linux didn't allowed to under clock not more than core clock minus 125 Mhz.
I hope Windows 11.8 will give correct sweet spot for 6970, which i know once i got my mother board back.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 04, 2011, 07:43:07 PM
@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance.  Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds).

But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol.

As of the newest edit on the first page, I am using CAL 11.7.


On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little.  I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...

In terms of efficiency one has to consider if a higer RAM frequency is worth it, becaus the cards draws much more power with a higher mem clock :-/. The sweet spot for my 5870 and 5830 seems to be @ 350 MHz Mem.

Hope you get a new card soon :)!

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 04, 2011, 07:57:53 PM
Yeah Phatk i hate to say it but I am having similar issues as deepceleron.

I started to notice an uptick in stales, I thought it was due to our proxy as we had problems before and we update it a lot.
about 3-5% across the board

i reverted back to dia 7-17 for the the past 10 hours, and I have less than 1% stales.. which is normal for me.
Using a 5830, 2.4 11.6 win7 32

phoenix,  guiminer  VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 04, 2011, 08:30:41 PM
@joulesbeef

Hmm... this might be a really hard bug to find.  If anyone has any ideas...
At first I was thinking it was because I compare the nonce to 0, but that would only give false negatives (1 in every 4 billion nonce will not be found)
The main difference between mine and diapolo's init file is that I pack 2 bases together and send them to the kernel.  I may try to get rid of the Base variable altogether and just use the offset parameter of the EnqueueKernel() command (I think you can do that in pyopencl)... 
Basically just thinking out loud... :-\
If i didn't love low level programming so much, I think I would shoot myself  :P

@Diapolo

Yeah, the unreleased version I am working on uses 20 registers (It performs about the same as a configuration which uses 19 but has 2 more ALU OPs)
Also, are you getting increased number of stales now that you have implemented some of the optimizations from phatk?


Title: Re: Phatk 2.1 returns bad hashes
Post by: deepceleron on August 04, 2011, 09:04:00 PM
@deepceleron

That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel).

How are you getting the hash results?

If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right?

If they are from the Client, are you running a modified version of phoenix?  I don't think the stock phoenix logs that information.  If you are, could you post details, so I can look into the bug.

Quote
Two miner instances per GPU.
Why are you running 2 instances per GPU?  That seems like it would just increase overhead and double the amount of stales.  Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12.  If that doesn't fix it, I'm not sure what else I can do without further information.

Anyone else getting this bug?


The output that I pastebinned is the standard console output of phoenix in -v verbose mode, I just highlighted the screen output on my console (with a 3000 line buffer) and copy-pasted it. It includes the first eight bytes of the hash in the results as you can see.

Actually when I said that it was unmodified phoenix that I was running, I lied, by forgetting I had done this modification at line 236 in KernelInterface.py (because of a difficulty bug in a namecoin pool I was previously using):

Original:
        if self.checkTarget(hash, nr.unit.target):
            formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
            d = self.miner.connection.sendResult(formattedResult)
            def callback(accepted):
                self.miner.logger.reportFound(hash, accepted)
            d.addCallback(callback)
            return True
        else:
            self.miner.logger.reportDebug("Result didn't meet full "
                   "difficulty, not sending")
            return False

Mine:
        formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
        d = self.miner.connection.sendResult(formattedResult)
        def callback(accepted):
            self.miner.logger.reportFound(hash, accepted)
        d.addCallback(callback)
        return True


All I've done is remove the second difficulty check in phoenix, and trust that the kernel is returning only valid difficulty 1 shares. Now, instead of spitting out an error "Result didn't meet full difficulty, not sending", phoenix sends on all results returned by the kernel to the pool. Without this mod, logs of your kernel would just show a "didn't meet full difficulty" error message instead of rejects from the pool, which would still be a problem (but the helpful hash value wouldn't be printed for debugging). We can see from the hash value that the bad results are nowhere near a valid share.

This code mod only exposes a problem in the kernel optimization code, that sometimes wild hashes are being returned by the kernel from some bad math (or by the kernel code being vulnerable to some overclocking glitch that no other kernel activates.) Are these just "extra" hashes that are leaking though, or is the number of valid shares being returned by the kernel lower too - hard to tell without a very long statistics run.

I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.

With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners.

My python lib versions are documented here (https://bitcointalk.org/index.php?topic=31235.msg427274#msg427274).

Joulesbeef:
I don't like the word 'stales' for rejected shares unless it specifically refer to shares rejected at a block change because they were obsolete when submitted to a pool, as logged by pushpool. The results I have above are not stale work, they are invalid hashes.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 04, 2011, 09:14:48 PM
Do you think VLIW4 is a step backward from VLIW5?

VLIW4 is slower than VLIW5 in many computational tasks


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 04, 2011, 10:15:06 PM
Quote
The output that I pastebinned is the standard console output of phoenix in -v verbose mode
Oh, thanks, I didn't even know you could do that... I'll do some testing with that

I think I'm going to have to download the source code for Phoenix and see what is actually happening...

Quote
I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.

With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners.

I agree totally, go with what works.  I am just trying to figure all this out.  Thanks for all your help.

-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 05, 2011, 01:51:10 AM
If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14


Title: Re: Modified Kernel for Phoenix 1.5
Post by: deepceleron on August 05, 2011, 02:15:00 AM
If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14

Unless you use the -v flag for verbose logging in phoenix, set your console window so it has a log of thousands of lines you can scroll back through, and look for the "Result didn't meet full difficulty, not sending" error message, you wouldn't see any difference.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 05, 2011, 02:39:40 AM
I'll give it another try.. and use the verbose tag to see what is going on.
right now i have 2 rejects over 360 shares on diablos newest 8-4 version.
3 different pools, both rejects at the same pool, all 3 have over 100 shares.
30 shares with yours 2.1 and no rejects which looks good so far.. I'll let you know when i get up over 300, maybe it was a fluke as some of my pools had connection issues.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 05, 2011, 03:00:23 AM
I like the VECTORS4 feature, it gives me extra 5Mhash/s using SDK2.5


Title: Re: Modified Kernel for Phoenix 1.5
Post by: joulesbeef on August 05, 2011, 03:28:59 AM
well crap Phateus, I guess I owe ya an apology. 300 shares, no rejects. it must have been a bad day on the pools i was on.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: navigator on August 05, 2011, 06:50:17 PM
DELETED for privacy


Title: Re: Modified Kernel for Phoenix 1.5
Post by: CanaryInTheMine on August 05, 2011, 06:53:01 PM
I am getting rejects and hardware problem errors since the modification of __init__.py. Diapolo's 7-11 version is the last stable version for me on my 2 5830's @ 1000/350 stock voltage. I switched back to 7-11 version last night and today on BTCGuild I am back to showing 4500 (31, 0.68%) on one card and 4324 (27, 0.62%). I am getting 320mh/s with the 7-11 version and was getting 324mh/s with your phatk 2.1. The number of stales/rejects showing on BTCGuild and in phoenix log after a short period of mining was up to and over 3% on both cards yesterday. If you would like I can let one card run each version to compare the difference. I can also provide a log of phoenix if needed. Not for certain about this, but I believe I was only getting the "hardware problem?" error from diapolo's versions after 7-11 and  your phatk2.0. With phatk2.1 I saw the introduction of the rejected shares. I can provide any info if needed. I assume the kernel is pushing the card too hard as Diapolo mentioned earlier in his thread. I have no idea why it is affecting some and not others.

I had to reduce the OC by 10, kept same memory settings with my 5830s using 2.1


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 05, 2011, 09:30:05 PM
well crap Phateus, I guess I owe ya an apology. 300 shares, no rejects. it must have been a bad day on the pools i was on.


Not a problem, I'm glad its working out for ya

I am getting rejects and hardware problem errors since the modification of __init__.py. Diapolo's 7-11 version is the last stable version for me on my 2 5830's @ 1000/350 stock voltage. I switched back to 7-11 version last night and today on BTCGuild I am back to showing 4500 (31, 0.68%) on one card and 4324 (27, 0.62%). I am getting 320mh/s with the 7-11 version and was getting 324mh/s with your phatk 2.1. The number of stales/rejects showing on BTCGuild and in phoenix log after a short period of mining was up to and over 3% on both cards yesterday. If you would like I can let one card run each version to compare the difference. I can also provide a log of phoenix if needed. Not for certain about this, but I believe I was only getting the "hardware problem?" error from diapolo's versions after 7-11 and  your phatk2.0. With phatk2.1 I saw the introduction of the rejected shares. I can provide any info if needed. I assume the kernel is pushing the card too hard as Diapolo mentioned earlier in his thread. I have no idea why it is affecting some and not others.

Any information you can post would be helpful in fixing this.  I am going to try to rewrite some of the init file and see if I can make it a bit more stable...



Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 06, 2011, 12:20:05 AM
I am getting rejects and hardware problem errors since the modification of __init__.py. Diapolo's 7-11 version is the last stable version for me on my 2 5830's @ 1000/350 stock voltage. I switched back to 7-11 version last night and today on BTCGuild I am back to showing 4500 (31, 0.68%) on one card and 4324 (27, 0.62%). I am getting 320mh/s with the 7-11 version and was getting 324mh/s with your phatk 2.1. The number of stales/rejects showing on BTCGuild and in phoenix log after a short period of mining was up to and over 3% on both cards yesterday. If you would like I can let one card run each version to compare the difference. I can also provide a log of phoenix if needed. Not for certain about this, but I believe I was only getting the "hardware problem?" error from diapolo's versions after 7-11 and  your phatk2.0. With phatk2.1 I saw the introduction of the rejected shares. I can provide any info if needed. I assume the kernel is pushing the card too hard as Diapolo mentioned earlier in his thread. I have no idea why it is affecting some and not others.
Maybe try disable overclock.  I had an experience where switching to new miner caused problems.  Then I revert back to stock clock and problem solved.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 06, 2011, 02:38:53 AM
I've gotten "Hardware problem?" errors often since the __init__ patches of 7-17 diapolo and phatk 2.0+, but they don't seem to cause any decrease in mhash, or crashes, or increased stales, so I'm not too worried about them. I'm assuming it causes "Hardware problem" when 1 hash isn't quite right, and getting 1 "bad" hash error every 10 minutes while doing 330 million hashes every second... you can probably see where I'm going with this.

If it matters, reducing my memory clock from 370 to 350 drastically reduced my "Hardware problem?" errors. I've gone from 1 every ~2 minutes to 1 every ~30 minutes.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: navigator on August 06, 2011, 03:00:37 AM
DELETED for privacy


Title: Re: Modified Kernel for Phoenix 1.5
Post by: bcforum on August 06, 2011, 11:26:28 AM
Now running 2.1 on one card and at ~500 shares without a single reject or hardware problem. I don't get it. Maybe a pool issue? I know for certain that I still haven't seen one hardware problem? error come up running diapolo's 7-11 version. My temps are usually 62c or lower, currently at 59c.

Rejects are valid hashes that weren't accepted by the pool for some reason. Generally it is because the pool has moved on (added transactions, bumped up the timestamp, etc) Reject count shouldn't be used to validate GPU kernel performance, but you can argue good mining software shouldn't have a high reject rate.

Phoenix (and other mining frontends) checks the results of the GPU miners before sending it on to the pool. If the nonce is invalid (doesn't generate a hash with zeros in the proper place) it is reported as a hardware problem. Different kernels may cause more hardware errors due to how they are exercising the GPU. Unfortunately, this is a problem with your hardware and not the kernel. You will probably have to tweak your overclocking for each kernel to find a stable operating point.



Title: Re: Modified Kernel for Phoenix 1.5
Post by: navigator on August 06, 2011, 04:25:36 PM
DELETED for privacy


Title: Re: Modified Kernel for Phoenix 1.5
Post by: deepceleron on August 06, 2011, 07:19:29 PM
I left 2.1 running on both cards overnight. Today, one card is showing only the hardware problem error occasionally, no increase in rejects. Here is ~5hr log of it to get an idea of how often it pops up, http://pastebin.com/raw.php?i=qQedNWRG (http://pastebin.com/raw.php?i=qQedNWRG). I immediately restarted that miner using 2.1 and got the first hardware problem after 4mins of mining. Next I switched back to diapolo 7-11. Ran it for 15 mins without a single hardware error. Okay once more back to 2.1, again another hardware problem after only 2 mins this time. So I can turn the problem on and off by switching versions. This whole time my other card has been running 2.1 since last night without a single problem. I'm not too concerned about the hardware problem error. But the first night I tried out 2.1 I was getting a large amount of rejects also.

EDIT: Backed off clock 10mHz like CanaryInTheMine said in his earlier post to 990 for 10 mins and no problem showed up. Put back to 1000 and one popped up after ~30secs. Running the newer version at the slower speed doesn't net me any gain in mhash.

Again, the hardware problem doesn't concern me much. It's just the sudden amounts of rejects I was getting after I first switched to 2.1

Well the hardware error indicates not necessarily a 'hardware error', but a bad hash was detected outside the running OpenCL kernel by the first validity check:

In __init.py__:
 if not hash.endswith('\x00\x00\x00\x00'):
                        self.interface.error('Unusual behavior from OpenCL. '
                            'Hardware problem?')


So to get this error, either the SHA hashing math or the hash checking in OpenCL was corrupted, and an invalid hash was returned as a valid share. If this happens, then it is also possible that a hash that would be a valid solution could be corrupted and not returned or sent.

Diapolo's 7-17 kernel is also more sensitive to overclock than previous ones, and will start returning the 'hardware error' at overclocks where 7-11 doesn't. Either a different stream core instruction on the die is being used that doesn't overclock well, or the higher utilization causes some failure. My way of thinking is you don't want to overclock to the point where any bad math is happening in any stream processors. If you have to overclock 5% less on a kernel that is 1% more efficient, then you lose any gains.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: phorensic on August 08, 2011, 06:20:12 AM
Ever since the kernel development for phatk got fired up again people noticed more "kernel errors".  After weeks of tweaking and testing I've come to realize a few simple things.  When overclocking to the edge, first the kernel will throw kernel error's a couple times an hour, after just a *few* more MHz it will actually crash the driver and reset the desktop, possibly killing the connection to the pool server.  It's amazing how razor thing those margins are.  Even as small as 4MHz on the core clock can go from rock stable all day, to kernel error, to crashed driver.  I don't see it as a bug, I see it as the kernel reporting what is going on better than before (verbosity).


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ovidiusoft on August 08, 2011, 06:26:15 AM
@deepceleron

As long as the board doesn't crash or overheat, I think you just need to find out if the increase in hashrate is significantly more than the increase in hardware errors. For my board, on Diapolo's 08-04 version I get 0,2% hardware errors (out of the accepted shares, maybe time referenced would be better) at 1040 Mhz. At 1050, it's 0,21%, so a variance that can be as well network conditions or measurement errors (if time was involved). But, the hashrate increases by 0,95%. So my reasonment is that I gain about 1% performance while losing 0,01% because of more hardware errors.

I only tested 2.1 on 1050 and for a shorter time, but hardware errors seem to be in the same range as in Diapolo's kernel, so I will most likely leave the frequency alone and do comparative testing on the kernels only.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: bcforum on August 08, 2011, 04:09:11 PM
@deepceleron

As long as the board doesn't crash or overheat, I think you just need to find out if the increase in hashrate is significantly more than the increase in hardware errors. For my board, on Diapolo's 08-04 version I get 0,2% hardware errors (out of the accepted shares, maybe time referenced would be better) at 1040 Mhz. At 1050, it's 0,21%, so a variance that can be as well network conditions or measurement errors (if time was involved). But, the hashrate increases by 0,95%. So my reasonment is that I gain about 1% performance while losing 0,01% because of more hardware errors.

I only tested 2.1 on 1050 and for a shorter time, but hardware errors seem to be in the same range as in Diapolo's kernel, so I will most likely leave the frequency alone and do comparative testing on the kernels only.

I think you should double (at least) the number of hardware errors. Remember that for every bad nonce that is reported, there is probably a good nonce that goes unreported.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 08, 2011, 05:11:47 PM
No update yet? It's aug 8 now <.<


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 08, 2011, 06:17:18 PM
No update yet? It's aug 8 now <.<

Just, posted the new version, enjoy.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ovidiusoft on August 08, 2011, 07:34:27 PM
First run report:

* Diapolo's 08-04: 338.9
* phatk-2.1: 340.9
* phatk-2.2: 341.3

Board is a 5830 Xtreme from Sapphire, GPU at 1050, RAM at 325, phoenix options:

-k phatkmod-0804 VECTORS2 BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256
-k phatk-2.1 VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256
-k phatk-2.2 VECTORS BFI_INT FASTLOOP=false AGGRESSION=14 WORKSIZE=256

I'll leave it overnight to see if any problems (hardware errors, etc), but it looks good! Thank you for your work! (and waiting for Diapolo's reply on your kernel :D ).


Title: Re: Modified Kernel for Phoenix 1.5
Post by: teukon on August 08, 2011, 09:01:35 PM
Sapphire HD5850 Xtreme: 899MHz/327MHz@0.9875V
Linux x86_64, Catalyst 11.6, SDK 2.1
VECTORS BFI_INT AGGRESSION=14 WORKSIZE=256

phatk 2.1: 376.9 Mh/s (+/- 0.1 Mh/s)
phatk 2.2: 377.5 Mh/s (+/- 0.1 Mh/s)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Beta-coiner1 on August 08, 2011, 10:09:50 PM
Radeon 6950- .1-.3 Mh/s over Diapolo's latest.v 4 w64 f3
Radeon 5770- .2-2.0 Mh/s  "     "   ".v 2 w128 f30

Cat. 11.6B /SDK 2.5

Not bad of an improvement.



Title: Re: Modified Kernel for Phoenix 1.5
Post by: UniverseMan on August 08, 2011, 11:25:33 PM
Cat 11.6, SDK 2.4
Ubunutu 11.04, phoenix 1.50

cards - 5830 1000/300, 6870 945/1050
phatk 2.1 - 324, 299
phatk 2.2 - 323, 298

Slightly slower on 2.2. I've listed it as 1 MH/s slower on both cards, but in fact it's more like .5 or less slower.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Tx2000 on August 09, 2011, 03:17:15 AM
Catalyst 11.4 / SDK 2.4
Ref 5850 @ 920c/320m

-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12

2.1: 399.27 to 399.63 Mh/s
2.2: 399.87 to 400.17 Mh/s



Title: Re: Modified Kernel for Phoenix 1.5
Post by: Clipse on August 09, 2011, 04:08:22 AM
Catalyst 11.4 / SDK 2.4
Ref 5850 @ 920c/320m

-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12

2.1: 399.27 to 399.63 Mh/s
2.2: 399.87 to 400.17 Mh/s



Damn those are some good hashrates for the core.

I think i will setup cat 11.4 aswell and test my card mem out at 320, my cores running between 1050-1150(for the extreme voltmodded version) all hd5850's aswell.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: metacontent on August 09, 2011, 05:50:56 AM
Hey, I've been using this modified kernel for a couple weeks now, I quite like it, just wanted to say thanks.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: teukon on August 09, 2011, 07:00:39 AM
Catalyst 11.4 / SDK 2.4
Ref 5850 @ 920c/320m

-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12

2.1: 399.27 to 399.63 Mh/s
2.2: 399.87 to 400.17 Mh/s



Damn those are some good hashrates for the core.

I think i will setup cat 11.4 aswell and test my card mem out at 320, my cores running between 1050-1150(for the extreme voltmodded version) all hd5850's aswell.

Yeah - seriously!  I've come up against this before when trying to find the maximum hash-rate for a 1GHz 5850 and ended up being well and truly trumped by a Windows user with Catalyst 11.4.  I may have to try playing with this version of Catalyst again.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 09, 2011, 07:06:47 AM
something wrong with kernel 2.2

i get 330 MHs using 2.2
410 MHs using kernel 2.1

card AMD 5870 clock at 900 Mhz

using 11.8 beta driver with SDK2.5


Title: Re: Modified Kernel for Phoenix 1.5
Post by: bcforum on August 09, 2011, 10:30:53 AM
Ubuntu 10.10
Cata 11.3
SDK 2.4
6970x2 OC 940,1375

Phoenix-r112 (Diapolo 7-17 w/ Vals[7] patch) 422.8MH/s
Phatk-2.2 423.3MH/s

So up 0.5MH/s, sent you my profits for the week.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 09, 2011, 05:06:12 PM
I found that VECTER4 option does not work for version 2.2



Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 09, 2011, 05:40:16 PM
I found that VECTER4 option does not work for version 2.2



Same. Using VECTORS4 drops my hash rate from 385 to 310 on my 5870. Using VECTORS WORKSIZE=128 brings it back up to about 380.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 09, 2011, 08:33:43 PM
I found that VECTER4 option does not work for version 2.2



I optimize the code for VECTORS, so probably making it faster in 2.2 made VECTORS4 slower.  I can't really optimize the kernel for both, so I would just stick with version 2.1 if that is faster for you.

And everyone, thanks for your support, every little bit helps :)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: jedi95 on August 09, 2011, 08:57:30 PM
I found that VECTER4 option does not work for version 2.2



Same. Using VECTORS4 drops my hash rate from 385 to 310 on my 5870. Using VECTORS WORKSIZE=128 brings it back up to about 380.

This is probably because of the increased GPR usage of the VECTORS4 code. According to KernelAnalyzer VECTORS4 uses 2707 ALU OPS and 33 GPRs. This is compared with VECTORS which is 1355 ALU OPS and only 23 GPRs. Theoretically VECTORS4 would be faster, since it tests twice the number of nonces using 3 fewer ALU OPS than 2 executions of VECTORS. However, if the GPU runs out of GPRs then this limits the number of threads that can be running at once, which is what causes the performance drop.

(Above ALU OPS and GPR numbers are for Cypress, AKA 58xx)

VECTORS4 might be faster for 69xx users though, when combined with a smaller WORKSIZE.

EDIT: Just looked at the 2.1 version and it uses even more GPRs with VECTORS4 than 2.2 does. (35 GPRs, 1358 ALU OPS) I'm not quite sure how it can be faster than 2.2.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: metacontent on August 09, 2011, 09:16:34 PM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Title: Re: Modified Kernel for Phoenix 1.5
Post by: bcforum on August 09, 2011, 10:40:58 PM
VECTORS4 might be faster for 69xx users though, when combined with a smaller WORKSIZE.

Ubuntu 10.10
Catalyst 11.3
SDK 2.4
6970 @ 940,1375
Phatk 2.2

Quote
315.5MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS4 FASTLOOP=false
414.2MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=128 VECTORS4 FASTLOOP=false
321.1MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS4 FASTLOOP=false

422.8MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=64 VECTORS FASTLOOP=false
423.5MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=128 VECTORS FASTLOOP=false
420.9MH/s      DEVICE=0 AGGRESSION=13 BFI_INT WORKSIZE=256 VECTORS FASTLOOP=false


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 09, 2011, 10:43:16 PM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations.  I just don't have the time to keep up with two kernels.  If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like.  ;)

Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4.

Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory).  Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: metacontent on August 09, 2011, 11:00:04 PM
 Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

I think for the foreseeable future those cards will be doing the lions share of the work, so I would say you are on the right track.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: cyberlync on August 09, 2011, 11:44:27 PM
 Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

I think for the foreseeable future those cards will be doing the lions share of the work, so I would say you are on the right track.

+1


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Tx2000 on August 10, 2011, 12:32:41 AM
Catalyst 11.4 / SDK 2.4
Ref 5850 @ 920c/320m

-k phatk VECTORS BFI_INT FASTLOOP=false WORKSIZE=256 AGGRESSION=12

2.1: 399.27 to 399.63 Mh/s
2.2: 399.87 to 400.17 Mh/s



Damn those are some good hashrates for the core.

I think i will setup cat 11.4 aswell and test my card mem out at 320, my cores running between 1050-1150(for the extreme voltmodded version) all hd5850's aswell.

Yea beats me =/  I haven't been able to get my second 5850 (new 230SA Sapphire 5850 Xtreme) to achieve the same results.  In fact, it seems to hate SDK 2.4.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: fpgaminer on August 10, 2011, 04:35:50 AM
Updated my poclbm branch to support phatk2.2 through the --phatk2_2 command line option:

https://github.com/progranism/poclbm (https://github.com/progranism/poclbm)



Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 10, 2011, 09:40:16 AM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations.  I just don't have the time to keep up with two kernels.  If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like.  ;)

Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4.

Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory).  Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

-Phateus
the thing is, VECTORS4 worked perfectly for me in version 2.1
in version 2.2 its broken


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 10, 2011, 04:21:30 PM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations.  I just don't have the time to keep up with two kernels.  If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like.  ;)

Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4.

Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory).  Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

-Phateus
the thing is, VECTORS4 worked perfectly for me in version 2.1
in version 2.2 its broken

As in it doesn't work at all, or that it is much slower?... Just use version 2.1 then


Title: Re: Modified Kernel for Phoenix 1.5
Post by: huayra.agera on August 10, 2011, 04:43:21 PM
Hi! Just used v2.2 and it increased my hashrate by 3 Mhash compared to Diapolo's. From 402 > 405. Vectors4 seemed to drop the hashrate significantly on my 5850 by 50 Mhash. Great work to you guys and we are very grateful =).

I think the mods should create a Child Board under Mining support and name it "Mods" or Tweaks I guess and put this thread there.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on August 10, 2011, 07:52:14 PM
Why not make two separate kernels then?

VECTORS4 might one day be the better alternative, instead of doing all that work then why not start now and keep pace?



Because I have literally put in over 100 hours on the main kernel and have gotten almost nothing in donations.  I just don't have the time to keep up with two kernels.  If anyone feels like making a VECTORS4 branch, go for it... the source code is in the public domain and you can use how you'd like.  ;)

Also, from what I've gathered, there may be only 1 or 2 people interested it... If you can lower your memory speed, I think VECTORS will always be faster than VECTORS4.

Now, I do like hearing feedback from everyone. I am just letting you know that it is not feasible to optimize the kernel for every possible configuration (SDK 2.1, 2.4, slow memory).  Right now, the kernel is optimized for SDK 2.5 and the 68xx and 5xxx cards and assuming you pick the best memory clock speed for your card (somewhere around 1/3 of your core clock).

-Phateus
the thing is, VECTORS4 worked perfectly for me in version 2.1
in version 2.2 its broken

As in it doesn't work at all, or that it is much slower?... Just use version 2.1 then

The behavior is as if it's not doing 4 nonces, but only doing 1 (i.e. no VECTORS option specified). My compute speed remained the same regardless of memory speed, which is exactly like your V1 result on the graph on page 1.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: critical on August 11, 2011, 09:49:54 AM
in guiminer, i keep getting invalid buffer, unable to write to file, wonder why


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 11, 2011, 11:09:00 AM
Just did a test:

Rig setup:
  Linuxcoin v0.2b (Linux version 2.6.38-2-amd64)
  Dual HD5970 (4 GPU cores in the rig)
  Mem clock @ 300Mhz
  Core clock @ 800Mhz
  VCore @ 1.125v
  AMD SDK 2.5
  Phoenix r100
  Phatk v2.2
  -v -k phatk BFI_INT VECTORS WORKSIZE=256 AGGRESSION=11 FASTLOOP=false

Result:
  Overall Rig rate: 1484 MH/s
  Rate per core: 371 MH/s

This is ~4MH/s faster than Diapolo's latest.

On 5970, phatk 2.2 is current king of the hill.

For the world to be perfect, this kernel needs to be integrated into cgminer :)



The last kernel releases show, that it is a bit of trial and error to find THE perfect kernel for a specific setup. Phaetus and I try to use the KernelAnalyzer and our Setups as a first measurement, if a new Kernel got "faster". But there are many different factors that come into play like OS, driver, SDK, miner-software and so on.

I would suggest that we should try to create a kernel which is based on the same kernel-parameters for phatk and phatk-Diapolo so that the users are free to chose which kernel is used. One thing is CGMINER kernel uses the switch VECTORS2, where Phoenix used only VECTORS (which I changed to VECTORS2 in my last kernel releases). It doesn't even matter to use the same variable names in the kernel (in fact they are different sometimes) as long as the main miner software passes the awaited values in a defined sequence to the kernel.

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: MegaBux on August 11, 2011, 03:26:33 PM
As of version 2.1, phatk now has command line option "VECTORS4" which can be used instead of "VECTORS".
This option works on 4 nonces per thread instead of 2 and may increase speed mainly if you do not underclock your memory, but feel free to try it out.  Note that if you use this, you will more than likely have to decrease your WORKSIZE to 128 or 64.

I'm using a 6770 @ 1.01Ghz with phatk 2.2.  When I run the memory clock at 300Mhz with the VECTORS option, I get 234.5Mhps.  However, I can't seem to reap the benefits of VECTORS2 or VECTORS4 at a higher memory clock (i.e. 1.2Ghz).  I've reduced the WORKSIZE from 256 to 128 and 64 and peak around 213Mhps;  with these options, I can only achieve between 204 and 213 Mhps.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 11, 2011, 04:33:14 PM
Just did a test:

Rig setup:
  Linuxcoin v0.2b (Linux version 2.6.38-2-amd64)
  Dual HD5970 (4 GPU cores in the rig)
  Mem clock @ 300Mhz
  Core clock @ 800Mhz
  VCore @ 1.125v
  AMD SDK 2.5
  Phoenix r100
  Phatk v2.2
  -v -k phatk BFI_INT VECTORS WORKSIZE=256 AGGRESSION=11 FASTLOOP=false

Result:
  Overall Rig rate: 1484 MH/s
  Rate per core: 371 MH/s

This is ~4MH/s faster than Diapolo's latest.

On 5970, phatk 2.2 is current king of the hill.

For the world to be perfect, this kernel needs to be integrated into cgminer :)



The last kernel releases show, that it is a bit of trial and error to find THE perfect kernel for a specific setup. Phaetus and I try to use the KernelAnalyzer and our Setups as a first measurement, if a new Kernel got "faster". But there are many different factors that come into play like OS, driver, SDK, miner-software and so on.

I would suggest that we should try to create a kernel which is based on the same kernel-parameters for phatk and phatk-Diapolo so that the users are free to chose which kernel is used. One thing is CGMINER kernel uses the switch VECTORS2, where Phoenix used only VECTORS (which I changed to VECTORS2 in my last kernel releases). It doesn't even matter to use the same variable names in the kernel (in fact they are different sometimes) as long as the main miner software passes the awaited values in a defined sequence to the kernel.

Dia

A good idea.

A further improvement: I'd like to have an option in my miner that spends ~2mn
benchmarking all the kernels available in the current directory (without talking to
a pool, i.e. doing pure SHA256 on bogus nonces), and picking the fastest for the
current rig.

For people with lots of different rigs/setups, that would save them the headache
of having to hand-tune each instance.


What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork).
I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...)
This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.

As for cgminer support, I haven't tried it, are there any benefits over phoenix?  I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 11, 2011, 04:50:32 PM
As of version 2.1, phatk now has command line option "VECTORS4" which can be used instead of "VECTORS".
This option works on 4 nonces per thread instead of 2 and may increase speed mainly if you do not underclock your memory, but feel free to try it out.  Note that if you use this, you will more than likely have to decrease your WORKSIZE to 128 or 64.

I'm using a 6770 @ 1.01Ghz with phatk 2.2.  When I run the memory clock at 300Mhz with the VECTORS option, I get 234.5Mhps.  However, I can't seem to reap the benefits of VECTORS2 or VECTORS4 at a higher memory clock (i.e. 1.2Ghz).  I've reduced the WORKSIZE from 256 to 128 and 64 and can only seem to peek at 213Mhps.  With these options, I can only achieve between 204 and 213 Mhps.

I have found that VECTORS4 is extremely unreliable... even tiny changes in the kernel and other factors affect the hashrate tremendously...  OpenCL gets really weird when you use a lot of registers.  I added it in 2.1 because it was comparable to VECTORS in some situations, but changing the kernel slightly in 2.2 seems to have broken it (even though kernel analyer says it uses less registers and less ALU ops... *sigh*)

Anyone wondering about any new kernel improvements, I seem to be at a standstill... I have tried the following:
  • Removing all control flow operations (about 1MH/s slower)
  • Sending all kernel arguments in a buffer (about 1MH/s slower)
  • Using an atomic counter for the output so that the output buffer is written sequentially (about the same speed and only works on ATI xxx cards and newer)
  • Using an internal loop in the kernel to process multiple nonces (Either significantly slower or massive desktop lag)
  • Calling set_arg only once per getwork instead of once per kernel call (only faster when using very low aggression and FASTLOOP, I will add this to my next kernel release)

-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: jedi95 on August 11, 2011, 08:44:55 PM

What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork).
I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...)
This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.

As for cgminer support, I haven't tried it, are there any benefits over phoenix?  I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).

In most cases you won't see much if any decrease in the number of getwork requests by running multiple kernels behind the same work queue. The reason for having a work queue in the first place is so that the miner only needs to ask for more work when the queue falls below a certain size. During normal operation Phoenix won't request more work than absolutely necessary. There might be a small benefit to doing this when the block changes, but aside from that the getwork count for a single instance running 2 kernels compared to 2 instances will be very close.

That said, I am interested to see the results of the other changes you mentioned. Feel free to PM me if you have any questions.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: deepceleron on August 12, 2011, 03:12:32 AM
Big Edit:

I looked again at the AMD APP SDK v2.5, trying to get it to not suck. I did one more thing, not only did I install the 2.5 SDK (on Catalyst 11.6), but I also re-compiled pyopencl 0.92 against the newer SDK. On phatk 2.2, changing just from 2.4 SDK to 2.5 SDK with a matching pyOpenCL gets a hair more mhash:
SDK 2.4: 309.97
SDK 2.5: 310.10

Just to let people know, regarding the APP SDK, the version installed as well as the version used to compile pyopencl both seem to matter (not that this helps you if you are using just the prepackaged Windows phoenix.exe.)

Using a pyOpenCL newer than 0.92 gives a deprecation warning:

[0 Khash/sec] [0 Accepted] [0 Rejected] [RPC]kernels\phatk\__init__.py:414: Depr
ecationWarning: 'enqueue_read_buffer' has been deprecated in version 2011.1. Ple
ase use enqueue_copy() instead.
  self.commandQueue, self.output_buf, self.output)
[11/08/2011 21:10:22] Server gave new work; passing to WorkQueue
[291.32 Mhash/sec] [0 Accepted] [0 Rejected] [RPC (+LP)]kernels\phatk\__init__.p
y:427: DeprecationWarning: 'enqueue_write_buffer' has been deprecated in version
 2011.1. Please use enqueue_copy() instead.
  self.commandQueue, self.output_buf, self.output)


Using pyOpenCL 2011.1.2 with the kernel in its current form gets me less mhash though:
SDK 2.4: 307.98
SDK 2.5: 307.84

(5830@955/350; Catalyst 11.6; Win7; py 2.6.6)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: CYPER on August 12, 2011, 03:24:26 AM
Using the latest 2.2 version got quite a noticeable increase:

Before:
4x 440Mh/s = 1760Mh/s

After:
4x 446Mh/s = 1784Mh/s

My best settings are:
Worksize = 256
Aggresion = 12
VECTORS


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Tx2000 on August 12, 2011, 03:46:11 AM


What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork).
I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...)
This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.

As for cgminer support, I haven't tried it, are there any benefits over phoenix?  I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).

Would definitely be interested in a cgminer fork.  Don't get me wrong, phoenix is great and has always given me the best performance overall but it does lack some of the more refined features, which the other poster listed above.  Failover and nice static but updated command line "UI".  Seems like you and diapolo are hitting the ceiling with phoenix anyway.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: hugolp on August 12, 2011, 06:54:36 AM
There is a thing I dont understand about the results of these modifications. They increase the hash rate but they also increase consumption, and I always though that since they are making the kernel more efficient (same task with less instructions, less work for the gpu per hash) they should increase the hash rate without chaning consumption too much. Does anyone know why the more efficient kernel is not also more energy efficient?

Also, if one of you guys is out of ideas to make the cards runs faster it could be interesting to target energy efficiency instead of speed. A lot of us are not interested in running our cards at the maximum MHash/s rate but are more interested on having a better MHash/J rate.



Title: Re: Modified Kernel for Phoenix 1.5
Post by: talldude on August 12, 2011, 01:23:02 PM
It is more efficient - the more output per unit time you have, the more efficient it is since the card will be wasting less power sitting idle.

If you want to increase efficiency, that is a hardware thing - namely undervolt your card.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: bcforum on August 12, 2011, 01:28:56 PM
There is a thing I dont understand about the results of these modifications. They increase the hash rate but they also increase consumption, and I always though that since they are making the kernel more efficient (same task with less instructions, less work for the gpu per hash) they should increase the hash rate without chaning consumption too much. Does anyone know why the more efficient kernel is not also more energy efficient?

Also, if one of you guys is out of ideas to make the cards runs faster it could be interesting to target energy efficiency instead of speed. A lot of us are not interested in running our cards at the maximum MHash/s rate but are more interested on having a better MHash/J rate.


In theory, fewer ALU ops translates to less energy consumption. In practice, each ALU op uses a slightly different amount of power and a kernel which 10x instruction A may burn more power than 12x instruction B. Unfortunately, instruction power numbers aren't documented anywhere so it is almost impossible to optimize in a theoretical sense, and could vary from GPU to GPU (due to minor manufacturing defects.)

One of Diapolo's recent kernels lowered operating temperature by ~3C without changing hashrate significantly. Presumably that particular kernel is ~10% more power efficient than others.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: hugolp on August 12, 2011, 01:35:19 PM
In theory, fewer ALU ops translates to less energy consumption. In practice, each ALU op uses a slightly different amount of power and a kernel which 10x instruction A may burn more power than 12x instruction B. Unfortunately, instruction power numbers aren't documented anywhere so it is almost impossible to optimize in a theoretical sense, and could vary from GPU to GPU (due to minor manufacturing defects.)

One of Diapolo's recent kernels lowered operating temperature by ~3C without changing hashrate significantly. Presumably that particular kernel is ~10% more power efficient than others.

Thanks for the answer. Can you indicate the version of Diapolo's kernel you are refering to?


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 12, 2011, 05:53:22 PM


What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork).
I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...)
This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.

As for cgminer support, I haven't tried it, are there any benefits over phoenix?  I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).

Would definitely be interested in a cgminer fork.  Don't get me wrong, phoenix is great and has always given me the best performance overall but it does lack some of the more refined features, which the other poster listed above.  Failover and nice static but updated command line "UI".  Seems like you and diapolo are hitting the ceiling with phoenix anyway.

I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).

We are hitting a ceiling with opencl in general (and perhaps with the current hardware).  In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.

Now that doesn't mean that there is NO room for improvement, just that any other improvement will probably have to be faster hardware, a more efficient implementation of openCL by AMD or figuring out a better way to finagle the current openCL implementation to reduce the implementation overhead.  But, unless there is a problem with pyopenCL, c and python should give equivalent speeds as long as they are just calling the openCL interface (the actual miner uses negligible resources).  I suppose it could be possible to access the hardware drivers directly and run the kernel that way... but I don't see that as being feasible.

But, with all of that said, I have looked through some of his code, and it some really clean code.  Part of the reason I want to add these features is to learn more python (this is the first thing I have programmed in python), but it probably will just be easier modifying the cgminer code.  Thanks for pointing out cgminer to me :)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Tx2000 on August 12, 2011, 06:04:56 PM
Sent another donation your way.  Look forward to your work on cgminer.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 12, 2011, 06:30:17 PM
Sent another donation your way.  Look forward to your work on cgminer.

Thanks :D


Title: Re: Modified Kernel for Phoenix 1.5
Post by: bcforum on August 12, 2011, 06:50:47 PM
Thanks for the answer. Can you indicate the version of Diapolo's kernel you are refering to?

https://bitcointalk.org/index.php?topic=25860.msg428882#msg428882


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 12, 2011, 07:38:31 PM
I took a look at the comparison between version 2.2 and version 2.1
could it because __constant uint ConstW[128] change that broke VECTORS4?


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 12, 2011, 08:01:13 PM
I took a look at the comparison between version 2.2 and version 2.1
could it because __constant uint ConstW[128] change that broke VECTORS4?

That change is inconsequential (I was trying some things that required the change but did not keep them).. the compiler doesn't use those values, so they code should be exactly the same doing it either way (you can try and replace the code with the old code if you want to check).

You keep saying that it is broken.. if it does not run, post the errors.

I have found that on my card, VECTORS4 is much slower in version 2.2 than 2.1, but this is not a bug... it seems to be because openCL does not like allocating that many registers... Version 2.1 uses around 99.7% of instruction slots with VECTORS4 and I have tried many many ways to make it faster and more reliable (in 2.1), but I have given up on it.  It is still in the release because I don't see any point in taking it out...  but getting 2.2 to run as fast as 2.1 with VECTORS4 is not going to happen.  Also, the differences between 2.1 and 2.2 with VECTORS are very tiny anyway (less than .5%)...

Getting into more detail about it: If you look at the graph on the main page of the thread, you can see the graph of VECTORS4 in version 2.1... in version 2.2 for some reason, the spike (and corresponding valley) is located higher (somewhere around 500), this could mean that it would be just as fast if you had 1500 Mhz memory, but I have no idea why openCL reacts this way to changing the memory speed.  There are way to many GPU architecture/GPU bios/PCIe bus/CPU-GPU transfer/driver/openCL implementation unknowns to try to predict this behavior.


-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: huayra.agera on August 12, 2011, 08:19:20 PM
Hi, I used phatk 2.2 on my 5 rigs and I had restarting/BSOD errors occuring on all machines (5850 multi/single, 6850) on several occasions already.

Yes, there was an increase in hashrate however, it seemed to have a memory leak or something. Just thought I'd inform you on this. Anyways, great work still. Looking forward to further improvements on the project. But for now, I'll revert to my previous settings.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: jedi95 on August 13, 2011, 06:38:02 AM

I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).


Looking forward to this !!

Just sent one coin your way, and there's another once the work is done.

Quote
We are hitting a ceiling with opencl in general (and perhaps with the current hardware).  In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.

Out of curiosity, have you looked into trying to code a version
directly in AMD's assembly language and bypassing OpenCL entirely ?
(I'm thinking: since we're already patching the ELF output, this seems
like the logical next step :))

Also, have you looked at AMD CAL ? I know this is what ufasoft's miner
uses (https://bitcointalk.org/index.php?topic=3486.500), and also what
zorinaq considers the most efficient way to access AMD hardware (somwhere
on http://blog.zorinaq.com)



Replacing one instruction in the ELF with another that uses the exact same inputs/outputs is one thing, but manually editing the ASM code is another thing entirely. Besides, with the work that has been done the GPU is already at >99% of the theoretical maximum throughput. (ALU packing) And as said above, we are also close to the theoretical minimum number of instructions to correctly run SHA256.

Also, if you look near the end of the hdminer thread you will notice that users are able to get the same hashrates from phatk on 69xx. For 58xx and other VLIW5 cards phatk is significantly faster than hdminer. If that's the best he can do with CAL then I don't see any reason to use it. hdminer had a substantial performance advantage back in March/April, but with basically every miner supporting BFI_INT this is no longer the case.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: kano on August 14, 2011, 03:03:42 AM
Well I've been talking to a few people about this but got no real response from anyone, that it was possible ...
(Woke up with this idea back on the 4th of August ...)

So I guess I need to post in a thread where someone works on a CL kernel and just let them implement it if they don't already do it :P

I've written it in pseudo-code coz I still don't follow how the CL file actually does 2^n checks and returns the full list of valid results.
Yeah I've programmed in almost every language known to man (except C# and that's avoided by choice) but I still don't quite get the interface from C/C++ to the CL and how that matches what happens

What I am discussing, is the 2nd call to SHA256 with the output of the first call (not the first call)

Anyway, to explain, here's the end of the SHA256 pseudo code from the wikipedia:
==================
  for i from 0 to 63
    s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
    maj := (a and b) xor (a and c) xor (b and c)
    t2 := s0 + maj
    s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
    ch := (e and f) xor ((not e) and g)
    t1 := h + s1 + ch + k[ i] + w[ i]

    h := g
    g := f
    f := e
    e := d + t1
    d := c
    c := b
    b := a
    a := t1 + t2

  Add this chunk's hash to result:
  h0 := h0 + a
  h1 := h1 + b
  h2 := h2 + c
  h3 := h3 + d
  h4 := h4 + e
  h5 := h5 + f
  h6 := h6 + g
  h7 := h7 + h

Then test if h0..h7 is a share (CHECK0, CHECK1, ?)
==================

Firstly, I added that last line of course.
I understand that with current difficulty, if h0 != 0 then we don't have a share (call this CHECK0)
If h0=0 then check some leading part of h1 based on the current difficulty (call this CHECK1)
... feel free to correct this anyone who knows better :)

If a difficulty actually gets to checking h2 then my optimisation can be made even better by going back one more step (adding an i := 61) in the pseudo code shown below

A reasonably simple optimisation of the end code for when we are about to check if h0..h7 is a share (i.e. only the 2nd hash)

==================
 for i from 0 to 61
    s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
    maj := (a and b) xor (a and c) xor (b and c)
    t2 := s0 + maj
    s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
    ch := (e and f) xor ((not e) and g)
    t1 := h + s1 + ch + k[ i] + w[ i]

    h := g
    g := f
    f := e
    e := d + t1
    d := c
    c := b
    b := a
    a := t1 + t2

 i := 62
    s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
    maj := (a and b) xor (a and c) xor (b and c)
    t2 := s0 + maj
    s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
    ch := (e and f) xor ((not e) and g)
    t1 := h + s1 + ch + k[ i] + w[ i]

 tmpa := t1 + t2
 tmpb := h1 + tmpa (this is the actual value of h1 at the end)
 if CHECK1 on tmpb then abort - not a share
  (i.e. return false for a share)

    h := g
    g := f
    f := e
    e := d + t1
    d := c
    c := b
    b := a
    a := tmpa

 i := 63
    s0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
    maj := (a and b) xor (a and c) xor (b and c)
    t2 := s0 + maj
    s1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
    ch := (e and f) xor ((not e) and g)
    t1 := h + s1 + ch + k[ i] + w[ i]

 tmpa := h0 + t1 + t2 (this is the actual value of h0 at the end)
 if CHECK0 on tmpa then abort - not a share
  (i.e. return false for a share)

    h := g
    g := f
    f := e
    e := d + t1
    d := c
    c := b

 Add this chunk's hash to result:
 h0 := tmpa
 h1 := tmpb
 h2 := h2 + c
 h3 := h3 + d
 h4 := h4 + e
 h5 := h5 + f
 h6 := h6 + g
 h7 := h7 + h

Its a share - unless we need to test h2?
==================

Firstly the obvious (as I've said twice above):
This should only be done when calculating a hash to be tested as a share.
Since the actual process is a double-hash, the first hash should not, of course, do this.

In i=62:
If the tmpb test (CHECK1) says it isn't a share it avoids an entire loop (i=63), the 'e' calculation at i=62 and any unneeded assignments after that
and also we don't care about the actual values of h0-h7 so there is no need to assign them anything (or do the additions) except whatever is needed to affirm the result is not a share (e.g. set h0=-1 if h0..h7 must be examined later - or just return false if that is good enough - I don't know which the code actually needs)

CHECK1's probability of failure is high so it easily cover the issue of an extra calculation (h1 + tmpa) to do it.

In i=63:
If the tmpa test (CHECK0) says it isn't a share it avoids the 'e' calculation at i=63 and any unneeded assigments after that
and also we don't care about the actual values of h0-h7 so there is no need to assign them anything (or do the additions) except whatever is needed to affirm the result is not a share (e.g. set h0=-1 if h0..h7 must be examined later - or just return false if that is good enough - I don't know which the code actually needs)


P.S. any and all mistakes I've made - oh well but the concept is there anyway


Any mistakes? Comments?


Title: Re: Modified Kernel for Phoenix 1.5
Post by: fpgaminer on August 14, 2011, 10:11:42 AM
I've compiled a Win32 EXE for my poclbm fork (which has phatk, phatk2, phatk2.1, and phatk2.2 support):

http://www.bitcoin-mining.com/poclbm-progranism-win32-20110814a.zip (http://www.bitcoin-mining.com/poclbm-progranism-win32-20110814a.zip)
md5sum - df623a45f8cb0a50fcded92728f12c14

Let me know if it works, I was only able to test it on one machine so far.

Quote
Well I've been talking to a few people about this but got no real response from anyone, that it was possible ...
The optimization you've spelled out is more or less already implemented in most, if not all GPU miners.

The way GPU miners currently work is that they check in the GPU code whether h7==0. If it does, the result (a nonce) is returned, otherwise nothing is returned. It is the responsibility of the CPU software to do any further difficulty checks if needed.

Since the only thing the GPU miners care about is H7, they completely skip the last 3 rounds (stopping after the 61st round).

Also note, that GPU miners don't calculate the first 3 rounds of the first pass. Those rounds are pre-computed, because the inputs to those rounds remains constant for a given unit of getwork. So a GPU miner really only computes a grand total of 122 rounds, minus various other small pre-calculations here and there.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Clipse on August 14, 2011, 10:57:41 AM
You may be one, but you are the champion of many :P

Its working great on my lazy spare windows machine, thanks :)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: kano on August 14, 2011, 11:47:00 AM
...
Quote
Well I've been talking to a few people about this but got no real response from anyone, that it was possible ...
The optimization you've spelled out is more or less already implemented in most, if not all GPU miners.

The way GPU miners currently work is that they check in the GPU code whether h7==0. If it does, the result (a nonce) is returned, otherwise nothing is returned. It is the responsibility of the CPU software to do any further difficulty checks if needed.

Since the only thing the GPU miners care about is H7, they completely skip the last 3 rounds (stopping after the 61st round).

Also note, that GPU miners don't calculate the first 3 rounds of the first pass. Those rounds are pre-computed, because the inputs to those rounds remains constant for a given unit of getwork. So a GPU miner really only computes a grand total of 122 rounds, minus various other small pre-calculations here and there.
OK, so I've got the H's back-to-front (H7 is the first one, not H0) then yeah that makes sense of doing fewer steps yet again than what I said.
Still, why not do the share/H6 test in GPU - it would certainly be faster - shares are also rare compared to a job (about 1 in 2 billion)
Is that an issue with the CL not being able to be changed based on the difficulty?
Yet it could be done as a simple pre-calculated number to AND against the H6 value (extra calculation) when H7 is zero.
(I should work out what's the difficulty value high enough to need to test H5 ... though that may be so large it would never be reached)

Edit: of course if a nonce (H7=0) is the requirement of a share - then there is no more testing (no testing of H6) required
I need to read pushpool more closely to determine exactly what a share is ... unless someone feels like answering that ... :)

Edit2: so skipping the first 3 rounds of the first pass is possible (128 - 3 = 125)
but there are actually 3.5 rounds not needed at the end of the 2nd pass - though I guess you already do that
Round 60 (2nd round) becomes only the calculations necessary to get t1 (s1 & ch) since unneeded are s0 and maj (and of course t2)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 14, 2011, 04:05:07 PM
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time.
Any idea Phateus?

Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right?

Edit: Could be my setup if no one else has this error :D.

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: techwtf on August 14, 2011, 04:46:33 PM
One of my cards (5850, 835 MHz. down clock to 810M still failed) seems not able to work well with phatk 2.1/2.2. it die after a while(<1h), having to restart miner(win32)/reset(linux).
diabolo's 2011.7.17 is ok @ 835MHz.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 14, 2011, 05:58:50 PM
I tried to figure out the reason version 2.2 does not work well with VECTORS4
I could not find out why as I do not have enough knowledge.
Here are some results I found:

replacing this block of code in version 2.1 with the corresponding block in version 2.2 will make VECTORS4 much slower


#define P1(n) ((rot(W[(n)-2],15u)^rot(W[(n)-2],13u)^((W[(n)-2])>>10U)))
#define P2(n) ((rot(W[(n)-15],25u)^rot(W[(n)-15],14u)^((W[(n)-15])>>3U)))
#define P3(x)  W[x-7]
#define P4(x)  W[x-16]


//Partial Calcs for constant W values
#define P1C(n) ((rotate(ConstW[(n)-2],15)^rotate(ConstW[(n)-2],13)^((ConstW[(n)-2])>>10U)))
#define P2C(n) ((rotate(ConstW[(n)-15],25)^rotate(ConstW[(n)-15],14)^((ConstW[(n)-15])>>3U)))
#define P3C(x)  ConstW[x-7]
#define P4C(x)  ConstW[x-16]

//SHA round with built in W calc
#define sharoundW(n)  Vals[(3 + 128 - (n)) % 8] += t1W(n); Vals[(7 + 128 - (n)) % 8] = t1W(n) + t2(n);  

//SHA round without W calc
#define sharound(n) Vals[(3 + 128 - (n)) % 8] += t1(n); Vals[(7 + 128 - (n)) % 8] = t1(n) + t2(n);

//SHA round for constant W values
#define sharoundC(n) Barrier(n); Vals[(3 + 128 - (n)) % 8] += t1C(n); Vals[(7 + 128 - (n)) % 8] = t1C(n) + t2(n);

//The compiler is stupid... I put this in there only to stop the compiler from (de)optimizing the order
#define Barrier(n) t1 = t1C((n) % 64)

And this block is not the only thing that causes the problem.

I am guessing there is something to do with rotC function.(it is a guess only


Title: Re: Modified Kernel for Phoenix 1.5
Post by: fpgaminer on August 14, 2011, 06:30:05 PM
Quote
Still, why not do the share/H6 test in GPU - it would certainly be faster - shares are also rare compared to a job (about 1 in 2 billion)
Is that an issue with the CL not being able to be changed based on the difficulty?
There are several reasons.

99.99% of the time the mining software only needs to look for Difficulty 1 (a share, H7==0), so there is rarely the needed to check for anything else.
GPU's absolutely hate branching; a full Difficulty check involves many branches.
Smaller GPU programs are better GPU programs.
The CPU runs in parallel to the GPU. Since the CPU is fully capable of checking for extra Difficulty levels, why would you burden the GPU with such work?
The CPU should double-check the GPU's results anyway, to detect errors. Since the CPU will thus be recomputing the full two SHA-256 passes for each result returned by the GPU, it again makes sense to only check for higher difficulties on the CPU.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Diapolo on August 14, 2011, 07:47:31 PM
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time.
Any idea Phateus?

Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right?

Edit and solved, non BFI_INT Ch has to be:
Code:
#define Ch(x, y, z) bitselect(z, y, x)

If you want to thank someone, you can donate to 1LY4hGSY6rRuL7BQ8cjUhP2JFHFrPp5JVe (Vince -> who did a GREAT job during my kernel development)!

Dia


Title: Re: Modified Kernel for Phoenix 1.5
Post by: RoadStress on August 15, 2011, 01:24:41 PM
Sent another donation your way.  Look forward to your work on cgminer.
+1


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 15, 2011, 05:54:59 PM
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time.
Any idea Phateus?

Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right?

Edit and solved, non BFI_INT Ch has to be:
Code:
#define Ch(x, y, z) bitselect(z, y, x)

If you want to thank someone, you can donate to 1LY4hGSY6rRuL7BQ8cjUhP2JFHFrPp5JVe (Vince -> who did a GREAT job during my kernel development)!

Dia

Awesome, thank you!  I was under the assumption that BFI_INT and bitselect were the same operation, apparently, the operand order is different.  I will fix it in my next release.

Thank you everyone for your support (both in BTC and discussion).

I should have a drop-in version of the kernel available for cgminer soon, so anyone wanting to try out the pre-release, I'll be posting it tonight.

@BOARBEAR
*sigh*.... come on man... do you even read my posts? There is no single cause of the bad performance.  2.2 executes less instructions and uses less registers than 2.1, but as I said... there is some weird issue which makes openCL slower behind the scenes.  My best guess is that it has to do with register allocation. 

The GPU has a total of 256x32x4 registers (8192 UINT4).  At the most, there are 256 threads per workgroup (8192/256 = 32 registers per thread).  Using VECTORS, the number of registers is far below this number, therefore the hardware can operate on the maximum allowable threads at a time.  However, when you compile with VECTORS4, there is more than 32 registers per thread.  OpenCL must determine how to allocate the threads, and the utilization of the video card is sub-optimal)  Below is a diagram of what I think is going on.


4 thread groups running simultaneously VECTORS (2 running at a time)
[1111111122222222]
[3333333344444444]

using an optimal version of VECTORS4, it would look much like this (double the work is done per thread)
[1111111111111111]
[2222222222222222]
[3333333333333333]
[4444444444444444]

now making it use slightly less resources will make it slower because the threads are out of sync and there will be overhead in syncing and tracking data within threadgroups:
[1111111111111112]
[2222222222222233]
[3333333333333444]
[4444444444445555]

Now, I may be waaaaay off here, but something like this is what makes sense to me.  Especially, since this would explain why decreasing the memory actually improves performance in some cases (by forcing synchronization).

Anyway, enough of my off-topic analysis...


I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).


Looking forward to this !!

Just sent one coin your way, and there's another once the work is done.

Quote
We are hitting a ceiling with opencl in general (and perhaps with the current hardware).  In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.

Out of curiosity, have you looked into trying to code a version
directly in AMD's assembly language and bypassing OpenCL entirely ?
(I'm thinking: since we're already patching the ELF output, this seems
like the logical next step :))

Also, have you looked at AMD CAL ? I know this is what ufasoft's miner
uses (https://bitcointalk.org/index.php?topic=3486.500), and also what
zorinaq considers the most efficient way to access AMD hardware (somwhere
on http://blog.zorinaq.com)



Replacing one instruction in the ELF with another that uses the exact same inputs/outputs is one thing, but manually editing the ASM code is another thing entirely. Besides, with the work that has been done the GPU is already at >99% of the theoretical maximum throughput. (ALU packing) And as said above, we are also close to the theoretical minimum number of instructions to correctly run SHA256.

Also, if you look near the end of the hdminer thread you will notice that users are able to get the same hashrates from phatk on 69xx. For 58xx and other VLIW5 cards phatk is significantly faster than hdminer. If that's the best he can do with CAL then I don't see any reason to use it. hdminer had a substantial performance advantage back in March/April, but with basically every miner supporting BFI_INT this is no longer the case.

Agreed, the kernel itself is pretty optimal.  I might look into calling lower level CAL functions to manage the (OpenCL compiled) GPU threads (instead of using openCL), but I doubt this will give any speedup (although, I might be able to reduce the CPU overhead).


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 16, 2011, 02:35:42 AM
Alright... I'm getting a little delayed on the prerelease for cgminer... mingw is a pain in the ass.. trying a full cygwin install next...

Bear with me, hopefully I'll get it running tomorrow.

-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: -ck on August 16, 2011, 12:07:38 PM
Alright... I'm getting a little delayed on the prerelease for cgminer... mingw is a pain in the ass.. trying a full cygwin install next...

Bear with me, hopefully I'll get it running tomorrow.

-Phateus
You could just tell me what to do to interface it with cgminer (i.e. what new variables you want) and I'd copy most of your kernel across. Only the return code and define macros are actually different in cgminer in the kernel itself.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on August 16, 2011, 03:20:46 PM
It seems like your latest kernel and mine have problems if BFI_INT gets forced of via (BFI_INT=false) ... it seems the results are invalid every time.
Any idea Phateus?

Perhaps #define Ch(x, y, z) bitselect(x, y, z) is not right?

Edit and solved, non BFI_INT Ch has to be:
Code:
#define Ch(x, y, z) bitselect(z, y, x)

If you want to thank someone, you can donate to 1LY4hGSY6rRuL7BQ8cjUhP2JFHFrPp5JVe (Vince -> who did a GREAT job during my kernel development)!

Dia

Awesome, thank you!  I was under the assumption that BFI_INT and bitselect were the same operation, apparently, the operand order is different.  I will fix it in my next release.

Thank you everyone for your support (both in BTC and discussion).

I should have a drop-in version of the kernel available for cgminer soon, so anyone wanting to try out the pre-release, I'll be posting it tonight.

@BOARBEAR
*sigh*.... come on man... do you even read my posts? There is no single cause of the bad performance.  2.2 executes less instructions and uses less registers than 2.1, but as I said... there is some weird issue which makes openCL slower behind the scenes.  My best guess is that it has to do with register allocation. 

The GPU has a total of 256x32x4 registers (8192 UINT4).  At the most, there are 256 threads per workgroup (8192/256 = 32 registers per thread).  Using VECTORS, the number of registers is far below this number, therefore the hardware can operate on the maximum allowable threads at a time.  However, when you compile with VECTORS4, there is more than 32 registers per thread.  OpenCL must determine how to allocate the threads, and the utilization of the video card is sub-optimal)  Below is a diagram of what I think is going on.


4 thread groups running simultaneously VECTORS (2 running at a time)
[1111111122222222]
[3333333344444444]

using an optimal version of VECTORS4, it would look much like this (double the work is done per thread)
[1111111111111111]
[2222222222222222]
[3333333333333333]
[4444444444444444]

now making it use slightly less resources will make it slower because the threads are out of sync and there will be overhead in syncing and tracking data within threadgroups:
[1111111111111112]
[2222222222222233]
[3333333333333444]
[4444444444445555]

Now, I may be waaaaay off here, but something like this is what makes sense to me.  Especially, since this would explain why decreasing the memory actually improves performance in some cases (by forcing synchronization).

Anyway, enough of my off-topic analysis...


I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).


Looking forward to this !!

Just sent one coin your way, and there's another once the work is done.

Quote
We are hitting a ceiling with opencl in general (and perhaps with the current hardware).  In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.

Out of curiosity, have you looked into trying to code a version
directly in AMD's assembly language and bypassing OpenCL entirely ?
(I'm thinking: since we're already patching the ELF output, this seems
like the logical next step :))

Also, have you looked at AMD CAL ? I know this is what ufasoft's miner
uses (https://bitcointalk.org/index.php?topic=3486.500), and also what
zorinaq considers the most efficient way to access AMD hardware (somwhere
on http://blog.zorinaq.com)



Replacing one instruction in the ELF with another that uses the exact same inputs/outputs is one thing, but manually editing the ASM code is another thing entirely. Besides, with the work that has been done the GPU is already at >99% of the theoretical maximum throughput. (ALU packing) And as said above, we are also close to the theoretical minimum number of instructions to correctly run SHA256.

Also, if you look near the end of the hdminer thread you will notice that users are able to get the same hashrates from phatk on 69xx. For 58xx and other VLIW5 cards phatk is significantly faster than hdminer. If that's the best he can do with CAL then I don't see any reason to use it. hdminer had a substantial performance advantage back in March/April, but with basically every miner supporting BFI_INT this is no longer the case.

Agreed, the kernel itself is pretty optimal.  I might look into calling lower level CAL functions to manage the (OpenCL compiled) GPU threads (instead of using openCL), but I doubt this will give any speedup (although, I might be able to reduce the CPU overhead).
I understand what you are saying.  Perhaps version2.1 will be the last version that works well with VECTORS4.  You said the work that has been done on the GPU is already at >99% of the theoretical maximum throughput.  But VECTORS4 alone gives me about 1.5% boost.(contraindication?)  That is why I tried hard to find a way to make VECTORS4 work so that the future versions can use it.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 16, 2011, 05:49:36 PM
Alright... I'm getting a little delayed on the prerelease for cgminer... mingw is a pain in the ass..


Yeah, mingw is most certainly a giant PITA.

To compile cgminer with mingw, the trick is to use msys and get pkg-config and libcurl installed properly

For pkg-config, the best is to install this: http://ftp.gnome.org/pub/gnome/binaries/win32/gtk+/2.22/gtk+-bundle_2.22.1-20101227_win32.zip

Once you have that, libcurl is rather easy.

Quote
trying a full cygwin install next...

Mmmh. Not sure this'll get you very far.

If your main dev box is windows and your goal is to integrate
phatk into cgminer, your best bet is probably to install a small
virtual machine (qemu or vmplayer) running ubuntu inside your
windows box and work on cgminer directly on Linux in there.

That's exactly what I do (the other way round) when I have to
try windows-specific things or a piece of code.


Yeah, I think I will stay away from using the mingw environment from now on... Cygwin was easy as pie.  No issues, I think can cross compile from cygwin using mingw if I want native Win32 support.  Apparently, getting pkg-conf (i think) working without POSIX support is terrible.  I got my kernel working around 5am last night linking against the cygwin dlls.. so tonight I will release the changes when I get home.

Alright... I'm getting a little delayed on the prerelease for cgminer... mingw is a pain in the ass.. trying a full cygwin install next...

Bear with me, hopefully I'll get it running tomorrow.

-Phateus
You could just tell me what to do to interface it with cgminer (i.e. what new variables you want) and I'd copy most of your kernel across. Only the return code and define macros are actually different in cgminer in the kernel itself.

Yeah, if you want, I can send you the changes tonight so you can put it in your release.  The only modifications I had to make to the kernel is changing VECTORS to VECTORS2 , hardcoding OUTPUT_SIZE = 4095 and hardcoding WORKSIZE=256 (I really do need this passed to the kernel though).  Also, my kernel only uses WORKSIZE+1 entries in the buffer, it would be better if you made the buffer that size.

As for the changes in the miner, I think I only had to change the precalc_hash() function, the kernel input and output file name, queue_phatk_kernel() function
what I will do tonight, is add KL_PHATK_2_2 to the cl_kernel enum and copy the function code and add the corresponding command line argument (right now I have just replaced PHATK with mine) and add -DWORKSIZE= arguments for the kernel.

Anyway, I will give you more details tonight when I am in front of my code.
My fork is https://github.com/Phateus/cgminer (https://github.com/Phateus/cgminer), I will upload the changes tonight (as soon as I figure out git... never used that before)

-Phateus

P.S. thanks for the easy to read code :)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: -ck on August 16, 2011, 10:18:23 PM
Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 17, 2011, 04:13:55 AM
Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.

Ok, the source is up... I am trying to figure out how to compile this for windows without the cygwin layer (I really haven't done any of this before... I am soooo lost)...

https://github.com/Phateus/cgminer (https://github.com/Phateus/cgminer)

ckolivas... if you want to merge this into your code at some point, let me know what I have to do... I literally installed git yesterday, and there is only so much you can learn on the internet in a day ;-)

As for the buffer, my kernel only uses WORKSIZE+1 parts of your buffer, but I left the buffer size intact.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: -ck on August 17, 2011, 05:14:20 AM
Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.

Ok, the source is up... I am trying to figure out how to compile this for windows without the cygwin layer (I really haven't done any of this before... I am soooo lost)...

https://github.com/Phateus/cgminer (https://github.com/Phateus/cgminer)

ckolivas... if you want to merge this into your code at some point, let me know what I have to do... I literally installed git yesterday, and there is only so much you can learn on the internet in a day ;-)

As for the buffer, my kernel only uses WORKSIZE+1 parts of your buffer, but I left the buffer size intact.
Very good work. Nice of you to figure out how to do git and all as well. Don't worry about the merge, I've taken care of everything and cherry picked your changes as I needed to. I've modified a few things too to be consistent with cgminer's code and there is definitely a significant speed advantage thanks to your changes. Note that if you're ever working on git doing your own changes, do them to a branch that's not called master as you may end up making it impossible to pull back my changes since I won't necessarily take all your code. Thanks again, and I'm sure the cgminer users will be most grateful. :)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 17, 2011, 05:51:08 AM
Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.

Ok, the source is up... I am trying to figure out how to compile this for windows without the cygwin layer (I really haven't done any of this before... I am soooo lost)...

https://github.com/Phateus/cgminer (https://github.com/Phateus/cgminer)

ckolivas... if you want to merge this into your code at some point, let me know what I have to do... I literally installed git yesterday, and there is only so much you can learn on the internet in a day ;-)

As for the buffer, my kernel only uses WORKSIZE+1 parts of your buffer, but I left the buffer size intact.
Very good work. Nice of you to figure out how to do git and all as well. Don't worry about the merge, I've taken care of everything and cherry picked your changes as I needed to. I've modified a few things too to be consistent with cgminer's code and there is definitely a significant speed advantage thanks to your changes. Note that if you're ever working on git doing your own changes, do them to a branch that's not called master as you may end up making it impossible to pull back my changes since I won't necessarily take all your code. Thanks again, and I'm sure the cgminer users will be most grateful. :)


Ah, that's how that works... good to know.  This whole git seems really useful for working together.  Thanks

-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: -ck on August 17, 2011, 06:44:39 AM
If you want to restore your tree without losing your changes, create a new branch and reset the master to the last one before your commits.

git checkout master
git branch newphatk
git reset --hard 58eb4d58599521933a3fef599e1dcba4f996dadc
git pull

that will pull my changes into the master branch and your personal changes will be in newphatk. Unfortunately your github account has a messed up master now so

git push -f

will force the changes to propagate. Do not use this command normally as it makes it impossible for people pulling from your branch to keep in sync.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: 1984 on August 18, 2011, 06:28:30 AM
excellent work on the cgminer, I'm seeing the ~same performance as phoenix and am enjoying the fancy cg features. Donation on it's way.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: iopq on August 18, 2011, 08:23:43 AM
I'm getting hardware errors on phatk 2.2, didn't get them on diapolo's or 2.1

the three are about undistinguishable in terms of speed for me


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on August 18, 2011, 05:07:53 PM
I'm getting hardware errors on phatk 2.2, didn't get them on diapolo's or 2.1

the three are about undistinguishable in terms of speed for me

Are you using BFI_INT?  Of not, there is a bug in the 2.2 kernel, Vince found that in the kernel.cl file, you have to replace

Code:
#define Ch(x, y, z) bitselect(x,y,z)

on line 78 with

Code:
#define Ch(x, y, z) bitselect(z, y, x)

I haven't gotten around to release a new version, but if you make the change yourself, it should fix it.

-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: iopq on August 19, 2011, 01:45:40 AM
I am using BFI_INT, the hardware errors are kind of random
I should mention I'm using fpgaminer's poclbm fork for this so maybe it might have something to do with it


Title: Re: Modified Kernel for Phoenix 1.5
Post by: -ck on August 19, 2011, 01:48:38 AM
Hey phateus, just a head's up. Your cgminer code only worked for 2 vectors. I've updated it in my git tree to work with 1 and 4. Simple enough change.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: iopq on September 04, 2011, 12:47:26 PM
how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on September 05, 2011, 10:47:39 PM
how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks

go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: iopq on September 06, 2011, 10:26:13 AM
how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks

go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto.
surely, this can be done programmatically, it's just changing clocks and measuring speeds for x seconds and averaging

although on my cards some values will make them unstable, lol


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on September 06, 2011, 04:36:44 PM
how can I generate this kind of a graph for my 5850 and 5750? I'm having an argument with Diablo about the best memory clocks vs. core clocks

go to google docs, make a spreadsheet, test all the speeds and options on your end manually (this part will be extremely time consuming for a high resolution graph), and put the data in yourself, and generate graph. presto pronto.
surely, this can be done programmatically, it's just changing clocks and measuring speeds for x seconds and averaging

although on my cards some values will make them unstable, lol

well the OP already said he did it manually. you're free to write a program to do it automatically, or hire someone to write one for you.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Lord F(r)og on September 25, 2011, 06:28:38 PM
donated knickknack


Title: Re: Modified Kernel for Phoenix 1.5
Post by: phelix on September 29, 2011, 07:31:57 AM
are the 1354 ALU OPs for a single SHA256 or for double? as in SHA256(SHA256(Block_Header))

http://bitcoin.stackexchange.com/questions/1293/how-many-integer-operations-on-a-gpu-are-necessary-for-one-hash

network speed guinness world record
https://bitcointalk.org/index.php?topic=38064.0


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on September 29, 2011, 03:21:15 PM
are the 1354 ALU OPs for a single SHA256 or for double? as in SHA256(SHA256(Block_Header))

http://bitcoin.stackexchange.com/questions/1293/how-many-integer-operations-on-a-gpu-are-necessary-for-one-hash

network speed guinness world record
https://bitcointalk.org/index.php?topic=38064.0

1354 OPs are for two double hashes.

SHA256(SHA256(Block_Header1)), SHA256(SHA256(Block_Header2))

so, 677 per double hash.

Although, there aren't completely full hashes, since the first and last few rounds (a few %) have optimized out.

Also, each ALU OP is a VLIW5 (very long instruction word) instruction which contains 5 integer operations that run simultaneously, so... depending on how you think about it,

could be ~3375 integer operations or 677 VLIW5 instructions

Hope this helps, let me know if you need any more help with this.  I am interested in how this turns out.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: phelix on September 30, 2011, 03:04:31 PM
first you shocked me with TWO double hashes  :o

but ~3375 integer operations per hash is just perfect  ;D

edit: did you mean 3385??


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on October 01, 2011, 10:29:26 PM
first you shocked me with TWO double hashes  :o

but ~3375 integer operations per hash is just perfect  ;D

edit: did you mean 3385??

It's actually closer to 3375 because some VLIW5 instructions only have 4 operations in them.  I can get a more exact number if needed, but its kinda a PITA cuz AMD's software won't actually tell you outright.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: phelix on October 02, 2011, 08:33:46 PM
first you shocked me with TWO double hashes  :o

but ~3375 integer operations per hash is just perfect  ;D

edit: did you mean 3385??

It's actually closer to 3375 because some VLIW5 instructions only have 4 operations in them.  I can get a more exact number if needed, but its kinda a PITA cuz AMD's software won't actually tell you outright.
ok thanks for elaborating. I used 3385 in the last calc but will just say it makes up for all the 6xxx cards ;)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Crypt_Current on October 06, 2011, 05:58:56 PM
Is the latest version of phatk the one that's included in LinuxCoin final?

I could probably check somehow, as I am a LinuxCoin user... I just don't know much about Linux and don't want to poke at my rig while it's on a roll...  ;D


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Lord F(r)og on October 09, 2011, 01:47:33 PM
If it works out for you and you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software:
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv

-Phateus

It worked out for me and I'm feeling generous, so I donated for further development.


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Crypt_Current on October 09, 2011, 04:29:51 PM
If it works out for you and you're feeling generous, any donations would be greatly appreciated so I can continue to put out bitcoin related software:
124RraPqYcEpX5qFcQ2ZBVD9MqUamfyQnv

-Phateus

It worked out for me and I'm feeling generous, so I donated for further development.

+1
I would if I could.  I wonder how many BTC enthusiasts live below the USA's so-called "poverty line"?


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on January 19, 2012, 08:58:16 AM
Bumping an ancient thread. Just noting that phatk 2.2 still holds the crown for fastest kernel on 2.1, 2.5, and 2.6 SDKs for VLIW5 tech (radeon 5xxx and 60xx-68xx). I hope phateus can come back and do even moar tweaks for moar speed!


Title: Re: Modified Kernel for Phoenix 1.5
Post by: Phateus on January 21, 2012, 08:44:43 PM
I'm checking back in after being gone for so long...  I just downloaded the 2.6 SDK and it destroys my optimization...  :-\ I will see if there is anything I can do without completely rewriting.  Stay tuned and I should have more info later this week.

P.S. Thanks to everyone who has donated to me in the past, I have been busy lately, but I have not forgotten.

-Phateus


Title: Re: Modified Kernel for Phoenix 1.5
Post by: ssateneth on January 24, 2012, 02:43:21 AM
I'm checking back in after being gone for so long...  I just downloaded the 2.6 SDK and it destroys my optimization...  :-\ I will see if there is anything I can do without completely rewriting.  Stay tuned and I should have more info later this week.

P.S. Thanks to everyone who has donated to me in the past, I have been busy lately, but I have not forgotten.

-Phateus

Should be noted that current phatk 2.2 still runs amazingly fast on 2.1 SDK; people suggested poclbm but phatk2 still runs fastest for my system with 2.1 along with 2.4/2.5. At least it does for me. I use VLIW5 hardware. Perhaps keep a 2.1-2.5 kernel around for those that want to keep using it, and a seperate 2.6 optimized kernel for those that need it for GCN architecture hardware or people that game with their gpus (they'll probably be running 1GHz+ on memory which is better suited for VECTORS4 which works well with high memory frequency)


Title: Re: Modified Kernel for Phoenix 1.5
Post by: BOARBEAR on January 24, 2012, 04:38:31 AM
Note: new sdk version works best work worksize 64 for 5870