TurdHurdur
|
 |
January 14, 2012, 08:17:47 PM |
|
FASTLOOP is great with AGGRESSION=6 for good desktop responsiveness, I did indeed need it. Mind you, this newest kernel doesn't seem to improve the performance of my 5970 with Catalyst 12.1.
|
|
|
|
|
|
"With e-currency based on cryptographic proof, without the need to
trust a third party middleman, money can be secure and transactions
effortless." -- Satoshi
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
blandead
Newbie
Offline
Activity: 46
Merit: 0
|
 |
January 18, 2012, 03:33:57 AM |
|
Hey Dia,
So I had to do a fresh install on my computer, but I sent you a small donation just now, lemme know if it went through : D
Anyways, while I was in the process of installing AMD drivers I saw an awesome article about OpenCL 1.2 Preview with SDK 2.6. I'm testing the preview drivers out now since they add a couple new extensions, though I have to figure out the best place to use them. I ran your kernel through the latest APP KernelAnalyzer. I think there are many places it can be optimized as I'm seeing BFI_INT directly from the GPU ISA for many of the rounds, and it looks like there are a lot of new patterns they added to do so.
I also found a really cool pdf on new optimizations that are recommended for OpenCL 1.2, and it is supposed to provide a pretty good performance increase for VLIW4 architecture, and there was one part that I think would solve your VECTORS3 issue or even a better way of achieving it. If you have time send me a PM, and I can send you the pdf.
Anyways, new kernel is a little faster with VECTORS4, but for some reason the temperature is higher. That could just be because of the fresh wipe I did, did anyone else notice their GPU running hotter?
|
|
|
|
Diapolo (OP)
|
 |
January 18, 2012, 06:29:13 AM |
|
Hey Dia,
So I had to do a fresh install on my computer, but I sent you a small donation just now, lemme know if it went through : D
Anyways, while I was in the process of installing AMD drivers I saw an awesome article about OpenCL 1.2 Preview with SDK 2.6. I'm testing the preview drivers out now since they add a couple new extensions, though I have to figure out the best place to use them. I ran your kernel through the latest APP KernelAnalyzer. I think there are many places it can be optimized as I'm seeing BFI_INT directly from the GPU ISA for many of the rounds, and it looks like there are a lot of new patterns they added to do so.
I also found a really cool pdf on new optimizations that are recommended for OpenCL 1.2, and it is supposed to provide a pretty good performance increase for VLIW4 architecture, and there was one part that I think would solve your VECTORS3 issue or even a better way of achieving it. If you have time send me a PM, and I can send you the pdf.
Anyways, new kernel is a little faster with VECTORS4, but for some reason the temperature is higher. That could just be because of the fresh wipe I did, did anyone else notice their GPU running hotter?
Your donation has just arrived, thank you  ! Sounds pretty interesting and I would like to receive a copy of that PDF. Can you upload it somewhere or send me a link via PM? I saw, that there is a new cl_amd_media_ops2 extension in the latest drivers, but I could not find and documentation for it (the first one is used for BFI_INT patching). Would be very nice, if BFI_INT would be directly accessible via OpenCL, so that we could kick the binary patching out. The vec3 bug is really strange, I guess it happens in the Python host code and not in the kernel, because KernelAnalyzer will run it just fine. I'm looking forward to further discussions! Dia
|
|
|
|
gat3way
|
 |
January 18, 2012, 10:48:13 AM |
|
Hello,
Unfortunately the amd_cl_media_ops2 extension has nothing to do with BFI_INT. There are amd_bfe() and amd_bfm() functions defined, but nothing that maps to bfi_int.
Can I have that pdf too please?
|
|
|
|
Diapolo (OP)
|
 |
January 18, 2012, 11:17:09 AM |
|
Hello,
Unfortunately the amd_cl_media_ops2 extension has nothing to do with BFI_INT. There are amd_bfe() and amd_bfm() functions defined, but nothing that maps to bfi_int.
Can I have that pdf too please?
Have you got a link to the amd_media_ops2 documentation? Thanks, Dia
|
|
|
|
|
malevolent
can into space
Staff
Legendary
Offline
Activity: 3472
Merit: 1718
|
 |
January 18, 2012, 11:40:25 PM |
|
minus 12 Mhash/s for HD 6850 minus 25 Mhash/s for each of my HD 5850s
compared to guiminer from July 1st :/
PS. Yes, I did experiment with flags, etc.
drivers: 11.5 and 2.3 stream SDK OS: win 7 64 pro
|
Signature space available for rent.
|
|
|
Diapolo (OP)
|
 |
January 19, 2012, 08:21:14 AM |
|
minus 12 Mhash/s for HD 6850 minus 25 Mhash/s for each of my HD 5850s
compared to guiminer from July 1st :/
PS. Yes, I did experiment with flags, etc.
drivers: 11.5 and 2.3 stream SDK OS: win 7 64 pro
I recall that I mentioned this kernel is for SDK 2.6+, sorry! It's totally ok for this kernel to not work well for older SDK versions. Dia
|
|
|
|
malevolent
can into space
Staff
Legendary
Offline
Activity: 3472
Merit: 1718
|
 |
January 19, 2012, 06:27:59 PM |
|
I recall that I mentioned this kernel is for SDK 2.6+, sorry! It's totally ok for this kernel to not work well for older SDK versions.
Dia
My fault then  Wasn't SDK 2.6 the one that was significantly slower? Which driver version would you recommend to work along with SDK 2.6?
|
Signature space available for rent.
|
|
|
BCMan
|
 |
January 19, 2012, 06:55:34 PM |
|
Why better don't improve kernel for phatk2? It's faster than 1st version and still faster than phatk_dia.
|
|
|
|
blandead
Newbie
Offline
Activity: 46
Merit: 0
|
 |
January 20, 2012, 10:29:51 PM |
|
Your donation has just arrived, thank you  ! Sounds pretty interesting and I would like to receive a copy of that PDF. Can you upload it somewhere or send me a link via PM? I saw, that there is a new cl_amd_media_ops2 extension in the latest drivers, but I could not find and documentation for it (the first one is used for BFI_INT patching). Would be very nice, if BFI_INT would be directly accessible via OpenCL, so that we could kick the binary patching out. The vec3 bug is really strange, I guess it happens in the Python host code and not in the kernel, because KernelAnalyzer will run it just fine. I'm looking forward to further discussions! Dia I'm not sure where I downloaded it, but I can easily e-mail you it. The cl_amd_media_ops2 command is for mapping 3d images, so that doesn't help us. But if you look at AMD 11.12 driver they tell you to add an environment path "GPU_ASYNC_MEM_COPY=2" to make use of a new feature. There is a preview driver of the opencl 1.2 that adds some functionality. They are lifting the rule of only 1 overloaded function, and will allow you to code directly in c++. Here is a reference card of commands http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdfHere is the bases to one of the new commands (cl_khr_fp64) http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/cl_khr_fp64.html -- adds double floating-point precision. Only works on AMD 69xx devices though, and probably the GCN cardsI'm trying to find a direct link to this nice pdf I found with excellent examples. I have the file on my computer though. Ah! found it... http://www.bu.edu/pasi/files/2011/01/AndreasKloeckner3-07-1000.pdf Look at page 56-60 This code should look familiar to anybody who took a programming class.
|
|
|
|
zvs
Legendary
Offline
Activity: 1680
Merit: 1000
https://web.archive.org/web/*/nogleg.com
|
 |
January 21, 2012, 07:26:54 PM |
|
Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...
This is something that has changed in SDK 2.6; The best performance at the best settings after trying all options comes at a GPU RAM speed of 1000MHz (stock speed for most cards) instead of at an underclock of 300MHz-370MHz. Version 2.6, included with driver 11.12 and 12.1, is significantly different in how it responds to worksizes, vector settings, and OpenCL programming than the previous SDKs. It is a benefit in that one doesn't need oddly tweak memory speeds from stock to get the best performance (annoying to tell noobs over and over to underclock RAM), but bad in that this old quirk was actually an electricity saver if you did it. Underclock to 300-370mhz has never been best. 395 is faster. Fastest? Not sure.
|
|
|
|
Bananington
|
 |
January 23, 2012, 03:55:18 AM |
|
I get about 10-9MH/s increase. Thank you Diapolo!
|
|
|
|
deepceleron
Legendary
Offline
Activity: 1512
Merit: 1011
|
 |
January 23, 2012, 05:44:45 PM |
|
Underclock to 300-370mhz has never been best. 395 is faster. Fastest? Not sure.
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be: GPU model/architecture, GPU card memory bus/memory size, GPU core overclock, Operating System/32or64bit/video card driver, OpenCL/APP SDK runtime installed on system, Miner software, Miner kernel (and it's particular optimizations), Miner kernel parameters (worksize, vector size), Compiler/SDK used to create miner, Libraries installed on system (if running interpreted source)... So there is no one right answer.
|
|
|
|
malevolent
can into space
Staff
Legendary
Offline
Activity: 3472
Merit: 1718
|
 |
January 23, 2012, 05:58:57 PM |
|
I recall that I mentioned this kernel is for SDK 2.6+, sorry! It's totally ok for this kernel to not work well for older SDK versions. Dia
OK, managed to set it work at full speed with sdk 2.3 
|
Signature space available for rent.
|
|
|
Diapolo (OP)
|
 |
January 23, 2012, 07:58:51 PM |
|
I recall that I mentioned this kernel is for SDK 2.6+, sorry! It's totally ok for this kernel to not work well for older SDK versions. Dia
OK, managed to set it work at full speed with sdk 2.3  Great you got it working, I only wanted to mention it's intended for 2.6+  . Dia Btw.: The current kernel doesn't work with 7970 + GCN seems to dislike vectors for mining.
|
|
|
|
DiabloD3
Legendary
Offline
Activity: 1162
Merit: 1000
DiabloMiner author
|
 |
January 24, 2012, 05:15:23 AM |
|
(and is not common, most 5xxx/6xxx cards are at 300MHz)
ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.
|
|
|
|
Fiyasko
Legendary
Offline
Activity: 1428
Merit: 1000
Okey Dokey Lokey
|
 |
January 24, 2012, 05:16:56 PM |
|
(and is not common, most 5xxx/6xxx cards are at 300MHz)
ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz. OH THATS THE TRICK?!?! My 6870's "sweetspot" SEEMS to be 490 with the core at 990! That makes Quite alot of sense!, I was planning on look for a SweetER spot but i felt that 490 "was it" and that i wouldnt find anything better, So i didnt look.
|
|
|
|
zvs
Legendary
Offline
Activity: 1680
Merit: 1000
https://web.archive.org/web/*/nogleg.com
|
 |
January 25, 2012, 02:43:20 AM |
|
Underclock to 300-370mhz has never been best. 395 is faster. Fastest? Not sure.
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be: GPU model/architecture, GPU card memory bus/memory size, GPU core overclock, Operating System/32or64bit/video card driver, OpenCL/APP SDK runtime installed on system, Miner software, Miner kernel (and it's particular optimizations), Miner kernel parameters (worksize, vector size), Compiler/SDK used to create miner, Libraries installed on system (if running interpreted source)... So there is no one right answer. Hasn't been my experience, nor any of the other half a dozen people I know that run 5830 setups. The decision is more along the lines of 'do I want to run the card cooler with a lower memory setting', vs 'do I want to run at 395mhz memory, but gain a few mhash?'. I speak of 5830's exclusively.
|
|
|
|
Diapolo (OP)
|
 |
January 27, 2012, 03:53:48 PM |
|
I'm currently working pretty hard on a kernel for 7970 cards and am looking for a few guys, who are willing to test / benchmark it. Please apply in this thread or via PM, you need to have a 7970 card and be on a current Phoenix version with latest Catalyst. For now I don't want to release the kernel into the wild, sorry ... it's not polished  . Thanks, Dia
|
|
|
|
|