Bitcoin Forum
May 05, 2024, 01:56:38 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [20] 21 »  All
  Print  
Author Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13  (Read 106678 times)
TurdHurdur
Full Member
***
Offline Offline

Activity: 216
Merit: 100


View Profile
January 14, 2012, 08:17:47 PM
 #381

FASTLOOP is great with AGGRESSION=6 for good desktop responsiveness, I did indeed need it. Mind you, this newest kernel doesn't seem to improve the performance of my 5970 with Catalyst 12.1.
1714874198
Hero Member
*
Offline Offline

Posts: 1714874198

View Profile Personal Message (Offline)

Ignore
1714874198
Reply with quote  #2

1714874198
Report to moderator
1714874198
Hero Member
*
Offline Offline

Posts: 1714874198

View Profile Personal Message (Offline)

Ignore
1714874198
Reply with quote  #2

1714874198
Report to moderator
1714874198
Hero Member
*
Offline Offline

Posts: 1714874198

View Profile Personal Message (Offline)

Ignore
1714874198
Reply with quote  #2

1714874198
Report to moderator
"If you don't want people to know you're a scumbag then don't be a scumbag." -- margaritahuyan
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714874198
Hero Member
*
Offline Offline

Posts: 1714874198

View Profile Personal Message (Offline)

Ignore
1714874198
Reply with quote  #2

1714874198
Report to moderator
1714874198
Hero Member
*
Offline Offline

Posts: 1714874198

View Profile Personal Message (Offline)

Ignore
1714874198
Reply with quote  #2

1714874198
Report to moderator
1714874198
Hero Member
*
Offline Offline

Posts: 1714874198

View Profile Personal Message (Offline)

Ignore
1714874198
Reply with quote  #2

1714874198
Report to moderator
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 18, 2012, 03:33:57 AM
 #382

Hey Dia,

So I had to do a fresh install on my computer, but I sent you a small donation just now, lemme know if it went through : D

Anyways, while I was in the process of installing AMD drivers I saw an awesome article about OpenCL 1.2 Preview with SDK 2.6. I'm testing the preview drivers out now since they add a couple new extensions, though I have to figure out the best place to use them. I ran your kernel through the latest APP KernelAnalyzer. I think there are many places it can be optimized as I'm seeing BFI_INT directly from the GPU ISA for many of the rounds, and it looks like there are a lot of new patterns they added to do so.

I also found a really cool pdf on new optimizations that are recommended for OpenCL 1.2, and it is supposed to provide a pretty good performance increase for VLIW4 architecture, and there was one part that I think would solve your VECTORS3 issue or even a better way of achieving it. If you have time send me a PM, and I can send you the pdf.

Anyways, new kernel is a little faster with VECTORS4, but for some reason the temperature is higher. That could just be because of the fresh wipe I did, did anyone else notice their GPU running hotter?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 18, 2012, 06:29:13 AM
 #383

Hey Dia,

So I had to do a fresh install on my computer, but I sent you a small donation just now, lemme know if it went through : D

Anyways, while I was in the process of installing AMD drivers I saw an awesome article about OpenCL 1.2 Preview with SDK 2.6. I'm testing the preview drivers out now since they add a couple new extensions, though I have to figure out the best place to use them. I ran your kernel through the latest APP KernelAnalyzer. I think there are many places it can be optimized as I'm seeing BFI_INT directly from the GPU ISA for many of the rounds, and it looks like there are a lot of new patterns they added to do so.

I also found a really cool pdf on new optimizations that are recommended for OpenCL 1.2, and it is supposed to provide a pretty good performance increase for VLIW4 architecture, and there was one part that I think would solve your VECTORS3 issue or even a better way of achieving it. If you have time send me a PM, and I can send you the pdf.

Anyways, new kernel is a little faster with VECTORS4, but for some reason the temperature is higher. That could just be because of the fresh wipe I did, did anyone else notice their GPU running hotter?

Your donation has just arrived, thank you Smiley!

Sounds pretty interesting and I would like to receive a copy of that PDF. Can you upload it somewhere or send me a link via PM? I saw, that there is a new cl_amd_media_ops2 extension in the latest drivers, but I could not find and documentation for it (the first one is used for BFI_INT patching). Would be very nice, if BFI_INT would be directly accessible via OpenCL, so that we could kick the binary patching out. The vec3 bug is really strange, I guess it happens in the Python host code and not in the kernel, because KernelAnalyzer will run it just fine.

I'm looking forward to further discussions!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
gat3way
Sr. Member
****
Offline Offline

Activity: 256
Merit: 250


View Profile
January 18, 2012, 10:48:13 AM
 #384

Hello,

Unfortunately the amd_cl_media_ops2 extension has nothing to do with BFI_INT. There are amd_bfe() and amd_bfm() functions defined, but nothing that maps to bfi_int.

Can I have that pdf too please?
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 18, 2012, 11:17:09 AM
 #385

Hello,

Unfortunately the amd_cl_media_ops2 extension has nothing to do with BFI_INT. There are amd_bfe() and amd_bfm() functions defined, but nothing that maps to bfi_int.

Can I have that pdf too please?

Have you got a link to the amd_media_ops2 documentation?

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
gat3way
Sr. Member
****
Offline Offline

Activity: 256
Merit: 250


View Profile
January 18, 2012, 11:58:36 AM
 #386

There is no documentation yet. Those are the strings carved from libamdocl64.so. Additionaly, I've tested most of them (excluding max3/min3 and the sad ones) and they work. For some reason, you need to compile with -Dcl_amd_media_ops2, because just the pragma does not enable it.

For the full list see this thread:

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=157516&messid=1274705&parentid=1274660&FTVAR_FORUMVIEWTMP=Branch
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
January 18, 2012, 11:40:25 PM
 #387

minus 12 Mhash/s for HD 6850
minus 25 Mhash/s for each of my HD 5850s

compared to guiminer from July 1st :/

PS. Yes, I did experiment with flags, etc.

drivers: 11.5 and 2.3 stream SDK
OS: win 7 64 pro

Signature space available for rent.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 19, 2012, 08:21:14 AM
 #388

minus 12 Mhash/s for HD 6850
minus 25 Mhash/s for each of my HD 5850s

compared to guiminer from July 1st :/

PS. Yes, I did experiment with flags, etc.

drivers: 11.5 and 2.3 stream SDK
OS: win 7 64 pro

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
January 19, 2012, 06:27:59 PM
 #389

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.

Dia

My fault then Tongue

Wasn't SDK 2.6 the one that was significantly slower? Which driver version would you recommend to work along with SDK 2.6?

Signature space available for rent.
BCMan
Hero Member
*****
Offline Offline

Activity: 535
Merit: 500



View Profile
January 19, 2012, 06:55:34 PM
 #390

 Why better don't improve kernel for phatk2? It's faster than 1st version and still faster than phatk_dia.
blandead
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 20, 2012, 10:29:51 PM
 #391

Quote
Your donation has just arrived, thank you Smiley!

Sounds pretty interesting and I would like to receive a copy of that PDF. Can you upload it somewhere or send me a link via PM? I saw, that there is a new cl_amd_media_ops2 extension in the latest drivers, but I could not find and documentation for it (the first one is used for BFI_INT patching). Would be very nice, if BFI_INT would be directly accessible via OpenCL, so that we could kick the binary patching out. The vec3 bug is really strange, I guess it happens in the Python host code and not in the kernel, because KernelAnalyzer will run it just fine.

I'm looking forward to further discussions!

Dia

I'm not sure where I downloaded it, but I can easily e-mail you it. The cl_amd_media_ops2 command is for mapping 3d images, so that doesn't help us. But if you look at AMD 11.12 driver they tell you to add an environment path "GPU_ASYNC_MEM_COPY=2" to make use of a new feature. There is a preview driver of the opencl 1.2 that adds some functionality. They are lifting the rule of only 1 overloaded function, and will allow you to code directly in c++. Here is a reference card of commands http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdf

Here is the bases to one of the new commands (cl_khr_fp64) http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/cl_khr_fp64.html -- adds double floating-point precision.
Only works on AMD 69xx devices though, and probably the GCN cards

I'm trying to find a direct link to this nice pdf I found with excellent examples. I have the file on my computer though.

Ah! found it... http://www.bu.edu/pasi/files/2011/01/AndreasKloeckner3-07-1000.pdf Look at page 56-60

This code should look familiar to anybody who took a programming class.
zvs
Legendary
*
Offline Offline

Activity: 1680
Merit: 1000


https://web.archive.org/web/*/nogleg.com


View Profile WWW
January 21, 2012, 07:26:54 PM
 #392

Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...
This is something that has changed in SDK 2.6; The best performance at the best settings after trying all options comes at a GPU RAM speed of 1000MHz (stock speed for most cards) instead of at an underclock of 300MHz-370MHz. Version 2.6, included with driver 11.12 and 12.1, is significantly different in how it responds to worksizes, vector settings, and OpenCL programming than the previous SDKs.

It is a benefit in that one doesn't need oddly tweak memory speeds from stock to get the best performance (annoying to tell noobs over and over to underclock RAM), but bad in that this old quirk was actually an electricity saver if you did it.
Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure. 
Bananington
Sr. Member
****
Offline Offline

Activity: 1414
Merit: 344



View Profile
January 23, 2012, 03:55:18 AM
 #393

I get about 10-9MH/s increase. Thank you Diapolo!

.
SPIN

       ▄▄▄██████████▄▄▄
     ▄███████████████████▄
   ▄██████████▀▀███████████▄
   ██████████    ███████████
 ▄██████████      ▀█████████▄
▄██████████        ▀█████████▄
█████████▀▀   ▄▄    ▀▀▀███████
█████████▄▄  ████▄▄███████████
███████▀  ▀▀███▀      ▀███████
▀█████▀          ▄█▄   ▀█████▀
 ▀███▀   ▄▄▄  ▄█████▄   ▀███▀
   ██████████████████▄▄▄███
   ▀██████████████████████▀
     ▀▀████████████████▀▀
        ▀▀▀█████████▀▀▀
.
RIUM
.
███
███
███
███
███
███
███
███
███
███
███
███
SAFE GAMES
WITH WITHDRAWALS
       ▄▀▀▀▀▀▀▄▄▄▄
 ▄▀▀▀▀▀▀▀▀▀▀▀▀▄  ▀▀▄
█    ▄         █   ▀▌
█   █ █        █    ▌
█      ▄█▄     █   ▐
█     ▄███▄    █   ▌
█    ███████   █  ▐
█    ▀▀ █ ▀▀   █  ▌
█     ▄███▄    █ ▐
█              █▐▌
█        █ █   █▌
 ▀▄▄▄▄▄▄▄▄█▄▄▄▀
       ▄▀▀▀▀▀▀▄▄▄▄
 ▄▀▀▀▀▀▀▀▀▀▀▀▀▄  ▀▀▄
█    ▄         █   ▀▌
█   █ █        █    ▌
█      ▄█▄     █   ▐
█     ▄███▄    █   ▌
█    ███████   █  ▐
█    ▀▀ █ ▀▀   █  ▌
█     ▄███▄    █ ▐
█              █▐▌
█        █ █   █▌
 ▀▄▄▄▄▄▄▄▄█▄▄▄▀
.
███
███
███
███
███
███
███
███
███
███
███
███
▄▀▀▀











▀▄▄▄
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
.
SIGN UP


▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
▀▀▀▄











▄▄▄▀
deepceleron
Legendary
*
Offline Offline

Activity: 1512
Merit: 1028



View Profile WWW
January 23, 2012, 05:44:45 PM
 #394

Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure.  
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be:

GPU model/architecture,
GPU card memory bus/memory size,
GPU core overclock,
Operating System/32or64bit/video card driver,
OpenCL/APP SDK runtime installed on system,
Miner software,
Miner kernel (and it's particular optimizations),
Miner kernel parameters (worksize, vector size),
Compiler/SDK used to create miner,
Libraries installed on system (if running interpreted source)...

So there is no one right answer.
malevolent
can into space
Legendary
*
Offline Offline

Activity: 3472
Merit: 1721



View Profile
January 23, 2012, 05:58:57 PM
 #395

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.
Dia

OK, managed to set it work at full speed with sdk 2.3  Cheesy

Signature space available for rent.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 23, 2012, 07:58:51 PM
 #396

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.
Dia

OK, managed to set it work at full speed with sdk 2.3  Cheesy

Great you got it working, I only wanted to mention it's intended for 2.6+ Cheesy.

Dia

Btw.: The current kernel doesn't work with 7970 + GCN seems to dislike vectors for mining.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
DiabloD3
Legendary
*
Offline Offline

Activity: 1162
Merit: 1000


DiabloMiner author


View Profile WWW
January 24, 2012, 05:15:23 AM
 #397

(and is not common, most 5xxx/6xxx cards are at 300MHz)

ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.

Fiyasko
Legendary
*
Offline Offline

Activity: 1428
Merit: 1001


Okey Dokey Lokey


View Profile
January 24, 2012, 05:16:56 PM
 #398

(and is not common, most 5xxx/6xxx cards are at 300MHz)

ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.
OH THATS THE TRICK?!?! My 6870's "sweetspot" SEEMS to be 490 with the core at 990! That makes Quite alot of sense!, I was planning on look for a SweetER spot but i felt that 490 "was it" and that i wouldnt find anything better, So i didnt look.

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
zvs
Legendary
*
Offline Offline

Activity: 1680
Merit: 1000


https://web.archive.org/web/*/nogleg.com


View Profile WWW
January 25, 2012, 02:43:20 AM
 #399

Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure.  
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be:

GPU model/architecture,
GPU card memory bus/memory size,
GPU core overclock,
Operating System/32or64bit/video card driver,
OpenCL/APP SDK runtime installed on system,
Miner software,
Miner kernel (and it's particular optimizations),
Miner kernel parameters (worksize, vector size),
Compiler/SDK used to create miner,
Libraries installed on system (if running interpreted source)...

So there is no one right answer.
Hasn't been my experience, nor any of the other half a dozen people I know that run 5830 setups.  The decision is more along the lines of 'do I want to run the card cooler with a lower memory setting', vs 'do I want to run at 395mhz memory, but gain a few mhash?'.

I speak of 5830's exclusively.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 769
Merit: 500



View Profile WWW
January 27, 2012, 03:53:48 PM
 #400

I'm currently working pretty hard on a kernel for 7970 cards and am looking for a few guys, who are willing to test / benchmark it.
Please apply in this thread or via PM, you need to have a 7970 card and be on a current Phoenix version with latest Catalyst.
For now I don't want to release the kernel into the wild, sorry ... it's not polished Cheesy.

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [20] 21 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!