Bitcoin Forum
December 09, 2016, 12:04:30 PM *
News: To be able to use the next phase of the beta forum software, please ensure that your email address is correct/functional.
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [20] 21 »  All
  Print  
Author Topic: further improved phatk_dia kernel for Phoenix + SDK 2.6 - 2012-01-13  (Read 101463 times)
TurdHurdur
Full Member
***
Offline Offline

Activity: 217


View Profile
January 14, 2012, 08:17:47 PM
 #381

FASTLOOP is great with AGGRESSION=6 for good desktop responsiveness, I did indeed need it. Mind you, this newest kernel doesn't seem to improve the performance of my 5970 with Catalyst 12.1.
1481285070
Hero Member
*
Offline Offline

Posts: 1481285070

View Profile Personal Message (Offline)

Ignore
1481285070
Reply with quote  #2

1481285070
Report to moderator
1481285070
Hero Member
*
Offline Offline

Posts: 1481285070

View Profile Personal Message (Offline)

Ignore
1481285070
Reply with quote  #2

1481285070
Report to moderator
1481285070
Hero Member
*
Offline Offline

Posts: 1481285070

View Profile Personal Message (Offline)

Ignore
1481285070
Reply with quote  #2

1481285070
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1481285070
Hero Member
*
Offline Offline

Posts: 1481285070

View Profile Personal Message (Offline)

Ignore
1481285070
Reply with quote  #2

1481285070
Report to moderator
1481285070
Hero Member
*
Offline Offline

Posts: 1481285070

View Profile Personal Message (Offline)

Ignore
1481285070
Reply with quote  #2

1481285070
Report to moderator
1481285070
Hero Member
*
Offline Offline

Posts: 1481285070

View Profile Personal Message (Offline)

Ignore
1481285070
Reply with quote  #2

1481285070
Report to moderator
blandead
Jr. Member
*
Offline Offline

Activity: 46


View Profile
January 18, 2012, 03:33:57 AM
 #382

Hey Dia,

So I had to do a fresh install on my computer, but I sent you a small donation just now, lemme know if it went through : D

Anyways, while I was in the process of installing AMD drivers I saw an awesome article about OpenCL 1.2 Preview with SDK 2.6. I'm testing the preview drivers out now since they add a couple new extensions, though I have to figure out the best place to use them. I ran your kernel through the latest APP KernelAnalyzer. I think there are many places it can be optimized as I'm seeing BFI_INT directly from the GPU ISA for many of the rounds, and it looks like there are a lot of new patterns they added to do so.

I also found a really cool pdf on new optimizations that are recommended for OpenCL 1.2, and it is supposed to provide a pretty good performance increase for VLIW4 architecture, and there was one part that I think would solve your VECTORS3 issue or even a better way of achieving it. If you have time send me a PM, and I can send you the pdf.

Anyways, new kernel is a little faster with VECTORS4, but for some reason the temperature is higher. That could just be because of the fresh wipe I did, did anyone else notice their GPU running hotter?
Diapolo
Hero Member
*****
Offline Offline

Activity: 769



View Profile WWW
January 18, 2012, 06:29:13 AM
 #383

Hey Dia,

So I had to do a fresh install on my computer, but I sent you a small donation just now, lemme know if it went through : D

Anyways, while I was in the process of installing AMD drivers I saw an awesome article about OpenCL 1.2 Preview with SDK 2.6. I'm testing the preview drivers out now since they add a couple new extensions, though I have to figure out the best place to use them. I ran your kernel through the latest APP KernelAnalyzer. I think there are many places it can be optimized as I'm seeing BFI_INT directly from the GPU ISA for many of the rounds, and it looks like there are a lot of new patterns they added to do so.

I also found a really cool pdf on new optimizations that are recommended for OpenCL 1.2, and it is supposed to provide a pretty good performance increase for VLIW4 architecture, and there was one part that I think would solve your VECTORS3 issue or even a better way of achieving it. If you have time send me a PM, and I can send you the pdf.

Anyways, new kernel is a little faster with VECTORS4, but for some reason the temperature is higher. That could just be because of the fresh wipe I did, did anyone else notice their GPU running hotter?

Your donation has just arrived, thank you Smiley!

Sounds pretty interesting and I would like to receive a copy of that PDF. Can you upload it somewhere or send me a link via PM? I saw, that there is a new cl_amd_media_ops2 extension in the latest drivers, but I could not find and documentation for it (the first one is used for BFI_INT patching). Would be very nice, if BFI_INT would be directly accessible via OpenCL, so that we could kick the binary patching out. The vec3 bug is really strange, I guess it happens in the Python host code and not in the kernel, because KernelAnalyzer will run it just fine.

I'm looking forward to further discussions!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
gat3way
Sr. Member
****
Offline Offline

Activity: 256


View Profile
January 18, 2012, 10:48:13 AM
 #384

Hello,

Unfortunately the amd_cl_media_ops2 extension has nothing to do with BFI_INT. There are amd_bfe() and amd_bfm() functions defined, but nothing that maps to bfi_int.

Can I have that pdf too please?
Diapolo
Hero Member
*****
Offline Offline

Activity: 769



View Profile WWW
January 18, 2012, 11:17:09 AM
 #385

Hello,

Unfortunately the amd_cl_media_ops2 extension has nothing to do with BFI_INT. There are amd_bfe() and amd_bfm() functions defined, but nothing that maps to bfi_int.

Can I have that pdf too please?

Have you got a link to the amd_media_ops2 documentation?

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
gat3way
Sr. Member
****
Offline Offline

Activity: 256


View Profile
January 18, 2012, 11:58:36 AM
 #386

There is no documentation yet. Those are the strings carved from libamdocl64.so. Additionaly, I've tested most of them (excluding max3/min3 and the sad ones) and they work. For some reason, you need to compile with -Dcl_amd_media_ops2, because just the pragma does not enable it.

For the full list see this thread:

http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=157516&messid=1274705&parentid=1274660&FTVAR_FORUMVIEWTMP=Branch
malevolent
can into space
Staff
Legendary
*
Offline Offline

Activity: 1624



View Profile
January 18, 2012, 11:40:25 PM
 #387

minus 12 Mhash/s for HD 6850
minus 25 Mhash/s for each of my HD 5850s

compared to guiminer from July 1st :/

PS. Yes, I did experiment with flags, etc.

drivers: 11.5 and 2.3 stream SDK
OS: win 7 64 pro
Diapolo
Hero Member
*****
Offline Offline

Activity: 769



View Profile WWW
January 19, 2012, 08:21:14 AM
 #388

minus 12 Mhash/s for HD 6850
minus 25 Mhash/s for each of my HD 5850s

compared to guiminer from July 1st :/

PS. Yes, I did experiment with flags, etc.

drivers: 11.5 and 2.3 stream SDK
OS: win 7 64 pro

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
malevolent
can into space
Staff
Legendary
*
Offline Offline

Activity: 1624



View Profile
January 19, 2012, 06:27:59 PM
 #389

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.

Dia

My fault then Tongue

Wasn't SDK 2.6 the one that was significantly slower? Which driver version would you recommend to work along with SDK 2.6?
BCMan
Hero Member
*****
Offline Offline

Activity: 527



View Profile
January 19, 2012, 06:55:34 PM
 #390

 Why better don't improve kernel for phatk2? It's faster than 1st version and still faster than phatk_dia.

blandead
Jr. Member
*
Offline Offline

Activity: 46


View Profile
January 20, 2012, 10:29:51 PM
 #391

Quote
Your donation has just arrived, thank you Smiley!

Sounds pretty interesting and I would like to receive a copy of that PDF. Can you upload it somewhere or send me a link via PM? I saw, that there is a new cl_amd_media_ops2 extension in the latest drivers, but I could not find and documentation for it (the first one is used for BFI_INT patching). Would be very nice, if BFI_INT would be directly accessible via OpenCL, so that we could kick the binary patching out. The vec3 bug is really strange, I guess it happens in the Python host code and not in the kernel, because KernelAnalyzer will run it just fine.

I'm looking forward to further discussions!

Dia

I'm not sure where I downloaded it, but I can easily e-mail you it. The cl_amd_media_ops2 command is for mapping 3d images, so that doesn't help us. But if you look at AMD 11.12 driver they tell you to add an environment path "GPU_ASYNC_MEM_COPY=2" to make use of a new feature. There is a preview driver of the opencl 1.2 that adds some functionality. They are lifting the rule of only 1 overloaded function, and will allow you to code directly in c++. Here is a reference card of commands http://www.khronos.org/files/opencl-1-2-quick-reference-card.pdf

Here is the bases to one of the new commands (cl_khr_fp64) http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/cl_khr_fp64.html -- adds double floating-point precision.
Only works on AMD 69xx devices though, and probably the GCN cards

I'm trying to find a direct link to this nice pdf I found with excellent examples. I have the file on my computer though.

Ah! found it... http://www.bu.edu/pasi/files/2011/01/AndreasKloeckner3-07-1000.pdf Look at page 56-60

This code should look familiar to anybody who took a programming class.
zvs
Legendary
*
Offline Offline

Activity: 1386



View Profile WWW
January 21, 2012, 07:26:54 PM
 #392

Why is it so neccesarry for phatk kernal variations to have the memclock at 1k.... Some people cant deal with that extra heat...
This is something that has changed in SDK 2.6; The best performance at the best settings after trying all options comes at a GPU RAM speed of 1000MHz (stock speed for most cards) instead of at an underclock of 300MHz-370MHz. Version 2.6, included with driver 11.12 and 12.1, is significantly different in how it responds to worksizes, vector settings, and OpenCL programming than the previous SDKs.

It is a benefit in that one doesn't need oddly tweak memory speeds from stock to get the best performance (annoying to tell noobs over and over to underclock RAM), but bad in that this old quirk was actually an electricity saver if you did it.
Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure. 

Dacentec, best deals for US dedicated servers. They regularly restock $20-$25 Opterons with 8-16GB RAM & 2x1-2TB HDD's (ofc, usually lots of other good stuff to choose from).  I did a Serverbear benchmark of one of my $20/mo Opteron (June last year), it's here.  Have had about a half dozen different servers with Dacentec, & none have failed to sustain at least 40MB/s (burst higher). My favorite is a 12-month rent-to-own ZT Systems 2XL5520 16GB 2x2TB SATA for $40/month (got lucky with the 'off-brand', haven't seen a RTO 2xL5520 for under $50/mo since -- at least for monthly contracts).  wholesaleinternet.com has some ancient 2-core intel CPUs @ $10/mo sometimes (I got an Intel Core 2 6300 @ 1.86GHz, with a 250GB HDD with 46000 hours on it, LOL. $20 @ Dacentec is much better, if you can grab one). joesdatacenter.com (same location as Wholesale Internet) also occasionally has specials (or if you don't want to wait, it has an AMD Opteron 170 @ $16/mo).
Bananington
Sr. Member
****
Offline Offline

Activity: 366


Twinkle twinkle motherfucker, twinkle twinkle.


View Profile
January 23, 2012, 03:55:18 AM
 #393

I get about 10-9MH/s increase. Thank you Diapolo!
deepceleron
Legendary
*
Offline Offline

Activity: 1470



View Profile WWW
January 23, 2012, 05:44:45 PM
 #394

Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure.  
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be:

GPU model/architecture,
GPU card memory bus/memory size,
GPU core overclock,
Operating System/32or64bit/video card driver,
OpenCL/APP SDK runtime installed on system,
Miner software,
Miner kernel (and it's particular optimizations),
Miner kernel parameters (worksize, vector size),
Compiler/SDK used to create miner,
Libraries installed on system (if running interpreted source)...

So there is no one right answer.

malevolent
can into space
Staff
Legendary
*
Offline Offline

Activity: 1624



View Profile
January 23, 2012, 05:58:57 PM
 #395

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.
Dia

OK, managed to set it work at full speed with sdk 2.3  Cheesy
Diapolo
Hero Member
*****
Offline Offline

Activity: 769



View Profile WWW
January 23, 2012, 07:58:51 PM
 #396

I recall that I mentioned this kernel is for SDK 2.6+, sorry!
It's totally ok for this kernel to not work well for older SDK versions.
Dia

OK, managed to set it work at full speed with sdk 2.3  Cheesy

Great you got it working, I only wanted to mention it's intended for 2.6+ Cheesy.

Dia

Btw.: The current kernel doesn't work with 7970 + GCN seems to dislike vectors for mining.

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
DiabloD3
Legendary
*
Offline Offline

Activity: 1162


DiabloMiner author


View Profile WWW
January 24, 2012, 05:15:23 AM
 #397

(and is not common, most 5xxx/6xxx cards are at 300MHz)

ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.

Fiyasko
Legendary
*
Offline Offline

Activity: 1428


Okey Dokey Lokey


View Profile
January 24, 2012, 05:16:56 PM
 #398

(and is not common, most 5xxx/6xxx cards are at 300MHz)

ITYM 1/3rd core clock. 300mhz is only correct if your core is 900mhz.
OH THATS THE TRICK?!?! My 6870's "sweetspot" SEEMS to be 490 with the core at 990! That makes Quite alot of sense!, I was planning on look for a SweetER spot but i felt that 490 "was it" and that i wouldnt find anything better, So i didnt look.

http://bitcoin-otc.com/viewratingdetail.php?nick=DingoRabiit&sign=ANY&type=RECV <-My Ratings
https://bitcointalk.org/index.php?topic=857670.0 GAWminers and associated things are not to be trusted, Especially the "mineral" exchange
zvs
Legendary
*
Offline Offline

Activity: 1386



View Profile WWW
January 25, 2012, 02:43:20 AM
 #399

Underclock to 300-370mhz has never been best.  395 is faster.  Fastest?  Not sure.  
I'm glad you found the memory peak that worked for you. However your case is not the absolute correct answer (and is not common, most 5xxx/6xxx cards are at 300MHz), it is just your setup and what works for you; many things will affect performance and where the memory "sweet spot" will be:

GPU model/architecture,
GPU card memory bus/memory size,
GPU core overclock,
Operating System/32or64bit/video card driver,
OpenCL/APP SDK runtime installed on system,
Miner software,
Miner kernel (and it's particular optimizations),
Miner kernel parameters (worksize, vector size),
Compiler/SDK used to create miner,
Libraries installed on system (if running interpreted source)...

So there is no one right answer.
Hasn't been my experience, nor any of the other half a dozen people I know that run 5830 setups.  The decision is more along the lines of 'do I want to run the card cooler with a lower memory setting', vs 'do I want to run at 395mhz memory, but gain a few mhash?'.

I speak of 5830's exclusively.

Dacentec, best deals for US dedicated servers. They regularly restock $20-$25 Opterons with 8-16GB RAM & 2x1-2TB HDD's (ofc, usually lots of other good stuff to choose from).  I did a Serverbear benchmark of one of my $20/mo Opteron (June last year), it's here.  Have had about a half dozen different servers with Dacentec, & none have failed to sustain at least 40MB/s (burst higher). My favorite is a 12-month rent-to-own ZT Systems 2XL5520 16GB 2x2TB SATA for $40/month (got lucky with the 'off-brand', haven't seen a RTO 2xL5520 for under $50/mo since -- at least for monthly contracts).  wholesaleinternet.com has some ancient 2-core intel CPUs @ $10/mo sometimes (I got an Intel Core 2 6300 @ 1.86GHz, with a 250GB HDD with 46000 hours on it, LOL. $20 @ Dacentec is much better, if you can grab one). joesdatacenter.com (same location as Wholesale Internet) also occasionally has specials (or if you don't want to wait, it has an AMD Opteron 170 @ $16/mo).
Diapolo
Hero Member
*****
Offline Offline

Activity: 769



View Profile WWW
January 27, 2012, 03:53:48 PM
 #400

I'm currently working pretty hard on a kernel for 7970 cards and am looking for a few guys, who are willing to test / benchmark it.
Please apply in this thread or via PM, you need to have a 7970 card and be on a current Phoenix version with latest Catalyst.
For now I don't want to release the kernel into the wild, sorry ... it's not polished Cheesy.

Thanks,
Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [20] 21 »  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!