Bitcoin Forum
December 09, 2024, 03:49:52 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 »  All
  Print  
Author Topic: DiaKGCN kernel for CGMINER + Phoenix 2 (79XX / 78XX / 77XX / GCN) - 2012-05-25  (Read 27825 times)
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 27, 2012, 06:56:42 PM
Last edit: September 20, 2012, 05:15:46 PM by Diapolo
 #1

DiaKGCN is a work-in-progress GCN optimised mining-kernel for CGMINER and Phoenix 2. Currently it ate weeks of hard work and trial and error. It will run on VLIW4 and VLIW5 GPUs just fine, but it's not optimised for them.

As the kernel is now part of CGMINER since version 2.2.7, there is no need to download additional files, you can use it out of the box. I will supply an updated kernel package for Phoenix 2, when the final version is available!

I'd like to get feedback, performance results and ideas to optimise it even further!
To support the further development of this kernel please donate to: 1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x (0.94 BTC donated so far, thanks!)

Diapolo



CGMINER thread with download links and documentation:
https://bitcointalk.org/index.php?topic=28402.0

DiaKGCN - Phoenix 2 download history:
https://anonfiles.com/file/a88219997407050d4b2ec153b35b2c0a
http://www.filedropper.com/diakgcnphoenix2
http://www.filedropper.com/diakgcnphoenix2preview_1

DiaKGCN - Phoenix 1 download history (just for reference):
http://www.filedropper.com/diakgcn04-02-2012
http://www.filedropper.com/diakgcn03-02-2012_1
http://www.filedropper.com/diakgcn02-02-2012
http://www.filedropper.com/diakgcn29-01-2012
http://www.filedropper.com/diakgcn28-01-2012



instructions for CGMINER

To use the current optimal settings on 79XX cards add this parameters to your CGMINER command-line:
Code:
-k diakgcn -v 2 -w 256

You need CGMINER >= 2.2.7 to be able to use diakgcn!



instructions for Phoenix 2

Place the folder diakgcn in phoenix2\plugins and use this for your config-file on 79XX cards (here it's for platform and device 0):
Code:
[cl:0:0]
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256

For VLIW4 / VLIW5 you should use:
Code:
[cl:0:0]
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = false
vectors4 = false
vectors8 = true
worksize = 128

With the current Phoenix 2 version don't use 1 instance with mixed GCN or VLIW4 / VLIW5 GPUs as this will lead to very poor performance!



instructions for Phoenix 1

Place the folder diakgcn in phoenix\kernels and use this command line on 79XX cards:
Code:
-k diakgcn AGGRESSION=12 VECTORS2 WORKSIZE=256

For VLIW4 / VLIW5 you should use:
Code:
-k diakgcn AGGRESSION=12 VECTORS4 WORKSIZE=128
or
Code:
-k diakgcn AGGRESSION=12 VECTORS8 WORKSIZE=128

If you encounter high CPU usage and use multiple cards, try to give each Phoenix instance a single CPU core (set a CPU affinity)!



DiaKGCN parameter description for Phoenix

BFI_INT
Use BFI_INT instruction patching (default is true).

GOFFSET
Use OpenCL 1.1 global offset parameter (default is true).

VECTORS2
Enable uint2 vector support in the kernel (default is false).

VECTORS4
Enable uint4 vector support in the kernel (default is false).

VECTORS8
Enable uint8 vector support in the kernel (default is false).



BFI_INT patching whitelist (only VLIW4 / VLIW5 GPUs)

Barts
BeaverCreek
Caicos
Cayman
Cedar
Cypress
Devastator
Juniper
Loveland
Redwood
Scrapper
Turks
WinterPark



changelog 04-02-2012:
- added uint8 vectors support in the kernel and the init (use VECTORS8 switch to activate it)
- added GOFFSET switch to be able to disable global offset parameter (use GOFFSET=False to disable it)
  -> perhaps GOFFSET is slower for some, now you can try the alternative
- changed some kernel parameter descriptions
- removed unused VECTORS3 code, never got it working :-/
- renamed OpenCL11 flag to hasOpenCL11 in the init
- removed some unneeded references to phatk from the init
- added a few comments in the init
- upped init revision to 127

changelog 03-02-2012:
- fixed the VECTORS4 code-path, which is now usable again
  -> VECTORS4 should be beneficial for VLIW4 / VLIW5, but not for GCN
- removed the (u) typecasts in the non BFI_INT Ch() and Ma() versions
  -> the hex values, who are directly used in Ch() or Ma() were changed to be unsigned
- added 2 different Ma() versions, one for VECTORS2 or VECTORS4 defined (was in before), the other for the scalar version of the kernel (new)
  -> new scalar version saves 4 Bytes in compiled GPU ISA code (but VECTORS2 is still fastest for GCN)
- hardened the BFI_INT auto patching code in the init
  -> a whitelisted OpenCL device is now checked for cl_amd_media_ops extension
- fixed a small bug where I tried to use the C-operator "&" as a "logical and" in the init
  -> changed into an Python "and" ^^
- removed a few lines unused code from the init
- upped init revision to 126

changelog 02-02-2012:
- added an automatic usage of the OpenCL 1.1 global offset parameter, on OpenCL >= 1.1 platforms -> Thanks DiabloD3 for the idea
- removed both __constant arrays in the kernel, values are now used directly
- changed Ma() function from a general one into faster ones for the BFI_INT path and the non BFI_INT path
- added new kernel parameters (W16addK16, W17addK17, state0A and state0B)
- added 2 new local variables state0AaddV0 and state0BaddV0
- rewrote some rounds to use new kernel parameters and variables for faster execution
- fixed a write to output buffer bug for the non VECTORS path in the kernel
- changed the BFI_INT whitelisted flag code in the init
- added an OpenCL >= 1.1 flag in the init used for activating the global offset parameter
- reactivated PyOpenCL version output in the init
- upped init revision to 125
- removed unneeded code or comments from the kernel and the init
- added DiabloMiner kernel as addition reference for getting new ideas in the kernel header

changelog 29-01-2012:
- reordered kernel parameters in order of usage in the kernel
- removed unused kernel parameters (B1addF1addK6, C1addG1addK5, D1addH1)
- added new kernel parameter (PreVal0addK7)
- rewrote first 4 rounds to speed up the kernel
- VECTORS4 parameter is not finished, it currently uses VECTORS2 code-path

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Roadhog2k5
Full Member
***
Offline Offline

Activity: 131
Merit: 100



View Profile
January 27, 2012, 07:55:09 PM
 #2

I have 3, 7970s I'd be willing to test on. Shoot me a pm.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 27, 2012, 10:13:16 PM
 #3

I have 3, 7970s I'd be willing to test on. Shoot me a pm.

Done, thanks for helping Smiley.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
sveetsnelda
Hero Member
*****
Offline Offline

Activity: 642
Merit: 500


View Profile
January 27, 2012, 11:05:39 PM
 #4

Same story.  Have a 4 card rig and would be glad to help.

14u2rp4AqFtN5jkwK944nn741FnfF714m7
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 27, 2012, 11:09:54 PM
 #5

Same story.  Have a 4 card rig and would be glad to help.

PM sent, thanks!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
simplecoin
Sr. Member
****
Offline Offline

Activity: 406
Merit: 250



View Profile WWW
January 28, 2012, 12:20:38 AM
 #6

got a 1 card rig if you need it.

Donations: 1VjGJHPtLodwCFBDWsHJMdEhqRcRKdBQk
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 28, 2012, 12:28:57 AM
 #7

If all keeps this smooth, a release is just around the corner ... stay tuned.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
simplecoin
Sr. Member
****
Offline Offline

Activity: 406
Merit: 250



View Profile WWW
January 28, 2012, 01:02:02 AM
 #8

Nice work for sure! The more 7970 kernels the better Smiley

Donations: 1VjGJHPtLodwCFBDWsHJMdEhqRcRKdBQk
jjiimm_64
Legendary
*
Offline Offline

Activity: 1876
Merit: 1000


View Profile
January 28, 2012, 04:21:15 AM
 #9


I have a 4x7970 rig.  would love to test.

1jimbitm6hAKTjKX4qurCNQubbnk2YsFw
wndrbr3d
Hero Member
*****
Offline Offline

Activity: 914
Merit: 500


View Profile
January 28, 2012, 05:00:43 AM
 #10

Totes subbing to this thread. I have the money, just waiting for the results Smiley
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 28, 2012, 04:39:26 PM
 #11

A second version was sent to the testers, if others are interested in trying this out just give me a shout.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 28, 2012, 10:29:41 PM
 #12

http://www.filedropper.com/diakgcn28-01-2012

I'll leave this without comments for now ...

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
simplecoin
Sr. Member
****
Offline Offline

Activity: 406
Merit: 250



View Profile WWW
January 28, 2012, 11:51:32 PM
Last edit: January 29, 2012, 05:27:05 AM by simplecoin
 #13

Is the hashrate display broken for VECTORS4? Running VEC2/AGG10/WS256 I get ~626MH at 1080/366 (about 10 mh/s less than diablo, not bad!).

If I use VEC4 my hashrate display doubles - ~1.22GH/s. I wish this wasn't a bug or something Shocked

Yes, I see this too at stock (1.09Gh v4 agg12). Although, shares are accepted..... gonna wait to see what my site says actual shares are

UPDATE: Actual Hashrate is about the same as vectors2. Seems like a reporting issue.

Donations: 1VjGJHPtLodwCFBDWsHJMdEhqRcRKdBQk
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 29, 2012, 10:16:47 AM
 #14

Is the hashrate display broken for VECTORS4? Running VEC2/AGG10/WS256 I get ~626MH at 1080/366 (about 10 mh/s less than diablo, not bad!).

If I use VEC4 my hashrate display doubles - ~1.22GH/s. I wish this wasn't a bug or something Shocked

VEC4 is broken, sorry to say Wink ... it works with VEC2 speed currently. VEC4 seems to be not a good option for GCN.
I will polish the kernel further and supply a changelog in the future. I only wanted to get it released first.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 29, 2012, 04:18:09 PM
 #15

download current version:
http://www.filedropper.com/diakgcn29-01-2012

Should be faster than the previous one, changelog is included and I edited the first post to be more informative!

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Dyaheon
Member
**
Offline Offline

Activity: 121
Merit: 10


View Profile
January 30, 2012, 02:22:55 PM
 #16

~695MH/s on a 7970 at 1175/1375 clocks, with the command line from the OP.

Diablominer gives ~700MH/s with less interface lag though.
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
January 30, 2012, 02:32:01 PM
 #17

~695MH/s on a 7970 at 1175/1375 clocks, with the command line from the OP.

Diablominer gives ~700MH/s with less interface lag though.

Some reports indicate, that a lower AGGRESSION could lead to higher values, but I can't confirm this for my machine.
I'm working hard on the next version, the optimisation is not finished...

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
wndrbr3d
Hero Member
*****
Offline Offline

Activity: 914
Merit: 500


View Profile
February 01, 2012, 07:48:43 PM
 #18

@Diapolo:

So do you have any opinions on GCN vs. VLIW4/5 when it comes to optimizations for the mining cores that are out there? Do you expect to CGN to be a nice step forward, or at best, should we be happy that CGN didn't nerf performance when compared to the VLIW4/5 architecture?

I'm curious to get your feedback. Smiley

Thanks for all your work!
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
February 02, 2012, 12:21:38 PM
Last edit: February 03, 2012, 06:48:41 AM by Diapolo
 #19

New version 02-02-2012 is ready for download. Release highlights include OpenCL 1.1 global offset parameter support (THX DiabloD3 for the idea - damn it sucked to do this in Python ^^), fixed non VECTOR code path and faster kernel execution on GCN cards (achieved via saving instructions in the GPU ISA code).

download current version:
http://www.filedropper.com/diakgcn02-02-2012

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Diapolo (OP)
Hero Member
*****
Offline Offline

Activity: 772
Merit: 500



View Profile WWW
February 02, 2012, 12:27:43 PM
 #20

@Diapolo:

So do you have any opinions on GCN vs. VLIW4/5 when it comes to optimizations for the mining cores that are out there? Do you expect to CGN to be a nice step forward, or at best, should we be happy that CGN didn't nerf performance when compared to the VLIW4/5 architecture?

I'm curious to get your feedback. Smiley

Thanks for all your work!

I think GCN is a great step in the right direction. It's far easier for me AND the compiler to write / generate code, which results in pretty good utilization of the GPUs compute units. The CUs in contrast to VLIW4/VLIW5 units consist of independant vector units, which makes code or wavefronts on the GPU depend less on results of other units. The OpenCL compiler for GCN feels far more matured, than it was after the relase of the 69XX series of cards. The drawback seems to be, that the current kernels have all very similar performance levels Cheesy.

Dia

Liked my former work for Bitcoin Core? Drop me a donation via:
1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x
bitcoin:1PwnvixzVAKnAqp8LCV8iuv7ohzX2pbn5x?label=Diapolo
Pages: [1] 2 3 4 5 6 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!