Bitcoin Forum
April 26, 2024, 09:21:18 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
Author Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels  (Read 61214 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 11, 2015, 10:54:26 PM
 #161

Sorry for taking it a bit long.

Here's what all you have to know if you're willing to test: http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/

Please send me benchmarks and compiled kernels for various cards!

I'm running it for an hour now and I got a 'rejected'. I'm solo mining GRS. Do I need to worry? Or is it usual? Can it be caused by slow network?
1714166478
Hero Member
*
Offline Offline

Posts: 1714166478

View Profile Personal Message (Offline)

Ignore
1714166478
Reply with quote  #2

1714166478
Report to moderator
1714166478
Hero Member
*
Offline Offline

Posts: 1714166478

View Profile Personal Message (Offline)

Ignore
1714166478
Reply with quote  #2

1714166478
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714166478
Hero Member
*
Offline Offline

Posts: 1714166478

View Profile Personal Message (Offline)

Ignore
1714166478
Reply with quote  #2

1714166478
Report to moderator
1714166478
Hero Member
*
Offline Offline

Posts: 1714166478

View Profile Personal Message (Offline)

Ignore
1714166478
Reply with quote  #2

1714166478
Report to moderator
1714166478
Hero Member
*
Offline Offline

Posts: 1714166478

View Profile Personal Message (Offline)

Ignore
1714166478
Reply with quote  #2

1714166478
Report to moderator
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 11, 2015, 10:56:59 PM
 #162

Realhet, thanks for the capeverde bin, unfortunately I can't use it because it's 32 bit.
I created a bootable win7 stick in order to compile the kernel: it compiles fine but, when run, it says "no target Hawaii" and no bin is created.

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
January 11, 2015, 10:58:04 PM
 #163

I'm running it for an hour now and I got a 'rejected'. I'm solo mining GRS. Do I need to worry? Or is it usual? Can it be caused by slow network?

yes it can be cause of the network: if the wallet is behind sync, the block may be rejected (or orphaned).
try with a pool...

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 11, 2015, 11:11:06 PM
Last edit: January 12, 2015, 01:26:16 AM by utahjohn
 #164

Runtime error: No GCN device found

I have 2 AMD cards on gpu-platform 1
and 1 Intel GPU on gpu-platform 0

Edit: DOH 14.7RC3 not GCN ...
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 11, 2015, 11:47:28 PM
 #165

Thx for testing! So many errors :S But usually that's how it goes.

"No GCN device found" error.

That could be because I can't recognize new cards.
I know only these at the moment.
'TAHITI', 'PITCAIRN', 'CAPEVERDE', 'UNKNOWN5');
Importing new names right now.

Meanwhile you can select an OpenCL device by uncommenting this line in the code:
var dev:=cl.devices[0]; //access device by index (must be a GCN one)

The findDevices function can't recognize new cards. I'll repair it now.

@pallas: Thanks for fiddling with Win7! Cheesy What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )
   
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 12, 2015, 12:19:53 AM
 #166

I've updated HetPas and the groestl_isa.hpas too. Pls download HetPas150111_Groestl.zip.

From now it will start with a list of the cards:
writeln("List of opencl devices:");
for var i:=0 to cl.devices.count-1 do begin
  writeln("Device #",i);
  writeln(cl.devices[ i].dump);
end;

It should display something like this:
List of opencl devices:
Device #0
Target: Cayman  Series: 6  Core:880 MHz  CU:24  RAM:2048 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics ...
Device #1
Target: Capeverde  Series: 7  Core:880 MHz  CU:10  RAM:1024 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 ...

Using device:
Target: Capeverde  Series: 7  Core:880 MHz  CU:10  RAM:1024 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 ...
* core MHz value is not always accurate, use Catalyst Control Center (or ADL) instead!

For the GCN cards, the 'Series' must be at least 7. If it fails and it is indeed a GCN card, then I detected it badly, pls report then. My first card is a series 6xxx Northern Islands hardware, it can't used for this kernel.

@utahjohn: Maybe it works on 14.7 too. I can't tell that, but I know that it will crash on 13.4 because the kernel parameters are handled differently in that driver.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 12, 2015, 12:22:34 AM
Last edit: January 12, 2015, 01:18:51 AM by utahjohn
 #167

Temporarily upgraded to 14.9 to run hetpas, built for 280x.
Had hell of a time reverting back to 14.7 ... several tries later 14.7 working again and I have a kernel.elf for 280x.

Testing now ...

Very early results ...
280x I=22 E=1180 M=150 WS=256 ... 26 MHs Solo . No blocks yet ... approx 1.4x normal diamond kernel (18.5MHs)

Intensity 22 is sweet spot for my 280x, now playing with mem clock ...

No significant effect on raising mem-clock other than higher temps ...

stick with low mem clock.
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 12, 2015, 12:56:32 AM
 #168

"Very early results ..."

Very good, that it runs at you!

The speedup is not that impressive but let me ask yo to do a test:

Please when you stop sgminer, press run the groestl_isa.hpas, and copy/paste here my programs output, like this:

-----------------------------------------
Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.661 ms  13.749 MH/s   gain:   3.44x
elapsed: 188.444 ms  13.911 MH/s   gain:   3.48x
elapsed: 188.218 ms  13.928 MH/s   gain:   3.48x
elapsed: 188.225 ms  13.927 MH/s   gain:   3.48x

Functional test: RESULT IS OK
-----------------------------------------

And then go to around line 23 and comment out the "#define USE_NEW_ASM_KERNEL" and run it again! This will compile the original OpenCL kernel I've downloaded with sgminer5.1.

-----------------------------------------
Using original OpenCL code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 657.623 ms  3.986 MH/s   gain:   1.00x
elapsed: 655.396 ms  4.000 MH/s   gain:   1.00x
elapsed: 654.897 ms  4.003 MH/s   gain:   1.00x
elapsed: 655.055 ms  4.002 MH/s   gain:   1.00x

Functional test: RESULT IS OK
-----------------------------------------

As you can see, on my small card the speedup is 3.5x. I'd like to check these results on your 280x as well.
I'm thinking that the problem is only because your big card don't get enough threads ore something similar.

Just a silly test: what if you turn Memory clock up to normal speed? Maybe it will change the L1 cache's behaviour? My kernel uses 0 memory, but uses L1 cache extensively.

And finally I had an 'accepted', phew...

"Had hell of a time reverting back to 14.7" -> Is there a tool called "Catalyst Clean Uninstall Utility" nowadays? 2-3 years ago that was useful when decrease Cat version.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 12, 2015, 01:09:06 AM
 #169

"Very early results ..."

Very good, that it runs at you!

The speedup is not that impressive but let me ask yo to do a test:

Please when you stop sgminer, press run the groestl_isa.hpas, and copy/paste here my programs output, like this:

-----------------------------------------
Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.661 ms  13.749 MH/s   gain:   3.44x
elapsed: 188.444 ms  13.911 MH/s   gain:   3.48x
elapsed: 188.218 ms  13.928 MH/s   gain:   3.48x
elapsed: 188.225 ms  13.927 MH/s   gain:   3.48x

Functional test: RESULT IS OK
-----------------------------------------

And then go to around line 23 and comment out the "#define USE_NEW_ASM_KERNEL" and run it again! This will compile the original OpenCL kernel I've downloaded with sgminer5.1.

-----------------------------------------
Using original OpenCL code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 657.623 ms  3.986 MH/s   gain:   1.00x
elapsed: 655.396 ms  4.000 MH/s   gain:   1.00x
elapsed: 654.897 ms  4.003 MH/s   gain:   1.00x
elapsed: 655.055 ms  4.002 MH/s   gain:   1.00x

Functional test: RESULT IS OK
-----------------------------------------

As you can see, on my small card the speedup is 3.5x. I'd like to check these results on your 280x as well.
I'm thinking that the problem is only because your big card don't get enough threads ore something similar.

Just a silly test: what if you turn Memory clock up to normal speed? Maybe it will change the L1 cache's behaviour? My kernel uses 0 memory, but uses L1 cache extensively.

And finally I had an 'accepted', phew...

"Had hell of a time reverting back to 14.7" -> Is there a tool called "Catalyst Clean Uninstall Utility" nowadays? 2-3 years ago that was useful when decrease Cat version.
No significant effect on raising mem-clock other than higher temps ...

Use "DDU" to clean catalyst drivers but not always 100% effective sometimes a little manual cleaning needed too ...

BTW I am using Pallas kernel as reference, not one supplied with stock sgminer ...

Any tweaks you can do with 2048 shaders (280x) and 1792 shaders (7950) ?
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 12, 2015, 01:24:21 AM
 #170

Yes, that is must be the same kernel that I've copied into the groestl directory next to the groestl_isa.hpas file.

When you compile the original kernel within then groestl_isa.hpas program, it will use the groestl_original.cl kernel. It's Pallas's kernel, except that I hardcoded the workgroup size in it, and did another very minor change.

Also I compared the kernel I downloaded from the very first post in this topic: It's the same.
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 12, 2015, 01:29:45 AM
 #171

I did not try running kernel under catalyst 14.9, all I wanted was to generate the kernel.elf to run under 14.7 ... because I run multiple algos concurrently under 14.7 that suffer under 14.9 ...

Also note that I am running sgminer 4.1.0
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 12, 2015, 01:36:18 AM
 #172

I tested my kernel only in Cat 14.9
I have no info on how it works on 14.7

When you compile in HetPas it will generate a skeleton kernel binary with the help of the OpenCL compiler. And then the new assembly code will be PATCHED into that. So I don't make the binary from scratch and maybe the 14.7 binary is a bit different than the 14.9 binary and I just don't know about that. (Although life would be so much easier if AMD would be so kind and give us an interface to upload binary program code... But that's not going to happen Cheesy)


"Any tweaks you can do with..."

Please let's do the test inside the IDE first. Let's compare the original and the new kernel there, as it is perfect for timing. In sgminer we need to play with Intensity and other factors and wait for minutes to get a correct time anyways.

So please paste here what you see on HetPas on the right pane after you run the program:
I'm interested in this information, and also tell me what card and engine MHz you used:

Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.645 ms  13.750 MH/s   gain:   3.44x
elapsed: 188.281 ms  13.923 MH/s   gain:   3.48x
elapsed: 188.233 ms  13.927 MH/s   gain:   3.48x
elapsed: 188.316 ms  13.920 MH/s   gain:   3.48x

Functional test: RESULT IS OK

realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 12, 2015, 02:04:01 AM
 #173

Thanks!

Well this is kinda bad for a Tahiti :/

Also the times of the 4 kernel launches are weird:
On my card it is 3.44x, 3.48x, 3.48x, 3.48x
But on your card this is 3.88x, 3.10x, 3.10x, 3.10x

On my card the first launch is a bit slow because the card was at low MHz when the test started and after the warmip it became steady 3.48x.

On your card the speeds are so random. Your card (at 1150) is 3.68x faster than mine, so everything is ok, you should have see 12.8x gains.

Maybe it is a 14.7 issue, I don't know. Everything can change from driver to driver...

What is on my mind is:

1. What if you change workcount form the original
    WorkCount := 256*10*512
to WorkCount := 256*10*512*10;  ?
Does elapsed times became are 10x longer?  (Functional test will fail, ot's ok, just reset WorkCount to default value after this test)

2. Let's see how the original kernel works in HetPas:
  just comment out the  "#define USE_NEW_ASM_KERNEL" and let me see the times please. If the original kernel works well, then gain must be 3.68.


(Thank you for testing so far)

--------------------------------------------------------------------
"elapsed: 50.686 ms  51.719 MH/s   gain:  12.93x"
WOW! THIS IS IT! Cheesy:D:D
Exactly what I've expected! Your card is 3.71x faster. What was the error? You accidentally mined while testing, right?
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 12, 2015, 02:15:04 AM
 #174

The last test run I did grabbed 2 cards so divide in half for an average on Tahiti (280x+7950).

Not the gains I was expecting base on you blog ... 3.4x times 18.5 MHs should net me around 62 Mhz vs the 26MHs I'm getting now ... so Tahiti not so great gains but better  Smiley

Short of pulling a card physically I don't know how to disable hetpas running all of them ...
Star65
Member
**
Offline Offline

Activity: 109
Merit: 13


View Profile
January 12, 2015, 02:34:42 AM
 #175

I would also tested on 7970 & 280x.
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 12, 2015, 02:47:15 AM
 #176

There must be some missunderstandings based on MHs values. So we have to be careful!

On this topic (first post) when Pallas says that R9 280x is 18MH/s he counts it in Groestl hashes.

When my program says "elapsed: 50.686 ms  51.719 MH/s" it counts it also in Groestl hashes. Just as Pallas.

But when you see MH/s inside sgminer then it must be multiplied by 2 because in SG 1 MH/s = 2 MGroestlH/s.

--------------------------
So when you see "51.719 MH/s" is my program
then you must see 26MH/s in SG.

And when you see 18MH/s on the first post on this topic
You must see 9MH/s in SG.

Also when I see 4MH/s in my program
Then I saw 2MH/s in SG.
---------------------------

So the equation is: 2*sgminer Mh/s = Pallas's Mh/s

This is because sgminer counts 2 Groesth hash calculations as 1. But Pallas count it as 2 hashes, and I just copied Pallas, then later found out how sgminer calculates.

---------------------------
So the Tahiti 26MH/s in sgminer is correct. Please remove the kernel and let sgminer compile it form opencl! If I'm calculating well, then you must see 7-8MH/s with the original kernel. Can you check it please?


utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 12, 2015, 02:52:13 AM
 #177

When I run Pallas OCL I see 18.5MHs in sgminer.
When I run Realhet asm I see 26.0MHs in sgminer.
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
January 12, 2015, 03:11:14 AM
 #178

When I run Pallas OCL I see 18.5MHs in sgminer.
When I run Realhet asm I see 26.0MHs in sgminer.

Please send me that .cl file and the binary that is compiled by the sgminer, I gotta check it.

For today, Thank You for testing, I gotta sleep now, see you!
Star65
Member
**
Offline Offline

Activity: 109
Merit: 13


View Profile
January 12, 2015, 04:33:54 AM
 #179

TVM Pallas and realhet for nice work!

7970/280x 1130/300 W7

Pallas kernel in Cat 14.6  - 17.8MH/s
Pallas kernel in Cat 14.9  - 7.8MH/s   - so 14.9 very bad drivers?!
Realhet kernel in Cat 14.9 - 24.8MH/s - 24.8/7.8=3.18x !!!

We need realhet kernel (bin) with Cat 14.6 or 14.7 (best drivers perhaps). But I do not know how to do it.

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
January 12, 2015, 05:03:14 AM
Last edit: January 12, 2015, 09:21:03 AM by utahjohn
 #180

14.9 has a piss poor OCL compiler, we've known this for a long time ... Stick with 14.7RC3 for best overall performance over many different algo's.

I guess we are stuck with compiling realhet asm on 14.9 but 14.7 does better compiles for OCL.

I am running realhet asm kernel generated with 14.9 on 14.7 catalyst, just a pain in the ass reverting to 14.7 after using 14.9.

My Pallas OCL compile was done with 14.7RC3 and works better than OCL compiled on 14.9.
Pallas ocl compiled with 14.7RC3 will run normal on 14.9, just don't re-compile it with 14.9 ...

Confused yet? hehe

@Realhet
So the gain of Realhet = 1.40x Pallas stands when comparing to properly working Pallas OCL kernel on 14.7
(Same clocks and Intensity running under 14.7 so a fair compare).
Your Pallas reference speed is incorrect in hetpas because 14.9 mangled the OCL badly performance wise.
Take a look at performance hit 14.7 vs 14.9 in Star65 post above.
Unfortunately some of the "gains" you made may have been just repairing 14.9 OCL bugs LOL but obviously improvement was made somewhere in asm kernel.
You need to establish a baseline for your GPU using 14.7 Pallas OCL and see what really made improvements ...
I suggest start over and use this first round a learning experience Smiley  You started with code broken by 14.9 compiler as a base ...

Pallas 14.7 OCL Bin for 280x 18.5 MHs
https://mega.co.nz/#!kAEnDATC!HeelwXTHDsQNx8WJhTDcwqS-slOmikoBiMqTEK9-DV0
Realhet 14.9 ASM bin for 280x 26.0 MHs
https://mega.co.nz/#!1NlRhYLC!7oLFfr2umL7T2Lc0fX3HY1ddthbpNqt6I_tYdG9OI9g

Another random thought Smiley Can you set hetpas up to "cross-compile" for diff GCN architectures so all we have to do is DL bin files from u to test them?  I really dislike uninst-inst-uninst-inst to try a new asm version on 14.7 ... For example have it compile Tahiti.elf, hawaii.elf etc.  I understand u can only test for your card but with us out here to test other elf would speed process of testing new versions ...

DMD Donations : dJrhv4Pp1FXPrQiEp5njx42QrZiuZrbjQ1

Block found and accepted  solo mining so your asm kernel appears to be valid Smiley

I'd like you to have a look see what you can do to further improve wolf0's neoscrypt kernel with asm when you get time.
7950 currently doing 278KHs mining FTC.  PM me for OCL and BIN.
Pages: « 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!