Bitcoin Forum
November 07, 2024, 02:39:50 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 [6] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
Author Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels  (Read 61242 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
qaz6767
Full Member
***
Offline Offline

Activity: 151
Merit: 100


View Profile
October 05, 2014, 09:02:58 AM
 #101

I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
October 05, 2014, 11:05:43 AM
 #102

I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you

if a bin file with the same name already exists, it shouldn't replace it.
so best replace the bin file the miner creates with mine, using the same filename.

qaz6767
Full Member
***
Offline Offline

Activity: 151
Merit: 100


View Profile
October 05, 2014, 01:58:08 PM
 #103

I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you

if a bin file with the same name already exists, it shouldn't replace it.
so best replace the bin file the miner creates with mine, using the same filename.
Thanks!!!
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
October 07, 2014, 08:02:36 AM
 #104

It looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases.
The same happens for other algorythms as well, for example on X11.
I've tweaked the code a bit but I still can't reach full speed, so I will keep on trying or, eventually, wait for a new driver release.
Meanwhile, if you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
October 09, 2014, 10:01:33 PM
 #105

Any chance of getting a worksize 128 super optimized kernel to try on HD5450? (256 too large)
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
October 10, 2014, 07:40:39 AM
 #106

Any chance of getting a worksize 128 super optimized kernel to try on HD5450? (256 too large)

The changes needed to make it work at 128 are easy, but it probably won't be tuned well for such a card: I've tested on r9 290 and 7950 while developing. It might even not work at all.
If you want to try I can send you a file or the changes and if it works well we can post it here.

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
October 11, 2014, 01:24:53 AM
Last edit: October 11, 2014, 02:42:39 AM by utahjohn
 #107

been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today Smiley might have it online by monday Smiley
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
October 11, 2014, 08:58:16 AM
 #108

been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today Smiley might have it online by monday Smiley

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
October 11, 2014, 09:05:31 AM
Last edit: October 11, 2014, 09:18:43 AM by utahjohn
 #109

been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today Smiley might have it online by monday Smiley

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!
thanks will do as soon as I get time Smiley  VM server box has 5450 in it might as well let the host make use of it, I can get ~0.25MHs with normal gorestlcoin kernel with ws 128 on it ... it's running 24/7/365 anyway Smiley

Only has 80 shaders LOL it's a dwarf but is air cooled hehe
about on par with intel HD GPU (10 shaders) in G3220 CPU as far as hashrate

qaz6767
Full Member
***
Offline Offline

Activity: 151
Merit: 100


View Profile
October 20, 2014, 12:26:52 PM
 #110

Help! What the bat file to start Diamond? I can not run for card 280x.Thanks
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
October 20, 2014, 12:29:45 PM
 #111

been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today Smiley might have it online by monday Smiley

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!
thanks will do as soon as I get time Smiley  VM server box has 5450 in it might as well let the host make use of it, I can get ~0.25MHs with normal gorestlcoin kernel with ws 128 on it ... it's running 24/7/365 anyway Smiley

Only has 80 shaders LOL it's a dwarf but is air cooled hehe
about on par with intel HD GPU (10 shaders) in G3220 CPU as far as hashrate

just curious... did you manage to make it work? if yes, what hashrate?

EDIT: it has about half the shaders of a nexus 9 :-D

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
November 04, 2014, 12:50:23 PM
 #112

A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

If you want to fix it for 14.9, remove the naive implementation of the B64_# macros and use swizzle. Worked for me.

Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
November 04, 2014, 01:27:57 PM
 #113

Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.
Ah, I see - I just saw it go from 7MH/s to... I think 20, on 14.9, so I figured it worked; never mind, then.

It's funny how some little changes lead to huge hashrate drops (depending on compiler version); but it's true for memory intensive algos only, as far as I can see.
Maybe your own version doesn't have this problem, then ;-)

cryptonit
Legendary
*
Offline Offline

Activity: 3052
Merit: 1053


bit.diamonds | uNiq.diamonds


View Profile WWW
November 16, 2014, 07:31:32 AM
 #114


 
  Diamond [DMD]     uNiq.Diamonds  
Scarce✦✦✦✦ Valuable ✦✦✦✦ Secure ✦                     ▬ a collector experience ▬                
cryptonit
Legendary
*
Offline Offline

Activity: 3052
Merit: 1053


bit.diamonds | uNiq.diamonds


View Profile WWW
November 23, 2014, 12:23:11 PM
Last edit: November 23, 2014, 01:35:54 PM by cryptonit
 #115

@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome

 
  Diamond [DMD]     uNiq.Diamonds  
Scarce✦✦✦✦ Valuable ✦✦✦✦ Secure ✦                     ▬ a collector experience ▬                
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
November 23, 2014, 02:39:19 PM
 #116

@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome


the problem with my kernel is that, no matter how hard I try, I can't get the best hashrate on 14.9 drivers (only 20 Mh/s vs 25 with 14.6), so it's not enough to just replace diamond.cl on sgminer 4.1 or 5.
that's why I still prefer people visit this post, with all the info and troubleshooting, for best performance.
the only way to make it clean is creating a fork of sgminer, for tahiti and hawaii cards only, with the precompiled binary; some changes are needed in order for it to always use the binary and not compile the cl sources.
not sure I like it but it might work for many... what do you think?

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
November 23, 2014, 04:15:49 PM
 #117

@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome


the problem with my kernel is that, no matter how hard I try, I can't get the best hashrate on 14.9 drivers (only 20 Mh/s vs 25 with 14.6), so it's not enough to just replace diamond.cl on sgminer 4.1 or 5.
that's why I still prefer people visit this post, with all the info and troubleshooting, for best performance.
the only way to make it clean is creating a fork of sgminer, for tahiti and hawaii cards only, with the precompiled binary; some changes are needed in order for it to always use the binary and not compile the cl sources.
not sure I like it but it might work for many... what do you think?

Just an aside - I've gotten the same results - 21MH/s vs. 25MH/s. It's frustrating - but all I've tried is the lookup table implementation, so far.

Well, that means there is probably little room for improvements on that kind of implementation.
I'm curious to see if a bitslice version can be faster on AMD gpus, but I have no time (and no interest because of negative revenue) to try it myself.

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
December 27, 2014, 10:18:22 PM
 #118

Could someone please share their hashrate with r9 285? I'm curious to see if it outperforms the 280 and how much power it uses.

lpedretti
Full Member
***
Offline Offline

Activity: 152
Merit: 100


View Profile
December 29, 2014, 04:19:19 PM
 #119

I was having issues using the optimized cl and precompiled binaries, no HW but there were very ocassional shares and pools reported me a very low hashrate, however the problem was the sgminer version i was using, i'm now using the sgminer-develop that has neoscrypt optimized kernels and with that version it works like a charm!
Running Lubuntu 14.04 with 14.x (don't remember which one)
Clock at 930, 0.95v, 13.5 Mh/s each XFX-7970DD and Gigabyte 280x windforce

Great job!

Best regards!

AC: ANuRoFPkCjZSxsw2S41djrrA1D4xMMmwhs
realhet
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
December 31, 2014, 10:59:08 PM
 #120

Hi All,

I registered here because I need a little help from you, who develops this OpenCL kernel.
A month ago I've found the Groestl algo on the amd dev forums, thanks to Wolf0 who mentioned it on there. I thought it will be a good algo to test my skills in GCN asm, and I'd like to play with it, maybe I can optimize it better than the OCL compiler (or maybe not, but at least I can learn from it anyways).

So the help I'm seeking is this:
- Please send me the latest version of this kernel (I see everyone altering it a bit, just don't know which is which)
- And pls give me a test vector with these things:
  - global kernel dimensions, workgroup size(I guess it's 256)
  - kernel parameters: dump "char *block", and the "target" value
- And of course the above testcase must find a GroestlCoin hash.

Thank you in advance

(I already sent it to Wolf0 on the amd dev forums, but the moderation there can take more time there and later I found this more appropriate place for my question)

And have a Happy New Year, btw
Pages: « 1 2 3 4 5 [6] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!