Bitcoin Forum

Alternate cryptocurrencies => Mining (Altcoins) => Topic started by: pallas on September 12, 2014, 08:00:34 AM



Title: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 08:00:34 AM
ABOUT

This is my optimized Groestlcoin / Diamond and similar opencl kernel (groestl + groestl algorythm, not myriad-groestl which is groestl + sha, see below for the latter).
It is based on the sph version originally available on sph-sgminer but is now totally rewritten.
It should be compatible with all sph-sgminer versions and derivates.

PERFORMANCE

v1 - to be compiled with catalyst 14.6 or 14.7:

R9 290x @1125 Mhz: ~26.4 Mh/s
R9 290 @1200: ~25 Mh/s
R9 280x (stock): ~18 Mh/s
7950 @1200: ~16 Mh/s
R9 270X: ~9.7 Mh/s

v2 - experimental hawaii only bin:

R9 290x @1125 Mhz: ~34.4 Mh/s
R9 290 @1100: ~30.6 Mh/s

Wolf0's Tahiti binary:

R9 280x: ~25 Mh/s

HOW TO USE

- Stop the miner
- Replace groestlcoin.cl, diamond.cl and/or the kernel you want to use with this one (it's inside the "kernel" folder)
- Remove all the .bin files (in the main folder)
- Set worksize to 256 only (-w 256)
- Run and enjoy!

TWEAKING

Set intensity from 20 to 22. Thread concurrency and all the other parameters are useless.
This kernel doesn't make use of gpu ram, so set the ram clock to THE MINIMUM POSSIBLE VALUE; for example 150 MHz for R9 290.
Now play with the core clock until you find the highest stable value (probably between 1100 and 1200 for the R9 290).

COMPATIBILITY

Tested working stable on R9 290, 280x and 7950. Should work on any recent amd gpu but performance is not guaranted to be optimal.
I doesn't work with cryptohunger optimized pool: use the conventional port or another pool. Also do not replace the optimized kernel of grs-sgminer but the normal one.

TROUBLESHOOTING

Try the following:
- Sure you set worksize to 256?
- Replace the generated .bin file with this one (64 bit, r9 280(x) and 290(x) only): LINK EXPIRED (diamondHawaiiw256l8.bin), see below for a newer binary file
- Lower the intensity
- Lower the core speed (are you sure you put the ram clock to the lowest possible value?)
- Since it uses more power, it could be a cooling issue too: check the gpu temperature

DONATIONS

This work took me months of coding and testing and unslept nights; please show your appreciation (you are making more money by using it!) by donating to:
BTC: 1H7qC5uHuGX2d5s9Kuw3k7Wm7xMQzL16SN

DOWNLOAD

Opensource Kernel (v1):
https://app.box.com/s/9vikvemf7acio3uns7taorodqmod42ej

Experimental Hawaii bin (v2):
https://app.box.com/s/zsr29tfgv4tpxs1q7451dayzaw3wnoee

Wolf0's Tahiti bin (https://bitcointalk.org/index.php?topic=779598.msg11778971#msg11778971):
https://ottrbutt.com/miner/wolf-groestlcoinTahitigw256l4.bin

**** MYRIAD GROESTL ****

If you are looking for the closed source myriad groestl miner (for DGB, SFR, etc.) look here:

https://satoshibox.com/fttcfvpiyhbod7ueidmgdhym


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 08:19:21 AM
A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: cryptonit on September 12, 2014, 08:20:05 AM
thx a lot for ur effort
to make best possible amd based mining open source avaiable for
DMD Diamond



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: popshot on September 12, 2014, 08:40:05 AM
Pallas you are Prometheus, spending your time and skills in creating something useful to a lot of people and at the end opening it to all interested. Kudos  :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 08:49:49 AM
That's what opensource is about ;-)
I'm a linux guy for 20 years now and I remember public domain software since the commodore age (around 1984).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: srcxxx on September 12, 2014, 08:53:53 AM
Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 09:00:40 AM
Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx

I did a lot of optimizations which looked smart but lead to slower code. You can't imagine how many. Some of them I though of while in bed and made me not sleep until I could try them. Then the delusion :-D
I believe it's because of the optimizations the compiler does but most of all about local memory and cache access.
If you optimize some instructions but leading to less parallelism in memory access, you'll get slower code.
That's typical for groestl because the speed is limited mostly by the memory (otherwise it would be faster like keccak, blake etc.).
I know there was a bounty: my code was unfinished and, at the time, it was the only way I could mine without loosing money.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 12, 2014, 09:04:14 AM
Wow that's a nice improvement on hashrate :)  Now tuning for stability on my miners ...
Sending a donation your way next block find :)

Testing on HD7950 and R9280X and will report my hashrates when I get it stable :)
Both cards run considerably hotter and 100% fan ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 09:07:18 AM
Wow that's a nice improvement on hashrate :)  Now tuning for stability on my miners ...
Sending a donation your way next block find :)

Thanks!
Let me know your figures.
I need 280x and 290x hashrates, to put in the op.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: srcxxx on September 12, 2014, 09:22:19 AM
Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx

I did a lot of optimizations which looked smart but lead to slower code. You can't imagine how many. Some of them I though of while in bed and made me not sleep until I could try them. Then the delusion :-D
I believe it's because of the optimizations the compiler does but most of all about local memory and cache access.
If you optimize some instructions but leading to less parallelism in memory access, you'll get slower code.
That's typical for groestl because the speed is limited mostly by the memory (otherwise it would be faster like keccak, blake etc.).
I know there was a bounty: my code was unfinished and, at the time, it was the only way I could mine without loosing money.

I know. I actually think that the compiler is not that clever and that's why sometimes worse code runs faster.
Also, I looked at ASM and some stuff there is just plain not optimal. Perhaps it'll be improved in future versions of AMD drivers.

Also, most ASM code only uses .xy from a register. I tried making it work on ulong2 or ulong8 - only slower.

I wish it was possible to write GPU code in assembler...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 09:33:44 AM
Again it's mostly about memory for groestl: optimizing register operations might lead to unnoticeable gain but you may loose on memory access.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 12, 2014, 09:34:59 AM
@pallas
What's your DMD donation address :) Found 2 blocks in like 15 minutes (LUCK!)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 09:36:25 AM
@pallas
What's your DMD donation address :) Found 2 blocks in like 15 minutes (LUCK!)

good!
my DMD address is dVrz69vZFrxJRH9AnKyHim7Hd3PhY3w9NQ


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 12, 2014, 09:39:37 AM
Sent ya 0.5 DMD for now, will send some more after it runs stable for a day :)
Transaction ID: 37bca0a9872845908b4fc4e223d920b3355b5bbbb54de97a583aee67c7b4605d


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 12, 2014, 10:08:46 AM
I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 10:10:00 AM
I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Ivanech on September 12, 2014, 10:10:34 AM
Have anybody tried with 270X cards - what hashrate should I expect?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 12, 2014, 10:18:09 AM
I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.

Actually runs faster at I=21 :)
Have not messed with GPU or MEM clocks just defaults :)  
(Powercolor) 280X  18MHs 67C-68C
(Powercolor) 7950  16MHs 68C-69C
Both cards are volt-modded to lower than stock ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 10:48:25 AM
I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.

Actually runs faster at I=21 :)
Have not messed with GPU or MEM clocks just defaults :)  
(Powercolor) 280X  18MHs 67C-68C
(Powercolor) 7950  16MHs 68C-69C
Both cards are volt-modded to lower than stock ...

good, but if you lower the mem clock you will save power, and get higher maximum core clock as well.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 10:56:02 AM
added to the op:

IF IN TROUBLE, TRY REPLACING THE GENERATED .BIN FILE WITH THIS ONE: https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw256l8.bin


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: echo00114 on September 12, 2014, 12:42:42 PM
Have anybody tried with 270X cards - what hashrate should I expect?

hello
i try saphire toxic but not good all hw error and i try lower gpu mhz but not work 1150 mhz still 1150  o use grs-sgminer 1.4 i use 14.6 amd driver rc.

bye
update 1:
i try different miner sgminerGroestl4_1_0_1  and this cl working great 9.7 mhs  I 22 and  I 20  9.5 mhs 

bye


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 12:45:52 PM
Have anybody tried with 270X cards - what hashrate should I expect?

hello
i try saphire toxic but not good all hw error and i try lower gpu mhz but not work 1150 mhz still 1150  o use grs-sgminer 1.4 i use 14.6 amd driver rc.

bye

since I've never tested it on 270x there could be issues, but please try:

- using the bin file from the OP
- lowering the intensity
- lowering core speed (are you sure you put the ram clock to the lowest possible value?)

since it uses more power, it could be a cooling issue too.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 12, 2014, 12:59:07 PM
Have anybody tried with 270X cards - what hashrate should I expect?

hello
i try saphire toxic but not good all hw error and i try lower gpu mhz but not work 1150 mhz still 1150  o use grs-sgminer 1.4 i use 14.6 amd driver rc.

bye

since I've never tested it on 270x there could be issues, but please try:

- using the bin file from the OP
- lowering the intensity
- lowering core speed (are you sure you put the ram clock to the lowest possible value?)

since it uses more power, it could be a cooling issue too.
If he is using grs-sgminer then he must be using cryptohunger pool which would be incompatible with
a real groestlcoin kernel ... have to use sph-sgminer, not proprietary grs-sgminer with this kernel :)
cryptohunger requires grs-sgminer, so move to a different pool if you want to try this kernel or just solo mine like I do :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Litejavichu on September 12, 2014, 01:21:15 PM
The version I have proposed, goes perfect with 13.2. go to control panel, uninstall 14 and then select install driver 13.2

7950=13Mhs I 21
7970=14-15mhs I21
But these optimized versions hotter than the GPU and more electricity...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 01:27:48 PM
The version I have proposed, goes perfect with 13.2. go to control panel, uninstall 14 and then select install driver 13.2

7950=13Mhs I 21
7970=14-15mhs I21
But these optimized versions hotter than the GPU and more electricity...

fine, thanks, but my kernel is faster...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: cryptonit on September 12, 2014, 01:40:08 PM
21 mhash amd290 at 1020gpu clock
pretty powerhungry (dual 290 are around 150w more useage on this than on x11)
but its the best groestl amd miner i did ever see regarding hashrate per card
great job!

i still use http://multipool.bit.diamonds/ (able mine DMD with nearly any existing algo)
amd cards running x11 most time and nvidia cards nist5
becaue at austria powercost i need to be very poweruseage sensitive

no mining gear? still wana have a constant DMD income? check out DMD Cloudmining
no electricity no heat no maintainance
one time invest forever DMD payouts!

Quote


DMD Multipool total earned DMD jumped across the 10000 DMD mark!


--------------------------------------------------------------------------
--------------------------------------------------------------------------
DMD Multipool Lotto
earn lotto numbers for free when u mine at DMD Multipool
next drawing 5. Oktober 100 DMD
http://multipool.bit.diamonds/

--------------------------------------------------------------------------
--------------------------------------------------------------------------
No mining gear to join DMD Multipool?
Get some DMD Cloudmining shares.
We will give them ability to earn DMD lotto tickets too.....

http://my.picresize.com/vault2/7064E7BC72.jpg






Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: stillontop on September 12, 2014, 09:28:06 PM
Pallas, you are great!

Here are my numbers:
290x @ 1125 Mhz ~ 26,4 Mh/s
290x @ 1040 Mhz ~ 24,4 Mh/s

I am going to send you 20% of the next ten blocks I will find,
just because I want to donate to an honest and sharing person.


To use the bin file correctly, you have to remove all bin files in the main folder, download pallas bin file, put it into the main folder,
start your miner, copy the filename of your automatically created bin file after you started the miner. Then you stop the miner and rename pallas bin-file with the copied filename.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: srcxxx on September 12, 2014, 09:56:16 PM
Hi guys!

I am not sure if you understand it yet, but your profits are not going to increase. Unfortunately. :-(

Perhaps in the upcoming 5-10 days, but then it'll all be as before. And you'll be paying more for electricity.

That is one of the reasons I did not open source my miner.

Cause not only you are going to move to the faster kernel. Everybody will move to the new kernel.
So not just your hashrate will increase, everybody's hashrate will increase as well.

And so, the total coins distribution between miners will not change, but you're going to pay more to electricity companies.

The new kernel eats a lot more electricity and makes more heat.

Which is bad for the algo we're mining.
Also it makes CPU mining a lot less profitable compared with GPU mining. Also bad.

The root of all evil,
srcxxx


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 10:20:01 PM
Hi guys!

I am not sure if you understand it yet, but your profits are not going to increase. Unfortunately. :-(

Perhaps in the upcoming 5-10 days, but then it'll all be as before. And you'll be paying more for electricity.

That is one of the reasons I did not open source my miner.

Cause not only you are going to move to the faster kernel. Everybody will move to the new kernel.
So not just your hashrate will increase, everybody's hashrate will increase as well.

And so, the total coins distribution between miners will not change, but you're going to pay more to electricity companies.

The new kernel eats a lot more electricity and makes more heat.

Which is bad for the algo we're mining.
Also it makes CPU mining a lot less profitable compared with GPU mining. Also bad.

The root of all evil,
srcxxx

hmmm not exactly.
the biggest miners are already using optimized miners made by themselves; this kernel puts the common people on a higher level so they can compete.
smaller miners will make more coins and biggest ones less.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 12, 2014, 10:21:02 PM
Pallas, you are great!

Here are my numbers:
290x @ 1125 Mhz ~ 26,4 Mh/s
290x @ 1040 Mhz ~ 24,4 Mh/s

I am going to send you 20% of the next ten blocks I will find,
just because I want to donate to an honest and sharing person.


To use the bin file correctly, you have to remove all bin files in the main folder, download pallas bin file, put it into the main folder,
start your miner, copy the filename of your automatically created bin file after you started the miner. Then you stop the miner and rename pallas bin-file with the copied filename.

thanks very much! :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: lecaillou on September 12, 2014, 11:09:25 PM
Hi, sorry but I don't see where I can download the last sph-sgminer???<
Thanks
Séb


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: stillontop on September 12, 2014, 11:20:57 PM
Maybe it would be a wise decision to use a power saver kernel in summer, to enhance your graphics card lifetime and a more power version in winter to keep your feet warm :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 13, 2014, 08:01:14 AM
Maybe it would be a wise decision to use a power saver kernel in summer, to enhance your graphics card lifetime and a more power version in winter to keep your feet warm :)

Or just lower the core clock ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 13, 2014, 08:02:29 AM
Hi, sorry but I don't see where I can download the last sph-sgminer???<
Thanks
Séb

You can use the "classic" version as well as the new version 5 (and probably other similar miners).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Ivanech on September 13, 2014, 09:24:05 AM
Excelent work!

Got about 15 MH/s for my Gigabyte 270X compared to 13 MH/s previously.

Is it possible to release other optimized kernels? Qubitcoin for example?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 13, 2014, 10:30:26 AM
Excelent work!

Got about 15 MH/s for my Gigabyte 270X compared to 13 MH/s previously.

Is it possible to release other optimized kernels? Qubitcoin for example?

Yes some improvements can be made but I have no time to dedicate to it :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 14, 2014, 12:25:09 PM
Can someone please post a 32 bit Hawai or Tahiti binary?
The filename should end in "256l4.bin".


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 14, 2014, 10:53:02 PM
has anyone tested this with sgminer v5?  I am currently running sph-sgminer 4.1.0


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 16, 2014, 09:35:23 AM
if someone is still getting hardware errors, ensure you are using a worksize of 256 (-w 256 on the commandline).
some optimizations depend on it so don't set it to a lower value.
also don't set it to a higher value simply because 256 is the maximum and that value will be used instead.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: HR on September 20, 2014, 08:53:10 AM

Congratulations Pallas.

Do you have it on Github already?



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 20, 2014, 09:49:22 AM

Congratulations Pallas.

Do you have it on Github already?

Thanks.
No git yet. I was thinking it's not necessary for a single file, but now there are bins and maybe multiple versions so I may end up doing it.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep on September 20, 2014, 06:58:13 PM

Congratulations Pallas.

Do you have it on Github already?

Thanks.
No git yet. I was thinking it's not necessary for a single file, but now there are bins and maybe multiple versions so I may end up doing it.
congratulations excellent optimization, but what about the other algorithms


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 20, 2014, 08:02:24 PM

Congratulations Pallas.

Do you have it on Github already?

Thanks.
No git yet. I was thinking it's not necessary for a single file, but now there are bins and maybe multiple versions so I may end up doing it.
congratulations excellent optimization, but what about the other algorithms

I did work on other algorithms too (x11 components, m7 and others) but nothing ready for publication.
I'd need more time and lower kilowatt hour cost in order to go ahead :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep on September 20, 2014, 08:49:54 PM

Congratulations Pallas.

Do you have it on Github already?

Thanks.
No git yet. I was thinking it's not necessary for a single file, but now there are bins and maybe multiple versions so I may end up doing it.
congratulations excellent optimization, but what about the other algorithms

I did work on other algorithms too (x11 components, m7 and others) but nothing ready for publication.
I'd need more time and lower kilowatt hour cost in order to go ahead :-)
I understand that all miners are written C ++, and why not in C #, it is a bit faster than the C ++


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 20, 2014, 08:57:27 PM

Congratulations Pallas.

Do you have it on Github already?

Thanks.
No git yet. I was thinking it's not necessary for a single file, but now there are bins and maybe multiple versions so I may end up doing it.
congratulations excellent optimization, but what about the other algorithms

I did work on other algorithms too (x11 components, m7 and others) but nothing ready for publication.
I'd need more time and lower kilowatt hour cost in order to go ahead :-)
I understand that all miners are written C ++, and why not in C #, it is a bit faster than the C ++

Most miners are written in plain C but it doesn't matter that much unless you are mining with the CPU (still there is a good deal of assembly on some algorithms). GPU code is opencl or cuda instead.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on September 24, 2014, 12:09:06 AM
Take a look at my groestl implementation (https://bitcointalk.org/index.php?topic=475795.msg5993529#msg5993529) ;)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 24, 2014, 07:06:02 AM
Take a look at my groestl implementation (https://bitcointalk.org/index.php?topic=475795.msg5993529#msg5993529) ;)

Thanks. From a first look, I don't see anything I haven't tried yet :-)
Do you have some hashrate figures?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on September 24, 2014, 05:13:54 PM
Take a look at my groestl implementation (https://bitcointalk.org/index.php?topic=475795.msg5993529#msg5993529) ;)

Thanks. From a first look, I don't see anything I haven't tried yet :-)
Do you have some hashrate figures?
Sorry, no testing results, I'm away from all crypto stuff, that's rather abandoned project, collecting virtual dust on HDD...
I didn't quite grok all your tricks :) I only use 3 arrays of 32 integers for intermediate results, so memory usage should be almost minimal and such buffer reusing could be an independent optimization, quite sure you have tried the rest :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 24, 2014, 05:58:15 PM
I did the best I could to reduce register usage, but in the end using more 64 bits turned out to be faster. Probably private memory is not fully used...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on September 24, 2014, 06:29:22 PM
in the end using more 64 bits turned out to be faster. Probably private memory is not fully used...
May be 64 bit math tricked AMD OpenCL compiler away from useless 'optimizations' :) I once get strange effect when inserting absolutely unrelated operations (well, it was copy protection) in the middle of big number crunching resulted in ~5% speed increase.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 24, 2014, 06:58:55 PM
@pallas
Any chance of you integrating your groestl kernel into the optimised X11 and X13 kernels on my BCT thread (in my sig) :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 24, 2014, 08:18:21 PM
@pallas
Any chance of you integrating your groestl kernel into the optimised X11 and X13 kernels on my BCT thread (in my sig) :)

I'll have a look asap. Actually I don't have much free time now, so it may take a bit.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 24, 2014, 08:21:07 PM
in the end using more 64 bits turned out to be faster. Probably private memory is not fully used...
May be 64 bit math tricked AMD OpenCL compiler away from useless 'optimizations' :) I once get strange effect when inserting absolutely unrelated operations (well, it was copy protection) in the middle of big number crunching resulted in ~5% speed increase.

Sometimes it looks random indeed! :-D
And sometimes compiling the same .cl file leads to different hashrates O_o


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: istvandv on September 25, 2014, 04:58:45 AM
@pallas
Any chance of you integrating your groestl kernel into the optimised X11 and X13 kernels on my BCT thread (in my sig) :)

I'll have a look asap. Actually I don't have much free time now, so it may take a bit.

while you are at it, how about myr-groestl?  ;D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 25, 2014, 07:28:17 AM
@pallas
Any chance of you integrating your groestl kernel into the optimised X11 and X13 kernels on my BCT thread (in my sig) :)

I'll have a look asap. Actually I don't have much free time now, so it may take a bit.

while you are at it, how about myr-groestl?  ;D

I had a quick look some time ago: some of the tricks that work on this kernel do not make myr-groestl any faster, thus I'd need to re-tune it from scratch... :-/


Title: 16 Mh/s with a Hd 7950 ? I have only 6 !
Post by: Spider07 on September 28, 2014, 07:08:43 AM
Hello
How do you do to have a such result ? I have only 6
My settings are wrong ?

sgminer.exe -k groestlcoin -o localhost:17772 -u XXXX -p XXXXXXXX -I 22 -w 256 -g 1 --thread-concurrency 24000 --gpu-engine 1100 --gpu-memclock 1250


Thanks


Title: Re: 16 Mh/s with a Hd 7950 ? I have only 6 !
Post by: pallas on September 28, 2014, 07:26:05 AM
Hello
How do you do to have a such result ? I have only 6
My settings are wrong ?

sgminer.exe -k groestlcoin -o localhost:17772 -u XXXX -p XXXXXXXX -I 22 -w 256 -g 1 --thread-concurrency 24000 --gpu-engine 1100 --gpu-memclock 1250


Thanks

See the troubleshooting on the op.
Lower your memory clock, try the compiled binary.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Spider07 on September 28, 2014, 07:33:57 AM
Oh
Tried to use the compiled binary - no more succes.....
 I paste your binary to my folder, copy the name of my  generate  .bin, delete my .bin and rename your .bin with the name of my.bin, run again my .bat
Is-it the good way to proceed ?

Also changed to --gpu-memclock 350 without any success
Thanks


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: losk22 on September 28, 2014, 09:24:09 AM
 Replace groestlcoin.cl, diamond.cl and/or the kernel you want to use with this one (it's inside the "kernel" folder) You can read more?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Spider07 on September 28, 2014, 10:52:27 AM
Replace groestlcoin.cl, diamond.cl and/or the kernel you want to use with this one (it's inside the "kernel" folder) You can read more?

Thanks for your help.

Sorry forgot to mention that I have done that.
To be sure , I deleted all files in kernel folder
I have only 1 file (groestlcoin-v1.cl renamed to groestlcoin.cl)

I can't reach 16 Mh/s..... :'(   only 6..... :'(


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: losk22 on September 29, 2014, 07:39:45 AM
ABOUT

This is my optimized Groestlcoin / Diamond and similar opencl kernel (groestl + groestl algorythm, not myriad-groestl which is groestl + sha).
It is based on the sph version originally available on sph-sgminer but is now totally rewritten.
It should be compatible with all sph-sgminer versions and derivates.

PERFORMANCE

R9 290x @1125 Mhz: ~26.4 Mh/s
R9 290 @1200: ~25 Mh/s
R9 280x (stock): ~18 Mh/s
7950 @1200: ~16 Mh/s
R9 270X: ~9.7 Mh/s

HOW TO USE

- Stop the miner
- Replace groestlcoin.cl, diamond.cl and/or the kernel you want to use with this one (it's inside the "kernel" folder)
- Remove all the .bin files (in the main folder)
- Set worksize to 256 only (-w 256)
- Run and enjoy!

TWEAKING

Set intensity from 20 to 22. Thread concurrency and all the other parameters are useless.
This kernel doesn't make use of gpu ram, so set the ram clock to THE MINIMUM POSSIBLE VALUE; for example 150 MHz for R9 290.
Now play with the core clock until you find the highest stable value (probably between 1100 and 1200 for the R9 290).

COMPATIBILITY

Tested working stable on R9 290, 280x and 7950. Should work on any recent amd gpu but performance is not guaranted to be optimal.
I doesn't work with cryptohunger optimized pool: use the conventional port or another pool. Also do not replace the optimized kernel of grs-sgminer but the normal one.

TROUBLESHOOTING

Try the following:
- Sure you set worksize to 256?
- Replace the generated .bin file with this one (64 bit only): https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw256l8.bin
- Lower the intensity
- Lower the core speed (are you sure you put the ram clock to the lowest possible value?)
- Since it uses more power, it could be a cooling issue too: check the gpu temperature

DONATIONS

This work took me months of coding and testing and unslept nights; please show your appreciation (you are making more money by using it!) by donating to:
BTC: 1H7qC5uHuGX2d5s9Kuw3k7Wm7xMQzL16SN
DMD: dVrz69vZFrxJRH9AnKyHim7Hd3PhY3w9NQ

DOWNLOAD

https://dl.dropboxusercontent.com/u/40353042/Diamond/groestlcoin-v1.cl

What coins can be extracted with your algorithm?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 29, 2014, 08:14:34 AM
What coins can be extracted with your algorithm?

the ones using the groestlcoin algo (groestl + groestl): for example diamond, aidbit and atheistcoin.
not myriad groestl like saffroncoin and digibyte.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: St.Neman on September 30, 2014, 08:42:47 PM
4gpu 280x = 69Mhs = about 7000AID x 0.00000086 = 0.00602 BTC/daily

(power consumption 870W/h)

0,00602BTC = 2,35$/daily
electricity = 1,12$/daily

2,35$ - 1,12$ = 1,23$/daily


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 30, 2014, 09:20:58 PM
4gpu 280x = 69Mhs = about 7000AID x 0.00000086 = 0.00602 BTC/daily

(power consumption 870W/h)

0,00602BTC = 2,35$/daily
electricity = 1,12$/daily

2,35$ - 1,12$ = 1,23$/daily

That's good! You pay very little for your electricity...
I pay a lot more so it's no longer profitable for me for a long time now.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on September 30, 2014, 11:54:51 PM
Interesting observation 280x, I had to raise GPU vdc in bios to get stable above 18.4MHs (now 1.118v) ... (was stable at 1.087v using previous kernel).  Best for me 1160gpu/1500mem threads 1, TC 24576 (it does make a difference).  Temps rise dramatically with minor increase in voltage sadly and have capped my overclock attempts as over 75c is enough for me :(  I have been unable to lower mem clock it seems to be locked (powercolor R9 280x).  Cooler ambient temps coming but will not gain all that much as temps seem to rise exponentially with overclocking (nice heater LOL).
Congrats on profitibility St.Neman I am barely profitable with increased power requirement but the 1 DMD reward window will end soon enough ... take advantage while you can.  Got new (used) Mobo on the way and will soon get main miner back online (only 1 280x atm in entertainment box right now main miner mobo died in a puff of smoke LOL).
You should be able to push 73+ Mhs with 4 cards ... (volt mod and proper heatsink paste application).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 01, 2014, 12:18:13 AM
@pallas
have you looked into incorporating super optimized groestl into the X11mod kernel on my thread (in sig)?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 01, 2014, 07:39:50 AM
Interesting observation 280x, I had to raise GPU vdc in bios to get stable above 18.4MHs (now 1.118v) ... (was stable at 1.087v using previous kernel).  Best for me 1160gpu/1500mem threads 1, TC 24576 (it does make a difference).  Temps rise dramatically with minor increase in voltage sadly and have capped my overclock attempts as over 75c is enough for me :(  I have been unable to lower mem clock it seems to be locked (powercolor R9 280x).  Cooler ambient temps coming but will not gain all that much as temps seem to rise exponentially with overclocking (nice heater LOL).
Congrats on profitibility St.Neman I am barely profitable with increased power requirement but the 1 DMD reward window will end soon enough ... take advantage while you can.  Got new (used) Mobo on the way and will soon get main miner back online (only 1 280x atm in entertainment box right now main miner mobo died in a puff of smoke LOL).
You should be able to push 73+ Mhs with 4 cards ... (volt mod and proper heatsink paste application).

It's a pity you can't reduce mem clock because it would save you watts and lower the card temperature (and also let you overclock the core more).
I think the 280x has a maximum core/ram clock range like the 7950: I used to set it at 1150 core and 1000 ram.
If the hashrate changes with thread concurrency, it's probably just compiler variance and not due to the TC buffer which is not used at all. Try removing the bins and restart a handful of times, you'll probably see different hashrates with the exact same settings.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 01, 2014, 04:21:10 PM
Researching inability to lower mem clock, I use VBE7.0.0.7b.exe and atiwinflash to do the volt mod on 280x and 7950.  
Going to try force mem clock in bios ... I get worried about flashing card so many times but it's worth a try ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 01, 2014, 04:33:01 PM
OK that worked I was able to lower mem clock to 500, trying further OC now ...

Temp dropped from 76-77C to 72C with current settings (1170 GPU 500 MEM) 18.5MHs after 10 min ... will probably lower memclk to 150 like on OP :)

Lowered memclk to 150 by bios mod and gpu to 1.112V

Temp 71C after 25 min at 1171/150 18.48Mhs :) I'm happy with that and will leave it alone for a week to see if stable :)

Display driver crash after 30 min, lowering clock by 1 ...

NOTE: VBios modding is not for the faint of heart, this would likely destroy gaming performance, use only on dedicated mining box :)

Question for more experienced miners ... what would I=22 be as xIntensity on r9 280x? (2048 shaders)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 01, 2014, 06:01:30 PM
mmmm ... wonderful odor, I am steam curing a pound of virginia tobacco in oven for my own version of dark molasses cavendish ... nice rich smooth smoke and it's getting really dark after 8 hr sample.

I'm wondering what it has to do with this thread.
Still, thanks for the bump ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 01, 2014, 06:06:20 PM
I'm getting query from another user on my thread on status of x11mod ... any progress?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 02, 2014, 07:55:49 AM
I've found it is needed to keep GPU temp below 72C with the powercolor R9 280x, GPU clock gets throttled down >= 72C ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 02, 2014, 08:00:09 AM
I've found it is needed to keep GPU temp below 72C with the powercolor R9 280x, GPU clock gets throttled down >= 72C ...

72C looks pretty low to me... my 290s throttles at 95 and 85.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 02, 2014, 08:26:45 AM
I've found it is needed to keep GPU temp below 72C with the powercolor R9 280x, GPU clock gets throttled down >= 72C ...

72C looks pretty low to me... my 290s throttles at 95 and 85.
Not a real concern for me with groestl (running under 72C) but running X11 it is an issue when I OC to 1080/1650.

Might have something to do with fan profiles in vbios ... have not messed with that yet.

Just checked status on my ebay order of replacement motherboard for mining box (MOBO/RAM/CPU - Asrock H81 PRO BTC/G3220 3GHZ/4GB 1600 DDR3) and it has reached post office here in SLC :) should get delivered tomorrow or next day :) then I can bring more cards online again :)~

R9 280x stable at 18.47 Mhs temp 71C (1170/150) after 15+ hours :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: St.Neman on October 02, 2014, 06:52:23 PM
Interesting observation 280x, I had to raise GPU vdc in bios to get stable above 18.4MHs (now 1.118v) ... (was stable at 1.087v using previous kernel).  Best for me 1160gpu/1500mem threads 1, TC 24576 (it does make a difference).  Temps rise dramatically with minor increase in voltage sadly and have capped my overclock attempts as over 75c is enough for me :(  I have been unable to lower mem clock it seems to be locked (powercolor R9 280x).  Cooler ambient temps coming but will not gain all that much as temps seem to rise exponentially with overclocking (nice heater LOL).
Congrats on profitibility St.Neman I am barely profitable with increased power requirement but the 1 DMD reward window will end soon enough ... take advantage while you can.  Got new (used) Mobo on the way and will soon get main miner back online (only 1 280x atm in entertainment box right now main miner mobo died in a puff of smoke LOL).
You should be able to push 73+ Mhs with 4 cards ... (volt mod and proper heatsink paste application).

thanks utahjohn,
gpu1090, mem1000, i22, 1,169V (temp 70C fan 2000-2500)
can't get more, i try but can't :(
i have worst gpus 280x http://www.sapphiretech.com/presentation/product/product_index.aspx?pid=2022&lid=1,
new gpus after a few months simply stop working,
then i must to change every capacitor on gpu
if i want to work again


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 02, 2014, 07:43:17 PM
Interesting observation 280x, I had to raise GPU vdc in bios to get stable above 18.4MHs (now 1.118v) ... (was stable at 1.087v using previous kernel).  Best for me 1160gpu/1500mem threads 1, TC 24576 (it does make a difference).  Temps rise dramatically with minor increase in voltage sadly and have capped my overclock attempts as over 75c is enough for me :(  I have been unable to lower mem clock it seems to be locked (powercolor R9 280x).  Cooler ambient temps coming but will not gain all that much as temps seem to rise exponentially with overclocking (nice heater LOL).
Congrats on profitibility St.Neman I am barely profitable with increased power requirement but the 1 DMD reward window will end soon enough ... take advantage while you can.  Got new (used) Mobo on the way and will soon get main miner back online (only 1 280x atm in entertainment box right now main miner mobo died in a puff of smoke LOL).
You should be able to push 73+ Mhs with 4 cards ... (volt mod and proper heatsink paste application).

thanks utahjohn,
gpu1090, mem1000, i22, 1,169V (temp 70C fan 2000-2500)
can't get more, i try but can't :(
i have worst gpus 280x http://www.sapphiretech.com/presentation/product/product_index.aspx?pid=2022&lid=1,
new gpus after a few months simply stop working,
then i must to change every capacitor on gpu
if i want to work again
Are you tuning each card individually (there is great varience between GPU quality on each card), try to tune them one by one in your config. 
Smoked capacitors ??? is your power unstable or PS faulty, I have never had a blown capacitor.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: St.Neman on October 02, 2014, 10:06:00 PM
Interesting observation 280x, I had to raise GPU vdc in bios to get stable above 18.4MHs (now 1.118v) ... (was stable at 1.087v using previous kernel).  Best for me 1160gpu/1500mem threads 1, TC 24576 (it does make a difference).  Temps rise dramatically with minor increase in voltage sadly and have capped my overclock attempts as over 75c is enough for me :(  I have been unable to lower mem clock it seems to be locked (powercolor R9 280x).  Cooler ambient temps coming but will not gain all that much as temps seem to rise exponentially with overclocking (nice heater LOL).
Congrats on profitibility St.Neman I am barely profitable with increased power requirement but the 1 DMD reward window will end soon enough ... take advantage while you can.  Got new (used) Mobo on the way and will soon get main miner back online (only 1 280x atm in entertainment box right now main miner mobo died in a puff of smoke LOL).
You should be able to push 73+ Mhs with 4 cards ... (volt mod and proper heatsink paste application).

thanks utahjohn,
gpu1090, mem1000, i22, 1,169V (temp 70C fan 2000-2500)
can't get more, i try but can't :(
i have worst gpus 280x http://www.sapphiretech.com/presentation/product/product_index.aspx?pid=2022&lid=1,
new gpus after a few months simply stop working,
then i must to change every capacitor on gpu
if i want to work again
Are you tuning each card individually (there is great varience between GPU quality on each card), try to tune them one by one in your config. 
Smoked capacitors ??? is your power unstable or PS faulty, I have never had a blown capacitor.


my equipment and power is absolutely ok, another's gpu's working perfectly, bad chinese capacitors is my only and big problem :(


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 03, 2014, 11:26:22 AM
@st.neman
Tried installing catalyst 14.9 and it totally borked hashrate using super kernel. 
Stick with catalyst 14.7 RC3 :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 03, 2014, 11:30:37 AM
@st.neman
Tried installing catalyst 14.9 and it totally borked hashrate using super kernel. 
Stick with catalyst 14.7 RC3 :)

I've experienced the hashrate drop on the new drivers too. I'll look into it: there might need to be two versions of the kernel.
You can still use the new drivers but the old bin files, it should work.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 03, 2014, 11:45:47 AM
@st.neman
Tried installing catalyst 14.9 and it totally borked hashrate using super kernel. 
Stick with catalyst 14.7 RC3 :)

I've experienced the hashrate drop on the new drivers too. I'll look into it: there might need to be two versions of the kernel.
You can still use the new drivers but the old bin files, it should work.
Didn't try old bin, but I reverted to 14.7RC3 (used DDU) and no problem :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 03, 2014, 11:49:50 AM
I suggest everybody keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 03, 2014, 11:54:53 AM
I suggest everybody keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've added a "final advices" section to the second post with the quoted text and this one:
"I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!"


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: zgor on October 03, 2014, 12:18:12 PM
Hi,

4gpu 280x = 69Mhs

This is for DMD, right? Could you share your full set of parameters passed to the sgminer?

Me, on 280x I can't get past 7.45 Mhs, with the sph-sgminer version 4.1.0-105-gf02f. The hash rate increase due this optimized kernel has been about 7% only.

Thanks.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 03, 2014, 12:20:08 PM
Hi,

4gpu 280x = 69Mhs

This is for DMD, right? Could you share your full set of parameters passed to the sgminer?

Me, on 280x I can't get past 7.45 Mhs, with the sph-sgminer version 4.1.0-105-gf02f. The hash rate increase due this optimized kernel has been about 7% only.

Thanks.
That is right in the MHs range I was getting with catalyst 14.9 driver ... use 14.7 RC3 or the bin provided by pallas and see what you get.

Here's my bin for 280x at 1170/150 getting 18.46MHs
 https://mega.co.nz/#!pNcTnCLA!dszQHHMsK9RQngPQtQKsvKvvGz8KNqhPSO8HWwZNxD4 (https://mega.co.nz/#!pNcTnCLA!dszQHHMsK9RQngPQtQKsvKvvGz8KNqhPSO8HWwZNxD4)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: zgor on October 03, 2014, 12:36:11 PM
Hi,

4gpu 280x = 69Mhs

This is for DMD, right? Could you share your full set of parameters passed to the sgminer?

Me, on 280x I can't get past 7.45 Mhs, with the sph-sgminer version 4.1.0-105-gf02f. The hash rate increase due this optimized kernel has been about 7% only.

Thanks.
That is right in the MHs range I was getting with catalyst 14.9 driver ... use 14.7 RC3 or the bin provided by pallas and see what you get

Could not locate 14.7RC3 (I am on Linux), only 14.4, will try. The bin provided by pallas is for Hawaii (290), not for Tahiti (280), correct?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 03, 2014, 12:41:42 PM
Hi,

4gpu 280x = 69Mhs

This is for DMD, right? Could you share your full set of parameters passed to the sgminer?

Me, on 280x I can't get past 7.45 Mhs, with the sph-sgminer version 4.1.0-105-gf02f. The hash rate increase due this optimized kernel has been about 7% only.

Thanks.
That is right in the MHs range I was getting with catalyst 14.9 driver ... use 14.7 RC3 or the bin provided by pallas and see what you get

Could not locate 14.7RC3 (I am on Linux), only 14.4, will try. The bin provided by pallas is for Hawaii (290), not for Tahiti (280), correct?
Check my prev post I added a link to tahiti bin


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 03, 2014, 12:48:29 PM
Hi,

4gpu 280x = 69Mhs

This is for DMD, right? Could you share your full set of parameters passed to the sgminer?

Me, on 280x I can't get past 7.45 Mhs, with the sph-sgminer version 4.1.0-105-gf02f. The hash rate increase due this optimized kernel has been about 7% only.

Thanks.
That is right in the MHs range I was getting with catalyst 14.9 driver ... use 14.7 RC3 or the bin provided by pallas and see what you get

Could not locate 14.7RC3 (I am on Linux), only 14.4, will try. The bin provided by pallas is for Hawaii (290), not for Tahiti (280), correct?

this is 14.9 (non-beta) for linux:

http://support.amd.com/en-us/download/desktop?os=Linux+x86

the hawaii binary should work on tahiti as well.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Feneusens on October 03, 2014, 01:24:50 PM
Hi,

4gpu 280x = 69Mhs

This is for DMD, right? Could you share your full set of parameters passed to the sgminer?

Me, on 280x I can't get past 7.45 Mhs, with the sph-sgminer version 4.1.0-105-gf02f. The hash rate increase due this optimized kernel has been about 7% only.

Thanks.
That is right in the MHs range I was getting with catalyst 14.9 driver ... use 14.7 RC3 or the bin provided by pallas and see what you get

Could not locate 14.7RC3 (I am on Linux), only 14.4, will try. The bin provided by pallas is for Hawaii (290), not for Tahiti (280), correct?

this is 14.9 (non-beta) for linux:

http://support.amd.com/en-us/download/desktop?os=Linux+x86

the hawaii binary should work on tahiti as well.


Any benefit for upgrading to 14.9?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 03, 2014, 01:53:26 PM
Hi,

4gpu 280x = 69Mhs

This is for DMD, right? Could you share your full set of parameters passed to the sgminer?

Me, on 280x I can't get past 7.45 Mhs, with the sph-sgminer version 4.1.0-105-gf02f. The hash rate increase due this optimized kernel has been about 7% only.

Thanks.
That is right in the MHs range I was getting with catalyst 14.9 driver ... use 14.7 RC3 or the bin provided by pallas and see what you get

Could not locate 14.7RC3 (I am on Linux), only 14.4, will try. The bin provided by pallas is for Hawaii (290), not for Tahiti (280), correct?

this is 14.9 (non-beta) for linux:

http://support.amd.com/en-us/download/desktop?os=Linux+x86

the hawaii binary should work on tahiti as well.


Any benefit for upgrading to 14.9?

using the old binary, made with 14.6, I noticed less power usage with 14.9.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: DMDCreeper on October 03, 2014, 03:48:43 PM
I am getting better hash rates with this than with grs-sgminer (solo mining).  

R290  - 20.8 mh/s vs 18.3 mh/s with grs
R280x - 17.3 mh/s vs 14.6 mh/s with grs

I am not seeing much advantage to increase intensity above 20.

I am seeing that both cards are running hotter - using more power.  I have the 14.6 beta driver package installed.

However, I have two questions about what I am seeing.

1.  My R290s are set to 1100 for GPU speed, yet they are actually running under 1000.  I am seeing speeds of only 937.  auto gpu is disabled.  I increased powertune to 20 and the gpu speed increased slightly but not to the speed setting of 1100. Can anyone explain this?  

2.  I set the memory speeds on the R290s to 150.  With grs-sgminer I don't have to, they run at that speed simply because the memory is not used.  With this software, the memory on the R290s runs are 1250 and I cannot change that.  On the R280X, the memory is running at 1500 and I cannot lower it no matter what I set it to.  Can anyone explain this?

BTW, I was using the 13.12 driver package before and I could not get the hash rates that I am getting with the newer drivers.  The best I could get out of the R290s was about 13.8 mh/s.  With grs-sgminer and the 13.12 drivers I was getting 18.3 mh/s.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: zgor on October 03, 2014, 04:13:54 PM
Here's my bin for 280x at 1170/150 getting 18.46MHs
 https://mega.co.nz/#!pNcTnCLA!dszQHHMsK9RQngPQtQKsvKvvGz8KNqhPSO8HWwZNxD4 (https://mega.co.nz/#!pNcTnCLA!dszQHHMsK9RQngPQtQKsvKvvGz8KNqhPSO8HWwZNxD4)

Thanks! It's running stable, yielding 17.5 MHs for my 280x at 1129/1000 (could not lower the ram speed below that) and the 14.4 Catalyst drivers.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: zgor on October 03, 2014, 04:27:49 PM
this is 14.9 (non-beta) for linux:

Plan on trying it out.


the hawaii binary should work on tahiti as well.

Indeed it runs, but gives no MH increase :-(   Is it 64bits vs 32bits of the one provided by utahjohn?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 03, 2014, 04:39:17 PM
this is 14.9 (non-beta) for linux:

Plan on trying it out.


the hawaii binary should work on tahiti as well.

Indeed it runs, but gives no MH increase :-(   Is it 64bits vs 32bits of the one provided by utahjohn?
Mine 64bit windows 14.7RC3 specifically compiled for 280x

Let me know if 14.9 with compiled (14.7) bin helps power consumption if you test it.

Also see https://bitcointalk.org/index.php?topic=779598.msg9043545#msg9043545 (https://bitcointalk.org/index.php?topic=779598.msg9043545#msg9043545) about forcing lower mem clock by vbios mod :)
Always keep a backup copy of original vbios for each specific card if you need to reflash back to normal for resale ...
If you can't get linux tools to mod the vbios, PM me for options.
Also note that vbios is card mfr specific, so I can't just give a .ROM for Powercolor card and have it work on a different card ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 03, 2014, 05:11:52 PM
I am getting better hash rates with this than with grs-sgminer (solo mining).  

R290  - 20.8 mh/s vs 18.3 mh/s with grs
R280x - 17.3 mh/s vs 14.6 mh/s with grs

I am not seeing much advantage to increase intensity above 20.

I am seeing that both cards are running hotter - using more power.  I have the 14.6 beta driver package installed.

However, I have two questions about what I am seeing.

1.  My R290s are set to 1100 for GPU speed, yet they are actually running under 1000.  I am seeing speeds of only 937.  auto gpu is disabled.  I increased powertune to 20 and the gpu speed increased slightly but not to the speed setting of 1100. Can anyone explain this?  

2.  I set the memory speeds on the R290s to 150.  With grs-sgminer I don't have to, they run at that speed simply because the memory is not used.  With this software, the memory on the R290s runs are 1250 and I cannot change that.  On the R280X, the memory is running at 1500 and I cannot lower it no matter what I set it to.  Can anyone explain this?

BTW, I was using the 13.12 driver package before and I could not get the hash rates that I am getting with the newer drivers.  The best I could get out of the R290s was about 13.8 mh/s.  With grs-sgminer and the 13.12 drivers I was getting 18.3 mh/s.

1. High ram clock = lower Max core clock
2. If on Linux try:

http://epixoip.github.io/od6config/


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: zgor on October 03, 2014, 07:38:09 PM
Let me know if 14.9 with compiled (14.7) bin helps power consumption if you test it.

With your .bin, and 14.9, the power consumption appear very slightly higher, but within 1%, and the hashrate is up about 0.6%.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 03, 2014, 07:47:52 PM
Let me know if 14.9 with compiled (14.7) bin helps power consumption if you test it.

With your .bin, and 14.9, the power consumption appear very slightly higher, but within 1%, and the hashrate is up about 0.6%.
I'll stick with 14.7 then (I run other algos's also and have need to compile bin's).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: DMDCreeper on October 03, 2014, 09:30:32 PM
I am getting better hash rates with this than with grs-sgminer (solo mining).  

R290  - 20.8 mh/s vs 18.3 mh/s with grs
R280x - 17.3 mh/s vs 14.6 mh/s with grs

I am not seeing much advantage to increase intensity above 20.

I am seeing that both cards are running hotter - using more power.  I have the 14.6 beta driver package installed.

However, I have two questions about what I am seeing.

1.  My R290s are set to 1100 for GPU speed, yet they are actually running under 1000.  I am seeing speeds of only 937.  auto gpu is disabled.  I increased powertune to 20 and the gpu speed increased slightly but not to the speed setting of 1100. Can anyone explain this?  

2.  I set the memory speeds on the R290s to 150.  With grs-sgminer I don't have to, they run at that speed simply because the memory is not used.  With this software, the memory on the R290s runs are 1250 and I cannot change that.  On the R280X, the memory is running at 1500 and I cannot lower it no matter what I set it to.  Can anyone explain this?

BTW, I was using the 13.12 driver package before and I could not get the hash rates that I am getting with the newer drivers.  The best I could get out of the R290s was about 13.8 mh/s.  With grs-sgminer and the 13.12 drivers I was getting 18.3 mh/s.

1. High ram clock = lower Max core clock
2. If on Linux try:

http://epixoip.github.io/od6config/

I should have noted I am running Windows 7 64 bit.  I am trying to lower the RAM speed but it is not budging no matter what settings I try.  Likewise the GPU core clock is staying under 1000.

I do not have this problem with grs-sgminer miner. 

Even so, this package is giving me a higher hash rate than grs-sgminer so I will stay with it.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: anatolikostis on October 03, 2014, 10:01:35 PM
I should have noted I am running Windows 7 64 bit.  I am trying to lower the RAM speed but it is not budging no matter what settings I try.  Likewise the GPU core clock is staying under 1000.

I do not have this problem with grs-sgminer miner. 

Even so, this package is giving me a higher hash rate than grs-sgminer so I will stay with it.
your gpu freq. fluct. is caused by driver power limit function.
just try to reduce gpu voltage - may be 2-3 steps backward... ;)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: optiplex on October 05, 2014, 05:05:20 AM
I should have noted I am running Windows 7 64 bit.  I am trying to lower the RAM speed but it is not budging no matter what settings I try.  Likewise the GPU core clock is staying under 1000.

I do not have this problem with grs-sgminer miner. 

Even so, this package is giving me a higher hash rate than grs-sgminer so I will stay with it.
your gpu freq. fluct. is caused by driver power limit function.
just try to reduce gpu voltage - may be 2-3 steps backward... ;)
That sound like a total line of crap to me.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qaz6767 on October 05, 2014, 07:04:13 AM
Help set up R9 290 Tri-x! Thank you!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 05, 2014, 07:11:56 AM
Help set up R9 290 Tri-x! Thank you!
Read page 1 then ask question if help needed :)  (Sorry for being ill-tempered, I just had to block a cred card due to fraudulent charges being made on it, cash only till new card arrives).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qaz6767 on October 05, 2014, 09:02:58 AM
I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 05, 2014, 11:05:43 AM
I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you

if a bin file with the same name already exists, it shouldn't replace it.
so best replace the bin file the miner creates with mine, using the same filename.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qaz6767 on October 05, 2014, 01:58:08 PM
I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you

if a bin file with the same name already exists, it shouldn't replace it.
so best replace the bin file the miner creates with mine, using the same filename.
Thanks!!!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 07, 2014, 08:02:36 AM
It looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases.
The same happens for other algorythms as well, for example on X11.
I've tweaked the code a bit but I still can't reach full speed, so I will keep on trying or, eventually, wait for a new driver release.
Meanwhile, if you are on 14.9, use the provided bin file instead of the kernel source in .cl format.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 09, 2014, 10:01:33 PM
Any chance of getting a worksize 128 super optimized kernel to try on HD5450? (256 too large)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 10, 2014, 07:40:39 AM
Any chance of getting a worksize 128 super optimized kernel to try on HD5450? (256 too large)

The changes needed to make it work at 128 are easy, but it probably won't be tuned well for such a card: I've tested on r9 290 and 7950 while developing. It might even not work at all.
If you want to try I can send you a file or the changes and if it works well we can post it here.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 11, 2014, 01:24:53 AM
been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today :) might have it online by monday :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 11, 2014, 08:58:16 AM
been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today :) might have it online by monday :)

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on October 11, 2014, 09:05:31 AM
been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today :) might have it online by monday :)

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!
thanks will do as soon as I get time :)  VM server box has 5450 in it might as well let the host make use of it, I can get ~0.25MHs with normal gorestlcoin kernel with ws 128 on it ... it's running 24/7/365 anyway :)

Only has 80 shaders LOL it's a dwarf but is air cooled hehe
about on par with intel HD GPU (10 shaders) in G3220 CPU as far as hashrate



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qaz6767 on October 20, 2014, 12:26:52 PM
Help! What the bat file to start Diamond? I can not run for card 280x.Thanks


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 20, 2014, 12:29:45 PM
been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today :) might have it online by monday :)

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!
thanks will do as soon as I get time :)  VM server box has 5450 in it might as well let the host make use of it, I can get ~0.25MHs with normal gorestlcoin kernel with ws 128 on it ... it's running 24/7/365 anyway :)

Only has 80 shaders LOL it's a dwarf but is air cooled hehe
about on par with intel HD GPU (10 shaders) in G3220 CPU as far as hashrate

just curious... did you manage to make it work? if yes, what hashrate?

EDIT: it has about half the shaders of a nexus 9 :-D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on November 04, 2014, 12:43:07 PM
A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

If you want to fix it for 14.9, remove the naive implementation of the B64_# macros and use swizzle. Worked for me.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on November 04, 2014, 12:50:23 PM
A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

If you want to fix it for 14.9, remove the naive implementation of the B64_# macros and use swizzle. Worked for me.

Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on November 04, 2014, 01:09:20 PM
A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

If you want to fix it for 14.9, remove the naive implementation of the B64_# macros and use swizzle. Worked for me.

Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.

Ah, I see - I just saw it go from 7MH/s to... I think 20, on 14.9, so I figured it worked; never mind, then.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on November 04, 2014, 01:27:57 PM
Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.
Ah, I see - I just saw it go from 7MH/s to... I think 20, on 14.9, so I figured it worked; never mind, then.

It's funny how some little changes lead to huge hashrate drops (depending on compiler version); but it's true for memory intensive algos only, as far as I can see.
Maybe your own version doesn't have this problem, then ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on November 04, 2014, 01:43:39 PM
Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.
Ah, I see - I just saw it go from 7MH/s to... I think 20, on 14.9, so I figured it worked; never mind, then.

It's funny how some little changes lead to huge hashrate drops (depending on compiler version); but it's true for memory intensive algos only, as far as I can see.
Maybe your own version doesn't have this problem, then ;-)

Not true for only memory intensive algos - one little screwup and the idiot compiler will double the size of your code, it won't fit in the code cache, and be slow lol


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: cryptonit on November 16, 2014, 07:31:32 AM
https://pbs.twimg.com/media/B2i70_JIMAA4DJX.png
http://multipool.bit.diamonds/
cloud@bit.diamonds


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: cryptonit on November 23, 2014, 12:23:11 PM
@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on November 23, 2014, 02:39:19 PM
@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome


the problem with my kernel is that, no matter how hard I try, I can't get the best hashrate on 14.9 drivers (only 20 Mh/s vs 25 with 14.6), so it's not enough to just replace diamond.cl on sgminer 4.1 or 5.
that's why I still prefer people visit this post, with all the info and troubleshooting, for best performance.
the only way to make it clean is creating a fork of sgminer, for tahiti and hawaii cards only, with the precompiled binary; some changes are needed in order for it to always use the binary and not compile the cl sources.
not sure I like it but it might work for many... what do you think?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on November 23, 2014, 03:06:50 PM
@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome


the problem with my kernel is that, no matter how hard I try, I can't get the best hashrate on 14.9 drivers (only 20 Mh/s vs 25 with 14.6), so it's not enough to just replace diamond.cl on sgminer 4.1 or 5.
that's why I still prefer people visit this post, with all the info and troubleshooting, for best performance.
the only way to make it clean is creating a fork of sgminer, for tahiti and hawaii cards only, with the precompiled binary; some changes are needed in order for it to always use the binary and not compile the cl sources.
not sure I like it but it might work for many... what do you think?

Just an aside - I've gotten the same results - 21MH/s vs. 25MH/s. It's frustrating - but all I've tried is the lookup table implementation, so far.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on November 23, 2014, 04:15:49 PM
@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome


the problem with my kernel is that, no matter how hard I try, I can't get the best hashrate on 14.9 drivers (only 20 Mh/s vs 25 with 14.6), so it's not enough to just replace diamond.cl on sgminer 4.1 or 5.
that's why I still prefer people visit this post, with all the info and troubleshooting, for best performance.
the only way to make it clean is creating a fork of sgminer, for tahiti and hawaii cards only, with the precompiled binary; some changes are needed in order for it to always use the binary and not compile the cl sources.
not sure I like it but it might work for many... what do you think?

Just an aside - I've gotten the same results - 21MH/s vs. 25MH/s. It's frustrating - but all I've tried is the lookup table implementation, so far.

Well, that means there is probably little room for improvements on that kind of implementation.
I'm curious to see if a bitslice version can be faster on AMD gpus, but I have no time (and no interest because of negative revenue) to try it myself.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on November 23, 2014, 04:18:11 PM
@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome


the problem with my kernel is that, no matter how hard I try, I can't get the best hashrate on 14.9 drivers (only 20 Mh/s vs 25 with 14.6), so it's not enough to just replace diamond.cl on sgminer 4.1 or 5.
that's why I still prefer people visit this post, with all the info and troubleshooting, for best performance.
the only way to make it clean is creating a fork of sgminer, for tahiti and hawaii cards only, with the precompiled binary; some changes are needed in order for it to always use the binary and not compile the cl sources.
not sure I like it but it might work for many... what do you think?

Just an aside - I've gotten the same results - 21MH/s vs. 25MH/s. It's frustrating - but all I've tried is the lookup table implementation, so far.

Well, that means there is probably little room for improvements on that kind of implementation.
I'm curious to see if a bitslice version can be faster on AMD gpus, but I have no time (and no interest because of negative revenue) to try it myself.

I think it might be - 14.9 killed my X11 hashrate at first, down from 10MH/s on 290X to 2 point something. After redesigning Groestl, still based on lookup tables, I got it back up to 6.5MH/s or so. Still dismal...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on December 27, 2014, 10:18:22 PM
Could someone please share their hashrate with r9 285? I'm curious to see if it outperforms the 280 and how much power it uses.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: lpedretti on December 29, 2014, 04:19:19 PM
I was having issues using the optimized cl and precompiled binaries, no HW but there were very ocassional shares and pools reported me a very low hashrate, however the problem was the sgminer version i was using, i'm now using the sgminer-develop that has neoscrypt optimized kernels and with that version it works like a charm!
Running Lubuntu 14.04 with 14.x (don't remember which one)
Clock at 930, 0.95v, 13.5 Mh/s each XFX-7970DD and Gigabyte 280x windforce

Great job!

Best regards!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on December 31, 2014, 10:59:08 PM
Hi All,

I registered here because I need a little help from you, who develops this OpenCL kernel.
A month ago I've found the Groestl algo on the amd dev forums, thanks to Wolf0 who mentioned it on there. I thought it will be a good algo to test my skills in GCN asm, and I'd like to play with it, maybe I can optimize it better than the OCL compiler (or maybe not, but at least I can learn from it anyways).

So the help I'm seeking is this:
- Please send me the latest version of this kernel (I see everyone altering it a bit, just don't know which is which)
- And pls give me a test vector with these things:
  - global kernel dimensions, workgroup size(I guess it's 256)
  - kernel parameters: dump "char *block", and the "target" value
- And of course the above testcase must find a GroestlCoin hash.

Thank you in advance

(I already sent it to Wolf0 on the amd dev forums, but the moderation there can take more time there and later I found this more appropriate place for my question)

And have a Happy New Year, btw


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on January 01, 2015, 12:17:47 AM
Hi All,

I registered here because I need a little help from you, who develops this OpenCL kernel.
A month ago I've found the Groestl algo on the amd dev forums, thanks to Wolf0 who mentioned it on there. I thought it will be a good algo to test my skills in GCN asm, and I'd like to play with it, maybe I can optimize it better than the OCL compiler (or maybe not, but at least I can learn from it anyways).

So the help I'm seeking is this:
- Please send me the latest version of this kernel (I see everyone altering it a bit, just don't know which is which)
- And pls give me a test vector with these things:
  - global kernel dimensions, workgroup size(I guess it's 256)
  - kernel parameters: dump "char *block", and the "target" value
- And of course the above testcase must find a GroestlCoin hash.

Thank you in advance

(I already sent it to Wolf0 on the amd dev forums, but the moderation there can take more time there and later I found this more appropriate place for my question)

And have a Happy New Year, btw

I don't check there often - how exactly do you do GCN ASM? I'm interested.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 01, 2015, 09:22:12 AM
how exactly do you do GCN ASM? I'm interested.

I wrote an assembler for it. You can try it at realhet.wordpress.com. (Use Cat 13.4 or older, otherwise examples will crash.)

My first thoughts compiling the OCL kernel (on a 7770):
- Its 2.5 times bigger than the instruction cache. (and there are no loops in it, so I guess it often reads from ram.)
- T0 and T1 is located in the gpu ram.
- VReg count is above 128. -> that allows only the minimum no of 4 wavefronts/CU. So there are no
latency hiding via parallel wavefronts.
- too short kernel with too much initialization: Ideally I'd let every workgroup run for a minimum of 0.5 sec. So kernel launch and LDS table initialization would take no time compared to the actual work.
- better instructions: BitFieldExtract for 64bit rotate, ds_read2_b64 for 128 bit LDS read.
- balancing load between LDS and L1 cache

I don't know which of the above is an actual bottleneck or will be usefull, but I wanna find out.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 01, 2015, 11:04:15 PM
how exactly do you do GCN ASM? I'm interested.

I wrote an assembler for it. You can try it at realhet.wordpress.com. (Use Cat 13.4 or older, otherwise examples will crash.)

My first thoughts compiling the OCL kernel (on a 7770):
- Its 2.5 times bigger than the instruction cache. (and there are no loops in it, so I guess it often reads from ram.)
- T0 and T1 is located in the gpu ram.
- VReg count is above 128. -> that allows only the minimum no of 4 wavefronts/CU. So there are no
latency hiding via parallel wavefronts.
- too short kernel with too much initialization: Ideally I'd let every workgroup run for a minimum of 0.5 sec. So kernel launch and LDS table initialization would take no time compared to the actual work.
- better instructions: BitFieldExtract for 64bit rotate, ds_read2_b64 for 128 bit LDS read.
- balancing load between LDS and L1 cache

I don't know which of the above is an actual bottleneck or will be usefull, but I wanna find out.

I'm going to try your assembler, very interesting projects!
About your observations, first of all keep in mind that the compiler is pretty unpredictable: many optimizations just do not make sense but they work. Also I only tested it with Tahiti and Hawaii cards.
Kernel size: it can easily be made smaller (for example by including a single table instead of 2), but in all my tests it doesn't bring any advantage.
T0 and T1 are not in gpu ram: it would be much slower if they were. They are in constant ram, I believe.
Short kernel: even though you might design it in order to process multiple hashes in a single run, I think it's not worth. Simple proof: algos which are tens of times faster than groestl, like keccak, still do a single hash per kernel run. Another reason is that making the kernel last longer will result in more rejected shares.
Balancing load between local ram and cache (or whatever balancing of memory reads): I believe that many optimizations that do not make sense, work because they intrudoduce little delays that permit better memory reads between the threads. They sort of better fit together. In fact, modifying the code on other parts of the code may make the same optimization worthless. Interesting speed variations may be brought by switching instructions or grouping local ram reads differently, for example.

Hope that helps.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Atomicat on January 02, 2015, 04:10:02 AM
Learn something new every day.  My instinct is to push it till it moves, crank it to 11, but that doesn't work with the R9-290.  Doesn't work because it's throttling for power considerations long before you're hitting 1150.  Just dropped my voltages right down and finally got 23.5 at 1125, I-20.  New understanding of how to handle this card will make for better benchmarks, for true.

Oh, nice price jump today, from 60k to 70k.  Yeah, I'll take credit for that.  Put some orders up last night, woke up to find that I basically owned it on Cryptsy (http://i633.photobucket.com/albums/uu53/acatphoto/Temp/DMD1.gif)!  Drop a line with your DMD wallet address, I'll give you well earned reward from my ill gotten gains.



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 02, 2015, 10:05:10 AM
"T0 and T1 are not in gpu ram: it would be much slower if they were."

Thanks for the ideas!

Actually I knew it from the disasm, that it uses ram instead of LDS for T0, T1. (Note that there is no such thing as constant memory in GCN. It can read a single value with the Scalar ALU and broadcast it across all the wavefront's workitems or it can read 64 values for a whole wavefront by the Vector ALU. Because T0 is addressed by data, it must be read by the VALU using L1 cache (there is a scalar cache too)).
And from there I had the idea of balancing the two sources (LDS and L1).

I did a simple test: renamed T0 and T1, and allocated a new T0 and a T1 from __local. And then initialized them properly. Result: all tbuffer memory read instructions disappeared from the disasm, and the hash rate is dropped from 3.99 MH/s down to 3.841. Don't know how much is the penalty of copying T0, and T1 into the LDS, though.
By the 'textbook': L1 cache can read 4bytes/cycle, LDS: 8bytes/cycle

And yes, the OpenCL compiler is totally unpredictable.

Important question: In the MH/s calculation 1 kernel thread execution means 2 Hashes, right?

(I have a HD7770 @1000MHz, and it's at 4MH/s which looks similar to Wolf0's report on dev.amd.com: R9 290 @1200 20MH/s. Using 14.9 where the compiler generates slower code.)

Now I have to convert all the math into asm. That's painful :D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 03, 2015, 03:22:57 AM
@realhet
please share your work with the rest of us if it works out that assembly optimization works out.  Looked at r9-285 review today, looks promising as long as smaller memory bus (256 vs 384) doesn't bottleneck.  Should be faster than 280x and on par or maybe even better than 290 with lower power requirement ...
May need tweaks for each architecture ... can it be written to detect which card it's running on and auto select best?

A quote from AnandTech review :
Quote
A complete Tonga configuration will contain 2048 SPs, just like its Tahiti predecessor, with 1792 of those SPs active on R9 285. This is paired with the card’s 32 ROPs attached to a 256-bit memory bus, and a 4-wide (4 geometry processor) frontend. Compared to Tahiti the most visible change is the memory bus size, which has gone from 384-bit to 256-bit. In our look at GCN 1.2 we’ll see why AMD is able to get away with this – the short answer is compression – but it’s notable since at an architectural level Tahiti had to use a memory crossbar between the ROPs and memory bus due to their mismatched size (each block of 4 ROPs wants to be paired with a 32bit memory channel). The crossbar on Tahiti exposes the cards to more memory bandwidth, but it also introduces some inefficiencies of its own that make the subject a tradeoff.

Meanwhile Tonga’s geometry frontend has received an upgrade similar to Hawaii’s, expanding the number of geometry units (and number of polygons per clock) from 2 to 4. And there are actually some additional architectural efficiency improvements in here that should further push performance per clock beyond what Hawaii can do in the real world.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 03, 2015, 01:26:36 PM
Yes they are two chained iterations of groestl.
But they run a bit different code: the first is optimised because part of the input is known in advance and the second because the whole hash is not needed.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 03, 2015, 01:28:54 PM
Is anyone willing to donate or lend a 285 so I can optimise for Tonga?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 03, 2015, 07:34:53 PM
Hi again,

Finally I'm at the point that it first time ever produced a correct result.
The speed test was surprisingly good: HD7770 1000MHz (640 streams, GCN1.0 chip), Cat:14.9(the 20% slower driver), total workitems: 256*10*512, elapsed: 558.613 ms,  4.693 MH/s,   gain:   1.17x where the baseline is the opencl implementation (found on amd.com at Wolf0's post) which is 4.00MH/s.

And the first optimization was really a cheap shot ;D. Unlike ocl, I was able to made it under 128 VGPRS (I use 120 currently, it was kinda close). So as each Vector ALU can choose from 2 wavefronts at any time, latency hiding finally kicked in -> elapsed: 279.916 ms  9.365 MH/s   gain:   2.34x

And I'm full of ideas to try :D Next will be to shrink the code to fit into the 32KB instruction cache. Now it is 300kb, it's a massive macro unroll at the moment. The original pallas' ocl version is 110kb, wonder why 3x the multiplier though. Anyways, on GCN we can have loops with only 1 clycle overheads, or even I can write subroutines with call/ret instructions, so I gotta try that fast it is when the instruction cache has no misses at all.

OpenCL thing: While I simplify the code (I chopped down the first/last round optimizations because they would be hard to implement in asm atm) I noticed that I knew already from the past: The OpenCL->llvm-> amd_il -> gcn_asm toolchain will eliminate all the constant calculations and all the calculations whose results is not used at all. I watched the times while making these modifications and it stayed around 4MH/s. Sometimes it dropped below 3.7 when I put measurement code at various places to compare the original kernel with my kernel: if(gid==1234 && flag==1) for(int i=0; i<16; ++i) output = g;


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 03, 2015, 10:21:32 PM
Hi again,

Finally I'm at the point that it first time ever produced a correct result.
The speed test was surprisingly good: HD7770 1000MHz (640 streams, GCN1.0 chip), Cat:14.9(the 20% slower driver), total workitems: 256*10*512, elapsed: 558.613 ms,  4.693 MH/s,   gain:   1.17x where the baseline is the opencl implementation (found on amd.com at Wolf0's post) which is 4.00MH/s.

And the first optimization was really a cheap shot ;D. Unlike ocl, I was able to made it under 128 VGPRS (I use 120 currently, it was kinda close). So as each Vector ALU can choose from 2 wavefronts at any time, latency hiding finally kicked in -> elapsed: 279.916 ms  9.365 MH/s   gain:   2.34x

And I'm full of ideas to try :D Next will be to shrink the code to fit into the 32KB instruction cache. Now it is 300kb, it's a massive macro unroll at the moment. The original pallas' ocl version is 110kb, wonder why 3x the multiplier though. Anyways, on GCN we can have loops with only 1 clycle overheads, or even I can write subroutines with call/ret instructions, so I gotta try that fast it is when the instruction cache has no misses at all.

OpenCL thing: While I simplify the code (I chopped down the first/last round optimizations because they would be hard to implement in asm atm) I noticed that I knew already from the past: The OpenCL->llvm-> amd_il -> gcn_asm toolchain will eliminate all the constant calculations and all the calculations whose results is not used at all. I watched the times while making these modifications and it stayed around 4MH/s. Sometimes it dropped below 3.7 when I put measurement code at various places to compare the original kernel with my kernel: if(gid==1234 && flag==1) for(int i=0; i<16; ++i) output = g;

Great progress, very interesting!
The first improvement, 1.17x, is about the same as the 20% that is lost on 14.9 compared to 14.6 beta, so the two implementations are equivalent.
The second, 2.34x, is really impressive: I have tried multiple times to reduce the number of variables as much as possible (down to 3x16 ulong arrays, 2 ulong and 2 uint), but the results were always worse, so probably that improvement can't be implemented in opencl, or at least I don't know how to.
The same for code size and instruction cache: I was able to squeeze it to about 50K, but at a speed loss.
About the compiler than can eliminate the constant calculations: I noticed that, but doing it by hand works best both in terms of speed and kernel size.
Finally, a question about your work: do you plan to opensource it?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 04, 2015, 11:26:22 PM
Hi,

The Groestl asm code is opensource (I just uploaded it). My compiler and IDE is closed source though, but once you compiled the kernel with it into an .ELF binary, you can use it even on Linux, not just Win.

The first asm version is documented on my blog. Check it out here -> http://realhet.wordpress.com/ (http://realhet.wordpress.com/)
It's only a development version, and the kernel parameters are incompatible with Pallas's OpenCL kernel. I have a hard time reverse engineering how params are passed through registers, not mentioning that it can be different in every catalyst version so I keep parameters simple. One buffer with pinned memory for everything data IO is the fastest anyways.
I'm planning to post about many optimizations. Let's see how far can I go. With using only 128 VGPRS it is already at 2.3x speedup and I'm expecting more. ;D
I believe that OCL is so generalized and is kinda far from the actual GCN hardware that it is worth for some projects to go low level. (Not all projects: For example I have failed with LiteCoin. It's better for it to stay in maintainable OCL code.)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 05, 2015, 05:12:13 PM
First 2 optimizations are done, I wrote a blog post about them. I'm at 2.65x now.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 05, 2015, 06:29:19 PM
First 2 optimizations are done, I wrote a blog post about them. I'm at 2.65x now.

Thanks very much!
Unfortunately it appears the two optimisations are hard to implement in opencl: minimum code size I was able to achieve was 50K, far from 32k, and reducing the number of variables as much as possible didn't provide any speed up. Maybe the number of vregs is still higher than 128...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 06, 2015, 04:39:59 AM
Hi, I think I'm done with the things I wanted to try. It's at 3.48x now ;D Check the second part of the optimizations: http://realhet.wordpress.com/ (http://realhet.wordpress.com/)
It's really cool that how the ALU, the LDS and the L1 cache can cooperate on the same job.

Let's discuss that how my kernel can be used in the miner program. I'm an absolute noob with mining so pls help me. Is it the popular sg-miner? Can I compile it with Qt5.3 with MSVC? Or maybe under Visual Studio Express? Do you have actual test vectors to test it? I wanna make sure if it calculates 100% correctly. And can't wait to see if it really goes 70MH/s on a 290x beast.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 06, 2015, 04:49:51 AM
(Oups an important part was missing in my blogpost -> now it's corrected)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on January 06, 2015, 06:12:39 AM
Hi, I think I'm done with the things I wanted to try. It's at 3.48x now ;D Check the second part of the optimizations: http://realhet.wordpress.com/ (http://realhet.wordpress.com/)
It's really cool that how the ALU, the LDS and the L1 cache can cooperate on the same job.

Let's discuss that how my kernel can be used in the miner program. I'm an absolute noob with mining so pls help me. Is it the popular sg-miner? Can I compile it with Qt5.3 with MSVC? Or maybe under Visual Studio Express? Do you have actual test vectors to test it? I wanna make sure if it calculates 100% correctly. And can't wait to see if it really goes 70MH/s on a 290x beast.

You first need to have the target passed to the kernel.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 06, 2015, 11:35:27 AM
Hi, I think I'm done with the things I wanted to try. It's at 3.48x now ;D Check the second part of the optimizations: http://realhet.wordpress.com/ (http://realhet.wordpress.com/)
It's really cool that how the ALU, the LDS and the L1 cache can cooperate on the same job.

Let's discuss that how my kernel can be used in the miner program. I'm an absolute noob with mining so pls help me. Is it the popular sg-miner? Can I compile it with Qt5.3 with MSVC? Or maybe under Visual Studio Express? Do you have actual test vectors to test it? I wanna make sure if it calculates 100% correctly. And can't wait to see if it really goes 70MH/s on a 290x beast.

Thanks for the update.
I've been using Linux only for many years now, so I can't help you on windows compiling; just know it's trivial to compile the miner on linux, it runs on a terminal so doesn't need qt.
About the software version, I prefer the good old sph-sgminer which is based on sgminer 4.1, (I modified it a bit)  but you can use the latest sgminer 5.X as well.
To test the kernel you can simply point it to a pool, printf the hash or whatever.
Back to my opencl effort, I've reduced the number of vgprs to 147 but I'm struggling to get past that.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on January 06, 2015, 01:06:50 PM
Does you assembler support self modifying code? ;) Then you can use the instruction cache as a precalc buffer as well. The advantage is that most gpu's can read from the inst cache in paralell to the level 1 cache.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep1 on January 06, 2015, 03:38:19 PM
and will be a version for windows  :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 07, 2015, 10:34:17 PM
On hawaii only, I've managed to get to 123 VGRPS and 28K ISA size, so now I have all the optimizations of the asm code :-)
I believe the asm version is still faster on hawaii, and of course much faster on smaller cards.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 07, 2015, 11:37:09 PM
new optimized CL or a BIN? (I'll test on 280x and 7950).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 08, 2015, 07:20:00 AM
"On hawaii only, I've managed to get to 123 VGRPS and 28K ISA size, so now I have all the optimizations of the asm code :-)"

Then it got all the goodies: vgprs, icache and 2ram+6lds reads. The speedup must be the same 3.5x! Is it that much?

It must be good on small cards either, only important difference is the number of CUs anyways.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 08, 2015, 07:36:35 AM
And you have the first/last round optimizations so it must be faster!
If it's as fast as the asm version, then I don't have to deal with the kernel parameters, which is boring/painful. My asm was only needed to encourage you to shrink the code/regs. :D

Can you share the new source?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 08, 2015, 09:28:52 AM
And you have the first/last round optimizations so it must be faster!
If it's as fast as the asm version, then I don't have to deal with the kernel parameters, which is boring/painful. My asm was only needed to encourage you to shrink the code/regs. :D

Can you share the new source?

Unfortunately it's only about 25% faster, but we should compare apples to apples: could you try your code on hawaii chipset so we have a constant testbed?
Now I'm working on further first round optimizations, they bring little improvement but it's still worth imho.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 08, 2015, 09:59:22 AM
25% seems like only that loss coming back which is lost with the 14.9. I really thought you had it 3.5x faster.

Are you sure that it only uses 123VGPRS AND code size is 28KB only? Or does it started to use Scratch regs (those are terribly slow)?

Unfortunatelly I can't try on anything else than HD7770. But I'd also like to see how it runs on faster systems. I uploaded it onto my blog in the download area if someone wish to try it. I'm not familiar with the latest GCN chips (I think AMD only improve their instruction from time to time, and maybe cut down double precision performance), but with this particular program, I'm pretty sure that it will bring the 3.48x speedup on the R9 290x too. Because all the CUs can work alone using LDS and L1 cache and ICache on their own, that's why. So if current ocl code on the R9 290x runs at 20MH/s then the latest asm code should be run at 70MH/s.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 08, 2015, 10:03:12 AM
25% seems like only that loss coming back which is lost with the 14.9. I really thought you had it 3.5x faster.

Are you sure that it only uses 123VGPRS AND code size is 28KB only? Or does it started to use Scratch regs (those are terribly slow)?

Unfortunatelly I can't try on anything else than HD7770. But I'd also like to see how it runs on faster systems. I uploaded it onto my blog in the download area if someone wish to try it. I'm not familiar with the latest GCN chips (I think AMD only improve their instruction from time to time, and maybe cut down double precision performance), but with this particular program, I'm pretty sure that it will bring the 3.48x speedup on the R9 290x too. Because all the CUs can work alone using LDS and L1 cache and ICache on their own, that's why. So if current ocl code on the R9 290x runs at 20MH/s then the latest asm code should be run at 70MH/s.

25% compared to 14.6, it's 43% compared to 14.9.
No scratch reg use (when I triggered it a couple times, it slowed down to less than 1 Mh/s).
I'd like to try your asm code myself, but I'd need the linux version of the assembler.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 08, 2015, 12:55:13 PM
Now I managed to build sgminer5.1 on my sys. I still have to make my kernel to work with it.

Does sgminer has an offline 'diagnostic' mode, just for testing the kernel if it runs and how fast it runs?

"I'd need the linux version of the assembler."
Sorry, it's impossible. It's not even written in Cpp just to be able to compile on any other system, than win.

And to make things more complicated :D You have to compile with it for every type of gcn cards multiplied by every Catalyst driver that was altered by AMD developers. My compiler only patches the binary into the .elf, the actual elf file is generated by the current Catalyst Driver of the currently selected gfx card.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 08, 2015, 01:39:04 PM
Does sgminer has an offline 'diagnostic' mode, just for testing the kernel if it runs and how fast it runs?

There is a simple "benchmark" option:

--benchmark         Run sgminer in benchmark mode - produces no shares


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 08, 2015, 03:43:38 PM
Unfortunately there is no --benchmark parameter. I checked in in the source code too, but nothing similar https://github.com/sgminer-dev/sgminer/blob/master/sgminer.c (https://github.com/sgminer-dev/sgminer/blob/master/sgminer.c).
Is there a simple war to run it? Now I have a groestl wallet, but where can I get username from? What parameters should I use other than -k groestl and -d 1?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 08, 2015, 03:57:32 PM
Unfortunately there is no --benchmark parameter. I checked in in the source code too, but nothing similar https://github.com/sgminer-dev/sgminer/blob/master/sgminer.c (https://github.com/sgminer-dev/sgminer/blob/master/sgminer.c).
Is there a simple war to run it? Now I have a groestl wallet, but where can I get username from? What parameters should I use other than -k groestl and -d 1?

Probably they removed it, I'm using an older version.
I run it like this, for solo mine:

sgminer -k groestlcoin --difficulty-multiplier 0.0039062500 -o http://localhost:GROESTLCOIN_RPC_PORT -u YOURUSER -p YOURPASSWORD

Then you have to find and add your best intensity and worksize (my OS kernel works with 256 only).
username and password are set in groestlcoin.conf; the port you can easily find in their thread (or via netstat).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 08, 2015, 08:08:22 PM
Thanks for help. Now it runs, and I found that this command produces the best results:
sgminer -d 1 -k groestlcoin --difficulty-multiplier 0.0039062500 -o http://localhost:1441 -u u -p p --shaders 1280 --worksize 256 -g 1 --intensity 24

It produces (avg)2MH/s which is the half of the 4Mh/s I calculated earlier.
Does sgminer divides the Groestl-hash calculation number by 2? Although, It would be more reasonable.
Or something is really wrong, that It runs on half speed (exactly hald speed)?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 08, 2015, 08:30:59 PM
Thanks for help. Now it runs, and I found that this command produces the best results:
sgminer -d 1 -k groestlcoin --difficulty-multiplier 0.0039062500 -o http://localhost:1441 -u u -p p --shaders 1280 --worksize 256 -g 1 --intensity 24

It produces (avg)2MH/s which is the half of the 4Mh/s I calculated earlier.
Does sgminer divides the Groestl-hash calculation number by 2? Although, It would be more reasonable.
Or something is really wrong, that It runs on half speed (exactly hald speed)?
Are you sure it is running your kernel?.  Look in your sgminer dir, for a .bin file generated by OCL it may be running default groestlcoin OCL.  delete .bin and replace with your own of same name generated, it will not be regenerated it it exists in dir.  you must delete .bin whenever you change configs to force OCL recompile ... but you don't want that, u want to run your asm kernel ... so will have to figure out the parameter passing from sgminer ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 08, 2015, 08:41:59 PM
Thanks for help. Now it runs, and I found that this command produces the best results:
sgminer -d 1 -k groestlcoin --difficulty-multiplier 0.0039062500 -o http://localhost:1441 -u u -p p --shaders 1280 --worksize 256 -g 1 --intensity 24

It produces (avg)2MH/s which is the half of the 4Mh/s I calculated earlier.
Does sgminer divides the Groestl-hash calculation number by 2? Although, It would be more reasonable.
Or something is really wrong, that It runs on half speed (exactly hald speed)?

Intensity 24 is too much, I'd stay between 20 and 22, otherwise you'll produce a lot of rejected shares (or orphans if solo mining).
The shaders option is ignored for groestl.
The hashrate should be calculated on the full computation, i.e. 2 chained hashes.
What kernel are you using?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 08, 2015, 09:48:35 PM
I'm using your kernel: groestlcoin.cl.

Now I disassembled a dummy kernel with the appropriate parameters and I forgot about the T buffers. OpenCL uploads them in an extra buffer automatically. I don't even wanna know how the driver send that extra buffer and most importantly can't make an automatic skeleton kernel to get the binary with a placeholder for constant data that my program can patch with the output of the assembler.

So the easiest way would be to modify sgminer to handle my kernel. I have found the the 'queue_sph_kernel()' function where I can start from.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 08, 2015, 09:57:06 PM
I'm using your kernel: groestlcoin.cl.

Now I disassembled a dummy kernel with the appropriate parameters and I forgot about the T buffers. OpenCL uploads them in an extra buffer automatically. I don't even wanna know how the driver send that extra buffer and most importantly can't make an automatic skeleton kernel to get the binary with a placeholder for constant data that my program can patch with the output of the assembler.

So the easiest way would be to modify sgminer to handle my kernel. I have found the the 'queue_sph_kernel()' function where I can start from.

I never tested my kernel with cards smaller than tahiti, I also have no reports of it running on <= pitcairn: other groestlcoin kernels might be faster in that case.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 09, 2015, 03:55:25 PM
I see there is very little interest in mining groestl coins with GPU: very few users joined the recent discussion (2/3).
Let alone contributing to the code (2) or donating (2), in the whole life of this thread.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 09, 2015, 04:08:09 PM
I see there is very little interest in mining groestl coins with GPU: very few users joined the recent discussion (2/3).
Let alone contributing to the code (2) or donating (2), in the whole life of this thread.
Well I still prefer GPU mining while block rwd 1.0 and will see what happens to diff when Rwd drops to 0.1 ... So count me in on new kernel, I donated a bit last time u did new kernel and will donate again for new super-super asm kernel :)
I expect diff will drop remarkably when Rwd drops and solo mining might still be attractive even aftre ...
I have 1 280x solo mining DMD (Pallas Diamond) approx 18.6 MHs (2-4 coins per day)
and 7950 solo mining FTC (neoscrypt) 278 KHs (would be sweet if these opt'z could be applied to Neoscrypt also ... wolf0 where are u?)

@realhet
Would be great if you could add a kernel setting parameter (perhaps realhet) that selects using your kernel and supply a windows x64 build of your sgminer ... I'd donate for that :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 10, 2015, 01:41:26 AM
Well, I found it better not to alter sgminer, that I'm totally unfamiliar with it and rather started to turn my kernel to be exactly the same as groestlcoin.cl from the outside. It will be a half page of additional code that deals with the kernel parameters. With a small dummy kernel it is already working now, but I'm just too tired to continue now. :D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 10, 2015, 03:48:37 AM
Well, I found it better not to alter sgminer, that I'm totally unfamiliar with it and rather started to turn my kernel to be exactly the same as groestlcoin.cl from the outside. It will be a half page of additional code that deals with the kernel parameters. With a small dummy kernel it is already working now, but I'm just too tired to continue now. :D

well that's easier to use for the people.
waiting forward to seeing your progress! :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 11, 2015, 03:01:34 AM
Do I need a better proof than this? ;D
http://x.pgy.hu/~worm/het/my_first_grs.png
I'm the proud owner of my first 19 GRS coins, haha. I guess I was super lucky to get an 'accepted' right after 10 minutes of mining.

The speed increase in sgminer is the same that I measured in my 'workbench': From 2MH/s it raised to 7MH/s. (Or if we calculate in GroestlHash/s then it is 4MH/s -> 14MH/s.)

If anyone willing to help me testing this, please tell me! You'll need a Windows with cat14.9 and you also have to brave enough to run my IDE (HetPas.exe) on that system.

I can't wait to see your reports that how fast it is on the big cards. :D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 11, 2015, 03:24:35 AM
The compiled bin file should work regardless of catalyst version or operating system, so could you please post a link to the bin file? Thanks.
There is something weird about the sgminer screenshot, are you sure it's working correctly? It shows a single, disabled GPU with id 0, and the share got accepted was from GPU id 1. The diff numbers are also kinda weird.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on January 11, 2015, 05:23:58 AM
The compiled bin file should work regardless of catalyst version or operating system, so could you please post a link to the bin file? Thanks.
There is something weird about the sgminer screenshot, are you sure it's working correctly? It shows a single, disabled GPU with id 0, and the share got accepted was from GPU id 1. The diff numbers are also kinda weird.

SG bug.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep1 on January 11, 2015, 09:21:06 AM
I would also tested


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 11, 2015, 10:54:26 PM
Sorry for taking it a bit long.

Here's what all you have to know if you're willing to test: http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/ (http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/)

Please send me benchmarks and compiled kernels for various cards!

I'm running it for an hour now and I got a 'rejected'. I'm solo mining GRS. Do I need to worry? Or is it usual? Can it be caused by slow network?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 11, 2015, 10:56:59 PM
Realhet, thanks for the capeverde bin, unfortunately I can't use it because it's 32 bit.
I created a bootable win7 stick in order to compile the kernel: it compiles fine but, when run, it says "no target Hawaii" and no bin is created.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 11, 2015, 10:58:04 PM
I'm running it for an hour now and I got a 'rejected'. I'm solo mining GRS. Do I need to worry? Or is it usual? Can it be caused by slow network?

yes it can be cause of the network: if the wallet is behind sync, the block may be rejected (or orphaned).
try with a pool...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 11, 2015, 11:11:06 PM
Runtime error: No GCN device found

I have 2 AMD cards on gpu-platform 1
and 1 Intel GPU on gpu-platform 0

Edit: DOH 14.7RC3 not GCN ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 11, 2015, 11:47:28 PM
Thx for testing! So many errors :S But usually that's how it goes.

"No GCN device found" error.

That could be because I can't recognize new cards.
I know only these at the moment.
'TAHITI', 'PITCAIRN', 'CAPEVERDE', 'UNKNOWN5');
Importing new names right now.

Meanwhile you can select an OpenCL device by uncommenting this line in the code:
var dev:=cl.devices[0]; //access device by index (must be a GCN one)

The findDevices function can't recognize new cards. I'll repair it now.

@pallas: Thanks for fiddling with Win7! :D What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )
   


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 12, 2015, 12:19:53 AM
I've updated HetPas and the groestl_isa.hpas too. Pls download HetPas150111_Groestl.zip.

From now it will start with a list of the cards:
writeln("List of opencl devices:");
for var i:=0 to cl.devices.count-1 do begin
  writeln("Device #",i);
  writeln(cl.devices[ i].dump);
end;

It should display something like this:
List of opencl devices:
Device #0
Target: Cayman  Series: 6  Core:880 MHz  CU:24  RAM:2048 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics ...
Device #1
Target: Capeverde  Series: 7  Core:880 MHz  CU:10  RAM:1024 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 ...

Using device:
Target: Capeverde  Series: 7  Core:880 MHz  CU:10  RAM:1024 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 ...
* core MHz value is not always accurate, use Catalyst Control Center (or ADL) instead!

For the GCN cards, the 'Series' must be at least 7. If it fails and it is indeed a GCN card, then I detected it badly, pls report then. My first card is a series 6xxx Northern Islands hardware, it can't used for this kernel.

@utahjohn: Maybe it works on 14.7 too. I can't tell that, but I know that it will crash on 13.4 because the kernel parameters are handled differently in that driver.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 12:22:34 AM
Temporarily upgraded to 14.9 to run hetpas, built for 280x.
Had hell of a time reverting back to 14.7 ... several tries later 14.7 working again and I have a kernel.elf for 280x.

Testing now ...

Very early results ...
280x I=22 E=1180 M=150 WS=256 ... 26 MHs Solo . No blocks yet ... approx 1.4x normal diamond kernel (18.5MHs)

Intensity 22 is sweet spot for my 280x, now playing with mem clock ...

No significant effect on raising mem-clock other than higher temps ...

stick with low mem clock.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 12, 2015, 12:56:32 AM
"Very early results ..."

Very good, that it runs at you!

The speedup is not that impressive but let me ask yo to do a test:

Please when you stop sgminer, press run the groestl_isa.hpas, and copy/paste here my programs output, like this:

-----------------------------------------
Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.661 ms  13.749 MH/s   gain:   3.44x
elapsed: 188.444 ms  13.911 MH/s   gain:   3.48x
elapsed: 188.218 ms  13.928 MH/s   gain:   3.48x
elapsed: 188.225 ms  13.927 MH/s   gain:   3.48x

Functional test: RESULT IS OK
-----------------------------------------

And then go to around line 23 and comment out the "#define USE_NEW_ASM_KERNEL" and run it again! This will compile the original OpenCL kernel I've downloaded with sgminer5.1.

-----------------------------------------
Using original OpenCL code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 657.623 ms  3.986 MH/s   gain:   1.00x
elapsed: 655.396 ms  4.000 MH/s   gain:   1.00x
elapsed: 654.897 ms  4.003 MH/s   gain:   1.00x
elapsed: 655.055 ms  4.002 MH/s   gain:   1.00x

Functional test: RESULT IS OK
-----------------------------------------

As you can see, on my small card the speedup is 3.5x. I'd like to check these results on your 280x as well.
I'm thinking that the problem is only because your big card don't get enough threads ore something similar.

Just a silly test: what if you turn Memory clock up to normal speed? Maybe it will change the L1 cache's behaviour? My kernel uses 0 memory, but uses L1 cache extensively.

And finally I had an 'accepted', phew...

"Had hell of a time reverting back to 14.7" -> Is there a tool called "Catalyst Clean Uninstall Utility" nowadays? 2-3 years ago that was useful when decrease Cat version.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 01:09:06 AM
"Very early results ..."

Very good, that it runs at you!

The speedup is not that impressive but let me ask yo to do a test:

Please when you stop sgminer, press run the groestl_isa.hpas, and copy/paste here my programs output, like this:

-----------------------------------------
Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.661 ms  13.749 MH/s   gain:   3.44x
elapsed: 188.444 ms  13.911 MH/s   gain:   3.48x
elapsed: 188.218 ms  13.928 MH/s   gain:   3.48x
elapsed: 188.225 ms  13.927 MH/s   gain:   3.48x

Functional test: RESULT IS OK
-----------------------------------------

And then go to around line 23 and comment out the "#define USE_NEW_ASM_KERNEL" and run it again! This will compile the original OpenCL kernel I've downloaded with sgminer5.1.

-----------------------------------------
Using original OpenCL code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 657.623 ms  3.986 MH/s   gain:   1.00x
elapsed: 655.396 ms  4.000 MH/s   gain:   1.00x
elapsed: 654.897 ms  4.003 MH/s   gain:   1.00x
elapsed: 655.055 ms  4.002 MH/s   gain:   1.00x

Functional test: RESULT IS OK
-----------------------------------------

As you can see, on my small card the speedup is 3.5x. I'd like to check these results on your 280x as well.
I'm thinking that the problem is only because your big card don't get enough threads ore something similar.

Just a silly test: what if you turn Memory clock up to normal speed? Maybe it will change the L1 cache's behaviour? My kernel uses 0 memory, but uses L1 cache extensively.

And finally I had an 'accepted', phew...

"Had hell of a time reverting back to 14.7" -> Is there a tool called "Catalyst Clean Uninstall Utility" nowadays? 2-3 years ago that was useful when decrease Cat version.
No significant effect on raising mem-clock other than higher temps ...

Use "DDU" to clean catalyst drivers but not always 100% effective sometimes a little manual cleaning needed too ...

BTW I am using Pallas kernel as reference, not one supplied with stock sgminer ...

Any tweaks you can do with 2048 shaders (280x) and 1792 shaders (7950) ?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 12, 2015, 01:24:21 AM
Yes, that is must be the same kernel that I've copied into the groestl directory next to the groestl_isa.hpas file.

When you compile the original kernel within then groestl_isa.hpas program, it will use the groestl_original.cl kernel. It's Pallas's kernel, except that I hardcoded the workgroup size in it, and did another very minor change.

Also I compared the kernel I downloaded from the very first post in this topic: It's the same.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 01:29:45 AM
I did not try running kernel under catalyst 14.9, all I wanted was to generate the kernel.elf to run under 14.7 ... because I run multiple algos concurrently under 14.7 that suffer under 14.9 ...

Also note that I am running sgminer 4.1.0


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 12, 2015, 01:36:18 AM
I tested my kernel only in Cat 14.9
I have no info on how it works on 14.7

When you compile in HetPas it will generate a skeleton kernel binary with the help of the OpenCL compiler. And then the new assembly code will be PATCHED into that. So I don't make the binary from scratch and maybe the 14.7 binary is a bit different than the 14.9 binary and I just don't know about that. (Although life would be so much easier if AMD would be so kind and give us an interface to upload binary program code... But that's not going to happen :D)


"Any tweaks you can do with..."

Please let's do the test inside the IDE first. Let's compare the original and the new kernel there, as it is perfect for timing. In sgminer we need to play with Intensity and other factors and wait for minutes to get a correct time anyways.

So please paste here what you see on HetPas on the right pane after you run the program:
I'm interested in this information, and also tell me what card and engine MHz you used:

Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.645 ms  13.750 MH/s   gain:   3.44x
elapsed: 188.281 ms  13.923 MH/s   gain:   3.48x
elapsed: 188.233 ms  13.927 MH/s   gain:   3.48x
elapsed: 188.316 ms  13.920 MH/s   gain:   3.48x

Functional test: RESULT IS OK



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 12, 2015, 02:04:01 AM
Thanks!

Well this is kinda bad for a Tahiti :/

Also the times of the 4 kernel launches are weird:
On my card it is 3.44x, 3.48x, 3.48x, 3.48x
But on your card this is 3.88x, 3.10x, 3.10x, 3.10x

On my card the first launch is a bit slow because the card was at low MHz when the test started and after the warmip it became steady 3.48x.

On your card the speeds are so random. Your card (at 1150) is 3.68x faster than mine, so everything is ok, you should have see 12.8x gains.

Maybe it is a 14.7 issue, I don't know. Everything can change from driver to driver...

What is on my mind is:

1. What if you change workcount form the original
    WorkCount := 256*10*512
to WorkCount := 256*10*512*10;  ?
Does elapsed times became are 10x longer?  (Functional test will fail, ot's ok, just reset WorkCount to default value after this test)

2. Let's see how the original kernel works in HetPas:
  just comment out the  "#define USE_NEW_ASM_KERNEL" and let me see the times please. If the original kernel works well, then gain must be 3.68.


(Thank you for testing so far)

--------------------------------------------------------------------
"elapsed: 50.686 ms  51.719 MH/s   gain:  12.93x"
WOW! THIS IS IT! :D:D:D
Exactly what I've expected! Your card is 3.71x faster. What was the error? You accidentally mined while testing, right?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 02:15:04 AM
The last test run I did grabbed 2 cards so divide in half for an average on Tahiti (280x+7950).

Not the gains I was expecting base on you blog ... 3.4x times 18.5 MHs should net me around 62 Mhz vs the 26MHs I'm getting now ... so Tahiti not so great gains but better  :)

Short of pulling a card physically I don't know how to disable hetpas running all of them ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Star65 on January 12, 2015, 02:34:42 AM
I would also tested on 7970 & 280x.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 12, 2015, 02:47:15 AM
There must be some missunderstandings based on MHs values. So we have to be careful!

On this topic (first post) when Pallas says that R9 280x is 18MH/s he counts it in Groestl hashes.

When my program says "elapsed: 50.686 ms  51.719 MH/s" it counts it also in Groestl hashes. Just as Pallas.

But when you see MH/s inside sgminer then it must be multiplied by 2 because in SG 1 MH/s = 2 MGroestlH/s.

--------------------------
So when you see "51.719 MH/s" is my program
then you must see 26MH/s in SG.

And when you see 18MH/s on the first post on this topic
You must see 9MH/s in SG.

Also when I see 4MH/s in my program
Then I saw 2MH/s in SG.
---------------------------

So the equation is: 2*sgminer Mh/s = Pallas's Mh/s

This is because sgminer counts 2 Groesth hash calculations as 1. But Pallas count it as 2 hashes, and I just copied Pallas, then later found out how sgminer calculates.

---------------------------
So the Tahiti 26MH/s in sgminer is correct. Please remove the kernel and let sgminer compile it form opencl! If I'm calculating well, then you must see 7-8MH/s with the original kernel. Can you check it please?




Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 02:52:13 AM
When I run Pallas OCL I see 18.5MHs in sgminer.
When I run Realhet asm I see 26.0MHs in sgminer.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 12, 2015, 03:11:14 AM
When I run Pallas OCL I see 18.5MHs in sgminer.
When I run Realhet asm I see 26.0MHs in sgminer.

Please send me that .cl file and the binary that is compiled by the sgminer, I gotta check it.

For today, Thank You for testing, I gotta sleep now, see you!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Star65 on January 12, 2015, 04:33:54 AM
TVM Pallas and realhet for nice work!

7970/280x 1130/300 W7

Pallas kernel in Cat 14.6  - 17.8MH/s
Pallas kernel in Cat 14.9  - 7.8MH/s   - so 14.9 very bad drivers?!
Realhet kernel in Cat 14.9 - 24.8MH/s - 24.8/7.8=3.18x !!!

We need realhet kernel (bin) with Cat 14.6 or 14.7 (best drivers perhaps). But I do not know how to do it.



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 05:03:14 AM
14.9 has a piss poor OCL compiler, we've known this for a long time ... Stick with 14.7RC3 for best overall performance over many different algo's.

I guess we are stuck with compiling realhet asm on 14.9 but 14.7 does better compiles for OCL.

I am running realhet asm kernel generated with 14.9 on 14.7 catalyst, just a pain in the ass reverting to 14.7 after using 14.9.

My Pallas OCL compile was done with 14.7RC3 and works better than OCL compiled on 14.9.
Pallas ocl compiled with 14.7RC3 will run normal on 14.9, just don't re-compile it with 14.9 ...

Confused yet? hehe

@Realhet
So the gain of Realhet = 1.40x Pallas stands when comparing to properly working Pallas OCL kernel on 14.7
(Same clocks and Intensity running under 14.7 so a fair compare).
Your Pallas reference speed is incorrect in hetpas because 14.9 mangled the OCL badly performance wise.
Take a look at performance hit 14.7 vs 14.9 in Star65 post above.
Unfortunately some of the "gains" you made may have been just repairing 14.9 OCL bugs LOL but obviously improvement was made somewhere in asm kernel.
You need to establish a baseline for your GPU using 14.7 Pallas OCL and see what really made improvements ...
I suggest start over and use this first round a learning experience :)  You started with code broken by 14.9 compiler as a base ...

Pallas 14.7 OCL Bin for 280x 18.5 MHs
https://mega.co.nz/#!kAEnDATC!HeelwXTHDsQNx8WJhTDcwqS-slOmikoBiMqTEK9-DV0 (https://mega.co.nz/#!kAEnDATC!HeelwXTHDsQNx8WJhTDcwqS-slOmikoBiMqTEK9-DV0)
Realhet 14.9 ASM bin for 280x 26.0 MHs
https://mega.co.nz/#!1NlRhYLC!7oLFfr2umL7T2Lc0fX3HY1ddthbpNqt6I_tYdG9OI9g (https://mega.co.nz/#!1NlRhYLC!7oLFfr2umL7T2Lc0fX3HY1ddthbpNqt6I_tYdG9OI9g)

Another random thought :) Can you set hetpas up to "cross-compile" for diff GCN architectures so all we have to do is DL bin files from u to test them?  I really dislike uninst-inst-uninst-inst to try a new asm version on 14.7 ... For example have it compile Tahiti.elf, hawaii.elf etc.  I understand u can only test for your card but with us out here to test other elf would speed process of testing new versions ...

DMD Donations : dJrhv4Pp1FXPrQiEp5njx42QrZiuZrbjQ1

Block found and accepted  solo mining so your asm kernel appears to be valid :)

I'd like you to have a look see what you can do to further improve wolf0's neoscrypt kernel with asm when you get time.
7950 currently doing 278KHs mining FTC.  PM me for OCL and BIN.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 09:18:29 AM
"when Pallas says that R9 280x is 18MH/s he counts it in Groestl hashes."

no my hashrates are taken from sgminer.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 09:20:38 AM
@pallas: Thanks for fiddling with Win7! :D What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )

Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on January 12, 2015, 09:25:04 AM
@pallas: Thanks for fiddling with Win7! :D What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )
Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.

On linux yes, but on windows they work. You need to run the x86 build of sgminer.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 09:26:25 AM
@pallas: Thanks for fiddling with Win7! :D What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )

Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.

infact:

[10:25:27] Internal error: Input OpenCL binary is not for the target!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 09:30:29 AM
@pallas: Thanks for fiddling with Win7! :D What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )

Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.
Min end in l4.bin ... am I 32 or 64 ... (win 7 x64)

4 * 8 (bits) = 32

it's the size of a long integer.
probably the sgminer build you are using is 32 bit.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 10:04:38 AM
@pallas: Thanks for fiddling with Win7! :D What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )

Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.
Min end in l4.bin ... am I 32 or 64 ... (win 7 x64)

4 * 8 (bits) = 32

it's the size of a long integer.
probably the sgminer build you are using is 32 bit.
question is does hetpas use 32 or 64 bit ... I'd assume 32 bit since it runs ok on my sgminer ...
my sgminer is old 4.1.0 ...

so you main prob is needing hetpas src to run on linux ...

Probably realhet coded it for 32 bit; I don't know what changes, maybe the parameter passing part.
I hope realhet has time to look into this.
I also use version 4.1.
Hetpas can't run on linux: I'll try again with the new version when I can access my workstation and make it boot on windows.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: JuanHungLo on January 12, 2015, 12:30:42 PM
I built my bins with Wolf0's x64 miner.  Works perfectly.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 12:37:57 PM
I built my bins with Wolf0's x64 miner.  Works perfectly.

could you share your bin files please?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: JuanHungLo on January 12, 2015, 01:37:52 PM
I built my bins with Wolf0's x64 miner.  Works perfectly.

could you share your bin files please?

Personally, I wouldn't download this.  I'd generate my own.  But here it is.  Use at your own risk!
http://ge.tt/2uga0R82/v/0?c (http://ge.tt/2uga0R82/v/0?c)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 01:41:34 PM
I built my bins with Wolf0's x64 miner.  Works perfectly.

could you share your bin files please?

Personally, I wouldn't download this.  I'd generate my own.  But here it is.  Use at your own risk!
http://ge.tt/2uga0R82/v/0?c (http://ge.tt/2uga0R82/v/0?c)

Thanks, but it's 32 bit, I need 64 bit.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 01:51:21 PM
HOW TO TELL IF AN SGMINER BIN FILE IS 32 OR 64 BIT

If the filename, generated by sgminer, ends in l4.bin it is 32 bit (8 x 4 = 32)
If the filename, generated by sgminer, ends in l8.bin it is 64 bit (8 x 8 = 64)

They are incompatible.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Star65 on January 12, 2015, 02:50:10 PM
Guys! We do not need more optimization! If all we get a faster kernel, then the difficulty will increase proportionally. Accordingly, we will not get more coins, but will pay more for electricity. Profits will only decrease.  :(
Faster kernel good for dev only (as a reward for their hard work), i think so.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 02:53:17 PM
Guys! We do not need more optimization! If all we get a faster kernel, then the difficulty will increase proportionally. Accordingly, we will not get more coins, but will pay more for electricity. Profits will only decrease.  :(
Faster kernel good for dev only (as a reward for their hard work), i think so.

true....
until you have half the hashpower by a couple fpga miners (or so they say) ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 02:55:41 PM
Guys! We do not need more optimization! If all we get a faster kernel, then the difficulty will increase proportionally. Accordingly, we will not get more coins, but will pay more for electricity. Profits will only decrease.  :(
Faster kernel good for dev only (as a reward for their hard work), i think so.
Not everyone will use new kernel so there is an advantage.  Yes diff will go up some.  Also as diff goes up many miners will drop like dead flies, so It will even out ...
Tell all your friends to Cloudmine/Multipool mine  and stop direct mining, this will lower diff for diehard solo miners :)

3 blocks DMD since I started ASM kernel last night ... :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 03:50:07 PM
@Pallas
It is extremely rare for me to see any orphan when solo mining so I would venture to guess your network is too slow.

probably too few nodes nearby: I have 20/30 msec round trip time to big internet nodes in my country.
having few fast nodes nearby means my blocks take a lot of time to spread thru the diamond network.
or a lot of bad luck :D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep1 on January 12, 2015, 04:46:17 PM
Target: Tahiti  Series: 7  Core:1100 MHz  CU:32  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
* core MHz value is not always accurate, use Catalyst Control Center (ADL) instead!

elapsed: 69.778 ms  37.568 MH/s   gain:   9.39x
elapsed: 54.247 ms  48.324 MH/s   gain:  12.08x
elapsed: 54.269 ms  48.305 MH/s   gain:  12.08x
elapsed: 54.236 ms  48.334 MH/s   gain:  12.08x
############### RESULT IS WRONG ###################
   idx        hi       lo           hi           lo
     0: 00000000 00000000            0            0
     1: 00000000 00000000            0            0
     2: 00000000 00000000            0            0
     3: 00000000 00000000            0            0
     4: 00000000 00000000            0            0
     5: 00000000 00000000            0            0
     6: 00000000 00000000            0            0
     7: 00000000 00000000            0            0
     8: 00000000 00000000            0            0
     9: 00000000 00000000            0            0
     A: 00000000 00000000            0            0
     B: 00000000 00000000            0            0
     C: 00000000 00000000            0            0
     D: 00000000 00000000            0            0
     E: 00000000 00000000            0            0
     F: 00000000 00000000            0            0
    10: A9A41A9D 9337706F  -1448863075  -1825083281
    11: 370D1AF4 DD743586    923605748   -579586682
    12: CB7EB389 EADF9917   -880888951   -354445033
    13: 25FA6A42 76EDCD1E    637168194   1995296030
    14: 91783455 C7EE8F10  -1854393259   -940667120
    15: F60C362A FD9AFAB3   -166971862    -40174925
    16: 038C0C0F D2E4564F     59509775   -756787633
    17: EA28DD29 3A1B41CA   -366420695    974864842
    18: 708C1E9A DFCDC04F   1888231066   -540164017
    19: 00000000 A7B76679            0  -1481152903
    1A: 00000000 00000000            0            0
    1B: 00000000 00000000            0            0
    1C: 00000000 00000000            0            0
    1D: 00000000 00000000            0            0
    1E: 00000000 00000000            0            0
    1F: 00000000 00000000            0            0
this is normal or am I doing something wrong

Quote
do not get me compile a file


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: physixz on January 12, 2015, 06:59:14 PM
Whats the best driver version to use as i can only get 11MH/s from my R9 290


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 07:00:26 PM
Whats the best driver version to use as i can only get 11MH/s from my R9 290

14.6b or 14.7

Or use the precompiled binary.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep1 on January 12, 2015, 07:30:14 PM
where there is a folder kernel_dump\  ???I can not find


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 08:12:03 PM
@realhet
OK a few things I have discovered:
1. Hetpas does compile and run ok on 14.7RC3.
    So no need to install 14.9 :)
2. Test Runs:
Target: Tahiti  core:1150 MHz  cu:32  ram:3072 MB  uid:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
* core MHz value is not always accurate, use Catalyst Control Center (or ADL) instead!

Using original OpenCL code
Kernel binary saved: C:\Miners\HetPas150111_Groestl\groestl\kernel_dump\kernel.elf

elapsed: 72.626 ms  36.095 MH/s   gain:   9.02x
elapsed: 70.712 ms  37.072 MH/s   gain:   9.27x
elapsed: 70.718 ms  37.069 MH/s   gain:   9.27x
elapsed: 70.741 ms  37.057 MH/s   gain:   9.26x

Functional test: RESULT IS OK

Target: Tahiti  core:1150 MHz  cu:32  ram:3072 MB  uid:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
* core MHz value is not always accurate, use Catalyst Control Center (or ADL) instead!

Using new GCN ASM code
Kernel binary saved: C:\Miners\HetPas150111_Groestl\groestl\kernel_dump\kernel.elf

elapsed: 53.629 ms  48.881 MH/s   gain:  12.22x
elapsed: 50.666 ms  51.740 MH/s   gain:  12.93x
elapsed: 50.677 ms  51.729 MH/s   gain:  12.93x
elapsed: 50.660 ms  51.746 MH/s   gain:  12.94x

Functional test: RESULT IS OK

3. Calculated speed gain is close to actual speed gain of 1.40x as shown running sgminer :)

4. First run of OCL should be reference value of 1.0x to do proper comparison, this needs to be reset in hetpas for each architecture.

5. Your timing calculations appear to be wrong.  Single 280x OCL is 18.5MHs, Single 280x ASM is 26.0MHs.
    Are you sure hetpas is not using BOTH of the cards in my test box when running tests?  I am mining in sgminer with SINGLE card, other is turned off and used in another instance of sgminer mining neoscrypt ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 12, 2015, 09:35:34 PM
The new version compiles fine, but of the two GPUs only id 1 works, id 0 doesn't produce any valid work unit.
Speed: r9 290 30Mh/s, r9 290x 33Mh/s (1100 MHz)
My experimental opencl kernel is a couple percent faster.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 12, 2015, 09:50:34 PM
The new version compiles fine, but of the two GPUs only id 1 works, id 0 doesn't produce any valid work unit.
Speed: r9 290 30Mh/s, r9 290x 33Mh/s (1100 MHz)
My experimental opencl kernel is a couple percent faster.
care to share newest incarnation of OCL ? PM me a link for personal use only :) :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: physixz on January 12, 2015, 11:43:48 PM
when i run HetPas it doesn't detect the graphics cards even though im running 14.9 drivers. anybody know why?

i get either Runtime error: openCL error: CL_Device_not_found or no GCN device found when i re-enable the intel integrated graphics. i am running 3 R9 290's and ive tried 14.9 and 14.12 beta drivers and neither work


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 12:46:01 AM
when i run HetPas it doesn't detect the graphics cards even though im running 14.9 drivers. anybody know why?

i get either Runtime error: openCL error: CL_Device_not_found or no GCN device found when i re-enable the intel integrated graphics. i am running 3 R9 290's and ive tried 14.9 and 14.12 beta drivers and neither work
I had to disable intel onboard graphics. uninstall all drivers and reinstall 14.7RC3.  What is happening is your AMD cards are being on wrong gpu-platform 1 in my case and Intel was gpu-platform 0.
Hetpas appears to be looking only on gpu-platform 0
completely uninstall all display drivers with DDU and then go to BIOS and disable onboard intel.  When AMD cards redetect they will appear on gpu-platform 0

AVOID 14.9 like the plague, it's OCL compiler is retarded.



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: physixz on January 13, 2015, 01:27:09 AM
Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 01:35:36 AM
Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
Pallas is getting 30MHs on 290 with realhet ASM kernel ... so some further tuning now, play with intensity, gpu clock, drop mem clock to lowest possible (150 on my 280x).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: JuanHungLo on January 13, 2015, 01:43:17 AM
Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
I'm using 14.7r3, xI 2048, 1100/150, -w 256 undervolted to 1.00 and getting 23.38 MH/s.  What's your config?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 01:43:20 AM
Note Intel GPU can be used for other algo such as X11, neoscrypt.  Now that AMD is on gpu-platform 0, you can try re-enable intel and see if it will pop up on gpu-platform 1.
check with sgminer -n in a command prompt window.
to specify which platform to use on sgminer command line --gpu-platform 0, or 1 ...
display # counts still start from 0 on each gpu-platform.

I have heard of ppl also running nvidia cards in same box with AMD, yet another gpu-platform selection ... :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 13, 2015, 02:28:13 AM
Hi All,

Important things to the top:
* I slightly updated the HetPas150111_Groestl.zip -> MH/s values are now the same as in SG.
* I've updated the main page with benchmark data I've collected: http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/
* I've uploaded the diamondTahiti binary, so now there are 2 precompiled bins, thank you utahjohn!


* My MH/s missunderstanding.
Thany you all for the investigations, now I see it clearly.
When I tried the groestlcoin.cl on my card on 14.9 and it ran on 2 MH/s. If I convert the 25MH/s from R9 290 down to my HD7770, then I should have got 4 MH/s.
And here comes my bad decision:
I didn't believed that the 14.6->14.9 changes were so bad that they slowed the kernel more than 2x. Actually it was 2.6x slower than my expectations.
And because the algorithm contains technically 2 hash calculations I thought that multiplying by 2 gives me the correct MH/s.
But as it turned out they indeed broke 14.9 so badly.
So If I ever thought about hating ocl, now I hate it more than twice. To be precise I hate it 2.6x more. :D
But on the optimistic side because of 14.9 made an exceptional quality of cr4p out of the ocl kernel, that gave me the false feel of success to continue optimizing, haha.
Anyways, I'm happy that it is solved now.


* HetPas and Catalyst version
When you compile an ASM kernel, my compiler generates a pure binary (and some parameters eg. LDS size)
In order to make it run it have to generate a complicated ELF binary image, so it will ask for one from the OpenCL compiler.
This small skeleton kernel contains the kernel parameters that you request in the assembly source.
For this groestl kernel I supply it a special skeleton.cl (see below in this post).
So when CpenCL compiled this small skeleton kernel, my program will patch the binary and other parameters into is. Also cut out every unwanted parts such as ocl, llwmir, amd_il sections. There is even a few kilobytes of zeroes in the ELF just to be compatible with terribly old hardware, I cut that out too.
And because I use the current OpenCL system, that's why the produced binary will be only compatible with that kind of hardware.


* Binary kernels and Catalyst versions
AFAIK when a kernel binary os loaded by clBuildKernei it doesn't check if it is compatible by cat version. Or any other version number.
So the binary is quiet transferable between versions.
When incompatibility occurs that can be caused by these things:
- driver developers changed the ELF file structure (for example they removed some sections: in 13.4 they removed the amd_il section from the inner ELF image. Yes, it is an ELF inside an ELF. :D) This can cause an error ow access violation when loading the kernel.
- driver developers changed the way/format kernel parameters are passed. This kind of incompatibility can causes a crash on the GPU.
So it doesn't matter that you compile with hetpas on 14.7, I just wrote 14.9 on my blog because I was 100% sure that my program works on 14.9


* "cross compile" option
Yea, it would be a nice feature. To do it I need binaries from all hardware, so I can 'dissect' them and maybe find out how to produce them manually.
I'm not going to understand the complete binary structure as amd can change it any time, and they must do it when they improve things anyways.
I only want to inject GCN binary into the hardware as simple as I can.
But with analyzing different binaries maybe I can find out how to change a binary to be compatible with a specific hardware.
For example If there are too much hardware dependent options that also depends on the kernel's parameters, then it's impossible to do without fully understanding how parameters are exchanged between the driver and the (specific) hardware.


* 32bit/64bit
Ok, now I understand. HetPas is all 32bit, so I haven't noticed there can be 64bit ELF's too.
I can guess that the Linux driver uses a an API of the OS to access ELF contents and that's why 32/64nit is important...
Please compile this kernel to a 64 bit binary and send me:
__attribute__((reqd_work_group_size(256, 1, 1)))
void search(__global unsigned char* block, volatile __global uint* output, const ulong target)
{ if(target>0) output[get_local_id(0)] = block[get_global_id(0)]; }


* "neoscrypt kernel"
Is this similar to LiteCoin?
1 year ago I played with LiteCoin's salsa, It was fun, but I wasn't able to outperform opencl.
But in the future I have plans to make a special salsa that will use LDS instead of the slow ram. This will be an interesting experiment as I gonna have to try some assembly exclusive things in order to outperform the original kernel:
- To be able to use 64KB lds for one thread I'll have to connect wavefront pairs to share their 32KB allocs with each other. For this I have to know that the current wavefront is running on which compute unit (s_get_hwreg).
- synching the two kernels on each CU individually will require some research. (GCN has an awesome global wave synch feature by hardware, so maybe there is something for 'local' too. If not, maybe I can poll GDS)
- because only one 'thread' will work actively on a CU, there'll be no latency hiding, so I have to program the kernel in a paralell way (but, no probs, I'll have all the 256 regs...)
- By the textbook: LDS throughput is 64x better(IMO it's not) than MEM throughput on a HD7970. So this would be the benefit.
- threads in workitems can copy register data from each other. So while I calculate only 1 salsa using the 2*32KB LDS for lookup (lookup_gap=2), I can spread data across more lanes on the wavefront and make calculations in paralell.
I've just checked neoscrypt.cl, it's insane :D But if I see it well, the half of it is SALSA.


* "Guys! We do not need more optimization!"
I've thought about this too. But I think if everyone use better kernels, then everyone will use the same power to get the same profit as difficulty will be harder but mining will require less power.
But what if not everyone uses the faster kernel. I think my compuler/IDE is helping in this a lot, as it is kinda user unfriendly :D


* Just a question about LiteCoin
Do you know that is it worth to optimize it on GPU? Or too many FPGA/ASIC there too?
I'm just curious only. I'd like to play that salsa algo, but my free time is running out soon.


* @qwep1  ### RESULT IS WRONG ####
Something is totally went bad there.
Tahiti is tested already, and the 'elapsed' is ok too, but in the memory dump that is garbage. What Catalyst are you using? Is the memory clock setting ok?
* kernel_dump\ folder is in the same folder as the groestl_isa.hpas program that you're running in HetPas.


* "AVOID 14.9 like the plague"
Haha, I'll try 14.12 now.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 02:48:04 AM
forget litecoin and all scrypt coins, they are asic territory now and GPU mining pointless on them.  Wolf0 can explain what he did to optimize neoscrypt, It was some major improvements ... I might be able to dig up a link ...

Here is last neoscrypt OCL I got from wolf0
https://mega.co.nz/#!cFEGTBBY!snQhOeLs6E_giKx2rY_i7XNcv95dASkrrRzlDOq7fIE (https://mega.co.nz/#!cFEGTBBY!snQhOeLs6E_giKx2rY_i7XNcv95dASkrrRzlDOq7fIE)
and some update from the forum
https://forum.feathercoin.com/index.php?/topic/7780-dev-neoscrypt-gpu-miner-public-beta-test/page-41#entry71777 (https://forum.feathercoin.com/index.php?/topic/7780-dev-neoscrypt-gpu-miner-public-beta-test/page-41#entry71777)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 03:35:12 AM
Look better :)  Dev 0 is 280x Dev 1 is 7950

List of opencl devices:
Device #0
Target: Tahiti  Series: 7  Core:1150 MHz  CU:32  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
Device #1
Target: Tahiti  Series: 7  Core:1150 MHz  CU:28  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

Using device:
Target: Tahiti  Series: 7  Core:1150 MHz  CU:32  RAM:3072 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event
* core MHz value is not always accurate, use Catalyst Control Center (or ADL) instead!

Using new GCN ASM code
Kernel binary saved: C:\Miners\HetPas150111_Groestl\groestl\kernel_dump\kernel.elf

elapsed: 53.609 ms  24.449 MH/s   gain:  12.22x
elapsed: 50.710 ms  25.847 MH/s   gain:  12.92x
elapsed: 50.670 ms  25.868 MH/s   gain:  12.93x
elapsed: 50.707 ms  25.849 MH/s   gain:  12.92x

Functional test: RESULT IS OK

   idx        hi       lo           hi           lo
     0: 16410000 D9080000    373358592   -653787136
     1: 4A820000 D0630000   1250033664   -798818304
     2: C3E00000 EDA60000  -1008730112   -307888128
     3: 1F020100 33FF0000    520225024    872349696
     4: 8A200100 F8F10000  -1977614080   -118423552
     5: 9F3A0100 C22D0100  -1623588608  -1037238016
     6: A6000200 86D40100  -1509948928  -2032926464
     7: C52A0200 A7190200   -987102720  -1491533312
     8: 36610200 F6380200    912327168   -164101632
     9: B8E80200 6BAB0200  -1192754688   1806369280
     A: B72B0300 E9280300  -1221917952   -383253760
     B: 684F0300 B04C0300   1750008576  -1337195776
     C: FA6F0300 F15D0300    -93388032   -245562624
     D: A9B80300 BE8D0300  -1447558400  -1098054912
     E: 06CF0300 5FCF0300    114230016   1607402240
     F: DCF90300 EDF00300   -587660544   -303037696
    10: FF300400 2F0A0400    -13630464    789185536
    11: D2DB0400 D5830400   -757398528   -712834048
    12: 97060500 53CD0400  -1761213184   1405944832
    13: E3100500 77160500   -485489408   1997931776
    14: 0E2E0500 3E1B0500    237896960   1041958144
    15: FA460500 2F490500    -96074496    793314560
    16: 2A860500 D0650500    713426176   -798685952
    17: 4BCC0500 8C950500   1271661824  -1936390912
    18: 860F0600 1DED0500  -2045835776    502072576
    19: 3B810600 3C710600    998311424   1014040064
    1A: E09B0600 E9840600   -526711296   -377223680
    1B: 58FE0600 56AF0600   1493042688   1454310912
    1C: 44160700 CBF90600   1142294272   -872872448
    1D: F9240700 DA1F0700   -115079424   -635500800
    1E: 79910700 64700700   2039547648   1685063424
    1F: 98FD0700 DFA10700  -1728248064   -543095040
    20: 44450800 8E0E0800   1145374720  -1911683072
    21: 1E4D0800 8B570800    508364800  -1957230592
    22: 317D0800 52670800    830277632   1382483968
    23: 20A30800 BE830800    547555328  -1098708992
    24: CFAE0800 FCAA0800   -810678272    -55965696
    25: AED30800 00B40800  -1361901568     11798528
    26: 37150900 1D070900    924125440    487000320
    27: 37570900 EE210900    928450816   -299824896
    28: 8F9C0900 21740900  -1885599488    561252608
    29: 729D0900 38960900   1922894080    949356800
    2A: 8C270A00 10D20900  -1943598592    282200320
    2B: A5460A00 163C0A00  -1522136576    373033472
    2C: 93540A00 2E470A00  -1823208960    776407552
    2D: FF7A0A00 19650A00     -8779264    426052096
    2E: BCEA0A00 A09A0A00  -1125512704  -1600517632
    2F: 94210B00 76F80A00  -1809773824   1995966976
    30: 2E5B0B00 38310B00    777718528    942738176
    31: 0BF70B00 27610B00    200739584    660671232
    32: CB8B0C00 EA5D0C00   -880079872   -363000832
    33: 2AA20C00 D59B0C00    715262976   -711259136
    34: 2AB00C00 38AB0C00    716180480    950733824
    35: 79DB0C00 DFC60C00   2044398592   -540668928
    36: A21B0D00 5D0E0D00  -1575285504   1561201920
    37: 05370D00 84190D00     87493888  -2078733056
    38: 58A90D00 4FAA0D00   1487473920   1336544512
    39: 26EF0D00 BAB10D00    653200640  -1162801920
    3A: EA030E00 E0F50D00   -368898560   -520811264
    3B: 960B0E00 A5090E00  -1777660416  -1526133248
    3C: 12410E00 2F140E00    306253312    789843456
    3D: 785B0E00 47490E00   2019233280   1195970048
    3E: 017C0E00 5D7B0E00     24907264   1568345600
    3F: 87B40E00 4D9A0E00  -2018243072   1301941760
    40: 83E20E00 7ACB0E00  -2082337280   2060127744
    41: 11110F00 85E70E00    286330624  -2048455168
    42: F3270F00 AE130F00   -215544064  -1374482688
    43: 19540F00 E8390F00    424939264   -398913792
    44: F4630F00 2C5D0F00   -194834688    744296192
    45: 66780F00 997A0F00   1719144192  -1720054016
    46: D1AE0F00 0AA50F00   -777122048    178589440
    47: 96C00F00 65AD0F00  -1765798144   1705840384
    48: 4ECB0F00 D8C40F00   1321930496   -658239744
    49: 4CF90F00 F1DF0F00   1291390720   -237039872
    4A: 6A1C1000 44171000   1780224000   1142362112
    4B: 62E21000 80841000   1658982400  -2138828800
    4C: 6F2B1100 26141100   1865093376    638849280
    4D: 3D481100 DB351100   1028133120   -617279232
    4E: A04F1100 F3521100  -1605431040   -212725504
    4F: 02BC1100 06881100     45879552    109580544
    50: 1DD71100 73D41100    500633856   1943277824
    51: 0CE51100 E7DF1100    216338688   -404811520
    52: BEFA1100 8DEA1100  -1090907904  -1914040064
    53: F6341200 DB2F1200   -164359680   -617672192
    54: 99541200 54331200  -1722543616   1412633088
    55: 10931200 01711200    278073856     24187392
    56: 1BAC1200 DBA21200    464261632   -610135552
    57: 7EF11200 37F01200   2129728000    938480128
    58: 7FFB1200 40F51200   2147160576   1089802752
    59: 84811300 38791300  -2071915776    947458816
    5A: 6DCA1300 98B11300   1841959680  -1733225728
    5B: 0A001400 64F01300    167777280   1693455104
    5C: 00000000 00000000            0            0
    5D: 00000000 00000000            0            0
    5E: 00000000 00000000            0            0
    5F: 00000000 00000000            0            0
    60: 00000000 00000000            0            0
    61: 00000000 00000000            0            0
    62: 00000000 00000000            0            0
    63: 00000000 00000000            0            0
    64: 00000000 00000000            0            0
    65: 00000000 00000000            0            0
    66: 00000000 00000000            0            0
    67: 00000000 00000000            0            0
    68: 00000000 00000000            0            0
    69: 00000000 00000000            0            0
    6A: 00000000 00000000            0            0
    6B: 00000000 00000000            0            0
    6C: 00000000 00000000            0            0
    6D: 00000000 00000000            0            0
    6E: 00000000 00000000            0            0
    6F: 00000000 00000000            0            0
    70: 00000000 00000000            0            0
    71: 00000000 00000000            0            0
    72: 00000000 00000000            0            0
    73: 00000000 00000000            0            0
    74: 00000000 00000000            0            0
    75: 00000000 00000000            0            0
    76: 00000000 00000000            0            0
    77: 00000000 00000000            0            0
    78: 00000000 00000000            0            0
    79: 00000000 00000000            0            0
    7A: 00000000 00000000            0            0
    7B: 00000000 00000000            0            0
    7C: 00000000 00000000            0            0
    7D: 00000000 00000000            0            0
    7E: 00000000 00000000            0            0
    7F: 000000B8 00000000          184            0


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 13, 2015, 03:57:29 AM
Well that neoscrypt is quiet complicated. I can't even got it compiled as I think it needs more defines than just WORKSIZE alone. Some day I gonna chack it from a closer view as it is interesting...

Now I have Cat 14.12 omega (whatever it is) now. Asm kernel is unchanged, original Ocl kernel is 15% faster than cat 14.9 but it is still way too bad.

I've compared your diamondTahiti compilation with my Capeverde one. The differences are not that complicated:
- In the ELF's header the 'archtype' field is 3FF vs. 3FD
- In the small binary info section (outer elf)   2x bytes are different: 9F vs. 9C
- In the small binary info section (inner elf)   one byte difference: 1C vs. 1A
- In the text ARG section the only difference is the strings: capeverde vs. tahiti

So if I collect all these constants/strings I can convert from one to another. But Capeverde and Tahiti are identical chips. It's possible that the binary of Hawaii is much more different.
And yet the two binary (capeverde and tahiti) are almost the same, the clBuildKernel() checks for hardware ids and refuses to load it.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 04:12:32 AM
ROFL u just had to try 14.12 hahahaha ... now back to 14.7RC3 LOL


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 13, 2015, 04:19:05 AM
* I've updated the main page with benchmark data I've collected: http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/

30 Mh/s is for the r9 290, 290x does 33.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: mitache365 on January 13, 2015, 08:05:07 AM
Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
I'm using 14.7r3, xI 2048, 1100/150, -w 256 undervolted to 1.00 and getting 23.38 MH/s.  What's your config?

same here. 23.4
witch is the right miner?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 08:49:56 AM
Thanks, I wasnt disabling the intel in UEFI which was my problem. its working now at 26.5MH/s per card which is amazing.
I'm using 14.7r3, xI 2048, 1100/150, -w 256 undervolted to 1.00 and getting 23.38 MH/s.  What's your config?

same here. 23.4
witch is the right miner?
Yer not gonna get 30-33MHs right out of the box on 290/290x, you will have to tune intensity, gpu clock, mem clock (lowest possible).  Pallas can help with these cards if he's in right mood.
On 280x (1180/150) I was able to use my tuning from previous kernel to get 26.0MHs only because it was already maxed out :) Volt modded, vbios modded etc.  Info about these techniques is in the thread if you look about ...
As far as miner ... sgminer 4.1.0 (sph) is what I use ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: mitache365 on January 13, 2015, 10:00:59 AM
I am at stock core 1070 and mem 1100. this is the difference maybe.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: physixz on January 13, 2015, 10:03:36 AM
Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 10:26:04 AM
Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.
Not sure if you can do this to a 290/290x card because vbios likely to be quite different.  You will have to do some research before you attempt my method usiing VBE7.0.0.7b.exe it is a video bios editor u can use to change voltages, clocks at board level.  If I remember correctly it was only for Tahiti cards ... do your research, then u flash vbios with atiwinflash.  There may be programs like msi afterburner but I did my card at low level :) Again check  out if it will work on your card before you do it or u can "brick" your card haha


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 13, 2015, 10:38:42 AM
Wow I got Best share: 702K
Was it a block???? :-D

Now let's get serious: I finally have a little time to write some considerations on the ocl and asm kernels.
I believe we should pursue the asm path for a number or reasons:

- currently the OCL kernel is a little faster on hawaii but not on all other cards and I don't think it can be improved in this respect
- the OCL kernel has been tweaked and optimized for months, while the asm one is new so there is probably much more room for improvement
- just by applying the first and last round optimization the asm kernel will probably be faster on hawaii as well; I'm sure that Realhet will find other asm tricks to apply
- with all these catalyst version problems, the best way to share kernels for the people to mine is by bin files, making the asm version and ocl equivalent (for distribution purposes); better yet would be a miner with all the bundled bin files (takes time)
- asm is cooler than ocl ;-)

what do you guys think?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 10:41:51 AM
I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: physixz on January 13, 2015, 10:45:29 AM
Well the best i can run at without crashing is at 1040 core / 1250 memory as it wont go lower with a -0.055 core volt drop

1 card the rig pulls 510W at 28.5MH/s so 0.056MH per watt
2 cards the rig pulls 740W at 57MH/s so 0.077MH per watt
3 cards the rig pull 990W at 85.5MH/s so 0.086MH per watt

which is about 230 - 250W per card with 0.114MH per watt excluding the system use

if anybody can get a higher hash per watt then let me know

EDIT

Even at that rate with my electricity costs i cant make a profit...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 10:45:34 AM
Wow I got Best share: 702K
Was it a block???? :-D

Now let's get serious: I finally have a little time to write some considerations on the ocl and asm kernels.
I believe we should pursue the asm path for a number or reasons:

- currently the OCL kernel is a little faster on hawaii but not on all other cards and I don't think it can be improved in this respect
- the OCL kernel has been tweaked and optimized for months, while the asm one is new so there is probably much more room for improvement
- just by applying the first and last round optimization the asm kernel will probably be faster on hawaii as well; I'm sure that Realhet will find other asm tricks to apply
- with all these catalyst version problems, the best way to share kernels for the people to mine is by bin files, making the asm version and ocl equivalent (for distribution purposes); better yet would be a miner with all the bundled bin files (takes time)
- asm is cooler than ocl ;-)

what do you guys think?
I'm all for sticking with asm route ... u need to feed your ocl tweaks to realhet and lets maximize asm kernel.
As I already suggested to realhet "cross-compile" to generate bins for all arch we support is possible, he needs our bins created on each arch to dig out minor diffs between bins.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on January 13, 2015, 10:48:19 AM
Wow I got Best share: 702K
Was it a block???? :-D

Now let's get serious: I finally have a little time to write some considerations on the ocl and asm kernels.
I believe we should pursue the asm path for a number or reasons:

- currently the OCL kernel is a little faster on hawaii but not on all other cards and I don't think it can be improved in this respect
- the OCL kernel has been tweaked and optimized for months, while the asm one is new so there is probably much more room for improvement
- just by applying the first and last round optimization the asm kernel will probably be faster on hawaii as well; I'm sure that Realhet will find other asm tricks to apply
- with all these catalyst version problems, the best way to share kernels for the people to mine is by bin files, making the asm version and ocl equivalent (for distribution purposes); better yet would be a miner with all the bundled bin files (takes time)
- asm is cooler than ocl ;-)

what do you guys think?
I'm all for sticking with asm route ... u need to feed your ocl tweaks to realhet and lets maximize asm kernel.
As I already suggested to realhet "cross-compile" to generate bins for all arch we support is possible, he needs our bins created on each arch to dig out minour diffs between bins.

ASM route seems better.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: mitache365 on January 13, 2015, 02:59:31 PM
Can I ask what software you are using to change the values as I'm using msi after burner 4.1 but it wont change the memory clock and the core clock is always lower than what I set?

These changes are really pushing my cards now as they normally sat at 50'C but are now over 60'C (They are watercooled). I will put up power usage when i next reboot and plug the power meter in.

also afterburner but 14.4 driver for me


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: mitache365 on January 13, 2015, 03:00:28 PM
I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150

I will stay at core 1070(dont like to overclock) but will set mem at 150 to see the result.

edit. hm again 23.4 but lower temps. thats fine enough I think.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 13, 2015, 03:04:19 PM
I am at stock core 1070 and mem 1100. this is the difference maybe.
Not to be condescending but have u tried on sgminer command line
--gpu-clock 1100 --mem-clock 150

I will stay at core 1070(dont like to overclock) but will set mem at 150 to see the result.

edit. hm again 23.4 but lower temps. thats fine enough I think.

lower mem clock = less power usage and bigger core overclock potential.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: mitache365 on January 13, 2015, 03:33:17 PM
hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 06:03:39 PM
hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.
Why can't ppl read the thread ... u will have best performance with driver 14.7RC3 ...

Many 280x are locked to small range of adjustment on clocks (PowerColor 280x being one of them, that's why I had to low-level vbios mod mine.  Also many 280x will throttle gpu-clock at temps above 72C)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: mitache365 on January 13, 2015, 06:19:24 PM
hmm now I see miner still showing the stock memlock 1550 for 280x vapor. why its not changed? driver 14.4
I changed it from the batch file also from the miner later.
Why can't ppl read the thread ... u will have best performance with driver 14.7RC3 ...

Many 280x are locked to small range of adjustment on clocks (PowerColor 280x being one of them, that's why I had to low-level vbios mod mine.  Also many 280x will throttle gpu-clock at temps above 72C)

thread readed. tryed 14.4 14.6 14.7 14.9
can't put the cards at lower memlock than the stock 1550 :( maybe really locked!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 06:22:54 PM
Trust me 14.7RC3 is best.
Then u are unlucky enough to have a "Locked" card.
Only recourse for u is vbios modding your card if u want to lower memclock to 150 and be able to do higher overclock on gpu.
Do the research on vbios modding ... there are pointers in this thread by myself, I hate repeating my self a hundred times that's why the info is in thread.
https://bitcointalk.org/index.php?topic=779598.msg9043545#msg9043545

BTW I can still clock mem at 1625 via sgminer setting when I mine X11 or Neoscrypt with it ... just have to set it manual for them ...

Welcome to Extreme Diamond Mining LOL


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: mitache365 on January 13, 2015, 07:37:37 PM
Trust me 14.7RC3 is best.
Then u are unlucky enough to have a "Locked" card.
Only recourse for u is vbios modding your card if u want to lower memclock to 150 and be able to do higher overclock on gpu.
Do the research on vbios modding ... there are pointers in this thread by myself, I hate repeating my self a hundred times that's why the info is in thread.
https://bitcointalk.org/index.php?topic=779598.msg9043545#msg9043545

BTW I can still clock mem at 1625 via sgminer setting when I mine X11 or Neoscrypt with it ... just have to set it manual for them ...

Welcome to Extreme Diamond Mining LOL

I have 3 different 280x cards(dual,vapor,toxic). Strange to see all are locked. The same thing is that all are sapphire. Will try this vbios modding theese days.
Thank you for the support. I will leave the rigs for now at 22.3mh(1020/1500) /23.4mh(1070/1550) /24mh(1100/1600). Little tired last 2 days trying to config everything :)) maybe I am wrong somewhere.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 13, 2015, 09:02:58 PM
For those of you with 290/290x cards that are locked:
http://www.overclock.net/t/1443242/the-r9-290-290x-unlock-thread

That thread is about transforming a 290 into 290x by means of firmware flashing.
But some are hardware locked, like mine... :-(


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 13, 2015, 09:10:54 PM
For those of you with 290/290x cards that are locked:
http://www.overclock.net/t/1443242/the-r9-290-290x-unlock-thread

That thread is about transforming a 290 into 290x by means of firmware flashing.
But some are hardware locked, like mine... :-(
Still may be useful for someone ...7% gain from 290 to 290x ...  :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 13, 2015, 09:17:19 PM
For those of you with 290/290x cards that are locked:
http://www.overclock.net/t/1443242/the-r9-290-290x-unlock-thread

That thread is about transforming a 290 into 290x by means of firmware flashing.
But some are hardware locked, like mine... :-(
Still may be useful for someone ...7% gain from 290 to 290x ...  :)

I have 30 on 290 and 33 on 290x, same clock, so it's 10% :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 14, 2015, 07:07:58 PM
Quote
* "Guys! We do not need more optimization!"
I've thought about this too. But I think if everyone use better kernels, then everyone will use the same power to get the same profit as difficulty will be harder but mining will require less power.
But what if not everyone uses the faster kernel. I think my compuler/IDE is helping in this a lot, as it is kinda user unfriendly
Just as I thought also, there has not been a widespread migration to new kernel :)  Difficulty for newbs to set-up properly, combined with falling BTC value make direct mining less attractive to the "Dumpers" ... Diff is back to a reasonable range again as miners drop out of game ...
This is good for those of us who stick with it :) BTC will recover eventually, I am a long-term DMD holder anyway :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 14, 2015, 09:13:04 PM
Utahjohn: true!
Realhet: still there?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 14, 2015, 09:31:58 PM
Realhet has given us an impressive tool to work with ... time to learn ASM coding ...
I don't really have a grasp on parallel processing and all the nuances of register usage on GPU ... looks like the ball is in your court Pallas.  Hopefully Realhet returns to continue on this project.

Found this, looking thru it now ...
http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf

Wish I had a printed book of this reference ... wonder if one could order it online, I have no printer ...

This is what I was looking for :) Wow a lot to grasp :)

Nice, even gives opcodes, I bet this is reference realhet used to build hetpas :)

In his dev thread he mentions u can inline an instruction opcode for any that are not supported by hetpas assembler.  The opcode tables for generating instructions manually are in the ref manual :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 14, 2015, 11:47:44 PM
@pallas
will you be able to do your best first/last pass implementation in ASM?
Looks like realhet has moved on to another project ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on January 15, 2015, 02:28:11 AM
Hi,

"- asm is cooler than ocl ;-)"  Haha, yes!
And ocl needs black magic to optimize, asm just does what you tell it to.

"Nice, even gives opcodes, I bet this is reference realhet used to build hetpas Smiley"
I remember, I had a work that time when 7970 came out. I just got one in 2011 december. There was no manual for more than half a year, but the disassembler worked well. So I decoded the instruction set using the disassembler. I even found some undocumented ones that way. It was fun.
But for some unknown reasons this approach is broken because 1-2 years ago the disassembler is just does nothing when the .elf is a binary only .elf (this is the case when you use my assembler).

Some tips:
- Use Ctrl+Space in the IDE! It's like Intellisense/codeInsight. (Just start typing v_something!)
- Press F1 on any instruction, it will show a mini help.
- You can DD anything that doesn't implemented. (eg. "dd $12345678, 0x74732921, 1234" emits 3 uints into code)
- Disassembling small opencl programs is a good source of knowledge. Also this is the 'documentstion' on how to specific set of pass kernel parameters.

"Looks like realhet has moved on to another project ..."
Yea, I have to continue my job soon, as my free time runs out. I'm only planning to experiment with a bit of 2D rope physics. But whatever, now I'm in a Red Alert 2 'project', haha :D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 15, 2015, 09:18:27 AM
Oh, that's a pity you moved away...
Not sure I like the idea of learning another asm, even if it's very cool!
I understand the first and last round optimizations are boring to do, but could you please, before leaving us, fix the problem with multiple cards? Where card 0 doesn't provide any work unit while card 1 works fine? Thanks!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 15, 2015, 09:26:14 AM
This one is going to take a lot of learning for me (I'm 53yo LOL learning takes more time for me hehehe) ... I'm going to copy asm src to flash drive and print at apt complex office so it's a bit easier for me to follow through.  Can u send me your latest greatest fastest OCL and I'll get that printed too ...

Funny u can only get 1 card to run, I have 280x as card 0, 7950 as card 1 and they both run kernel fine ...
Just an after thought, I have not tried with single instance of sgminer controlling both cards, I run an instance of sgminer for each card individually, so I do not know if this problem affects me ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on January 15, 2015, 10:28:31 AM
@realhet
can u make the right hand pane detachable in hetpas?  I like to use multiple monitors and have a full screen for IDE ...

Also would really appreciate if you could do the finishing touches on first/last pass as neither of us are up to speed on asm yet and could be quite a while ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 15, 2015, 12:25:43 PM
Oh, that's a pity you moved away...
Not sure I like the idea of learning another asm, even if it's very cool!
I understand the first and last round optimizations are boring to do, but could you please, before leaving us, fix the problem with multiple cards? Where card 0 doesn't provide any work unit while card 1 works fine? Thanks!

I built my bin as posted using Hetpass.
Two machines.
Two 7950s reference cards in one.
And a Dualx  7950 in the other.
All Sapphires. I use Sgminer 4.1 the original if you will.
Not sgminer 5.1. Too many bells and whistles.
Hetpass said these cards should do soo many hashes and it is correct.
I run two cards in one machine with out any problems.
.......

On my machine, card 0 hashed fine but no work submitted (WU=0).
Card 1 had normal WU.
I'm using 4.1 as well.
Never had this problem with any kernel before.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Toninho on January 17, 2015, 06:32:28 AM
Hi, i need Miner for Nvidia 560 TI DS Work about 3600 kh/s but in Groestl 100%  in --algo=dmd-gr  =  Bommm ...Bommm  reject  do not understand ??'


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: M1ST3R on January 17, 2015, 10:56:11 PM
Hi, i need Miner for Nvidia 560 TI DS Work about 3600 kh/s but in Groestl 100%  in --algo=dmd-gr  =  Bommm ...Bommm  reject  do not understand ??'

Hello, anyone home??? This is opencl kernel which is for AMD gpu not Nvidia.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qaz6767 on January 20, 2015, 10:27:50 AM
Dobrii den! Podskajite v shapke novii fail .cl ? Kotorii vidaet 30Mh na 290 karte? Spasibo


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 19, 2015, 03:01:59 PM
Is there still interest in this? Utahjohn and.... :-D
I could dedicate some time to finish the opensource kernel v2 if it's worth.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: berbip on February 19, 2015, 04:58:51 PM
Is there still interest in this? Utahjohn and.... :-D
I could dedicate some time to finish the opensource kernel v2 if it's worth.

Oh, i'm totally interested :)
Thanks for your great work btw


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 25, 2015, 01:02:39 PM
experimental new bin for Hawaii (r9 290/290X) only:

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

use worksize 128.

this is my opencl kernel, tweaked for speed and compatibility.
please report hashrates and show your support!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Star65 on February 25, 2015, 02:27:00 PM
Thx pallas! But im out of the game cos i have 280s only. Utahjohn too (i think so).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 25, 2015, 02:43:16 PM
Thx pallas! But im out of the game cos i have 280s only. Utahjohn too (i think so).

the new kernel should make no difference on tahiti cards, but we will eventually make some tests later anyway: the reason is compatibility with newer drivers.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on February 25, 2015, 07:26:18 PM
Thx pallas! But im out of the game cos i have 280s only. Utahjohn too (i think so).
Indeed 280x only here, I am in talks with Pallas to work on this further :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: M1ST3R on February 27, 2015, 03:44:29 AM
experimental new bin for Hawaii (r9 290/290X) only:

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

use worksize 128.

this is my opencl kernel, tweaked for speed and compatibility.
please report hashrates and show your support!

Hi Pallas,

The bin file is not working on both the sgminer 4.1.0 from Diamond website and sgminer 5 from Wolf0.
After the ...kernel is experimental... display, both sgminer version either hanged or display black screen.
Maybe the sgminer needs the specific v2 diamond.cl file to function properly.

BTW, unlike v1, changing the name of your .bin file to match the one sgminer generated does not work either.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 27, 2015, 09:30:03 AM
experimental new bin for Hawaii (r9 290/290X) only:

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

use worksize 128.

this is my opencl kernel, tweaked for speed and compatibility.
please report hashrates and show your support!

Hi Pallas,

The bin file is not working on both the sgminer 4.1.0 from Diamond website and sgminer 5 from Wolf0.
After the ...kernel is experimental... display, both sgminer version either hanged or display black screen.
Maybe the sgminer needs the specific v2 diamond.cl file to function properly.

BTW, unlike v1, changing the name of your .bin file to match the one sgminer generated does not work either.

the binary can work without the sources.
check that you are running a 64 bit miner (the official diamond miner is 32 bit), that you are using worksize 128 and that you are setting the correct bin file name.
and of course that you have a hawaii card! :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on February 27, 2015, 09:33:34 AM
experimental new bin for Hawaii (r9 290/290X) only:

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

use worksize 128.

this is my opencl kernel, tweaked for speed and compatibility.
please report hashrates and show your support!

Hi Pallas,

The bin file is not working on both the sgminer 4.1.0 from Diamond website and sgminer 5 from Wolf0.
After the ...kernel is experimental... display, both sgminer version either hanged or display black screen.
Maybe the sgminer needs the specific v2 diamond.cl file to function properly.

BTW, unlike v1, changing the name of your .bin file to match the one sgminer generated does not work either.

the binary can work without the sources.
check that you are running a 64 bit miner (the official diamond miner is 32 bit), that you are using worksize 128 and that you are setting the correct bin file name.
and of course that you have a hawaii card! :-)
Good reason to have OCL source LOL, I am 64 bit OS but my miner is 32 bit ...
In my experience blackscreen or just hung miner indicates too much O/C or memclock not set as recommended (GPU crash before it can be reported) ...

Without the OCL source I can not test on Tahiti properly so I have no definte answer ... u did not specify OS, config etc so a bit hard to troubleshoot ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sammy007 on February 27, 2015, 04:35:20 PM
Very nice results with 290(X), any chance for 280x gain?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 27, 2015, 04:41:36 PM
Very nice results with 290(X), any chance for 280x gain?

In order to do the same optimizations on 280(x), the code would need to be almost completely rewritten to work on 32 bit numbers instead of 64 (because of lds usage), hoping for the vgprs count (which is mostly in compiler control and very difficult to reduce by modifying the opencl code) to be low enough to permit 2 wavefronts.
Or, maybe, a better compiler in the future could do it by itself.
As of now, I think the best is using the asm version for 280(x) and my last binary for 290(x).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on February 27, 2015, 05:00:08 PM
Very nice results with 290(X), any chance for 280x gain?

In order to do the same optimizations on 280(x), the code would need to be almost completely rewritten to work on 32 bit numbers instead of 64 (because of lds usage), hoping for the vgprs count (which is mostly in compiler control and very difficult to reduce by modifying the opencl code) to be low enough to permit 2 wavefronts.
Or, maybe, a better compiler in the future could do it by itself.
As of now, I think the best is using the asm version for 280(x) and my last binary for 290(x).
Why do you think this, ASM version is driver independent and relies on directly coding for GPU, there is very little difference between 280x and 290, 290 has more shaders, true.   Buts basic code optimize such as your first/last pass should work in ASM just as well or better considering that AMD lobotomized OCL compiler after 14.7


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 27, 2015, 09:57:54 PM
Very nice results with 290(X), any chance for 280x gain?

In order to do the same optimizations on 280(x), the code would need to be almost completely rewritten to work on 32 bit numbers instead of 64 (because of lds usage), hoping for the vgprs count (which is mostly in compiler control and very difficult to reduce by modifying the opencl code) to be low enough to permit 2 wavefronts.
Or, maybe, a better compiler in the future could do it by itself.
As of now, I think the best is using the asm version for 280(x) and my last binary for 290(x).
Why do you think this, ASM version is driver independent and relies on directly coding for GPU, there is very little difference between 280x and 290, 290 has more shaders, true.   Buts basic code optimize such as your first/last pass should work in ASM just as well or better considering that AMD lobotomized OCL compiler after 14.7

14.12 is the first version making Hawaii specific code which, in some cases, may bring sensible improvements.
On Tahiti, the compiler simply can't make code capable of running 2 wavefronts. Or maybe it can but I'm not able to make it do it, on Hawaii I can instead.
Hawaii is not just Tahiti with more shaders...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on February 27, 2015, 10:11:06 PM
Very nice results with 290(X), any chance for 280x gain?

In order to do the same optimizations on 280(x), the code would need to be almost completely rewritten to work on 32 bit numbers instead of 64 (because of lds usage), hoping for the vgprs count (which is mostly in compiler control and very difficult to reduce by modifying the opencl code) to be low enough to permit 2 wavefronts.
Or, maybe, a better compiler in the future could do it by itself.
As of now, I think the best is using the asm version for 280(x) and my last binary for 290(x).
Why do you think this, ASM version is driver independent and relies on directly coding for GPU, there is very little difference between 280x and 290, 290 has more shaders, true.   Buts basic code optimize such as your first/last pass should work in ASM just as well or better considering that AMD lobotomized OCL compiler after 14.7

14.12 is the first version making Hawaii specific code which, in some cases, may bring sensible improvements.
On Tahiti, the compiler simply can't make code capable of running 2 wavefronts. Or maybe it can but I'm not able to make it do it, on Hawaii I can instead.
Hawaii is not just Tahiti with more shaders...
What are the differences, 280x (Tahiti)  can do multiple gpu-threads on many other coins (up to 4 gpu-threads on x11) with great efficiency, I do not understand why groesl can not.
Forgive me for asking such questions, but like my question about neoscrypt (which performs best with only 1 gpu-thread)  WS being totally tuned by amount of shaders ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 27, 2015, 10:34:32 PM
I believe multiple threads help with algos which use gpu ram: groestl does not. Only WS and intensity matter. TC is a buffer in ram so not relevant as well.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on February 27, 2015, 10:41:24 PM
OK that makes sense :)
So I can still play with WS in your new OCL? (I think WS may be card specific tuning).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: HR on February 28, 2015, 09:35:46 AM

Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 28, 2015, 09:51:37 AM
Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on February 28, 2015, 10:41:20 AM
Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO
There is interest, I am leaving DMD groestl as I am sick of crap there.  HR has convinced me to move to Digibyte which is myriad-groestl?  IDK yet I have to d/l wallet and blockchain ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on February 28, 2015, 11:29:29 AM
Myr-groestl: If there is interest and I can get some free time I'd love to do it :-)
Github: for a couple files it's not worth, IMHO
There is interest, I am leaving DMD groestl as I am sick of crap there.  HR has convinced me to move to Digibyte which is myriad-groestl?  IDK yet I have to d/l wallet and blockchain ...

I think it's multi-algo including myr-groestl, skein etc.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Heavyiron on March 01, 2015, 07:06:44 PM
Hello pallas and thank you for your kernel.
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series. But it is not adapted for sgminer and I have no skills to do this job.

Code:
#define CONSTANT __constant
#define LOCAL __local
#define GLOBAL __global
#define RESTRICT restrict
#define GLOBALID (uint)(get_global_id(0))
#define LOCALID get_local_id(0)

#define EXT_BYTE32_0(n) ((uint)(as_uchar4((uint)(n)).x))
#define EXT_BYTE32_1(n) ((uint)(as_uchar4((uint)(n)).y))
#define EXT_BYTE32_2(n) ((uint)(as_uchar4((uint)(n)).z))
#define EXT_BYTE32_3(n) ((uint)(as_uchar4((uint)(n)).w))

#define groestl_EXT_BYTE_0(n) EXT_BYTE32_0(n)
#define groestl_EXT_BYTE_1(n) EXT_BYTE32_1(n)
#define groestl_EXT_BYTE_2(n) EXT_BYTE32_2(n)
#define groestl_EXT_BYTE_3(n) EXT_BYTE32_3(n)


#define groestl_PMIX(src, dst, r)\
src[ 0] ^= (r);\
src[ 2] ^= 0x00000010u^(r);\
src[ 4] ^= 0x00000020u^(r);\
src[ 6] ^= 0x00000030u^(r);\
src[ 8] ^= 0x00000040u^(r);\
src[10] ^= 0x00000050u^(r);\
src[12] ^= 0x00000060u^(r);\
src[14] ^= 0x00000070u^(r);\
src[16] ^= 0x00000080u^(r);\
src[18] ^= 0x00000090u^(r);\
src[20] ^= 0x000000a0u^(r);\
src[22] ^= 0x000000b0u^(r);\
src[24] ^= 0x000000c0u^(r);\
src[26] ^= 0x000000d0u^(r);\
src[28] ^= 0x000000e0u^(r);\
src[30] ^= 0x000000f0u^(r);\
dst[ 0]  = groestl_T0[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 1]  = groestl_T0[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 2]  = groestl_T0[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 3]  = groestl_T0[groestl_EXT_BYTE_0(src[11])];\
dst[ 4]  = groestl_T0[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 5]  = groestl_T0[groestl_EXT_BYTE_0(src[13])];\
dst[ 6]  = groestl_T0[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 7]  = groestl_T0[groestl_EXT_BYTE_0(src[15])];\
dst[ 8]  = groestl_T0[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 9]  = groestl_T0[groestl_EXT_BYTE_0(src[17])];\
dst[10]  = groestl_T0[groestl_EXT_BYTE_0(src[10])];\
dst[11]  = groestl_T0[groestl_EXT_BYTE_0(src[19])];\
dst[12]  = groestl_T0[groestl_EXT_BYTE_0(src[12])];\
dst[13]  = groestl_T0[groestl_EXT_BYTE_0(src[21])];\
dst[14]  = groestl_T0[groestl_EXT_BYTE_0(src[14])];\
dst[15]  = groestl_T0[groestl_EXT_BYTE_0(src[23])];\
dst[16]  = groestl_T0[groestl_EXT_BYTE_0(src[16])];\
dst[17]  = groestl_T0[groestl_EXT_BYTE_0(src[25])];\
dst[18]  = groestl_T0[groestl_EXT_BYTE_0(src[18])];\
dst[19]  = groestl_T0[groestl_EXT_BYTE_0(src[27])];\
dst[20]  = groestl_T0[groestl_EXT_BYTE_0(src[20])];\
dst[21]  = groestl_T0[groestl_EXT_BYTE_0(src[29])];\
dst[22]  = groestl_T0[groestl_EXT_BYTE_0(src[22])];\
dst[23]  = groestl_T0[groestl_EXT_BYTE_0(src[31])];\
dst[24]  = groestl_T0[groestl_EXT_BYTE_0(src[24])];\
dst[25]  = groestl_T0[groestl_EXT_BYTE_0(src[ 1])];\
dst[26]  = groestl_T0[groestl_EXT_BYTE_0(src[26])];\
dst[27]  = groestl_T0[groestl_EXT_BYTE_0(src[ 3])];\
dst[28]  = groestl_T0[groestl_EXT_BYTE_0(src[28])];\
dst[29]  = groestl_T0[groestl_EXT_BYTE_0(src[ 5])];\
dst[30]  = groestl_T0[groestl_EXT_BYTE_0(src[30])];\
dst[31]  = groestl_T0[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 0] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 2])];\
dst[ 1] ^= groestl_T1[groestl_EXT_BYTE_1(src[11])];\
dst[ 2] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 3] ^= groestl_T1[groestl_EXT_BYTE_1(src[13])];\
dst[ 4] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 5] ^= groestl_T1[groestl_EXT_BYTE_1(src[15])];\
dst[ 6] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 7] ^= groestl_T1[groestl_EXT_BYTE_1(src[17])];\
dst[ 8] ^= groestl_T1[groestl_EXT_BYTE_1(src[10])];\
dst[ 9] ^= groestl_T1[groestl_EXT_BYTE_1(src[19])];\
dst[10] ^= groestl_T1[groestl_EXT_BYTE_1(src[12])];\
dst[11] ^= groestl_T1[groestl_EXT_BYTE_1(src[21])];\
dst[12] ^= groestl_T1[groestl_EXT_BYTE_1(src[14])];\
dst[13] ^= groestl_T1[groestl_EXT_BYTE_1(src[23])];\
dst[14] ^= groestl_T1[groestl_EXT_BYTE_1(src[16])];\
dst[15] ^= groestl_T1[groestl_EXT_BYTE_1(src[25])];\
dst[16] ^= groestl_T1[groestl_EXT_BYTE_1(src[18])];\
dst[17] ^= groestl_T1[groestl_EXT_BYTE_1(src[27])];\
dst[18] ^= groestl_T1[groestl_EXT_BYTE_1(src[20])];\
dst[19] ^= groestl_T1[groestl_EXT_BYTE_1(src[29])];\
dst[20] ^= groestl_T1[groestl_EXT_BYTE_1(src[22])];\
dst[21] ^= groestl_T1[groestl_EXT_BYTE_1(src[31])];\
dst[22] ^= groestl_T1[groestl_EXT_BYTE_1(src[24])];\
dst[23] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 1])];\
dst[24] ^= groestl_T1[groestl_EXT_BYTE_1(src[26])];\
dst[25] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 3])];\
dst[26] ^= groestl_T1[groestl_EXT_BYTE_1(src[28])];\
dst[27] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 5])];\
dst[28] ^= groestl_T1[groestl_EXT_BYTE_1(src[30])];\
dst[29] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 7])];\
dst[30] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 0])];\
dst[31] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 0] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 4])];\
dst[ 1] ^= groestl_T2[groestl_EXT_BYTE_2(src[13])];\
dst[ 2] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 6])];\
dst[ 3] ^= groestl_T2[groestl_EXT_BYTE_2(src[15])];\
dst[ 4] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 5] ^= groestl_T2[groestl_EXT_BYTE_2(src[17])];\
dst[ 6] ^= groestl_T2[groestl_EXT_BYTE_2(src[10])];\
dst[ 7] ^= groestl_T2[groestl_EXT_BYTE_2(src[19])];\
dst[ 8] ^= groestl_T2[groestl_EXT_BYTE_2(src[12])];\
dst[ 9] ^= groestl_T2[groestl_EXT_BYTE_2(src[21])];\
dst[10] ^= groestl_T2[groestl_EXT_BYTE_2(src[14])];\
dst[11] ^= groestl_T2[groestl_EXT_BYTE_2(src[23])];\
dst[12] ^= groestl_T2[groestl_EXT_BYTE_2(src[16])];\
dst[13] ^= groestl_T2[groestl_EXT_BYTE_2(src[25])];\
dst[14] ^= groestl_T2[groestl_EXT_BYTE_2(src[18])];\
dst[15] ^= groestl_T2[groestl_EXT_BYTE_2(src[27])];\
dst[16] ^= groestl_T2[groestl_EXT_BYTE_2(src[20])];\
dst[17] ^= groestl_T2[groestl_EXT_BYTE_2(src[29])];\
dst[18] ^= groestl_T2[groestl_EXT_BYTE_2(src[22])];\
dst[19] ^= groestl_T2[groestl_EXT_BYTE_2(src[31])];\
dst[20] ^= groestl_T2[groestl_EXT_BYTE_2(src[24])];\
dst[21] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 1])];\
dst[22] ^= groestl_T2[groestl_EXT_BYTE_2(src[26])];\
dst[23] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 3])];\
dst[24] ^= groestl_T2[groestl_EXT_BYTE_2(src[28])];\
dst[25] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 5])];\
dst[26] ^= groestl_T2[groestl_EXT_BYTE_2(src[30])];\
dst[27] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 7])];\
dst[28] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 0])];\
dst[29] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 9])];\
dst[30] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 2])];\
dst[31] ^= groestl_T2[groestl_EXT_BYTE_2(src[11])];\
dst[ 0] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 6])];\
dst[ 1] ^= groestl_T3[groestl_EXT_BYTE_3(src[23])];\
dst[ 2] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 8])];\
dst[ 3] ^= groestl_T3[groestl_EXT_BYTE_3(src[25])];\
dst[ 4] ^= groestl_T3[groestl_EXT_BYTE_3(src[10])];\
dst[ 5] ^= groestl_T3[groestl_EXT_BYTE_3(src[27])];\
dst[ 6] ^= groestl_T3[groestl_EXT_BYTE_3(src[12])];\
dst[ 7] ^= groestl_T3[groestl_EXT_BYTE_3(src[29])];\
dst[ 8] ^= groestl_T3[groestl_EXT_BYTE_3(src[14])];\
dst[ 9] ^= groestl_T3[groestl_EXT_BYTE_3(src[31])];\
dst[10] ^= groestl_T3[groestl_EXT_BYTE_3(src[16])];\
dst[11] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 1])];\
dst[12] ^= groestl_T3[groestl_EXT_BYTE_3(src[18])];\
dst[13] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 3])];\
dst[14] ^= groestl_T3[groestl_EXT_BYTE_3(src[20])];\
dst[15] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 5])];\
dst[16] ^= groestl_T3[groestl_EXT_BYTE_3(src[22])];\
dst[17] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 7])];\
dst[18] ^= groestl_T3[groestl_EXT_BYTE_3(src[24])];\
dst[19] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 9])];\
dst[20] ^= groestl_T3[groestl_EXT_BYTE_3(src[26])];\
dst[21] ^= groestl_T3[groestl_EXT_BYTE_3(src[11])];\
dst[22] ^= groestl_T3[groestl_EXT_BYTE_3(src[28])];\
dst[23] ^= groestl_T3[groestl_EXT_BYTE_3(src[13])];\
dst[24] ^= groestl_T3[groestl_EXT_BYTE_3(src[30])];\
dst[25] ^= groestl_T3[groestl_EXT_BYTE_3(src[15])];\
dst[26] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 0])];\
dst[27] ^= groestl_T3[groestl_EXT_BYTE_3(src[17])];\
dst[28] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 2])];\
dst[29] ^= groestl_T3[groestl_EXT_BYTE_3(src[19])];\
dst[30] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 4])];\
dst[31] ^= groestl_T3[groestl_EXT_BYTE_3(src[21])];\
dst[ 0] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 1] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 2] ^= groestl_T4[groestl_EXT_BYTE_0(src[11])];\
dst[ 3] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 4] ^= groestl_T4[groestl_EXT_BYTE_0(src[13])];\
dst[ 5] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 6] ^= groestl_T4[groestl_EXT_BYTE_0(src[15])];\
dst[ 7] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 8] ^= groestl_T4[groestl_EXT_BYTE_0(src[17])];\
dst[ 9] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 8])];\
dst[10] ^= groestl_T4[groestl_EXT_BYTE_0(src[19])];\
dst[11] ^= groestl_T4[groestl_EXT_BYTE_0(src[10])];\
dst[12] ^= groestl_T4[groestl_EXT_BYTE_0(src[21])];\
dst[13] ^= groestl_T4[groestl_EXT_BYTE_0(src[12])];\
dst[14] ^= groestl_T4[groestl_EXT_BYTE_0(src[23])];\
dst[15] ^= groestl_T4[groestl_EXT_BYTE_0(src[14])];\
dst[16] ^= groestl_T4[groestl_EXT_BYTE_0(src[25])];\
dst[17] ^= groestl_T4[groestl_EXT_BYTE_0(src[16])];\
dst[18] ^= groestl_T4[groestl_EXT_BYTE_0(src[27])];\
dst[19] ^= groestl_T4[groestl_EXT_BYTE_0(src[18])];\
dst[20] ^= groestl_T4[groestl_EXT_BYTE_0(src[29])];\
dst[21] ^= groestl_T4[groestl_EXT_BYTE_0(src[20])];\
dst[22] ^= groestl_T4[groestl_EXT_BYTE_0(src[31])];\
dst[23] ^= groestl_T4[groestl_EXT_BYTE_0(src[22])];\
dst[24] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 1])];\
dst[25] ^= groestl_T4[groestl_EXT_BYTE_0(src[24])];\
dst[26] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 3])];\
dst[27] ^= groestl_T4[groestl_EXT_BYTE_0(src[26])];\
dst[28] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 5])];\
dst[29] ^= groestl_T4[groestl_EXT_BYTE_0(src[28])];\
dst[30] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 7])];\
dst[31] ^= groestl_T4[groestl_EXT_BYTE_0(src[30])];\
dst[ 0] ^= groestl_T5[groestl_EXT_BYTE_1(src[11])];\
dst[ 1] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 2])];\
dst[ 2] ^= groestl_T5[groestl_EXT_BYTE_1(src[13])];\
dst[ 3] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 4] ^= groestl_T5[groestl_EXT_BYTE_1(src[15])];\
dst[ 5] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 6] ^= groestl_T5[groestl_EXT_BYTE_1(src[17])];\
dst[ 7] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 8] ^= groestl_T5[groestl_EXT_BYTE_1(src[19])];\
dst[ 9] ^= groestl_T5[groestl_EXT_BYTE_1(src[10])];\
dst[10] ^= groestl_T5[groestl_EXT_BYTE_1(src[21])];\
dst[11] ^= groestl_T5[groestl_EXT_BYTE_1(src[12])];\
dst[12] ^= groestl_T5[groestl_EXT_BYTE_1(src[23])];\
dst[13] ^= groestl_T5[groestl_EXT_BYTE_1(src[14])];\
dst[14] ^= groestl_T5[groestl_EXT_BYTE_1(src[25])];\
dst[15] ^= groestl_T5[groestl_EXT_BYTE_1(src[16])];\
dst[16] ^= groestl_T5[groestl_EXT_BYTE_1(src[27])];\
dst[17] ^= groestl_T5[groestl_EXT_BYTE_1(src[18])];\
dst[18] ^= groestl_T5[groestl_EXT_BYTE_1(src[29])];\
dst[19] ^= groestl_T5[groestl_EXT_BYTE_1(src[20])];\
dst[20] ^= groestl_T5[groestl_EXT_BYTE_1(src[31])];\
dst[21] ^= groestl_T5[groestl_EXT_BYTE_1(src[22])];\
dst[22] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 1])];\
dst[23] ^= groestl_T5[groestl_EXT_BYTE_1(src[24])];\
dst[24] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 3])];\
dst[25] ^= groestl_T5[groestl_EXT_BYTE_1(src[26])];\
dst[26] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 5])];\
dst[27] ^= groestl_T5[groestl_EXT_BYTE_1(src[28])];\
dst[28] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 7])];\
dst[29] ^= groestl_T5[groestl_EXT_BYTE_1(src[30])];\
dst[30] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 9])];\
dst[31] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 0])];\
dst[ 0] ^= groestl_T6[groestl_EXT_BYTE_2(src[13])];\
dst[ 1] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 4])];\
dst[ 2] ^= groestl_T6[groestl_EXT_BYTE_2(src[15])];\
dst[ 3] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 6])];\
dst[ 4] ^= groestl_T6[groestl_EXT_BYTE_2(src[17])];\
dst[ 5] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 6] ^= groestl_T6[groestl_EXT_BYTE_2(src[19])];\
dst[ 7] ^= groestl_T6[groestl_EXT_BYTE_2(src[10])];\
dst[ 8] ^= groestl_T6[groestl_EXT_BYTE_2(src[21])];\
dst[ 9] ^= groestl_T6[groestl_EXT_BYTE_2(src[12])];\
dst[10] ^= groestl_T6[groestl_EXT_BYTE_2(src[23])];\
dst[11] ^= groestl_T6[groestl_EXT_BYTE_2(src[14])];\
dst[12] ^= groestl_T6[groestl_EXT_BYTE_2(src[25])];\
dst[13] ^= groestl_T6[groestl_EXT_BYTE_2(src[16])];\
dst[14] ^= groestl_T6[groestl_EXT_BYTE_2(src[27])];\
dst[15] ^= groestl_T6[groestl_EXT_BYTE_2(src[18])];\
dst[16] ^= groestl_T6[groestl_EXT_BYTE_2(src[29])];\
dst[17] ^= groestl_T6[groestl_EXT_BYTE_2(src[20])];\
dst[18] ^= groestl_T6[groestl_EXT_BYTE_2(src[31])];\
dst[19] ^= groestl_T6[groestl_EXT_BYTE_2(src[22])];\
dst[20] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 1])];\
dst[21] ^= groestl_T6[groestl_EXT_BYTE_2(src[24])];\
dst[22] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 3])];\
dst[23] ^= groestl_T6[groestl_EXT_BYTE_2(src[26])];\
dst[24] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 5])];\
dst[25] ^= groestl_T6[groestl_EXT_BYTE_2(src[28])];\
dst[26] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 7])];\
dst[27] ^= groestl_T6[groestl_EXT_BYTE_2(src[30])];\
dst[28] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 9])];\
dst[29] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 0])];\
dst[30] ^= groestl_T6[groestl_EXT_BYTE_2(src[11])];\
dst[31] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 2])];\
dst[ 0] ^= groestl_T7[groestl_EXT_BYTE_3(src[23])];\
dst[ 1] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 6])];\
dst[ 2] ^= groestl_T7[groestl_EXT_BYTE_3(src[25])];\
dst[ 3] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 8])];\
dst[ 4] ^= groestl_T7[groestl_EXT_BYTE_3(src[27])];\
dst[ 5] ^= groestl_T7[groestl_EXT_BYTE_3(src[10])];\
dst[ 6] ^= groestl_T7[groestl_EXT_BYTE_3(src[29])];\
dst[ 7] ^= groestl_T7[groestl_EXT_BYTE_3(src[12])];\
dst[ 8] ^= groestl_T7[groestl_EXT_BYTE_3(src[31])];\
dst[ 9] ^= groestl_T7[groestl_EXT_BYTE_3(src[14])];\
dst[10] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 1])];\
dst[11] ^= groestl_T7[groestl_EXT_BYTE_3(src[16])];\
dst[12] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 3])];\
dst[13] ^= groestl_T7[groestl_EXT_BYTE_3(src[18])];\
dst[14] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 5])];\
dst[15] ^= groestl_T7[groestl_EXT_BYTE_3(src[20])];\
dst[16] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 7])];\
dst[17] ^= groestl_T7[groestl_EXT_BYTE_3(src[22])];\
dst[18] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 9])];\
dst[19] ^= groestl_T7[groestl_EXT_BYTE_3(src[24])];\
dst[20] ^= groestl_T7[groestl_EXT_BYTE_3(src[11])];\
dst[21] ^= groestl_T7[groestl_EXT_BYTE_3(src[26])];\
dst[22] ^= groestl_T7[groestl_EXT_BYTE_3(src[13])];\
dst[23] ^= groestl_T7[groestl_EXT_BYTE_3(src[28])];\
dst[24] ^= groestl_T7[groestl_EXT_BYTE_3(src[15])];\
dst[25] ^= groestl_T7[groestl_EXT_BYTE_3(src[30])];\
dst[26] ^= groestl_T7[groestl_EXT_BYTE_3(src[17])];\
dst[27] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 0])];\
dst[28] ^= groestl_T7[groestl_EXT_BYTE_3(src[19])];\
dst[29] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 2])];\
dst[30] ^= groestl_T7[groestl_EXT_BYTE_3(src[21])];\
dst[31] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 4])];

#define groestl_QMIX(src, dst, r)\
src[ 0] = ~src[ 0];\
src[ 1] ^= ~(r);\
src[ 2] = ~src[ 2];\
src[ 3] ^= 0xefffffffu^(r);\
src[ 4] = ~src[ 4];\
src[ 5] ^= 0xdfffffffu^(r);\
src[ 6] = ~src[ 6];\
src[ 7] ^= 0xcfffffffu^(r);\
src[ 8] = ~src[ 8];\
src[ 9] ^= 0xbfffffffu^(r);\
src[10] = ~src[10];\
src[11] ^= 0xafffffffu^(r);\
src[12] = ~src[12];\
src[13] ^= 0x9fffffffu^(r);\
src[14] = ~src[14];\
src[15] ^= 0x8fffffffu^(r);\
src[16] = ~src[16];\
src[17] ^= 0x7fffffffu^(r);\
src[18] = ~src[18];\
src[19] ^= 0x6fffffffu^(r);\
src[20] = ~src[20];\
src[21] ^= 0x5fffffffu^(r);\
src[22] = ~src[22];\
src[23] ^= 0x4fffffffu^(r);\
src[24] = ~src[24];\
src[25] ^= 0x3fffffffu^(r);\
src[26] = ~src[26];\
src[27] ^= 0x2fffffffu^(r);\
src[28] = ~src[28];\
src[29] ^= 0x1fffffffu^(r);\
src[30] = ~src[30];\
src[31] ^= 0x0fffffffu^(r);\
dst[ 0]  = groestl_T0[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 1]  = groestl_T0[groestl_EXT_BYTE_0(src[ 1])];\
dst[ 2]  = groestl_T0[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 3]  = groestl_T0[groestl_EXT_BYTE_0(src[ 3])];\
dst[ 4]  = groestl_T0[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 5]  = groestl_T0[groestl_EXT_BYTE_0(src[ 5])];\
dst[ 6]  = groestl_T0[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 7]  = groestl_T0[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 8]  = groestl_T0[groestl_EXT_BYTE_0(src[10])];\
dst[ 9]  = groestl_T0[groestl_EXT_BYTE_0(src[ 9])];\
dst[10]  = groestl_T0[groestl_EXT_BYTE_0(src[12])];\
dst[11]  = groestl_T0[groestl_EXT_BYTE_0(src[11])];\
dst[12]  = groestl_T0[groestl_EXT_BYTE_0(src[14])];\
dst[13]  = groestl_T0[groestl_EXT_BYTE_0(src[13])];\
dst[14]  = groestl_T0[groestl_EXT_BYTE_0(src[16])];\
dst[15]  = groestl_T0[groestl_EXT_BYTE_0(src[15])];\
dst[16]  = groestl_T0[groestl_EXT_BYTE_0(src[18])];\
dst[17]  = groestl_T0[groestl_EXT_BYTE_0(src[17])];\
dst[18]  = groestl_T0[groestl_EXT_BYTE_0(src[20])];\
dst[19]  = groestl_T0[groestl_EXT_BYTE_0(src[19])];\
dst[20]  = groestl_T0[groestl_EXT_BYTE_0(src[22])];\
dst[21]  = groestl_T0[groestl_EXT_BYTE_0(src[21])];\
dst[22]  = groestl_T0[groestl_EXT_BYTE_0(src[24])];\
dst[23]  = groestl_T0[groestl_EXT_BYTE_0(src[23])];\
dst[24]  = groestl_T0[groestl_EXT_BYTE_0(src[26])];\
dst[25]  = groestl_T0[groestl_EXT_BYTE_0(src[25])];\
dst[26]  = groestl_T0[groestl_EXT_BYTE_0(src[28])];\
dst[27]  = groestl_T0[groestl_EXT_BYTE_0(src[27])];\
dst[28]  = groestl_T0[groestl_EXT_BYTE_0(src[30])];\
dst[29]  = groestl_T0[groestl_EXT_BYTE_0(src[29])];\
dst[30]  = groestl_T0[groestl_EXT_BYTE_0(src[ 0])];\
dst[31]  = groestl_T0[groestl_EXT_BYTE_0(src[31])];\
dst[ 0] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 1] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 5])];\
dst[ 2] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 3] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 7])];\
dst[ 4] ^= groestl_T1[groestl_EXT_BYTE_1(src[10])];\
dst[ 5] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 6] ^= groestl_T1[groestl_EXT_BYTE_1(src[12])];\
dst[ 7] ^= groestl_T1[groestl_EXT_BYTE_1(src[11])];\
dst[ 8] ^= groestl_T1[groestl_EXT_BYTE_1(src[14])];\
dst[ 9] ^= groestl_T1[groestl_EXT_BYTE_1(src[13])];\
dst[10] ^= groestl_T1[groestl_EXT_BYTE_1(src[16])];\
dst[11] ^= groestl_T1[groestl_EXT_BYTE_1(src[15])];\
dst[12] ^= groestl_T1[groestl_EXT_BYTE_1(src[18])];\
dst[13] ^= groestl_T1[groestl_EXT_BYTE_1(src[17])];\
dst[14] ^= groestl_T1[groestl_EXT_BYTE_1(src[20])];\
dst[15] ^= groestl_T1[groestl_EXT_BYTE_1(src[19])];\
dst[16] ^= groestl_T1[groestl_EXT_BYTE_1(src[22])];\
dst[17] ^= groestl_T1[groestl_EXT_BYTE_1(src[21])];\
dst[18] ^= groestl_T1[groestl_EXT_BYTE_1(src[24])];\
dst[19] ^= groestl_T1[groestl_EXT_BYTE_1(src[23])];\
dst[20] ^= groestl_T1[groestl_EXT_BYTE_1(src[26])];\
dst[21] ^= groestl_T1[groestl_EXT_BYTE_1(src[25])];\
dst[22] ^= groestl_T1[groestl_EXT_BYTE_1(src[28])];\
dst[23] ^= groestl_T1[groestl_EXT_BYTE_1(src[27])];\
dst[24] ^= groestl_T1[groestl_EXT_BYTE_1(src[30])];\
dst[25] ^= groestl_T1[groestl_EXT_BYTE_1(src[29])];\
dst[26] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 0])];\
dst[27] ^= groestl_T1[groestl_EXT_BYTE_1(src[31])];\
dst[28] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 2])];\
dst[29] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 1])];\
dst[30] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 4])];\
dst[31] ^= groestl_T1[groestl_EXT_BYTE_1(src[ 3])];\
dst[ 0] ^= groestl_T2[groestl_EXT_BYTE_2(src[10])];\
dst[ 1] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 9])];\
dst[ 2] ^= groestl_T2[groestl_EXT_BYTE_2(src[12])];\
dst[ 3] ^= groestl_T2[groestl_EXT_BYTE_2(src[11])];\
dst[ 4] ^= groestl_T2[groestl_EXT_BYTE_2(src[14])];\
dst[ 5] ^= groestl_T2[groestl_EXT_BYTE_2(src[13])];\
dst[ 6] ^= groestl_T2[groestl_EXT_BYTE_2(src[16])];\
dst[ 7] ^= groestl_T2[groestl_EXT_BYTE_2(src[15])];\
dst[ 8] ^= groestl_T2[groestl_EXT_BYTE_2(src[18])];\
dst[ 9] ^= groestl_T2[groestl_EXT_BYTE_2(src[17])];\
dst[10] ^= groestl_T2[groestl_EXT_BYTE_2(src[20])];\
dst[11] ^= groestl_T2[groestl_EXT_BYTE_2(src[19])];\
dst[12] ^= groestl_T2[groestl_EXT_BYTE_2(src[22])];\
dst[13] ^= groestl_T2[groestl_EXT_BYTE_2(src[21])];\
dst[14] ^= groestl_T2[groestl_EXT_BYTE_2(src[24])];\
dst[15] ^= groestl_T2[groestl_EXT_BYTE_2(src[23])];\
dst[16] ^= groestl_T2[groestl_EXT_BYTE_2(src[26])];\
dst[17] ^= groestl_T2[groestl_EXT_BYTE_2(src[25])];\
dst[18] ^= groestl_T2[groestl_EXT_BYTE_2(src[28])];\
dst[19] ^= groestl_T2[groestl_EXT_BYTE_2(src[27])];\
dst[20] ^= groestl_T2[groestl_EXT_BYTE_2(src[30])];\
dst[21] ^= groestl_T2[groestl_EXT_BYTE_2(src[29])];\
dst[22] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 0])];\
dst[23] ^= groestl_T2[groestl_EXT_BYTE_2(src[31])];\
dst[24] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 2])];\
dst[25] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 1])];\
dst[26] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 4])];\
dst[27] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 3])];\
dst[28] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 6])];\
dst[29] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 5])];\
dst[30] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 8])];\
dst[31] ^= groestl_T2[groestl_EXT_BYTE_2(src[ 7])];\
dst[ 0] ^= groestl_T3[groestl_EXT_BYTE_3(src[22])];\
dst[ 1] ^= groestl_T3[groestl_EXT_BYTE_3(src[13])];\
dst[ 2] ^= groestl_T3[groestl_EXT_BYTE_3(src[24])];\
dst[ 3] ^= groestl_T3[groestl_EXT_BYTE_3(src[15])];\
dst[ 4] ^= groestl_T3[groestl_EXT_BYTE_3(src[26])];\
dst[ 5] ^= groestl_T3[groestl_EXT_BYTE_3(src[17])];\
dst[ 6] ^= groestl_T3[groestl_EXT_BYTE_3(src[28])];\
dst[ 7] ^= groestl_T3[groestl_EXT_BYTE_3(src[19])];\
dst[ 8] ^= groestl_T3[groestl_EXT_BYTE_3(src[30])];\
dst[ 9] ^= groestl_T3[groestl_EXT_BYTE_3(src[21])];\
dst[10] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 0])];\
dst[11] ^= groestl_T3[groestl_EXT_BYTE_3(src[23])];\
dst[12] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 2])];\
dst[13] ^= groestl_T3[groestl_EXT_BYTE_3(src[25])];\
dst[14] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 4])];\
dst[15] ^= groestl_T3[groestl_EXT_BYTE_3(src[27])];\
dst[16] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 6])];\
dst[17] ^= groestl_T3[groestl_EXT_BYTE_3(src[29])];\
dst[18] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 8])];\
dst[19] ^= groestl_T3[groestl_EXT_BYTE_3(src[31])];\
dst[20] ^= groestl_T3[groestl_EXT_BYTE_3(src[10])];\
dst[21] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 1])];\
dst[22] ^= groestl_T3[groestl_EXT_BYTE_3(src[12])];\
dst[23] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 3])];\
dst[24] ^= groestl_T3[groestl_EXT_BYTE_3(src[14])];\
dst[25] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 5])];\
dst[26] ^= groestl_T3[groestl_EXT_BYTE_3(src[16])];\
dst[27] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 7])];\
dst[28] ^= groestl_T3[groestl_EXT_BYTE_3(src[18])];\
dst[29] ^= groestl_T3[groestl_EXT_BYTE_3(src[ 9])];\
dst[30] ^= groestl_T3[groestl_EXT_BYTE_3(src[20])];\
dst[31] ^= groestl_T3[groestl_EXT_BYTE_3(src[11])];\
dst[ 0] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 1])];\
dst[ 1] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 2])];\
dst[ 2] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 3])];\
dst[ 3] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 4])];\
dst[ 4] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 5])];\
dst[ 5] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 6])];\
dst[ 6] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 7])];\
dst[ 7] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 8])];\
dst[ 8] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 9])];\
dst[ 9] ^= groestl_T4[groestl_EXT_BYTE_0(src[10])];\
dst[10] ^= groestl_T4[groestl_EXT_BYTE_0(src[11])];\
dst[11] ^= groestl_T4[groestl_EXT_BYTE_0(src[12])];\
dst[12] ^= groestl_T4[groestl_EXT_BYTE_0(src[13])];\
dst[13] ^= groestl_T4[groestl_EXT_BYTE_0(src[14])];\
dst[14] ^= groestl_T4[groestl_EXT_BYTE_0(src[15])];\
dst[15] ^= groestl_T4[groestl_EXT_BYTE_0(src[16])];\
dst[16] ^= groestl_T4[groestl_EXT_BYTE_0(src[17])];\
dst[17] ^= groestl_T4[groestl_EXT_BYTE_0(src[18])];\
dst[18] ^= groestl_T4[groestl_EXT_BYTE_0(src[19])];\
dst[19] ^= groestl_T4[groestl_EXT_BYTE_0(src[20])];\
dst[20] ^= groestl_T4[groestl_EXT_BYTE_0(src[21])];\
dst[21] ^= groestl_T4[groestl_EXT_BYTE_0(src[22])];\
dst[22] ^= groestl_T4[groestl_EXT_BYTE_0(src[23])];\
dst[23] ^= groestl_T4[groestl_EXT_BYTE_0(src[24])];\
dst[24] ^= groestl_T4[groestl_EXT_BYTE_0(src[25])];\
dst[25] ^= groestl_T4[groestl_EXT_BYTE_0(src[26])];\
dst[26] ^= groestl_T4[groestl_EXT_BYTE_0(src[27])];\
dst[27] ^= groestl_T4[groestl_EXT_BYTE_0(src[28])];\
dst[28] ^= groestl_T4[groestl_EXT_BYTE_0(src[29])];\
dst[29] ^= groestl_T4[groestl_EXT_BYTE_0(src[30])];\
dst[30] ^= groestl_T4[groestl_EXT_BYTE_0(src[31])];\
dst[31] ^= groestl_T4[groestl_EXT_BYTE_0(src[ 0])];\
dst[ 0] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 5])];\
dst[ 1] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 6])];\
dst[ 2] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 7])];\
dst[ 3] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 8])];\
dst[ 4] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 9])];\
dst[ 5] ^= groestl_T5[groestl_EXT_BYTE_1(src[10])];\
dst[ 6] ^= groestl_T5[groestl_EXT_BYTE_1(src[11])];\
dst[ 7] ^= groestl_T5[groestl_EXT_BYTE_1(src[12])];\
dst[ 8] ^= groestl_T5[groestl_EXT_BYTE_1(src[13])];\
dst[ 9] ^= groestl_T5[groestl_EXT_BYTE_1(src[14])];\
dst[10] ^= groestl_T5[groestl_EXT_BYTE_1(src[15])];\
dst[11] ^= groestl_T5[groestl_EXT_BYTE_1(src[16])];\
dst[12] ^= groestl_T5[groestl_EXT_BYTE_1(src[17])];\
dst[13] ^= groestl_T5[groestl_EXT_BYTE_1(src[18])];\
dst[14] ^= groestl_T5[groestl_EXT_BYTE_1(src[19])];\
dst[15] ^= groestl_T5[groestl_EXT_BYTE_1(src[20])];\
dst[16] ^= groestl_T5[groestl_EXT_BYTE_1(src[21])];\
dst[17] ^= groestl_T5[groestl_EXT_BYTE_1(src[22])];\
dst[18] ^= groestl_T5[groestl_EXT_BYTE_1(src[23])];\
dst[19] ^= groestl_T5[groestl_EXT_BYTE_1(src[24])];\
dst[20] ^= groestl_T5[groestl_EXT_BYTE_1(src[25])];\
dst[21] ^= groestl_T5[groestl_EXT_BYTE_1(src[26])];\
dst[22] ^= groestl_T5[groestl_EXT_BYTE_1(src[27])];\
dst[23] ^= groestl_T5[groestl_EXT_BYTE_1(src[28])];\
dst[24] ^= groestl_T5[groestl_EXT_BYTE_1(src[29])];\
dst[25] ^= groestl_T5[groestl_EXT_BYTE_1(src[30])];\
dst[26] ^= groestl_T5[groestl_EXT_BYTE_1(src[31])];\
dst[27] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 0])];\
dst[28] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 1])];\
dst[29] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 2])];\
dst[30] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 3])];\
dst[31] ^= groestl_T5[groestl_EXT_BYTE_1(src[ 4])];\
dst[ 0] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 9])];\
dst[ 1] ^= groestl_T6[groestl_EXT_BYTE_2(src[10])];\
dst[ 2] ^= groestl_T6[groestl_EXT_BYTE_2(src[11])];\
dst[ 3] ^= groestl_T6[groestl_EXT_BYTE_2(src[12])];\
dst[ 4] ^= groestl_T6[groestl_EXT_BYTE_2(src[13])];\
dst[ 5] ^= groestl_T6[groestl_EXT_BYTE_2(src[14])];\
dst[ 6] ^= groestl_T6[groestl_EXT_BYTE_2(src[15])];\
dst[ 7] ^= groestl_T6[groestl_EXT_BYTE_2(src[16])];\
dst[ 8] ^= groestl_T6[groestl_EXT_BYTE_2(src[17])];\
dst[ 9] ^= groestl_T6[groestl_EXT_BYTE_2(src[18])];\
dst[10] ^= groestl_T6[groestl_EXT_BYTE_2(src[19])];\
dst[11] ^= groestl_T6[groestl_EXT_BYTE_2(src[20])];\
dst[12] ^= groestl_T6[groestl_EXT_BYTE_2(src[21])];\
dst[13] ^= groestl_T6[groestl_EXT_BYTE_2(src[22])];\
dst[14] ^= groestl_T6[groestl_EXT_BYTE_2(src[23])];\
dst[15] ^= groestl_T6[groestl_EXT_BYTE_2(src[24])];\
dst[16] ^= groestl_T6[groestl_EXT_BYTE_2(src[25])];\
dst[17] ^= groestl_T6[groestl_EXT_BYTE_2(src[26])];\
dst[18] ^= groestl_T6[groestl_EXT_BYTE_2(src[27])];\
dst[19] ^= groestl_T6[groestl_EXT_BYTE_2(src[28])];\
dst[20] ^= groestl_T6[groestl_EXT_BYTE_2(src[29])];\
dst[21] ^= groestl_T6[groestl_EXT_BYTE_2(src[30])];\
dst[22] ^= groestl_T6[groestl_EXT_BYTE_2(src[31])];\
dst[23] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 0])];\
dst[24] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 1])];\
dst[25] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 2])];\
dst[26] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 3])];\
dst[27] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 4])];\
dst[28] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 5])];\
dst[29] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 6])];\
dst[30] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 7])];\
dst[31] ^= groestl_T6[groestl_EXT_BYTE_2(src[ 8])];\
dst[ 0] ^= groestl_T7[groestl_EXT_BYTE_3(src[13])];\
dst[ 1] ^= groestl_T7[groestl_EXT_BYTE_3(src[22])];\
dst[ 2] ^= groestl_T7[groestl_EXT_BYTE_3(src[15])];\
dst[ 3] ^= groestl_T7[groestl_EXT_BYTE_3(src[24])];\
dst[ 4] ^= groestl_T7[groestl_EXT_BYTE_3(src[17])];\
dst[ 5] ^= groestl_T7[groestl_EXT_BYTE_3(src[26])];\
dst[ 6] ^= groestl_T7[groestl_EXT_BYTE_3(src[19])];\
dst[ 7] ^= groestl_T7[groestl_EXT_BYTE_3(src[28])];\
dst[ 8] ^= groestl_T7[groestl_EXT_BYTE_3(src[21])];\
dst[ 9] ^= groestl_T7[groestl_EXT_BYTE_3(src[30])];\
dst[10] ^= groestl_T7[groestl_EXT_BYTE_3(src[23])];\
dst[11] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 0])];\
dst[12] ^= groestl_T7[groestl_EXT_BYTE_3(src[25])];\
dst[13] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 2])];\
dst[14] ^= groestl_T7[groestl_EXT_BYTE_3(src[27])];\
dst[15] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 4])];\
dst[16] ^= groestl_T7[groestl_EXT_BYTE_3(src[29])];\
dst[17] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 6])];\
dst[18] ^= groestl_T7[groestl_EXT_BYTE_3(src[31])];\
dst[19] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 8])];\
dst[20] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 1])];\
dst[21] ^= groestl_T7[groestl_EXT_BYTE_3(src[10])];\
dst[22] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 3])];\
dst[23] ^= groestl_T7[groestl_EXT_BYTE_3(src[12])];\
dst[24] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 5])];\
dst[25] ^= groestl_T7[groestl_EXT_BYTE_3(src[14])];\
dst[26] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 7])];\
dst[27] ^= groestl_T7[groestl_EXT_BYTE_3(src[16])];\
dst[28] ^= groestl_T7[groestl_EXT_BYTE_3(src[ 9])];\
dst[29] ^= groestl_T7[groestl_EXT_BYTE_3(src[18])];\
dst[30] ^= groestl_T7[groestl_EXT_BYTE_3(src[11])];\
dst[31] ^= groestl_T7[groestl_EXT_BYTE_3(src[20])];

// global
const CONSTANT UINT32 groestl_T_init[256*8] =
{
0xa5f432c6UL,0x84976ff8UL,0x99b05eeeUL,0x8d8c7af6UL,0x0d17e8ffUL,0xbddc0ad6UL,0xb1c816deUL,0x54fc6d91UL,0x50f09060UL,0x03050702UL,0xa9e02eceUL,0x7d87d156UL,0x192bcce7UL,0x62a613b5UL,0xe6317c4dUL,0x9ab559ecUL,0x45cf408fUL,0x9dbca31fUL,0x40c04989UL,0x879268faUL,0x153fd0efUL,0xeb2694b2UL,0xc940ce8eUL,0x0b1de6fbUL,0xec2f6e41UL,0x67a91ab3UL,0xfd1c435fUL,0xea256045UL,0xbfdaf923UL,0xf7025153UL,0x96a145e4UL,0x5bed769bUL,0xc25d2875UL,0x1c24c5e1UL,0xaee9d43dUL,0x6abef24cUL,0x5aee826cUL,0x41c3bd7eUL,0x0206f3f5UL,0x4fd15283UL,0x5ce48c68UL,0xf4075651UL,0x345c8dd1UL,0x0818e1f9UL,0x93ae4ce2UL,0x73953eabUL,0x53f59762UL,0x3f416b2aUL,0x0c141c08UL,0x52f66395UL,0x65afe946UL,0x5ee27f9dUL,0x28784830UL,0xa1f8cf37UL,0x0f111b0aUL,0xb5c4eb2fUL,0x091b150eUL,0x365a7e24UL,0x9bb6ad1bUL,0x3d4798dfUL,0x266aa7cdUL,0x69bbf54eUL,0xcd4c337fUL,0x9fba50eaUL,0x1b2d3f12UL,0x9eb9a41dUL,0x749cc458UL,0x2e724634UL,0x2d774136UL,0xb2cd11dcUL,0xee299db4UL,0xfb164d5bUL,0xf601a5a4UL,0x4dd7a176UL,0x61a314b7UL,0xce49347dUL,0x7b8ddf52UL,0x3e429fddUL,0x7193cd5eUL,0x97a2b113UL,0xf504a2a6UL,0x68b801b9UL,0x00000000UL,0x2c74b5c1UL,0x60a0e040UL,0x1f21c2e3UL,0xc8433a79UL,0xed2c9ab6UL,0xbed90dd4UL,0x46ca478dUL,0xd9701767UL,0x4bddaf72UL,0xde79ed94UL,0xd467ff98UL,0xe82393b0UL,0x4ade5b85UL,0x6bbd06bbUL,0x2a7ebbc5UL,0xe5347b4fUL,0x163ad7edUL,0xc554d286UL,0xd762f89aUL,0x55ff9966UL,0x94a7b611UL,0xcf4ac08aUL,0x1030d9e9UL,0x060a0e04UL,0x819866feUL,0xf00baba0UL,0x44ccb478UL,0xbad5f025UL,0xe33e754bUL,0xf30eaca2UL,0xfe19445dUL,0xc05bdb80UL,0x8a858005UL,0xadecd33fUL,0xbcdffe21UL,0x48d8a870UL,0x040cfdf1UL,0xdf7a1963UL,0xc1582f77UL,0x759f30afUL,0x63a5e742UL,0x30507020UL,0x1a2ecbe5UL,0x0e12effdUL,0x6db708bfUL,0x4cd45581UL,0x143c2418UL,0x355f7926UL,0x2f71b2c3UL,0xe13886beUL,0xa2fdc835UL,0xcc4fc788UL,0x394b652eUL,0x57f96a93UL,0xf20d5855UL,0x829d61fcUL,0x47c9b37aUL,0xacef27c8UL,0xe73288baUL,0x2b7d4f32UL,0x95a442e6UL,0xa0fb3bc0UL,0x98b3aa19UL,0xd168f69eUL,0x7f8122a3UL,0x66aaee44UL,0x7e82d654UL,0xabe6dd3bUL,0x839e950bUL,0xca45c98cUL,0x297bbcc7UL,0xd36e056bUL,0x3c446c28UL,0x798b2ca7UL,0xe23d81bcUL,0x1d273116UL,0x769a37adUL,0x3b4d96dbUL,0x56fa9e64UL,0x4ed2a674UL,0x1e223614UL,0xdb76e492UL,0x0a1e120cUL,0x6cb4fc48UL,0xe4378fb8UL,0x5de7789fUL,0x6eb20fbdUL,0xef2a6943UL,0xa6f135c4UL,0xa8e3da39UL,0xa4f7c631UL,0x37598ad3UL,0x8b8674f2UL,0x325683d5UL,0x43c54e8bUL,0x59eb856eUL,0xb7c218daUL,0x8c8f8e01UL,0x64ac1db1UL,0xd26df19cUL,0xe03b7249UL,0xb4c71fd8UL,0xfa15b9acUL,0x0709faf3UL,0x256fa0cfUL,0xafea20caUL,0x8e897df4UL,0xe9206747UL,0x18283810UL,0xd5640b6fUL,0x888373f0UL,0x6fb1fb4aUL,0x7296ca5cUL,0x246c5438UL,0xf1085f57UL,0xc7522173UL,0x51f36497UL,0x2365aecbUL,0x7c8425a1UL,0x9cbf57e8UL,0x21635d3eUL,0xdd7cea96UL,0xdc7f1e61UL,0x86919c0dUL,0x85949b0fUL,0x90ab4be0UL,0x42c6ba7cUL,0xc4572671UL,0xaae529ccUL,0xd873e390UL,0x050f0906UL,0x0103f4f7UL,0x12362a1cUL,0xa3fe3cc2UL,0x5fe18b6aUL,0xf910beaeUL,0xd06b0269UL,0x91a8bf17UL,0x58e87199UL,0x2769533aUL,0xb9d0f727UL,0x384891d9UL,0x1335deebUL,0xb3cee52bUL,0x33557722UL,0xbbd604d2UL,0x709039a9UL,0x89808707UL,0xa7f2c133UL,0xb6c1ec2dUL,0x22665a3cUL,0x92adb815UL,0x2060a9c9UL,0x49db5c87UL,0xff1ab0aaUL,0x7888d850UL,0x7a8e2ba5UL,0x8f8a8903UL,0xf8134a59UL,0x809b9209UL,0x1739231aUL,0xda751065UL,0x315384d7UL,0xc651d584UL,0xb8d303d0UL,0xc35edc82UL,0xb0cbe229UL,0x7799c35aUL,0x11332d1eUL,0xcb463d7bUL,0xfc1fb7a8UL,0xd6610c6dUL,0x3a4e622cUL,
0xf432c6c6UL,0x976ff8f8UL,0xb05eeeeeUL,0x8c7af6f6UL,0x17e8ffffUL,0xdc0ad6d6UL,0xc816dedeUL,0xfc6d9191UL,0xf0906060UL,0x05070202UL,0xe02ececeUL,0x87d15656UL,0x2bcce7e7UL,0xa613b5b5UL,0x317c4d4dUL,0xb559ececUL,0xcf408f8fUL,0xbca31f1fUL,0xc0498989UL,0x9268fafaUL,0x3fd0efefUL,0x2694b2b2UL,0x40ce8e8eUL,0x1de6fbfbUL,0x2f6e4141UL,0xa91ab3b3UL,0x1c435f5fUL,0x25604545UL,0xdaf92323UL,0x02515353UL,0xa145e4e4UL,0xed769b9bUL,0x5d287575UL,0x24c5e1e1UL,0xe9d43d3dUL,0xbef24c4cUL,0xee826c6cUL,0xc3bd7e7eUL,0x06f3f5f5UL,0xd1528383UL,0xe48c6868UL,0x07565151UL,0x5c8dd1d1UL,0x18e1f9f9UL,0xae4ce2e2UL,0x953eababUL,0xf5976262UL,0x416b2a2aUL,0x141c0808UL,0xf6639595UL,0xafe94646UL,0xe27f9d9dUL,0x78483030UL,0xf8cf3737UL,0x111b0a0aUL,0xc4eb2f2fUL,0x1b150e0eUL,0x5a7e2424UL,0xb6ad1b1bUL,0x4798dfdfUL,0x6aa7cdcdUL,0xbbf54e4eUL,0x4c337f7fUL,0xba50eaeaUL,0x2d3f1212UL,0xb9a41d1dUL,0x9cc45858UL,0x72463434UL,0x77413636UL,0xcd11dcdcUL,0x299db4b4UL,0x164d5b5bUL,0x01a5a4a4UL,0xd7a17676UL,0xa314b7b7UL,0x49347d7dUL,0x8ddf5252UL,0x429fddddUL,0x93cd5e5eUL,0xa2b11313UL,0x04a2a6a6UL,0xb801b9b9UL,0x00000000UL,0x74b5c1c1UL,0xa0e04040UL,0x21c2e3e3UL,0x433a7979UL,0x2c9ab6b6UL,0xd90dd4d4UL,0xca478d8dUL,0x70176767UL,0xddaf7272UL,0x79ed9494UL,0x67ff9898UL,0x2393b0b0UL,0xde5b8585UL,0xbd06bbbbUL,0x7ebbc5c5UL,0x347b4f4fUL,0x3ad7ededUL,0x54d28686UL,0x62f89a9aUL,0xff996666UL,0xa7b61111UL,0x4ac08a8aUL,0x30d9e9e9UL,0x0a0e0404UL,0x9866fefeUL,0x0baba0a0UL,0xccb47878UL,0xd5f02525UL,0x3e754b4bUL,0x0eaca2a2UL,0x19445d5dUL,0x5bdb8080UL,0x85800505UL,0xecd33f3fUL,0xdffe2121UL,0xd8a87070UL,0x0cfdf1f1UL,0x7a196363UL,0x582f7777UL,0x9f30afafUL,0xa5e74242UL,0x50702020UL,0x2ecbe5e5UL,0x12effdfdUL,0xb708bfbfUL,0xd4558181UL,0x3c241818UL,0x5f792626UL,0x71b2c3c3UL,0x3886bebeUL,0xfdc83535UL,0x4fc78888UL,0x4b652e2eUL,0xf96a9393UL,0x0d585555UL,0x9d61fcfcUL,0xc9b37a7aUL,0xef27c8c8UL,0x3288babaUL,0x7d4f3232UL,0xa442e6e6UL,0xfb3bc0c0UL,0xb3aa1919UL,0x68f69e9eUL,0x8122a3a3UL,0xaaee4444UL,0x82d65454UL,0xe6dd3b3bUL,0x9e950b0bUL,0x45c98c8cUL,0x7bbcc7c7UL,0x6e056b6bUL,0x446c2828UL,0x8b2ca7a7UL,0x3d81bcbcUL,0x27311616UL,0x9a37adadUL,0x4d96dbdbUL,0xfa9e6464UL,0xd2a67474UL,0x22361414UL,0x76e49292UL,0x1e120c0cUL,0xb4fc4848UL,0x378fb8b8UL,0xe7789f9fUL,0xb20fbdbdUL,0x2a694343UL,0xf135c4c4UL,0xe3da3939UL,0xf7c63131UL,0x598ad3d3UL,0x8674f2f2UL,0x5683d5d5UL,0xc54e8b8bUL,0xeb856e6eUL,0xc218dadaUL,0x8f8e0101UL,0xac1db1b1UL,0x6df19c9cUL,0x3b724949UL,0xc71fd8d8UL,0x15b9acacUL,0x09faf3f3UL,0x6fa0cfcfUL,0xea20cacaUL,0x897df4f4UL,0x20674747UL,0x28381010UL,0x640b6f6fUL,0x8373f0f0UL,0xb1fb4a4aUL,0x96ca5c5cUL,0x6c543838UL,0x085f5757UL,0x52217373UL,0xf3649797UL,0x65aecbcbUL,0x8425a1a1UL,0xbf57e8e8UL,0x635d3e3eUL,0x7cea9696UL,0x7f1e6161UL,0x919c0d0dUL,0x949b0f0fUL,0xab4be0e0UL,0xc6ba7c7cUL,0x57267171UL,0xe529ccccUL,0x73e39090UL,0x0f090606UL,0x03f4f7f7UL,0x362a1c1cUL,0xfe3cc2c2UL,0xe18b6a6aUL,0x10beaeaeUL,0x6b026969UL,0xa8bf1717UL,0xe8719999UL,0x69533a3aUL,0xd0f72727UL,0x4891d9d9UL,0x35deebebUL,0xcee52b2bUL,0x55772222UL,0xd604d2d2UL,0x9039a9a9UL,0x80870707UL,0xf2c13333UL,0xc1ec2d2dUL,0x665a3c3cUL,0xadb81515UL,0x60a9c9c9UL,0xdb5c8787UL,0x1ab0aaaaUL,0x88d85050UL,0x8e2ba5a5UL,0x8a890303UL,0x134a5959UL,0x9b920909UL,0x39231a1aUL,0x75106565UL,0x5384d7d7UL,0x51d58484UL,0xd303d0d0UL,0x5edc8282UL,0xcbe22929UL,0x99c35a5aUL,0x332d1e1eUL,0x463d7b7bUL,0x1fb7a8a8UL,0x610c6d6dUL,0x4e622c2cUL,
0x32c6c6a5UL,0x6ff8f884UL,0x5eeeee99UL,0x7af6f68dUL,0xe8ffff0dUL,0x0ad6d6bdUL,0x16dedeb1UL,0x6d919154UL,0x90606050UL,0x07020203UL,0x2ececea9UL,0xd156567dUL,0xcce7e719UL,0x13b5b562UL,0x7c4d4de6UL,0x59ecec9aUL,0x408f8f45UL,0xa31f1f9dUL,0x49898940UL,0x68fafa87UL,0xd0efef15UL,0x94b2b2ebUL,0xce8e8ec9UL,0xe6fbfb0bUL,0x6e4141ecUL,0x1ab3b367UL,0x435f5ffdUL,0x604545eaUL,0xf92323bfUL,0x515353f7UL,0x45e4e496UL,0x769b9b5bUL,0x287575c2UL,0xc5e1e11cUL,0xd43d3daeUL,0xf24c4c6aUL,0x826c6c5aUL,0xbd7e7e41UL,0xf3f5f502UL,0x5283834fUL,0x8c68685cUL,0x565151f4UL,0x8dd1d134UL,0xe1f9f908UL,0x4ce2e293UL,0x3eabab73UL,0x97626253UL,0x6b2a2a3fUL,0x1c08080cUL,0x63959552UL,0xe9464665UL,0x7f9d9d5eUL,0x48303028UL,0xcf3737a1UL,0x1b0a0a0fUL,0xeb2f2fb5UL,0x150e0e09UL,0x7e242436UL,0xad1b1b9bUL,0x98dfdf3dUL,0xa7cdcd26UL,0xf54e4e69UL,0x337f7fcdUL,0x50eaea9fUL,0x3f12121bUL,0xa41d1d9eUL,0xc4585874UL,0x4634342eUL,0x4136362dUL,0x11dcdcb2UL,0x9db4b4eeUL,0x4d5b5bfbUL,0xa5a4a4f6UL,0xa176764dUL,0x14b7b761UL,0x347d7dceUL,0xdf52527bUL,0x9fdddd3eUL,0xcd5e5e71UL,0xb1131397UL,0xa2a6a6f5UL,0x01b9b968UL,0x00000000UL,0xb5c1c12cUL,0xe0404060UL,0xc2e3e31fUL,0x3a7979c8UL,0x9ab6b6edUL,0x0dd4d4beUL,0x478d8d46UL,0x176767d9UL,0xaf72724bUL,0xed9494deUL,0xff9898d4UL,0x93b0b0e8UL,0x5b85854aUL,0x06bbbb6bUL,0xbbc5c52aUL,0x7b4f4fe5UL,0xd7eded16UL,0xd28686c5UL,0xf89a9ad7UL,0x99666655UL,0xb6111194UL,0xc08a8acfUL,0xd9e9e910UL,0x0e040406UL,0x66fefe81UL,0xaba0a0f0UL,0xb4787844UL,0xf02525baUL,0x754b4be3UL,0xaca2a2f3UL,0x445d5dfeUL,0xdb8080c0UL,0x8005058aUL,0xd33f3fadUL,0xfe2121bcUL,0xa8707048UL,0xfdf1f104UL,0x196363dfUL,0x2f7777c1UL,0x30afaf75UL,0xe7424263UL,0x70202030UL,0xcbe5e51aUL,0xeffdfd0eUL,0x08bfbf6dUL,0x5581814cUL,0x24181814UL,0x79262635UL,0xb2c3c32fUL,0x86bebee1UL,0xc83535a2UL,0xc78888ccUL,0x652e2e39UL,0x6a939357UL,0x585555f2UL,0x61fcfc82UL,0xb37a7a47UL,0x27c8c8acUL,0x88babae7UL,0x4f32322bUL,0x42e6e695UL,0x3bc0c0a0UL,0xaa191998UL,0xf69e9ed1UL,0x22a3a37fUL,0xee444466UL,0xd654547eUL,0xdd3b3babUL,0x950b0b83UL,0xc98c8ccaUL,0xbcc7c729UL,0x056b6bd3UL,0x6c28283cUL,0x2ca7a779UL,0x81bcbce2UL,0x3116161dUL,0x37adad76UL,0x96dbdb3bUL,0x9e646456UL,0xa674744eUL,0x3614141eUL,0xe49292dbUL,0x120c0c0aUL,0xfc48486cUL,0x8fb8b8e4UL,0x789f9f5dUL,0x0fbdbd6eUL,0x694343efUL,0x35c4c4a6UL,0xda3939a8UL,0xc63131a4UL,0x8ad3d337UL,0x74f2f28bUL,0x83d5d532UL,0x4e8b8b43UL,0x856e6e59UL,0x18dadab7UL,0x8e01018cUL,0x1db1b164UL,0xf19c9cd2UL,0x724949e0UL,0x1fd8d8b4UL,0xb9acacfaUL,0xfaf3f307UL,0xa0cfcf25UL,0x20cacaafUL,0x7df4f48eUL,0x674747e9UL,0x38101018UL,0x0b6f6fd5UL,0x73f0f088UL,0xfb4a4a6fUL,0xca5c5c72UL,0x54383824UL,0x5f5757f1UL,0x217373c7UL,0x64979751UL,0xaecbcb23UL,0x25a1a17cUL,0x57e8e89cUL,0x5d3e3e21UL,0xea9696ddUL,0x1e6161dcUL,0x9c0d0d86UL,0x9b0f0f85UL,0x4be0e090UL,0xba7c7c42UL,0x267171c4UL,0x29ccccaaUL,0xe39090d8UL,0x09060605UL,0xf4f7f701UL,0x2a1c1c12UL,0x3cc2c2a3UL,0x8b6a6a5fUL,0xbeaeaef9UL,0x026969d0UL,0xbf171791UL,0x71999958UL,0x533a3a27UL,0xf72727b9UL,0x91d9d938UL,0xdeebeb13UL,0xe52b2bb3UL,0x77222233UL,0x04d2d2bbUL,0x39a9a970UL,0x87070789UL,0xc13333a7UL,0xec2d2db6UL,0x5a3c3c22UL,0xb8151592UL,0xa9c9c920UL,0x5c878749UL,0xb0aaaaffUL,0xd8505078UL,0x2ba5a57aUL,0x8903038fUL,0x4a5959f8UL,0x92090980UL,0x231a1a17UL,0x106565daUL,0x84d7d731UL,0xd58484c6UL,0x03d0d0b8UL,0xdc8282c3UL,0xe22929b0UL,0xc35a5a77UL,0x2d1e1e11UL,0x3d7b7bcbUL,0xb7a8a8fcUL,0x0c6d6dd6UL,0x622c2c3aUL,
0xc6c6a597UL,0xf8f884ebUL,0xeeee99c7UL,0xf6f68df7UL,0xffff0de5UL,0xd6d6bdb7UL,0xdedeb1a7UL,0x91915439UL,0x606050c0UL,0x02020304UL,0xcecea987UL,0x56567dacUL,0xe7e719d5UL,0xb5b56271UL,0x4d4de69aUL,0xecec9ac3UL,0x8f8f4505UL,0x1f1f9d3eUL,0x89894009UL,0xfafa87efUL,0xefef15c5UL,0xb2b2eb7fUL,0x8e8ec907UL,0xfbfb0bedUL,0x4141ec82UL,0xb3b3677dUL,0x5f5ffdbeUL,0x4545ea8aUL,0x2323bf46UL,0x5353f7a6UL,0xe4e496d3UL,0x9b9b5b2dUL,0x7575c2eaUL,0xe1e11cd9UL,0x3d3dae7aUL,0x4c4c6a98UL,0x6c6c5ad8UL,0x7e7e41fcUL,0xf5f502f1UL,0x83834f1dUL,0x68685cd0UL,0x5151f4a2UL,0xd1d134b9UL,0xf9f908e9UL,0xe2e293dfUL,0xabab734dUL,0x626253c4UL,0x2a2a3f54UL,0x08080c10UL,0x95955231UL,0x4646658cUL,0x9d9d5e21UL,0x30302860UL,0x3737a16eUL,0x0a0a0f14UL,0x2f2fb55eUL,0x0e0e091cUL,0x24243648UL,0x1b1b9b36UL,0xdfdf3da5UL,0xcdcd2681UL,0x4e4e699cUL,0x7f7fcdfeUL,0xeaea9fcfUL,0x12121b24UL,0x1d1d9e3aUL,0x585874b0UL,0x34342e68UL,0x36362d6cUL,0xdcdcb2a3UL,0xb4b4ee73UL,0x5b5bfbb6UL,0xa4a4f653UL,0x76764decUL,0xb7b76175UL,0x7d7dcefaUL,0x52527ba4UL,0xdddd3ea1UL,0x5e5e71bcUL,0x13139726UL,0xa6a6f557UL,0xb9b96869UL,0x00000000UL,0xc1c12c99UL,0x40406080UL,0xe3e31fddUL,0x7979c8f2UL,0xb6b6ed77UL,0xd4d4beb3UL,0x8d8d4601UL,0x6767d9ceUL,0x72724be4UL,0x9494de33UL,0x9898d42bUL,0xb0b0e87bUL,0x85854a11UL,0xbbbb6b6dUL,0xc5c52a91UL,0x4f4fe59eUL,0xeded16c1UL,0x8686c517UL,0x9a9ad72fUL,0x666655ccUL,0x11119422UL,0x8a8acf0fUL,0xe9e910c9UL,0x04040608UL,0xfefe81e7UL,0xa0a0f05bUL,0x787844f0UL,0x2525ba4aUL,0x4b4be396UL,0xa2a2f35fUL,0x5d5dfebaUL,0x8080c01bUL,0x05058a0aUL,0x3f3fad7eUL,0x2121bc42UL,0x707048e0UL,0xf1f104f9UL,0x6363dfc6UL,0x7777c1eeUL,0xafaf7545UL,0x42426384UL,0x20203040UL,0xe5e51ad1UL,0xfdfd0ee1UL,0xbfbf6d65UL,0x81814c19UL,0x18181430UL,0x2626354cUL,0xc3c32f9dUL,0xbebee167UL,0x3535a26aUL,0x8888cc0bUL,0x2e2e395cUL,0x9393573dUL,0x5555f2aaUL,0xfcfc82e3UL,0x7a7a47f4UL,0xc8c8ac8bUL,0xbabae76fUL,0x32322b64UL,0xe6e695d7UL,0xc0c0a09bUL,0x19199832UL,0x9e9ed127UL,0xa3a37f5dUL,0x44446688UL,0x54547ea8UL,0x3b3bab76UL,0x0b0b8316UL,0x8c8cca03UL,0xc7c72995UL,0x6b6bd3d6UL,0x28283c50UL,0xa7a77955UL,0xbcbce263UL,0x16161d2cUL,0xadad7641UL,0xdbdb3badUL,0x646456c8UL,0x74744ee8UL,0x14141e28UL,0x9292db3fUL,0x0c0c0a18UL,0x48486c90UL,0xb8b8e46bUL,0x9f9f5d25UL,0xbdbd6e61UL,0x4343ef86UL,0xc4c4a693UL,0x3939a872UL,0x3131a462UL,0xd3d337bdUL,0xf2f28bffUL,0xd5d532b1UL,0x8b8b430dUL,0x6e6e59dcUL,0xdadab7afUL,0x01018c02UL,0xb1b16479UL,0x9c9cd223UL,0x4949e092UL,0xd8d8b4abUL,0xacacfa43UL,0xf3f307fdUL,0xcfcf2585UL,0xcacaaf8fUL,0xf4f48ef3UL,0x4747e98eUL,0x10101820UL,0x6f6fd5deUL,0xf0f088fbUL,0x4a4a6f94UL,0x5c5c72b8UL,0x38382470UL,0x5757f1aeUL,0x7373c7e6UL,0x97975135UL,0xcbcb238dUL,0xa1a17c59UL,0xe8e89ccbUL,0x3e3e217cUL,0x9696dd37UL,0x6161dcc2UL,0x0d0d861aUL,0x0f0f851eUL,0xe0e090dbUL,0x7c7c42f8UL,0x7171c4e2UL,0xccccaa83UL,0x9090d83bUL,0x0606050cUL,0xf7f701f5UL,0x1c1c1238UL,0xc2c2a39fUL,0x6a6a5fd4UL,0xaeaef947UL,0x6969d0d2UL,0x1717912eUL,0x99995829UL,0x3a3a2774UL,0x2727b94eUL,0xd9d938a9UL,0xebeb13cdUL,0x2b2bb356UL,0x22223344UL,0xd2d2bbbfUL,0xa9a97049UL,0x0707890eUL,0x3333a766UL,0x2d2db65aUL,0x3c3c2278UL,0x1515922aUL,0xc9c92089UL,0x87874915UL,0xaaaaff4fUL,0x505078a0UL,0xa5a57a51UL,0x03038f06UL,0x5959f8b2UL,0x09098012UL,0x1a1a1734UL,0x6565dacaUL,0xd7d731b5UL,0x8484c613UL,0xd0d0b8bbUL,0x8282c31fUL,0x2929b052UL,0x5a5a77b4UL,0x1e1e113cUL,0x7b7bcbf6UL,0xa8a8fc4bUL,0x6d6dd6daUL,0x2c2c3a58UL,
0xc6a597f4UL,0xf884eb97UL,0xee99c7b0UL,0xf68df78cUL,0xff0de517UL,0xd6bdb7dcUL,0xdeb1a7c8UL,0x915439fcUL,0x6050c0f0UL,0x02030405UL,0xcea987e0UL,0x567dac87UL,0xe719d52bUL,0xb56271a6UL,0x4de69a31UL,0xec9ac3b5UL,0x8f4505cfUL,0x1f9d3ebcUL,0x894009c0UL,0xfa87ef92UL,0xef15c53fUL,0xb2eb7f26UL,0x8ec90740UL,0xfb0bed1dUL,0x41ec822fUL,0xb3677da9UL,0x5ffdbe1cUL,0x45ea8a25UL,0x23bf46daUL,0x53f7a602UL,0xe496d3a1UL,0x9b5b2dedUL,0x75c2ea5dUL,0xe11cd924UL,0x3dae7ae9UL,0x4c6a98beUL,0x6c5ad8eeUL,0x7e41fcc3UL,0xf502f106UL,0x834f1dd1UL,0x685cd0e4UL,0x51f4a207UL,0xd134b95cUL,0xf908e918UL,0xe293dfaeUL,0xab734d95UL,0x6253c4f5UL,0x2a3f5441UL,0x080c1014UL,0x955231f6UL,0x46658cafUL,0x9d5e21e2UL,0x30286078UL,0x37a16ef8UL,0x0a0f1411UL,0x2fb55ec4UL,0x0e091c1bUL,0x2436485aUL,0x1b9b36b6UL,0xdf3da547UL,0xcd26816aUL,0x4e699cbbUL,0x7fcdfe4cUL,0xea9fcfbaUL,0x121b242dUL,0x1d9e3ab9UL,0x5874b09cUL,0x342e6872UL,0x362d6c77UL,0xdcb2a3cdUL,0xb4ee7329UL,0x5bfbb616UL,0xa4f65301UL,0x764decd7UL,0xb76175a3UL,0x7dcefa49UL,0x527ba48dUL,0xdd3ea142UL,0x5e71bc93UL,0x139726a2UL,0xa6f55704UL,0xb96869b8UL,0x00000000UL,0xc12c9974UL,0x406080a0UL,0xe31fdd21UL,0x79c8f243UL,0xb6ed772cUL,0xd4beb3d9UL,0x8d4601caUL,0x67d9ce70UL,0x724be4ddUL,0x94de3379UL,0x98d42b67UL,0xb0e87b23UL,0x854a11deUL,0xbb6b6dbdUL,0xc52a917eUL,0x4fe59e34UL,0xed16c13aUL,0x86c51754UL,0x9ad72f62UL,0x6655ccffUL,0x119422a7UL,0x8acf0f4aUL,0xe910c930UL,0x0406080aUL,0xfe81e798UL,0xa0f05b0bUL,0x7844f0ccUL,0x25ba4ad5UL,0x4be3963eUL,0xa2f35f0eUL,0x5dfeba19UL,0x80c01b5bUL,0x058a0a85UL,0x3fad7eecUL,0x21bc42dfUL,0x7048e0d8UL,0xf104f90cUL,0x63dfc67aUL,0x77c1ee58UL,0xaf75459fUL,0x426384a5UL,0x20304050UL,0xe51ad12eUL,0xfd0ee112UL,0xbf6d65b7UL,0x814c19d4UL,0x1814303cUL,0x26354c5fUL,0xc32f9d71UL,0xbee16738UL,0x35a26afdUL,0x88cc0b4fUL,0x2e395c4bUL,0x93573df9UL,0x55f2aa0dUL,0xfc82e39dUL,0x7a47f4c9UL,0xc8ac8befUL,0xbae76f32UL,0x322b647dUL,0xe695d7a4UL,0xc0a09bfbUL,0x199832b3UL,0x9ed12768UL,0xa37f5d81UL,0x446688aaUL,0x547ea882UL,0x3bab76e6UL,0x0b83169eUL,0x8cca0345UL,0xc729957bUL,0x6bd3d66eUL,0x283c5044UL,0xa779558bUL,0xbce2633dUL,0x161d2c27UL,0xad76419aUL,0xdb3bad4dUL,0x6456c8faUL,0x744ee8d2UL,0x141e2822UL,0x92db3f76UL,0x0c0a181eUL,0x486c90b4UL,0xb8e46b37UL,0x9f5d25e7UL,0xbd6e61b2UL,0x43ef862aUL,0xc4a693f1UL,0x39a872e3UL,0x31a462f7UL,0xd337bd59UL,0xf28bff86UL,0xd532b156UL,0x8b430dc5UL,0x6e59dcebUL,0xdab7afc2UL,0x018c028fUL,0xb16479acUL,0x9cd2236dUL,0x49e0923bUL,0xd8b4abc7UL,0xacfa4315UL,0xf307fd09UL,0xcf25856fUL,0xcaaf8feaUL,0xf48ef389UL,0x47e98e20UL,0x10182028UL,0x6fd5de64UL,0xf088fb83UL,0x4a6f94b1UL,0x5c72b896UL,0x3824706cUL,0x57f1ae08UL,0x73c7e652UL,0x975135f3UL,0xcb238d65UL,0xa17c5984UL,0xe89ccbbfUL,0x3e217c63UL,0x96dd377cUL,0x61dcc27fUL,0x0d861a91UL,0x0f851e94UL,0xe090dbabUL,0x7c42f8c6UL,0x71c4e257UL,0xccaa83e5UL,0x90d83b73UL,0x06050c0fUL,0xf701f503UL,0x1c123836UL,0xc2a39ffeUL,0x6a5fd4e1UL,0xaef94710UL,0x69d0d26bUL,0x17912ea8UL,0x995829e8UL,0x3a277469UL,0x27b94ed0UL,0xd938a948UL,0xeb13cd35UL,0x2bb356ceUL,0x22334455UL,0xd2bbbfd6UL,0xa9704990UL,0x07890e80UL,0x33a766f2UL,0x2db65ac1UL,0x3c227866UL,0x15922aadUL,0xc9208960UL,0x874915dbUL,0xaaff4f1aUL,0x5078a088UL,0xa57a518eUL,0x038f068aUL,0x59f8b213UL,0x0980129bUL,0x1a173439UL,0x65daca75UL,0xd731b553UL,0x84c61351UL,0xd0b8bbd3UL,0x82c31f5eUL,0x29b052cbUL,0x5a77b499UL,0x1e113c33UL,0x7bcbf646UL,0xa8fc4b1fUL,0x6dd6da61UL,0x2c3a584eUL,
0xa597f4a5UL,0x84eb9784UL,0x99c7b099UL,0x8df78c8dUL,0x0de5170dUL,0xbdb7dcbdUL,0xb1a7c8b1UL,0x5439fc54UL,0x50c0f050UL,0x03040503UL,0xa987e0a9UL,0x7dac877dUL,0x19d52b19UL,0x6271a662UL,0xe69a31e6UL,0x9ac3b59aUL,0x4505cf45UL,0x9d3ebc9dUL,0x4009c040UL,0x87ef9287UL,0x15c53f15UL,0xeb7f26ebUL,0xc90740c9UL,0x0bed1d0bUL,0xec822fecUL,0x677da967UL,0xfdbe1cfdUL,0xea8a25eaUL,0xbf46dabfUL,0xf7a602f7UL,0x96d3a196UL,0x5b2ded5bUL,0xc2ea5dc2UL,0x1cd9241cUL,0xae7ae9aeUL,0x6a98be6aUL,0x5ad8ee5aUL,0x41fcc341UL,0x02f10602UL,0x4f1dd14fUL,0x5cd0e45cUL,0xf4a207f4UL,0x34b95c34UL,0x08e91808UL,0x93dfae93UL,0x734d9573UL,0x53c4f553UL,0x3f54413fUL,0x0c10140cUL,0x5231f652UL,0x658caf65UL,0x5e21e25eUL,0x28607828UL,0xa16ef8a1UL,0x0f14110fUL,0xb55ec4b5UL,0x091c1b09UL,0x36485a36UL,0x9b36b69bUL,0x3da5473dUL,0x26816a26UL,0x699cbb69UL,0xcdfe4ccdUL,0x9fcfba9fUL,0x1b242d1bUL,0x9e3ab99eUL,0x74b09c74UL,0x2e68722eUL,0x2d6c772dUL,0xb2a3cdb2UL,0xee7329eeUL,0xfbb616fbUL,0xf65301f6UL,0x4decd74dUL,0x6175a361UL,0xcefa49ceUL,0x7ba48d7bUL,0x3ea1423eUL,0x71bc9371UL,0x9726a297UL,0xf55704f5UL,0x6869b868UL,0x00000000UL,0x2c99742cUL,0x6080a060UL,0x1fdd211fUL,0xc8f243c8UL,0xed772cedUL,0xbeb3d9beUL,0x4601ca46UL,0xd9ce70d9UL,0x4be4dd4bUL,0xde3379deUL,0xd42b67d4UL,0xe87b23e8UL,0x4a11de4aUL,0x6b6dbd6bUL,0x2a917e2aUL,0xe59e34e5UL,0x16c13a16UL,0xc51754c5UL,0xd72f62d7UL,0x55ccff55UL,0x9422a794UL,0xcf0f4acfUL,0x10c93010UL,0x06080a06UL,0x81e79881UL,0xf05b0bf0UL,0x44f0cc44UL,0xba4ad5baUL,0xe3963ee3UL,0xf35f0ef3UL,0xfeba19feUL,0xc01b5bc0UL,0x8a0a858aUL,0xad7eecadUL,0xbc42dfbcUL,0x48e0d848UL,0x04f90c04UL,0xdfc67adfUL,0xc1ee58c1UL,0x75459f75UL,0x6384a563UL,0x30405030UL,0x1ad12e1aUL,0x0ee1120eUL,0x6d65b76dUL,0x4c19d44cUL,0x14303c14UL,0x354c5f35UL,0x2f9d712fUL,0xe16738e1UL,0xa26afda2UL,0xcc0b4fccUL,0x395c4b39UL,0x573df957UL,0xf2aa0df2UL,0x82e39d82UL,0x47f4c947UL,0xac8befacUL,0xe76f32e7UL,0x2b647d2bUL,0x95d7a495UL,0xa09bfba0UL,0x9832b398UL,0xd12768d1UL,0x7f5d817fUL,0x6688aa66UL,0x7ea8827eUL,0xab76e6abUL,0x83169e83UL,0xca0345caUL,0x29957b29UL,0xd3d66ed3UL,0x3c50443cUL,0x79558b79UL,0xe2633de2UL,0x1d2c271dUL,0x76419a76UL,0x3bad4d3bUL,0x56c8fa56UL,0x4ee8d24eUL,0x1e28221eUL,0xdb3f76dbUL,0x0a181e0aUL,0x6c90b46cUL,0xe46b37e4UL,0x5d25e75dUL,0x6e61b26eUL,0xef862aefUL,0xa693f1a6UL,0xa872e3a8UL,0xa462f7a4UL,0x37bd5937UL,0x8bff868bUL,0x32b15632UL,0x430dc543UL,0x59dceb59UL,0xb7afc2b7UL,0x8c028f8cUL,0x6479ac64UL,0xd2236dd2UL,0xe0923be0UL,0xb4abc7b4UL,0xfa4315faUL,0x07fd0907UL,0x25856f25UL,0xaf8feaafUL,0x8ef3898eUL,0xe98e20e9UL,0x18202818UL,0xd5de64d5UL,0x88fb8388UL,0x6f94b16fUL,0x72b89672UL,0x24706c24UL,0xf1ae08f1UL,0xc7e652c7UL,0x5135f351UL,0x238d6523UL,0x7c59847cUL,0x9ccbbf9cUL,0x217c6321UL,0xdd377cddUL,0xdcc27fdcUL,0x861a9186UL,0x851e9485UL,0x90dbab90UL,0x42f8c642UL,0xc4e257c4UL,0xaa83e5aaUL,0xd83b73d8UL,0x050c0f05UL,0x01f50301UL,0x12383612UL,0xa39ffea3UL,0x5fd4e15fUL,0xf94710f9UL,0xd0d26bd0UL,0x912ea891UL,0x5829e858UL,0x27746927UL,0xb94ed0b9UL,0x38a94838UL,0x13cd3513UL,0xb356ceb3UL,0x33445533UL,0xbbbfd6bbUL,0x70499070UL,0x890e8089UL,0xa766f2a7UL,0xb65ac1b6UL,0x22786622UL,0x922aad92UL,0x20896020UL,0x4915db49UL,0xff4f1affUL,0x78a08878UL,0x7a518e7aUL,0x8f068a8fUL,0xf8b213f8UL,0x80129b80UL,0x17343917UL,0xdaca75daUL,0x31b55331UL,0xc61351c6UL,0xb8bbd3b8UL,0xc31f5ec3UL,0xb052cbb0UL,0x77b49977UL,0x113c3311UL,0xcbf646cbUL,0xfc4b1ffcUL,0xd6da61d6UL,0x3a584e3aUL,
0x97f4a5f4UL,0xeb978497UL,0xc7b099b0UL,0xf78c8d8cUL,0xe5170d17UL,0xb7dcbddcUL,0xa7c8b1c8UL,0x39fc54fcUL,0xc0f050f0UL,0x04050305UL,0x87e0a9e0UL,0xac877d87UL,0xd52b192bUL,0x71a662a6UL,0x9a31e631UL,0xc3b59ab5UL,0x05cf45cfUL,0x3ebc9dbcUL,0x09c040c0UL,0xef928792UL,0xc53f153fUL,0x7f26eb26UL,0x0740c940UL,0xed1d0b1dUL,0x822fec2fUL,0x7da967a9UL,0xbe1cfd1cUL,0x8a25ea25UL,0x46dabfdaUL,0xa602f702UL,0xd3a196a1UL,0x2ded5bedUL,0xea5dc25dUL,0xd9241c24UL,0x7ae9aee9UL,0x98be6abeUL,0xd8ee5aeeUL,0xfcc341c3UL,0xf1060206UL,0x1dd14fd1UL,0xd0e45ce4UL,0xa207f407UL,0xb95c345cUL,0xe9180818UL,0xdfae93aeUL,0x4d957395UL,0xc4f553f5UL,0x54413f41UL,0x10140c14UL,0x31f652f6UL,0x8caf65afUL,0x21e25ee2UL,0x60782878UL,0x6ef8a1f8UL,0x14110f11UL,0x5ec4b5c4UL,0x1c1b091bUL,0x485a365aUL,0x36b69bb6UL,0xa5473d47UL,0x816a266aUL,0x9cbb69bbUL,0xfe4ccd4cUL,0xcfba9fbaUL,0x242d1b2dUL,0x3ab99eb9UL,0xb09c749cUL,0x68722e72UL,0x6c772d77UL,0xa3cdb2cdUL,0x7329ee29UL,0xb616fb16UL,0x5301f601UL,0xecd74dd7UL,0x75a361a3UL,0xfa49ce49UL,0xa48d7b8dUL,0xa1423e42UL,0xbc937193UL,0x26a297a2UL,0x5704f504UL,0x69b868b8UL,0x00000000UL,0x99742c74UL,0x80a060a0UL,0xdd211f21UL,0xf243c843UL,0x772ced2cUL,0xb3d9bed9UL,0x01ca46caUL,0xce70d970UL,0xe4dd4bddUL,0x3379de79UL,0x2b67d467UL,0x7b23e823UL,0x11de4adeUL,0x6dbd6bbdUL,0x917e2a7eUL,0x9e34e534UL,0xc13a163aUL,0x1754c554UL,0x2f62d762UL,0xccff55ffUL,0x22a794a7UL,0x0f4acf4aUL,0xc9301030UL,0x080a060aUL,0xe7988198UL,0x5b0bf00bUL,0xf0cc44ccUL,0x4ad5bad5UL,0x963ee33eUL,0x5f0ef30eUL,0xba19fe19UL,0x1b5bc05bUL,0x0a858a85UL,0x7eecadecUL,0x42dfbcdfUL,0xe0d848d8UL,0xf90c040cUL,0xc67adf7aUL,0xee58c158UL,0x459f759fUL,0x84a563a5UL,0x40503050UL,0xd12e1a2eUL,0xe1120e12UL,0x65b76db7UL,0x19d44cd4UL,0x303c143cUL,0x4c5f355fUL,0x9d712f71UL,0x6738e138UL,0x6afda2fdUL,0x0b4fcc4fUL,0x5c4b394bUL,0x3df957f9UL,0xaa0df20dUL,0xe39d829dUL,0xf4c947c9UL,0x8befacefUL,0x6f32e732UL,0x647d2b7dUL,0xd7a495a4UL,0x9bfba0fbUL,0x32b398b3UL,0x2768d168UL,0x5d817f81UL,0x88aa66aaUL,0xa8827e82UL,0x76e6abe6UL,0x169e839eUL,0x0345ca45UL,0x957b297bUL,0xd66ed36eUL,0x50443c44UL,0x558b798bUL,0x633de23dUL,0x2c271d27UL,0x419a769aUL,0xad4d3b4dUL,0xc8fa56faUL,0xe8d24ed2UL,0x28221e22UL,0x3f76db76UL,0x181e0a1eUL,0x90b46cb4UL,0x6b37e437UL,0x25e75de7UL,0x61b26eb2UL,0x862aef2aUL,0x93f1a6f1UL,0x72e3a8e3UL,0x62f7a4f7UL,0xbd593759UL,0xff868b86UL,0xb1563256UL,0x0dc543c5UL,0xdceb59ebUL,0xafc2b7c2UL,0x028f8c8fUL,0x79ac64acUL,0x236dd26dUL,0x923be03bUL,0xabc7b4c7UL,0x4315fa15UL,0xfd090709UL,0x856f256fUL,0x8feaafeaUL,0xf3898e89UL,0x8e20e920UL,0x20281828UL,0xde64d564UL,0xfb838883UL,0x94b16fb1UL,0xb8967296UL,0x706c246cUL,0xae08f108UL,0xe652c752UL,0x35f351f3UL,0x8d652365UL,0x59847c84UL,0xcbbf9cbfUL,0x7c632163UL,0x377cdd7cUL,0xc27fdc7fUL,0x1a918691UL,0x1e948594UL,0xdbab90abUL,0xf8c642c6UL,0xe257c457UL,0x83e5aae5UL,0x3b73d873UL,0x0c0f050fUL,0xf5030103UL,0x38361236UL,0x9ffea3feUL,0xd4e15fe1UL,0x4710f910UL,0xd26bd06bUL,0x2ea891a8UL,0x29e858e8UL,0x74692769UL,0x4ed0b9d0UL,0xa9483848UL,0xcd351335UL,0x56ceb3ceUL,0x44553355UL,0xbfd6bbd6UL,0x49907090UL,0x0e808980UL,0x66f2a7f2UL,0x5ac1b6c1UL,0x78662266UL,0x2aad92adUL,0x89602060UL,0x15db49dbUL,0x4f1aff1aUL,0xa0887888UL,0x518e7a8eUL,0x068a8f8aUL,0xb213f813UL,0x129b809bUL,0x34391739UL,0xca75da75UL,0xb5533153UL,0x1351c651UL,0xbbd3b8d3UL,0x1f5ec35eUL,0x52cbb0cbUL,0xb4997799UL,0x3c331133UL,0xf646cb46UL,0x4b1ffc1fUL,0xda61d661UL,0x584e3a4eUL,
0xf4a5f432UL,0x9784976fUL,0xb099b05eUL,0x8c8d8c7aUL,0x170d17e8UL,0xdcbddc0aUL,0xc8b1c816UL,0xfc54fc6dUL,0xf050f090UL,0x05030507UL,0xe0a9e02eUL,0x877d87d1UL,0x2b192bccUL,0xa662a613UL,0x31e6317cUL,0xb59ab559UL,0xcf45cf40UL,0xbc9dbca3UL,0xc040c049UL,0x92879268UL,0x3f153fd0UL,0x26eb2694UL,0x40c940ceUL,0x1d0b1de6UL,0x2fec2f6eUL,0xa967a91aUL,0x1cfd1c43UL,0x25ea2560UL,0xdabfdaf9UL,0x02f70251UL,0xa196a145UL,0xed5bed76UL,0x5dc25d28UL,0x241c24c5UL,0xe9aee9d4UL,0xbe6abef2UL,0xee5aee82UL,0xc341c3bdUL,0x060206f3UL,0xd14fd152UL,0xe45ce48cUL,0x07f40756UL,0x5c345c8dUL,0x180818e1UL,0xae93ae4cUL,0x9573953eUL,0xf553f597UL,0x413f416bUL,0x140c141cUL,0xf652f663UL,0xaf65afe9UL,0xe25ee27fUL,0x78287848UL,0xf8a1f8cfUL,0x110f111bUL,0xc4b5c4ebUL,0x1b091b15UL,0x5a365a7eUL,0xb69bb6adUL,0x473d4798UL,0x6a266aa7UL,0xbb69bbf5UL,0x4ccd4c33UL,0xba9fba50UL,0x2d1b2d3fUL,0xb99eb9a4UL,0x9c749cc4UL,0x722e7246UL,0x772d7741UL,0xcdb2cd11UL,0x29ee299dUL,0x16fb164dUL,0x01f601a5UL,0xd74dd7a1UL,0xa361a314UL,0x49ce4934UL,0x8d7b8ddfUL,0x423e429fUL,0x937193cdUL,0xa297a2b1UL,0x04f504a2UL,0xb868b801UL,0x00000000UL,0x742c74b5UL,0xa060a0e0UL,0x211f21c2UL,0x43c8433aUL,0x2ced2c9aUL,0xd9bed90dUL,0xca46ca47UL,0x70d97017UL,0xdd4bddafUL,0x79de79edUL,0x67d467ffUL,0x23e82393UL,0xde4ade5bUL,0xbd6bbd06UL,0x7e2a7ebbUL,0x34e5347bUL,0x3a163ad7UL,0x54c554d2UL,0x62d762f8UL,0xff55ff99UL,0xa794a7b6UL,0x4acf4ac0UL,0x301030d9UL,0x0a060a0eUL,0x98819866UL,0x0bf00babUL,0xcc44ccb4UL,0xd5bad5f0UL,0x3ee33e75UL,0x0ef30eacUL,0x19fe1944UL,0x5bc05bdbUL,0x858a8580UL,0xecadecd3UL,0xdfbcdffeUL,0xd848d8a8UL,0x0c040cfdUL,0x7adf7a19UL,0x58c1582fUL,0x9f759f30UL,0xa563a5e7UL,0x50305070UL,0x2e1a2ecbUL,0x120e12efUL,0xb76db708UL,0xd44cd455UL,0x3c143c24UL,0x5f355f79UL,0x712f71b2UL,0x38e13886UL,0xfda2fdc8UL,0x4fcc4fc7UL,0x4b394b65UL,0xf957f96aUL,0x0df20d58UL,0x9d829d61UL,0xc947c9b3UL,0xefacef27UL,0x32e73288UL,0x7d2b7d4fUL,0xa495a442UL,0xfba0fb3bUL,0xb398b3aaUL,0x68d168f6UL,0x817f8122UL,0xaa66aaeeUL,0x827e82d6UL,0xe6abe6ddUL,0x9e839e95UL,0x45ca45c9UL,0x7b297bbcUL,0x6ed36e05UL,0x443c446cUL,0x8b798b2cUL,0x3de23d81UL,0x271d2731UL,0x9a769a37UL,0x4d3b4d96UL,0xfa56fa9eUL,0xd24ed2a6UL,0x221e2236UL,0x76db76e4UL,0x1e0a1e12UL,0xb46cb4fcUL,0x37e4378fUL,0xe75de778UL,0xb26eb20fUL,0x2aef2a69UL,0xf1a6f135UL,0xe3a8e3daUL,0xf7a4f7c6UL,0x5937598aUL,0x868b8674UL,0x56325683UL,0xc543c54eUL,0xeb59eb85UL,0xc2b7c218UL,0x8f8c8f8eUL,0xac64ac1dUL,0x6dd26df1UL,0x3be03b72UL,0xc7b4c71fUL,0x15fa15b9UL,0x090709faUL,0x6f256fa0UL,0xeaafea20UL,0x898e897dUL,0x20e92067UL,0x28182838UL,0x64d5640bUL,0x83888373UL,0xb16fb1fbUL,0x967296caUL,0x6c246c54UL,0x08f1085fUL,0x52c75221UL,0xf351f364UL,0x652365aeUL,0x847c8425UL,0xbf9cbf57UL,0x6321635dUL,0x7cdd7ceaUL,0x7fdc7f1eUL,0x9186919cUL,0x9485949bUL,0xab90ab4bUL,0xc642c6baUL,0x57c45726UL,0xe5aae529UL,0x73d873e3UL,0x0f050f09UL,0x030103f4UL,0x3612362aUL,0xfea3fe3cUL,0xe15fe18bUL,0x10f910beUL,0x6bd06b02UL,0xa891a8bfUL,0xe858e871UL,0x69276953UL,0xd0b9d0f7UL,0x48384891UL,0x351335deUL,0xceb3cee5UL,0x55335577UL,0xd6bbd604UL,0x90709039UL,0x80898087UL,0xf2a7f2c1UL,0xc1b6c1ecUL,0x6622665aUL,0xad92adb8UL,0x602060a9UL,0xdb49db5cUL,0x1aff1ab0UL,0x887888d8UL,0x8e7a8e2bUL,0x8a8f8a89UL,0x13f8134aUL,0x9b809b92UL,0x39173923UL,0x75da7510UL,0x53315384UL,0x51c651d5UL,0xd3b8d303UL,0x5ec35edcUL,0xcbb0cbe2UL,0x997799c3UL,0x3311332dUL,0x46cb463dUL,0x1ffc1fb7UL,0x61d6610cUL,0x4e3a4e62UL
};

// local table
LOCAL UINT32 groestl_T_local[256*8];
const UINT32 LOCAL *groestl_T0 = &groestl_T_local[0 * 256];
const UINT32 LOCAL *groestl_T1 = &groestl_T_local[1 * 256];
const UINT32 LOCAL *groestl_T2 = &groestl_T_local[2 * 256];
const UINT32 LOCAL *groestl_T3 = &groestl_T_local[3 * 256];
const UINT32 LOCAL *groestl_T4 = &groestl_T_local[4 * 256];
const UINT32 LOCAL *groestl_T5 = &groestl_T_local[5 * 256];
const UINT32 LOCAL *groestl_T6 = &groestl_T_local[6 * 256];
const UINT32 LOCAL *groestl_T7 = &groestl_T_local[7 * 256];

// init, once per kernel
UINT32 nLocalId = LOCALID;
{
for(i = 0; i < 256 * 8; i += WORKSIZE)
groestl_T_local[i + nLocalId ] = groestl_T_init[i + nLocalId];
}

// declarations
UINT32 hash[32]; // hash[16..31] - scratch buffer

UINT32 groestl_BuffB[32];
UINT32 groestl_BuffC[32];
unsigned groestl_i;
unsigned index;

// inlined function body
groestl_BuffC[16] = hash[16] = 0x80;
groestl_BuffC[17] = hash[17] = 0;
groestl_BuffC[18] = hash[18] = 0;
groestl_BuffC[19] = hash[19] = 0;
groestl_BuffC[20] = hash[20] = 0;
groestl_BuffC[21] = hash[21] = 0;
groestl_BuffC[22] = hash[22] = 0;
groestl_BuffC[23] = hash[23] = 0;
groestl_BuffC[24] = hash[24] = 0;
groestl_BuffC[25] = hash[25] = 0;
groestl_BuffC[26] = hash[26] = 0;
groestl_BuffC[27] = hash[27] = 0;
groestl_BuffC[28] = hash[28] = 0;
groestl_BuffC[29] = hash[29] = 0;
groestl_BuffC[30] = hash[30] = 0;
hash[31] = 0x01000000;
groestl_BuffC[31] = 0x01020000L;

#pragma unroll 16
for (groestl_i = 0; groestl_i < 16; groestl_i++)
{
groestl_BuffC[groestl_i] = hash[groestl_i];
}

for(groestl_i=0; groestl_i < 0x0d000000u; groestl_i+=0x01000000u)
{
groestl_QMIX(hash, groestl_BuffB, groestl_i)
groestl_i+=0x01000000u;
groestl_QMIX(groestl_BuffB, hash, groestl_i)
}

for(groestl_i=0; groestl_i<13; ++groestl_i)
{
groestl_PMIX(groestl_BuffC, groestl_BuffB, groestl_i)
++groestl_i;
groestl_PMIX(groestl_BuffB, groestl_BuffC, groestl_i)
}

#pragma unroll 32
for(groestl_i = 0; groestl_i < 32-1; groestl_i++)
{
hash[groestl_i] ^= groestl_BuffC[groestl_i];
groestl_BuffB[groestl_i] = hash[groestl_i];
}
hash[31] ^= 0x00020000UL ^ groestl_BuffC[31];
groestl_BuffB[31] = hash[31];

for(groestl_i = 0; groestl_i < 14;)
{
groestl_PMIX(groestl_BuffB, groestl_BuffC, groestl_i)
++groestl_i;
groestl_PMIX(groestl_BuffC, groestl_BuffB, groestl_i)
++groestl_i;
}

#pragma unroll 16
for(groestl_i = 0; groestl_i < 16; ++groestl_i)
{
hash[groestl_i] = groestl_BuffB[16+groestl_i] ^ hash[16+groestl_i];
}


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 05, 2015, 07:16:49 AM

Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

Myr-Groestl must do SHA256 as well, IIRC - of course pure Groestl is faster.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on March 05, 2015, 08:45:32 AM

Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

Myr-Groestl must do SHA256 as well, IIRC - of course pure Groestl is faster.

myr-groestl should be faster because its has a single round of groestl (14 iterations) + sha; groestlcoin is groestl + groestl again, so slower.
it's just that I do not have enough free time to work on all these algos.....
Now wolf0 just did a fantastic job on whirlpoolx and I want to understand the magic ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 05, 2015, 08:48:41 AM

Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

Myr-Groestl must do SHA256 as well, IIRC - of course pure Groestl is faster.

myr-groestl should be faster because its has a single round of groestl (14 iterations) + sha; groestlcoin is groestl + groestl again, so slower.
it's just that I do not have enough free time to work on all these algos.....
Now wolf0 just did a fantastic job on whirlpoolx and I want to understand the magic ;-)

Haha, you ain't seen impressive yet! Check the thread, I'm about to post again!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on March 05, 2015, 08:56:35 AM

Pallas,

Are you planning on adding myriad-groestl support in the future? If not, could you explain why not? Is it because your groestl kernel is already faster than the myriad-groestl?

Also, are you planning on putting your work on github? Again, if not, could you explain why not?

It seems to me that both are important ways to further your efforts and establish your reputation.

Best regards as always.

HR

Myr-Groestl must do SHA256 as well, IIRC - of course pure Groestl is faster.

myr-groestl should be faster because its has a single round of groestl (14 iterations) + sha; groestlcoin is groestl + groestl again, so slower.
it's just that I do not have enough free time to work on all these algos.....
Now wolf0 just did a fantastic job on whirlpoolx and I want to understand the magic ;-)

Haha, you ain't seen impressive yet! Check the thread, I'm about to post again!

OMG, this means a lot less reading and TV this week for me LoL!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on March 06, 2015, 04:26:47 AM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on March 06, 2015, 09:04:19 AM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 06, 2015, 07:25:15 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on March 07, 2015, 03:55:11 AM
It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.
Yes, transpose, do bitsliced calculation and transpose back, that will work. Does GCN have something like PMOVMSKB (https://mischasan.wordpress.com/2011/07/24/what-is-sse-good-for-transposing-a-bit-matrix/)?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 07, 2015, 11:30:45 PM
It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.
Yes, transpose, do bitsliced calculation and transpose back, that will work. Does GCN have something like PMOVMSKB (https://mischasan.wordpress.com/2011/07/24/what-is-sse-good-for-transposing-a-bit-matrix/)?

I don't think so.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on March 09, 2015, 08:45:42 AM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 16, 2015, 12:26:56 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

Hey - a few hours ago, I remembered your OpenCL frustrations with 14.9 and above, and decided to take a look at your Groestlcoin code again. Fixed it up just a little bit, and while the resulting binaries aren't quite as good as the ones using GCN assembly, they outperform the original OpenCL on its intended driver.

Stock Pallas' OpenCL, available in the OP, used with 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinpallas-03162015.png
Modified version of that OpenCL, used with the same 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-03162015.png

I'm not running old drivers on any rig right now, and I don't intend to change that in the near future, so comparing my numbers to the numbers in the OP, 290X goes from 26.4MH/s to 29.11MH/s - substantial.
Other cards, as well as clocks and such are in the screenshot. Oh, and I know memclock doesn't matter here, but I set it to 1500 by force of habit.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on March 16, 2015, 12:46:38 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

Hey - a few hours ago, I remembered your OpenCL frustrations with 14.9 and above, and decided to take a look at your Groestlcoin code again. Fixed it up just a little bit, and while the resulting binaries aren't quite as good as the ones using GCN assembly, they outperform the original OpenCL on its intended driver.

Stock Pallas' OpenCL, available in the OP, used with 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinpallas-03162015.png
Modified version of that OpenCL, used with the same 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-03162015.png

I'm not running old drivers on any rig right now, and I don't intend to change that in the near future, so comparing my numbers to the numbers in the OP, 290X goes from 26.4MH/s to 29.11MH/s - substantial.
Other cards, as well as clocks and such are in the screenshot. Oh, and I know memclock doesn't matter here, but I set it to 1500 by force of habit.

thanks Wolf0, but I already got over 34, see op. (experimental v2, bin some posts ago)
it's 2-3% faster than asm version.
it's only for Hawaii and 14.12, though; 14.9 is damned!
next step is bitslicing, but I do not have the time to work on it ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 16, 2015, 12:53:20 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

Hey - a few hours ago, I remembered your OpenCL frustrations with 14.9 and above, and decided to take a look at your Groestlcoin code again. Fixed it up just a little bit, and while the resulting binaries aren't quite as good as the ones using GCN assembly, they outperform the original OpenCL on its intended driver.

Stock Pallas' OpenCL, available in the OP, used with 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinpallas-03162015.png
Modified version of that OpenCL, used with the same 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-03162015.png

I'm not running old drivers on any rig right now, and I don't intend to change that in the near future, so comparing my numbers to the numbers in the OP, 290X goes from 26.4MH/s to 29.11MH/s - substantial.
Other cards, as well as clocks and such are in the screenshot. Oh, and I know memclock doesn't matter here, but I set it to 1500 by force of habit.

thanks Wolf0, but I already got over 34, see op. (experimental v2, bin some posts ago)
it's 2-3% faster than asm version.
it's only for Hawaii and 14.12, though; 14.9 is damned!
next step is bitslicing, but I do not have the time to work on it ;-)

As I said - I noticed. However, notice the 280X speeds? You haven't been able to create binaries that good for any chip but Hawaii, AFAIK.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on March 16, 2015, 01:12:34 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

Hey - a few hours ago, I remembered your OpenCL frustrations with 14.9 and above, and decided to take a look at your Groestlcoin code again. Fixed it up just a little bit, and while the resulting binaries aren't quite as good as the ones using GCN assembly, they outperform the original OpenCL on its intended driver.

Stock Pallas' OpenCL, available in the OP, used with 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinpallas-03162015.png
Modified version of that OpenCL, used with the same 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-03162015.png

I'm not running old drivers on any rig right now, and I don't intend to change that in the near future, so comparing my numbers to the numbers in the OP, 290X goes from 26.4MH/s to 29.11MH/s - substantial.
Other cards, as well as clocks and such are in the screenshot. Oh, and I know memclock doesn't matter here, but I set it to 1500 by force of habit.

thanks Wolf0, but I already got over 34, see op. (experimental v2, bin some posts ago)
it's 2-3% faster than asm version.
it's only for Hawaii and 14.12, though; 14.9 is damned!
next step is bitslicing, but I do not have the time to work on it ;-)

As I said - I noticed. However, notice the 280X speeds? You haven't been able to create binaries that good for any chip but Hawaii, AFAIK.

I do not have the card so I can't test it, but I know that on hawaii it can use two wavefronts, but only 1 on tahiti.
Does your kernel run 2 wavefronts on tahiti, as the asm version does?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 16, 2015, 01:43:54 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

Hey - a few hours ago, I remembered your OpenCL frustrations with 14.9 and above, and decided to take a look at your Groestlcoin code again. Fixed it up just a little bit, and while the resulting binaries aren't quite as good as the ones using GCN assembly, they outperform the original OpenCL on its intended driver.

Stock Pallas' OpenCL, available in the OP, used with 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinpallas-03162015.png
Modified version of that OpenCL, used with the same 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-03162015.png

I'm not running old drivers on any rig right now, and I don't intend to change that in the near future, so comparing my numbers to the numbers in the OP, 290X goes from 26.4MH/s to 29.11MH/s - substantial.
Other cards, as well as clocks and such are in the screenshot. Oh, and I know memclock doesn't matter here, but I set it to 1500 by force of habit.

thanks Wolf0, but I already got over 34, see op. (experimental v2, bin some posts ago)
it's 2-3% faster than asm version.
it's only for Hawaii and 14.12, though; 14.9 is damned!
next step is bitslicing, but I do not have the time to work on it ;-)

As I said - I noticed. However, notice the 280X speeds? You haven't been able to create binaries that good for any chip but Hawaii, AFAIK.

I do not have the card so I can't test it, but I know that on hawaii it can use two wavefronts, but only 1 on tahiti.
Does your kernel run 2 wavefronts on tahiti, as the asm version does?

Mine's got 2 waves in flight on Hawaii - I believe editing to get another wave in flight on Tahiti and Pitcairn should be simple, stand by.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: iju76 on March 16, 2015, 01:44:41 PM
win7-64 -- sgminer-5-dev-neoscrypt-windows-new2 -- dr-14.7

http://s001.radikal.ru/i194/1503/f3/09a2627a6270.png


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 16, 2015, 03:30:25 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)

I'd really like to see that implemented on sgminer, but I'm not sure it'll be faster: nvidia is a much different architecture.
On a side note, interest in mining groestlcoin based PoW coins is fading because the only coin with enough volume is switching to 1/10 reward soon, the others are dying, dead or with very little reward anyway.

It'd be better to just bitslice the S-box, I think, since we don't have warp shuffle.

I've seen what you achived on whirlpoolx: assuming a similar improvement can be made on groestl as well, that would mean more than 80 Mh/s.
Now, since the time I can dedicate to such project is a few minutes a day, it would take months. Volounters? :-)

Hey - a few hours ago, I remembered your OpenCL frustrations with 14.9 and above, and decided to take a look at your Groestlcoin code again. Fixed it up just a little bit, and while the resulting binaries aren't quite as good as the ones using GCN assembly, they outperform the original OpenCL on its intended driver.

Stock Pallas' OpenCL, available in the OP, used with 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinpallas-03162015.png
Modified version of that OpenCL, used with the same 14.12 drivers (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-03162015.png

I'm not running old drivers on any rig right now, and I don't intend to change that in the near future, so comparing my numbers to the numbers in the OP, 290X goes from 26.4MH/s to 29.11MH/s - substantial.
Other cards, as well as clocks and such are in the screenshot. Oh, and I know memclock doesn't matter here, but I set it to 1500 by force of habit.

thanks Wolf0, but I already got over 34, see op. (experimental v2, bin some posts ago)
it's 2-3% faster than asm version.
it's only for Hawaii and 14.12, though; 14.9 is damned!
next step is bitslicing, but I do not have the time to work on it ;-)

As I said - I noticed. However, notice the 280X speeds? You haven't been able to create binaries that good for any chip but Hawaii, AFAIK.

I do not have the card so I can't test it, but I know that on hawaii it can use two wavefronts, but only 1 on tahiti.
Does your kernel run 2 wavefronts on tahiti, as the asm version does?

Hmm... it's one hell of a lot harder than I anticipated to lose two goddamned VGPRs than I thought it'd be.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on March 16, 2015, 03:33:40 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)
Hi,

Because of my curiosity I really had to check that bitsliced code :D and well... I must say that NV has better instructions to do it:
__byte_perm(x, 0, 1010)>>s:  this could be emulated by an AND and a MAD24 and az SHR. 3 instead of 2 cycle.
__byte_perm(x, 0, 3232)>>s:  SHR, MAD24, SHR   also 3 instead of 2.
__byte_perm(x, y, 5410)      :  SHL, BFE      2 instead of 1 instr.  (Even the Intel SSE has many instructions for these things since ages :S)
And there are lots of bitwise logical instructions where NV is 2x faster because NV has a 3 op logic instruction with all the possible 16*16 logic operator combinations.
There are shuffling between 4 lanes: That is not a problem on GCN with ds_swizzle, otherwise it needs LDS on OpenCL.
I've just checked the GCN 1.3 ISA manual and (at least there) I haven't found byte_swizzle and no 3 operand logic instructions either.

Anyways, It would be interesting that how this totally different approach can perform compared to the table based one.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 16, 2015, 03:35:54 PM
I have groestl code from smelter (first GPU miner for quark). May be it have some tricks for your work. It was rather fast on radeon HD 5xxx series.
Today my code is obsolete, it has already been discussed in this thread. BTW, another source of tricks is cbuchner1 (https://bitcointalk.org/index.php?action=profile;u=33385)'s bitsliced and byteshuffled code (https://github.com/cbuchner1/ccminer/blob/master/groestl_functions_quad.cu)
Hi,

Because of my curiosity I really had to check that bitsliced code :D and well... I must say that NV has better instructions to do it:
__byte_perm(x, 0, 1010)>>s:  this could be emulated by an AND and a MAD24 and az SHR. 3 instead of 2 cycle.
__byte_perm(x, 0, 3232)>>s:  SHR, MAD24, SHR   also 3 instead of 2.
__byte_perm(x, y, 5410)      :  SHL, BFE      2 instead of 1 instr.  (Even the Intel SSE has many instructions for these things since ages :S)
And there are lots of bitwise logical instructions where NV is 2x faster because NV has a 3 op logic instruction with all the possible 16*16 logic operator combinations.
There are shuffling between 4 lanes: That is not a problem on GCN with ds_swizzle, otherwise it needs LDS on OpenCL.
I've just checked the GCN 1.3 ISA manual and (at least there) I haven't found byte_swizzle and no 3 operand logic instructions either.

Anyways, It would be interesting that how this totally different approach can perform compared to the table based one.

Well, I've done Whirlpool-512 with no lookups at all, and it kinda sucks on GPU. It'll probably be a beast on FPGA, though!


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on March 16, 2015, 07:47:11 PM
Yes, transpose, do bitsliced calculation and transpose back, that will work. Does GCN have something like PMOVMSKB (https://mischasan.wordpress.com/2011/07/24/what-is-sse-good-for-transposing-a-bit-matrix/)?
I don't think so.
May be VCC (vector condition code) will do the trick, so normal and bitsliced operations could be cheaply interleaved

NV has a 3 op logic instruction with all the possible 16*16 logic operator combinations.
I've just checked the GCN 1.3 ISA manual and (at least there) I haven't found byte_swizzle and no 3 operand logic instructions either.
Yes, AMD's GCN is overplayed by VPTERNLOGD and VPTERNLOGQ from Intel AVX512 and LOP3.LUT by NVidia :(


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on March 16, 2015, 07:54:01 PM
Hmm... it's one hell of a lot harder than I anticipated to lose two goddamned VGPRs than I thought it'd be.
Have you rotated table values left by 3 bits? ;) Not sure it will help with register usage through...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 16, 2015, 07:59:02 PM
Hmm... it's one hell of a lot harder than I anticipated to lose two goddamned VGPRs than I thought it'd be.
Have you rotated table values left by 3 bits? ;) Not sure it will help with register usage through...

Rotations seem to hurt reg usage a bit. The source REALLY needs cleaning, but IMO, it's rather well done code by Pallas. I'm not really used to seeing anyone with a semblance of clue doing AMD miners.  :P


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on March 16, 2015, 09:02:43 PM
Hmm... it's one hell of a lot harder than I anticipated to lose two goddamned VGPRs than I thought it'd be.
Have you rotated table values left by 3 bits? ;) Not sure it will help with register usage through...

Rotations seem to hurt reg usage a bit. The source REALLY needs cleaning, but IMO, it's rather well done code by Pallas. I'm not really used to seeing anyone with a semblance of clue doing AMD miners.  :P

Now I've put some parts of the code (ex. the list of rbtts) in pragma unrolled for loops and it looks much better ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on March 16, 2015, 09:34:52 PM
Hmm... it's one hell of a lot harder than I anticipated to lose two goddamned VGPRs than I thought it'd be.
Have you rotated table values left by 3 bits? ;) Not sure it will help with register usage through...

Rotations seem to hurt reg usage a bit. The source REALLY needs cleaning, but IMO, it's rather well done code by Pallas. I'm not really used to seeing anyone with a semblance of clue doing AMD miners.  :P

Now I've put some parts of the code (ex. the list of rbtts) in pragma unrolled for loops and it looks much better ;-)

Nice - now, I haven't tried this, so the OpenCL compiler may mangle the shit out of it (unpredictable little fucker) - but you have vector types. They look a lot nicer than for loops. :P


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on April 03, 2015, 05:49:17 AM
Any chance of getting your latest OCL source to try on 280x (Hawaii) :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 03, 2015, 10:36:47 AM
Any chance of getting your latest OCL source to try on 280x (Hawaii) :)

I assume you meant Tahiti.
I've acquired a 280x myself: it's not worth using v2 on it, hashrate is lower than with v1.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on April 03, 2015, 11:20:07 AM
Any chance of getting your latest OCL source to try on 280x (Hawaii) :)

I assume you meant Tahiti.
I've acquired a 280x myself: it's not worth using v2 on it, hashrate is lower than with v1.

Doh, yeah Tahiti 2 wavefronts not possible?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 03, 2015, 11:26:39 AM
Any chance of getting your latest OCL source to try on 280x (Hawaii) :)

I assume you meant Tahiti.
I've acquired a 280x myself: it's not worth using v2 on it, hashrate is lower than with v1.

Doh, yeah Tahiti 2 wavefronts not possible?

Both me and Wolf0 tried that and (at least for me) stopped trying after a while. Funny no longer ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 08, 2015, 09:11:38 AM
Any chance of getting your latest OCL source to try on 280x (Hawaii) :)

I assume you meant Tahiti.
I've acquired a 280x myself: it's not worth using v2 on it, hashrate is lower than with v1.

Doh, yeah Tahiti 2 wavefronts not possible?

Both me and Wolf0 tried that and (at least for me) stopped trying after a while. Funny no longer ;-)

I think I've beaten your ASM with pure OpenCL on 290X.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 08, 2015, 09:35:38 AM
Any chance of getting your latest OCL source to try on 280x (Hawaii) :)

I assume you meant Tahiti.
I've acquired a 280x myself: it's not worth using v2 on it, hashrate is lower than with v1.

Doh, yeah Tahiti 2 wavefronts not possible?

Both me and Wolf0 tried that and (at least for me) stopped trying after a while. Funny no longer ;-)

I think I've beaten your ASM with pure OpenCL on 290X.

Some of your last tips (and smolen's) can be applied to this kernel as well, I think it can reach 38/40 Mh/s ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 08, 2015, 09:56:49 AM
Any chance of getting your latest OCL source to try on 280x (Hawaii) :)

I assume you meant Tahiti.
I've acquired a 280x myself: it's not worth using v2 on it, hashrate is lower than with v1.

Doh, yeah Tahiti 2 wavefronts not possible?

Both me and Wolf0 tried that and (at least for me) stopped trying after a while. Funny no longer ;-)

I think I've beaten your ASM with pure OpenCL on 290X.

Some of your last tips (and smolen's) can be applied to this kernel as well, I think it can reach 38/40 Mh/s ;-)

Possibly - but turns out, it didn't reach the speed I thought it did; it's still slightly under. Damn.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 08, 2015, 10:27:01 AM
Any chance of getting your latest OCL source to try on 280x (Hawaii) :)

I assume you meant Tahiti.
I've acquired a 280x myself: it's not worth using v2 on it, hashrate is lower than with v1.

Doh, yeah Tahiti 2 wavefronts not possible?

Both me and Wolf0 tried that and (at least for me) stopped trying after a while. Funny no longer ;-)

Just did it. :D

2 waves in flight on Tahiti, for 21MH/s on a 7950 @ 1125/1250. Screenshot (NSFW): https://ottrbutt.com/miner/groestlcoinwolf-04082015.png


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on April 08, 2015, 10:55:23 AM
Good work on the groest. Smolens quark miner does around 2 mhash on the 280x.
My gtx 980 does 20mhash. The competition is sleeping...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 08, 2015, 11:10:51 AM
Good work on the groest. Smolens quark miner does around 2 mhash on the 280x.
My gtx 980 does 20mhash. The competition is sleeping...

I think that just applying some well known tricks, already available on public kernels, will bring quark hashrate to around 10.
Thing is, it's not funny. Optimizing single kernel algos is much more interesting, imho.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 08, 2015, 11:29:10 AM
Good work on the groest. Smolens quark miner does around 2 mhash on the 280x.
My gtx 980 does 20mhash. The competition is sleeping...

I think that just applying some well known tricks, already available on public kernels, will bring quark hashrate to around 10.
Thing is, it's not funny. Optimizing single kernel algos is much more interesting, imho.

I agree about the easy Quark speedups. Doing this was fun, doing the foundation for Quark would be simply boring. Sure, it would get fun later, when working on one algo at a time, but to get there...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on April 08, 2015, 02:17:44 PM
15 years ago I worked for a company in the silicon valley. My collegues earned xxx.xxx$ a year but I was a student at san francisco state u.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 08, 2015, 02:23:06 PM
15 years ago I worked for a company in the silicon valley. My collegues earned xxx.xxx$ but I was a student at san francisco state u.

20 years ago I started programming professionally.
Still, a lot of my work is free or almost free :-)
I was wondering if us (miner developers) should unite to take the best out of it.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on April 08, 2015, 02:24:08 PM
Today i earn $xxx.xxx  a year. Optimizing is just a hobby..


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 08, 2015, 02:27:46 PM
Today i earn $xxx.xxx  a year. Optimizing is just a hobby..

same for me.
still, if a fun job also remunerates, it's even better ;-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on April 08, 2015, 02:34:17 PM
I've also lived and worked in st. Petersburg Russia. My collegues are some of the best programmers in the world.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 08, 2015, 02:52:42 PM
I've also lived and worked in st. Petersburg Russia. My collegues are some of the best programmers in the world.

Ok but it looks like you mistaken this thread for a job search one :-D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep1 on April 08, 2015, 03:00:23 PM
I've also lived and worked in st. Petersburg Russia. My collegues are some of the best programmers in the world.

Ok but it looks like you mistaken this thread for a job search one :-D
;D ;D ;D


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on April 08, 2015, 03:14:05 PM
Nah.  I don't need to work.. My program is making money


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on April 08, 2015, 08:04:49 PM
Good work on the groest. Smolens quark miner does around 2 mhash on the 280x.
My gtx 980 does 20mhash. The competition is sleeping...
Some of competitors are awake, taking exercises with pen and paper to get all AES-wannabees at once :D Doing it all by hand, algo by algo will be just boring.
http://cdn.themis-media.com/media/global/images/library/deriv/75/75543.jpg

15 years ago I worked for a company in the silicon valley. My collegues earned xxx.xxx$ a year but I was a student at san francisco state u.
I've also lived and worked in st. Petersburg Russia. My collegues are some of the best programmers in the world.
Triangulated (https://en.wikipedia.org/wiki/Sun_Microsystems)

Some of your last tips (and smolen's) can be applied to this kernel as well, I think it can reach 38/40 Mh/s ;-)

Last but one trick in my WhirlpoolX kernel. Anyway, I'm going to abandon table approach, no much sense to keep it secret.
Code:
static const CONSTANT UINT64 arrPrecalc_post_l27[256] = ...
#define baseL27 ((UINT32)&arrPrecalc_post_l27[0])
#define TC0off8_l27(off8) (*(const CONSTANT UINT64*)&(((const CONSTANT UINT8*)0)[off8]))
#define LUT3_r3(v) ASX64(TC0off8_l27(bitselect(baseL27, (UINT32)(as_ulong(v) >> 24), 0x7F8U)))


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on April 08, 2015, 08:19:05 PM
I was wondering if us (miner developers) should unite to take the best out of it.
Cartel will take all the fun out of game and possibly destroy PoW world. On the other hand, PoS landscape could benefit from some polishing :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 08, 2015, 08:23:55 PM
I was wondering if us (miner developers) should unite to take the best out of it.
Cartel will take all the fun out of game and possibly destroy PoW world. On the other hand, PoS landscape could benefit from some polishing :)

Haha, too true. Also, just going through this for myself, here:
Code:
static const __constant ulong arrPrecalc_post_l27[256] = ...
#define baseL27 ((uint)&arrPrecalc_post_l27[0])
#define TC0off8_l27(off8) (*(const __constant ulong *)&(((const __constant uint8 *)0)[off8]))
#define LUT3_r3(v) as_ulong(TC0off8_l27(bitselect(baseL27, (uint)(as_ulong(v) >> 24), 0x7F8U))

That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on April 08, 2015, 08:27:08 PM
That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes :) Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type :)
X64/ASX64 macros keep code debugable on CPU - MSVC is too handy
Code:
#ifdef __OPENCL_VERSION__
#define X64 uint2
#define ASX64(v) (as_uint2(v))
#else
#define X64 UINT64
#define ASX64(v) (v)
#endif


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 08, 2015, 08:46:11 PM
That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes :) Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type :)

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on April 08, 2015, 09:21:19 PM
That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes :) Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type :)

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...
Tables are constant, just prerotated left by 3 bit (size of one uint2 when used as index). Well, this stuff needs comments, if kernel will be published. Money are in X11 and Monero, not so much value in Whirlpool code, I could just drop it somewhere, but it will give everyone free boost in X11 :(


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 08, 2015, 09:43:52 PM
That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes :) Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type :)

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...
Tables are constant, just prerotated left by 3 bit (size of one uint2 when used as index). Well, this stuff needs comments, if kernel will be published. Money are in X11 and Monero, not so much value in Whirlpool code, I could just drop it somewhere, but it will give everyone free boost in X11 :(

Maybe not: people are using wolf0's precompiled x11 binaries, just adding your trick to stock kernels will not come close to them speed-wise.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 08, 2015, 09:57:25 PM
That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes :) Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type :)

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...
Tables are constant, just prerotated left by 3 bit (size of one uint2 when used as index). Well, this stuff needs comments, if kernel will be published. Money are in X11 and Monero, not so much value in Whirlpool code, I could just drop it somewhere, but it will give everyone free boost in X11 :(

Whirlpool's not even in X11 - might help a bit with Groestl, though.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on April 13, 2015, 10:44:28 AM
The need for an improved goestl kernel is now immediate ... please do what u can ... I am just C, C++ coder and am not fully into multi thread GPU coding ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on April 13, 2015, 11:38:45 AM
Pallas, can you  rewrite this groesl-256 implementation to a groestl-512 and add it to sgminer (x11,x13,x15).?



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: sp_ on April 13, 2015, 01:53:50 PM
Some of competitors are awake, taking exercises with pen and paper to get all AES-wannabees at once :D

Wolf0 claims to know aes from the inside backwords and forwards. Me too.

The answer is SEA :-)



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 13, 2015, 09:10:08 PM
Some of competitors are awake, taking exercises with pen and paper to get all AES-wannabees at once :D

Wolf0 claims to know aes from the inside backwords and forwards. Me too.

The answer is SEA :-)

Dunno what SEA means, but for AMD, you can do it classic, you can do it table-based, or you can do it classic with a twist: Convert to bitslice form. do the S-box, and convert it right back, doing the rest of the ops classic-style, with some optimization tricks.

I would not advise doing it the way Christian did for Nvidia - lack of shfl().


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on April 14, 2015, 08:59:49 AM
Wolf0 claims to know aes from the inside backwords and forwards. Me too.

The answer is SEA :-)
Yes, that makes the game damn addictive.
Look, you told us about wide tables, great idea, but to skip sboxing with it couple more inches deeper inside AES is needed :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 14, 2015, 02:03:28 PM
Pallas, can you  rewrite this groesl-256 implementation to a groestl-512 and add it to sgminer (x11,x13,x15).?

Sorry for the delay.
That would be nice, but everybody's using wolf0's binaries, so why? It would make sense if there is a plan to opensource optimized versions of most of the algos.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 14, 2015, 02:44:33 PM
Pallas, can you  rewrite this groesl-256 implementation to a groestl-512 and add it to sgminer (x11,x13,x15).?

Sorry for the delay.
That would be nice, but everybody's using wolf0's binaries, so why? It would make sense if there is a plan to opensource optimized versions of most of the algos.

I'm guessing sp_ wants you to do the work for him :P


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on April 16, 2015, 05:33:39 AM
@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 16, 2015, 05:49:22 AM
@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x

I didn't leak that, I released it. Checking my records...

EDIT: Okay, most recent record of Neoscrypt I have is 12/23/2014 (NSFW): https://ottrbutt.com/miner/neoscryptwolf-12232014.png


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on April 16, 2015, 03:03:22 PM
@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x

I didn't leak that, I released it. Checking my records...

EDIT: Okay, most recent record of Neoscrypt I have is 12/23/2014 (NSFW): https://ottrbutt.com/miner/neoscryptwolf-12232014.png
Needless to say but I will, I appreciate your work, I have no conception of wavefronts and such, I have tried but I'm just too old to embrace new concepts.  If you have something better for me please do put on Mega :)  Same goes for groestl Pallas :)  U are my heroes :)
And realhet who understands AMD GPU coding better than all of us :)  realhet hetpas assembly kernel still best for 280x and other Tahiti cards AFAIK :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 16, 2015, 03:27:14 PM
Just wanted to say I've tried applying some of the tricks I learnt working on whirlpoolx to the groestl kernel, but it's not so simple.
This kernel is much bigger in size so you can't just copy some good lines of code and it runs faster. Furthermore some of the optimizations I made in the past, make it more time consuming to apply some apparently simple hacks. Wolf0 I'm sure you know what I mean ;-)
Still there is room for improvement, I have some ideas, but the question is: when the profit is gone, and the fun is gone, is it still worth?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on April 16, 2015, 03:32:50 PM
Just wanted to say I've tried applying some of the tricks I learnt working on whirlpoolx to the groestl kernel, but it's not so simple.
This kernel is much bigger in size so you can't just copy some good lines of code and it runs faster. Furthermore some of the optimizations I made in the past, make it more time consuming to apply some apparently simple hacks. Wolf0 I'm sure you know what I mean ;-)
Still there is room for improvement, I have some ideas, but the question is: when the profit is gone, and the fun is gone, is it still worth?

I expect DMD to drop into low teens difficulty after a week or so :)  If it does not mining is dead LOL.  I have a direct interest in this as a partner on donkypool ... 12 miners up from 6 a few weeks ago ... I am currently mining neoscrypt for sale on westhash lol and p=4.8 selling :)  anything less goes to yaamp ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 16, 2015, 03:37:30 PM
Just wanted to say I've tried applying some of the tricks I learnt working on whirlpoolx to the groestl kernel, but it's not so simple.
This kernel is much bigger in size so you can't just copy some good lines of code and it runs faster. Furthermore some of the optimizations I made in the past, make it more time consuming to apply some apparently simple hacks. Wolf0 I'm sure you know what I mean ;-)
Still there is room for improvement, I have some ideas, but the question is: when the profit is gone, and the fun is gone, is it still worth?

It is indeed, as long as all mining profit isn't gone. It's a challenge - learn from it, and use that knowledge elsewhere.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 16, 2015, 06:33:24 PM
@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x

I didn't leak that, I released it. Checking my records...

EDIT: Okay, most recent record of Neoscrypt I have is 12/23/2014 (NSFW): https://ottrbutt.com/miner/neoscryptwolf-12232014.png
Needless to say but I will, I appreciate your work, I have no conception of wavefronts and such, I have tried but I'm just too old to embrace new concepts.  If you have something better for me please do put on Mega :)  Same goes for groestl Pallas :)  U are my heroes :)
And realhet who understands AMD GPU coding better than all of us :)  realhet hetpas assembly kernel still best for 280x and other Tahiti cards AFAIK :)

Nope, I have 21MH/s out of a 7950 at 1125/1250, IIRC, using OpenCL.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on April 16, 2015, 11:31:53 PM
@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x

I didn't leak that, I released it. Checking my records...

EDIT: Okay, most recent record of Neoscrypt I have is 12/23/2014 (NSFW): https://ottrbutt.com/miner/neoscryptwolf-12232014.png
Needless to say but I will, I appreciate your work, I have no conception of wavefronts and such, I have tried but I'm just too old to embrace new concepts.  If you have something better for me please do put on Mega :)  Same goes for groestl Pallas :)  U are my heroes :)
And realhet who understands AMD GPU coding better than all of us :)  realhet hetpas assembly kernel still best for 280x and other Tahiti cards AFAIK :)

Nope, I have 21MH/s out of a 7950 at 1125/1250, IIRC, using OpenCL.
Wow! may I have new Neoscrypt kernel, 7950 working hard just doing 278KHs with your older kernel!

Looking ... 1160/1500 :) I have modded card a bit for better cooling :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on April 16, 2015, 11:47:51 PM
@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x

I didn't leak that, I released it. Checking my records...

EDIT: Okay, most recent record of Neoscrypt I have is 12/23/2014 (NSFW): https://ottrbutt.com/miner/neoscryptwolf-12232014.png
Needless to say but I will, I appreciate your work, I have no conception of wavefronts and such, I have tried but I'm just too old to embrace new concepts.  If you have something better for me please do put on Mega :)  Same goes for groestl Pallas :)  U are my heroes :)
And realhet who understands AMD GPU coding better than all of us :)  realhet hetpas assembly kernel still best for 280x and other Tahiti cards AFAIK :)

Nope, I have 21MH/s out of a 7950 at 1125/1250, IIRC, using OpenCL.
Wow! may I have new Neoscrypt kernel, 7950 working hard just doing 278KHs with your older kernel!

The 21MH/s is Groestl, I meant. And... idk, I suppose Neoscrypt isn't very useful at the moment...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: utahjohn on April 17, 2015, 12:08:03 AM
I get 26MHs on 280x mining groestl however I have quit groestl mining of DMD for the moment till diff drops back into the teens.  For some reason ASM kernel crashes 7950 within a few minutes ...  I am mining nneoscrypt on yaamp at present and also selling neo on westhash :)
Buying more DMD than I used to mine direct ???  Will see what happens in next week or so as miners drop like flies on DMD ...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: smolen on April 17, 2015, 04:02:13 AM
Just wanted to say I've tried applying some of the tricks I learnt working on whirlpoolx to the groestl kernel, but it's not so simple.
This kernel is much bigger in size so you can't just copy some good lines of code and it runs faster. Furthermore some of the optimizations I made in the past, make it more time consuming to apply some apparently simple hacks. Wolf0 I'm sure you know what I mean ;-)
Still there is room for improvement, I have some ideas, but the question is: when the profit is gone, and the fun is gone, is it still worth?
Another trick, not for speed, but for cleaning the code - when you want to postpone sboxing of byte, put preimage of zero (0x81 in Whirlpool) there.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on May 17, 2015, 10:21:24 AM
Hi,

Have you checked the new GCN3 ISA manual? http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/AMD_GCN3_Instruction_Set_Architecture.pdf (http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/AMD_GCN3_Instruction_Set_Architecture.pdf)

It has some really useful things like:

- Bytepermute (no more shifts and masks)
- VOP_DPP: It actually does 2 ds_swizzle in the instruction in no time, so optimizing a single thread for 4 lanes costs no more cycles.
- VOP_SDWA: access a word or a byte in the 32bit inputs and in the output too. (again: no more shifts and masks)
- S alu can write memory

No 3 op add, and 3 op bitwise, though.

And they altered some instruction encodings, so I guess my asm will crash on GCN3 immediately. :D



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: MaxDZ8 on May 18, 2015, 03:44:15 PM
That's some truly slick updates!

I was indeed planning to do full AES round without t-tables as the amount of masks are nonsensical.
I had the impression the SALU was immensely updated for Tonga given it takes much more VGPRs on the analyzer.

I wonder how to trick the CL compiler in emitting this code.

But most importantly, what are they waiting to just make an AMD_GCN_swizzle extension!?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: realhet on May 21, 2015, 12:31:48 PM
It doesn't seems like they are implementing gcn specific goodies on the current compiler stack. It's kinda bloated, and AMD_IL awaits for it's replacement since 7970 came out. I'm sure in the upcoming HSA language there will be much more GCN things implemented (except the separated V and S programming).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on July 01, 2015, 09:45:05 PM
I'm interested in knowing the hashrate of R9 285 and R9 Fury X cards, anybody?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on July 03, 2015, 08:31:30 AM
Wolf0 created a faster Tahiti binary and posted about it in the groestlcoin thread:

I have a faster Tahiti binary than Pallas' for Groestlcoin - works on DMD, too. The usage is the same as his binary; I should have more info later.

Get it here: https://ottrbutt.com/miner/wolf-groestlcoinTahitigw256l4.bin

it is indeed faster and works flawlessly.
usage: just rename it over the old one and make sure you set worksize 256 for that card; you can get a bit more hashrate by using 2 or 4 threads.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 13, 2015, 11:32:00 AM
Nothing new in the groestl+groestl area, but I've worked a bit on the groestl+sha variant (myr-groestl for myriad, digibyte, saffron, etc.).
Tahiti is a mess, but I could easily push hawaii over 60 Mh/s, keeping the kernel compatible with the old miners.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 16, 2015, 12:59:30 PM
Nothing new in the groestl+groestl area, but I've worked a bit on the groestl+sha variant (myr-groestl for myriad, digibyte, saffron, etc.).
Tahiti is a mess, but I could easily push hawaii over 60 Mh/s, keeping the kernel compatible with the old miners.

I could finally get rid of scratch registers on Tahiti: now the 280x is doing 35 Mh/s with moderate overclock :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: carlo_0000 on November 02, 2015, 01:30:27 AM
the diamond.cl  is missing on the download
i only see groestlcoin-v1.cl

or must we just rename to diamond ?

so i rename to diamond.cl
but no change in my speed i have 4.7 mh on r9 270  sgminer 4.1.0

i guest groestlcoin-v1.cl is not for diamond, i ve got a lot rejected shares


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on November 02, 2015, 09:01:07 AM
the diamond.cl  is missing on the download
i only see groestlcoin-v1.cl

or must we just rename to diamond ?

so i rename to diamond.cl
but no change in my speed i have 4.7 mh on r9 270  sgminer 4.1.0

i guest groestlcoin-v1.cl is not for diamond, i ve got a lot rejected shares

groestlcoin and diamond use the same block hashing algo so the same opencl kernel applies.
but you must configure the miner to mine for the specific coin because there are differences!
that's why there are two kernels even thought the two kernel files are the same.

please posto your conf file and commandline so I can help you debug it.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: carlo_0000 on November 03, 2015, 12:20:37 AM
so i have r9 270  with driver 15.10

sgminer_diamond_v4.1.0

my batch

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
"E:\myriadcoin\cgminer skein\sgminer_diamond_v4.1.0\sgminer.exe" -k diamond -o stratum+tcp://eu.miningfield.com:3377 -u carlo0000.r9a -p 0 --difficulty-multiplier 0.0039062500 -w 256 -I 22 -T

so i can only use the kernel
the bin are for 290 and 280, i try to use and rename but sgminer crash

my bin file name is  diamondPitcairnglg2tc10688w256l4.bin


i try again it s working now with wolf-groestlcoinTahitigw256l4
the other one crash

i notice i had diamondPitcairnglg2tc10688w256l4.bin in my user folder C:\Users\carlo  , i delete

i have 8.7 mh  ;D @1025mhz

thanks for help

so i run it on my other computer, that one does 2x  9.3 mh @1040mhz  with driver 15.7 with display is at 800*600 with no screen,

so u put this computer to 1040mhz too but i does only 8.8 , but i have lot of stuff running on it a the display it at 1080p
so i put back to 1025 and gonna mine mry skein again on this one, it s less intensive, screen is really slow with mining diamond @ I22
but on skein it s more faster with I8 (max) 140mh

even with double hashrate now on diamond, i still have more incomes with myr with skein on my r9 270
but i don't know how much it s gona make with POS on diamond  so maybe not a big difference in long term

or not i just made new calculation MYR is still droping a lot last days
difficulty is higher and price get down a lot   , last week i was at 120000 satochi day,

today it s only 78000 sat  ???  it s less than mining diamond
so i m going on diamond for now


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on November 03, 2015, 09:12:13 AM
so i have r9 270  with driver 15.10

...snip...

Instead of using binaries made for other chips, why not simply compiling your own for pitcairn? just overwrite diamond.cl, remove the bin files and run.
Let me know how it goes :-)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: carlo_0000 on November 04, 2015, 01:40:16 AM
so i have r9 270  with driver 15.10

...snip...

Instead of using binaries made for other chips, why not simply compiling your own for pitcairn? just overwrite diamond.cl, remove the bin files and run.
Let me know how it goes :-)

it s what i did first   not using the bin file  but speed was 4.6mh

but i think maybe because there was the old bin file in C:\Users\carlo   (sgminer use that one  sometimes) strange but not always

when i see it don't create a bin file in sgminer  directory  than i lnow it s using the one is  C:\Users\carlo

i gonna tri again  remove them both

so i delete, start sgmine  it create a new but  now working 4.6mhz only

so the bin make the difference   not the kernel



Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Heavyiron on November 05, 2015, 09:53:08 PM
Hm, bin from Wolf0 works fine and faster a little bit on 7850 (8 MH/s) and even 5770 (3,6 MH/s). On kernel v1 compiled on 14.7 RC3 driver speeds were 7,2 and 3,2 at the same clocks. Nice work.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 18, 2016, 07:51:14 PM
Myriad-groestl: I tried splitting the kernel into two parts, groestl and sha. I was almost sure it would be an improvement but it is a little slower instead (and it requires a custom miner). It could fix Tahiti slowness, though. I don't know yet because I didn't have such a card ready on the rig to test.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep1 on January 18, 2016, 09:19:34 PM
Myriad-groestl: I tried splitting the kernel into two parts, groestl and sha. I was almost sure it would be an improvement but it is a little slower instead (and it requires a custom miner). It could fix Tahiti slowness, though. I don't know yet because I didn't have such a card ready on the rig to test.
I can provide access to test


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: ajqjjj on January 20, 2016, 07:08:28 AM
Hi There will be something for Cayman (6970) or (5870) Wolf0's binary ?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: MaxDZ8 on January 20, 2016, 07:55:54 AM
Myriad-groestl: I tried splitting the kernel into two parts, groestl and sha. I was almost sure it would be an improvement but it is a little slower instead (and it requires a custom miner). It could fix Tahiti slowness, though. I don't know yet because I didn't have such a card ready on the rig to test.
How did you get this expectation? I also tried that but for very different reason: giving small devices chance at a more efficient task switching. It wasn't worth it even in that case (surprisingly).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on January 20, 2016, 09:47:49 AM
Myriad-groestl: I tried splitting the kernel into two parts, groestl and sha. I was almost sure it would be an improvement but it is a little slower instead (and it requires a custom miner). It could fix Tahiti slowness, though. I don't know yet because I didn't have such a card ready on the rig to test.
How did you get this expectation? I also tried that but for very different reason: giving small devices chance at a more efficient task switching. It wasn't worth it even in that case (surprisingly).

The two algos are very different and I was hoping for a lower registry occupation (especially for sha, which in addition doesn't use LDS) and, thus, more waves in flight.
It's not the case for hawaii but it might be for tahiti, I'll test it as soon as I have some free time and can access the rig.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Wolf0 on January 25, 2016, 07:15:55 PM
Hi There will be something for Cayman (6970) or (5870) Wolf0's binary ?

I don't have a 6970 to test.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: alevlaslo on April 16, 2016, 07:29:11 AM
Experimental Hawaii bin (v2):
https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw256l8.bin

links do not working


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 16, 2016, 08:13:40 AM
Experimental Hawaii bin (v2):
https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw256l8.bin

links do not working

Sorry, Dropbox is gone nuts.
Try this one:

https://app.box.com/s/zsr29tfgv4tpxs1q7451dayzaw3wnoee


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Matinator22 on April 20, 2016, 04:26:53 PM
Experimental Hawaii bin (v2):
https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw256l8.bin

links do not working

Sorry, Dropbox is gone nuts.
Try this one:

https://app.box.com/s/zsr29tfgv4tpxs1q7451dayzaw3wnoee

Could you post a updated kernel link as well? Seems broken. Fn Dropbox.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 21, 2016, 12:12:53 PM
Experimental Hawaii bin (v2):
https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw256l8.bin

links do not working

Sorry, Dropbox is gone nuts.
Try this one:

https://app.box.com/s/zsr29tfgv4tpxs1q7451dayzaw3wnoee

Could you post a updated kernel link as well? Seems broken. Fn Dropbox.

Here it is:

https://app.box.com/s/9vikvemf7acio3uns7taorodqmod42ej


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Matinator22 on April 22, 2016, 06:42:51 PM
Experimental Hawaii bin (v2):
https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw128l8.bin

https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw256l8.bin

links do not working

Sorry, Dropbox is gone nuts.
Try this one:

https://app.box.com/s/zsr29tfgv4tpxs1q7451dayzaw3wnoee

Could you post a updated kernel link as well? Seems broken. Fn Dropbox.

Here it is:

https://app.box.com/s/9vikvemf7acio3uns7taorodqmod42ej

Thanks a bunch :)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on April 26, 2016, 06:48:59 PM
Is anyone still mining this algo? What coins?
BTW, first post as legendary...


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: carlo_0000 on April 27, 2016, 05:33:41 PM
Could someone please share their hashrate with r9 285? I'm curious to see if it outperforms the 280 and how much power it uses.

on diamond mining ,my msi r9 270 oc 1060  does 9.4mh with pitcairn.bin

so i m now testing r9 380 tonga

i have sgminer dmd 4.1.0 , last drivers

i remember i replace the diamond.cl in the past
with default bin it s running at 7mh  stock gpu speed and oc to 1080 7.1mh

with wolf 280x bin (if i remember it s that one) it runs at
15.4.mh asus strix 380 2G stock @ 970   and 17.06mh oc @1080
15.8mh Sapphire Nitro R9 380 2G D5  stock @ 1000  and 16.8mh oc @1080
   
i don't know why Sapphire are slower
most of the link here are dead so i can't check if i have the good bin's & cl

so they slower than the 280

on skein i m doing 208mh @1080, and reported speed of 280x are 185mh if i remember

here is the original bin what sgminer build

http://www.megafileupload.com/no4y/diamondTongaglg2tc10624w256l4.default.bin










Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: restless on May 01, 2016, 06:52:38 PM
Are there binaries/kernel for Myr-Gr?
It seems only gr-gr are posted?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Koltan on May 04, 2016, 07:56:35 PM
Is anyone still mining this algo? What coins?
BTW, first post as legendary...

Hi!

I try to mine DMD. BTW, the links from your first post are not working (still can't try version2 bin).
I've found groestlcoin-v1.cl in the sgminer distribution.
On my HD7790 I get ~4.0 Mh on standart diamond.cl, and 4.7 Mh on your optimized kernel groestlcoin-v1.cl.
The fastest results I got with Wolf0's Tahiti bin. It is 8.5 Mh. Unfortunately, it's still too low to get the DMD mining profitable on Radeon HD7790.
So, I'll have to stay on MYR-Groestl, which gives 17.5 Mh on optimized kernel from ghostlander.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on May 04, 2016, 08:34:57 PM
Is anyone still mining this algo? What coins?
BTW, first post as legendary...

Hi!

I try to mine DMD. BTW, the links from your first post are not working (still can't try version2 bin).
I've found groestlcoin-v1.cl in the sgminer distribution.
On my HD7790 I get ~4.0 Mh on standart diamond.cl, and 4.7 Mh on your optimized kernel groestlcoin-v1.cl.
The fastest results I got with Wolf0's Tahiti bin. It is 8.5 Mh. Unfortunately, it's still too low to get the DMD mining profitable on Radeon HD7790.
So, I'll have to stay on MYR-Groestl, which gives 17.5 Mh on optimized kernel from ghostlander.

Which version of catalyst did you use? Usually 15.3 gives best results.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Koltan on May 04, 2016, 10:41:45 PM
Yes, I'm using 15.3.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Shivoreun on June 07, 2016, 10:53:00 AM
DOWNLOAD

Opensource Kernel (v1):
https://dl.dropboxusercontent.com/u/40353042/Diamond/groestlcoin-v1.cl

Can you, please, put the file into another location, and give us a new link? This is dead.
Thanks.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on June 07, 2016, 10:58:16 AM
DOWNLOAD

Opensource Kernel (v1):
https://dl.dropboxusercontent.com/u/40353042/Diamond/groestlcoin-v1.cl

Can you, please, put the file into another location, and give us a new link? This is dead.
Thanks.

Just look some posts up yours.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Shivoreun on June 07, 2016, 12:23:59 PM
Just look some posts up yours.
Ooops.. Now the link is working (was err 404 before)
Sorry for disturbing.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: CrimsonGT on July 09, 2016, 06:03:50 PM
Spent a lot more time trying to get this working than I care to admit, however I finally got it running and would just like to say thank you!

R9-280x - 23Mh/s

1090Mhz Clock
Stock Memory Clock
0% Power Limit

My only question is that I was never able to find a working download link for your .cl file, I just replaced the .bin file with that from Wolf. Not sure if I would get better results having your .cl file as well but if anyone can provide a working link to that, it would be most appreciated. Given my confusion on how to get started, here is what I did in case anyone else runs into the same problem.

1) Download SPH SG-Miner
2) Run it and connect it to a pool, where it will generate a .bin file in the root folder of the Miner
3) Download Wolf's Tahiti Binary (https://ottrbutt.com/miner/wolf-groestlcoinTahitigw256l4.bin)
4) Place the Binary file into the root folder of the Miner, rename it to that of the one in Step 2 and delete or replace the original

Another question I have is if I can solo mine using this?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on July 10, 2016, 08:17:51 AM
280x: best kernel is Wolf0's which is available as bin only.
Memory clock: set is as low as possible, less power used and higher core overclocking possibility.
Link to the sources: original is currently not working but some posts ago I provided alternatives.
Solo mine: yes.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on August 19, 2016, 12:49:29 PM
Wanted to play with satoshibox, here is a link for the private myr-groestl binary:

https://satoshibox.com/fttcfvpiyhbod7ueidmgdhym


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: Nikolaj on August 19, 2016, 01:11:08 PM
some more details and hashrates :) ?

Will you port them on Pascal? I am more interested there, hawaii it's an old architecture unfortunately, due the power consumption values.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on August 19, 2016, 01:15:00 PM
some more details and hashrates :) ?

Will you port them on Pascal? I am more interested there, hawaii it's an old architecture unfortunately, due the power consumption values.

Speed: 63 Mh/s on r9 290x @1100/150. It is compatible with stock miner.
Hawaii is still very good on groestl, whirlpool and similar algos (with big tables).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: qwep1 on August 19, 2016, 10:23:36 PM
some more details and hashrates :) ?

Will you port them on Pascal? I am more interested there, hawaii it's an old architecture unfortunately, due the power consumption values.

Speed: 63 Mh/s on r9 290x @1100/150. It is compatible with stock miner.
Hawaii is still very good on groestl, whirlpool and similar algos (with big tables).
but what about the 7970 and 280x


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: ylpkm on September 05, 2016, 10:09:47 PM
any updates or settings for the rx 480? for just groestl.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 06, 2016, 01:11:23 AM
any updates or settings for the rx 480? for just groestl.

Did you try running it? Does it work and what speed?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: ylpkm on September 07, 2016, 02:55:03 AM
any updates or settings for the rx 480? for just groestl.

Did you try running it? Does it work and what speed?
20mh/s with the groestl-v2.cl i found in one of the other miner downloads, i think from nicehash.
with original file about 19 mh/s


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 07, 2016, 08:42:18 AM
any updates or settings for the rx 480? for just groestl.

Did you try running it? Does it work and what speed?
20mh/s with the groestl-v2.cl i found in one of the other miner downloads, i think from nicehash.
with original file about 19 mh/s

could you try with another driver version, something from 2015? you don't need to replace the main driver of your system, just copy the opencl.dll file into the sgminer folder and remove the bin files.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: ylpkm on September 07, 2016, 12:09:48 PM
tried with 15.7,15.12 and 14.12  opencl.dll files. no real difference, tried it with the different groestlcoin.cl, groestlcoin-v1.cl, and groestlcoin-v2.cl types as well.   v2 still being the fastest. no real improvement using the different opencl.dll with the varying groestlcoin.cl (maybe <1% difference if anything).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 07, 2016, 12:29:56 PM
tried with 15.7,15.12 and 14.12  opencl.dll files. no real difference, tried it with the different groestlcoin.cl, groestlcoin-v1.cl, and groestlcoin-v2.cl types as well.   v2 still being the fastest. no real improvement using the different opencl.dll with the varying groestlcoin.cl (maybe <1% difference if anything).

also try renaming the latest hawaii bin to use on your card.
it may work.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: ylpkm on September 08, 2016, 06:47:30 AM
tried with 15.7,15.12 and 14.12  opencl.dll files. no real difference, tried it with the different groestlcoin.cl, groestlcoin-v1.cl, and groestlcoin-v2.cl types as well.   v2 still being the fastest. no real improvement using the different opencl.dll with the varying groestlcoin.cl (maybe <1% difference if anything).

also try renaming the latest hawaii bin to use on your card.
it may work.
so rename, diamondHawaiiw128l8.bin to groestlcoinEllesmeregw256l4.bin? I tried that and no real change either.
(P.S. I appreciate your assistance with this pallas, thanks for your time)


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 08, 2016, 06:51:38 AM
tried with 15.7,15.12 and 14.12  opencl.dll files. no real difference, tried it with the different groestlcoin.cl, groestlcoin-v1.cl, and groestlcoin-v2.cl types as well.   v2 still being the fastest. no real improvement using the different opencl.dll with the varying groestlcoin.cl (maybe <1% difference if anything).

also try renaming the latest hawaii bin to use on your card.
it may work.
so rename, diamondHawaiiw128l8.bin to groestlcoinEllesmeregw256l4.bin? I tried that and no real change either.
(P.S. I appreciate your assistance with this pallas, thanks for your time)

You're welcome.
Then a specific tuning for polaris is needed, unfortunately I do not have such a card and don't plan to buy it in the near future.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on September 26, 2016, 02:37:38 PM
If you are looking for the closed source myriad groestl miner (for DGB, SFR, etc.) look here:

https://satoshibox.com/fttcfvpiyhbod7ueidmgdhym


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: FeranD on October 09, 2016, 10:18:07 PM
Hey, guys. Running 2x7950 GPU's (Asus and Gigabyte). After installing 14.6 instead of 15.12 catalyst my hashrate on mining Diamond Coin increased from 6.2(1st GPU) and 6.6(2nd GPU) to significant 10.2(1st) and 11.09(2nd)Mh/s !!!

Looking for more improvement. Links on first message are broken. Where to get v2 experimental cl kernel, pls ? Send to miner.hid@ya.ru someone kind please.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 09, 2016, 10:30:27 PM
Hey, guys. Running 2x7950 GPU's (Asus and Gigabyte). After installing 14.6 instead of 15.12 catalyst my hashrate on mining Diamond Coin increased from 6.2(1st GPU) and 6.6(2nd GPU) to significant 10.2(1st) and 11.09(2nd)Mh/s !!!

Looking for more improvement. Links on first message are broken. Where to get v2 experimental cl kernel, pls ? Send to miner.hid@ya.ru someone kind please.

For best result, you need to get Wolf0's Tahiti bin, see first post.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: FeranD on October 10, 2016, 12:52:16 PM
Hey, guys. Running 2x7950 GPU's (Asus and Gigabyte). After installing 14.6 instead of 15.12 catalyst my hashrate on mining Diamond Coin increased from 6.2(1st GPU) and 6.6(2nd GPU) to significant 10.2(1st) and 11.09(2nd)Mh/s !!!

Looking for more improvement. Links on first message are broken. Where to get v2 experimental cl kernel, pls ? Send to miner.hid@ya.ru someone kind please.

For best result, you need to get Wolf0's Tahiti bin, see first post.

I've downloaded Wolf's bin on first page. Delete my old .bin files after i copied their names, then pasted Wolf's bin and renamed it to copied string. Correct ?
It gives me significant small hashrate at 700Kilo hashes per card  :o
What's wrong ? sgminer v.4.1.0(got it from diamond mining page).


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 10, 2016, 01:29:15 PM
Hey, guys. Running 2x7950 GPU's (Asus and Gigabyte). After installing 14.6 instead of 15.12 catalyst my hashrate on mining Diamond Coin increased from 6.2(1st GPU) and 6.6(2nd GPU) to significant 10.2(1st) and 11.09(2nd)Mh/s !!!

Looking for more improvement. Links on first message are broken. Where to get v2 experimental cl kernel, pls ? Send to miner.hid@ya.ru someone kind please.

For best result, you need to get Wolf0's Tahiti bin, see first post.

I've downloaded Wolf's bin on first page. Delete my old .bin files after i copied their names, then pasted Wolf's bin and renamed it to copied string. Correct ?
It gives me significant small hashrate at 700Kilo hashes per card  :o
What's wrong ? sgminer v.4.1.0(got it from diamond mining page).

does it submit valid shares?
if you look one or two pages back, there are alternative links for the v1 sources.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: FeranD on October 10, 2016, 09:49:26 PM
...

does it submit valid shares?
if you look one or two pages back, there are alternative links for the v1 sources.

1-2 pages ago there is a same link as on page №1. Do i need to use this bin file along with v1 or v2 kernel files(.cl) ? If so - where to get them ? Links are broken (((


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 11, 2016, 09:16:19 AM
...

does it submit valid shares?
if you look one or two pages back, there are alternative links for the v1 sources.

1-2 pages ago there is a same link as on page №1. Do i need to use this bin file along with v1 or v2 kernel files(.cl) ? If so - where to get them ? Links are broken (((

You should use a bin file OR a cl file (which will generate the bins itself).
Here is an alternative link to the v1 sources:

https://app.box.com/s/9vikvemf7acio3uns7taorodqmod42ej


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: FeranD on October 11, 2016, 11:26:40 AM
...

does it submit valid shares?
if you look one or two pages back, there are alternative links for the v1 sources.

1-2 pages ago there is a same link as on page №1. Do i need to use this bin file along with v1 or v2 kernel files(.cl) ? If so - where to get them ? Links are broken (((

You should use a bin file OR a cl file (which will generate the bins itself).
Here is an alternative link to the v1 sources:

https://app.box.com/s/9vikvemf7acio3uns7taorodqmod42ej
Got this. Thanks for sharing link. I've downloaded .cl file, deleted existing .bin and tried to run sgminer with following command line:
sgminer.exe -k diamond -o stratum+tcp://eu.miningfield.com:3376 -u x -p x --difficulty-multiplier 0.0039062500 --intensity 22
same results 10-11Mh/s per one 7950 card. The one that overclocked to 1200Mhz core/1450Mhz mem runs 11Mhs. Not even near 18Mh/s that i've seen in this thread. I am using 4.1.1 sgminer, do i need to use other version ?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 11, 2016, 11:51:14 AM
...

does it submit valid shares?
if you look one or two pages back, there are alternative links for the v1 sources.

1-2 pages ago there is a same link as on page №1. Do i need to use this bin file along with v1 or v2 kernel files(.cl) ? If so - where to get them ? Links are broken (((

You should use a bin file OR a cl file (which will generate the bins itself).
Here is an alternative link to the v1 sources:

https://app.box.com/s/9vikvemf7acio3uns7taorodqmod42ej
Got this. Thanks for sharing link. I've downloaded .cl file, deleted existing .bin and tried to run sgminer with following command line:
sgminer.exe -k diamond -o stratum+tcp://eu.miningfield.com:3376 -u x -p x --difficulty-multiplier 0.0039062500 --intensity 22
same results 10-11Mh/s per one 7950 card. The one that overclocked to 1200Mhz core/1450Mhz mem runs 11Mhs. Not even near 18Mh/s that i've seen in this thread. I am using 4.1.1 sgminer, do i need to use other version ?

try with another driver version:

https://bitcointalk.org/index.php?topic=779598.msg8787566#msg8787566

you don't need to install the whole driver, just put the opencl.dll file in the sgminer executable folder and run.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: FeranD on October 11, 2016, 09:49:53 PM
try with another driver version:

https://bitcointalk.org/index.php?topic=779598.msg8787566#msg8787566

you don't need to install the whole driver, just put the opencl.dll file in the sgminer executable folder and run.
Are u using Linux? I've found some info dating 2014 year saying video drivers under linux(especially 14.6) runs better for groestl.
i've tried 14.6_beta, 14.7_rc3 - no difference at all. also played with .bin .cl files.

btw do u have a hawaii.cl file ? Using groestlcoin.cl gives rejected shares 100% on DiamondCoin
sgminer-5.4.0-nicehash - gives little better +1-1.5Mh/s hashrate


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 12, 2016, 06:50:28 AM
try with another driver version:

https://bitcointalk.org/index.php?topic=779598.msg8787566#msg8787566

you don't need to install the whole driver, just put the opencl.dll file in the sgminer executable folder and run.
Are u using Linux? I've found some info dating 2014 year saying video drivers under linux(especially 14.6) runs better for groestl.
i've tried 14.6_beta, 14.7_rc3 - no difference at all. also played with .bin .cl files.

btw do u have a hawaii.cl file ? Using groestlcoin.cl gives rejected shares 100% on DiamondCoin
sgminer-5.4.0-nicehash - gives little better +1-1.5Mh/s hashrate

Yes I use linux.
Diamond and groestlcoin use the same kernel.
DiamondCoin is another thing, unrelated to diamond.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: FeranD on October 12, 2016, 10:37:03 AM
try with another driver version:

https://bitcointalk.org/index.php?topic=779598.msg8787566#msg8787566

you don't need to install the whole driver, just put the opencl.dll file in the sgminer executable folder and run.
Are u using Linux? I've found some info dating 2014 year saying video drivers under linux(especially 14.6) runs better for groestl.
i've tried 14.6_beta, 14.7_rc3 - no difference at all. also played with .bin .cl files.

btw do u have a hawaii.cl file ? Using groestlcoin.cl gives rejected shares 100% on DiamondCoin
sgminer-5.4.0-nicehash - gives little better +1-1.5Mh/s hashrate

Yes I use linux.
Diamond and groestlcoin use the same kernel.
DiamondCoin is another thing, unrelated to diamond.

Thank you ! Finally i got results that i wanted. Just needed to use groestlcoin.cl in bat file by writing -k groestlcoin. Makes hashrate 15.2 Mh/s for 1100 mhz core and 16.53 Mh/s for 1200 mhz core.  :) :) :)

Wait.... all shares are rejected due to low difficulty


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 12, 2016, 11:18:45 AM
try with another driver version:

https://bitcointalk.org/index.php?topic=779598.msg8787566#msg8787566

you don't need to install the whole driver, just put the opencl.dll file in the sgminer executable folder and run.
Are u using Linux? I've found some info dating 2014 year saying video drivers under linux(especially 14.6) runs better for groestl.
i've tried 14.6_beta, 14.7_rc3 - no difference at all. also played with .bin .cl files.

btw do u have a hawaii.cl file ? Using groestlcoin.cl gives rejected shares 100% on DiamondCoin
sgminer-5.4.0-nicehash - gives little better +1-1.5Mh/s hashrate

Yes I use linux.
Diamond and groestlcoin use the same kernel.
DiamondCoin is another thing, unrelated to diamond.

Thank you ! Finally i got results that i wanted. Just needed to use groestlcoin.cl in bat file by writing -k groestlcoin. Makes hashrate 15.2 Mh/s for 1100 mhz core and 16.53 Mh/s for 1200 mhz core.  :) :) :)

Wait.... all shares are rejected due to low difficulty

did you put memory to the lowest clock possible?
about low difficulty, try adding this:

--difficulty-multiplier 0.0039062500

EDIT: you should use -k diamond and rename groestlcoin.cl to diamond.cl


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: FeranD on October 12, 2016, 12:32:25 PM
Quote
did you put memory to the lowest clock possible?
about low difficulty, try adding this:

--difficulty-multiplier 0.0039062500

EDIT: you should use -k diamond and rename groestlcoin.cl to diamond.cl

no, memory runs at 1450(1st card) and 1250(2nd card).

adding difficulty-multiplier results in much faster output of rejected shares.
renaming groestlcoin.cl to diamond.cl and running sgminer-5.4.0 with -k diamond gives same low hashrate(11 and 12) unless i rebuild bin file. That works !!!! Thank you so much Pallas. Going to put a positive feedback for your endless help.
What sgminer version do you use ?


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: pallas on October 12, 2016, 12:38:27 PM
Quote
did you put memory to the lowest clock possible?
about low difficulty, try adding this:

--difficulty-multiplier 0.0039062500

EDIT: you should use -k diamond and rename groestlcoin.cl to diamond.cl

no, memory runs at 1450(1st card) and 1250(2nd card).

adding difficulty-multiplier results in much faster output of rejected shares.
renaming groestlcoin.cl to diamond.cl and running sgminer-5.4.0 with -k diamond gives same low hashrate(11 and 12) unless i rebuild bin file. That works !!!! Thank you so much Pallas. Going to put a positive feedback for your endless help.
What sgminer version do you use ?

if you set ram to the lowest possible clock, which may be limited to core clock - 200, you will save power and be able to overclock more.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: marada on February 04, 2017, 11:36:12 AM
I am getting 19.3 Mh/s with 7950 @1100, fglrx 15.201.11.51 (Catalyst 15.9 installer) and wolf0 binary in diamond.


Title: Re: [ANN][GRS][DMD] Pallas optimized groestlcoin / diamond etc. opencl kernel
Post by: r8st on April 22, 2017, 10:34:05 PM
I'm getting 24 MH/s per 7950/7970/280x, new AMD 17.x.x drivers, Win10 (got the same on ethos/Ubuntu), and lots of time spent fine-tuning overclocks. To be specific my two 7950s get about 22 each, and the three 280x get 25, 25, and 26 MH/s

Also would like to add, since I am not sure it has been mentioned here... My 2nd rig has 3x RX470s which get around 30 MH/s each with proper adjustments and Wolf's DMD Tahiti .bin. That's pretty good, at least compared to whattomine.com which reports I should be getting around 20 MH/s for each 470. Do I tell them? :-)