Bitcoin Forum
April 27, 2024, 05:19:46 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
Author Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels  (Read 61214 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 08:00:34 AM
Last edit: October 26, 2018, 07:21:02 AM by pallas
 #1

**** MYRIAD GROESTL ****

If you are looking for the closed source myriad groestl miner (for DGB, SFR, etc.) look here instead:

https://satoshibox.com/fttcfvpiyhbod7ueidmgdhym

ABOUT

This is my optimized Groestlcoin / Diamond and similar opencl kernel (groestl + groestl algorythm, not myriad-groestl which is groestl + sha, see the top of this post for the latter).
It is based on the sph version originally available on sph-sgminer but is now totally rewritten.
It should be compatible with all sph-sgminer versions and derivates.

PERFORMANCE

v1 - to be compiled with catalyst 14.6 or 14.7:

R9 290x @1125 Mhz: ~26.4 Mh/s
R9 290 @1200: ~25 Mh/s
R9 280x (stock): ~18 Mh/s
7950 @1200: ~16 Mh/s
R9 270X: ~9.7 Mh/s

v2 - experimental hawaii only bin:

R9 290x @1125 Mhz: ~34.4 Mh/s
R9 290 @1100: ~30.6 Mh/s

Wolf0's Tahiti binary:

R9 280x: ~25 Mh/s

HOW TO USE

- Stop the miner
- Replace groestlcoin.cl, diamond.cl and/or the kernel you want to use with this one (it's inside the "kernel" folder)
- Remove all the .bin files (in the main folder)
- Set worksize to 256 only (-w 256)
- Run and enjoy!

TWEAKING

Set intensity from 20 to 22. Thread concurrency and all the other parameters are useless.
This kernel doesn't make use of gpu ram, so set the ram clock to THE MINIMUM POSSIBLE VALUE; for example 150 MHz for R9 290.
Now play with the core clock until you find the highest stable value (probably between 1100 and 1200 for the R9 290).

COMPATIBILITY

Tested working stable on R9 290, 280x and 7950. Should work on any recent amd gpu but performance is not guaranted to be optimal.
I doesn't work with cryptohunger optimized pool: use the conventional port or another pool. Also do not replace the optimized kernel of grs-sgminer but the normal one.

TROUBLESHOOTING

Try the following:
- Sure you set worksize to 256?
- Replace the generated .bin file with this one (64 bit, r9 280(x) and 290(x) only): LINK EXPIRED (diamondHawaiiw256l8.bin), see below for a newer binary file
- Lower the intensity
- Lower the core speed (are you sure you put the ram clock to the lowest possible value?)
- Since it uses more power, it could be a cooling issue too: check the gpu temperature

DONATIONS

This work took me months of coding and testing and unslept nights; please show your appreciation (you are making more money by using it!) by donating to:
BTC: DISABLED

DOWNLOAD

Opensource Kernel (v1):
https://app.box.com/s/9vikvemf7acio3uns7taorodqmod42ej

Experimental Hawaii bin (v2):
https://app.box.com/s/zsr29tfgv4tpxs1q7451dayzaw3wnoee

Wolf0's Tahiti bin (https://bitcointalk.org/index.php?topic=779598.msg11778971#msg11778971):
https://ottrbutt.com/miner/wolf-groestlcoinTahitigw256l4.bin

1714238386
Hero Member
*
Offline Offline

Posts: 1714238386

View Profile Personal Message (Offline)

Ignore
1714238386
Reply with quote  #2

1714238386
Report to moderator
1714238386
Hero Member
*
Offline Offline

Posts: 1714238386

View Profile Personal Message (Offline)

Ignore
1714238386
Reply with quote  #2

1714238386
Report to moderator
1714238386
Hero Member
*
Offline Offline

Posts: 1714238386

View Profile Personal Message (Offline)

Ignore
1714238386
Reply with quote  #2

1714238386
Report to moderator
Unlike traditional banking where clients have only a few account numbers, with Bitcoin people can create an unlimited number of accounts (addresses). This can be used to easily track payments, and it improves anonymity.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714238386
Hero Member
*
Offline Offline

Posts: 1714238386

View Profile Personal Message (Offline)

Ignore
1714238386
Reply with quote  #2

1714238386
Report to moderator
1714238386
Hero Member
*
Offline Offline

Posts: 1714238386

View Profile Personal Message (Offline)

Ignore
1714238386
Reply with quote  #2

1714238386
Report to moderator
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 08:19:21 AM
Last edit: October 07, 2014, 08:03:32 AM by pallas
 #2

A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

cryptonit
Legendary
*
Offline Offline

Activity: 3038
Merit: 1053


bit.diamonds | uNiq.diamonds


View Profile WWW
September 12, 2014, 08:20:05 AM
 #3

thx a lot for ur effort
to make best possible amd based mining open source avaiable for
DMD Diamond


 
  Diamond [DMD]     uNiq.Diamonds  
Scarce✦✦✦✦ Valuable ✦✦✦✦ Secure ✦                     ▬ a collector experience ▬                
popshot
Hero Member
*****
Offline Offline

Activity: 774
Merit: 554


CEO Diamond Foundation


View Profile WWW
September 12, 2014, 08:40:05 AM
 #4

Pallas you are Prometheus, spending your time and skills in creating something useful to a lot of people and at the end opening it to all interested. Kudos  Smiley

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 08:49:49 AM
 #5

That's what opensource is about ;-)
I'm a linux guy for 20 years now and I remember public domain software since the commodore age (around 1984).

srcxxx
Sr. Member
****
Offline Offline

Activity: 266
Merit: 250


View Profile WWW
September 12, 2014, 08:53:53 AM
 #6

Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 09:00:40 AM
 #7

Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx

I did a lot of optimizations which looked smart but lead to slower code. You can't imagine how many. Some of them I though of while in bed and made me not sleep until I could try them. Then the delusion :-D
I believe it's because of the optimizations the compiler does but most of all about local memory and cache access.
If you optimize some instructions but leading to less parallelism in memory access, you'll get slower code.
That's typical for groestl because the speed is limited mostly by the memory (otherwise it would be faster like keccak, blake etc.).
I know there was a bounty: my code was unfinished and, at the time, it was the only way I could mine without loosing money.

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
September 12, 2014, 09:04:14 AM
Last edit: September 12, 2014, 09:15:06 AM by utahjohn
 #8

Wow that's a nice improvement on hashrate Smiley  Now tuning for stability on my miners ...
Sending a donation your way next block find Smiley

Testing on HD7950 and R9280X and will report my hashrates when I get it stable Smiley
Both cards run considerably hotter and 100% fan ...
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 09:07:18 AM
 #9

Wow that's a nice improvement on hashrate Smiley  Now tuning for stability on my miners ...
Sending a donation your way next block find Smiley

Thanks!
Let me know your figures.
I need 280x and 290x hashrates, to put in the op.

srcxxx
Sr. Member
****
Offline Offline

Activity: 266
Merit: 250


View Profile WWW
September 12, 2014, 09:22:19 AM
 #10

Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx

I did a lot of optimizations which looked smart but lead to slower code. You can't imagine how many. Some of them I though of while in bed and made me not sleep until I could try them. Then the delusion :-D
I believe it's because of the optimizations the compiler does but most of all about local memory and cache access.
If you optimize some instructions but leading to less parallelism in memory access, you'll get slower code.
That's typical for groestl because the speed is limited mostly by the memory (otherwise it would be faster like keccak, blake etc.).
I know there was a bounty: my code was unfinished and, at the time, it was the only way I could mine without loosing money.

I know. I actually think that the compiler is not that clever and that's why sometimes worse code runs faster.
Also, I looked at ASM and some stuff there is just plain not optimal. Perhaps it'll be improved in future versions of AMD drivers.

Also, most ASM code only uses .xy from a register. I tried making it work on ulong2 or ulong8 - only slower.

I wish it was possible to write GPU code in assembler...
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 09:33:44 AM
 #11

Again it's mostly about memory for groestl: optimizing register operations might lead to unnoticeable gain but you may loose on memory access.

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
September 12, 2014, 09:34:59 AM
 #12

@pallas
What's your DMD donation address Smiley Found 2 blocks in like 15 minutes (LUCK!)
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 09:36:25 AM
 #13

@pallas
What's your DMD donation address Smiley Found 2 blocks in like 15 minutes (LUCK!)

good!
my DMD address is dVrz69vZFrxJRH9AnKyHim7Hd3PhY3w9NQ

utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
September 12, 2014, 09:39:37 AM
 #14

Sent ya 0.5 DMD for now, will send some more after it runs stable for a day Smiley
Transaction ID: 37bca0a9872845908b4fc4e223d920b3355b5bbbb54de97a583aee67c7b4605d
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
September 12, 2014, 10:08:46 AM
 #15

I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 10:10:00 AM
 #16

I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.

Ivanech
Hero Member
*****
Offline Offline

Activity: 808
Merit: 1014


View Profile
September 12, 2014, 10:10:34 AM
 #17

Have anybody tried with 270X cards - what hashrate should I expect?
utahjohn
Hero Member
*****
Offline Offline

Activity: 630
Merit: 500


View Profile
September 12, 2014, 10:18:09 AM
Last edit: September 12, 2014, 10:32:09 AM by utahjohn
 #18

I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.

Actually runs faster at I=21 Smiley
Have not messed with GPU or MEM clocks just defaults Smiley  
(Powercolor) 280X  18MHs 67C-68C
(Powercolor) 7950  16MHs 68C-69C
Both cards are volt-modded to lower than stock ...
pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 10:48:25 AM
 #19

I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.

Actually runs faster at I=21 Smiley
Have not messed with GPU or MEM clocks just defaults Smiley  
(Powercolor) 280X  18MHs 67C-68C
(Powercolor) 7950  16MHs 68C-69C
Both cards are volt-modded to lower than stock ...

good, but if you lower the mem clock you will save power, and get higher maximum core clock as well.

pallas (OP)
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 12, 2014, 10:56:02 AM
 #20

added to the op:

IF IN TROUBLE, TRY REPLACING THE GENERATED .BIN FILE WITH THIS ONE: https://dl.dropboxusercontent.com/u/40353042/Diamond/diamondHawaiiw256l8.bin

Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!