Bitcoin Forum
May 09, 2024, 12:47:51 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 »  All
  Print  
Author Topic: New OpenCL Kernel for Myriad-Groestl (DGB, MYR, etc.)  (Read 37692 times)
MaxDZ8
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500



View Profile
October 23, 2015, 09:27:28 AM
 #21

What is your experience with async block reads?
1715258871
Hero Member
*
Offline Offline

Posts: 1715258871

View Profile Personal Message (Offline)

Ignore
1715258871
Reply with quote  #2

1715258871
Report to moderator
1715258871
Hero Member
*
Offline Offline

Posts: 1715258871

View Profile Personal Message (Offline)

Ignore
1715258871
Reply with quote  #2

1715258871
Report to moderator
1715258871
Hero Member
*
Offline Offline

Posts: 1715258871

View Profile Personal Message (Offline)

Ignore
1715258871
Reply with quote  #2

1715258871
Report to moderator
No Gods or Kings. Only Bitcoin
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715258871
Hero Member
*
Offline Offline

Posts: 1715258871

View Profile Personal Message (Offline)

Ignore
1715258871
Reply with quote  #2

1715258871
Report to moderator
ghostlander (OP)
Legendary
*
Offline Offline

Activity: 1239
Merit: 1020


No surrender, no retreat, no regret.


View Profile WWW
October 23, 2015, 12:55:35 PM
 #22

Thanks to those who have donated.

instead of using "elif", just use simple independent "if" statements and remove the dup-triplicate instructions.
or, better, make an unrolled loop.
that way it's much more compact and easier to debug.

Umm, these are preprocessor directives. The decision is made at compile time. There is nothing left to unroll.

What is your experience with async block reads?

Aren't they async by default in SGminer?

"If you've got a problem and have to spread some coins to make it go away, you've got no problem. You've got an expence." ~ Phoenixcoin (PXC) and Orbitcoin (ORB) and Halcyon (HAL)
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
October 23, 2015, 01:17:41 PM
 #23

Thanks to those who have donated.

instead of using "elif", just use simple independent "if" statements and remove the dup-triplicate instructions.
or, better, make an unrolled loop.
that way it's much more compact and easier to debug.

Umm, these are preprocessor directives. The decision is made at compile time. There is nothing left to unroll.

do you see the repeated instructions?
just change the "if" structure and you can remove them ;-)
i.e. if worksize <= 128 you need to do some additional rotates compared to default (256), some additional others if worksize == 64....
although the alternative for loop is a much more elegant solution and the difference in speed is negligible.

ghostlander (OP)
Legendary
*
Offline Offline

Activity: 1239
Merit: 1020


No surrender, no retreat, no regret.


View Profile WWW
October 23, 2015, 01:42:40 PM
 #24

Thanks to those who have donated.

instead of using "elif", just use simple independent "if" statements and remove the dup-triplicate instructions.
or, better, make an unrolled loop.
that way it's much more compact and easier to debug.

Umm, these are preprocessor directives. The decision is made at compile time. There is nothing left to unroll.

do you see the repeated instructions?
just change the "if" structure and you can remove them ;-)
i.e. if worksize <= 128 you need to do some additional rotates compared to default (256), some additional others if worksize == 64....
although the alternative for loop is a much more elegant solution and the difference in speed is negligible.

What you suggest results in less linear memory writes which isn't good usually. I prefer to avoid loops if possible.

Code:
    T0_L[lclid] = T0[lclid];
    T1_L[lclid] = rotate(T0[lclid], 8UL);
    T2_L[lclid] = rotate(T0[lclid], 16UL);
    T3_L[lclid] = rotate(T0[lclid], 24UL);
    T4_L[lclid] = rotate(T0[lclid], 32UL);
    T5_L[lclid] = rotate(T0[lclid], 40UL);
    T6_L[lclid] = rotate(T0[lclid], 48UL);
    T7_L[lclid] = rotate(T0[lclid], 56UL);
#if (WORKSIZE < 256)
    T0_L[lclid + 128] = T0[lclid + 128];
    T1_L[lclid + 128] = rotate(T0[lclid + 128], 8UL);
    T2_L[lclid + 128] = rotate(T0[lclid + 128], 16UL);
    T3_L[lclid + 128] = rotate(T0[lclid + 128], 24UL);
    T4_L[lclid + 128] = rotate(T0[lclid + 128], 32UL);
    T5_L[lclid + 128] = rotate(T0[lclid + 128], 40UL);
    T6_L[lclid + 128] = rotate(T0[lclid + 128], 48UL);
    T7_L[lclid + 128] = rotate(T0[lclid + 128], 56UL);
#endif
#if (WORKSIZE < 128)
    T0_L[lclid + 64] = T0[lclid + 64];
    T0_L[lclid + 192] = T0[lclid + 192];
    T1_L[lclid + 64] = rotate(T0[lclid + 64], 8UL);
    T1_L[lclid + 192] = rotate(T0[lclid + 192], 8UL);
    T2_L[lclid + 64] = rotate(T0[lclid + 64], 16UL);
    T2_L[lclid + 192] = rotate(T0[lclid + 192], 16UL);
    T3_L[lclid + 64] = rotate(T0[lclid + 64], 24UL);
    T3_L[lclid + 192] = rotate(T0[lclid + 192], 24UL);
    T4_L[lclid + 64] = rotate(T0[lclid + 64], 32UL);
    T4_L[lclid + 192] = rotate(T0[lclid + 192], 32UL);
    T5_L[lclid + 64] = rotate(T0[lclid + 64], 40UL);
    T5_L[lclid + 192] = rotate(T0[lclid + 192], 40UL);
    T6_L[lclid + 64] = rotate(T0[lclid + 64], 48UL);
    T6_L[lclid + 192] = rotate(T0[lclid + 192], 48UL);
    T7_L[lclid + 64] = rotate(T0[lclid + 64], 56UL);
    T7_L[lclid + 192] = rotate(T0[lclid + 192], 56UL);
#endif

"If you've got a problem and have to spread some coins to make it go away, you've got no problem. You've got an expence." ~ Phoenixcoin (PXC) and Orbitcoin (ORB) and Halcyon (HAL)
MaxDZ8
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500



View Profile
October 23, 2015, 03:50:00 PM
 #25

Aren't they async by default in SGminer?
I mean as of async_work_group_copy & friends. I have very mixed feelings on them, on my hardware, they just don't perform well not even in terms of bandwidth usage. Maybe on GCN1.1 they work better, IDK.
BTW, I would suggest to put a barrier(CLK_LOCAL_MEM_FENCE) right after the T-tables load. Besides, just have it looped!
Code:
for(ulong i = 0; i < 8; ++i) {
    local uint *tdst = T_local + 256 * 8; // not even really required if you alloc your local T tables in block
    for(int el = 0; el < 256; el += get_local_size(0)) {
        tdst[get_local_id(0)] = rotate(T_global[get_local_id(0)], i * 8);
        tdst += get_local_size(0);
        tsrc += get_local_size(0);
    }
}
That's more or less what pallas suggests. It is a fully coherent read. LDS has full scatter/gather capability, as long as you end in a different bank you're safe (here, it happens by construction).
Alternatively the first block can be loaded by async_work_group_copy, and you can derive T1_L from T0_L, LDS should provide you a massive latency reduction by not having to round-trip to the (hot) L1 cache.
Loops such as this are fully unrolled in most cases.
drr0ss
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
October 23, 2015, 09:18:34 PM
 #26

instead of using "elif", just use simple independent "if" statements and remove the dup-triplicate instructions.
or, better, make an unrolled loop.
that way it's much more compact and easier to debug.

Hi Pallas, can you share me your .cl, I will send you you some feeds.....
drr0ss
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
October 23, 2015, 09:31:51 PM
 #27

Aren't they async by default in SGminer?
I mean as of async_work_group_copy & friends. I have very mixed feelings on them, on my hardware, they just don't perform well not even in terms of bandwidth usage. Maybe on GCN1.1 they work better, IDK.
BTW, I would suggest to put a barrier(CLK_LOCAL_MEM_FENCE) right after the T-tables load. Besides, just have it looped!
Code:
for(ulong i = 0; i < 8; ++i) {
    local uint *tdst = T_local + 256 * 8; // not even really required if you alloc your local T tables in block
    for(int el = 0; el < 256; el += get_local_size(0)) {
        tdst[get_local_id(0)] = rotate(T_global[get_local_id(0)], i * 8);
        tdst += get_local_size(0);
        tsrc += get_local_size(0);
    }
}
That's more or less what pallas suggests. It is a fully coherent read. LDS has full scatter/gather capability, as long as you end in a different bank you're safe (here, it happens by construction).
Alternatively the first block can be loaded by async_work_group_copy, and you can derive T1_L from T0_L, LDS should provide you a massive latency reduction by not having to round-trip to the (hot) L1 cache.
Loops such as this are fully unrolled in most cases.

Oh how forcing a old man like me to retype the code and thinking Smiley
never mind and cheers Wink
sp_
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
October 23, 2015, 10:30:40 PM
Last edit: November 17, 2015, 08:12:17 PM by sp_
 #28

Pallas is pretty good.

He bough a NVIDIA card and improved Neoscrypt 10% in a couple of weeks.

CUDA, foreign language foreign technology...

respect

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
proctologic
Sr. Member
****
Offline Offline

Activity: 241
Merit: 250


View Profile WWW
November 17, 2015, 08:10:57 PM
 #29

What coin do you use Myriad-Groestl to mine?
There is :
- Saffroncoin
- Digibyte
- Myriadcoin


Trinitycoin

proctologic
Sr. Member
****
Offline Offline

Activity: 241
Merit: 250


View Profile WWW
November 29, 2015, 01:31:11 AM
 #30



I don't understand why you'd be using a 290 to mine Myriad-Groestl on linux?  Wrong algo.
huh ?
I don't understand why you are posting that ? wrong or random answer...

AFAIK, Myriad-Groestl is only used by DGB and MYR.  And if you're gonna mine either of those coins with a 290 (I have half a dozen 290s), Skein is by far the better algo for that particular gpu.  See https://bitcointalk.org/index.php?topic=1186670 for more details.




Trinity to http://coinspool.cu.cc/workers_trinity

https://bitcointalk.org/index.php?topic=1186025.0

Koltan
Newbie
*
Offline Offline

Activity: 57
Merit: 0


View Profile
May 04, 2016, 08:14:02 PM
 #31

Radeon HD7790 1200/1600 mining MYR got 17.5 Mh on this kernel. It's two times faster than the original  Shocked
Tmdz
Hero Member
*****
Offline Offline

Activity: 1008
Merit: 1000


View Profile
May 04, 2016, 09:10:30 PM
Last edit: May 05, 2016, 12:01:29 AM by Tmdz
 #32

nice work 7950 went from 7 mh to 27 mh, but I think mining the skein will still earn you more with dgb.

On the technical side that kind of efficiency improvement is simply amazing.  Shocked
navydude
Sr. Member
****
Offline Offline

Activity: 283
Merit: 250


View Profile
May 18, 2017, 08:30:52 PM
 #33

Wondering if someone would compile this for windows. Would be much appreciated!

prichina
Newbie
*
Offline Offline

Activity: 82
Merit: 0


View Profile
May 23, 2017, 07:56:44 PM
 #34

7970 is doing fine with sgminer 5.1.1. - 35 mh/s ....but my R9 290X is bonkers, only 40-42 mh/s is very low :/ Can someone help, pls send me a kernel or bin so i can make it work...I've tryied ghostlenders myriad-groestl.cl ...still 7970 - 35 mh/s and R9 290X - 40-42 mh/s :/ If someone needs work with video editing, i can make it happen....privat message me, Kind Regards Ivo Icevski
ghostlander (OP)
Legendary
*
Offline Offline

Activity: 1239
Merit: 1020


No surrender, no retreat, no regret.


View Profile WWW
May 24, 2017, 03:20:55 PM
 #35

I don't work on this kernel any more. 500k MYR in donations was all that I received, which is hardly worth the effort. If anyone wants to continue, feel free.

"If you've got a problem and have to spread some coins to make it go away, you've got no problem. You've got an expence." ~ Phoenixcoin (PXC) and Orbitcoin (ORB) and Halcyon (HAL)
JetstoBrazil
Newbie
*
Offline Offline

Activity: 31
Merit: 0


View Profile
June 04, 2017, 05:31:44 AM
 #36

Hi, I have very stupid question and cant find the aswer...

How could I install this kernel? Should I save the code in text editor? And than how to order sgminer to use this kernel? I have in sgminer fordel subforlder with a lot of kernels (kernel files *.cl).

I cant find any instructions for installing this kernel.

Thank you for your answer

open miner folder in there open the kernel folder and find the myriad-grostel kernel right click and select open with wordpad delete everything that is in there then copy paste the code from the first post into it. save and close
coinmania
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 09, 2017, 09:12:36 AM
 #37

Hey There,

just a question the digibite core wallet is now running for one day and telling me the the synchronizing with the network will take about 10 Year?
This is not normal right?

"Amateurs sit and wait for inspiration, the rest of us get up and go to work" Stephen King

Just sayin: 158xW3o63zdGe6wCQH5edyrjm5RFzTqghn
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
June 09, 2017, 09:27:40 AM
 #38

Hey There,

just a question the digibite core wallet is now running for one day and telling me the the synchronizing with the network will take about 10 Year?
This is not normal right?

And it is not on topic either, you should ask in the digibyte thread.

coinmania
Member
**
Offline Offline

Activity: 97
Merit: 10


View Profile
June 09, 2017, 11:17:14 AM
 #39

Hey There,

just a question the digibite core wallet is now running for one day and telling me the the synchronizing with the network will take about 10 Year?
This is not normal right?

And it is not on topic either, you should ask in the digibyte thread.

yeah you are right, never mind found the Solutions.

"Amateurs sit and wait for inspiration, the rest of us get up and go to work" Stephen King

Just sayin: 158xW3o63zdGe6wCQH5edyrjm5RFzTqghn
Harry5555
Member
**
Offline Offline

Activity: 81
Merit: 10


View Profile
July 21, 2017, 02:22:44 AM
Last edit: July 21, 2017, 12:44:50 PM by Harry5555
 #40

Could anybody help me configure this miner, each time I configure it it prefers mine on intel HD graphics rather than my RX GPU...


Any suggestions?


Edit: I found a work around by using --gpu-platform 1
Pages: « 1 [2] 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!