Bitcoin Forum
October 24, 2017, 06:11:25 AM *
News: Latest stable version of Bitcoin Core: 0.15.0.1  [Torrent]. (New!)
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 »  All
  Print  
Author Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels  (Read 59284 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
smolen
Hero Member
*****
Offline Offline

Activity: 525


View Profile
April 08, 2015, 08:04:49 PM
 #321

Good work on the groest. Smolens quark miner does around 2 mhash on the 280x.
My gtx 980 does 20mhash. The competition is sleeping...
Some of competitors are awake, taking exercises with pen and paper to get all AES-wannabees at once Cheesy Doing it all by hand, algo by algo will be just boring.


15 years ago I worked for a company in the silicon valley. My collegues earned xxx.xxx$ a year but I was a student at san francisco state u.
I've also lived and worked in st. Petersburg Russia. My collegues are some of the best programmers in the world.
Triangulated

Some of your last tips (and smolen's) can be applied to this kernel as well, I think it can reach 38/40 Mh/s ;-)

Last but one trick in my WhirlpoolX kernel. Anyway, I'm going to abandon table approach, no much sense to keep it secret.
Code:
static const CONSTANT UINT64 arrPrecalc_post_l27[256] = ...
#define baseL27 ((UINT32)&arrPrecalc_post_l27[0])
#define TC0off8_l27(off8) (*(const CONSTANT UINT64*)&(((const CONSTANT UINT8*)0)[off8]))
#define LUT3_r3(v) ASX64(TC0off8_l27(bitselect(baseL27, (UINT32)(as_ulong(v) >> 24), 0x7F8U)))

Of course I gave you bad advice. Good one is way out of your price range.
1508825485
Hero Member
*
Offline Offline

Posts: 1508825485

View Profile Personal Message (Offline)

Ignore
1508825485
Reply with quote  #2

1508825485
Report to moderator
1508825485
Hero Member
*
Offline Offline

Posts: 1508825485

View Profile Personal Message (Offline)

Ignore
1508825485
Reply with quote  #2

1508825485
Report to moderator
1508825485
Hero Member
*
Offline Offline

Posts: 1508825485

View Profile Personal Message (Offline)

Ignore
1508825485
Reply with quote  #2

1508825485
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1508825485
Hero Member
*
Offline Offline

Posts: 1508825485

View Profile Personal Message (Offline)

Ignore
1508825485
Reply with quote  #2

1508825485
Report to moderator
1508825485
Hero Member
*
Offline Offline

Posts: 1508825485

View Profile Personal Message (Offline)

Ignore
1508825485
Reply with quote  #2

1508825485
Report to moderator
smolen
Hero Member
*****
Offline Offline

Activity: 525


View Profile
April 08, 2015, 08:19:05 PM
 #322

I was wondering if us (miner developers) should unite to take the best out of it.
Cartel will take all the fun out of game and possibly destroy PoW world. On the other hand, PoS landscape could benefit from some polishing Smiley

Of course I gave you bad advice. Good one is way out of your price range.
Wolf0
Legendary
*
Online Online

Activity: 1722


Miner Developer


View Profile
April 08, 2015, 08:23:55 PM
 #323

I was wondering if us (miner developers) should unite to take the best out of it.
Cartel will take all the fun out of game and possibly destroy PoW world. On the other hand, PoS landscape could benefit from some polishing Smiley

Haha, too true. Also, just going through this for myself, here:
Code:
static const __constant ulong arrPrecalc_post_l27[256] = ...
#define baseL27 ((uint)&arrPrecalc_post_l27[0])
#define TC0off8_l27(off8) (*(const __constant ulong *)&(((const __constant uint8 *)0)[off8]))
#define LUT3_r3(v) as_ulong(TC0off8_l27(bitselect(baseL27, (uint)(as_ulong(v) >> 24), 0x7F8U))

That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"

Code:
Donations: BTC: 1WoLFdwcfNEg64fTYsX1P25KUzzSjtEZC -- XMR: 45SLUTzk7UXYHmzJ7bFN6FPfzTusdUVAZjPRgmEDw7G3SeimWM2kCdnDQXwDBYGUWaBtZNgjYtEYA22aMQT4t8KfU3vHLHG
smolen
Hero Member
*****
Offline Offline

Activity: 525


View Profile
April 08, 2015, 08:27:08 PM
 #324

That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes Smiley Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type Smiley
X64/ASX64 macros keep code debugable on CPU - MSVC is too handy
Code:
#ifdef __OPENCL_VERSION__
#define X64 uint2
#define ASX64(v) (as_uint2(v))
#else
#define X64 UINT64
#define ASX64(v) (v)
#endif

Of course I gave you bad advice. Good one is way out of your price range.
Wolf0
Legendary
*
Online Online

Activity: 1722


Miner Developer


View Profile
April 08, 2015, 08:46:11 PM
 #325

That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes Smiley Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type Smiley

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...

Code:
Donations: BTC: 1WoLFdwcfNEg64fTYsX1P25KUzzSjtEZC -- XMR: 45SLUTzk7UXYHmzJ7bFN6FPfzTusdUVAZjPRgmEDw7G3SeimWM2kCdnDQXwDBYGUWaBtZNgjYtEYA22aMQT4t8KfU3vHLHG
smolen
Hero Member
*****
Offline Offline

Activity: 525


View Profile
April 08, 2015, 09:21:19 PM
 #326

That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes Smiley Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type Smiley

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...
Tables are constant, just prerotated left by 3 bit (size of one uint2 when used as index). Well, this stuff needs comments, if kernel will be published. Money are in X11 and Monero, not so much value in Whirlpool code, I could just drop it somewhere, but it will give everyone free boost in X11 Sad

Of course I gave you bad advice. Good one is way out of your price range.
pallas
Legendary
*
Offline Offline

Activity: 1442


Black Belt Developer


View Profile
April 08, 2015, 09:43:52 PM
 #327

That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes Smiley Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type Smiley

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...
Tables are constant, just prerotated left by 3 bit (size of one uint2 when used as index). Well, this stuff needs comments, if kernel will be published. Money are in X11 and Monero, not so much value in Whirlpool code, I could just drop it somewhere, but it will give everyone free boost in X11 Sad

Maybe not: people are using wolf0's precompiled x11 binaries, just adding your trick to stock kernels will not come close to them speed-wise.

Wolf0
Legendary
*
Online Online

Activity: 1722


Miner Developer


View Profile
April 08, 2015, 09:57:25 PM
 #328

That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes Smiley Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type Smiley

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...
Tables are constant, just prerotated left by 3 bit (size of one uint2 when used as index). Well, this stuff needs comments, if kernel will be published. Money are in X11 and Monero, not so much value in Whirlpool code, I could just drop it somewhere, but it will give everyone free boost in X11 Sad

Whirlpool's not even in X11 - might help a bit with Groestl, though.

Code:
Donations: BTC: 1WoLFdwcfNEg64fTYsX1P25KUzzSjtEZC -- XMR: 45SLUTzk7UXYHmzJ7bFN6FPfzTusdUVAZjPRgmEDw7G3SeimWM2kCdnDQXwDBYGUWaBtZNgjYtEYA22aMQT4t8KfU3vHLHG
utahjohn
Hero Member
*****
Offline Offline

Activity: 630


View Profile
April 13, 2015, 10:44:28 AM
 #329

The need for an improved goestl kernel is now immediate ... please do what u can ... I am just C, C++ coder and am not fully into multi thread GPU coding ...
sp_
Legendary
*
Offline Offline

Activity: 1190

Ccminer developer


View Profile
April 13, 2015, 11:38:45 AM
 #330

Pallas, can you  rewrite this groesl-256 implementation to a groestl-512 and add it to sgminer (x11,x13,x15).?

sp_
Legendary
*
Offline Offline

Activity: 1190

Ccminer developer


View Profile
April 13, 2015, 01:53:50 PM
 #331

Some of competitors are awake, taking exercises with pen and paper to get all AES-wannabees at once Cheesy

Wolf0 claims to know aes from the inside backwords and forwards. Me too.

The answer is SEA :-)

Wolf0
Legendary
*
Online Online

Activity: 1722


Miner Developer


View Profile
April 13, 2015, 09:10:08 PM
 #332

Some of competitors are awake, taking exercises with pen and paper to get all AES-wannabees at once Cheesy

Wolf0 claims to know aes from the inside backwords and forwards. Me too.

The answer is SEA :-)

Dunno what SEA means, but for AMD, you can do it classic, you can do it table-based, or you can do it classic with a twist: Convert to bitslice form. do the S-box, and convert it right back, doing the rest of the ops classic-style, with some optimization tricks.

I would not advise doing it the way Christian did for Nvidia - lack of shfl().

Code:
Donations: BTC: 1WoLFdwcfNEg64fTYsX1P25KUzzSjtEZC -- XMR: 45SLUTzk7UXYHmzJ7bFN6FPfzTusdUVAZjPRgmEDw7G3SeimWM2kCdnDQXwDBYGUWaBtZNgjYtEYA22aMQT4t8KfU3vHLHG
smolen
Hero Member
*****
Offline Offline

Activity: 525


View Profile
April 14, 2015, 08:59:49 AM
 #333

Wolf0 claims to know aes from the inside backwords and forwards. Me too.

The answer is SEA :-)
Yes, that makes the game damn addictive.
Look, you told us about wide tables, great idea, but to skip sboxing with it couple more inches deeper inside AES is needed Smiley

Of course I gave you bad advice. Good one is way out of your price range.
pallas
Legendary
*
Offline Offline

Activity: 1442


Black Belt Developer


View Profile
April 14, 2015, 02:03:28 PM
 #334

Pallas, can you  rewrite this groesl-256 implementation to a groestl-512 and add it to sgminer (x11,x13,x15).?

Sorry for the delay.
That would be nice, but everybody's using wolf0's binaries, so why? It would make sense if there is a plan to opensource optimized versions of most of the algos.

Wolf0
Legendary
*
Online Online

Activity: 1722


Miner Developer


View Profile
April 14, 2015, 02:44:33 PM
 #335

Pallas, can you  rewrite this groesl-256 implementation to a groestl-512 and add it to sgminer (x11,x13,x15).?

Sorry for the delay.
That would be nice, but everybody's using wolf0's binaries, so why? It would make sense if there is a plan to opensource optimized versions of most of the algos.

I'm guessing sp_ wants you to do the work for him Tongue

Code:
Donations: BTC: 1WoLFdwcfNEg64fTYsX1P25KUzzSjtEZC -- XMR: 45SLUTzk7UXYHmzJ7bFN6FPfzTusdUVAZjPRgmEDw7G3SeimWM2kCdnDQXwDBYGUWaBtZNgjYtEYA22aMQT4t8KfU3vHLHG
utahjohn
Hero Member
*****
Offline Offline

Activity: 630


View Profile
April 16, 2015, 05:33:39 AM
 #336

@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x
Wolf0
Legendary
*
Online Online

Activity: 1722


Miner Developer


View Profile
April 16, 2015, 05:49:22 AM
 #337

@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x

I didn't leak that, I released it. Checking my records...

EDIT: Okay, most recent record of Neoscrypt I have is 12/23/2014 (NSFW): https://ottrbutt.com/miner/neoscryptwolf-12232014.png

Code:
Donations: BTC: 1WoLFdwcfNEg64fTYsX1P25KUzzSjtEZC -- XMR: 45SLUTzk7UXYHmzJ7bFN6FPfzTusdUVAZjPRgmEDw7G3SeimWM2kCdnDQXwDBYGUWaBtZNgjYtEYA22aMQT4t8KfU3vHLHG
utahjohn
Hero Member
*****
Offline Offline

Activity: 630


View Profile
April 16, 2015, 03:03:22 PM
 #338

@wolf0 do you have anything better than the neocrypt kernel u leaked on feathercoin thread?  I am getting 278KHs on 7950 and 295 on 280x

I didn't leak that, I released it. Checking my records...

EDIT: Okay, most recent record of Neoscrypt I have is 12/23/2014 (NSFW): https://ottrbutt.com/miner/neoscryptwolf-12232014.png
Needless to say but I will, I appreciate your work, I have no conception of wavefronts and such, I have tried but I'm just too old to embrace new concepts.  If you have something better for me please do put on Mega Smiley  Same goes for groestl Pallas Smiley  U are my heroes Smiley
And realhet who understands AMD GPU coding better than all of us Smiley  realhet hetpas assembly kernel still best for 280x and other Tahiti cards AFAIK Smiley
pallas
Legendary
*
Offline Offline

Activity: 1442


Black Belt Developer


View Profile
April 16, 2015, 03:27:14 PM
 #339

Just wanted to say I've tried applying some of the tricks I learnt working on whirlpoolx to the groestl kernel, but it's not so simple.
This kernel is much bigger in size so you can't just copy some good lines of code and it runs faster. Furthermore some of the optimizations I made in the past, make it more time consuming to apply some apparently simple hacks. Wolf0 I'm sure you know what I mean ;-)
Still there is room for improvement, I have some ideas, but the question is: when the profit is gone, and the fun is gone, is it still worth?

utahjohn
Hero Member
*****
Offline Offline

Activity: 630


View Profile
April 16, 2015, 03:32:50 PM
 #340

Just wanted to say I've tried applying some of the tricks I learnt working on whirlpoolx to the groestl kernel, but it's not so simple.
This kernel is much bigger in size so you can't just copy some good lines of code and it runs faster. Furthermore some of the optimizations I made in the past, make it more time consuming to apply some apparently simple hacks. Wolf0 I'm sure you know what I mean ;-)
Still there is room for improvement, I have some ideas, but the question is: when the profit is gone, and the fun is gone, is it still worth?

I expect DMD to drop into low teens difficulty after a week or so Smiley  If it does not mining is dead LOL.  I have a direct interest in this as a partner on donkypool ... 12 miners up from 6 a few weeks ago ... I am currently mining neoscrypt for sale on westhash lol and p=4.8 selling Smiley  anything less goes to yaamp ...
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 [17] 18 19 20 21 22 »  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!