[ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 10:47:39 AM

#4241

Quote from: Wolf0 on January 15, 2016, 10:23:51 AM

Quote from: pallas on January 15, 2016, 10:06:10 AM

Quote from: Wolf0 on January 14, 2016, 09:53:57 PM

Quote from: pallas on January 14, 2016, 08:25:44 PM

That shows the importance of simd. Now you know where to work ;-)
Thanks for the information, please keep us updated on the progress!
Please tell me if I can be of any help.

I don't know - I know one thing for certain now, though - Kachur's Blake was about the same speed as mine, his BMW, however, needed a bit of work. Mine improved overall X11 hash by 1.35% (remember that BMW itself is quite a small part of X11, so the raw improvement in my BMW over his is much larger.)

EDIT: search2, originally Groestl-512, did not take to a simple kernel replacement and will have to be investigated further (manual study of the disassembly.) Skipping for now.

EDIT2: search2 may not have been fucked because of a difference in output, but in where the bloody constants are in global. For JH, I'm going to make an all new test kernel which takes a read-only buffer for JH's constants rather than trying to reference constant memory right now. Simpler. I should then be able to put that in place of the Kachur JH and modify SGMiner to pass a constant buffer on that kernel.

Is search2 faster than yours? or is it just simd?
Maybe Kachur has found a way to make AES-like algos better...
BTW I wouldn't mind a frankenbin if it's faster and stable ;-)

I can't tell - without straight up replacement of a kernel, I dunno if he's done some kind of fuckery with part of a hash in one kernel, and part in another, for example. What I suspect is SIMD has been cut into two parts (at least.)

Now, even if his Groestl is faster than mine, my current Groestl is outdated anyways. My R & D area has a bitsliced Groestl that I have not yet played too much with - parallelization using 4 work items like it's done in CUDA should be possible. I can drop to GCN ASM for ds_swizzle_b32 - limits me to a 4-way, as it's not a 32-way shuffle like CUDA, but it's enough for me. I've just got a lot to do atm - maybe there is something we could work on together... a Groestl, perhaps? If you could look at the code and see if you could split it over multiple work-items and use LDS for the data sharing, I could probably remove said LDS usage by disassembling and modifying the kernel before reassembling it?

SIMD: tonight I was thinking about it and slicing into two parts is the natural way of doing it; I think I could try that. The only little annoyance is that the data to be passed between the (at least two) parts won't just be a hash but a bigger set of data, so the standard sgminer searchX() system wouldn't work.

GROESTL (and similar): I always had the idea that nvidia had to do the bitslice thing because shared memory was slower than on GCN; in fact nvidia bitsliced is on par with GCN LDS. As a logical consequence, I think that if bitslice on GCN is presumed to be slower than on nvidia, I wouldn't even try it.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

24core

Sr. Member

Offline

Activity: 582
Merit: 250

An Impressive Purely Anonymous Currency.

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 11:00:48 AM

#4242

Quote from: Foss on January 05, 2016, 02:47:05 PM

Quote from: 24core on January 05, 2016, 09:45:52 AM

Quote from: rednoW on January 05, 2016, 08:11:14 AM

Quote from: sp_ on January 05, 2016, 05:29:26 AM

But with Xintensity at 1024 you get invalid shares, so lower hashrate on the pool. X 256 or 512 is the highest I can use..

Default was 640. I used this w/o a problem )))

Can you please share your config for the 7950 ?

Also slightly confused about the kachur miner as NiceHash download os 5.2.1, is there another link ?

https://github.com/nicehash/NiceHashMiner/releases use sgminer-5-1-0-optimized

Code:

setx GPU_MAX_ALLOC_PERCENT 100
sgminer.exe --algorithm darkcoin-mod -o stratum+tcp://x11.eu.nicehash.com:3336 -u 1M948TedPdVkbk59TMnYJhtW5BFTXKPfFL -p d=0.04 -d 0 --xintensity 640 -g 1 -w 64 --gpu-memclock 1500 --gpu-engine 1100 -s 0 --expiry 10 --queue 0 --gpu-powertune 10 --keccak-unroll 0 --hamsi-expand-big 4 --gpu-fan 30-95 --temp-cutoff 95 --temp-overheat 90 --temp-target 75 --auto-fan

I tried those settings and many others, but I get a very low Hash rate and my CPU is going mental.

I only have dual core Intel G3220 @ 3Ghz - Does the optimised miner require extra CPU and hence my mining rig requires a CPU upgrade ?

▀▄ ▄▀ ██▄██ ███▀███ ▄██▀█▄ ▀██▄ ▄██▀ ▄███▄ ▀██▄ ▄██▀ ▄██▀ ▀██▄ ▀██▄ ██▀ ▄████▄ ▄████▄ ▀██ ██▀ ██▀ ▄█▌ ▐█▄ ▀██ ▀██ ▐██ ▐██ ██▄ ▄██ ██▌ ██▌ ██▄ ▀██▄ ▀▀▀ ▄██▀ ▄██ ██▄ ▀█████████▀ ▄██ ▀██▄▄ ▄▄██▀ ▀▀█████████▀▀

DeepOnion
Your Choice for Privacy

Boost My Privacy With DeepSend
.
► ► WTF?! DeepSend ◄ ◄

───── Meet My Team
─── 9 Experts Working
─ +136 Weeks NonStop

The DeepOnion Story ──
ONION Looks Great! ──
Give Me ONION ────

► Resources
► VoteCentral
► DeepVault

████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ████

ONION Community

YouTube
Telegram

Twitter
Reddit

████ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ████

z0n0

Legendary

Offline

Activity: 1274
Merit: 1006

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 11:33:56 AM

#4243

pallas & Wolf0:
when I read your (coding 'n shit) dialog, here is what I read: kjndcknsdvnlsdnvlsvlsdlkvlksdmvlmskcnsjvnljsnvljsnvlsdclkslvjhsdljvnlxnvcdnvl

Grin

ldw-com

Full Member

Offline

Activity: 229
Merit: 100

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 11:37:48 AM

#4244

Quote from: z0n0 on January 15, 2016, 11:33:56 AM

pallas & Wolf0:
when I read your (coding 'n shit) dialog, here is what I read: kjndcknsdvnlsdnvlsvlsdlkvlksdmvlmskcnsjvnljsnvljsnvlsdclkslvjhsdljvnlxnvcdnvl

Grin

Oh, is that what you're reading?

Well, over here it looks like this "这是我读一堆奇怪的迹象"

:p

z0n0

Legendary

Offline

Activity: 1274
Merit: 1006

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 11:38:57 AM

#4245

Quote from: ldw-com on January 15, 2016, 11:37:48 AM

Quote from: z0n0 on January 15, 2016, 11:33:56 AM

pallas & Wolf0:
when I read your (coding 'n shit) dialog, here is what I read: kjndcknsdvnlsdnvlsvlsdlkvlksdmvlmskcnsjvnljsnvljsnvlsdclkslvjhsdljvnlxnvcdnvl

Grin

Oh, is that what you're reading?

Well, over here it looks like this "这是我读一堆奇怪的迹象"

:p

hahaha, yeah or that

Hey guys, whats the hash speed at X11 with R9 380 (4GB sapphire)?

ldw-com

Full Member

Offline

Activity: 229
Merit: 100

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 11:41:28 AM

#4246

Quote from: z0n0 on January 15, 2016, 11:38:57 AM

Quote from: ldw-com on January 15, 2016, 11:37:48 AM

Quote from: z0n0 on January 15, 2016, 11:33:56 AM

pallas & Wolf0:
when I read your (coding 'n shit) dialog, here is what I read: kjndcknsdvnlsdnvlsvlsdlkvlksdmvlmskcnsjvnljsnvljsnvlsdclkslvjhsdljvnlxnvcdnvl

Grin

Oh, is that what you're reading?

Well, over here it looks like this "这是我读一堆奇怪的迹象"

:p

hahaha, yeah or that

Hey guys, whats the hash speed at X11 with R9 380 (4GB sapphire)?

I'm getting 18-19Mhz with a 390 so around 17 i guess?

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 11:50:22 AM

#4247

Quote from: Wolf0 on January 15, 2016, 10:57:45 AM

Quote from: pallas on January 15, 2016, 10:47:39 AM

Quote from: Wolf0 on January 15, 2016, 10:23:51 AM

Quote from: pallas on January 15, 2016, 10:06:10 AM

Quote from: Wolf0 on January 14, 2016, 09:53:57 PM

Quote from: pallas on January 14, 2016, 08:25:44 PM

That shows the importance of simd. Now you know where to work ;-)
Thanks for the information, please keep us updated on the progress!
Please tell me if I can be of any help.

Is search2 faster than yours? or is it just simd?
Maybe Kachur has found a way to make AES-like algos better...
BTW I wouldn't mind a frankenbin if it's faster and stable ;-)

You might not be looking at the big picture with Groestl - look at that fucking shitty amount of waves in flight you get due to LDS (ab)use.

That's an issue with <= tahiti only, hence why I hate optimizing for those chips ;-)

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 12:12:49 PM

#4248

Quote from: Wolf0 on January 15, 2016, 11:54:18 AM

Quote from: pallas on January 15, 2016, 11:50:22 AM

Quote from: Wolf0 on January 15, 2016, 10:57:45 AM

Quote from: pallas on January 15, 2016, 10:47:39 AM

Quote from: Wolf0 on January 15, 2016, 10:23:51 AM

Quote from: pallas on January 15, 2016, 10:06:10 AM

Quote from: Wolf0 on January 14, 2016, 09:53:57 PM

Quote from: pallas on January 14, 2016, 08:25:44 PM

That shows the importance of simd. Now you know where to work ;-)
Thanks for the information, please keep us updated on the progress!
Please tell me if I can be of any help.

Is search2 faster than yours? or is it just simd?
Maybe Kachur has found a way to make AES-like algos better...
BTW I wouldn't mind a frankenbin if it's faster and stable ;-)

You might not be looking at the big picture with Groestl - look at that fucking shitty amount of waves in flight you get due to LDS (ab)use.

That's an issue with <= tahiti only, hence why I hate optimizing for those chips ;-)

Not the case - two waves in flight, and your kernel is STILL not actually using the GPU's parallelism like it's supposed to be. One Groestl-512 hash is a big job, and it's parallelizable. If you're doing a throughput of 64 hashes per local workgroup, then use 256 for Groestl, and do 4 work-items per actual hash. Tune to taste.

I understand what you mean: it's like the good old cgminer "vector size". I will think about it.
Besides, I haven't worked on groestl for a long while, but on whirlpool and variants I can easily get 3 waves on >= hawaii.
It's a lighter job, I know, but I haven't had any interest in developing groestl recently.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 15, 2016, 01:27:06 PM

#4249

Quote from: Wolf0 on January 15, 2016, 12:35:52 PM

No, it is the OPPOSITE of vector size. You don't get how the GPU is ACTUALLY supposed to solve issues, I don't think - it really doesn't fucking like large code size, or very complex problems in one work-item - you know this.

Vectors were profitable before because of the old architectures - VLIW based. GCN abolished hardware vectors, and instead made VGPRs 4 bytes. Why, you may ask? Occupancy! This way, if you need to work on a problem that can't be efficiently vectorized like that, you don't waste most of your VGPR.

But, but, but... mah parallelism! GCN has you covered - you just need to think of the shit differently. Instead of parallelizing in vectors, do it in work-items. To give you the cleanest example I've worked with demonstrating this (in X11), take Echo-512.

You have a 256 byte state which I'll now refer to as W. W can be represented as an array of 16 uint4s. If you're looking at the shitty darkcoin-mod.cl trying to visualize this, just look at the 64-bit W vars and imagine them as 32-bit, and an array. Now, if I was going to demonstrate this technique with Echo - I have an array of 4 uint4s. This is my W. To figure out which part of the hash you are, you can choose two ways: launch the kernel with throughput * 4, 1, 1 local size, or do throughput, 4, 1 local size. Since the latter is cleaner, I'll assume that notation: lid = get_local_id(0), and hashid = get_local_id(1).

if hashid is < 2 (i.e. 0 or 1) - we fill up W with (512, 0, 0, 0) (uint4, remember) over all four array indices. If hashid == 2, W becomes the input (input being 16 uints, this may be represented as 4 uint4s, as well), and if hashid == 3, we fill up W with the odds & ends - for X11, these are (0x80, 0, 0, 0) for W[0], (0, 0, 0, 0) for W[1], (0, 0, 0, 0x02000000) for W[2], and (512, 0,0, 0,) for W[3]. Now, go pull up darkcoin-mod.cl, and look at it until the this and the previous paragraph make sense.

I'll continue with rounds and output calculation in another post in just a bit.

Thanks for the explanation: let me put it in simple words so you can easily understad if I got it or not :-)
Basicly you are dividing the state in 4 parts which will be computed by different work items. Less regs per kernel leads to more waves and generally better parallelism.
Looking at echo, there is a good amount of work which can be done on a single slice of the input, but in this case:

#define BIG_SHIFT_ROWS do { \
SHIFT_ROW1(1, 5, 9, D); \
SHIFT_ROW2(2, 6, A, E); \
SHIFT_ROW3(3, 7, B, F); \
}

there are overlaps. I assume you'll use LDS to communicate between work items, or maybe shuffle but that would need assembly.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 16, 2016, 08:39:32 PM

#4250

Wolf0 I'm curious to know if you tried that technique (split to multiple work items) on a kernel and how was the outcome.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

go6ooo1212

Legendary

Offline

Activity: 1512
Merit: 1000

quarkchain.io

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 16, 2016, 10:51:28 PM

#4251

Has someone been testing the hash-power of R9 Nano , on Ethereum ?

Eliovp

Legendary

Offline

Activity: 1050
Merit: 1293

Huh?

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 16, 2016, 11:06:08 PM

#4252

Quote from: go6ooo1212 on January 16, 2016, 10:51:28 PM

Has someone been testing the hash-power of R9 Nano , on Ethereum ?

Sure

I have

26Mh stock

Eliovp.com - By Miles The Best ROI in the industry!

go6ooo1212

Legendary

Offline

Activity: 1512
Merit: 1000

quarkchain.io

⇾ Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 16, 2016, 11:24:31 PM

#4253

Quote from: Eliovp on January 16, 2016, 11:06:08 PM

Quote from: go6ooo1212 on January 16, 2016, 10:51:28 PM

Has someone been testing the hash-power of R9 Nano , on Ethereum ?

Sure

I have

26Mh stock

Pf, I've expected some more improvement , compared to 7970/280X

Eliovp

Legendary

Offline

Activity: 1050
Merit: 1293

Huh?

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 17, 2016, 01:26:41 AM

#4254

Quote from: go6ooo1212 on January 16, 2016, 11:24:31 PM

Quote from: Eliovp on January 16, 2016, 11:06:08 PM

Quote from: go6ooo1212 on January 16, 2016, 10:51:28 PM

Has someone been testing the hash-power of R9 Nano , on Ethereum ?

Sure

I have

26Mh stock

Pf, I've expected some more improvement , compared to 7970/280X

You're not the only one...

Eliovp.com - By Miles The Best ROI in the industry!

chrysophylax

Legendary

Offline

Activity: 2828
Merit: 1091

--- ChainWorks Industries ---

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 17, 2016, 04:32:22 AM

#4255

Quote from: Eliovp on January 17, 2016, 01:26:41 AM

Quote from: go6ooo1212 on January 16, 2016, 11:24:31 PM

Quote from: Eliovp on January 16, 2016, 11:06:08 PM

Quote from: go6ooo1212 on January 16, 2016, 10:51:28 PM

Has someone been testing the hash-power of R9 Nano , on Ethereum ?

Sure

I have

26Mh stock

Pf, I've expected some more improvement , compared to 7970/280X

You're not the only one...

just wait till wolf decides that its worth his time to get on the code for the new cards ... if he does of course ...

Wink

...

#crysx

CWI-Thread (theFORUM) - https://bitcointalk.org/index.php?topic=1563601 . CWI-WebSite (theSITE) - https://chainworksindustries.com/ . CWI-Shop (theSHOP) - https://chainworksindustries.com/theSHOP.html .

RyanX

Full Member

Offline

Activity: 140
Merit: 100

fastdice.com The Worlds Fastest Bitcoin Dice

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 17, 2016, 10:33:47 AM

#4256

Quote from: go6ooo1212 on January 16, 2016, 11:24:31 PM

Quote from: Eliovp on January 16, 2016, 11:06:08 PM

Quote from: go6ooo1212 on January 16, 2016, 10:51:28 PM

Has someone been testing the hash-power of R9 Nano , on Ethereum ?

Sure

I have

26Mh stock

Pf, I've expected some more improvement , compared to 7970/280X

https://forum.ethereum.org/discussion/comment/16112/#Comment_16112

"I have a Nano. HBM does you no good for ethash. HBM memory cycle time (effective) is the same as GDDR5 cards, although GDDR5 has higher latency. The advantage of HBM is that it has a 4096 bit wide bus, so the *bandwidth* is much higher, not the access time, which is what's critical for ethash. The issue is that ethash does more or less random 128 byte reads of the Dag at the core of the inner loop, which can't take advantage of HBM's super wide bus. That is compounded by the fact that the address of the next read (of the Dag) is computed from the 128 byte Dag currently being "mixed', so pre-fetch isn't happening to any significant degree. The algorithm's memory accesses also render memory caching of limited use. Starting to get the picture?

The GPU does overlap global memory accesses with compute work, so some of the memory access time is hidden, but that has nothing to do with HBM. The fact is ethash is a bugger, period."

╔═══════════════════════FastDice.com ▊▊▊ Up To 300 Bets Per Second══════════════════════╗
▊▊▊▊▊▊▊▊ 850,000,000 Bets in 1st 3 Months, FIND OUT WHY ALL THE BUZZ ▊▊▊▊▊▊▊▊
The Worlds Fastest Bitcoin Dice ✔ 1% house edge ✔ 1,000,000’s Given away weekly

revelacaogr

Legendary

Offline

Activity: 1316
Merit: 1021

2009 Alea iacta est

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 20, 2016, 09:12:49 AM
Last edit: January 20, 2016, 09:27:34 AM by revelacaogr

#4257

i hope WOLF0 will make a good pablic algo for decred coin when the mining will start.....

ps: wolf0 u must know that u r my favorite...U prove yr work with facts & not with many sauces.....THANKS FOR ALL!

pallas

Legendary

Offline

Activity: 2716
Merit: 1094

Black Belt Developer

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 20, 2016, 09:54:26 AM

#4258

Quote from: revelacaogr on January 20, 2016, 09:12:49 AM

I hope Wolf0 will prove me wrong, but I fear there is not a lot to optimise on blake 14 rounds.

Cryptonite (XCN): first mini-blockchain coin, innovative, running since 2014!

ldw-com

Full Member

Offline

Activity: 229
Merit: 100

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 20, 2016, 10:36:29 AM

#4259

Quote from: revelacaogr on January 20, 2016, 09:12:49 AM

ps: wolf0 u must know that u r my favorite...U prove yr work with facts & not with many sauces.....THANKS FOR ALL!

That sounds so weird :p

revelacaogr

Legendary

Offline

Activity: 1316
Merit: 1021

2009 Alea iacta est

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 20, 2016, 10:00:44 PM

#4260

U prove yr work with facts & not with many sauces: Not Just Words, But Deeds ...

what is weird about that?

Pages: « 1 ... 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 [213] 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 »

Bitcoin Forum > Alternate cryptocurrencies > Mining (Altcoins) > [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

« previous topic next topic »