[ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

damm315er

Sr. Member

Offline

Activity: 539
Merit: 255

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 04, 2015, 01:02:18 AM

#2681

Quote from: jch9678 on January 03, 2015, 07:48:19 PM

Quote from: damm315er on January 03, 2015, 03:17:56 PM

Quote from: jch9678 on January 02, 2015, 07:49:30 PM

I don't think elpida memory could account for such a wide discrepancy in the hash or the HW errors. At the very least a 290x should be equal with a 290, the only difference is the number of shaders at least for a reference card. If it was a hynix 290x hitting 330+ and an elpida 290x hitting 310, then maybe I could understand. I've got a hynix 290x in my test rig and I forgot what the other one is but I can test. I realize you spent hours on your config for your 290 but if you feel like showing it I can start from there. I've got 14.6 rc2 installed on the test rig and I'm going to install 14.9 tonight and drop the 14.6 in the mining directory (I also use wolf0's builds). Do you use Stilt's bios? I couldn't get stilt's bios stable for the X coins but I wonder if it will work for neoscrypt. Maybe Stilt on neoscrypt will let us find the right ratio of gpu to memory clock (if neoscrypt is anything like scrypt). Sad thing is this really doesn't make too much of a difference in profit.

Yeah, right from the get-go tuning the GPU's for neoscrypt (and tuning them for scrypt as well) using the exact same settings would typically get the 290's with Hynix more hash than the 290x's with elpida. Then once that peaked I split them off in different directions for tuning.

There was a single setting where the 290x got more hash than the 290, but that was back in the 30 to 60 kh/s range and was never repeatable with the newer kernel and drivers, even with the same settings.

It may be more than just the hynix/elpida thing, it could be in the card hardware or bios (never tried stilts). I never dug deeper than the memory after I figured out why the 290's were outpacing the 290x's mining scrypt. But, when I was doing scrypt the hashrates were much closer, so it could also be that the bottleneck in the kernel affects the 290x worse.

And you have another good point.. At this time hashrate isn't good for much more than bragging rights unless you have a farm.. I heard via the rumor mill that there's a massive GPU farm getting built. If that is true, then unless there's some attractive new coins to draw the hash, the GPU mine-able coins are all going to get diluted even further. It would help if BTC weren't tanking, but there was a big hype bubble to recover from..

My elpida 290x's always outperformed my hynix 290s, especially with Stilt's bios. Stilt's bios was stable for the 2 hynix 290x on my test rig mining neoscrypt but it didn't seem to make a difference in the max hash I could get. With either bios I could squeeze out about 330kh/s by overclocking to 1070/1500. There was no magic ratio that I could find but that may be because I'm running the stock kernel. I have a feeling stilt's bios may help out with a better kernel, if not for performance then for energy savings. The core clock doesn't influence hash that much which is something wolf0 and others have said, ie crank up the memory speed and downclock the core for energy savings.

Testing the different drivers didn't make a difference to me, in fact I saw a slight increase in just sticking with 14.6rc2, as opposed to using 14.9 and 14.6 ocl files. Maybe you play games and 14.9 is better for that but I don't use these for gaming. Testing different settings didn't really make too much of a difference either, TCs of 8192, 8448, 16384, 22500 (I used that for scrypt-n) and 22528 and different worksizes didn't produce a significant change. The 290 kernel bottleneck is a problem.

I wouldn't worry to much about a massive gpu farm. I don't think it will matter too much, there will always be new farmers and some will also leave. Now if it's wolf0's farm then maybe that would be something to worry about, Wink

But thanks to his hawaii bin, x11 is much more profitable for me than neoscrypt.

Anyway

My 290x don't like over 1130 clocks and 1450 memory at all, they keep crashing the drivers or the rig. But the 290's will run 1150 and 1500. They might run more, but I haven't really pushed them as I had bad fans. Now they are back with new fans, I will push further when I have the time to pay attention to it.

No gaming on this rig.. It's pretty capable of it with an AMD8350 and 16 gigs of ram, but 4 of the 5 GPU's only have a 1x extender so it would be playing with a single 280x. I noticed about 5 to 10 mhs difference with the 14.6/14.9, odd that you don't.

Yes, there's a whole lotta blah settings in that bottlenecked area, I tried them all, and many a few times as I was running 4 gpu's and would set all 4 differently during the testing to cover as much ground as possible.

Prelude

Legendary

Offline

Activity: 1596
Merit: 1000

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 04, 2015, 01:39:19 AM

#2682

Quote from: Wolf0 on January 03, 2015, 11:23:50 PM

Quote from: kopam on January 03, 2015, 07:38:24 PM

Quote from: damm315er on January 03, 2015, 07:31:20 PM

Quote from: kopam on January 03, 2015, 06:59:30 PM

Hey, so i started testing neoscrypt configs, but i have no idea what i am aiming for.
Can anyone share what is the most you can get from 7950 ? I am getting around 220kh/s. Is that good ? can i get more then that ?

Cheers

Looks like you can get a little more..

http://hw.neoscrypt.tk/index.php

I actually got them up to 260 but i am still wondering if i can get more out of them. I am using sgminer5.1-dev

I would like to know what is the best some one got out of this cards or from any cards actually.
I mean any optimized hidden super secret kernel etc

Just wondering what is the max at this moment.

Around 600kh/s out of 290X.

That's insane! I want your kernel. Wink

Do you have any power figures VS the publicly available kernel that gets me ~315KH/s on 290 & 290X @975/1500?

Also, what clocks are you running? Would you mind sharing your TC and other relevant settings if they can be applied to the public kernel?

go6ooo1212

Legendary

Offline

Activity: 1512
Merit: 1000

quarkchain.io

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 04, 2015, 07:53:22 AM

#2683

..who doesn't want those wolf0's kernels

damm315er

Sr. Member

Offline

Activity: 539
Merit: 255

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 04, 2015, 12:44:58 PM

#2684

Quote from: go6ooo1212 on January 04, 2015, 07:53:22 AM

..who doesn't want those wolf0's kernels

True.

bobben2

Full Member

Offline

Activity: 279
Merit: 104

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 06, 2015, 05:55:29 PM

#2685

Here is a small neoscrypt kernel improvement for free, since I am mostly doing X11 anyway.
It gave me a 5.8% speedup on my reference R9 290 card (with Stilt bios),
from 290.2 to 307Kh/s at 800/1500 core/mem freq on Ubuntu 12.04 with stock drivers.
I didnt try it on my R9 280x cards, so please post your results if you try this.

You will have to mod the kernel as per the code below.
The bottleneck in this kernel is the way it stores the 128 intermediate results of chacha and salsa in global memory.
By doing the change below you are reducing stalls/latency by not making read/writes to same/adjacent memory banks.

Change:
void ScratchpadStore(__global void *V, void *X, uchar idx)
{
   ((__global ulong16 *)V)[idx << 1] = ((ulong16 *)X)[0];
   ((__global ulong16 *)V)[(idx << 1) + 1] = ((ulong16 *)X)[1];
}

void ScratchpadMix(void *X, const __global void *V, uchar idx)
{
   ((ulong16 *)X)[0] ^= ((__global ulong16 *)V)[idx << 1];
   ((ulong16 *)X)[1] ^= ((__global ulong16 *)V)[(idx << 1) + 1];
}

To:
void ScratchpadStore(__global void *V, void *X, uchar idx)
{
   ((__global ulong16 *)V)[idx] = ((ulong16 *)X)[0];
   ((__global ulong16 *)V)[idx + 128] = ((ulong16 *)X)[1];
}
void ScratchpadMix(void *X, const __global void *V, uchar idx)
{
   ((ulong16 *)X)[0] ^= ((__global ulong16 *)V)[idx];
   ((ulong16 *)X)[1] ^= ((__global ulong16 *)V)[idx + 128];
}

Fellow miners, get your thens and thans in order and help other forum readers understand what you are writing. Remember the grammar basics: B larger THAN A (comparator operator). If something THEN ....

JuanHungLo

Hero Member

Offline

Activity: 935
Merit: 1001

I don't always drink...

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 06, 2015, 06:50:34 PM

#2686

@ bobben2, 280x with hynix 302Kh/s 1100/1600 x 4 is 990 watts at the wall

Bull markets are born on pessimism, grow on skepticism, mature on optimism, and die on euphoria. - John Templeton

bobben2

Full Member

Offline

Activity: 279
Merit: 104

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 06, 2015, 07:17:33 PM

#2687

Quote from: JuanHungLo on January 06, 2015, 06:50:34 PM

@ bobben2, 280x with hynix 302Kh/s 1100/1600 x 4 is 990 watts at the wall

Those cards must be screaming Grin

How much of a %age improvement did you get?

Zuikkis

Newbie

Offline

Activity: 57
Merit: 0

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 06, 2015, 07:37:36 PM

#2688

Quote from: JuanHungLo on January 06, 2015, 06:50:34 PM

@ bobben2, 280x with hynix 302Kh/s 1100/1600 x 4 is 990 watts at the wall

That's not very good hashrate with the public kernel.. 1600 is not very good memclock, try lowering to 1500.

I had about 320khs with 1100/1500..

JuanHungLo

Hero Member

Offline

Activity: 935
Merit: 1001

I don't always drink...

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 06, 2015, 07:47:59 PM

#2689

Quote from: bobben2 on January 06, 2015, 07:17:33 PM

Quote from: JuanHungLo on January 06, 2015, 06:50:34 PM

@ bobben2, 280x with hynix 302Kh/s 1100/1600 x 4 is 990 watts at the wall

Those cards must be screaming Grin

How much of a %age improvement did you get?

Sorry, 1600 was a type, it is 1500
about 8.6% with the below config. Note the undervoltage to 1000.

Quote

{
  "pools": [
   {
   "name": "FeatherCoin-neo Pool - WemineFTC",
   "nfactor": "10",
   "algorithm": "neoscrypt",
   "url": "stratum+tcp://stratum.wemineftc.com:4444",
   "user": "USER",
   "pass": "x"
   }
  ],
  "api-port": "4028",
  "gpu-engine": "1100",
  "gpu-memclock": "1500",
  "worksize": "256",
  "gpu-threads": "2",
  "api-listen": true,
  "api-allow": "W:127.0.0.1/32",
  "queue": "1",
  "algorithm": "neoscrypt",
  "device": "0,1,2,3",
  "xintensity": "3",
  "thread-concurrency": "8192",
  "gpu-vddc": "1.00",
  "scan-time": "1",
  "gpu-reorder": true,
  "temp-cutoff": "90",
  "temp-overheat": "82",
  "temp-target": "72",
  "gpu-platform": "0",
  "gpu-dyninterval": "7",
  "expiry": "1",
  "no-pool-disable": true,
  "no-client-reconnect": true,
  "log": "5",
  "no-submit-stale": true,
  "scrypt": true,
  "tcp-keepalive": "30",
  "temp-hysteresis": "3",
  "kernel-path": "/usr/local/bin",
  "powertune": "20"
}

Bull markets are born on pessimism, grow on skepticism, mature on optimism, and die on euphoria. - John Templeton

Zuikkis

Newbie

Offline

Activity: 57
Merit: 0

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 06, 2015, 07:52:12 PM

#2690

Try to remove thread-concurrency from config, so sgminer can calculate it from the xintensity.

Edit: Oh, and worksize 128 or even 64 is probably faster.

bobben2

Full Member

Offline

Activity: 279
Merit: 104

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 06, 2015, 08:08:48 PM

#2691

Hi again,
Now I tried the "improved" kernel on my own 280X rig.
3 cards, all running 1000/1500 core/mem. 550Watts at the wall
Orig neoscrypt kernel (Kh/s)
301
296
287
My "improved" kernel
295
289
276
Yiikes! I got worse performance on the 280X!
Sorry guys. This "improvement", as it stands, seems to come to the 290 only.

JuanHungLo

Hero Member

Offline

Activity: 935
Merit: 1001

I don't always drink...

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 06, 2015, 08:31:11 PM

#2692

I concur. After further testing with suggested settings I was getting HW errors. Back to the drawing board...

Bull markets are born on pessimism, grow on skepticism, mature on optimism, and die on euphoria. - John Templeton

damm315er

Sr. Member

Offline

Activity: 539
Merit: 255

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 07, 2015, 02:21:03 AM

#2693

So wolf0 isn't the only one with a kernel mod that works better on the 290's than anything else..

Eastwind

Hero Member

Offline

Activity: 896
Merit: 1000

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 07, 2015, 10:42:42 AM

#2694

Quote from: unklo on January 06, 2015, 09:14:48 PM

Quote from: bobben2 on January 06, 2015, 08:08:48 PM

Hi again,
Now I tried the "improved" kernel on my own 280X rig.
3 cards, all running 1000/1500 core/mem. 550Watts at the wall
Orig neoscrypt kernel (Kh/s)
  301
  296
  287
My "improved" kernel
  295
  289
  276
Yiikes! I got worse performance on the 280X!
Sorry guys. This "improvement", as it stands, seems to come to the 290 only.

+1
on my 280x i get 308 with "improved" vs 317 before

The drop is about 5% compared to old kernel for 7970. Maybe this "improved kernel" works only for 290 which has larger memory and more cores.

KL0nLutiy

Member

Offline

Activity: 158
Merit: 10

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 07, 2015, 11:41:18 AM

#2695

Quote from: bobben2 on January 06, 2015, 05:55:29 PM

thanks, increase from 317 to 324 on 290x

╔═════════════════ CARDSTACK ═════════════════╗
╚══◼ The Experience Layer of the Decentralized Internet ◼══╝
◼══════ Twitter ⦁ Telegram ⦁ Blog ⦁ Bitcointalk ══════◼

tccd

Newbie

Offline

Activity: 51
Merit: 0

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 07, 2015, 11:56:49 AM

#2696

Quote from: bobben2 on January 06, 2015, 05:55:29 PM

Not working well with Wolf0's Hawaii mod. Hash rate dropped from 339kh/s to 320kh/s.

damm315er

Sr. Member

Offline

Activity: 539
Merit: 255

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 09, 2015, 11:01:11 PM
Last edit: January 10, 2015, 02:36:06 PM by damm315er

#2697

Quote from: bobben2 on January 06, 2015, 05:55:29 PM

CORRECTION:

That made a 8 kh/s increase on my 290's.. from 341 to 349 kh/s. (dumb azz me, I forgot to delete the bin)

cat77

Newbie

Offline

Activity: 18
Merit: 0

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 10, 2015, 04:16:31 AM
Last edit: January 10, 2015, 04:40:15 AM by cat77

#2698

.....This is worth 20KH/s on my 280X......from 343KHs to 363KH/s at 1020MHz clock
.....now somebody needs to find 20KH/s more for me....

change the XORBytesInPlace call from

Code:

	XORBytesInPlace(B + bufidx, input, BLAKE2S_OUT_SIZE);

Code:

      XORBytesInPlace(B + bufidx, input, bufidx);

and change the function itself to perform some byte alignment checking

Code:

//
// a bit of byte alignment checking goes a long ways...
//
void XORBytesInPlace(void *restrict dst, const void *restrict src, uint mod)
{
  switch(mod % 4)
  {
  case 0:
    #pragma unroll 2
    for(int i = 0; i < 4; i+=2)
    {
    	  ((uint2 *)dst)[i]   ^= ((uint2 *)src)[i]; 
     	  ((uint2 *)dst)[i+1] ^= ((uint2 *)src)[i+1];    
    }
    break;    

  case 2:  
    #pragma unroll 8
    for(int i = 0; i < 16; i+=2)
    {
    	  ((uchar2 *)dst)[i] ^= ((uchar2 *)src)[i]; 
    	  ((uchar2 *)dst)[i+1] ^= ((uchar2 *)src)[i+1]; 
    }
    break;

  default:
  #pragma unroll 8
   for(int i = 0; i < 31; i+=4)
   {
  	  ((uchar *)dst)[i] ^= ((uchar *)src)[i];
  	  ((uchar *)dst)[i+1] ^= ((uchar *)src)[i+1];
  	  ((uchar *)dst)[i+2] ^= ((uchar *)src)[i+2];
  	  ((uchar *)dst)[i+3] ^= ((uchar *)src)[i+3];   
    }
  }
}

Eastwind

Hero Member

Offline

Activity: 896
Merit: 1000

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 10, 2015, 09:02:03 AM

#2699

Quote from: cat77 on January 10, 2015, 04:16:31 AM

.....This is worth 20KH/s on my 280X......from 343KHs to 363KH/s at 1020MHz clock
.....now somebody needs to find 20KH/s more for me....

change the XORBytesInPlace call from

Code:

	XORBytesInPlace(B + bufidx, input, BLAKE2S_OUT_SIZE);

Code:

      XORBytesInPlace(B + bufidx, input, bufidx);

and change the function itself to perform some byte alignment checking

Code:

//
// a bit of byte alignment checking goes a long ways...
//
void XORBytesInPlace(void *restrict dst, const void *restrict src, uint mod)
{
  switch(mod % 4)
  {
  case 0:
    #pragma unroll 2
    for(int i = 0; i < 4; i+=2)
    {
    	  ((uint2 *)dst)[i]   ^= ((uint2 *)src)[i]; 
     	  ((uint2 *)dst)[i+1] ^= ((uint2 *)src)[i+1];    
    }
    break;    

  case 2:  
    #pragma unroll 8
    for(int i = 0; i < 16; i+=2)
    {
    	  ((uchar2 *)dst)[i] ^= ((uchar2 *)src)[i]; 
    	  ((uchar2 *)dst)[i+1] ^= ((uchar2 *)src)[i+1]; 
    }
    break;

  default:
  #pragma unroll 8
   for(int i = 0; i < 31; i+=4)
   {
  	  ((uchar *)dst)[i] ^= ((uchar *)src)[i];
  	  ((uchar *)dst)[i+1] ^= ((uchar *)src)[i+1];
  	  ((uchar *)dst)[i+2] ^= ((uchar *)src)[i+2];
  	  ((uchar *)dst)[i+3] ^= ((uchar *)src)[i+3];   
    }
  }
}

Did you change from the original kernal or after boben2's change? Can you upload a revised kernal?

KL0nLutiy

Member

Offline

Activity: 158
Merit: 10

Re: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

January 10, 2015, 11:58:54 AM

#2700

Quote from: cat77 on January 10, 2015, 04:16:31 AM

.....This is worth 20KH/s on my 280X......from 343KHs to 363KH/s at 1020MHz clock
.....now somebody needs to find 20KH/s more for me....

change the XORBytesInPlace call from

Code:

	XORBytesInPlace(B + bufidx, input, BLAKE2S_OUT_SIZE);

Code:

      XORBytesInPlace(B + bufidx, input, bufidx);

and change the function itself to perform some byte alignment checking

Code:

//
// a bit of byte alignment checking goes a long ways...
//
void XORBytesInPlace(void *restrict dst, const void *restrict src, uint mod)
{
  switch(mod % 4)
  {
  case 0:
    #pragma unroll 2
    for(int i = 0; i < 4; i+=2)
    {
    	  ((uint2 *)dst)[i]   ^= ((uint2 *)src)[i]; 
     	  ((uint2 *)dst)[i+1] ^= ((uint2 *)src)[i+1];    
    }
    break;    

  case 2:  
    #pragma unroll 8
    for(int i = 0; i < 16; i+=2)
    {
    	  ((uchar2 *)dst)[i] ^= ((uchar2 *)src)[i]; 
    	  ((uchar2 *)dst)[i+1] ^= ((uchar2 *)src)[i+1]; 
    }
    break;

  default:
  #pragma unroll 8
   for(int i = 0; i < 31; i+=4)
   {
  	  ((uchar *)dst)[i] ^= ((uchar *)src)[i];
  	  ((uchar *)dst)[i+1] ^= ((uchar *)src)[i+1];
  	  ((uchar *)dst)[i+2] ^= ((uchar *)src)[i+2];
  	  ((uchar *)dst)[i+3] ^= ((uchar *)src)[i+3];   
    }
  }
}

What settings do you use?

Pages: « 1 ... 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 [135] 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 ... 233 »

Bitcoin Forum > Alternate cryptocurrencies > Mining (Altcoins) > [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner

« previous topic next topic »