Bitcoin Forum
June 23, 2025, 07:48:16 PM *
News: Pizza day contest voting
 
   Home   Help Search Login Register More  
Pages: « 1 ... 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 [280] 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 ... 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347851 times)
Slava_K
Hero Member
*****
Offline Offline

Activity: 677
Merit: 500



View Profile
September 03, 2015, 08:47:27 PM
 #5581

More stable and higher hashrates. Enjoy.

Happy mining!
Lyra2REv2 on SM52 is 5% lower - but pool side is GOOD!
On SM50 2-3% higher.

                                 
                  █████████████████████████████▒
               ▒███████████████████████████████▓░
             ▒████▓                         ░▓███▒░
         ░▒▓████▓                             ░▓███▓▓▒▒░
▓▓▓▓▓████████▓▒               ░░░▒▒▒▒▒░         ░▒█████████▓▓▓▓▓
████████▓▒                ░▒▓▓▓▒▒▒▒▒▒▒▓▓▓▓▒         ░░▒▒████████
▓██▓                   ░▒▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▓███▒░             ███▓
▒███                 ░▓█▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▓▓█████▒░         ▓▓█░
░█▓█                ▓█▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▓▓▓▓▓▓█████▓██░     ▓███
 ▓██▓             ▒██▒▒░▒▒▒▒▒▒▒░▒▒▒▒▒▒▒▒▒▒▒░▒▒░  ░▓█▓      ███▓
 ▒█▓█            ▓█▓▒▒▒▒▒▒▒▒▒▒░▒░▒░░░░░▒▒░ ░▒░░▓███▓      ▒███▒
  █▓█▓          ▓█▓▒▓▒▒▒▒░░░░░░░░░░░░▒▓▒▒░░▒▒▓█████░      ███▓
  ▒█▓█░        ░██▓▓▒░░░░░░░░░▒▓▒░ ░░░ ░░▒▒▓▓▓▓▓█▒█░     ▓███▒
   ▓▓▓▓        ███▒░░░░░▒░░░▒▒▒▒▒░░░░░▒▒▒▒▒▒▒▒▒▓▓ █░    ▒███▒
   ░▓▓▓▓   ░▒▒ █▓▒▒▒▒▒▓▓▓▒░▒░░░░░░░▒▒▒▒▒▒▒▒▒░▒▒▓ ▒█    ░████
    ░▓▒▓▒ ░▓████▓▓▓▒▒▒▓▒░░░░░░░░░▒▒▒▒▒▒▒▒▒▒▒▒▒▒  ██   ░████
     ▒▓▓██  ▓████▓▒▒░░░░░░░░░▒▒▒▒▒▒▒▒▒▒▒░▒░▒▒░ ░██▒  ░████▒
      ▓████  ░██████▓▓▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░  ░███▓  ░████░
       ▒████   ▓█▓░█████▓▓▒▒▒▒▒▒▒▒▒▒░░░░░░▒▓████░  ▒████▒
        ░████▒  ▒░   ▒██████▓▓▓▒▒▒▒▒▒▒▓██████▓░   ▓████▒
          ████▓         ░▒▓██████████████▓░░    ░████▓
           ▒████▒                              ▓████░
             ▓████░                          ▒████▒
              ░████▓░                      ▒████▓
                ░████▓░                  ▒████▓░
                  ░████▓░              ▒████▓
                     ░▓████▒          ▓████▒
                       ░▒████▓░    ▒████▓
                          ░▓████▓▓████▓░
                             ▒█████▓░
                               ░▒▒░
✬✬✬✬✬















[/cen
t-nelson
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
September 03, 2015, 09:31:21 PM
 #5582

Pool is looking good here now too.  Good job.

Now we just have to be careful polishing the porcelain. Smiley

BTC:   1K4yxRwZB8DpFfCgeJnFinSqeU23dQFEMu
DASH: XcRSCstQpLn8rgEyS6yH4Kcma4PfcGSJxe
hashbrown9000
Sr. Member
****
Offline Offline

Activity: 427
Merit: 250


View Profile
September 04, 2015, 01:01:58 AM
 #5583

@joblo, glad you got linux OC working. which release are you running?

it takes a little digging with google and man pages, and it's definitely not user friendly (i.e. work locally first to setup up initial OC of multiple cards, then deploy as headless SSH), but does work.

Pinkcoin:
ETH:
VTC:
BTC:
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
September 04, 2015, 02:05:54 AM
 #5584

@joblo, glad you got linux OC working. which release are you running?

it takes a little digging with google and man pages, and it's definitely not user friendly (i.e. work locally first to setup up initial OC of multiple cards, then deploy as headless SSH), but does work.

Currently using Fedora 20. Although it's EOL it's the last Fedora release supported by cuda 6.5.
I'm crossing my fingers that cuda 7.5 gets optimized soon so I can upgrade to Fedora 22 or Centos 7.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
pokeytex
Legendary
*
Offline Offline

Activity: 1504
Merit: 1002



View Profile
September 04, 2015, 02:08:29 AM
 #5585

@sp - is there any future upgrade to the Spreadcoin miner in the near future?  Grin

tsiv
Full Member
***
Offline Offline

Activity: 137
Merit: 100


View Profile
September 04, 2015, 06:28:10 AM
 #5586

Trying to make a Windows build for the modified lyra is leaving me with a seriously Huh face. Same exact code and nvcc generates something completely different that actually runs slower than the original, be it on a 970 or another 750 Ti. That in addition to VS insisting on rebuilding EVERYTHING after changing something for a single source file in the project file, gotta love it.
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
September 04, 2015, 06:54:46 AM
 #5587

If you use cuda 7.5 you should build in 64 bit mode. The x86 compiler is broken.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
tsiv
Full Member
***
Offline Offline

Activity: 137
Merit: 100


View Profile
September 04, 2015, 07:09:45 AM
 #5588

Nah, 6.5 on both boxes. Slightly older 6.5.12 on Linux and 6.5.19 (the latest 6.5 + compute 5.2 support I think) on Windows. Tried x64 builds too, doesn't seem to make much of a difference either way. Weird shit. I did manage to make the win build a little better by manually unrolling stuff, just looks like the win version of nvcc isn't really trying to figure stuff out itself. Which brings me back to weird shit.
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
September 04, 2015, 07:13:08 AM
 #5589

If you share your code I can take a look  Grin

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
September 04, 2015, 07:16:37 AM
 #5590

If you look in my bmw256 mod this code would not unroll:

//   #pragma unroll
//   for (i = 0; i<2; i++)
//      Q[i + 16] = expand32_1(i + 16, M32, H, Q);

So I had to manually unroll it. And with the manual unroll I got less instructions and faster code.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
chrysophylax
Legendary
*
Offline Offline

Activity: 3094
Merit: 1093


--- ChainWorks Industries ---


View Profile WWW
September 04, 2015, 07:26:39 AM
 #5591

@joblo, glad you got linux OC working. which release are you running?

it takes a little digging with google and man pages, and it's definitely not user friendly (i.e. work locally first to setup up initial OC of multiple cards, then deploy as headless SSH), but does work.

Currently using Fedora 20. Although it's EOL it's the last Fedora release supported by cuda 6.5.
I'm crossing my fingers that cuda 7.5 gets optimized soon so I can upgrade to Fedora 22 or Centos 7.

same here joblo ...

all nvidia based miners - running fedora 20 x64 with all the latest dnf updates ...

all amd based miners - running fedora 19 x64 with all the latest dnf updates also ...

f20x64 - cuda 6.5 ...

i have one f22x64 cuda 7.0.28 machine that IS running - ccminer-spmod 1.5.64 using x11 - and its fine ... though hashrate is about 200KH under the 6.5 compiles ... many of the other algos ( including quark and lyra2v2 ) are 'cpu validation error' persistent ...

when the donation links are up and running - and all my other jobs are done ( official rename of granitecoin and logo and website ) - then ill work on the recompile and adjustment of oc and OS test also - with centos 7 x64 vps ... i have about 7 of those at the moment - and soon to grow to a LOT more vps in centos 7 x64 for various applications ...

i would be VERY interested if there is a dedicated page / link / site specifically for linux / fedora / oc - so that we can reference it all to ... if not - ill make one ... i think we all need it when trying to setup ( and also help ) the systems for the linux savvy ... there is just too much to wade through to get the 'right' info ...

im back online for the next few days - so off to compile the 'new' ccminer-spmod 1.5.65 in both cuda 6.5 ( f20x64 ) and 7.0 ( f22x64 ) ...

wish me luck Smiley ...

btw - tsiv ... if you read this ... i have not heard back from the pm i sent you ... i would really like your details also - as i cant setup a donation server without them ... ill be publishing them in the next day or so when i iron out the little issues i have currently with them ...

tanx ...

#crysx

t-nelson
Member
**
Offline Offline

Activity: 70
Merit: 10


View Profile
September 04, 2015, 07:45:57 AM
 #5592

That in addition to VS insisting on rebuilding EVERYTHING after changing something for a single source file in the project file, gotta love it.

Pretty sure one of my PRs from yesterday should've taken care of that.  Unless you're touching a header, in which case you're probably up a creek.  I think every header includes ever other header.  Embarrassed

BTC:   1K4yxRwZB8DpFfCgeJnFinSqeU23dQFEMu
DASH: XcRSCstQpLn8rgEyS6yH4Kcma4PfcGSJxe
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 04, 2015, 07:52:13 AM
 #5593

If you look in my bmw256 mod this code would not unroll:

//   #pragma unroll
//   for (i = 0; i<2; i++)
//      Q[i + 16] = expand32_1(i + 16, M32, H, Q);

So I had to manually unroll it. And with the manual unroll I got less instructions and faster code.

That's interesting. Any idea why it's not unrolling it automatically in this case?
Did you try this:

for (i = 16; i<18; i++)
   Q = expand32_1(i, M32, H, Q);

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
September 04, 2015, 08:12:13 AM
 #5594

Nah, 6.5 on both boxes. Slightly older 6.5.12 on Linux and 6.5.19 (the latest 6.5 + compute 5.2 support I think) on Windows. Tried x64 builds too, doesn't seem to make much of a difference either way. Weird shit. I did manage to make the win build a little better by manually unrolling stuff, just looks like the win version of nvcc isn't really trying to figure stuff out itself. Which brings me back to weird shit.

You should fork my branch and merge the lyra2 changes. My fork is already 500KHASH faster than the DJM34's opensource without modding the lyra2(only the other algos). Big donations are waiting.

You have done some improvements in x11(simd) and x13 so your handle is still in the credits. But this is 1 year ago.


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
scryptr
Legendary
*
Offline Offline

Activity: 1798
Merit: 1028



View Profile WWW
September 04, 2015, 11:05:03 AM
Last edit: September 04, 2015, 05:21:40 PM by scryptr
 #5595

11 Mh/s QUARK, RELEASE 65---

With a little tuning, and using the "cpu-mining", "-C" switch, I was able to get these results with my Win 7 x64 work computer and an EVGA GTX 960 SSC graphics card:


EVGA GTX 960 SSC mining Quark

The card is mining with SP-mod release 65, and a +80 core/+240 mem overclock.  There may be room for better and faster tuning, but this appears to be a stable setting for my machine and card.  Higher overclocks bring as much as 11.2Mh/s results, but have been less stable.

Earlier this week, this card was mining Quark at 10.6Mh/s. With the cpu-mining switch, "-C", performance has improved.  My other rigs on Linux have shown similar gains.  My 6x EVGA 750ti FTW rig mines at 40Mh/s, up from 38.5Mh/s, and the cards run from 6.6Mh/s to 6.8Mh/s each.  My EVGA GTX 970 FTW+ cards now mine Quark at 16.5Mh/s each, up from 14-15Mh/s each.

I hope the other bugs are worked out, I'd like to try solo-mining VertCoin.  I also noticed the lower poolside VTC hash-rate reports within the last 2 releases, hope it is fixed.       --scryptr

EDIT:  Better results are obtained when using an intensity slightly less than the maximum acceptable/stable.  My launch string: ./ccminer -a quark -i 23.9 -C --cpu-priority 5 -o stratum+tcp://quark.pool.com:port -u a -p x


EVGA GTX 960 SSC with OverClock, mining Quark

My clocks are currently +90core/+270mem.  My results are +160kh/s from the first (top) posted pic.  Adjust per your hardware.       --scryptr

SCRYPTR'S NOTEBOOK: https://bitcointalk.org/index.php?topic=5035515.msg46035530#msg46035530
GITHUB: "github.com/scryptr"  MERIT is appreciated, also.  Thanks!
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
September 04, 2015, 11:52:26 AM
 #5596

If you look in my bmw256 mod this code would not unroll:

//   #pragma unroll
//   for (i = 0; i<2; i++)
//      Q[i + 16] = expand32_1(i + 16, M32, H, Q);

So I had to manually unroll it. And with the manual unroll I got less instructions and faster code.
That's interesting. Any idea why it's not unrolling it automatically in this case?
Did you try this:
for (i = 16; i<18; i++)
   Q = expand32_1(i, M32, H, Q);

I don't know. But I know that in my change the loop was working on constant data, and when I unrolled it manually the constant data was not calculated and less instructions was the result.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
September 04, 2015, 12:00:35 PM
Last edit: September 04, 2015, 12:13:47 PM by djm34
 #5597

Nah, 6.5 on both boxes. Slightly older 6.5.12 on Linux and 6.5.19 (the latest 6.5 + compute 5.2 support I think) on Windows. Tried x64 builds too, doesn't seem to make much of a difference either way. Weird shit. I did manage to make the win build a little better by manually unrolling stuff, just looks like the win version of nvcc isn't really trying to figure stuff out itself. Which brings me back to weird shit.

You should fork my branch and merge the lyra2 changes. My fork is already 500KHASH faster than the DJM34's opensource without modding the lyra2(only the other algos). Big donations are waiting.


hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

edit: actually the main difference I saw from my original setting, was by raising the intensity (which is a parameter adjustable by the user even in my release)

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
September 04, 2015, 12:12:01 PM
 #5598

hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

If the values go down over time it meens that your cards are trottling, because of heat or too low voltage. On my gtx 970 the miner is mining 500KHASH faster than yours.

Release 62 standard clocks:

(the 980ti is clocked at 1260 on the core)


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
September 04, 2015, 12:24:43 PM
 #5599

hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

If the values go down over time it meens that your cards are trottling, because of heat or too low voltage. On my gtx 970 the miner is mining 500KHASH faster than yours.

Release 62 standard clocks:

(the 980ti is clocked at 1260 on the core)

well, the argument isn't really relevant, if throttling happens it happens in the same way for every kernels (slow or fast), so if a kernel is faster it will remain faster no matter of any throttling and here it isn't the case...

(test was done using default clock and tdp target of 100%)

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
September 04, 2015, 12:28:26 PM
 #5600

hmmm. I doubt that...
I tried to use your modified kernels (cubehash, blakekeccak,bmw) and I mostly see no difference.
there are some variability in the result but on a medium/long run it goes down to the same values I get with the standard kernels...

If the values go down over time it meens that your cards are trottling, because of heat or too low voltage. On my gtx 970 the miner is mining 500KHASH faster than yours.

Release 62 standard clocks:

(the 980ti is clocked at 1260 on the core)

well, the argument isn't really relevant, if throttling happens it happens in the same way for every kernels (slow or fast), so if a kernel is faster it will remain faster no matter of any throttling and here it isn't the case...

(test was done using default clock and tdp target of 100%)

If a kernel is faster it probably also draw more power, which in turn means more heat so higher chance of throttling.
If an enhancement to a kernel has the same performance/watt ratio as the original, the card may throttle and bring the same performance using the same power but a lower clock speed.
I'm talking general as I don't know if it's valid for this specific case.

Pages: « 1 ... 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 [280] 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 ... 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!