Bitcoin Forum
June 15, 2024, 09:35:52 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 [380] 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 ... 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347501 times)
chrysophylax
Legendary
*
Offline Offline

Activity: 2828
Merit: 1091


--- ChainWorks Industries ---


View Profile WWW
November 17, 2015, 06:38:29 AM
 #7581


-snip-

have you tried this with v74? ...

or with ftc? ...

#crysx

No not yet with either.

Edit: Will test with .74 now.

Sorry real life has had me working, not much time to keep up with the thread and versions.

i think a lot of us know how you feel mate ... im in exactly the same boat ... Wink ...

tanx ...

#crysx

chrysophylax
Legendary
*
Offline Offline

Activity: 2828
Merit: 1091


--- ChainWorks Industries ---


View Profile WWW
November 17, 2015, 07:31:07 AM
 #7582

IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining



I already have an ARM dual-core with an FPGA on the same chip.

that can process x11? ... <looks up at the ceiling - all innocent> ...

Wink ...

#crysx

The Cyclone V itself isn't big enough. I do know how to get boards that are on the cheap.

would they be difficult to code to do x11 optimized efficiently? ...

#crysx

It's not really coding, it's chip design. And it'd be VERY tedious, but doable.

tedious and doable - but worth doing? ...

#crysx

DEFINITELY.

well - that says it all doesnt it Smiley ...

ill pm you for any details you wish to share - and whether you are interested in maybe doing it as a project ...

you know all the details - so its just a matter of when where and how much? ...

hang on a moment ... thats a proposition for a service - but not this one ... ok ...

Tongue ...

#crysx

zTheWolfz
Full Member
***
Offline Offline

Activity: 231
Merit: 150



View Profile
November 17, 2015, 07:31:42 AM
Last edit: November 17, 2015, 08:13:22 AM by zTheWolfz
 #7583


-snip-

have you tried this with v74? ...

or with ftc? ...

#crysx

No not yet with either.

Edit: Will test with .74 now.

Sorry real life has had me working, not much time to keep up with the thread and versions.

i think a lot of us know how you feel mate ... im in exactly the same boat ... Wink ...

tanx ...

#crysx

v74 is working with some loss of hash over
v72 at -i 11 - 320
v74 -i 11 - 266/316

[2015-11-17 01:37:28] accepted: 2/3 (66.67%), 275.05 kH/s yes!
Getting noooo's with both at times, didn't see these nooo's with sp v54

Edit: after double checking looks like v72 & v74 are doing about the same hash numbers on good found blocks.
Not doing to bad 350 coins found since my first post today, but I do have 3 other cards besize the 960 working.
Three of the 50 coin blocks were found on a pair of R270 non X cards. Nothing found on the 760GTX yet.

pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
November 17, 2015, 09:03:41 AM
 #7584

IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining

10 times?
I don't think it's possible with the current technology, and even if it was, nvidia wouldn't break the upgrade path releasing such a product.
I guess max 1.5x maxwell with same power consumption.

chrysophylax
Legendary
*
Offline Offline

Activity: 2828
Merit: 1091


--- ChainWorks Industries ---


View Profile WWW
November 17, 2015, 10:23:02 AM
Last edit: November 17, 2015, 10:58:08 AM by chrysophylax
 #7585

IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining

10 times?
I don't think it's possible with the current technology, and even if it was, nvidia wouldn't break the upgrade path releasing such a product.
I guess max 1.5x maxwell with same power consumption.

damn ...

well - when we see pascal out in the stores - we should see some degree of increase ... and if its anything larger than 25% then we are doing well ...

if it gets much larger than 25% - say 50-100% - then we will be in for a wild ride pallas ... and some massive hashrates ...

ill start a miner system and btc address just for them next year ...

#crysx

Genoil
Sr. Member
****
Offline Offline

Activity: 438
Merit: 250


View Profile
November 17, 2015, 10:48:58 AM
 #7586

IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining



The 10 times figure came from some marketing talk by NVidia CEO Jen-Hsun Huang, which had nothing to do with hard benchmark figures:



I think this had something to do with machine learning with a crapload of interconnected cards.

ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d
BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
thefix
Legendary
*
Offline Offline

Activity: 1049
Merit: 1001



View Profile
November 17, 2015, 11:23:23 AM
 #7587

IBM(Xilinix) and Intel(Altera ) are both working with FPGA makers to produce CPUs with FPGAS built in.

Nvidia Pascal looks to be 10 times faster than Maxwell and is expected to be released in 2016

If the Pascal specs are true, it would breathe some new life into GPU mining

10 times?
I don't think it's possible with the current technology, and even if it was, nvidia wouldn't break the upgrade path releasing such a product.
I guess max 1.5x maxwell with same power consumption.

1.5x sounds more realistic considering it will be 16nm
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 17, 2015, 11:34:10 AM
 #7588

Doesn't help with a 1.5x faster hardware when the software is 2x slower.. So you will need someone to create a good compiler, and someone to mod the code..

The 980ti is around 3x faster than the 780ti mining quark.
The 980ti is around 2x faster than the 780ti mining x11.
The 980ti is around 1.5x faster than the 780ti mining lyra2v2.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
November 17, 2015, 11:51:12 AM
 #7589

10x speed isn't going to happen, maybe 2x tops if I had to guess. The big advantage will come from better efficiency due to the switch from 28nm to 16nm architecture.

I actually hope Pascal won't be very good because that would render our current cards pretty much useless for mining with a huge drop in resell value.

Not your keys, not your coins!
thefix
Legendary
*
Offline Offline

Activity: 1049
Merit: 1001



View Profile
November 17, 2015, 11:55:55 AM
 #7590

Doesn't help with a 1.5x faster hardware when the software is 2x slower.. So you will need someone to create a good compiler, and someone to mod the code..

The 980ti is around 3x faster than the 780ti mining quark.
The 980ti is around 2x faster than the 780ti mining x11.
The 980ti is around 1.5x faster than the 780ti mining lyra2v2.


That is interesting considering the 780ti and 980ti look very similar hardware specs wise other than a bit more memory(3GB) and a tiny amount of cuda cores(16more) on the 980ti


Why the big difference?
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 17, 2015, 12:07:48 PM
 #7591

Doesn't help with a 1.5x faster hardware when the software is 2x slower.. So you will need someone to create a good compiler, and someone to mod the code..
The 980ti is around 3x faster than the 780ti mining quark.
The 980ti is around 2x faster than the 780ti mining x11.
The 980ti is around 1.5x faster than the 780ti mining lyra2v2.
That is interesting considering the 780ti and 980ti look very similar hardware specs wise other than a bit more memory(3GB) and a tiny amount of cuda cores(16more) on the 980ti
Why the big difference?

The 780ti only have 64 registers while the maxwell have 256. With 64 registers, the 780ti spills to the stack. But in the memory algos like Lyra2v2 the performance is bether. Mostly because djm34 have made seperate kernals for compute 3.5 and 5.0.  In my maxwell mod I removed the optimized compute 3.5 kernals, because they used 25% more memory. The maxwell is caching only 32bit  in a cacheline while the kepler cache 128 bits.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
Genoil
Sr. Member
****
Offline Offline

Activity: 438
Merit: 250


View Profile
November 17, 2015, 12:25:16 PM
 #7592

Doesn't help with a 1.5x faster hardware when the software is 2x slower.. So you will need someone to create a good compiler, and someone to mod the code..
The 980ti is around 3x faster than the 780ti mining quark.
The 980ti is around 2x faster than the 780ti mining x11.
The 980ti is around 1.5x faster than the 780ti mining lyra2v2.
That is interesting considering the 780ti and 980ti look very similar hardware specs wise other than a bit more memory(3GB) and a tiny amount of cuda cores(16more) on the 980ti
Why the big difference?

The 780ti only have 64 registers while the maxwell have 256.

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.

ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d
BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 17, 2015, 12:41:24 PM
Last edit: November 17, 2015, 01:13:35 PM by sp_
 #7593

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.

I can see it in the link. I don't have a compute 3.5 card. Maybe there are some possible speedups to be made on the 780ti.


Anyone with a 780ti card who can compile the latest version (add compute 3.5 in the projectfile (or makefile)

What hashrates are you getting?

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
Genoil
Sr. Member
****
Offline Offline

Activity: 438
Merit: 250


View Profile
November 17, 2015, 01:17:17 PM
 #7594

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.

I can see it in the link. I don't have a compute 3.5 card. Maybe there are some possible speedups to be made on the 780ti.

I did most of the work on ethminer on a 780. It's mostly the same as Maxwell, but on SASS level (or PTX if you use Cuda 7.5) the biggest difference is in the absence of LOP3.LUT. Another big difference is that for kernels with low reg counts, you can do double the amount of blocks per SM on Maxwell (32 vs 16). And more shared mem per SM.

ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d
BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 17, 2015, 01:20:20 PM
 #7595

1.5.66(sp-MOD) - Lyra2v2 - GTX750Ti  - 1431/1440 - 5177kh/s
1.5.69(sp-MOD) - Lyra2v2 - GTX750Ti  - 1431/1440 - 5160kh/s
1.5.73(sp-MOD) - Lyra2v2 - GTX750Ti  - 1431/1440 - 5100kh/s
1.5.74(sp-MOD) - Lyra2v2 - GTX750Ti  - 1431/1440 - 5145kh/s
1.5.66(sp-MOD) - Quark - GTX750Ti  - 1431/1440 - 7195kh/s
1.5.69(sp-MOD) - Quark - GTX750Ti  - 1431/1440 - 7251kh/s
1.5.73(sp-MOD) - Quark - GTX750Ti  - 1431/1440 - 7238kh/s
1.5.74(sp-MOD) - Quark - GTX750Ti  - 1431/1440 - 7190kh/s
Thanks for testing. Something happed between release 66 and 69.
Looks like 66 is the fastest. but the clocks are higher.

lyra2v2:
release 66: 9805
gtx 970 (core 1354Mhz, mem 1502MHz)

release 69: 9282
gtx 970 (core 1328.5Mhz, mem 1502MHz)

release 73: 9204
gtx 970 (core 1328.5Mhz, mem 1502MHz)

release 74: 9550
gtx 970 (core 1316.3Mhz, mem 1502MHz)

I have tested some more. Release 74 is using more power and heat, so the card trottle and performance is lost.
My test was conducted in a closed case rig with a EVGA superclocked card (2x 6pins power)

but my dev card:
The gigabyte 970oc G1 comes with 1x8 pin and 1x6 pin and doesn't trottle and give bether performance in release 74 than 66.


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 17, 2015, 01:23:17 PM
 #7596

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.
I can see it in the link. I don't have a compute 3.5 card. Maybe there are some possible speedups to be made on the 780ti.
I did most of the work on ethminer on a 780. It's mostly the same as Maxwell, but on SASS level (or PTX if you use Cuda 7.5) the biggest difference is in the absence of LOP3.LUT. Another big difference is that for kernels with low reg counts, you can do double the amount of blocks per SM on Maxwell (32 vs 16). And more shared mem per SM.

Would you be so kind and test the speed of the different algos in my fork?

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
November 17, 2015, 01:27:30 PM
 #7597

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.

I can see it in the link. I don't have a compute 3.5 card. Maybe there are some possible speedups to be made on the 780ti.


Anyone with a 780ti card who can compile the latest version (add compute 3.5 in the projectfile (or makefile)

What hashrates are you getting?

EVGA 780ti SC +100 GPU OC

Quark:   11.7 MH/s
X11:       6.35 MH/s
Lyra2v2: 7.7 MH/s
Neo:       330 KH/s   (375 with r58)



AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2912
Merit: 1087

Team Black developer


View Profile
November 17, 2015, 01:35:50 PM
 #7598

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.
I can see it in the link. I don't have a compute 3.5 card. Maybe there are some possible speedups to be made on the 780ti.
Anyone with a 780ti card who can compile the latest version (add compute 3.5 in the projectfile (or makefile)
What hashrates are you getting?
EVGA 780ti SC +100 GPU OC
Quark:   11.7 MH/s
X11:       6.35 MH/s
Lyra2v2: 7.7 MH/s
Neo:       330 KH/s   (375 with r58)

I think djm34's original lyra2v2 does around 9 in lyra2v2?

Gigabyte 980ti OC g1 +100mhz (release 74)

quark: 28.5MHASH
x11: 13.5MHASH
lyra2v2: 18MHASH
Neo: 650KH/s


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
Genoil
Sr. Member
****
Offline Offline

Activity: 438
Merit: 250


View Profile
November 17, 2015, 01:42:16 PM
 #7599

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.
I can see it in the link. I don't have a compute 3.5 card. Maybe there are some possible speedups to be made on the 780ti.
Anyone with a 780ti card who can compile the latest version (add compute 3.5 in the projectfile (or makefile)
What hashrates are you getting?
EVGA 780ti SC +100 GPU OC
Quark:   11.7 MH/s
X11:       6.35 MH/s
Lyra2v2: 7.7 MH/s
Neo:       330 KH/s   (375 with r58)

I think djm34's original lyra2v2 does around 9 in lyra2v2?

Gigabyte 980ti OC g1 +100mhz

quark: 28.5MHASH
x11: 13.5MHASH
lyra2v2: 18MHASH
Neo: 650KH/s

Must be LOP3.LUT, Nvidia's answer (and more) to AMD's native bitselect. A while ago I tried using inline PTX lop3.lut on some of your algo's, only to find out that the PTX -> SASS compiler already took care of that  Undecided

ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d
BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
theotherme
Member
**
Offline Offline

Activity: 81
Merit: 10


View Profile
November 17, 2015, 03:19:16 PM
 #7600

Nope. Compute 3.5 also has max 255 regs per thread. https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capabilities Table 13.
I can see it in the link. I don't have a compute 3.5 card. Maybe there are some possible speedups to be made on the 780ti.
Anyone with a 780ti card who can compile the latest version (add compute 3.5 in the projectfile (or makefile)
What hashrates are you getting?
EVGA 780ti SC +100 GPU OC
Quark:   11.7 MH/s
X11:       6.35 MH/s
Lyra2v2: 7.7 MH/s
Neo:       330 KH/s   (375 with r58)

I think djm34's original lyra2v2 does around 9 in lyra2v2?

must be about that. However it is (I think) related to the 64bit instruction which is a lot faster on the 780ti for some reasons...
Pages: « 1 ... 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 [380] 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 ... 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!