Bitcoin Forum
November 18, 2024, 07:24:24 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 [84] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 ... 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347585 times)
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 24, 2015, 07:35:42 AM
 #1661

The CUDA and OpenCL code for Whirlpool consists of lookups into huge tables - which sucks for the GPU;

The lookup is done in shared memory and is 1 cycle, but the internal RISC cpu needs 4 instructions to do the lookup (byteperm/add/shift/move)
With the BFINS instruction and alligned memroy buffers this can be reduced to 2 instructions, although I failed to implement it in my first attempt (AES)

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
tbearhere
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
February 24, 2015, 09:08:12 AM
 #1662

Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't,  all other pools on this algorithm work fine.
On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile.
Thx
bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
February 24, 2015, 12:06:41 PM
 #1663

Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't,  all other pools on this algorithm work fine.
On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile.
Thx

If a pool is showing half the hashrate chances are you're doing twice the expected work so doubling your difficulty divide factor (--diff or -f) is what's probably missing. The default is 1 so you should try 2. Conversely, if it only accepts half the shares then you're sending smaller chunks of work then what the pool expects in which case halving the diff helps (-f 0.5). If there are still rejected shares try lowering the values to like -f 0.0078125 or -f 0.00390625 to offset the default 128/256 multipliers while checking the pool's reported hashrate.

Not your keys, not your coins!
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 24, 2015, 12:26:32 PM
 #1664

The CUDA and OpenCL code for Whirlpool consists of lookups into huge tables - which sucks for the GPU;
The lookup is done in shared memory and is 1 cycle, but the internal RISC cpu needs 4 instructions to do the lookup (byteperm/add/shift/move)
With the BFINS instruction and alligned memroy buffers this can be reduced to 2 instructions, although I failed to implement it in my first attempt (AES)
I haven't done CUDA in quite a while, but here's a tip about AMD - using fucktons of LDS is bad for you. It reduces the waves in flight - more waves in flight usually mean more performance, up to a point.

The maxwell can do 2 instructions per clockcycle, but only one cycle when the instruction is using shared/const memory. Normal superscalar design. Thats why I normally move constants into the instruction cache.  Just need to make sure that the codesize fit the cache..

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
tbearhere
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
February 24, 2015, 03:16:36 PM
Last edit: February 24, 2015, 04:48:18 PM by tbearhere
 #1665

Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't,  all other pools on this algorithm work fine.
On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile.
Thx

If a pool is showing half the hashrate chances are you're doing twice the expected work so doubling your difficulty divide factor (--diff or -f) is what's probably missing. The default is 1 so you should try 2. Conversely, if it only accepts half the shares then you're sending smaller chunks of work then what the pool expects in which case halving the diff helps (-f 0.5). If there are still rejected shares try lowering the values to like -f 0.0078125 or -f 0.00390625 to offset the default 128/256 multipliers while checking the pool's reported hashrate.
-f 0.5  divides it in half  so total= 1/4 hash rate. I did try on another pool and its fine but this amd pool is s***. theblocksfactory  I tried 2 but over shares. So I come to the conclusion that it theblocksfactory pool.
I'm in http://digihash.co very good no problems. Smiley
bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
February 24, 2015, 05:12:08 PM
 #1666

Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't,  all other pools on this algorithm work fine.
On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile.
Thx

If a pool is showing half the hashrate chances are you're doing twice the expected work so doubling your difficulty divide factor (--diff or -f) is what's probably missing. The default is 1 so you should try 2. Conversely, if it only accepts half the shares then you're sending smaller chunks of work then what the pool expects in which case halving the diff helps (-f 0.5). If there are still rejected shares try lowering the values to like -f 0.0078125 or -f 0.00390625 to offset the default 128/256 multipliers while checking the pool's reported hashrate.
-f 0.5  divides it in half  so total= 1/4 hash rate. I did try on another pool and its fine but this amd pool is s***. theblocksfactory  I tried 2 but over shares. So I come to the conclusion that it theblocksfactory pool.
I'm in http://digihash.co very good no problems. Smiley

Theblocksfactory is weird. When their vardiff starts climbing it throws rejects so it goes back and repeats. Anyway, you can use a fixed minimum vardiff and it seems for a 6 card 750 Ti rig 4 (.workername_diff4) works fine with -f 256 with release 39.

Not your keys, not your coins!
tbearhere
Legendary
*
Offline Offline

Activity: 3220
Merit: 1003



View Profile
February 24, 2015, 10:41:54 PM
Last edit: February 25, 2015, 09:14:49 PM by tbearhere
 #1667

Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't,  all other pools on this algorithm work fine.
On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile.
Thx

If a pool is showing half the hashrate chances are you're doing twice the expected work so doubling your difficulty divide factor (--diff or -f) is what's probably missing. The default is 1 so you should try 2. Conversely, if it only accepts half the shares then you're sending smaller chunks of work then what the pool expects in which case halving the diff helps (-f 0.5). If there are still rejected shares try lowering the values to like -f 0.0078125 or -f 0.00390625 to offset the default 128/256 multipliers while checking the pool's reported hashrate.
-f 0.5  divides it in half  so total= 1/4 hash rate. I did try on another pool and its fine but this amd pool is s***. theblocksfactory  I tried 2 but over shares. So I come to the conclusion that it theblocksfactory pool.
I'm in http://digihash.co very good no problems. Smiley

Theblocksfactory is weird. When their vardiff starts climbing it throws rejects so it goes back and repeats. Anyway, you can use a fixed minimum vardiff and it seems for a 6 card 750 Ti rig 4 (.workername_diff4) works fine with -f 256 with release 39.
Thanks
Thats better..but getting alot of booo's with shares above target.  Funny that we have to use the -f 256 for that.
Edit: With diif4 and -f 256 hash went from 50% to 75%. So I have to adjust the diff4 to 2 or 8 ect to see what happens when I get a chance.
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 25, 2015, 10:41:29 PM
 #1668

I have rewritten wirlpool hash. 12% faster when mining wirlcoin (750ti)
x15 is +20khash(750ti)

Will cleanup abit and submit to github.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 25, 2015, 10:52:30 PM
 #1669

I have rewritten wirlpool hash. 12% faster when mining wirlcoin (750ti)
x15 is +20khash(750ti)
Will cleanup abit and submit to github.
Sounds like you're still using tables...

yes. but the table is 1/8 the size.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 25, 2015, 11:24:22 PM
 #1670

Yeah, a few rotations and you can down the size, still ouch.

The Hashing function can probobly be improved more, but 12% is ok for now.
I have also submitted a speedup in fugue (x13) precalced some hash and removed instructions.
Building release 40 now.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 25, 2015, 11:44:23 PM
 #1671

1.5.40(sp-MOD) is available here: (27-feb-2015)

https://github.com/sp-hash/ccminer/releases/tag/1.5.40

The sourcecode is available here:

https://github.com/sp-hash/ccminer

Differences from release 39

wirlcoin +12%

Faster hash in

Wirlpool(x15,x17)
fugue(x13,x14,x15,x17)
shavite(x11,x13,x14,x15,x17) (tiny speedup)
shabal(x11,x13,x14,x15,x17)(tiny speedup)




Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
February 26, 2015, 12:23:17 AM
 #1672

1.5.40(sp-MOD) is available here: (27-feb-2015)

https://github.com/sp-hash/ccminer/releases/tag/1.5.40

The sourcecode is available here:

https://github.com/sp-hash/ccminer

Differences from release 39

wirlcoin +12%

Faster hash in

Wirlpool(x15,x17)
fugue(x13,x14,x15,x17)
shavite(x11,x13,x14,x15,x17) (tiny speedup)
shabal(x11,x13,x14,x15,x17)(tiny speedup)

I try sometime ago the rotation but I wasn't convince, however I don't think I tried it with uint2 since then (I hate working on whirlpool... takes forever to compile).

I get +20MH/s on whirlpoolx on gtx980
        +10MH/s on 750ti
but -30MH on 780ti

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 26, 2015, 07:18:54 AM
 #1673

With uint2 it uses more registers and spills to memory, so I increaced the launchbound to 128 regs.
The codesize is also bigger.
On the 780ti you should probobobly not unroll all the loops.

There are more speedups to come. Still some easy pickings.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
rednoW
Legendary
*
Offline Offline

Activity: 1510
Merit: 1003


View Profile
February 26, 2015, 08:50:22 AM
 #1674

commit Faster fugue https://github.com/sp-hash/ccminer/commit/2ab3254cbddedc0a34020fb3b5d7917fca87dc01 was a little bit (5-7khs) slower for my gtx750 even with manually fine tuned -i.
But commit faster whirlpool https://github.com/sp-hash/ccminer/commit/9715bf7eea8f1c92034e1c67891fa242a8c63d26 is faster for x15.

So i just added x15/cuda_x15_whirlpool.cu from Release 40 to my custom build and gain optimal performance: +17khs on x15 without drop in x13 and x14 
rednoW
Legendary
*
Offline Offline

Activity: 1510
Merit: 1003


View Profile
February 26, 2015, 09:11:12 AM
 #1675

damn, I can't understand why your R40 binary is 20khs faster in Qubit then my custom R39 based built.
Your R40 didn't work on GTX750 1gb on default (out of memory error) so I use it with "-i 19.3" option.

And it is 20khs faster then both your own built R39 binary and my custom build.
As I see on GitHub there were no changes in Qubit between R39 and R40 except for different launch config that is overridden by -i option.
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 26, 2015, 09:21:29 AM
 #1676

commit Faster fugue https://github.com/sp-hash/ccminer/commit/2ab3254cbddedc0a34020fb3b5d7917fca87dc01 was a little bit (5-7khs) slower for my gtx750 even with manually fine tuned -i.
But commit faster whirlpool https://github.com/sp-hash/ccminer/commit/9715bf7eea8f1c92034e1c67891fa242a8c63d26 is faster for x15.
So i just added x15/cuda_x15_whirlpool.cu from Release 40 to my custom build and gain optimal performance: +17khs on x15 without drop in x13 and x14 

There are 3 commits to the fugue hash between 39 and 40. did you forget these?

https://github.com/sp-hash/ccminer/commit/cfb07f6488f436caae14fcd179933ae74efa6a65
https://github.com/sp-hash/ccminer/commit/c60096da2401173051c11fe3a864aee2b2f5d7ad

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 26, 2015, 09:23:32 AM
 #1677

damn, I can't understand why your R40 binary is 20khs faster in Qubit then my custom R39 based built.
Your R40 didn't work on GTX750 1gb on default (out of memory error) so I use it with "-i 19.3" option.
And it is 20khs faster then both your own built R39 binary and my custom build.
As I see on GitHub there were no changes in Qubit between R39 and R40 except for different launch config that is overridden by -i option.

shavite is used in qubit did you include this change?

https://github.com/sp-hash/ccminer/commit/af409c1da57085ad942b3de05d1e33f730d6b910

Also make sure that you compile with the latest drivers.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
rednoW
Legendary
*
Offline Offline

Activity: 1510
Merit: 1003


View Profile
February 26, 2015, 09:45:25 AM
 #1678


No I didn't. These 2 commits are in my build.
And the last https://github.com/sp-hash/ccminer/commit/2ab3254cbddedc0a34020fb3b5d7917fca87dc01 is certainly not so good on my 750 so I've got rid of it Wink
rednoW
Legendary
*
Offline Offline

Activity: 1510
Merit: 1003


View Profile
February 26, 2015, 09:51:42 AM
 #1679

shavite is used in qubit did you include this change?

https://github.com/sp-hash/ccminer/commit/af409c1da57085ad942b3de05d1e33f730d6b910

Also make sure that you compile with the latest drivers.

You were right! I missed it! Now my own build is doing well, 4378khs in Qubit
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
February 26, 2015, 10:08:20 AM
 #1680

shavite is used in qubit did you include this change?
https://github.com/sp-hash/ccminer/commit/af409c1da57085ad942b3de05d1e33f730d6b910
Also make sure that you compile with the latest drivers.
You were right! I missed it! Now my own build is doing well, 4378khs in Qubit

Pretty good for 512 cores.. (The GTX 750 Maxwell 1gb card retails for around $120)

A AMD radeon 280x R9 does 5,5MHASH with 2048 cores. (optimized opensource miner with 5 times the power usage).
Wolf, how fast can you make qubit?

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
Pages: « 1 ... 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 [84] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 ... 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!