sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 24, 2015, 07:35:42 AM |
|
The CUDA and OpenCL code for Whirlpool consists of lookups into huge tables - which sucks for the GPU;
The lookup is done in shared memory and is 1 cycle, but the internal RISC cpu needs 4 instructions to do the lookup (byteperm/add/shift/move) With the BFINS instruction and alligned memroy buffers this can be reduced to 2 instructions, although I failed to implement it in my first attempt (AES)
|
|
|
|
tbearhere
Legendary
Offline
Activity: 3220
Merit: 1003
|
|
February 24, 2015, 09:08:12 AM |
|
Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't, all other pools on this algorithm work fine. On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile. Thx
|
|
|
|
bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
February 24, 2015, 12:06:41 PM |
|
Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't, all other pools on this algorithm work fine. On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile. Thx
If a pool is showing half the hashrate chances are you're doing twice the expected work so doubling your difficulty divide factor (--diff or -f) is what's probably missing. The default is 1 so you should try 2. Conversely, if it only accepts half the shares then you're sending smaller chunks of work then what the pool expects in which case halving the diff helps (-f 0.5). If there are still rejected shares try lowering the values to like -f 0.0078125 or -f 0.00390625 to offset the default 128/256 multipliers while checking the pool's reported hashrate.
|
Not your keys, not your coins!
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 24, 2015, 12:26:32 PM |
|
The CUDA and OpenCL code for Whirlpool consists of lookups into huge tables - which sucks for the GPU;
The lookup is done in shared memory and is 1 cycle, but the internal RISC cpu needs 4 instructions to do the lookup (byteperm/add/shift/move) With the BFINS instruction and alligned memroy buffers this can be reduced to 2 instructions, although I failed to implement it in my first attempt (AES) I haven't done CUDA in quite a while, but here's a tip about AMD - using fucktons of LDS is bad for you. It reduces the waves in flight - more waves in flight usually mean more performance, up to a point. The maxwell can do 2 instructions per clockcycle, but only one cycle when the instruction is using shared/const memory. Normal superscalar design. Thats why I normally move constants into the instruction cache. Just need to make sure that the codesize fit the cache..
|
|
|
|
tbearhere
Legendary
Offline
Activity: 3220
Merit: 1003
|
|
February 24, 2015, 03:16:36 PM Last edit: February 24, 2015, 04:48:18 PM by tbearhere |
|
Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't, all other pools on this algorithm work fine. On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile. Thx
If a pool is showing half the hashrate chances are you're doing twice the expected work so doubling your difficulty divide factor (--diff or -f) is what's probably missing. The default is 1 so you should try 2. Conversely, if it only accepts half the shares then you're sending smaller chunks of work then what the pool expects in which case halving the diff helps (-f 0.5). If there are still rejected shares try lowering the values to like -f 0.0078125 or -f 0.00390625 to offset the default 128/256 multipliers while checking the pool's reported hashrate. -f 0.5 divides it in half so total= 1/4 hash rate. I did try on another pool and its fine but this amd pool is s***. theblocksfactory I tried 2 but over shares. So I come to the conclusion that it theblocksfactory pool. I'm in http://digihash.co very good no problems.
|
|
|
|
bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
February 24, 2015, 05:12:08 PM |
|
Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't, all other pools on this algorithm work fine. On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile. Thx
If a pool is showing half the hashrate chances are you're doing twice the expected work so doubling your difficulty divide factor (--diff or -f) is what's probably missing. The default is 1 so you should try 2. Conversely, if it only accepts half the shares then you're sending smaller chunks of work then what the pool expects in which case halving the diff helps (-f 0.5). If there are still rejected shares try lowering the values to like -f 0.0078125 or -f 0.00390625 to offset the default 128/256 multipliers while checking the pool's reported hashrate. -f 0.5 divides it in half so total= 1/4 hash rate. I did try on another pool and its fine but this amd pool is s***. theblocksfactory I tried 2 but over shares. So I come to the conclusion that it theblocksfactory pool. I'm in http://digihash.co very good no problems. Theblocksfactory is weird. When their vardiff starts climbing it throws rejects so it goes back and repeats. Anyway, you can use a fixed minimum vardiff and it seems for a 6 card 750 Ti rig 4 (.workername_diff4) works fine with -f 256 with release 39.
|
Not your keys, not your coins!
|
|
|
tbearhere
Legendary
Offline
Activity: 3220
Merit: 1003
|
|
February 24, 2015, 10:41:54 PM Last edit: February 25, 2015, 09:14:49 PM by tbearhere |
|
Need help on DGB coin Qubit algo Theblocksfactory I think needs a setting that I don't understand and he doesn't, all other pools on this algorithm work fine. On Qubit algo... before #33 we needed a -f 236 and now we don't. Now on this pool since 6 months ago I never got ccminer to work properly. With the older versions I needed to restart the program every 60 seconds to get the pool at my true hashrate. With #39 , no -f 236 needed , it works fine except it only excepts exactly 1/2 my hashrate. I think its a setting the pool owner needs to make. Again I tried this on another pools and it works fine. Any thoughts on this please? Please. ps The other pools have so little hash rate they only hit a block once in awhile. Thx
If a pool is showing half the hashrate chances are you're doing twice the expected work so doubling your difficulty divide factor (--diff or -f) is what's probably missing. The default is 1 so you should try 2. Conversely, if it only accepts half the shares then you're sending smaller chunks of work then what the pool expects in which case halving the diff helps (-f 0.5). If there are still rejected shares try lowering the values to like -f 0.0078125 or -f 0.00390625 to offset the default 128/256 multipliers while checking the pool's reported hashrate. -f 0.5 divides it in half so total= 1/4 hash rate. I did try on another pool and its fine but this amd pool is s***. theblocksfactory I tried 2 but over shares. So I come to the conclusion that it theblocksfactory pool. I'm in http://digihash.co very good no problems. Theblocksfactory is weird. When their vardiff starts climbing it throws rejects so it goes back and repeats. Anyway, you can use a fixed minimum vardiff and it seems for a 6 card 750 Ti rig 4 (.workername_diff4) works fine with -f 256 with release 39. Thanks Thats better..but getting alot of booo's with shares above target. Funny that we have to use the -f 256 for that. Edit: With diif4 and -f 256 hash went from 50% to 75%. So I have to adjust the diff4 to 2 or 8 ect to see what happens when I get a chance.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 25, 2015, 10:41:29 PM |
|
I have rewritten wirlpool hash. 12% faster when mining wirlcoin (750ti) x15 is +20khash(750ti)
Will cleanup abit and submit to github.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 25, 2015, 10:52:30 PM |
|
I have rewritten wirlpool hash. 12% faster when mining wirlcoin (750ti) x15 is +20khash(750ti) Will cleanup abit and submit to github.
Sounds like you're still using tables... yes. but the table is 1/8 the size.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 25, 2015, 11:24:22 PM |
|
Yeah, a few rotations and you can down the size, still ouch.
The Hashing function can probobly be improved more, but 12% is ok for now. I have also submitted a speedup in fugue (x13) precalced some hash and removed instructions. Building release 40 now.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 25, 2015, 11:44:23 PM |
|
1.5.40(sp-MOD) is available here: (27-feb-2015) https://github.com/sp-hash/ccminer/releases/tag/1.5.40The sourcecode is available here: https://github.com/sp-hash/ccminerDifferences from release 39 wirlcoin +12% Faster hash in Wirlpool(x15,x17) fugue(x13,x14,x15,x17) shavite(x11,x13,x14,x15,x17) (tiny speedup) shabal(x11,x13,x14,x15,x17)(tiny speedup)
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
February 26, 2015, 12:23:17 AM |
|
1.5.40(sp-MOD) is available here: (27-feb-2015) https://github.com/sp-hash/ccminer/releases/tag/1.5.40The sourcecode is available here: https://github.com/sp-hash/ccminerDifferences from release 39 wirlcoin +12% Faster hash in Wirlpool(x15,x17) fugue(x13,x14,x15,x17) shavite(x11,x13,x14,x15,x17) (tiny speedup) shabal(x11,x13,x14,x15,x17)(tiny speedup) I try sometime ago the rotation but I wasn't convince, however I don't think I tried it with uint2 since then (I hate working on whirlpool... takes forever to compile). I get +20MH/s on whirlpoolx on gtx980 +10MH/s on 750ti but -30MH on 780ti
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 26, 2015, 07:18:54 AM |
|
With uint2 it uses more registers and spills to memory, so I increaced the launchbound to 128 regs. The codesize is also bigger. On the 780ti you should probobobly not unroll all the loops.
There are more speedups to come. Still some easy pickings.
|
|
|
|
|
rednoW
Legendary
Offline
Activity: 1510
Merit: 1003
|
|
February 26, 2015, 09:11:12 AM |
|
damn, I can't understand why your R40 binary is 20khs faster in Qubit then my custom R39 based built. Your R40 didn't work on GTX750 1gb on default (out of memory error) so I use it with "-i 19.3" option.
And it is 20khs faster then both your own built R39 binary and my custom build. As I see on GitHub there were no changes in Qubit between R39 and R40 except for different launch config that is overridden by -i option.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 26, 2015, 09:21:29 AM |
|
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 26, 2015, 09:23:32 AM |
|
damn, I can't understand why your R40 binary is 20khs faster in Qubit then my custom R39 based built. Your R40 didn't work on GTX750 1gb on default (out of memory error) so I use it with "-i 19.3" option. And it is 20khs faster then both your own built R39 binary and my custom build. As I see on GitHub there were no changes in Qubit between R39 and R40 except for different launch config that is overridden by -i option.
shavite is used in qubit did you include this change? https://github.com/sp-hash/ccminer/commit/af409c1da57085ad942b3de05d1e33f730d6b910Also make sure that you compile with the latest drivers.
|
|
|
|
|
rednoW
Legendary
Offline
Activity: 1510
Merit: 1003
|
|
February 26, 2015, 09:51:42 AM |
|
You were right! I missed it! Now my own build is doing well, 4378khs in Qubit
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
February 26, 2015, 10:08:20 AM |
|
You were right! I missed it! Now my own build is doing well, 4378khs in Qubit Pretty good for 512 cores.. (The GTX 750 Maxwell 1gb card retails for around $120) A AMD radeon 280x R9 does 5,5MHASH with 2048 cores. (optimized opensource miner with 5 times the power usage). Wolf, how fast can you make qubit?
|
|
|
|
|