sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 03, 2014, 12:40:15 PM |
|
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.
actually, this should be removed from any makefile or setup (even though I didn't do it myself...), this method of allocating register is deprecated and should be replaced by __launch_bounds__ (I am paraphrasing cuda doc...) Yes, most of my modded kernals use __launchbounds__.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 12:05:16 AM |
|
Today I improved the final hashing in x11 (echo). The 1mb 750 is +50 KHASH. the ti 50-100KHASH. reverted BMW to an earlier version. My gainward 750ti is peaking at 2.750MHASH. While it was at 2650-2700 earlier.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 12:06:50 AM |
|
Will not make a new build before I have run it through the night.
|
|
|
|
grendel25
Legendary
Offline
Activity: 2296
Merit: 1031
|
|
December 04, 2014, 03:43:15 AM |
|
could folks share there command line? this is mine:
ccminer.exe -q -r 3 -R 10 -a x13 --no-color -o stratum+tcp://yaamp.com:3633 -u xxx -p xxx
Thank you. Is there a 'read me' that explains the -flags? Not familiar with -q -r or -R. seems like the -q maybe 'quiets' some of the output?
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
|
December 04, 2014, 05:01:55 AM |
|
--help or README.txt at github (or in my releases)
|
|
|
|
grendel25
Legendary
Offline
Activity: 2296
Merit: 1031
|
|
December 04, 2014, 06:51:02 AM |
|
--help or README.txt at github (or in my releases)
thx!
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
|
December 04, 2014, 07:41:30 AM |
|
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.
actually, this should be removed from any makefile or setup (even though I didn't do it myself...), this method of allocating register is deprecated and should be replaced by __launch_bounds__ (I am paraphrasing cuda doc...) depends of the cases, sometimes you cant change the TPB without rewriting the code (i think about sharedmem code) and you cant fine tune the max regs with the launchbound. That is right to support all past and possible/future cards.. not for the 90% of users with 750Ti or 970/980
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 08:25:39 AM |
|
Will not make a new build before I have run it through the night.
The version I checked in last night on github needs some more work...
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
|
December 04, 2014, 08:44:56 AM |
|
else your echo improvement works on linux too... 2870 yesterday 2920 now we are close to the 3MH on the 750Ti ^^ I need to inspect the changes
you have also remains of a missing simd free with current commit (does not build)
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 09:00:31 AM |
|
Not done, wait for a new checkin tonight. Multipools show high numbers , but Solomining is broken.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 09:06:26 AM |
|
else your echo improvement works on linux too... 2870 yesterday 2920 now we are close to the 3MH on the 750Ti ^^ I need to inspect the changes
2.92MHASH on the 750TI windforce black am I right? I only get 2700-2750 on my gainward ti.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 11:34:06 AM |
|
else your echo improvement works on linux too... 2870 yesterday 2920 now we are close to the 3MH on the 750Ti ^^ I need to inspect the changes you have also remains of a missing simd free with current commit (does not build)
The new Echo is doing 8.xx rounds of echo instead of 10 rounds. The previous version did 9.25 rounds. And the original version does 10rounds. On the 980 we will probobly get 300KHASH more. Less work, less power more hash.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 11:51:32 AM |
|
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.
actually, this should be removed from any makefile or setup (even though I didn't do it myself...), this method of allocating register is deprecated and should be replaced by __launch_bounds__ (I am paraphrasing cuda doc...) depends of the cases, sometimes you cant change the TPB without rewriting the code (i think about sharedmem code) and you cant fine tune the max regs with the launchbound. That is right to support all past and possible/future cards.. not for the 90% of users with 750Ti or 970/980 The problem is that when you change the number of registers in one kernal, all the other kernals needs to be recompiled, and sometimes the performance in the other kernals get worse. Use launchbounds. It's faster and bether.
|
|
|
|
coinut
|
|
December 04, 2014, 12:10:14 PM |
|
else your echo improvement works on linux too... 2870 yesterday 2920 now we are close to the 3MH on the 750Ti ^^ I need to inspect the changes you have also remains of a missing simd free with current commit (does not build)
The new Echo is doing 8.xx rounds of echo instead of 10 rounds. The previous version did 9.25 rounds. And the original version does 10rounds. On the 980 we will probobly get 300KHASH more. Less work, less power more hash. great work sp ! looking forward to testing your next release keep up the good work bro
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
December 04, 2014, 04:47:04 PM |
|
Your makefile use 128 registers and mine has 80 registers as default, and this will cause big differences.
actually, this should be removed from any makefile or setup (even though I didn't do it myself...), this method of allocating register is deprecated and should be replaced by __launch_bounds__ (I am paraphrasing cuda doc...) depends of the cases, sometimes you cant change the TPB without rewriting the code (i think about sharedmem code) and you cant fine tune the max regs with the launchbound. That is right to support all past and possible/future cards.. not for the 90% of users with 750Ti or 970/980 The problem is that when you change the number of registers in one kernal, all the other kernals needs to be recompiled, and sometimes the performance in the other kernals get worse. Use launchbounds. It's faster and bether. +1 also for past gen card, you can just duplicate the kernel with different launch_bound parameters then call using minor/major to make sure the correct kernel is used at launch time based on the card compute version
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 06:31:03 PM Last edit: December 04, 2014, 07:06:35 PM by sp_ |
|
Fixed solomining/echo, and improved fugue (x13). Now building release 15.
|
|
|
|
sp_ (OP)
Legendary
Offline
Activity: 2954
Merit: 1087
Team Black developer
|
|
December 04, 2014, 07:45:44 PM |
|
|
|
|
|
italeffect
|
|
December 04, 2014, 11:12:15 PM |
|
definitely faster on x11 but seeing a lot of result does not validate on CPU messages.
|
Dash: Xdopotr3eAHpsSCMkUyU2YWP3WQWb5X3t8
|
|
|
jjjordan
|
|
December 05, 2014, 02:29:50 AM |
|
yep, im not so confident about one pool graph... But i need to compare also... But its not related to the extranonce feature... which is not even used by most pools its only one request on start in this case. I have thested both the 1.5.0 and my modded version and they seem to report the correct hash on other pools like wafflepool, yaamp, and nicehash, so this seems to be a pool issue, but it might be a bug in ccminer as well. I second that - all multicoin/profit switching pools I've tried work fine, but most the single coin ones report much lower hash speed than what's reported by ccminer. At least that's the pattern I've noticed. version8 is the last one that works fine I think... All the newer report higher hashrates but pools don't "like" them. Version 15 is no different for me - try it yourself at pool.profitcoin.org
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
|
December 05, 2014, 03:23:21 AM |
|
Fixed solomining/echo, and improved fugue (x13). Now building release 15.
what was the problem with echo ? the last changes (in aes.cu) reduced x11 perfs to less than 2900 And yes i have 2 Gigabyte Black Edition, and also a normal Gigabyte with black edition bios (only stable on linux)
|
|
|
|
|