I've always gone with what was stated in the wiki. I'm pretty sure the same info is stated on the nVidia web site as well. I will gladly try compiling for 3.0 and see if it works. Probably not till tomorrow though. Will report back my results. thanks all for chipping in to help.
Just reporting back that yes indeed, building for SM3.0 for my card that nVidia and the Wiki both state is 3.5 did the trick. Looking through the makefile of ccminer 1.2 I see that it was building for both 3.0 and 3.5 so that's why it worked. Some bitquark (quark algo) hashrate comparisons from that machine:
ccminer 1.2 = ~1200KH/s
ccminer 1.6.6 = ~950KH/s
A little disappointed with that. Hoping the share rates are higher to compensate but that's harder to gauge. I'll be trying 1.7 tonight.
Glad to see you got it sorted out and thanks for sharing your results, even though they weren't as you hoped.
One of the side effects of optimized code is it's more specialized. Another example of an older miner being better
on older HW is neoscrypt. A new optimed neoscrypt kernel was added to 1.5.59-SP_MOD that significantly
improved performance on Maxwell but lowered kepler (780ti) performance. I'm not sure which version is
included in the TPruvot fork.
I tried analyzing the code to see where the significant differences were and made a few changes where I thought
it would affect perfomance but I couldn't find the critical code. I guess my c++ skills and cuda knowledge aren't
I have 5 ways to fix this it, listed in increassing order of sophistication.
1. Simply use an older miner when mining neoscrypt on older HW.
2. Replace the neoscrypt source directory with an older version before compiling for 3.5.
3. I managed to put together a hack to select the appropriate neoscrypt kernel based on the architecture.
It's a run time switch meaning that both kernels need to be compiled into every SM version binary and the
appropriate kernel is chosen at run time.
4. A compile time solution would be preferable where only the appropriate neoscrypt kernel is built into each SM binary.
It seems only device code can make use of __CUDA_ARCH__ at compile time so the differences in host code need
to be handled differently.
5. A unified kernel where only the critical code is architecture dependant.
I've done 1, 2 & 3 successfully and take a look at 4 when I get motivated. I think 5 is beyond my skill level.
I'm currenly using 1 because my 780ti is in a rig all by itself and I don't need to support multiple architectures.
I think this contributes to my lack of motivation along with age and rust. However if there is interest it might
be enough to get me out of my rocking chair and put on my old coding hat.