Well, I did a little A-B testing with pretty much every known option.
Everything described here is using the same version (2013-04-17 of cudaMiner), same driver (314.22), same card (9600 GT), same clock options (stock - verified to be running the same clocks all the time), same launch options (16x4). Just different OS and the options described:
- XP 32-bit, cache (-C) set to 0, 1, 2 = between 20-24khps - cache off seems to have best performance
- XP 32-bit, interactive turned on (1) = 16-18khps
- XP 32-bit, single-allocation turned on (1) = no difference, 20-22khps but only seems to impair performance
- XP 32-bit, autotune = 20-24khps depending on its mood (sometimes it picks 8x options, sometimes 16, but usually 16x4)
- XP 32-bit, dual-thread and half-workload (-t 2; -l 16x2,16x2; -i 0,0) = totals about 22khps (yep!), but crashes on Ctrl+C or if autotune is accidentally invoked by forgetting to add multiple options.
- Win7 x64, autotune = 10.8khps
- Win7 x64, interactive = 8-10khps
- Win7 x64, single-allocation = 10.8khps (initially hangs the screen but snaps back just before crashing - starts at "0.8khps")
- Win7 x64, cache 1 or 2 = no effect, still ~10khps
I really think you should dig out an XP CD and give it a try yourself - just have to install the nVidia drivers and the Visual C++ 2010 runtime, then launch it and see for yourself. Might just find a huge speed boost
Also, this testing was really tiresome since the display only updates once a minute (or if a result is found). Since the initial display is usually bogus (claiming very low figures at 4096 hashes right after launching), I have to wait a whole minute each time I play with background options (clocks, background apps, etc). Isn't there a way to make the display output the status more often? Couldn't find any related options in the "--help" notes...