Thanks for putting it together! (Also thanks to C. Buchner for somehow integrating it into Cudaminer so fast.)
I compiled from source on Visual Studio 2012, and I do indeed get the speedup from ~160 kHash/Sec to ~230kHash/sec on my card:
[2013-12-18 xx:48:12] GPU #0: GeForce GTX 660 Ti, 3154816 hashes, 230.69 khash/s
[2013-12-18 xx:48:17] GPU #0: GeForce GTX 660 Ti, 1160320 hashes, 229.80 khash/s
[2013-12-18 xx:48:21] GPU #0: GeForce GTX 660 Ti, 940800 hashes, 229.74 khash/s
[2013-12-18 xx:48:38] GPU #0: GeForce GTX 660 Ti, 3926272 hashes, 230.79 khash/s
[2013-12-18 xx:48:54] GPU #0: GeForce GTX 660 Ti, 3600128 hashes, 230.78 khash/s
(BTW I also compiled it just fine with the VS 2012 toolchain - no change from the 2010 in terms of performance.)
This is on a "stock" eVGA 660 Ti. (There was never a "reference" 660Ti, but my card is clocked the same as what is shown on nVidia's web site.) My settings are as follows:
Cudaminer settings: Running the 32-bits cudaminer,
-d 0 -i 0 -C 2 -H 2 -l K14x14CPU: Intel Core i5 2500k OC @ 4.3 GHz, 3 other threads doing CPU mining
Motherboard: ASUS Maximus V GENE
RAM: 8 GB DDR3-1866 @ 9-10-9-27
HDDs: (A low-performance mess - you don't want to know.)
OS: Windows 8.1 Pro 64 bit
My display runs on a GeForce 8500 GT while the 660 Ti does all the work. Yes - it surprises me I'm not getting random crashes by running cards that are ~5 generations apart.
As I'm in cryptocoins for fun and not for profit, I'd like to take a crack at improving this over the Christmas holiday:
[2013-12-18 xx:55:34] GPU #0: GeForce GTX 550 Ti, 1584128 hashes, 79.13 khash/s
I know the keplerminer code wasn't designed for Compute Capability 2.0, but I'm sure I can think of something. I dabbled in FPGA accelerators a couple years ago, so this should be fun.
[/s]
Hmm I just looked at the code and it looks like the newest revision also implemented the changes for the other versions of the CUDA cards... I still want to get in to GPU or other parallel stuff to keep my head sharp on the topic, so maybe I'll give it another look over anyway. ...if I had access to my old hardware lab I'd try FPGAs, but oh well...