Someone informed me they have achieved 6x speedup with new GPU code,
and is preparing to submit the code for verification. Stay tuned...
I verified the (GPU over CPU) speedup as 2.05x, so the $500 bounty is claimed.
At least the money stays in Dutch hands:-)
The improvement will be on github in a few days,
while the bounty winner tries to squeeze out some more performance.
The new and improved cuda_miner.cu is now up on
https://github.com/tromp/cuckooI also link to Genoil's
https://github.com/Genoil/cuckoo as the Cuckoo Cycle repo of choice for Windows developers.
I'd like to collect Cuckoo Cycle performance data on various graphics cards.
I propose to use the wall-clock running time of a single proof attempt:
$ make cuda30
nvcc -o cuda30 -DSIZESHIFT=30 -arch sm_35 cuda_miner.cu -lssl -lcrypto
$ time ./cuda30 -h 0 -n 7 -t 16384
Looking for 42-cycle on cuckoo30("0") with 50% edges, 7 trims, 16384 threads
Using 64MB edge and 128MB node memory.
final load 48%
58-cycle found at 0:98%
60-cycle found at 0:99%
426-cycle found at 0:99%
real 0m8.100s
user 0m7.515s
sys 0m0.576s
That 8.1 seconds was on a
$ ./cuda_query
Device Number: 0
Device name: GeForce GTX 980
Memory Clock Rate (KHz): 3505000
Memory Bus Width (bits): 256
Peak Memory Bandwidth (GB/s): 224.320000
Please post times for your favorite GPU,
and I can add a little table to the project page.
Information on power use and temperatures are also welcome.
For the latter, you may prefer a longer running version like
$ make cuda32
nvcc -o cuda32 -DSIZESHIFT=32 -arch sm_35 cuda_miner.cu -lssl -lcrypto
$ time ./cuda32 -h 0 -n 7 -t 16384
or a loop like
$ time for i in {0..99}; do ./cuda30 -h $i -n 7 -t 16384; done