A brave tester with 8 Fermi cards Tesla M2090 (thanks Choseh) just figured out the performance regression between 2013-12-18 and 2014-02-02. If you change the #if 0 in the fermi_kernel.cu to #if 1 (thereby enabling the previous version of the Salsa20/8 round function) you should see the previous performance figures again. Those who can compile the code themselves and want to mine on Fermi are welcome to make this change themselves. EDIT: False alarm apparently. My tester cannot reproduce this now
also there seems to be a bug in the autotuning code in salsa_kernel.cu
hash_sec = (double)WU_PER_LAUNCH / tdelta;
should very likely be
hash_sec = (double)WU_PER_LAUNCH * repeat / tdelta;
to factor in the number of repetitions in the measurement (we want to measure for 50ms minimum for better timer accuracy). So autotune was drunk after all!
So, it seems I should release fixes (new binary release) for these problems tonight.
Christian
also there seems to be a bug in the autotuning code in salsa_kernel.cu
hash_sec = (double)WU_PER_LAUNCH / tdelta;
should very likely be
hash_sec = (double)WU_PER_LAUNCH * repeat / tdelta;
to factor in the number of repetitions in the measurement (we want to measure for 50ms minimum for better timer accuracy). So autotune was drunk after all!
So, it seems I should release fixes (new binary release) for these problems tonight.
Christian
I've been experiencing problems with the 2-2 and 2-4 releases, both dropped my Kh/s about 20-30 on my GTX 560 Ti (Fermi). I've been using the 12-18 release to maximize my hashing power.
Here's my config:
cudaminer.exe --no-autotune -O user.worker:pass -o stratum+tcp://pool.com:3333 -C 1 -i 0 -H 1 -l F8x16
I have noticed that the newer releases report that the maximum warps as 209, whereas 12-18 shows maximum warps as 211. I did a benchmark on 2-4 with -C 1 -H 1 and -i 0 flags included, which gave me a config of F32x4. According to all of the resources I've read, F8x16 is the maximum my card can handle before giving CPU validation errors.