RX480 with amdgpu-pro 16.30
Total 55.8 sol/s [dev0 54.0] 18 shares
Total 55.3 sol/s [dev0 52.4] 18 shares
Total 55.6 sol/s [dev0 54.7] 18 shares
Total 55.9 sol/s [dev0 55.7] 18 shares
Total 55.0 sol/s [dev0 55.7] 18 shares
Total 55.5 sol/s [dev0 56.2] 18 shares
Total 55.2 sol/s [dev0 56.1] 19 shares
Total 54.6 sol/s [dev0 54.8] 19 shares
Total 54.9 sol/s [dev0 55.3] 19 shares
Total 55.1 sol/s [dev0 53.1] 19 shares
Total 54.4 sol/s [dev0 52.6] 19 shares
Kernel:
http://coinsforall.io/distr/input.cl.coll1 NVidia also have speedup.
I reduced number of collisions to found from 5 to 1, it seems 5 is too much, need mrb's comments.
Yes, you can reduce collisions from 5 to 1. I meant to do this but forgot about it :-P
For the record your input.cl.coll1 on my RX 480 with amdgpu-pro 16.40:
Total 40.7 sol/s [dev0 41.9] 3 shares
Total 40.7 sol/s [dev0 40.8] 3 shares
Total 40.7 sol/s [dev0 40.4] 3 shares
Total 41.4 sol/s [dev0 42.8] 3 shares
Total 41.6 sol/s [dev0 43.3] 3 shares
Total 41.8 sol/s [dev0 45.9] 3 shares
Total 42.1 sol/s [dev0 47.7] 3 shares
Total 41.8 sol/s [dev0 45.5] 3 shares
Total 41.1 sol/s [dev0 44.9] 3 shares
Total 41.2 sol/s [dev0 43.5] 4 shares
Total 41.1 sol/s [dev0 44.3] 4 shares
Total 41.4 sol/s [dev0 44.5] 4 shares
Total 41.3 sol/s [dev0 44.7] 4 shares
You must have o/c'd. I don't believe 16.30 is 37% faster than 16.40.