You may start to understand why i try to keep cpuminer-multi "sph lib" intact
The difference is simple, some algos are much more hungry than others, and hashing 80 bytes often require 2 "loops", most algos use a 64 bytes buffer internally
Yes but groestly was failing with 64 byte hash, same as x11 and others that work. In the test I did BMW was always first with 80.
This was the chain I captured, ignore the time it's wrong.
0 BMW 80, sph, time 388407
1 Groes 64, sph, time 388412
2 Kecca 64, sph, time 388418
3 JH 64, sph, time 388423
4 Luffa 64, opt, time 388430
5 Cube 64, opt, time 388436
6 Skein 64, sph, time 388444
7 Blake 64 sph, time 388449
It runs the optimized luffa(SSE2) and cube(AVX2) but produces the same hashrate as sph.
This is the first time I have not seen a significant increase with either of these functions.
The code is different there should be some observable difference. Like I said performance is identical
with either, both, or neither optimized functions as long as the chain hasn't changed. That is too much
to be a coincidence. Something else is going on but I have no clue what.
I will take a look at the AES groestl, as I recall there is some commented out code that mentions
it isn't needed for x11 and quark. I assumed it meant it didn't need to support 80 byte input but maybe
there's something else. However, given it's called with the same args as x11, quark, etc, I would have
expected it to behave the same. The only possible difference is the input value itself, maybe there's something
different about the actual inpout data.
This could be one of those puzzles that consumes me.