Thanks to riecoin my gmp is already custom compiled and tuned. I only mentioned testnet as the diffidently was high and was increasing at a steady rate.
Interesting that you get some performance out of SIMD with the code as is. One of the first things I did was to compile with "-ftree-vectorize -mavx2 -ftree-vectorizer-verbose=5" to see what auto-vectorising found. It didn't find much (any?) to vectorise as most of the size of the loops aren't know at compile time (there are trick you can use however) or a non-uniform step being used.
Regards,
PS. The fact that GCC couldn't vectorise what should be very conducive to vectorising is a good avenue to work on for speed-ups.
--
bsunau7
Interesting that you get some performance out of SIMD with the code as is. One of the first things I did was to compile with "-ftree-vectorize -mavx2 -ftree-vectorizer-verbose=5" to see what auto-vectorising found. It didn't find much (any?) to vectorise as most of the size of the loops aren't know at compile time (there are trick you can use however) or a non-uniform step being used.
Regards,
PS. The fact that GCC couldn't vectorise what should be very conducive to vectorising is a good avenue to work on for speed-ups.
--
bsunau7
Its not only the vector extensions that are added, its switching between march=native to everything up to core-avx2. And O2 to O3. I dont know what exactly did the trick, has to be something thats added here.