I got another 10% speed improvement on my Has(h)well I5 box by tweaking the already excellent mods done by Wolf/Joe .
From 72 KHash/s to 81 KHash/s
running 4 threads on my i5-4460 , no overclock
This tweak/hack uses a different implementation of the sha256 function and will only work on chips with AESNI/AVX2 .
Here is how I did it. Read on if you are interested (Warning: The below will be a bit technical and applies to Linux only):
So I copied the source code from
http://lxr.free-electrons.com/source/arch/x86/crypto/sha256-avx2-asm.S.
The code in here replaces the sha2_round code found in the m7/sha2.c file.
I edited it carefully by removing the line numbers and commented out the following lines
// #ifdef CONFIG_AS_AVX2
// #include <linux/linkage.h>
..
.global sha256_transform_rorx ## ENTRY(sha256_transform_rorx)
..
// ENDPROC(sha256_transform_rorx)
and the bottom line
// endif
Then stored the resulting as sha256_rorx.S in the m7 subfolder of the miner source.
Then edited the m7/sha2.c as follows:
// Added this line above the below C function:
extern void sha256_transform_rorx(const void *input_data, sph_u32 *digest, unsigned long num_blks);
// Then the following changes to this function.
static void
sha2_round(const unsigned char *data, sph_u32 r[8])
{
// Comment out these 3 lines
//#define SHA2_IN(x) sph_dec32be_aligned(data + (4 * (x)))
// SHA2_ROUND_BODY(SHA2_IN, r);
//#undef SHA2_IN
// Add this line:
sha256_transform_rorx(data, r, 1);
}
saved the above file, then ran the following commands:
run 'make' in the m7 folder.
Then run 'gcc -c sha256_rorx.S'
Then 'cd ..'
Then:
gcc -std=gnu99 -O2 -pthread -fuse-linker-plugin -o minerd minerd-cpu-miner.o minerd-util.o minerd-sha2-old.o minerd-scrypt.o minerd-m7mhash.o minerd-haval.o minerd-keccak.o minerd-ripemd.o minerd-sha2.o minerd-sha2big.o minerd-tiger.o minerd-whirlpool.o m7/sha256_rorx.o -ljansson -lpthread -lgmp -lcurl -lm
There are other versions of the above, so for example you might try the sse3 version if you only have a core 2 duo, for example.