I optimized EndianSwap in scrypt130511.cl a little with v_bfi_b32 instruction today.
For me it improved my hashrate from 408 kh/s to 415 kh/s on average.
Search EndianSwap in scrypt130511.cl, and replace
Code:
#define EndianSwap(n) (rotl(n & ES[0], 24U)|rotl(n & ES[1], 8U))
Code:
#define EndianSwap(n) (Ch(ES[0], rotl(n, 8U), rotl(n, 24U)))
Before optimization, EndianSwap compiles to:
Code:
v_and_b32 v2, 0x00ff00ff, v1
v_and_b32 v1, 0xff00ff00, v1
v_alignbit_b32 v2, v2, v2, 8
v_alignbit_b32 v1, v1, v1, 24
v_or_b32 v1, v2, v1
Code:
v_alignbit_b32 v2, v1, v1, 24
v_alignbit_b32 v1, v1, v1, 8
s_mov_b32 s0, 0x00ff00ff
v_bfi_b32 v1, s0, v2, v1
BTW, you can try another optimization I found two weeks ago, It also speeds up my hashrate by about 5kh/s.
Search "i<LOOKUP_GAP" and add "#pragma unroll" before the line.
Code:
#pragma unroll
for(uint i=0; i<LOOKUP_GAP; ++i)
salsa(X);
If you like my work, please donate.
BTC Donate: 1C1Dzhe9V8qwZvCMCTLSu5p4D582rgpBGR
LTC Donate: LVmUpF9eP3YwcZ5r8LLMiK74YfVk7oJpCB
DOGE Donate: D5AQ1y7ukUzo1R6iJxYwbCRD9tgvi3mgws