I'm too 'fresh' to post in the mining forums, hope that some seniors are around and help me out with some hints.
Just started gaining interest in GPU programming after my first mining rig is up and producing some bitcents. Lurked around at AMD and Khronos websites and downloaded the documentation I found at . Lots of information, but still it seems the last details are missing.
What I haven't found (yet) is information on the lowest level with cycle times for machine instructions to figure out why e.g.
- amd_bytealign((z^x),(y),(x)) is performing better than amd_bytealign((y),(x|z),(z&x))
- or amd_bitalign(x,x,(u)(32-y)) is faster than rotate(x,(u)y)
For the first case it might be obvious that one logical operation takes less than two of them. But for the second you need to have the cycle count for both instructions to know that one is faster than the other.
Is someone aware of related documentation being freely available?