I'd like to learn how to Mod ccminer to improve performance by moving the branching part to the CPU and run only the hashing algos on GPU, can someone guide me step by step? or can at least tell me where can I find a guide for that? I understand it's difficult but I'd appreciate any help.
what branching part ?
as sp_ wrote, everything is already done on gpu except the (unnecessary) cpu validation ( where the result to gpu algo is compared to the "original" cpu algo)
By branching if you mean to merge together all the kernel into one big entity then it will kill the performance (not enough registers)