Hi,
Sorry if this is the wrong forum.
I am currently in the process of creating a fairly simple bitcoin miner for CPU and GPU (purely for demonstration purposes not for money earning).
However I am having a little troubling understanding how the GPU miners generally work. Now I do want a fairly simple version of this, but I hope to be able to get one that performs decently (hopefully within 10-30% of normal miners, and definitely faster than CPU version).
In general I would think the strategy you have for executing on the GPU is something like the below. I hope someone could help me out on whether I am doing something completely wrong and give me some pointers towards how you usually do it.
- Transfer the binary version of the data to hash to the kernel (I noticed a lot of input arguments to the OpenCL kernels on some of the versions I have seen, I assume this is some sort of optimizations of data transfer, looks like a midstate calculation that is passed to each kernel)
- Now, calculate the double sha256 hash of the data (Is it generally advisable to have a loop checking multiple nonces or just one per kernel?)
- Return a result. What is the best way of doing this? Do I check inside the kernel if it is lower than the desired target, do I just return any value and return to the host device to check for validity or how is this generally done? If checking multiple nonces I assume you should keep track of what was the best result during the run.
I do have a general and very basic GPU implementation but it is currently slower than the CPU implementation I have. I do more or less as above where each kernel check several nonces and return the "best" one (i.e. most trailing zeros of the hash (using getwork protocol) ).
Usually in the nonce loop you want to return a result as soon as you find a diff 1 hash (4 bytes of zeroes).
If you don't have 4 bytes of zeroes then you can continue.
Some things don't change if you have a different nonce. These can be precomputed outside of the nonce loop.
Did you use the unrolled version of sha256? Or do you have a for loop with 64 rounds?