Hi,
cudaMemset(outputPrefix,0,4); reset the prefix counter which is a 32bits variable, there's no need to reset anything else.
This variable is used in:
pos = atomicAdd(out, 1);
where out is a pointer (uint32_t *) to outputPrefix.
With the functions, everything is clear
But, it does not work out logically ...
this-> outputSize = (maxFound * ITEM_SIZE + 4);
cudaMalloc ((void **) & outputPrefix, outputSize);
cudaMemset (outputPrefix, 0,4);
Respectively
outputSize = (65536 * 28 + 4) = 1835012
outputSize = 65536 * (28 + 4) = 2097152 = 2 ^ 21 bytes = 2048 KB = 2 MB
Conclusions:
Allocate output memory 1835012 bytes
Clear output memory 4 bytes
This is probably a mistake !?
Found values in the Launch function - prefixFound is passed to the ITEM array "prefixFound.push_back (it);"
Next comes return callKernel ();
Where the data in outputPrefix memory should be zeroed, but this does not happen in the program, or rather, only the lower 4 bytes are erased
and cyclically "while (ok &&! flag_endOfSearch)" and "ok = g.Launch (found);" copied to the host.
This is verified practically in VanitySearch :: FindKeyGPU by the printf function.
Need to replace ?
cudaMemset (outputPrefix, 0,4);
on the
cudaMemset (outputPrefix, 0, outputSize); // spinWait = true;
and
this-> outputSize = (maxFound * ITEM_SIZE + 4);
on the
this-> outputSize = (maxFound * (ITEM_SIZE + 4));
Because 1835012 is not a multiple of 8