I committed a new Makefile with debug option. make clean make gpu=1 debug=1 all
In debug mode no inlining is done. But, obviously it is much slower. So launch pons@linpons:~/VanitySearch$ ./VanitySearch -g 1 -check
|
|
|
Could you try this: pons@linpons:~/VanitySearch$ /usr/local/cuda/bin/cuda-memcheck --tool memcheck VanitySearch -g 1 -check
On my Linux it does not work (too old hardware) but on windows it ends like this. C:\C++\VanitySearch\x64\ReleaseSM30>cuda-memcheck --tool memcheck VanitySearch.exe -g 1 -check ... Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK GPU: GPU #0 GeForce GTX 645 (3x192 cores) Grid(1x128) Endianness: Little Seed: 1006346800 401.220 KiloKey/sec ComputeKeys() found 46 items , CPU check... GPU/CPU check OK ========= ERROR SUMMARY: 0 errors
|
|
|
Just to try. Try to reduce the number of thread per block from 128 to 64. And if it works to double the number of block per grid using -g GPUEngine.h:28 #define NB_TRHEAD_PER_GROUP 64
There is a typo in the code
|
|
|
OK it confirms what I'm thinking. It seems that this code is now near the limit of what CUDA (or nvcc) can do. May be CUDA SDK 10 can help. I'll try (for other users also) to make things work for CUDA 10 under Linux. I'll try also to reduce the code size.
|
|
|
After the mark, calculation are 50% wrong. On my 2 configs, all is working fine. It really looks like the weird problem I had last time. The _GetHash160Comp is ok, it is also tested alone by the check function. The _ModMult is heavily used during ecc calculation. The CHECK_POINT() works 100% the in first case. I would try: __noinline__ _ModMult, __noinline__ ModNeg256 Remove the whole lookup32 test in CheckPoint() (not used here) I will add more info... __device__ __noinline__ void CheckHashComp(prefix_t *prefix, uint64_t *px, uint64_t *py, int32_t incr, uint32_t tid, uint32_t *lookup32, uint32_t *out) {
uint32_t h[20]; uint64_t pe1x[4]; uint64_t pe2x[4];
_GetHash160Comp(px, py, (uint8_t *)h); CHECK_POINT(h, incr, 0); <-- 100% Ok up to here, means that (px,py) is good _ModMult(pe1x, px, _beta); _GetHash160Comp(pe1x, py, (uint8_t *)h); <-- 50% Wrong from here CHECK_POINT(h, incr, 1); _ModMult(pe2x, px, _beta2); _GetHash160Comp(pe2x, py, (uint8_t *)h); CHECK_POINT(h, incr, 2);
ModNeg256(py);
_GetHash160Comp(px, py, (uint8_t *)h); CHECK_POINT(h, -incr, 0); _GetHash160Comp(pe1x, py, (uint8_t *)h); CHECK_POINT(h, -incr, 1); _GetHash160Comp(pe2x, py, (uint8_t *)h); CHECK_POINT(h, -incr, 2);
}
|
|
|
Ok Thanks i will investigate. That's quite strange the point seems OK but all sym and endo are sometimes right sometimes wrong. Funny bug
|
|
|
Try also with -g option to reduce the number of GPU thread. The -g option must be placed before the -check option in the command line.
|
|
|
The result are constant ? I mean, for instance, GPU: point correct [252/252] is always at 100% ?
|
|
|
Hello,
I added more details in the check function. I also enabled the #define FULLCHECK (GPUEngine.cu:1163) which will perform individual test on hash function and ModularMult.
|
|
|
771 lines of "Expected item not found"
This time all wrong.
|
|
|
Tomorrow, I will add more details in the check function in order to try to understand what is going wrong. As on my 2 config all is going fine, it is very difficult to debug.
|
|
|
OK. That means that 1545 - 549 are ok. An idea would also to noline the _ModInvGrouped (Line 594 GPUEngine.cu) I committed GPUEngine.cu (I just removed unused functions) so the line number may be incorrect.
|
|
|
What you can try is to remove the endomorphism for checking. Comment the following code in GPUEngine.cu around line 1843 (in the Check() function) /* secp.GetHash160(p1, searchComp, h); pr = *(prefix_t *)h; if (pr == 0xFEFE || pr == 0x1234) { nbFoundCPU++; ok &= CheckHash(h, found); } secp.GetHash160(p2, searchComp, h); pr = *(prefix_t *)h; if (pr == 0xFEFE || pr == 0x1234) { nbFoundCPU++; ok &= CheckHash(h, found); } */
// Symetrics pt.y.ModNeg(); p1.y.ModNeg(); p2.y.ModNeg();
secp.GetHash160(pt, searchComp, h); pr = *(prefix_t *)h; if (pr == 0xFEFE || pr == 0x1234) { nbFoundCPU++; ok &= CheckHash(h, found); } /* secp.GetHash160(p1, searchComp, h); pr = *(prefix_t *)h; if (pr == 0xFEFE || pr == 0x1234) { nbFoundCPU++; ok &= CheckHash(h, found); } secp.GetHash160(p2, searchComp, h); pr = *(prefix_t *)h; if (pr == 0xFEFE || pr == 0x1234) { nbFoundCPU++; ok &= CheckHash(h, found); } */
And the following in GPUCompute.h around line 78 (jn the CheckHashComp() function) _GetHash160Comp(px, py, (uint8_t *)h); CHECK_POINT(h, incr, 0); /* _ModMult(pe1x, px, _beta); _GetHash160Comp(pe1x, py, (uint8_t *)h); CHECK_POINT(h, incr, 1); _ModMult(pe2x, px, _beta2); _GetHash160Comp(pe2x, py, (uint8_t *)h); CHECK_POINT(h, incr, 2); */
ModNeg256(py);
_GetHash160Comp(px, py, (uint8_t *)h); CHECK_POINT(h, -incr, 0); /* _GetHash160Comp(pe1x, py, (uint8_t *)h); CHECK_POINT(h, -incr, 1); _GetHash160Comp(pe2x, py, (uint8_t *)h); CHECK_POINT(h, -incr, 2); */
And then make gpu=1 clean and make gpu=1 all
|
|
|
At the end of the list you have the number of items found by the CPU if this number is equal to 549, that means that all are wrong.
|
|
|
They are not compile errors, it is the verbose ouput of nvcc, the _Z10CheckPointPjiiPtjS_S_ is a mangled name of the function CheckPoint in GPUCompute.h that I added in the last release.
|
|
|
The check works flawlessly on my configs, both linux and windows. It should work. There is something wrong somewhere but where...
|
|
|
I found out an illegal memory access. I committed a fix on git source. Thanks to test if it is better or not...
|
|
|
Good news. But restoring the group size, and get back in my linux git repo the problem appears. I will try to debug this.
|
|
|
The 1111 is not a problem difficult to solve. But the other concerning the GPU, f...ing hell !
And all work fine on my 8 years old quadro 600 !
|
|
|
I confirm that there is a problem with difficulty calculation when searching prefix like 1111something...
|
|
|
|