Jean_Luc (OP)
|
|
March 17, 2019, 03:05:11 PM |
|
Could you try this: pons@linpons:~/VanitySearch$ /usr/local/cuda/bin/cuda-memcheck --tool memcheck VanitySearch -g 1 -check
On my Linux it does not work (too old hardware) but on windows it ends like this. C:\C++\VanitySearch\x64\ReleaseSM30>cuda-memcheck --tool memcheck VanitySearch.exe -g 1 -check ... Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK GPU: GPU #0 GeForce GTX 645 (3x192 cores) Grid(1x128) Endianness: Little Seed: 1006346800 401.220 KiloKey/sec ComputeKeys() found 46 items , CPU check... GPU/CPU check OK ========= ERROR SUMMARY: 0 errors
|
|
|
|
Jean_Luc (OP)
|
|
March 17, 2019, 03:47:15 PM |
|
I committed a new Makefile with debug option. make clean make gpu=1 debug=1 all
In debug mode no inlining is done. But, obviously it is much slower. So launch pons@linpons:~/VanitySearch$ ./VanitySearch -g 1 -check
|
|
|
|
arulbero
Legendary
Offline
Activity: 1915
Merit: 2074
|
|
March 17, 2019, 04:09:58 PM Last edit: March 17, 2019, 04:20:11 PM by arulbero |
|
I committed a new Makefile with debug option. make clean make gpu=1 debug=1 all
In debug mode no inlining is done. But, obviously it is much slower. So launch pons@linpons:~/VanitySearch$ ./VanitySearch -g 1 -check
./VanitySearch -g 1 -check GetBase10() Results OK Add() Results OK : 108.696 MegaAdd/sec Mult() Results OK : 10.684 MegaMult/sec Div() Results OK : 1.656 MegaDiv/sec ModInv()/ModExp() Results OK ModInv() Results OK : 132.041 KiloInv/sec IntGroup.ModInv() Results OK : 2.222 MegaInv/sec ModMulK1() Results OK : 3.661 MegaMult/sec ModMulK1order() Results OK : 1.700 MegaMult/sec ModSqrt() Results OK ! Check Generator :OK Check Double :OK Check Add :OK Check GenKey :OK Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK! Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK! Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)! Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK! Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)! Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(1x128) Seed: 888394 193.110 KiloKey/sec ComputeKeys() found 26 items , CPU check... GPU/CPU check OK
~/VanitySearch$ /usr/local/cuda-8.0/bin/cuda-memcheck --tool memcheck VanitySearch -g 1 -check ========= CUDA-MEMCHECK GetBase10() Results OK Add() Results OK : 109.890 MegaAdd/sec Mult() Results OK : 10.695 MegaMult/sec Div() Results OK : 1.818 MegaDiv/sec ModInv()/ModExp() Results OK ModInv() Results OK : 130.572 KiloInv/sec IntGroup.ModInv() Results OK : 2.182 MegaInv/sec ModMulK1() Results OK : 3.602 MegaMult/sec ModMulK1order() Results OK : 1.684 MegaMult/sec ModSqrt() Results OK ! Check Generator :OK Check Double :OK Check Add :OK Check GenKey :OK Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK! Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK! Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)! Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK! Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)! Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(1x128) Seed: 781110 15.061 KiloKey/sec ComputeKeys() found 26 items , CPU check... GPU/CPU check OK ========= ERROR SUMMARY: 0 errors
~/VanitySearch$ /usr/local/cuda-8.0/bin/cuda-memcheck --tool memcheck VanitySearch -g 32 -check ========= CUDA-MEMCHECK GetBase10() Results OK Add() Results OK : 80.000 MegaAdd/sec Mult() Results OK : 10.030 MegaMult/sec Div() Results OK : 1.883 MegaDiv/sec ModInv()/ModExp() Results OK ModInv() Results OK : 130.924 KiloInv/sec IntGroup.ModInv() Results OK : 2.221 MegaInv/sec ModMulK1() Results OK : 3.659 MegaMult/sec ModMulK1order() Results OK : 1.704 MegaMult/sec ModSqrt() Results OK ! Check Generator :OK Check Double :OK Check Add :OK Check GenKey :OK Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK! Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK! Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)! Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK! Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)! Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(32x128) Seed: 639838 59.308 KiloKey/sec ComputeKeys() found 721 items , CPU check... GPU/CPU check OK ========= ERROR SUMMARY: 0 errors
|
|
|
|
Jean_Luc (OP)
|
|
March 17, 2019, 04:55:04 PM |
|
Ok Thanks, could you try to run cuda-memcheck on the release version.
|
|
|
|
arulbero
Legendary
Offline
Activity: 1915
Merit: 2074
|
|
March 17, 2019, 05:32:56 PM |
|
Ok Thanks, could you try to run cuda-memcheck on the release version.
~/VanitySearch-1.8$ /usr/local/cuda-8.0/bin/cuda-memcheck --tool memcheck VanitySearch -g 1 -check ========= CUDA-MEMCHECK GetBase10() Results OK Add() Results OK : 123.457 MegaAdd/sec Mult() Results OK : 23.148 MegaMult/sec Div() Results OK : 5.208 MegaDiv/sec ModInv()/ModExp() Results OK ModInv() : 341.317 KiloInv/sec IntGroup.ModInv() : 9.130 MegaInv/sec ModMulK1() : 12.968 MegaMult/sec ModSqrt() OK ! Check Generator :OK Check Double :OK Check Add :OK Check GenKey :OK Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK! Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK! Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)! Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK! Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)! Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK GPU: GPU #0 Quadro M2200 (8x128 cores) Grid(1x128) Seed: 223215 95.697 KiloKey/sec ComputeKeys() found 26 items , CPU check... Expected item not found 3412bb65 cb39a716 67dcd486 209b19df c65e364c Expected item not found fefea644 d535267a 46308e46 c579e91b 0aad3ee2 Expected item not found 3412726b 9830f325 9c5f0d95 a99e2a9b 6c473922 Expected item not found 341292e1 b4a39d2c 59e34f3d 38725b42 dfc2e801 Expected item not found fefeba57 c1209e3d 1b79200c b9529018 de0e35e4 Expected item not found fefe4aaa 34f02402 4ed76c83 a1d60efc 8c79f7a6 Expected item not found fefe8742 63e9b7bc b13a08f1 28229fd8 30987ed3 CPU found 22 items ========= ERROR SUMMARY: 0 errors
|
|
|
|
stivensons
Jr. Member
Offline
Activity: 82
Merit: 1
|
|
March 17, 2019, 06:04:15 PM |
|
Ok Thanks, could you try to run cuda-memcheck on the release version.
if you post a release windows , I can test it too
|
|
|
|
Jean_Luc (OP)
|
|
March 17, 2019, 06:28:46 PM |
|
if you post a release windows , I can test it too You can test with the release you have. You can try: VanitySearch -gpuId 0 -check VanitySearch -gpuId 6 -check (On the 3GB)
Thanks Tomorow, I will try to set up cuda sdk 10 on a recent hardware (Linux) and see If I can reproduce the issue.
|
|
|
|
stortz
Jr. Member
Offline
Activity: 40
Merit: 15
|
|
March 17, 2019, 10:43:52 PM |
|
I tried your program with the parameters as shown in the sample + my username -stop -gpu 1stortz it ran, but just closed after finding it did it generate the private keys into a file? I am confused
|
|
|
|
stivensons
Jr. Member
Offline
Activity: 82
Merit: 1
|
|
March 18, 2019, 05:12:04 AM |
|
if you post a release windows , I can test it too You can test with the release you have. You can try: VanitySearch -gpuId 0 -check VanitySearch -gpuId 6 -check (On the 3GB)
Thanks Tomorow, I will try to set up cuda sdk 10 on a recent hardware (Linux) and see If I can reproduce the issue. cuda 10 G:\vanitysearch>vanitysearch -gpuId 0 -check GetBase10() Results OK Add() Results OK : 567.189 MegaAdd/sec Mult() Results OK : 38.169 MegaMult/sec Div() Results OK : 4.410 MegaDiv/sec ModInv()/ModExp() Results OK ModInv() : 281.352 KiloInv/sec IntGroup.ModInv() : 8.365 MegaInv/sec ModMulK1() : 10.770 MegaMult/sec ModSqrt() OK ! Check Generator :OK Check Double :OK Check Add :OK Check GenKey :OK Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK! Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK! Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)! Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK! Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)! Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK GPU: GPU #0 GeForce GTX 1060 6GB (10x128 cores) Grid(80x128) Seed: 1853432973 296.742 MegaKey/sec ComputeKeys() found 1947 items , CPU check... GPU/CPU check OK
G:\vanitysearch>vanitysearch -gpuId 6 -check GetBase10() Results OK Add() Results OK : 556.067 MegaAdd/sec Mult() Results OK : 35.273 MegaMult/sec Div() Results OK : 4.104 MegaDiv/sec ModInv()/ModExp() Results OK ModInv() : 260.561 KiloInv/sec IntGroup.ModInv() : 7.773 MegaInv/sec ModMulK1() : 9.881 MegaMult/sec ModSqrt() OK ! Check Generator :OK Check Double :OK Check Add :OK Check GenKey :OK Adress : 15t3Nt1zyMETkHbjJTTshxLnqPzQvAtdCe OK! Adress : 1BoatSLRHtKNngkdXEeobR76b53LETtpyT OK! Adress : 1JeanLucgidKHxfY5gkqGmoVjo1yaU4EDt OK(comp)! Adress : 1Test6BNjSJC5qwYXsjwKVLvz7DpfLehy OK! Adress : 1BitcoinP7vnLpsUHWbzDALyJKnNo16Qms OK(comp)! Check Calc PubKey (full) 1ViViGLEawN27xRzGrEhhYPQrZiTKvKLo :OK Check Calc PubKey (even) 1Gp7rQ4GdooysEAEJAS2o4Ktjvf1tZCihp:OK Check Calc PubKey (odd) 18aPiLmTow7Xgu96msrDYvSSWweCvB9oBA:OK GPU: GPU #6 GeForce GTX 1060 3GB (9x128 cores) Grid(72x128) Seed: 2205931314 260.131 MegaKey/sec ComputeKeys() found 1752 items , CPU check... GPU/CPU check OK
|
|
|
|
Jean_Luc (OP)
|
|
March 18, 2019, 06:53:09 AM |
|
Hello, it ran, but just closed after finding it did it generate the private keys into a file? I am confused
To output the key in a file, use the -o option. VanitySearch -stop -gpu -o key.txt 1stortz
Many thanks stivensons for the report
|
|
|
|
|
Lisa Finn
Newbie
Offline
Activity: 4
Merit: 0
|
|
March 19, 2019, 06:11:55 PM |
|
Hello, I would like to present a new bitcoin prefix address finder called VanitySearch. It is very similar to Vanitygen. The main differences with Vanitygen are that VanitySearch is not using the heavy OpenSSL for CPU calculation and that the kernel is written in Cuda in order to take full advantage of inline PTX assembly. On my Intel Core i7-4770, VanitySearch runs ~4 times faster than vanitygen64. (1.32 Mkey/s -> 5.27 MK/s) On my GeForce GTX 645, VanitySearch runs ~1.5 times faster than oclvanitygen. (9.26 Mkey/s -> 14.548 MK/s) If you want to compare VanitySearch and Vanitygen result, use the -u option for searching uncompressed address. VanitySearch may not compute a good gridsize for your GPU, so make several tries using -g options in order to find best performances. Using compressed addresses is roughly 20% faster. VanitySearch is available from https://github.com/JeanLucPons/VanitySearchThere is still lots of improvement to do. Feel free to test it and to submit issue. Thanks. Sorry for my bad English. Jean-Luc Is this really legil in asian countries like India
|
|
|
|
TryNinja
Legendary
Offline
Activity: 2954
Merit: 7379
|
|
March 19, 2019, 06:52:37 PM |
|
Why would a Bitcoin address generator be ilegal anywhere?
|
|
|
|
|
arulbero
Legendary
Offline
Activity: 1915
Merit: 2074
|
|
March 20, 2019, 09:40:53 AM |
|
A new release of VanitySearch (1.9) is out: Added -b option (Search compressed or uncompressed addresses) Improved performance for loading large prefix list Fixed difficulty calculation bug for prefix containing only '1'
New version is slower on my pc (132 MKeys/s against 162 MKeys/s).
|
|
|
|
Jean_Luc (OP)
|
|
March 20, 2019, 10:07:38 AM |
|
New version is slower on my pc (132 MKeys/s against 162 MKeys/s).
On my Windows, performance are the same than the previous release (Cuda 10). Slightly slower on Linux (Cuda 8.0), from 39.5MK/s to 37.9MK/s. Anyway, Do you compile or do you use Linux binaries ? Do you solved your problem ? I didn't manage to reproduce the issue yet.
|
|
|
|
arulbero
Legendary
Offline
Activity: 1915
Merit: 2074
|
|
March 20, 2019, 11:36:29 AM |
|
New version is slower on my pc (132 MKeys/s against 162 MKeys/s).
On my Windows, performance are the same than the previous release (Cuda 10). Slightly slower on Linux (Cuda 8.0), from 39.5MK/s to 37.9MK/s. Anyway, Do you compile or do you use Linux binaries ? Do you solved your problem ? I didn't manage to reproduce the issue yet. I compile the source myself. No, my problem is not solved. I have only Cuda 8.0. Some ideas for (maybe) a little speed improvement: 1) in __device__ void ComputeKeys (GPUCompute.h) instead of doing HSIZE times ModNeg256(dy,Gy[i]); <-- ModSub256(dy, py);
you could do: ModSub256(dy, pyn, Gy[i]);
and you compute only once pyn: ModNeg256(pyn,py);
2) instead of ModAdd256(py, Gy[i]);
ModSub256(py, sy);
To sum up: ModSub256(dy, pyn, Gy[i]);
_ModMult(_s, dy, dx[i]); // s = (p2.y-p1.y)*inverse(p2.x-p1.x) _ModMult(_p2, _s, _s); // _p = pow2(s)
ModSub256(px, _p2, px); ModSub256(px, Gx[i]); // px = pow2(s) - p1.x - p2.x;
ModSub256(py, sx, px); _ModMult(py, _s); // py = - s*(ret.x-p2.x) ModSub256(py, sy); // py = - p2.y - s*(ret.x-p2.x);
3) in __device__ void ModSub256 instead of if ((int64_t)t < 0) { UADDO1(r[0], _P[0]); UADDC1(r[1], _P[1]); UADDC1(r[2], _P[2]); UADD1(r[3], _P[3]); }
it would be better something like that: if ((int64_t)t < 0) { USUBO1(r[0], 0x01000003d1); USUBC1(r[1], 0ULL); USUBC1(r[2], 0ULL); USUBC1(r[3], 0ULL); }
(I'm not sure what C means, I suppose means with carry)
|
|
|
|
arulbero
Legendary
Offline
Activity: 1915
Merit: 2074
|
|
March 20, 2019, 11:47:42 AM |
|
Another sub function, if you want to test it: __device__ void ModSub256(uint64_t *rp, uint64_t *ap, uint64_t *bp) {
uint64_t a0, a1, a2, a3, b0, b1, b2, b3, r0, r1, r2, r3; int8_t c0, c1, c2, c3;
a0 = ap[0]; a1 = ap[1]; a2 = ap[2]; a3 = ap[3];
b0 = bp[0]; b1 = bp[1]; b2 = bp[2]; b3 = bp[3]; /* r0 = a0 - b0; c0 = (a0 < b0) ? 1 : -1; c0 = (r0 == 0) ? 0 : c0; r1 = a1 - b1; c1 = (a1 < b1) ? 1 : -1; c1 = (r1 == 0) ? c0 : c1; r1 = r1 - (c0 == 1); r2 = a2 - b2; c2 = (a2 < b2) ? 1 : -1; c2 = (r2 == 0) ? c1 : c2; r2 = r2 - (c1 == 1);
r3 = a3 - b3; c3 = (a3 < b3) ? 1 : -1; c3 = (r3 == 0) ? c2 : c3; r3 = r3 - (c2 == 1); */
c0 = a0 < b0; r0 = a0 - b0; c1 = a1 < b1; r1 = a1 - b1; if(r1 == 0){ c1 = c0;} if(c0) {r1 = r1 - 1;}
c2 = a2 < b2; r2 = a2 - b2; if(r2 == 0){ c2 = c1;} if(c1) {r2 = r2 - 1;}
c3 = a3 < b3; r3 = a3 - b3; if(r3 == 0){ c3 = c2;} if(c2) {r3 = r3 - 1;}
if(c3 == 1){
if(r0 > 0x1000003d0){ //almost always --> no borrow r0 = r0 - 0x1000003d1;
} else{ //c[0] = (r0 < 0x1000003d1) ? 1 : -1; //c0 = (r0 == 0x1000003d1) ? 0 : 1; //c0 = 1; // for sure r0 < 0x1000003d1
r0 = r0 - 0x1000003d1; r1 = r1 - 1; //c0 is 1 c1 = (r1 == 0xffffffffffffffff) ? 1 : -1; c2 = (r2 == 0) ? c1 : -1;
if(c1 == 1) r2 = r2 - 1; if(c2 == 1) r3 = r3 - 1;
}; }; rp[0] = r0; rp[1] = r1; rp[2] = r2; rp[3] = r3;
return; }
|
|
|
|
Jean_Luc (OP)
|
|
March 20, 2019, 11:50:28 AM |
|
Many thanks for the tips I will try this. You don't want to try binary ? The libcudart.so.10.0 is also available from the given link. You do not need to set up cuda sdk 10 (unless a driver problem appears but this may work without installing anything). You can just copy VanitySearch50 and the libcudart.so.10.0 in a directory and set the LD_LIBRARY_PATH. export LD_LIBRARY_PATH=. ./VanitySearch50 ...
This is mainly to see if the problem is solved with CUDA 10 or if it comes from elsewhere.
|
|
|
|
Jean_Luc (OP)
|
|
March 20, 2019, 11:56:10 AM |
|
(I'm not sure what C means, I suppose means with carry)
Yes, ADD0 is the initial add without carry and set carry flag ADDC is add with carry and set carry flag ADD is add with carry and do no set carry flag Same for SUB Function may be have a 1 suffix for unary function.
|
|
|
|
|