Hi,
Yes there is a bug with suggestedDP when log2(nbKangaroo) > log(sqrt(N)). In the case above the suggested DP should be 0.
As said before, to avoid a large DP overhead, it is important to choose DP such as nbKangaroo.2^dpbit << sqrt(N) and here you have nbKangaroo.2^dpbit >> sqrt(N) ! So for 40 bits search, 2^23.9 kangaroo is too much. As said in the readme "Powerfull GPUs with large number of cores won't be very efficient on small range", so to solve 40bit search, use only the CPU even with -t 1 (1 thread is enough), it will solve 40bit key in few seconds.
|
|
|
Many thanks to all of you for these tests ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) I will have a look at that tomorrow. Concerning the server in the 2.0 the DP checking is enabled. It may slow down the server if too much DP. If you compile by yourself, try to comment the line 542 in Network.cpp as below: //#define VALIDITY_POINT_CHECK #ifdef VALIDITY_POINT_CHECK
|
|
|
Will you commit your source ?
|
|
|
@JeanLuc How much constant memory do you use for the multiplication and for the addition? 32 jumps are 16kB for x and y-coordinate + 8 kB for their private keys (32 * 256bit = 8kB) + what else?
I use the following setting to prefer L1 cache as shared mem is not used. cudaDeviceSetCacheConfig(cudaFuncCachePreferL1); In constant mem: __device__ __constant__ uint64_t _0[] = { 0ULL,0ULL,0ULL,0ULL,0ULL }; __device__ __constant__ uint64_t _1[] = { 1ULL,0ULL,0ULL,0ULL,0ULL }; __device__ __constant__ uint64_t _P[] = { 0xFFFFFFFEFFFFFC2F,0xFFFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF,0xFFFFFFFFFFFFFFFF,0ULL }; __device__ __constant__ uint64_t MM64 = 0xD838091DD2253531; // 64bits lsb negative inverse of P (mod 2^64) __device__ __constant__ uint64_t _O[] = { 0xBFD25E8CD0364141ULL,0xBAAEDCE6AF48A03BULL,0xFFFFFFFFFFFFFFFEULL,0xFFFFFFFFFFFFFFFFULL __device__ __constant__ uint64_t jD[NB_JUMP][4]; __device__ __constant__ uint64_t jPx[NB_JUMP][4]; __device__ __constant__ uint64_t jPy[NB_JUMP][4];
I will definitely reduce jD to 128 bits in the next release, the less constant mem usage is better, there is 64Kb available but for L1 cache the lowest is the best. In that case best choice for solving keys is using CPU ![Grin](https://bitcointalk.org/Smileys/default/grin.gif) Yes, this is true for small range (as written in the README). Great work!
Thanks ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) For the #120, if you and Zielar use 2^30 kangaroos, you need to use a DP < 28.
Yes, we didn't launch the run yet, we will make our choice in the days to come. Small DP also increase needed mem. If you reduce k, you reduce the speed, then you have to reduce theta (DP).
Right. How many kangaroos run in parallel on a single V100 ? At wich speed?
~2^20 for the last 2 runs, we will see for #120.
|
|
|
The best time complexity with the birthday paradox and kangaroo having a starting position uniformly distributed in [0..N] is obtained when drawing alternatively one tame and one wild. When using DP0, you get ~2.sqrt(N). When using DPx, it is like drawing alternatively 2^x TAME and then 2^x WILD, you have then a DP overhead. I managed to get the formula for parallel version which is: ~2.CubicRoot( N (k.theta + sqrt(N)) ) where theta=2^dpbit and k the number of kangaroo-2 running in parallel. For theta=1 (DP0) and k=0 (or when k.theta << sqrt(N)) we well get ~2.sqrt(N), when k.theta >> sqrt(N), the time complexity tend to ~2.CubicRoot(k.N.theta). So it is important to choose a DP such as k.theta << sqrt(N).
|
|
|
50 pages and 2.25BTC unlocked in less than 2 months ! Rather a good score, no ?
|
|
|
It is necessary to use same jumps otherwise path are incompatible. And the mean has also to be controlled. Implementing changing of G is rather an heavy task so I would like to have at least an estimation of the loss due to the fact that all paths of new DP will have all points multiple of 32. We will probably also use DP28 or DP27 for #120
|
|
|
OK I see, I will try to implement this. Thanks ![Wink](https://bitcointalk.org/Smileys/default/wink.gif)
|
|
|
Jumps (the points) are the same, only their private keys are different;
So in practical, I let the old #115 DP as they are and for the new search #120 I use inv(32)*G instead of G ?
|
|
|
Hi, I don't understand well how this can work ? If you change G, the path will also differ as jumps are a function of the X value.
|
|
|
If we attack #64, we will try all priv key one by one, compute the HASH160 until we get a match. With few mods, the vanitySearch engine can do that. Kangaroo is not usefull here.
|
|
|
Yes thanks for this tip ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) I'm building a map to compute with accuracy the DP overhead as I didn't yet managed to get the correct analytical expression but only the 2 asymptote (0 and infinity) Tomorrow I will also investigate how long would it take with the VanitySearch engine optimized for V100 to solve #64. Then we will make our choice on attacking #64 or #120.
|
|
|
my Question you misunderstand example i have 2 pubkeys in 115bit range, 1 key found, for 2nd key how i can use that work file for find in less time, ??
You can simply merge the files, but at the moment the "trapped wild" are not yet handled, you will take benefit only from tame. This will be done in a future release as we will need it for solving #120.
|
|
|
You have computed about 2^58.36 steps. And you know now the private keys of 2^33.36 DPs in the [1, 2^114 - 1] interval (why you said [2^114,2^115-1]? you know only one public key P in [2^114, 2^115 - 1] range, all the public keys/points you have computed lie in [1, 2^114 - 1])
Yes you're right. I ve forgotten that the start range was shifted to zero. shame on me ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) It will help to solve #120. ![Wink](https://bitcointalk.org/Smileys/default/wink.gif)
|
|
|
Unfortunately I don't see how to reuse the work file on a different range. This work file is for [2^114,2^115-1].
If you translate the kangaroos, the paths will differ so you have to redo all the job.
There was an interesting question in a previous message of this topic: Why paths are important and why not adding DP without computing paths ?
Using only DP: It is like drawing a random number in a reduced space N/2^dpbit so time to solve will be O( sqrt(N/2^dpbit) ) However, it is true, that you can reach a DP faster in this way by computing consecutive points.
Using paths: It is like drawing a bunch of 2^dpbit random points at each DP, so time to solve is O( sqrt(N)/2^dpbit ) The "gain" using path is 2^dpbit while without paths, the "gain" is only sqrt(2^dpbit)
So even if reaching a DP without path is faster, the gain is not enough to beat path.
|
|
|
Congratulations for breaking the World Record too. Good job ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) The last world record was 112 bits, right? ![Huh](https://bitcointalk.org/Smileys/default/huh.gif) Thanks ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) The world record on standard architecture was also 114 bit but on a 114 bit field. Here we have solved a 114bit key on a 256bit field. So yes, we broke the world record on classic architecture ![Cheesy](https://bitcointalk.org/Smileys/default/cheesy.gif) The world record on FPGA is 117.25 bit, it was done using up to 576 FPGA during 6 months. With our architecture, we expect 2 months on 256 V100 to solve #120 (119bit keys), we will see if we will attack it. Yes I will send the satoshi to 1FoundByJLPKangaroo... but before I'm waiting Zielar who is offline at the moment. We will publish the private in a while when altcoin will also be moved.
|
|
|
We (me and zielar) solved #115 after ~2^33.36 DP (DP25) (More than 300GB of DP). I do not know exactly how long the run takes due to unwanted interruption. ![Cheesy](https://bitcointalk.org/Smileys/default/cheesy.gif)
|
|
|
|