MrFreeDragon
|
|
May 08, 2020, 06:27:48 PM |
|
@MrFreeDragon You have to take in consideration that on this first test it has written a ~2GB file and ~4GB one and created 2^24 kangaroos. On the second with DP18 and 2^24, this gives an overhead of ~2^42 so more than the average, however it was solved solved in only 2^41, more than 2 time less than expected, he was lucky This is the problem of multiple powerful GPU, with 2^24 kangaroos, 80bit range is a too small range. The client/server mode is a good idea as it will avoid memory consumption on the client side, but the server should be well tuned., the overhead due to DP still apply. @HardwareCollector I agree I compared the results for default settings (for DP selected by program). In HardwareCollector's 1st case DP was 15, as in my tests as well. He has 2^24 created kangaroos, but I had 2^23 created kangaroos (because I had just 4x2080ti, and he used 8x2080ti) - 2 times more cards, and so 2 times more kangaroos. However, the number of created kangaroos per one cards is the same (as for 8x, so for 4x) - 2^21.09. So it should not be an issue...
|
|
|
|
MrFreeDragon
|
|
May 08, 2020, 07:21:33 PM |
|
-snip- I tried to continue the job 7 times - for the 1st time 2080ti solved the key for the extra 1 minute (with total operations 2^41.9), but all other 6 attempts I stopped while they reach 2^42.3 group operations (actually 2 times more than the expected).
However I'm also a bit surprised here, i would need more info to try to understand, especially number of kangaroo of each configuration and evolution of the number of distinguished point bits in the work files... I made the same test again with your recent release (for 2^80 range key): 1) Start work on 2x2080ti (12min work) 2) Continue work on 1x2080ti (15min work) 3a) Continue work from (2) on 2x2080ti (launched 5 times) 3b) Continue work from (2) on Tesla T4 (launched one time) At start (1) the expected time to solve was 19-20min, when continue on less powerful machine the expected time changed to 35min. In fact, the continued job in (3a) was solved for 40 min, and in (3b) was solved for 1h 48 min (expected 1h 11min only). The interesting thing also that while continue job on lower GPU (Tesla T4) it expected less number of operations (2^41.16 compared to 2^41.38 at work start). Probably the expected number of operations is calculated based on expected DP, but for work used DP from the started work where the expected operations number was different. Here is the statistics from all the steps: --------------------------------------------------------- (1) Start work on 2x2080ti
$ ./kangaroo -gpu -gpuId 0,1 -w work2080 -wi 150 -t 0 VC_CUDA8/in80.txt Kangaroo v1.4 Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^22.09 Suggested DP: 17 Expected operations: 2^41.38 Expected RAM: 846.5MB DP size: 17 [0xffff800000000000] GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#1: creating kangaroos... SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [12.9s] SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [14.1s] [2454.82 MK/s][GPU 2454.82 MK/s][Count 2^38.08][Dead 0][02:14 (Avg 19:28)][69.5/103.6MB] SaveWork: work2080...............done [69.6 MB] [00s] Fri May 8 16:47:27 2020 [2453.25 MK/s][GPU 2453.25 MK/s][Count 2^39.16][Dead 0][04:45 (Avg 19:28)][144.8/188.0MB] SaveWork: work2080...............done [144.8 MB] [01s] Fri May 8 16:49:59 2020 [2448.46 MK/s][GPU 2448.46 MK/s][Count 2^39.77][Dead 0][07:16 (Avg 19:31)][219.8/281.2MB] SaveWork: work2080...............done [219.8 MB] [01s] Fri May 8 16:52:31 2020 [2450.64 MK/s][GPU 2450.64 MK/s][Count 2^40.19][Dead 0][09:46 (Avg 19:30)][293.8/373.8MB] SaveWork: work2080...............done [293.9 MB] [02s] Fri May 8 16:55:01 2020 [2445.49 MK/s][GPU 2445.49 MK/s][Count 2^40.51][Dead 0][12:17 (Avg 19:32)][367.9/466.4MB] SaveWork: work2080...............done [367.9 MB] [02s] Fri May 8 16:57:32 2020 [2292.20 MK/s][GPU 2292.20 MK/s][Count 2^40.52][Dead 0][12:21 (Avg 20:50)][369.0/467.7MB] ^C
--------------------------------------------------------- (2) Continue work on 1x2080ti
$ ./kangaroo -gpu -gpuId 0 -i work2080 -w work2080 -wi 150 -t 0 Kangaroo v1.4 Loading: work2080 Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 LoadWork: [HashTalbe 367.9/466.4MB] [01s] Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^21.09 Suggested DP: 18 Expected operations: 2^41.23 Expected RAM: 761.5MB DP size: 17 [0xffff800000000000] GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [11.8s] [1224.73 MK/s][GPU 1224.73 MK/s][Count 2^40.64][Dead 0][14:33 (Avg 35:02)][402.0/508.9MB] SaveWork: work2080...............done [402.0 MB] [01s] Fri May 8 17:00:46 2020 [1224.25 MK/s][GPU 1224.25 MK/s][Count 2^40.77][Dead 0][17:03 (Avg 35:03)][439.1/555.3MB] SaveWork: work2080...............done [439.1 MB] [02s] Fri May 8 17:03:16 2020 [1224.60 MK/s][GPU 1224.60 MK/s][Count 2^40.89][Dead 0][19:34 (Avg 35:02)][476.1/601.6MB] SaveWork: work2080...............done [476.1 MB] [02s] Fri May 8 17:05:47 2020 [1225.37 MK/s][GPU 1225.37 MK/s][Count 2^41.00][Dead 0][22:05 (Avg 35:01)][513.1/647.8MB] SaveWork: work2080...............done [513.1 MB] [02s] Fri May 8 17:08:18 2020 [1223.11 MK/s][GPU 1223.11 MK/s][Count 2^41.10][Dead 0][24:36 (Avg 35:05)][550.1/694.1MB] SaveWork: work2080...............done [550.1 MB] [03s] Fri May 8 17:10:50 2020 [1225.35 MK/s][GPU 1225.35 MK/s][Count 2^41.19][Dead 0][27:08 (Avg 35:01)][587.1/740.4MB] SaveWork: work2080...............done [587.1 MB] [03s] Fri May 8 17:13:23 2020 [1137.83 MK/s][GPU 1137.83 MK/s][Count 2^41.19][Dead 0][27:14 (Avg 37:43)][587.6/741.0MB] ^C
--------------------------------------------------------- (3a) Continue work on 2x2080ti (5 times):
$ ./kangaroo -gpu -gpuId 0,1 -i work2080 -t 0 Kangaroo v1.4 Loading: work2080 Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 LoadWork: [HashTalbe 587.1/740.4MB] [01s] Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^22.09 Suggested DP: 17 Expected operations: 2^41.38 Expected RAM: 846.5MB DP size: 17 [0xffff800000000000] GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#1: creating kangaroos... GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [12.1s] SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [14.4s] [2444.55 MK/s][GPU 2444.55 MK/s][Count 2^41.48][Dead 0][31:31 (Avg 19:32)][718.7/904.8MB] Key# 0 [1S]Pub: 0x0284A930C243C0C2F67FDE3E0A98CE6DB0DB9AB5570DAD9338CADE6D181A431246 Priv: 0xB60E83280258A40F9CDF1649744D730D6E939DE92A2BE19B0D19A3D64A1DE032
Done: Total time 31:49
$ ./kangaroo -gpu -gpuId 0,1 -i work2080 -t 0 Kangaroo v1.4 Loading: work2080 Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 LoadWork: [HashTalbe 587.1/740.4MB] [01s] Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^22.09 Suggested DP: 17 Expected operations: 2^41.38 Expected RAM: 846.5MB DP size: 17 [0xffff800000000000] GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#1: creating kangaroos... GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [12.9s] SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [13.1s] [2436.24 MK/s][GPU 2436.24 MK/s][Count 2^42.54][Dead 2][57:24 (Avg 19:36)][1491.2/1870.5MB] Key# 0 [1S]Pub: 0x0284A930C243C0C2F67FDE3E0A98CE6DB0DB9AB5570DAD9338CADE6D181A431246 Priv: 0xB60E83280258A40F9CDF1649744D730D6E939DE92A2BE19B0D19A3D64A1DE032
Done: Total time 57:54
$ ./kangaroo -gpu -gpuId 0,1 -i work2080 -t 0 Kangaroo v1.4 Loading: work2080 Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 LoadWork: [HashTalbe 587.1/740.4MB] [01s] Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^22.09 Suggested DP: 17 Expected operations: 2^41.38 Expected RAM: 846.5MB DP size: 17 [0xffff800000000000] GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#1: creating kangaroos... SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [12.9s] SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [13.5s] [2439.37 MK/s][GPU 2439.37 MK/s][Count 2^41.97][Dead 3][41:04 (Avg 19:35)][1003.9/1261.4MB] Key# 0 [1S]Pub: 0x0284A930C243C0C2F67FDE3E0A98CE6DB0DB9AB5570DAD9338CADE6D181A431246 Priv: 0xB60E83280258A40F9CDF1649744D730D6E939DE92A2BE19B0D19A3D64A1DE032
Done: Total time 41:26
$ ./kangaroo -gpu -gpuId 0,1 -i work2080 -t 0 Kangaroo v1.4 Loading: work2080 Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 LoadWork: [HashTalbe 587.1/740.4MB] [01s] Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^22.09 Suggested DP: 17 Expected operations: 2^41.38 Expected RAM: 846.5MB DP size: 17 [0xffff800000000000] GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#1: creating kangaroos... SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [12.5s] SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [13.9s] [2446.30 MK/s][GPU 2446.30 MK/s][Count 2^41.35][Dead 0][29:25 (Avg 19:32)][655.4/825.8MB] Key# 0 [1S]Pub: 0x0284A930C243C0C2F67FDE3E0A98CE6DB0DB9AB5570DAD9338CADE6D181A431246 Priv: 0xB60E83280258A40F9CDF1649744D730D6E939DE92A2BE19B0D19A3D64A1DE032
Done: Total time 29:42
$ ./kangaroo -gpu -gpuId 0,1 -i work2080 -t 0 Kangaroo v1.4 Loading: work2080 Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 LoadWork: [HashTalbe 587.1/740.4MB] [01s] Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^22.09 Suggested DP: 17 Expected operations: 2^41.38 Expected RAM: 846.5MB DP size: 17 [0xffff800000000000] GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#1: creating kangaroos... GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [11.9s] SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [15.2s] [2442.00 MK/s][GPU 2442.00 MK/s][Count 2^42.02][Dead 1][42:16 (Avg 19:34)][1040.2/1306.8MB] Key# 0 [1S]Pub: 0x0284A930C243C0C2F67FDE3E0A98CE6DB0DB9AB5570DAD9338CADE6D181A431246 Priv: 0xB60E83280258A40F9CDF1649744D730D6E939DE92A2BE19B0D19A3D64A1DE032
Done: Total time 42:42
--------------------------------------------------------- (3b) Continue work on Tesla T4
$ ./kangaroo -t 0 -gpu -i work_from2080 -w work_teslaT4 -wi 300 Kangaroo v1.4 Loading: work_from2080 Start:B60E83280258A40F9CDF1649744D730D6E939DE92A2B00000000000000000000 Stop :B60E83280258A40F9CDF1649744D730D6E939DE92A2BE19BFFFFFFFFFFFFFFFF Keys :1 LoadWork: [HashTalbe 587.1/740.4MB] [01s] Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^20.32 Suggested DP: 19 Expected operations: 2^41.16 Expected RAM: 726.5MB DP size: 17 [0xffff800000000000] GPU: GPU #0 Tesla T4 (40x64 cores) Grid(80x128) (129.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... SolveKeyGPU Thread GPU#0: 2^20.32 kangaroos [8.1s] [572.34 MK/s][GPU 572.34 MK/s][Count 2^41.27][Dead 1][31:59 (Avg 01:11:29)][621.1/782.9MB] SaveWork: work_teslaT4...............done [621.2 MB] [02s] Fri May 8 17:34:58 2020 [574.98 MK/s][GPU 574.98 MK/s][Count 2^41.35][Dead 1][37:00 (Avg 01:11:09)][656.1/826.7MB] SaveWork: work_teslaT4...............done [656.1 MB] [03s] Fri May 8 17:40:01 2020 [573.82 MK/s][GPU 573.82 MK/s][Count 2^41.43][Dead 1][42:01 (Avg 01:11:18)][690.8/870.1MB] SaveWork: work_teslaT4...............done [690.8 MB] [04s] Fri May 8 17:45:03 2020 [573.58 MK/s][GPU 573.58 MK/s][Count 2^41.50][Dead 1][47:02 (Avg 01:11:20)][725.6/913.5MB] SaveWork: work_teslaT4...............done [725.6 MB] [04s] Fri May 8 17:50:04 2020 [573.76 MK/s][GPU 573.76 MK/s][Count 2^41.57][Dead 1][52:03 (Avg 01:11:18)][760.3/956.9MB] SaveWork: work_teslaT4...............done [760.3 MB] [04s] Fri May 8 17:55:05 2020 [575.16 MK/s][GPU 575.16 MK/s][Count 2^41.63][Dead 1][57:05 (Avg 01:11:08)][795.1/1000.3MB] SaveWork: work_teslaT4...............done [795.1 MB] [05s] Fri May 8 18:00:08 2020 [573.54 MK/s][GPU 573.54 MK/s][Count 2^41.69][Dead 1][01:02:05 (Avg 01:11:20)][829.6/1043.5MB] SaveWork: work_teslaT4...............done [829.6 MB] [05s] Fri May 8 18:05:08 2020 [573.72 MK/s][GPU 573.72 MK/s][Count 2^41.75][Dead 1][01:07:05 (Avg 01:11:19)][864.0/1086.5MB] SaveWork: work_teslaT4...............done [864.0 MB] [06s] Fri May 8 18:10:09 2020 [575.11 MK/s][GPU 575.11 MK/s][Count 2^41.81][Dead 1][01:12:06 (Avg 01:11:08)][898.5/1129.7MB] SaveWork: work_teslaT4...............done [898.5 MB] [06s] Fri May 8 18:15:10 2020 [573.57 MK/s][GPU 573.57 MK/s][Count 2^41.86][Dead 1][01:17:08 (Avg 01:11:20)][932.9/1172.6MB] SaveWork: work_teslaT4...............done [932.9 MB] [06s] Fri May 8 18:20:11 2020 [575.05 MK/s][GPU 575.05 MK/s][Count 2^41.91][Dead 1][01:22:09 (Avg 01:11:09)][967.4/1215.8MB] SaveWork: work_teslaT4...............done [967.4 MB] [07s] Fri May 8 18:25:13 2020 [574.34 MK/s][GPU 574.34 MK/s][Count 2^41.97][Dead 2][01:27:11 (Avg 01:11:14)][1001.9/1258.8MB] SaveWork: work_teslaT4...............done [1001.9 MB] [07s] Fri May 8 18:30:15 2020 [574.75 MK/s][GPU 574.75 MK/s][Count 2^42.01][Dead 2][01:32:11 (Avg 01:11:11)][1036.1/1301.7MB] SaveWork: work_teslaT4...............done [1036.1 MB] [07s] Fri May 8 18:35:16 2020 [573.45 MK/s][GPU 573.45 MK/s][Count 2^42.06][Dead 2][01:37:11 (Avg 01:11:21)][1070.3/1344.4MB] SaveWork: work_teslaT4...............done [1070.4 MB] [08s] Fri May 8 18:40:17 2020 [575.05 MK/s][GPU 575.05 MK/s][Count 2^42.11][Dead 2][01:42:13 (Avg 01:11:09)][1104.6/1387.2MB] SaveWork: work_teslaT4...............done [1104.6 MB] [08s] Fri May 8 18:45:19 2020 [575.27 MK/s][GPU 575.27 MK/s][Count 2^42.15][Dead 2][01:47:14 (Avg 01:11:07)][1138.8/1430.1MB] SaveWork: work_teslaT4...............done [1138.8 MB] [08s] Fri May 8 18:50:20 2020 [572.15 MK/s][GPU 572.15 MK/s][Count 2^42.16][Dead 2][01:47:57 (Avg 01:11:30)][1142.8/1435.0MB] Key# 0 [1S]Pub: 0x0284A930C243C0C2F67FDE3E0A98CE6DB0DB9AB5570DAD9338CADE6D181A431246 Priv: 0xB60E83280258A40F9CDF1649744D730D6E939DE92A2BE19B0D19A3D64A1DE032
Done: Total time 01:48:18
---------------------------------------------------------
|
|
|
|
Jean_Luc (OP)
|
|
May 08, 2020, 07:29:10 PM |
|
2 times more kangaroos with the same dp, overhead 2 time larger and it has solved in 2^42 (no lucky here) + 6GB wrote to disk. During the file saving, GPUs are waiting, the table is locked. I'll have a look at your result tomorow in details Thanks for the test...
|
|
|
|
HardwareCollector
Member
Offline
Activity: 144
Merit: 10
|
|
May 08, 2020, 10:48:54 PM |
|
I did run some more tests with 80-bit intervals and my luck seems to be very consistent with d=19-20. 1x RTX 2070, Total time 33:27./Kangaroos -t 0 -d 20 -gpu -gpuId 0 input_80_bit_interval.txt Kangaroo v1.4 Start:7FFFFFFFFFFFFFFFFFFF Stop :FFFFFFFFFFFFFFFFFFFF Keys :1 Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^20.17 Suggested DP: 19 Expected operations: 2^41.68 Expected RAM: 140.1MB DP size: 20 [0xfffff00000000000] GPU: GPU #0 GeForce RTX 2070 (36x64 cores) Grid(72x128) (117.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... SolveKeyGPU Thread GPU#0: 2^20.17 kangaroos [9.9s] [862.68 MK/s][GPU 862.68 MK/s][Count 2^40.45][Dead 0][33:16 (Avg 01:08:00)][45.9/79.7MB] Key# 0 [1S]Pub: 0x037E1238F7B1CE757DF94FAA9A2EB261BF0AEB9F84DBF81212104E78931C2A19DC Priv: 0xEA1A5C66DCC11B5AD180
Done: Total time 33:27
8x RTX 2080 Ti, Total time 02:28./Kangaroos -t 0 -d 19 -gpu -gpuId 0,1,2,3,4,5,6,7 -w server_1 -wi 300 input_80_bit_interval.txt Kangaroo v1.4 Start:7FFFFFFFFFFFFFFFFFFF Stop :FFFFFFFFFFFFFFFFFFFF Keys :1 Number of CPU thread: 0 Range width: 2^80 Jump Avg distance: 2^40.03 Number of kangaroos: 2^24.09 Suggested DP: 15 Expected operations: 2^43.40 Expected RAM: 858.1MB DP size: 19 [0xffffe00000000000] GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... GPU: GPU #7 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#7: creating kangaroos... GPU: GPU #2 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#2: creating kangaroos... GPU: GPU #3 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#3: creating kangaroos... GPU: GPU #6 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#6: creating kangaroos... GPU: GPU #5 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#5: creating kangaroos... GPU: GPU #4 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#4: creating kangaroos... GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used) SolveKeyGPU Thread GPU#1: creating kangaroos... SolveKeyGPU Thread GPU#6: 2^21.09 kangaroos [21.1s] SolveKeyGPU Thread GPU#2: 2^21.09 kangaroos [21.2s] SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos [21.3s] SolveKeyGPU Thread GPU#7: 2^21.09 kangaroos [23.4s] SolveKeyGPU Thread GPU#3: 2^21.09 kangaroos [23.7s] SolveKeyGPU Thread GPU#5: 2^21.09 kangaroos [24.0s] SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos [24.6s] SolveKeyGPU Thread GPU#4: 2^21.09 kangaroos [24.8s] [9827.80 MK/s][GPU 9827.80 MK/s][Count 2^39.93][Dead 0][02:00 (Avg 19:43)][63.1/97.2MB] Key# 0 [1S]Pub: 0x037E1238F7B1CE757DF94FAA9A2EB261BF0AEB9F84DBF81212104E78931C2A19DC Priv: 0xEA1A5C66DCC11B5AD180
Done: Total time 02:28
|
|
|
|
Jean_Luc (OP)
|
|
May 09, 2020, 04:15:39 AM Last edit: May 09, 2020, 12:34:09 PM by Jean_Luc |
|
@MrFreeDragon: I'm looking at your test. @HardwareCollector: I did a test on 100 trials, 40bit range, 2^12 Kangaroo dp=10 ( quasi square root of your config, cannot have dp=9.5 ). The 100 keys are uniformly distributed in the range. [ 98] 2^21.962 Dead:0 Avg:2^22.016 DeadAvg:1.4 (2^22.603) [ 99] 2^21.276 Dead:0 Avg:2^22.010 DeadAvg:1.4 (2^22.603) [100] 2^20.427 Dead:0 Avg:2^22.001 DeadAvg:1.4 (2^22.603) The calculation of the overhead gives 2^22.603 and the actual average is 2^22.001, exact average for dp=0 is 20^21.056. In that case it overestimate because I don't know the exact analytic expression of the time complexity of the DP method. I know that it converges to ~cubicroot( 16.numberOfKangaroo.N.2^dp ) when numberOfKangaroo >> sqrt(N)/2^dp. nbKangaroo.2^dp is an asymptote when numberOfKangaroo << sqrt(N)/2^dp , N is the range size, 2^dp is lower than sqrt(N). Here we are a bit between the 2 cases where the approximation is not really good. To compute the exact expression, it is like the birthday paradox with 2 tables but by drawing bunches of 2^dp random numbers nbKangaroo times alternatively in the 2 tables. Quite a nightmare, never detailed in all the papers I read. You can also see that the 3 last trials where under the average. This is due to the fact that the number of expected operation depends also where the private key is in the range and that the deviation is large. To make test there is 2 things, you need a large number of test with key uniformly distributed in the range. If you make test with always the same key, it is not representative. Edit: correction it was log2(numberOfKangaroo) >> log2(sqrt(N)) - log2(2^dp) so numberOfKangaroo >> sqrt(N)/2^dp
|
|
|
|
HardwareCollector
Member
Offline
Activity: 144
Merit: 10
|
|
May 09, 2020, 12:00:08 PM |
|
To make test there is 2 things, you need a large number of test with key uniformly distributed in the range. If you make test with always the same key, it is not representative.
Thanks for pointing out that again. Somehow I missed it the first time when you mentioned that the seed, for creating the kangaroo jumps, is now fixed. // https://github.com/JeanLucPons/Kangaroo/blob/e7f481f6ad86338288e43cb8758700459ac4b800/Kangaroo.cpp#L638 // Kangaroo jumps // Constant seed for compatibilty of workfiles rseed(0x600DCAFE);
|
|
|
|
Jean_Luc (OP)
|
|
May 09, 2020, 12:35:58 PM |
|
Yes the jumps have to be fixed otherwise paths differ and work files become incompatible. Hope the cafe will be good
|
|
|
|
arulbero
Legendary
Offline
Activity: 1937
Merit: 2080
|
|
May 10, 2020, 09:16:46 AM Last edit: May 10, 2020, 10:01:46 AM by arulbero |
|
The current record for a ECDLP solved on a curve over a prime field is 114-bit: https://ellipticnews.wordpress.com/2018/04/22/114-bit-ecdlp-solved-on-a-curve-with-automorphisms-over-a-prime-field/The curve has j-invariant 0, and so has an automorphism group of size 6. Hence, it is possible to perform the Pollard rho algorithm using equivalence classes of size 6.
They used n = 1024 partitions for the random walk, and the “hash function” was chosen to be the least significant log_2(n) bits of the x-coordinate of the current curve point.
The paper writes that “The parallel implementation of the rho method by adopting a client-server model, using 2000 CPU cores took about 6 months”. They seem to have been lucky to get a collision earlier than expected: “the result of the authors attack is little bit better than the average number of rational points where a simple collision attack stops.”
For the secp256k1, the current record is a ECDLP solved in a interval of 104 bit (key #105 of the "puzzle transaction") https://www.blockchain.com/btc/tx/08389f34c98c606322740c0be6a7125d9860bb8d5cb182c02f98461e5fa6cd15that key was found on 2019-09-23. This is the next public key (#110, with a private key in range [ 2^109 , 2^110 - 1], 109 bit) they have been looking for over 7,5 months (about 225 days): 0309976ba5570966bf889196b7fdf5a0f9a1e9ab340556ec29f8bb60599616167d (address: 12JzYkkN76xkwvcPT6AWKZtGX6w2LAgsJg) The Pollard's kangaroo ECDLP solver needs 2*(2^(109/2)) = 2^55.5 steps to retrieve this private key, a GPU that computes 2^30 steps/sec would take 2^25.5 seconds, about 550 days. A good article / recap about ECDLP: https://ellipticnews.wordpress.com/2016/04/07/ecdlp-in-less-than-square-root-time/
|
|
|
|
Jean_Luc (OP)
|
|
May 10, 2020, 11:50:21 AM |
|
Thanks for the info I didn't look at all addresses, the #110 is the only one remaining with the pub key exposed ?
|
|
|
|
|
|
arulbero
Legendary
Offline
Activity: 1937
Merit: 2080
|
|
May 10, 2020, 02:30:25 PM |
|
Thanks for the info I didn't look at all addresses, the #110 is the only one remaining with the pub key exposed ? #115, #120, #125, #130 ...., #160 are left
|
|
|
|
HardwareCollector
Member
Offline
Activity: 144
Merit: 10
|
|
May 10, 2020, 06:10:31 PM |
|
Yes the jumps have to be fixed otherwise paths differ and work files become incompatible. Hope the cafe will be good @Jean_Luc I have some good news, I performed over 250 tests with the client/server approach for [80-85]-bit intervals with randomly generated keys. When memory usage is not an issue for distinguished points storage, we seemed to converge around ~2(SquareRoot(interval size)), the worst cases where around ~7(Square Root(interval size)). Of course, every now and then we get very lucky and solve it in under SquareRoot(interval size). Now running some random 95-bit interval tests with 96x RTX 2080 Tis and expect the average solution time to be ~60 minutes. I will let in run for a couple of days and see. Seems to be the worst case: ~7(Square Root(interval size))./dlpserver ECDLP Server Started and Listening on Port 8090... ECDLP File Merger Process Started... Loading: savefile_1589084641 MergeWork: [HashTalbe 2.0/4.0MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 0 DP[20-bit]: count 2^15.13 [50s] Loading: savefile_1589084702 MergeWork: [HashTalbe 3.1/9.2MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 0 DP[20-bit]: count 2^16.27 [01:50] Loading: savefile_1589084762 MergeWork: [HashTalbe 4.4/14.7MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 0 DP[20-bit]: count 2^16.90 [02:50] . . . Loading: savefile_1589123004 MergeWork: [HashTalbe 112.3/149.4MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 0 DP[20-bit]: count 2^21.80 [02:50] Loading: savefile_1589123065 MergeWork: [HashTalbe 113.6/150.9MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 0 DP[20-bit]: count 2^21.82 [03:51] Loading: savefile_1589123125 MergeWork: [HashTalbe 115.0/152.5MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 0 DP[20-bit]: count 2^21.84 [04:51] Exiting File Merger Process... . . . Receiving savefile_1589124355 (8414564 bytes) Loading: savefile_1589124355 MergeWork: [HashTalbe 186.5/239.7MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 1 DP[20-bit]: count 2^22.57 [09:47] Receiving savefile_1589124361 (3480068 bytes) Loading: savefile_1589124361 MergeWork: [HashTalbe 192.5/247.2MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 1 DP[20-bit]: count 2^22.58 [14:55] Receiving savefile_1589124416 (8382084 bytes) Receiving savefile_1589124421 (3489508 bytes) Loading: savefile_1589124421 MergeWork: [HashTalbe 193.8/248.8MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 1 DP[20-bit]: count 2^22.59 [15:55] Loading: savefile_1589124416 MergeWork: [HashTalbe 195.2/250.5MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 1 DP[20-bit]: count 2^22.64 [10:48] Receiving savefile_1589124476 (8177604 bytes) Receiving savefile_1589124482 (3483076 bytes) Loading: savefile_1589124482 MergeWork: [HashTalbe 201.2/258.0MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 1 DP[20-bit]: count 2^22.65 [16:56] Loading: savefile_1589124476 MergeWork: [HashTalbe 202.5/259.6MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 1 DP[20-bit]: count 2^22.69 [11:48] Receiving savefile_1589124537 (7515044 bytes) Receiving savefile_1589124542 (3492612 bytes) Loading: savefile_1589124542 MergeWork: [HashTalbe 208.3/266.9MB] [00s] Appending... Range width: 2^80 Dead kangaroo: 1 DP[20-bit]: count 2^22.70 [17:56] Loading: savefile_1589124537 MergeWork: [HashTalbe 209.6/268.5MB] [00s] Appending... Range width: 2^80
Key# 0 [1S]Pub: 0x037E1238F7B1CE757DF94FAA9A2EB261BF0AEB9F84DBF81212104E78931C2A19DC Priv: 0xEA1A5C66DCC11B5AD180
|
|
|
|
arulbero
Legendary
Offline
Activity: 1937
Merit: 2080
|
|
May 10, 2020, 07:17:15 PM Last edit: May 10, 2020, 09:08:15 PM by arulbero |
|
I've got another idea, but it is not (much) faster:
we need 2.sqrt(N) steps to get a collision in a N-space,
we need 2.sqrt(2N) = sqrt(2).2.sqrt(N) steps to get a collision in a 2N-space,
we need 2.sqrt(3N) = sqrt(3).2.sqrt(N) steps to get a collision in a 3N-space.
If we find a way to generate sqrt(3).2.sqrt(N) points faster than 2.sqrt(N), we get a collision in less time, even if we triple our searching space.
If we use the endomorphism, we can triple our interval, from
[1,2,3, ..., n-1,n]
to
[1,2,3,....., n-1,n] + [lambda, 2*lambda, 3*lambda, ....., n*lambda] + [lambda^2, 2*lambda^2, 3*lambda^2, ....., n*lambda^2]
We have 3 types of jumps (always positive): 10 small steps (in first interval) + 10 big steps (in second interval) + 10 bigbig steps (in third interval)
For example: (1G, 2G, 4G, 8G, 24G, 37G,....., 512G) + (lambda*18G, lambda*72G, ..., lambda*816G) + (lambda^2*41G, lambda^2*10G, ..., lambda^2*653G)
+ 1 jump from range1 to range2 or from range1 to range2: P -> lambda*P
We set the probability to perform a jump between 2 different ranges = 50%, the other 50% is for normal jupms.
For a single step we compute the normal jump +k*G
(3 multiplications for the inverse + 1M + 1S for the x-coordinate + 1M for the y-coordinate = about 6M + some additions)
or a single multiplication: beta*x
You cannot have loops, you have only avoid 3 consecutive steps like: P -> lambda*P -> lambda*(lambda*P) -> lambda*(lambda^2*P = P), in this case you do a normal step P+kG instead of lambda*P.
If we look at a triple of consecutive jumps, we will have:
probability normal jump + normal jump + normal jump = (1/2)^3 normal jump + normal jump + lambda jump = (1/2)^3 * 3 normal jump + lambda jump + lambda jump = (1/2)^3 * 3
lambda jump + lambda jump + lambda jump= (1/2)^3 but -> loop, it is forbidden -> it becomes: lambda jump + lambda jump + lambda jump + normal jump
On average for each 3 points we need to perform (1/2)^3*18M + (1/2)^3*3*13M + (1/2)^3*3*8M + (1/2)^3*8M = 11.125M, 3.71M for each point.
In this way you perform sqrt(3).2.sqrt(N) steps in the same time you are performing now sqrt(3).2.(3,71/6).sqrt(N)= 2.14*sqrt(N)
If you double the interval:
[1,2,3,....., n-1,n] + [lambda, 2*lambda, 3*lambda, ....., n*lambda]
you perform sqrt(2).2.sqrt(N) steps in the same time you are performing now sqrt(2).2.(4,125/6).sqrt(N)= 1.945*sqrt(N)
|
|
|
|
HardwareCollector
Member
Offline
Activity: 144
Merit: 10
|
|
May 10, 2020, 10:47:13 PM |
|
This is the next public key (#110, with a private key in range [ 2^109 , 2^110 - 1], 109 bit) they have been looking for over 7,5 months (about 225 days): 0309976ba5570966bf889196b7fdf5a0f9a1e9ab340556ec29f8bb60599616167d
(address: 12JzYkkN76xkwvcPT6AWKZtGX6w2LAgsJg)
The Pollard's kangaroo ECDLP solver needs 2*(2^(109/2)) = 2^55.5 steps to retrieve this private key, a GPU that computes 2^30 steps/sec would take 2^25.5 seconds, about 550 days.
Based solely on the tests I’ve done so far with RTX 2080 Tis, it’s reasonable to expect that a 110-bit interval can be solved in 5 days, 7 hours, and 39 minutes on average with 128 RTX 2080 Tis. While consuming ~640GB(~80bytes/point in the hash table) of distinguished points storage with a 23-bit points mask. Also, one might get very lucky and solve it less than the average time, say a couple of days, or 18-19 days worst case. But we definitely can reduce the memory requirements by only using 4-bytes for herd type, 8-bytes for the x-coordinate, and 16-bytes for starting position of the kangaroo. When there’s a collision, we re-walk the kangaroo’s path up to the distinguished point and check for a solution.
|
|
|
|
Elliptic23
Newbie
Offline
Activity: 22
Merit: 3
|
|
May 11, 2020, 12:11:42 AM |
|
This is the next public key (#110, with a private key in range [ 2^109 , 2^110 - 1], 109 bit) they have been looking for over 7,5 months (about 225 days): 0309976ba5570966bf889196b7fdf5a0f9a1e9ab340556ec29f8bb60599616167d
(address: 12JzYkkN76xkwvcPT6AWKZtGX6w2LAgsJg)
The Pollard's kangaroo ECDLP solver needs 2*(2^(109/2)) = 2^55.5 steps to retrieve this private key, a GPU that computes 2^30 steps/sec would take 2^25.5 seconds, about 550 days.
128 RTX 2080 Tis. Who else other than you has that kind of GPU power? Those of us lucky to have 4 2080 Ti's have no chance of solving anything in time.
|
|
|
|
HardwareCollector
Member
Offline
Activity: 144
Merit: 10
|
|
May 11, 2020, 02:36:50 AM |
|
Who else other than you has that kind of GPU power? Those of us lucky to have 4 2080 Ti's have no chance of solving anything in time.
A lot of miners do, but we do not have a server with the much RAM available to solve the (110-bit interval) problem as fast as possible. So we need to come up with the best time-memory trade-off parameters. Also, I by far are more interested in the intellectual challenge, although the prize (in BTC) can be a very good motivator for some.
|
|
|
|
MrFreeDragon
|
|
May 11, 2020, 03:07:10 AM |
|
-snip- Based solely on the tests I’ve done so far with RTX 2080 Tis, it’s reasonable to expect that a 110-bit interval can be solved in 5 days, 7 hours, and 39 minutes on average with 128 RTX 2080 Tis. While consuming ~640GB(~80bytes/point in the hash table) of distinguished points storage with a 23-bit points mask. Also, one might get very lucky and solve it less than the average time, say a couple of days, or 18-19 days worst case. -snip-
Anyway this GPU power is not free. Even if you own it, you should have the alternative cots (like to use them for mining with the guaranteed profit). As for the rent, I made a quick search, and found that 8 x 2080 Ti 384 GB RAM Xeon Gold could be rented for 572 eur per week, i.e 620usd/week (source: https://www.leadergpu.com/#chose-best). So, the cost for 128 devices will be 128/8 * 620 = 9.9 kUSD per week.. Current prize value of #110 is 1.1x8.8kUSD = 9.7 kUSD, so the prize is more o less the same like the investments need to find it. But there is no 100% guarantee to solve the key, it is only 50%. Ok, that rent for 8 x2080ti is expensive. I am sure the real cost is 1.5-2 times less (it is possible to find 8x2080ti for 300-400 USD/week. However even you invest 2 times less (5kUSD), you still have 50% probability to solve the key and receive 9.7k. It is like a casino roulette game: put 5k on "red" or "black" and receive 2 times more with 50% probability just for 1 spin (with total duration 1 minute, and not 1 week). Or someone not ready to invest 5k, can rent just 4x2080ti for 200-300USD/week, divide the total range by (128/4) = 32, select any part of the initial range divided by 32, and search within it for 1 week with rented 4x2080ti. If lucky, he can also win. The prize will be the same, so the pot odds are 9.7k / 0.3k = 32-33 which is more or less like a bet on just one number on the same roulette (with 35:1 pot odds and chances to win), however with 2 times less probability. So, agree that the method is interesting for intellectual reasons. Money is just an advantage, but not the main goal.
|
|
|
|
Jean_Luc (OP)
|
|
May 11, 2020, 03:54:28 AM |
|
Good luck to solve the puzzle #110, I have the felling that It will be found very soon Take care of well configuring the 109bit range: Puzzle #110: 109bit 12JzYkkN76xkwvcPT6AWKZtGX6w2LAgsJg 2000000000000000000000000000 3FFFFFFFFFFFFFFFFFFFFFFFFFFF 0309976ba5570966bf889196b7fdf5a0f9a1e9ab340556ec29f8bb60599616167d
Ex for puzzle #80 (79bit key) 1BCf6rHUW6m3iH2ptsvnjgLruAiPQQepLe 80000000000000000000 FFFFFFFFFFFFFFFFFFFF 037E1238F7B1CE757DF94FAA9A2EB261BF0AEB9F84DBF81212104E78931C2A19DC
:\C++\Kangaroo\VC_CUDA10>x64\Release\Kangaroo.exe -d 17 -t 0 -gpu in79.txt Kangaroo v1.4 Start:80000000000000000000 Stop :FFFFFFFFFFFFFFFFFFFF Keys :1 Number of CPU thread: 0 Range width: 2^79 Jump Avg distance: 2^38.96 Number of kangaroos: 2^18.58 Suggested DP: 20 Expected operations: 2^40.60 Expected RAM: 496.9MB DP size: 17 [0xFFFF800000000000] GPU: GPU #0 GeForce GTX 1050 Ti (6x128 cores) Grid(12x256) (45.0 MB used) SolveKeyGPU Thread GPU#0: creating kangaroos... SolveKeyGPU Thread GPU#0: 2^18.58 kangaroos [2.0s] [159.53 MK/s][GPU 159.53 MK/s][Count 2^39.82][Dead 0][01:55:44 (Avg 02:54:02)][228.4/292.0MB] Key# 0 [1S]Pub: 0x037E1238F7B1CE757DF94FAA9A2EB261BF0AEB9F84DBF81212104E78931C2A19DC Priv: 0xEA1A5C66DCC11B5AD180
Done: Total time 01:55:48
I will have a look if i can do something with endomorphism.
|
|
|
|
stalker00075
Newbie
Offline
Activity: 54
Merit: 0
|
|
May 11, 2020, 11:14:21 AM |
|
please tell me if you can change the kangaroo so check not 1 public key but at the same time several? is this possible? (let's say 10,000 at the same time)
You are the best! the program is good and if I manage to find something I won’t forget you
|
|
|
|
|