Telariust
Jr. Member
Offline
Activity: 38
Merit: 18
|
|
August 24, 2019, 06:30:14 PM Last edit: August 24, 2019, 06:56:55 PM by Telariust |
|
..OpenSSL library is pretty slow.
OpenSSL's advantage in versatility, but not in speed, no. I bet python+coincurve will be faster! ideally you need C bitcoin/secp256k1 I am porting your code to bitcoin/secp256k1 as soon as the source code is published. (there is the usual addition and multiplication of points, nothing complicated) (..but I feel you want to do it yourself, your own and not someone else's head?)
|
|
|
|
BurtW (OP)
Legendary
Offline
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
|
|
August 25, 2019, 06:10:18 AM |
|
One weird thing is that the runs on the Linux machine took longer than the ones on Windows (usually is the other way around). And the other weird thing is that the program is called 'roo' in Windows and 'ROO.exe' in Linux It is actually ROO.exe in Windows but Windows adds the .exe for you and is not case sensitive. ..OpenSSL library is pretty slow.
... ideally you need C bitcoin/secp256k1 ... Thanks for the suggestion! I should have thought of that first. It is silly to use OpenSSL for secp256k1 when there is a highly optimized specialized implementation available. Results available soon.
|
Our family was terrorized by Homeland Security. Read all about it here: http://www.jmwagner.com/ and http://www.burtw.com/ Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
|
|
|
BurtW (OP)
Legendary
Offline
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
|
|
August 26, 2019, 10:23:54 PM Last edit: August 27, 2019, 01:54:34 AM by BurtW |
|
We now have a compile time switch between OpenSSL, secp256k1(SCALAR_8X32, FIELD_10X26), secp256k1(SCALAR_4X64, FIELD_5X52), and secp256k1(SCALAR_4X64, FIELD_5X52, ENDOMORMPHSIM). Here is the run time and hops per second comparison for the 32 unit tests: pkey Standard OpenSSL library secp256k1 (S 8X32, F 10X26) secp256k1(S 4X64, F 5X52) secp256k1(S4X64, F5X52, EM) bits hops/second test run time hops/second test run time hops/second test run time hops/second test run time ==== =========================== =========================== =========================== =========================== 1 2003.205128 6.0950 msecs 5114.435494 6.3592 msecs 4318.255425 6.4021 msecs 5269.397971 5.8835 msecs 2 1757.199025 8.3635 msecs 4655.764418 7.8586 msecs 5142.049107 7.0887 msecs 4802.497299 7.6262 msecs 3 1716.811880 5.9904 msecs 5003.752815 6.9754 msecs 5110.514884 6.0526 msecs 4268.487888 6.8087 msecs 4 1960.438354 26.3895 msecs 5092.427560 27.7752 msecs 5187.260089 7.4309 msecs 4996.252810 7.7770 msecs 5 1880.119970 30.0658 msecs 5013.009954 31.3478 msecs 5215.318134 9.4163 msecs 5236.254831 9.7029 msecs 6 1608.708475 30.6862 msecs 4410.467510 30.2971 msecs 4926.917392 22.3417 msecs 5644.933672 20.1065 msecs 7 1866.896315 34.8628 msecs 5558.843043 31.2577 msecs 5131.685676 21.5328 msecs 5181.347150 20.9325 msecs 8 1990.502460 31.4858 msecs 5039.596832 30.4067 msecs 4838.430966 20.6268 msecs 5072.279990 28.5510 msecs 9 1928.760658 40.3494 msecs 4950.639215 41.2204 msecs 5260.714838 22.1908 msecs 5241.170939 20.5226 msecs 10 1890.373754 81.8738 msecs 5256.467332 57.0551 msecs 5429.994625 32.7629 msecs 5453.290894 41.4165 msecs 11 1694.971872 101.2429 msecs 4763.924918 61.4267 msecs 5340.172912 41.3388 msecs 5439.249145 46.5306 msecs 12 1858.940556 1.0718 secs 5145.349959 527.2134 msecs 5326.737472 410.4776 msecs 5221.758623 434.4509 msecs 13 1895.071971 51.6923 msecs 5386.060874 38.1821 msecs 5233.349227 23.4991 msecs 5373.390968 31.1977 msecs 14 1908.767643 896.4858 msecs 5082.972044 385.9540 msecs 5430.446276 321.9922 msecs 5441.145301 317.7290 msecs 15 1939.352878 439.6496 msecs 5326.381969 195.0367 msecs 5488.742759 160.9818 msecs 5420.695712 179.5955 msecs 16 1918.732814 530.3667 msecs 5258.277326 219.2995 msecs 5510.868625 188.6750 msecs 5466.555751 201.7339 msecs 17 1895.048388 205.8168 msecs 5430.753319 95.6410 msecs 5453.448080 79.5526 msecs 5339.284536 85.5150 msecs 18 1930.620117 1.3492 secs 5136.760660 546.6825 msecs 5545.967087 474.6491 msecs 5433.455762 484.8219 msecs 19 1959.292325 594.2687 msecs 5364.976931 228.5351 msecs 5407.810389 219.3399 msecs 5489.027106 219.1785 msecs 20 1937.090596 1.2060 secs 5519.443944 438.4141 msecs 5567.924674 427.1013 msecs 5487.179193 432.8127 msecs 21 1934.545311 2.1349 secs 5493.508717 774.8660 msecs 5504.112724 758.1783 msecs 5469.417987 766.5974 msecs 22 1929.260673 3.4962 secs 5376.994864 1.2774 secs 5564.087823 1.2174 secs 5465.781801 1.2414 secs 23 1934.889761 7.0861 secs 5476.220007 2.5224 secs 5548.504187 2.4744 secs 5465.500565 2.5118 secs 24 1939.683627 5.8652 secs 5461.486912 2.1110 secs 5548.744349 2.0488 secs 5417.722111 2.1042 secs 25 1925.342112 47.4712 secs 5476.818503 16.7678 secs 5557.570589 16.4377 secs 5479.097929 16.6700 secs 26 1930.362383 16.1728 secs 5486.557150 5.7494 secs 5553.792419 5.6187 secs 5544.780029 5.6293 secs 27 1945.269663 7.2153 secs 5443.169561 2.6035 secs 5534.059747 2.5462 secs 5507.743611 2.5530 secs 28 1920.384111 37.4526 secs 5479.914967 13.1715 secs 5553.680361 12.9380 secs 5507.436467 13.0454 secs 29 1940.307975 7.8315 secs 5530.787084 2.7661 secs 5547.749218 2.7347 secs 5510.767015 2.7558 secs 30 1927.658333 1.1905 mins 5468.604811 25.2139 secs 5438.685448 25.3136 secs 5555.456216 24.7841 secs 31 1942.378857 28.4496 secs 5520.634781 10.0240 secs 5501.207943 10.0433 secs 5510.264398 10.0245 secs 32 1904.831829 59.7541 secs 5521.457239 20.6445 secs 5537.529539 20.5532 secs 5496.131886 20.7084 secs
Standard OpenSSL library averaged 1897 hops per second secp256k1 (S 8X32, F 10X26) averaged 5258 hops per second secp256k1(S 4X64, F 5X52) averaged 5352 hops per second secp256k1(S 4X64, F 5X52, , ENDOMORMPHSIM) averaged 5350 hops per second We currently have ECMULT_WINDOW_SIZE set to 15. Does anyone know how changing this setting will affect performance?
|
Our family was terrorized by Homeland Security. Read all about it here: http://www.jmwagner.com/ and http://www.burtw.com/ Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
|
|
|
BurtW (OP)
Legendary
Offline
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
|
|
August 27, 2019, 10:38:46 AM Last edit: August 27, 2019, 10:55:34 AM by BurtW |
|
I draw your attention, secp256k1_scalar_set_int() has x32-x64 problem my fix void secp256k1_scalar_set_uint64(secp256k1_scalar *r, uint64_t v){ #if defined(USE_SCALAR_4X64) r->d[0] = v; r->d[1] = r->d[2] = r->d[3] = 0; #elif defined(USE_SCALAR_8X32) r->d[0] = (uint32_t)(v); r->d[1] = (uint32_t)(v>>32); r->d[2]=r->d[3]=r->d[4]=r->d[5]=r->d[6]=r->d[7]=0; #endif }
I must probably unsubscribe to the developers I completely forgot maybe your problems grow from here. We also found this issue and also quickly created our own functions to get it to work. We also called our new functions secp256k1_scalar_set/get_uint64 so I guess great minds think alike! We were just trying to quickly get everything to work so we have not optimized anything yet. Our function was: static uint64_t secp256k1_scalar_get_uint64(secp256k1_scalar * a) { uint8_t b32[32]; secp256k1_scalar_get_b32(b32, a);
const uint64_t ret = ((uint64_t)b32[24] << 56 ) | ((uint64_t)b32[25] << 48 ) | ((uint64_t)b32[26] << 40 ) | ((uint64_t)b32[27] << 32 ) | ((uint64_t)b32[28] << 24 ) | ((uint64_t)b32[29] << 16 ) | ((uint64_t)b32[30] << 8 ) | ((uint64_t)b32[31] << 0 ) ;
return ret; }
Obviously yours is much better/faster so we have replaced ours with yours. Thanks! ENDOMORMPHSIM
Endomorphism is implemented here only to accelerate the signing of the transaction, so do not expect this to accelerate the usual multiplication. When we ran our benchmark there was no difference. That explains it, so thanks. We have turned off this flag since it does not help us. We currently have ECMULT_WINDOW_SIZE set to 15. Does anyone know how changing this setting will affect performance?
table #1 eprint.iacr.org/2016/103.pdf The size of a pre-computed table in memory is created just for this. Speeds up multiplication in exchange for RAM. Ryan also used the increased size -w 18/20/22/24.. in the brainflayer. ECMULT_WINDOW_SIZE? maybe ECMULT_TABLE_SIZE with WINDOW_G? ecmult_impl.h #else /** One table for window size 16: 1.375 MiB. */ #define WINDOW_G 16 #endif We currently have ECMULT_WINDOW_SIZE set to 15. Does anyone know how changing this setting will affect performance?
questions like these can only be answered with a well designed benchmark. in this case you have to measure how long it takes to do the precomputation with different values ∈ [2, 24] and how that compares with the rest of computation. if the rest is significantly longer then it can justify that long initial precomputation and allocation of memory. Thanks for the very useful information. We will play with this setting in the future. We also found this in the code: /** Larger values for ECMULT_WINDOW_SIZE result in possibly better * performance at the cost of an exponentially larger precomputed * table. The exact table size is * (1 << (WINDOW_G - 2)) * sizeof(secp256k1_ge_storage) bytes, * where sizeof(secp256k1_ge_storage) is typically 64 bytes but can * be larger due to platform-specific padding and alignment. * If the endomorphism optimization is enabled (USE_ENDOMORMPHSIM) * two tables of this size are used instead of only one. */ # define WINDOW_G ECMULT_WINDOW_SIZE
/* Noone will ever need more than a window size of 24. The code might * be correct for larger values of ECMULT_WINDOW_SIZE but this is not * not tested. * * The following limitations are known, and there are probably more: * If WINDOW_G > 27 and size_t has 32 bits, then the code is incorrect * because the size of the memory object that we allocate (in bytes) * will not fit in a size_t. * If WINDOW_G > 31 and int has 32 bits, then the code is incorrect * because certain expressions will overflow. */ #if ECMULT_WINDOW_SIZE < 2 || ECMULT_WINDOW_SIZE > 24 # error Set ECMULT_WINDOW_SIZE to an integer in range [2..24]. #endif
|
Our family was terrorized by Homeland Security. Read all about it here: http://www.jmwagner.com/ and http://www.burtw.com/ Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
|
|
|
Firebox
Jr. Member
Offline
Activity: 59
Merit: 3
|
|
August 29, 2019, 02:15:17 PM |
|
Who need some code, I have it for Python 2.7! This is my own code and I apologize for absolutely no-PEP. And for my English, I apologize too. The code is not so fast, but my estimation is more optimistic - only 10000 years on average for the problem 105. Tests with C++ (VS2013) leads to x2..x3 faster calculations only (single-thread), but much more complicated code. The code is on my WebSite: http://fe57.org/forum/thread.php?board=4&thema=1#1This is my main interest. You are now informed that my site exists in the Universe! To be more interesting: test problem 75 from pickachunakapika to drotika solution: 26970178626862821196031 (decimal) I guess that the solution can be posted now because all deadlines are finished. Or I miss something and drotika was already posted the answer. Very nice For those who use python3, you may have to change a few things for the code to work: 1) for all print statements you need (), for example line 60 change this > print 'prime must be 3 modulo 4' to this > print ('prime must be 3 modulo 4') 2) make sure you distinguish between integer division // and float division / for example, line 63 change this > pw = (p + 1) / 4 to this > pw = (p + 1) // 4 For case 32 I get P-table prepared tame and wild herds are prepared 6091.198 h/s 6119.991 h/s 6047.619 h/s 6028.418 h/s 6020.816 h/s 5995.830 h/s 6041.220 h/s 6026.008 h/s 6077.185 h/s total time: 45.38 sec SOLVED: 3093472814 Hops: 273742 tame and wild herds are prepared 5995.223 h/s 6044.420 h/s 5966.429 h/s 6050.819 h/s total time: 68.41 sec SOLVED: 3093472814 Hops: 137749 tame and wild herds are prepared 6038.017 h/s 5968.435 h/s total time: 82.83 sec SOLVED: 3093472814 Hops: 85546 165679.0 +/- 56093.66357263537 Average time to solve: 27.61 sec Hi, how did you change line 160 for python 3? if you mean this line: print '%.3f h/s'%((Hops-Hops_old)/(t1-t0)) just add parenthesis print ('%.3f h/s'%((Hops-Hops_old)/(t1-t0))) do the same for all print statements. Now everything works! I didn’t put it there () thank I have installed gmpy2 module v.2.0.8 and got a good results, speed increased from ~6k h/s till ~150k h/s and for e.g. #45 solved in ~100sec. But when I run the same script for e.g. #105 files wild.txt & tame.txt are empty ((. When you, guys, run this script for the #105 addr, do your files wild.txt & tame.txt being filled-up with any data? If yes share pls your version of script.
|
|
|
|
Mangal99
Newbie
Offline
Activity: 4
Merit: 0
|
|
August 29, 2019, 03:28:12 PM |
|
To Firefox: At startup files wild.txt & tame.txt were also empty, but after 2 days of work has appeared a few lines.
Does that mean that kargaroo jumps appears very late? I think yes. Puzzle # 105 has a lot of difficulty. Obviously, you need a version of the program on the GPU. There are already several programs for computing using the Pollard algorithm on the GPU, but they are not in the public domain. Also on freelance sites there are several orders for the development of such programs on CUDA and OpenCL. Budget from 300 to 1000 dollars. Maybe someone will share or sell a similar program.
|
|
|
|
Firebox
Jr. Member
Offline
Activity: 59
Merit: 3
|
|
August 29, 2019, 05:54:21 PM |
|
To Firefox: At startup files wild.txt & tame.txt were also empty, but after 2 days of work has appeared a few lines.
Does that mean that kargaroo jumps appears very late? I think yes. Puzzle # 105 has a lot of difficulty. Obviously, you need a version of the program on the GPU. There are already several programs for computing using the Pollard algorithm on the GPU, but they are not in the public domain. Also on freelance sites there are several orders for the development of such programs on CUDA and OpenCL. Budget from 300 to 1000 dollars. Maybe someone will share or sell a similar program. That is the problem, that versions of the script which are capable to use a GPU resources are available only for money. If any of it realy exist... I have a huge doubt! Otherwise why to sell it if you have it )). Take a fu****g loan in the bank, rent any huge cloud GPU VPS and open all addresses with available PUBKEY. That it! But as far as we can see, since this scripts were anounced none of the 100+ addresses was openned, so looks like or it doesn't work or it's fake.))) Anyhow, hopefully a good part of community soon will release an open source version on such script. Fingers crossed!
|
|
|
|
asher1101
Newbie
Offline
Activity: 17
Merit: 2
|
|
August 31, 2019, 07:03:43 PM |
|
Hello, I am glad to participate in this project. Regarding the Elliptic Curve, I think there is a weakness in the current Algorithm.
|
|
|
|
racminer
Member
Offline
Activity: 245
Merit: 17
|
|
August 31, 2019, 10:59:18 PM Last edit: August 31, 2019, 11:28:08 PM by racminer Merited by vapourminer (1) |
|
Finally I got gmpy2 installed on windows Thanks to this site : " https://www.lfd.uci.edu/~gohlke/pythonlibs/" which provides unofficial Windows binaries for Python Extension Packages. D:\newwork\python>python kang.py P-table prepared tame and wild herds are prepared 115378.321 h/s 115352.684 h/s 113976.331 h/s total time: 18.41 sec SOLVED: 1003651412950 Hops: 2101873 tame and wild herds are prepared 112978.446 h/s 112514.481 h/s 105994.030 h/s total time: 34.77 sec SOLVED: 1003651412950 Hops: 1799360 tame and wild herds are prepared 110405.305 h/s 110319.404 h/s 113115.999 h/s 114104.774 h/s 114309.523 h/s 112088.956 h/s 110844.133 h/s 113467.999 h/s 112101.691 h/s total time: 81.57 sec SOLVED: 1003651412950 Hops: 5249192 3050141.6666666665 +/- 1102987.655596129 Average time to solve: 27.19 sec
My modest config: Intel(R) Core(TM) i7-2670QM CPU @2.20GHz 16Gb (Dell Inspiron N5110) Without gmpy2, I was getting some 8000 h/s ... for example, if you are running windows 10 64 bit and you have python 3.7 installed, download this file : gmpy2-2.0.8-cp37-cp37m-win_amd64.whl from the site above and do: pip install gmpy2-2.0.8-cp37-cp37m-win_amd64.whl Next 0) uncomment ligne 3 (import gmpy2) 1) uncomment ligne 45 (c = dy * gmpy2.invert(dx, p) % p) 2) comment ligne 45 (#c = dy * rev(dx, p) % p ) good luck ! Ooops !!! , I saw this afteward ... https://bitcointalk.org/index.php?topic=5166284.260 from Telariust
|
|
|
|
racminer
Member
Offline
Activity: 245
Merit: 17
|
|
September 01, 2019, 12:04:13 PM |
|
Terminology for the science fair project
------------------------- -------------------------
0x034AF4B81F8C450C2C870CE1DF184AFF1297E5FCD54944D98D81E1A545FFB22596 54 bit 222872.369 h/s 223958.708 h/s --------------- 230794.045 h/s 230895.206 h/s 230857.369 h/s What is your PC configuration ? intel i3-6100, 8gb ddr4 ram, ubuntu 18.x It runs at 3.7GHz ... that explains your rate ...
|
|
|
|
brainless
Member
Offline
Activity: 330
Merit: 34
|
|
September 04, 2019, 07:29:45 PM |
|
I have tried the python code on linux and windows with or without gmpy2. I was able to retrieve all known private keys (up to case 100) with linux python. However when using windows things go wrong beyond case 59. For exemple case 60 ... the private key should be 0xFC07A1825367BDA but I get 0xFC07A1825367BXX with XX a random number case 61 ... the private key should be 0x13C96A3742F64906 but I get 0x13C96A3742F64XXX with XXX a random number case 63 ... the private key should be 0x7CCE5EFDACCF6808 but I get 0x7CCE5EFDACCF6XXX with XXX a random number etc ... This bug persists even without gmpy2 !!! I am working on it What hardware do you use and how long it took to crack 100? To check all known private keys, I changed the code to search within a given range [a, b], not necessarily the whole range. For instance, private key for case 100 is 0xaf55fc59c335c8ec67ed24826, so if I run my modified python script for range [ 0xaf55fc59c335c8e0000000000, 0xaf55fc59c335c8f0000000000 ], I get the answer in less than half a minute. like to share modifed script for search range ?
|
13sXkWqtivcMtNGQpskD78iqsgVy9hcHLF
|
|
|
BurtW (OP)
Legendary
Offline
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
|
|
September 05, 2019, 01:22:22 AM |
|
Does anyone know why this post has been erased? it is still here because "brainless" quoted it?
Yes, your entire conversation is still in the thread, again: Please note that since this thread will be used as reference material in a science fair project we delete most posts in this thread just to keep it clean. In other words, once a post is quoted then we delete the original post. Your posts will all still be in the thread.The thread stays shorter and cleaner this way. I hope this is not an issue for anyone. It is just to keep the thread as short and clean as possible. It is not because we do not appreciate your posts. We do!Thanks!
|
Our family was terrorized by Homeland Security. Read all about it here: http://www.jmwagner.com/ and http://www.burtw.com/ Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
|
|
|
BurtW (OP)
Legendary
Offline
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
|
|
September 05, 2019, 01:26:07 AM Last edit: September 05, 2019, 01:38:01 AM by BurtW |
|
I am up for testing also? It is live on GitHub yet?
Hello, I am glad to participate in this project. Regarding the Elliptic Curve, I think there is a weakness in the current Algorithm.
Also, I'm interested in testing... the fpga port. I realise that is a far way off, and I'm willing to help with the development if you and your daughter are open to the idea.
You have been added to the list of possible testers. See the OP of this thread. Whatup whatup, all doing their own test, nice to have something to test on a pkey or address, how should i know how it works, i don't know, any news fpga or gpu or cpu dont matter, i'm in, systems ready, got two days off, so love peace out,
You are already on the list. We are still working on our first release of code.
|
Our family was terrorized by Homeland Security. Read all about it here: http://www.jmwagner.com/ and http://www.burtw.com/ Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
|
|
|
BurtW (OP)
Legendary
Offline
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
|
|
September 05, 2019, 01:46:19 AM |
|
Terminology for the science fair project We have created the following terminology, inspired by the ITU radio bands, that we will be using to describe the relative entropy of the various private keys. For example private keys that have between 61 and 70 bits are called "low entropy" or LE private keys. Keys that use between 91 and 100 bits are called "high entropy" or HE private keys. The entire scale is as follows: max bits entropy range designation -------- ----------------------------- 10 RLE Ridiculously Low Entropy 20 TLE Tremendously Low Entropy 30 ELE Extremely Low Entropy 40 SLE Super Low Entropy 50 ULE Ultra Low Entropy 60 VLE Very Low Entropy 70 LE Low Entropy 80 MLE Medium Low Entropy 90 MHE Medium High Entropy 100 HE High Entropy 110 VHE Very High Entropy 120 UHE Ultra High Entropy 130 SHE Super High Entropy 140 EHE Extremely High Entropy 150 THE Tremendously High Entropy 160 RHE Ridiculously High Entropy
Using this terminology everyone is currently searching for a HE private key (105) and our program is currently only capable of finding ULE private keys in a 24 hour period: ver = 138 with 128 bit integers, sizeof(int) = 4 test = 53 seed = none (default value) bits = 54 a = 0x20000000000000 b = 0x3FFFFFFFFFFFFF P = 0x034AF4B81F8C450C2C870CE1DF184AFF1297E5FCD54944D98D81E1A545FFB22596 (33 bytes) ba = 0x1FFFFFFFFFFFFF k0000 = 0x2DCF462904B478D8 s = 0x236FB6D5AD1F43 s*G = 0x034AF4B81F8C450C2C870CE1DF184AFF1297E5FCD54944D98D81E1A545FFB22596 (33 bytes) time = 13.6929 hours(49294385241200 nsecs) hps = 5460.352734
We are making progress and we still have not converted our unique implementation of the algorithm to be run on a GPU or FPGA yet. Using this terminology we can say that our program can pretty easily find all RLE and TLE keys in simple brute force mode [set the PRF to always return 1]. Here are the execution times and number of hops in brute force mode from our program for the RLE, TLE and ELE private keys: bits brute force time hops (of 1*G) ---- ---------------- ------------- 1 0.8077 msecs 1 2 0.8464 msecs 1 3 9.8947 msecs 1 4 7.3781 msecs 8 5 12.5342 msecs 11 6 10.0498 msecs 15 7 20.7085 msecs 52 8 16.8994 msecs 32 9 15.0751 msecs 45 10 114.3055 msecs 510 11 207.3905 msecs 893 12 298.7654 msecs 1413 13 642.6892 msecs 2976 14 1.2877 secs 5840 15 1.2414 secs 5901 16 2.9654 secs 14026 17 7.3875 secs 35249 18 13.4464 secs 63475 19 34.9504 secs 166753 20 38.9897 secs 185259
0x034AF4B81F8C450C2C870CE1DF184AFF1297E5FCD54944D98D81E1A545FFB22596 54 bit 222872.369 h/s 223958.708 h/s 223408.635 h/s ... 221374.437 h/s 222090.945 h/s 220270.176 h/s total time: 1089.12 sec SOLVED: 9974455244496707 Hops: 246139766 Very impressive h/s. Thanks for the numbers. It gives us something to shoot for.
|
Our family was terrorized by Homeland Security. Read all about it here: http://www.jmwagner.com/ and http://www.burtw.com/ Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
|
|
|
racminer
Member
Offline
Activity: 245
Merit: 17
|
|
September 06, 2019, 11:27:20 AM |
|
I have tried the python code on linux and windows with or without gmpy2. I was able to retrieve all known private keys (up to case 100) with linux python. However when using windows things go wrong beyond case 59. For exemple case 60 ... the private key should be 0xFC07A1825367BDA but I get 0xFC07A1825367BXX with XX a random number case 61 ... the private key should be 0x13C96A3742F64906 but I get 0x13C96A3742F64XXX with XXX a random number case 63 ... the private key should be 0x7CCE5EFDACCF6808 but I get 0x7CCE5EFDACCF6XXX with XXX a random number etc ... This bug persists even without gmpy2 !!! I am working on it What hardware do you use and how long it took to crack 100? To check all known private keys, I changed the code to search within a given range [a, b], not necessarily the whole range. For instance, private key for case 100 is 0xaf55fc59c335c8ec67ed24826, so if I run my modified python script for range [ 0xaf55fc59c335c8e0000000000, 0xaf55fc59c335c8f0000000000 ], I get the answer in less than half a minute. like to share modifed script for search range ? See here: https://bitcointalk.org/index.php?topic=5166284.msg52375040#msg52375040
|
|
|
|
BurtW (OP)
Legendary
Offline
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
|
|
September 09, 2019, 03:56:42 PM Last edit: September 09, 2019, 04:13:04 PM by BurtW |
|
We have updated the OP with our latest results. Here is a summary of the results of runs for tests 0 through 49: test time hps hops compares mem(bytes) total time time hopping not hopping ---- -------------- ------- ---------- ------------ ---------- -------------- -------------- ----------- 0 0.3237 msecs 10,504 1 3 1032 323 95 228 1 0.2853 msecs 11,062 1 3 1032 285 90 194 2 0.2785 msecs 11,111 1 3 1032 278 90 188 3 1.8730 msecs 11,271 8 10 1032 1,873 709 1,163 4 1.8351 msecs 11,147 11 13 1032 1,835 986 848 5 2.1243 msecs 11,123 15 17 1032 2,124 1,348 775 6 5.2260 msecs 11,069 52 54 1032 5,226 4,697 528 7 3.9406 msecs 11,065 32 34 1032 3,940 2,891 1,048 8 4.6425 msecs 10,174 45 47 1032 4,642 4,423 219 9 46.8674 msecs 11,052 510 512 1032 46,867 46,146 721
10 84.2980 msecs 10,674 893 895 1032 84,298 83,661 636 11 122.2013 msecs 11,662 1413 1415 1032 122,201 121,162 1,039 12 257.2438 msecs 11,579 2976 2978 1032 257,243 257,019 224 13 514.6517 msecs 11,402 5840 5842 1032 514,651 512,196 2,455 14 501.1961 msecs 11,791 5901 5903 1032 501,196 500,466 729 15 1.1839 secs 11,886 14026 14028 1032 1,183,908 1,180,079 3,829 16 3.0370 secs 11,631 35249 35251 1032 3,037,005 3,030,586 6,419 17 5.6526 secs 11,230 63475 63477 1032 5,652,638 5,652,025 613 18 14.0759 secs 11,847 166753 166755 1032 14,075,930 14,075,708 222 19 15.5404 secs 11,922 185259 185261 1032 15,540,417 15,539,472 945
20 742.1446 msecs 11,882 8804 5677 32088 742,144 740,944 1,200 21 475.1973 msecs 11,881 5627 1903 32696 475,197 473,623 1,574 22 561.2366 msecs 11,555 6359 3169 33912 561,236 550,304 10,932 23 428.6246 msecs 11,867 5044 1167 36344 428,624 425,047 3,577 24 2.2156 secs 11,842 26026 22985 41208 2,215,606 2,197,825 17,780 25 350.9793 msecs 11,864 4025 695 50936 350,979 339,275 11,704 26 650.9386 msecs 11,695 7326 4443 70392 650,938 626,434 24,504 27 11.2562 secs 11,793 132214 128675 109304 11,256,223 11,211,378 44,845 28 4.5728 secs 11,819 52883 49679 187128 4,572,770 4,474,294 98,476 29 4.2294 secs 11,907 48300 45216 342776 4,229,449 4,056,522 172,927
30 10.6122 secs 11,759 124766 56010 62808 10,612,244 10,610,509 1,735 31 9.2328 secs 11,884 109705 41191 63416 9,232,760 9,231,168 1,592 32 14.3117 secs 11,922 170592 103053 64632 14,311,717 14,309,552 2,165 33 55.8702 secs 11,902 664893 596368 67064 55,870,151 55,862,085 8,066 34 22.1413 secs 11,923 263922 195453 71928 22,141,326 22,135,223 6,103 35 1.0531 mins 11,927 753508 683623 81656 63,186,883 63,175,281 11,602 36 28.3459 secs 11,924 337734 269198 101112 28,345,873 28,323,662 22,210 37 9.2387 mins 11,860 6573832 6505951 140024 554,319,152 554,275,660 43,492 38 11.2025 secs 11,926 132534 64338 217848 11,202,535 11,113,380 89,154 39 4.5557 mins 11,925 3257543 3189102 373496 273,342,973 273,170,580 172,393
40 13.1216 mins 11,927 9389808 7285993 124248 787,294,602 787,289,468 5,133 41 10.3313 mins 11,929 7394352 5292081 124856 619,880,117 619,878,329 1,787 42 3.8015 mins 11,935 2722138 620876 126072 228,090,413 228,087,570 2,842 43 5.7648 mins 11,924 4124314 2019820 128504 345,888,060 345,884,231 3,828 44 3.2043 mins 11,930 2293481 193018 133368 192,258,803 192,252,406 6,397 45 33.9166 mins 11,919 24254078 22153676 143096 2,034,994,703 2,034,982,887 11,815 46 47.9810 mins 11,922 34321486 32221012 162552 2,878,859,972 2,878,832,117 27,854 47 2.4000 hours 11,918 102968649 100868429 201464 8,639,864,920 8,639,820,958 43,962 48 4.1320 hours 11,924 177371081 175268406 279288 14,875,131,910 14,875,045,149 86,760 49 23.9987 hours 11,914 1029325281 1027225432 434936 86,395,345,966 86,395,170,751 175,214 ============== ====== ============== ============== =========== 11,630 118,106,695,147 118,105,560,480 1,134,666
Where test test number time run time needed to find the private key hps hops per second (averaged 11,630 on my laptop with a highly modified secp256k1 library) hops total number of hops it took to find the private key cmps total number of point comparisons it took to find the private key mem amount of memory dynamically allocated during the run total time total run time in msec time hopping total time spent hopping in msec not hopping the overhead (time not hopping) in msec
|
Our family was terrorized by Homeland Security. Read all about it here: http://www.jmwagner.com/ and http://www.burtw.com/ Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
|
|
|
Firebox
Jr. Member
Offline
Activity: 59
Merit: 3
|
|
September 10, 2019, 09:01:32 AM |
|
We have updated the OP with our latest results. Here is a summary of the results of runs for tests 0 through 49:...
That is already good result! When will be the first release?
|
|
|
|
Telariust
Jr. Member
Offline
Activity: 38
Merit: 18
|
|
September 10, 2019, 04:55:56 PM Last edit: October 09, 2019, 02:05:55 AM by Telariust |
|
The claimed avg "expected of 2w^(1/2) group operations" is not achievable with m = w^(1/2))/2 (mean jump size from acticles by Pollard)Empirical tests have shown that 2w1/2 is achievable at m * 8.I ask you to check on your implementation what speed will be if the MID/MAX step size is increased by 8 times, plz.thx, no need all norm after add # get JmaxofSp def getJmaxofSp(optimalmeanjumpsize, dS): if flag_verbose > 0: print('[optimal_mean_jumpsize] %s' % optimalmeanjumpsize)
sumjumpsize = 0
for i in range(1,len(dS)):
#sumjumpsize = (2**i)-1 #sumjumpsize += 2**(i-1) sumjumpsize += dS[i-1]
now_meanjumpsize = int(round(1.0*(sumjumpsize)/(i)))
#next_meanjumpsize = int(round(1.0*(sumjumpsize+2**i)/(i+1))) next_meanjumpsize = int(round(1.0*(sumjumpsize+dS[i])/(i+1)))
if flag_verbose > 1: print('[meanjumpsize#Sp[%d]] %s(now) <= %s(optimal) <= %s(next)' % (i, now_meanjumpsize, optimalmeanjumpsize, next_meanjumpsize ))
if optimalmeanjumpsize - now_meanjumpsize <= next_meanjumpsize - optimalmeanjumpsize : if flag_verbose > 0: print('[meanjumpsize#Sp[%d]] %s(now) <= %s(optimal) <= %s(next)' % (i, now_meanjumpsize, optimalmeanjumpsize, next_meanjumpsize ))
if flag_verbose > 0: print('[JmaxofSp] Sp[%s]=%s nearer to optimal mean jumpsize of Sp set' % (i, now_meanjumpsize))
return i
print("\n[FATAL_ERROR] JmaxofSp not defined!\n"); exit(-1)
################# about speed We have updated the OP with our latest results. Here is a summary of the results of runs for tests 0 through 49:...
That is already good result! no, absolutely not August 26, 2019 pkey Standard OpenSSL library secp256k1 (S 8X32, F 10X26) bits hops/second test run time hops/second test run time ==== =========================== =========================== 31 1942.378857 28.4496 secs 5520.634781 10.0240 secs
Standard OpenSSL library averaged 1897 hops per second secp256k1 (S 8X32, F 10X26) averaged 5258 hops per second September 09, 2019 (secp256k1) test time hps hops ---- -------------- ------- ---------- 31 9.2328 secs 11,884 109705
already better, but.. it very slow for 1core C/C++ openssl/secp256k1, it should be more! (even if u built early classic 1 Wild algo with 3.28w^(1/2), this does not explain x10 speed drop) We were just trying to quickly get everything to work so we have not optimized anything yet.
ok, we expect and hope see for compare Algo: 1 Tame + 1 Wild with distinguished points, expected of 2w^(1/2) group operations avg stat after 10tests 1core i5-2540, python37x64, win7x64 [i] 8.6K j/s; ... ... [################################################] [averages] expected of 2w^(1/2) group operations -------|--------/--------|---------------------------------/---------------------------------| W |jump avg/2w^(1/2)| time avg/2w^(1/2) | -------|--------/--------|---------------------------------/---------------------------------| >2^031 | 84.0K/ 92.7K | 0y 0m 0d 00:00:10s 002ms / 0y 0m 0d 00:00:11s 042ms | ----------------------------------------------------------------------------------------------
(it raw python!) 1core i5-2540, python37x64 + gmpy2, win7x64 [i] 73.6K j/s; ... ... [################################################] [averages] expected of 2w^(1/2) group operations -------|--------/--------|---------------------------------/---------------------------------| W |jump avg/2w^(1/2)| time avg/2w^(1/2) | -------|--------/--------|---------------------------------/---------------------------------| >2^031 | 90.0K/ 92.7K | 0y 0m 0d 00:00:01s 150ms / 0y 0m 0d 00:00:01s 185ms | ----------------------------------------------------------------------------------------------
################# about secp256k1 library Legendary main question from legendary master root of all ecc problems) Are you using affine or jacobian coordinates for the points?
42: Answer to the Ultimate Question of Life, the Universe, and Everything"secp256k1 have functions J+J->J , J+A->J , and no have functions A+A->A (Is this a precisely optimized library for bitcoin? ) (all simple, because in real often need multiply rand point instead adding, and jacobian is better for multiply) J+J->J (12M, 4S) - bad choice J+A->J (8M, 3S) - already better (look eprint.iacr.org/2016/103.pdf) you can take the code from break_short by Jochen Hoenicke, it fits perfectly. this code uses J+A->J and convert only X to Affine xJ ->xA (its max what we have in secp256k1 as it is) where A is S-table with pre-computed G,2G,4G,8G,16G.. ..u used pre-computed, y? not say me what u calc every Si permanently used multiplication, nonono,plz,no! X%mask instead pow2(X%k)? (..by the way, that would explain your terrible speed..) ..and use old alternatively functions without protection against side channel attacks, +10-20% speed secp256k1_gej_add_ge_var() instead secp256k1_gej_add_ge() (warn, _var() need normalize before use in some functions.. my early mistake, "leading zeros" github.com/bitcoin-core/secp256k1/issues/601) in ideal - need write new functions: A+A->A (1I, 2M, 1S) - best choice, because in the end we need X in affine for unique representation each point to adding in storage hashtable ..u used hashtable, y?) after which do not need to spend 8M,3S, only nearly 20M to each 1I, it +35% speed-up like that ######## I seem to know what the problem is. This is a very obvious fact for me, I should have started with it. (I should have guessed at a time when about ECMULT_WINDOW_SIZE) No multiplies the points inside the loop!I warn, multiplication points is very expensive, more expensive than the slowest addition points. Shouldn't use it at all in multi-million integration cycles. You must calculate all possible displacement points before the start of the cycle. Then just add them, being inside cycle. I think you are using a mask https://bitcointalk.org/index.php?topic=5166284.msg52068840#msg52068840..so you can’t store several million pre-calculated points in memory. it explains your choice. x = PRF(X) = 1 << (X % m) ... ec_point_mul_and_add(r,a,n)
not mask, i see,.. then simple, u tried built classic pattern of a*b+c ..rewritten the following functions..
I tell you - take loop from break_short as it is (with standart functions), rewrite to kangaroos, and benchmark it. if after you still want to rewrite what, then ok (but i think what not) I think the "secret sauce" will not create, it will turn out an exotic slow implementation through multiplication. I would like to be mistaken, but my experience says that you lose time. ..How do I know about multiplication?.. I checked it myself,.. and from the topic about why brainflayer is slow and why it cannot be done faster,.. and it was on one of the topics pages LBC. This is not the most famous fact, it is not surprising that you did not know about it, no fault. Its not random, not brainflayer, its pseudo random walk, so we can pre-compute our jumps size (offset points) for using addition points instead multiplication points.
|
|
|
|
BurtW (OP)
Legendary
Offline
Activity: 2646
Merit: 1137
All paid signature campaigns should be banned.
|
|
September 10, 2019, 09:20:51 PM Last edit: September 17, 2019, 12:41:29 PM by BurtW |
|
Thanks for all your input. We are reading it over carefully to see what we can use in our project and will get back to you. We have already rewritten the following functions to make them as fast as possible: static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp256k1_gej *r, const secp256k1_scalar *gn) SECP256K1_INLINE static int secp256k1_scalar_reduce(secp256k1_scalar *r, unsigned int overflow) static int secp256k1_scalar_add(secp256k1_scalar *r, const secp256k1_scalar *a, const secp256k1_scalar *b) We are working on rewriting: static void secp256k1_gej_add_ge(secp256k1_gej *r, const secp256k1_gej *a, const secp256k1_ge *b) Our internal function #define ec_point_mul_and_add(r,a,n) { secp256k1_gej rj; secp256k1_ecmult_gen(&ecgrp->ecmult_gen_ctx, &rj, n); // highly optimized secp256k1_gej_add_ge_var(&rj, &rj, a, NULL); // optimization is work in progress secp256k1_ge_set_gej_var(r, &rj); }
We heavily rely on and use the following observation to inform our parameter choices: If the PRF is defined as follows:
x = 1 << (X % m)
The PRF will generate one of the numbers: 2**0, 2**1, .., 2**(m-1) with equal probability
So the average hop distance as a function of m can be calculated as follows:
ahd = (1/m)(2**0) + (1/m)(2**1) + .. + (1/m)(2**(m-1)) = (1/m)[(2**0) + (2**1) + .. + (2**(m-1))] = (1/m)[(2**m) - 1] = ((2**m) - 1)/m
Therefore the entire distance hopped by the kangaroo, given nn hops, can be estimated as:
c = nn(ahd) = (nn/m)((2**m) - 1)
|
Our family was terrorized by Homeland Security. Read all about it here: http://www.jmwagner.com/ and http://www.burtw.com/ Any donations to help us recover from the $300,000 in legal fees and forced donations to the Federal Asset Forfeiture slush fund are greatly appreciated!
|
|
|
Telariust
Jr. Member
Offline
Activity: 38
Merit: 18
|
|
September 14, 2019, 08:34:32 PM Last edit: October 01, 2019, 01:28:55 AM by Telariust |
|
static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp256k1_gej *r, const secp256k1_scalar *gn) ... #define ec_point_mul_and_add(r,a,n) {
.. why the name of this function is not familiar to me? (although I used multiplication) .. and why I myself didn’t feel the need to write a*b+c ?.. i looked at my source code - always use // R = na*A + ng*G // secp256k1_ecmult(*ctx, gej *r, const gej *a, const scalar *na, const scalar *ng); 1899. Mr. Deull's most famous attributed utterance is that "everything that can be invented has been invented." I bet your experiments with ECMULT_WINDOW_SIZE didn’t give anything, because secp256k1_ecmult_gen() doesn’t use it I used exactly secp256k1_ecmult() because it is accelerated in ECMULT_WINDOW_SIZE. moreover, secp256k1_ecmult() uses endomorphism to accelerate multiplication (but only if you use both na*A+ng*G, not one of them) also why _ecmult() more faster than _gen() https://github.com/bitcoin-core/secp256k1/pull/41https://github.com/bitcoin-core/secp256k1/pull/210#issuecomment-96145070https://github.com/bitcoin-core/secp256k1/pull/210#issuecomment-96166989simple init for secp256k1_ecmult() //secp256k1_context *ctx = secp256k1_context_create(SECP256K1_CONTEXT_NONE); secp256k1_context *ctx = secp256k1_context_create(SECP256K1_CONTEXT_SIGN | SECP256K1_CONTEXT_VERIFY);
// ecmult vars secp256k1_scalar scalarK; secp256k1_scalar_set_int(&scalarK, 123456); secp256k1_scalar scalar0; secp256k1_scalar_set_int(&scalar0, 0); secp256k1_gej point_0gej; secp256k1_gej_set_infinity(&point_0gej);
// jacobian base point G secp256k1_gej secp256k1_gej_const_g; secp256k1_gej_set_ge(&secp256k1_gej_const_g, &secp256k1_ge_const_g);
################# I was interested to compare these functions. 1core i7-6820 secp256k1 (now, without endomorphism) make clean ./autogen.sh ./configure --enable-ecmult-static-precomputation --with-bignum=gmp --with-scalar=64bit --with-field=64bit --with-asm=x86_64 make
default ECMULT_WINDOW_SIZE = 15 ######## Test#1, multiplication, increment sequential points, calculation 1million pubkeys in loop: for(uint64_t i = 1; i<1000000 ; ++i){ secp256k1_scalar_set_uint64(&scalarK, i); // sequential points MULT_FUNCTION_HERE(&context, point_jacobian, &scalarK) }
results of benchmark //// Multiply: R = q*A (in constant-time) secp256k1_ecmult_const(&TestGej, &secp256k1_ge_const_g, &scalarK, 256);
1M at 58,0 sec; 0,017Mk/s constant-time? hoho) //// Multiply: R = a*G (with the generator) //// To harden against timing attacks .. Break up the multiplicand into groups of.. secp256k1_ecmult_gen(&ctx->ecmult_gen_ctx, &TestGej, &scalarK);
1M at 26,5 sec; 0,037Mk/s .. against timing attacks? y, its need rewrite, its right. //// R = na*A + ng*G //// secp256k1_ecmult(*ctx, gej *r, const gej *a, const scalar *na, const scalar *ng); //secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &secp256k1_gej_const_g, &scalarK, NULL); // NULL not recommended! secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &secp256k1_gej_const_g, &scalarK, &scalar0); // k*G + 0*G
1M at 8,2 sec; 0,121Mk/s same function, another //// R = na*A + ng*G //secp256k1_ecmult(&ctx->ecmult_ctx ,&TestGej, NULL, &scalar0, &scalarK); // NULL not recommended! secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &point_0gej, &scalar0, &scalarK); // 0*P0 + k*G
1M at 3,5 sec; 0,285Mk/s now u can see, how efficiency working ECMULT_WINDOW_SIZE (because we calculation sequential points) secp256k1_ecmult() (option "0*P0 + K*G") - best choice (x7.5 faster than secp256k1_ecmult_gen() for multiply sequential points) ######## Test#2, multiplication, pseudo random points, calculation 1million pubkeys in loop: for(uint64_t i = 1; i<1000000 ; ++i){ secp256k1_scalar_set_uint64(&scalarK, TestGej.x.n[0]); // pseudo random points MULT_FUNCTION_HERE(&context, point_jacobian, &scalarK) }
results of benchmark secp256k1_ecmult_const(&TestGej, &secp256k1_ge_const_g, &scalarK, 256);
1M at 58,5 sec; 0,017Mk/s same secp256k1_ecmult_gen(&ctx->ecmult_gen_ctx, &TestGej, &scalarK);
1M at 26,5 sec; 0,037Mk/s same secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &secp256k1_gej_const_g, &scalarK, &scalar0); // k*G + 0*G
1M at 16,4 sec; 0,061Mk/s diff! slower same function, another secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &point_0gej, &scalar0, &scalarK); // 0*P0 + k*G
1M at 9,5 sec; 0,105Mk/s diff! slower efficiency lower, ECMULT_WINDOW_SIZE working is not so good (because we calculation pseudo random points) secp256k1_ecmult() with "0*P0 + K*G" - best choice (x2.8 faster than secp256k1_ecmult_gen() for multiply pseudo random points) ######## ..multiplication points is very expensive, more expensive than the slowest addition points. Test#3, addition points, calculation 1million pubkeys in loop: for(uint64_t i = 1; i<1000000 ; ++i){ ADD_FUNCTION_HERE(&result_point_jacobian, 1point_jacobian, 2point_affine) }
results of benchmark secp256k1_gej_add_ge(&TestGej, &TestGej, &secp256k1_ge_const_g);
1M at 0sec 337msec; 2,9Mk/s secp256k1_gej_add_ge_var(&TestGej, &TestGej, &secp256k1_ge_const_g, NULL);
1M at 0sec 285msec; 3,5Mk/s 3,5Mk/s, Karl! secp256k1_gej_double_var(&TestGej, &TestGej, NULL);
1M at 0sec 183msec; 5,4Mk/s (its double, rarely used) secp256k1_gej_add_ge_var() - best choice (x33.3 faster than secp256k1_ecmult(), x92.9 faster than secp256k1_ecmult_gen()) https://github.com/Telariust/pollard-kangaroo-c99/blob/master/compare-test.c
|
|
|
|
|