Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: Etar on October 09, 2021, 05:46:55 PM



Title: BSGS solver for cuda
Post by: Etar on October 09, 2021, 05:46:55 PM
It is my implementation of BigStepGiantStep algorithm for Nvidia card (Cuda and Windows x64 only)
https://github.com/Etayson/BSGS-cuda
Let me know of your speed results.


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 09, 2021, 07:15:17 PM
It seems that you know a bit bsgs algo and x86 assembler....

So I would like to ask you one question. I already modified Jean's bsgs for curve "r1" (btc uses k1)

What is the meaning of start value? Jean even need start and stop values for k1 and k2.

Does the searched k must lie in this interval?


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 09, 2021, 08:40:45 PM
awesome. will let you know speed on various GPUs once I run it


Title: Re: BSGS solver for cuda
Post by: COBRAS on October 10, 2021, 05:12:57 AM
It is my implementation of BigStepGiantStep algorithm for Nvidia card (Cuda and Windows x64 only)
https://github.com/Etayson/BSGS-cuda
Let me know of your speed results.


Great. I thin your project will be more usable then JLP cangaro.

Tuning JLP kangaroo is a real big shit !!!!


Title: Re: BSGS solver for cuda
Post by: davidjjones on October 10, 2021, 10:35:50 AM
It is my implementation of BigStepGiantStep algorithm for Nvidia card (Cuda and Windows x64 only)
https://github.com/Etayson/BSGS-cuda
Let me know of your speed results.

I tested your BSGS on GTX 1660s, the speed was significantly slower than JeanLucPons Kangaroo:
BSGS-cuda => 330 Mkey/s
Kangaroo 2.2 => 450 Mkey/s


Title: Re: BSGS solver for cuda
Post by: COBRAS on October 10, 2021, 12:12:12 PM
It is my implementation of BigStepGiantStep algorithm for Nvidia card (Cuda and Windows x64 only)
https://github.com/Etayson/BSGS-cuda
Let me know of your speed results.

I tested your BSGS on GTX 1660s, the speed was significantly slower than JeanLucPons Kangaroo:
BSGS-cuda => 330 Mkey/s
Kangaroo 2.2 => 450 Mkey/s

Need real tests on how many time need for find exaple pprivkey, what code find faste.


Title: Re: BSGS solver for cuda
Post by: a.a on October 10, 2021, 04:23:47 PM
COBRAS, then how about you start testing and benchmarking? Or should others do that for you too?


Title: Re: BSGS solver for cuda
Post by: Etar on October 10, 2021, 07:12:18 PM
with v1.2 and single 2080ti i solve example pubkeys in range:
start: 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
end: 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff
in 28minutes with params -w 26:
Here is pubkeys for searching:
Code:
0459A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8
04A50FBBB20757CC0E9C41C49DD9DF261646EE7936272F3F68C740C9DA50D42BCD3E48440249D6BC78BC928AA52B1921E9690EBA823CBC7F3AF54B3707E6A73F34
0404A49211C0FE07C9F7C94695996F8826E09545375A3CF9677F2D780A3EB70DE3BD05357CAF8340CB041B1D46C5BB6B88CD9859A083B0804EF63D498B29D31DD1
040B39E3F26AF294502A5BE708BB87AEDD9F895868011E60C1D2ABFCA202CD7A4D1D18283AF49556CF33E1EA71A16B2D0E31EE7179D88BE7F6AA0A7C5498E5D97F
04837A31977A73A630C436E680915934A58B8C76EB9B57A42C3C717689BE8C0493E46726DE04352832790FD1C99D9DDC2EE8A96E50CAD4DCC3AF1BFB82D51F2494
040ECDB6359D41D2FD37628C718DDA9BE30E65801A88A00C3C5BDF36E7EE6ADBBAD71A2A535FCB54D56913E7F37D8103BA33ED6441D019D0922AC363FCC792C29A
0422DD52FCFA3A4384F0AFF199D019E481D335923D8C00BADAD42FFFC80AF8FCF038F139D652842243FC841E7C5B3E477D901F88C5AB0B88EE13D80080E413F2ED
04DB4F1B249406B8BD662F78CBA46F5E90E20FE27FC69D0FBAA2F06E6E50E536695DF83B68FD0F396BB9BFCF6D4FE312F32A43CF3FA1FE0F81DF70C877593B64E0
043BD0330D7381917F8860F1949ACBCCFDC7863422EEE2B6DB7EDD551850196687528B6D2BC0AA7A5855D168B26C6BAF9DDCD04B585D42C7B9913F60421716D37A
04332A02CA42C481EAADB7ADB97DF89033B23EA291FDA809BEA3CE5C3B73B20C49C410D1AD42A9247EB8FF217935C9E28411A08B325FBF28CC2AF8182CE2B5CE38
04513981849DE1A1327DEF34B51F5011C5070603CA22E6D868263CB7C908525F0C19EBA6BD2A8DCF651E4342512EDEACB6EA22DA323A194E25C6A1614ABD259BC0
04D4E6FA664BD75A508C0FF0ED6F2C52DA2ADD7C3F954D9C346D24318DBD2ECFC6805511F46262E10A25F252FD525AF1CBCC46016B6CD0A7705037364309198DA1
0456B468963752924DBF56112633DC57F07C512E3671A16CD7375C58469164599D1E04011D3E9004466C814B144A9BCB7E47D5BACA1B90DA0C4752603781BF5873
04D5BE7C653773CEE06A238020E953CFCD0F22BE2D045C6E5B4388A3F11B4586CBB4B177DFFD111F6A15A453009B568E95798B0227B60D8BEAC98AF671F31B0E2B
04B1985389D8AB680DEDD67BBA7CA781D1A9E6E5974AAD2E70518125BAD5783EB5355F46E927A030DB14CF8D3940C1BED7FB80624B32B349AB5A05226AF15A2228
0455B95BEF84A6045A505D015EF15E136E0A31CC2AA00FA4BCA62E5DF215EE981B3B4D6BCE33718DC6CF59F28B550648D7E8B2796AC36F25FF0C01F8BC42A16FD9
it is 6 times faster then original bsgs from JLP based on CPU.


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 10, 2021, 08:04:18 PM
Ok, and how fast it would be with interval
000000000....00000000
Ffffffffffff......fffffffffffff

?


Title: Re: BSGS solver for cuda
Post by: _Counselor on October 10, 2021, 09:11:25 PM
How many bytes of memory do you need to store one babystep? Hashtable uses GPU memory or global ram?


Title: Re: BSGS solver for cuda
Post by: a.a on October 10, 2021, 09:54:00 PM
COBRAS, then how about you start testing and benchmarking? Or should others do that for you too?


I suppose that was sarcasm:D

Yeah something like sarcasm. COBRAS is a lazy lurker. And you can see, that his last post does not make any sense.


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 10, 2021, 10:20:54 PM
RTX 3070 = 1,000 MKey/s

Code:
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5ebb3ef3883c1866d4
Pub: 59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8
****************************
Found in 34 seconds
GPU #1 finished
GPU #3 finished
GPU #5 finished
GPU #2 finished
GPU #4 finished

Default settings. Have not tinkered with settings to see if GPUs can gain any speed.


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 10, 2021, 11:00:11 PM
I ran the same test as Etar and JLP, with 16 pubkeys:

Code:
0459A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8
04A50FBBB20757CC0E9C41C49DD9DF261646EE7936272F3F68C740C9DA50D42BCD3E48440249D6BC78BC928AA52B1921E9690EBA823CBC7F3AF54B3707E6A73F34
0404A49211C0FE07C9F7C94695996F8826E09545375A3CF9677F2D780A3EB70DE3BD05357CAF8340CB041B1D46C5BB6B88CD9859A083B0804EF63D498B29D31DD1
040B39E3F26AF294502A5BE708BB87AEDD9F895868011E60C1D2ABFCA202CD7A4D1D18283AF49556CF33E1EA71A16B2D0E31EE7179D88BE7F6AA0A7C5498E5D97F
04837A31977A73A630C436E680915934A58B8C76EB9B57A42C3C717689BE8C0493E46726DE04352832790FD1C99D9DDC2EE8A96E50CAD4DCC3AF1BFB82D51F2494
040ECDB6359D41D2FD37628C718DDA9BE30E65801A88A00C3C5BDF36E7EE6ADBBAD71A2A535FCB54D56913E7F37D8103BA33ED6441D019D0922AC363FCC792C29A
0422DD52FCFA3A4384F0AFF199D019E481D335923D8C00BADAD42FFFC80AF8FCF038F139D652842243FC841E7C5B3E477D901F88C5AB0B88EE13D80080E413F2ED
04DB4F1B249406B8BD662F78CBA46F5E90E20FE27FC69D0FBAA2F06E6E50E536695DF83B68FD0F396BB9BFCF6D4FE312F32A43CF3FA1FE0F81DF70C877593B64E0
043BD0330D7381917F8860F1949ACBCCFDC7863422EEE2B6DB7EDD551850196687528B6D2BC0AA7A5855D168B26C6BAF9DDCD04B585D42C7B9913F60421716D37A
04332A02CA42C481EAADB7ADB97DF89033B23EA291FDA809BEA3CE5C3B73B20C49C410D1AD42A9247EB8FF217935C9E28411A08B325FBF28CC2AF8182CE2B5CE38
04513981849DE1A1327DEF34B51F5011C5070603CA22E6D868263CB7C908525F0C19EBA6BD2A8DCF651E4342512EDEACB6EA22DA323A194E25C6A1614ABD259BC0
04D4E6FA664BD75A508C0FF0ED6F2C52DA2ADD7C3F954D9C346D24318DBD2ECFC6805511F46262E10A25F252FD525AF1CBCC46016B6CD0A7705037364309198DA1
0456B468963752924DBF56112633DC57F07C512E3671A16CD7375C58469164599D1E04011D3E9004466C814B144A9BCB7E47D5BACA1B90DA0C4752603781BF5873
04D5BE7C653773CEE06A238020E953CFCD0F22BE2D045C6E5B4388A3F11B4586CBB4B177DFFD111F6A15A453009B568E95798B0227B60D8BEAC98AF671F31B0E2B
04B1985389D8AB680DEDD67BBA7CA781D1A9E6E5974AAD2E70518125BAD5783EB5355F46E927A030DB14CF8D3940C1BED7FB80624B32B349AB5A05226AF15A2228
0455B95BEF84A6045A505D015EF15E136E0A31CC2AA00FA4BCA62E5DF215EE981B3B4D6BCE33718DC6CF59F28B550648D7E8B2796AC36F25FF0C01F8BC42A16FD9

Total time:
Code:
GPU #4 finished
GPU #1 finished
GPU #5 finished
GPU #2 finished
GPU #3 finished
Total time 00:15:11s
cuda finished ok

Press Enter to exit

For comparison, JLP with CPU only took 3 hours and 35 minutes.


Title: Re: BSGS solver for cuda
Post by: Etar on October 11, 2021, 05:25:25 AM
How many bytes of memory do you need to store one babystep? Hashtable uses GPU memory or global ram?
each baby step used 8 bytes memory. HT stored in GPU memory.
with -w 26 and -htsz 25(default), app generate 2^26 babysteps that stored in HT with size (2^25 + 2^26 )*8 bytes


Title: Re: BSGS solver for cuda
Post by: fxsniper on October 11, 2021, 10:46:42 AM

Thank Etar

I think BSGS-cuda is work better than JLP BSGS
JLP BSGS is good but using very long time (for my GPU)

I test first sample command from github page

speed result (GPU GTX 1050 on laptop)
Code:
Found in 972 seconds
Total time 00:16:21s


Code:
bsgscudaHT2.exe -t 512 -b 68 -p 256 -pb 59A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8 -pk 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000 -w 26
Number of GPU threads set to #512
Number of GPU blocks set to #68
Number of pparam set to #256
Pubkey set to 0x59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8
Range begin: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
Items number set to #67108864
APP VERSION: 1.2
Found 1 Cuda device.
Cuda device:NVIDIA GeForce GTX 1050(4095Mb)
Device have: MP:5 Cores+320
Shared memory total:49152
Constant memory total:65536
---------------
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000008000000
GiantSUBpubkey:(a94c6524bd40d2bbdac85c056236a79da78bc61fd5bdec9d2bf26bd84b2438e84adfe0266d069d7f0286de6afafe61c581a2c39f5f1c64d43d1d37230e799a3b)
*******************************
Total GPU Memory Need: 1584.000Mb
*******************************
Generate Giants Buffer: 8912896 items
Load BIN file:512_68_256_67108864_g2.BIN
[0] chunk:570425344b
Done in 00:00:00s
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_b.BIN
[0] chunk:536870912b
Done in 00:00:00s
Verify baby array...ok
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_s.BIN
[0] chunk:536870912b
Done in 00:00:00s
Verify sorted array...ok
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_ht.BIN
[0] chunk:805306368b
Verify packed HT items...ok
Verify packed HT items sorting...ok
Total removed items: 0, freed memory: 1312.000 MB
GPU count #1
START RANGE= 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000
SUBpoint= (3c52f78892c8c2f5c51d7249951bbb1c302a8ed4d37561724e68e8d22db14a69, e0ba5063f64117bccd7fc6c1d5b97df4f0bdc5a6ba481f21e69da330ed9750ae)
FINDpubkey= (59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc, 994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8)
NewFINDpubkey= (de84b4334e87f1d1466f8c382c279ab7ac0e20d3510cec74abfd4b6b94fc7833, 9d2e496386ca9fafd5e806ddeba50e875b3a56fd1bde9711581957f5229d0663)
***************************
GPU #0 launched
GPU #0 TotalBuff: 1584.000Mb
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000000066000000000001 99MKey/s x67108864 2^26.63 x2^27=2^53.63
GPU#0 Cnt:00000000000000000000000000000000000000000000000000cc000000000001 100MKey/s x67108864 2^26.65 x2^27=2^53.65


Result
Code:
GPU#0 Cnt:000000000000000000000000000000000000000000000000ba78000000000001 98MKey/s x67108864 2^26.62 x2^27=2^53.62
GPU#0 Cnt:000000000000000000000000000000000000000000000000bade000000000001 98MKey/s x67108864 2^26.62 x2^27=2^53.62
***********GPU#0************
Total solutions: 1
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5ebb3ef3883c1866d4
Pub: 59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8
****************************
Found in 972 seconds
GPU #0 finished
Total time 00:16:21s
cuda finished ok

Press Enter to exit


Title: Re: BSGS solver for cuda
Post by: Etar on October 11, 2021, 07:24:13 PM
Mantadory update v.1.2.1
*bug fixed with multy GPU searching.


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 11, 2021, 07:31:57 PM
Quote
I think BSGS-cuda is work better than JLP BSGS
JLP BSGS is good but using very long time (for my GPU)
JLP's BSGS does not support GPU; his is CPU only.

Side by side tests of BSGS Cuda and JLP's Kangaroo...

4 pubkeys all in 65 bit range:

Kangaroo total time = 2 mins 34 seconds:
Code:
[4921.81 MK/s][GPU 4517.36 MK/s][Count 2^33.89][Dead 0][04s (Avg 04s)][121.0/159.5MB]
Key# 0 [1S]Pub:  0x02400C76A4D227D7BCFE00DC5CE7C935DE02AD42749A712ED4D98D290313DC49D2
       Priv: 0x17838B13505B26867
[1135.79 MK/s][GPU 1135.79 MK/s][Count 2^34.27][Dead 0][34s (Avg 18s)][156.6/202.5MB]
Key# 1 [1S]Pub:  0x021D6440B8338632692397D3D98FB6B62055E267E4333EC2A9316E72845649109A
       Priv: 0x18838B13505B26867
[1485.74 MK/s][GPU 1485.74 MK/s][Count 2^34.64][Dead 0][36s (Avg 13s)][201.9/258.9MB]
Key# 2 [1S]Pub:  0x03047BA9686B470D7BCCFF8305D1C440389CE43A111CA79DFD25C9943B1949F729
       Priv: 0x1012A713505B26867
[1835.35 MK/s][GPU 1835.35 MK/s][Count 2^34.94][Dead 2][38s (Avg 11s)][246.9/315.1MB]
Key# 3 [1S]Pub:  0x02094C07F799C681B9A501A70618E260E47E777A141BF6A445523254DAF1085385
       Priv: 0x1F028A10C05B26867

Done: Total time 02:34

BSGS Cuda total time = 1 min 29 seconds:
Code:
GPU#2 Cnt:000000000000000000000000000000000000000000000000b850800000000001 859MKey/s x134217728 2^29.75 x2^28=2^57.75
KEY!!>000000000000000000000000000000000000000000000001f028a10c05b26867
Pub: 094c07f799c681b9a501a70618e260e47e777a141bf6a445523254daf1085385c22b8f7747f0b280dac05dc2f60085de07af8e080bf32a1d3befb1f83c1f5404
****************************
Found in 19 seconds
GPU #0 finished
GPU #2 finished
GPU #1 finished
GPU #3 finished
Total time 00:01:29s
cuda finished ok

Press Enter to exit


For at least this range (and probably more up to a certain size) the BSGS Cuda program will be faster, for checking multiple pubkeys, as the spin up time between
pub keys (finding a pub key and moving to the next pub key) is a lot faster than kangaroo program.


Title: Re: BSGS solver for cuda
Post by: fxsniper on October 12, 2021, 01:46:53 AM
Quote
I think BSGS-cuda is work better than JLP BSGS
JLP BSGS is good but using very long time (for my GPU)
JLP's BSGS does not support GPU; his is CPU only.

Correct, Sorry I forget it, I mean it use very slow for my laptop work , sometime I give up to end task for waiting longtime overnight


Title: Re: BSGS solver for cuda
Post by: fxsniper on October 12, 2021, 01:56:51 AM

4 pubkeys all in 65 bit range:


work fast on 65 bit range

still limited for power can fine on 120 bit right

and still limited to fine range that on 65 bit nearly point to hit key


Title: Re: BSGS solver for cuda
Post by: hamnaz on October 13, 2021, 04:11:22 PM
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates



Title: Re: BSGS solver for cuda
Post by: NotATether on October 13, 2021, 04:18:33 PM
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates

CUDA toolkits don't support your CUDA version and CCap anymore, therefore it is highly unlikely you will find any brute-forcing software that works with your GPU. You're better using a newer GPU with ccap 6.0+ (even then, there is no Linux port of this code).


Title: Re: BSGS solver for cuda
Post by: hamnaz on October 13, 2021, 04:24:14 PM
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates

CUDA toolkits don't support your CUDA version and CCap anymore, therefore it is highly unlikely you will find any brute-forcing software that works with your GPU. You're better using a newer GPU with ccap 6.0+ (even then, there is no Linux port of this code).
i purchased tesla k80, will arrive at me aprox 7 days later, will that work ?


Title: Re: BSGS solver for cuda
Post by: NotATether on October 13, 2021, 04:32:10 PM
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates

CUDA toolkits don't support your CUDA version and CCap anymore, therefore it is highly unlikely you will find any brute-forcing software that works with your GPU. You're better using a newer GPU with ccap 6.0+ (even then, there is no Linux port of this code).
i purchased tesla k80, will arrive at me aprox 7 days later, will that work ?

Sorry I made a mistake, anything with ccap 3.5+ will work. Yours is a Kepler GK210 model with ccap 3.7, so it should work fine [despite the caveat on Wikipedia (https://en.wikipedia.org/wiki/CUDA) saying that CUDA Toolkit 11.x only partially supports Kepler].



Title: Re: BSGS solver for cuda
Post by: PrivatePerson on October 13, 2021, 05:50:50 PM
Is it really possible to find a 100-bit key on one video card? How long does it take for this?


Title: Re: BSGS solver for cuda
Post by: math09183 on October 13, 2021, 06:01:36 PM
searching these 2 pubkeys in 100 bit range
034786ac12686480348261b5dce84efcffc27b56b512ca793a09229ed06d63058d
027ede4f01c7dd2690603cd0449fc4e4ac9ca2d11de2404ef2285ab897d2645391

some one can help me to understand what hardware gpu's models you are using for above result data ?

is there any ubuntu compilation/sourcecode program available, for cuda 8.0 and ccap 20, g++ 4.8

love to see your updates


Why? There are no coins.


Title: Re: BSGS solver for cuda
Post by: hamnaz on October 13, 2021, 06:48:29 PM
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
as i see 100bit puzzle was picked by telaurist who write first kangaroo ver in cpu, and he used 1 gpu to find it
maybe latest cards do it fast


Title: Re: BSGS solver for cuda
Post by: NotATether on October 14, 2021, 08:15:17 AM
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
as i see 100bit puzzle was picked by telaurist who write first kangaroo ver in cpu, and he used 1 gpu to find it
maybe latest cards do it fast

They definitely do not do it fast because that's what would happen if the range was 50-60 bits... you sure his program wasn't published after he took #100 coins? Maybe his was the only Kangaroo program at the time and he kept it to himself until he found some private key.


Title: Re: BSGS solver for cuda
Post by: Minase on October 14, 2021, 09:25:35 AM
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
as i see 100bit puzzle was picked by telaurist who write first kangaroo ver in cpu, and he used 1 gpu to find it
maybe latest cards do it fast

It's quite possible to find 100bit puzzle with single video card and not even the most powerful one. (kangaroo method)
On single RTX 2060 you can find such a key in 34-35 days (2^51 operations). Sometimes you dont even need full 2^51, you can find the key even when you reach 2^50 (this means half of time ~17 days).
If we are talking about RTX 2080 then the speed is higher with almost 50% compared to 2060, this leads us to ~23 days for full 2^51 range.


Title: Re: BSGS solver for cuda
Post by: hamnaz on October 14, 2021, 09:45:59 AM
Is it really possible to find a 100-bit key on one video card? How long does it take for this?
as i see 100bit puzzle was picked by telaurist who write first kangaroo ver in cpu, and he used 1 gpu to find it
maybe latest cards do it fast

It's quite possible to find 100bit puzzle with single video card and not even the most powerful one. (kangaroo method)
On single RTX 2060 you can find such a key in 34-35 days (2^51 operations). Sometimes you dont even need full 2^51, you can find the key even when you reach 2^50 (this means half of time ~17 days).
If we are talking about RTX 2080 then the speed is higher with almost 50% compared to 2060, this leads us to ~23 days for full 2^51 range.
with rtx 3xxx series maybe do it in hours ?
above 2 random key generate, one from first half and 2nd is 2nd half of 100 bit, i want to know how much fast rtx 3xxx series could found, i need to calc times, if you have rtx and have some time , to find above pubkeys in 100 bit, will help me to 3xxx power for time
thankx


Title: Re: BSGS solver for cuda
Post by: Minase on October 14, 2021, 10:08:14 AM
i dont have 3xxx series available but based on specs i can calculate the average speed.
with one 3090 or 3080ti 2^51 operations should be done in 6-7 days
//edit
based on your previous post your tesla k80 will find (if lucky) a private key, if it's in range 100bit, in ~ 25-26 days


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 07:37:15 AM
GPU #0 launched
GPU #0 TotalBuff: 8112.000Mb
error cuMemAlloc-2
Press Enter to exit

i guess you hard coded 4096 GPU mem as i did everything but i am unable utilizing full GPU memory  , my GPU is 3080 with 10GB

this is the max i can use

GPU #0 launched
GPU #0 TotalBuff: 3216.000Mb

      

speed is also slower than Kangaroo around 1200M i am getting , but i want to tweak to utilize max gpu memory and max ram with max power , increase item size will slow down speed and take longer to solve .

any idea how to tweak 


Title: Re: BSGS solver for cuda
Post by: NotATether on October 15, 2021, 07:52:45 AM
speed is also slower than Kangaroo around 1200M i am getting , but i want to tweak to utilize max gpu memory and max ram with max power , increase item size will slow down speed and take longer to solve .

any idea how to tweak 

Possibly due to "memory fragmentation" that happens when the program allocates GPU memory for one stuct, it's allocated in the middle of GPU memory and that will limit the maximum contiguous memory allocation allowed on the GPU for other structs.

The resolution for it is to allocate the largest structure first (in this case the TotalBuff) and then the smaller ones last. It requires a code modification though, which is impossible to do without the source code.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 08:24:12 AM
speed is also slower than Kangaroo around 1200M i am getting , but i want to tweak to utilize max gpu memory and max ram with max power , increase item size will slow down speed and take longer to solve .

any idea how to tweak 

Possibly due to "memory fragmentation" that happens when the program allocates GPU memory for one stuct, it's allocated in the middle of GPU memory and that will limit the maximum contiguous memory allocation allowed on the GPU for other structs.

The resolution for it is to allocate the largest structure first (in this case the TotalBuff) and then the smaller ones last. It requires a code modification though, which is impossible to do without the source code.

source codes are available i guess here https://github.com/Etayson/BSGS-cuda/blob/main/bsgscudaussualHTchangeble1_2.pb (https://github.com/Etayson/BSGS-cuda/blob/main/bsgscudaussualHTchangeble1_2.pb) can you check please


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 08:29:17 AM
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                     wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions


Title: Re: BSGS solver for cuda
Post by: NotATether on October 15, 2021, 10:11:54 AM
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                     wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions

There is no need to wait for a patch, you can independently get these stats on an NVIDIA card using their sample DeviceQuery program: https://github.com/NVIDIA/cuda-samples/blob/master/Samples/deviceQuery/deviceQuery.cpp - It needs to be compiled from source though but it's extremely easy to do since it's only a single file.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 10:21:44 AM
i agree with you but free purebasic program can compile only small code lines so that's why i need help from @Etar

and program is setting memory automatically but calculating it wrong


Title: Re: BSGS solver for cuda
Post by: Etar on October 15, 2021, 10:24:17 AM
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                    wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions
Program used cuda driver api(not runtime api that ussualy used) and code for GPU writed on ptx.
cuda.lib that used to call cuda driver api even x64 version alwayse return 32bit values.
In that case you can`t use/allocate GPU memory more than 2**32bytes
Also cuDeviceTotalMem() return 32bit values of memory that is why you see 4095mb
I write about this issues to nvidia few times but according to them they have no problem)
if you are looking to cuda.lib you will fined unofficial commands like cuDeviceTotalMem_v2 and other.
All this commands have prefix _v2 and this comands return correct 64bit values.
But nvidia say that they does not have commands with prefix _v2 ))
It is about limitation of 2**32 bytes GPU memory
About Device have: MP:68 Cores+0, here 0 because i didn`t add Ampere to programm:
Code:
Case 2 ;Fermi
            Debug "Fermi"
            If minor=1
              cores = mp * 48
            Else
              cores = mp * 32
            EndIf
          Case 3; Kepler
            Debug "Kepler"
            cores = mp * 192
            
          Case 5; Maxwell
            Debug "Maxwell"
            cores = mp * 128
            
          Case 6; Pascal
            Debug "Pascal"
            cores = mp * 64
            
          Case 7; Pascal
            Debug "Pascal RTX"
            cores = mp * 64
          Default
            Debug "Unknown device type"
        EndSelect
by the way it need only for information and nothing more
to get corect number of cores need add only this
Code:
          Case 8; Ampere 
            Debug "Ampere RTX"
            cores = mp * 128
          Default
            Debug "Unknown device type"


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 10:27:15 AM
i think i found the problem the information which program is pulling from device is wrong or these are max value which intentionally hardcoded in program , Ethar can you please set all dynamic , i mean device should report all parameters

Found 1 Cuda device.
Cuda device:GeForce RTX 3080(4095Mb)    wrong
Device have: MP:68 Cores+0                    wrong
Shared memory total:49152                      i guess this is system memory but avaiable is 128GB
Constant memory total:65536                    not sure how calculate this one

i am not sure but MP is unit of AMD cards and cuda for Nvidia , and cuda is 8k+ in 3080 but not sure what is 68 cores here
so many confusions
Program used cuda driver api(not runtime api that ussualy used) and code for GPU writed on ptx.
cuda.lib that used to call cuda driver api even x64 version alwayse return 32bit values.
In that case you can`t use/allocate GPU memory more than 2**32bytes
Also cuDeviceTotalMem() return 32bit values of memory that is why you see 4095mb
I write about this issues to nvidia few times but according to them they have no problem)
if you are looking to cuda.lib you will fined unofficial commands like cuDeviceTotalMem_v2 and other.
All this commands have prefix _v2 and this comands return correct 64bit values.
But nvidia say that they does not have commands with prefix _v2 ))
It is about limitation of 2**32 bytes GPU memory
About Device have: MP:68 Cores+0, here 0 because i didn`t add Ampere to programm:
Code:
Case 2 ;Fermi
            Debug "Fermi"
            If minor=1
              cores = mp * 48
            Else
              cores = mp * 32
            EndIf
          Case 3; Kepler
            Debug "Kepler"
            cores = mp * 192
            
          Case 5; Maxwell
            Debug "Maxwell"
            cores = mp * 128
            
          Case 6; Pascal
            Debug "Pascal"
            cores = mp * 64
            
          Case 7; Pascal
            Debug "Pascal RTX"
            cores = mp * 64
          Default
            Debug "Unknown device type"
        EndSelect
by the way it need only for information and nothing more
to get corect number of cores need add only this
Code:
          Case 8; Ampere 
            Debug "Ampere RTX"
            cores = mp * 128
          Default
            Debug "Unknown device type"


Thanks man for the information , can you please fix memory & ampere issue? is it possible ? and recompile it as i am unable to compile it via pure basic , free version have limitation


Title: Re: BSGS solver for cuda
Post by: Etar on October 15, 2021, 10:31:56 AM


Thanks man for the information , can you please fix memory & ampere issue? is it possible ? and recompile it as i am unable to compile it via pure basic , free version have limitation
I can only add code to get correct number of ampere cores.
I can`t fix memory(it is more fix return 32bit values instead 64bit) because i can`t use unofficial _v2 comands with official commands in the same app.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 10:36:10 AM


Thanks man for the information , can you please fix memory & ampere issue? is it possible ? and recompile it as i am unable to compile it via pure basic , free version have limitation
I can only add code to get correct number of ampere cores.
I can`t fix memory(it is more fix return 32bit values instead 64bit) because i can`t use unofficial _v2 comands with official commands in the same app.


ahan :(,   i am not good at cuda or in programing , but if i use -i in kangaroo , it is returning correct parameters of memory.

is it possible to mix some codes from kangaroo side ? or any way to hardcode memory ?


Title: Re: BSGS solver for cuda
Post by: Etar on October 15, 2021, 10:43:34 AM

ahan :(,   i am not good at cuda or in programing , but if i use -i in kangaroo , it is returning correct parameters of memory.

is it possible to mix some codes from kangaroo side ? or any way to hardcode memory ?
Ussualy people used cuda runtime api it is different library incompatible with cuda driver api.
I was try to solve 32bit limitation few years ago as soon as the first cards with more than 4GB memory appeared.
But unfortunately this limit could not be overcome.
And do you need to utilize all the memory?
On my 2080ti already at -w 27 the hash rate drops from 570mkeys to 81. While at 3070 everything is fine.
So you need first to check how your hashrate will decrease with increasing parameter -w.
here is with cuDeviceTotalMem_v2
Code:
APP VERSION: 1.2.1
Found 1 Cuda device.
Cuda device:GeForce RTX 2080 Ti(11264Mb)
Device have: MP:68 Cores+4352
Shared memory total:49152
Constant memory total:65536
return correct 64bit values but it is only information it is didn`t help to solve all limitation in cuda commands.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 10:54:05 AM

ahan :(,   i am not good at cuda or in programing , but if i use -i in kangaroo , it is returning correct parameters of memory.

is it possible to mix some codes from kangaroo side ? or any way to hardcode memory ?
Ussualy people used cuda runtime api it is different library incompatible with cuda driver api.
I was try to solve 32bit limitation few years ago as soon as the first cards with more than 4GB memory appeared.
But unfortunately this limit could not be overcome.
And do you need to utilize all the memory?
On my 2080ti already at -w 27 the hash rate drops from 570mkeys to 81. While at 3070 everything is fine.
So you need first to check how your hashrate will decrease with increasing parameter -w.
here is with cuDeviceTotalMem_v2
Code:
APP VERSION: 1.2.1
Found 1 Cuda device.
Cuda device:GeForce RTX 2080 Ti(11264Mb)
Device have: MP:68 Cores+4352
Shared memory total:49152
Constant memory total:65536


some question i have for my understanding

does memory allocation in gpu maks difference in speed?
how to know T, P and b optimal value for my card (3080)?
what is W and -htsz role?
and what is item size ?
can i occupy more ram in computer to give some speed boost as i have 128GB memory ? if yes how can ?


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 11:06:52 AM
please take a look in these 2 URLs ~ they fixed this issue.

https://github.com/BOINC/boinc/issues/1773 (https://github.com/BOINC/boinc/issues/1773)
https://github.com/BOINC/boinc/pull/2707 (https://github.com/BOINC/boinc/pull/2707)
perhaps you will get some clue


Title: Re: BSGS solver for cuda
Post by: Etar on October 15, 2021, 11:06:56 AM

does memory allocation in gpu maks difference in speed?
how to know T, P and b optimal value for my card (3080)?
what is W and -htsz role?
and what is item size ?
can i occupy more ram in computer to give some speed boost as i have 128GB memory ? if yes how can ?

-t use 512 for your 3080
-b use 68, shoud be multiples of SM count your cars(3080 have 68 SM)
-p use 256, this value mean how many xpoints will compute each thread in kernel.
-w it is number of baby step, -w 26 mean create array with size 2^26 as large this array then more big giant step. But you should check you hashrate when increase -w it shodn`t drop more then 1.5 times. For ex, your hashrate with -w 26 is 1500 Mkeys and if with -w 27 your hashrate is more then 1000 mkeys then there will be sense to increase -w

-htsz use default 25, it is size of Hash Table. you can change -htsz only if you have small baby aray(-w) less then Hash Table size


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 11:09:36 AM

does memory allocation in gpu maks difference in speed?
how to know T, P and b optimal value for my card (3080)?
what is W and -htsz role?
and what is item size ?
can i occupy more ram in computer to give some speed boost as i have 128GB memory ? if yes how can ?

-t use 512 for your 3080
-b use 68, shoud be multiples of SM count your cars(3080 have 68 SM)
-p use 256, this value mean how many xpoints will compute each thread in kernel.
-w it is number of baby step, -w 26 mean create array with size 2^26 as large this array then more big giant step. But you should check you hashrate when increase -w it shodn`t drop more then 1.5 times. For ex, your hashrate with -w 26 is 1500 Mkeys and if with -w 27 your hashrate is more then 1000 mkeys then there will be sense to increase -w

-htsz use default 25, it is size of Hash Table. you can change -htsz only if you have small baby aray(-w) less then Hash Table size


awesome  , big thanks 


Title: Re: BSGS solver for cuda
Post by: Etar on October 15, 2021, 01:51:19 PM
Seems like i fix app..  ;D Replace most commands with unofficial _v2
Code:
GPU #0 launched
GPU #0 TotalBuff: 5168.000Mb
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000015ea000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:0000000000000000000000000000000000000000000000002bd4000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:00000000000000000000000000000000000000000000000041be000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:00000000000000000000000000000000000000000000000057a8000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:0000000000000000000000000000000000000000000000006d92000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:000000000000000000000000000000000000000000000000835a000000000001 696MKey/s x536870912 2^29.44 x2^30=2^59.44
GPU#0 Cnt:0000000000000000000000000000000000000000000000009922000000000001 696MKey/s x536870912 2^29.44 x2^30=2^59.44
***********GPU#0************
Total solutions: 1
KEY!!>000000000000000000000000000000000000000000000001a838b13505b26867
Pub: 30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc
****************************
Found in 17 seconds
Result above with -w 29
Also speed is little increased.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 06:33:39 PM
Seems like i fix app..  ;D Replace most commands with unofficial _v2
Code:
GPU #0 launched
GPU #0 TotalBuff: 5168.000Mb
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000015ea000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:0000000000000000000000000000000000000000000000002bd4000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:00000000000000000000000000000000000000000000000041be000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:00000000000000000000000000000000000000000000000057a8000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:0000000000000000000000000000000000000000000000006d92000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:000000000000000000000000000000000000000000000000835a000000000001 696MKey/s x536870912 2^29.44 x2^30=2^59.44
GPU#0 Cnt:0000000000000000000000000000000000000000000000009922000000000001 696MKey/s x536870912 2^29.44 x2^30=2^59.44
***********GPU#0************
Total solutions: 1
KEY!!>000000000000000000000000000000000000000000000001a838b13505b26867
Pub: 30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc
****************************
Found in 17 seconds
Result above with -w 29
Also speed is little increased.


awesome bro , thanks for sharing all this ~~ will test it


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 15, 2021, 08:51:50 PM
Seems like i fix app..  ;D Replace most commands with unofficial _v2
Code:
GPU #0 launched
GPU #0 TotalBuff: 5168.000Mb
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000015ea000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:0000000000000000000000000000000000000000000000002bd4000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:00000000000000000000000000000000000000000000000041be000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:00000000000000000000000000000000000000000000000057a8000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:0000000000000000000000000000000000000000000000006d92000000000001 697MKey/s x536870912 2^29.45 x2^30=2^59.45
GPU#0 Cnt:000000000000000000000000000000000000000000000000835a000000000001 696MKey/s x536870912 2^29.44 x2^30=2^59.44
GPU#0 Cnt:0000000000000000000000000000000000000000000000009922000000000001 696MKey/s x536870912 2^29.44 x2^30=2^59.44
***********GPU#0************
Total solutions: 1
KEY!!>000000000000000000000000000000000000000000000001a838b13505b26867
Pub: 30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc
****************************
Found in 17 seconds
Result above with -w 29
Also speed is little increased.

i am sorry bro its still same and still showing 4095 memory and cannot utilize above 3200+



Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 04:35:50 AM
Do i have to remove 04 from bigging of uncompressed key or software can recognize with 04 also?

and seems like one more issue is there

if i am using range like this
49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5eba34000000000000
its running fine in range but when i use 120 range it is calculating range fine but running very below and showing false collision  

this started like this
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
.....
GPU#0 Cnt:0000000000000000000000000000000000000000000000160cc3800000000001

but i set range 0x800000000000000000000000000000 to 0xffffffffffffffffffffffffffffff

is something wrong with software ?



Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 16, 2021, 04:47:08 AM
Do i have to remove 04 from bigging of uncompressed key or software can recognize with 04 also?

and seems like one more issue is there

if i am using range like this
49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5eba34000000000000
its running fine in range but when i use 120 range it is calculating range fine but running very below and showing false collision  

this started like this
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
.....
GPU#0 Cnt:0000000000000000000000000000000000000000000000160cc3800000000001

but i set range 0x800000000000000000000000000000 to 0xffffffffffffffffffffffffffffff

is something wrong with software ?


You can run with 04 in front of uncompressed key's x,y points; you just can not use a compressed key in any format.

The Cnt's are the giant steps. Program offsets (subtracts start range) pubkey on startup and then after all the baby steps and sorting, the GPU starts the giant steps. Nothing is wrong with the program. If you run a smaller range, you will see the same thing and you will see it will solve for the inputted key. False collisions are normal due to 8 bytes stored in hash table.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 05:37:26 AM
Do i have to remove 04 from bigging of uncompressed key or software can recognize with 04 also?

and seems like one more issue is there

if i am using range like this
49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5eba34000000000000
its running fine in range but when i use 120 range it is calculating range fine but running very below and showing false collision  

this started like this
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
.....
GPU#0 Cnt:0000000000000000000000000000000000000000000000160cc3800000000001

but i set range 0x800000000000000000000000000000 to 0xffffffffffffffffffffffffffffff

is something wrong with software ?


You can run with 04 in front of uncompressed key's x,y points; you just can not use a compressed key in any format.

The Cnt's are the giant steps. Program offsets (subtracts start range) pubkey on startup and then after all the baby steps and sorting, the GPU starts the giant steps. Nothing is wrong with the program. If you run a smaller range, you will see the same thing and you will see it will solve for the inputted key. False collisions are normal due to 8 bytes stored in hash table.

awesome man got the point i was worried that something wrong with my setup


Title: Re: BSGS solver for cuda
Post by: Etar on October 16, 2021, 05:46:45 AM
i am sorry bro its still same and still showing 4095 memory and cannot utilize above 3200+
Make sure you run v1.3.1. you can check version in the begining.


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 16, 2021, 05:53:09 AM
i am sorry bro its still same and still showing 4095 memory and cannot utilize above 3200+
Make sure you run v1.3.1. you can check version in the begining.

Does it search whole 256bit space? Or limited only to few lsb bits?


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 06:10:46 AM
i am sorry bro its still same and still showing 4095 memory and cannot utilize above 3200+
Make sure you run v1.3.1. you can check version in the begining.

yes bro this is v1.3.1 nut still same incorrect detection :(


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 06:18:49 AM
i am sorry bro its still same and still showing 4095 memory and cannot utilize above 3200+
Make sure you run v1.3.1. you can check version in the begining.

yes bro this is v1.3.1 nut still same incorrect detection :(

i fixed it as i did not delete old files which were computed before by old program , when i deleted all old .bin etc file and now everything is fine thanks man will start testing now


Title: Re: BSGS solver for cuda
Post by: Etar on October 16, 2021, 06:19:59 AM
i am sorry bro its still same and still showing 4095 memory and cannot utilize above 3200+
Make sure you run v1.3.1. you can check version in the begining.

yes bro this is v1.3.1 nut still same incorrect detection :(
Very strange because you talk above about false collision but i remove printing false collision in cmd in version 1.3.1(they certainly happen, but they are no longer visible)
By the way bsgs fast only in small ranges like 2^64 and less. if you will try use bsgs for #80 puzzle for ex. then you will search pubkeys much longer then JLP kangaroo.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 06:34:44 AM
i am sorry bro its still same and still showing 4095 memory and cannot utilize above 3200+
Make sure you run v1.3.1. you can check version in the begining.

yes bro this is v1.3.1 nut still same incorrect detection :(
Very strange because you talk above about false collision but i remove printing false collision in cmd in version 1.3.1(they certainly happen, but they are no longer visible)
By the way bsgs fast only in small ranges like 2^64 and less. if you will try use bsgs for #80 puzzle for ex. then you will search pubkeys much longer then JLP kangaroo.

i am beginner so must be doing some wrong but what program you will recommend for above 80-bit range


Title: Re: BSGS solver for cuda
Post by: davidjjones on October 16, 2021, 06:42:53 AM
Is there any script to uncompress multiple pubkeys in a file?
like this script: BTC Adresses > HASH160
https://github.com/sezginyildirim91/btc-address-to-hash160


Title: Re: BSGS solver for cuda
Post by: Etar on October 16, 2021, 06:43:34 AM
--snip--
i am beginner so must be doing some wrong but what program you will recommend for above 80-bit range
JLP Kangaroo. https://github.com/JeanLucPons/Kangaroo


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 06:52:44 AM
Total: 4294967296 bytes
Save BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_536870912_b.BI N
  • chunk:2147483648b
Error when saving chunk: save:2147483648b, got:-2147483648b
Press Enter to exit


???  new error


Title: Re: BSGS solver for cuda
Post by: Etar on October 16, 2021, 07:20:28 AM
Total: 4294967296 bytes
Save BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_536870912_b.BI N
  • chunk:2147483648b
Error when saving chunk: save:2147483648b, got:-2147483648b
Press Enter to exit


???  new error
Download 1.3.2 version. decreased chunk size to 1Gb


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 07:28:59 AM
Total: 4294967296 bytes
Save BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_536870912_b.BI N
  • chunk:2147483648b
Error when saving chunk: save:2147483648b, got:-2147483648b
Press Enter to exit


???  new error
Download 1.3.2 version. decreased chunk size to 1Gb

let me try


Title: Re: BSGS solver for cuda
Post by: hamnaz on October 16, 2021, 07:51:04 AM
Is there any script to uncompress multiple pubkeys in a file?
like this script: BTC Adresses > HASH160
https://github.com/sezginyildirim91/btc-address-to-hash160
here i see post related to compress to uncompress and uncompress to compress (codes)
https://bitcointalk.org/index.php?topic=5244940.msg57700007#msg57700007


Title: Re: BSGS solver for cuda
Post by: a.a on October 16, 2021, 08:57:50 AM
Use bitcoin-tool.
https://github.com/matja/bitcoin-tool


Title: Re: BSGS solver for cuda
Post by: davidjjones on October 16, 2021, 11:59:24 AM
Is there any script to uncompress multiple pubkeys in a file?
like this script: BTC Adresses > HASH160
https://github.com/sezginyildirim91/btc-address-to-hash160
here i see post related to compress to uncompress and uncompress to compress (codes)
https://bitcointalk.org/index.php?topic=5244940.msg57700007#msg57700007
Thank you


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 02:25:16 PM
help me to understand which one is faster when you are searching multiple keys from file

BSGS or Kangaroo?

i heard that if you load list of PB keys in kangaroo it is not checking all simultaneously, is it true?

if over 80 bit kangaroo is more efficient and fast than bsgs please explain why?


Title: Re: BSGS solver for cuda
Post by: a.a on October 16, 2021, 04:26:51 PM
Currently there is no known implementation of kangaroo algorithm which handles multiple pubkeys. JLPs implementation processes each pubkey consecutively.

BSGS is possible to handle each pubkey simultaneously.

Which one is faster? Depends.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 16, 2021, 04:38:19 PM
Currently there is no known implementation of kangaroo algorithm which handles multiple pubkeys. JLPs implementation processes each pubkey consecutively.

BSGS is possible to handle each pubkey simultaneously.

Which one is faster? Depends.

so maybe your meaning is JLP first get one key from file and process only on that until he will find that in collision and than will process second one

and BSGS somewhat i know is checking all same time


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 16, 2021, 05:04:45 PM
Currently there is no known implementation of kangaroo algorithm which handles multiple pubkeys. JLPs implementation processes each pubkey consecutively.

BSGS is possible to handle each pubkey simultaneously.

Which one is faster? Depends.

so maybe your meaning is JLP first get one key from file and process only on that until he will find that in collision and than will process second one

and BSGS somewhat i know is checking all same time
Currently, JLP Kangaroo and BSGS Cuda, both process each pub key, one at a time. Neither are currently programmed to search for multiple pub keys at once.

In my limited tests, I can tell you BSGS Cuda is at least faster, when searching multiple pub keys (one at a time), in the 72 bit range. BSGS gives you 100% check a key is or is not in a range, Kangaroo does not, you have to at least set the -m option to -m 6 to get a 99% rate that the key is or is not in a range.


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 17, 2021, 06:56:36 AM
Do you mean to say that bsgs algo gives you 100% information if key is or is not in s given range k1-k2?

Or you say bsgs would find private key even if k1 k2 interval is incorrect?

I had a closer look at JLP bsgs code (not cuda you speak about) and it seems there is no limit to 125 or 128 bit for search interval? Is bsgs by nature searching whole 256 bit range?


Title: Re: BSGS solver for cuda
Post by: a.a on October 17, 2021, 08:25:02 AM
You should read the articles about Pollards Kangaroo Algorithm and BSGS.
If you find the key with BSGS in range, than you know 100% for sure, that the key is in range. And if a baby step is not found in the defined range for the pubkey than it will find no solution.



Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 17, 2021, 08:34:13 AM
You should read the articles about Pollards Kangaroo Algorithm and BSGS.
If you find the key with BSGS in range, than you know 100% for sure, that the key is in range. And if a baby step is not found in the defined range for the pubkey than it will find no solution.



I do not know if you are joking or serious.
Your answer is totally useless and does not even touch any of my questions.
Claiming that if you find key in range then you are sure 100% is there is childisch and raising my doubts for your mental health


Title: Re: BSGS solver for cuda
Post by: a.a on October 17, 2021, 11:17:54 AM
Actually my original answer was kind of rough and was basically
Like: your questions are stupid, first inform yourself what the different cracking methods are doing on Wikipedia and then ask again before wasting our time. Then I thought, I should be friendly and removed the toxic part. Now to read your answer encourages me to give you a rough answer.

So yes your question about BSGS is total bullshit. BSGS looks for the right babysteps. If It finds the right babysteps, then it can determine by doing the giant steps the actual value of the private key. This is clearly described in wikipedia.

Do you mean to say that bsgs algo gives you 100% information if key is or is not in s given range k1-k2?

This question is stupid. Obviously if BSGS ran a range and did or did not find a solution it means that the key is or is not in range k1-k2.

Or you say bsgs would find private key even if k1 k2 interval is incorrect?
How is this even possible, when already babysteps don't find a value? This question is totally stupid.

Please first do some fundamental research. Maybe then you won't waste your and our time by asking totally stupid questions and blaming others for giving supposedly stupid answer despite the fact reading wikipedia and using your own brain would give you the logical answer.


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 17, 2021, 12:51:51 PM
If you feel wasting your time you should not react at all

Anyway, your answer is like saying baby steps must find it (what) first. Total bullshit.


Title: Re: BSGS solver for cuda
Post by: a.a on October 17, 2021, 01:23:07 PM
If you feel wasting your time you should not react at all

I was answering nice to even reach out to someone who asks such uninformed questions.

Anyway, your answer is like saying baby steps must find it (what) first. Total bullshit.

To understand babystep giantstep read the wiki article https://en.wikipedia.org/wiki/Baby-step_giant-step

If you search in a range, you just need to generate a small babystep lookup table, with potential steps in that range. But if you are looking for a value outside the provided range, the babystep lookup table wont contain the necessary value! So you are having no hit.

The english WP contains no example, but the german one contains an example:
https://de.wikipedia.org/wiki/Babystep-Giantstep-Algorithmus#Beispiel


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 17, 2021, 01:40:07 PM
Thank you


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 17, 2021, 06:00:56 PM
Do you mean to say that bsgs algo gives you 100% information if key is or is not in s given range k1-k2?

Or you say bsgs would find private key even if k1 k2 interval is incorrect?

I had a closer look at JLP bsgs code (not cuda you speak about) and it seems there is no limit to 125 or 128 bit for search interval? Is bsgs by nature searching whole 256 bit range?
Apart from testing a program, meaning we know a key lies in a range we are using to test a program, to see if program works and can find the key that we know is in the range we are searching....

So if you are searching a range and do not know if the key you are searching for lies in the range, if you run the range with BSGS program, if the key is not found, then you know for 100% sure, that the key is not in that range. If you use JLP Kangaroo, you have to at least set the -m option to -m 6, if the program does not find the key, then you know with 99% sure, the key is not in the range.

BSGS will 100% tell you if the key is or is not in the range you are searching.

JLP BSGS is not limited to any bit range; meaning you can search in a 10 bit range or a 256 bit range.

Both programs search in the range you specify. If you specify a 64 bit range, that is what the program will search; if you specify a 256 bit range, that is what the program will search.


Title: Re: BSGS solver for cuda
Post by: Etar on October 17, 2021, 06:27:47 PM
New release v1.4.0
supported compressed/uncompressed format public keys
removed binsort program (don`t need sorted array any more)
baby array need only first time to create HT (or rebuild HT when -htsz changed)
After HT created and saved, next time you will need less ram and only HT, giant array to launch.
with single 2080ti (parameters -w 29   -htsz 28) find 16 public keys from example JLP bsgs in 2m 30s
Code:
GPU#0 Cnt:0000000000000000000000000000000000000000000000006ce8000000000001 869MKey/s x536870912 2^29.76 x2^30=2^59.76
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 9 seconds
GPU#0 job finished
Working time 00:02:30s
Total time 00:03:16s


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 17, 2021, 07:30:27 PM

BSGS will 100% tell you if the key is or is not in the range you are searching.

JLP BSGS is not limited to any bit range; meaning you can search in a 10 bit range or a 256 bit range.


Thank you for exhausting explanation!

I hope I understand correct both JLP and "cuda" BSGS work up to 256bit size search

Maybe you noticed I migrated JLP bsgs from curve k1 to r1. It was success. I started it for interval almost 256bit:
1000000000....0000000
EFFF.....................FFFFF

I know it will probably not succeed in this century but it is not impossible :) I want to try to find private key for gr**npass

I continue to work to migrate also  Kangaroo256 to r1 curve but I was told by the author to wait a bit until he updates hastables (?)
Anyway, I migrated cpu part already just to see how is the behaviour of the code. I already reported him my findings, it is here on the forum


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 17, 2021, 07:39:31 PM
Quote
I continue to work to migrate also  Kangaroo256 to r1 curve but I was told by the author to wait a bit until he updates hastables (?)

Not sure which author you are referring to, but if this is NotATether's version, I would not use it. Very buggy and last known speed is compromised and the program does not find keys.
The only thing needed, IMO, to upgrade JLPs original Kangaroo to be able to search a 256 bit range, was to update the limited 128 bit store function to a 256 bit store function (plus the + - and type bits) so the program could solve key. I have not looked at the code of the new 256bit version, but it seems it does not find keys.

Original JLP Kangaroo, you can search a 256 bit range for a key but since it only stores 128 bits for the distances, you will not solve key properly because it is missing 128 bits of the distance (private key), which is needed to reconstruct the private key of the pub key you are searching for.


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 18, 2021, 05:38:50 AM

Not sure which author you are referring to, but if this is NotATether's version

yes it is, he informed me already about some problems and I should wait until he corrects them

but as I have written already, I could not resist and I migrated it to R1 curve just to try what it does...

yes, you are right, as it is now it behaves a bit non repetitive way, for me it found the keys but solving the same problem more time the time to resolve was from zero (!) seconds to minutes, but I tried only fewer bits search as it was considerably slower than JLP 125b version searching more bits range

I also noticed he changed DP mask meaning (not from left to rigth as original but from right to left) and he "regrets" this as it made me confused

also I noticed there is a problem when having public key in configuration file to solve in 03 mode (only x coordinate)  something is wrong as it falsely reports "point not lie on curve", when in 04 mode it seems to be ok

but let us see and wait if NotATether manages to bring it to life

I was already thinking if there is any way as to test the program before starting it for months..., would it be a change to provide instead of random numbers where algo starts some precalculated values so program would find a test private key in hours? or is there already a way to test the code?

Sorry, I do not understand abbreviation "IMO"  what does it mean? :)


Title: Re: BSGS solver for cuda
Post by: davidjjones on October 18, 2021, 07:27:34 AM
New release v1.4.0
supported compressed/uncompressed format public keys
removed binsort program (don`t need sorted array any more)
baby array need only first time to create HT (or rebuild HT when -htsz changed)
After HT created and saved, next time you will need less ram and only HT, giant array to launch.
...

Thanks for your great code.
Is there a limit to the number of pubkeys in the input txt file?

Here is the list of 2800 top richest pubkeys:
https://pastebin.com/YMr3BaiU


Title: Re: BSGS solver for cuda
Post by: a.a on October 18, 2021, 12:02:00 PM
Interesting List you have. I would have given merits If I had some.

I don't think that they can be effectively searched in parallel. You have to divide each pubkey and check it with the babysteps. So not only do you need to make very expensive global memory lookup (GPU has slow global and super fast local memory) and load each key.

So if you would search multiple keys you would effectively reduce the performance by them. Like 10 keys in parallel means 10 times slower. 2800 keys means 2800 times slower.


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 18, 2021, 12:10:15 PM
Quote
Is there a limit to the number of pubkeys in the input txt file?

I am sure there will be a limit, but it is probably in the millions. For similar programs, it usually caps out at around 30 million addresses, pubkeys, xpoints...

a.a.
Have you ran this program yet? It's just that some of your answers make it seem like you have not ran it at all.

The program does not check keys in parallel, it runs range with one pubkey, once finished, it moves to the next, until the last pubkey has been checked for that specific range.


Title: Re: BSGS solver for cuda
Post by: a.a on October 18, 2021, 12:20:36 PM
I think I made myself not clear. My native language is German. I apologize. I Looked at it with my programmers eyes.

As you already explained few days ago, cuda BSGS is searching the keys one after another, not in parallel. I know that, I read carefully ;). With my previous post I meant that even if it would run in parallel it would slow down as described. So read my post in conjunctive and if someone would modify it to process in parallel I expect that behaviours.


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 18, 2021, 12:46:36 PM
I think I made myself not clear. My native language is German. I apologize. I Looked at it with my programmers eyes.

As you already explained few days ago, cuda BSGS is searching the keys one after another, not in parallel. I know that, I read carefully ;). With my previous post I meant that even if it would run in parallel it would slow down as described. So read my post in conjunctive and if someone would modify it to process in parallel I expect that behaviours.
Gotcha...no worries, I just did not want to mislead anyone or anyone mislead anyone lol.

But you are right, if it did search in parallel, performance would drop, but I believe it would be due to the giant steps (CPU performs the baby steps). If one had higher end card and wanted to search 2 pubkeys, then I think it would be worth it, to search 2 at same time. I have been running the program on slower card, my test card, for a few days now. The purpose is to see if there is a benefit or angle to attack the 120, 125, 130, etc keys where public key is exposed. More to come with this...


Title: Re: BSGS solver for cuda
Post by: davidjjones on October 18, 2021, 01:59:43 PM
Interesting List you have. I would have given merits If I had some.

I don't think that they can be effectively searched in parallel. You have to divide each pubkey and check it with the babysteps. So not only do you need to make very expensive global memory lookup (GPU has slow global and super fast local memory) and load each key.

So if you would search multiple keys you would effectively reduce the performance by them. Like 10 keys in parallel means 10 times slower. 2800 keys means 2800 times slower.

Quote
Is there a limit to the number of pubkeys in the input txt file?

I am sure there will be a limit, but it is probably in the millions. For similar programs, it usually caps out at around 30 million addresses, pubkeys, xpoints...

a.a.
Have you ran this program yet? It's just that some of your answers make it seem like you have not ran it at all.

The program does not check keys in parallel, it runs range with one pubkey, once finished, it moves to the next, until the last pubkey has been checked for that specific range.
Thanks for your good explanation, I got how it works.
So multiple xpoints checking (in parallel) is only possible with KeyHunt-CUDA.


Title: Re: BSGS solver for cuda
Post by: NotATether on October 18, 2021, 01:59:51 PM
Quote
I continue to work to migrate also  Kangaroo256 to r1 curve but I was told by the author to wait a bit until he updates hastables (?)

Not sure which author you are referring to, but if this is NotATether's version, I would not use it. Very buggy and last known speed is compromised and the program does not find keys.
The only thing needed, IMO, to upgrade JLPs original Kangaroo to be able to search a 256 bit range, was to update the limited 128 bit store function to a 256 bit store function (plus the + - and type bits) so the program could solve key. I have not looked at the code of the new 256bit version, but it seems it does not find keys.

Original JLP Kangaroo, you can search a 256 bit range for a key but since it only stores 128 bits for the distances, you will not solve key properly because it is missing 128 bits of the distance (private key), which is needed to reconstruct the private key of the pub key you are searching for.

Yep he was referring to mine.

You should not try to update it to 256-bit store - it's too complicated since you'll have to find a new home for the two flag bits at the end of each store (the ones which limit to the actual search range to 126 bits). This is how the hashtable got screwed.

It is more logical to update it to a 254-bit store instead so you don't have to move the flag bits anywhere.


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 18, 2021, 02:10:13 PM
Quote
I continue to work to migrate also  Kangaroo256 to r1 curve but I was told by the author to wait a bit until he updates hastables (?)

Not sure which author you are referring to, but if this is NotATether's version, I would not use it. Very buggy and last known speed is compromised and the program does not find keys.
The only thing needed, IMO, to upgrade JLPs original Kangaroo to be able to search a 256 bit range, was to update the limited 128 bit store function to a 256 bit store function (plus the + - and type bits) so the program could solve key. I have not looked at the code of the new 256bit version, but it seems it does not find keys.

Original JLP Kangaroo, you can search a 256 bit range for a key but since it only stores 128 bits for the distances, you will not solve key properly because it is missing 128 bits of the distance (private key), which is needed to reconstruct the private key of the pub key you are searching for.

Yep he was referring to mine.

You should not try to update it to 256-bit store - it's too complicated since you'll have to find a new home for the two flag bits at the end of each store (the ones which limit to the actual search range to 126 bits). This is how the hashtable got screwed.

It is more logical to update it to a 254-bit store instead so you don't have to move the flag bits anywhere.
Agreed...really, can it be bumped up to a 160-bit store plus the 2 flag bits, easier than the 254-bit store? Then the program could at least cover up to the last 160 bit key for the puzzle/challenge transaction.


Title: Re: BSGS solver for cuda
Post by: sky59sky59 on October 18, 2021, 02:41:59 PM
Yep he was referring to mine.

You should not try to update it to 256-bit store - it's too complicated since you'll have to find a new home for the two flag bits at the end of each store (the ones which limit to the actual search range to 126 bits). This is how the hashtable got screwed.

It is more logical to update it to a 254-bit store instead so you don't have to move the flag bits anywhere.

In a meantime I found that your updated Div() is faulty, why not to keep original functions? And it seems more functions are faulty that are tested in Check()

you can try this:  (gives wrong results)

I would be probably the happiest man in universe if with all your knowledge and experience just updated 125bit version to 254bit version, just absolutely unnecessary staff without any improvements :)  then I see your great success!!


 // Div -------------------------------------------------------------------------------------------
  tTotal = 0.0;
  ok = true;
  for (int i = 0; i < 2 && ok; i++) {


    a.SetBase16("D51263D15FC81DE32C5CB69070ABDF3D58A2028184E15F3A6C56EB8A787C81DB");
    b.SetBase16("2AED15B34BE1B98EE4246FB3F447059A");
    // a.Rand(BISIZE);
    //b.Rand(BISIZE/2);
    d.Set(&a);
    e.Set(&b);

 printf("a= %s\n", a.GetBase16().c_str());
 printf("b= %s\n", b.GetBase16().c_str());
 printf("d= %s\n", d.GetBase16().c_str());
 printf("e= %s\n", e.GetBase16().c_str());

    t0 = Timer::get_tick();
    a.Div(&b, &c);
 printf("a/b= %s\n", a.GetBase16().c_str());   
 printf("rem= %s\n", c.GetBase16().c_str());
   
    t1 = Timer::get_tick();
    tTotal += (t1 - t0);

    a.Mult(&e);
    a.Add(&c);
    if (!a.IsEqual(&d)) {
     ok = false;
      printf("Div() Results Wrong \nN: %s\nD: %s\nQ: %s\nR: %s\n",
        d.GetBase16().c_str(),
        b.GetBase16().c_str(),
        a.GetBase16().c_str(),
        c.GetBase16().c_str()
        
      );
      return;
    }


Title: Re: BSGS solver for cuda
Post by: NotATether on October 19, 2021, 05:34:38 AM
Agreed...really, can it be bumped up to a 160-bit store plus the 2 flag bits, easier than the 254-bit store? Then the program could at least cover up to the last 160 bit key for the puzzle/challenge transaction.

254 bits is easier to make than 160 bits because I can just update the structures from

int128_t

to int256_t

in all occurrences without having to make any other changes.


Title: Re: BSGS solver for cuda
Post by: Etar on October 19, 2021, 10:41:00 AM
This is the maximum that I can squeeze out of my 2080ti card:
16 pubkeys from JLP example solved in 1m 23s
Code:
NewFINDpubkey= (2375c86aa2a807fd50e4b1a2a65820244e704b8eabc8eb4dc0517393aff0c647, fad56264ae29d620205a68792091b64ae262bba359f8d013ce904d595e790ccf)
***************************
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000003388000000000001 823MKey/s x1073741824 2^29.69 x2^31=2^60.69
GPU#0 Cnt:0000000000000000000000000000000000000000000000006754000000000001 826MKey/s x1073741824 2^29.69 x2^31=2^60.69
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 5 seconds
GPU#0 job finished
Working time 00:01:23s
Total time 00:06:00s
Used two hashtables:
first  for GPU without xpoint position, only xpoint 32bit + size htsz, totaly 32+29 = 61bit per xpoint
Second for host usage with xpoint position
Utilized 9008Mb of GPU memory.


Title: Re: BSGS solver for cuda
Post by: ssxb on October 19, 2021, 12:02:39 PM
This is the maximum that I can squeeze out of my 2080ti card:
16 pubkeys from JLP example solved in 1m 23s
Code:
NewFINDpubkey= (2375c86aa2a807fd50e4b1a2a65820244e704b8eabc8eb4dc0517393aff0c647, fad56264ae29d620205a68792091b64ae262bba359f8d013ce904d595e790ccf)
***************************
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000003388000000000001 823MKey/s x1073741824 2^29.69 x2^31=2^60.69
GPU#0 Cnt:0000000000000000000000000000000000000000000000006754000000000001 826MKey/s x1073741824 2^29.69 x2^31=2^60.69
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 5 seconds
GPU#0 job finished
Working time 00:01:23s
Total time 00:06:00s
Used two hashtables:
first  for GPU without xpoint position, only xpoint 32bit + size htsz, totaly 32+29 = 61bit per xpoint
Second for host usage with xpoint position
Utilized 9008Mb of GPU memory.


i optimized parameters and the key

"59A3BFDAD718C9D3FAC7C187F1139F0815AC5D923910D516E186AFDA28B221DC994327554CED887 AAE5D211A2407CDD025CFC3779ECB9C9D7F2F1A1DDF3E9FF8"

i solved in 17 seconds

well can you please tune it for parallel search for pubs , i undertand speed will drop but its still worth to try .. can you?


Title: Re: BSGS solver for cuda
Post by: Etar on October 19, 2021, 07:13:08 PM

well can you please tune it for parallel search for pubs , i undertand speed will drop but its still worth to try .. can you?

BSGS algorithm  is not intended to search for public keys in parallel.
Possible to make pseudo-parallelism (this means finding the keys sequentially at each giant step). But the speed will drop in multiples of the number of search keys.
For ex. with search 1 public key your speed is 1000mkeys/s. if you setup 10 keys the speed will drop to 100mkey/s, with 1000keys speed drop to 1mkeys/s :)
By the way the search time for 16 keys will be exactly the same, either in a sequential search or in a pseudo-parallel.


Title: Re: BSGS solver for cuda
Post by: lostrelic on October 19, 2021, 09:40:58 PM
This is the maximum that I can squeeze out of my 2080ti card:
16 pubkeys from JLP example solved in 1m 23s
Code:
NewFINDpubkey= (2375c86aa2a807fd50e4b1a2a65820244e704b8eabc8eb4dc0517393aff0c647, fad56264ae29d620205a68792091b64ae262bba359f8d013ce904d595e790ccf)
***************************
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000003388000000000001 823MKey/s x1073741824 2^29.69 x2^31=2^60.69
GPU#0 Cnt:0000000000000000000000000000000000000000000000006754000000000001 826MKey/s x1073741824 2^29.69 x2^31=2^60.69
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 5 seconds
GPU#0 job finished
Working time 00:01:23s
Total time 00:06:00s
Used two hashtables:
first  for GPU without xpoint position, only xpoint 32bit + size htsz, totaly 32+29 = 61bit per xpoint
Second for host usage with xpoint position
Utilized 9008Mb of GPU memory.

Hi Etar what settings have you used to get that, and could you recommend what to use for a 3080?
Thanks Relic


Title: Re: BSGS solver for cuda
Post by: ssxb on October 20, 2021, 03:57:51 AM

well can you please tune it for parallel search for pubs , i undertand speed will drop but its still worth to try .. can you?

BSGS algorithm  is not intended to search for public keys in parallel.
Possible to make pseudo-parallelism (this means finding the keys sequentially at each giant step). But the speed will drop in multiples of the number of search keys.
For ex. with search 1 public key your speed is 1000mkeys/s. if you setup 10 keys the speed will drop to 100mkey/s, with 1000keys speed drop to 1mkeys/s :)
By the way the search time for 16 keys will be exactly the same, either in a sequential search or in a pseudo-parallel.

Got it but just as example if you do 32 divisor and load 32 keys , assume if key is on position 1~ lucky you. but if the key is on position 30 program will hang with full range scan for key 1 and than will be back to second (my guess ~ didn't test your program) one perhaps after this century ;D

but one thing that i noticed Alberto keyhunt [updated recently] is way too faster than BSGScuda [although both have different way]. i solved 80 key with blink of eye but that one need serious K and N optimization ~ do the wrong K and N you will never reach the goal. i am not sure if you guys have a chance to test that one Alberto KEYHUNT (https://github.com/albertobsd/keyhunt) you will find it interesting.

but dark fact is keyhunt is ram eating bug  ;D so if have less ram (minimum 128gb) no point to compare it with BSGScuda perhaps in that case BSGScuda will do way better than Keyhunt.


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 20, 2021, 05:10:05 AM

well can you please tune it for parallel search for pubs , i undertand speed will drop but its still worth to try .. can you?

BSGS algorithm  is not intended to search for public keys in parallel.
Possible to make pseudo-parallelism (this means finding the keys sequentially at each giant step). But the speed will drop in multiples of the number of search keys.
For ex. with search 1 public key your speed is 1000mkeys/s. if you setup 10 keys the speed will drop to 100mkey/s, with 1000keys speed drop to 1mkeys/s :)
By the way the search time for 16 keys will be exactly the same, either in a sequential search or in a pseudo-parallel.

Got it but just as example if you do 32 divisor and load 32 keys , assume if key is on position 1~ lucky you. but if the key is on position 30 program will hang with full range scan for key 1 and than will be back to second (my guess ~ didn't test your program) one perhaps after this century ;D

but one thing that i noticed Alberto keyhunt [updated recently] is way too faster than BSGScuda [although both have different way]. i solved 80 key with blink of eye but that one need serious K and N optimization ~ do the wrong K and N you will never reach the goal. i am not sure if you guys have a chance to test that one Alberto KEYHUNT (https://github.com/albertobsd/keyhunt) you will find it interesting.

but dark fact is keyhunt is ram eating bug  ;D so if have less ram (minimum 128gb) no point to compare it with BSGScuda perhaps in that case BSGScuda will do way better than Keyhunt.
If you feel keyhunt is faster via your own tests, then divide that 120 pubkey up into 2^40 pubkeys and run keyhunt; maybe you find the key in 2 blinks of an eye...
and if you are going for this:
Quote
Got it but just as example if you do 32 divisor and load 32 keys , assume if key is on position 1~ lucky you
then just do your 32 divisor and let it search each pubkey for 1 minute; maybe lucky you.


Title: Re: BSGS solver for cuda
Post by: Etar on October 20, 2021, 05:30:12 AM
Hi Etar what settings have you used to get that, and could you recommend what to use for a 3080?
Thanks Relic
It is unpublished version yet(i will publish it today)
it is my settings for 2080ti
-t 512 -b 136 -p 480 -w 30 -htsz 28
Utilized around 9200mb of GPU memory(totaly 2080ti in windows10 have only 9240 free memory)

P.s.Already released v1.6.0


Title: Re: BSGS solver for cuda
Post by: ssxb on October 20, 2021, 05:48:59 AM

well can you please tune it for parallel search for pubs , i undertand speed will drop but its still worth to try .. can you?

BSGS algorithm  is not intended to search for public keys in parallel.
Possible to make pseudo-parallelism (this means finding the keys sequentially at each giant step). But the speed will drop in multiples of the number of search keys.
For ex. with search 1 public key your speed is 1000mkeys/s. if you setup 10 keys the speed will drop to 100mkey/s, with 1000keys speed drop to 1mkeys/s :)
By the way the search time for 16 keys will be exactly the same, either in a sequential search or in a pseudo-parallel.

Got it but just as example if you do 32 divisor and load 32 keys , assume if key is on position 1~ lucky you. but if the key is on position 30 program will hang with full range scan for key 1 and than will be back to second (my guess ~ didn't test your program) one perhaps after this century ;D

but one thing that i noticed Alberto keyhunt [updated recently] is way too faster than BSGScuda [although both have different way]. i solved 80 key with blink of eye but that one need serious K and N optimization ~ do the wrong K and N you will never reach the goal. i am not sure if you guys have a chance to test that one Alberto KEYHUNT (https://github.com/albertobsd/keyhunt) you will find it interesting.

but dark fact is keyhunt is ram eating bug  ;D so if have less ram (minimum 128gb) no point to compare it with BSGScuda perhaps in that case BSGScuda will do way better than Keyhunt.
If you feel keyhunt is faster via your own tests, then divide that 120 pubkey up into 2^40 pubkeys and run keyhunt; maybe you find the key in 2 blinks of an eye...
and if you are going for this:
Quote
Got it but just as example if you do 32 divisor and load 32 keys , assume if key is on position 1~ lucky you
then just do your 32 divisor and let it search each pubkey for 1 minute; maybe lucky you.



you got big mouth but less sense and knowledge  ;D

i hate to tell you that grow up your knowledge & perhaps things will get more clear.

1 > 80 key not 80 keys [single key] [random mode with 4.7 Ekeys/sec] [4300000000000000000 keys/sec] [3BACAB37B62E0000 keys/sec][ whole 65 range in 1 sec]. now compare with bsgscuda  
        with reference key in range 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000:49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff.  let me guess 3080 took with full
        optimization around 17 second but keyhunt took just 1 second. even i have to reduce k and n value to reduce speed for this  ;D.
2 > do your research and than find how many keys you will get while doing 120 to 2^40 divisor [lol]. if you will load 2 keys, you will make keyhunt speed half and what about billion keys . speed will be just like your mind
       processing to understand my answer.
 


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on October 20, 2021, 06:09:02 AM
Quote
you got big mouth but less sense and knowledge  Grin

i hate to tell you that grow up your knowledge & perhaps things will get more clear.

1 > 80 key not 80 keys [single key] [random mode with 4.7 Ekeys/sec] [4300000000000000000 keys/sec] [3BACAB37B62E0000 keys/sec][ whole 65 range in 1 sec]. now compare with bsgscuda 
        with reference key in range 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000:49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff.  let me guess 3080 took with full
        optimization around 17 second but keyhunt took just 1 second. even i have to reduce k and n value to reduce speed for this  Grin.
2 > do your research and than find how many keys you will get while doing 120 to 2^40 divisor [lol]. if you will load 2 keys, you will make keyhunt speed half and what about billion keys . speed will be just like your mind
       processing to understand my answer.
Your English reading or comprehension is less sense and knowledge.

I never said anything about 80 keys...you are saying you found 80 key, I took that as a single key in an 80 bit range, not 80 keys because you did not pluralize the word key. So with that, I merely said instead of trying to get someone to reprogram BSGS Cuda for multi key, run keyhunt, since it already supports multi key and if you think it is faster, then break up 120 key into however many keys you want to, 2^5, 2^20, 2^40, or however many you want to and let that program eat.  I said 2^40 specifically because you said an 80 key in a blink of an eye; so 2^120/2^40 = 2^80; if you found one 80 key in a blink of an eye, maybe you find the 120 key in 80 bit range in 2 blinks of an eye.

BSGS Cuda, can find 65 bit key in less than a second, it all depends on your hardware.

you say
Quote
if you will load 2 keys, you will make keyhunt speed half
the same will happen to BSGS Cuda; so I am not sure what your point is really.



Title: Re: BSGS solver for cuda
Post by: Etar on October 20, 2021, 06:19:11 AM

1 > 80 key not 80 keys [single key] [random mode with 4.7 Ekeys/sec] [4300000000000000000 keys/sec] [3BACAB37B62E0000 keys/sec][ whole 65 range in 1 sec]. now compare with bsgscuda  
        with reference key in range 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000:49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff.  let me guess 3080 took with full
        optimization around 17 second but keyhunt took just 1 second. even i have to reduce k and n value to reduce speed for this  ;D.
2 > do your research and than find how many keys you will get while doing 120 to 2^40 divisor [lol]. if you will load 2 keys, you will make keyhunt speed half and what about billion keys . speed will be just like your mind
       processing to understand my answer.
 
4300000000000000000 it is 2^61.89. so whole 65range( i think you mean puzzle #65 with range 2^64bit) need 4.28 seconds
I don`t have 3080 card but i think speed will be around 1400Mkeys x BabyArraySize
windows10 eat 20% of GPU memory so 3080 should have 8192 free memory, so we can use -w 30
Totaly 1400mkeys = 2^30.38 and baby array x2 = 2^31 and full perfomance = 2^61.38 and to check full 2^64 need 6.14s
Only Kangaroo can solve keys faster then bsgs or keyhunt or whatever.
Bsgs cuda created only because i didn`t find bsgs for gpu (maybe it useless app i don`t know)


Title: Re: BSGS solver for cuda
Post by: ssxb on October 20, 2021, 07:38:15 AM

1 > 80 key not 80 keys [single key] [random mode with 4.7 Ekeys/sec] [4300000000000000000 keys/sec] [3BACAB37B62E0000 keys/sec][ whole 65 range in 1 sec]. now compare with bsgscuda  
        with reference key in range 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000:49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff.  let me guess 3080 took with full
        optimization around 17 second but keyhunt took just 1 second. even i have to reduce k and n value to reduce speed for this  ;D.
2 > do your research and than find how many keys you will get while doing 120 to 2^40 divisor [lol]. if you will load 2 keys, you will make keyhunt speed half and what about billion keys . speed will be just like your mind
       processing to understand my answer.
 
4300000000000000000 it is 2^61.89. so whole 65range( i think you mean puzzle #65 with range 2^64bit) need 4.28 seconds
I don`t have 3080 card but i think speed will be around 1400Mkeys x BabyArraySize
windows10 eat 20% of GPU memory so 3080 should have 8192 free memory, so we can use -w 30
Totaly 1400mkeys = 2^30.38 and baby array x2 = 2^31 and full perfomance = 2^61.38 and to check full 2^64 need 6.14s
Only Kangaroo can solve keys faster then bsgs or keyhunt or whatever.
Bsgs cuda created only because i didn`t find bsgs for gpu (maybe it useless app i don`t know)

i am not arguing on your math but if you have time and hardware please just try to do research on keyhunt [CPU+memory ] and by the way i appreciate your programing skills toward cuda its really impressive and wish some day you will enhanced it more to overcome 120 and by the way i know one guy who is running it with 9+Ekeys/sec [yoyodapro].

 but with divisor you can get only get 1 key out of  1073741824 if you want to reach 90bit.
i loaded all keys in keyhunt and i am trying my luck but on other side i was hoping if we can figure it out how to load multi keys with cudabsgs . so i will keep busy my 3080 for that as that one is just sitting idle now.



Title: Re: BSGS solver for cuda
Post by: ssxb on October 20, 2021, 07:56:11 AM
Quote
you got big mouth but less sense and knowledge  Grin

i hate to tell you that grow up your knowledge & perhaps things will get more clear.

1 > 80 key not 80 keys [single key] [random mode with 4.7 Ekeys/sec] [4300000000000000000 keys/sec] [3BACAB37B62E0000 keys/sec][ whole 65 range in 1 sec]. now compare with bsgscuda 
        with reference key in range 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000:49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff.  let me guess 3080 took with full
        optimization around 17 second but keyhunt took just 1 second. even i have to reduce k and n value to reduce speed for this  Grin.
2 > do your research and than find how many keys you will get while doing 120 to 2^40 divisor [lol]. if you will load 2 keys, you will make keyhunt speed half and what about billion keys . speed will be just like your mind
       processing to understand my answer.
Your English reading or comprehension is less sense and knowledge.

I never said anything about 80 keys...you are saying you found 80 key, I took that as a single key in an 80 bit range, not 80 keys because you did not pluralize the word key. So with that, I merely said instead of trying to get someone to reprogram BSGS Cuda for multi key, run keyhunt, since it already supports multi key and if you think it is faster, then break up 120 key into however many keys you want to, 2^5, 2^20, 2^40, or however many you want to and let that program eat.  I said 2^40 specifically because you said an 80 key in a blink of an eye; so 2^120/2^40 = 2^80; if you found one 80 key in a blink of an eye, maybe you find the 120 key in 80 bit range in 2 blinks of an eye.

BSGS Cuda, can find 65 bit key in less than a second, it all depends on your hardware.

you say
Quote
if you will load 2 keys, you will make keyhunt speed half
the same will happen to BSGS Cuda; so I am not sure what your point is really.



maybe you find the 120 key in 80 bit range in 2 blinks of an eye.   


ok learn basic knowledge of divisor bro , if you will do 32 times, only one key will be from 5 bit down range on unknown position other all will from uper bit ranges on exact same distance from their references values.

now if you will do 2^40, you will have 1208925819614629174706176 reference values in 256 bit range and only one of key will be in 40bit range other all keys will from uper bits on exact distance from their respectively reference values.

now how the hell you can work with such large number of keys and the line you said that get the 2^40 is aggressive comment without knowing my intention.

my intention is that i already did divisor of 32 and loaded keys in Keyhunt and running it right now and i know how much speed and power i am getting from that , but i just dont know power of BSGScuda if i will load 32 keys parallel in that. so i asked Etar that if he can make such possibility who knows BSGS will out performed keyhunt.

now come to the point in above post Etar said that my program is good until 80 bit and above that use JL kangaroo so i was comparing it with BSGS of alberto but i found that CPU based BSGS is more powerful than 3080 if you have good specification hardware but same time BSGScuda is better than keyhunt[CPU] if you dont have enough power of CPU and memory.


Title: Re: BSGS solver for cuda
Post by: ssxb on October 20, 2021, 08:05:42 AM
@Etar  ???

i seriously believing that there will some way to use power of GPU cores and process all BSGS inside computer memory perhaps this will give some crazy power which never been discovered or there will be bottle neck but you can confirm it when you will build such program.

assume if you have power of keyhunt and than you will make bloom in SSD [7000+ read write speed gen4]

RAM        bpfile elements   bpfile size      bloom size
8 GB         1000000000   32 GB      5.02 GB
32 GB     5000000000   160 GB      25.11 GB
128 GB   22000000000   704 GB      110.47 GB
500 GB   90000000000   2.9 TB      451.92 GB

based on above table you can increase speed if you will utilize both bloom+bp https://github.com/iceland2k14/bsgs (https://github.com/iceland2k14/bsgs)

so CPU cores are less powerful than cuda and i was thinking [not sure possible or not] if we load all bp in RAM and use some bloom in GPU memory perhaps their will be some dramatic speed boost



Title: Re: BSGS solver for cuda
Post by: bigvito19 on October 20, 2021, 08:48:29 AM
What's the link to the divisor script?

and how many keys can I generate with the divisor?


Title: Re: BSGS solver for cuda
Post by: NotATether on October 20, 2021, 09:55:44 AM
What's the link to the divisor script?

and how many keys can I generate with the divisor?

If you mean the one I made, it's in the Kangaroo thread, anywhere from pages 90 to 100 I think.



I think we can cut the number of baby steps made if we take into account that the correct baby step amount is going to be random-looking (in other words, no long 0 or 1 sequences).

Or at least make the baby steps take a higher bit count, decreasing the number of giant steps.

I'm thinking that we can find the numbers represented by these random bits and then calculate their multiples to use as an incrementor... not perfect but it does the trick I guess.

E.g.

5 is 101, 10 is 1010, 15 is 1111, 20 is 10100, 25 is 11001, 30 11110, ..... etc.

Special care would need to be taken to choose a number whose multiples don't make long sequences of bits, like 15: 3*5

I don't think that this randomness has any correlation to primality of numbers (or inverse correlation to it).


Title: Re: BSGS solver for cuda
Post by: bigvito19 on October 20, 2021, 01:28:20 PM
I'm testing with the divisor keys on a smaller range, but its not solving the key with keyhunt. does it work the same with xpoint mode?


Title: Re: BSGS solver for cuda
Post by: ssxb on October 20, 2021, 01:46:08 PM
I'm testing with the divisor keys on a smaller range, but its not solving the key with keyhunt. does it work the same with xpoint mode?

you need to adjust K and N as smaller range will be not solved if power of K and N is more than range count or if number of keys will be more or less than power of your hardware.

remember tweak is seriously needed while keeping K and N according to your hardware power as well as adjust K and N according to number of keys you will load in software ~ do the test again and again and again


Title: Re: BSGS solver for cuda
Post by: Etar on October 20, 2021, 03:54:04 PM
With last ptx optimisation (forgot about simmetry in batch point addition)
solve 16 pubkeys from JLP in 58s
Code:
...
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000004673f00000000001 1121MKey/s x1073741824 2^30.13 x2^31=2^61.13
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:58s
Total time 00:06:33s
GPU#0 thread finished
cuda finished ok

Press Enter to exit
Seems like it is the maximum that I can achieve in single 2080ti.
Ofcourse JLP would probably have done it even faster :)


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 21, 2021, 08:58:37 AM
With last ptx optimisation (forgot about simmetry in batch point addition)
solve 16 pubkeys from JLP in 58s
Code:
...
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000004673f00000000001 1121MKey/s x1073741824 2^30.13 x2^31=2^61.13
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:58s
Total time 00:06:33s
GPU#0 thread finished
cuda finished ok

Press Enter to exit
Seems like it is the maximum that I can achieve in single 2080ti.
Ofcourse JLP would probably have done it even faster :)



impressive Etar . i have question . lets say if you have 1m keys in file and you load in bsgscuda and set scan range only to 64, now my question is if gpu finished whole 64 range scan for key1 than gpu will abandoned  search of key1 and move to key2?

your program is doing that or you will impalement this. right?


Title: Re: BSGS solver for cuda
Post by: Etar on October 21, 2021, 10:11:16 AM
impressive Etar . i have question . lets say if you have 1m keys in file and you load in bsgscuda and set scan range only to 64, now my question is if gpu finished whole 64 range scan for key1 than gpu will abandoned  search of key1 and move to key2?

your program is doing that or you will impalement this. right?
Use -pk to set start range and -pke to set endrange. if pubkey will not find in this range then seraching will be switched to next pubkey.


Title: Re: BSGS solver for cuda
Post by: lostrelic on October 21, 2021, 10:22:37 AM
Hi Etar thanks for your continuing support for this program.
Quick question the fastest I get is 2^60 if I try to get 2^61 it sticks on add baby points to hashtable? I’ve got a 3080 16gb ram and 500gb ssd any ideas on settings to try? or how long should I wait for it to load?
Thanks Relic


Title: Re: BSGS solver for cuda
Post by: Etar on October 21, 2021, 11:46:14 AM
Hi Etar thanks for your continuing support for this program.
Quick question the fastest I get is 2^60 if I try to get 2^61 it sticks on add baby points to hashtable? I’ve got a 3080 16gb ram and 500gb ssd any ideas on settings to try? or how long should I wait for it to load?
Thanks Relic
Screen what i post in post above it is the latest verion and not yet published(tested).
By the way v1.6.0 shoud works fine for you but in little less perfomance, at v1.6.0 2080ti speed 826MKey/s x1073741824 2^29.69 x2^31=2^60.69
If you have 16gb gpu ram then try -w 31 and -htsz 29
In any case 3080 shoud have better perfomance then 2080ti even with the same size of baby array that i use, try set -t 512 -b 136 -p 512 -w 30 -htsz 28

P.s. Maybe you stick on add baby points to hashtable because have little memory on PC to generate HT in RAM. I generate HT -w 30 on PC that have 32GB of ram.
For -w 31 you need 64gb of ram to creat all arrays.
To launch solver you will need less more memory with already generated arrays.


Title: Re: BSGS solver for cuda
Post by: Etar on October 21, 2021, 12:47:06 PM
STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.


Title: Re: BSGS solver for cuda
Post by: studyroom1 on October 21, 2021, 01:58:10 PM
STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

oh when can we see next update :(


Title: Re: BSGS solver for cuda
Post by: Etar on October 22, 2021, 12:55:03 PM
The problem was a double giant step.
Now I have removed the double giant step and in my opinion everything works as it should.
I run several tests with different small -w -p options with 1024 pubkeys file and all keys are founded.
True, now the total indicator is 2 times less, due to the fact that the step is normal.
You can run all sorts of tests with keys and check. If there are any bugs, let me know.
release 1.7.0 available on github.


Title: Re: BSGS solver for cuda
Post by: _Counselor on October 22, 2021, 02:09:15 PM
The problem was a double giant step.
Now I have removed the double giant step and in my opinion everything works as it should.
I run several tests with different small -w -p options with 1024 pubkeys file and all keys are founded.
True, now the total indicator is 2 times less, due to the fact that the step is normal.
You can run all sorts of tests with keys and check. If there are any bugs, let me know.
release 1.7.0 available on github.
What the kind of problem was?
I think you exploited symmetry to double size of giant steps?
Why it did not find some keys?


Title: Re: BSGS solver for cuda
Post by: math09183 on October 23, 2021, 06:53:32 AM
STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

LOL  :D

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  ;D


Title: Re: BSGS solver for cuda
Post by: Etar on October 23, 2021, 07:01:46 AM

LOL  :D

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  ;D

Most of code have bugs. Are you a great programmer who does everything without mistakes?
I found this bug and solved it, what's your problem?


Title: Re: BSGS solver for cuda
Post by: Etar on October 23, 2021, 07:02:51 AM
What the kind of problem was?
I think you exploited symmetry to double size of giant steps?
Why it did not find some keys?

if we talk about doubled GS (Giant Step)
For ex, option -p 8 -w 4 mean baby array 2^4 =16
each giant step (doubled) is 16*2=32
let say we should find pubkey with privkey=32
program substruct GS from public key and look to a Baby array to check overlap.
but if you substruct 32-32 then you get 64 and this value is not present in the baby array.
But if we used the usual GS
32-16=16 and 16 is present in baby array - pubkey solved.

So with doubled GS not finded every (baby array size)*2 keys.


Title: Re: BSGS solver for cuda
Post by: math09183 on October 23, 2021, 03:17:08 PM

LOL  :D

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  ;D

Most of code have bugs. Are you a great programmer who does everything without mistakes?
I found this bug and solved it, what's your problem?

Relax, haters gonna hate  ;D

Anyway, good job, I appreciate your work. Sh*t happens.


Title: Re: BSGS solver for cuda
Post by: Etar on October 23, 2021, 05:13:51 PM
Code:
KEY[15]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e2452dd26bc983cd5
    Pub: 02b1985389d8ab680dedd67bba7ca781d1a9e6e5974aad2e70518125bad5783eb5
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:55s
FINDpubkey= (55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b, 3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9)
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000045ba000000000001 1110MKey/s x1073741824 2^30.12 x2^31=2^61.12
***********GPU#0************
KEY[16]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
    Pub: 0355b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:59s
double giant step defeated.
v1.7.1 released with maximum perfomance.


Title: Re: BSGS solver for cuda
Post by: mamuu on October 23, 2021, 07:13:47 PM
Code:
KEY[15]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e2452dd26bc983cd5
    Pub: 02b1985389d8ab680dedd67bba7ca781d1a9e6e5974aad2e70518125bad5783eb5
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:55s
FINDpubkey= (55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b, 3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9)
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000045ba000000000001 1110MKey/s x1073741824 2^30.12 x2^31=2^61.12
***********GPU#0************
KEY[16]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
    Pub: 0355b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:59s
double giant step defeated.
v1.7.1 released with maximum perfomance.

Hi,
Please , can you write a little tutorial on usage?


Title: Re: BSGS solver for cuda
Post by: Etar on October 23, 2021, 07:16:04 PM
Hi,
Please , can you write a little tutorial on usage?
did you look at readme.md on github?
first of all set -t parametr as 256 or 512
next set -b equil to SM count of your card
next set -p start from 128
then set -w as max as possible to your gpu memory, check -htsz params
-w 31 -htsz 29 need around 64GB of RAM to generate all arrays
-w 30 -htsz 28 need around 32GB of RAM to generate all arrays
-w 29 -htsz 28
-w 28 -htsz 27
-w 27 -htsz 25
if you will have free gpu memory you can increase -p or -b or -t  (all params multiple of 2)


Title: Re: BSGS solver for cuda
Post by: ssxb on October 24, 2021, 03:30:22 AM
Hi,
Please , can you write a little tutorial on usage?
did you look at readme.md on github?
first of all set -t parametr as 256 or 512
next set -b equil to SM count of your card
next set -p start from 128
then set -w as max as possible to your gpu memory, check -htsz params
-w 31 -htsz 29 need around 64GB of RAM to generate all arrays
-w 30 -htsz 28 need around 32GB of RAM to generate all arrays
-w 29 -htsz 28
-w 28 -htsz 27
-w 27 -htsz 25
if you will have free gpu memory you can increase -p or -b or -t  (all params multiple of 2)


some great work from your side  , appreciate

just a  quick question, do you have any plan to enhanced it for 120bit or more to perform better than JLK?


Title: Re: BSGS solver for cuda
Post by: Etar on October 24, 2021, 07:55:06 AM

just a  quick question, do you have any plan to enhanced it for 120bit or more to perform better than JLK?


Bsgs will never faster then kangaroo on big ranges.
Example puzzle #80, width 79bit
Kanagaroo need in average 2^40.5 op to solve key
2080ti can serach DPs with speed 1500Mkey/s = 2^30.48
So you need in average 2^10 second to find key.

Bsgscuda can search 2^61/s, so 2^79 / 2^61 = 2^18 second to check whole 79bit range. Ofcourse key can be close to the begining and you can found very fast but it is not 100%.
we can devide pub to 64 pubkey and try to find one of them in range 1..3ffffffffffffffffff fortunately, the key we need is the first one on the list but it was just lucky.
by the way privkey will be 3a869719b73046d6b46 = 2^73.87
so we need  2^73.87 / 2^61/s = 2^12.87 seconds to find key. It is 7 more timer then need for kangaroo


Title: Re: BSGS solver for cuda
Post by: arulbero on October 24, 2021, 04:32:04 PM
Do you use the negation map to speed up the algorithm ?  --> pag. 8-9  https://eprint.iacr.org/2015/605.pdf (https://eprint.iacr.org/2015/605.pdf)

You need to compute only:

sqrt(n) / 2  baby steps : for example, n = 2^60  -> 2^29 baby steps

sqrt(n)  giant steps : for example, n = 2^60  -> 2^30 giant steps

It is like to shift the public key of 2^59 steps, and search in the interval [1,..., 2^59] instead of [1,...., 2^60] exploiting the symmetrie.

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.


Title: Re: BSGS solver for cuda
Post by: Etar on October 24, 2021, 05:02:06 PM
Do you use the negation map to speed up the algorithm ?  --> pag. 8-9  https://eprint.iacr.org/2015/605.pdf (https://eprint.iacr.org/2015/605.pdf)

You need to compute only:

sqrt(n) / 2  baby steps : for example, n = 2^60  -> 2^29 baby steps

sqrt(n)  giant steps : for example, n = 2^60  -> 2^30 giant steps

It is like to shift the public key of 2^59 steps, and search in the interval [1,..., 2^59] instead of [1,...., 2^60] exploiting the symmetrie.

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.


Title: Re: BSGS solver for cuda
Post by: arulbero on October 24, 2021, 05:16:14 PM
As you already explained few days ago, cuda BSGS is searching the keys one after another, not in parallel. I know that, I read carefully ;). With my previous post I meant that even if it would run in parallel it would slow down as described. So read my post in conjunctive and if someone would modify it to process in parallel I expect that behaviours.

BSGS works in this way:

suppose we know that P = k*G, and the private key k is in [1,...., 2^60] range

1) precompute 2^29 baby steps (they are simple public keys): 1*G, 2*G, 3*G, ...., 2^29 * G

2) split the public key P in many other public keys (they are called giant steps, 2^30 public keys):
P, P - 1*(2^30*G), P - 2*(2^30*G), P - 3*(2^30)G, ..., P - (2^30 - 1)*(2^30*G)

3) for each giant steps, you check if it lies in [1, ..., 2^30] range (if it is equal to a 'baby step' public key)

4) if    P - a*(2^30*G) = +-b*G   then  P = (a*2^30 +- b)*G   then the private key is k = a*2^30 +- b

If you want to search 2 public keys, P1 and P2, you can use the same baby steps, but you need to generate 2^31 giant steps instead of 2^30.


Title: Re: BSGS solver for cuda
Post by: arulbero on October 24, 2021, 05:18:43 PM
Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?


Title: Re: BSGS solver for cuda
Post by: Etar on October 24, 2021, 05:31:37 PM
Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?
with group inverse, calculate inverse for batch and use it in addition. Sorry if i misunderstud question.
In the same way as in bitcrack https://github.com/brichard19/BitCrack/blob/6bf8059ef075eb1622298395866b0bd02375e1d9/cudaMath/secp256k1.cuh#L642
and then https://github.com/brichard19/BitCrack/blob/6bf8059ef075eb1622298395866b0bd02375e1d9/cudaMath/secp256k1.cuh#L656


Title: Re: BSGS solver for cuda
Post by: arulbero on October 24, 2021, 05:40:09 PM
Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?
with group inverse, calculate inverse for batch and use it in addition. Sorry if i misunderstud question.

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.

If you have to compute A + B:

https://i.imgur.com/jbMdLFE.jpg


If you have to compute A - B, since -B = (xB, n-yB)

1/(xb-xa) is the same as in A + B.
Because  for example P+2G and P-2G use the same inverse.

 


Title: Re: BSGS solver for cuda
Post by: Etar on October 24, 2021, 05:44:48 PM

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys


Title: Re: BSGS solver for cuda
Post by: arulbero on October 24, 2021, 05:52:59 PM

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys

Ok. The square "a**2 mod p" is optimized like here (https://github.com/JeanLucPons/Kangaroo/blob/master/GPU/GPUMath.h#L909) ?


Title: Re: BSGS solver for cuda
Post by: Etar on October 24, 2021, 05:57:01 PM

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys

Ok. The square "a**2 mod p" is optimized like here (https://github.com/JeanLucPons/Kangaroo/blob/master/GPU/GPUMath.h#L909) ?
No. used mult mod P
i use optimized square mod P in PB https://github.com/Etayson/BSGS-cuda/blob/e41fff517b8de153b6bf9846ee7abb47524fe43e/lib/Curve64.pb#L2161
but need buffer 512bytes for this, so i did not transfer it to cuda ptx
Also used double giant step, so we substuct double giant value from pub.


Title: Re: BSGS solver for cuda
Post by: arulbero on October 24, 2021, 06:33:05 PM
Also used double giant step, so we substuct double giant value from pub.

So, instead of computing

sqrt(n) / 2  baby steps
and
sqrt(n) giant steps

you compute

sqrt(n) baby steps
and
sqrt(n) / 2 giant steps


then, for example, for n = 2^60:

P - a * (2^31*G) = b*G  where  'a' lies in [1, ..., 2^29] and 'b' lies in [1,...,2^30]

means:  

1) P - a*(2^31*G) = b*G  -->   P = [a*(2^31) + b] * G  --> priv key = a*(2^31) + b
or
2) P - a*(2^31*G) = -b*G  -->   P = [a*(2^31) - b] * G  --> priv key = a*(2^31) - b

to save 2^30 giant steps.

It seems to me that the program is already optimized.


Title: Re: BSGS solver for cuda
Post by: Etar on October 24, 2021, 06:46:12 PM
Also used double giant step, so we substuct double giant value from pub.

So, instead of computing

sqrt(n) / 2  baby steps
and
sqrt(n) giant steps

you compute

sqrt(n) baby steps
and
sqrt(n) / 2 giant steps


then, for example, for n = 2^60:

P - a * (2^31*G) = b*G  where  'a' lies in [1, ..., 2^29] and 'b' lies in [1,...,2^30]

means:  

1) P - a*(2^31*G) = b*G  -->   P = [a*(2^31) + b] * G  --> priv key = a*(2^31) + b
or
2) P - a*(2^31*G) = -b*G  -->   P = [a*(2^31) - b] * G  --> priv key = a*(2^31) - b

to save 2^30 giant steps.

It seems to me that the program is already optimized.
I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.


Title: Re: BSGS solver for cuda
Post by: arulbero on October 24, 2021, 06:57:39 PM

I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.

You can choose:

1) 2^30 baby-steps: 1*G, 2*G, ..., 2^30*G
and
2^29 giant steps: P-1*2^31*G, P-2*2^31*G,..,P -a*2^31*G where a is in [1,2,.., 2^29 - 1]

or

2) 2^29 baby-steps: 1*G, 2*G, ..., 2^29*G 
and
2^30 giant steps: P-1*2^30*G, P-2*2^30*G,..., P -a*2^30*G  where a is in [1,2,..., 2^30 - 1]

I don't understand why:  "G*(2^31), G*(2^32), G*(2^33)": in this way you compute only 30 giant steps





Title: Re: BSGS solver for cuda
Post by: Etar on October 24, 2021, 07:05:38 PM

I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.

You can choose:

1) 2^30 baby-steps: 1*G, 2*G, ..., 2^30*G
and
2^29 giant steps: P-1*2^31*G, P-2*2^31*G,..,P -a*2^31*G where a is in [1,2,.., 2^29 - 1]

or

2) 2^29 baby-steps: 1*G, 2*G, ..., 2^29*G  
and
2^30 giant steps: P-1*2^30*G, P-2*2^30*G,..., P -a*2^30*G  where a is in [1,2,..., 2^30 - 1]

I don't understand why:  "G*(2^31), G*(2^32), G*(2^33)": in this way you compute only 30 giant steps




it is universal arrays for any ranges. You not need to recompute array if you change range or pubkeys.
size of giant array is equil to thread number * block number * pparam
for 2080ti i use 512 thread 138 blocks and pparam = 480
so totaly i have 33914880 doubled giant values
So each cuda kernel call calculate 33914880 * 2(due to +y/-y in batch additions) giat steps


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on October 29, 2021, 03:42:12 AM
based on above table you can increase speed if you will utilize both bloom+bp https://github.com/iceland2k14/bsgs (https://github.com/iceland2k14/bsgs)
so CPU cores are less powerful than cuda and i was thinking [not sure possible or not] if we load all bp in RAM and use some bloom in GPU memory perhaps their will be some dramatic speed boost

I had try iceland2k14's BSGS
Intel i7-7800X  + 24 GB  DDR4-2400

Code:
D:\python\BSGS_ice>python bsgs_dll_secp256k1.py -p 0385a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f963 -b bPfile.bin -bl bloomfile.bin -n 1000000000000000 -keyspace 40000000000000:80000000000000

[+] Starting Program : BSGS mode     Version [ 13072021 ]
[+] Search Started for the Public key:  0485a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f9630eb400323654cec63999b56f4ba44e8b21ab92d9d697fabe4666df3678585669
[+] Search Mode: Sequential search in the given range
[+] Reading bloom filter from file complete in : 26.70087 sec
[+] Reading Baby table from file complete in : 0.03731 sec
[+] seq value: 1000000000000000    m value : 3999999999
[+] Search Range: 0x40000000000000  to  0x80000000000000
                                                                [+] k1: 0x40000000000000
PVK not found. 1000.00000 Trillion scanned in 1.14 sec. New range [+] k1: 0x438d7ea4c68001
PVK not found. 1000.00000 Trillion scanned in 1.18 sec. New range [+] k1: 0x471afd498d0002
PVK not found. 1000.00000 Trillion scanned in 1.03 sec. New range [+] k1: 0x4aa87bee538003
PVK not found. 1000.00000 Trillion scanned in 1.05 sec. New range [+] k1: 0x4e35fa931a0004
PVK not found. 1000.00000 Trillion scanned in 0.99 sec. New range [+] k1: 0x51c37937e08005
PVK not found. 1000.00000 Trillion scanned in 1.01 sec. New range [+] k1: 0x5550f7dca70006
PVK not found. 1000.00000 Trillion scanned in 1.00 sec. New range [+] k1: 0x58de76816d8007
PVK not found. 1000.00000 Trillion scanned in 1.01 sec. New range [+] k1: 0x5c6bf526340008
PVK not found. 1000.00000 Trillion scanned in 1.09 sec. New range [+] k1: 0x5ff973cafa8009
PVK not found. 1000.00000 Trillion scanned in 0.93 sec. New range [+] k1: 0x6386f26fc1000a
PVK not found. 1000.00000 Trillion scanned in 0.96 sec. New range [+] k1: 0x6714711487800b
PVK not found. 1000.00000 Trillion scanned in 1.00 sec. New range [+] k1: 0x6aa1efb94e000c
============== KEYFOUND ============== 1
BSGS FOUND PrivateKey  0x6abe1f9b67e114
======================================
Start 1  :  2021-10-29 11:34:19
Start 1_0:  2021-10-29 11:34:45
END   1  :  2021-10-29 11:34:59

1 CPU speed:
BSGS Check 0x38D7EA4C68000  key / second



Title: Re: BSGS solver for cuda
Post by: Etar on October 29, 2021, 05:20:18 AM
-snip-
Code:
============== KEYFOUND ============== 1
BSGS FOUND PrivateKey  0x6abe1f9b67e114
======================================
Start 1  :  2021-10-29 11:34:19
Start 1_0:  2021-10-29 11:34:45
END   1  :  2021-10-29 11:34:59

1 CPU speed:
BSGS Check 0x38D7EA4C68000  key / second


with bsgscuda and single 2080ti found in 1s.
Code:
FINDpubkey: 0385a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f963
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
***********GPU#0************
KEY[1]: 0x000000000000000000000000000000000000000000000000006abe1f9b67e114
   Pub: 0385a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f963
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:01s
bsgs cuda can be used in a rig, for example with 6-8 cards.


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on October 29, 2021, 10:51:42 AM
Quote
with bsgscuda and single 2080ti found in 1s.

Good....
Use my NVIDIA GeForce RTX 3090 Founders Edition
#65  only spent 38 seconds to solved
I have 3  RTX 3090 Founders Edition graphics cards 

Code:
D:\BTC\cuda_BSBG>SET pub=30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc
D:\BTC\cuda_BSBG>SET rangestart=0x0000000000000000000000000000000000000000000000010000000000000000
D:\BTC\cuda_BSBG>SET rangeend=  0x0000000000000000000000000000000000000000000000020000000000000000
D:\BTC\cuda_BSBG>SET thread_size=512
D:\BTC\cuda_BSBG>SET block_size=68
D:\BTC\cuda_BSBG>SET pparam_size=256
D:\BTC\cuda_BSBG>SET items_size=26
D:\BTC\cuda_BSBG>bsgscudaHT_1_7_1.exe -d 0 -t 512 -b 68 -p 256 -pb 30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc -pk 0x0000000000000000000000000000000000000000000000010000000000000000 -pke   0x0000000000000000000000000000000000000000000000020000000000000000 -w 26
Used GPU devices #0
Number of GPU threads set to #512
Number of GPU blocks set to #68
Number of pparam set to #256
Pubkey set to 30210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1be383c4a8ed4fac77c0d2ad737d8499a362f483f8fe39d1e86aaed578a9455dfc
Range begin: 0x0000000000000000000000000000000000000000000000010000000000000000
Range end: 0x0000000000000000000000000000000000000000000000020000000000000000
Items number set to 2^26
APP VERSION: 1.7.1
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce GT 1030(2047Mb)
Device have: MP:3 Cores+192
Shared memory total:49152
Constant memory total:65536
---------------
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000008000000
GiantSUBpubkey: 03a94c6524bd40d2bbdac85c056236a79da78bc61fd5bdec9d2bf26bd84b2438e8
*******************************
Total GPU Memory Need: 1328.000Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_33554432_htGPU.BIN
[0] chunk:536870912b
Generate Giants Buffer: 8912896 items
Load BIN file:512_68_256_67108864_g2.BIN
[0] chunk:570425344b
Done in 00:00:00s
GPU count #1
GPU #0 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 1328.000Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_67108864_33554432_htCPU.BIN
[0] chunk:805306368b
START RANGE= 0000000000000000000000000000000000000000000000010000000000000000
  END RANGE= 0000000000000000000000000000000000000000000000020000000000000000
WIDTH RANGE= 0000000000000000000000000000000000000000000000010000000000000000
SUBpoint= (3322d401243c4e2582a2147c104d6ecbf774d163db0f5e5313b7e0e742d0e6bd, a918f8681699b10a404fe643b2250648d7fa09c15d78c509db0c5d1593d7498f)
FINDpubkey: 0230210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1b
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000008b3000000000001 2225MKey/s x67108864 2^31.12 x2^27=2^58.12
GPU#0 Cnt:00000000000000000000000000000000000000000000000011b2800000000001 2302MKey/s x67108864 2^31.17 x2^27=2^58.17
GPU#0 Cnt:0000000000000000000000000000000000000000000000001a98800000000001 2271MKey/s x67108864 2^31.15 x2^27=2^58.15
GPU#0 Cnt:000000000000000000000000000000000000000000000000234b800000000001 2222MKey/s x67108864 2^31.12 x2^27=2^58.12
GPU#0 Cnt:0000000000000000000000000000000000000000000000002c07000000000001 2234MKey/s x67108864 2^31.13 x2^27=2^58.13
GPU#0 Cnt:00000000000000000000000000000000000000000000000034ba000000000001 2223MKey/s x67108864 2^31.12 x2^27=2^58.12
GPU#0 Cnt:0000000000000000000000000000000000000000000000003da8800000000001 2279MKey/s x67108864 2^31.15 x2^27=2^58.15
GPU#0 Cnt:0000000000000000000000000000000000000000000000004697000000000001 2279MKey/s x67108864 2^31.15 x2^27=2^58.15
GPU#0 Cnt:0000000000000000000000000000000000000000000000004f39000000000001 2206MKey/s x67108864 2^31.11 x2^27=2^58.11
GPU#0 Cnt:00000000000000000000000000000000000000000000000057fd000000000001 2238MKey/s x67108864 2^31.13 x2^27=2^58.13
GPU#0 Cnt:00000000000000000000000000000000000000000000000060c1000000000001 2237MKey/s x67108864 2^31.13 x2^27=2^58.13
GPU#0 Cnt:00000000000000000000000000000000000000000000000069c0800000000001 2297MKey/s x67108864 2^31.17 x2^27=2^58.17
GPU#0 Cnt:00000000000000000000000000000000000000000000000072c0000000000001 2302MKey/s x67108864 2^31.17 x2^27=2^58.17
GPU#0 Cnt:0000000000000000000000000000000000000000000000007b84000000000001 2238MKey/s x67108864 2^31.13 x2^27=2^58.13
GPU#0 Cnt:0000000000000000000000000000000000000000000000008404000000000001 2169MKey/s x67108864 2^31.08 x2^27=2^58.08
GPU#0 Cnt:0000000000000000000000000000000000000000000000008d0c000000000001 2305MKey/s x67108864 2^31.17 x2^27=2^58.17
GPU#0 Cnt:00000000000000000000000000000000000000000000000095bf000000000001 2219MKey/s x67108864 2^31.12 x2^27=2^58.12
GPU#0 Cnt:0000000000000000000000000000000000000000000000009e61000000000001 2205MKey/s x67108864 2^31.11 x2^27=2^58.11
GPU#0 Cnt:000000000000000000000000000000000000000000000000a714000000000001 2224MKey/s x67108864 2^31.12 x2^27=2^58.12
***********GPU#0************
KEY[1]: 0x000000000000000000000000000000000000000000000001a838b13505b26867
   Pub: 0230210c23b1a047bc9bdbb13448e67deddc108946de6de639bcc75d47c0216b1b
****************************
Found in 38 seconds
GPU#0 job finished
Working time 00:00:38s
Total time 00:00:43s
GPU#0 thread finished
cuda finished ok

Press Enter to exit


Title: Re: BSGS solver for cuda
Post by: Etar on October 29, 2021, 12:38:42 PM
Quote
with bsgscuda and single 2080ti found in 1s.

Good....
Use my NVIDIA GeForce RTX 3090 Founders Edition
#65  only spent 38 seconds to solved
I have 3  RTX 3090 Founders Edition graphics cards 
-snip-
you need to tune -w params (D:\BTC\cuda_BSBG>SET items_size=26) set at least 30 and -htsz  29
and you will solve this puzzle #64 16 time faster


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 01, 2021, 07:23:26 PM
3 * RTX 3090 (GPU Ram only use 5 GB ) ...but RTX3090 have 24 GB  DDR6
#70  My Computer took 53 seconds  to solve .....


Code:

D:\BTC\cuda_BSGS>bsgscudaHT_1_7_1.exe -d 0,1,2 -t 512 -b 68 -p 256 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -pke   0x0000000000000000000000000000000000000000000000400000000000000000 -w 29 -htsz 28
Used GPU devices #0,1,2
Number of GPU threads set to #512
Number of GPU blocks set to #68
Number of pparam set to #256
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Range end: 0x0000000000000000000000000000000000000000000000400000000000000000
Items number set to 2^29
HT size number set to 2^28
APP VERSION: 1.7.1
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce RTX 3090(24575Mb)
Device have: MP:82 Cores+10496
Shared memory total:49152
Constant memory total:65536
---------------
Cuda device:NVIDIA GeForce GT 1030(2047Mb)
Device have: MP:3 Cores+192
Shared memory total:49152
Constant memory total:65536
---------------
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000040000000
GiantSUBpubkey: 02e1efb9cd05adc63bcce10831d9538c479cf1d05fefdd08b2448d70422ede454c
*******************************
Total GPU Memory Need: 4912.000Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_536870912_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
Generate Giants Buffer: 8912896 items
Load BIN file:512_68_256_536870912_g2.BIN
[0] chunk:570425344b
Done in 00:00:00s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 4912.000Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #1 TotalBuff: 4912.000Mb
GPU #2 Free memory: 20450Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 4912.000Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_536870912_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
START RANGE= 0000000000000000000000000000000000000000000000200000000000000000
  END RANGE= 0000000000000000000000000000000000000000000000400000000000000000
WIDTH RANGE= 0000000000000000000000000000000000000000000000200000000000000000
SUBpoint= (534ccf6b740f9ec036c1861215c8a61f3b89ea46df2e6d96998b90bc1f17fc25, 2a8ea34f6374d224b9d51c22cd2abcaaf51c2d884022d72228e3809030178db9)
FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
GPU#2 Cnt:0000000000000000000000000000000000000000000000000088000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#1 Cnt:0000000000000000000000000000000000000000000000000044000000000001
GPU#1 Cnt:000000000000000000000000000000000000000000000000aacc000000000001 1818MKey/s x536870912 2^30.83 x2^30=2^60.83
GPU#0 Cnt:000000000000000000000000000000000000000000000000b5b0000000000001 1936MKey/s x536870912 2^30.92 x2^30=2^60.92
GPU#2 Cnt:000000000000000000000000000000000000000000000000bbcc000000000001 1990MKey/s x536870912 2^30.96 x2^30=2^60.96
GPU#1 Cnt:0000000000000000000000000000000000000000000000017864000000000001 2189MKey/s x536870912 2^31.10 x2^30=2^61.10
.....
.....
.....
[Delete]
.....
.....
.....
***********GPU#1************
KEY[1]: 0x0000000000000000000000000000000000000000000000349b84b6431a6c4ef1
   Pub: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
****************************
Found in 52 seconds
GPU#1 job finished
GPU#2 job finished
GPU#0 job finished
Working time 00:00:53s
Total time 00:01:40s
GPU#1 thread finished
GPU#2 thread finished
GPU#0 thread finished
cuda finished ok

Press Enter to exit


Title: Re: BSGS solver for cuda
Post by: dlystyr on November 01, 2021, 10:09:28 PM
Hi,

Thanks for the great program Etar. I was wondering if there is a way to save work? or if BSGS does not work like that? Or with the hash table, does it continue from where it left off already?

Thanks


Title: Re: BSGS solver for cuda
Post by: Etar on November 05, 2021, 06:48:44 PM
Hi,

Thanks for the great program Etar. I was wondering if there is a way to save work? or if BSGS does not work like that? Or with the hash table, does it continue from where it left off already?

Thanks
released v1.7.2
Current state is saved to file currentwork.txt(file name can`t be changed) every 180s (by default) but you can change this parametr with -wt
If app crash or you stop app, you can start working from the last saved state. if the launch configuration has not been changed.
set parametr -wl in your bat file with file name of state and app will start from this state.
Also added presettings for each card(just showing) but you can try to use this presetings to fill full your GPU memory.


Title: Re: BSGS solver for cuda
Post by: Etar on November 05, 2021, 07:10:45 PM
3 * RTX 3090 (GPU Ram only use 5 GB ) ...but RTX3090 have 24 GB  DDR6
-snip-
Read what i write above and readme.md file on github.
You need increase -w parameter and set -htsz dependency of -w

The main task is to fill the GPU memory as much as possible with the help of the -w parameter(and -htsz)
And then fill free GPU  memory with the -p parameter -b (not more then x2 of SM) and -t (not more then 512)

Presettings say that for your RTX 3090 good config is:
-t 512 -b 328 -p 530 -w 31 -htsz 29
you fill 20436.750 MB from free 20450.000 but you need around 58GB of host RAM to generate all arrays.
with saved arrays you need much less memory to launch app.
In this case you perfomance will be around 2^62 per card.


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 09, 2021, 12:51:50 AM
I upgrade the memory DRAM to 128 GB  (  DDR4 - 32 GB * 4 DIMM )
Error !!!

-t 512 -b 328 -p 530 -w 31 -htsz 29 

Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_7_2.exe -d 0,1,2 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -t 512 -b 328 -p 530 -w 31 -htsz 29
Used GPU devices #0,1,2
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Number of GPU threads set to #512
Number of GPU blocks set to #328
Number of pparam set to #530
Items number set to 2^31
HT size number set to 2^29
APP VERSION: 1.7.2
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
WARNING! -htsz parametr is to low, should be 29
Current config hash[90cda0e4d9468c970904cb810d0c9ab13669eb78]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000100000000
GiantSUBpubkey: 02100f44da696e71672791d0a09b7bde459f1215a29b3c03bfefd7835b39a48db0
*******************************
Total GPU Memory Need: 20436.750Mb
*******************************
Free RAM[121601 MB], need[57344 MB]
Allocated (4294967424) for HT
Generate Babys Buffer: 2147483648 items
jobperthread: 178956970 items
Rest points: 8
Baby #0  (79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8)
Baby #1  (f399e5115afafe321274caf2c0c453f72401c6d7e4a75fd8f6894c52a023a07c6916b13d8f03db488ef5bb5e25eb75e0532bd82f1ebe62c238d6a2f31ce7172a)
Baby #2  (6eebc7557afc9c1493fdd7b030603702a64fa331abcf64e2175cd4f201c72b4b42864cc4fec022227b7c46b9dcdc6c26658493d71264b2b5acb09795c51c80ba)
Baby #3  (b19d2673eca5e243f1c31911340594416bf091de07c7913318c2f3cf6617c23ea69ee2f47d2dd9b0d596408e964aed25929b222cda616c4b171a40d8821b990f)
Baby #4  (51d96545704c794396e6b528bb84e64a86117c48b5ab7914c9fcc1f9a0df6233e3ff9160e9e0a3c8dc7fda4b6a602d82973b31317198609f9bfd98bdecb46d52)
Baby #5  (95e22caa7a7680bc663f900b6580d44d859b41ecceeab1dc6a658a2aad94492b0270fb74e81550ec29e7ae96c3b3a5bf58631fea8f4d30d5f5c2f2ca3adaf59f)
Baby #6  (be0b233985b3971f22f74933606bf7c8377774eaf2f5262af08f2ea3e9fad6eb2b62ac7d866dbe3a264ebe5d720380e83e214671dfc45e4696772e300000ea79)
Baby #7  (0e67e9aa85efc1c3d52e0f747d85d4673c02f25fb5069bbcc8dfcb541f13fe225d41f44a46e5113fc9aa1d62abe7e835a40d2020d6f7a7a79f905638acf2c57a)
Baby #8  (3ce8429969fdf6f15caf74a5efbd272bd23925b1d122a18f776fb62c7f03f4d9957d92c68bff8591dcce1c30a99d944028082abdf7ec41b1020bbbc534d5138d)
Baby #9  (5792bf775ade74c78fd7bb013a8bdc1125f8005903c61a4192ab4efa2b6722b3bc91f0bd0f33b12fe3c7119985cf80ab94803ee0adf202c479bf02c97826ecef)
Baby #10  (82985dcf6155434cfe4550c5b7767e65f29c76ef431a6b724689996d7359257d57acb9c852a0b365f399b71e478097bc0ed00c771a9aeb7522dac12418470616)
Baby #11  (5f38268a5099977ea286bc1ae082867d96fcbb70060c0c1ff37063351324a883696c2b51721888dd3f702b2c30b68155d0cce4c84e5ff56d64f0a0b16498a43a)
Baby #0  (815eb241e04b5649154819a5464e176a5003b4aa748f7c2b7073f617eef0f2fc401d939a55d3c23c86edaf8ac8b20f84630bd672cc78f6e6ebfd902e6b339817)
Total: 17179869184 bytes
Save BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_b.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
Saved:17179869184 bytes
Done in 00:02:15s
Verify baby array...ok
Add baby points to HashTable..100%
----------HashTable Info----------
Table size: 2^29x8=4294967296 bytes
Table mask: 1FFFFFFF
Table used: 98.17%
Total unique hashes: 527035716 = 24.5%
Total hashes: 2147483648=2147483648x8=17179869184 bytes
Total 21474836480 bytes = 20480.0Mb
Total colisions:1620447932 = 75.5%
Max. colisions:21
----------------------------------
Sorting HT items...Value exist!!!>ECEAB5E0
[0] 44CE1E61 (33D0F5ED)
[1] AAAA6CA1 (3C15D41)
[2] ECEAB5E0 (23368524)
[3] ECEAB5E0 (41E88F24)
[4] ACBD7030 (7F072F4F)
Try increase -htsz
Press Enter to exit




then.......

-t 512 -b 328 -p 530 -w 31 -htsz 30

Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_7_2.exe -d 0,1,2 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -t 512 -b 328 -p 530 -w 31 -htsz 30
Used GPU devices #0,1,2
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Number of GPU threads set to #512
Number of GPU blocks set to #328
Number of pparam set to #530
Items number set to 2^31
HT size number set to 2^30
APP VERSION: 1.7.2
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
-htsz should be less than 30
Press Enter to exit



[moderator's note: consecutive posts merged]


Title: Re: BSGS solver for cuda
Post by: Etar on November 09, 2021, 06:58:24 AM
I upgrade the memory DRAM to 128 GB  (  DDR4 - 32 GB * 4 DIMM )
Error !!!

-t 512 -b 328 -p 530 -w 31 -htsz 29  

Yeeh, you are right. With -w 31 and -htsz 29 there in HT can be collision with the same values. Even with -w 31 -htsz 30  HT will have collision.
So try for your configuration
-t 512 -b 328 -p 796 -w 30 -htsz 29 
total MB: 20430.500

or

-t 512 -b 328 -p 930 -w 30 -htsz 28
total MB: 20442.750


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 09, 2021, 06:49:30 PM
test #70
3 * RTX3090  use " -t 512 -b 328 -p 930 -w 30 -htsz 28 "
I get error message "error cuCtxSynchronize-700"
GPU Memory used 20919 MB~~~~~


Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_7_2.exe -d 0,1,2 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -t 512 -b 328 -p 930 -w 30 -htsz 28
Used GPU devices #0,1,2
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Number of GPU threads set to #512
Number of GPU blocks set to #328
Number of pparam set to #930
Items number set to 2^30
HT size number set to 2^28
APP VERSION: 1.7.2
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
Current config hash[6845c9505fb9be49a93bec5ca685a8075e7a607b]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000080000000
GiantSUBpubkey: 025318f9b1a2697010c5ac235e9af475a8c7e5419f33d47b18d33feeb329eb99a4
*******************************
Total GPU Memory Need: 20442.750Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_1073741824_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
Generate Giants Buffer: 156180480 items
Load BIN file:512_328_930_1073741824_g2.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
Last chunk:331874304b
[9] chunk:331874304b
Done in 00:00:04s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 20442.750Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #2 Free memory: 20450Mb
GPU #1 TotalBuff: 20442.750Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 20442.750Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_1073741824_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
START RANGE= 0000000000000000000000000000000000000000000000200000000000000000
SUBpoint= (534ccf6b740f9ec036c1861215c8a61f3b89ea46df2e6d96998b90bc1f17fc25, 2a8ea34f6374d224b9d51c22cd2abcaaf51c2d884022d72228e3809030178db9)
Save work every 180 seconds

FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Cnt:1 [3][ 0 0 0 ] = 0 MKeys/s x2^31=2^31.00 t:00:00:00error cuCtxSynchronize-700
Press Enter to exit
error cuCtxSynchronize-700
Press Enter to exit
error cuCtxSynchronize-700
Press Enter to exit
Cnt:1bed600000000001 [3][ 0 0 0 ] = 0 MKeys/s x2^31=2^31.00 t:00:01:08


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on November 09, 2021, 07:04:03 PM
test #70
3 * RTX3090  use " -t 512 -b 328 -p 930 -w 30 -htsz 28 "
I get error message "error cuCtxSynchronize-700"
GPU Memory used 20919 MB~~~~~


Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_7_2.exe -d 0,1,2 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -t 512 -b 328 -p 930 -w 30 -htsz 28
Used GPU devices #0,1,2
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Number of GPU threads set to #512
Number of GPU blocks set to #328
Number of pparam set to #930
Items number set to 2^30
HT size number set to 2^28
APP VERSION: 1.7.2
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 530 -w 31 -htsz 29 [20436.750 MB] Gen RAM[57344 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
Current config hash[6845c9505fb9be49a93bec5ca685a8075e7a607b]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000080000000
GiantSUBpubkey: 025318f9b1a2697010c5ac235e9af475a8c7e5419f33d47b18d33feeb329eb99a4
*******************************
Total GPU Memory Need: 20442.750Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_1073741824_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
Generate Giants Buffer: 156180480 items
Load BIN file:512_328_930_1073741824_g2.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
Last chunk:331874304b
[9] chunk:331874304b
Done in 00:00:04s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 20442.750Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #2 Free memory: 20450Mb
GPU #1 TotalBuff: 20442.750Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 20442.750Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_1073741824_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
START RANGE= 0000000000000000000000000000000000000000000000200000000000000000
SUBpoint= (534ccf6b740f9ec036c1861215c8a61f3b89ea46df2e6d96998b90bc1f17fc25, 2a8ea34f6374d224b9d51c22cd2abcaaf51c2d884022d72228e3809030178db9)
Save work every 180 seconds

FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Cnt:1 [3][ 0 0 0 ] = 0 MKeys/s x2^31=2^31.00 t:00:00:00error cuCtxSynchronize-700
Press Enter to exit
error cuCtxSynchronize-700
Press Enter to exit
error cuCtxSynchronize-700
Press Enter to exit
Cnt:1bed600000000001 [3][ 0 0 0 ] = 0 MKeys/s x2^31=2^31.00 t:00:01:08

Have you tried running with 1 GPU or at least all GPUs that can handle the same config? It's hard to tell if your 1030 is one of the GPUs selected.


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 09, 2021, 08:01:26 PM
Have you tried running with 1 GPU or at least all GPUs that can handle the same config? It's hard to tell if your 1030 is one of the GPUs selected.

YES , I use 1 GPU  still show this error

#0   NVIDIA GeForce RTX 3090    ( PCI-E x1   -  Plug in the card PIC-E x1 to x16 USB3.0 )
#1   NVIDIA GeForce RTX 3090    ( PCI-E x1   -  Plug in the card PIC-E x1 to x16 USB3.0 )
#2   NVIDIA GeForce RTX 3090    ( PCI-E x1   -  Plug in the card PIC-E X1 to x16 USB3.0 )
#3   NVIDIA GeForce GT 1030      ( PCI-E x16 -  Plug in the ASUS TUF x299 motherboard)

When using parameters  "-t 512 -b 328 -p 930 -w 30 -htsz 28 "~~ get this error
I use "-t 512 -b 68 -p 256 -w 29 -htsz 28" to solve #70 only 50~60 seconds


Title: Re: BSGS solver for cuda
Post by: COBRAS on November 10, 2021, 12:34:46 AM
Have you tried running with 1 GPU or at least all GPUs that can handle the same config? It's hard to tell if your 1030 is one of the GPUs selected.

YES , I use 1 GPU  still show this error

#0   NVIDIA GeForce RTX 3090    ( PCI-E x1   -  Plug in the card PIC-E x1 to x16 USB3.0 )
#1   NVIDIA GeForce RTX 3090    ( PCI-E x1   -  Plug in the card PIC-E x1 to x16 USB3.0 )
#2   NVIDIA GeForce RTX 3090    ( PCI-E x1   -  Plug in the card PIC-E X1 to x16 USB3.0 )
#3   NVIDIA GeForce GT 1030      ( PCI-E x16 -  Plug in the ASUS TUF x299 motherboard)

When using parameters  "-t 512 -b 328 -p 930 -w 30 -htsz 28 "~~ get this error
I use "-t 512 -b 68 -p 256 -w 29 -htsz 28" to solve #70 only 50~60 seconds


How long you need for solve 115 and 110 (one pubkey) ?


Title: Re: BSGS solver for cuda
Post by: Etar on November 10, 2021, 07:51:04 AM
test #70
3 * RTX3090  use " -t 512 -b 328 -p 930 -w 30 -htsz 28 "
I get error message "error cuCtxSynchronize-700"
GPU Memory used 20919 MB~~~~~
-snip-

cuCtxSynchronize-700 happened when giant array is more then 4gb(32bit pointer overflow in 1.7.2)
replaced pointer with 64bit and released v1.7.3 it should fixed issue with cuCtxSynchronize-700
https://github.com/Etayson/BSGS-cuda/releases/tag/v1.7.3 (https://github.com/Etayson/BSGS-cuda/releases/tag/v1.7.3)

PS. have an off topic question:
Ethereum blockchain have some interesting transaction that signed with unusual R signature like this:
000000000000000000000000000000000000000000000000000000000000002D
or 1820182018201820182018201820182018201820182018201820182018201820
or 8208208208208208208208208208208208208208208208208208208208208200
R=K*G.. how he calculated k for this beautiful R ?


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 10, 2021, 11:18:00 AM
bsgscudaHT_1_7_3.exe work well on  3 * RTX3090  (use 20GB  GPU RAM)

29 seconds solved #70 .......

Speed:  0xB5EA17F0A8A2A19E / second

 :D :D :D

Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_7_3.exe -d 0,1,2 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -t 512 -b 328 -p 930 -w 30 -htsz 28
Used GPU devices #0,1,2
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Number of GPU threads set to #512
Number of GPU blocks set to #328
Number of pparam set to #930
Items number set to 2^30
HT size number set to 2^28
APP VERSION: 1.7.3
Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
Current config hash[6845c9505fb9be49a93bec5ca685a8075e7a607b]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000080000000
GiantSUBpubkey: 025318f9b1a2697010c5ac235e9af475a8c7e5419f33d47b18d33feeb329eb99a4
*******************************
Total GPU Memory Need: 20442.750Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_1073741824_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
Generate Giants Buffer: 156180480 items
Load BIN file:512_328_930_1073741824_g2.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
Last chunk:331874304b
[9] chunk:331874304b
Done in 00:00:04s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 20442.750Mb
GPU #2 Free memory: 20450Mb
GPU #2 Total memory: 24575Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #2 TotalBuff: 20442.750Mb
GPU #1 TotalBuff: 20442.750Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_1073741824_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
START RANGE= 0000000000000000000000000000000000000000000000200000000000000000
SUBpoint= (534ccf6b740f9ec036c1861215c8a61f3b89ea46df2e6d96998b90bc1f17fc25, 2a8ea34f6374d224b9d51c22cd2abcaaf51c2d884022d72228e3809030178db9)
Save work every 180 seconds

FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Cnt:137dab000000000001 [3][ 1955 1970 2023 ] = 5949 MKeys/s x2^31=2^63.54 t:00:00:28
KEY[1]: 0x0000000000000000000000000000000000000000000000349b84b6431a6c4ef1
   Pub: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Working time 00:00:29s
Total time 00:02:57s
GPU#2 job finished
GPU#1 job finished
GPU#0 job finished
cuda finished ok

Press Enter to exit
GPU#2 thread finished
GPU#0 thread finished
GPU#1 thread finished


Title: Re: BSGS solver for cuda
Post by: Etar on November 10, 2021, 12:14:52 PM
bsgscudaHT_1_7_3.exe work well on  3 * RTX3090  (use 20GB  GPU RAM)

29 seconds solved #70 .......

Speed:  0xB5EA17F0A8A2A19E / second
-snip-
Good that it is working for you.
But you can try to play with -t -b -p because i don`t see reason to use all 20gb of GPU memory.
Maybe -t 512 -b 164 -p 512 (total 12128MB) and compare with your current result.


Title: Re: BSGS solver for cuda
Post by: jovica888 on November 10, 2021, 12:49:52 PM
So I need to know Public Key from the address to search the private key???
How can I find a Public key from the address - for example 1E4oDjEoBPXLS8vSYZ5dgjQEf4PZ4FLRhY


Title: Re: BSGS solver for cuda
Post by: bigvito19 on November 10, 2021, 12:57:58 PM
So I need to know Public Key from the address to search the private key???
How can I find a Public key from the address - for example 1E4oDjEoBPXLS8vSYZ5dgjQEf4PZ4FLRhY

That address doesn't have an outgoing transaction, the public key is not exposed for that address.


Title: Re: BSGS solver for cuda
Post by: jovica888 on November 10, 2021, 01:53:20 PM
I did my research... This is a very good tool <3


Title: Re: BSGS solver for cuda
Post by: _Counselor on November 10, 2021, 04:20:47 PM
PS. have an off topic question:
Ethereum blockchain have some interesting transaction that signed with unusual R signature like this:
000000000000000000000000000000000000000000000000000000000000002D
or 1820182018201820182018201820182018201820182018201820182018201820
or 8208208208208208208208208208208208208208208208208208208208208200
R=K*G.. how he calculated k for this beautiful R ?
That is smart contract transactions, read here: https://eips.ethereum.org/EIPS/eip-1820


Title: Re: BSGS solver for cuda
Post by: jovica888 on November 10, 2021, 05:12:23 PM
I did my research again. With cuBitCrack I search around 200Mkeys/s with 2x Nvidia 1060
I have a text file with 23milion addresses

The range of my search will be  0 to ffffffffffffffffffffffffffffffffffffffff - 2^160

With this software, first I manually found around 20 (just 20 not 20 million) public keys then started to scan and I got around 700Mkeys/s which is good... But I realized that my searching range is now 0 to 2^256 and also I am searching only 20 keys...

So why is this software better than cuBitCrack?


Title: Re: BSGS solver for cuda
Post by: Etar on November 10, 2021, 05:21:28 PM
-snip-
That is smart contract transactions, read here: https://eips.ethereum.org/EIPS/eip-1820
thanks for the link, I didn’t know that can be ignored the usual signing process.


Title: Re: BSGS solver for cuda
Post by: Etar on November 11, 2021, 12:21:09 PM
bsgscudaHT_1_7_3.exe work well on  3 * RTX3090  (use 20GB  GPU RAM)

29 seconds solved #70 .......

Speed:  0xB5EA17F0A8A2A19E / second
-snip-

@jacky19790729 can you test prerelease, just for testing -w ?
https://github.com/Etayson/BSGS-cuda/releases/tag/v.1.8.0-alpha (https://github.com/Etayson/BSGS-cuda/releases/tag/v.1.8.0-alpha)
use -t 512 -b 164 -p 512 -w 31 -htsz 29
if this works for you, try a puzzle#70 for example and let me know about result.
if all will be ok, try -t 256 -b 164 -p 512 -w 32 -htsz 28  with puzzle#70 for example and let me know about result.
(if there are warning messages when generating an arrays, just ignore them)
Thanks!


Title: Re: BSGS solver for cuda
Post by: jovica888 on November 11, 2021, 02:25:47 PM
Can you put the option to search multiple pubic keys... To search for example 100 keys at once? Not 1by1


Title: Re: BSGS solver for cuda
Post by: math09183 on November 11, 2021, 02:37:12 PM
Can you put the option to search multiple pubic keys... To search for example 100 keys at once? Not 1by1


You have no idea what you are talking about, what you are doing and what is the algorithm.


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 11, 2021, 04:07:05 PM
bsgscudaHT_1_8_0.exe   "-t 512 -b 164 -p 512 -w 31 -htsz 29"     #75     (  122  seconds  )
bsgscudaHT_1_8_0.exe   "-t 512 -b 164 -p 512 -w 31 -htsz 29"     #70     (   13   seconds  )
bsgscudaHT_1_8_0.exe   "-t 256 -b 164 -p 512 -w 31 -htsz 28"     #70     (   14   seconds  )

3 * RTX 3090   -    6700~6800  MKeys/s  

#75 result   (-t 512 -b 164 -p 512 -w 31 -htsz 29)

Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_8_0.exe -d 0,1,2 -pb 04726b574f193e374686d8e12bc6e4142adeb06770e0a2856f5e4ad89f660447559b15322e6707090a4db3f09c7e6632a26db57f03eb07b40979fc01c827e1b0a3 -pk 0x0000000000000000000000000000000000000000000004000000000000000000 -t 512 -b 164 -p 512 -w 31 -htsz 29
Used GPU devices #0,1,2
Pubkey set to 04726b574f193e374686d8e12bc6e4142adeb06770e0a2856f5e4ad89f660447559b15322e6707090a4db3f09c7e6632a26db57f03eb07b40979fc01c827e1b0a3
Range begin: 0x0000000000000000000000000000000000000000000004000000000000000000
Number of GPU threads set to #512
Number of GPU blocks set to #164
Number of pparam set to #512
Items number set to 2^31
HT size number set to 2^29
APP VERSION: 1.8.0-alpha
**********************************************************************************
* This version [1.8.0-alpha] may content various bugs,                                 *
* Don`t use this version for serious task.                                       *
* It is needed to test the possibility of using the -w parameter greater than 30 *
* if you accept this press ENTER to continue or close the program otherwise.     *
**********************************************************************************

Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
Current config hash[c2d68f5f2b9e02ec5c25955b0c7a9f3984666199]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000100000000
GiantSUBpubkey: 02100f44da696e71672791d0a09b7bde459f1215a29b3c03bfefd7835b39a48db0
*******************************
Total GPU Memory Need: 16224.000Mb
*******************************
Free RAM[120926 MB], need[57344 MB]
Allocated (4294967424) for HT
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_b.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
Done in 00:00:25s
Verify baby array...ok
Add baby points to HashTable..100%
----------HashTable Info----------
Table size: 2^29x8=4294967296 bytes
Table mask: 1FFFFFFF
Table used: 98.17%
Total unique hashes: 527035716 = 24.5%
Total hashes: 2147483648=2147483648x8=17179869184 bytes
Total 21474836480 bytes = 20480.0Mb
Total colisions:1620447932 = 75.5%
Max. colisions:21
----------------------------------
Sorting HT items...Value exist!!!>ECEAB5E0 (41E88F24)
Value exist!!!>ECEAB5E0 (23368524)
Value exist!!!>3C770B72 (492FEC6D)
Value exist!!!>3C770B72 (2C687C76)
Value exist!!!>812D6C3C (71D81206)
Value exist!!!>812D6C3C (275179B3)
ok
4
-1
8
3
Verify HT sorting...Warning !!! Same value founded:ECEAB5E0
Warning !!! Same value founded:3C770B72
Warning!!!
min set : 0
  value : 0
Warning !!! Same value founded:812D6C3C
ok
Verify HT items...ok
Pack HTCPU items...ok
Verify packed HTCPU items...ok
Verify packed HTCPU items sorting...Warning !!! Same value founded:ECEAB5E0
Warning !!! Same value founded:3C770B72
Warning!!!
min set : 0
  value : 0
Warning !!! Same value founded:812D6C3C
ok
Save BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_536870912_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
[18] chunk:1073741824b
[19] chunk:1073741824b
Saved:21474836480 bytes
Pack HTGPU items...ok
Save BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_536870912_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
Saved:12884901888 bytes
Removed Temp HashTable...wait
Total removed items: 2147483648, freed memory: 24876.033 MB
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_536870912_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
Generate Giants Buffer: 42991616 items
Giant #0  (100f44da696e71672791d0a09b7bde459f1215a29b3c03bfefd7835b39a48db032261ece6d5ff488d1370ccff3f6f9994800b5e700ae6a53f042a328d439a226)
Verify giant array...ok
Prepear Giant Buffer for GPU using...ok
Convert BigIntegers to 32b...ok
Freed memory: 2624.000 MB
Save BIN file:512_164_512_2147483648_g2.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
Last chunk:603979776b
[2] chunk:603979776b
Saved:2751463424 bytes
Done in 00:01:53s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 16224.002Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #1 TotalBuff: 16224.002Mb
GPU #2 Free memory: 20450Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 16224.002Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_536870912_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
[18] chunk:1073741824b
[19] chunk:1073741824b
Verify packed HTCPU items...ok
START RANGE= 0000000000000000000000000000000000000000000004000000000000000000
SUBpoint= (c62e58e6fc23c5bdbef2be8b131ff243f521196572d6b0e9f102588976134f96, bc687d82ba4e5e9873c29898acebe03a43047aca9c8ce3c17dd8812a2eb302b1)
Save work every 180 seconds

FINDpubkey: 03726b574f193e374686d8e12bc6e4142adeb06770e0a2856f5e4ad89f66044755
Cnt:c172e0000000000001 [3][ 2276 2357 2193 ] = 6827 MKeys/s x2^32=2^64.74 t:00:02:00
KEY[1]: 0x0000000000000000000000000000000000000000000004c5ce114686a1336e07
   Pub: 03726b574f193e374686d8e12bc6e4142adeb06770e0a2856f5e4ad89f66044755
Working time 00:02:02s
Total time 01:47:00s
GPU#2 job finished
GPU#0 job finished
GPU#1 job finished
GPU#2 thread finished
GPU#1 thread finished
GPU#0 thread finished
cuda finished ok

Press Enter to exit



#70 result  (-t 512 -b 164 -p 512 -w 31 -htsz 29)

Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_8_0.exe -d 0,1,2 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -t 512 -b 164 -p 512 -w 31 -htsz 29
Used GPU devices #0,1,2
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Number of GPU threads set to #512
Number of GPU blocks set to #164
Number of pparam set to #512
Items number set to 2^31
HT size number set to 2^29
APP VERSION: 1.8.0-alpha
**********************************************************************************
* This version [1.8.0-alpha] may content various bugs,                                 *
* Don`t use this version for serious task.                                       *
* It is needed to test the possibility of using the -w parameter greater than 30 *
* if you accept this press ENTER to continue or close the program otherwise.     *
**********************************************************************************

Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
Current config hash[62eb84d34d42f5ce8ec9fbd838cc1ae339b5585a]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000100000000
GiantSUBpubkey: 02100f44da696e71672791d0a09b7bde459f1215a29b3c03bfefd7835b39a48db0
*******************************
Total GPU Memory Need: 16224.000Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_536870912_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
Generate Giants Buffer: 42991616 items
Load BIN file:512_164_512_2147483648_g2.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
Last chunk:603979776b
[2] chunk:603979776b
Done in 00:00:01s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 16224.002Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #1 TotalBuff: 16224.002Mb
GPU #2 Free memory: 20450Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 16224.002Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_536870912_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
[18] chunk:1073741824b
[19] chunk:1073741824b
Verify packed HTCPU items...ok
START RANGE= 0000000000000000000000000000000000000000000000200000000000000000
SUBpoint= (534ccf6b740f9ec036c1861215c8a61f3b89ea46df2e6d96998b90bc1f17fc25, 2a8ea34f6374d224b9d51c22cd2abcaaf51c2d884022d72228e3809030178db9)
Save work every 180 seconds

FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Cnt:12b2c0000000000001 [3][ 2268 2262 2204 ] = 6734 MKeys/s x2^32=2^64.72 t:00:00:12
KEY[1]: 0x0000000000000000000000000000000000000000000000349b84b6431a6c4ef1
   Pub: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Working time 00:00:13s
Total time 00:02:41s
GPU#0 job finished
GPU#2 job finished
GPU#1 job finished
GPU#0 thread finished
GPU#2 thread finished
GPU#1 thread finished
cuda finished ok

Press Enter to exit

#70  result     ( -t 256 -b 164 -p 512 -w 31 -htsz 28  )   
Code:

D:\BTC\cuda_BSGS>bsgscudaHT_1_8_0.exe -d 0,1,2 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -t 256 -b 164 -p 512 -w 31 -htsz 28
Used GPU devices #0,1,2
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Number of GPU threads set to #256
Number of GPU blocks set to #164
Number of pparam set to #512
Items number set to 2^31
HT size number set to 2^28
APP VERSION: 1.8.0-alpha
**********************************************************************************
* This version [1.8.0-alpha] may content various bugs,                                 *
* Don`t use this version for serious task.                                       *
* It is needed to test the possibility of using the -w parameter greater than 30 *
* if you accept this press ENTER to continue or close the program otherwise.     *
**********************************************************************************

Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
WARNING! -htsz parametr is to low, should be at least 29
Current config hash[7d8a97073d15bafa3a2281eb853bf3ecb16338b4]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000100000000
GiantSUBpubkey: 02100f44da696e71672791d0a09b7bde459f1215a29b3c03bfefd7835b39a48db0
*******************************
Total GPU Memory Need: 12208.000Mb
*******************************
Free RAM[124693 MB], need[53248 MB]
Allocated (2147483776) for HT
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_b.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
Done in 00:00:14s
Verify baby array...ok
Add baby points to HashTable..100%
----------HashTable Info----------
Table size: 2^28x8=2147483648 bytes
Table mask: FFFFFFF
Table used: 99.97%
Total unique hashes: 268345621 = 12.5%
Total hashes: 2147483648=2147483648x8=17179869184 bytes
Total 19327352832 bytes = 18432.0Mb
Total colisions:1879138027 = 87.5%
Max. colisions:30
----------------------------------
Sorting HT items...Value exist!!!>ECEAB5E0 (41E88F24)
Value exist!!!>ECEAB5E0 (23368524)
Value exist!!!>2C6933 (7C5B963F)
Value exist!!!>2C6933 (20F8DF66)
Value exist!!!>3C770B72 (492FEC6D)
Value exist!!!>3C770B72 (2C687C76)
Value exist!!!>812D6C3C (71D81206)
Value exist!!!>812D6C3C (275179B3)
ok
11
-1
14
13
Verify HT sorting...Warning !!! Same value founded:ECEAB5E0
Warning !!! Same value founded:2C6933
Warning !!! Same value founded:3C770B72
Warning!!!
min set : 0
  value : 0
Warning !!! Same value founded:812D6C3C
ok
Verify HT items...ok
Pack HTCPU items...ok
Verify packed HTCPU items...ok
Verify packed HTCPU items sorting...Warning !!! Same value founded:ECEAB5E0
Warning !!! Same value founded:2C6933
Warning !!! Same value founded:3C770B72
Warning!!!
min set : 0
  value : 0
Warning !!! Same value founded:812D6C3C
ok
Save BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
Saved:19327352832 bytes
Pack HTGPU items...ok
Save BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
Saved:10737418240 bytes
Removed Temp HashTable...wait
Total removed items: 2147483648, freed memory: 20512.974 MB
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
Generate Giants Buffer: 21495808 items
Giant #0  (100f44da696e71672791d0a09b7bde459f1215a29b3c03bfefd7835b39a48db032261ece6d5ff488d1370ccff3f6f9994800b5e700ae6a53f042a328d439a226)
Verify giant array...ok
Prepear Giant Buffer for GPU using...ok
Convert BigIntegers to 32b...ok
Freed memory: 1312.000 MB
Save BIN file:256_164_512_2147483648_g2.BIN
[0] chunk:1073741824b
Last chunk:301989888b
[1] chunk:301989888b
Saved:1375731712 bytes
Done in 00:00:57s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 12208.002Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #1 TotalBuff: 12208.002Mb
GPU #2 Free memory: 20450Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 12208.002Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
Verify packed HTCPU items...ok
START RANGE= 0000000000000000000000000000000000000000000000200000000000000000
SUBpoint= (534ccf6b740f9ec036c1861215c8a61f3b89ea46df2e6d96998b90bc1f17fc25, 2a8ea34f6374d224b9d51c22cd2abcaaf51c2d884022d72228e3809030178db9)
Save work every 180 seconds

FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Cnt:1067f0000000000001 [3][ 2011 1913 2010 ] = 5935 MKeys/s x2^32=2^64.54 t:00:00:12
KEY[1]: 0x0000000000000000000000000000000000000000000000349b84b6431a6c4ef1
   Pub: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Working time 00:00:14s
Total time 01:15:16s
GPU#0 job finished
GPU#2 job finished
GPU#1 job finished
GPU#1 thread finished
GPU#0 thread finished
GPU#2 thread finished
cuda finished ok

Press Enter to exit


Title: Re: BSGS solver for cuda
Post by: Etar on November 11, 2021, 05:31:39 PM
bsgscudaHT_1_8_0.exe   "-t 512 -b 164 -p 512 -w 31 -htsz 29"     #75     (  122  seconds  )
bsgscudaHT_1_8_0.exe   "-t 512 -b 164 -p 512 -w 31 -htsz 29"     #70     (   13   seconds  )
bsgscudaHT_1_8_0.exe   "-t 256 -b 164 -p 512 -w 31 -htsz 28"     #70     (   14   seconds  )

3 * RTX 3090   -    6700~6800  MKeys/s  
-snip-
Good and thanks! If possible try the configuration -t 256 -b 164 -p 512 -w 32 -htsz 28


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 11, 2021, 06:58:48 PM
Quote
Good and thanks! If possible try the configuration -t 256 -b 164 -p 512 -w 32 -htsz 28

"-t 256 -b 164 -p 512 -w 32 -htsz 28"

 I try it , then get this error message "-w should be less than 32"

Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_8_0.exe -d 0,1,2 -pb 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0 -pk 0x0000000000000000000000000000000000000000000000200000000000000000 -t 256 -b 164 -p 512 -w 32 -htsz 28
Used GPU devices #0,1,2
Pubkey set to 90e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483d7319f127105f492fd15e009b103b4a83295722f28f07c95f9a5443ef8e77ce0
Range begin: 0x0000000000000000000000000000000000000000000000200000000000000000
Number of GPU threads set to #256
Number of GPU blocks set to #164
Number of pparam set to #512
Items number set to 2^32
HT size number set to 2^28
APP VERSION: 1.8.0-alpha
**********************************************************************************
* This version [1.8.0-alpha] may content various bugs,                                 *
* Don`t use this version for serious task.                                       *
* It is needed to test the possibility of using the -w parameter greater than 30 *
* if you accept this press ENTER to continue or close the program otherwise.     *
**********************************************************************************

Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
-w should be less than 32
Press Enter to exit



Title: Re: BSGS solver for cuda
Post by: Etar on November 11, 2021, 07:08:16 PM
-snip-

"-t 256 -b 164 -p 512 -w 32 -htsz 28"

 I try it , then get this error message "-w should be less than 32"
-snip-

try reload release, i was update max -w parameter


Title: Re: BSGS solver for cuda
Post by: jovica888 on November 11, 2021, 07:20:53 PM
Can you put the option to search multiple pubic keys... To search for example 100 keys at once? Not 1by1


You have no idea what you are talking about, what you are doing and what is the algorithm.

I saw that program search only 1 public key - what did I ask wrong?


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on November 11, 2021, 07:41:54 PM
Can you put the option to search multiple pubic keys... To search for example 100 keys at once? Not 1by1


You have no idea what you are talking about, what you are doing and what is the algorithm.

I saw that program search only 1 public key - what did I ask wrong?
math09183 does not know...math09183 comes in, says some stuff, and hopes it sticks. Like trying to nail jello to a wall.
Someone has asked Etar already. Although it could be done, it would reduce the overall speed. So if your speed is 100 MKey/s searching for 1 pubkey and then you searched for 50 pubkeys at once, your speed would now be roughly 100/50 = 2 MKey/s. It's best to break total range into smaller subranges and search multiple pubkeys that way; or at least that way requires no additional tweaks to the main BSGS cuda code.


Title: Re: BSGS solver for cuda
Post by: Etar on November 11, 2021, 07:49:42 PM

I saw that program search only 1 public key - what did I ask wrong?
save public keys to file  mypubs.txt for example and  and set parametr -infile mypubs.txt


Title: Re: BSGS solver for cuda
Post by: jovica888 on November 11, 2021, 08:12:05 PM
In cmd it says

-infile  Set file with pubkey for searching in uncompressed/compressed  format (search sequential)

So it will get 1st key and then search it until it finds it and it will search for 2nd 3rd 4th... to the end of list


Title: Re: BSGS solver for cuda
Post by: math09183 on November 12, 2021, 07:22:28 AM
Can you put the option to search multiple pubic keys... To search for example 100 keys at once? Not 1by1


You have no idea what you are talking about, what you are doing and what is the algorithm.

I saw that program search only 1 public key - what did I ask wrong?
math09183 does not know...math09183 comes in, says some stuff, and hopes it sticks.

@jovica888:
1) you did not read the topic, the question was already asked
2) it makes no sense in terms of performance. It is like watching Gordon Ramsay preparing lunch and asking "could you also do ironing and dancing at the same moment"?

It is important to understand moment when you switch from consecutive work which could (sooner or later) guarantee success into playing lottery and wishing for luck.



Title: Re: BSGS solver for cuda
Post by: demoinvest1 on November 12, 2021, 09:28:43 AM

I try run bsgscudaHT_1_7_3.exe and bsgscudaHT_1_8_0.exe is work fine

but for code on github  bsgscudaussualHTchangeble1_7_3.pb
I try use purebasic v5.70 run it but not work with SHA1Fingerprint function
How can I fix it?

Just try understand method BSGS how it works?


Title: Re: BSGS solver for cuda
Post by: Etar on November 12, 2021, 09:45:45 AM

I try run bsgscudaHT_1_7_3.exe and bsgscudaHT_1_8_0.exe is work fine

but for code on github  bsgscudaussualHTchangeble1_7_3.pb
I try use purebasic v5.70 run it but not work with SHA1Fingerprint function
How can I fix it?

Just try understand method BSGS how it works?

you need purebasic v5.31 because need ascll mode enabled
in new version of PB removed ascll mode.


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 12, 2021, 11:15:12 AM
"-t 256 -b 164 -p 512 -w 32 -htsz 28" generate all BIN files need 90 GB RAM and 4 hours running time

Part1:  -t 256 -b 164 -p 512 -w 32 -htsz 28  (search 6 keys publickey.txt)  3 keys lost
Part2:  -t 256 -b 164 -p 512 -w 31 -htsz 28  (search 6 keys publickey.txt )  0 keys lost  (Fix)

Part1 log:  3 keys lost
Code:
D:\BTC\cuda_BSGS>bsgscudaHT_1_8_0.exe -d 0,1,2 -infile publickey.txt -pk 0x0000000000000000000000000000000000000000000000000000000000000001 -pke   0x0000000000000000000000000000000000000000000000800000000000000000 -t 256 -b 164 -p 512 -w 32 -htsz 28
Used GPU devices #0,1,2
Will be used file: publickey.txt
Range begin: 0x0000000000000000000000000000000000000000000000000000000000000001
Range end: 0x0000000000000000000000000000000000000000000000800000000000000000
Number of GPU threads set to #256
Number of GPU blocks set to #164
Number of pparam set to #512
Items number set to 2^32
HT size number set to 2^28
APP VERSION: 1.8.0-alpha
**********************************************************************************
* This version [1.8.0-alpha] may content various bugs,                           *
* Don`t use this version for serious task.                                       *
* It is needed to test the possibility of using the -w parameter greater than 30 *
* if you accept this press ENTER to continue or close the program otherwise.     *
**********************************************************************************

Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
WARNING! -htsz parametr is to low, should be at least 30
Current config hash[2ce1c0810496648d20e741af064b822b32b66576]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000200000000
GiantSUBpubkey: 038c0989f2ceb5c771a8415dff2b4c4199d8d9c8f9237d08084b05284f1e4df706
*******************************
Total GPU Memory Need: 20400.000Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_4294967296_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
Generate Giants Buffer: 21495808 items
Load BIN file:256_164_512_4294967296_g2.BIN
[0] chunk:1073741824b
Last chunk:301989888b
[1] chunk:301989888b
Done in 00:00:01s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 20400.002Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #1 TotalBuff: 20400.002Mb
GPU #2 Free memory: 20450Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 20400.002Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_4294967296_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
[18] chunk:1073741824b
[19] chunk:1073741824b
[20] chunk:1073741824b
[21] chunk:1073741824b
[22] chunk:1073741824b
[23] chunk:1073741824b
[24] chunk:1073741824b
[25] chunk:1073741824b
[26] chunk:1073741824b
[27] chunk:1073741824b
[28] chunk:1073741824b
[29] chunk:1073741824b
[30] chunk:1073741824b
[31] chunk:1073741824b
[32] chunk:1073741824b
[33] chunk:1073741824b
Verify packed HTCPU items...ok
START RANGE= 0000000000000000000000000000000000000000000000000000000000000001
  END RANGE= 0000000000000000000000000000000000000000000000800000000000000000
WIDTH RANGE= 00000000000000000000000000000000000000000000007fffffffffffffffff
SUBpoint= (79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798, b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777)
Save work every 180 seconds

FINDpubkey: 02a521a07e98f78b03fc1e039bc3a51408cd73119b5eb116e583fe57dc8db07aea
Cnt:7b0520000000000001 [3][ 1704 1658 1800 ] = 5163 MKeys/s x2^33=2^65.33 t:00:00:51
GPU#2 job finished
GPU#1 job finished
GPU#0 job finished

FINDpubkey: 0311569442e870326ceec0de24eb5478c19e146ecd9d15e4666440f2f638875f42
Cnt:1 [3][ 0 0 0 ] = 0 MKeys/s x2^33=2^33.00 t:00:00:53
KEY[2]: 0x00000000000000000000000000000000000000000000000002c675b852189a21
   Pub: 0311569442e870326ceec0de24eb5478c19e146ecd9d15e4666440f2f638875f42
Working time 00:00:53s
GPU#2 job finished
GPU#0 job finished
GPU#1 job finished

FINDpubkey: 0241267d2d7ee1a8e76f8d1546d0d30aefb2892d231cee0dde7776daf9f8021485
Cnt:1 [3][ 0 0 0 ] = 0 MKeys/s x2^33=2^33.00 t:00:00:53
KEY[3]: 0x00000000000000000000000000000000000000000000000007496cbb87cab44f
   Pub: 0241267d2d7ee1a8e76f8d1546d0d30aefb2892d231cee0dde7776daf9f8021485
Working time 00:00:53s
GPU#1 job finished
GPU#0 job finished
GPU#2 job finished

FINDpubkey: 026a12fe6199cd9ed5fdeaaa72432f329054fb594a50619cc26d0a5457f212fb26
Cnt:7bf600000000000001 [3][ 1683 1778 1706 ] = 5168 MKeys/s x2^33=2^65.34 t:00:01:44
GPU#2 job finished
GPU#1 job finished
GPU#0 job finished

FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Cnt:7c3dc0000000000001 [3][ 1650 1605 1749 ] = 5005 MKeys/s x2^33=2^65.29 t:00:02:36
GPU#0 job finished
GPU#1 job finished
GPU#2 job finished

FINDpubkey: 03726b574f193e374686d8e12bc6e4142adeb06770e0a2856f5e4ad89f66044755
Cnt:7b0520000000000001 [3][ 1698 1699 1728 ] = 5126 MKeys/s x2^33=2^65.32 t:00:03:28
Total time 00:07:50s
GPU#0 job finished
GPU#2 job finished
GPU#1 job finished
cuda finished ok

Press Enter to exit
GPU#2 thread finished
GPU#1 thread finished
GPU#0 thread finished

==== win.txt =====
KEY[2]: 0x00000000000000000000000000000000000000000000000002c675b852189a21
   Pub: 0311569442e870326ceec0de24eb5478c19e146ecd9d15e4666440f2f638875f42
KEY[3]: 0x00000000000000000000000000000000000000000000000007496cbb87cab44f
   Pub: 0241267d2d7ee1a8e76f8d1546d0d30aefb2892d231cee0dde7776daf9f8021485


Part2 log:  0 keys lost (Fix)
Code:

D:\BTC\cuda_BSGS>SET rangestart=0x0000000000000000000000000000000000000000000000000000000000000001
D:\BTC\cuda_BSGS>SET rangeend=  0x0000000000000000000000000000000000000000000000800000000000000000
D:\BTC\cuda_BSGS>bsgscudaHT_1_8_0.exe -d 0,1,2 -infile publickey.txt -pk 0x0000000000000000000000000000000000000000000000000000000000000001 -pke   0x0000000000000000000000000000000000000000000000800000000000000000 -t 256 -b 164 -p 512 -w 31 -htsz 28
Used GPU devices #0,1,2
Will be used file: publickey.txt
Range begin: 0x0000000000000000000000000000000000000000000000000000000000000001
Range end: 0x0000000000000000000000000000000000000000000000800000000000000000
Number of GPU threads set to #256
Number of GPU blocks set to #164
Number of pparam set to #512
Items number set to 2^31
HT size number set to 2^28
APP VERSION: 1.8.0-alpha
**********************************************************************************
* This version [1.8.0-alpha] may content various bugs,                           *
* Don`t use this version for serious task.                                       *
* It is needed to test the possibility of using the -w parameter greater than 30 *
* if you accept this press ENTER to continue or close the program otherwise.     *
**********************************************************************************

Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
WARNING! -htsz parametr is to low, should be at least 29
Current config hash[dbc42a4d70dae8f383fff1b2e72a68cb25e83273]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000100000000
GiantSUBpubkey: 02100f44da696e71672791d0a09b7bde459f1215a29b3c03bfefd7835b39a48db0
*******************************
Total GPU Memory Need: 12208.000Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_268435456_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
Generate Giants Buffer: 21495808 items
Load BIN file:256_164_512_2147483648_g2.BIN
[0] chunk:1073741824b
Last chunk:301989888b
[1] chunk:301989888b
Done in 00:00:01s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #0 Free memory: 20450Mb
GPU #1 TotalBuff: 12208.002Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 12208.002Mb
GPU #2 Free memory: 20450Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 12208.002Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_268435456_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
Verify packed HTCPU items...ok
START RANGE= 0000000000000000000000000000000000000000000000000000000000000001
  END RANGE= 0000000000000000000000000000000000000000000000800000000000000000
WIDTH RANGE= 00000000000000000000000000000000000000000000007fffffffffffffffff
SUBpoint= (79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798, b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777)
Save work every 180 seconds

FINDpubkey: 02a521a07e98f78b03fc1e039bc3a51408cd73119b5eb116e583fe57dc8db07aea
Cnt:1 [3][ 0 0 0 ] = 0 MKeys/s x2^32=2^32.00 t:00:00:00
KEY[1]: 0x00000000000000000000000000000000000000000000000001eb25c90795d61c
   Pub: 02a521a07e98f78b03fc1e039bc3a51408cd73119b5eb116e583fe57dc8db07aea
Working time 00:00:01s
GPU#0 job finished
GPU#2 job finished
GPU#1 job finished

FINDpubkey: 0311569442e870326ceec0de24eb5478c19e146ecd9d15e4666440f2f638875f42
Cnt:1 [3][ 0 0 0 ] = 0 MKeys/s x2^32=2^32.00 t:00:00:01
KEY[2]: 0x00000000000000000000000000000000000000000000000002c675b852189a21
   Pub: 0311569442e870326ceec0de24eb5478c19e146ecd9d15e4666440f2f638875f42
Working time 00:00:01s
GPU#1 job finished
GPU#2 job finished
GPU#0 job finished

FINDpubkey: 0241267d2d7ee1a8e76f8d1546d0d30aefb2892d231cee0dde7776daf9f8021485
Cnt:1 [3][ 0 0 0 ] = 0 MKeys/s x2^32=2^32.00 t:00:00:01
KEY[3]: 0x00000000000000000000000000000000000000000000000007496cbb87cab44f
   Pub: 0241267d2d7ee1a8e76f8d1546d0d30aefb2892d231cee0dde7776daf9f8021485
Working time 00:00:01s
GPU#2 job finished
GPU#1 job finished
GPU#0 job finished

FINDpubkey: 026a12fe6199cd9ed5fdeaaa72432f329054fb594a50619cc26d0a5457f212fb26
Cnt:c66f0000000000001 [3][ 2127 1839 2103 ] = 6070 MKeys/s x2^32=2^64.57 t:00:00:10
KEY[4]: 0x00000000000000000000000000000000000000000000000c6b015f4d1eb25c9a
   Pub: 026a12fe6199cd9ed5fdeaaa72432f329054fb594a50619cc26d0a5457f212fb26
Working time 00:00:10s
GPU#0 job finished
GPU#2 job finished
GPU#1 job finished

FINDpubkey: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Cnt:33c7d0000000000001 [3][ 2092 1889 1995 ] = 5977 MKeys/s x2^32=2^64.55 t:00:00:46
KEY[5]: 0x0000000000000000000000000000000000000000000000349b84b6431a6c4ef1
   Pub: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483
Working time 00:00:46s
GPU#1 job finished
GPU#0 job finished
GPU#2 job finished

FINDpubkey: 03726b574f193e374686d8e12bc6e4142adeb06770e0a2856f5e4ad89f66044755
Cnt:7fc3c0000000000001 [3][ 1944 1946 2060 ] = 5951 MKeys/s x2^32=2^64.54 t:00:02:16
Total time 00:04:37s
GPU#1 job finished
GPU#0 job finished
GPU#2 job finished
GPU#2 thread finished
GPU#1 thread finished
GPU#0 thread finished
cuda finished ok

Press Enter to exit

============ win.txt ===============
KEY[1]: 0x00000000000000000000000000000000000000000000000001eb25c90795d61c
   Pub: 02a521a07e98f78b03fc1e039bc3a51408cd73119b5eb116e583fe57dc8db07aea
KEY[2]: 0x00000000000000000000000000000000000000000000000002c675b852189a21
   Pub: 0311569442e870326ceec0de24eb5478c19e146ecd9d15e4666440f2f638875f42
KEY[3]: 0x00000000000000000000000000000000000000000000000007496cbb87cab44f
   Pub: 0241267d2d7ee1a8e76f8d1546d0d30aefb2892d231cee0dde7776daf9f8021485
KEY[4]: 0x00000000000000000000000000000000000000000000000c6b015f4d1eb25c9a
   Pub: 026a12fe6199cd9ed5fdeaaa72432f329054fb594a50619cc26d0a5457f212fb26
KEY[5]: 0x0000000000000000000000000000000000000000000000349b84b6431a6c4ef1
   Pub: 0290e6900a58d33393bc1097b5aed31f2e4e7cbd3e5466af958665bc0121248483



Title: Re: BSGS solver for cuda
Post by: Etar on November 12, 2021, 12:18:02 PM
"-t 256 -b 164 -p 512 -w 32 -htsz 28" generate all BIN files need 90 GB RAM and 4 hours running time

Part1:  -t 256 -b 164 -p 512 -w 32 -htsz 28  (search 6 keys publickey.txt)  3 keys lost
Part2:  -t 256 -b 164 -p 512 -w 31 -htsz 28  (search 6 keys publickey.txt )  1 keys lost
-snip-
Many thanks, will investigate why lost happened.


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 12, 2021, 12:24:22 PM
"-t 256 -b 164 -p 512 -w 32 -htsz 28" generate all BIN files need 90 GB RAM and 4 hours running time

Part1:  -t 256 -b 164 -p 512 -w 32 -htsz 28  (search 6 keys publickey.txt)  3 keys lost
Part2:  -t 256 -b 164 -p 512 -w 31 -htsz 28  (search 6 keys publickey.txt )  1 keys lost
-snip-
Many thanks, will investigate why lost happened.

"-t 256 -b 164 -p 512 -w 31 -htsz 28" should be  0 keys lost
sorry,  I give error -pke  end range for #75 public key



Title: Re: BSGS solver for cuda
Post by: Etar on November 12, 2021, 12:29:13 PM
-snip-
"-t 256 -b 164 -p 512 -w 31 -htsz 28" should be  0 keys lost
sorry,  I give error -pke  end range for #75 public key
As i correct understand lost only in configuration with -w32 ?
With -w31 all keys found, correct?


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 12, 2021, 12:49:15 PM
-snip-
"-t 256 -b 164 -p 512 -w 31 -htsz 28" should be  0 keys lost
sorry,  I give error -pke  end range for #75 public key
As i correct understand lost only in configuration with -w32 ?
With -w31 all keys found, correct?

yes....-w31  find all keys......  -w32 lost some keys
can you give me a file for testint ~~
public key list  ..... 2^1 ~ 2^75  range
I will try find all private key from your file  ....


Title: Re: BSGS solver for cuda
Post by: Etar on November 12, 2021, 01:10:33 PM
-snip-
can you give me a file for testint ~~
public key list  ..... 2^1 ~ 2^75  range
I will try find all private key from your file  ....


20keys in range 2^75  -pk 0x1 -pke 0x7ffffffffffffffffff
try with -w 31, because we already know that -w 32 have problem.
Code:
0473f38e8621417cef51fe848ba26f00a5e78ccc00852ab1c2ca56505e80a9b37810ca73afaf222a1f072ec9a7f48929c029c762f5fca422ec3e6bf1cc4589b946
04fbba0ce7fd86fb853b30db836f607ca4e221f6c803a9eb838d5a9520b2210bfd0625b1eef25758ec0072a9aa68aa4035d8aa4f276075b270cf41ec968ed9f86e
04d973d466855479d4116059c642734de4d2346a77dc31927e2862c1f38664d263bcb2b40600d6799ed09a9813d733ecedb94fb0a2eae65ebbb02b8dec60ef77f6
049723a4af59b41e095ac09ae5a3cdaf2f9b1c54ae7696f55681fc467309d0917353ea97526c7a6e560acea6a0907849ef639309b578b792c3312ee9a77b6c1817
040b4e8df661659fca212b30a4691033f362b03df397d46bab628af968ca90e2eec0663762d6ff90edcb8b4d91856488c967430a1bd04e8ecd0ae1a95271e8cc67
041df3f4dcf02db59898682b479b93e796ac59419b3c65b96a9b7ee44b08ad50ebcbc0d42eba64674d6f4ff4bf1850e7491eb91d1fc3862a462d0c0f4c1389604c
043a86e936da211ed72ab668b5ef2756e5a7fab6b3d7e9b6ccd059badaf6eb2a541df8e7dfc0aed56244453b982b6f9df5b8e003d7005400e9f710d9474e46efcf
04a096fd3aa76f05019dcb5988a0d1cef995272208f4589a36a9cfc941f745cb8e43ad34d57479eb363285252bfafccdd8f4dd100a94d569091b4f1b44a4d31903
04e39efdb3be5b61af4705cbb4ec06d5d29e6f56fb96d5888fcfcc13a734041009c1642c28ffd9ce68a62cebef80bad28be58fd9c1ca740371cb4ab5cb6e554c80
04dbb80d5ab39f6d9b09194fe8005c211333a266d470ab41d7411c0c9efb7aaa8612aca6c73296f55823d789cf7e8e7729a14ccd81c89c0096a26b28c10053ad35
043f10cf4fa0f7e019ea38d50f74c627902c50880d00b67fb9511fc5f324c37d6efd655f99308913ac4e8d8d8f2ffb46c549738931abf22092245db92655bd8404
04e1ec202deea26e3816f158d86943398e0cc97e285dcaf625a2a6f51ec68b7294eed00abedd1d3093a4e03c631c6390d35bc5b1406b0c3ed29e8f429f66f0169f
04eafa8c22fb132b31daba9d709efefcd22c7226fae8bcca2bc52cbeaf0f807df548dee5795ca99db7c4dc57f2494a96626d19c10d4f51b483f52e550eb3bca3fe
04a92c4ab0b4fdda1bffd64caf982f0811554abb7a4a90725fcd5b53a76a8e20b3918f6e63933516ecbd57ee981b086b726fd65974995624bf7ba251a1f2eeb751
047d5fca8e520358bdbc407d4c12261bb4e891b916e0e0b325a2376aea7b79dcdaead7fdd3237e10b68bc10774b8a966bb5138b8cbf03fad05b1717db3ac44fa7f
045dccbcb13a944789afa59b7b2735939b01b3f13a7eab24a5480eda98be87684629511cd67e1e64a1082dbc7444e7645a81a0ac3c01472a3dfdba949a9da9bb93
04d760337b9b5c06d1b54d5cd1156bace85beb7cbb80af7f786372cb912171dc8aaf48afd32129a052d5e75ce372d40c38fbc4334a48ed137083dca9f56b98a338
040f5f18bebe20c1d37dcb1a2f47ad3595b7d5b784dcdf55ffc8354b64a3337ba8a6434c284cd2560431ad7fac0978c81ccce36014ef3ff3daf6fe22432d4e9b32
0446a7291cef2c69deb429045dbc6938a8d8d005c8edccfe09b130467ee391a43bf1377def7f2553ab9abc4377310dcbb411dfc7d810b7bf4aa011cab919b713b2
04eecf02b33d095951eb7af031c49bda6c7dd1aad98342153a6f855986a5233610c4aa0d3a8e7e57b459db7597c71f2a3524fb798149a24c4a573f94463a6def01


Title: Re: BSGS solver for cuda
Post by: jacky19790729 on November 12, 2021, 03:37:39 PM
20keys in range 2^75  -pk 0x1 -pke 0x7ffffffffffffffffff
try with -w 31, because we already know that -w 32 have problem.

 "-t 512 -b 164 -p 512 -w 31 -htsz 29"  
I try to  find 1~10 keys ,  I think this is the fastest and no any lost key

Code:
KEY[1]: 0x00000000000000000000000000000000000000000000024b831f525b5544432d
   Pub: 0273f38e8621417cef51fe848ba26f00a5e78ccc00852ab1c2ca56505e80a9b378
KEY[2]: 0x0000000000000000000000000000000000000000000007c14ea1f850a49c8188
   Pub: 02fbba0ce7fd86fb853b30db836f607ca4e221f6c803a9eb838d5a9520b2210bfd
KEY[3]: 0x0000000000000000000000000000000000000000000004bbfa418622913363a8
   Pub: 02d973d466855479d4116059c642734de4d2346a77dc31927e2862c1f38664d263
KEY[4]: 0x000000000000000000000000000000000000000000000702ecd16f5d26cd4929
   Pub: 039723a4af59b41e095ac09ae5a3cdaf2f9b1c54ae7696f55681fc467309d09173
KEY[5]: 0x00000000000000000000000000000000000000000000064fb7c2bef26c5c1d5f
   Pub: 030b4e8df661659fca212b30a4691033f362b03df397d46bab628af968ca90e2ee
KEY[6]: 0x0000000000000000000000000000000000000000000004877a165a0b08d74a74
   Pub: 021df3f4dcf02db59898682b479b93e796ac59419b3c65b96a9b7ee44b08ad50eb
KEY[7]: 0x00000000000000000000000000000000000000000000015cda7804408c3e0b21
   Pub: 033a86e936da211ed72ab668b5ef2756e5a7fab6b3d7e9b6ccd059badaf6eb2a54
KEY[8]: 0x0000000000000000000000000000000000000000000006efdfb7277b3d4b1b59
   Pub: 03a096fd3aa76f05019dcb5988a0d1cef995272208f4589a36a9cfc941f745cb8e
KEY[9]: 0x00000000000000000000000000000000000000000000036c2aa2334aa651a223
   Pub: 02e39efdb3be5b61af4705cbb4ec06d5d29e6f56fb96d5888fcfcc13a734041009
KEY[10]: 0x0000000000000000000000000000000000000000000000c07102149fc97fada1
    Pub: 03dbb80d5ab39f6d9b09194fe8005c211333a266d470ab41d7411c0c9efb7aaa86



D:\BTC\cuda_BSGS>bsgscudaHT_1_8_0.exe -d 0,1,2 -infile publickey.txt -pk 0x0000000000000000000000000000000000000000000000000000000000000001 -pke   0x0000000000000000000000000000000000000000000007ffffffffffffffffff -t 512 -b 164 -p 512 -w 31 -htsz 29
Used GPU devices #0,1,2
Will be used file: publickey.txt
Range begin: 0x0000000000000000000000000000000000000000000000000000000000000001
Range end: 0x0000000000000000000000000000000000000000000007ffffffffffffffffff
Number of GPU threads set to #512
Number of GPU blocks set to #164
Number of pparam set to #512
Items number set to 2^31
HT size number set to 2^29
APP VERSION: 1.8.0-alpha
**********************************************************************************
* This version [1.8.0-alpha] may content various bugs,                           *
* Don`t use this version for serious task.                                       *
* It is needed to test the possibility of using the -w parameter greater than 30 *
* if you accept this press ENTER to continue or close the program otherwise.     *
**********************************************************************************

Found 4 Cuda device.
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce RTX 3090 (20452.612/24575MB)
Device have: MP:82 Cores+10496
Try -t 512 -b 328 -p 930 -w 30 -htsz 28 [20442.750 MB] Gen RAM[28672 MB]
---------------
Cuda device:NVIDIA GeForce GT 1030 (1660.593/2047MB)
Device have: MP:3 Cores+192
Try -t 512 -b 12 -p 1586 -w 27 -htsz 25 [1660.125 MB] Gen RAM[3584 MB]
---------------
Current config hash[e80231b2bba0f4f89c36a597efce2061775fc7d1]
GiantSUBvalue:0000000000000000000000000000000000000000000000000000000100000000
GiantSUBpubkey: 02100f44da696e71672791d0a09b7bde459f1215a29b3c03bfefd7835b39a48db0
*******************************
Total GPU Memory Need: 16224.000Mb
*******************************
Both HT files exist
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_536870912_htGPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
Generate Giants Buffer: 42991616 items
Load BIN file:512_164_512_2147483648_g2.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
Last chunk:603979776b
[2] chunk:603979776b
Done in 00:00:01s
GPU count #3
GPU #0 launched
GPU #1 launched
GPU #2 launched
GPU #0 Free memory: 20450Mb
GPU #0 Total memory: 24575Mb
GPU #0 TotalBuff: 16224.002Mb
GPU #1 Free memory: 20450Mb
GPU #1 Total memory: 24575Mb
GPU #2 Free memory: 20450Mb
GPU #1 TotalBuff: 16224.002Mb
GPU #2 Total memory: 24575Mb
GPU #2 TotalBuff: 16224.002Mb
Load BIN file:79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798_2147483648_536870912_htCPU.BIN
[0] chunk:1073741824b
[1] chunk:1073741824b
[2] chunk:1073741824b
[3] chunk:1073741824b
[4] chunk:1073741824b
[5] chunk:1073741824b
[6] chunk:1073741824b
[7] chunk:1073741824b
[8] chunk:1073741824b
[9] chunk:1073741824b
[10] chunk:1073741824b
[11] chunk:1073741824b
[12] chunk:1073741824b
[13] chunk:1073741824b
[14] chunk:1073741824b
[15] chunk:1073741824b
[16] chunk:1073741824b
[17] chunk:1073741824b
[18] chunk:1073741824b
[19] chunk:1073741824b
Verify packed HTCPU items...ok
START RANGE= 0000000000000000000000000000000000000000000000000000000000000001
  END RANGE= 0000000000000000000000000000000000000000000007ffffffffffffffffff
WIDTH RANGE= 0000000000000000000000000000000000000000000007fffffffffffffffffe
SUBpoint= (79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798, b7c52588d95c3b9aa25b0403f1eef75702e84bb7597aabe663b82f6f04ef2777)
Save work every 180 seconds

FINDpubkey: 0273f38e8621417cef51fe848ba26f00a5e78ccc00852ab1c2ca56505e80a9b378
Cnt:2472b40000000000001 [3][ 2301 2201 2250 ] = 6753 MKeys/s x2^32=2^64.72 t:00:06:00
KEY[1]: 0x00000000000000000000000000000000000000000000024b831f525b5544432d
   Pub: 0273f38e8621417cef51fe848ba26f00a5e78ccc00852ab1c2ca56505e80a9b378
Working time 00:06:02s
GPU#2 job finished
GPU#1 job finished
GPU#0 job finished

FINDpubkey: 02fbba0ce7fd86fb853b30db836f607ca4e221f6c803a9eb838d5a9520b2210bfd
Cnt:7bfe540000000000001 [3][ 2278 2208 2209 ] = 6696 MKeys/s x2^32=2^64.71 t:00:26:26
KEY[2]: 0x0000000000000000000000000000000000000000000007c14ea1f850a49c8188
   Pub: 02fbba0ce7fd86fb853b30db836f607ca4e221f6c803a9eb838d5a9520b2210bfd
Working time 00:26:26s
GPU#0 job finished
GPU#1 job finished
GPU#2 job finished

FINDpubkey: 02d973d466855479d4116059c642734de4d2346a77dc31927e2862c1f38664d263
Cnt:4b79920000000000001 [3][ 2181 2321 2206 ] = 6709 MKeys/s x2^32=2^64.71 t:00:38:52
KEY[3]: 0x0000000000000000000000000000000000000000000004bbfa418622913363a8
   Pub: 02d973d466855479d4116059c642734de4d2346a77dc31927e2862c1f38664d263
Working time 00:38:54s
GPU#2 job finished
GPU#0 job finished
GPU#1 job finished

FINDpubkey: 039723a4af59b41e095ac09ae5a3cdaf2f9b1c54ae7696f55681fc467309d09173
Cnt:70287e0000000000001 [3][ 2104 2242 2160 ] = 6507 MKeys/s x2^32=2^64.67 t:00:57:21
KEY[4]: 0x000000000000000000000000000000000000000000000702ecd16f5d26cd4929
   Pub: 039723a4af59b41e095ac09ae5a3cdaf2f9b1c54ae7696f55681fc467309d09173
Working time 00:57:21s
GPU#0 job finished
GPU#2 job finished
GPU#1 job finished

FINDpubkey: 030b4e8df661659fca212b30a4691033f362b03df397d46bab628af968ca90e2ee
Cnt:64bac20000000000001 [3][ 2244 2248 2275 ] = 6768 MKeys/s x2^32=2^64.72 t:01:13:57
KEY[5]: 0x00000000000000000000000000000000000000000000064fb7c2bef26c5c1d5f
   Pub: 030b4e8df661659fca212b30a4691033f362b03df397d46bab628af968ca90e2ee
Working time 01:13:59s
GPU#2 job finished
GPU#1 job finished
GPU#0 job finished

FINDpubkey: 021df3f4dcf02db59898682b479b93e796ac59419b3c65b96a9b7ee44b08ad50eb
Cnt:4873b20000000000001 [3][ 2179 2240 2219 ] = 6639 MKeys/s x2^32=2^64.70 t:01:25:56
KEY[6]: 0x0000000000000000000000000000000000000000000004877a165a0b08d74a74
   Pub: 021df3f4dcf02db59898682b479b93e796ac59419b3c65b96a9b7ee44b08ad50eb
Working time 01:25:56s
GPU#2 job finished
GPU#1 job finished
GPU#0 job finished

FINDpubkey: 033a86e936da211ed72ab668b5ef2756e5a7fab6b3d7e9b6ccd059badaf6eb2a54
Cnt:1588420000000000001 [3][ 2249 2147 2178 ] = 6574 MKeys/s x2^32=2^64.68 t:01:29:29
KEY[7]: 0x00000000000000000000000000000000000000000000015cda7804408c3e0b21
   Pub: 033a86e936da211ed72ab668b5ef2756e5a7fab6b3d7e9b6ccd059badaf6eb2a54
Working time 01:29:31s
GPU#2 job finished
GPU#0 job finished
GPU#1 job finished

FINDpubkey: 03a096fd3aa76f05019dcb5988a0d1cef995272208f4589a36a9cfc941f745cb8e
Cnt:6ec2b40000000000001 [3][ 2251 2264 2300 ] = 6816 MKeys/s x2^32=2^64.73 t:01:47:46
KEY[8]: 0x0000000000000000000000000000000000000000000006efdfb7277b3d4b1b59
   Pub: 03a096fd3aa76f05019dcb5988a0d1cef995272208f4589a36a9cfc941f745cb8e
Working time 01:47:48s
GPU#1 job finished
GPU#2 job finished
GPU#0 job finished

FINDpubkey: 02e39efdb3be5b61af4705cbb4ec06d5d29e6f56fb96d5888fcfcc13a734041009
Cnt:36a4b00000000000001 [3][ 2221 2185 2064 ] = 6471 MKeys/s x2^32=2^64.66 t:01:56:48
KEY[9]: 0x00000000000000000000000000000000000000000000036c2aa2334aa651a223
   Pub: 02e39efdb3be5b61af4705cbb4ec06d5d29e6f56fb96d5888fcfcc13a734041009
Working time 01:56:49s
GPU#1 job finished
GPU#2 job finished
GPU#0 job finished

FINDpubkey: 03dbb80d5ab39f6d9b09194fe8005c211333a266d470ab41d7411c0c9efb7aaa86
Cnt:bd5320000000000001 [3][ 2260 2256 2326 ] = 6843 MKeys/s x2^32=2^64.74 t:01:58:46
KEY[10]: 0x0000000000000000000000000000000000000000000000c07102149fc97fada1
    Pub: 03dbb80d5ab39f6d9b09194fe8005c211333a266d470ab41d7411c0c9efb7aaa86
Working time 01:58:47s
GPU#0 job finished
GPU#2 job finished
GPU#1 job finished

FINDpubkey: 023f10cf4fa0f7e019ea38d50f74c627902c50880d00b67fb9511fc5f324c37d6e
Cnt:162a560000000000001 [3][ 2247 2071 2252 ] = 6571 MKeys/s x2^32=2^64.68 t:02:02:27



Title: Re: BSGS solver for cuda
Post by: demoinvest1 on November 13, 2021, 01:27:35 AM

you need purebasic v5.31 because need ascll mode enabled
in new version of PB removed ascll mode.

Thank you
Now I change to use PureBasic v5.30, problem , How I can find cuda.lib ?

POLINK: fatal error: File not found lib\cuda.lib


Title: Re: BSGS solver for cuda
Post by: demoinvest1 on November 13, 2021, 02:00:49 AM

Thank you
Now I change to use PureBasic v5.30, problem , How I can find cuda.lib ?

POLINK: fatal error: File not found lib\cuda.lib

Ok, I solve my problem already
Now, I got cuda.lib from NVIDIA CUDA 10 driver

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\lib\x64


Title: Re: BSGS solver for cuda
Post by: citb0in on November 15, 2022, 08:47:43 AM
Was there any progress made with BSGS solver for CUDA meanwhile? I just stumbled over this old post and tried to use it, however I was not successful. I downloaded purebasic from the suggested link at the bottom of Etayson's Github repository (https://github.com/Etayson/BSGS-cuda), however the free version that is available for download on www.purebasic.com is a demo version which is limited to a few thousand lines of code and thus the loaded purebasic file will not get executed. OP said that we need PureBasic v5.31 but I cannot find this full version 5.31 on the webpage. Can anyone point me to a working download link for 5.31 for Linux x64, please?

Is BSGS solver useless meanwhile and there are some better tools that you would suggest? I am only aware of Keyhunts' BSGS mode which is executed in CPU threads. A CUDA version would be nice to test and hopefully get a higher rate.

@Etar, are you even reading this anymore? Maybe under a different username? If so, please reply.

I am trying to compile your program <bsgscudaussualHTchangeble1_7_3.pb> with PureBasic v5.31 under Linux. Unfortunately I do not succeed. At the first try I got this error message:

Code:
$ pbcompiler ./bsgscudaussualHTchangeble1_7_3.pb
Quote
******************************************
PureBasic 5.31 (Linux - x64)
******************************************

Loading external modules...
Starting compilation...
Starting compilation...
Error: Line 2 - File not found (~/BSGS-cuda/./Curve64.pb).

This one was easy to fix, I just had to replace the backslash into a forward slash in line 2 of your program. I guess you were using Windows where folders are separated by the character '\' instead of '/' in Linux.

Quote
IncludeFile "lib/Curve64.pb"

But then after another try I get the error indicating that no cuda.lib was found. I searched for this file but wasn't able to find, even not under my CUDA installation in /usr/local/cuda* there is absolutely no such file on a linux system. Where do we find this file? I was able to find a similar file and I thought I give a try

Code:
cp /usr/local/cuda-11.8/targets/x86_64-linux/lib/stubs/libcuda.so ~/BSGS-cuda/lib/

then I replaced line 42 by:
Quote
Import "lib/libcuda.so"

but the compiler still fails, see here:

Code:
$ pbcompiler ./bsgscudaussualHTchangeble1_7_3.pb

Quote
******************************************
PureBasic 5.31 (Linux - x64)
******************************************

Loading external modules...
Starting compilation...
Starting compilation...
Including source: lib/Curve64.pb
10273 lines processed.
Creating the executable.
Error: Linker
/usr/bin/ld: purebasic.o: warning: relocation in read-only section `.text'
/usr/bin/ld: purebasic.o: relocation R_X86_64_PC32 against symbol `exit@@GLIBC_2.2.5' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

If anyone else here is reading along and can help, I am of course also very grateful for helpful tips and recommendations. Any help appreciated. Thank you


Title: Re: BSGS solver for cuda
Post by: raschwarz on January 11, 2024, 03:36:43 PM
Was there any progress made with BSGS solver for CUDA meanwhile? I just stumbled over this old post and tried to use it, however I was not successful. I downloaded purebasic from the suggested link at the bottom of Etayson's Github repository (https://github.com/Etayson/BSGS-cuda), however the free version that is available for download on www.purebasic.com is a demo version which is limited to a few thousand lines of code and thus the loaded purebasic file will not get executed. OP said that we need PureBasic v5.31 but I cannot find this full version 5.31 on the webpage. Can anyone point me to a working download link for 5.31 for Linux x64, please?

Is BSGS solver useless meanwhile and there are some better tools that you would suggest? I am only aware of Keyhunts' BSGS mode which is executed in CPU threads. A CUDA version would be nice to test and hopefully get a higher rate.

@Etar, are you even reading this anymore? Maybe under a different username? If so, please reply.

I am trying to compile your program <bsgscudaussualHTchangeble1_7_3.pb> with PureBasic v5.31 under Linux. Unfortunately I do not succeed. At the first try I got this error message:

Code:
$ pbcompiler ./bsgscudaussualHTchangeble1_7_3.pb
Quote
******************************************
PureBasic 5.31 (Linux - x64)
******************************************

Loading external modules...
Starting compilation...
Starting compilation...
Error: Line 2 - File not found (~/BSGS-cuda/./Curve64.pb).

This one was easy to fix, I just had to replace the backslash into a forward slash in line 2 of your program. I guess you were using Windows where folders are separated by the character '\' instead of '/' in Linux.

Quote
IncludeFile "lib/Curve64.pb"

But then after another try I get the error indicating that no cuda.lib was found. I searched for this file but wasn't able to find, even not under my CUDA installation in /usr/local/cuda* there is absolutely no such file on a linux system. Where do we find this file? I was able to find a similar file and I thought I give a try

Code:
cp /usr/local/cuda-11.8/targets/x86_64-linux/lib/stubs/libcuda.so ~/BSGS-cuda/lib/

then I replaced line 42 by:
Quote
Import "lib/libcuda.so"

but the compiler still fails, see here:

Code:
$ pbcompiler ./bsgscudaussualHTchangeble1_7_3.pb

Quote
******************************************
PureBasic 5.31 (Linux - x64)
******************************************

Loading external modules...
Starting compilation...
Starting compilation...
Including source: lib/Curve64.pb
10273 lines processed.
Creating the executable.
Error: Linker
/usr/bin/ld: purebasic.o: warning: relocation in read-only section `.text'
/usr/bin/ld: purebasic.o: relocation R_X86_64_PC32 against symbol `exit@@GLIBC_2.2.5' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

If anyone else here is reading along and can help, I am of course also very grateful for helpful tips and recommendations. Any help appreciated. Thank you



I tried to switch off PIE with code below at start and you can compile and build executable for Linux, but running that code ended with "Illegal instruction (core dumped)".

Code:
Import "-no-pie"
endimport

Actually I am trying to debug it, however I am not very far with it. No success yet.


Title: Re: BSGS solver for cuda
Post by: citb0in on January 12, 2024, 07:47:04 AM
Thanks for your feedback. Please let us know your findings and hopefully you will have success. I tried a lot but finally gave up without a satisfying result. However I am still interested in testing this tool.

Wish you best of luck
citb0in


Title: Re: BSGS solver for cuda
Post by: greenAlien on March 12, 2024, 02:25:14 PM
Congrats for your software @etar

It would be nice to have this program for Linux or at least the binary to create the HT so we can rent a Linux server with big amount of RAM to make the HT and download it to our computers.


Title: Re: BSGS solver for cuda
Post by: GTX1060x2 on March 16, 2024, 02:54:27 PM
I successfully compiled it for Linux, but the program just closes without an error. Has anyone been able to solve this?
Code:
# ./onlygen1_9_6File -t 256 -b 96 -p 506 -w 30 -htsz 29
Number of GPU threads set to #256
Number of GPU blocks set to #96
Number of pparam set to #506
Items number set to 2^30=1073741824
HT size set to 2^29
initHTsize #1
APP VERSION: 1.9.6File-onlygen0
**********************************************************************************
 This version [1.9.6File-onlygen0] may content various bugs,
 Don`t use this version for serious task.
 It is needed to test the possibility of using the -w parameter greater than 30
 if you accept this press ENTER to continue or close the program otherwise.
**********************************************************************************


Title: Re: BSGS solver for cuda
Post by: citb0in on March 16, 2024, 03:45:22 PM
if you provide some instructions on how to compile on Linux I might look into it.


Title: Re: BSGS solver for cuda
Post by: GTX1060x2 on March 16, 2024, 05:43:28 PM
if you provide some instructions on how to compile on Linux I might look into it.
Code:
apt install build-essential gcc g++ libxxf86vm-dev libxine2-dev unixodbc-dev libsdl1.2-dev libsdl2-dev libssl-dev libgtk2.0-dev libgtk-3-dev libwebkit2gtk-4.0-dev libvlc-dev 
apt install libcudart9.1
export PUREBASIC_HOME=~/purebasic && export PATH=$PUREBASIC_HOME/compilers:$PATH

Replace lib\ to lib/ in the source
And add this
Code:
Import "-no-pie"
endimport

Code:
root@vm:~/purebasic/compilers# cat /etc/lsb-release | grep -i release
DISTRIB_RELEASE=18.04
root@vm:~/purebasic/compilers# ln -s /usr/lib/x86_64-linux-gnu/libcuda.so BSGS-cuda/lib/cuda.lib
root@vm:~/purebasic/compilers# ./pbcompiler BSGS-cuda/1_9_7File.pb -e 1_9_7File

******************************************
PureBasic 5.31 (Linux - x64)
******************************************

Loading external modules...
Starting compilation...
Starting compilation...
Including source: lib/Curve64.pb
27392 lines processed.
Creating the executable.

- Feel the ..PuRe.. Power -

root@vm:~/purebasic/compilers# ./1_9_7File -h
 -t      Number of GPU threads, default 256
 -b      Number of GPU blocks, default 132
 -p      Number of pparam, default 400
 -d      Select GPU IDs, default
-pb      Set single uncompressed/compressed pubkey for searching
-pk      Range start from , default 0x01
-pke     End range
-w       Set number of baby items 2^ or decimal representation
-htsz    Set number of HashTable 2^ , default 25
    Recommendation:
    with htsz 27 value -w should be less Or equil To 1331331443 Or 2^30.310222637591963
    with htsz 28 value -w should be less or equil to 1777178603 Or 2^30.726941530690112
    with htsz 29 value -w should be less or equil to 3069485950 Or 2^31.515349920643907
    with htsz 30 value -w should be less or equil to 3069485951 Or 2^31.515349920643907
    with htsz 31 value -w should be less or equil to 3069485951 Or 2^31.515349920643907
-infile  Set file with pubkey for searching in uncompressed/compressed  format (search sequential)
-wl      Set recovery file from which the state will be loaded
-wt      Set timer for autosaving current state, default every 180seconds

Do not use Ubuntu 22.


Title: Re: BSGS solver for cuda
Post by: citb0in on March 17, 2024, 08:18:55 AM
where to download pb ?


Title: Re: BSGS solver for cuda
Post by: greenAlien on March 17, 2024, 10:01:19 AM
Thanks GTX1060x2!

I will take a look too!

where to download pb ?

From here https://github.com/Etayson/BSGS-cuda/blob/main/onlygen1_9_6File.pb (https://github.com/Etayson/BSGS-cuda/blob/main/onlygen1_9_6File.pb)


Title: Re: BSGS solver for cuda
Post by: greenAlien on March 23, 2024, 01:07:34 PM
I successfully compiled it for Linux, but the program just closes without an error. Has anyone been able to solve this?
Code:
# ./onlygen1_9_6File -t 256 -b 96 -p 506 -w 30 -htsz 29
Number of GPU threads set to #256
Number of GPU blocks set to #96
Number of pparam set to #506
Items number set to 2^30=1073741824
HT size set to 2^29
initHTsize #1
APP VERSION: 1.9.6File-onlygen0
**********************************************************************************
 This version [1.9.6File-onlygen0] may content various bugs,
 Don`t use this version for serious task.
 It is needed to test the possibility of using the -w parameter greater than 30
 if you accept this press ENTER to continue or close the program otherwise.
**********************************************************************************

I have dedicated some time to compile the BSGS, the onlyGen file and BSGS-fractions in Linux with your instructions. The compilation was successfully for all of them however, when running the files they display the initial text but after that it closes without any output.
What are we missing here ?

This is and example of the execution, don't take the arguments seriously since it was just to test:

Code:
vboxuser@ubuntupurebasic:~/purebasic/compilers$ ./generateHT  -t 256 -b 96 -p 506 -w 30  -pk 8000000000000000 -pke ffffffffffffffff -pb 03100611c54dfef604163b8358f7b7fac13ce478e02cb224ae16d45526b25d9d4d -htsz 28
Number of GPU threads set to #256
Number of GPU blocks set to #96
Number of pparam set to #506
Items number set to 2^30=1073741824
Range begin: 0x8000000000000000
Range end: 0xffffffffffffffff
Pubkey set to 03100611c54dfef604163b8358f7b7fac13ce478e02cb224ae16d45526b25d9d4d
HT size set to 2^28
initHTsize #1
APP VERSION: 1.9.6File-onlygen0
**********************************************************************************
 This version [1.9.6File-onlygen0] may content various bugs,                          
 Don`t use this version for serious task.                                      
 It is needed to test the possibility of using the -w parameter greater than 30
 if you accept this press ENTER to continue or close the program otherwise.    
**********************************************************************************
Same behavior for the bsgs and the bsgs-fractions. Maybe @etar can help us here ?
I have also rented a Linux server and tried to run the binaries there just in case they didn't work on my Linux because GPU or Ram issues but I had the same results... :(


Title: Re: BSGS solver for cuda
Post by: Cricktor on March 24, 2024, 01:47:23 PM
where to download pb ?

From here https://github.com/Etayson/BSGS-cuda/blob/main/onlygen1_9_6File.pb (https://github.com/Etayson/BSGS-cuda/blob/main/onlygen1_9_6File.pb)

I believe citb0in is asking where to download the PureBasic compiler itself, not Etar's PureBasic source code file(s). I haven't searched myself, but it shouldn't be rocket science to find the needed version of the PureBasic compiler for Linux with the help of internet search engines and/or Linux package search sites.


Maybe @etar can help us here ?

You may be lucky if you would address OP with correct spelling of his username @Etar, but maybe you're still unlucky because Etar was last active in this forum around July 23rd, 2023.


Title: Re: BSGS solver for cuda
Post by: citb0in on March 24, 2024, 02:27:26 PM
I believe citb0in is asking where to download the PureBasic compiler itself, not Etar's PureBasic source code file(s). I haven't searched myself, but it shouldn't be rocket science to find the needed version of the PureBasic compiler for Linux with the help of internet search engines and/or Linux package search sites.

That was my question, absolutely. The source code is available but I wasn't able to found a working and usable PureBasic installation source.


Title: Re: BSGS solver for cuda
Post by: WanderingPhilospher on March 25, 2024, 02:07:34 AM
I believe citb0in is asking where to download the PureBasic compiler itself, not Etar's PureBasic source code file(s). I haven't searched myself, but it shouldn't be rocket science to find the needed version of the PureBasic compiler for Linux with the help of internet search engines and/or Linux package search sites.

That was my question, absolutely. The source code is available but I wasn't able to found a working and usable PureBasic installation source.

Really, none of y’all could find it?

https://www.purebasic.com/pricing.php (https://www.purebasic.com/pricing.php)

Yes, you have to pay for it. Once you pay for it, you can download any new or legacy versions, windows and/or Linux. You will need the one that Etar mentions in his PB code.


Title: Re: BSGS solver for cuda
Post by: greenAlien on March 25, 2024, 09:04:31 AM
I believe citb0in is asking where to download the PureBasic compiler itself, not Etar's PureBasic source code file(s). I haven't searched myself, but it shouldn't be rocket science to find the needed version of the PureBasic compiler for Linux with the help of internet search engines and/or Linux package search sites.

That was my question, absolutely. The source code is available but I wasn't able to found a working and usable PureBasic installation source.

Really, none of y’all could find it?

https://www.purebasic.com/pricing.php (https://www.purebasic.com/pricing.php)

Yes, you have to pay for it. Once you pay for it, you can download any new or legacy versions, windows and/or Linux. You will need the one that Etar mentions in his PB code.

Exactly, you can buy it or...you can just search in the internet...

Does anyone have any clue regarding the no output when running the binaries after linux compilation?  The binaries just close themselves  ???


Title: Re: BSGS solver for cuda
Post by: anjilite7 on May 21, 2024, 09:59:43 AM
so the speed is around 300MKeys, wheres my exakeys? ;D

Cnt:b8c6800000000001 [1][ 316 ] = 316 MKeys/s x2^27.0=2^55.31 Jt:00:05:00 Tt:00:05:03


Title: Re: BSGS solver for cuda
Post by: CY4NiDE on May 21, 2024, 10:40:49 PM
so the speed is around 300MKeys, wheres my exakeys? ;D

Cnt:b8c6800000000001 [1][ 316 ] = 316 MKeys/s x2^27.0=2^55.31 Jt:00:05:00 Tt:00:05:03


It says  316Mk/s  x  2^27.0  =  2^55.31

2^55.31  =  44665177000000000

So this is your speed. Around 44 Pk/s (or 0.04 Ek/s).


Title: Re: BSGS solver for cuda
Post by: mahurovihamilo on May 23, 2024, 05:30:09 PM
Hi there,

Is there a "version" of this for Ubuntu? Thanks.