Bitcoin Forum
November 10, 2024, 07:27:43 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 [6] 7 8 9 10 11 »  All
  Print  
Author Topic: BSGS solver for cuda  (Read 3910 times)
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 635
Merit: 312


View Profile
October 20, 2021, 06:19:11 AM
 #101


1 > 80 key not 80 keys [single key] [random mode with 4.7 Ekeys/sec] [4300000000000000000 keys/sec] [3BACAB37B62E0000 keys/sec][ whole 65 range in 1 sec]. now compare with bsgscuda  
        with reference key in range 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000:49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff.  let me guess 3080 took with full
        optimization around 17 second but keyhunt took just 1 second. even i have to reduce k and n value to reduce speed for this  Grin.
2 > do your research and than find how many keys you will get while doing 120 to 2^40 divisor [lol]. if you will load 2 keys, you will make keyhunt speed half and what about billion keys . speed will be just like your mind
       processing to understand my answer.
 
4300000000000000000 it is 2^61.89. so whole 65range( i think you mean puzzle #65 with range 2^64bit) need 4.28 seconds
I don`t have 3080 card but i think speed will be around 1400Mkeys x BabyArraySize
windows10 eat 20% of GPU memory so 3080 should have 8192 free memory, so we can use -w 30
Totaly 1400mkeys = 2^30.38 and baby array x2 = 2^31 and full perfomance = 2^61.38 and to check full 2^64 need 6.14s
Only Kangaroo can solve keys faster then bsgs or keyhunt or whatever.
Bsgs cuda created only because i didn`t find bsgs for gpu (maybe it useless app i don`t know)
ssxb
Jr. Member
*
Offline Offline

Activity: 81
Merit: 2


View Profile
October 20, 2021, 07:38:15 AM
Last edit: October 20, 2021, 01:42:40 PM by ssxb
 #102


1 > 80 key not 80 keys [single key] [random mode with 4.7 Ekeys/sec] [4300000000000000000 keys/sec] [3BACAB37B62E0000 keys/sec][ whole 65 range in 1 sec]. now compare with bsgscuda  
        with reference key in range 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000:49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff.  let me guess 3080 took with full
        optimization around 17 second but keyhunt took just 1 second. even i have to reduce k and n value to reduce speed for this  Grin.
2 > do your research and than find how many keys you will get while doing 120 to 2^40 divisor [lol]. if you will load 2 keys, you will make keyhunt speed half and what about billion keys . speed will be just like your mind
       processing to understand my answer.
 
4300000000000000000 it is 2^61.89. so whole 65range( i think you mean puzzle #65 with range 2^64bit) need 4.28 seconds
I don`t have 3080 card but i think speed will be around 1400Mkeys x BabyArraySize
windows10 eat 20% of GPU memory so 3080 should have 8192 free memory, so we can use -w 30
Totaly 1400mkeys = 2^30.38 and baby array x2 = 2^31 and full perfomance = 2^61.38 and to check full 2^64 need 6.14s
Only Kangaroo can solve keys faster then bsgs or keyhunt or whatever.
Bsgs cuda created only because i didn`t find bsgs for gpu (maybe it useless app i don`t know)

i am not arguing on your math but if you have time and hardware please just try to do research on keyhunt [CPU+memory ] and by the way i appreciate your programing skills toward cuda its really impressive and wish some day you will enhanced it more to overcome 120 and by the way i know one guy who is running it with 9+Ekeys/sec [yoyodapro].

 but with divisor you can get only get 1 key out of  1073741824 if you want to reach 90bit.
i loaded all keys in keyhunt and i am trying my luck but on other side i was hoping if we can figure it out how to load multi keys with cudabsgs . so i will keep busy my 3080 for that as that one is just sitting idle now.

ssxb
Jr. Member
*
Offline Offline

Activity: 81
Merit: 2


View Profile
October 20, 2021, 07:56:11 AM
 #103

Quote
you got big mouth but less sense and knowledge  Grin

i hate to tell you that grow up your knowledge & perhaps things will get more clear.

1 > 80 key not 80 keys [single key] [random mode with 4.7 Ekeys/sec] [4300000000000000000 keys/sec] [3BACAB37B62E0000 keys/sec][ whole 65 range in 1 sec]. now compare with bsgscuda 
        with reference key in range 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000:49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5effffffffffffffff.  let me guess 3080 took with full
        optimization around 17 second but keyhunt took just 1 second. even i have to reduce k and n value to reduce speed for this  Grin.
2 > do your research and than find how many keys you will get while doing 120 to 2^40 divisor [lol]. if you will load 2 keys, you will make keyhunt speed half and what about billion keys . speed will be just like your mind
       processing to understand my answer.
Your English reading or comprehension is less sense and knowledge.

I never said anything about 80 keys...you are saying you found 80 key, I took that as a single key in an 80 bit range, not 80 keys because you did not pluralize the word key. So with that, I merely said instead of trying to get someone to reprogram BSGS Cuda for multi key, run keyhunt, since it already supports multi key and if you think it is faster, then break up 120 key into however many keys you want to, 2^5, 2^20, 2^40, or however many you want to and let that program eat.  I said 2^40 specifically because you said an 80 key in a blink of an eye; so 2^120/2^40 = 2^80; if you found one 80 key in a blink of an eye, maybe you find the 120 key in 80 bit range in 2 blinks of an eye.

BSGS Cuda, can find 65 bit key in less than a second, it all depends on your hardware.

you say
Quote
if you will load 2 keys, you will make keyhunt speed half
the same will happen to BSGS Cuda; so I am not sure what your point is really.



maybe you find the 120 key in 80 bit range in 2 blinks of an eye.   


ok learn basic knowledge of divisor bro , if you will do 32 times, only one key will be from 5 bit down range on unknown position other all will from uper bit ranges on exact same distance from their references values.

now if you will do 2^40, you will have 1208925819614629174706176 reference values in 256 bit range and only one of key will be in 40bit range other all keys will from uper bits on exact distance from their respectively reference values.

now how the hell you can work with such large number of keys and the line you said that get the 2^40 is aggressive comment without knowing my intention.

my intention is that i already did divisor of 32 and loaded keys in Keyhunt and running it right now and i know how much speed and power i am getting from that , but i just dont know power of BSGScuda if i will load 32 keys parallel in that. so i asked Etar that if he can make such possibility who knows BSGS will out performed keyhunt.

now come to the point in above post Etar said that my program is good until 80 bit and above that use JL kangaroo so i was comparing it with BSGS of alberto but i found that CPU based BSGS is more powerful than 3080 if you have good specification hardware but same time BSGScuda is better than keyhunt[CPU] if you dont have enough power of CPU and memory.
ssxb
Jr. Member
*
Offline Offline

Activity: 81
Merit: 2


View Profile
October 20, 2021, 08:05:42 AM
 #104

@Etar  Huh

i seriously believing that there will some way to use power of GPU cores and process all BSGS inside computer memory perhaps this will give some crazy power which never been discovered or there will be bottle neck but you can confirm it when you will build such program.

assume if you have power of keyhunt and than you will make bloom in SSD [7000+ read write speed gen4]

RAM        bpfile elements   bpfile size      bloom size
8 GB         1000000000   32 GB      5.02 GB
32 GB     5000000000   160 GB      25.11 GB
128 GB   22000000000   704 GB      110.47 GB
500 GB   90000000000   2.9 TB      451.92 GB

based on above table you can increase speed if you will utilize both bloom+bp https://github.com/iceland2k14/bsgs

so CPU cores are less powerful than cuda and i was thinking [not sure possible or not] if we load all bp in RAM and use some bloom in GPU memory perhaps their will be some dramatic speed boost

bigvito19
Full Member
***
Offline Offline

Activity: 711
Merit: 111


View Profile
October 20, 2021, 08:48:29 AM
 #105

What's the link to the divisor script?

and how many keys can I generate with the divisor?
NotATether
Legendary
*
Offline Offline

Activity: 1778
Merit: 7372


Top Crypto Casino


View Profile WWW
October 20, 2021, 09:55:44 AM
 #106

What's the link to the divisor script?

and how many keys can I generate with the divisor?

If you mean the one I made, it's in the Kangaroo thread, anywhere from pages 90 to 100 I think.



I think we can cut the number of baby steps made if we take into account that the correct baby step amount is going to be random-looking (in other words, no long 0 or 1 sequences).

Or at least make the baby steps take a higher bit count, decreasing the number of giant steps.

I'm thinking that we can find the numbers represented by these random bits and then calculate their multiples to use as an incrementor... not perfect but it does the trick I guess.

E.g.

5 is 101, 10 is 1010, 15 is 1111, 20 is 10100, 25 is 11001, 30 11110, ..... etc.

Special care would need to be taken to choose a number whose multiples don't make long sequences of bits, like 15: 3*5

I don't think that this randomness has any correlation to primality of numbers (or inverse correlation to it).

███████████████████████
████▐██▄█████████████████
████▐██████▄▄▄███████████
████▐████▄█████▄▄████████
████▐█████▀▀▀▀▀███▄██████
████▐███▀████████████████
████▐█████████▄█████▌████
████▐██▌█████▀██████▌████
████▐██████████▀████▌████
█████▀███▄█████▄███▀█████
███████▀█████████▀███████
██████████▀███▀██████████

███████████████████████
.
BC.GAME
▄▄▀▀▀▀▀▀▀▄▄
▄▀▀░▄██▀░▀██▄░▀▀▄
▄▀░▐▀▄░▀░░▀░░▀░▄▀▌░▀▄
▄▀▄█▐░▀▄▀▀▀▀▀▄▀░▌█▄▀▄
▄▀░▀░░█░▄███████▄░█░░▀░▀▄
█░█░▀░█████████████░▀░█░█
█░██░▀█▀▀█▄▄█▀▀█▀░██░█
█░█▀██░█▀▀██▀▀█░██▀█░█
▀▄▀██░░░▀▀▄▌▐▄▀▀░░░██▀▄▀
▀▄▀██░░▄░▀▄█▄▀░▄░░██▀▄▀
▀▄░▀█░▄▄▄░▀░▄▄▄░█▀░▄▀
▀▄▄▀▀███▄███▀▀▄▄▀
██████▄▄▄▄▄▄▄██████
.
..CASINO....SPORTS....RACING..


▄▄████▄▄
▄███▀▀███▄
██████████
▀███▄░▄██▀
▄▄████▄▄░▀█▀▄██▀▄▄████▄▄
▄███▀▀▀████▄▄██▀▄███▀▀███▄
███████▄▄▀▀████▄▄▀▀███████
▀███▄▄███▀░░░▀▀████▄▄▄███▀
▀▀████▀▀████████▀▀████▀▀
bigvito19
Full Member
***
Offline Offline

Activity: 711
Merit: 111


View Profile
October 20, 2021, 01:28:20 PM
 #107

I'm testing with the divisor keys on a smaller range, but its not solving the key with keyhunt. does it work the same with xpoint mode?
ssxb
Jr. Member
*
Offline Offline

Activity: 81
Merit: 2


View Profile
October 20, 2021, 01:46:08 PM
 #108

I'm testing with the divisor keys on a smaller range, but its not solving the key with keyhunt. does it work the same with xpoint mode?

you need to adjust K and N as smaller range will be not solved if power of K and N is more than range count or if number of keys will be more or less than power of your hardware.

remember tweak is seriously needed while keeping K and N according to your hardware power as well as adjust K and N according to number of keys you will load in software ~ do the test again and again and again
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 635
Merit: 312


View Profile
October 20, 2021, 03:54:04 PM
 #109

With last ptx optimisation (forgot about simmetry in batch point addition)
solve 16 pubkeys from JLP in 58s
Code:
...
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000004673f00000000001 1121MKey/s x1073741824 2^30.13 x2^31=2^61.13
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:58s
Total time 00:06:33s
GPU#0 thread finished
cuda finished ok

Press Enter to exit
Seems like it is the maximum that I can achieve in single 2080ti.
Ofcourse JLP would probably have done it even faster Smiley
studyroom1
Jr. Member
*
Offline Offline

Activity: 40
Merit: 7


View Profile
October 21, 2021, 08:58:37 AM
 #110

With last ptx optimisation (forgot about simmetry in batch point addition)
solve 16 pubkeys from JLP in 58s
Code:
...
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:0000000000000000000000000000000000000000000000004673f00000000001 1121MKey/s x1073741824 2^30.13 x2^31=2^61.13
***********GPU#0************
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
Pub: 55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:58s
Total time 00:06:33s
GPU#0 thread finished
cuda finished ok

Press Enter to exit
Seems like it is the maximum that I can achieve in single 2080ti.
Ofcourse JLP would probably have done it even faster Smiley



impressive Etar . i have question . lets say if you have 1m keys in file and you load in bsgscuda and set scan range only to 64, now my question is if gpu finished whole 64 range scan for key1 than gpu will abandoned  search of key1 and move to key2?

your program is doing that or you will impalement this. right?
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 635
Merit: 312


View Profile
October 21, 2021, 10:11:16 AM
 #111

impressive Etar . i have question . lets say if you have 1m keys in file and you load in bsgscuda and set scan range only to 64, now my question is if gpu finished whole 64 range scan for key1 than gpu will abandoned  search of key1 and move to key2?

your program is doing that or you will impalement this. right?
Use -pk to set start range and -pke to set endrange. if pubkey will not find in this range then seraching will be switched to next pubkey.
lostrelic
Jr. Member
*
Offline Offline

Activity: 32
Merit: 1


View Profile
October 21, 2021, 10:22:37 AM
 #112

Hi Etar thanks for your continuing support for this program.
Quick question the fastest I get is 2^60 if I try to get 2^61 it sticks on add baby points to hashtable? I’ve got a 3080 16gb ram and 500gb ssd any ideas on settings to try? or how long should I wait for it to load?
Thanks Relic
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 635
Merit: 312


View Profile
October 21, 2021, 11:46:14 AM
 #113

Hi Etar thanks for your continuing support for this program.
Quick question the fastest I get is 2^60 if I try to get 2^61 it sticks on add baby points to hashtable? I’ve got a 3080 16gb ram and 500gb ssd any ideas on settings to try? or how long should I wait for it to load?
Thanks Relic
Screen what i post in post above it is the latest verion and not yet published(tested).
By the way v1.6.0 shoud works fine for you but in little less perfomance, at v1.6.0 2080ti speed 826MKey/s x1073741824 2^29.69 x2^31=2^60.69
If you have 16gb gpu ram then try -w 31 and -htsz 29
In any case 3080 shoud have better perfomance then 2080ti even with the same size of baby array that i use, try set -t 512 -b 136 -p 512 -w 30 -htsz 28

P.s. Maybe you stick on add baby points to hashtable because have little memory on PC to generate HT in RAM. I generate HT -w 30 on PC that have 32GB of ram.
For -w 31 you need 64gb of ram to creat all arrays.
To launch solver you will need less more memory with already generated arrays.
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 635
Merit: 312


View Profile
October 21, 2021, 12:47:06 PM
 #114

STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.
studyroom1
Jr. Member
*
Offline Offline

Activity: 40
Merit: 7


View Profile
October 21, 2021, 01:58:10 PM
 #115

STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

oh when can we see next update Sad
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 635
Merit: 312


View Profile
October 22, 2021, 12:55:03 PM
 #116

The problem was a double giant step.
Now I have removed the double giant step and in my opinion everything works as it should.
I run several tests with different small -w -p options with 1024 pubkeys file and all keys are founded.
True, now the total indicator is 2 times less, due to the fact that the step is normal.
You can run all sorts of tests with keys and check. If there are any bugs, let me know.
release 1.7.0 available on github.
_Counselor
Member
**
Offline Offline

Activity: 110
Merit: 61


View Profile
October 22, 2021, 02:09:15 PM
 #117

The problem was a double giant step.
Now I have removed the double giant step and in my opinion everything works as it should.
I run several tests with different small -w -p options with 1024 pubkeys file and all keys are founded.
True, now the total indicator is 2 times less, due to the fact that the step is normal.
You can run all sorts of tests with keys and check. If there are any bugs, let me know.
release 1.7.0 available on github.
What the kind of problem was?
I think you exploited symmetry to double size of giant steps?
Why it did not find some keys?
math09183
Member
**
Offline Offline

Activity: 170
Merit: 58


View Profile
October 23, 2021, 06:53:32 AM
 #118

STOP using BSGScuda, i found a bug that not all public keys found. I can`t say now from which version this bug apear, so don`t use programm while i am do not solve issue.

LOL  Cheesy

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  Grin
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 635
Merit: 312


View Profile
October 23, 2021, 07:01:46 AM
 #119


LOL  Cheesy

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  Grin

Most of code have bugs. Are you a great programmer who does everything without mistakes?
I found this bug and solved it, what's your problem?
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 635
Merit: 312


View Profile
October 23, 2021, 07:02:51 AM
 #120

What the kind of problem was?
I think you exploited symmetry to double size of giant steps?
Why it did not find some keys?

if we talk about doubled GS (Giant Step)
For ex, option -p 8 -w 4 mean baby array 2^4 =16
each giant step (doubled) is 16*2=32
let say we should find pubkey with privkey=32
program substruct GS from public key and look to a Baby array to check overlap.
but if you substruct 32-32 then you get 64 and this value is not present in the baby array.
But if we used the usual GS
32-16=16 and 16 is present in baby array - pubkey solved.

So with doubled GS not finded every (baby array size)*2 keys.
Pages: « 1 2 3 4 5 [6] 7 8 9 10 11 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!