Bitcoin Forum
July 06, 2024, 06:20:58 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 [7] 8 9 10 11 »  All
  Print  
Author Topic: BSGS solver for cuda  (Read 3712 times)
math09183
Member
**
Offline Offline

Activity: 170
Merit: 58


View Profile
October 23, 2021, 03:17:08 PM
 #121


LOL  Cheesy

That's what happens when you use  ad hoc written code, without proper testing. I guess you still did not prepare any set of unit tests to proof your code works?
Good luck for the future releases, maybe somewhere around version 20 it will be stable  Grin

Most of code have bugs. Are you a great programmer who does everything without mistakes?
I found this bug and solved it, what's your problem?

Relax, haters gonna hate  Grin

Anyway, good job, I appreciate your work. Sh*t happens.
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 23, 2021, 05:13:51 PM
Last edit: October 23, 2021, 07:01:36 PM by Etar
 #122

Code:
KEY[15]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e2452dd26bc983cd5
    Pub: 02b1985389d8ab680dedd67bba7ca781d1a9e6e5974aad2e70518125bad5783eb5
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:55s
FINDpubkey= (55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b, 3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9)
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000045ba000000000001 1110MKey/s x1073741824 2^30.12 x2^31=2^61.12
***********GPU#0************
KEY[16]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
    Pub: 0355b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:59s
double giant step defeated.
v1.7.1 released with maximum perfomance.
mamuu
Member
**
Offline Offline

Activity: 72
Merit: 19


View Profile
October 23, 2021, 07:13:47 PM
 #123

Code:
KEY[15]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e2452dd26bc983cd5
    Pub: 02b1985389d8ab680dedd67bba7ca781d1a9e6e5974aad2e70518125bad5783eb5
****************************
Found in 1 seconds
GPU#0 job finished
Working time 00:00:55s
FINDpubkey= (55b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b, 3b4d6bce33718dc6cf59f28b550648d7e8b2796ac36f25ff0c01f8bc42a16fd9)
GPU#0 Cnt:0000000000000000000000000000000000000000000000000000000000000001
GPU#0 Cnt:00000000000000000000000000000000000000000000000045ba000000000001 1110MKey/s x1073741824 2^30.12 x2^31=2^61.12
***********GPU#0************
KEY[16]: 0x49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e7ad38337c7f173c7
    Pub: 0355b95bef84a6045a505d015ef15e136e0a31cc2aa00fa4bca62e5df215ee981b
****************************
Found in 4 seconds
GPU#0 job finished
Working time 00:00:59s
double giant step defeated.
v1.7.1 released with maximum perfomance.

Hi,
Please , can you write a little tutorial on usage?

1DWA3Sa8i6eHVWV4AG4UP2SBhYB2XrfiHW
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 23, 2021, 07:16:04 PM
 #124

Hi,
Please , can you write a little tutorial on usage?
did you look at readme.md on github?
first of all set -t parametr as 256 or 512
next set -b equil to SM count of your card
next set -p start from 128
then set -w as max as possible to your gpu memory, check -htsz params
-w 31 -htsz 29 need around 64GB of RAM to generate all arrays
-w 30 -htsz 28 need around 32GB of RAM to generate all arrays
-w 29 -htsz 28
-w 28 -htsz 27
-w 27 -htsz 25
if you will have free gpu memory you can increase -p or -b or -t  (all params multiple of 2)
ssxb
Jr. Member
*
Offline Offline

Activity: 81
Merit: 2


View Profile
October 24, 2021, 03:30:22 AM
 #125

Hi,
Please , can you write a little tutorial on usage?
did you look at readme.md on github?
first of all set -t parametr as 256 or 512
next set -b equil to SM count of your card
next set -p start from 128
then set -w as max as possible to your gpu memory, check -htsz params
-w 31 -htsz 29 need around 64GB of RAM to generate all arrays
-w 30 -htsz 28 need around 32GB of RAM to generate all arrays
-w 29 -htsz 28
-w 28 -htsz 27
-w 27 -htsz 25
if you will have free gpu memory you can increase -p or -b or -t  (all params multiple of 2)


some great work from your side  , appreciate

just a  quick question, do you have any plan to enhanced it for 120bit or more to perform better than JLK?
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 24, 2021, 07:55:06 AM
 #126


just a  quick question, do you have any plan to enhanced it for 120bit or more to perform better than JLK?


Bsgs will never faster then kangaroo on big ranges.
Example puzzle #80, width 79bit
Kanagaroo need in average 2^40.5 op to solve key
2080ti can serach DPs with speed 1500Mkey/s = 2^30.48
So you need in average 2^10 second to find key.

Bsgscuda can search 2^61/s, so 2^79 / 2^61 = 2^18 second to check whole 79bit range. Ofcourse key can be close to the begining and you can found very fast but it is not 100%.
we can devide pub to 64 pubkey and try to find one of them in range 1..3ffffffffffffffffff fortunately, the key we need is the first one on the list but it was just lucky.
by the way privkey will be 3a869719b73046d6b46 = 2^73.87
so we need  2^73.87 / 2^61/s = 2^12.87 seconds to find key. It is 7 more timer then need for kangaroo
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
October 24, 2021, 04:32:04 PM
 #127

Do you use the negation map to speed up the algorithm ?  --> pag. 8-9  https://eprint.iacr.org/2015/605.pdf

You need to compute only:

sqrt(n) / 2  baby steps : for example, n = 2^60  -> 2^29 baby steps

sqrt(n)  giant steps : for example, n = 2^60  -> 2^30 giant steps

It is like to shift the public key of 2^59 steps, and search in the interval [1,..., 2^59] instead of [1,...., 2^60] exploiting the symmetrie.

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 24, 2021, 05:02:06 PM
 #128

Do you use the negation map to speed up the algorithm ?  --> pag. 8-9  https://eprint.iacr.org/2015/605.pdf

You need to compute only:

sqrt(n) / 2  baby steps : for example, n = 2^60  -> 2^29 baby steps

sqrt(n)  giant steps : for example, n = 2^60  -> 2^30 giant steps

It is like to shift the public key of 2^59 steps, and search in the interval [1,..., 2^59] instead of [1,...., 2^60] exploiting the symmetrie.

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
October 24, 2021, 05:16:14 PM
 #129

As you already explained few days ago, cuda BSGS is searching the keys one after another, not in parallel. I know that, I read carefully Wink. With my previous post I meant that even if it would run in parallel it would slow down as described. So read my post in conjunctive and if someone would modify it to process in parallel I expect that behaviours.

BSGS works in this way:

suppose we know that P = k*G, and the private key k is in [1,...., 2^60] range

1) precompute 2^29 baby steps (they are simple public keys): 1*G, 2*G, 3*G, ...., 2^29 * G

2) split the public key P in many other public keys (they are called giant steps, 2^30 public keys):
P, P - 1*(2^30*G), P - 2*(2^30*G), P - 3*(2^30)G, ..., P - (2^30 - 1)*(2^30*G)

3) for each giant steps, you check if it lies in [1, ..., 2^30] range (if it is equal to a 'baby step' public key)

4) if    P - a*(2^30*G) = +-b*G   then  P = (a*2^30 +- b)*G   then the private key is k = a*2^30 +- b

If you want to search 2 public keys, P1 and P2, you can use the same baby steps, but you need to generate 2^31 giant steps instead of 2^30.
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
October 24, 2021, 05:18:43 PM
 #130

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 24, 2021, 05:31:37 PM
 #131

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?
with group inverse, calculate inverse for batch and use it in addition. Sorry if i misunderstud question.
In the same way as in bitcrack https://github.com/brichard19/BitCrack/blob/6bf8059ef075eb1622298395866b0bd02375e1d9/cudaMath/secp256k1.cuh#L642
and then https://github.com/brichard19/BitCrack/blob/6bf8059ef075eb1622298395866b0bd02375e1d9/cudaMath/secp256k1.cuh#L656
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
October 24, 2021, 05:40:09 PM
 #132

Besides, in order to compute a batch of k steps, you need to calculate only k/2 (instead of k) elements x^-1 mod p , at the cost of 1 inversions and 3*(k/2 - 1) multiplications mod p.

No, don`t use even don`t know about this.
Will try to understand this tweak, thanks.

I mean: how do you compute a batch of 'consecutive' keys ?

Like P, P+G, P+2G, P+3G, P+4G, P+5G, ...  ?
with group inverse, calculate inverse for batch and use it in addition. Sorry if i misunderstud question.

Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.

If you have to compute A + B:




If you have to compute A - B, since -B = (xB, n-yB)

1/(xb-xa) is the same as in A + B.
Because  for example P+2G and P-2G use the same inverse.

 
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 24, 2021, 05:44:48 PM
 #133


Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
October 24, 2021, 05:52:59 PM
 #134


Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys

Ok. The square "a**2 mod p" is optimized like here ?
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 24, 2021, 05:57:01 PM
Last edit: October 24, 2021, 06:07:30 PM by Etar
 #135


Ok.  
If you have a batch of 100 points, you don't need to compute 100 inversions but only 50 inversions.
Because  for example P+2G and P-2G use the same inverse.

 
Exacly all this used in bsgs cuda(batch addition and symmetry)
befor using symmetry in addition speed was 800Mkeys with -w 30, after using symmetry speed grow to 1150Mkeys

Ok. The square "a**2 mod p" is optimized like here ?
No. used mult mod P
i use optimized square mod P in PB https://github.com/Etayson/BSGS-cuda/blob/e41fff517b8de153b6bf9846ee7abb47524fe43e/lib/Curve64.pb#L2161
but need buffer 512bytes for this, so i did not transfer it to cuda ptx
Also used double giant step, so we substuct double giant value from pub.
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
October 24, 2021, 06:33:05 PM
 #136

Also used double giant step, so we substuct double giant value from pub.

So, instead of computing

sqrt(n) / 2  baby steps
and
sqrt(n) giant steps

you compute

sqrt(n) baby steps
and
sqrt(n) / 2 giant steps


then, for example, for n = 2^60:

P - a * (2^31*G) = b*G  where  'a' lies in [1, ..., 2^29] and 'b' lies in [1,...,2^30]

means:  

1) P - a*(2^31*G) = b*G  -->   P = [a*(2^31) + b] * G  --> priv key = a*(2^31) + b
or
2) P - a*(2^31*G) = -b*G  -->   P = [a*(2^31) - b] * G  --> priv key = a*(2^31) - b

to save 2^30 giant steps.

It seems to me that the program is already optimized.
Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 24, 2021, 06:46:12 PM
 #137

Also used double giant step, so we substuct double giant value from pub.

So, instead of computing

sqrt(n) / 2  baby steps
and
sqrt(n) giant steps

you compute

sqrt(n) baby steps
and
sqrt(n) / 2 giant steps


then, for example, for n = 2^60:

P - a * (2^31*G) = b*G  where  'a' lies in [1, ..., 2^29] and 'b' lies in [1,...,2^30]

means:  

1) P - a*(2^31*G) = b*G  -->   P = [a*(2^31) + b] * G  --> priv key = a*(2^31) + b
or
2) P - a*(2^31*G) = -b*G  -->   P = [a*(2^31) - b] * G  --> priv key = a*(2^31) - b

to save 2^30 giant steps.

It seems to me that the program is already optimized.
I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
October 24, 2021, 06:57:39 PM
 #138


I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.

You can choose:

1) 2^30 baby-steps: 1*G, 2*G, ..., 2^30*G
and
2^29 giant steps: P-1*2^31*G, P-2*2^31*G,..,P -a*2^31*G where a is in [1,2,.., 2^29 - 1]

or

2) 2^29 baby-steps: 1*G, 2*G, ..., 2^29*G 
and
2^30 giant steps: P-1*2^30*G, P-2*2^30*G,..., P -a*2^30*G  where a is in [1,2,..., 2^30 - 1]

I don't understand why:  "G*(2^31), G*(2^32), G*(2^33)": in this way you compute only 30 giant steps



Etar (OP)
Sr. Member
****
Offline Offline

Activity: 617
Merit: 312


View Profile
October 24, 2021, 07:05:38 PM
Last edit: October 24, 2021, 07:19:00 PM by Etar
 #139


I compute as max as possible baby array size dependency of GPU memory
Baby array is 1G,2G,3G...
So this array computed only first time and then redesigned to hashtable.
It is one HT for any ranges.
Giant array is computed with doubled value of Baby array size, for ex if Baby array have size 2^30 then Giant Array have value G*(2^31), G*(2^32), G*(2^33)...
All arrays computed only one time if not changed settings.
So you can easy used all arays for different ranges and pubkeys without recompute.

You can choose:

1) 2^30 baby-steps: 1*G, 2*G, ..., 2^30*G
and
2^29 giant steps: P-1*2^31*G, P-2*2^31*G,..,P -a*2^31*G where a is in [1,2,.., 2^29 - 1]

or

2) 2^29 baby-steps: 1*G, 2*G, ..., 2^29*G  
and
2^30 giant steps: P-1*2^30*G, P-2*2^30*G,..., P -a*2^30*G  where a is in [1,2,..., 2^30 - 1]

I don't understand why:  "G*(2^31), G*(2^32), G*(2^33)": in this way you compute only 30 giant steps




it is universal arrays for any ranges. You not need to recompute array if you change range or pubkeys.
size of giant array is equil to thread number * block number * pparam
for 2080ti i use 512 thread 138 blocks and pparam = 480
so totaly i have 33914880 doubled giant values
So each cuda kernel call calculate 33914880 * 2(due to +y/-y in batch additions) giat steps
jacky19790729
Jr. Member
*
Offline Offline

Activity: 71
Merit: 8


View Profile
October 29, 2021, 03:42:12 AM
 #140

based on above table you can increase speed if you will utilize both bloom+bp https://github.com/iceland2k14/bsgs
so CPU cores are less powerful than cuda and i was thinking [not sure possible or not] if we load all bp in RAM and use some bloom in GPU memory perhaps their will be some dramatic speed boost

I had try iceland2k14's BSGS
Intel i7-7800X  + 24 GB  DDR4-2400

Code:
D:\python\BSGS_ice>python bsgs_dll_secp256k1.py -p 0385a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f963 -b bPfile.bin -bl bloomfile.bin -n 1000000000000000 -keyspace 40000000000000:80000000000000

[+] Starting Program : BSGS mode     Version [ 13072021 ]
[+] Search Started for the Public key:  0485a30d8413af4f8f9e6312400f2d194fe14f02e719b24c3f83bf1fd233a8f9630eb400323654cec63999b56f4ba44e8b21ab92d9d697fabe4666df3678585669
[+] Search Mode: Sequential search in the given range
[+] Reading bloom filter from file complete in : 26.70087 sec
[+] Reading Baby table from file complete in : 0.03731 sec
[+] seq value: 1000000000000000    m value : 3999999999
[+] Search Range: 0x40000000000000  to  0x80000000000000
                                                                [+] k1: 0x40000000000000
PVK not found. 1000.00000 Trillion scanned in 1.14 sec. New range [+] k1: 0x438d7ea4c68001
PVK not found. 1000.00000 Trillion scanned in 1.18 sec. New range [+] k1: 0x471afd498d0002
PVK not found. 1000.00000 Trillion scanned in 1.03 sec. New range [+] k1: 0x4aa87bee538003
PVK not found. 1000.00000 Trillion scanned in 1.05 sec. New range [+] k1: 0x4e35fa931a0004
PVK not found. 1000.00000 Trillion scanned in 0.99 sec. New range [+] k1: 0x51c37937e08005
PVK not found. 1000.00000 Trillion scanned in 1.01 sec. New range [+] k1: 0x5550f7dca70006
PVK not found. 1000.00000 Trillion scanned in 1.00 sec. New range [+] k1: 0x58de76816d8007
PVK not found. 1000.00000 Trillion scanned in 1.01 sec. New range [+] k1: 0x5c6bf526340008
PVK not found. 1000.00000 Trillion scanned in 1.09 sec. New range [+] k1: 0x5ff973cafa8009
PVK not found. 1000.00000 Trillion scanned in 0.93 sec. New range [+] k1: 0x6386f26fc1000a
PVK not found. 1000.00000 Trillion scanned in 0.96 sec. New range [+] k1: 0x6714711487800b
PVK not found. 1000.00000 Trillion scanned in 1.00 sec. New range [+] k1: 0x6aa1efb94e000c
============== KEYFOUND ============== 1
BSGS FOUND PrivateKey  0x6abe1f9b67e114
======================================
Start 1  :  2021-10-29 11:34:19
Start 1_0:  2021-10-29 11:34:45
END   1  :  2021-10-29 11:34:59

1 CPU speed:
BSGS Check 0x38D7EA4C68000  key / second

Pages: « 1 2 3 4 5 6 [7] 8 9 10 11 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!