Bitcoin Forum
June 22, 2024, 06:45:50 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 ... 142 »
  Print  
Author Topic: Pollard's kangaroo ECDLP solver  (Read 56632 times)
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
May 03, 2020, 03:24:11 PM
Last edit: May 03, 2020, 06:28:55 PM by arulbero
 #61

It works with dp=0 because random walk are not really important, they just generate random sequence, and you don't care about the path, so it correctly solve the key (the distance is also stored in the hastable) but with dp>0 paths are of course important.
You're right there is probably a bug with my storage of symmetric point.

I have explained the bug,
the bug is not in the storage, is in the way you generate the next step of each sequence:

only symmetric jumps (+-G, +-2G, +-4G, +- 17G) lead a tame and a wild from 2 symmetric points (first collision) to other 2 symmetric points until they meet the 'same' 2 DPs (other 2 symmetric points).

If we call 2 symmetric point (a collision) T and W=-T , and DP > 0, the next step produces 2 different points:

T,   W=-T                                                                ->        T1,  W1        ->   Tk = DP,    Wk != Tk   Wk is not a DP!!

(x,y),  (x,-y)                                                           -> (x1,y1),  (x1',y1') ->    (xk, yk),  (xk',yk')

same x -> same jump = +s*G -> different points                x1 != x1'                    xk !=xk'

if T and W are 2 symmetric points, T1=T+s*G and W1=-T+s*G are not!!!

because you use +s*G for T and W, even if they have 2 opposite y-coordinates!

You have to use +s*G for T and -s*G for W (or viceversa)!!

In this way:

T,    W=-T                                                                       ->     T1=T+s*G , W1=-T-s*G    and T1 = -W1 !
(x,y),  (x,-y)                                                                    ->          (x1,y1)  ,     (x1,-y1)
same x -> opposite jumps = +-s*G -> symmetric points                        x1 = x1  
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
May 03, 2020, 04:39:36 PM
Last edit: May 03, 2020, 04:53:39 PM by arulbero
 #62

The number field sieve is applicable to finite fields and to elliptic curves that admit embeddings into relatively small finite fields. Such elliptic curves are called pairing-friendly, and are not normally used for ordinary ECDH key agreement or ECDSA or EdDSA signatures like Bitcoin—secp256k1, for instance, ...

It is not allowed to copy something without quoting the source:

https://crypto.stackexchange.com/questions/52655/what-are-the-fastest-attacks-on-ecdlp


MrFreeDragon
Sr. Member
****
Offline Offline

Activity: 443
Merit: 350


View Profile
May 04, 2020, 03:11:21 AM
 #63

This is just a test of speed on Tesla V100 32Gb.

For speed tests were used 72bit keys (available together with 14beta release)
The GPU speed is 1415-1425 Mkey/sec with v1.4beta. The 72bit keys is found for 2-3min per one key.

Code:
$ ./kangaroo -gpu -t 0 in72.txt
Kangaroo v1.4beta
Start:59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1000000000000000000
Stop :59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1FFFFFFFFFFFFFFFFFF
Keys :20
Number of CPU thread: 0
Range width: 2^72
Jump Avg distance: 2^36.04
Number of kangaroos: 2^21.32
Suggested DP: 13
Expected operations: 2^37.15
Expected RAM: 1419.0MB
DP size: 13 [0xfff8000000000000]
GPU: GPU #0 Tesla V100-PCIE-32GB (80x64 cores) Grid(160x128) (249.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^21.32 kangaroos in 15264.7ms
[1424.23 MK/s][GPU 1424.23 MK/s][Count 2^37.12][Dead 0][02:00 (Avg 01:46)][1399.1MB]
Key# 0 Pub:  0x038F63B86D8EE91D4B78FF4680F927DCC7754CF734A386ED5FA45E71DE9328F433
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D102F962838A4EE7364E
[1421.86 MK/s][GPU 1421.86 MK/s][Count 2^37.40][Dead 3][02:40 (Avg 01:47)][1688.6MB]
Key# 1 Pub:  0x0389044AFFFD381B496D63F8C80CDAAB5E57E40DEE6A56C8AFA5194B1FFD83FEBB
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1F5821BA583622EEE49
[1424.46 MK/s][GPU 1424.46 MK/s][Count 2^37.33][Dead 1][02:34 (Avg 01:46)][1619.6MB]
Key# 2 Pub:  0x0338E88602F88C3268C68552C4C53987F41BB42335A8E36658A80D2F5BDF63615B
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D110DC466D7E3D293A10
[1415.73 MK/s][GPU 1415.73 MK/s][Count 2^36.93][Dead 1][02:00 (Avg 01:47)][1224.5MB]
Key# 3 Pub:  0x026776529C6C8932ABF9DCFDCB2DB2784DCE82164914D4C3294FECFE1B48F3BF27
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1DE55BDE3E1B2F36A9B
[1416.26 MK/s][GPU 1416.26 MK/s][Count 2^37.58][Dead 2][03:00 (Avg 01:47)][1916.6MB]
Key# 4 Pub:  0x03312940E0EE296C23B1E7888A4D23FED01358C1021FE2091D909B0A8D8AA80DE1
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D11B78CA8A9FBCDBDE3E
[1419.49 MK/s][GPU 1419.49 MK/s][Count 2^37.06][Dead 0][02:10 (Avg 01:47)][1340.6MB]
Key# 5 Pub:  0x020097A3826A7BC1EC5383AE390EEA8C436B2B6D2E3770493F7E993B164C9F233E
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D152477D7EC473A483D4
[1420.57 MK/s][GPU 1420.57 MK/s][Count 2^37.53][Dead 6][02:54 (Avg 01:47)][1848.9MB]
Key# 6 Pub:  0x03DAE902F3F4E0A62AA7DF422B2BF802CD9ADA747177F853390BEB7F203E4C3F84
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D10D2AB0CDE883E6B3BC
[1419.72 MK/s][GPU 1419.72 MK/s][Count 2^37.03][Dead 2][02:08 (Avg 01:47)][1316.6MB]
Key# 7 Pub:  0x023C4E51FD6EB029CFD4DDCBD93FAD060C7D026D252EC37923355D1B192C69B9E3
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1B69F2EC54CEEBA50C4
[1420.12 MK/s][GPU 1420.12 MK/s][Count 2^37.10][Dead 1][02:14 (Avg 01:47)][1381.6MB]
Key# 8 Pub:  0x03C3112770BC8455596AF38A2A2E3F556B1F02758945608364C3AFD22FAD806B8D
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D18D06BC6E25E5C7EAEF
[1422.92 MK/s][GPU 1422.92 MK/s][Count 2^38.40][Dead 2][05:06 (Avg 01:47)][3383.7MB]
Key# 9 Pub:  0x0264AE637A90AA93798E7BA6CCFD57CB077BCCAE49D3AB76F6857CA044A423E3AC
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1C8A4760400224E8C05

Jean_Luc (OP)
Sr. Member
****
Offline Offline

Activity: 462
Merit: 696


View Profile
May 04, 2020, 11:31:10 AM
 #64

Hi,

Many thanks MrFreeDragon the tests Wink
Many thanks to arulbero for helping me solving the issue Smiley

I found the bugs, these was 2 !
The jump table size, I already checked that but the combination of the 2 bugs make this hard to find.
The split of equivalence class in the random walk.

There are only positive jumps in this table.
Loop are still possible because of the equivalence class switch "(xP,min{yP,q−yP})".

The bad news is that I had to put a large number of random jump (65536) to avoid cycles and this will decrease GPU performance Sad
I will investigate on choosing these jumps in order to reduce probability of cycle and having the smallest possible jump table.

The gain I get by spreading to [-N/8,N/8] was due to the fact that it optimize the overlap between Tame and Wild.
If we can compress the Wild to a very small range, we can reach 1.25 sqrt(N) but this a limit and it has dramatic effects on the border and on wild collisions.
 
Using dp 5 (500 trials, Wild in [-N/4 N/4])
[500] 2^17.581 Dead:0 Avg:2^20.608 (2^20.556)  => 1.52sqrt(N)

Using dp 5 (500 trials, Wild in [-N/8 N/8])
[500] 2^21.693 Dead:103 Avg:2^20.480 (2^20.556) => 1.39sqrt(N)


I will now code the GPU and I publish it ASAP Wink
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
May 04, 2020, 11:59:38 AM
 #65

There are only positive jumps in this table.
Loop are still possible because of the equivalence class switch "(xP,min{yP,q−yP})".

Only positive? Why? You mean you store only +k*G but you do -k*G too?

The bad news is that I had to put a large number of random jump (65536) to avoid cycles and this will decrease GPU performance Sad
I will investigate on choosing these jumps in order to reduce probability of cycle and having the smallest possible jump table.

I will try with python to do some tests. Could you provide me only the array of jumps you used?

like: [+-1,+-2,+-4,+-8,+-16].  

Given m and DP I want to see if we can set up a suitable array of jumps such that we minimize the probability of cycle.

You have to change your jumps. But choose them in a "intelligent" way, I think it is not too difficult.
'Intelligent' means that a man can't do this, let the probability do it. I think that the main problem is that you want to choose the jumps by yourself. Why?

You should study separately a sequence of jumps such that the probability to come back before 2^DP steps is low. This study depends only from jumps, DP and m (the average jump), not from points/coordinates/N!

For example, with these jumps [+-1,+-2,+-4,+-8,+-16]  you try to generate many sequences (starting from 0). If the number of sequences with lenght < 2^DP (or < k*2^DP) is too high, change randomly the jumps and retry. Lenght of sequence: number of steps before to get back to 0. Deal simply with integer numbers.

This setting phase should be done automatically at the start of the program. In this way for each DP you will have configured the jumps in order to get a lower probability of generating loops.
Jean_Luc (OP)
Sr. Member
****
Offline Offline

Activity: 462
Merit: 696


View Profile
May 04, 2020, 12:28:52 PM
 #66

I use a random jump table with a controlled average.
I store as before, no inforamtion on symetry is needed, the key is solved later.


Code:
void Kangaroo::CreateJumpTable() {

  int jumpBit = rangePower / 2;
  if(jumpBit > 128) jumpBit = 128;
  int maxRetry = 100;
  bool ok = false;
  double maxAvg = pow(2.0,(double)jumpBit - 0.95);
  double minAvg = pow(2.0,(double)jumpBit - 1.05);
  double distAvg;
  //::printf("Jump Avg distance min: 2^%.2f\n",log2(minAvg));
  //::printf("Jump Avg distance max: 2^%.2f\n",log2(maxAvg));

  // Kangaroo jumps

  // Positive only
  while(!ok && maxRetry>0 ) {
    Int totalDist;
    totalDist.SetInt32(0);
    for(int i = 0; i < NB_JUMP; ++i) {
      jumpDistance[i].Rand(jumpBit);
      if(jumpDistance[i].IsZero())
        jumpDistance[i].SetInt32(1);
      totalDist.Add(&jumpDistance[i]);
    }
    distAvg = totalDist.ToDouble() / (double)(NB_JUMP);
    ok = distAvg>minAvg && distAvg<maxAvg;
    maxRetry--;
  }

  for(int i = 0; i < NB_JUMP; ++i) {
    Point J = secp->ComputePublicKey(&jumpDistance[i]);
    jumpPointx[i].Set(&J.x);
    jumpPointy[i].Set(&J.y);
  }

  if(!ok) {
    ::printf("Warning, jump Avg distance out of bounds: 2^%.2f (restart the program)\n",log2(distAvg));
  } else {
    ::printf("Jump Avg distance: 2^%.2f\n",log2(distAvg));
  }

}

Here is the code of the random walk:

Code:
void Kangaroo::SolveKeyCPU(TH_PARAM *ph) {

  // Global init
  int thId = ph->threadId;
  counters[thId] = 0;

  // Create Kangaroos
  KANGAROO *herd = new KANGAROO[CPU_GRP_SIZE];

  IntGroup *grp = new IntGroup(CPU_GRP_SIZE);
  Int *dx = new Int[CPU_GRP_SIZE];

  if(keyIdx==0) {
    ::printf("SolveKeyCPU Thread %d: %d kangaroos\n",ph->threadId,CPU_GRP_SIZE);
  }

  ph->hasStarted = true;

  // Using Affine coord
  Int dy;
  Int rx;
  Int ry;
  Int _s;
  Int _p;

  uint64_t minW = 0;
  uint64_t maxW = 0;
  uint64_t minT = 0;
  uint64_t maxT = 0;

  while(!endOfSearch) {
  for(int j = 0; j<CPU_GRP_SIZE; j++)
    Create(&herd[j],j%2);

  uint64_t cnt = 0;
  while(!endOfSearch && cnt<maxStep) {

    for(int g = 0; g < CPU_GRP_SIZE; g++) {

      uint64_t jmp = herd[g].pos.x.bits64[0] % NB_JUMP;
      Int *p1x = &jumpPointx[jmp];
      Int *p2x = &herd[g].pos.x;
      dx[g].ModSub(p2x,p1x);

    }
    grp->Set(dx);
    grp->ModInv();

    for(int g = 0; g < CPU_GRP_SIZE; g++) {

      uint64_t jmp = herd[g].pos.x.bits64[0] % NB_JUMP;
      Int *p1x = &jumpPointx[jmp];
      Int *p1y = &jumpPointy[jmp];
      Int *p2x = &herd[g].pos.x;
      Int *p2y = &herd[g].pos.y;

      dy.ModSub(p2y,p1y);
      _s.ModMulK1(&dy,&dx[g]);
      _p.ModSquareK1(&_s);

      rx.ModSub(&_p,p1x);
      rx.ModSub(p2x);

      ry.ModSub(p2x,&rx);
      ry.ModMulK1(&_s);
      ry.ModSub(p2y);

      herd[g].distance.ModAddK1order(&jumpDistance[jmp]);

      // Equivalence symmetry class switch
      if(ry.ModPositiveK1())
        herd[g].distance.ModNegK1order();

      herd[g].pos.x.Set(&rx);
      herd[g].pos.y.Set(&ry);

    }

    for(int g = 0; g < CPU_GRP_SIZE && !endOfSearch; g++) {

      if(IsDP(herd[g].pos.x.bits64[3])) {
        LOCK(ghMutex);
        if(!endOfSearch) {

          if( !AddToTable(&herd[g].pos.x,&herd[g].distance,herd[g].type) ) {
            // Collision inside the same herd
            // We need to reset the kangaroo
            Create(&herd[g],herd[g].type,false);
            collisionInSameHerd++;
          }

        }
        UNLOCK(ghMutex);
      }

      if(!endOfSearch) counters[thId] ++;
      cnt++;

    }

  }
  }

  // Free
  delete grp;
  delete dx;
  delete[] herd;

  ph->isRunning = false;

}


Edit: And kangaroo creation:

Code:
void Kangaroo::Create(KANGAROO *k,int type,bool lock) {

  k->type = type;

  if(lock) LOCK(ghMutex);
  k->distance.Rand(rangePower - 1);
  if(lock) UNLOCK(ghMutex);

  if( type == TAME ) {

    // Tame in [0..N/2]
    k->pos = secp->ComputePublicKey(&k->distance);

  } else {

    // Wild in [-N/4..N/4]
    k->distance.ModSubK1order(&rangeWidthDiv4);
    Point O = secp->ComputePublicKey(&k->distance);
    k->pos = secp->AddDirect(keyToSearch,O);

  }

  // Equivalence symmetry class switch
  if(k->pos.y.ModPositiveK1())
    k->distance.ModNegK1order();

}
MrFreeDragon
Sr. Member
****
Offline Offline

Activity: 443
Merit: 350


View Profile
May 04, 2020, 12:37:00 PM
 #67

While the code is being optimized with the random walks, I have some more tests with 72bit key in order to know the speed rate per second for different GPU devices.

Tesla T4 16Gb has 565MKey/sec
RTX 2080ti 11Gb has 1225MKey/sec

Tesla T4 seems to be close to 1080ti results, but 2080ti is close to Tesla V100.

Tesla T4
Code:
$ ./kangaroo -gpu -t 0 in72.txt
Kangaroo v1.4beta
Start:59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1000000000000000000
Stop :59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1FFFFFFFFFFFFFFFFFF
Keys :10
Number of CPU thread: 0
Range width: 2^72
Jump Avg distance: 2^36.05
Number of kangaroos: 2^20.32
Suggested DP: 14
Expected operations: 2^37.15
Expected RAM: 710.0MB
DP size: 14 [0xfffc000000000000]
GPU: GPU #0 Tesla T4 (40x64 cores) Grid(80x128) (129.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.32 kangaroos in 7606.8ms
[566.64 MK/s][GPU 566.64 MK/s][Count 2^37.26][Dead 1][05:32 (Avg 04:28)][772.3MB]
Key# 0 Pub:  0x038F63B86D8EE91D4B78FF4680F927DCC7754CF734A386ED5FA45E71DE9328F433
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D102F962838A4EE7364E
[566.40 MK/s][GPU 566.40 MK/s][Count 2^36.95][Dead 0][04:38 (Avg 04:28)][621.6MB]
Key# 1 Pub:  0x0389044AFFFD381B496D63F8C80CDAAB5E57E40DEE6A56C8AFA5194B1FFD83FEBB
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1F5821BA583622EEE49
[563.86 MK/s][GPU 563.86 MK/s][Count 2^36.44][Dead 0][03:18 (Avg 04:30)][439.2MB]
Key# 2 Pub:  0x0338E88602F88C3268C68552C4C53987F41BB42335A8E36658A80D2F5BDF63615B
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D110DC466D7E3D293A10
[566.37 MK/s][GPU 566.37 MK/s][Count 2^38.16][Dead 3][10:27 (Avg 04:28)][1431.0MB]
Key# 3 Pub:  0x026776529C6C8932ABF9DCFDCB2DB2784DCE82164914D4C3294FECFE1B48F3BF27
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1DE55BDE3E1B2F36A9B
[567.90 MK/s][GPU 567.90 MK/s][Count 2^37.10][Dead 2][05:06 (Avg 04:28)][693.4MB]
Key# 4 Pub:  0x03312940E0EE296C23B1E7888A4D23FED01358C1021FE2091D909B0A8D8AA80DE1
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D11B78CA8A9FBCDBDE3E
[567.67 MK/s][GPU 567.67 MK/s][Count 2^37.89][Dead 5][08:45 (Avg 04:28)][1194.9MB]
Key# 5 Pub:  0x020097A3826A7BC1EC5383AE390EEA8C436B2B6D2E3770493F7E993B164C9F233E
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D152477D7EC473A483D4
[568.12 MK/s][GPU 568.12 MK/s][Count 2^37.69][Dead 1][07:34 (Avg 04:27)][1036.6MB]
Key# 6 Pub:  0x03DAE902F3F4E0A62AA7DF422B2BF802CD9ADA747177F853390BEB7F203E4C3F84
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D10D2AB0CDE883E6B3BC
[568.25 MK/s][GPU 568.25 MK/s][Count 2^37.42][Dead 0][06:18 (Avg 04:27)][860.3MB]
Key# 7 Pub:  0x023C4E51FD6EB029CFD4DDCBD93FAD060C7D026D252EC37923355D1B192C69B9E3
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1B69F2EC54CEEBA50C4
[566.46 MK/s][GPU 566.46 MK/s][Count 2^37.77][Dead 1][08:01 (Avg 04:28)][1096.9MB]
Key# 8 Pub:  0x03C3112770BC8455596AF38A2A2E3F556B1F02758945608364C3AFD22FAD806B8D
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D18D06BC6E25E5C7EAEF
[565.15 MK/s][GPU 565.15 MK/s][Count 2^36.78][Dead 0][04:06 (Avg 04:29)][554.3MB]
Key# 9 Pub:  0x0264AE637A90AA93798E7BA6CCFD57CB077BCCAE49D3AB76F6857CA044A423E3AC
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1C8A4760400224E8C05

Done: Total time 01:04:59

RTX 2080ti
Code:
$ ./kangaroo -gpu -t 0 in72.txt
Kangaroo v1.4beta
Start:59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1000000000000000000
Stop :59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1FFFFFFFFFFFFFFFFFF
Keys :10
Number of CPU thread: 0
Range width: 2^72
Jump Avg distance: 2^36.00
Number of kangaroos: 2^21.09
Suggested DP: 13
Expected operations: 2^37.15
Expected RAM: 1419.0MB
DP size: 13 [0xfff8000000000000]
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos in 15220.1ms
[1229.83 MK/s][GPU 1229.83 MK/s][Count 2^37.86][Dead 4][03:52 (Avg 02:03)][2330.7MB]
Key# 0 Pub:  0x038F63B86D8EE91D4B78FF4680F927DCC7754CF734A386ED5FA45E71DE9328F433
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D102F962838A4EE7364E
[1229.29 MK/s][GPU 1229.29 MK/s][Count 2^37.30][Dead 1][02:52 (Avg 02:03)][1578.0MB]
Key# 1 Pub:  0x0389044AFFFD381B496D63F8C80CDAAB5E57E40DEE6A56C8AFA5194B1FFD83FEBB
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1F5821BA583622EEE49
[1224.87 MK/s][GPU 1224.87 MK/s][Count 2^37.83][Dead 3][04:02 (Avg 02:04)][2274.8MB]
Key# 2 Pub:  0x0338E88602F88C3268C68552C4C53987F41BB42335A8E36658A80D2F5BDF63615B
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D110DC466D7E3D293A10
[1219.55 MK/s][GPU 1219.55 MK/s][Count 2^36.94][Dead 1][02:18 (Avg 02:04)][1236.3MB]
Key# 3 Pub:  0x026776529C6C8932ABF9DCFDCB2DB2784DCE82164914D4C3294FECFE1B48F3BF27
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1DE55BDE3E1B2F36A9B
[1222.71 MK/s][GPU 1222.71 MK/s][Count 2^36.18][Dead 0][01:28 (Avg 02:04)][732.0MB]
Key# 4 Pub:  0x03312940E0EE296C23B1E7888A4D23FED01358C1021FE2091D909B0A8D8AA80DE1
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D11B78CA8A9FBCDBDE3E
[1221.42 MK/s][GPU 1221.42 MK/s][Count 2^36.97][Dead 0][02:20 (Avg 02:04)][1255.1MB]
Key# 5 Pub:  0x020097A3826A7BC1EC5383AE390EEA8C436B2B6D2E3770493F7E993B164C9F233E
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D152477D7EC473A483D4
[1219.38 MK/s][GPU 1219.38 MK/s][Count 2^37.61][Dead 1][03:30 (Avg 02:04)][1953.7MB]
Key# 6 Pub:  0x03DAE902F3F4E0A62AA7DF422B2BF802CD9ADA747177F853390BEB7F203E4C3F84
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D10D2AB0CDE883E6B3BC
[1227.23 MK/s][GPU 1227.23 MK/s][Count 2^37.14][Dead 0][02:36 (Avg 02:04)][1414.3MB]
Key# 7 Pub:  0x023C4E51FD6EB029CFD4DDCBD93FAD060C7D026D252EC37923355D1B192C69B9E3
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1B69F2EC54CEEBA50C4
[1224.22 MK/s][GPU 1224.22 MK/s][Count 2^37.10][Dead 0][02:32 (Avg 02:04)][1373.2MB]
Key# 8 Pub:  0x03C3112770BC8455596AF38A2A2E3F556B1F02758945608364C3AFD22FAD806B8D
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D18D06BC6E25E5C7EAEF
[1229.32 MK/s][GPU 1229.32 MK/s][Count 2^35.49][Dead 0][01:00 (Avg 02:03)][453.5MB]
Key# 9 Pub:  0x0264AE637A90AA93798E7BA6CCFD57CB077BCCAE49D3AB76F6857CA044A423E3AC
       Priv: 0x59C0465E6C5C5502E17E79A74FE1FF5365707890EB68D1C8A4760400224E8C05

Done: Total time 28:49

MrFreeDragon
Sr. Member
****
Offline Offline

Activity: 443
Merit: 350


View Profile
May 04, 2020, 12:56:55 PM
Merited by ABCbits (3)
 #68

This is a benchmark post with speeds for Kangaroo GPU Solver. All the tests were made with default DP and default grid size (calculated by a program). I guess that some plays with DP and grid size could change (increase or decrease) the speed. If somebody knows the optimal values, please let us know.

Code:
  Card Model           Grid size      DP        Tested speed
---------------------------------------------------------------
GTX 1050 Ti          Grid(12x256)   DP 16     115 MKey/sec
GTX 1080 Ti          Grid(56x256)   DP 15     500 MKey/sec
Tesla T4 16Gb        Grid(80x128)   DP 14     565 MKey/sec
RTX 2080ti 11Gb      Grid(136x128)  DP 13     1225 MKey/sec
Tesla V100 32Gb      Grid(160x128)  DP 13     1420 MKey/sec
---------------------------------------------------------------

It looks like the Kangaroo solver works fine with GeForce family cards but not with Quadro/NVS family. For example Tesla V100 is 4-5 times more expensive than RTX 2080ti, but has the speed only 15% higher.

And we also can learn from this table that the most efficient card for kangaroo solver is RTX 2080ti 11Gb which has the best performance in terms of speed/cost.

PS. Tests were made for 10 keys with 2^72 range width

EDIT: some spelling mistakes were corrected

Jean_Luc (OP)
Sr. Member
****
Offline Offline

Activity: 462
Merit: 696


View Profile
May 04, 2020, 01:17:47 PM
 #69

Thanks for the benchmark Wink

I committed on github.
GPU is not yet ready but CPU works (at least for 40bits)
R

Code:
pons@linpons:~/Kangaroo$ ./kangaroo -d 5 VC_CUDA8/in40_1000.txt

It runs in stat mode.
If you want to make test.
HardwareCollector
Member
**
Offline Offline

Activity: 144
Merit: 10


View Profile
May 04, 2020, 01:37:46 PM
 #70

@Jean_Luc

On the GPU side of things, I get different estimations for "Suggested DP", "Expected operations", and "Expected RAM" for the same problem/interval size on machines with dissimilar hardware configurations, please see below. It may be also helpful to add standard estimations based on the interval size alone. For example, on one my of servers configured with 8x RTX 2080 Ti while using the program defaults, the performance in terms iterations per second quickly degrades from ~10G to less than ~1G.

Thanks

8x RTX 2080 Tis with PL=240W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0,1,2,3,4,5,6,7 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^41.99
Number of kangaroos: 2^24.09
Suggested DP: 17
Expected operations: 2^44.36
Expected RAM: 1646.6MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#1: creating kangaroos...
GPU: GPU #2 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#2: creating kangaroos...
GPU: GPU #5 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#5: creating kangaroos...
GPU: GPU #6 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#6: creating kangaroos...
GPU: GPU #3 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#3: creating kangaroos...
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
GPU: GPU #7 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#7: creating kangaroos...
GPU: GPU #4 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#4: creating kangaroos...
SolveKeyGPU Thread GPU#7: 2^21.09 kangaroos in 19929.3ms
SolveKeyGPU Thread GPU#2: 2^21.09 kangaroos in 20113.4ms
SolveKeyGPU Thread GPU#5: 2^21.09 kangaroos in 20136.4ms
SolveKeyGPU Thread GPU#6: 2^21.09 kangaroos in 20160.7ms
SolveKeyGPU Thread GPU#4: 2^21.09 kangaroos in 20179.5ms
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos in 20247.4ms
SolveKeyGPU Thread GPU#3: 2^21.09 kangaroos in 22114.7ms
SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos in 22203.1ms
[9802.06 MK/s][GPU 9802.06 MK/s][Count 2^44.27][Dead 2][41:14 (Avg 38:27)][1550.0MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 41:48

1x RTX 2070 with PL=120W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^42.03
Number of kangaroos: 2^20.17
Suggested DP: 21
Expected operations: 2^43.65
Expected RAM: 1003.7MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #0 GeForce RTX 2070 (36x64 cores) Grid(72x128) (117.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.17 kangaroos in 9119.2ms
[768.80 MK/s][GPU 768.80 MK/s][Count 2^43.99][Dead 3][07:13:01 (Avg 04:58:44)][1278.3MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 07:13:16


Jean_Luc (OP)
Sr. Member
****
Offline Offline

Activity: 462
Merit: 696


View Profile
May 04, 2020, 01:41:39 PM
 #71

Yes, these estimations are trickly to calculate, i will try improve things...

Impressive result with the 8 RTX 2080 Ti Cheesy
MrFreeDragon
Sr. Member
****
Offline Offline

Activity: 443
Merit: 350


View Profile
May 04, 2020, 02:40:02 PM
Last edit: May 04, 2020, 03:01:02 PM by MrFreeDragon
 #72

-snip-
I committed on github.
GPU is not yet ready but CPU works (at least for 40bits)
R
Code:
pons@linpons:~/Kangaroo$ ./kangaroo -d 5 VC_CUDA8/in40_1000.txt
It runs in stat mode.
If you want to make test.

Yes, I made a test with your last commit on CPU (16 threads) Intel Xeon 2Ghz: 1000 keys of 40bit range were found for 1:39 min Smiley So, this is 99 seconds for 1000 keys (10 keys for 1 second)

Code:
Kangaroo v1.4gamma
Start:62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A0000000000
Stop :62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AFFFFFFFFFF
Keys :1000
Number of CPU thread: 16
Range width: 2^40
Jump Avg distance: 2^18.99
Number of kangaroos: 2^14.00
Suggested DP: 5
Expected operations: 2^21.00
Expected RAM: 6.0MB
DP size: 5 [0xf800000000000000]
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos

Key# 0 [1S]Pub:  0x022FF363EBC536B7CF20DECD41F0B86B90992937B4E38D9F616C7F980AACD9C922
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A31CED094F0
[  0] 2^20.958 Dead:8 Avg:2^20.958 (2^21.000)

Key# 1 [1S]Pub:  0x0290E532FA5DABF668B4EC1F7FC4727AFEB1A4AE3FB3009E74045797B7C55C12A7
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A31D2F397AF
[  1] 2^21.390 Dead:12 Avg:2^21.190 (2^21.000)

Key# 2 [1N]Pub:  0x02032B8F3648BBC1D1666CAC1D0100462913C72E8736DE615C807CC13B78A63FFA
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A00A71C8425
[  2] 2^20.854 Dead:6 Avg:2^21.087 (2^21.000)

Key# 3 [1N]Pub:  0x03FB6688436949D86D446A2CEF4A66958D99490064F0BF9D7F677F83EC867F9B46
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A73420E46DF
[  3] 2^21.579 Dead:29 Avg:2^21.226 (2^21.000)

Key# 4 [1N]Pub:  0x02CBCACB58AF8BFD517BDAE6363C3E050E988E46A222F79BAEE88A512D6B885F36
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004ACEF20EBCFD
[  4] 2^20.672 Dead:9 Avg:2^21.131 (2^21.000)

.............................................................
deleted due to message size restricrtions
.............................................................

Key#994 [1S]Pub:  0x038926680E04C0622E1DC996AEC5B7AF1CCFAB9186BC801D6563C67A9E6CD836B3
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AF7B4AE2C1F
[994] 2^21.280 Dead:12 Avg:2^20.792 (2^21.000)

Key#995 [1N]Pub:  0x0260FA97C34F238CB14858BF439CBCE84C8BC0FAFA7B9F6B693A5ADF7E8ADB8648
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AD8100D4FFE
[995] 2^20.110 Dead:2 Avg:2^20.792 (2^21.000)

Key#996 [1N]Pub:  0x03E7E47031870D9E1E028E7996B2FB14E494E659399F0C21698C9410451E85E97B
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004ADC798EE740
[996] 2^21.098 Dead:12 Avg:2^20.792 (2^21.000)

Key#997 [1N]Pub:  0x025F07F6D4F63D6FEB1857B3E82080C1EF18DDCB535C8698215B2B4CCA0A2E2302
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AD23E45CBA9
[997] 2^20.731 Dead:11 Avg:2^20.792 (2^21.000)

Key#998 [1S]Pub:  0x039F0D9D4A0ECEB0EA57BB3183A4C691B2D1568C819386777F8DB1E91763607BD7
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004AB7B2FEFC39
[998] 2^20.949 Dead:7 Avg:2^20.792 (2^21.000)

Key#999 [1S]Pub:  0x034D3E8FB5B7D9FA27B207BB9A7EE0D43D30021682965BC3D495DFEC7C40D1ABB6
       Priv: 0x62CE27C8FED90758A834C2CB6E3F19BC8A0B5E7D92C0FC0F57004A0074C8377A
[999] 2^21.273 Dead:21 Avg:2^20.793 (2^21.000)

Done: Total time 01:39

Full test results are here (valid for 180 days only): https://pastebin.com/JFkpvhAA

Maybe this will be helpful for you, Jean_Luc

Jean_Luc (OP)
Sr. Member
****
Offline Offline

Activity: 462
Merit: 696


View Profile
May 04, 2020, 03:07:15 PM
 #73

Many thanks.
Note that here you have an expected of 2^21 because it take in consideration the overload due dp and the number of kangaroos.
I see that it over estimate a bit.
These results are very interesting to adapt the calculation of expected memory and operations.

To answer a bit in details to HardwareCollector, these calculations are dependent on the range size, the number of kangaroo (so the number of thread, grid size, number of GPU,... ) and dp (number of distinguished point bits) so it is normal that you get different results on different configurations.

The calculation of suggested DP is a poor approximation , it need some works to improve it.

Kpot87
Jr. Member
*
Offline Offline

Activity: 39
Merit: 1


View Profile
May 04, 2020, 03:50:18 PM
 #74

@Jean_Luc

On the GPU side of things, I get different estimations for "Suggested DP", "Expected operations", and "Expected RAM" for the same problem/interval size on machines with dissimilar hardware configurations, please see below. It may be also helpful to add standard estimations based on the interval size alone. For example, on one my of servers configured with 8x RTX 2080 Ti while using the program defaults, the performance in terms iterations per second quickly degrades from ~10G to less than ~1G.

Thanks

8x RTX 2080 Tis with PL=240W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0,1,2,3,4,5,6,7 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^41.99
Number of kangaroos: 2^24.09
Suggested DP: 17
Expected operations: 2^44.36
Expected RAM: 1646.6MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #1 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#1: creating kangaroos...
GPU: GPU #2 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#2: creating kangaroos...
GPU: GPU #5 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#5: creating kangaroos...
GPU: GPU #6 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#6: creating kangaroos...
GPU: GPU #3 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#3: creating kangaroos...
GPU: GPU #0 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
GPU: GPU #7 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#7: creating kangaroos...
GPU: GPU #4 GeForce RTX 2080 Ti (68x64 cores) Grid(136x128) (213.0 MB used)
SolveKeyGPU Thread GPU#4: creating kangaroos...
SolveKeyGPU Thread GPU#7: 2^21.09 kangaroos in 19929.3ms
SolveKeyGPU Thread GPU#2: 2^21.09 kangaroos in 20113.4ms
SolveKeyGPU Thread GPU#5: 2^21.09 kangaroos in 20136.4ms
SolveKeyGPU Thread GPU#6: 2^21.09 kangaroos in 20160.7ms
SolveKeyGPU Thread GPU#4: 2^21.09 kangaroos in 20179.5ms
SolveKeyGPU Thread GPU#0: 2^21.09 kangaroos in 20247.4ms
SolveKeyGPU Thread GPU#3: 2^21.09 kangaroos in 22114.7ms
SolveKeyGPU Thread GPU#1: 2^21.09 kangaroos in 22203.1ms
[9802.06 MK/s][GPU 9802.06 MK/s][Count 2^44.27][Dead 2][41:14 (Avg 38:27)][1550.0MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 41:48

1x RTX 2070 with PL=120W:
Code:
./Kangaroo14 -t 0 -d 20 -gpu -gpuId 0 in85.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 0
Range width: 2^85
Jump Avg distance: 2^42.03
Number of kangaroos: 2^20.17
Suggested DP: 21
Expected operations: 2^43.65
Expected RAM: 1003.7MB
DP size: 20 [0xfffff00000000000]
GPU: GPU #0 GeForce RTX 2070 (36x64 cores) Grid(72x128) (117.0 MB used)
SolveKeyGPU Thread GPU#0: creating kangaroos...
SolveKeyGPU Thread GPU#0: 2^20.17 kangaroos in 9119.2ms
[768.80 MK/s][GPU 768.80 MK/s][Count 2^43.99][Dead 3][07:13:01 (Avg 04:58:44)][1278.3MB]
Key# 0 Pub:  0x0329C4574A4FD8C810B7E42A4B398882B381BCD85E40C6883712912D167C83E73A
       Priv: 0x11720C4F018D51B8CEBBA8

Done: Total time 07:13:16



Hi! How you connect 8RTX ? Rizer? Thanks
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
May 04, 2020, 04:20:46 PM
Last edit: May 04, 2020, 05:21:55 PM by arulbero
 #75

I use a random jump table with a controlled average.
I store as before, no inforamtion on symetry is needed, the key is solved later.


Code:
void Kangaroo::CreateJumpTable() {

  int jumpBit = rangePower / 2;
  if(jumpBit > 128) jumpBit = 128;
  int maxRetry = 100;
  bool ok = false;
  double maxAvg = pow(2.0,(double)jumpBit - 0.95);
  double minAvg = pow(2.0,(double)jumpBit - 1.05);
  double distAvg;
  //::printf("Jump Avg distance min: 2^%.2f\n",log2(minAvg));
  //::printf("Jump Avg distance max: 2^%.2f\n",log2(maxAvg));

  // Kangaroo jumps

  // Positive only
  while(!ok && maxRetry>0 ) {
    Int totalDist;
    totalDist.SetInt32(0);
    for(int i = 0; i < NB_JUMP; ++i) {
      jumpDistance[i].Rand(jumpBit);
      if(jumpDistance[i].IsZero())
        jumpDistance[i].SetInt32(1);
      totalDist.Add(&jumpDistance[i]);
    }
    distAvg = totalDist.ToDouble() / (double)(NB_JUMP);
    ok = distAvg>minAvg && distAvg<maxAvg;
    maxRetry--;
  }

  for(int i = 0; i < NB_JUMP; ++i) {
    Point J = secp->ComputePublicKey(&jumpDistance[i]);
    jumpPointx[i].Set(&J.x);
    jumpPointy[i].Set(&J.y);
  }

  if(!ok) {
    ::printf("Warning, jump Avg distance out of bounds: 2^%.2f (restart the program)\n",log2(distAvg));
  } else {
    ::printf("Jump Avg distance: 2^%.2f\n",log2(distAvg));
  }

}

I simulated many walks with your CreateJumpTable (I did a porting in python).

I found a very interesting thing: finally I know how are these loops!!

I started 2000 walks (not together, I was interested in 'selfcollisions' only) from a position '0', and using only integer numbers I found out that:

almost all cycles have length exactly = 2!

I generated 32 positive jumps (and the 32 negative, don't worry, it is like you did but I prefer to write them)

Code:
jumpBit = 20
[b]Jump Avg distance[/b]:
minAvg, [b]distAvg[/b], maxAvg
506428.82601934916 [b]530326.5625[/b] 542776.9763909484

[907868, -907868, 716251, -716251, 1007711, -1007711, 720977, -720977, 901512, -901512, 835718, -835718, 363381, -363381, 475363, -475363, 770732, -770732, 217555, -217555, 67168, -67168, 615504, -615504, 584908, -584908, 768134, -768134, 450427, -450427, 926634, -926634, 458021, -458021, 854265, -854265, 589522, -589522, 280841, -280841, 172112, -172112, 32886, -32886, 279644, -279644, 525179, -525179, 200017, -200017, 372430, -372430, 66448, -66448, 905754, -905754, 1000160, -1000160, 331424, -331424, 170629, -170629, 401275, -401275]

and these are the results:

I got 787 (39,35%) collisions (only cycles!!) over 2000 walks (I stopped each walk after 32 steps) but the most important thing is that I understood why.

783 over 787 has length 2!!  Only 2 loops have length 4 and other 2 loops have length 6.

This is logic: what is the chance that a point Q produces a loop of length 2?

Q -> R -> Q   The probability is 1/64 (1/(number of effectively different jumps), if the jumps from R and Q is opposite to the jumps from Q and R.

Then when you produce a sequence of points, each point you generate has the probability 1/(number of effectively different jumps)) to create a loop of length 2 with the next point!

Some basic computation:  

probability to meet a 1 point that create a loop of length 2 with the next point: 1/NUM_JUMPS
probability to generate 32 (or 2^DP) consecutive points that not produce a 2-loop : (1 - 1/NUM_JUMPS)^32
probability to generate 32 consecutive points with at least one that produces a 2-loop: 1 - (1-NUM_JUMPS)^32

in my case: 1 - (1 - 1/64)^32 = 0.39586 probability of collision in a walk

The bad news is that I had to put a large number of random jump (65536) to avoid cycles and this will decrease GPU performance Sad
I will investigate on choosing these jumps in order to reduce probability of cycle and having the smallest possible jump table.


This explain perfectly why you have to use 65536 jumps instead of 32, because in this way you got:

1 - (1 - 1/65536)^32 = 0.000488

The simplest way to avoid to use a large number of jumps and to avoid almost all loops is to modify the condition of the jump, you have to store the last movement (jump):

you have: P -> Q -> R

when you have to decide how to do the next jump (from Q to R), do it normally unless the random jump is the opposite of the previous, in that case instead of doing -k*G do +k*G (in this way you avoid the loop and mantain the symmetry).

EDIT: the probability to have a loop of length 3 is instead very low, because you should have 3 numbers (jump) a,b,c such that a + b + c = 0 and that is not frequent (a,b,c are 3 random numbers between 32(64) generated in a 2^20 space!)

Second test:

20000 walks:

7610 loops of length 2, 101 loops of length 4, 5 loops of length 6,  ....
this is the sequence of lenght of each loop:

-> 7610, 0, 101, 0, 5, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 3, 0, 9, 0

38,05 %  of walk fall in a 2-loop!
38,66 % of loops in walks limited to 32 steps ...

Jean_Luc (OP)
Sr. Member
****
Offline Offline

Activity: 462
Merit: 696


View Profile
May 04, 2020, 04:54:06 PM
 #76

Many thanks.
I will try this tomorrow. Hope the test of the last jump and the symmetry will no kill GPU performance.
We have to find a determinist logic to get an other jump if we get the opposite.
Thanks Wink
HardwareCollector
Member
**
Offline Offline

Activity: 144
Merit: 10


View Profile
May 04, 2020, 04:59:43 PM
 #77

@Kpot87
Please send a private message and I will respond when I have a minute or two.

@Jean_Luc
Please see below for 65-bit interval tests with your latest CPU (Kangaroo v1.4gamma) changes:

AMD Ryzen 7 1800X Eight-Core Processor with program defaults:
Code:
./kangaroo -t 16 in65.txt
Kangaroo v1.4gamma
Start:FFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^65
Jump Avg distance: 2^31.00
Number of kangaroos: 2^14.00
Suggested DP: 17
Expected operations: 2^33.33
Expected RAM: 7.3MB
DP size: 17 [0xffff800000000000]
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
[33.27 MK/s][GPU 0.00 MK/s][Count 2^35.06][Dead 145][19:25 (Avg 05:25)][7.5MB]
Key# 0 [1S]Pub:  0x0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
       Priv: 0x1A838B13505B26867
[  0] 2^35.061 Dead:145 Avg:2^35.061 (2^33.333)

Done: Total time 19:27

AMD Ryzen 7 1800X Eight-Core Processor with DP=8:
Code:
./kangaroo -t 16 -d 8 in65.txt
Kangaroo v1.4gamma
Start:FFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^65
Jump Avg distance: 2^31.00
Number of kangaroos: 2^14.00
Suggested DP: 17
Expected operations: 2^33.05
Expected RAM: 2659.0MB
DP size: 8 [0xff00000000000000]
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
[34.98 MK/s][GPU 0.00 MK/s][Count 2^34.42][Dead 638][12:46 (Avg 04:14)][1250.3MB]
Key# 0 [1S]Pub:  0x0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
       Priv: 0x1A838B13505B26867
[  0] 2^34.425 Dead:639 Avg:2^34.425 (2^33.054)

Done: Total time 12:52
arulbero
Legendary
*
Offline Offline

Activity: 1915
Merit: 2074


View Profile
May 04, 2020, 05:11:53 PM
Last edit: May 04, 2020, 08:25:01 PM by arulbero
 #78

Many thanks.
I will try this tomorrow. Hope the test of the last jump and the symmetry will no kill GPU performance.
We have to find a determinist logic to get an other jump if we get the opposite.
Thanks Wink

It is only a test between 2 integers (32 bit <-> 32 bit, a previous index with a current index in an array of 32/64 elements,  I don't think it is too much )

you compute the next jump as usual, then

if(next_jump != -prev_jump): ok
else
 next_jump = prev_jump  (you double the previous jump)

P -> P+k*G -> P+k*G+k*G

Note: in this case the inverse of (x2-x1) is the same:

(P + k*G) + k*G has the same inverse of (P+k*G) - k*G,

then you can compute the inverse before to check the sign of the y-coordinate and the prev_jump
HardwareCollector
Member
**
Offline Offline

Activity: 144
Merit: 10


View Profile
May 04, 2020, 05:27:34 PM
 #79

@Jean_Luc

For larger search intervals, the previous version (v1.4beta) appears to be significantly faster in terms of wall clock time than the current version (v1.4gamma). I am ready to test with a 90-bit search intervals once the latest GPU code is ready.

AMD Ryzen 7 1800X Eight-Core Processor with program defaults (v1.4beta):
Code:
./kangaroo -t 16 in65.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^65
Jump Avg distance: 2^32.01
Number of kangaroos: 2^14.00
Suggested DP: 17
Expected operations: 2^33.65
Expected RAM: 8.8MB
DP size: 17 [0xffff800000000000]
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
[52.89 MK/s][GPU 0.00 MK/s][Count 2^33.00][Dead 1][03:06 (Avg 04:14)][10.5MB]
Key# 0 Pub:  0x0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
       Priv: 0x1A838B13505B26867

Done: Total time 03:06

AMD Ryzen 7 1800X Eight-Core Processor with DP=8 (v1.4beta):

Code:
./kangaroo -t 16 -d 8 in65.txt
Kangaroo v1.4beta
Start:FFFFFFFFFFFFFFFF
Stop :1FFFFFFFFFFFFFFFF
Keys :1
Number of CPU thread: 16
Range width: 2^65
Jump Avg distance: 2^31.97
Number of kangaroos: 2^14.00
Suggested DP: 17
Expected operations: 2^33.65
Expected RAM: 4011.6MB
DP size: 8 [0xff00000000000000]
SolveKeyCPU Thread 7: 1024 kangaroos
SolveKeyCPU Thread 11: 1024 kangaroos
SolveKeyCPU Thread 5: 1024 kangaroos
SolveKeyCPU Thread 10: 1024 kangaroos
SolveKeyCPU Thread 1: 1024 kangaroos
SolveKeyCPU Thread 13: 1024 kangaroos
SolveKeyCPU Thread 9: 1024 kangaroos
SolveKeyCPU Thread 4: 1024 kangaroos
SolveKeyCPU Thread 0: 1024 kangaroos
SolveKeyCPU Thread 2: 1024 kangaroos
SolveKeyCPU Thread 12: 1024 kangaroos
SolveKeyCPU Thread 6: 1024 kangaroos
SolveKeyCPU Thread 15: 1024 kangaroos
SolveKeyCPU Thread 8: 1024 kangaroos
SolveKeyCPU Thread 14: 1024 kangaroos
SolveKeyCPU Thread 3: 1024 kangaroos
[49.60 MK/s][GPU 0.00 MK/s][Count 2^33.43][Dead 0][04:20 (Avg 04:31)][3443.0MB]
Key# 0 Pub:  0x0230210C23B1A047BC9BDBB13448E67DEDDC108946DE6DE639BCC75D47C0216B1B
       Priv: 0x1A838B13505B26867

Done: Total time 04:33
Jean_Luc (OP)
Sr. Member
****
Offline Offline

Activity: 462
Merit: 696


View Profile
May 04, 2020, 05:41:25 PM
 #80

OK thansk I'll try tomorow
The 1.4gamma is still under dev...
Pages: « 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 ... 142 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!