Latest posts of: Jean

Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 »

441

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 21, 2019, 06:20:07 PM

It can be due to a wrong optimization concerning a carry somewhere which could explain that it works from time to time. I had a similar problem with the CPU release when I compiled with gcc 6, gcc 7 or Visual C++ work flawlessly. The patch (a volatile also) is at IntMop.cpp:859 and IntMp.cpp:915

442

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 21, 2019, 06:12:52 PM

You can also try VanitySearch -u -check It will perform the check using uncompressed addresses and so use the CheckHashUncomp() function which is similar except that it calls GetHash160() instead of GetHash160Comp()

443

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 21, 2019, 06:06:22 PM

endo and sym are computed in CheckHashComp() in GPUCompute.h.
I quoted my last post and I added few comments.
The point (px,py) is always OK so no errors before CHECK_POINT(h, incr, 0);
The errors randomly appear after this line.
It seems that nvcc generates (in your case) a wrong code.

Quote from: Jean_Luc on March 17, 2019, 09:48:34 AM

Code:

__device__ __noinline__ void CheckHashComp(prefix_t *prefix, uint64_t *px, uint64_t *py,
  int32_t incr, uint32_t tid, uint32_t *lookup32, uint32_t *out) {

  uint32_t   h[20];
  uint64_t   pe1x[4];
  uint64_t   pe2x[4];

  // Point
  _GetHash160Comp(px, py, (uint8_t *)h);
  CHECK_POINT(h, incr, 0);                         <-- 100% Ok up to here, means that (px,py) is good

  // Endo #1  if (x, y) = k * G, then (beta*x, y) = lambda*k*G
  _ModMult(pe1x, px, _beta); 
  _GetHash160Comp(pe1x, py, (uint8_t *)h);   <-- 50% Wrong from here
  CHECK_POINT(h, incr, 1);

  // Endo #2 if (x, y) = k * G, then (beta2*x, y) = lambda2*k*G
  _ModMult(pe2x, px, _beta2);
  _GetHash160Comp(pe2x, py, (uint8_t *)h);
  CHECK_POINT(h, incr, 2);

  ModNeg256(py);

  // Symetric points

  _GetHash160Comp(px, py, (uint8_t *)h);
  CHECK_POINT(h, -incr, 0);
  _GetHash160Comp(pe1x, py, (uint8_t *)h);
  CHECK_POINT(h, -incr, 1);
  _GetHash160Comp(pe2x, py, (uint8_t *)h);
  CHECK_POINT(h, -incr, 2);

}

444

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 21, 2019, 05:17:11 PM

OK thanks, it works On my 645 GTX same performance. Sqr bring few spill moves more (there is more temp variables than in ModMult). I didn't try yet on the OLD Quadro 600. I will see If I can win few registers. With Sqr 1> 33280 bytes stack frame, 128 bytes spill stores, 436 bytes spill loads Without Sqr 1> 33280 bytes stack frame, 120 bytes spill stores, 424 bytes spill loads

445

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 21, 2019, 04:02:33 PM

Quote from: arulbero on March 21, 2019, 03:20:19 PM From 153 MKeys/s to 160 MKeys/s using a _ModSqr instead of _ModMult Thanks, I tried but the -check failed. I will have a look at it. I committed the patch with few of your mods , i also review a bit the main loop.

446

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 21, 2019, 03:24:12 PM

Quote from: arulbero on March 21, 2019, 03:01:56 PM Still errors. OK Thanks for testing. I give up for the moment. I run out of ideas. I let the volatile. Hope I will manage to reproduce this.

447

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 21, 2019, 01:48:12 PM

Arg...
Could you try this (for 2 modmult) ?
With this mods, all instruction of the ModMult will be volatile and, theoretically, cannot be moved or removed by the compiler.

Code:

#define SET0(a) asm volatile ("mov.u64 %0,0;" : "=l"(a))

// ---------------------------------------------------------------------------------------
// Compute a*b*(mod n)
// a and b must be lower than n
// ---------------------------------------------------------------------------------------

__device__ void _ModMult(uint64_t *r, uint64_t *a, uint64_t *b) {

  uint64_t r512[8];
  uint64_t t[NBBLOCK];
  uint64_t ah,al;

  SET0(r512[5]);
  SET0(r512[6]);
  SET0(r512[7]);

  // 256*256 multiplier

448

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 21, 2019, 08:06:21 AM

Hello,

@arulbero

Could you try this file:
http://zelda38.free.fr/VanitySearch/GPUEngine.cu

I unrolled the UMult macro, may be nvcc performs wrong optimization due to this.
The volatile causes a 10% performance loss on my Windows. A bit less on my Linux.

Code:

// Reduce from 512 to 320 
-  UMult(t,(r512 + 4), 0x1000003D1ULL);
+  UMULLO(t[0],r512[4],0x1000003D1ULL);
+  UMULLO(t[1],r512[5],0x1000003D1ULL);
+  MADDO(t[1], r512[4],0x1000003D1ULL,t[1]);
+  UMULLO(t[2],r512[6],0x1000003D1ULL);
+  MADDC(t[2],r512[5],0x1000003D1ULL, t[2]);
+  UMULLO(t[3],r512[7],0x1000003D1ULL);
+  MADDC(t[3],r512[6],0x1000003D1ULL, t[3]);
+  MADD(t[4],r512[7],0x1000003D1ULL, 0ULL);

449

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 20, 2019, 04:54:52 PM

Quote from: Lolo54 on March 20, 2019, 03:50:05 PM Hello is it possible jean luc to compile it in .exe for CUDA 8 under windows or it only works for linux with cuda 8? It is in my task list but on Windows it is not easy to play with several releases of Visual C++. On Linux, it is more clear and simple enough. For Windows, I have to set up a full config with the good compiler fir Cuda 8. Quote from: arulbero on March 20, 2019, 04:05:48 PM It works!!! A little slower, but it is correct now! Good news I add the patch in the next release.

450

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 20, 2019, 03:46:16 PM

An other try: GPU/GPUEngine.cu: 465 and GPU/GPUEngine.cu: 514 Code: volatile uint64_t r512[8]; volatile prevent the compiler to make optimization on the variable adn to remove used code. I had a problem with gcc 6 concerning this on the CPU release.

451

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 20, 2019, 02:41:40 PM

I compiled a cuda 8 binaries if you want to try if you have same the behavior. http://zelda38.free.fr/VanitySearch/1.9/VanitySearch50_cuda8 On my install with SDK 8, it uses 135 registers and 0 spill move. With SDK 10, only 120 registers and also 0 spill move.

452

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 20, 2019, 12:59:39 PM

Quote from: arulbero on March 20, 2019, 11:57:57 AM Already tried wit "LD_LIBRARY_PATH", the problem is the driver. I have Ubuntu 17.04, I cannot install a new driver on it. Ok, That's too bad that the driver is not compatible. I tried your function on my Linux config but it does bring significant performance increase. Mainly due to the fact that adding temporary variable add more spill move which are slower, sometimes it is better to recompute. On your hardware you have much more available registers, performance increase should be more significant. A tip, May be you can try to play with the maxregister in the makefile, for compute cap 5.0, nvcc cuda 10, use 120 registers. The random problem you have may also be due to wrong register sharing between thread, it can explain the strange and random behavior. Reducing the number of used register by inlining also reduce the probability that this happens. It might be an explanation...

453

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 20, 2019, 11:56:10 AM

Quote from: arulbero on March 20, 2019, 11:36:29 AM (I'm not sure what C means, I suppose means with carry) Yes, ADD0 is the initial add without carry and set carry flag ADDC is add with carry and set carry flag ADD is add with carry and do no set carry flag Same for SUB Function may be have a 1 suffix for unary function.

454

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 20, 2019, 11:50:28 AM

Many thanks for the tips I will try this. You don't want to try binary ? The libcudart.so.10.0 is also available from the given link. You do not need to set up cuda sdk 10 (unless a driver problem appears but this may work without installing anything). You can just copy VanitySearch50 and the libcudart.so.10.0 in a directory and set the LD_LIBRARY_PATH. Code: export LD_LIBRARY_PATH=. ./VanitySearch50 ... This is mainly to see if the problem is solved with CUDA 10 or if it comes from elsewhere.

455

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 20, 2019, 10:07:38 AM

Quote from: arulbero on March 20, 2019, 09:40:53 AM New version is slower on my pc (132 MKeys/s against 162 MKeys/s). On my Windows, performance are the same than the previous release (Cuda 10). Slightly slower on Linux (Cuda 8.0), from 39.5MK/s to 37.9MK/s. Anyway, Do you compile or do you use Linux binaries ? Do you solved your problem ? I didn't manage to reproduce the issue yet.

456

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 20, 2019, 09:00:00 AM

Hello, A new release of VanitySearch (1.9) is out: Code: Added -b option (Search compressed or uncompressed addresses) Improved performance for loading large prefix list Fixed difficulty calculation bug for prefix containing only '1' Windows binaries: https://github.com/JeanLucPons/VanitySearch/releases/tag/1.9 Linux binaries: http://zelda38.free.fr/VanitySearch/ (Experimental) Tanks to test it !

457

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 18, 2019, 12:15:32 PM

Linux binary are available for download here (experimental). They are compiled with CUDA SDK10. Thanks to test them http://zelda38.free.fr/VanitySearch/

458

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 18, 2019, 06:53:09 AM

Hello, Quote from: stortz on March 17, 2019, 10:43:52 PM it ran, but just closed after finding it did it generate the private keys into a file? I am confused To output the key in a file, use the -o option. Code: VanitySearch -stop -gpu -o key.txt 1stortz Many thanks stivensons for the report

459

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 17, 2019, 06:28:46 PM

Quote from: stivensons on March 17, 2019, 06:04:15 PM if you post a release windows , I can test it too You can test with the release you have. You can try: Code: VanitySearch -gpuId 0 -check VanitySearch -gpuId 6 -check (On the 3GB) Thanks Tomorow, I will try to set up cuda sdk 10 on a recent hardware (Linux) and see If I can reproduce the issue.

460

Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder)

on: March 17, 2019, 04:55:04 PM

Ok Thanks, could you try to run cuda-memcheck on the release version.

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 »