Bitcoin Forum
November 01, 2024, 03:02:53 AM *
News: Bitcoin Pumpkin Carving Contest
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 »
441  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 21, 2019, 06:20:07 PM
It can be due to a wrong optimization concerning a carry somewhere which could explain that it works from time to time.
I had a similar problem with the CPU release when I compiled with gcc 6, gcc 7 or Visual C++ work flawlessly.
The patch (a volatile also) is at IntMop.cpp:859 and IntMp.cpp:915
442  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 21, 2019, 06:12:52 PM
You can also try VanitySearch -u -check
It will perform the check using uncompressed addresses and so use the CheckHashUncomp() function which is similar except that it calls GetHash160() instead of GetHash160Comp()

443  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 21, 2019, 06:06:22 PM
endo and sym are computed in CheckHashComp() in GPUCompute.h.
I quoted my last post and I added few comments.
The point (px,py) is always OK so no errors before CHECK_POINT(h, incr, 0);
The errors randomly appear after this line.
It seems that nvcc generates (in your case) a wrong code.


Code:
__device__ __noinline__ void CheckHashComp(prefix_t *prefix, uint64_t *px, uint64_t *py,
  int32_t incr, uint32_t tid, uint32_t *lookup32, uint32_t *out) {

  uint32_t   h[20];
  uint64_t   pe1x[4];
  uint64_t   pe2x[4];

  // Point
  _GetHash160Comp(px, py, (uint8_t *)h);
  CHECK_POINT(h, incr, 0);                         <-- 100% Ok up to here, means that (px,py) is good

  // Endo #1  if (x, y) = k * G, then (beta*x, y) = lambda*k*G
  _ModMult(pe1x, px, _beta);
  _GetHash160Comp(pe1x, py, (uint8_t *)h);   <-- 50% Wrong from here
  CHECK_POINT(h, incr, 1);

  // Endo #2 if (x, y) = k * G, then (beta2*x, y) = lambda2*k*G
  _ModMult(pe2x, px, _beta2);
  _GetHash160Comp(pe2x, py, (uint8_t *)h);
  CHECK_POINT(h, incr, 2);

  ModNeg256(py);

  // Symetric points

  _GetHash160Comp(px, py, (uint8_t *)h);
  CHECK_POINT(h, -incr, 0);
  _GetHash160Comp(pe1x, py, (uint8_t *)h);
  CHECK_POINT(h, -incr, 1);
  _GetHash160Comp(pe2x, py, (uint8_t *)h);
  CHECK_POINT(h, -incr, 2);

}

444  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 21, 2019, 05:17:11 PM
OK thanks, it works Smiley

On my 645 GTX same performance. Sqr bring few spill moves more (there is more temp variables than in ModMult).
I didn't try yet on the OLD Quadro 600.
I will see If I can win few registers.

With Sqr
1>    33280 bytes stack frame, 128 bytes spill stores, 436 bytes spill loads
Without Sqr
1>    33280 bytes stack frame, 120 bytes spill stores, 424 bytes spill loads
445  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 21, 2019, 04:02:33 PM
From 153 MKeys/s to 160 MKeys/s

using a _ModSqr instead of _ModMult

Thanks, I tried but the -check failed.
I will have a look at it.
I committed the patch with few of your mods , i also review a bit the main loop.
446  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 21, 2019, 03:24:12 PM
Still errors.

OK Thanks for testing. I give up for the moment. I run out of ideas.
I let the volatile.
Hope I will manage to reproduce this.
447  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 21, 2019, 01:48:12 PM
Arg...
Could you try this (for 2 modmult) ?
With this mods, all instruction of the ModMult will be volatile and, theoretically, cannot be moved or removed by the compiler.

Code:
#define SET0(a) asm volatile ("mov.u64 %0,0;" : "=l"(a))

// ---------------------------------------------------------------------------------------
// Compute a*b*(mod n)
// a and b must be lower than n
// ---------------------------------------------------------------------------------------

__device__ void _ModMult(uint64_t *r, uint64_t *a, uint64_t *b) {

  uint64_t r512[8];
  uint64_t t[NBBLOCK];
  uint64_t ah,al;

  SET0(r512[5]);
  SET0(r512[6]);
  SET0(r512[7]);

  // 256*256 multiplier
448  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 21, 2019, 08:06:21 AM
Hello,

@arulbero

Could you try this file:
http://zelda38.free.fr/VanitySearch/GPUEngine.cu

I unrolled the UMult macro, may be nvcc performs wrong optimization due to this.
The volatile causes a 10% performance loss on my Windows. A bit less on my Linux.

Code:
// Reduce from 512 to 320 
-  UMult(t,(r512 + 4), 0x1000003D1ULL);
+  UMULLO(t[0],r512[4],0x1000003D1ULL);
+  UMULLO(t[1],r512[5],0x1000003D1ULL);
+  MADDO(t[1], r512[4],0x1000003D1ULL,t[1]);
+  UMULLO(t[2],r512[6],0x1000003D1ULL);
+  MADDC(t[2],r512[5],0x1000003D1ULL, t[2]);
+  UMULLO(t[3],r512[7],0x1000003D1ULL);
+  MADDC(t[3],r512[6],0x1000003D1ULL, t[3]);
+  MADD(t[4],r512[7],0x1000003D1ULL, 0ULL);
449  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 20, 2019, 04:54:52 PM

Hello
is it possible jean luc to compile it in .exe for CUDA 8 under windows or it only works for linux with cuda 8?

It is in my task list but on Windows it is not easy to play with several releases of Visual C++. On Linux, it is more clear and simple enough. For Windows, I have to set up a full config with the good compiler fir Cuda 8.

It works!!! A little slower, but it is correct now!

Good news Wink
I add the patch in the next release.
450  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 20, 2019, 03:46:16 PM
 Embarrassed

An other try:

GPU/GPUEngine.cu: 465
and
GPU/GPUEngine.cu: 514

Code:
   volatile uint64_t r512[8];

volatile prevent the compiler to make optimization on the variable adn to remove used code.
I had a problem with gcc 6 concerning this on the CPU release.
451  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 20, 2019, 02:41:40 PM
I compiled a cuda 8 binaries if you want to try if you have same the behavior.
http://zelda38.free.fr/VanitySearch/1.9/VanitySearch50_cuda8

On my install with SDK 8, it uses 135 registers and 0 spill move.
With SDK 10, only 120 registers and also 0 spill move.
452  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 20, 2019, 12:59:39 PM
Already tried wit "LD_LIBRARY_PATH",  the problem is the driver. I have Ubuntu 17.04, I cannot install a new driver on it.

Ok, That's too bad that the driver is not compatible.

I tried your function on my Linux config but it does bring significant performance increase.
Mainly due to the fact that adding temporary variable add more spill move which are slower, sometimes it is better to recompute.
On your hardware you have much more available registers, performance increase should be more significant.

A tip, May be you can try to play with the maxregister in the makefile, for compute cap 5.0, nvcc cuda 10, use 120 registers.
The random problem you have may also be due to wrong register sharing between thread, it can explain the strange and random behavior. Reducing the number of used register by inlining also reduce the probability that this happens.
It might be an explanation...

453  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 20, 2019, 11:56:10 AM
(I'm not sure what C means, I suppose means with carry)

Yes,
ADD0 is the initial add without carry and set carry flag
ADDC is add with carry and set carry flag
ADD is add with carry and do no set carry flag
Same for SUB
Function may be have a 1 suffix for unary function.
454  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 20, 2019, 11:50:28 AM
Many thanks for the tips Wink
I will try this.

You don't want to try binary ? The libcudart.so.10.0 is also available from the given link. You do not need to set up cuda sdk 10 (unless a driver problem appears but this may work without installing anything).
You can just copy VanitySearch50 and the libcudart.so.10.0 in a directory and set the LD_LIBRARY_PATH.
Code:
export LD_LIBRARY_PATH=.
./VanitySearch50 ...

This is mainly to see if the problem is solved with CUDA 10 or if it comes from elsewhere.
455  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 20, 2019, 10:07:38 AM
New version is slower on my pc (132 MKeys/s against 162 MKeys/s).

On my Windows, performance are the same than the previous release (Cuda 10).
Slightly slower on Linux (Cuda 8.0), from 39.5MK/s to 37.9MK/s.

Anyway,
Do you compile or do you use Linux binaries ?
Do you solved your problem ? I didn't manage to reproduce the issue yet.
456  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 20, 2019, 09:00:00 AM
Hello,

A new release of VanitySearch (1.9) is out:

Code:
Added -b option (Search compressed or uncompressed addresses)
Improved performance for loading large prefix list
Fixed difficulty calculation bug for prefix containing only '1'

Windows binaries: https://github.com/JeanLucPons/VanitySearch/releases/tag/1.9
Linux binaries: http://zelda38.free.fr/VanitySearch/ (Experimental)

Tanks to test it !
Smiley
457  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 18, 2019, 12:15:32 PM
Linux binary are available for download here (experimental).
They are compiled with CUDA SDK10.
Thanks to test them Wink

http://zelda38.free.fr/VanitySearch/
458  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 18, 2019, 06:53:09 AM
Hello,

it ran, but just closed after finding it
did it generate the private keys into a file?
I am confused

To output the key in a file, use the -o option.
Code:
VanitySearch -stop -gpu -o key.txt 1stortz

Many thanks stivensons for the report Smiley
459  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 17, 2019, 06:28:46 PM
if you post a release windows , I can test it too  Smiley

You can test with the release you have.
You can try:
Code:
VanitySearch -gpuId 0 -check 
VanitySearch -gpuId 6 -check (On the 3GB)
Thanks Wink


Tomorow, I will try to set up cuda sdk 10 on a recent hardware (Linux) and see If I can reproduce the issue.



460  Bitcoin / Development & Technical Discussion / Re: VanitySearch (Yet another address prefix finder) on: March 17, 2019, 04:55:04 PM
Ok Thanks, could you try to run cuda-memcheck on the release version.
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!