Bitcoin Forum
November 16, 2024, 07:41:48 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 »
  Print  
Author Topic: Large Bitcoin Collider (Collision Finders Pool)  (Read 193407 times)
TooDumbForBitcoin
Legendary
*
Offline Offline

Activity: 1638
Merit: 1001



View Profile
February 11, 2017, 10:36:23 AM
 #441

Good morning!

HeavenlyCreatures found #49


Code:
From	XXX
To bots@cryptoguru.org
Date Today 08:02
Hi,

I found #49

0d2f533966c6578e1111978ca698f8add7fffdf3:c:priv:000000000000000000000000000000000000000000000000000174176b015001
+ 0xf4c

Looking at the PK, the pool must have found it GMT: Sat, 11 Feb 2017 04:32:26 GMT

edit: trophies update.

cheers!

Rico

16 days, 16 days, 33 days, 67 days, ......



▄▄                                  ▄▄
 ███▄                            ▄███
  ██████                      ██████
   ███████                  ███████
    ███████                ███████
     ███████              ███████
      ███████            ███████
       ███████▄▄      ▄▄███████
        ██████████████████████
         ████████████████████
          ██████████████████
           ████████████████
            ██████████████
             ███████████
              █████████
               ███████
                █████
                 ██
                  █
veil|     PRIVACY    
     WITHOUT COMPROMISE.      
▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
|   NO ICO. NO PREMINE. 
   X16RT GPU Mining. Fair distribution.  
|      The first Zerocoin-based Cryptocurrency      
   WITH ALWAYS-ON PRIVACY.  
|



                   ▄▄████
              ▄▄████████▌
         ▄▄█████████▀███
    ▄▄██████████▀▀ ▄███▌
▄████████████▀▀  ▄█████
▀▀▀███████▀   ▄███████▌
      ██    ▄█████████
       █  ▄██████████▌
       █  ███████████
       █ ██▀ ▀██████▌
       ██▀     ▀████
                 ▀█▌




   ▄███████
   ████████
   ███▀
   ███
██████████
██████████
   ███
   ███
   ███
   ███
   ███
   ███




     ▄▄█▀▀ ▄▄▄▄▄▄▄▄ ▀▀█▄▄
   ▐██▄▄██████████████▄▄██▌
   ████████████████████████
  ▐████████████████████████▌
  ███████▀▀▀██████▀▀▀███████
 ▐██████     ████     ██████▌
 ███████     ████     ███████
▐████████▄▄▄██████▄▄▄████████▌
▐████████████████████████████▌
 █████▄▄▀▀▀▀██████▀▀▀▀▄▄█████
  ▀▀██████          ██████▀▀
      ▀▀▀            ▀▀▀
becoin
Legendary
*
Offline Offline

Activity: 3431
Merit: 1233



View Profile
February 11, 2017, 01:44:37 PM
 #442

Good morning!

HeavenlyCreatures found #49


Code:
From	XXX
To bots@cryptoguru.org
Date Today 08:02
Hi,

I found #49

0d2f533966c6578e1111978ca698f8add7fffdf3:c:priv:000000000000000000000000000000000000000000000000000174176b015001
+ 0xf4c

Looking at the PK, the pool must have found it GMT: Sat, 11 Feb 2017 04:32:26 GMT

edit: trophies update.

cheers!

Rico

16 days, 16 days, 33 days, 67 days, ......

And all are with standard and negligible amount of bitcoins, right? Cheesy
I bet all they are from early days bitcoin conferences when participants were given QR code badges with naked privkeys. There are hundreds if not thousands of them. Don't waste your time with this 'project', just contact conference organizers and ask for the list with privkeys!
arulbero
Legendary
*
Offline Offline

Activity: 1941
Merit: 2094


View Profile
February 11, 2017, 08:04:21 PM
Last edit: February 12, 2017, 11:28:06 AM by arulbero
 #443

I tried it. On my notebook it takes

Code:
real    0m26.493s
user    0m26.490s
sys     0m0.000s

for the ~4.1mio keys (1000 * 4096). And

Code:
real    1m47.661s
user    1m47.657s
sys     0m0.003s

for 16M keys. So around ~160 000 keys/s

Ok, I don't know how to use the lists on Python  Grin

This version is faster (at least 50%) with only few modifications:

https://www.dropbox.com/s/wrbolxzbiu3y9su/ecc_for_collider04.zip?dl=0


Another update of the library with endomorphism:

https://www.dropbox.com/s/7v5i36n4k6d849b/ecc_for_collider05.zip?dl=0
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 11, 2017, 08:27:52 PM
 #444

Another update of the library with endomorphism:

https://www.dropbox.com/s/7v5i36n4k6d849b/ecc_for_collider05.zip?dl=0

05 runs nearly 22 seconds for 16M keys on my notebook. This is now only 3.5 times slower than what LBC optimized C version needs for 16M keys. I don't dare to estimate what optimized C code can make of this.

 Shocked

Code:
real    0m21.790s
user    0m21.787s
sys     0m0.000s

Your code is a tremendous help, but it would speed up my understanding of the code and porting it to C (and possibly OpenCL) if there were more comments. e.g. you start with

Code:
start=2**55+789079076   #k is a random private key 

but start cannot be smaller than 2049 else

Code:
$ time python ./gen_batch_points05.py 
Traceback (most recent call last):
  File "./gen_batch_points05.py", line 50, in <module>
    kminverse = [invjkz] + inv_batch(kx,mGx,p)
  File "/data/soft/lin/LBC/generator/HRD/arulbero-ECC/5/ecc_for_collider05.py", line 56, in inv_batch
    inverse=inv(partial[2048],p) # 1I
  File "/data/soft/lin/LBC/generator/HRD/arulbero-ECC/5/ecc_for_collider05.py", line 32, in inv
    q, r = divmod(v,u)
ZeroDivisionError: integer division or modulo by zero

It'd also help if you could lay out the effective sequence of private keys as it is computed, because if LBC should adopt this, I have to merge this - somehow - with the LBC interval arithmetics to make sure work still is distributable / parallelizable and the bookkeeping still being sane.


Rico

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
arulbero
Legendary
*
Offline Offline

Activity: 1941
Merit: 2094


View Profile
February 11, 2017, 09:09:40 PM
Last edit: February 12, 2017, 03:38:54 PM by arulbero
 #445

Imagine you want to generate a batch from 10000 to 14096 (the script actually generates batches of 4097 points)

First you generate the key k = 12048 (always we start with the middle point, to exploit the symmetry), this is the only point (a pivot point) of the batch that we get with the slower function mult

Code:
... k ...  <-- one batch, only one key k

jkx,jky,jkz = mul(k,Gx,Gy,1)
invjkz = inv(jkz,p)
(kx,ky) = jac_to_aff(jkx, jky, jkz, invjkz)


k can be any number greater than 2048 (otherwise, if k=3 for example, kG+3G gives a error because you are trying to use the addition formula instead of the double...) The first batch you can create with this script goes from 1 to 4097, the start key in that case would be k=2049.

Then the script generates three batches, each batch has 1 point + 2048 couple of points:

first batch: this is the batch you are more interested of, because it has 4097 points in your range, including the point 12048G:

(12048),(12048+1,12048-1),(12048+2,12048-2),....,(12048+2048=14096,12048-2048=10000)

the script computes this batch with the function double_add_P_Q_inv

Element #0 of the list is always kG, element #1 is the couple kG+1G, kG-1G, #2 is the couple kG+2G, kG-2G,  and so on ... --> #2048 is the couple kG+2048,kG-2048G

Code:
batch = batch + list(map(double_add_P_Q_inv,kxl[1:],kyl[1:],mGx[1:],mGy[1:],kminverse[1:]))	

Batch 1 and 2: these keys are not in your range, here we use endomorphism:

batch1:
(12048*lambda), ((12048+1)*lambda,(12048-1)*lambda), ((12048+2)*lambda,(12048-2)*lambda),  ...., (10000*lambda,14096*lambda)

batch2:
(12048*lambda^2),   ((12048+1)*lambda^2, (12048-1)*lambda^2),   ((12048+2)*lambda^2, (12048-2)*lambda^2),  ....,  (14096*lambda^2, 10000*lambda^2)

EDIT:
to make sure work still is distributable / parallelizable and the bookkeeping still being sane.
You don't worry about each key, in my opinion you have to store only a private key for 3 batches, you can think at the single key in the middle of the batch like a special seed. 99,9999% of the batches doesn't match any address with bitcoin, so when a match occurs only then you have to regenerate the entire 3 batches from this single seed to fetch the correct private key. Batch 1 and 2 are sequence of keys each different from each other, so you are sure that you are not wasting your computational efforts. I'm almost sure about the last sentence, there can't be more than three points with the same y, it is not possible checking the same key twice. Note that the 3 batches are related, they must be computed together.

Imagine you know that the pool has searched so far from key 1 to 2^50, then you know that the pool has searched keys 1*lambda, 2*lambda, 3*lambda ...  to 2^50*lambda (mod n) too, and keys 1*lambda^2, 2*lambda^2, 3*lambda^2,... to 2^50*lambda^2 (mod n).


05 runs nearly 22 seconds for 16M keys on my notebook. This is now only 3.5 times slower than what LBC optimized C version needs for 16M keys. I don't dare to estimate what optimized C code can make of this.

 Shocked

I dare: if you use complement too, you can generate 16M keys in less than half a second (with cpu, I don't know for GPU)

Considering that your current code performs 6M + 1S only for the transition from jacobian to affine coordinates for each point and that you are using J+J --> J to perform each addition (12M + 4S), your current cost should be 18M + 5S each point.

Let's say 1S = 0,8M, you have  about 22M for point.

If you are now using instead  J+A --> J to perform addition (8M + 3S), then you have about 17,2M for point.

My code uses 3,5M + 1S for each point of the first batch, and only 1M for each point of the other 2 batches.
So the average is: 5,5/3= 1,83M + 0,33S for point, let's say about 2,1M for point.


Now your speed is 16M/6s = 2,7 M/s for each cpu core.

If you could achieve a 8x - 10x improvement, let's say a 8x, so you could perform at least 21M/s. If you use (X,Y) --> (X,-Y) too, 42M/s. Let's say at least 40M k/s for each core, 15x respect of your actual speed.
With a 8-core cpu, you could generate more keys than your entire pool can handle at this moment.


Maybe tomorrow I'll add more comments on the code. Anyway read again this post, I edited it.

EDIT2:

this is a version with more comments:

https://www.dropbox.com/s/6o2az7n6x0luld4/ecc_for_collider06.zip?dl=0
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 12, 2017, 12:03:09 PM
Last edit: February 12, 2017, 08:50:27 PM by rico666
 #446

  • New BLF file on FTP
  • New LBC client version (1.010) available

./LBC -u is your friend.

As mentioned in #433, you can now attach a BTC address with your id for rewards to your client.
As mentioned in #436, you can now call the LBC client with a --gpu parameter. The best case scenario you will see is currently this:

Code:
$ ./LBC --gpu
OpenCL diagnostics written.
GPU authorized: yes

If you see this, you're on the highway to a GPU accelerated client. If you see instead this:


Code:
Perl module 'OpenCL' not found - please make sure:
 * OpenCL is installed correctly on your system
 * then install the Perl OpenCL module via CPAN
   (cpan install OpenCL)

you want to make sure OpenCL is installed correctly on your system. Some pointers to do so:
https://wiki.tiker.net/OpenCLHowTo
http://askubuntu.com/questions/796770/how-to-install-libopencl-so-on-ubuntu

Won't work in a VM. At least not without advanced magic. If oclvanitygen runs on your system, you're fine. The only thing left to do is to install the Perl bindings for OpenCL:

https://metacpan.org/pod/OpenCL

For this, it's the usual:

Code:
$ cpan
cpan> install OpenCL

or - in one batch:

$ cpan install OpenCL

The message "OpenCL diagnostics written" indicates you will see a file diagnostics-OpenCL.txt in your directory. Please do not post its output here as it is quite extensive. Either pastebin it and post the link here, or send its content to bots@cryptoguru.org. (If there are any problems, or if you want to make sure your config is supported).

Well, and if you see a

Code:
GPU authorized: no
instead and you would want to change that - you want to be in the top30 or will have to fork out 0.1 BTC Smiley

OpenCL generator ETA: "really soon now(tm)"


Rico


edit:

Short HowTo install LBC @ AWS Ubuntu instance including OpenCL


Code:
# $ is shell/bash
# cpan> is cpan shell


$ sudo apt-get update
$ sudo apt-get install gcc xdelta3 make
$ sudo apt-get install nvidia-opencl-dev nvidia-opencl-icd-367 nvidia-modprobe clinfo
$ clinfo
$ sudo cpan
cpan> install JSON OpenCL

$ mkdir collider; cd collider; tmux
$ wget ftp://ftp.cryptoguru.org/LBC/client/LBC
$ chmod a+x LBC
$ ./LBC -h

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 13, 2017, 02:36:57 PM
 #447

Unoptimized CPU/GPU hybrid generator. 1st successful run on 1 CPU core with Nvidia GPU in tandem: 1811207 keys/s



Code:
$ time hrd-core -I 0000000000000000000000000000000000000000000000000000000000000001 -c 10000
Num platforms: 2
Platform - 0
  1.1 CL_PLATFORM_NAME: Intel(R) OpenCL
  1.2 CL_PLATFORM_VENDOR: Intel(R) Corporation
  1.3 CL_PLATFORM_VERSION: OpenCL 2.0
  1.4 CL_PLATFORM_PROFILE: FULL_PROFILE
  1.5 CL_PLATFORM_EXTENSIONS: cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir
  Device - 0:
    CL_DEVICE_NAME: Intel(R) HD Graphics
    CL_DEVICE_VENDOR: Intel(R) Corporation
    CL_DRIVER_VERSION: r2.0.54425
    CL_DEVICE_VERSION: OpenCL 2.0
    CL_DEVICE_MAX_COMPUTE_UNITS: 24
Platform - 1
  2.1 CL_PLATFORM_NAME: NVIDIA CUDA
  2.2 CL_PLATFORM_VENDOR: NVIDIA Corporation
  2.3 CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 8.0.0
  2.4 CL_PLATFORM_PROFILE: FULL_PROFILE
  2.5 CL_PLATFORM_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event
  Device - 0:
    CL_DEVICE_NAME: Quadro M2000M
    CL_DEVICE_VENDOR: NVIDIA Corporation
    CL_DRIVER_VERSION: 375.26
    CL_DEVICE_VERSION: OpenCL 1.2 CUDA
    CL_DEVICE_MAX_COMPUTE_UNITS: 5
2d17543d32448acc7a1c43c5f72cd5be459ab302:u:priv:0000000000000000000000000000000000000000000000000000000000000001 + 0x5e
02e62151191a931d51cdc513a86d4bf5694f4e51:c:priv:0000000000000000000000000000000000000000000000000000000000000001 + 0x65
9d74ffdb31068ca2a1feb8e34830635c0647d714:u:priv:00000000000000000000000000000000000000000000000000000000000f9001 + 0xf8c
3d6871076780446bd46fc564b0c443e1fd415beb:c:priv:00000000000000000000000000000000000000000000000000000000000f9001 + 0xf8c
response: 30-19-0

real    0m9.263s
user    0m8.117s
sys     0m1.097s


Rico

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
arulbero
Legendary
*
Offline Offline

Activity: 1941
Merit: 2094


View Profile
February 13, 2017, 05:11:28 PM
Last edit: February 13, 2017, 05:37:37 PM by arulbero
 #448

Hi,

Unoptimized CPU/GPU hybrid generator. 1st successful run on 1 CPU core with Nvidia GPU in tandem: 1811207 keys/s

CPU only for public keys generation + GPU for sha256/ripemd160? Why in the meantime the pool performance has fell down?


I have a new version of the ecc_for_collider:

1) + complement private keys

2) + comments

https://www.dropbox.com/s/3jsxjy7sntx3p4a/ecc_for_collider07.zip?dl=0

The file foo.py performs 16,4 M of useless products; just to appreciate the efficiency of the generation of public keys of the script gen_batches_points07.py:

main_batch --> (x,y)   3,5M + 1S for each point

batch2 --> (betax,y)     1M for each point

batch3 --> (beta^2*x,y) 1M for each point

batch_minus --> (x,-y)  (betax,-y) (beta^2*x,-y)  0M and 0S

Total:  about 1,1M for each point!
If you know the performance of the field multiplication in your C code, you can have an idea of the performance you could reach. How long it takes your C code to perform 16,4 M multiplications (operands: big numbers and multiplication mod p)?

In the next days I want to perform some tests about endomorphism, just to be sure that everything is ok (for example  we'd like to avoid  twice computation of the same key)
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 13, 2017, 05:48:37 PM
 #449

CPU only for public keys generation + GPU for sha256/ripemd160?

Exactly. meanwhile I am at

Code:
real    0m8.561s
user    0m8.093s
sys     0m0.413s

(= 1959955 keys/s per CPU core with GPU support) and memory requirement on GPU a mere 29MB (GPU is bored)

Quote
Why in the meantime the pool performance has fell down?

Because two (in words: two!) guys turned their machines off.  Cheesy
I have a feeling this dip in performance is only temporary...

Of the aforementioned 8 seconds, around 6.2 are ECC public key generation (16M uncompressed keys, the compressed key is done @ GPU).
Every second less here counts, so naturally all you did towards ECC optimization will have maximum effect with the CPU/GPU hybrid.


Rico

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
arulbero
Legendary
*
Offline Offline

Activity: 1941
Merit: 2094


View Profile
February 13, 2017, 06:17:16 PM
 #450

CPU only for public keys generation + GPU for sha256/ripemd160?

Exactly. meanwhile I am at
Code:
real    0m8.561s
user    0m8.093s
sys     0m0.413s

(= 1959955 keys/s per CPU core with GPU support) and memory requirement on GPU a mere 29MB (GPU is bored)

Of the aforementioned 8 seconds, around 6.2 are ECC public key generation (16M uncompressed keys, the compressed key is done @ GPU).

6,2 s: CPU generates 16.7 M of public keys (x,y)
1,8 s: GPU performs SHA256 / ripemd160 of (x,y) and (x) <-compressed, what do you mean "compressed key is done with GPU"? Do you use 1 or 2 compressed keys? The x is always the same, you don't need to compute the y so you can generate 2 compressed keys for each uncompressed. Do you generate 33M of addresses each 8s or 50M of addresses?

Anyway at the moment the cpu is the bottleneck, gpu does his work at least x3 faster than cpu...
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 13, 2017, 06:34:52 PM
Last edit: February 13, 2017, 08:17:28 PM by rico666
 #451

6,2 s: CPU generates 16.7 M of public keys (x,y)
1,8 s: GPU performs SHA256 / ripemd160 of (x,y) and (x) <-compressed,

Yes.

Quote
what do you mean "compressed key is done with GPU"?

Code:
sha256_in[0] = 0x02 | (sha256_in[64] & 0x01);

 Wink


Quote
Anyway at the moment the cpu is the bottleneck, gpu does his work at least x3 faster than cpu...

Sure. It is a 1st step. The big advantage of this is, it works like a drop-in replacement.
I see lots of optimization potential, originally, my notebook maxed out at ~ 2.8 Mkeys/s and now

Code:
$ LBC -c 8
Ask for work... got blocks [383054009-383054392] (402 Mkeys)
oooooooooooooooooooooooo (7.30 Mkeys/s)


Rico






edit:


LOL...

Code:
$ LBC -t 1 -l 0
Ask for work... Server doesn't like us. Answer: toofast.

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
Jude Austin
Legendary
*
Offline Offline

Activity: 1140
Merit: 1000


The Real Jude Austin


View Profile WWW
February 14, 2017, 02:11:10 AM
Last edit: February 14, 2017, 02:37:06 AM by Jude Austin
 #452

6,2 s: CPU generates 16.7 M of public keys (x,y)
1,8 s: GPU performs SHA256 / ripemd160 of (x,y) and (x) <-compressed,

Yes.

Quote
what do you mean "compressed key is done with GPU"?

Code:
sha256_in[0] = 0x02 | (sha256_in[64] & 0x01);

 Wink


Quote
Anyway at the moment the cpu is the bottleneck, gpu does his work at least x3 faster than cpu...

Sure. It is a 1st step. The big advantage of this is, it works like a drop-in replacement.
I see lots of optimization potential, originally, my notebook maxed out at ~ 2.8 Mkeys/s and now

Code:
$ LBC -c 8
Ask for work... got blocks [383054009-383054392] (402 Mkeys)
oooooooooooooooooooooooo (7.30 Mkeys/s)


Rico






edit:


LOL...

Code:
$ LBC -t 1 -l 0
Ask for work... Server doesn't like us. Answer: toofast.

Can't wait to get home and try the GPU version.

If I have 4 cores and 4 GPUs will it use a GPU with each core or...

Also,  can you make it so I can run this on an Rpi?

Allow the client to run the old Go script that should suffice.

Buy or sell $100 of Crypto and get $10!
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 14, 2017, 07:16:45 AM
Last edit: February 14, 2017, 09:08:26 AM by rico666
 #453

Can't wait to get home and try the GPU version.

Think March. I have some basic quality assurance in this project.  Wink
The client basically works, but several things are still hard coded for my notebook (choice of OpenCL device).
I have no feedback (diagnostics-OpenCL.txt) from AMD GPUs yet.
Client is stable. Ran the whole night through on my notebook with 7.x Mkeys/s:



Of course I also checked the client with all blocks containing private keys the pool has found so far - it reliably finds all of them.

Quote
If I have 4 cores and 4 GPUs will it use a GPU with each core or...

Right now, one GPU would be taken as accelerator for all cores and still be bored.
Probably the best balancing one can get right now is 1 GPU and many cores

Amazon p2.xlarge or similar to my notebook. That's why I am asking for the OpenCL diagnostics files, to be able to cover a broader range of configurations.

My next step will to be to incorporate arulberos ECC magic to shift the balance by taking
more and more load from the CPU. Current status: https://twitter.com/LBC_collider

Quote
Also,  can you make it so I can run this on an Rpi?
Allow the client to run the old Go script that should suffice.

It's unlikely I will go down that path for now.
The HRD-client was originally about 13 times faster than the Go client (meanwhile >15 times), and 32bit architectures are on average half the speed of 64bit.
I do have a 32bit notebook (Lenovo Z61p), two cores, that does about 200 Kkeys/s on both cores with HRD, this notebook does around 12 Kkeys/s with the Go client.
My new notebook was originally 14 times faster with CPU only and is meanwhile over 35 times faster than the HRD on the old one. It is about 616 times faster than the Go client on the old notebook.
Also, the Go client needed more memory (2GB).

So my goal is to make a GPU client so that my current notebook (and your computer) will be x-thousand times faster than Go on the old/small machines.


Rico


all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
arulbero
Legendary
*
Offline Offline

Activity: 1941
Merit: 2094


View Profile
February 14, 2017, 07:44:49 PM
Last edit: February 15, 2017, 03:55:51 PM by arulbero
 #454

I am performing some tests about endomorphism.

I remind the idea, we would like to generate:

a) 1G,  2G,  3G,  ......., kG, .......... , 2^160G
b) 1G',  2G', 3G', ......., kG', .........., 2^160G'    where G'=lambdaG
c) 1G'', 2G'', 3G'', ......., kG'',.........., 2^160G''   where G''=lambda^2G

We are sure that each row has different elements, because G, G', G'' have period n. But of course we cannot be sure that each element of b) for example is not an element of a) too. If we generated n keys instead of just 2^160, we would get the entire group of all n points, and then all the 3 rows would have the same elements. Only the order would be different.

But we have to generate only "few" elements.
Let's look at the rows a) and b) and at the relation between 2 corresponding elements: kG' = k*(lambdaG) = lambda*(kG). Where are these elements of b)?

My guess is:
multiplication by lambda produces 2^160 elements of b) evenly distributed in the space of the keys (keys respect of the generator G).

If that were true, how often would we have a "collision" (double computation of the same key in 2 distinct rows) between the 2 rows?
If the keys of the b) row are actually evenly distributed, the probability for each new key of b) to fall in the range 1-2^160 should be 2^160/2^256, about 1/2^96. If we generated 2^160 elements, we'd have 2^64 collisions.

To deal with this hypothesis, I generated 2^30 keys of the row b) (lambda1, lambda2, lambda3, ..., lambda2^30); none of these were in the range (1,2^160), so I checked how many were in larger ranges (like for example (1,2^238), and in that case I got about 2^12 'collisions' (2^238/2^256 * 2^30 = 2^12). So my hypothesis seems to have been confirmed by these results.

In summary, since we have to generate only 2^160 keys, we can accept (but obviously it's up to you) to have a double computation for one key each 2^96, only 16 'collisions' in the first 2^100 keys.

A question remains: do you want to generate random keys outside from your initial range? In case of collision, how can somebody prove to you that it is his key, since that key is indistinguishable from the others?

If you want instead to let go of endomorphism, I remind you that your generation's speed will be halved (from 1,1 M to 2,1 M for each point).
Haze
Full Member
***
Offline Offline

Activity: 149
Merit: 100


View Profile
February 15, 2017, 08:46:48 PM
 #455

Quote
Of course I also checked the client with all blocks containing private keys the pool has found so far - it reliably finds all of them.

Can you explain this further please? How do you know which blocks contains keys the pool has found?

If someone found a valid key and just let it go and kept colliding, would you know about it?
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 15, 2017, 10:09:17 PM
 #456

Quote
Of course I also checked the client with all blocks containing private keys the pool has found so far - it reliably finds all of them.

Can you explain this further please? How do you know which blocks contains keys the pool has found?

https://lbc.cryptoguru.org/trophies

What I meant was, that additionally to the usual ./LBC -x I also searched manually in spaces where the known private keys of the puzzle transaction are (all compressed) and also the two addresses we found with funds on them (which are uncompressed).

The new CPU/GPU hybrid found all of them, so I assume it is a working drop in replacement.
Testing the LBC is crucial, because when you have rare events like we have, you cannot afford to have a generator that overlooks something.
If your computer works for a month without a find, you have to be pretty sure it is because there really was nothing and not that because of some bug your client "oversaw" something. So that's basically what my test (and the statement) was about.


Rico

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 16, 2017, 08:03:05 AM
 #457

Seems I catched some race condition after my async modifications. When I came to my notebook today morning, I saw

Code:
...lots of work done, but then ...
Ask for work... got blocks [403243609-403246040] (2550 Mkeys)
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0
0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0
0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x1
0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x1
0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x2
0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x2
0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x3
0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x3
0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x4
0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x4
0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x5
...and so on...

thousands of "finds" of a 000000 hash160. And then

Code:
197f1706f2aa45480c1debc40628c87823da08f6:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0xd2

Naturally, I looked up 197f1706f2aa45480c1debc40628c87823da08f6, which resolves to https://blockchain.info/address/13Kp9AJAxhEEjFo8N6YTP9DMW71YpK2fD9, but no funds there. Ok, that can happen if the bloom filter sees a false positive (allegedly 10-27 probability), but a re-run in the same search space went smooth with neither any fake zero-hash160 finds nor this false positive.

Investigating, but it seems like clEnqueueReadBuffer does not respect a blocking read, after it has been called with non blocking reads before.  Undecided

I have done some more optimizations, but all I managed to do, was that the GPU load went down from 43% to 34%  Tongue I need to take load down from the CPU!  Roll Eyes


Rico

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 16, 2017, 11:42:27 AM
Last edit: February 16, 2017, 12:09:56 PM by rico666
 #458

https://twitter.com/LBC_collider

GPU's arent ... you should try GPU ... I'm sure you can delivered great speed with GPU

even with 1 server I think I can triple the pool speed.

root@soft:~# lshw -C video | grep product:
       product: ASPEED Graphics Family
       product: GK210GL [Tesla K80]
       product: GK210GL [Tesla K80]
       product: GK210GL [Tesla K80]
       product: GK210GL [Tesla K80]

Code:
ubuntu@ip-172-31-34-146:~/collider$ ./LBC -c 4 -l 0 -t 1
Benchmark info not found - benchmarking... done.
Your maximum speed is 1576126 keys/s per CPU core.
Ask for work... got blocks [405066137-405066520] (402 Mkeys)
oooooooooooooooooooooooo (3.19 Mkeys/s)
ubuntu@ip-172-31-34-146:~/collider$ ./LBC -c 2 -l 0 -t 1
Ask for work... got blocks [405077529-405077720] (201 Mkeys)
oooooooooooo (2.78 Mkeys/s)

Clearly, Amazon puts way too few/too weak CPUs in their Instances - for our usecase.
What surprises me more, is that the K80 does not look so impressive compared with my tiny Notebook GPU:

Code:
ubuntu@ip-172-31-34-146:~$ nvidia-smi
Thu Feb 16 11:23:38 2017      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 0000:00:1E.0     Off |                    0 |
| N/A   55C    P0    76W / 149W |    256MiB / 11439MiB |     23%      Default |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1938    C   ./gen-hrdcore-avx2-linux64                      64MiB |
|    0      1939    C   ./gen-hrdcore-avx2-linux64                      64MiB |
|    0      1940    C   ./gen-hrdcore-avx2-linux64                      64MiB |
|    0      1941    C   ./gen-hrdcore-avx2-linux64                      64MiB |
+-----------------------------------------------------------------------------+


With the 4 vCPUs in use. Clearly , 4 vCPUs in Amazon speak mean 2 real cores + 2HT

versus my real 4 CPUs:

Code:
$ nvidia-smi
Thu Feb 16 12:36:36 2017      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro M2000M       Off  | 0000:01:00.0     Off |                  N/A |
| N/A   51C    P0    N/A /  N/A |    115MiB /  4041MiB |     33%      Default |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     21809    C   ./gen-hrdcore-skylake-linux64                   28MiB |
|    0     21810    C   ./gen-hrdcore-skylake-linux64                   28MiB |
|    0     21811    C   ./gen-hrdcore-skylake-linux64                   28MiB |
|    0     21839    C   ./gen-hrdcore-skylake-linux64                   28MiB |
+-----------------------------------------------------------------------------+

I end up at almost 7 Mkeys/s with my 4 CPUs. Moreover, not only is the memory usage more efficient (ok - the K80 has 3 times the memory, but it also slurps - for reasons unknown to me - about 2.5 times per process), also the relative utilization is in favor of my notebook. If Amazon offered a P2 instance with 20vCPUs and 1 K80 -> that would be balanced and at least 30 Mkeys/s could be expected from that.
Als a good (in terms of balance) configuration: 12 real Skylake cores and some reasonable Maxwell (GM107) GPU -> should give you 23+ Mkeys/s

On the more positive side, GPU detection and choice of OpenCL device ran flawless on 1st try.


Rico

edit installation howto for OpenCL on Ubuntu 16.04 (as used on AWS):

Code:
# OpenCL @ Amazon AWS Ubuntu ----------------------------------

sudo apt-get install gcc make tmux libssl-dev xdelta3 nvidia-367 nvidia-cuda-toolkit
mkdir collider; cd collider; tmux
wget ftp://ftp.cryptoguru.org/LBC/client/LBC
chmod a+x LBC
sudo ./LBC -h
sudo cpan
cpan> install OpenCL
sudo reboot
sudo nvidia-smi -pm 1
sudo nvidia-smi --auto-boost-default=0
sudo nvidia-smi -ac 2505,875
./LBC -x
./LBC --gpu

economic considerations:

At the moment AWS GPU instances are not economical. For 0.25/h you can get the p2.xlarge and it will give you max 3.2 Mkeys/s. OTOH, you can get for 0.5/h a m4.x16 compute instance with 64 vCPUs and that will give you around 18 Mkeys/s. Yes - we need a better GPU client.

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
becoin
Legendary
*
Offline Offline

Activity: 3431
Merit: 1233



View Profile
February 16, 2017, 01:07:04 PM
 #459


economic considerations:


Really? You've finally decided this "project" needs some economic considerations after 23 pages of enthusiastic code churning?

rico666 (OP)
Legendary
*
Offline Offline

Activity: 1120
Merit: 1037


฿ → ∞


View Profile WWW
February 16, 2017, 01:19:02 PM
 #460


economic considerations:

Really? You've finally decided this "project" needs some economic considerations after 23 pages of enthusiastic code churning?

becoin - as always... It's not the "project" that needs economic considerations, but anyone who wants to get in the top30 for getting a GPU client and not forking out 0.1 BTC (or 0.5 BTC if he's becoin).

Right now, you can still get in the top30 for around $11 (~28 hours) with a m4.x16 AWS spot instance. To achieve the same with the p2.xlarge would cost you $33.


Apropos churning:

I made a workaround in the LBC client to stop the generator when it is churning bad hashes:

Code:
Ask for work... got blocks [405316777-405317288] (536 Mkeys)
oooooooooooooooooooooooooooooooo (6.68 Mkeys/s)
Ask for work... got blocks [405317817-405318328] (536 Mkeys)
oooooooooooooooooooooooooooooooo (6.51 Mkeys/s)
Ask for work... got blocks [405318361-405318872] (536 Mkeys)
ooooooooooooooooooGenerator churning bad hits! Abort.
20 just got out of the pool with exit code: 255 and data:
ooooooooooooomalformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "HASH(0x3e5cca8)") at ./LBC line 1176.

It's not nice, but until I find a real fix, this at least prevents flawed PoW proliferating into the done blocks.


Rico

all non self-referential signatures except mine are lame ... oh wait ...   ·  LBC Thread (News)  ·  Past BURST Activities
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!