TooDumbForBitcoin
Legendary
Offline
Activity: 1638
Merit: 1001
|
|
February 11, 2017, 10:36:23 AM |
|
Good morning! HeavenlyCreatures found #49 From XXX To bots@cryptoguru.org Date Today 08:02 Hi,
I found #49
0d2f533966c6578e1111978ca698f8add7fffdf3:c:priv:000000000000000000000000000000000000000000000000000174176b015001 + 0xf4c Looking at the PK, the pool must have found it GMT: Sat, 11 Feb 2017 04:32:26 GMT edit: trophies update. cheers! Rico 16 days, 16 days, 33 days, 67 days, ......
|
|
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
|
|
becoin
Legendary
Offline
Activity: 3431
Merit: 1233
|
|
February 11, 2017, 01:44:37 PM |
|
Good morning! HeavenlyCreatures found #49 From XXX To bots@cryptoguru.org Date Today 08:02 Hi,
I found #49
0d2f533966c6578e1111978ca698f8add7fffdf3:c:priv:000000000000000000000000000000000000000000000000000174176b015001 + 0xf4c Looking at the PK, the pool must have found it GMT: Sat, 11 Feb 2017 04:32:26 GMT edit: trophies update. cheers! Rico 16 days, 16 days, 33 days, 67 days, ...... And all are with standard and negligible amount of bitcoins, right? I bet all they are from early days bitcoin conferences when participants were given QR code badges with naked privkeys. There are hundreds if not thousands of them. Don't waste your time with this 'project', just contact conference organizers and ask for the list with privkeys!
|
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 11, 2017, 08:27:52 PM |
|
05 runs nearly 22 seconds for 16M keys on my notebook. This is now only 3.5 times slower than what LBC optimized C version needs for 16M keys. I don't dare to estimate what optimized C code can make of this. real 0m21.790s user 0m21.787s sys 0m0.000s
Your code is a tremendous help, but it would speed up my understanding of the code and porting it to C (and possibly OpenCL) if there were more comments. e.g. you start with start=2**55+789079076 #k is a random private key but start cannot be smaller than 2049 else $ time python ./gen_batch_points05.py Traceback (most recent call last): File "./gen_batch_points05.py", line 50, in <module> kminverse = [invjkz] + inv_batch(kx,mGx,p) File "/data/soft/lin/LBC/generator/HRD/arulbero-ECC/5/ecc_for_collider05.py", line 56, in inv_batch inverse=inv(partial[2048],p) # 1I File "/data/soft/lin/LBC/generator/HRD/arulbero-ECC/5/ecc_for_collider05.py", line 32, in inv q, r = divmod(v,u) ZeroDivisionError: integer division or modulo by zero It'd also help if you could lay out the effective sequence of private keys as it is computed, because if LBC should adopt this, I have to merge this - somehow - with the LBC interval arithmetics to make sure work still is distributable / parallelizable and the bookkeeping still being sane. Rico
|
|
|
|
arulbero
Legendary
Offline
Activity: 1914
Merit: 2071
|
|
February 11, 2017, 09:09:40 PM Last edit: February 12, 2017, 03:38:54 PM by arulbero |
|
Imagine you want to generate a batch from 10000 to 14096 (the script actually generates batches of 4097 points) First you generate the key k = 12048 (always we start with the middle point, to exploit the symmetry), this is the only point (a pivot point) of the batch that we get with the slower function mult ... k ... <-- one batch, only one key k jkx,jky,jkz = mul(k,Gx,Gy,1) invjkz = inv(jkz,p) (kx,ky) = jac_to_aff(jkx, jky, jkz, invjkz)
k can be any number greater than 2048 (otherwise, if k=3 for example, kG+3G gives a error because you are trying to use the addition formula instead of the double...) The first batch you can create with this script goes from 1 to 4097, the start key in that case would be k=2049. Then the script generates three batches, each batch has 1 point + 2048 couple of points: first batch: this is the batch you are more interested of, because it has 4097 points in your range, including the point 12048G: (12048),(12048+1,12048-1),(12048+2,12048-2),....,(12048+2048=14096,12048-2048=10000) the script computes this batch with the function double_add_P_Q_inv Element #0 of the list is always kG, element #1 is the couple kG+1G, kG-1G, #2 is the couple kG+2G, kG-2G, and so on ... --> #2048 is the couple kG+2048,kG-2048G batch = batch + list(map(double_add_P_Q_inv,kxl[1:],kyl[1:],mGx[1:],mGy[1:],kminverse[1:]))
Batch 1 and 2: these keys are not in your range, here we use endomorphism: batch1: (12048*lambda), ((12048+1)*lambda,(12048-1)*lambda), ((12048+2)*lambda,(12048-2)*lambda), ...., (10000*lambda,14096*lambda) batch2: (12048*lambda^2), ((12048+1)*lambda^2, (12048-1)*lambda^2), ((12048+2)*lambda^2, (12048-2)*lambda^2), ...., (14096*lambda^2, 10000*lambda^2) EDIT: to make sure work still is distributable / parallelizable and the bookkeeping still being sane.
You don't worry about each key, in my opinion you have to store only a private key for 3 batches, you can think at the single key in the middle of the batch like a special seed. 99,9999% of the batches doesn't match any address with bitcoin, so when a match occurs only then you have to regenerate the entire 3 batches from this single seed to fetch the correct private key. Batch 1 and 2 are sequence of keys each different from each other, so you are sure that you are not wasting your computational efforts. I'm almost sure about the last sentence, there can't be more than three points with the same y, it is not possible checking the same key twice. Note that the 3 batches are related, they must be computed together. Imagine you know that the pool has searched so far from key 1 to 2^50, then you know that the pool has searched keys 1*lambda, 2*lambda, 3*lambda ... to 2^50*lambda (mod n) too, and keys 1*lambda^2, 2*lambda^2, 3*lambda^2,... to 2^50*lambda^2 (mod n). 05 runs nearly 22 seconds for 16M keys on my notebook. This is now only 3.5 times slower than what LBC optimized C version needs for 16M keys. I don't dare to estimate what optimized C code can make of this. I dare: if you use complement too, you can generate 16M keys in less than half a second (with cpu, I don't know for GPU) Considering that your current code performs 6M + 1S only for the transition from jacobian to affine coordinates for each point and that you are using J+J --> J to perform each addition (12M + 4S), your current cost should be 18M + 5S each point.
Let's say 1S = 0,8M, you have about 22M for point.
If you are now using instead J+A --> J to perform addition (8M + 3S), then you have about 17,2M for point.
My code uses 3,5M + 1S for each point of the first batch, and only 1M for each point of the other 2 batches. So the average is: 5,5/3= 1,83M + 0,33S for point, let's say about 2,1M for point. Now your speed is 16M/6s = 2,7 M/s for each cpu core. If you could achieve a 8x - 10x improvement, let's say a 8x, so you could perform at least 21M/s. If you use (X,Y) --> (X,-Y) too, 42M/s. Let's say at least 40M k/s for each core, 15x respect of your actual speed. With a 8-core cpu, you could generate more keys than your entire pool can handle at this moment. Maybe tomorrow I'll add more comments on the code. Anyway read again this post, I edited it. EDIT2: this is a version with more comments: https://www.dropbox.com/s/6o2az7n6x0luld4/ecc_for_collider06.zip?dl=0
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 12, 2017, 12:03:09 PM Last edit: February 12, 2017, 08:50:27 PM by rico666 |
|
- New BLF file on FTP
- New LBC client version (1.010) available
./LBC -u is your friend. As mentioned in #433, you can now attach a BTC address with your id for rewards to your client. As mentioned in #436, you can now call the LBC client with a --gpu parameter. The best case scenario you will see is currently this: $ ./LBC --gpu OpenCL diagnostics written. GPU authorized: yes
If you see this, you're on the highway to a GPU accelerated client. If you see instead this: Perl module 'OpenCL' not found - please make sure: * OpenCL is installed correctly on your system * then install the Perl OpenCL module via CPAN (cpan install OpenCL) you want to make sure OpenCL is installed correctly on your system. Some pointers to do so: https://wiki.tiker.net/OpenCLHowTohttp://askubuntu.com/questions/796770/how-to-install-libopencl-so-on-ubuntuWon't work in a VM. At least not without advanced magic. If oclvanitygen runs on your system, you're fine. The only thing left to do is to install the Perl bindings for OpenCL: https://metacpan.org/pod/OpenCLFor this, it's the usual: $ cpan cpan> install OpenCL
or - in one batch:
$ cpan install OpenCL
The message "OpenCL diagnostics written" indicates you will see a file diagnostics-OpenCL.txt in your directory. Please do not post its output here as it is quite extensive. Either pastebin it and post the link here, or send its content to bots@cryptoguru.org. (If there are any problems, or if you want to make sure your config is supported). Well, and if you see a instead and you would want to change that - you want to be in the top30 or will have to fork out 0.1 BTC OpenCL generator ETA: "really soon now(tm)" Rico edit: Short HowTo install LBC @ AWS Ubuntu instance including OpenCL # $ is shell/bash # cpan> is cpan shell
$ sudo apt-get update $ sudo apt-get install gcc xdelta3 make $ sudo apt-get install nvidia-opencl-dev nvidia-opencl-icd-367 nvidia-modprobe clinfo $ clinfo $ sudo cpan cpan> install JSON OpenCL
$ mkdir collider; cd collider; tmux $ wget ftp://ftp.cryptoguru.org/LBC/client/LBC $ chmod a+x LBC $ ./LBC -h
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 13, 2017, 02:36:57 PM |
|
Unoptimized CPU/GPU hybrid generator. 1st successful run on 1 CPU core with Nvidia GPU in tandem: 1811207 keys/s $ time hrd-core -I 0000000000000000000000000000000000000000000000000000000000000001 -c 10000 Num platforms: 2 Platform - 0 1.1 CL_PLATFORM_NAME: Intel(R) OpenCL 1.2 CL_PLATFORM_VENDOR: Intel(R) Corporation 1.3 CL_PLATFORM_VERSION: OpenCL 2.0 1.4 CL_PLATFORM_PROFILE: FULL_PROFILE 1.5 CL_PLATFORM_EXTENSIONS: cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_spir Device - 0: CL_DEVICE_NAME: Intel(R) HD Graphics CL_DEVICE_VENDOR: Intel(R) Corporation CL_DRIVER_VERSION: r2.0.54425 CL_DEVICE_VERSION: OpenCL 2.0 CL_DEVICE_MAX_COMPUTE_UNITS: 24 Platform - 1 2.1 CL_PLATFORM_NAME: NVIDIA CUDA 2.2 CL_PLATFORM_VENDOR: NVIDIA Corporation 2.3 CL_PLATFORM_VERSION: OpenCL 1.2 CUDA 8.0.0 2.4 CL_PLATFORM_PROFILE: FULL_PROFILE 2.5 CL_PLATFORM_EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_khr_gl_event Device - 0: CL_DEVICE_NAME: Quadro M2000M CL_DEVICE_VENDOR: NVIDIA Corporation CL_DRIVER_VERSION: 375.26 CL_DEVICE_VERSION: OpenCL 1.2 CUDA CL_DEVICE_MAX_COMPUTE_UNITS: 5 2d17543d32448acc7a1c43c5f72cd5be459ab302:u:priv:0000000000000000000000000000000000000000000000000000000000000001 + 0x5e 02e62151191a931d51cdc513a86d4bf5694f4e51:c:priv:0000000000000000000000000000000000000000000000000000000000000001 + 0x65 9d74ffdb31068ca2a1feb8e34830635c0647d714:u:priv:00000000000000000000000000000000000000000000000000000000000f9001 + 0xf8c 3d6871076780446bd46fc564b0c443e1fd415beb:c:priv:00000000000000000000000000000000000000000000000000000000000f9001 + 0xf8c response: 30-19-0
real 0m9.263s user 0m8.117s sys 0m1.097s Rico
|
|
|
|
arulbero
Legendary
Offline
Activity: 1914
Merit: 2071
|
|
February 13, 2017, 05:11:28 PM Last edit: February 13, 2017, 05:37:37 PM by arulbero |
|
Hi, Unoptimized CPU/GPU hybrid generator. 1st successful run on 1 CPU core with Nvidia GPU in tandem: 1811207 keys/s
CPU only for public keys generation + GPU for sha256/ripemd160? Why in the meantime the pool performance has fell down? I have a new version of the ecc_for_collider: 1) + complement private keys 2) + comments https://www.dropbox.com/s/3jsxjy7sntx3p4a/ecc_for_collider07.zip?dl=0The file foo.py performs 16,4 M of useless products; just to appreciate the efficiency of the generation of public keys of the script gen_batches_points07.py: main_batch --> (x,y) 3,5M + 1S for each point batch2 --> (betax,y) 1M for each point batch3 --> (beta^2*x,y) 1M for each point batch_minus --> (x,-y) (betax,-y) (beta^2*x,-y) 0M and 0S Total: about 1,1M for each point! If you know the performance of the field multiplication in your C code, you can have an idea of the performance you could reach. How long it takes your C code to perform 16,4 M multiplications (operands: big numbers and multiplication mod p)? In the next days I want to perform some tests about endomorphism, just to be sure that everything is ok (for example we'd like to avoid twice computation of the same key)
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 13, 2017, 05:48:37 PM |
|
CPU only for public keys generation + GPU for sha256/ripemd160?
Exactly. meanwhile I am at real 0m8.561s user 0m8.093s sys 0m0.413s
(= 1959955 keys/s per CPU core with GPU support) and memory requirement on GPU a mere 29MB (GPU is bored) Why in the meantime the pool performance has fell down? Because two (in words: two!) guys turned their machines off. I have a feeling this dip in performance is only temporary... Of the aforementioned 8 seconds, around 6.2 are ECC public key generation (16M uncompressed keys, the compressed key is done @ GPU). Every second less here counts, so naturally all you did towards ECC optimization will have maximum effect with the CPU/GPU hybrid. Rico
|
|
|
|
arulbero
Legendary
Offline
Activity: 1914
Merit: 2071
|
|
February 13, 2017, 06:17:16 PM |
|
CPU only for public keys generation + GPU for sha256/ripemd160?
Exactly. meanwhile I am at real 0m8.561s user 0m8.093s sys 0m0.413s
(= 1959955 keys/s per CPU core with GPU support) and memory requirement on GPU a mere 29MB (GPU is bored) Of the aforementioned 8 seconds, around 6.2 are ECC public key generation (16M uncompressed keys, the compressed key is done @ GPU). 6,2 s: CPU generates 16.7 M of public keys (x,y) 1,8 s: GPU performs SHA256 / ripemd160 of (x,y) and (x) <-compressed, what do you mean "compressed key is done with GPU"? Do you use 1 or 2 compressed keys? The x is always the same, you don't need to compute the y so you can generate 2 compressed keys for each uncompressed. Do you generate 33M of addresses each 8s or 50M of addresses? Anyway at the moment the cpu is the bottleneck, gpu does his work at least x3 faster than cpu...
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 13, 2017, 06:34:52 PM Last edit: February 13, 2017, 08:17:28 PM by rico666 |
|
6,2 s: CPU generates 16.7 M of public keys (x,y) 1,8 s: GPU performs SHA256 / ripemd160 of (x,y) and (x) <-compressed,
Yes. what do you mean "compressed key is done with GPU"?
sha256_in[0] = 0x02 | (sha256_in[64] & 0x01);
Anyway at the moment the cpu is the bottleneck, gpu does his work at least x3 faster than cpu...
Sure. It is a 1st step. The big advantage of this is, it works like a drop-in replacement. I see lots of optimization potential, originally, my notebook maxed out at ~ 2.8 Mkeys/s and now $ LBC -c 8 Ask for work... got blocks [383054009-383054392] (402 Mkeys) oooooooooooooooooooooooo (7.30 Mkeys/s)
Rico edit: LOL... $ LBC -t 1 -l 0 Ask for work... Server doesn't like us. Answer: toofast.
|
|
|
|
Jude Austin
Legendary
Offline
Activity: 1140
Merit: 1000
The Real Jude Austin
|
|
February 14, 2017, 02:11:10 AM Last edit: February 14, 2017, 02:37:06 AM by Jude Austin |
|
6,2 s: CPU generates 16.7 M of public keys (x,y) 1,8 s: GPU performs SHA256 / ripemd160 of (x,y) and (x) <-compressed,
Yes. what do you mean "compressed key is done with GPU"?
sha256_in[0] = 0x02 | (sha256_in[64] & 0x01);
Anyway at the moment the cpu is the bottleneck, gpu does his work at least x3 faster than cpu...
Sure. It is a 1st step. The big advantage of this is, it works like a drop-in replacement. I see lots of optimization potential, originally, my notebook maxed out at ~ 2.8 Mkeys/s and now $ LBC -c 8 Ask for work... got blocks [383054009-383054392] (402 Mkeys) oooooooooooooooooooooooo (7.30 Mkeys/s)
Rico edit: LOL... $ LBC -t 1 -l 0 Ask for work... Server doesn't like us. Answer: toofast.
Can't wait to get home and try the GPU version. If I have 4 cores and 4 GPUs will it use a GPU with each core or... Also, can you make it so I can run this on an Rpi? Allow the client to run the old Go script that should suffice.
|
Buy or sell $100 of Crypto and get $10!
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 14, 2017, 07:16:45 AM Last edit: February 14, 2017, 09:08:26 AM by rico666 |
|
Can't wait to get home and try the GPU version.
Think March. I have some basic quality assurance in this project. The client basically works, but several things are still hard coded for my notebook (choice of OpenCL device). I have no feedback (diagnostics-OpenCL.txt) from AMD GPUs yet. Client is stable. Ran the whole night through on my notebook with 7.x Mkeys/s: Of course I also checked the client with all blocks containing private keys the pool has found so far - it reliably finds all of them. If I have 4 cores and 4 GPUs will it use a GPU with each core or...
Right now, one GPU would be taken as accelerator for all cores and still be bored. Probably the best balancing one can get right now is 1 GPU and many cores Amazon p2.xlarge or similar to my notebook. That's why I am asking for the OpenCL diagnostics files, to be able to cover a broader range of configurations. My next step will to be to incorporate arulberos ECC magic to shift the balance by taking more and more load from the CPU. Current status: https://twitter.com/LBC_colliderAlso, can you make it so I can run this on an Rpi? Allow the client to run the old Go script that should suffice.
It's unlikely I will go down that path for now. The HRD-client was originally about 13 times faster than the Go client (meanwhile >15 times), and 32bit architectures are on average half the speed of 64bit. I do have a 32bit notebook (Lenovo Z61p), two cores, that does about 200 Kkeys/s on both cores with HRD, this notebook does around 12 Kkeys/s with the Go client. My new notebook was originally 14 times faster with CPU only and is meanwhile over 35 times faster than the HRD on the old one. It is about 616 times faster than the Go client on the old notebook. Also, the Go client needed more memory (2GB). So my goal is to make a GPU client so that my current notebook (and your computer) will be x-thousand times faster than Go on the old/small machines. Rico
|
|
|
|
arulbero
Legendary
Offline
Activity: 1914
Merit: 2071
|
|
February 14, 2017, 07:44:49 PM Last edit: February 15, 2017, 03:55:51 PM by arulbero |
|
I am performing some tests about endomorphism.
I remind the idea, we would like to generate:
a) 1G, 2G, 3G, ......., kG, .......... , 2^160G b) 1G', 2G', 3G', ......., kG', .........., 2^160G' where G'=lambdaG c) 1G'', 2G'', 3G'', ......., kG'',.........., 2^160G'' where G''=lambda^2G
We are sure that each row has different elements, because G, G', G'' have period n. But of course we cannot be sure that each element of b) for example is not an element of a) too. If we generated n keys instead of just 2^160, we would get the entire group of all n points, and then all the 3 rows would have the same elements. Only the order would be different.
But we have to generate only "few" elements. Let's look at the rows a) and b) and at the relation between 2 corresponding elements: kG' = k*(lambdaG) = lambda*(kG). Where are these elements of b)?
My guess is: multiplication by lambda produces 2^160 elements of b) evenly distributed in the space of the keys (keys respect of the generator G).
If that were true, how often would we have a "collision" (double computation of the same key in 2 distinct rows) between the 2 rows? If the keys of the b) row are actually evenly distributed, the probability for each new key of b) to fall in the range 1-2^160 should be 2^160/2^256, about 1/2^96. If we generated 2^160 elements, we'd have 2^64 collisions.
To deal with this hypothesis, I generated 2^30 keys of the row b) (lambda1, lambda2, lambda3, ..., lambda2^30); none of these were in the range (1,2^160), so I checked how many were in larger ranges (like for example (1,2^238), and in that case I got about 2^12 'collisions' (2^238/2^256 * 2^30 = 2^12). So my hypothesis seems to have been confirmed by these results.
In summary, since we have to generate only 2^160 keys, we can accept (but obviously it's up to you) to have a double computation for one key each 2^96, only 16 'collisions' in the first 2^100 keys.
A question remains: do you want to generate random keys outside from your initial range? In case of collision, how can somebody prove to you that it is his key, since that key is indistinguishable from the others?
If you want instead to let go of endomorphism, I remind you that your generation's speed will be halved (from 1,1 M to 2,1 M for each point).
|
|
|
|
Haze
|
|
February 15, 2017, 08:46:48 PM |
|
Of course I also checked the client with all blocks containing private keys the pool has found so far - it reliably finds all of them. Can you explain this further please? How do you know which blocks contains keys the pool has found? If someone found a valid key and just let it go and kept colliding, would you know about it?
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 15, 2017, 10:09:17 PM |
|
Of course I also checked the client with all blocks containing private keys the pool has found so far - it reliably finds all of them. Can you explain this further please? How do you know which blocks contains keys the pool has found? https://lbc.cryptoguru.org/trophiesWhat I meant was, that additionally to the usual ./LBC -x I also searched manually in spaces where the known private keys of the puzzle transaction are (all compressed) and also the two addresses we found with funds on them (which are uncompressed). The new CPU/GPU hybrid found all of them, so I assume it is a working drop in replacement. Testing the LBC is crucial, because when you have rare events like we have, you cannot afford to have a generator that overlooks something. If your computer works for a month without a find, you have to be pretty sure it is because there really was nothing and not that because of some bug your client "oversaw" something. So that's basically what my test (and the statement) was about. Rico
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 16, 2017, 08:03:05 AM |
|
Seems I catched some race condition after my async modifications. When I came to my notebook today morning, I saw ...lots of work done, but then ... Ask for work... got blocks [403243609-403246040] (2550 Mkeys) oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0 0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0 0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x1 0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x1 0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x2 0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x2 0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x3 0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x3 0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x4 0000000000000000000000000000000000000000:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x4 0000000000000000000000000000000000000000:u:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0x5 ...and so on...
thousands of "finds" of a 000000 hash160. And then 197f1706f2aa45480c1debc40628c87823da08f6:c:priv:000000000000000000000000000000000000000000000000000180909f801001 + 0xd2
Naturally, I looked up 197f1706f2aa45480c1debc40628c87823da08f6, which resolves to https://blockchain.info/address/13Kp9AJAxhEEjFo8N6YTP9DMW71YpK2fD9, but no funds there. Ok, that can happen if the bloom filter sees a false positive (allegedly 10 -27 probability), but a re-run in the same search space went smooth with neither any fake zero-hash160 finds nor this false positive. Investigating, but it seems like clEnqueueReadBuffer does not respect a blocking read, after it has been called with non blocking reads before. I have done some more optimizations, but all I managed to do, was that the GPU load went down from 43% to 34% I need to take load down from the CPU! Rico
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 16, 2017, 11:42:27 AM Last edit: February 16, 2017, 12:09:56 PM by rico666 |
|
https://twitter.com/LBC_colliderGPU's arent ... you should try GPU ... I'm sure you can delivered great speed with GPU
even with 1 server I think I can triple the pool speed.
root@soft:~# lshw -C video | grep product: product: ASPEED Graphics Family product: GK210GL [Tesla K80] product: GK210GL [Tesla K80] product: GK210GL [Tesla K80] product: GK210GL [Tesla K80]
ubuntu@ip-172-31-34-146:~/collider$ ./LBC -c 4 -l 0 -t 1 Benchmark info not found - benchmarking... done. Your maximum speed is 1576126 keys/s per CPU core. Ask for work... got blocks [405066137-405066520] (402 Mkeys) oooooooooooooooooooooooo (3.19 Mkeys/s) ubuntu@ip-172-31-34-146:~/collider$ ./LBC -c 2 -l 0 -t 1 Ask for work... got blocks [405077529-405077720] (201 Mkeys) oooooooooooo (2.78 Mkeys/s) Clearly, Amazon puts way too few/too weak CPUs in their Instances - for our usecase. What surprises me more, is that the K80 does not look so impressive compared with my tiny Notebook GPU: ubuntu@ip-172-31-34-146:~$ nvidia-smi Thu Feb 16 11:23:38 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.57 Driver Version: 367.57 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 On | 0000:00:1E.0 Off | 0 | | N/A 55C P0 76W / 149W | 256MiB / 11439MiB | 23% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1938 C ./gen-hrdcore-avx2-linux64 64MiB | | 0 1939 C ./gen-hrdcore-avx2-linux64 64MiB | | 0 1940 C ./gen-hrdcore-avx2-linux64 64MiB | | 0 1941 C ./gen-hrdcore-avx2-linux64 64MiB | +-----------------------------------------------------------------------------+
With the 4 vCPUs in use. Clearly , 4 vCPUs in Amazon speak mean 2 real cores + 2HT versus my real 4 CPUs: $ nvidia-smi Thu Feb 16 12:36:36 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro M2000M Off | 0000:01:00.0 Off | N/A | | N/A 51C P0 N/A / N/A | 115MiB / 4041MiB | 33% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 21809 C ./gen-hrdcore-skylake-linux64 28MiB | | 0 21810 C ./gen-hrdcore-skylake-linux64 28MiB | | 0 21811 C ./gen-hrdcore-skylake-linux64 28MiB | | 0 21839 C ./gen-hrdcore-skylake-linux64 28MiB | +-----------------------------------------------------------------------------+
I end up at almost 7 Mkeys/s with my 4 CPUs. Moreover, not only is the memory usage more efficient (ok - the K80 has 3 times the memory, but it also slurps - for reasons unknown to me - about 2.5 times per process), also the relative utilization is in favor of my notebook. If Amazon offered a P2 instance with 20vCPUs and 1 K80 -> that would be balanced and at least 30 Mkeys/s could be expected from that. Als a good (in terms of balance) configuration: 12 real Skylake cores and some reasonable Maxwell (GM107) GPU -> should give you 23+ Mkeys/s On the more positive side, GPU detection and choice of OpenCL device ran flawless on 1st try. Rico edit installation howto for OpenCL on Ubuntu 16.04 (as used on AWS): # OpenCL @ Amazon AWS Ubuntu ----------------------------------
sudo apt-get install gcc make tmux libssl-dev xdelta3 nvidia-367 nvidia-cuda-toolkit mkdir collider; cd collider; tmux wget ftp://ftp.cryptoguru.org/LBC/client/LBC chmod a+x LBC sudo ./LBC -h sudo cpan cpan> install OpenCL sudo reboot sudo nvidia-smi -pm 1 sudo nvidia-smi --auto-boost-default=0 sudo nvidia-smi -ac 2505,875 ./LBC -x ./LBC --gpu
economic considerations: At the moment AWS GPU instances are not economical. For 0.25/h you can get the p2.xlarge and it will give you max 3.2 Mkeys/s. OTOH, you can get for 0.5/h a m4.x16 compute instance with 64 vCPUs and that will give you around 18 Mkeys/s. Yes - we need a better GPU client.
|
|
|
|
becoin
Legendary
Offline
Activity: 3431
Merit: 1233
|
|
February 16, 2017, 01:07:04 PM |
|
economic considerations:
Really? You've finally decided this "project" needs some economic considerations after 23 pages of enthusiastic code churning?
|
|
|
|
rico666 (OP)
Legendary
Offline
Activity: 1120
Merit: 1037
฿ → ∞
|
|
February 16, 2017, 01:19:02 PM |
|
economic considerations:
Really? You've finally decided this "project" needs some economic considerations after 23 pages of enthusiastic code churning? becoin - as always... It's not the "project" that needs economic considerations, but anyone who wants to get in the top30 for getting a GPU client and not forking out 0.1 BTC (or 0.5 BTC if he's becoin). Right now, you can still get in the top30 for around $11 (~28 hours) with a m4.x16 AWS spot instance. To achieve the same with the p2.xlarge would cost you $33. Apropos churning: I made a workaround in the LBC client to stop the generator when it is churning bad hashes: Ask for work... got blocks [405316777-405317288] (536 Mkeys) oooooooooooooooooooooooooooooooo (6.68 Mkeys/s) Ask for work... got blocks [405317817-405318328] (536 Mkeys) oooooooooooooooooooooooooooooooo (6.51 Mkeys/s) Ask for work... got blocks [405318361-405318872] (536 Mkeys) ooooooooooooooooooGenerator churning bad hits! Abort. 20 just got out of the pool with exit code: 255 and data: ooooooooooooomalformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "HASH(0x3e5cca8)") at ./LBC line 1176.
It's not nice, but until I find a real fix, this at least prevents flawed PoW proliferating into the done blocks. Rico
|
|
|
|
|