Bitcoin Forum
June 24, 2019, 08:19:30 PM *
News: Latest Bitcoin Core release: 0.18.0 [Torrent] (New!)
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 »  All
  Print  
Author Topic: VanitySearch (Yet another address prefix finder)  (Read 5041 times)
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 20, 2019, 02:31:36 PM
Last edit: May 23, 2019, 08:26:39 PM by Jean_Luc
Merited by DarkStar_ (25), malevolent (15), foroplus (15), Welsh (14), dbshck (10), Nestade (10), Avirunes (5), xandry (4), franckuestein (3), Halab (3), ETFbitcoin (2), jacktheking (2), nc50lc (2), asche (2), OgNasty (1), vapourminer (1), BitMaxz (1), arulbero (1), vv181 (1), hugeblack (1), Husna QA (1), 1miau (1), escope (1)
 #1

Hello,

I would like to present a new bitcoin prefix address finder called VanitySearch. It is very similar to Vanitygen.
The main differences with Vanitygen are that VanitySearch is not using the heavy OpenSSL for CPU calculation and that the kernel is written in Cuda in order to take full advantage of inline PTX assembly.
On my Intel Core i7-4770, VanitySearch runs ~4 times faster than vanitygen64. (1.32 Mkey/s -> 5.27  MK/s)
On my  GeForce GTX 645, VanitySearch runs ~1.5 times faster than oclvanitygen. (9.26 Mkey/s -> 14.548 MK/s)
If you want to compare VanitySearch and Vanitygen result, use the -u option for searching uncompressed address.
VanitySearch may not compute a good gridsize for your GPU, so make several tries using -g options in order to find best performances.
Using compressed addresses is roughly 20% faster.

VanitySearch is available from https://github.com/JeanLucPons/VanitySearch

Benchmarks for various hardware (By DaveF):
https://bitcointalk.org/index.php?topic=5112311.msg50823897#msg50823897

There is still lots of improvement to do.
Feel free to test it and to submit issue.

Thanks.
Sorry for my bad English.
Jean-Luc
1561407570
Hero Member
*
Offline Offline

Posts: 1561407570

View Profile Personal Message (Offline)

Ignore
1561407570
Reply with quote  #2

1561407570
Report to moderator
NEW GAME FORMAT
JACKPOT UP TO $50000+
Guess The Symbols Of a Real Ethereum Hash
PLAY NOW
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1561407570
Hero Member
*
Offline Offline

Posts: 1561407570

View Profile Personal Message (Offline)

Ignore
1561407570
Reply with quote  #2

1561407570
Report to moderator
1561407570
Hero Member
*
Offline Offline

Posts: 1561407570

View Profile Personal Message (Offline)

Ignore
1561407570
Reply with quote  #2

1561407570
Report to moderator
1561407570
Hero Member
*
Offline Offline

Posts: 1561407570

View Profile Personal Message (Offline)

Ignore
1561407570
Reply with quote  #2

1561407570
Report to moderator
joniboini
Hero Member
*****
Offline Offline

Activity: 602
Merit: 1069



View Profile WWW
February 21, 2019, 02:56:03 AM
 #2

Do you have any plan to write this for Linux? Or provide a guide to compile it on Linux?
I might try it on my Linux and will compare it with Vanitygen (my laptop is old, so it might worth to compare directly here).

......
.L I V E C O I N . N E T.
.
..PROFITBOX..
██  █████████████████████████
  █████████▄      ▄██████████
█████████████▄  ▄████████████
    █████████████████████████
  ██████████▀    ▀█ ▀████████
████  █████▀  ▄▄  ▀█  ▀██████
  ████████▀  ▄██▄  ▀█   ▀████
    ██████   ▀██▀   ██   ████
  █████████▄      ▄██████████
██  █████████▄  ▄████████████
  ███████████████████████████
██  █████████████████████████
  █████████████████████▀ ███
█████████████████████▀   ███
    █████████████▀     ████
  █████████████▀   ██    ████
████  █████▀     ██    ████
  ███████▀   ██    ██    ████
    █████    ██    ██    ████
  ███████    ██    ██    ████
██  █████    ██    ██    ████
  ███████████████████████████
.....
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 21, 2019, 10:14:10 AM
 #3

Yes,
For the CPU code it should not be a problem unless some intrinsics are missing with gcc. I have to check.
Concerning CUDA, I have to install the Cuda SDK on a Linux machine and try to compile, that should not be a big deal.
There is only few Windows specific functions (concerning high resolution timer) but I can easily make a linux code for it using gettimeofday().
I let you informed.
Wink
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 22, 2019, 09:36:41 AM
 #4

Hello  Smiley

I've just published a new release. There is also a makefile for Linux but it supports only CPU release. CUDA release for Linux is coming. I'm very interested in knowing performance you get on your hardware (Linux/Windows/CPU/GPU).

Thanks for testing and reporting issues.
ETFbitcoin
Legendary
*
Offline Offline

Activity: 1680
Merit: 1811

Use SegWit and enjoy lower fees.


View Profile WWW
February 22, 2019, 07:11:20 PM
 #5

I've tested your software a bit and it's really faster than oclvanitygen (about 50% faster on short test with GTX 1060). The ability to use both CPU and GPU is great to the point where system freezing on default configuration.

Also, i wonder why there's 5 field on output text file?

Code:
Pub Addr: 1testFRS9jRwDmAhcawMbQUP9eu5pJUAT
Prv Addr: 5KKQTwySshztQ9Z9wNuYXBScaRwnA4SXgXoKdrKqAipN6E8kPQM
Prv Key : 0xC5D04AAC3DD1FE9268722E6E24D3AA6D795664736BBDF854C56E7E38623B9FEF
Check   : 1testFRS9jRwDmAhcawMbQUP9eu5pJUAT
Check   : 1EfirXfmYi6uNfwT5sWnvCTBbrfEntGWLJ (comp)

Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 22, 2019, 07:46:02 PM
 #6

Thanks for testing  Smiley
The 2 "Check" fields are here especially for debugging/checking purposes. The 2 'checked' addresses are recomputed from the private key by a direct multiplication. To reach the desired address, during the search, generator points are added one by one.
You're right by default, if you just add the -gpu option, all CPU cores are used and it slows down much the system and even the GPU. The CPU cannot handle GPU/CPU transfer efficiently. I wrote few words about this on the README but I will let one CPU core free if the gpu is selected.
OgNasty
Donator
Legendary
*
Offline Offline

Activity: 2954
Merit: 1691


I 💚 Bitcoin


View Profile
February 22, 2019, 09:55:23 PM
 #7

Great to see someone continuing to develop an open source vanity application.  I'll have to check this out when I get an opportunity. 

Have you seen my prize winning cow? She’s out standing in her field.
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 23, 2019, 08:17:48 AM
 #8

Hello,

Thank you for your interest and for reporting issues Smiley

I just published a new release (v1.2):
-Updated probability calculation for very large prefix
-Avoid that default configuration hangs the system when gpu is enabled
-Performance increase (~10%)
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 25, 2019, 07:53:43 AM
Merited by DarkStar_ (4), LoyceV (1)
 #9

Hello,

I published a new release (1.3) with a ~15% global performance increase, (~20% on GPU).
On my hardware, VanitySearch is now 2 times faster (GPU) than oclvanitygen.
My goal was to reach a 8 characters (case sensitive) prefix in a reasonable time on my 6 years old hardware, it still need 2 weeks of computation for a 50% probability.
I'm not sure I will reach my goal of 2 or 3 days without changing my hardware Cheesy
The next step will be to handle multiple GPU and to support CUDA for linux.

Lolo54
Member
**
Offline Offline

Activity: 70
Merit: 25


View Profile
February 27, 2019, 03:28:04 PM
 #10

Hello jean_luc I would like to send you an MP but it would be necessary if you would like to activate this option in your profile because otherwise the New can not. Smiley
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 27, 2019, 03:40:38 PM
 #11

Hello,
No problem.
Done Wink
arulbero
Legendary
*
Offline Offline

Activity: 1278
Merit: 1298


View Profile
February 27, 2019, 07:25:46 PM
 #12

Hello,

I would like to present a new bitcoin prefix address finder called VanitySearch. It is very similar to Vanitygen.
The main differences with Vanitygen are that VanitySearch is not using the heavy OpenSSL for CPU calculation and that the kernel is written in Cuda in order to take full advantage of inline PTX assembly.
On my Intel Core i7-4770, VanitySearch runs ~4 times faster than vanitygen64. (1.32 Mkey/s -> 5.27  MK/s)
On my  GeForce GTX 645, VanitySearch runs ~1.5 times faster than oclvanitygen. (9.26 Mkey/s -> 14.548 MK/s)
If you want to compare VanitySearch and Vanitygen result, use the -u option for searching uncompressed address.

There is still lots of improvement to do.
Feel free to test it and to submit issue.


Are you using affine or jacobian coordinates for the points?
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 28, 2019, 08:10:41 AM
Last edit: February 28, 2019, 08:52:26 AM by Jean_Luc
 #13

Hello,

Affine coordinates for search (faster):
Each group perform p = startP + i*G, i in [1..group_size] where i*G is a pre-computed table containing G,2G,3G,.... in affine coordinates. The inversion of deltax (dx1-dx2) is done once per group (1 ModInv and 256*3 mult). group_size is 256 key long.

Protective coordinates for EC multiplication (computation of starting keys). Normalization of the key is done after the multiplication for starting key.

Edit:
You also may have noticed that I have an innovative implementation of modular inversion (DRS62) which is almost 2 times faster than the Montgomery one. Some benchmark and comments are available in IntMop.cpp.
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 28, 2019, 08:35:23 AM
 #14

Hello,

Some news:
I just published (1.4) a new release with few fixes (especially for Linux) but the un-initialized memory bug may also affect Windows (I didn't manage to reproduced this bug on Windows but it can be random).

I managed to get back an old PC from my company (~8 years old) with 2 Quadro 600 inside Smiley
Unfortunately the Quadro 600 (fermi) has only compute capability 2.1 and I will have to set-up CUDA SDK 8.0 (the last one which supports fermi). I set up Ubuntu on this PC and I will try to develop the multi GPU release under Linux.
Hope I will manage to get good drivers for the Quadro 600 and to make it work.
arulbero
Legendary
*
Offline Offline

Activity: 1278
Merit: 1298


View Profile
February 28, 2019, 06:27:37 PM
 #15

Hello,

Affine coordinates for search (faster):
Each group perform p = startP + i*G, i in [1..group_size] where i*G is a pre-computed table containing G,2G,3G,.... in affine coordinates. The inversion of deltax (dx1-dx2) is done once per group (1 ModInv and 256*3 mult). group_size is 256 key long.

Protective coordinates for EC multiplication (computation of starting keys). Normalization of the key is done after the multiplication for starting key.

Edit:
You also may have noticed that I have an innovative implementation of modular inversion (DRS62) which is almost 2 times faster than the Montgomery one. Some benchmark and comments are available in IntMop.cpp.


Ok.
two questions:

1) why only 256 for the group size? There is a memory problem? Less inversions are better

2) the field multiplication a*b = c mod p ;  why do you use Montgomery, are you sure it is worth it?
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 28, 2019, 06:46:52 PM
 #16

A group size of 512 does not bring significant improvement (less than 1%). The DRS62 ModInv is fast and almost negligible with a group size of 256.
If you have a modular mult faster than the digit serial Montgomery mult on a 256bit field, I'm obviously fully open. A folding does not improve thing on 256 bit when working with 64bit digits. I'm not sure if Barrett could be faster, I must say I didn't try and for "medium size field", there can be traps.


arulbero
Legendary
*
Offline Offline

Activity: 1278
Merit: 1298


View Profile
February 28, 2019, 07:40:49 PM
Merited by DarkStar_ (1)
 #17

A group size of 512 does not bring significant improvement (less than 1%). The DRS62 ModInv is fast and almost negligible with a group size of 256.
If you have a modular mult faster than the digit serial Montgomery mult on a 256bit field, I'm obviously fully open. A folding does not improve thing on 256 bit when working with 64bit digits. I'm not sure if Barrett could be faster, I must say I didn't try and for "medium size field", there can be traps.


On my pc:

VanitySearch -stop -u -t 1 1tryme --> 1,2 MKeys/s

my ecc library  --> 2,0 MKeys/s  (17 M Public keys/s)

EDIT:
I use:

a) group of 4096 points
b) a * b = c mod p   a*b --> 8 * 64 bit, then first 4 limbs * (2**256 - p) + lower 4 limbs.
c) exploit some properties of secp256k1 curve



Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 28, 2019, 07:47:00 PM
 #18

Linux or windows ?
Is it open source ? Can i try it ?
arulbero
Legendary
*
Offline Offline

Activity: 1278
Merit: 1298


View Profile
February 28, 2019, 07:56:24 PM
 #19

Linux or windows ?
Is it open source ? Can i try it ?
Linux. You have a PM
Jean_Luc
Full Member
***
Offline Offline

Activity: 140
Merit: 193


View Profile
February 28, 2019, 07:57:47 PM
 #20

b) a * b = c mod p   a*b --> 8 * 64 bit, then first 4 limbs * (2**256 - p) + lower 4 limbs.

I tried this. ~same performance as the multiplication by P (for secpk1) for mmult  can be reduced in a single 64bit mult. So I'm interested in c.
OK, on linux, performace are still bad, i'm sorry. Some problem with intrinsic....
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 »  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!