Bitcoin Forum
November 09, 2024, 12:52:09 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 [6] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 ... 191 »
  Print  
Author Topic: Vanitygen: Vanity bitcoin address generator/miner [v0.22]  (Read 1153529 times)
EricJ2190
Full Member
***
Offline Offline

Activity: 134
Merit: 102


View Profile
July 12, 2011, 04:35:21 AM
 #101

I have got this estimation for my pattern : 9.47e+33y
What exactly this means in decimals ?   Huh
Maybe 9.47*2.7183^33 years ?  Undecided

It means 9.47*10^33. I assume years.

http://en.wikipedia.org/wiki/Scientific_notation#E_notation
samr7 (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 430

Firstbits: 1samr7


View Profile
July 12, 2011, 09:52:30 AM
 #102

New version 0.10 is up.

This version is approx. 6X (!!) faster at prefix matching, thanks to an OpenSSL optimization for quickly computing batches of modular inverses.  This optimization also makes the cost of regular expressions much more acute.  The search rate for matching a single regular expression only improved by about 3X, and overall is approx. 1/3 the speed of a prefix match.
Shevek
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250



View Profile
July 12, 2011, 10:50:52 AM
 #103

New version 0.10 is up.

This version is approx. 6X (!!) faster at prefix matching, thanks to an OpenSSL optimization for quickly computing batches of modular inverses.  This optimization also makes the cost of regular expressions much more acute.  The search rate for matching a single regular expression only improved by about 3X, and overall is approx. 1/3 the speed of a prefix match.

Congratz!

But... any news about entropy import?

Proposals for improving bitcoin are like asses: everybody has one
1SheveKuPHpzpLqSvPSavik9wnC51voBa
pc
Sr. Member
****
Offline Offline

Activity: 253
Merit: 250


View Profile
July 12, 2011, 11:38:23 AM
 #104

I have a dual-quad-core Mac Pro with hyperthreading, and on previous versions if I ran at 8 threads I got optimal performance, but I noticed with the new version that at 8 threads I was still only using "400%" of a cpu, so I tried running at 4 threads instead and got up to 300000 K/s instead of around 200000 K/s. So, I don't know if others have a similar configuration, but it might be good to play around with the number of threads to try to hit the optimal rate for your platform.

Thank you very much for this.
dserrano5
Legendary
*
Offline Offline

Activity: 1974
Merit: 1029



View Profile
July 12, 2011, 01:33:22 PM
 #105

I'm seeing 4x increase. I don't care not getting 6x, 4x is an amazing improvement in any case Smiley.
samr7 (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 430

Firstbits: 1samr7


View Profile
July 12, 2011, 01:54:26 PM
 #106

I have a dual-quad-core Mac Pro with hyperthreading, and on previous versions if I ran at 8 threads I got optimal performance, but I noticed with the new version that at 8 threads I was still only using "400%" of a cpu, so I tried running at 4 threads instead and got up to 300000 K/s instead of around 200000 K/s. So, I don't know if others have a similar configuration, but it might be good to play around with the number of threads to try to hit the optimal rate for your platform.

Great, negative scalability.  Are you using regular expressions?  How fast does it run with just one thread?

I'm seeing 4x increase. I don't care not getting 6x, 4x is an amazing improvement in any case Smiley.

an0therlr3, you might be noticing some scalability issues as well.  If you have a sec, give some details.  Are you using prefixes?  How many cores, how fast, and how fast with a single thread?
dserrano5
Legendary
*
Offline Offline

Activity: 1974
Merit: 1029



View Profile
July 12, 2011, 02:34:16 PM
 #107

If you have a sec, give some details.  Are you using prefixes?  How many cores, how fast, and how fast with a single thread?

Intel(R) Xeon(R) CPU E5420  @ 2.50GHz (8 cores):

Code:
$ ./vg-0.6 -it1 1Loaners & sleep 10; kill $!
[1] 30177
Difficulty: 28173812690
[28363 K/s][total 280000][Prob 0.0%][50% in 8.0d]
$ ./vg-0.6 -i 1Loaners & sleep 10; kill $!
[1] 30179
Difficulty: 28173812690
[174878 K/s][total 1520000][Prob 0.0%][50% in 1.3d]
$ ./vg-0.10 -it1 1Loaners & sleep 10; kill $!
[1] 30188
Difficulty: 28173812690
[164485 K/s][total 1605696][Prob 0.0%][50% in 1.4d]
$ ./vg-0.10 -i 1Loaners & sleep 10; kill $!
[1] 30190
Difficulty: 28173812690
[884067 K/s][total 8430080][Prob 0.0%][50% in 6.1h]

v0.6 single thread to v0.6 8 threads: 174878/28363 = 6.1657x (expect 8x)
v0.6 single thread to v0.10 single thread: 164485/28363 = 5.7992x (expect 6x as announced)
v0.6 8 threads to v0.10 8 threads: 884067/174878 = 5.0553x (expect 6x as announced)

Oops, my fault, it's not 4x but 5x. I stopped vanitygen v0.6 8 hours ago and started v0.10 some minutes ago. I judged the improvement not by the rate but by the time remaining, and I suspect I didn't take into account the fact that when I stopped v0.6 this morning, it had been running for some hours and the time remaining was, of course, less than at the start Smiley.
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
July 12, 2011, 03:09:09 PM
 #108

New version 0.10 is up. This version is approx. 6X (!!) faster at prefix matching

Congratulations for this optimization! I've profiled vanitygen 0.9 before and also noticed the issue with the inversion taking so much time. But you found a solution already. Are you very familiar with OpenSSL internals? It certainly seems so.

If someone can port two important functions to the GPU, one being the EC_POINT_add() and the other being EC_POINTs_make_affine(), this thing will fly. Even more so when also the SHA256 and MD160 hashes are done on the GPU.  Here is the blurb of relevant profiler output. The number in the second column is seconds spent inside the function and its children. The total execution time was about 25 seconds in this test run.

Code:
-----------------------------------------------
[3]     99.9    0.01   24.92       1         vg_thread_loop(_vg_context_s*) [3]
                0.00   12.55  249406/250471      EC_POINT_add [7]
                0.00    7.91     932/941         EC_POINTs_make_affine [9]
                0.00    1.82  219546/219548      EC_POINT_point2oct [16]
                0.00    1.53  272770/272775      SHA256 [19]
                0.01    0.71  226973/226974      RIPEMD160 [26]

Anything else is peanuts in comparison, including the prefix matching.

pc
Sr. Member
****
Offline Offline

Activity: 253
Merit: 250


View Profile
July 12, 2011, 03:10:28 PM
 #109

Great, negative scalability.  Are you using regular expressions?  How fast does it run with just one thread?

Well, I'm pretty sure it's still faster than it was on the old version, even with the old running at 8 threads, but I'd need to recompile the older version if I wanted to compare. Just running a case-insensitive prefix:

Code:
cebu:~% nice ./Applications/vanitygen -i -t 1 1abcdefg
Difficulty: 13628644118
[80020 K/s][total 501760][Prob 0.0%][50% in 1.4d]
                         
cebu:~% nice ./Applications/vanitygen -i -t 2 1abcdefg
Difficulty: 13628644118
[162979 K/s][total 1505280][Prob 0.0%][50% in 16.1h]

cebu:~% nice ./Applications/vanitygen -i -t 3 1abcdefg
Difficulty: 13628644118
[237562 K/s][total 903168][Prob 0.0%][50% in 11.0h]

cebu:~% nice ./Applications/vanitygen -i -t 4 1abcdefg
Difficulty: 13628644118
[299808 K/s][total 2408448][Prob 0.0%][50% in 8.8h]

Up to this point, CPU usage in Activity Monitor is about what one expect, being roughly 100% times the number of threads.

Code:
cebu:~% nice ./Applications/vanitygen -i -t 5 1abcdefg
Difficulty: 13628644118
[262992 K/s][total 4264960][Prob 0.0%][50% in 10.0h]

5 threads was having CPU hovering between 420% and 440%, and a lower keygen rate, which makes me think that there's some kind of contention for something that's not CPU-bound.

Code:
cebu:~% nice ./Applications/vanitygen -i -t 6 1abcdefg
Difficulty: 13628644118
[261357 K/s][total 9182592][Prob 0.1%][50% in 10.0h]

cebu:~% nice ./Applications/vanitygen -i -t 7 1abcdefg
Difficulty: 13628644118
[245618 K/s][total 1705984][Prob 0.0%][50% in 10.7h]

Using 6 and 7 threads was roughly the same as 5, with CPU slightly higher, maybe between 425% and 445%.

Code:
cebu:~% nice ./Applications/vanitygen -i -t 8 1abcdefg
Difficulty: 13628644118
[200385 K/s][total 2358272][Prob 0.0%][50% in 13.1h]

Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I'm not remembering exactly what speeds I was getting before on v0.8, but when I ran 8 threads it was using about 800% CPU, and I'm pretty sure it was well south of 200000 K/s, probably more like 100000, but I really don't remember so I wouldn't rely on that number at all.

And just for completeness, here's my hardware configuration:
Code:
  Model Name:	Mac Pro
  Model Identifier: MacPro4,1
  Processor Name: Quad-Core Intel Xeon
  Processor Speed: 2.26 GHz
  Number Of Processors: 2
  Total Number Of Cores: 8
  L2 Cache (per core): 256 KB
  L3 Cache (per processor): 8 MB
  Memory: 32 GB
  Processor Interconnect Speed: 5.86 GT/s
  Boot ROM Version: MP41.0081.B07
  SMC Version (system): 1.39f5
  SMC Version (processor tray): 1.39f5

Thanks again!
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
July 12, 2011, 04:13:59 PM
 #110

Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I believe the contention might be caused by the pooling of EC_POINT objects before calling that make_affine function.
This might spill the contents of your L1/L2 caches now. So it may be more efficient to not run hyperthreaded in this version.

There are some profiling tools by Intel Corp that would permit to figure this out. Haven't used any of them yet.

You could also play with that pool size.
pc
Sr. Member
****
Offline Offline

Activity: 253
Merit: 250


View Profile
July 12, 2011, 04:29:40 PM
 #111

Using 8 threads somehow brings even more contention, with CPU hovering just around 400%.

I believe the contention might be caused by the pooling of EC_POINT objects before calling that make_affine function.
This might spill the contents of your L1/L2 caches now. So it may be more efficient to not run hyperthreaded in this version.

There are some profiling tools by Intel Corp that would permit to figure this out. Haven't used any of them yet.

You could also play with that pool size.

I don't think I was clear before: I have 8 physical cores, and hyperthreading is on, so I see 16 logical CPUs in Activity Monitor. I wasn't surprised with the older version when it maxed out performance at 8 as opposed to 16, but maxing out at 4 seems a little weird.

It's so awesome to churning through billions of addresses. Amusing how this is even less useful than mining is, and yet somehow is more fun.
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
July 12, 2011, 04:35:54 PM
 #112

I don't think I was clear before: I have 8 physical cores, and hyperthreading is on, so I see 16 logical CPUs in Activity Monitor. I wasn't surprised with the older version when it maxed out performance at 8 as opposed to 16, but maxing out at 4 seems a little weird.

Sorry, I tend to go into denial mode if someone has better hardware than I do.
samr7 (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 430

Firstbits: 1samr7


View Profile
July 12, 2011, 08:29:01 PM
 #113

Well, I'm pretty sure it's still faster than it was on the old version, even with the old running at 8 threads, but I'd need to recompile the older version if I wanted to compare. Just running a case-insensitive prefix:

Code:
cebu:~% nice ./Applications/vanitygen -i -t 1 1abcdefg
Difficulty: 13628644118
[80020 K/s][total 501760][Prob 0.0%][50% in 1.4d]

That's oddly slow, you should be getting about twice that key rate on that CPU.

Quote
Code:
cebu:~% nice ./Applications/vanitygen -i -t 4 1abcdefg
Difficulty: 13628644118
[299808 K/s][total 2408448][Prob 0.0%][50% in 8.8h]

Up to this point, CPU usage in Activity Monitor is about what one expect, being roughly 100% times the number of threads.

Code:
cebu:~% nice ./Applications/vanitygen -i -t 5 1abcdefg
Difficulty: 13628644118
[262992 K/s][total 4264960][Prob 0.0%][50% in 10.0h]

5 threads was having CPU hovering between 420% and 440%, and a lower keygen rate, which makes me think that there's some kind of contention for something that's not CPU-bound.

Indeed!  Try running two instances at four threads each.  If the OS X scheduler is smart, it will isolate each to a processor package to minimize the cost of contention.
pc
Sr. Member
****
Offline Offline

Activity: 253
Merit: 250


View Profile
July 12, 2011, 09:42:10 PM
 #114

Indeed!  Try running two instances at four threads each.  If the OS X scheduler is smart, it will isolate each to a processor package to minimize the cost of contention.

Fascinating. Running two instances at four threads each gives me each instance running about 260000–275000 K/s or so, and each taking up a bit under 400% (probably about as much as they can with the other programs I have running here).
Joric
Member
**
Offline Offline

Activity: 67
Merit: 130


View Profile
July 12, 2011, 10:23:43 PM
 #115

Just built up a script (pywallet.py 1.0) allowing export/import private keys in shortened format (mostly as a lightweight alternative to showwallet for those who didn't manage to compile the branch). Requires only openssl libs (for elliptic curve cryptography). URL: https://github.com/joric/pywallet


1JoricCBkW8C5m7QUZMwoRz9rBCM6ZSy96
bitlotto
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500


BitLotto - best odds + best payouts + cheat-proof


View Profile WWW
July 12, 2011, 11:09:16 PM
 #116

Just built up a script (pywallet.py 1.0) allowing export/import private keys in shortened format (mostly as a lightweight alternative to showwallet for those who didn't manage to compile the branch). Requires only openssl libs (for elliptic curve cryptography). URL: https://github.com/joric/pywallet


COOL! So I'm assuming you run it with Bitcoin closed and force a rescan so it rescans the entire blockchain.

*Next Draw Feb 1*  BitLotto: monthly raffle (0.25 BTC per ticket) Completely transparent and impossible to manipulate who wins. TOR
TOR2WEB
Donations to: 1JQdiQsjhV2uJ4Y8HFtdqteJsZhv835a8J are appreciated.
samr7 (OP)
Full Member
***
Offline Offline

Activity: 140
Merit: 430

Firstbits: 1samr7


View Profile
July 12, 2011, 11:39:50 PM
 #117

New version 0.11 posted.
  • Allow the RNG to be seeded from a file, suggested by Shevek
  • Tweak the synchronization on the pattern list

Fascinating. Running two instances at four threads each gives me each instance running about 260000–275000 K/s or so, and each taking up a bit under 400% (probably about as much as they can with the other programs I have running here).

Try a single instance of the new version.  It should make a lot fewer pthread synchronization calls, and hopefully scale better on your multi-processor machine.  However, I'm still stumped on why each thread is getting about 1/2 the expected key rate.  You should be able to do >1MK/s on that machine.
bmgjet
Member
**
Offline Offline

Activity: 98
Merit: 10


View Profile
July 13, 2011, 12:12:31 AM
 #118

V0.10 is big improvement for me.
Went from finding 1 address per day to find 4.
Just using a LE sempron since running vanitygen takes the cpu from 18c idle to 29c full load and power useage went up by 10w.
My desktop's way quicker finds an address every 2-3 hours but dont like running it full speed since temp goes up to 58C and uses 130W more lol.

Still would love to see what its like on a GPU.

Donations to: 1BMGjetfht9XLkGBYR4TSsuXjrYEKACcow
1stbits: 1bmgjet
300MHash/s 6850 http://www.techpowerup.com/gpuz/5u6wr/
Overclocked for 6 years and still strong http://valid.canardpc.com/show_oc.php?id=1931458 & http://valid.canardpc.com/show_oc.php?id=285337
pc
Sr. Member
****
Offline Offline

Activity: 253
Merit: 250


View Profile
July 13, 2011, 12:27:21 AM
 #119

New version 0.11 posted.

Try a single instance of the new version.  It should make a lot fewer pthread synchronization calls, and hopefully scale better on your multi-processor machine.  However, I'm still stumped on why each thread is getting about 1/2 the expected key rate.  You should be able to do >1MK/s on that machine.

Yes, this seems to be scaling much better. 550000–575000 K/s on 8 threads, 320000 or so on 4 threads, 82000 on 1 thread.

Thank you very much.

And for anyone else compiling on a Mac, I have to add "-I/Developer/SDKs/MacOSX10.5.sdk/usr/include/php/ext/pcre/pcrelib/" to the makefile flags for it to find <pcre.h>. Perhaps there's a better way to get it into the build, but it seems to work for me.
Shevek
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250



View Profile
July 13, 2011, 10:02:41 AM
 #120

New version 0.11 posted.
  • Allow the RNG to be seeded from a file, suggested by Shevek
  • Tweak the synchronization on the pattern list


Thanks for the seed option!

I've tested the code. A "break;" instance should be after "seedfile = optarg;". After this, the program works perfectly!


Proposals for improving bitcoin are like asses: everybody has one
1SheveKuPHpzpLqSvPSavik9wnC51voBa
Pages: « 1 2 3 4 5 [6] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 ... 191 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!