I reached the performance of the good old oclvanitygen on my GeForce GTX 645 (3x192 cores) with my Core i7-4770@3.4GHz alone C:\C++\VanitySearch\x64\Release>VanitySearch.exe -u 1Happy Start Mon Mar 11 14:47:44 2019 Difficulty: 264104224Search: 1Happy Base Key:4F3C51AA76D9FFDD605B50174A0F2E54A79B58434DD5A0FBED72DCB9EBA69855 Number of CPU thread: 8 9.489 MK/s (GPU 0.000 MK/s) (2^25.77) [P 19.46%][50.00% in 00:00:13][0]
C:\C++\VanityGen>oclvanitygen.exe 1Happy Difficulty: 259627881 [9.28 Mkey/s][total 56623104][Prob 19.6%][50% in 13.3s]
C:\C++\VanityGen>vanitygen64.exe 1Happy Difficulty: 259627881 [1.31 Mkey/s][total 3227136][Prob 1.2%][50% in 2.3min]
|
|
|
The i option works very well at home ... I dream of compatibility with CUDA 8.0 to enjoy my old GPU GT520M I'll work on this ASAP. on the other hand I do not know what I'm doing wrong it only records the results of the first pattern
May be because you didn't use the -stop option. In that case all matching addresses are recorded and if one is much more probable than others, then you will have a lot of them in the output. By using the -stop option, only one record per prefix entry are saved and the program ends when all prefixes have been found. good job Jean_Luc Thanks
|
|
|
The new release (1.7) is ready. I had to revue the inline decision of the compiler (Linux) to make the old hardware work but still not very clear. I hope that it won't alter the performance for recent GPU. On my hardware, the reviewing on inline decision has improved a bit performance. Thanks to test it
|
|
|
Hello, Yes , the SLarkBoy's config is impressive. As for stivensons , I would suggest to free some CPU cores to see if the performance are better using the -t option. Some news: I will probably publish a new release today or tomorrow with a slight performance increase and the possibility to search several prefixes in one go. It was difficult to find a solution to not alter the performance due to the overhead of lookup tables but I manage to find good compromises. However, I'm facing an issue with the Linux release and I don't really know at the moment from where it comes. It seems that it is related to my old hardware or to the old SKD (I'm not sure). The same code works perfectly on windows (CUDA SDK 10) but not on my Linux config (CUDA SDK 8.0). It compiles without errors or warnings in both cases, but the generated seems wrong and returns wrong results on Linux. If I remove some part of it, it works again, it seems that I reached a limit somewhere. I have to figure out where... @arulbero: I see that you make a test with a Quadra M2200, it was on Linux ? If yes, would it possible that you try to compile from the last source and execute VanitySearch -check to see if it works on your config. Thanks EDIT: Concerning the issue, I'm speaking of the GPU code, the CPU code works great in both cases.
|
|
|
Hello, I published a new release (1.6). No new feature, just performance increase (16% GPU, 50% CPU on my hardware). The performance increase are mainly due to a best ECC calculations ( many thanks to arulbero ) It affects less the GPU because the GPU has no SIMD instructions to speed up the SHA, so the resource goes mainly to it and much less to ECC calculations. Next Step: - Add support for multi prefix search and (-i input.txt) - Optimize CPU/GPU exchange - Add missing ECC optimizations (some symmetries and endomorphism) - Add support for GPU funnel shift that should speed up SHA (but I need to find a board with compute capability >3.5, mine is 3.0). Thanks for testing it I almost reached the same performance with my CPU alone (Intel Core i7-4770 3.4GHz) than oclvanitygen with my GPU (GTX 645) but still 10 days of calculation to reach to prefix I want.
|
|
|
Can VanitySearch look for more than 1 vanity prefix at a time?
Not yet, I will add this in the next release. As I said, in a previous post, this feature need a refurbishment of the code. This refurbishment should also allow an optimization of data transfer between GPU and CPU. Some news: After very interesting exchanges with arulbero (by PM), we should see significant performance increase in the next release.
|
|
|
I mean, for example, to set a range for creating an address in a specific range, I want to create an address in 2 ^ 135 - 2 ^ 136 with a specific mask
For VanitySearch, I don't really see the interest of this options and it is a good way to see its funds stolen. May be it can be useful for bitcrak, if you have detected a failure in a wallet pseudo random generator and you know more or less subspaces of generated key.
|
|
|
[quote author=stivensons link=topic=5112311.msg50017516#msg50017516 date=1551710888] Will there be settings like Bitcrack in the future? -i, --in FILE Read addresses from FILE, one address per line. If FILE is "-" then stdin is read
Yes, I'm thinking to add this. It will need an important refurbishment of the code however it goes in the same way as optimizing data transfer. So probably yes. --keyspace KEYSPACE Specify the range of keys to search, where KEYSPACE is in the format,
START:END start at key START, end at key END START:+COUNT start at key START and end at key START + COUNT :END start at key 1 and end at key END :+COUNT start at key 1 and end at key 1 + COUNT
No (if I understand well the purpose of this option). VanitySearch is a prefix finder in order to generate usable addresses, you can specifie a seed to generate a base key ,it is even recommended. That's all. The seed is then passed into a pbkdf2_hmac_sha512 in order to protect against seed search attack. If you don't specifie the seed, the basekey is generated using timestamps (in us) plus the date and also passed into the pbkdf2_hmac_sha512. The result of the pbkdf2_hmac_sha512 is then passed into a SHA256 wich is use as the base key.
|
|
|
I get the same ~ 580mk\s
OK Thank you for the test. With the optimizations suggested by arulbero , with few memory transfer improvements, by adding specific GPU intrinsic (notably the funnel shift that should improve SHA and RIPE performance), I hope to reach 1GK/s on your config.
|
|
|
On my hardware, bitcrak (cuda version) is rather slow. With a single target, compressed, I reach only 15 MK/s against 26MK/s with vanitysearch.
|
|
|
OK Thanks So it is similar. Could try now vanitysearch -stop -t 0 -gpu -gpuId 0,1,2,3,4,5,6 -g 160,160,160,160,160,160,144 1Testtttt. It we still have same perf, that means that the default setting are rather good.
|
|
|
CPU 20-25% OK, thanks. Could you try the release 1.5.1 (Available on gitbub) I changed the number of thread per block to 128 and divided by 2 the default number of block per grid. I would like to know if, on your config, it improves performance, it is the same or it is worst ? Thank you Edit : Changed the link
|
|
|
Hello on my side being on a GPU GeForce GT 520 M/ Cuda 8/ driver 391.35/ windows 10 64 / version of appi direct3D 11.2/CUDA 48 ...I have the following error message with the command -gpu 1TEST
Aie ! The problem is that the current CUDA SDK does not support anymore old compute capabilities. The GeForce GT 520 M has compute capability 2.1 (as my old Quadro 600). To make my old Quadro work I had to compile VanitySearch with CUDA SDK 8.0 (under Linux) The current release of VanitySearch (for windows) is compiled with CUDA SDK 10.0. So you can try to compile VanitySearch by yourself. Visual Studio Community Edition 2017 is free. CUDA SDK 8.0 is still available from nVidia site. Follow the instruction on the VanitySearch HomePage.
|
|
|
Ok Thanks. So it seems that the CPU is not able to handle the key rate. If i believe the output you give, it seems that your CPU is a dual core ? If you launch the task manager, your CPU is at 100% ? I think I have to optimize the exchange between CPU and GPU. This is also in my task list. A good challenge would be to reach 1GK/s on your config
|
|
|
everything works perfectly windows-10x64 Thank you very much for testing. Amazing config Just curious, try with -t 0 option. It will free the CPU cores. With such a config, the CPU may be a bottleneck (GPU/CPU transfers). must I use nvidia gpu's with this?
Yes, I'll try to develop an OpenCL version. Edit: Next step will be to increase performance following precious advices from arulbero
|
|
|
Multi-GPU support is ready (Release 1.5), I tested it on Linux only, so If a Windows user can test it It would be great. Example of usage (on a old PC here running Ubuntu 18-04, with 2 Quadro 600 inside): $ ./VanitySearch -l GPU #0 Quadro 600 (2x48 cores) (Cap 2.1) (963.3 MB) (Multiple host threads) GPU #1 Quadro 600 (2x48 cores) (Cap 2.1) (964.5 MB) (Multiple host threads)
$ ./VanitySearch -stop -gpu -gpuId 0,1 1Test Start Sun Mar 3 12:16:26 2019 Search: 1Test Difficulty: 264104224 Base Key:593CB755EB63B403F247F9890BE2F0FEAB3E9023A779E18A6EA62FD6C3D1FDF5 Number of CPU thread: 1 GPU: GPU #1 Quadro 600 (2x48 cores) Grid(32x64) GPU: GPU #0 Quadro 600 (2x48 cores) Grid(32x64) 11.009 MK/s (GPU 10.221 MK/s) (2^27.61) [P 53.96%][60.00% in 00:00:03] Pub Addr: 1Test2JF73wznXjD3LYEfCw4kPqArkvAp Prv Addr: 5JVb2RQC5APQXti4yaGyNwEyo4phmvm773YaxD6rG9jGyZZtP32 Prv Key : 0x593CB755EB63B403F247F9890BE2F0FEABBF9023A7FBE18A6EA62FD6C3D2BAEE Check : 1LZeyhprPQq64ctexwc4Bgo5h15ZSGRWkE Check : 1Test2JF73wznXjD3LYEfCw4kPqArkvAp (comp)
Thanks for testing
|
|
|
Thanks for the link On the GPU, I must say I don't have a clear idea. Nsight is not obvious and its difficult to interpret results. It's good for determining if the GPU is well used (grid size, stream processor occupancy, memory transfers, ...) but I didn't manage to get a clear profile function by function. The GPU does not make Base58, it computes up to the hash160 and send them back to the CPU which check full base58 addresses. Concerning the OpenCL version, I will see, I'm not familiar with it.
|
|
|
Hello, I would like to thanks arulbero who gave me by MP a great tip to improve speed by MP using some symmetries I missed this, shame on me. It will save few modular mult. But however, ~40% of cpu is used for modular mult, other 60% mainly go to SHA,RIPE,Base58,ModInv and byteswapping, so I don't know if I can reach the 2.0MKey/s (x 1.66) For linux (cpu side), I have to work on code generation optimization but assembly using AT&T syntax makes me crazy. Anyway, I managed to set-up CUDA sdk 8.0 on the old Ubuntu PC. I had to patch the nvidia driver, a nightmare. But now CUDA works, I managed to compile sample code and make it work, so i will be able to develop the multi GPU release of vanitysearch.
|
|
|
|