AlexGR
Legendary
Offline
Activity: 1708
Merit: 1049
|
 |
April 24, 2016, 04:46:25 AM |
|
Let me get this straight. You compiled with -march=native on a core2 that thinks it's a i5-670.
Yep... The compile succeeded and the miner ran ok. That's pretty special.
.16 was broken due to some errors (algogate? can't remember) which I removed manually from all the sources, but .18 runs ok. The CPU model and AES support comes directly from CPUID and has been reliable until now. Even the AMD guys haven't reported CPUID problems. Can you confirm CPUID is correct: cat /proc/cpuinfo |grep model cat /proc/cpuinfo |grep model model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz stepping : 7 microcode : 0x70a cpu MHz : 1754.042 cache size : 2048 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm dtherm bugs : bogomips : 3508.08 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: It would be interesting to see what the compiler thought: gcc -march=native -Q --help=target | fgrep march gcc -march=native -Q --help=target | fgrep march = -march= core2 Regarding echo512, yes you will get the slow version. Unfortunately I'm unaware of a SSE2 optimized version and the AES version is already used by cpuminer-opt on capable CPUs.
In case when you need to see performances, or find sources: https://bench.cr.yp.to/primitives-sha3.html+ sources of every possible variant here: https://github.com/floodyberry/supercop/tree/master/crypto_hashI have yet to study your data in any detail but the following may put performance into perspective.
Echo512 and groestl have AES optimizations for most algos.
Cryptonight and hodl have their own unique AES optimizations.
The rest of the x11 chain, including groestl but excluding echo, have SSE2 optimized versions. The algos in the longer X chains, as well as non-aes echo, are filled with slow SPH versions.
Yeah, lacking AES (and AVX) hurts a lot.
|
|
|
|
AlexGR
Legendary
Offline
Activity: 1708
Merit: 1049
|
 |
April 24, 2016, 04:50:46 AM |
|
Ok found the problem... The profiler did it  I run the program through an indirect call: valgrind --tool=callgrind ./cpuminer -a x11 --benchmark and then exported the profile data to KCachegrind to get the graph. I don't know how running it indirectly can do that, except if it emulates another cpuid. Normal run is ok, it detects Q8200.
|
|
|
|
Giulini
|
 |
April 24, 2016, 07:20:04 AM |
|
tried "bdver1", no luck
kim@spiel-2:~/cpuminer-opt-3.1.18$ gcc -march=native -Q --help=target The following options are target specific: -m128bit-long-double [disabled] -m32 [disabled] -m3dnow [disabled] -m3dnowa [disabled] -m64 [enabled] -m80387 [enabled] -m8bit-idiv [disabled] -m96bit-long-double [enabled] -mabi= sysv -mabm [enabled] -maccumulate-outgoing-args [disabled] -maddress-mode= short -madx [disabled] -maes [disabled] -malign-double [disabled] -malign-functions= 0 -malign-jumps= 0 -malign-loops= 0 -malign-stringops [enabled] -mandroid [disabled] -march= amdfam10 -masm= att -mavx [disabled] -mavx2 [disabled] -mavx256-split-unaligned-load [disabled] -mavx256-split-unaligned-store [disabled] -mbionic [disabled] -mbmi [disabled] -mbmi2 [disabled] -mbranch-cost= 0 -mcld [disabled] -mcmodel= 32 -mcpu= -mcrc32 [disabled] -mcx16 [enabled] -mdispatch-scheduler [disabled] -mf16c [disabled] -mfancy-math-387 [enabled] -mfentry [enabled] -mfma [disabled] -mfma4 [disabled] -mforce-drap [disabled] -mfp-ret-in-387 [enabled] -mfpmath= 387 -mfsgsbase [disabled] -mfused-madd -mfxsr [enabled] -mglibc [enabled] -mhard-float [enabled] -mhle [disabled] -mieee-fp [enabled] -mincoming-stack-boundary= 0 -minline-all-stringops [disabled] -minline-stringops-dynamically [disabled] -mintel-syntax -mlarge-data-threshold= 0x10000 -mlong-double-64 [disabled] -mlong-double-80 [enabled] -mlwp [disabled] -mlzcnt [enabled] -mmmx [disabled] -mmovbe [disabled] -mms-bitfields [disabled] -mno-align-stringops [disabled] -mno-fancy-math-387 [disabled] -mno-push-args [disabled] -mno-red-zone [disabled] -mno-sse4 [enabled] -momit-leaf-frame-pointer [disabled] -mpc32 [disabled] -mpc64 [disabled] -mpc80 [disabled] -mpclmul [disabled] -mpopcnt [enabled] -mprefer-avx128 [disabled] -mpreferred-stack-boundary= 0 -mprfchw [enabled] -mpush-args [enabled] -mrdrnd [disabled] -mrdseed [disabled] -mrecip [disabled] -mrecip= -mred-zone [enabled] -mregparm= 0 -mrtd [disabled] -mrtm [disabled] -msahf [enabled] -msoft-float [disabled] -msse [disabled] -msse2 [disabled] -msse2avx [disabled] -msse3 [disabled] -msse4 [disabled] -msse4.1 [disabled] -msse4.2 [disabled] -msse4a [disabled] -msse5 -msseregparm [disabled] -mssse3 [disabled] -mstack-arg-probe [disabled] -mstackrealign [enabled] -mstringop-strategy= [default] -mtbm [disabled] -mtls-dialect= gnu -mtls-direct-seg-refs [enabled] -mtune= amdfam10 -muclibc [disabled] -mveclibabi= [default] -mvect8-ret-in-mem [disabled] -mvzeroupper [disabled] -mx32 [disabled] -mxop [disabled] -mxsave [disabled] -mxsaveopt [disabled]
Known assembler dialects (for use with the -masm-dialect= option): att intel
Known ABIs (for use with the -mabi= option): ms sysv
Known code models (for use with the -mcmodel= option): 32 kernel large medium small
Valid arguments to -mfpmath=: 387 387+sse 387,sse both sse sse+387 sse,387
Known vectorization library ABIs (for use with the -mveclibabi= option): acml svml
Known address mode (for use with the -maddress-mode= option): long short
Valid arguments to -mstringop-strategy=: byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop
Known TLS dialects (for use with the -mtls-dialect= option): gnu gnu2
|
|
|
|
th3.r00t
|
 |
April 24, 2016, 08:50:03 AM |
|
Edit2: I just realized there is a typo, it should be "-march=bdver1". Give it a try, it might be faster for some algos.
This might be the best I can come up with. Now that you both have it figured out for your own situation are the tips in README.md clear enough for other users? I've added another phrase to the existing in italic.
Some users with AMD CPUs without AES_NI have reported problems compiling with build.sh or "-march=native". Problems have included compile errors and poor performance. These users are recommended to compile manually specifying "-march=btver1bdver1" on the configure command line. If all else fails "-march=core2" will provide the best compatibility but the lowest performance".
 As you can see, there is btver1 and bdver1. They are NOT the same and they refer to different CPUs. bdver1 uses AES, AVX and so on CPU instructions.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 24, 2016, 11:20:19 AM |
|
Edit2: I just realized there is a typo, it should be "-march=bdver1". Give it a try, it might be faster for some algos.
This might be the best I can come up with. Now that you both have it figured out for your own situation are the tips in README.md clear enough for other users? I've added another phrase to the existing in italic.
Some users with AMD CPUs without AES_NI have reported problems compiling with build.sh or "-march=native". Problems have included compile errors and poor performance. These users are recommended to compile manually specifying "-march=btver1bdver1" on the configure command line. If all else fails "-march=core2" will provide the best compatibility but the lowest performance".
 As you can see, there is btver1 and bdver1. They are NOT the same and they refer to different CPUs. bdver1 uses AES, AVX and so on CPU instructions. Thanks, I also found this tidbit: AMD Opteron™ and AMD FX series processors with “Bulldozer” processor core (options: -march=bdver1 and -mtune=bdver1) and AMD processors with “Bobcat” core (options: -march=btver1 and -mtune=btver1). http://developer.amd.com/community/blog/2012/04/23/gcc-4-7-is-available-with-support-for-amd-opteron-6200-series-and-amd-fx-series-processors/
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 24, 2016, 11:21:34 AM |
|
Ok found the problem... The profiler did it  I run the program through an indirect call: valgrind --tool=callgrind ./cpuminer -a x11 --benchmark and then exported the profile data to KCachegrind to get the graph. I don't know how running it indirectly can do that, except if it emulates another cpuid. Normal run is ok, it detects Q8200. Thanks for the follow up, had me worried when your cpuid was shown correct.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 24, 2016, 06:54:29 PM Last edit: April 25, 2016, 02:11:28 AM by joblo |
|
Promissing news for Windows users. I have successfully compiled and run cpuminer-opt using CygWin. I won't go into a lot of details but install CygWin base, the g++/autoconf toolchain and all the dependencies required by cpuminer-opt and compile like on linux from the cygwin terminal. Caveats: v3.1.9W is a repackage of v3.1.9 and is therefore missing some new algos, optimizations and other changes. It should be considered beta quality. Algos are hit and miss, x11 and quark work but cryptonight does not. Only AES_NI is supported in this release. Source code is modified from original v3.1.9 so don't try to use it on Linux. I expect some issues to arise especially for those unfamiliar with Linux and Cygwin. Please be patient and try to work things out for yourselves. Do some research. When you ask for help please provide complete information and what you have tried to solve the problem so I don't have to retrace your steps. I also welcome other more experienced users to help out answering the newbie questions. Here's the download link for v3.1.9W: https://drive.google.com/file/d/0B0lVSGQYLJIZLUU5Njd2bVRKMUE/view?usp=sharing
|
|
|
|
michelem
Legendary
Offline
Activity: 1015
Merit: 1000
|
 |
April 27, 2016, 09:09:14 AM |
|
Is there a Github repository of this?
|
|
|
|
GoldTiger69
|
 |
April 27, 2016, 01:03:49 PM |
|
Hey man, I have a suggestion for you. You don't need to find out if the hardware have AES-NI in the beginning. If you use EVP alone, it finds out by itself if the hardware is AES-NI capable, and if that's the case, it uses it automatically. You don't have to do anything additional.
I already tried it and it works perfectly. Just my two cents.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 27, 2016, 08:37:40 PM |
|
Is there a Github repository of this?
Eventually but not yet.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 27, 2016, 08:43:17 PM |
|
Hey man, I have a suggestion for you. You don't need to find out if the hardware have AES-NI in the beginning. If you use EVP alone, it finds out by itself if the hardware is AES-NI capable, and if that's the case, it uses it automatically. You don't have to do anything additional.
I already tried it and it works perfectly. Just my two cents.
I don't know what EVP is? Where does it come into play at compile time, run time?
|
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
 |
April 27, 2016, 09:05:55 PM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 27, 2016, 10:55:22 PM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection. Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the AES_NI and non-AES_NI implementations.
|
|
|
|
GoldTiger69
|
 |
April 27, 2016, 11:09:00 PM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection. Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the AES_NI and non-AES_NI implementations. @joblo: You are already using it in hodl.cpp: EVP_EncryptInit(&ctx, EVP_aes_256_cbc(), key, iv); EVP_EncryptUpdate(&ctx, cacheMemoryOperatingData, &outlen1, cacheMemoryOperatingData2, cacheMemorySize); EVP_EncryptFinal(&ctx, cacheMemoryOperatingData + outlen1, &outlen2); EVP_CIPHER_CTX_cleanup(&ctx); All that I'm saying is that you shouldn't do anything else but let it run by itself. in other words, don't try to select the best arch; just use native and don't fork the source code to use 'special case' for AES-NI, since EVP will select it automatically. Meaning: don't use '#ifdef AES_NI', it's not necessary. Hope that I'm clear this time
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 27, 2016, 11:18:09 PM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection. Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the AES_NI and non-AES_NI implementations. @joblo: You are already using it in hodl.cpp: EVP_EncryptInit(&ctx, EVP_aes_256_cbc(), key, iv); EVP_EncryptUpdate(&ctx, cacheMemoryOperatingData, &outlen1, cacheMemoryOperatingData2, cacheMemorySize); EVP_EncryptFinal(&ctx, cacheMemoryOperatingData + outlen1, &outlen2); EVP_CIPHER_CTX_cleanup(&ctx); All that I'm saying is that you shouldn't do anything else but let it run by itself. in other words, don't try to select the best arch; just use native and don't fork the source code to use 'special case' for AES-NI, since EVP will select it automatically. Meaning: don't use '#ifdef AES_NI', it's not necessary. Hope that I'm clear this time I beg to differ. The NO_AES_NI check is needed to prevent the compiler from trying to compile AES_NI instructions on an incompatible CPU. Without it the compile fails.
|
|
|
|
GoldTiger69
|
 |
April 27, 2016, 11:21:38 PM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection. Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the AES_NI and non-AES_NI implementations. @joblo: You are already using it in hodl.cpp: EVP_EncryptInit(&ctx, EVP_aes_256_cbc(), key, iv); EVP_EncryptUpdate(&ctx, cacheMemoryOperatingData, &outlen1, cacheMemoryOperatingData2, cacheMemorySize); EVP_EncryptFinal(&ctx, cacheMemoryOperatingData + outlen1, &outlen2); EVP_CIPHER_CTX_cleanup(&ctx); All that I'm saying is that you shouldn't do anything else but let it run by itself. in other words, don't try to select the best arch; just use native and don't fork the source code to use 'special case' for AES-NI, since EVP will select it automatically. Meaning: don't use '#ifdef AES_NI', it's not necessary. Hope that I'm clear this time I beg to differ. The NO_AES_NI check is needed to prevent the compiler from trying to compile AES_NI instructions on an incompatible CPU. Without it the compile fails. Then just get rid of that code, you don't need that part because like I already said: EVP algo will detect AES-NI capability and will use it automatically. You don't need extra code for that, just get ride of it. I already did and it works perfect.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 28, 2016, 01:28:11 AM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection. Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the AES_NI and non-AES_NI implementations. @joblo: You are already using it in hodl.cpp: EVP_EncryptInit(&ctx, EVP_aes_256_cbc(), key, iv); EVP_EncryptUpdate(&ctx, cacheMemoryOperatingData, &outlen1, cacheMemoryOperatingData2, cacheMemorySize); EVP_EncryptFinal(&ctx, cacheMemoryOperatingData + outlen1, &outlen2); EVP_CIPHER_CTX_cleanup(&ctx); All that I'm saying is that you shouldn't do anything else but let it run by itself. in other words, don't try to select the best arch; just use native and don't fork the source code to use 'special case' for AES-NI, since EVP will select it automatically. Meaning: don't use '#ifdef AES_NI', it's not necessary. Hope that I'm clear this time I beg to differ. The NO_AES_NI check is needed to prevent the compiler from trying to compile AES_NI instructions on an incompatible CPU. Without it the compile fails. Then just get rid of that code, you don't need that part because like I already said: EVP algo will detect AES-NI capability and will use it automatically. You don't need extra code for that, just get ride of it. I already did and it works perfect. If you've been digging into the code you realize I have implemented both existing hodl miners, the original hodlminer and the AES optimized hodlminer-wolf. The original hodlminer uses EVP so why is it slower than Wolf? Openssl/evp is likely a general purpose tool while the Wolf miner is optimized for the specific task. If you've followed previous discuusions about design changes you'll know I'm a tough sell. Without going into detail there are only three things that would motivate me to make design changes to a stable product: support for more algos, higher hashrates or Windows support.
|
|
|
|
GoldTiger69
|
 |
April 28, 2016, 01:51:13 AM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection. Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the AES_NI and non-AES_NI implementations. @joblo: You are already using it in hodl.cpp: EVP_EncryptInit(&ctx, EVP_aes_256_cbc(), key, iv); EVP_EncryptUpdate(&ctx, cacheMemoryOperatingData, &outlen1, cacheMemoryOperatingData2, cacheMemorySize); EVP_EncryptFinal(&ctx, cacheMemoryOperatingData + outlen1, &outlen2); EVP_CIPHER_CTX_cleanup(&ctx); All that I'm saying is that you shouldn't do anything else but let it run by itself. in other words, don't try to select the best arch; just use native and don't fork the source code to use 'special case' for AES-NI, since EVP will select it automatically. Meaning: don't use '#ifdef AES_NI', it's not necessary. Hope that I'm clear this time I beg to differ. The NO_AES_NI check is needed to prevent the compiler from trying to compile AES_NI instructions on an incompatible CPU. Without it the compile fails. Then just get rid of that code, you don't need that part because like I already said: EVP algo will detect AES-NI capability and will use it automatically. You don't need extra code for that, just get ride of it. I already did and it works perfect. If you've been digging into the code you realize I have implemented both existing hodl miners, the original hodlminer and the AES optimized hodlminer-wolf. The original hodlminer uses EVP so why is it slower than Wolf? Openssl/evp is likely a general purpose tool while the Wolf miner is optimized for the specific task. If you've followed previous discuusions about design changes you'll know I'm a tough sell. Without going into detail there are only three things that would motivate me to make design changes to a stable product: support for more algos, higher hashrates or Windows support. Ok, the optimized Wolf miner runs slower than EVP when the machine is AES-NI capable. Again, I'm just giving you a suggestion, you are free to do whatever you want of course  Whenever you got the time, just give it a try on a machine with AES-NI running just EVP and you will see what I'm talking about. And I'm just talking about the part for hodlcoin, wich is the only part that I've checked out.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
April 28, 2016, 02:30:08 AM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection. Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the AES_NI and non-AES_NI implementations. @joblo: You are already using it in hodl.cpp: EVP_EncryptInit(&ctx, EVP_aes_256_cbc(), key, iv); EVP_EncryptUpdate(&ctx, cacheMemoryOperatingData, &outlen1, cacheMemoryOperatingData2, cacheMemorySize); EVP_EncryptFinal(&ctx, cacheMemoryOperatingData + outlen1, &outlen2); EVP_CIPHER_CTX_cleanup(&ctx); All that I'm saying is that you shouldn't do anything else but let it run by itself. in other words, don't try to select the best arch; just use native and don't fork the source code to use 'special case' for AES-NI, since EVP will select it automatically. Meaning: don't use '#ifdef AES_NI', it's not necessary. Hope that I'm clear this time I beg to differ. The NO_AES_NI check is needed to prevent the compiler from trying to compile AES_NI instructions on an incompatible CPU. Without it the compile fails. Then just get rid of that code, you don't need that part because like I already said: EVP algo will detect AES-NI capability and will use it automatically. You don't need extra code for that, just get ride of it. I already did and it works perfect. If you've been digging into the code you realize I have implemented both existing hodl miners, the original hodlminer and the AES optimized hodlminer-wolf. The original hodlminer uses EVP so why is it slower than Wolf? Openssl/evp is likely a general purpose tool while the Wolf miner is optimized for the specific task. If you've followed previous discuusions about design changes you'll know I'm a tough sell. Without going into detail there are only three things that would motivate me to make design changes to a stable product: support for more algos, higher hashrates or Windows support. Ok, the optimized Wolf miner runs slower than EVP when the machine is AES-NI capable. Again, I'm just giving you a suggestion, you are free to do whatever you want of course  Whenever you got the time, just give it a try on a machine with AES-NI running just EVP and you will see what I'm talking about. And I'm just talking about the part for hodlcoin, wich is the only part that I've checked out. Did you mistype your first sentence? If evp was faster than Wolf I would have expected a more emphatic response. If it is faster how do I implement it?
|
|
|
|
GoldTiger69
|
 |
April 28, 2016, 02:43:42 AM |
|
Evp, AFAIK, is an abstraction layer of openssl which automatically selects the best instruction set based on the current cpu. Not sure it applies to this project, maybe to hodlcoin or others with "standard" algos.
OK, it's associated with openssl, The only thing close I found for the acronym was enhanced virus protection. Few coins use openssl for their hashing algos so I don't see the point, and hodl already uses it for both the AES_NI and non-AES_NI implementations. @joblo: You are already using it in hodl.cpp: EVP_EncryptInit(&ctx, EVP_aes_256_cbc(), key, iv); EVP_EncryptUpdate(&ctx, cacheMemoryOperatingData, &outlen1, cacheMemoryOperatingData2, cacheMemorySize); EVP_EncryptFinal(&ctx, cacheMemoryOperatingData + outlen1, &outlen2); EVP_CIPHER_CTX_cleanup(&ctx); All that I'm saying is that you shouldn't do anything else but let it run by itself. in other words, don't try to select the best arch; just use native and don't fork the source code to use 'special case' for AES-NI, since EVP will select it automatically. Meaning: don't use '#ifdef AES_NI', it's not necessary. Hope that I'm clear this time I beg to differ. The NO_AES_NI check is needed to prevent the compiler from trying to compile AES_NI instructions on an incompatible CPU. Without it the compile fails. Then just get rid of that code, you don't need that part because like I already said: EVP algo will detect AES-NI capability and will use it automatically. You don't need extra code for that, just get ride of it. I already did and it works perfect. If you've been digging into the code you realize I have implemented both existing hodl miners, the original hodlminer and the AES optimized hodlminer-wolf. The original hodlminer uses EVP so why is it slower than Wolf? Openssl/evp is likely a general purpose tool while the Wolf miner is optimized for the specific task. If you've followed previous discuusions about design changes you'll know I'm a tough sell. Without going into detail there are only three things that would motivate me to make design changes to a stable product: support for more algos, higher hashrates or Windows support. Ok, the optimized Wolf miner runs slower than EVP when the machine is AES-NI capable. Again, I'm just giving you a suggestion, you are free to do whatever you want of course  Whenever you got the time, just give it a try on a machine with AES-NI running just EVP and you will see what I'm talking about. And I'm just talking about the part for hodlcoin, wich is the only part that I've checked out. Did you mistype your first sentence? If evp was faster than Wolf I would have expected a more emphatic response. If it is faster how do I implement it? It is faster and you already did. EVP is already implemented in the file hodl.cpp It will select AES-NI automatically if the machine has it. All you need to do is get rid of the Wolf code and let your own code by itself; it will select AES-NI automatically, you don't have to change anything. Just try it and you will see. Note: get rid of all the code that you are using for the AES-NI section in the hodlcoin source, you don't need it.
|
|
|
|
|