independently from this issue i noticed a medium decrease of lyra2re hashrate (not sure if other algos too) on amd cpu's using linux and the build.sh to compile the miner natively (3.4.1 vs 3.4.3/3.4.4)
fx 8320e went from 617kh/s to 550kh/s a10-6800k went from 380kh/s to 359kh/s
current intel cpus however gained the noted slight lyra2re improvement of some 10-20kh/s
any idea why that is?
willing to test around with my setups if needed, can setup some ssh if needed
cheers
edit: this buffer overflow was the first of its kind, system setup software wise is identical on my systems, could only be hardware (old hdd) i will just wait and see if it happens again
When I was doing the final tweaking of lyra I noticed that in some cases the AVX code required the same number of instructions as the scalar code or that the AVX version appeared no faster than the scalar version. In fact there was one fucntion I did not modify for AVX because it appeared to have no benefit. This is specific to AVX, AVX2 was always faster. If your CPUs have only AVX it is possible the AMD implementation of it is less efficient that Intel's. The reason for all this is the overhead in converting the data from scalar format to vector format and back again as AVX has its own set of registers. With only a 2 to 1 gain with AVX instructions on lyra2 the AVX segment has to be big enough to overcome the overhead. Short functions don't benefit as much. If you want ot see what I'm talking about perform a diff on algo/lyra2/sponge.c. As you know the situation with AMD and AVX is confusing and I don't think I could make it work perfectly even if I fully understood it.
|
|
|
its also the first time i have seen this, im using ubuntu lxc container on debian (proxmox) everywhere and they are rock solid, no clue what is responsible for this.
so i suppose the printed mem map and stuff did not explain whats the issue?
cheers
It apppears to have something to do with crypto but I have no idea what cpuminer code was running. I'm also unfamiliar with how buffer overflow detection works on Linux. I didn't even know it existed and suspect it involves special tools. Since you have, presumably similar, systems that do work the key is to find out what is different between them. Anythng from the host OS, the VM config, the guest OS, compile, miner version, algo, anything that is different. You could also try changing some variables, different algos, different cpuminer versions etc to try to change the symptoms. Deciphering backtraces is difficult it should be fairly easy to identify if they are all identical. If you can cause the symptoms to change it can lead you to what is causing it.
|
|
|
@joblo i got a strange buffer overflow, you might know if this is miner related: system is a Ubuntu server 16.04 LTS LXC container on proxmox (kernel 4.4.13-1-pve) able to use 2GB ram miner got terminated, my log (stdout/err from cpuminer) displayed the following: https://paste.felixbrucker.com/paste/avy2wI've never seen anything like this before. If it happens with all algos and only on proxmox I'd assume it's proxmox related.
|
|
|
Joblo --
Some further testing / updates for you: Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.
I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier. AMD builds work for anything newer than barcelona/amdfam10.
Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD. AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX. They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible. What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing user at the risk of breaking some unknown AMD users. I'm leaning toward the latter. I have some systems lying around with AMD CPUs. I'll see what I've got that is running and run some tests if I can. That would be nice. I'm a little confused about your compile problem related to AES256CBC. The min/max issue is resolved. In looking at the code more closely, it took a while to remember what I was thinking when I made those changes, I realized the AVX checks were intended to seperate the original Wolf AES optimizations from the recent Optiminer AVX enhancements. I assumed all the optiminer code required AVX so if it was not available the compiler would revert to the original Wolf code which was AES enhanced. The way it is coded only one instance of AES256CBC should be compiled, either the new Optiminer version or the Wolf version. I really would like to see your compile errors to understand this better. I need to understand the compile error. The code from 3.4.3 should compile the Wolf code on your CPU. The AVX checks in hodl-wolf make the assumption that if AVX is present AES is also present. They are present to seperate the original Wolf code from the Optimier code. The AES checks are only to prevent compile errors on non-AES CPUs. None of the Wolf code is actually run on a non-AES CPU. Perhaps I should block it all out if AES isn't available. The intended result is: AES+AVX: run Optiminer modded code in hodl-wolf.c and aes.c. AES only: run all Wolf code in hodl-wolf.c and aes.c. no AES: run the unoptimized c++ code. That was based on assumptions. You now have some actual data from a CPU with AES but not AVX. Your data shows that only the Optiminer code in GenerateGarbageCore contains AVX code. The remainder of the Optiminer code will run on your AES-only Westmere. This raises another question. Is the Optiminer AES code in aes.c and scanhash_hodl_wolf faster than the corresponding pure Wolf code? Since you weren't able to compile the code as released it points back to understanding why it didn't compile. Once it does you can test both and I can implement it whichever is faster. I know I'm pushy and I know it's a lot of work but it's rare to find a Westmere owner willing and able to do some dirty work. I really appreciate your help.
|
|
|
Thx a lot, so i have to wait new release?
Yes. I though that was clear from the recent discussions in this thread.
|
|
|
Joblo --
Some further testing / updates for you: Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.
I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier. AMD builds work for anything newer than barcelona/amdfam10.
Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD. AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX. They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible. What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.
|
|
|
I have an update on supporting cryptonight at nicehash. I implemented the changes and they seem to work and they don't break other pools so there was no need to impmement pool-specific code. My test results on Nicehash are erratic, possibly a pool issue. I was initially submitted 20-25% rejects but that seems to have stopped. The latest session is up to 36 accepts @ 100%, and counting. I also experienced periods of extremely frequent thread hashrate output from one or 2 threads, around 100 per second, showing a hash count of 1 with a normal hashrate. This occurred twice at startup and I killed it. It also happened mid session and cleared itself. This is not associated with the rejects, I still submit valid shares but they show a lower than normal hashrate. This is what it looks like: [2016-08-25 12:23:28] CPU #0: 1 H, 72.57 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 56.63 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 55.92 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 64.27 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 67.63 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 54.73 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 55.19 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 71.66 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 69.21 H/s
More testing to do. thanks for this! I think I found the bug causing the messy output. The bug has existed for a long time but didn't seem to have an effect before. It also wasn't specific to cryptonight or the Nicehash mod. The fix requires a small design change affecting all algos so extensive testing will be required. If it goes smoothly I should release it in a day or so. Edit: The output flood is fixed but I'm still concerned about stale shares. These rejects are intermittant. Last night was not good with rejects rates over 20% at times. Today is better at less than 5%. Sometimes it changes from session to session. A session could be runing clean but if I stop and restart it I may start producing rejects. These rejects are only produced when mining cryptonight at Nicehash. Moneropool is always clean. I'll poke around some more but If I don't find anything and the reject rate is manageable I'll release it as is. Edit2: I noticed something interesting while testing. I was mining three CPUs and had been running clean. They the all reported a cluster of 3 or 4 rejects at the same time. This is too much of a coincidence so it seems the stale share rejects appear to be a pool issue at Nicehash. I consider the issue closed and cryptonight support for Nicehash is ready for release. There is one more pending issue involving Westmere CPUs. If it isn't resolved quickly I'll release cryptoninght anyway.
|
|
|
joblo ....
Redid my set of changes on a clean copy of your 3.4.3 codebase. With these changes it compiles on my westmere CPU with -march=westmere Here are the diffs:
[snipped]
Thanks. I'm getting flashbacks to a AMD problem. It might be that some of that code won't compile on some AMD CPUs which would explain the presence of the AVX hooks in aes.c. I recently read that AMD was working on SSE5 when Intel was developping AVX. This may have created a mess with different implementations. Eventually AMD's SSE5 and Intel's AVX were merged. This might also be related to the compile error I encountered trying to build for amdfam10, it was AVX related. I'm going to have to dig deeper to understand all the ramifications. It could take a while. You seem to have a workaround and I know of no other Westmere users, well, not any that complained, so I won't rush it. For the time being I'll tighten up the check so it compiles on Westmere out of the box, but without AES performance. The min/max issue will be fixed in the next release. I hope you'll be available to test my fixes. It must be tested on appropriate HW. AMD testers would also help.
|
|
|
Hi, сould you tell, whether assembled under windows x86 32-bit? Sorry for my English....
32 bit is not supported.
|
|
|
Yes --- corei7 is definitely running without AES_NI.
For HODL, you are excluding a whole bunch of AES_NI code that doesn't require AVX to execute. The only part of Wolf's implementation that requires AVX is the SHA512 function in the initial scratchpad generation routine. If you take out the AVX checks and "non-AVX" code from the rest of the implementation in algo/hodl/aes.c and algo/hodl/hodl-wolf.c it compiles for westmere and runs just fine with AES-NI enabled. Running 24 threads with no affinity on my server I'm seeing about 215H/s without AES and close to 375H/s average with the modified version to allow the AES_NI code to run.
I would need to see your code changes and I also want to see the compile errors. I need to make sure your changes don't break other CPus.
|
|
|
Joblo ---
I flattened the code int algo/hodl/aes.c and algo/hodl/hodl-wolf.c to remove the "non-AVX" code versions for everything but the SHA512 Function at the top of hodl-wolf.c and the code now compiles and runs for -march=westmere.
For cpuminer-corei7.exe from your download mining HODL to nicehash with 12 threads, isolated to the six cores on one CPU I am getting in the 120-130 H/s range performance
For cpuminer-westmere.exe that I compiled using the above modifications using the same configuration on the other CPU in my server I am seeing 240-250 H/s and it indicates AES optimizations ARE enabled.
This confirms that the corei7 build has AES disabled.
|
|
|
Joblo ...
OK, had a chance to play around a bit with the GCC 6.1.0 compiling and I think I found a pretty simple fix to this problem at least. The min/max macros which are causing collisions in the HODL C++ code are only referenced locally in the decred.c file, but are defined manually in miner.h. So ....
1) Comment out or remove the macro definitions for min and max in miner.h 2) add a local definition of the min macro to decred.c
After that I was able to get it to compile on one of my Haswell systems. Still having trouble compiling on an older westmere based system due to some AES256CBC complaints.
Bob
Good work. I'll make the change proactively. The AES256CBC problem may be related to AVX code in hodl-wolf. IIRC either Nehalem or Westmere have AES but not AVX. I may have to tighen up the checking to force it to use the unoptimized version. Did you do a native compile? Have you tried corei7? "gcc -Q -march=native --help=target" will tell you which arch is the default for native. Westmere support AES-NI but not AVX. Nehalem doesn't support either. I've successfully compiled for all the AVX platforms on my laptop - haswell corei5 but can't compile with march=westmere or with native on my dev virtual machine which is running on some older servers (Dual Westmere Hex-core) that I wanted to test on. setting march=haswell on the older VM works fine and compiles haswell optimized code (which can't run locally obviously). Appears to be some sort of a conflict in the capabilities check on the HODL AES code. You're right, I only check for AES_NI, not AVX. This may affect some other algos that also have AVX code mixed in with AES. If I can identify which ones are pure AES I can make a distinction otherwise I'll have to use non-AES code unless the CPU also supports AVX. I don't have the necessary HW to test but if you don't mind doing a little more work it would help a lot. There are three groups of AES code. There is code used only by hodl, code only used by cryptonight and code shared among many algos including x11. Those three should cover the entire spectrum of AES optimized code. Those that work on your Westmere can have AES enabled without AVX. Those like Hodl will require a CPU with AVX before AES can be enabled. It just occurred to me that you probably did a native compile. Do you know what arch the compiler mapped that to? A Windows user reported success using the corei7 build on a Nehalem CPU. If yours is different you could try -march=corei7. This will raise my confidence in the fix since I can't test it on the right HW. Might have gotten lost in the thread above, but I was able to compile with AVX level features on the westemere based system, the obviously just don't get detected or work. The errors only seem to occur when I set the march to westmere or lower. I've been looking through the code for hodl to try to figure out what seems to be causing the problem. It primarily seems to be from different versions of the SHA256CBC algorithm that it is attempting to compile in simultaneously. In going through though, I've come across a question for you with regards to your capabilities tests --- you seem to be excluding a lot of the AES_NI optimized code by wrapping it in the AVX segment even though there don't seem to be any AVX instructions in those code segments. I haven't had a chance to look through it thoroughly, but on a quick scan the only part of wolf's code that utilizes AVX instructions is the SHA512 function used to generate the scratchpad. The rest of the code should be able to be under #ifndef NO_AES_NI. Bob Can you post the actual errors? I've been speculating it was an AVX issue. If you're seeing multiple definitions I may be going in the wrong direction.
|
|
|
Joblo ...
OK, had a chance to play around a bit with the GCC 6.1.0 compiling and I think I found a pretty simple fix to this problem at least. The min/max macros which are causing collisions in the HODL C++ code are only referenced locally in the decred.c file, but are defined manually in miner.h. So ....
1) Comment out or remove the macro definitions for min and max in miner.h 2) add a local definition of the min macro to decred.c
After that I was able to get it to compile on one of my Haswell systems. Still having trouble compiling on an older westmere based system due to some AES256CBC complaints.
Bob
Good work. I'll make the change proactively. The AES256CBC problem may be related to AVX code in hodl-wolf. IIRC either Nehalem or Westmere have AES but not AVX. I may have to tighen up the checking to force it to use the unoptimized version. Did you do a native compile? Have you tried corei7? "gcc -Q -march=native --help=target" will tell you which arch is the default for native. Westmere support AES-NI but not AVX. Nehalem doesn't support either. I've successfully compiled for all the AVX platforms on my laptop - haswell corei5 but can't compile with march=westmere or with native on my dev virtual machine which is running on some older servers (Dual Westmere Hex-core) that I wanted to test on. setting march=haswell on the older VM works fine and compiles haswell optimized code (which can't run locally obviously). Appears to be some sort of a conflict in the capabilities check on the HODL AES code. You're right, I only check for AES_NI, not AVX. This may affect some other algos that also have AVX code mixed in with AES. If I can identify which ones are pure AES I can make a distinction otherwise I'll have to use non-AES code unless the CPU also supports AVX. I don't have the necessary HW to test but if you don't mind doing a little more work it would help a lot. There are three groups of AES code. There is code used only by hodl, code only used by cryptonight and code shared among many algos including x11. Those three should cover the entire spectrum of AES optimized code. Those that work on your Westmere can have AES enabled without AVX. Those like Hodl will require a CPU with AVX before AES can be enabled. It just occurred to me that you probably did a native compile. Do you know what arch the compiler mapped that to? A Windows user reported success using the corei7 build on a Nehalem CPU. If yours is different you could try -march=corei7. This will raise my confidence in the fix since I can't test it on the right HW.
|
|
|
I know what could be a great CPU coin, make POW CPU coin, make it as steemit concept but pure POW and fair, miners must use private key for mining so pools are not possible, make it only minable solo, use as harder algo to kill any GPU mining attempt in near future I could see good prospect for that kind of CPU coin
This depends on how important mining is to the sucess of the coin. If you view it as an undesireable necessity it's fine if you discourage miners by making them jump through hoops. but if you view mining as a way to market the coin it would be counterproductive. From my own perspective my initial interest in crypto was via mining. I didn't care then, or care now, about coins that can't be mined easilly.
|
|
|
For comparison my i7-6700K @ 4 GHz does 290 h/s XMR.
The issue with "CPU algos" and GPUs is a combination of architectural limitations limiting the potential gain combined with increased development effort to try to overcome these limitations. Lots of work for little gain - low motivation.
Lyra2 is an interesting case. Lyra2RE performs much better on a CPU than Lyra2REv2 compared with nvidia GPUs. Using the latest optimisations for both CPU and GPU miners, and hash rates normalised, Lyra2RE mines at 4x Lyra2REv2 on a CPU compared with a GPU at 1x.
A 750ti mines cryptonight @ 300H/s @ around 50 watt with my private miner. What speed are you getting in lyra2v2 and lyra2RE with your cpu miner? 300 sounds good but private miners don't interest me. I get 290 but I haven't measured the power. It runs a lot cooler than most other algos so it's probably fairly light on power. On my i7-6700K @ 4 GHz: lyra2re: 1390 kH/s lyra2rev2: 1350 kH/s
|
|
|
I have an update on supporting cryptonight at nicehash. I implemented the changes and they seem to work and they don't break other pools so there was no need to impmement pool-specific code. My test results on Nicehash are erratic, possibly a pool issue. I was initially submitted 20-25% rejects but that seems to have stopped. The latest session is up to 36 accepts @ 100%, and counting. I also experienced periods of extremely frequent thread hashrate output from one or 2 threads, around 100 per second, showing a hash count of 1 with a normal hashrate. This occurred twice at startup and I killed it. It also happened mid session and cleared itself. This is not associated with the rejects, I still submit valid shares but they show a lower than normal hashrate. This is what it looks like: [2016-08-25 12:23:28] CPU #0: 1 H, 72.57 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 56.63 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 55.92 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 64.27 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 67.63 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 54.73 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 55.19 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 71.66 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 69.21 H/s
More testing to do.
|
|
|
@joblo ---
Mining LYRA2RE to NiceHash:
CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPU features: SSE2 AES SW built on Aug 24 2016 with GCC 4.8.3 SW features: SSE2 Algo features: SSE2 AES AVX AVX2 AES not available, starting mining with SSE2 optimizations...
Mining HODL to NiceHash:
CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz CPU features: SSE2 AES SW built on Aug 24 2016 with GCC 4.8.3 SW features: SSE2 Algo features: SSE2 AES AVX AVX2 AES not available, starting mining with SSE2 optimizations...
Excellent, thanks.
|
|
|
corei7 nehalem, westmere
I did a test on Nehalem. cpuminer-corei7.exe works well for m7m and cryptonight. Joblo, if you need other test on my CPU, just PM me, if I don't see your request in thread within 24h. Thanks for testing. Could you post the capabilities check ouput when the miner starts. I'd like to confirm it is correct.
|
|
|
@joblo --- I'd be happy to test the AES w/o AVX configurations --- do you have a patch file or a link to the downloaded files with the patches?
Thanks for the offer. I assume you have a Westmere CPU, and you use Windows. You can test with the current cpuminer-corei7 binary. As another user discovered Hodl algo failed to compile on his Westmere so I don't suggest you attempt hodl. I also have a fix coming shortly for that algo. Cryptonight also has AES code that may or may not work on a Westmere. There is another block of AES code shared among many algos including x11. Both of these would be good tests. Don't test cryptonight on Nicehash yet, I'm still working on that. If you did test either or both of them please include in your report whether the miner chose to use the AES code: Start mining with SSE2 AES AVX It will display both AES and AVX or neither. At this time I'm treating them as a single architecture level. If your tests prove there is AES code that does not also contain AVX I can split them up and improve performance on Westmere on some algos. Otherwise Westmere will have to dumb down and use the SSE2 code only as had to be done with Hodl.
|
|
|
|