joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 25, 2016, 08:11:56 PM |
|
Hi, could you tell, whether assembled under windows x86 32-bit? Sorry for my English....
32 bit is not supported.
|
|
|
|
NDBob
Newbie
Offline
Activity: 14
Merit: 0
|
|
August 25, 2016, 09:02:15 PM |
|
joblo ....
Redid my set of changes on a clean copy of your 3.4.3 codebase. With these changes it compiles on my westmere CPU with -march=westmere Here are the diffs:
$ diff miner.h miner.h.orig 49a50,56 > #ifndef min > #define min(a,b) (a>b ? b : a) > #endif > #ifndef max > #define max(a,b) (a<b ? b : a) > #endif >
$ diff algo/blake/decred.c algo/blake/decred.c.orig 9,10d8 < #define min(a,b) (a>b ? b : a) < $ diff algo/hodl/aes.c algo/hodl/aes.c.orig 85a86,87 > #ifdef __AVX__ > 149a152,178 > > #else // NO AVX > > static inline __m128i AES256Core(__m128i State, const __m128i *ExpandedKey) > { > State = _mm_xor_si128(State, ExpandedKey[0]); > > for(int i = 1; i < 14; ++i) State = _mm_aesenc_si128(State, ExpandedKey); > > return(_mm_aesenclast_si128(State, ExpandedKey[14])); > } > > void AES256CBC(__m128i *Ciphertext, const __m128i *Plaintext, const __m128i *ExpandedKey, __m128i IV, uint32_t BlockCount) > { > __m128i State = _mm_xor_si128(Plaintext[0], IV); > State = AES256Core(State, ExpandedKey); > Ciphertext[0] = State; > > for(int i = 1; i < BlockCount; ++i) > { > State = _mm_xor_si128(Plaintext, Ciphertext[i - 1]); > State = AES256Core(State, ExpandedKey); > Ciphertext = State; > } > } > > #endif $ diff algo/hodl/hodl-wolf.c algo/hodl/hodl-wolf.c.orig 58a59 > #ifdef __AVX__ 129a131,196 > > #else // no AVX > > uint32_t *pdata = work->data; > uint32_t *ptarget = work->target; > uint32_t BlockHdr[22], FinalPoW[8]; > CacheEntry *Garbage = (CacheEntry*)hodl_scratchbuf; > CacheEntry Cache; > uint32_t CollisionCount = 0; > > swab32_array( BlockHdr, pdata, 20 ); > // Search for pattern in psuedorandom data > int searchNumber = COMPARE_SIZE / opt_n_threads; > int startLoc = threadNumber * searchNumber; > > for(int32_t k = startLoc; k < startLoc + searchNumber && !work_restart[threadNumber].restart; k++) > { > // copy data to first l2 cache > memcpy(Cache.dwords, Garbage + k, GARBAGE_SLICE_SIZE); > #ifndef NO_AES_NI > for(int j = 0; j < AES_ITERATIONS; j++) > { > CacheEntry TmpXOR; > __m128i ExpKey[16]; > > // use last 4 bytes of first cache as next location > uint32_t nextLocation = Cache.dwords[(GARBAGE_SLICE_SIZE >> 2) > - 1] & (COMPARE_SIZE - 1); //% COMPARE_SIZE; > > // Copy data from indicated location to second l2 cache - > memcpy(&TmpXOR, Garbage + nextLocation, GARBAGE_SLICE_SIZE); > //XOR location data into second cache > for( int i = 0; i < (GARBAGE_SLICE_SIZE >> 4); ++i ) > TmpXOR.dqwords = _mm_xor_si128( Cache.dqwords, > TmpXOR.dqwords ); > // Key is last 32b of TmpXOR > // IV is last 16b of TmpXOR > > ExpandAESKey256( ExpKey, TmpXOR.dqwords + > (GARBAGE_SLICE_SIZE / sizeof(__m128i)) - 2 ); > AES256CBC( Cache.dqwords, TmpXOR.dqwords, ExpKey, > TmpXOR.dqwords[ (GARBAGE_SLICE_SIZE / sizeof(__m128i)) > - 1 ], 256 ); } > #endif > // use last X bits as solution > if( ( Cache.dwords[ (GARBAGE_SLICE_SIZE >> 2) - 1 ] > & (COMPARE_SIZE - 1) ) < 1000 ) > { > BlockHdr[20] = k; > BlockHdr[21] = Cache.dwords[ (GARBAGE_SLICE_SIZE >> 2) - 2 ]; > sha256d( (uint8_t *)FinalPoW, (uint8_t *)BlockHdr, 88 ); > CollisionCount++; > if( FinalPoW[7] <= ptarget[7] ) > { > pdata[20] = swab32( BlockHdr[20] ); > pdata[21] = swab32( BlockHdr[21] ); > *hashes_done = CollisionCount; > return(1); > } > } > } > > *hashes_done = CollisionCount; > return(0); > > #endif
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 25, 2016, 09:52:38 PM |
|
joblo ....
Redid my set of changes on a clean copy of your 3.4.3 codebase. With these changes it compiles on my westmere CPU with -march=westmere Here are the diffs:
[snipped]
Thanks. I'm getting flashbacks to a AMD problem. It might be that some of that code won't compile on some AMD CPUs which would explain the presence of the AVX hooks in aes.c. I recently read that AMD was working on SSE5 when Intel was developping AVX. This may have created a mess with different implementations. Eventually AMD's SSE5 and Intel's AVX were merged. This might also be related to the compile error I encountered trying to build for amdfam10, it was AVX related. I'm going to have to dig deeper to understand all the ramifications. It could take a while. You seem to have a workaround and I know of no other Westmere users, well, not any that complained, so I won't rush it. For the time being I'll tighten up the check so it compiles on Westmere out of the box, but without AES performance. The min/max issue will be fixed in the next release. I hope you'll be available to test my fixes. It must be tested on appropriate HW. AMD testers would also help.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 26, 2016, 02:28:58 AM Last edit: August 26, 2016, 06:49:49 PM by joblo |
|
I have an update on supporting cryptonight at nicehash. I implemented the changes and they seem to work and they don't break other pools so there was no need to impmement pool-specific code. My test results on Nicehash are erratic, possibly a pool issue. I was initially submitted 20-25% rejects but that seems to have stopped. The latest session is up to 36 accepts @ 100%, and counting. I also experienced periods of extremely frequent thread hashrate output from one or 2 threads, around 100 per second, showing a hash count of 1 with a normal hashrate. This occurred twice at startup and I killed it. It also happened mid session and cleared itself. This is not associated with the rejects, I still submit valid shares but they show a lower than normal hashrate. This is what it looks like: [2016-08-25 12:23:28] CPU #0: 1 H, 72.57 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 56.63 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 55.92 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 64.27 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 67.63 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 54.73 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 55.19 H/s [2016-08-25 12:23:28] CPU #1: 1 H, 71.66 H/s [2016-08-25 12:23:28] CPU #0: 1 H, 69.21 H/s
More testing to do. thanks for this! I think I found the bug causing the messy output. The bug has existed for a long time but didn't seem to have an effect before. It also wasn't specific to cryptonight or the Nicehash mod. The fix requires a small design change affecting all algos so extensive testing will be required. If it goes smoothly I should release it in a day or so. Edit: The output flood is fixed but I'm still concerned about stale shares. These rejects are intermittant. Last night was not good with rejects rates over 20% at times. Today is better at less than 5%. Sometimes it changes from session to session. A session could be runing clean but if I stop and restart it I may start producing rejects. These rejects are only produced when mining cryptonight at Nicehash. Moneropool is always clean. I'll poke around some more but If I don't find anything and the reject rate is manageable I'll release it as is. Edit2: I noticed something interesting while testing. I was mining three CPUs and had been running clean. They the all reported a cluster of 3 or 4 rejects at the same time. This is too much of a coincidence so it seems the stale share rejects appear to be a pool issue at Nicehash. I consider the issue closed and cryptonight support for Nicehash is ready for release. There is one more pending issue involving Westmere CPUs. If it isn't resolved quickly I'll release cryptoninght anyway.
|
|
|
|
hardkod
Newbie
Offline
Activity: 8
Merit: 0
|
|
August 26, 2016, 09:48:11 AM |
|
Hello Joblo. Sry for poor English.
1. Is current 3.4.3 version (1st post) for windows support nicehash CryptoNight? I have lot of "stratum_recv_line failed..." 2. Can you make small instruction howto compile\install\run your miner in Ububntu please? or link to it.
Big thx for you work.
|
|
|
|
ryen123
|
|
August 26, 2016, 10:06:11 AM |
|
Hello Joblo. Sry for poor English.
1. Is current 3.4.3 version (1st post) for windows support nicehash CryptoNight? I have lot of "stratum_recv_line failed..." 2. Can you make small instruction howto compile\install\run your miner in Ububntu please? or link to it.
Big thx for you work.
@hardkod v3.4.3 does not yet support cryptonight mining at nicehash. There are instructions inside README.md in the source code for building on linux. From README.mdBuilding on linux prerequisites:
It is assumed users know how to install packages on their system and be able to compile standard source packages. This is basic Linux and beyond the scope of cpuminer-opt.
Make sure you have the basic development packages installed. Here is a good start:
http://askubuntu.com/questions/457526/how-to-install-cpuminer-in-ubuntu
Install any additional dependencies needed by cpuminer-opt. The list below are some of the ones that may not be in the default install and need to be installed manually. There may be others, read the error messages they will give a clue as to the missing package.
The folliwing command should install everything you need on Debian based packages:
sudo apt-get install build-essential libssl-dev libcurl4-openssl-dev libjansson-dev libgmp-dev automake
Building on Linux, see below for Windows.
Dependencies
build-essential (for Ubuntu, Development Tools package group on Fedora) automake libjansson-dev libgmp-dev libcurl4-openssl-dev libssl-dev pthreads zlib
tar xvzf [file.tar.gz] cd [file]
Run build.sh to build on Linux or execute the following commands.
./autogen.sh CFLAGS="-O3 -march=native -Wall" CXXFLAGS="$CFLAGS -std=gnu++11" ./configure --with-curl make
Start mining.
./cpuminer -a algo ...
|
|
|
|
hardkod
Newbie
Offline
Activity: 8
Merit: 0
|
|
August 26, 2016, 10:59:37 AM |
|
Thx a lot, so i have to wait new release?
|
|
|
|
NDBob
Newbie
Offline
Activity: 14
Merit: 0
|
|
August 26, 2016, 03:02:41 PM |
|
Joblo --
Some further testing / updates for you: Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.
I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier. AMD builds work for anything newer than barcelona/amdfam10.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 26, 2016, 03:29:20 PM |
|
Joblo --
Some further testing / updates for you: Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.
I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier. AMD builds work for anything newer than barcelona/amdfam10.
Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD. AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX. They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible. What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 26, 2016, 03:30:52 PM |
|
Thx a lot, so i have to wait new release?
Yes. I though that was clear from the recent discussions in this thread.
|
|
|
|
NDBob
Newbie
Offline
Activity: 14
Merit: 0
|
|
August 26, 2016, 03:52:29 PM |
|
Joblo --
Some further testing / updates for you: Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.
I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier. AMD builds work for anything newer than barcelona/amdfam10.
Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD. AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX. They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible. What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing user at the risk of breaking some unknown AMD users. I'm leaning toward the latter. I have some systems lying around with AMD CPUs. I'll see what I've got that is running and run some tests if I can.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 26, 2016, 06:43:15 PM |
|
Joblo --
Some further testing / updates for you: Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.
I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier. AMD builds work for anything newer than barcelona/amdfam10.
Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD. AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX. They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible. What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing user at the risk of breaking some unknown AMD users. I'm leaning toward the latter. I have some systems lying around with AMD CPUs. I'll see what I've got that is running and run some tests if I can. That would be nice. I'm a little confused about your compile problem related to AES256CBC. The min/max issue is resolved. In looking at the code more closely, it took a while to remember what I was thinking when I made those changes, I realized the AVX checks were intended to seperate the original Wolf AES optimizations from the recent Optiminer AVX enhancements. I assumed all the optiminer code required AVX so if it was not available the compiler would revert to the original Wolf code which was AES enhanced. The way it is coded only one instance of AES256CBC should be compiled, either the new Optiminer version or the Wolf version. I really would like to see your compile errors to understand this better. I need to understand the compile error. The code from 3.4.3 should compile the Wolf code on your CPU. The AVX checks in hodl-wolf make the assumption that if AVX is present AES is also present. They are present to seperate the original Wolf code from the Optimier code. The AES checks are only to prevent compile errors on non-AES CPUs. None of the Wolf code is actually run on a non-AES CPU. Perhaps I should block it all out if AES isn't available. The intended result is: AES+AVX: run Optiminer modded code in hodl-wolf.c and aes.c. AES only: run all Wolf code in hodl-wolf.c and aes.c. no AES: run the unoptimized c++ code. That was based on assumptions. You now have some actual data from a CPU with AES but not AVX. Your data shows that only the Optiminer code in GenerateGarbageCore contains AVX code. The remainder of the Optiminer code will run on your AES-only Westmere. This raises another question. Is the Optiminer AES code in aes.c and scanhash_hodl_wolf faster than the corresponding pure Wolf code? Since you weren't able to compile the code as released it points back to understanding why it didn't compile. Once it does you can test both and I can implement it whichever is faster. I know I'm pushy and I know it's a lot of work but it's rare to find a Westmere owner willing and able to do some dirty work. I really appreciate your help.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 27, 2016, 12:37:23 AM |
|
cpuminer 3.4.4 is released. Source: https://drive.google.com/file/d/0B0lVSGQYLJIZcWN3ZE5ma0FWRnM/view?usp=sharingWindows: https://drive.google.com/file/d/0B0lVSGQYLJIZdG50THdjZEo5c1U/view?usp=sharingV4.4.4 adds support for mining cryptonight algo at nicehash with AES optimizations. Some stale share rejects have been observed when mining cryptonight at Nicehash that don't occur at other pools. These rejects are believed to be a pool issue. Also fixed is a compile error when using gcc 6.1. An interim fix for a compile error in Hodl code on Westmere CPUs was submitted. This interim fix should allow hodl to compile, however, it will not be an optimum build. Further investigation into this issue is underway with a goal of enabling AES on Westmere CPUs.
|
|
|
|
felixbrucker
|
|
August 27, 2016, 06:30:41 AM |
|
@joblo i got a strange buffer overflow, you might know if this is miner related: system is a Ubuntu server 16.04 LTS LXC container on proxmox (kernel 4.4.13-1-pve) able to use 2GB ram miner got terminated, my log (stdout/err from cpuminer) displayed the following: https://paste.felixbrucker.com/paste/avy2w
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 27, 2016, 11:07:57 AM |
|
@joblo i got a strange buffer overflow, you might know if this is miner related: system is a Ubuntu server 16.04 LTS LXC container on proxmox (kernel 4.4.13-1-pve) able to use 2GB ram miner got terminated, my log (stdout/err from cpuminer) displayed the following: https://paste.felixbrucker.com/paste/avy2wI've never seen anything like this before. If it happens with all algos and only on proxmox I'd assume it's proxmox related.
|
|
|
|
felixbrucker
|
|
August 27, 2016, 12:23:47 PM |
|
its also the first time i have seen this, im using ubuntu lxc container on debian (proxmox) everywhere and they are rock solid, no clue what is responsible for this.
so i suppose the printed mem map and stuff did not explain whats the issue?
cheers
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 27, 2016, 01:59:51 PM |
|
its also the first time i have seen this, im using ubuntu lxc container on debian (proxmox) everywhere and they are rock solid, no clue what is responsible for this.
so i suppose the printed mem map and stuff did not explain whats the issue?
cheers
It apppears to have something to do with crypto but I have no idea what cpuminer code was running. I'm also unfamiliar with how buffer overflow detection works on Linux. I didn't even know it existed and suspect it involves special tools. Since you have, presumably similar, systems that do work the key is to find out what is different between them. Anythng from the host OS, the VM config, the guest OS, compile, miner version, algo, anything that is different. You could also try changing some variables, different algos, different cpuminer versions etc to try to change the symptoms. Deciphering backtraces is difficult it should be fairly easy to identify if they are all identical. If you can cause the symptoms to change it can lead you to what is causing it.
|
|
|
|
felixbrucker
|
|
August 27, 2016, 02:08:32 PM |
|
independently from this issue i noticed a medium decrease of lyra2re hashrate (not sure if other algos too) on amd cpu's using linux and the build.sh to compile the miner natively (3.4.1 vs 3.4.3/3.4.4)
fx 8320e went from 617kh/s to 550kh/s a10-6800k went from 380kh/s to 359kh/s
current intel cpus however gained the noted slight lyra2re improvement of some 10-20kh/s
any idea why that is?
willing to test around with my setups if needed, can setup some ssh if needed
cheers
edit: this buffer overflow was the first of its kind, system setup software wise is identical on my systems, could only be hardware (old hdd) i will just wait and see if it happens again
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
August 27, 2016, 02:35:05 PM |
|
independently from this issue i noticed a medium decrease of lyra2re hashrate (not sure if other algos too) on amd cpu's using linux and the build.sh to compile the miner natively (3.4.1 vs 3.4.3/3.4.4)
fx 8320e went from 617kh/s to 550kh/s a10-6800k went from 380kh/s to 359kh/s
current intel cpus however gained the noted slight lyra2re improvement of some 10-20kh/s
any idea why that is?
willing to test around with my setups if needed, can setup some ssh if needed
cheers
edit: this buffer overflow was the first of its kind, system setup software wise is identical on my systems, could only be hardware (old hdd) i will just wait and see if it happens again
When I was doing the final tweaking of lyra I noticed that in some cases the AVX code required the same number of instructions as the scalar code or that the AVX version appeared no faster than the scalar version. In fact there was one fucntion I did not modify for AVX because it appeared to have no benefit. This is specific to AVX, AVX2 was always faster. If your CPUs have only AVX it is possible the AMD implementation of it is less efficient that Intel's. The reason for all this is the overhead in converting the data from scalar format to vector format and back again as AVX has its own set of registers. With only a 2 to 1 gain with AVX instructions on lyra2 the AVX segment has to be big enough to overcome the overhead. Short functions don't benefit as much. If you want ot see what I'm talking about perform a diff on algo/lyra2/sponge.c. As you know the situation with AMD and AVX is confusing and I don't think I could make it work perfectly even if I fully understood it.
|
|
|
|
felixbrucker
|
|
August 27, 2016, 03:07:24 PM |
|
im sorry if i did not fully understand everything, im not familiar with such low level code avx is as fast as the scalar code and sometimes also requires the same amount of instructions, avx2 however is always faster the cpus only have avx afaik i took a look at the diff but im a bit lost there whats the best thing to do in my case? if the "old" code that was faster on amd but slower on intel can not be integrated into the miner i will have to identify the slower algos and use the old version for these and newer versions for the other algos i suppose? cheers
|
|
|
|
|