4ward
Member
Offline
Activity: 473
Merit: 18
|
|
May 22, 2019, 05:36:27 PM |
|
Can you add Ranfonrest2? https://github.com/MicroBitcoinOrg/CpuminerFrom my experience, the reference miner reports significantly higher speed than actual on pool side (Seems like x256)
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
May 22, 2019, 06:55:30 PM |
|
TPruvot has it, does it work better? I've already looked at the code. My first glance shows it's a completely new algo and can't benefit from any of the canned optimizations. To optimize it requires a detailed analysis of the code to look for opportunities to vectorize either serially, parallelly, or not at all. I expect the scalar code to be near optimum already. It's a huge task to do the whole algo at once. Not really interested at this time. Hashrate displayed by the miner, both thread and share, are artificially calculated based on the number of iterations over time. The pool calculates based on the number and difficulty of submitted valid shares. Perhaps there's a math error in the miners calculations.
|
|
|
|
4ward
Member
Offline
Activity: 473
Merit: 18
|
|
May 22, 2019, 07:02:40 PM |
|
TPruvot has it, does it work better? I've already looked at the code. My first glance shows it's a completely new algo and can't benefit from any of the canned optimizations. To optimize it requires a detailed analysis of the code to look for opportunities to vectorize either serially, parallelly, or not at all. I expect the scalar code to be near optimum already. It's a huge task to do the whole algo at once. Not really interested at this time. Hashrate displayed by the miner, both thread and share, are artificially calculated based on the number of iterations over time. The pool calculates based on the number and difficulty of submitted valid shares. Perhaps there's a math error in the miners calculations. Tpruvot has the first version of the algo, but they released a tweaked one (RFv2). There is also a pull request with RFv2, but it has the same issue. Anyway, I get your point about not being interested ))
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
May 23, 2019, 12:11:06 AM |
|
Tpruvot has the first version of the algo, but they released a tweaked one (RFv2). There is also a pull request with RFv2, but it has the same issue.
Anyway, I get your point about not being interested ))
It was a new algo and it's changed already, yet another reason why I don't like it. This seems to be a trend: vertcoin, zcoin, cryptonight, ... It appears to be an anti ASIC strategy, with SW miners able to adapt quicky without requiring new HW. It's not that big of a deal for a single coin but daunting for a multialgo miner to keep up. That the race I've withdrawn from.
|
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
May 28, 2019, 05:05:00 PM |
|
It looks like lyra2Z. Have you tried it?
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
May 29, 2019, 04:32:40 AM |
|
Here's a tease. It's the only visible part, much more is going on behind the scene. I'm trying to streamline the process, reduce overhead (especially interleaving for 4 way) and new innovative (imo) ideas for increasing performance. For now it's still in the napkin stage but it's starting to take shape. It means increasing the parallelization beyond the size of the largest vector. I have no idea if it will incease performance or by how much. It may actually be a flop but I think the idea has merit. It's a bit of a twist on another idea pioneered by a long time miner developper with an explosive name. That's all for now. I have a bug fix someone is waiting for I'm almost ready to think about a new release, still a few days away. [2019-05-29 00:17:17] Share 8 submitted by thread 12, lane 1. [2019-05-29 00:17:17] Accepted 8/8 (100%), diff 0.0113, 2659.60 kH/s, 70C [2019-05-29 00:17:29] Share 9 submitted by thread 2, lane 1. [2019-05-29 00:17:29] Accepted 9/9 (100%), diff 0.0187, 2659.60 kH/s, 70C [2019-05-29 00:17:35] Share 10 submitted by thread 11, lane 2. [2019-05-29 00:17:35] Accepted 10/10 (100%), diff 0.00811, 2659.60 kH/s, 70C [2019-05-29 00:17:52] Share 11 submitted by thread 8, lane 1. [2019-05-29 00:17:52] Accepted 11/11 (100%), diff 0.0127, 2659.02 kH/s, 71C
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
May 30, 2019, 04:27:52 PM Last edit: May 30, 2019, 06:16:42 PM by joblo |
|
Attention Ryzen users.
It is well known that Ryzen has a HW implementation of SHA and also well known that Ryzen also added AVX2 capabilities. Unfortunately Ryzen's AVX2 performance is poor.
The combination of these 2 points makes for some unusual effects on some algorithms depending on how much sha256 they use and how much AVX2 they use.
An extreme example is the sha256t algo, which is pure sha256 and also supports 8-way AVX2 and 4-way SSE.
The hw SHA implementation can't do parallel so the 8-way and 4-way code uses sw sha.
On Intel CPUs the performance is very predictable, 8-way AVX2 is fastest, 4-way SSE2 is next and 1 way is slowest.
On Ryzen it's the reverse. the single stream using HW SHA is fastest. A 16 thread Ryzen 1700 using HW SHA outperforms an 8 thread i7-6700K 8 way AVX2 by 50%. The 4 way SSE2 code is just as fast as, and maybe a little faster than, 8-way AVX2 on Ryzen. And the AVX2 performance is downright pitifull in most cases. Th eonly case where AVX2 may perform better is in 4-way AVX2 where there is no SSE2 equivalent.
As previously mentioned the impact depends on the mix of SHA and AVX2 in the algo as well as whether SSE2 parallel hashing is available.
I will investigate further and provide recommendations for Ryzen users.
The solution may extend beyond compiling and may require some code changes to ensure Ryzen prefers SHA over n-way when the algo contains a significant amout of sha256.
It likely won't be the upcoming release.
Edit: Here's a list of algos that use sha256
sha256t: as described above. lbry: significantly affected but less than sha256t skein: similar to lbry. m7m: no 4-way, not a problem. yescrypt and yespower: no 4 way, not a problem.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
May 30, 2019, 09:12:43 PM |
|
cpuminer-opt-3.9.1 is releasedhttps://github.com/JayDDee/cpuminer-opt/releasesFixed AVX2 version of anime algo. Added sonoa algo. Added "-DRYZEN_" compile option for Ryzen to override 4-way hashing when algo contains sha256 and use SHA instead. This is due to a combination of the introduction of HW SHA support combined with the poor performance of AVX2 on Ryzen. The Windows binaries package replaces cpuminer-avx2-sha with cpuminer-zen compiled with the override. Refer to the build instructions for more information. Ongoing restructuring to streamline the process, reduce latency, reduce memory usage and unnecessary copying of data. Most of these will not result in a notoceably higher reported hashrate as the change simply reduces the time wasted that wasn't factored into the hash rate reported by the miner. In short, less dead time resulting in a higher net hashrate. One of these measures to reduce latency also results in an enhanced share submission message including the share number*, the CPU thread, and the vector lane that found the solution. The time difference between the share submission and acceptance (or rejection) response indicates network ltatency. One other effect of this change is a reduction in hash meter messages because the scan function no longer exits when a share is found. Scan cycles will go longer and submit multiple shares per cycle. *the share number is antcipated and includes both accepted and rejected shares. Because the share is antipated and not synchronized it may be incorrect in time of very rapid share submission. Under most conditions it should be easy to match the submission with the corresponding response. Removed "-DUSE_SPH_SHA" option, all users should have a recent version of openssl installed: v1.0.2 (Ubuntu 16.04) or better. Ryzen SHA requires v1.1.0 or better. Ryzen SHA is not used when hashing multi-way parallel. Ryzen SHA is available in the Windows binaries release package. Improved compile instructions, now in seperate files: INSTALL_LINUX and INSTALL_WINDOWS. The Windows instructions are used to build the binaries release package. It's built on a Linux system either running as a virtual machine or a seperate computer. At this time there is no known way to build natively on a Windows system.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
May 31, 2019, 05:29:12 PM |
|
cpuminer-opt-3.9.1.1 is released
Fixed lyra2 regression affecting non-AVX2.
Compiling on Windows using Cygwin now works.
Simply use "./build.sh" from a cygwin shell.
I have no list of likely packages that need installing on top of the base Cygwin installation. You'll have to wing it for now.
It isn't portable therefore the Windows binaries package continues to use the existing procedure.
As always please report any problems.
|
|
|
|
malafaya
|
|
June 03, 2019, 12:52:31 PM Last edit: June 03, 2019, 03:53:39 PM by malafaya |
|
Hi,. joblo! Nice to see you back!
I noticed a few things in this v3.9.1.1 release:
* --cpu-affinity truncates to a 32-bit value which means one can't use CPUs at or above 32 unless I don't specify affinity at all (which for most algos is worse). I think this has been addressed in the past (regression?) * Miner now reports failure to affine to CPU x, y, z on startup if the startup processed is not affined to them: I usually affine the command prompt process before launching the miner (used to be a fix to the previous problem and it's also more flexible to me). I hope this does not affect anything performance related. * I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well. * For yescryptR16yespowerR16, I get 2200H/s, quite a bit below the 3000H/s I get with bellflower2015's variant. I suppose this is because you just introduced this algo and still didn't have the chance to tweak it.
Cheers!
|
|
|
|
4ward
Member
Offline
Activity: 473
Merit: 18
|
|
June 03, 2019, 01:02:46 PM |
|
* I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well.
If they are compiled with MinGW, the performance will be lower. Cross-compile with GCC does a better job optimizing If it's not the case, it might be something in the recent changes
|
|
|
|
malafaya
|
|
June 03, 2019, 01:04:40 PM |
|
* I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well.
If they are compiled with MinGW, the performance will be lower. Cross-compile with GCC does a better job optimizing If it's not the case, it might be something in the recent changes In both cases, I'm using the official Windows binaries.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
June 03, 2019, 03:28:03 PM Last edit: June 03, 2019, 04:18:40 PM by joblo |
|
* I'm consistently getting about 5% less hashrate for yescrypt than with older v3.8.8 for the exact same configuration. I didn't take more metrics but I think some other algos have slightly less performance as well.
If they are compiled with MinGW, the performance will be lower. Cross-compile with GCC does a better job optimizing If it's not the case, it might be something in the recent changes In both cases, I'm using the official Windows binaries. A few points. and questions: What's your CPU and OS? Summary: Changes were made to support Windows CPU groups There have been no changes to yescrypt code. There have been no recent changes to Windows buld process Edit: please use -D to display affinity debug info. Long version: CPU limit and affinity: A change was made initially in 3.9.0, later tweaked to add CPU groups support to Windows. This may be responsible for that issue. I can have a look at the code in light of the specific symtoms you saw. Do you use CPU groups? Which version of Windows? You said you affine the process seperately and that causes problems. That could be related to CPU groups if the process is in a different group from the miner threads. General performance degredation: The binaries are still made the same way using mingw,specifically using the winbuild-cross.sh script. The compiler was upgraded (evident in the startup messages showing the compiler version) prior to 3.9. However, I have been making some architectural changes that may have a small impact on performance, though 5% seems a bit much. I'm making them due to issues in preparation for AVX512 where up to 16 lanes can run parallel in a single CPU tread. The overhead for interleaving and deinterleaving the data, the increase in memory usage, etc, don't scale well. Some of those changes affect the locally displayed hashrate, both in volume of thread hash reports and their values. I have reduced the latency between detecting a solution and submitting it to the pool. As a side effect there are fewer hash meter reports and the reported hashrate is actually from the previous block. Another side effect is the reduced latency is not reflected in the hash rate reported by the miner. I considered it an acceptible compromise as it's just optics. The acumulated share difficulty over time is what the pool uses. In both the miner and the pool the hashrate is an artificial metric. The changes result in less deinterleaving of final hash (check for solution before interleaving instead of after), and submitting a share immediately when found while continuing the scan instead of aborting the scan to submit the share an start a new scan. On their face it is obviously more efficient but I measured no discernable difference in reprted hash rate. These changes are being migrated slowly and can be confirmed by a more detailed share submitted message indicating which thread and lane found the solution. Sorry for the ramble but there's a lot going on at the same time. I appreciate the testing and reports of any deviations from previous versions, especially the unintended ones.
|
|
|
|
malafaya
|
|
June 03, 2019, 04:04:56 PM Last edit: June 03, 2019, 04:44:07 PM by malafaya |
|
A few points. and questions:
What's your CPU and OS?
[...]
Do you use CPU groups? Which version of Windows? You said you affine the process seperately and that causes problems. That could be related to CPU groups if the process is in a different group from the miner threads.
Tested on Intel(R) Xeon CPU E5-266O v4 @ 2.OOGHz CPUs, on Windows Servr 2016. I do not use CPU groups (not sure what those are: will look into that) and I tested with 20 CPUs. [EDIT: I checked and CPU groups are applicable to machines with more than 64 CPUs so I only have one group here] I open a command prompt and prior to launching cpuminer-opt I set the command prompt's affinity to the one desired (set to 20 CPUs). The miner then runs with the desired processors already affined. v3.9.1.1 now complains that it can't affine to all CPUs (obviously, because I removed some from the parent process). I'm supposing that's just a benign warning and nothing will really change. You probably made the miner explicitely affine to all CPUs by default on startup hence the warnings. Yescrypt performance: There were no changes to the yescrypt code. I added yespower and tinkered with using that code for yescrypt without success so I left yescrypt as is. If you are aware of a better performing miner please point me to it and I'll have a look.
Argh, I'm sorry. I meant yespowerR16 instead of yescryptR16 in my last item! And I was referring to bellflower2015's fork of your miner (you can easily find it on github). Thanks!
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
June 03, 2019, 04:40:57 PM |
|
Argh, I'm sorry. I meant yespowerR16 instead of yescryptR16 in my last item! And I was referring to bellflower2015's fork of your miner (you can easily find it on github).
Thanks!
I suggest you not set the process affinity explicitly, it confuses cpuminer. Also please add -D option and post output. About yespower, I see the correction you made to your initial post. If I understand, you observed a drop in yescrypt performance over v3.8.8, but the big difference was with yespower vs bellflower fork. I can't explain the yescrypt difference, as I said I made no changes but I'll take another look. I'll also check out Bellflower.
|
|
|
|
malafaya
|
|
June 03, 2019, 04:55:39 PM |
|
I suggest you not set the process affinity explicitly, it confuses cpuminer. Also please add -D option and post output.
I'll have to set it explicitely for now because --cpu-affinity truncates to 32 bits, thus not allowing the use of CPUs above 31. Sorry, I'm not sure on what situation you want me to post the debug. Is this enough? ********** cpuminer-opt 3.9.1.1 *********** A CPU miner with multi algo support and optimized for CPUs with AES_NI and AVX2 and SHA extensions. BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
CPU: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz. SW built on May 31 2019 with GCC 7.3.0. CPU features: SSE2 AES SSE4.2 AVX AVX2. SW features: SSE2 AES SSE4.2 AVX AVX2. Algo features: SSE2. Start mining with SSE2.
[2019-06-03 17:52:38] 56 CPU cores available, 30 miner threads selected. [2019-06-03 17:52:38] Starting Stratum on stratum+tcp://***** [2019-06-03 17:52:38] Binding thread 0 to cpu 0 (mask 1) [2019-06-03 17:52:38] Binding thread 1 to cpu 1 (mask 2) [2019-06-03 17:52:38] Binding thread 4 to cpu 4 (mask 10) [2019-06-03 17:52:38] Binding thread 26 to cpu 26 (mask 4000000) [2019-06-03 17:52:38] affine_to_cpu_mask for 1 returned 57 [2019-06-03 17:52:38] Binding thread 2 to cpu 2 (mask 4) [2019-06-03 17:52:38] Binding thread 5 to cpu 5 (mask 20) [2019-06-03 17:52:38] affine_to_cpu_mask for 5 returned 57 [2019-06-03 17:52:38] Binding thread 7 to cpu 7 (mask 80) [2019-06-03 17:52:38] affine_to_cpu_mask for 7 returned 57 [2019-06-03 17:52:38] Binding thread 3 to cpu 3 (mask 8) [2019-06-03 17:52:38] affine_to_cpu_mask for 3 returned 57 [2019-06-03 17:52:38] Binding thread 10 to cpu 10 (mask 400) [2019-06-03 17:52:38] Binding thread 11 to cpu 11 (mask 800) [2019-06-03 17:52:38] affine_to_cpu_mask for 11 returned 57 [2019-06-03 17:52:38] Binding thread 13 to cpu 13 (mask 2000) [2019-06-03 17:52:38] affine_to_cpu_mask for 13 returned 57 [2019-06-03 17:52:38] Binding thread 15 to cpu 15 (mask 8000) [2019-06-03 17:52:38] affine_to_cpu_mask for 15 returned 57 [2019-06-03 17:52:38] Binding thread 17 to cpu 17 (mask 20000) [2019-06-03 17:52:38] affine_to_cpu_mask for 17 returned 57 [2019-06-03 17:52:38] Binding thread 6 to cpu 6 (mask 40) [2019-06-03 17:52:38] Binding thread 19 to cpu 19 (mask 80000) [2019-06-03 17:52:38] affine_to_cpu_mask for 19 returned 57 [2019-06-03 17:52:38] Binding thread 21 to cpu 21 (mask 200000) [2019-06-03 17:52:38] affine_to_cpu_mask for 21 returned 57 [2019-06-03 17:52:38] Binding thread 23 to cpu 23 (mask 800000) [2019-06-03 17:52:38] affine_to_cpu_mask for 23 returned 57 [2019-06-03 17:52:38] 30 miner threads started, using 'yespowerr16' algorithm. [2019-06-03 17:52:38] Binding thread 25 to cpu 25 (mask 2000000) [2019-06-03 17:52:38] affine_to_cpu_mask for 25 returned 57 [2019-06-03 17:52:38] Binding thread 27 to cpu 27 (mask 8000000) [2019-06-03 17:52:38] Binding thread 28 to cpu 28 (mask 10000000) [2019-06-03 17:52:38] Binding thread 29 to cpu 29 (mask 20000000) [2019-06-03 17:52:38] affine_to_cpu_mask for 29 returned 57 [2019-06-03 17:52:38] Binding thread 12 to cpu 12 (mask 1000) [2019-06-03 17:52:38] Binding thread 14 to cpu 14 (mask 4000) [2019-06-03 17:52:38] Binding thread 16 to cpu 16 (mask 10000) [2019-06-03 17:52:38] Binding thread 18 to cpu 18 (mask 40000) [2019-06-03 17:52:38] Binding thread 20 to cpu 20 (mask 100000) [2019-06-03 17:52:38] Binding thread 22 to cpu 22 (mask 400000) [2019-06-03 17:52:38] Binding thread 24 to cpu 24 (mask 1000000) [2019-06-03 17:52:38] Binding thread 8 to cpu 8 (mask 100) [2019-06-03 17:52:38] Binding thread 9 to cpu 9 (mask 200) [2019-06-03 17:52:38] affine_to_cpu_mask for 9 returned 57 [2019-06-03 17:52:38] Stratum session id: 2ee7a1bb44758f49710830c357335031 [2019-06-03 17:52:39] Stratum difficulty set to 1 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=00000000 ntime=5cf55048 [2019-06-03 17:52:39] yespowerr16 block 403605, network diff 0.013 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=01000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=02000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=03000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=04000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=05000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=06000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=07000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=08000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=09000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=0a000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=0b000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=0c000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=0d000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=0e000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=0f000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=10000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=11000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=12000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=13000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=14000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=15000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=16000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=17000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=18000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=19000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=1a000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=1b000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=1c000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=1d000000 ntime=5cf55048 [2019-06-03 17:52:39] DEBUG: job_id='690a' extranonce2=1e000000 ntime=5cf55048 [2019-06-03 17:52:46] DEBUG: job_id='690c' extranonce2=00000000 ntime=5cf5505e [2019-06-03 17:52:54] DEBUG: hash <= target Hash: 0000758736c194d8a115384dffac8218e7af7a552d235f254016c1c164269128 Target: 0000ffff00000000000000000000000000000000000000000000000000000000 [2019-06-03 17:52:54] Share submitted. [2019-06-03 17:52:55] Accepted 1/1 (100%), diff 3.32e-005, 2179.22 H/s [2019-06-03 17:53:07] DEBUG: job_id='690e' extranonce2=00000000 ntime=5cf55073 [2019-06-03 17:53:28] DEBUG: job_id='690f' extranonce2=00000000 ntime=5cf55088 [2019-06-03 17:53:41] DEBUG: hash <= target Hash: 00000f473fd9fdd57695a4730ccb13bbb31645b08ec9a5fdd6cc0f7c870dd7ef Target: 0000ffff00000000000000000000000000000000000000000000000000000000 [2019-06-03 17:53:41] Share submitted. [2019-06-03 17:53:41] Accepted 2/2 (100%), diff 0.000256, 2179.37 H/s
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
June 03, 2019, 05:37:29 PM Last edit: June 03, 2019, 06:27:27 PM by joblo |
|
I suggest you not set the process affinity explicitly, it confuses cpuminer. Also please add -D option and post output.
I'll have to set it explicitely for now because --cpu-affinity truncates to 32 bits, thus not allowing the use of CPUs above 31. Sorry, I'm not sure on what situation you want me to post the debug. Is this enough? What is confusing me is all your changes from the norm. I would like to see how it works with defaults to get a reference. I also don't know what you mean by truncating to 32, affinity is 64 bits. and you don't have more than 32 CPUs anyway. I don't know the case you posted but there were errors affine_to_cpu_mask for 1 returned 57 repeated for many CPUs, seems to be all the odd numbered ones. EDIT: I can't find what error 57 means. Some useful tests, you don't have to post the session just whether it worked as expected. Running less than N threads should be by factors of 2. Anything else is YMMV. And forcing the process affinity disqualifies everything. 1. All defaults 2. 14 threads default affinity, note wether cpu loads are balanced, ie affinity was properly distributed. 3. If unbalanced try setting affinity 0x5555555 or 0xaaaaaaa If everything works as expected I don't see a problem. Windows issues like CPU groups and NUMA shouldn't be an issue until you get over 64 CPUs.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
June 03, 2019, 07:12:40 PM |
|
If they are compiled with MinGW, the performance will be lower. Cross-compile with GCC does a better job optimizing
This statement caught my attention. The binaries are cross-compiled with GCC in a mingw environment runing on Linux. What are you referring to that is better?
|
|
|
|
malafaya
|
|
June 03, 2019, 07:40:37 PM Last edit: June 03, 2019, 08:09:39 PM by malafaya |
|
What is confusing me is all your changes from the norm. I would like to see how it works with defaults to get a reference. I also don't know what you mean by truncating to 32, affinity is 64 bits. and you don't have more than 32 CPUs anyway. I don't know the case you posted but there were errors affine_to_cpu_mask for 1 returned 57 repeated for many CPUs, seems to be all the odd numbered ones. EDIT: I can't find what error 57 means. Some useful tests, you don't have to post the session just whether it worked as expected. Running less than N threads should be by factors of 2. Anything else is YMMV. And forcing the process affinity disqualifies everything. 1. All defaults 2. 14 threads default affinity, note wether cpu loads are balanced, ie affinity was properly distributed. 3. If unbalanced try setting affinity 0x5555555 or 0xaaaaaaa If everything works as expected I don't see a problem. Windows issues like CPU groups and NUMA shouldn't be an issue until you get over 64 CPUs. There are more than 32 CPUs (check the debug above: 56 CPU cores available, 30 miner threads selected.). If I select all 28 even CPUs for instance, that means an affinity of 0x55555555555555 which is over 32 bits. If I do set --cpu-affinity=0x55555555555555 , I can check in Task Manager that affinities for CPUs above CPU 31 are not set, which led me to think that affinity is truncating to lower 32 bits. I do think the warning is just that: a warning, but wanted to be sure as v3.8.8 did not issue such warning before. And yes, there is no warning issued if I use --cpu-affinity (or don't use it at all) as long as I don't set affinity externally beforehand, so that is not an issue. EDIT: I made a few tests and verified the following: * With v3.8.8, a cpu affinity works correctly up to 0xffffff; above that, it triggers that exactly the first 32 CPUs are used, no matter the value. * With v3.9.1.1, a cpu affinity works correctly up to 0xf; above that, it triggers that exactly the first 32 CPUs are used, no matter the value. I verified this by using the CPU affinity option in Task Manager for the miner process. So the reason I set affinity externally is because I could never rely on the built-in cpu affinity for most miners. It seems to break in some configurations.
|
|
|
|
|