I have i7-7800X CPU @ 3.50GHz
Trying to see which CPU miner is best for lyra2z... anyone know...? I tried 4way and AES-avx2. they seem to get the same KH and KH/s. For aes-sse42, it seems i get lower.
From personal test, it seems like aes-avx2 is way to go. can anyone confirm? also, what would be safe CPU temperature for CPU mining?
AVX2 is the way to go, 4 way does nothing for lyra2. You can also try reducing the number of threads, it might improve cache performance. The cooler the CPU the better. Make sure the fan is at 100%, it's cheaper to replace a worn out fan than a burned out CPU.
|
|
|
The results for nist5 on Ryzen are baffling. I get much better performance with i7-6700K using 4way and all threads. It's even more baffling that the tribus results on Ryzen are consistent with Intel. They share a lot of code.
4way 8 threads: 2720 4way 4 threads: 2205 1way 8 threads: 2055 1way 4 threads: 1850
|
|
|
Ya, the default affinity was choosing virtual threads instead physical ones. Damn! 6MH/s!
All Ryzen users should take note. Intel chooses one thread per core before using HT. In fact. Joblo, is there an updated algo list that receive boost from SHA hardware acceleration? I found a little list some pages before: sha256t, lbry, skein, myr-groestl, m7m. Are there more algos? I converted all of them at the time and I don't recall any new algos that can use it. What about nist5? Can you try that again? I'd like to understand what's going on there. I get good performance on my Intel.
|
|
|
Ya, the default affinity was choosing virtual threads instead physical ones. Damn! 6MH/s!
All Ryzen users should take note. Intel chooses one thread per core before using HT.
|
|
|
Tribus 4way 8 threads: [2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 17:05:32] tribus block 449483, diff 735.578 [2017-12-17 17:05:32] CPU #7: 461.65 kH, 398.07 kH/s [2017-12-17 17:05:32] CPU #6: 460.63 kH, 398.21 kH/s [2017-12-17 17:05:32] CPU #5: 460.43 kH, 397.70 kH/s [2017-12-17 17:05:32] CPU #2: 460.88 kH, 397.74 kH/s [2017-12-17 17:05:32] CPU #4: 460.51 kH, 397.76 kH/s [2017-12-17 17:05:32] CPU #3: 460.82 kH, 398.03 kH/s [2017-12-17 17:05:32] CPU #0: 454.80 kH, 393.86 kH/s [2017-12-17 17:05:32] CPU #1: 463.35 kH, 399.53 kH/s Apparently Tribus 4way likes SMT/HT here. It's interesting that the thread rate didn't increase with fewer threads. Were the threads spread over all 8 cores? You can try "-t 8 --cpu-affinity 0x5555" to select alternate vcores.
|
|
|
Thanks for that. Do you have a howto guide? I need to file it for when I finally upgrade my build environment With your permission I will add your link to the OP.
|
|
|
I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.
This is very interesting feedback. I get 340 kH/s per thread 4way vs 255 kH/s AVX2 1way on my i7-6700K @4GHz. Something isn't right, need lots of details to eliminate simple stuff. Can you post the startup for both? None of the following should cause that much of a difference, but it helps to quantify. AMD AVX2 performance is known to be slower than AVX. Try running a test with just AVX2 and again with AVX to compare. Another, better, way to copmare AVX2 vs AVX performance is lyra2rev2. It has the most AVX2 code. 4way uses 4 times the memory of plain AVX2. This will expose any cache performance issues. Try running fewer threads to see if performance (total, not just per thread) improves. Try tribus algo, it's pure 4way parallel while nist5 has a serial component which reduces gain and adds some overhead.
|
|
|
Sorry to annoy you with so many questions.
You ask snap questions without thinking then you challenge my answers based on your misconceptions. Running out of memory is a simple problem that you should be able to solve yourself. You don't need to apologize, just try harder before asking questions. And if you do need to ask a question about a problem you should show how you tried to solve it. You learn more that way.
|
|
|
Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.
Oh, OK, it's just it previously stated SSE2. On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states: "thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate. I noted the change in feature reporting in the release announcement. You're out of memory. You only have enough memory for xx -1 threads. Thanks, fiddling around with virtual memory settings allowed it to run. Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture? Thanks and keep up the good work! Virtual memory is slow, you need the real thing. I have 16 Gb of Ram, it shouldn't be a problem. I had a fixed page file size, I set it to auto, and it worked. Maybe a bug? You don't have enough RAM to run that many threads without using VM. Using VM is slow. Stop arguing and do the math: N*threads.
|
|
|
Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.
Oh, OK, it's just it previously stated SSE2. On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states: "thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate. I noted the change in feature reporting in the release announcement. You're out of memory. You only have enough memory for xx -1 threads. Thanks, fiddling around with virtual memory settings allowed it to run. Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture? Thanks and keep up the good work! Virtual memory is slow, you need the real thing.
|
|
|
This is the proposed fix for the 32 cpu limit: @@ -204,7 +204,7 @@ for ( uint8_t i = 0; i < ncpus; i++ ) { // cpu mask - if( (ncpus > 64) || ( mask & (1UL << i) ) ) CPU_SET( i, &set ); + if( (ncpus > 64) || ( mask & (1ULL << i) ) ) CPU_SET( i, &set ); } if ( id == -1 ) { @@ -1690,9 +1690,9 @@ { if (opt_debug) applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)", - thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) ); + thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) ); - affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) ); + affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) ); } else if (opt_affinity != -1) {
All good now - all CPUs running with this patch, thanks! Thanks for testing. I still don't understand why it worked before with -1UL (32 bit) but it's moot now. If I get a response (or after a suitable timeout with no response) for the lyra2h fix I will release both
|
|
|
Stupid mistake, try this change in algo/lyra2/lyra2h.c line 34: 34c34 < LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16 ); --- > LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8);
I presume no news means it now works? I'd like confirmation. With the following change it still only uses 32 CPUs: --- algo/lyra2/lyra2h.c.orig 2017-12-14 23:28:51.000000000 +0000 +++ algo/lyra2/lyra2h.c 2017-12-16 05:29:48.295167452 +0000 @@ -31,7 +31,7 @@ sph_blake256( &ctx_blake, input + 64, 16 ); sph_blake256_close( &ctx_blake, hash ); - LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8); + LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16); memcpy(state, hash, 32); }
Not sure if I should try your earlier changes as well? If so - could you send a patch in diff -u format? I'm a bit confused by this post. Your comment about still using 32 CPUs is for my previous post about using 1ULL to force it to 64 bits. You're saying that didn't work? The quote above is for a different problem with rejects mining the new lyra2h algo. Is that what you are now offerring to test? Edit: I re-read you post a few more times and it appears you're saying that the Lyra2 chage didn't fix the 32 cpu limit problem you initially reported. It only (hopefully) fixes the rejects from lyra2h reported by someone else. This is the proposed fix for the 32 cpu limit: @@ -204,7 +204,7 @@ for ( uint8_t i = 0; i < ncpus; i++ ) { // cpu mask - if( (ncpus > 64) || ( mask & (1UL << i) ) ) CPU_SET( i, &set ); + if( (ncpus > 64) || ( mask & (1ULL << i) ) ) CPU_SET( i, &set ); } if ( id == -1 ) { @@ -1690,9 +1690,9 @@ { if (opt_debug) applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)", - thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) ); + thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) ); - affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) ); + affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) ); } else if (opt_affinity != -1) {
|
|
|
********** cpuminer-opt 3.7.6 *********** A CPU miner with multi algo support and optimized for CPUs with AES_NI and AVX2 and SHA extensions. BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
CPU: Intel(R) Core(TM) i3-5010U CPU @ 2.10GHz. SW built on Dec 15 2017 with GCC 6.3.0. CPU features: SSE2 AES AVX AVX2. SW features: SSE2 AES AVX AVX2 4WAY. Algo features: AVX AVX2. Start mining with AVX2.
[2017-12-15 05:13:05] Starting Stratum on stratum+tcp://pool.hppcoin.com:3888 [2017-12-15 05:13:05] 4 miner threads started, using 'lyra2h' algorithm. [2017-12-15 05:13:06] Stratum difficulty set to 0.03 [2017-12-15 05:14:57] lyra2h block 3745, diff 1.086 [2017-12-15 05:15:00] CPU #2: 33.13 kH, 304.87 H/s [2017-12-15 05:15:01] CPU #1: 31.07 kH, 284.39 H/s [2017-12-15 05:15:01] CPU #3: 33.19 kH, 303.82 H/s [2017-12-15 05:15:01] CPU #0: 29.41 kH, 269.21 H/s [2017-12-15 05:15:57] CPU #2: 18.29 kH, 320.96 H/s [2017-12-15 05:15:58] CPU #1: 17.06 kH, 299.32 H/s [2017-12-15 05:15:59] CPU #0: 16.15 kH, 277.56 H/s [2017-12-15 05:15:59] CPU #3: 18.23 kH, 313.18 H/s [2017-12-15 05:16:59] CPU #2: 19.26 kH, 311.88 H/s [2017-12-15 05:17:00] CPU #1: 17.96 kH, 285.40 H/s [2017-12-15 05:17:00] CPU #0: 16.65 kH, 269.79 H/s [2017-12-15 05:17:00] CPU #3: 18.79 kH, 304.49 H/s [2017-12-15 05:17:29] 4 WAY hash nonces submitted: 0 [2017-12-15 05:17:29] 1 WAY hash nonce submitted [2017-12-15 05:17:29] CPU #1: 8271 H, 290.19 H/s [2017-12-15 05:17:29] Rejected 1/1 (100.0%), 62.97 kH, 1176.35 H/s [2017-12-15 05:17:29] reject reason: low difficulty share of 1.1898710488783548e-7 [2017-12-15 05:17:29] factor reduced to : 0.67 [2017-12-15 05:17:46] CPU #3: 18.27 kH, 400.63 H/s [2017-12-15 05:17:46] CPU #0: 16.19 kH, 354.79 H/s [2017-12-15 05:17:46] CPU #2: 18.71 kH, 393.31 H/s [2017-12-15 05:17:47] CPU #1: 17.41 kH, 944.91 H/s [2017-12-15 05:17:48] 4 WAY hash nonces submitted: 0 [2017-12-15 05:17:48] 1 WAY hash nonce submitted [2017-12-15 05:17:48] CPU #2: 9619 H, 7518.31 H/s [2017-12-15 05:17:48] Rejected 2/2 (100.0%), 61.49 kH, 9218.64 H/s [2017-12-15 05:17:48] reject reason: low difficulty share of 1.0869699425015301e-7 [2017-12-15 05:17:48] factor reduced to : 0.44 [2017-12-15 05:17:48] CPU #0: 21.29 kH, 9075.46 H/s [2017-12-15 05:17:49] CPU #3: 24.04 kH, 9621.84 H/s [2017-12-15 05:17:54] CPU #1: 56.70 kH, 8476.80 H/s [2017-12-15 05:17:55] 4 WAY hash nonces submitted: 0 [2017-12-15 05:17:55] 1 WAY hash nonce submitted [2017-12-15 05:17:55] CPU #2: 62.99 kH, 8803.87 H/s [2017-12-15 05:17:55] Rejected 3/3 (100.0%), 165.01 kH, 35.98 kH/s [2017-12-15 05:17:55] reject reason: low difficulty share of 3.9760953496518064e-8 [2017-12-15 05:17:55] factor reduced to : 0.30 [2017-12-15 05:17:55] 4 WAY hash nonces submitted: 0 [2017-12-15 05:17:55] 1 WAY hash nonce submitted [2017-12-15 05:17:55] CPU #1: 10.24 kH, 14.97 kH/s [2017-12-15 05:17:55] Rejected 4/4 (100.0%), 118.56 kH, 42.48 kH/s [2017-12-15 05:17:55] reject reason: low difficulty share of 7.469801103230193e-7 [2017-12-15 05:17:55] factor reduced to : 0.20 [2017-12-15 05:18:03] 4 WAY hash nonces submitted: 0 [2017-12-15 05:18:03] 1 WAY hash nonce submitted [2017-12-15 05:18:03] CPU #3: 167.92 kH, 11.32 kH/s [2017-12-15 05:18:03] Rejected 5/5 (100.0%), 262.44 kH, 44.17 kH/s [2017-12-15 05:18:03] reject reason: low difficulty share of 3.4403911449252514e-8 [2017-12-15 05:18:03] factor reduced to : 0.13 [2017-12-15 05:18:14] 4 WAY hash nonces submitted: 0 [2017-12-15 05:18:14] 1 WAY hash nonce submitted [2017-12-15 05:18:14] CPU #0: 222.08 kH, 8624.82 H/s [2017-12-15 05:18:14] Rejected 6/6 (100.0%), 463.24 kH, 43.72 kH/s [2017-12-15 05:18:14] reject reason: low difficulty share of 8.317881954690417e-8 [2017-12-15 05:18:14] factor reduced to : 0.09 [2017-12-15 05:18:14] 4 WAY hash nonces submitted: 0 [2017-12-15 05:18:15] 1 WAY hash nonce submitted [2017-12-15 05:18:15] CPU #0: 3528 H, 13.74 kH/s [2017-12-15 05:18:15] Rejected 7/7 (100.0%), 244.68 kH, 48.84 kH/s [2017-12-15 05:18:15] reject reason: low difficulty share of 8.689102139919432e-8 [2017-12-15 05:18:15] factor reduced to : 0.06 [2017-12-15 05:18:15] 4 WAY hash nonces submitted: 0 [2017-12-15 05:18:15] 1 WAY hash nonce submitted [2017-12-15 05:18:15] CPU #0: 431 H, 7178.07 H/s [2017-12-15 05:18:15] Rejected 8/8 (100.0%), 241.58 kH, 42.27 kH/s [2017-12-15 05:18:15] reject reason: low difficulty share of 1.0635928005429685e-7 [2017-12-15 05:18:15] factor reduced to : 0.04 [2017-12-15 05:18:18] 4 WAY hash nonces submitted: 0 [2017-12-15 05:18:18] 1 WAY hash nonce submitted [2017-12-15 05:18:18] CPU #2: 222.20 kH, 9596.46 H/s [2017-12-15 05:18:18] Rejected 9/9 (100.0%), 400.79 kH, 43.06 kH/s [2017-12-15 05:18:18] reject reason: low difficulty share of 7.515368302351459e-8 [2017-12-15 05:18:18] factor reduced to : 0.03 [2017-12-15 05:18:40] 4 WAY hash nonces submitted: 0 [2017-12-15 05:18:40] 1 WAY hash nonce submitted [2017-12-15 05:18:40] CPU #0: 279.30 kH, 10.96 kH/s [2017-12-15 05:18:40] Rejected 10/10 (100.0%), 679.66 kH, 46.84 kH/s [2017-12-15 05:18:40] reject reason: low difficulty share of 5.0051796485619505e-8 [2017-12-15 05:18:40] factor reduced to : 0.02 [2017-12-15 05:18:46] CPU #3: 678.93 kH, 16.04 kH/s [2017-12-15 05:18:54] 4 WAY hash nonces submitted: 0 [2017-12-15 05:18:54] 1 WAY hash nonce submitted [2017-12-15 05:18:54] CPU #2: 536.56 kH, 15.03 kH/s [2017-12-15 05:18:54] Rejected 11/11 (100.0%), 1505.03 kH, 57.00 kH/s [2017-12-15 05:18:54] reject reason: low difficulty share of 9.10089396836757e-8 [2017-12-15 05:18:54] factor reduced to : 0.01 [2017-12-15 05:18:57] Stratum difficulty set to 0.0171429 [2017-12-15 05:21:04] CPU #1: 898.44 kH, 4757.21 H/s
Stupid mistake, try this change in algo/lyra2/lyra2h.c line 34: 34c34 < LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16 ); --- > LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8);
I presume no news means it now works? I'd like confirmation.
|
|
|
[2017-12-16 01:03:56] Starting Stratum on stratum+tcp://ip:port [2017-12-16 01:03:56] Binding thread 0 to cpu 0 (mask 1) [2017-12-16 01:03:56] Binding thread 1 to cpu 1 (mask 2) [2017-12-16 01:03:56] Binding thread 2 to cpu 2 (mask 4) [2017-12-16 01:03:56] Binding thread 3 to cpu 3 (mask 8) [2017-12-16 01:03:56] Binding thread 4 to cpu 4 (mask 10) [2017-12-16 01:03:56] Binding thread 5 to cpu 5 (mask 20) [2017-12-16 01:03:56] Binding thread 6 to cpu 6 (mask 40) [2017-12-16 01:03:56] Binding thread 7 to cpu 7 (mask 80) [2017-12-16 01:03:56] Binding thread 8 to cpu 8 (mask 100) [2017-12-16 01:03:56] Binding thread 9 to cpu 9 (mask 200) [2017-12-16 01:03:56] Binding thread 10 to cpu 10 (mask 400) [2017-12-16 01:03:56] Binding thread 11 to cpu 11 (mask 800) [2017-12-16 01:03:56] Binding thread 12 to cpu 12 (mask 1000) [2017-12-16 01:03:56] Binding thread 13 to cpu 13 (mask 2000) [2017-12-16 01:03:56] Binding thread 14 to cpu 14 (mask 4000) [2017-12-16 01:03:56] Binding thread 15 to cpu 15 (mask 8000) [2017-12-16 01:03:56] Binding thread 16 to cpu 16 (mask 10000) [2017-12-16 01:03:56] Binding thread 17 to cpu 17 (mask 20000) [2017-12-16 01:03:56] Binding thread 18 to cpu 18 (mask 40000) [2017-12-16 01:03:56] Binding thread 19 to cpu 19 (mask 80000) [2017-12-16 01:03:56] Binding thread 20 to cpu 20 (mask 100000) [2017-12-16 01:03:56] Binding thread 21 to cpu 21 (mask 200000) [2017-12-16 01:03:56] Binding thread 22 to cpu 22 (mask 400000) [2017-12-16 01:03:56] Binding thread 23 to cpu 23 (mask 800000) [2017-12-16 01:03:56] Binding thread 24 to cpu 24 (mask 1000000) [2017-12-16 01:03:56] Binding thread 25 to cpu 25 (mask 2000000) [2017-12-16 01:03:56] Binding thread 26 to cpu 26 (mask 4000000) [2017-12-16 01:03:56] Binding thread 27 to cpu 27 (mask 8000000) [2017-12-16 01:03:56] Binding thread 28 to cpu 28 (mask 10000000) [2017-12-16 01:03:56] Binding thread 29 to cpu 29 (mask 20000000) [2017-12-16 01:03:56] Binding thread 30 to cpu 30 (mask 40000000) [2017-12-16 01:03:56] Binding thread 31 to cpu 31 (mask 80000000) [2017-12-16 01:03:56] Binding thread 32 to cpu 32 (mask 1) [2017-12-16 01:03:56] Binding thread 33 to cpu 33 (mask 2) [2017-12-16 01:03:56] Binding thread 34 to cpu 34 (mask 4) [2017-12-16 01:03:56] Binding thread 35 to cpu 35 (mask 8) [2017-12-16 01:03:56] Binding thread 36 to cpu 36 (mask 10) [2017-12-16 01:03:56] Binding thread 37 to cpu 37 (mask 20) [2017-12-16 01:03:56] Binding thread 38 to cpu 38 (mask 40) [2017-12-16 01:03:56] Binding thread 39 to cpu 39 (mask 80) [2017-12-16 01:03:56] Binding thread 40 to cpu 40 (mask 100) [2017-12-16 01:03:56] Binding thread 41 to cpu 41 (mask 200) [2017-12-16 01:03:56] Binding thread 42 to cpu 42 (mask 400) [2017-12-16 01:03:56] Binding thread 43 to cpu 43 (mask 800) [2017-12-16 01:03:56] Binding thread 44 to cpu 44 (mask 1000) [2017-12-16 01:03:56] Binding thread 45 to cpu 45 (mask 2000) [2017-12-16 01:03:56] Binding thread 46 to cpu 46 (mask 4000) [2017-12-16 01:03:56] 48 miner threads started, using 'lyra2z' algorithm. [2017-12-16 01:03:56] Binding thread 47 to cpu 47 (mask 8000) [2017-12-16 01:03:57] Stratum session id: deadbeefcafebabef76c160000000000 [2017-12-16 01:03:57] Stratum difficulty set to 10 [2017-12-16 01:03:58] Stratum difficulty set to 5 [2017-12-16 01:03:58] DEBUG: job_id='1e40' extranonce2=00000000 ntime=5a3470e6 [2017-12-16 01:03:58] Stratum difficulty set to 10 (0.03906) (...)
I've reviewed the changes I made for 64 CPU support and they should only have an effect when there are more than 64 vcores. It's as simple as if num_cpus > 64 do something different else do as usual. The issue must be somewhere in cpu-miner.c.
If we copy cpu-miner.c from v3.7.5.tar.gz to 3.7.6 and compile it, all 48 CPUs are used.
If we copy cpu-miner.c from v3.7.6.tar.gz to 3.7.5 and compile it, only 32 CPUs are used.
applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)", thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) ); affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
Something isn't making sense. The mask is rolling over at 32. The only way that happens is if num_cpus == 32. This is really bugging me. The only idea I have is the literal constant 1 is not being extended to 64 bits and the result of ( 1 << (thr_id % num_cpus) ) is only a 32 bit value even though num_cpus is 48. If this is the case I don't know why it worked before. But it's worth a try forcing it to 64 bits: 207c207 < if( (ncpus > 64) || ( mask & (1ULL << i) ) ) CPU_SET( i, &set ); --- > if( (ncpus > 64) || ( mask & (1UL << i) ) ) CPU_SET( i, &set ); 1693c1693 < thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) ); --- > thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) ); 1695c1695 < affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) ); --- > affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
|
|
|
Using --cpu-affinity 0xffffffffffff makes it correctly run on 48 CPUs:
Well, that's something. I'll take a closer look and may provide some test code to gather more info.
|
|
|
Tried 3.7.5 vs 3.7.6 - 3.7.5 runs on all CPUs, 3.7.6 only works on 32 CPUs and gives lower hash rates. Both versions start in the same way with -D: [2017-12-16 01:03:56] Starting Stratum on stratum+tcp://ip:port [2017-12-16 01:03:56] Binding thread 0 to cpu 0 (mask 1) [2017-12-16 01:03:56] Binding thread 1 to cpu 1 (mask 2) [2017-12-16 01:03:56] Binding thread 2 to cpu 2 (mask 4) [2017-12-16 01:03:56] Binding thread 3 to cpu 3 (mask 8) [2017-12-16 01:03:56] Binding thread 4 to cpu 4 (mask 10) [2017-12-16 01:03:56] Binding thread 5 to cpu 5 (mask 20) [2017-12-16 01:03:56] Binding thread 6 to cpu 6 (mask 40) [2017-12-16 01:03:56] Binding thread 7 to cpu 7 (mask 80) [2017-12-16 01:03:56] Binding thread 8 to cpu 8 (mask 100) [2017-12-16 01:03:56] Binding thread 9 to cpu 9 (mask 200) [2017-12-16 01:03:56] Binding thread 10 to cpu 10 (mask 400) [2017-12-16 01:03:56] Binding thread 11 to cpu 11 (mask 800) [2017-12-16 01:03:56] Binding thread 12 to cpu 12 (mask 1000) [2017-12-16 01:03:56] Binding thread 13 to cpu 13 (mask 2000) [2017-12-16 01:03:56] Binding thread 14 to cpu 14 (mask 4000) [2017-12-16 01:03:56] Binding thread 15 to cpu 15 (mask 8000) [2017-12-16 01:03:56] Binding thread 16 to cpu 16 (mask 10000) [2017-12-16 01:03:56] Binding thread 17 to cpu 17 (mask 20000) [2017-12-16 01:03:56] Binding thread 18 to cpu 18 (mask 40000) [2017-12-16 01:03:56] Binding thread 19 to cpu 19 (mask 80000) [2017-12-16 01:03:56] Binding thread 20 to cpu 20 (mask 100000) [2017-12-16 01:03:56] Binding thread 21 to cpu 21 (mask 200000) [2017-12-16 01:03:56] Binding thread 22 to cpu 22 (mask 400000) [2017-12-16 01:03:56] Binding thread 23 to cpu 23 (mask 800000) [2017-12-16 01:03:56] Binding thread 24 to cpu 24 (mask 1000000) [2017-12-16 01:03:56] Binding thread 25 to cpu 25 (mask 2000000) [2017-12-16 01:03:56] Binding thread 26 to cpu 26 (mask 4000000) [2017-12-16 01:03:56] Binding thread 27 to cpu 27 (mask 8000000) [2017-12-16 01:03:56] Binding thread 28 to cpu 28 (mask 10000000) [2017-12-16 01:03:56] Binding thread 29 to cpu 29 (mask 20000000) [2017-12-16 01:03:56] Binding thread 30 to cpu 30 (mask 40000000) [2017-12-16 01:03:56] Binding thread 31 to cpu 31 (mask 80000000) [2017-12-16 01:03:56] Binding thread 32 to cpu 32 (mask 1) [2017-12-16 01:03:56] Binding thread 33 to cpu 33 (mask 2) [2017-12-16 01:03:56] Binding thread 34 to cpu 34 (mask 4) [2017-12-16 01:03:56] Binding thread 35 to cpu 35 (mask 8) [2017-12-16 01:03:56] Binding thread 36 to cpu 36 (mask 10) [2017-12-16 01:03:56] Binding thread 37 to cpu 37 (mask 20) [2017-12-16 01:03:56] Binding thread 38 to cpu 38 (mask 40) [2017-12-16 01:03:56] Binding thread 39 to cpu 39 (mask 80) [2017-12-16 01:03:56] Binding thread 40 to cpu 40 (mask 100) [2017-12-16 01:03:56] Binding thread 41 to cpu 41 (mask 200) [2017-12-16 01:03:56] Binding thread 42 to cpu 42 (mask 400) [2017-12-16 01:03:56] Binding thread 43 to cpu 43 (mask 800) [2017-12-16 01:03:56] Binding thread 44 to cpu 44 (mask 1000) [2017-12-16 01:03:56] Binding thread 45 to cpu 45 (mask 2000) [2017-12-16 01:03:56] Binding thread 46 to cpu 46 (mask 4000) [2017-12-16 01:03:56] 48 miner threads started, using 'lyra2z' algorithm. [2017-12-16 01:03:56] Binding thread 47 to cpu 47 (mask 8000) [2017-12-16 01:03:57] Stratum session id: deadbeefcafebabef76c160000000000 [2017-12-16 01:03:57] Stratum difficulty set to 10 [2017-12-16 01:03:58] Stratum difficulty set to 5 [2017-12-16 01:03:58] DEBUG: job_id='1e40' extranonce2=00000000 ntime=5a3470e6 [2017-12-16 01:03:58] Stratum difficulty set to 10 (0.03906) (...)
Are you sure? Which version is this? The mask changed from thread 32 up, yet the debug output says it's binding to the correct CPU for threads 32 and up I've reviewed the changes I made for 64 CPU support and they should only have an effect when there are more than 64 vcores. It's as simple as if num_cpus > 64 do something different else do as usual.
The issue must be somewhere in cpu-miner.c. If we copy cpu-miner.c from v3.7.5.tar.gz to 3.7.6 and compile it, all 48 CPUs are used. If we copy cpu-miner.c from v3.7.6.tar.gz to 3.7.5 and compile it, only 32 CPUs are used. Are you sure the output is exactly the same? Which version is this? The mask changed from thread 32 up, yet the debug output says it's binding to the correct CPU. Since I have no way to test this I'm relying on you. Without some kind of clue my only choice is to backout the change that added support for more than 64 CPUs. applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)", thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) ); affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
Something isn't making sense. The mask is rolling over at 32. The only way that happens is if num_cpus == 32. Can you try with a 48 bit affinity mask to see what that does? --cpu-affinity 0xffffffffffff
|
|
|
Ubuntu 16.04. When I start cpuminer-opt 3.7.6 on a 48 CPU system, in top/htop I can see only 32 CPUs are used (unexpected). When I start cpuminer-opt 3.7.3 on a 48 CPU system, in top/htop I can see 48 CPUs are used (expected). Didn't try other versions. Command line: # ./cpuminer -c config.conf -q ********** cpuminer-opt 3.7.6 *********** A CPU miner with multi algo support and optimized for CPUs with AES_NI and AVX2 and SHA extensions. BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
CPU: AMD EPYC 7401P 24-Core Processor . SW built on Dec 15 2017 with GCC 5.4.0. CPU features: SSE2 AES AVX AVX2 SHA. SW features: SSE2 AES AVX AVX2 SHA. Algo features: AVX AVX2. Start mining with AVX2.
[2017-12-15 22:06:13] Starting Stratum on stratum+tcp://address:port [2017-12-15 22:06:13] 48 miner threads started, using 'lyra2z' algorithm. [2017-12-15 22:06:13] Stratum difficulty set to 10
config.conf: {
"url" : "some-url:port", "user" : "some.01234.worker", "pass" : "pass",
"algo" : "lyra2z", "api-bind" : 0 }
I presume this is the same system using both versions of cpuminer-opt. I've reviewed the changes I made for 64 CPU support and they should only have an effect when there are more than 64 vcores. It's as simple as if num_cpus > 64 do something different else do as usual. I can find no reason why it behaves differently or why it maxes at 32. Test both versions with -D, the debug output may provide a clue. Edit: Also is there anything unusual about that system? Any virtualization or NUMA?
|
|
|
Does this also apply forgotten NVIDIA tesla architecture? I would like to at least make the attempt,but I'm a newbie to what programming language refers,there are also no guides on how to work from 0 or how to start my question although a bit vague, it is possible to work old and new some algorithms such as "cryptonight" ,has some impediment to what it refers to level of instructions or obstacle? (I know that this is not the indicated site I only ask for opinions regarding this) Any opinion is appreciated You're asking in the wrong place. This is for CPU mining, don't expect any GPU advice here.
|
|
|
|