whotheff
|
|
July 09, 2020, 09:25:56 AM |
|
I don't know if it is the Algo or the miner, but with Yescryptr32 AMD Ryzen CPUs can barely produce anything.
|
|
|
|
webhead
Newbie
Offline
Activity: 123
Merit: 0
|
|
August 10, 2020, 05:34:14 PM |
|
hello can you add xla panthera algo many thanks.
|
|
|
|
JayDDee (OP)
|
|
August 11, 2020, 02:52:41 AM |
|
hello can you add xla panthera algo many thanks.
The algo is a modified randomx and the existing miner is a fork of xmrig. I can't do better.
|
|
|
|
BoozyTalking
Newbie
Offline
Activity: 315
Merit: 0
|
|
August 13, 2020, 12:11:31 PM |
|
Hi, JayDDee. What is this error mean: [2020-08-13 15:08:46] JSON-RPC call failed: Block does not start with a coinbase [2020-08-13 15:08:46] submit_upstream_work json_rpc_call failed [2020-08-13 15:08:46] ...retry after 10 seconds
|
|
|
|
JayDDee (OP)
|
|
August 13, 2020, 02:31:01 PM |
|
Hi, JayDDee. What is this error mean: [2020-08-13 15:08:46] JSON-RPC call failed: Block does not start with a coinbase [2020-08-13 15:08:46] submit_upstream_work json_rpc_call failed [2020-08-13 15:08:46] ...retry after 10 seconds That is an error reported by the stratum server, never seen it before. Need more context.
|
|
|
|
BoozyTalking
Newbie
Offline
Activity: 315
Merit: 0
|
|
August 17, 2020, 09:15:40 AM |
|
QAC coin solo mining with local wallet. Happen after share is found.
|
|
|
|
|
JayDDee (OP)
|
|
November 06, 2020, 02:18:28 AM |
|
Comment regarding AMD Zen3 (Ryzen 5000)
Zen 3 includes VAES for 256 bit (AVX2) vectors. This is a 2 way parallel operation backported from Intel AVX512VL. All Intel CPUs require AVX512VL and VAES to implement 256 bit parallel AES.
A few bullet points:
AMD will require a different CPUID configuration to distinguish Intel VAES from AMD VAES-256.
A new compile architecture is required to generate the new 256 bit AES instructions.
There is no performance gain, with Zen AVX2 256 bit instructions are implemented internally as 2 128 bit instructions.
In cpuminer-opt all VAES is 4-way 512 bits and requires proper AVX512 support (AVX512F for 512 bit) with expected performance gains.
There is nothing to be gained in cpuminer-opt by coding for 256 bit AES or recompiling the existing code for Zen3.
|
|
|
|
sech1
|
|
November 06, 2020, 08:42:38 AM |
|
AMD will require a different CPUID configuration to distinguish Intel VAES from AMD VAES-256.
A new compile architecture is required to generate the new 256 bit AES instructions.
There is no performance gain, with Zen AVX2 256 bit instructions are implemented internally as 2 128 bit instructions.
In cpuminer-opt all VAES is 4-way 512 bits and requires proper AVX512 support (AVX512F for 512 bit) with expected performance gains.
There is nothing to be gained in cpuminer-opt by coding for 256 bit AES or recompiling the existing code for Zen3.
Same CPUID bit as far as I can see: "VAES CPUID Fn0000_0007_ECX[VAES]_x0 (bit 9)" from https://www.amd.com/system/files/TechDocs/26568.pdf"Zen AVX2 256 bit instructions are implemented internally as 2 128 bit instructions" what? Zen2/Zen3 has full 256-bit FP unit and AES instructions run there. VAES also runs there at full 256 bit throughput.
|
|
|
|
JayDDee (OP)
|
|
November 06, 2020, 02:12:44 PM Last edit: November 06, 2020, 07:18:15 PM by JayDDee |
|
AMD will require a different CPUID configuration to distinguish Intel VAES from AMD VAES-256.
A new compile architecture is required to generate the new 256 bit AES instructions.
There is no performance gain, with Zen AVX2 256 bit instructions are implemented internally as 2 128 bit instructions.
In cpuminer-opt all VAES is 4-way 512 bits and requires proper AVX512 support (AVX512F for 512 bit) with expected performance gains.
There is nothing to be gained in cpuminer-opt by coding for 256 bit AES or recompiling the existing code for Zen3.
Same CPUID bit as far as I can see: "VAES CPUID Fn0000_0007_ECX[VAES]_x0 (bit 9)" from https://www.amd.com/system/files/TechDocs/26568.pdf"Zen AVX2 256 bit instructions are implemented internally as 2 128 bit instructions" what? Zen2/Zen3 has full 256-bit FP unit and AES instructions run there. VAES also runs there at full 256 bit throughput. Throughput of AVX2 is easy to test. Just run a cpuminer-opt benchmark of sha256t algo using cpuminer-avx vs cpuminer-avx2. The initial Zen1 AVX2 is known to be a hack that uses 128 bits internally and there's no indication in any reviews that Zen3 is any better. But a test will confirm it. I'd appreciate if you could provide the exact feature list and do some performance testing of AVX vs AVX2 when you get yours. Regardless, there is currently no code in cpuminer-opt that does 2 way parallel hashing, with of without AES. 2 way hashing, even on Intel, doesn't provide enough performance gain to overcome the extra overhead. Edit: there may be some issues compiling for Zen3 with vaes. There is some code that assumes AVX512 is included if VAES is present. This will require a new release of cpuminer-opt. In addition, there is no compiler support for znver3 yet. The workaround is to compile with -march=znver2 until both are fixed. There are no performance implications because cpuminer-opt has no code that can take advantage of 256 bit VAES. Prebuilt Windows binaries are not affected, znver1 is used for the zen build.
|
|
|
|
JayDDee (OP)
|
|
November 09, 2020, 06:45:04 PM |
|
cpuminer-opt-3.15.1Fix compile on AMD Zen3 CPUs with VAES. Force new work immediately after solving a block solo. https://github.com/JayDDee/cpuminer-optNotes for Zen3: Zen 3 adds VAES for 256 bit vectors. Although cpuminer-opt supports VAES with 512 bit vectors it does not support 256 bit VAES. This will result in an algo's VAES optimizations not being used on Zen3. Compilers don't yet support the new znver3 architecture flag. Using znver2 is recommended when compiling from source code. It is possible to compile Zen3 with VAES by using "-march=znver2 -mvaes", but it makes no difference to cpuminer-opt. Windows users should use the cpuminer-zen build on Zen3 CPUs. Once again the question of AVX2 performance on Ryzen has surfaced, this time with a new twist: VAES. Although Zen 3 includes some significant architectural changes none of them seem to apply to the execution engine. The ease it which 256 bit VAES was added without any additional instructions suggests VAES was implemented using the existing 128 bit AES hardware. This would be consistent with the previous AVX2 implementation in Ryzen which performs 128 bit instructions internally and results in no performance increase. No actual testing was done on a Zen3 CPU, I don't have one.
|
|
|
|
sech1
|
|
November 09, 2020, 09:43:27 PM |
|
Once again the question of AVX2 performance on Ryzen has surfaced, this time with a new twist: VAES. Although Zen 3 includes some significant architectural changes none of them seem to apply to the execution engine. The ease it which 256 bit VAES was added without any additional instructions suggests VAES was implemented using the existing 128 bit AES hardware. This would be consistent with the previous AVX2 implementation in Ryzen which performs 128 bit instructions internally and results in no performance increase.
No actual testing was done on a Zen3 CPU, I don't have one.
VAES is full 256 bit in Zen3, I've tested it. VAES 256-bit instructions have the same latency/twice the throughput compared to 128-bit AES instructions. I've also tested two different VAES implementations for RandomX and they both didn't give any speedup. The AES part of RandomX is limited by AES instruction latency, not bandwidth. Edit: the measured latency was 4 cycles for both AESENC and VAESENC. Throughput was 2 instructions/cycle for both.
|
|
|
|
JayDDee (OP)
|
|
November 09, 2020, 11:07:48 PM Last edit: November 10, 2020, 05:43:56 AM by JayDDee |
|
Once again the question of AVX2 performance on Ryzen has surfaced, this time with a new twist: VAES. Although Zen 3 includes some significant architectural changes none of them seem to apply to the execution engine. The ease it which 256 bit VAES was added without any additional instructions suggests VAES was implemented using the existing 128 bit AES hardware. This would be consistent with the previous AVX2 implementation in Ryzen which performs 128 bit instructions internally and results in no performance increase.
No actual testing was done on a Zen3 CPU, I don't have one.
VAES is full 256 bit in Zen3, I've tested it. VAES 256-bit instructions have the same latency/twice the throughput compared to 128-bit AES instructions. I've also tested two different VAES implementations for RandomX and they both didn't give any speedup. The AES part of RandomX is limited by AES instruction latency, not bandwidth. Edit: the measured latency was 4 cycles for both AESENC and VAESENC. Throughput was 2 instructions/cycle for both. Can you test AVX2 performance vs AVX? If VAES gives full 256 bit throughput I would expect AVX2 to do so as well. In previous generations of Ryzen it did not. I have to admit I'm skeptical because adding 256 bit VAES would be very simple to add and would only require a change to the instruction decoder to support the new opcode. Supporting 256 bit throughput is a more radical change and would require a redesigned vector execution unit that would also benefit AVX2. If AVX2 performance improved, it will convince me that Zen3 has an improved vector unit. Edit: I ran a test to demonstrate the poor AVX2 performance: CPU: R7-1700 OS: Ubuntu-20.04, cpuminer compiled using build-allarch.sh. Windows binaries can also be used. Benchmark test sha256t and blake2s using cpuminer-avx2 & cpuminer-avx. These algos are 100% SIMD and the AVX2 code is identical to AVX except for the vector size. sha256t blake2s AVX2 26.5 Mh/s 141.5 Mh/s AVX 25.8 Mh/s 129.7 Mh/s
AVX2 should be close to double AVX. It would be nice if AMD fixed that for Zen3.
|
|
|
|
sech1
|
|
November 10, 2020, 07:00:54 AM |
|
Can you test AVX2 performance vs AVX? If VAES gives full 256 bit throughput I would expect AVX2 to do so as well. In previous generations of Ryzen it did not. Should be close to double AVX. It would be nice if AMD fixed that for Zen3.
It is full 256 bit since Zen2, no need to test it. Only Zen1/Zen+ had 128-bit FP ALU and splitted AVX instructions in 2. Read: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Key_changes_from_Zen.2B "2x wider datapath (256-bit, up from 128-bit)"
|
|
|
|
JayDDee (OP)
|
|
November 10, 2020, 01:22:49 PM |
|
Can you test AVX2 performance vs AVX? If VAES gives full 256 bit throughput I would expect AVX2 to do so as well. In previous generations of Ryzen it did not. Should be close to double AVX. It would be nice if AMD fixed that for Zen3.
It is full 256 bit since Zen2, no need to test it. Only Zen1/Zen+ had 128-bit FP ALU and splitted AVX instructions in 2. Read: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Key_changes_from_Zen.2B "2x wider datapath (256-bit, up from 128-bit)" If I've been wrong it's because I believed what someone told me. To quote Roger Daltrey, "I won't get fooled again". Please test.
|
|
|
|
sech1
|
|
November 10, 2020, 03:15:27 PM |
|
Can you test AVX2 performance vs AVX? If VAES gives full 256 bit throughput I would expect AVX2 to do so as well. In previous generations of Ryzen it did not. Should be close to double AVX. It would be nice if AMD fixed that for Zen3.
It is full 256 bit since Zen2, no need to test it. Only Zen1/Zen+ had 128-bit FP ALU and splitted AVX instructions in 2. Read: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Key_changes_from_Zen.2B "2x wider datapath (256-bit, up from 128-bit)" If I've been wrong it's because I believed what someone told me. To quote Roger Daltrey, "I won't get fooled again". Please test. Official slide from AMD: https://www.techpowerup.com/review/amd-ryzen-5-3600/images/arch1.jpg"Now supports single-op AVX256" Ryzen 7 4700U (Zen2) laptop (lots of stuff running in there, but the difference is obvious): cpuminer-aes-sse42 --benchmark --algo=blake2s: 106 MH/s cpuminer-avx.exe --benchmark --algo=blake2s: 113 MH/s cpuminer-avx2.exe --benchmark --algo=blake2s: 195 MH/s
|
|
|
|
JayDDee (OP)
|
|
November 10, 2020, 05:10:47 PM |
|
Official slide from AMD: https://www.techpowerup.com/review/amd-ryzen-5-3600/images/arch1.jpg"Now supports single-op AVX256" Ryzen 7 4700U (Zen2) laptop (lots of stuff running in there, but the difference is obvious): cpuminer-aes-sse42 --benchmark --algo=blake2s: 106 MH/s cpuminer-avx.exe --benchmark --algo=blake2s: 113 MH/s cpuminer-avx2.exe --benchmark --algo=blake2s: 195 MH/s Thanks very much, the numbers are convincing. That means there would be some benefit from 256 bit VAES on Zen3. I'll look into it further. It could improve the old X algos with faster groestl, shavite and echo. RandomX is a little different. It can benefit proportionaly more with VAES512. Some of the AES sequences alternate AESENC with AESDEC so they can't be paired. VAES512 can still provide a near 2x improvement in the AES performance, whlile the pure AESENC or AESDEC sequences get near 4x. I don't know how much AES factors in the performance of RandomX as a whole.
|
|
|
|
sech1
|
|
November 10, 2020, 07:16:01 PM |
|
RandomX is a little different. It can benefit proportionaly more with VAES512. Some of the AES sequences alternate AESENC with AESDEC so they can't be paired. VAES512 can still provide a near 2x improvement in the AES performance, whlile the pure AESENC or AESDEC sequences get near 4x. I don't know how much AES factors in the performance of RandomX as a whole.
RandomX is limited by AES instruction latency. Main AES loop has 8 128-bit AES instructions and runs in 4 clock cycles per iteration on Ryzen. With VAES it's 4 256-bit AES instructions but still 4 clock cycles per iteration. It can't be parallelized because each iteration depends on the previous one. AESENC/AESDEC interleaving can be worked around with some clever use of _mm256_permute2x128_si256().
|
|
|
|
nsummy
|
|
November 10, 2020, 08:10:28 PM |
|
I have a feature/documentation request. I think it would be good to document which algos can take advantage of some of these newer CPU instruction sets. I've been mostly a GPU miner but CPU mining really intrigues me and would like to do it as a side project. I also think documenting which algos are no longer "supported" would be beneficial, or somehow segregating them from the rest. Just looking at the algo list its apparent that most of the included algos will never be seriously mined by a CPU. I definitely appreciate the work though, I have been using cpuminer-opt off and on for years now
|
|
|
|
JayDDee (OP)
|
|
November 10, 2020, 11:04:09 PM |
|
RandomX is a little different. It can benefit proportionaly more with VAES512. Some of the AES sequences alternate AESENC with AESDEC so they can't be paired. VAES512 can still provide a near 2x improvement in the AES performance, whlile the pure AESENC or AESDEC sequences get near 4x. I don't know how much AES factors in the performance of RandomX as a whole.
RandomX is limited by AES instruction latency. Main AES loop has 8 128-bit AES instructions and runs in 4 clock cycles per iteration on Ryzen. With VAES it's 4 256-bit AES instructions but still 4 clock cycles per iteration. It can't be parallelized because each iteration depends on the previous one. AESENC/AESDEC interleaving can be worked around with some clever use of _mm256_permute2x128_si256(). The extra permutes would kill the advantage of 2 way parallel AES, but yes it can be done, 4 way parallel (avx512) might overcome the penalty. I looked at RandomX VAES a while ago but couldn't figure out how to enable AVX512 to compile with cmake. I'm not good with c++ either. I'm playing with AVX2+VAES on my Icelake laptop, it looks like x17 with get a 7% boost by using VAES for groestl, Shavite & Echo. I assume similar for Zen3. If things work out the next release may include a zen3 build with VAES in addition to AVX2 & SHA.
|
|
|
|
|