Bitcoin Forum
November 04, 2024, 01:05:43 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
Author Topic: [ANN] cpuminer-opt v24.5, Optimized multi-algo CPU miner for x86_64 and AArch64  (Read 10265 times)
whotheff
Member
**
Offline Offline

Activity: 762
Merit: 35


View Profile WWW
July 09, 2020, 09:25:56 AM
 #141

I don't know if it is the Algo or the miner, but with Yescryptr32 AMD Ryzen CPUs can barely produce anything.

webhead
Newbie
*
Offline Offline

Activity: 123
Merit: 0


View Profile
August 10, 2020, 05:34:14 PM
 #142

hello can you add xla panthera algo many thanks.
JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
August 11, 2020, 02:52:41 AM
 #143

hello can you add xla panthera algo many thanks.

The algo is a modified randomx and the existing miner is a fork of xmrig. I can't do better.

BoozyTalking
Newbie
*
Offline Offline

Activity: 315
Merit: 0


View Profile
August 13, 2020, 12:11:31 PM
 #144

Hi, JayDDee. What is this error mean:
Code:
[2020-08-13 15:08:46] JSON-RPC call failed: Block does not start with a coinbase
[2020-08-13 15:08:46] submit_upstream_work json_rpc_call failed
[2020-08-13 15:08:46] ...retry after 10 seconds
JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
August 13, 2020, 02:31:01 PM
 #145

Hi, JayDDee. What is this error mean:
Code:
[2020-08-13 15:08:46] JSON-RPC call failed: Block does not start with a coinbase
[2020-08-13 15:08:46] submit_upstream_work json_rpc_call failed
[2020-08-13 15:08:46] ...retry after 10 seconds

That is an error reported by the stratum server, never seen it before. Need more context.

BoozyTalking
Newbie
*
Offline Offline

Activity: 315
Merit: 0


View Profile
August 17, 2020, 09:15:40 AM
 #146

QAC coin solo mining with local wallet.
Happen after share is found.
JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
October 02, 2020, 02:55:59 PM
 #147

cpuminer-opt-3.15.0

Fugue optimized with AES, improves many sha3 algos.
Minotaur algo optimized for all architectures.
Fixed neoscrypt BUG log.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.15.0

JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
November 06, 2020, 02:18:28 AM
 #148

Comment regarding AMD Zen3 (Ryzen 5000)

Zen 3 includes VAES for 256 bit (AVX2) vectors. This is a 2 way parallel operation backported
from Intel AVX512VL. All Intel CPUs require AVX512VL and VAES to implement 256 bit parallel AES.

A few bullet points:

AMD will require a different CPUID configuration to distinguish Intel VAES from AMD VAES-256.

A new compile architecture is required to generate the new 256 bit AES instructions.

There is no performance gain, with Zen AVX2 256 bit instructions are implemented internally as
2 128 bit instructions.

In cpuminer-opt all VAES is 4-way 512 bits and requires proper AVX512 support (AVX512F for 512 bit)
with expected performance gains.

There is nothing to be gained in cpuminer-opt by coding for 256 bit AES or recompiling the existing code for Zen3.

sech1
Member
**
Offline Offline

Activity: 116
Merit: 66


View Profile WWW
November 06, 2020, 08:42:38 AM
 #149

AMD will require a different CPUID configuration to distinguish Intel VAES from AMD VAES-256.

A new compile architecture is required to generate the new 256 bit AES instructions.

There is no performance gain, with Zen AVX2 256 bit instructions are implemented internally as
2 128 bit instructions.

In cpuminer-opt all VAES is 4-way 512 bits and requires proper AVX512 support (AVX512F for 512 bit)
with expected performance gains.

There is nothing to be gained in cpuminer-opt by coding for 256 bit AES or recompiling the existing code for Zen3.

Same CPUID bit as far as I can see: "VAES CPUID Fn0000_0007_ECX[VAES]_x0 (bit 9)" from https://www.amd.com/system/files/TechDocs/26568.pdf
"Zen AVX2 256 bit instructions are implemented internally as 2 128 bit instructions" what? Zen2/Zen3 has full 256-bit FP unit and AES instructions run there. VAES also runs there at full 256 bit throughput.
JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
November 06, 2020, 02:12:44 PM
Last edit: November 06, 2020, 07:18:15 PM by JayDDee
 #150

AMD will require a different CPUID configuration to distinguish Intel VAES from AMD VAES-256.

A new compile architecture is required to generate the new 256 bit AES instructions.

There is no performance gain, with Zen AVX2 256 bit instructions are implemented internally as
2 128 bit instructions.

In cpuminer-opt all VAES is 4-way 512 bits and requires proper AVX512 support (AVX512F for 512 bit)
with expected performance gains.

There is nothing to be gained in cpuminer-opt by coding for 256 bit AES or recompiling the existing code for Zen3.

Same CPUID bit as far as I can see: "VAES CPUID Fn0000_0007_ECX[VAES]_x0 (bit 9)" from https://www.amd.com/system/files/TechDocs/26568.pdf
"Zen AVX2 256 bit instructions are implemented internally as 2 128 bit instructions" what? Zen2/Zen3 has full 256-bit FP unit and AES instructions run there. VAES also runs there at full 256 bit throughput.

Throughput of AVX2 is easy to test. Just run a cpuminer-opt benchmark of sha256t algo using cpuminer-avx vs cpuminer-avx2.
The initial Zen1 AVX2 is known to be a hack that uses 128 bits internally and there's no indication in any reviews that Zen3
is any better. But a test will confirm it.

I'd appreciate if you could provide the exact feature list and do some performance testing of AVX vs AVX2 when you get yours.
Regardless, there is currently no code in cpuminer-opt that does 2 way parallel hashing, with of without AES. 2 way hashing, even
on Intel, doesn't provide enough performance gain to overcome the extra overhead.


Edit: there may be some issues compiling for Zen3 with vaes.
There is some code that assumes AVX512 is included if VAES is present. This will require a new release of cpuminer-opt.
In addition, there is no compiler support for znver3 yet.
The workaround is to compile with -march=znver2 until both are fixed.
There are no performance implications because cpuminer-opt has no code that can take advantage of 256 bit VAES.
Prebuilt Windows binaries are not affected, znver1 is used for the zen build.


JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
November 09, 2020, 06:45:04 PM
 #151

cpuminer-opt-3.15.1

Fix compile on AMD Zen3 CPUs with VAES.
Force new work immediately after solving a block solo.

https://github.com/JayDDee/cpuminer-opt


Notes for Zen3:

Zen 3 adds VAES for 256 bit vectors. Although cpuminer-opt supports VAES with 512 bit vectors it does
not support 256 bit VAES. This will result in an algo's VAES optimizations not being used on Zen3.

Compilers don't yet support the new znver3 architecture flag. Using znver2 is recommended when compiling
from source code.

It is possible to compile Zen3 with VAES by using "-march=znver2 -mvaes", but it makes no difference to cpuminer-opt.

Windows users should use the cpuminer-zen build on Zen3 CPUs.

Once again the question of AVX2 performance on Ryzen has surfaced, this time with a new twist: VAES.
Although Zen 3 includes some significant architectural changes none of them seem to apply to the execution engine.
The ease it which 256 bit VAES was added without any additional instructions suggests VAES was implemented
using the existing 128 bit AES hardware. This would be consistent with the previous AVX2 implementation in Ryzen
which performs 128 bit instructions internally and results in no performance increase.

No actual testing was done on a Zen3 CPU, I don't have one.

 

sech1
Member
**
Offline Offline

Activity: 116
Merit: 66


View Profile WWW
November 09, 2020, 09:43:27 PM
 #152

Once again the question of AVX2 performance on Ryzen has surfaced, this time with a new twist: VAES.
Although Zen 3 includes some significant architectural changes none of them seem to apply to the execution engine.
The ease it which 256 bit VAES was added without any additional instructions suggests VAES was implemented
using the existing 128 bit AES hardware. This would be consistent with the previous AVX2 implementation in Ryzen
which performs 128 bit instructions internally and results in no performance increase.

No actual testing was done on a Zen3 CPU, I don't have one.
VAES is full 256 bit in Zen3, I've tested it. VAES 256-bit instructions have the same latency/twice the throughput compared to 128-bit AES instructions. I've also tested two different VAES implementations for RandomX and they both didn't give any speedup. The AES part of RandomX is limited by AES instruction latency, not bandwidth.

Edit: the measured latency was 4 cycles for both AESENC and VAESENC. Throughput was 2 instructions/cycle for both.
JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
November 09, 2020, 11:07:48 PM
Last edit: November 10, 2020, 05:43:56 AM by JayDDee
 #153

Once again the question of AVX2 performance on Ryzen has surfaced, this time with a new twist: VAES.
Although Zen 3 includes some significant architectural changes none of them seem to apply to the execution engine.
The ease it which 256 bit VAES was added without any additional instructions suggests VAES was implemented
using the existing 128 bit AES hardware. This would be consistent with the previous AVX2 implementation in Ryzen
which performs 128 bit instructions internally and results in no performance increase.

No actual testing was done on a Zen3 CPU, I don't have one.
VAES is full 256 bit in Zen3, I've tested it. VAES 256-bit instructions have the same latency/twice the throughput compared to 128-bit AES instructions. I've also tested two different VAES implementations for RandomX and they both didn't give any speedup. The AES part of RandomX is limited by AES instruction latency, not bandwidth.

Edit: the measured latency was 4 cycles for both AESENC and VAESENC. Throughput was 2 instructions/cycle for both.

Can you test AVX2 performance vs AVX? If VAES gives full 256 bit throughput I would expect AVX2 to do so as well.
In previous generations of Ryzen it did not.

I have to admit I'm skeptical because adding 256 bit VAES would be very simple to add and would only require a change
to the instruction decoder to support the new opcode. Supporting 256 bit throughput is a more radical change and would
require a redesigned vector execution unit that would also benefit AVX2.

If AVX2 performance improved, it will convince me that Zen3 has an improved vector unit.

Edit: I ran a test to demonstrate the poor AVX2 performance:

CPU: R7-1700
OS: Ubuntu-20.04, cpuminer compiled using build-allarch.sh. Windows binaries can also be used.

Benchmark test sha256t and blake2s using cpuminer-avx2 & cpuminer-avx. These algos are 100% SIMD
and the AVX2 code is identical to AVX except for the vector size.

Code:
          sha256t        blake2s
AVX2      26.5 Mh/s    141.5 Mh/s
AVX       25.8 Mh/s    129.7 Mh/s

AVX2 should be close to double AVX. It would be nice if AMD fixed that for Zen3.

sech1
Member
**
Offline Offline

Activity: 116
Merit: 66


View Profile WWW
November 10, 2020, 07:00:54 AM
 #154

Can you test AVX2 performance vs AVX? If VAES gives full 256 bit throughput I would expect AVX2 to do so as well.
In previous generations of Ryzen it did not.
Should be close to double AVX. It would be nice if AMD fixed that for Zen3.
It is full 256 bit since Zen2, no need to test it. Only Zen1/Zen+ had 128-bit FP ALU and splitted AVX instructions in 2.
Read: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Key_changes_from_Zen.2B "2x wider datapath (256-bit, up from 128-bit)"
JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
November 10, 2020, 01:22:49 PM
 #155

Can you test AVX2 performance vs AVX? If VAES gives full 256 bit throughput I would expect AVX2 to do so as well.
In previous generations of Ryzen it did not.
Should be close to double AVX. It would be nice if AMD fixed that for Zen3.
It is full 256 bit since Zen2, no need to test it. Only Zen1/Zen+ had 128-bit FP ALU and splitted AVX instructions in 2.
Read: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Key_changes_from_Zen.2B "2x wider datapath (256-bit, up from 128-bit)"

If I've been wrong it's because I believed what someone told me. To quote Roger Daltrey, "I won't get fooled again". Please test.

sech1
Member
**
Offline Offline

Activity: 116
Merit: 66


View Profile WWW
November 10, 2020, 03:15:27 PM
 #156

Can you test AVX2 performance vs AVX? If VAES gives full 256 bit throughput I would expect AVX2 to do so as well.
In previous generations of Ryzen it did not.
Should be close to double AVX. It would be nice if AMD fixed that for Zen3.
It is full 256 bit since Zen2, no need to test it. Only Zen1/Zen+ had 128-bit FP ALU and splitted AVX instructions in 2.
Read: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Key_changes_from_Zen.2B "2x wider datapath (256-bit, up from 128-bit)"

If I've been wrong it's because I believed what someone told me. To quote Roger Daltrey, "I won't get fooled again". Please test.
Official slide from AMD: https://www.techpowerup.com/review/amd-ryzen-5-3600/images/arch1.jpg
"Now supports single-op AVX256"

Ryzen 7 4700U (Zen2) laptop (lots of stuff running in there, but the difference is obvious):
cpuminer-aes-sse42 --benchmark --algo=blake2s: 106 MH/s
cpuminer-avx.exe --benchmark --algo=blake2s: 113 MH/s
cpuminer-avx2.exe --benchmark --algo=blake2s: 195 MH/s
JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
November 10, 2020, 05:10:47 PM
 #157

Official slide from AMD: https://www.techpowerup.com/review/amd-ryzen-5-3600/images/arch1.jpg
"Now supports single-op AVX256"

Ryzen 7 4700U (Zen2) laptop (lots of stuff running in there, but the difference is obvious):
cpuminer-aes-sse42 --benchmark --algo=blake2s: 106 MH/s
cpuminer-avx.exe --benchmark --algo=blake2s: 113 MH/s
cpuminer-avx2.exe --benchmark --algo=blake2s: 195 MH/s

Thanks very much, the numbers are convincing.

That means there would be some benefit from 256 bit VAES on Zen3. I'll look into it further.
It could improve the old X algos with faster groestl, shavite and echo.

RandomX is a little different. It can benefit proportionaly more with VAES512. Some of the AES sequences
alternate AESENC with AESDEC so they can't be paired. VAES512 can still provide a near 2x improvement
in the AES performance, whlile the pure AESENC or AESDEC sequences get near 4x. I don't know how much
AES factors in the performance of RandomX as a whole.

sech1
Member
**
Offline Offline

Activity: 116
Merit: 66


View Profile WWW
November 10, 2020, 07:16:01 PM
 #158

RandomX is a little different. It can benefit proportionaly more with VAES512. Some of the AES sequences
alternate AESENC with AESDEC so they can't be paired. VAES512 can still provide a near 2x improvement
in the AES performance, whlile the pure AESENC or AESDEC sequences get near 4x. I don't know how much
AES factors in the performance of RandomX as a whole.
RandomX is limited by AES instruction latency. Main AES loop has 8 128-bit AES instructions and runs in 4 clock cycles per iteration on Ryzen. With VAES it's 4 256-bit AES instructions but still 4 clock cycles per iteration. It can't be parallelized because each iteration depends on the previous one. AESENC/AESDEC interleaving can be worked around with some clever use of _mm256_permute2x128_si256().
nsummy
Full Member
***
Offline Offline

Activity: 1179
Merit: 131


View Profile
November 10, 2020, 08:10:28 PM
 #159

I have a feature/documentation request.  I think it would be good to document which algos can take advantage of some of these newer CPU instruction sets.  I've been mostly a GPU miner but CPU mining really intrigues me and would like to do it as a side project.  I also think documenting which algos are no longer "supported" would be beneficial, or somehow segregating them from the rest.  Just looking at the algo list its apparent that most of the included algos will never be seriously mined by a CPU.  I definitely appreciate the work though, I have been using cpuminer-opt off and on for years now
JayDDee (OP)
Full Member
***
Offline Offline

Activity: 1424
Merit: 225


View Profile
November 10, 2020, 11:04:09 PM
 #160

RandomX is a little different. It can benefit proportionaly more with VAES512. Some of the AES sequences
alternate AESENC with AESDEC so they can't be paired. VAES512 can still provide a near 2x improvement
in the AES performance, whlile the pure AESENC or AESDEC sequences get near 4x. I don't know how much
AES factors in the performance of RandomX as a whole.
RandomX is limited by AES instruction latency. Main AES loop has 8 128-bit AES instructions and runs in 4 clock cycles per iteration on Ryzen. With VAES it's 4 256-bit AES instructions but still 4 clock cycles per iteration. It can't be parallelized because each iteration depends on the previous one. AESENC/AESDEC interleaving can be worked around with some clever use of _mm256_permute2x128_si256().

The extra permutes would kill the advantage of 2 way parallel AES, but yes it can be done,
4 way parallel (avx512) might overcome the penalty.

I looked at RandomX VAES a while ago but couldn't figure out how to enable AVX512 to compile with cmake. I'm not good with c++ either.

I'm playing with AVX2+VAES on my Icelake laptop, it looks like x17 with get a 7% boost by using VAES for groestl, Shavite & Echo.
I assume similar for Zen3.

If things work out the next release may include a zen3 build with VAES in addition to AVX2 & SHA.

Pages: « 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!