Bitcoin Forum
March 18, 2019, 02:34:51 PM *
News: Latest Bitcoin Core release: 0.17.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 [152] 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 »
  Print  
Author Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner  (Read 419059 times)
mangoo
Newbie
*
Offline Offline

Activity: 23
Merit: 0


View Profile
December 16, 2017, 05:31:52 AM
 #3021

Stupid mistake, try this change in algo/lyra2/lyra2h.c line 34:

Code:
34c34
<         LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16 );
---
>         LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8);


I presume no news means it now works? I'd like confirmation.

With the following change it still only uses 32 CPUs:

Code:
--- algo/lyra2/lyra2h.c.orig    2017-12-14 23:28:51.000000000 +0000
+++ algo/lyra2/lyra2h.c 2017-12-16 05:29:48.295167452 +0000
@@ -31,7 +31,7 @@
         sph_blake256( &ctx_blake, input + 64, 16 );
         sph_blake256_close( &ctx_blake, hash );
 
-        LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8);
+        LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16);
 
     memcpy(state, hash, 32);
 }

Not sure if I should try your earlier changes as well? If so - could you send a patch in diff -u format?
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1552919691
Hero Member
*
Offline Offline

Posts: 1552919691

View Profile Personal Message (Offline)

Ignore
1552919691
Reply with quote  #2

1552919691
Report to moderator
1552919691
Hero Member
*
Offline Offline

Posts: 1552919691

View Profile Personal Message (Offline)

Ignore
1552919691
Reply with quote  #2

1552919691
Report to moderator
1552919691
Hero Member
*
Offline Offline

Posts: 1552919691

View Profile Personal Message (Offline)

Ignore
1552919691
Reply with quote  #2

1552919691
Report to moderator
starcrys
Newbie
*
Offline Offline

Activity: 27
Merit: 0


View Profile
December 16, 2017, 07:25:27 AM
Last edit: December 16, 2017, 08:49:56 AM by starcrys
 #3022

Hi, I just wanna check if anyone's mining m7m on zpool? I'm having trouble connecting to the server, keeps saying "stratum_subscribe send failed". Is it just me or is the pool having some issues? I've tried connecting to xevan on zpool and it seems that there wasn't any connection problems for xevan...
Drag0g0
Newbie
*
Offline Offline

Activity: 62
Merit: 0


View Profile
December 16, 2017, 08:00:28 AM
 #3023

Im getting "stratum_recv_line failed" with stable connection, did try differend pools and no help.

It happen every ~15min

Trying mine Yenten.
nizzuu
Full Member
***
Offline Offline

Activity: 193
Merit: 100

Cryptocurrency enthusiast


View Profile
December 16, 2017, 11:20:58 AM
 #3024

Im getting "stratum_recv_line failed" with stable connection, did try differend pools and no help.

It happen every ~15min

Trying mine Yenten.

Pool issues, or your hardaware is too slow to send at least one share in the desired period of time (e.g. 15mins for your pool), so the pool thinks you're not there. Try to decrease diff (use fixed diff if this pool supports it, or a port with a lower diff).
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 16, 2017, 02:26:20 PM
Last edit: December 16, 2017, 02:53:30 PM by joblo
 #3025

Stupid mistake, try this change in algo/lyra2/lyra2h.c line 34:

Code:
34c34
<         LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16 );
---
>         LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8);


I presume no news means it now works? I'd like confirmation.

With the following change it still only uses 32 CPUs:

Code:
--- algo/lyra2/lyra2h.c.orig    2017-12-14 23:28:51.000000000 +0000
+++ algo/lyra2/lyra2h.c 2017-12-16 05:29:48.295167452 +0000
@@ -31,7 +31,7 @@
         sph_blake256( &ctx_blake, input + 64, 16 );
         sph_blake256_close( &ctx_blake, hash );
 
-        LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 8, 8, 8);
+        LYRA2Z( lyra2h_matrix, hash, 32, hash, 32, hash, 32, 16, 16, 16);
 
     memcpy(state, hash, 32);
 }

Not sure if I should try your earlier changes as well? If so - could you send a patch in diff -u format?

I'm a bit confused by this post.

Your comment about still using 32 CPUs is for my previous post about using 1ULL to force it to 64 bits.
You're saying that didn't work?

The quote above is for a different problem with rejects mining the new lyra2h algo. Is that what you
are now offerring to test?

Edit: I re-read you post a few more times and it appears you're saying that the Lyra2 chage didn't fix
the 32 cpu limit problem you initially reported. It only (hopefully) fixes the rejects from lyra2h reported
by someone else.

This is the proposed fix for the 32 cpu limit:

Code:
@@ -204,7 +204,7 @@
    for ( uint8_t i = 0; i < ncpus; i++ )
    {
       // cpu mask
-      if( (ncpus > 64) || ( mask & (1UL << i) ) )  CPU_SET( i, &set );
+      if( (ncpus > 64) || ( mask & (1ULL << i) ) )  CPU_SET( i, &set );
    }
    if ( id == -1 )
    {
@@ -1690,9 +1690,9 @@
       {
          if (opt_debug)
             applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)",
-                   thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) );
+                   thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) );
 
-         affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
+         affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) );
       }
       else if (opt_affinity != -1)
       {

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
mangoo
Newbie
*
Offline Offline

Activity: 23
Merit: 0


View Profile
December 16, 2017, 04:32:47 PM
 #3026

This is the proposed fix for the 32 cpu limit:

Code:
@@ -204,7 +204,7 @@
    for ( uint8_t i = 0; i < ncpus; i++ )
    {
       // cpu mask
-      if( (ncpus > 64) || ( mask & (1UL << i) ) )  CPU_SET( i, &set );
+      if( (ncpus > 64) || ( mask & (1ULL << i) ) )  CPU_SET( i, &set );
    }
    if ( id == -1 )
    {
@@ -1690,9 +1690,9 @@
       {
          if (opt_debug)
             applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)",
-                   thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) );
+                   thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) );
 
-         affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
+         affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) );
       }
       else if (opt_affinity != -1)
       {

All good now - all CPUs running with this patch, thanks!
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 16, 2017, 05:12:40 PM
 #3027

This is the proposed fix for the 32 cpu limit:

Code:
@@ -204,7 +204,7 @@
    for ( uint8_t i = 0; i < ncpus; i++ )
    {
       // cpu mask
-      if( (ncpus > 64) || ( mask & (1UL << i) ) )  CPU_SET( i, &set );
+      if( (ncpus > 64) || ( mask & (1ULL << i) ) )  CPU_SET( i, &set );
    }
    if ( id == -1 )
    {
@@ -1690,9 +1690,9 @@
       {
          if (opt_debug)
             applog( LOG_DEBUG, "Binding thread %d to cpu %d (mask %x)",
-                   thr_id, thr_id % num_cpus, ( 1 << (thr_id % num_cpus) ) );
+                   thr_id, thr_id % num_cpus, ( 1ULL << (thr_id % num_cpus) ) );
 
-         affine_to_cpu_mask( thr_id, 1 << (thr_id % num_cpus) );
+         affine_to_cpu_mask( thr_id, 1ULL << (thr_id % num_cpus) );
       }
       else if (opt_affinity != -1)
       {

All good now - all CPUs running with this patch, thanks!

Thanks for testing. I still don't understand why it worked before with -1UL (32 bit) but it's moot now.

If I get a response (or after a suitable timeout with no response) for the lyra2h fix I will release both

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
lncm
Member
**
Offline Offline

Activity: 294
Merit: 11


View Profile
December 16, 2017, 10:33:05 PM
 #3028

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory.  You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture? 

Thanks and keep up the good work!
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 16, 2017, 10:49:02 PM
 #3029

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory.  You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture? 

Thanks and keep up the good work!

Virtual memory is slow, you need the real thing.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
lncm
Member
**
Offline Offline

Activity: 294
Merit: 11


View Profile
December 16, 2017, 11:14:54 PM
 #3030

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory.  You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture? 

Thanks and keep up the good work!

Virtual memory is slow, you need the real thing.

I have 16 Gb of Ram, it shouldn't be a problem.
I had a fixed page file size, I set it to auto, and it worked. Maybe a bug?
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 17, 2017, 12:53:36 AM
 #3031

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory.  You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture? 

Thanks and keep up the good work!

Virtual memory is slow, you need the real thing.

I have 16 Gb of Ram, it shouldn't be a problem.
I had a fixed page file size, I set it to auto, and it worked. Maybe a bug?

You don't have enough RAM to run that many threads without using VM. Using VM is slow.
Stop arguing and do the math: N*threads.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
lncm
Member
**
Offline Offline

Activity: 294
Merit: 11


View Profile
December 17, 2017, 11:28:35 AM
 #3032

Yes it's normal and dependent on the algo. It means cpuminer-opt has no optimizations for scrypt algo.

Oh, OK, it's just it previously stated SSE2.

On another subject, I tried 3.7.5 windows binary in my desktop (Ryzen 1700) and all executables fail to start - it states:
"thread xx (random): Scrypt buffer allocation failed Fail: thread xx failed to initiate.

I noted the change in feature reporting in the release announcement.

You're out of memory.  You only have enough memory for xx -1 threads.

Thanks, fiddling around with virtual memory settings allowed it to run.

Performance is still very bad with Ryzen CPU using Scrypt. At same level as a Xeon Westmere-EP 6 cores @ 2.4 GHz. Is this really the CPU fault, or could cpuminer-opt be more optimized for Zen architecture?  

Thanks and keep up the good work!

Virtual memory is slow, you need the real thing.

I have 16 Gb of Ram, it shouldn't be a problem.
I had a fixed page file size, I set it to auto, and it worked. Maybe a bug?

You don't have enough RAM to run that many threads without using VM. Using VM is slow.
Stop arguing and do the math: N*threads.

How many RAM per thread? So if I run less threads could it be actually faster?

Sorry to annoy you with so many questions.

PS: in task manager cpuminer has 11.5 Gb RAM allocated.
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 17, 2017, 03:45:41 PM
 #3033

Sorry to annoy you with so many questions.

You ask snap questions without thinking then you challenge my answers based on your misconceptions.

Running out of memory is a simple problem that you should be able to solve yourself.

You don't need to apologize, just try harder before asking questions. And if you do need to ask a
question about a problem you should show how you tried to solve it. You learn more that way.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 17, 2017, 05:12:42 PM
 #3034

New release cpuminer-opt-3.7.7

Fixed regression caused by 64 CPU support.
Fixed lyra2h.

https://github.com/JayDDee/cpuminer-opt/releases/tag/v3.7.7

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Larvitar
Jr. Member
*
Offline Offline

Activity: 198
Merit: 1


View Profile
December 17, 2017, 05:16:09 PM
 #3035

cpuminer-opt-3.7.6 is released.

Added lyra2h algo for Hppcoin.
Added support for more than 64 CPUs.
Optimized shavite with AES, improves x11 etc.

Get it on git:  https://github.com/JayDDee/cpuminer-opt/releases

More detailed release notes:

Lyra2h has not been tested. It is virtually a clone of lyra2z so it should work.
Please report any problems.

Support for over 64 CPU is limited in that specifying --cpu-affinity has no effect.
The arg will be ignored and he default affinity will be used. This has not been
tested either so if anyone has the ability to test it please do so and report.

There are no new 4way algos this release but optiizing shavite came as a surprise
and helps all CPUs with AES.

The past two releases have also seen some reworking of some existing SIMD code as
I learn new techniques. It should be more efficient but not likely to produce a significant
speed up.

There are currently 2 4way blockers. BMW is blocking full optimization of x11 and blake256
is blocking m7m. I'd like to get those resolved but I'm stuck at the moment. Since m7m is
CPU only I'd like to prioritize that algo.

A few algos have 4way enabled bur are either untested or have known problems that affect
performance.

Tested working: skein, keccak, keccakc, nist5, tribus.

Enabled untested: skein2, jha, whirlpool, pentablake.

Enabled with known problems: blake256 lane corruption: lyra2z, decred, blake.
These algos operate in 2way mode due to invalid hash in 2 lanes.

Kudos for you! Awesome miner Smiley
Lets to the feedback:
I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.

I would like to see SHA enabled and working in Windows, but I saw how difficult are. But, if I could help, I can allow you to connect to my machine to try something. I dont have knowledge about coding, but want help to compile a SHA miner.
My9bot
Full Member
***
Offline Offline

Activity: 243
Merit: 100


View Profile
December 17, 2017, 05:20:53 PM
 #3036

cpuminer-opt-3.7.6 is released.

Added lyra2h algo for Hppcoin.
Added support for more than 64 CPUs.
Optimized shavite with AES, improves x11 etc.

Get it on git:  https://github.com/JayDDee/cpuminer-opt/releases

More detailed release notes:

Lyra2h has not been tested. It is virtually a clone of lyra2z so it should work.
Please report any problems.

Support for over 64 CPU is limited in that specifying --cpu-affinity has no effect.
The arg will be ignored and he default affinity will be used. This has not been
tested either so if anyone has the ability to test it please do so and report.

There are no new 4way algos this release but optiizing shavite came as a surprise
and helps all CPUs with AES.

The past two releases have also seen some reworking of some existing SIMD code as
I learn new techniques. It should be more efficient but not likely to produce a significant
speed up.

There are currently 2 4way blockers. BMW is blocking full optimization of x11 and blake256
is blocking m7m. I'd like to get those resolved but I'm stuck at the moment. Since m7m is
CPU only I'd like to prioritize that algo.

A few algos have 4way enabled bur are either untested or have known problems that affect
performance.

Tested working: skein, keccak, keccakc, nist5, tribus.

Enabled untested: skein2, jha, whirlpool, pentablake.

Enabled with known problems: blake256 lane corruption: lyra2z, decred, blake.
These algos operate in 2way mode due to invalid hash in 2 lanes.

Kudos for you! Awesome miner Smiley
Lets to the feedback:
I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.

I would like to see SHA enabled and working in Windows, but I saw how difficult are. But, if I could help, I can allow you to connect to my machine to try something. I dont have knowledge about coding, but want help to compile a SHA miner.

cpuminer-opt-3.7.7-sha win

https://ufile.io/mkuq4

I'm better with code than with words-SatoshiNakamoto
Espers [ESP]SiteOnBlockchain
Larvitar
Jr. Member
*
Offline Offline

Activity: 198
Merit: 1


View Profile
December 17, 2017, 05:34:51 PM
Last edit: December 17, 2017, 05:52:25 PM by Larvitar
 #3037


cpuminer-opt-3.7.7-sha win

https://ufile.io/mkuq4


Thank you!  Cheesy

EDIT:
Starting miner it asks for libcrypto-1_1-x64.dll. Do I need it or just have to rename the libcrypto1.0.0.dll?

EDIT2:
Solved by installing OpenSSL 1.1 x64.
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 17, 2017, 06:09:52 PM
Last edit: December 17, 2017, 06:47:32 PM by joblo
 #3038


I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.

This is very interesting feedback.  I get 340 kH/s per thread 4way vs 255 kH/s AVX2 1way on my i7-6700K @4GHz.

Something isn't right, need lots of details to eliminate simple stuff. Can you post the startup for both?
None of the following should cause that much of a difference, but it helps to quantify.

AMD AVX2 performance is known to be slower than AVX. Try running a test with just AVX2 and again
with AVX to compare. Another, better, way to copmare AVX2 vs AVX performance is lyra2rev2. It has the most
AVX2 code.

4way uses 4 times the memory of plain AVX2. This will expose any cache performance issues. Try running fewer
threads to see if performance (total, not just per thread) improves.

Try tribus algo, it's pure 4way parallel while nist5 has a serial component which reduces gain and adds some overhead.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 17, 2017, 06:40:55 PM
 #3039


cpuminer-opt-3.7.7-sha win

https://ufile.io/mkuq4


Thanks for that. Do you have a howto guide? I need to file it for when I finally upgrade my build environment

With your permission I will add your link to the OP.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Larvitar
Jr. Member
*
Offline Offline

Activity: 198
Merit: 1


View Profile
December 17, 2017, 06:53:04 PM
Last edit: December 17, 2017, 08:06:59 PM by Larvitar
 #3040


I have a Ryzen 7 1700 at 3.7GHz. The 4way is around 15% slower than AES-AVX/AVX2 mining nist5. Around 240KH/s per core (8 threads) to 4way and 270KH/s per core to AES-AVX2. Its working stable, but with less performance. I can get 2.1~2.2MH/s NIST5.

This is very interesting feedback.  I get 340 kH/s per thread 4way vs 255 kH/s AVX2 1way on my i7-6700K @4GHz.

Something isn't right, need lots of details to eliminate simple stuff. Can you post the startup for both?
None of the following should cause that much of a difference, but it helps to quantify.

AMD AVX2 performance is known to be slower than AVX. Try running a test with just AVX2 and again
with AVX to compare.

4way uses 4 time the memory of plain AVX2. This will expose any cache performance issues. Try running fewer
threads to see if performance (total, not just per thread) improves.

Try tribus algo, it's pure 4way parallel while nist5 has a serial component which reduces gain and adds some overhead.

Thanks for the reply.

About Tribus (3.7.7 version):

Tribus AVX 16 threads:
Code:
[2017-12-17 15:45:48] tribus block 449382, diff 297.717
[2017-12-17 15:45:48] CPU #3: 73.32 kH, 226.66 kH/s
[2017-12-17 15:45:48] CPU #2: 60.95 kH, 225.42 kH/s
[2017-12-17 15:45:48] CPU #1: 68.89 kH, 228.54 kH/s
[2017-12-17 15:45:48] CPU #0: 59.57 kH, 220.31 kH/s
[2017-12-17 15:45:48] CPU #7: 71.66 kH, 226.42 kH/s
[2017-12-17 15:45:48] CPU #4: 47.67 kH, 206.94 kH/s
[2017-12-17 15:45:48] CPU #14: 69.70 kH, 228.19 kH/s
[2017-12-17 15:45:48] CPU #6: 66.07 kH, 226.71 kH/s
[2017-12-17 15:45:48] CPU #12: 36.67 kH, 223.24 kH/s
[2017-12-17 15:45:48] CPU #15: 69.95 kH, 228.24 kH/s
[2017-12-17 15:45:48] CPU #11: 66.53 kH, 225.95 kH/s
[2017-12-17 15:45:48] CPU #5: 70.96 kH, 227.81 kH/s
[2017-12-17 15:45:48] CPU #10: 312.06 kH, 275.75 kH/s
[2017-12-17 15:45:48] CPU #8: 43.73 kH, 172.57 kH/s
[2017-12-17 15:45:48] CPU #9: 68.83 kH, 238.64 kH/s
[2017-12-17 15:45:48] CPU #13: 72.51 kH, 228.39 kH/s

Tribus AVX2 16 threads:
Code:
[2017-12-17 15:45:48][2017-12-17 15:49:10] tribus block 449390, diff 254.451
[2017-12-17 15:49:10] CPU #4: 97.38 kH, 211.38 kH/s
[2017-12-17 15:49:10] CPU #6: 110.08 kH, 237.92 kH/s
[2017-12-17 15:49:10] CPU #7: 110.38 kH, 238.04 kH/s
[2017-12-17 15:49:10] CPU #0: 103.07 kH, 221.32 kH/s
[2017-12-17 15:49:10] CPU #1: 109.05 kH, 234.17 kH/s
[2017-12-17 15:49:10] CPU #9: 109.41 kH, 238.00 kH/s
[2017-12-17 15:49:10] CPU #8: 108.26 kH, 234.98 kH/s
[2017-12-17 15:49:10] CPU #13: 109.99 kH, 238.22 kH/s
[2017-12-17 15:49:10] CPU #5: 112.40 kH, 241.36 kH/s
[2017-12-17 15:49:10] CPU #11: 111.49 kH, 239.40 kH/s
[2017-12-17 15:49:10] CPU #3: 111.29 kH, 238.97 kH/s
[2017-12-17 15:49:10] CPU #15: 110.46 kH, 238.21 kH/s
[2017-12-17 15:49:10] CPU #2: 110.69 kH, 237.67 kH/s
[2017-12-17 15:49:10] CPU #10: 111.39 kH, 239.19 kH/s
[2017-12-17 15:49:10] CPU #14: 110.70 kH, 237.20 kH/s
[2017-12-17 15:49:10] CPU #12: 94.46 kH, 199.39 kH/s
[2017-12-17 15:49:15] CPU #12: 836.08 kH, 196.43 kH/s
[2017-12-17 15:49:15] Accepted 1/1 (100%), 2472.11 kH, 3722.47 kH/s


Tribus 4way 16 threads:
Code:
[2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 15:50:38] tribus block 449392, diff 221.049
[2017-12-17 15:50:38] CPU #0: 2552.29 kH, 340.11 kH/s
[2017-12-17 15:50:38] CPU #1: 3076.95 kH, 410.02 kH/s
[2017-12-17 15:50:38] CPU #12: 2199.45 kH, 293.25 kH/s
[2017-12-17 15:50:38] CPU #8: 2508.86 kH, 334.41 kH/s
[2017-12-17 15:50:38] CPU #14: 2807.39 kH, 374.11 kH/s
[2017-12-17 15:50:38] CPU #9: 3002.02 kH, 400.25 kH/s
[2017-12-17 15:50:38] CPU #2: 2978.50 kH, 396.85 kH/s
[2017-12-17 15:50:38] CPU #3: 2993.07 kH, 398.79 kH/s
[2017-12-17 15:50:38] CPU #5: 2997.27 kH, 399.67 kH/s
[2017-12-17 15:50:38] CPU #4: 2927.24 kH, 390.44 kH/s
[2017-12-17 15:50:38] CPU #6: 2954.16 kH, 393.72 kH/s
[2017-12-17 15:50:38] CPU #7: 2983.57 kH, 397.69 kH/s
[2017-12-17 15:50:38] CPU #11: 3005.27 kH, 400.79 kH/s
[2017-12-17 15:50:38] CPU #15: 2946.88 kH, 393.06 kH/s
[2017-12-17 15:50:38] CPU #10: 2947.45 kH, 392.77 kH/s
[2017-12-17 15:50:38] CPU #13: 2742.90 kH, 365.66 kH/s


Tribus 4way 8 threads:
Code:
[2017-12-17 15:45:48][2017-12-17 15:49:10] [2017-12-17 17:05:32] tribus block 449483, diff 735.578
[2017-12-17 17:05:32] CPU #7: 461.65 kH, 398.07 kH/s
[2017-12-17 17:05:32] CPU #6: 460.63 kH, 398.21 kH/s
[2017-12-17 17:05:32] CPU #5: 460.43 kH, 397.70 kH/s
[2017-12-17 17:05:32] CPU #2: 460.88 kH, 397.74 kH/s
[2017-12-17 17:05:32] CPU #4: 460.51 kH, 397.76 kH/s
[2017-12-17 17:05:32] CPU #3: 460.82 kH, 398.03 kH/s
[2017-12-17 17:05:32] CPU #0: 454.80 kH, 393.86 kH/s
[2017-12-17 17:05:32] CPU #1: 463.35 kH, 399.53 kH/s

Apparently Tribus 4way likes SMT/HT here.
Pages: « 1 ... 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 [152] 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!