Bitcoin Forum
September 19, 2024, 03:55:37 AM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 [54] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 ... 197 »
  Print  
Author Topic: [LOCKED] cpuminer-opt v3.12.3, open source optimized multi-algo CPU miner  (Read 444041 times)
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 25, 2016, 08:11:56 PM
 #1061

Hi, could you tell, whether assembled under windows x86 32-bit? Sorry for my English....

32 bit is not supported.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
NDBob
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
August 25, 2016, 09:02:15 PM
 #1062

joblo ....

Redid my set of changes on a clean copy of your 3.4.3 codebase.  With these changes it compiles on my westmere CPU with -march=westmere  Here are the diffs:

$ diff miner.h miner.h.orig
49a50,56
> #ifndef min
> #define min(a,b) (a>b ? b : a)
> #endif
> #ifndef max
> #define max(a,b) (a<b ? b : a)
> #endif
>

$ diff algo/blake/decred.c algo/blake/decred.c.orig
9,10d8
< #define min(a,b) (a>b ? b : a)
<
$ diff algo/hodl/aes.c algo/hodl/aes.c.orig
85a86,87
> #ifdef __AVX__
>
149a152,178
>
> #else    // NO AVX
>
> static inline __m128i AES256Core(__m128i State, const __m128i *ExpandedKey)
> {
>         State = _mm_xor_si128(State, ExpandedKey[0]);
>
>         for(int i = 1; i < 14; ++i) State = _mm_aesenc_si128(State, ExpandedKey);
>
>         return(_mm_aesenclast_si128(State, ExpandedKey[14]));
> }
>
> void AES256CBC(__m128i *Ciphertext, const __m128i *Plaintext, const __m128i *ExpandedKey, __m128i IV, uint32_t BlockCount)
> {
>         __m128i State = _mm_xor_si128(Plaintext[0], IV);
>         State = AES256Core(State, ExpandedKey);
>         Ciphertext[0] = State;
>
>         for(int i = 1; i < BlockCount; ++i)
>         {
>                 State = _mm_xor_si128(Plaintext, Ciphertext[i - 1]);
>                 State = AES256Core(State, ExpandedKey);
>                 Ciphertext = State;
>         }
> }
>
> #endif
$ diff algo/hodl/hodl-wolf.c algo/hodl/hodl-wolf.c.orig
58a59
> #ifdef __AVX__
129a131,196
>
> #else  // no AVX
>
>     uint32_t *pdata = work->data;
>     uint32_t *ptarget = work->target;
>     uint32_t BlockHdr[22], FinalPoW[8];
>     CacheEntry *Garbage = (CacheEntry*)hodl_scratchbuf;
>     CacheEntry Cache;
>     uint32_t CollisionCount = 0;
>
>     swab32_array( BlockHdr, pdata, 20 );
>         // Search for pattern in psuedorandom data
>         int searchNumber = COMPARE_SIZE / opt_n_threads;
>         int startLoc = threadNumber * searchNumber;
>
>         for(int32_t k = startLoc; k < startLoc + searchNumber && !work_restart[threadNumber].restart; k++)
>         {
>            // copy data to first l2 cache
>            memcpy(Cache.dwords, Garbage + k, GARBAGE_SLICE_SIZE);
> #ifndef NO_AES_NI
>            for(int j = 0; j < AES_ITERATIONS; j++)
>            {
>                 CacheEntry TmpXOR;
>                 __m128i ExpKey[16];
>
>                 // use last 4 bytes of first cache as next location
>                 uint32_t nextLocation = Cache.dwords[(GARBAGE_SLICE_SIZE >> 2)
>                                    - 1] & (COMPARE_SIZE - 1); //% COMPARE_SIZE;
>
>                 // Copy data from indicated location to second l2 cache -
>                 memcpy(&TmpXOR, Garbage + nextLocation, GARBAGE_SLICE_SIZE);
>                 //XOR location data into second cache
>                 for( int i = 0; i < (GARBAGE_SLICE_SIZE >> 4); ++i )
>                    TmpXOR.dqwords = _mm_xor_si128( Cache.dqwords,
>                                                       TmpXOR.dqwords );
>                 // Key is last 32b of TmpXOR
>                 // IV is last 16b of TmpXOR
>
>                 ExpandAESKey256( ExpKey, TmpXOR.dqwords +
>                                  (GARBAGE_SLICE_SIZE / sizeof(__m128i)) - 2 );
>                 AES256CBC( Cache.dqwords, TmpXOR.dqwords, ExpKey,
>                         TmpXOR.dqwords[ (GARBAGE_SLICE_SIZE / sizeof(__m128i))
>                                                              - 1 ], 256 );                 }
> #endif
>            // use last X bits as solution
>            if( ( Cache.dwords[ (GARBAGE_SLICE_SIZE >> 2) - 1 ]
>                                          & (COMPARE_SIZE - 1) ) < 1000 )
>            {
>               BlockHdr[20] = k;
>               BlockHdr[21] = Cache.dwords[ (GARBAGE_SLICE_SIZE >> 2) - 2 ];
>               sha256d( (uint8_t *)FinalPoW, (uint8_t *)BlockHdr, 88 );
>               CollisionCount++;
>               if( FinalPoW[7] <= ptarget[7] )
>               {
>                   pdata[20] = swab32( BlockHdr[20] );
>                   pdata[21] = swab32( BlockHdr[21] );
>                   *hashes_done = CollisionCount;
>                   return(1);
>               }
>            }
>         }
>
>     *hashes_done = CollisionCount;
>     return(0);
>
> #endif
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 25, 2016, 09:52:38 PM
 #1063

joblo ....

Redid my set of changes on a clean copy of your 3.4.3 codebase.  With these changes it compiles on my westmere CPU with -march=westmere  Here are the diffs:

[snipped]


Thanks.

I'm getting flashbacks to a AMD problem. It might be that some of that code won't compile on some AMD CPUs
which would explain the presence of the AVX hooks in aes.c. I recently read that AMD was working on SSE5 when Intel was
developping AVX. This may have created a mess with different implementations. Eventually AMD's SSE5 and Intel's AVX were merged.
This might also be related to the compile error I encountered trying to build for amdfam10, it was AVX related.

I'm going to have to dig deeper to understand all the ramifications. It could take a while. You seem to have a workaround and I know
of no other Westmere users, well, not any that complained, so I won't rush it.

For the time being I'll tighten up the check so it compiles on Westmere out of the box, but without AES performance.
The min/max issue will be fixed in the next release.

I hope you'll be available to test my fixes. It must be tested on appropriate HW. AMD testers would also help.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 26, 2016, 02:28:58 AM
Last edit: August 26, 2016, 06:49:49 PM by joblo
 #1064

I have an update on supporting cryptonight at nicehash.

I implemented the changes and they seem to work and they don't break other pools so there was no need to
impmement pool-specific code.

My test results on Nicehash are erratic, possibly a pool issue. I was initially submitted 20-25% rejects but that seems
to have stopped. The latest session is up to 36 accepts @ 100%, and counting.

I also experienced periods of extremely frequent thread hashrate output from one or 2 threads, around 100 per second, showing a hash count
of 1 with a normal hashrate. This occurred twice at startup and I killed it. It also happened mid session and cleared itself.
This is not associated with the rejects, I still submit valid shares but they show a lower than normal hashrate.

This is what it looks like:

Code:
[2016-08-25 12:23:28] CPU #0: 1 H, 72.57 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 56.63 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 55.92 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 64.27 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 67.63 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 54.73 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 55.19 H/s
[2016-08-25 12:23:28] CPU #1: 1 H, 71.66 H/s
[2016-08-25 12:23:28] CPU #0: 1 H, 69.21 H/s

More testing to do.  

thanks for this!

I think I found the bug causing the messy output. The bug has existed for a long time but didn't seem to have an effect before.
It also wasn't specific to cryptonight or the Nicehash mod. The fix requires a small design change affecting all algos so extensive
testing will be required. If it goes smoothly I should release it in a day or so.

Edit:

The output flood is fixed but I'm still concerned about stale shares. These rejects are intermittant. Last night was not good
with rejects rates over 20% at times. Today is better at less than 5%. Sometimes it changes from session to session. A session
could be runing clean but if I stop and restart it I may start producing rejects. These rejects are only produced when mining cryptonight
at Nicehash. Moneropool is always clean.

I'll poke around some more but If I don't find anything and the reject rate is manageable I'll release it as is.

Edit2:

I noticed something interesting while testing. I was mining three CPUs and had been running clean. They the all reported
a cluster of 3 or 4 rejects at the same time. This is too much of a coincidence so it seems the stale share rejects appear to be
a pool issue at Nicehash. I consider the issue closed and cryptonight support for Nicehash is ready for release.

There is one more pending issue involving Westmere CPUs. If it isn't resolved quickly I'll release cryptoninght anyway.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
hardkod
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
August 26, 2016, 09:48:11 AM
 #1065

Hello Joblo. Sry for poor English.

1. Is current 3.4.3 version (1st post) for windows support nicehash CryptoNight? I have lot of "stratum_recv_line failed..."
2. Can you make small instruction howto compile\install\run your miner in Ububntu please? or link to it.

Big thx for you work.
ryen123
Sr. Member
****
Offline Offline

Activity: 292
Merit: 250


View Profile
August 26, 2016, 10:06:11 AM
 #1066

Hello Joblo. Sry for poor English.

1. Is current 3.4.3 version (1st post) for windows support nicehash CryptoNight? I have lot of "stratum_recv_line failed..."
2. Can you make small instruction howto compile\install\run your miner in Ububntu please? or link to it.

Big thx for you work.

@hardkod

v3.4.3 does not yet support cryptonight mining at nicehash.

There are instructions inside README.md in the source code for building on linux.

From README.md
Code:
Building on linux prerequisites:

It is assumed users know how to install packages on their system and
be able to compile standard source packages. This is basic Linux and
beyond the scope of cpuminer-opt.

Make sure you have the basic development packages installed.
Here is a good start:

http://askubuntu.com/questions/457526/how-to-install-cpuminer-in-ubuntu

Install any additional dependencies needed by cpuminer-opt. The list below
are some of the ones that may not be in the default install and need to
be installed manually. There may be others, read the error messages they
will give a clue as to the missing package.

The folliwing command should install everything you need on Debian based
packages:

sudo apt-get install build-essential libssl-dev libcurl4-openssl-dev libjansson-dev libgmp-dev automake

Building on Linux, see below for Windows.

Dependencies

build-essential  (for Ubuntu, Development Tools package group on Fedora)
automake
libjansson-dev
libgmp-dev
libcurl4-openssl-dev
libssl-dev
pthreads
zlib

tar xvzf [file.tar.gz]
cd [file]

Run build.sh to build on Linux or execute the following commands.

./autogen.sh
CFLAGS="-O3 -march=native -Wall" CXXFLAGS="$CFLAGS -std=gnu++11" ./configure --with-curl
make

Start mining.

./cpuminer -a algo ...

hardkod
Newbie
*
Offline Offline

Activity: 8
Merit: 0


View Profile
August 26, 2016, 10:59:37 AM
 #1067

Thx a lot, so i have to wait new release?
NDBob
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
August 26, 2016, 03:02:41 PM
 #1068

Joblo --

Some further testing / updates for you:  Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.

I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier.  AMD builds work for anything newer than barcelona/amdfam10.
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 26, 2016, 03:29:20 PM
 #1069

Joblo --

Some further testing / updates for you:  Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.

I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier.  AMD builds work for anything newer than barcelona/amdfam10.


Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident
your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD.

AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX.
They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX
check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible.

What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing
user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 26, 2016, 03:30:52 PM
 #1070

Thx a lot, so i have to wait new release?

Yes. I though that was clear from the recent discussions in this thread.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
NDBob
Newbie
*
Offline Offline

Activity: 14
Merit: 0


View Profile
August 26, 2016, 03:52:29 PM
 #1071

Joblo --

Some further testing / updates for you:  Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.

I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier.  AMD builds work for anything newer than barcelona/amdfam10.


Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident
your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD.

AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX.
They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX
check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible.

What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing
user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.

I have some systems lying around with AMD CPUs.  I'll see what I've got that is running and run some tests if I can.
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 26, 2016, 06:43:15 PM
 #1072

Joblo --

Some further testing / updates for you:  Looks like there is an issue with compiling for AMD non-AES_NI capable processors and some older Intel processors --- but it seems to exist in the pristine 3.4.3 chain as well under GCC 6.1.0 so it does not appear to be due to the diffs I've made.

I don't have all the platforms to test binaries, but I have at least been able to successfully compile for all Intel architectures back as far as core2 --- the compile errors pop back up when I try to build with -march=nocona or earlier.  AMD builds work for anything newer than barcelona/amdfam10.


Thanks, that helps. I'm still a little concerned about being unable to both compile and test on the native HW. I'm pretty confident
your changes will not negatively impact other Intel architectures while helping Westmere but I'm not so sure about AMD.

AMD and Intel diverged between SSE4 and AVX. AMD was developping their own SSE5 which was not fully compatible with Intel's AVX.
They eventually converged but there may have been a period where AMD support was not aligned with Intel. This could mean the AVX
check does not work properly on some early AES AMD CPUs. This is somewhat speculative but plausible.

What it comes down to is whether I play it safe at the expense of Westmere performance or improve Westmere for a known and contributing
user at the risk of breaking some unknown AMD users. I'm leaning toward the latter.

I have some systems lying around with AMD CPUs.  I'll see what I've got that is running and run some tests if I can.

That would be nice.

I'm a little confused about your compile problem related to AES256CBC. The min/max issue is resolved.

In looking at the code more closely, it took a while to remember what I was thinking when I made those changes,
I realized the AVX checks were intended to seperate the original Wolf AES optimizations from the recent Optiminer
AVX enhancements. I assumed all the optiminer code required AVX so if it was not available the compiler would revert to
the original Wolf code which was AES enhanced.

The way it is coded only one instance of AES256CBC should be compiled, either the new Optiminer version or the Wolf version.
I really would like to see your compile errors to understand this better. I need to understand the compile error. The code from
3.4.3 should compile the Wolf code on your CPU.

The AVX checks in hodl-wolf make the assumption that if AVX is present AES is also present. They are present to seperate the
original Wolf code from the Optimier code. The AES checks are only to prevent compile errors on non-AES CPUs. None of the
Wolf code is actually run on a non-AES CPU. Perhaps I should block it all out if AES isn't available.

The intended result is:

AES+AVX: run Optiminer modded code in hodl-wolf.c and aes.c.

AES only: run all Wolf code in hodl-wolf.c and aes.c.

no AES: run the unoptimized c++ code.

That was based on assumptions. You now have some actual data from a CPU with AES but not AVX.
Your data shows that only the Optiminer code in GenerateGarbageCore contains AVX code. The remainder
of the Optiminer code will run on your AES-only Westmere.

This raises another question. Is the Optiminer AES code in aes.c and scanhash_hodl_wolf faster than the corresponding
pure Wolf code? Since you weren't able to compile the code as released it points back to understanding why it didn't
compile. Once it does you can test both and I can implement it whichever is faster.

I know I'm pushy and I know it's a lot of work but it's rare to find a Westmere owner willing and able to do some dirty work.
I really appreciate your help.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 27, 2016, 12:37:23 AM
 #1073

cpuminer 3.4.4 is released.

Source: https://drive.google.com/file/d/0B0lVSGQYLJIZcWN3ZE5ma0FWRnM/view?usp=sharing

Windows: https://drive.google.com/file/d/0B0lVSGQYLJIZdG50THdjZEo5c1U/view?usp=sharing

V4.4.4 adds support for mining cryptonight algo at nicehash with AES optimizations. Some stale share rejects have
been observed when mining cryptonight at Nicehash that don't occur at other pools. These rejects are believed to
be a pool issue.

Also fixed is a compile error when using gcc 6.1.

An interim fix for a compile error in Hodl code on Westmere CPUs was submitted. This interim fix should allow hodl
to compile, however, it will not be an optimum build. Further investigation into this issue is underway with a goal
of enabling AES on Westmere CPUs.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
felixbrucker
Hero Member
*****
Offline Offline

Activity: 700
Merit: 500


View Profile WWW
August 27, 2016, 06:30:41 AM
 #1074

@joblo

i got a strange buffer overflow, you might know if this is miner related:

system is a Ubuntu server 16.04 LTS LXC container on proxmox (kernel 4.4.13-1-pve) able to use 2GB ram

miner got terminated, my log (stdout/err from cpuminer) displayed the following:

https://paste.felixbrucker.com/paste/avy2w
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 27, 2016, 11:07:57 AM
 #1075

@joblo

i got a strange buffer overflow, you might know if this is miner related:

system is a Ubuntu server 16.04 LTS LXC container on proxmox (kernel 4.4.13-1-pve) able to use 2GB ram

miner got terminated, my log (stdout/err from cpuminer) displayed the following:

https://paste.felixbrucker.com/paste/avy2w


I've never seen anything like this before. If it happens with all algos and only on proxmox I'd assume
it's proxmox related.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
felixbrucker
Hero Member
*****
Offline Offline

Activity: 700
Merit: 500


View Profile WWW
August 27, 2016, 12:23:47 PM
 #1076

its also the first time i have seen this, im using ubuntu lxc container on debian (proxmox) everywhere and they are rock solid, no clue what is responsible for this.

so i suppose the printed mem map and stuff did not explain whats the issue?

cheers
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 27, 2016, 01:59:51 PM
 #1077

its also the first time i have seen this, im using ubuntu lxc container on debian (proxmox) everywhere and they are rock solid, no clue what is responsible for this.

so i suppose the printed mem map and stuff did not explain whats the issue?

cheers

It apppears to have something to do with crypto but I have no idea what cpuminer code was running.

I'm also unfamiliar with how buffer overflow detection works on Linux. I didn't even know it existed and suspect
it involves special tools.

Since you have, presumably similar, systems that do work the key is to find out what is different between them.
Anythng from the host OS, the VM config, the guest OS, compile,  miner version, algo, anything that is different.
You could also try changing some variables, different algos, different cpuminer versions etc to try to change the
symptoms. Deciphering backtraces is difficult it should be fairly easy to identify if they are all identical. If you can
cause the symptoms to change it can lead you to what is causing it.


AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
felixbrucker
Hero Member
*****
Offline Offline

Activity: 700
Merit: 500


View Profile WWW
August 27, 2016, 02:08:32 PM
 #1078

independently from this issue i noticed a medium decrease of lyra2re hashrate (not sure if other algos too) on amd cpu's using linux and the build.sh to compile the miner natively (3.4.1 vs 3.4.3/3.4.4)

fx 8320e went from 617kh/s to 550kh/s
a10-6800k went from 380kh/s to 359kh/s

current intel cpus however gained the noted slight lyra2re improvement of some 10-20kh/s

any idea why that is?

willing to test around with my setups if needed, can setup some ssh if needed

cheers

edit: this buffer overflow was the first of its kind, system setup software wise is identical on my systems, could only be hardware (old hdd)
i will just wait and see if it happens again
joblo (OP)
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
August 27, 2016, 02:35:05 PM
 #1079

independently from this issue i noticed a medium decrease of lyra2re hashrate (not sure if other algos too) on amd cpu's using linux and the build.sh to compile the miner natively (3.4.1 vs 3.4.3/3.4.4)

fx 8320e went from 617kh/s to 550kh/s
a10-6800k went from 380kh/s to 359kh/s

current intel cpus however gained the noted slight lyra2re improvement of some 10-20kh/s

any idea why that is?

willing to test around with my setups if needed, can setup some ssh if needed

cheers

edit: this buffer overflow was the first of its kind, system setup software wise is identical on my systems, could only be hardware (old hdd)
i will just wait and see if it happens again

When I was doing the final tweaking of lyra I noticed that in some cases the AVX code required the same number of instructions
as the scalar code or that the AVX version appeared no faster than the scalar version. In fact there was one fucntion I did
not modify for AVX because it appeared to have no benefit. This is specific to AVX, AVX2 was always faster.

If your CPUs have only AVX it is possible the AMD implementation of it is less efficient that Intel's.

The reason for all this is the overhead in converting the data from scalar format to vector format and back again as AVX has
its own set of registers. With only a 2 to 1 gain with AVX instructions on lyra2 the AVX segment has to be big enough to overcome the
overhead. Short functions don't benefit as much.

If you want ot see what I'm talking about perform a diff on algo/lyra2/sponge.c.

As you know the situation with AMD and AVX is confusing and I don't think I could make it work perfectly even if I fully
understood it.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
felixbrucker
Hero Member
*****
Offline Offline

Activity: 700
Merit: 500


View Profile WWW
August 27, 2016, 03:07:24 PM
 #1080

im sorry if i did not fully understand everything, im not familiar with such low level code Cheesy

avx is as fast as the scalar code and sometimes also requires the same amount of instructions, avx2 however is always faster
the cpus only have avx afaik

i took a look at the diff but im a bit lost there

whats the best thing to do in my case? if the "old" code that was faster on amd but slower on intel can not be integrated into the miner i will have to identify the slower algos and use the old version for these and newer versions for the other algos i suppose?

cheers
Pages: « 1 ... 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 [54] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 ... 197 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!