Bitcoin Forum
November 16, 2018, 12:36:07 AM *
News: Latest Bitcoin Core release: 0.17.0 [Torrent].
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 [48] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 ... 190 »
  Print  
Author Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner  (Read 416696 times)
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 03, 2016, 05:03:00 AM
 #941

Things are looking up. I solved the alignment problem and made progress with performance. I've added 3%
to Lyra2 so far and I have a few functions left to convert. So far I've only implemented AVX2, I still have to do
AVX implementations of al functions. The improvements are to the lyra2 core so they should also help lyra2v2.

You are talking about Lyra2RE, right? So it will get sped up? Because so far tpruvot-cpuminer-multi's Lyra2RE is faster than yours by some 3.8% on my servers. I guess it might be because of -flto that I use with his but can't use with yours (doesn't compile) Sad

I think you are doing something wrong, lyra2RE in cpuminer-opt v3.3.7 was improved 7% faster than cpuminer-multi.
In the next release it will another 3% faster. If you have very old CPUs (ie core2) you won't get the benefits of my optimisations.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
1542328567
Hero Member
*
Offline Offline

Posts: 1542328567

View Profile Personal Message (Offline)

Ignore
1542328567
Reply with quote  #2

1542328567
Report to moderator
1542328567
Hero Member
*
Offline Offline

Posts: 1542328567

View Profile Personal Message (Offline)

Ignore
1542328567
Reply with quote  #2

1542328567
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1542328567
Hero Member
*
Offline Offline

Posts: 1542328567

View Profile Personal Message (Offline)

Ignore
1542328567
Reply with quote  #2

1542328567
Report to moderator
1542328567
Hero Member
*
Offline Offline

Posts: 1542328567

View Profile Personal Message (Offline)

Ignore
1542328567
Reply with quote  #2

1542328567
Report to moderator
1542328567
Hero Member
*
Offline Offline

Posts: 1542328567

View Profile Personal Message (Offline)

Ignore
1542328567
Reply with quote  #2

1542328567
Report to moderator
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 04, 2016, 02:40:46 AM
 #942

I've found more AVX2 optimizations, converted cubehash SSE2 to AVX2, improved lyra2v2 19% and X algos 3-5%.
Cubehash will probbaly be the biggest single AVX2 optimization next to Hodl. I don't know how much more I can find.
Optiminer's code was an inspiration from which I learned a lot.

A lot more work to do before release.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
johnsmithx
Hero Member
*****
Offline Offline

Activity: 587
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 04, 2016, 03:50:28 AM
 #943

Things are looking up. I solved the alignment problem and made progress with performance. I've added 3%
to Lyra2 so far and I have a few functions left to convert. So far I've only implemented AVX2, I still have to do
AVX implementations of al functions. The improvements are to the lyra2 core so they should also help lyra2v2.

You are talking about Lyra2RE, right? So it will get sped up? Because so far tpruvot-cpuminer-multi's Lyra2RE is faster than yours by some 3.8% on my servers. I guess it might be because of -flto that I use with his but can't use with yours (doesn't compile) Sad

I think you are doing something wrong, lyra2RE in cpuminer-opt v3.3.7 was improved 7% faster than cpuminer-multi.
In the next release it will another 3% faster. If you have very old CPUs (ie core2) you won't get the benefits of my optimisations.

I know you didn't mean to but what you said was very amusing. The whole time I am talking about servers, the real servers in data centers, not some desktop pc in your home that you call "a server", and you come back at me with "core2" - a 10 years old super-obsolete desktop cpu. Hilarious!

But maybe you are right, maybe I am doing something wrong but it's not the hardware. I am using the up-to-date Ubuntu 16.04, if I messed something up then it must be the flags in the build.sh. Or maybe you did something wrong actually. Maybe you didn't compile tpruvot's cpuminer with -flto so now you are competing with a crippled sw because this flag does make the difference and it's not on by default, it's commented out in the build.sh so you have to enable it.

Either way, instead of accusing each other let's try to make things better. In this spirit I made a little test for you. I picked the most powerful server AWS has to offer, the x1.32xlarge (https://aws.amazon.com/ec2/instance-types/x1/) with 128 cores and 1952 GB memory. Here are the specs:
Code:
root@xxx:~/# grep -e name -e flags /proc/cpuinfo | head -n2
model name      : Intel(R) Xeon(R) CPU E7-8880 v3 @ 2.30GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt ida

root@xxx:~/# grep processor /proc/cpuinfo | wc -l
128

root@xxx:~/# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.9T        3.3G        1.9T        9.1M        624M        1.9T
Swap:            0B          0B          0B

Now I fresh recompiled tpruvot and here is the result:

(I noticed that with the many-cores machines it is sometimes actually more powerful to use just half the threads; here on x1.32xlarge the difference is marginal, on g2.8xlarge it's quite significant; the spike at the beginning of full cores I attribute to some throttling done by Amazon, it's just a vps after all, not a dedi)
Code:
root@xxx:~/tpruvot-cpuminer-multi# ./cpuminer -a lyra2re --benchmark --threads=64 | grep Total
[2016-08-04 02:44:55] Total: 8903 kH/s
[2016-08-04 02:44:59] Total: 8833 kH/s
[2016-08-04 02:45:04] Total: 8735 kH/s
[2016-08-04 02:45:09] Total: 8728 kH/s
[2016-08-04 02:45:14] Total: 8727 kH/s

root@xxx:~/tpruvot-cpuminer-multi# ./cpuminer -a lyra2re --benchmark --threads=128 | grep Total
[2016-08-04 02:45:21] Total: 11661 kH/s
[2016-08-04 02:45:25] Total: 11568 kH/s
[2016-08-04 02:45:30] Total: 8731 kH/s
[2016-08-04 02:45:35] Total: 8720 kH/s
[2016-08-04 02:45:40] Total: 8722 kH/s
[2016-08-04 02:45:45] Total: 8703 kH/s

Now let's look at you:
Code:
CPU: Intel(R) Xeon(R) CPU E7-8880 v3 @ 2.30GHz
CPU features: SSE2 AES AVX AVX2
SW built on Aug  4 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES
Start mining with AES-AVX optimizations...

root@xxx:~/cpuminer-opt-3.3.9# ./cpuminer -a lyra2re --benchmark --threads=64 | grep Total
[2016-08-04 02:47:57] Total: 4128.77 kH, 8660.05 kH/s
[2016-08-04 02:48:01] Total: 35.19 MH, 8668.56 kH/s
[2016-08-04 02:48:06] Total: 43.34 MH, 8590.78 kH/s
[2016-08-04 02:48:11] Total: 42.95 MH, 8600.93 kH/s
[2016-08-04 02:48:16] Total: 43.00 MH, 8621.90 kH/s

root@xxx:~/cpuminer-opt-3.3.9# ./cpuminer -a lyra2re --benchmark --threads=128 | grep Total
[2016-08-04 02:48:23] Total: 8323.07 kH, 11.96 MH/s
[2016-08-04 02:48:26] Total: 11.68 MH, 12.03 MH/s
[2016-08-04 02:48:31] Total: 32.93 MH, 8544.43 kH/s
[2016-08-04 02:48:36] Total: 39.36 MH, 8531.49 kH/s
[2016-08-04 02:48:41] Total: 42.52 MH, 8531.90 kH/s
[2016-08-04 02:48:46] Total: 42.33 MH, 8536.75 kH/s
[2016-08-04 02:48:51] Total: 41.88 MH, 8536.12 kH/s
[2016-08-04 02:48:56] Total: 42.41 MH, 8526.90 kH/s

Here tpruvot is faster by over 2%.

So let's look at the build.sh. Tpruvot's:
Code:
root@xxx:~/tpruvot-cpuminer-multi# cat build.sh
#!/bin/bash

if [ "$OS" = "Windows_NT" ]; then
    ./mingw64.sh
    exit 0
fi

# Linux build

make clean || echo clean

rm -f config.status
./autogen.sh || echo done

# Ubuntu 10.04 (gcc 4.4)
extracflags="-O3 -march=native -D_REENTRANT -funroll-loops -fvariable-expansion-in-unroller -fmerge-all-constants -fbranch-target-load-optimize2 -fsched2-use-superblocks -falign-loops=16 -falign-functions=16 -falign-jumps=16 -falign-labels=16"

# Debian 7.7 / Ubuntu 14.04 (gcc 4.7+)
extracflags="$extracflags -Ofast -flto -fuse-linker-plugin -ftree-loop-if-convert-stores"

if [ ! "0" = `cat /proc/cpuinfo | grep -c avx` ]; then
    # march native doesn't always works, ex. some Pentium Gxxx (no avx)
    extracflags="$extracflags -march=native"
fi

./configure --with-crypto --with-curl CFLAGS="-O3 $extracflags -march=native -DUSE_ASM -pg"

make -j $(grep processor /proc/cpuinfo | wc -l)

strip -s cpuminer

Yours ("customized" by me, but all I actually did was taking most of the flags from tpruvot's as long as it was compilable; maybe I messed it up?):
Code:
root@xxx:~/cpuminer-opt-3.3.9# cat build.sh
#!/bin/bash

#if [ "$OS" = "Windows_NT" ]; then
#    ./mingw64.sh
#    exit 0
#fi

# Linux build

make clean || echo clean

rm -f config.status
./autogen.sh || echo done

# Ubuntu 10.04 (gcc 4.4)
extracflags="-O3 -march=native -D_REENTRANT -funroll-loops -fvariable-expansion-in-unroller -fmerge-all-constants -fbranch-target-load-optimize2 -fsched2-use-superblocks -falign-loops=16 -falign-functions=16 -falign-jumps=16 -falign-labels=16"

# Debian 7.7 / Ubuntu 14.04 (gcc 4.7+)
extracflags="$extracflags -Ofast -fuse-linker-plugin -ftree-loop-if-convert-stores"

CFLAGS="-O3 $extracflags -march=native -DUSE_ASM" CXXFLAGS="$CFLAGS -std=gnu++11" ./configure --with-crypto --with-curl

make -j $(grep processor /proc/cpuinfo | wc -l)

strip -s cpuminer

You don't have -flto and -pg, neither is compilable, he doesn't have -std=gnu++11 (doesn't compile either).

Now let's see what all the -flto fuzz is about. What happens if I compile tpruvot without it:
Code:
root@xxx:~/tpruvot-cpuminer-multi# ./cpuminer -a lyra2re --benchmark --threads=64 | grep Total
[2016-08-04 02:56:24] Total: 356.42 kH/s
[2016-08-04 02:56:29] Total: 352.18 kH/s
[2016-08-04 02:56:34] Total: 352.00 kH/s
[2016-08-04 02:56:39] Total: 352.03 kH/s
[2016-08-04 02:56:44] Total: 352.09 kH/s

root@xxx:~/tpruvot-cpuminer-multi# ./cpuminer -a lyra2re --benchmark --threads=128 | grep Total
[2016-08-04 02:57:29] Total: 358.48 kH/s
[2016-08-04 02:57:34] Total: 357.76 kH/s
[2016-08-04 02:57:39] Total: 357.96 kH/s

It turned into a snail. As if it couldn't manage the multiplexing or something. So let's try just 1 thread:
Code:
root@xxx:~/tpruvot-cpuminer-multi# ./cpuminer -a lyra2re --benchmark --threads=1 | grep Total
[2016-08-04 02:57:45] Total: 124.22 kH/s
[2016-08-04 02:57:50] Total: 124.66 kH/s
[2016-08-04 02:57:55] Total: 125.81 kH/s
[2016-08-04 02:58:00] Total: 126.19 kH/s
[2016-08-04 02:58:05] Total: 126.84 kH/s

And yours:
Code:
root@xxx:~/cpuminer-opt-3.3.9# ./cpuminer -a lyra2re --benchmark --threads=1 | grep Total
[2016-08-04 02:58:25] Total: 65.54 kH, 136.75 kH/s
[2016-08-04 02:58:30] Total: 683.77 kH, 138.37 kH/s
[2016-08-04 02:58:35] Total: 691.85 kH, 140.37 kH/s
[2016-08-04 02:58:40] Total: 701.86 kH, 140.28 kH/s
[2016-08-04 02:58:45] Total: 701.42 kH, 140.33 kH/s
[2016-08-04 02:58:50] Total: 701.68 kH, 140.74 kH/s

You are clearly faster if he doesn't use -flto. But if I again turn -flto back on and recompile:
Code:
root@xxx:~/tpruvot-cpuminer-multi# ./cpuminer -a lyra2re --benchmark --threads=1 | grep Total
[2016-08-04 02:59:54] Total: 138.24 kH/s
[2016-08-04 02:59:58] Total: 139.78 kH/s
[2016-08-04 03:00:03] Total: 140.41 kH/s
[2016-08-04 03:00:08] Total: 140.39 kH/s
[2016-08-04 03:00:13] Total: 141.38 kH/s
[2016-08-04 03:00:18] Total: 141.65 kH/s
[2016-08-04 03:00:23] Total: 141.64 kH/s
[2016-08-04 03:00:28] Total: 142.04 kH/s
[2016-08-04 03:00:33] Total: 141.98 kH/s

He is clearly faster after all.

Now how do you do with 8 threads?
Code:
root@xxx:~/cpuminer-opt-3.3.9# ./cpuminer -a lyra2re --benchmark --threads=8 | grep Total
[2016-08-04 03:00:53] Total: 524.29 kH, 1128.89 kH/s
[2016-08-04 03:00:58] Total: 5644.46 kH, 1130.25 kH/s
[2016-08-04 03:01:03] Total: 5651.25 kH, 1130.72 kH/s
[2016-08-04 03:01:08] Total: 5653.59 kH, 1130.44 kH/s
[2016-08-04 03:01:13] Total: 5652.21 kH, 1130.53 kH/s
[2016-08-04 03:01:18] Total: 5652.64 kH, 1130.45 kH/s

And him with -flto?
Code:
root@xxx:~/tpruvot-cpuminer-multi# ./cpuminer -a lyra2re --benchmark --threads=8 | grep Total
[2016-08-04 03:01:29] Total: 1143 kH/s
[2016-08-04 03:01:34] Total: 1144 kH/s
[2016-08-04 03:01:39] Total: 1144 kH/s
[2016-08-04 03:01:44] Total: 1144 kH/s
[2016-08-04 03:01:49] Total: 1144 kH/s
[2016-08-04 03:01:54] Total: 1145 kH/s

And him without -flto:
Code:
root@xxx:~/tpruvot-cpuminer-multi# ./cpuminer -a lyra2re --benchmark --threads=8 | grep Total
[2016-08-04 03:03:13] Total: 637.08 kH/s
[2016-08-04 03:03:17] Total: 611.28 kH/s
[2016-08-04 03:03:22] Total: 605.97 kH/s
[2016-08-04 03:03:27] Total: 605.81 kH/s
[2016-08-04 03:03:32] Total: 605.90 kH/s


I support very much your optimizing effort and whenever you need I will gladly do tests for you on various machines.

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 04, 2016, 04:54:55 AM
 #944


I support very much your optimizing effort and whenever you need I will gladly do tests for you on various machines.

I appeciate the detailed report and the humour. It will take some time to digest it all but I have a couple of comments.
You clearly have all the optimization features in your CPU and the miner was running the optimized code, so that is eliminated.

I tested on a i7-6700K with 8 threads and saw no difference between multi with or without -flto.
Specifically I get:

multi: 920-922 kH/s
opt v3.3.9: 995
opt v3.3.8: 930
opt dev: 1025

Furthermore I can compile opt with both -flto and -pg, again with no performance difference.

I took the Lyra2RE code, and a lot more, directly from multi and I don't think he has made any changes to it. I've been tickering with
Lyra2RE for a while and only recently made any significant progress.

C++11 is required to support an algo not included in multi.

I can only speculate your compiler version may be the issue. I am using 4.8.4 and you 5.4.0.  I seem to recall someone
else, Wolf maybe, having compile problems using a more recent version of gcc. I'm too lazy to look back in the thread to find it.
If you can find it you could compare notes.

I'll read through your report in more detail, If any ideas come to mind I'll post an update.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
gimomars
Newbie
*
Offline Offline

Activity: 31
Merit: 0


View Profile
August 04, 2016, 05:55:27 AM
 #945

Hi experts, I'm trying to build in my openSUSE linux. But I've got an error:

./build.sh
make: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by make)
make: /lib64/libc.so.6: version `GLIBC_2.17' not found (required by make)
strip: 'cpuminer': No such file

Regards
johnsmithx
Hero Member
*****
Offline Offline

Activity: 587
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 04, 2016, 08:45:32 AM
 #946

I tested on a i7-6700K with 8 threads and saw no difference between multi with or without -flto.

One thing to keep in mind is that you are using a desktop cpu, I am using a server cpu. Those xeons are not multiple times more expensive for no reason (sure partly it's just branding, more reliability etc. etc., but there are also some functional differences).

Or it (the fact that -flto doesn't do anything to you) could simply be the compiler. Yours is 2 years old, mine 2 months. But the crucial information is that you can actually compile with -flto. Could you please give me the exact flags that work for you? I will take the effort and find out what's the problem on my end, I just need the starting (compilable) point.

I have no idea whether he improved the performance after you took the code from him. The very last commit is 3 days old but which one is the last one that could have had any real impact on Lyra2RE speed, or the overall speed, I am not going to investigate. But if you could remember, at least roughly, when did you take his code I can revert his tree to that date and try that version.


Hi experts, I'm trying to build in my openSUSE linux. But I've got an error:

./build.sh
make: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by make)
make: /lib64/libc.so.6: version `GLIBC_2.17' not found (required by make)
strip: 'cpuminer': No such file

Regards


This problem has nothing to do with cpuminer, your building environment is messed up. Glibc is the very core linux library, if make can't find the version it likes then there is something wrong with either. But if glibc was messed up the system would hardly even boot properly. Maybe try to update?

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
gimomars
Newbie
*
Offline Offline

Activity: 31
Merit: 0


View Profile
August 04, 2016, 09:39:20 AM
 #947

 
Hi experts, I'm trying to build in my openSUSE linux. But I've got an error:

./build.sh
make: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by make)
make: /lib64/libc.so.6: version `GLIBC_2.17' not found (required by make)
strip: 'cpuminer': No such file

Regards


This problem has nothing to do with cpuminer, your building environment is messed up. Glibc is the very core linux library, if make can't find the version it likes then there is something wrong with either. But if glibc was messed up the system would hardly even boot properly. Maybe try to update?

Thanks for the reply. I think my linux has old version of GLIBC_x.x.

        Version information:
        /bin/sh:
                libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
                libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.Cool => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.11) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        /lib64/libreadline.so.5:
                libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.11) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
johnsmithx
Hero Member
*****
Offline Offline

Activity: 587
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 04, 2016, 12:09:33 PM
 #948

 
Hi experts, I'm trying to build in my openSUSE linux. But I've got an error:

./build.sh
make: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by make)
make: /lib64/libc.so.6: version `GLIBC_2.17' not found (required by make)
strip: 'cpuminer': No such file

Regards


This problem has nothing to do with cpuminer, your building environment is messed up. Glibc is the very core linux library, if make can't find the version it likes then there is something wrong with either. But if glibc was messed up the system would hardly even boot properly. Maybe try to update?

Thanks for the reply. I think my linux has old version of GLIBC_x.x.

        Version information:
        /bin/sh:
                libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
                libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.Cool => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.11) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
        /lib64/libreadline.so.5:
                libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.11) => /lib64/libc.so.6
                libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6


Now you are looking what versions of glibc these two binaries (/bin/sh and /lib64/libreadline.so.5) require. To find out what version of glibc you actually have just run the library: type /lib64/libc.so.6 and press enter. Presumably it will be equal or higher than what /bin/sh requires and lower than what 'make' requires. But that would mean that you didn't install 'make' a standard way via a package system but somehow sideway. What did you do to your suse?!?

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 04, 2016, 01:24:07 PM
 #949

I tested on a i7-6700K with 8 threads and saw no difference between multi with or without -flto.

One thing to keep in mind is that you are using a desktop cpu, I am using a server cpu. Those xeons are not multiple times more expensive for no reason (sure partly it's just branding, more reliability etc. etc., but there are also some functional differences).

Or it (the fact that -flto doesn't do anything to you) could simply be the compiler. Yours is 2 years old, mine 2 months. But the crucial information is that you can actually compile with -flto. Could you please give me the exact flags that work for you? I will take the effort and find out what's the problem on my end, I just need the starting (compilable) point.

I have no idea whether he improved the performance after you took the code from him. The very last commit is 3 days old but which one is the last one that could have had any real impact on Lyra2RE speed, or the overall speed, I am not going to investigate. But if you could remember, at least roughly, when did you take his code I can revert his tree to that date and try that version.


I considered things like larger cache and fatser memory interface as likely advantages of a server grade CPU but I can't
figure out how that would produce inconsistent results. I'm still leaning toward the compiler. My dusty memories seem to also
have LTO in them, I may have to dig back in the thread to refresh. Your compile errors might also jog some memories. I don't
think the problem was in code imported from multi, more likely one of the ugly SSE2 optimized macros. When I compiled i
used the flags from build.sh + -flto -pg.

It is an issue worth pursuing, I can't stay on the old compiler forever.

One way to determine if it's a compiler issue or CPU issue is with a VM. I don't know if you start up a VM or boot an older
version of Ubuntu but a direct comparison of different compilers on the same HW would help understanding what's going on.

I'm a little busy right now testing the latest AVX2 optimisations to get them released, but after that I'll look into it more.

Edit: I found the post from Wolf0 about his experience compiling cpuminer-opt with gcc 6.1.1. Look similar to yours?

https://bitcointalk.org/index.php?topic=1326803.msg15140799#msg15140799

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 04, 2016, 06:14:54 PM
 #950

cpuminer-opt v3.4.0 is released.

A compile error was introduced in Windows v3.3.9 and has been fixed. X11gost was also broken since
v3.3.7 and has also been fixed.

The big news is more AVX2 optimizations inspired by Optiminer's work on the Hodl algo. See OP for details.
The entire Cubehash function was converted from SSE2 to AVX2 and improved all algos that use it. Some
AVX2 optimizations were also done to the Lyra2 core, improving both Lyra2RE and Lyra2REv2. Those were
the easy ones, I don't know how much more I can find. See OP for list of improved algos.

Source:
https://drive.google.com/file/d/0B0lVSGQYLJIZbFB1WThUZ09JbVk/view?usp=sharing

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
clipto
Member
**
Offline Offline

Activity: 181
Merit: 10

Neironix


View Profile
August 04, 2016, 08:03:58 PM
 #951

Great stuff, will it be released for Windows too?

joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 04, 2016, 08:06:41 PM
 #952

Great stuff, will it be released for Windows too?

I'm hoping. CMB have been good, seems they skipped v3.3.9 because it didn't compile. I hope they pickup v3.4.0.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
clipto
Member
**
Offline Offline

Activity: 181
Merit: 10

Neironix


View Profile
August 04, 2016, 08:08:54 PM
 #953

3.3.7 gave me better hashrate than 3.3.8 on Lyra2RE, so I'm still running that.
But looking forward to the increased performance, but are bound to Windows OS.

joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 04, 2016, 08:36:49 PM
 #954

3.3.7 gave me better hashrate than 3.3.8 on Lyra2RE, so I'm still running that.
But looking forward to the increased performance, but are bound to Windows OS.

There should be a slight increase between 3.3.7 and 3.3.8.

I'm suspecting data alignment issues. I've noticed different hashrates on different runs of the same version.
It doesn't seem to be related to other CPU activity.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
johnsmithx
Hero Member
*****
Offline Offline

Activity: 587
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 04, 2016, 11:49:08 PM
 #955

joblo, that's an excellent improvement! Now you are definitely faster than tpruvot, at least by 4.8%.

I made a better repeatable benchmark of tpruvot's, the numbers are directly in the build.sh:

Code:
#!/bin/bash

if [ "$OS" = "Windows_NT" ]; then
    ./mingw64.sh
    exit 0
fi

# Linux build

make clean || echo clean

rm -f config.status
./autogen.sh || echo done

# Ubuntu 10.04 (gcc 4.4)
extracflags="-O3 -march=native -w -D_REENTRANT -funroll-loops -fvariable-expansion-in-unroller -fmerge-all-constants -fbranch-target-load-optimize2 -fsched2-use-superblocks -falign-loops=16 -falign-functions=16 -falign-jumps=16 -falign-labels=16"

# Debian 7.7 / Ubuntu 14.04 (gcc 4.7+)
extracflags="$extracflags -Ofast -fuse-linker-plugin -ftree-loop-if-convert-stores"

if [ ! "0" = `cat /proc/cpuinfo | grep -c avx` ]; then
    # march native doesn't always works, ex. some Pentium Gxxx (no avx)
    extracflags="$extracflags -march=native"
fi


# Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz 4 threads (d2.xlarge)


#309-310
#CFLAGS="-O3 $extracflags -flto -march=native -DUSE_ASM -pg" ./configure --with-crypto --with-curl

#311-312
CFLAGS="-O3 $extracflags -flto -march=native -DUSE_ASM -pg" CXXFLAGS="-std=gnu++11" ./configure --with-crypto --with-curl

#281
#CFLAGS="-O3 $extracflags -flto -march=native -DUSE_ASM -pg" CXXFLAGS="$CFLAGS" ./configure --with-crypto --with-curl

#280
#CFLAGS="-O3 $extracflags -flto -march=native -DUSE_ASM -pg" CXXFLAGS="$CFLAGS -std=gnu++11" ./configure --with-crypto --with-curl


#269
#CFLAGS="-O3 $extracflags -march=native -DUSE_ASM -pg" ./configure --with-crypto --with-curl

#264
#CFLAGS="-O3 $extracflags -march=native -DUSE_ASM -pg" CXXFLAGS="-std=gnu++11" ./configure --with-crypto --with-curl

#242
#CFLAGS="-O3 $extracflags -march=native -DUSE_ASM -pg" CXXFLAGS="$CFLAGS" ./configure --with-crypto --with-curl

#245
#CFLAGS="-O3 $extracflags -march=native -DUSE_ASM -pg" CXXFLAGS="$CFLAGS -std=gnu++11" ./configure --with-crypto --with-curl


make -j $(grep processor /proc/cpuinfo | wc -l)

strip -s cpuminer


So with him I get 312 at best on this particular machine and that config of flags is basically the default if you uncomment everything so I didn't make him any faster, I just proved he can be much slower if wrong flags are used.

Now with yours, without any change, untouched cpuminer-opt-3.4.0.tar.gz, I get this:

Code:
root@xxx:~/cpuminer-opt# ./cpuminer -a lyra2re --benchmark

CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
CPU features: SSE2 AES AVX AVX2
SW built on Aug  4 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX AVX2

[2016-08-04 23:24:25] 4 miner threads started, using 'lyra2re' algorithm.
[2016-08-04 23:24:26] CPU #1: 65.54 kH, 82.16 kH/s
[2016-08-04 23:24:26] CPU #0: 65.54 kH, 81.67 kH/s
[2016-08-04 23:24:26] CPU #3: 65.54 kH, 81.84 kH/s
[2016-08-04 23:24:26] Total: 196.61 kH, 245.67 kH/s
[2016-08-04 23:24:26] CPU #2: 65.54 kH, 81.68 kH/s
[2016-08-04 23:24:30] CPU #0: 326.68 kH, 81.80 kH/s
[2016-08-04 23:24:30] CPU #3: 327.37 kH, 82.00 kH/s
[2016-08-04 23:24:30] Total: 785.13 kH, 327.64 kH/s
[2016-08-04 23:24:30] CPU #2: 326.73 kH, 81.75 kH/s
[2016-08-04 23:24:30] CPU #1: 328.64 kH, 82.01 kH/s
[2016-08-04 23:24:35] CPU #0: 409.02 kH, 81.78 kH/s
[2016-08-04 23:24:35] CPU #3: 409.99 kH, 81.93 kH/s
[2016-08-04 23:24:35] Total: 1474.38 kH, 327.46 kH/s
[2016-08-04 23:24:35] CPU #2: 408.76 kH, 81.73 kH/s
[2016-08-04 23:24:35] CPU #1: 410.04 kH, 81.94 kH/s
[2016-08-04 23:24:40] CPU #0: 408.89 kH, 81.78 kH/s
[2016-08-04 23:24:40] CPU #3: 409.63 kH, 81.94 kH/s
[2016-08-04 23:24:40] Total: 1637.32 kH, 327.39 kH/s

But when I add -flto I get the following error at the final link:

Code:
g++  -O3 -march=native -w -flto -std=gnu++11 -Lyes/lib  -Lyes/lib  -o cpuminer cpuminer-cpu-miner.o cpuminer-util.o cpuminer-uint256.o cpuminer-api.o cpuminer-sysinfos.o cpuminer-algo-gate-api.o algo/groestl/cpuminer-sph_groestl.o algo/skein/cpuminer-sph_skein.o algo/bmw/cpuminer-sph_bmw.o algo/shavite/cpuminer-sph_shavite.o algo/shavite/cpuminer-shavite.o algo/echo/cpuminer-sph_echo.o algo/blake/cpuminer-sph_blake.o algo/heavy/cpuminer-sph_hefty1.o algo/blake/cpuminer-mod_blakecoin.o algo/luffa/cpuminer-sph_luffa.o algo/cubehash/cpuminer-sph_cubehash.o algo/simd/cpuminer-sph_simd.o algo/hamsi/cpuminer-sph_hamsi.o algo/fugue/cpuminer-sph_fugue.o algo/gost/cpuminer-sph_gost.o algo/jh/cpuminer-sph_jh.o algo/keccak/cpuminer-sph_keccak.o algo/keccak/cpuminer-keccak.o algo/sha3/cpuminer-sph_sha2.o algo/sha3/cpuminer-sph_sha2big.o algo/shabal/cpuminer-sph_shabal.o algo/whirlpool/cpuminer-sph_whirlpool.o crypto/cpuminer-blake2s.o crypto/cpuminer-oaes_lib.o crypto/cpuminer-c_keccak.o crypto/cpuminer-c_groestl.o crypto/cpuminer-c_blake256.o crypto/cpuminer-c_jh.o crypto/cpuminer-c_skein.o crypto/cpuminer-hash.o crypto/cpuminer-aesb.o crypto/cpuminer-magimath.o algo/argon2/cpuminer-argon2a.o algo/argon2/ar2/cpuminer-argon2.o algo/argon2/ar2/cpuminer-opt.o algo/argon2/ar2/cpuminer-cores.o algo/argon2/ar2/cpuminer-ar2-scrypt-jane.o algo/argon2/ar2/cpuminer-blake2b.o algo/cpuminer-axiom.o algo/blake/cpuminer-blake.o algo/blake/cpuminer-blake2.o algo/blake/cpuminer-blakecoin.o algo/blake/cpuminer-decred.o algo/blake/cpuminer-pentablake.o algo/bmw/cpuminer-bmw256.o algo/cubehash/sse2/cpuminer-cubehash_sse2.o algo/cryptonight/cpuminer-cryptolight.o algo/cryptonight/cpuminer-cryptonight-common.o algo/cryptonight/cpuminer-cryptonight-aesni.o algo/cryptonight/cpuminer-cryptonight.o algo/cpuminer-drop.o algo/echo/aes_ni/cpuminer-hash.o algo/cpuminer-fresh.o algo/groestl/cpuminer-groestl.o algo/groestl/cpuminer-myr-groestl.o algo/groestl/sse2/cpuminer-grso.o algo/groestl/sse2/cpuminer-grso-asm.o algo/groestl/aes_ni/cpuminer-hash-groestl.o algo/groestl/aes_ni/cpuminer-hash-groestl256.o algo/haval/cpuminer-haval.o algo/heavy/cpuminer-heavy.o algo/heavy/cpuminer-bastion.o algo/cpuminer-hmq1725.o algo/hodl/cpuminer-hodl.o algo/hodl/cpuminer-hodl-gate.o algo/hodl/cpuminer-hodl_arith_uint256.o algo/hodl/cpuminer-hodl_uint256.o algo/hodl/cpuminer-hash.o algo/hodl/cpuminer-hmac_sha512.o algo/hodl/cpuminer-sha256.o algo/hodl/cpuminer-sha512.o algo/hodl/cpuminer-utilstrencodings.o algo/hodl/cpuminer-hodl-wolf.o algo/hodl/cpuminer-aes.o algo/hodl/cpuminer-sha512_avx.o algo/hodl/cpuminer-sha512_avx2.o algo/cpuminer-lbry.o algo/luffa/cpuminer-luffa.o algo/luffa/sse2/cpuminer-luffa_for_sse2.o algo/lyra2/cpuminer-lyra2.o algo/lyra2/cpuminer-sponge.o algo/lyra2/cpuminer-lyra2rev2.o algo/lyra2/cpuminer-lyra2re.o algo/keccak/sse2/cpuminer-keccak.o algo/cpuminer-m7m.o algo/cpuminer-neoscrypt.o algo/cpuminer-nist5.o algo/cpuminer-pluck.o algo/quark/cpuminer-quark.o algo/qubit/cpuminer-qubit.o algo/ripemd/cpuminer-sph_ripemd.o algo/cpuminer-scrypt.o algo/scryptjane/cpuminer-scrypt-jane.o algo/sha2/cpuminer-sha2.o algo/simd/sse2/cpuminer-nist.o algo/simd/sse2/cpuminer-vector.o algo/skein/cpuminer-skein.o algo/skein/cpuminer-skein2.o algo/cpuminer-s3.o algo/tiger/cpuminer-sph_tiger.o algo/whirlpool/cpuminer-whirlpool.o algo/whirlpool/cpuminer-whirlpoolx.o algo/x11/cpuminer-x11.o algo/x11/cpuminer-x11evo.o algo/x11/cpuminer-x11gost.o algo/x11/cpuminer-c11.o algo/x13/cpuminer-x13.o algo/x14/cpuminer-x14.o algo/x15/cpuminer-x15.o algo/x17/cpuminer-x17.o algo/yescrypt/cpuminer-yescrypt.o algo/yescrypt/cpuminer-yescrypt-common.o algo/yescrypt/cpuminer-sha256_Y.o algo/yescrypt/cpuminer-yescrypt-simd.o algo/cpuminer-zr5.o asm/cpuminer-neoscrypt_asm.o  asm/cpuminer-sha2-x64.o asm/cpuminer-scrypt-x64.o asm/cpuminer-aesb-x64.o   -lcurl -lz -ljansson -lpthread  -lssl -lcrypto -lgmp
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_avx2':
<artificial>:(.text+0x9712): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9729): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9760): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9785): undefined reference to `scrypt_ChunkMix_avx2'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_xop':
<artificial>:(.text+0x99f2): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a09): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a40): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a65): undefined reference to `scrypt_ChunkMix_xop'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_avx':
<artificial>:(.text+0x9cd2): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9ce9): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9d20): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9d45): undefined reference to `scrypt_ChunkMix_avx'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_ssse3':
<artificial>:(.text+0x9fb2): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0x9fc9): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0xa000): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0xa025): undefined reference to `scrypt_ChunkMix_ssse3'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_sse2':
<artificial>:(.text+0xa292): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa2a9): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa2e0): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa305): undefined reference to `scrypt_ChunkMix_sse2'
collect2: error: ld returned 1 exit status
Makefile:1333: recipe for target 'cpuminer' failed
make[2]: *** [cpuminer] Error 1
make[2]: Leaving directory '/root/cpuminer-opt'
Makefile:3453: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/root/cpuminer-opt'
Makefile:670: recipe for target 'all' failed
make: *** [all] Error 2

If you are unsure how to fix it could you at least guide me how to disable the whole scrypt (optimization) because I am really anxious to see what -flto will do.

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
ReiMomo
Sr. Member
****
Offline Offline

Activity: 499
Merit: 252



View Profile
August 05, 2016, 12:08:11 AM
 #956

So when is the Windows bin out?  Huh

███▄                 ▄███           ▄▄███▀▀█▄▄                   ▄▄█████▄▄           ███▄               ███    ▀██▄             ▄██▀
█████               ▄████       ▄▄████▄█ ██ ████▄▄           ▄▄████▀▀▀▀▀████▄▄       █████              ███      ███           ███
██████             ██████      ▀▀██▄█████▄▄███▄█▀▀▀         ███▀▀         ▀▀███      ███▀██▄            ███       ▀██▄       ▄██▀
███▀███▄         ▄██▀ ███     ███▄▀███▀████████ ████       ███               ███     ███  ███           ███         ███     ███
███  ▀███       ███▀  ███    █████ █████████  █▄▀▀▀▄█     ███                 ███    ███   ▀██▄         ███          ▀██▄ ▄██▀
███    ███▄   ▄███    ███   ▀████▀▄██████▀█████████▀██   ███                   ███   ███     ███        ███            ▀███▀
███     ▀███▄███▀     ███   █▄▄▄▄███  ████████▀███████   ███                   ███   ███      ▀██▄      ███            █████
███       █████       ███   ██▀██████████▀████████████   ███                   ███   ███        ███     ███           ██▀ ▀██
███        ▀█▀        ███    ███████▀████████▀▀▀███▄█     ███                 ███    ███         ▀██▄   ███         ▄██▀   ▀██▄
███                   ███
     █  █▀█████████ ███ ███       ███               ███     ███           ███  ███        ██▀       ▀██
███                   ███
      █████▀▄▄▀█▄██▄▀▀▀▄██         ███▄▄         ▄▄███      ███            ▀██▄███      ▄██▀         ▀██▄
███                   ███
       ▀▀██▄▀▀▄███▀████▀▀           ▀▀████▄▄▄▄▄████▀▀       ███              █████     ███             ███
███                   ███
           ▀▀██████▀▀                   ▀▀█████▀▀           ███               ▀███   ▄██▀               ▀██▄
.
.TRADE, EARN & OWN THE EXCHANGE
████   WHITEPAPER    FACEBOOK    TWITTER    LINKEDIN    TELEGRAM    CRUNCHBASE   ████
|FREETRADING &
ICO LISTING
|SUPERIORTO NASDAQ
AND LSE
|US$ 29MRAISED IN
2 WEEKS
|
[]
johnsmithx
Hero Member
*****
Offline Offline

Activity: 587
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 05, 2016, 01:30:49 AM
 #957

Success!

I did this very ugly hack, joblo please don't get a heart attack:
Code:
--- scrypt-jane-romix-template.h.orig   2016-02-05 22:05:38.000000000 +0000
+++ scrypt-jane-romix-template.h 2016-08-05 00:37:48.949684265 +0000
@@ -86,9 +86,9 @@
  for (i = 0; i < /*N - 1*/511; i++, block += chunkWords) {
        /* 3: V_i = X */
        /* 4: X = H(X) */
-       SCRYPT_CHUNKMIX_FN(block + chunkWords, block, NULL, /*r*/1);
+//         SCRYPT_CHUNKMIX_FN(block + chunkWords, block, NULL, /*r*/1);
  }
- SCRYPT_CHUNKMIX_FN(X, block, NULL, 1);
+//     SCRYPT_CHUNKMIX_FN(X, block, NULL, 1);

  /* 6: for i = 0 to N - 1 do */
  for (i = 0; i < /*N*/512; i += 2) {
@@ -96,13 +96,13 @@
        j = X[chunkWords - SCRYPT_BLOCK_WORDS] & /*(N - 1)*/511;

        /* 8: X = H(Y ^ V_j) */
-       SCRYPT_CHUNKMIX_FN(Y, X, scrypt_item(V, j, chunkWords), 1);
+//         SCRYPT_CHUNKMIX_FN(Y, X, scrypt_item(V, j, chunkWords), 1);

        /* 7: j = Integerify(Y) % N */
        j = Y[chunkWords - SCRYPT_BLOCK_WORDS] & /*(N - 1)*/511;

        /* 8: X = H(Y ^ V_j) */
-       SCRYPT_CHUNKMIX_FN(X, Y, scrypt_item(V, j, chunkWords), 1);
+//         SCRYPT_CHUNKMIX_FN(X, Y, scrypt_item(V, j, chunkWords), 1);
  }

  /* 10: B' = X */

And now it does compile with -flto and here is the result:
Code:
CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
CPU features: SSE2 AES AVX AVX2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX AVX2

[2016-08-05 00:58:03] 4 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 00:58:04] CPU #0: 65.54 kH, 84.17 kH/s
[2016-08-05 00:58:04] CPU #1: 65.54 kH, 84.25 kH/s
[2016-08-05 00:58:04] CPU #3: 65.54 kH, 84.23 kH/s
[2016-08-05 00:58:04] Total: 196.61 kH, 252.64 kH/s
[2016-08-05 00:58:04] CPU #2: 65.54 kH, 83.86 kH/s
[2016-08-05 00:58:08] CPU #2: 335.45 kH, 84.02 kH/s
[2016-08-05 00:58:08] CPU #1: 336.99 kH, 84.25 kH/s
[2016-08-05 00:58:08] CPU #3: 336.92 kH, 84.24 kH/s
[2016-08-05 00:58:08] Total: 1074.89 kH, 336.68 kH/s
[2016-08-05 00:58:08] CPU #0: 336.67 kH, 84.04 kH/s
[2016-08-05 00:58:13] CPU #2: 420.12 kH, 84.16 kH/s
[2016-08-05 00:58:13] CPU #1: 421.26 kH, 84.35 kH/s
[2016-08-05 00:58:13] CPU #0: 420.18 kH, 84.19 kH/s
[2016-08-05 00:58:13] CPU #3: 421.18 kH, 84.34 kH/s
[2016-08-05 00:58:13] Total: 1682.74 kH, 337.04 kH/s
[2016-08-05 00:58:18] CPU #2: 420.78 kH, 84.16 kH/s
[2016-08-05 00:58:18] CPU #1: 421.77 kH, 84.31 kH/s
[2016-08-05 00:58:18] CPU #0: 420.97 kH, 84.19 kH/s
[2016-08-05 00:58:18] CPU #3: 421.69 kH, 84.26 kH/s
[2016-08-05 00:58:18] Total: 1685.21 kH, 336.92 kH/s
[2016-08-05 00:58:23] CPU #1: 421.54 kH, 84.37 kH/s
[2016-08-05 00:58:23] CPU #3: 421.31 kH, 84.32 kH/s
[2016-08-05 00:58:23] CPU #2: 420.81 kH, 83.99 kH/s
[2016-08-05 00:58:23] Total: 1684.63 kH, 336.87 kH/s
[2016-08-05 00:58:23] CPU #0: 420.93 kH, 84.01 kH/s
[2016-08-05 00:58:28] CPU #2: 419.96 kH, 84.10 kH/s
[2016-08-05 00:58:28] CPU #0: 420.07 kH, 84.10 kH/s
[2016-08-05 00:58:28] CPU #1: 421.87 kH, 84.17 kH/s
[2016-08-05 00:58:28] CPU #3: 421.58 kH, 84.09 kH/s
[2016-08-05 00:58:28] Total: 1683.49 kH, 336.46 kH/s

So using -flto gives another 2.75% speed increase. That's 7.7% speed increase in total over tpruvot.

Now this is with -flto and -fuse-linker-plugin:
Code:
CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
CPU features: SSE2 AES AVX AVX2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX AVX2

[2016-08-05 00:55:15] 4 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 00:55:16] CPU #0: 65.54 kH, 84.75 kH/s
[2016-08-05 00:55:16] CPU #1: 65.54 kH, 84.78 kH/s
[2016-08-05 00:55:16] CPU #2: 65.54 kH, 84.56 kH/s
[2016-08-05 00:55:16] CPU #3: 65.54 kH, 84.44 kH/s
[2016-08-05 00:55:16] Total: 262.14 kH, 338.53 kH/s
[2016-08-05 00:55:20] CPU #3: 337.77 kH, 84.06 kH/s
[2016-08-05 00:55:20] Total: 534.38 kH, 338.15 kH/s
[2016-08-05 00:55:20] CPU #2: 338.22 kH, 84.01 kH/s
[2016-08-05 00:55:20] CPU #1: 339.13 kH, 84.09 kH/s
[2016-08-05 00:55:20] CPU #0: 338.98 kH, 84.02 kH/s
[2016-08-05 00:55:25] CPU #0: 420.11 kH, 84.71 kH/s
[2016-08-05 00:55:25] CPU #2: 420.03 kH, 84.49 kH/s
[2016-08-05 00:55:25] CPU #3: 420.31 kH, 84.05 kH/s
[2016-08-05 00:55:25] Total: 1599.59 kH, 337.33 kH/s
[2016-08-05 00:55:25] CPU #1: 420.43 kH, 84.07 kH/s
[2016-08-05 00:55:30] CPU #3: 420.25 kH, 83.97 kH/s
[2016-08-05 00:55:30] Total: 1680.82 kH, 337.24 kH/s
[2016-08-05 00:55:30] CPU #2: 422.44 kH, 83.97 kH/s
[2016-08-05 00:55:30] CPU #0: 423.54 kH, 83.98 kH/s
[2016-08-05 00:55:30] CPU #1: 420.36 kH, 83.97 kH/s
[2016-08-05 00:55:35] CPU #0: 419.88 kH, 84.64 kH/s
[2016-08-05 00:55:35] CPU #2: 419.84 kH, 84.39 kH/s
[2016-08-05 00:55:35] CPU #3: 419.85 kH, 84.00 kH/s
[2016-08-05 00:55:35] Total: 1679.93 kH, 337.00 kH/s
[2016-08-05 00:55:35] CPU #1: 419.85 kH, 84.02 kH/s
[2016-08-05 00:55:40] CPU #0: 423.20 kH, 84.42 kH/s
[2016-08-05 00:55:40] CPU #3: 420.02 kH, 84.32 kH/s
[2016-08-05 00:55:40] Total: 1682.91 kH, 337.15 kH/s

Basically the same speed. Now what if I actually call tpruvot's build.sh, exactly the one I showed in my previous post:
Code:
CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
CPU features: SSE2 AES AVX AVX2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX AVX2

[2016-08-05 01:10:02] 4 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 01:10:03] CPU #0: 65.54 kH, 84.11 kH/s
[2016-08-05 01:10:03] CPU #1: 65.54 kH, 83.93 kH/s
[2016-08-05 01:10:03] CPU #2: 65.54 kH, 83.86 kH/s
[2016-08-05 01:10:03] CPU #3: 65.54 kH, 83.96 kH/s
[2016-08-05 01:10:03] Total: 262.14 kH, 335.86 kH/s
[2016-08-05 01:10:07] CPU #1: 335.71 kH, 84.00 kH/s
[2016-08-05 01:10:07] CPU #2: 335.44 kH, 83.92 kH/s
[2016-08-05 01:10:07] CPU #3: 335.85 kH, 83.99 kH/s
[2016-08-05 01:10:07] Total: 1072.54 kH, 336.02 kH/s
[2016-08-05 01:10:07] CPU #0: 336.45 kH, 83.93 kH/s
[2016-08-05 01:10:12] CPU #1: 420.00 kH, 84.00 kH/s
[2016-08-05 01:10:12] CPU #2: 419.62 kH, 83.92 kH/s
[2016-08-05 01:10:12] CPU #3: 419.93 kH, 83.99 kH/s
[2016-08-05 01:10:12] Total: 1596.00 kH, 335.82 kH/s
[2016-08-05 01:10:12] CPU #0: 419.64 kH, 83.91 kH/s
[2016-08-05 01:10:17] CPU #1: 419.98 kH, 84.05 kH/s
[2016-08-05 01:10:17] CPU #2: 419.58 kH, 83.98 kH/s
[2016-08-05 01:10:17] CPU #3: 419.93 kH, 84.03 kH/s
[2016-08-05 01:10:17] Total: 1679.12 kH, 335.98 kH/s
[2016-08-05 01:10:17] CPU #0: 419.53 kH, 83.99 kH/s
[2016-08-05 01:10:22] CPU #2: 419.92 kH, 84.04 kH/s
[2016-08-05 01:10:22] CPU #1: 420.25 kH, 84.04 kH/s
[2016-08-05 01:10:22] CPU #0: 419.93 kH, 84.04 kH/s
[2016-08-05 01:10:22] CPU #3: 420.18 kH, 84.02 kH/s
[2016-08-05 01:10:22] Total: 1680.28 kH, 336.14 kH/s

Still the same (maximum) speed.

So I will be using joblo's cpuminer with tpruvot's (uncommented) build.sh because that build.sh has all those other flags (including -falign-*) which may or may not matter, so just to be safe..


EDIT: when I took the avx2 binary and tried to run it on a avx cpu I got this:
Code:
CPU:       Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX

Illegal instruction (core dumped)

But wasn't the whole idea that all the cpu features will be compiled in and what particular feature shall be used will be determined at the runtime? It's not a big deal, I just recompiled it and I will have two versions (avx and avx2) and run the one that's appropriate to the cpu. Just I thought I would report this.

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 05, 2016, 02:25:33 PM
 #958

So when is the Windows bin out?  Huh

Cryptomining Blog have usually been good producing binaries within a few hours of release.
I'm sure why not this time. You could ask.

I can't build distributable Windows binaries but mingw works to compile your own, instructions in README.md

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 05, 2016, 02:51:44 PM
 #959

Success!

[snip]

So I will be using joblo's cpuminer with tpruvot's (uncommented) build.sh because that build.sh has all those other flags (including -falign-*) which may or may not matter, so just to be safe..


EDIT: when I took the avx2 binary and tried to run it on a avx cpu I got this:
Code:
CPU:       Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX

Illegal instruction (core dumped)

But wasn't the whole idea that all the cpu features will be compiled in and what particular feature shall be used will be determined at the runtime? It's not a big deal, I just recompiled it and I will have two versions (avx and avx2) and run the one that's appropriate to the cpu. Just I thought I would report this.

Excellent work. The easiest way to block the compile error is to comment out the source dir for argon2 and remove the registration
call for argon2 in algo-gate-api.c:register_algo_gate. You can easilly remove any algo this way.

You have demonstrated that LTO improves performance with the new compiler but has some incompatibilities with the existing
argon2 code. I will investigate argon2 to try to solve it.

CPU architecture selection is made at compile time. If you do a native compile on a CPU that supports AVX2 you can not run it
on a CPU with only AVX.  If you want to cross compile you must specify the arch of the target CPU, and produce seperate executables
for each desired architecture.

My logic for AVX2 isn't fully implemented yet in the capablilities checks, had it been it would have
displayed a message warning of the impending crash, then crashed. This is what you should see when implemented:

Code:
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
[color=red]Unsupported CPU or SW configuration, miner will likely crash![/color]
Illegal instruction (core dumped)


cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
August 05, 2016, 04:13:36 PM
 #960


But when I add -flto I get the following error at the final link:

Code:
g++  -O3 -march=native -w -flto -std=gnu++11 -Lyes/lib  -Lyes/lib  -o cpuminer cpuminer-cpu-miner.o cpuminer-util.o cpuminer-uint256.o cpuminer-api.o cpuminer-sysinfos.o cpuminer-algo-gate-api.o algo/groestl/cpuminer-sph_groestl.o algo/skein/cpuminer-sph_skein.o algo/bmw/cpuminer-sph_bmw.o algo/shavite/cpuminer-sph_shavite.o algo/shavite/cpuminer-shavite.o algo/echo/cpuminer-sph_echo.o algo/blake/cpuminer-sph_blake.o algo/heavy/cpuminer-sph_hefty1.o algo/blake/cpuminer-mod_blakecoin.o algo/luffa/cpuminer-sph_luffa.o algo/cubehash/cpuminer-sph_cubehash.o algo/simd/cpuminer-sph_simd.o algo/hamsi/cpuminer-sph_hamsi.o algo/fugue/cpuminer-sph_fugue.o algo/gost/cpuminer-sph_gost.o algo/jh/cpuminer-sph_jh.o algo/keccak/cpuminer-sph_keccak.o algo/keccak/cpuminer-keccak.o algo/sha3/cpuminer-sph_sha2.o algo/sha3/cpuminer-sph_sha2big.o algo/shabal/cpuminer-sph_shabal.o algo/whirlpool/cpuminer-sph_whirlpool.o crypto/cpuminer-blake2s.o crypto/cpuminer-oaes_lib.o crypto/cpuminer-c_keccak.o crypto/cpuminer-c_groestl.o crypto/cpuminer-c_blake256.o crypto/cpuminer-c_jh.o crypto/cpuminer-c_skein.o crypto/cpuminer-hash.o crypto/cpuminer-aesb.o crypto/cpuminer-magimath.o algo/argon2/cpuminer-argon2a.o algo/argon2/ar2/cpuminer-argon2.o algo/argon2/ar2/cpuminer-opt.o algo/argon2/ar2/cpuminer-cores.o algo/argon2/ar2/cpuminer-ar2-scrypt-jane.o algo/argon2/ar2/cpuminer-blake2b.o algo/cpuminer-axiom.o algo/blake/cpuminer-blake.o algo/blake/cpuminer-blake2.o algo/blake/cpuminer-blakecoin.o algo/blake/cpuminer-decred.o algo/blake/cpuminer-pentablake.o algo/bmw/cpuminer-bmw256.o algo/cubehash/sse2/cpuminer-cubehash_sse2.o algo/cryptonight/cpuminer-cryptolight.o algo/cryptonight/cpuminer-cryptonight-common.o algo/cryptonight/cpuminer-cryptonight-aesni.o algo/cryptonight/cpuminer-cryptonight.o algo/cpuminer-drop.o algo/echo/aes_ni/cpuminer-hash.o algo/cpuminer-fresh.o algo/groestl/cpuminer-groestl.o algo/groestl/cpuminer-myr-groestl.o algo/groestl/sse2/cpuminer-grso.o algo/groestl/sse2/cpuminer-grso-asm.o algo/groestl/aes_ni/cpuminer-hash-groestl.o algo/groestl/aes_ni/cpuminer-hash-groestl256.o algo/haval/cpuminer-haval.o algo/heavy/cpuminer-heavy.o algo/heavy/cpuminer-bastion.o algo/cpuminer-hmq1725.o algo/hodl/cpuminer-hodl.o algo/hodl/cpuminer-hodl-gate.o algo/hodl/cpuminer-hodl_arith_uint256.o algo/hodl/cpuminer-hodl_uint256.o algo/hodl/cpuminer-hash.o algo/hodl/cpuminer-hmac_sha512.o algo/hodl/cpuminer-sha256.o algo/hodl/cpuminer-sha512.o algo/hodl/cpuminer-utilstrencodings.o algo/hodl/cpuminer-hodl-wolf.o algo/hodl/cpuminer-aes.o algo/hodl/cpuminer-sha512_avx.o algo/hodl/cpuminer-sha512_avx2.o algo/cpuminer-lbry.o algo/luffa/cpuminer-luffa.o algo/luffa/sse2/cpuminer-luffa_for_sse2.o algo/lyra2/cpuminer-lyra2.o algo/lyra2/cpuminer-sponge.o algo/lyra2/cpuminer-lyra2rev2.o algo/lyra2/cpuminer-lyra2re.o algo/keccak/sse2/cpuminer-keccak.o algo/cpuminer-m7m.o algo/cpuminer-neoscrypt.o algo/cpuminer-nist5.o algo/cpuminer-pluck.o algo/quark/cpuminer-quark.o algo/qubit/cpuminer-qubit.o algo/ripemd/cpuminer-sph_ripemd.o algo/cpuminer-scrypt.o algo/scryptjane/cpuminer-scrypt-jane.o algo/sha2/cpuminer-sha2.o algo/simd/sse2/cpuminer-nist.o algo/simd/sse2/cpuminer-vector.o algo/skein/cpuminer-skein.o algo/skein/cpuminer-skein2.o algo/cpuminer-s3.o algo/tiger/cpuminer-sph_tiger.o algo/whirlpool/cpuminer-whirlpool.o algo/whirlpool/cpuminer-whirlpoolx.o algo/x11/cpuminer-x11.o algo/x11/cpuminer-x11evo.o algo/x11/cpuminer-x11gost.o algo/x11/cpuminer-c11.o algo/x13/cpuminer-x13.o algo/x14/cpuminer-x14.o algo/x15/cpuminer-x15.o algo/x17/cpuminer-x17.o algo/yescrypt/cpuminer-yescrypt.o algo/yescrypt/cpuminer-yescrypt-common.o algo/yescrypt/cpuminer-sha256_Y.o algo/yescrypt/cpuminer-yescrypt-simd.o algo/cpuminer-zr5.o asm/cpuminer-neoscrypt_asm.o  asm/cpuminer-sha2-x64.o asm/cpuminer-scrypt-x64.o asm/cpuminer-aesb-x64.o   -lcurl -lz -ljansson -lpthread  -lssl -lcrypto -lgmp
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_avx2':
<artificial>:(.text+0x9712): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9729): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9760): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9785): undefined reference to `scrypt_ChunkMix_avx2'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_xop':
<artificial>:(.text+0x99f2): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a09): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a40): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a65): undefined reference to `scrypt_ChunkMix_xop'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_avx':
<artificial>:(.text+0x9cd2): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9ce9): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9d20): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9d45): undefined reference to `scrypt_ChunkMix_avx'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_ssse3':
<artificial>:(.text+0x9fb2): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0x9fc9): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0xa000): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0xa025): undefined reference to `scrypt_ChunkMix_ssse3'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_sse2':
<artificial>:(.text+0xa292): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa2a9): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa2e0): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa305): undefined reference to `scrypt_ChunkMix_sse2'
collect2: error: ld returned 1 exit status
Makefile:1333: recipe for target 'cpuminer' failed
make[2]: *** [cpuminer] Error 1
make[2]: Leaving directory '/root/cpuminer-opt'
Makefile:3453: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/root/cpuminer-opt'
Makefile:670: recipe for target 'all' failed
make: *** [all] Error 2


I just want to make sure I understand the problem definition

- multi is faster with -flto
- multi without -flto is slower than identically compiled opt
- multi with -flto is faster than pre-avx2 opt compiled without -flto
- opt fails to compile with gcc 5.4.0 with -flto
- -flto compiles with gcc 4.8.4 with no effect in performance.

The significant points are:

- flto is faster with gcc 5.4.0
- code that compiles with -flto using gcc 4.8.4 fails to compile using gcc 5.4.0.

The code that fails to compile is pretty ugly. It uses asm function pointers to select targets at compile time.
I've never seen anything like this so it will take a while to understand what is going on. It looks like the code is
self contained and the error doesn't seem to be related to missing libraries.

As a workaround, if you disable argon2 you can get the best of my optimizations as well as LTO, unless some of my opts
conflict with LTO. It wouldn't be the first time I step on the compiler when trying to optimize.
related to missing libraries

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 [48] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 ... 190 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!