Bitcoin Forum
July 22, 2018, 09:12:43 AM *
News: Latest stable version of Bitcoin Core: 0.16.1  [Torrent]. (New!)
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 [50] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 ... 191 »
  Print  
Author Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner  (Read 411541 times)
johnsmithx
Hero Member
*****
Offline Offline

Activity: 588
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 05, 2016, 01:30:49 AM
 #981

Success!

I did this very ugly hack, joblo please don't get a heart attack:
Code:
--- scrypt-jane-romix-template.h.orig   2016-02-05 22:05:38.000000000 +0000
+++ scrypt-jane-romix-template.h 2016-08-05 00:37:48.949684265 +0000
@@ -86,9 +86,9 @@
  for (i = 0; i < /*N - 1*/511; i++, block += chunkWords) {
        /* 3: V_i = X */
        /* 4: X = H(X) */
-       SCRYPT_CHUNKMIX_FN(block + chunkWords, block, NULL, /*r*/1);
+//         SCRYPT_CHUNKMIX_FN(block + chunkWords, block, NULL, /*r*/1);
  }
- SCRYPT_CHUNKMIX_FN(X, block, NULL, 1);
+//     SCRYPT_CHUNKMIX_FN(X, block, NULL, 1);

  /* 6: for i = 0 to N - 1 do */
  for (i = 0; i < /*N*/512; i += 2) {
@@ -96,13 +96,13 @@
        j = X[chunkWords - SCRYPT_BLOCK_WORDS] & /*(N - 1)*/511;

        /* 8: X = H(Y ^ V_j) */
-       SCRYPT_CHUNKMIX_FN(Y, X, scrypt_item(V, j, chunkWords), 1);
+//         SCRYPT_CHUNKMIX_FN(Y, X, scrypt_item(V, j, chunkWords), 1);

        /* 7: j = Integerify(Y) % N */
        j = Y[chunkWords - SCRYPT_BLOCK_WORDS] & /*(N - 1)*/511;

        /* 8: X = H(Y ^ V_j) */
-       SCRYPT_CHUNKMIX_FN(X, Y, scrypt_item(V, j, chunkWords), 1);
+//         SCRYPT_CHUNKMIX_FN(X, Y, scrypt_item(V, j, chunkWords), 1);
  }

  /* 10: B' = X */

And now it does compile with -flto and here is the result:
Code:
CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
CPU features: SSE2 AES AVX AVX2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX AVX2

[2016-08-05 00:58:03] 4 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 00:58:04] CPU #0: 65.54 kH, 84.17 kH/s
[2016-08-05 00:58:04] CPU #1: 65.54 kH, 84.25 kH/s
[2016-08-05 00:58:04] CPU #3: 65.54 kH, 84.23 kH/s
[2016-08-05 00:58:04] Total: 196.61 kH, 252.64 kH/s
[2016-08-05 00:58:04] CPU #2: 65.54 kH, 83.86 kH/s
[2016-08-05 00:58:08] CPU #2: 335.45 kH, 84.02 kH/s
[2016-08-05 00:58:08] CPU #1: 336.99 kH, 84.25 kH/s
[2016-08-05 00:58:08] CPU #3: 336.92 kH, 84.24 kH/s
[2016-08-05 00:58:08] Total: 1074.89 kH, 336.68 kH/s
[2016-08-05 00:58:08] CPU #0: 336.67 kH, 84.04 kH/s
[2016-08-05 00:58:13] CPU #2: 420.12 kH, 84.16 kH/s
[2016-08-05 00:58:13] CPU #1: 421.26 kH, 84.35 kH/s
[2016-08-05 00:58:13] CPU #0: 420.18 kH, 84.19 kH/s
[2016-08-05 00:58:13] CPU #3: 421.18 kH, 84.34 kH/s
[2016-08-05 00:58:13] Total: 1682.74 kH, 337.04 kH/s
[2016-08-05 00:58:18] CPU #2: 420.78 kH, 84.16 kH/s
[2016-08-05 00:58:18] CPU #1: 421.77 kH, 84.31 kH/s
[2016-08-05 00:58:18] CPU #0: 420.97 kH, 84.19 kH/s
[2016-08-05 00:58:18] CPU #3: 421.69 kH, 84.26 kH/s
[2016-08-05 00:58:18] Total: 1685.21 kH, 336.92 kH/s
[2016-08-05 00:58:23] CPU #1: 421.54 kH, 84.37 kH/s
[2016-08-05 00:58:23] CPU #3: 421.31 kH, 84.32 kH/s
[2016-08-05 00:58:23] CPU #2: 420.81 kH, 83.99 kH/s
[2016-08-05 00:58:23] Total: 1684.63 kH, 336.87 kH/s
[2016-08-05 00:58:23] CPU #0: 420.93 kH, 84.01 kH/s
[2016-08-05 00:58:28] CPU #2: 419.96 kH, 84.10 kH/s
[2016-08-05 00:58:28] CPU #0: 420.07 kH, 84.10 kH/s
[2016-08-05 00:58:28] CPU #1: 421.87 kH, 84.17 kH/s
[2016-08-05 00:58:28] CPU #3: 421.58 kH, 84.09 kH/s
[2016-08-05 00:58:28] Total: 1683.49 kH, 336.46 kH/s

So using -flto gives another 2.75% speed increase. That's 7.7% speed increase in total over tpruvot.

Now this is with -flto and -fuse-linker-plugin:
Code:
CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
CPU features: SSE2 AES AVX AVX2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX AVX2

[2016-08-05 00:55:15] 4 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 00:55:16] CPU #0: 65.54 kH, 84.75 kH/s
[2016-08-05 00:55:16] CPU #1: 65.54 kH, 84.78 kH/s
[2016-08-05 00:55:16] CPU #2: 65.54 kH, 84.56 kH/s
[2016-08-05 00:55:16] CPU #3: 65.54 kH, 84.44 kH/s
[2016-08-05 00:55:16] Total: 262.14 kH, 338.53 kH/s
[2016-08-05 00:55:20] CPU #3: 337.77 kH, 84.06 kH/s
[2016-08-05 00:55:20] Total: 534.38 kH, 338.15 kH/s
[2016-08-05 00:55:20] CPU #2: 338.22 kH, 84.01 kH/s
[2016-08-05 00:55:20] CPU #1: 339.13 kH, 84.09 kH/s
[2016-08-05 00:55:20] CPU #0: 338.98 kH, 84.02 kH/s
[2016-08-05 00:55:25] CPU #0: 420.11 kH, 84.71 kH/s
[2016-08-05 00:55:25] CPU #2: 420.03 kH, 84.49 kH/s
[2016-08-05 00:55:25] CPU #3: 420.31 kH, 84.05 kH/s
[2016-08-05 00:55:25] Total: 1599.59 kH, 337.33 kH/s
[2016-08-05 00:55:25] CPU #1: 420.43 kH, 84.07 kH/s
[2016-08-05 00:55:30] CPU #3: 420.25 kH, 83.97 kH/s
[2016-08-05 00:55:30] Total: 1680.82 kH, 337.24 kH/s
[2016-08-05 00:55:30] CPU #2: 422.44 kH, 83.97 kH/s
[2016-08-05 00:55:30] CPU #0: 423.54 kH, 83.98 kH/s
[2016-08-05 00:55:30] CPU #1: 420.36 kH, 83.97 kH/s
[2016-08-05 00:55:35] CPU #0: 419.88 kH, 84.64 kH/s
[2016-08-05 00:55:35] CPU #2: 419.84 kH, 84.39 kH/s
[2016-08-05 00:55:35] CPU #3: 419.85 kH, 84.00 kH/s
[2016-08-05 00:55:35] Total: 1679.93 kH, 337.00 kH/s
[2016-08-05 00:55:35] CPU #1: 419.85 kH, 84.02 kH/s
[2016-08-05 00:55:40] CPU #0: 423.20 kH, 84.42 kH/s
[2016-08-05 00:55:40] CPU #3: 420.02 kH, 84.32 kH/s
[2016-08-05 00:55:40] Total: 1682.91 kH, 337.15 kH/s

Basically the same speed. Now what if I actually call tpruvot's build.sh, exactly the one I showed in my previous post:
Code:
CPU: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
CPU features: SSE2 AES AVX AVX2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX AVX2

[2016-08-05 01:10:02] 4 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 01:10:03] CPU #0: 65.54 kH, 84.11 kH/s
[2016-08-05 01:10:03] CPU #1: 65.54 kH, 83.93 kH/s
[2016-08-05 01:10:03] CPU #2: 65.54 kH, 83.86 kH/s
[2016-08-05 01:10:03] CPU #3: 65.54 kH, 83.96 kH/s
[2016-08-05 01:10:03] Total: 262.14 kH, 335.86 kH/s
[2016-08-05 01:10:07] CPU #1: 335.71 kH, 84.00 kH/s
[2016-08-05 01:10:07] CPU #2: 335.44 kH, 83.92 kH/s
[2016-08-05 01:10:07] CPU #3: 335.85 kH, 83.99 kH/s
[2016-08-05 01:10:07] Total: 1072.54 kH, 336.02 kH/s
[2016-08-05 01:10:07] CPU #0: 336.45 kH, 83.93 kH/s
[2016-08-05 01:10:12] CPU #1: 420.00 kH, 84.00 kH/s
[2016-08-05 01:10:12] CPU #2: 419.62 kH, 83.92 kH/s
[2016-08-05 01:10:12] CPU #3: 419.93 kH, 83.99 kH/s
[2016-08-05 01:10:12] Total: 1596.00 kH, 335.82 kH/s
[2016-08-05 01:10:12] CPU #0: 419.64 kH, 83.91 kH/s
[2016-08-05 01:10:17] CPU #1: 419.98 kH, 84.05 kH/s
[2016-08-05 01:10:17] CPU #2: 419.58 kH, 83.98 kH/s
[2016-08-05 01:10:17] CPU #3: 419.93 kH, 84.03 kH/s
[2016-08-05 01:10:17] Total: 1679.12 kH, 335.98 kH/s
[2016-08-05 01:10:17] CPU #0: 419.53 kH, 83.99 kH/s
[2016-08-05 01:10:22] CPU #2: 419.92 kH, 84.04 kH/s
[2016-08-05 01:10:22] CPU #1: 420.25 kH, 84.04 kH/s
[2016-08-05 01:10:22] CPU #0: 419.93 kH, 84.04 kH/s
[2016-08-05 01:10:22] CPU #3: 420.18 kH, 84.02 kH/s
[2016-08-05 01:10:22] Total: 1680.28 kH, 336.14 kH/s

Still the same (maximum) speed.

So I will be using joblo's cpuminer with tpruvot's (uncommented) build.sh because that build.sh has all those other flags (including -falign-*) which may or may not matter, so just to be safe..


EDIT: when I took the avx2 binary and tried to run it on a avx cpu I got this:
Code:
CPU:       Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX

Illegal instruction (core dumped)

But wasn't the whole idea that all the cpu features will be compiled in and what particular feature shall be used will be determined at the runtime? It's not a big deal, I just recompiled it and I will have two versions (avx and avx2) and run the one that's appropriate to the cpu. Just I thought I would report this.

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
1532250763
Hero Member
*
Offline Offline

Posts: 1532250763

View Profile Personal Message (Offline)

Ignore
1532250763
Reply with quote  #2

1532250763
Report to moderator
1532250763
Hero Member
*
Offline Offline

Posts: 1532250763

View Profile Personal Message (Offline)

Ignore
1532250763
Reply with quote  #2

1532250763
Report to moderator
1532250763
Hero Member
*
Offline Offline

Posts: 1532250763

View Profile Personal Message (Offline)

Ignore
1532250763
Reply with quote  #2

1532250763
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1532250763
Hero Member
*
Offline Offline

Posts: 1532250763

View Profile Personal Message (Offline)

Ignore
1532250763
Reply with quote  #2

1532250763
Report to moderator
1532250763
Hero Member
*
Offline Offline

Posts: 1532250763

View Profile Personal Message (Offline)

Ignore
1532250763
Reply with quote  #2

1532250763
Report to moderator
1532250763
Hero Member
*
Offline Offline

Posts: 1532250763

View Profile Personal Message (Offline)

Ignore
1532250763
Reply with quote  #2

1532250763
Report to moderator
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 02:25:33 PM
 #982

So when is the Windows bin out?  Huh

Cryptomining Blog have usually been good producing binaries within a few hours of release.
I'm sure why not this time. You could ask.

I can't build distributable Windows binaries but mingw works to compile your own, instructions in README.md

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 02:51:44 PM
 #983

Success!

[snip]

So I will be using joblo's cpuminer with tpruvot's (uncommented) build.sh because that build.sh has all those other flags (including -falign-*) which may or may not matter, so just to be safe..


EDIT: when I took the avx2 binary and tried to run it on a avx cpu I got this:
Code:
CPU:       Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX

Illegal instruction (core dumped)

But wasn't the whole idea that all the cpu features will be compiled in and what particular feature shall be used will be determined at the runtime? It's not a big deal, I just recompiled it and I will have two versions (avx and avx2) and run the one that's appropriate to the cpu. Just I thought I would report this.

Excellent work. The easiest way to block the compile error is to comment out the source dir for argon2 and remove the registration
call for argon2 in algo-gate-api.c:register_algo_gate. You can easilly remove any algo this way.

You have demonstrated that LTO improves performance with the new compiler but has some incompatibilities with the existing
argon2 code. I will investigate argon2 to try to solve it.

CPU architecture selection is made at compile time. If you do a native compile on a CPU that supports AVX2 you can not run it
on a CPU with only AVX.  If you want to cross compile you must specify the arch of the target CPU, and produce seperate executables
for each desired architecture.

My logic for AVX2 isn't fully implemented yet in the capablilities checks, had it been it would have
displayed a message warning of the impending crash, then crashed. This is what you should see when implemented:

Code:
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
[color=red]Unsupported CPU or SW configuration, miner will likely crash![/color]
Illegal instruction (core dumped)


cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 04:13:36 PM
 #984


But when I add -flto I get the following error at the final link:

Code:
g++  -O3 -march=native -w -flto -std=gnu++11 -Lyes/lib  -Lyes/lib  -o cpuminer cpuminer-cpu-miner.o cpuminer-util.o cpuminer-uint256.o cpuminer-api.o cpuminer-sysinfos.o cpuminer-algo-gate-api.o algo/groestl/cpuminer-sph_groestl.o algo/skein/cpuminer-sph_skein.o algo/bmw/cpuminer-sph_bmw.o algo/shavite/cpuminer-sph_shavite.o algo/shavite/cpuminer-shavite.o algo/echo/cpuminer-sph_echo.o algo/blake/cpuminer-sph_blake.o algo/heavy/cpuminer-sph_hefty1.o algo/blake/cpuminer-mod_blakecoin.o algo/luffa/cpuminer-sph_luffa.o algo/cubehash/cpuminer-sph_cubehash.o algo/simd/cpuminer-sph_simd.o algo/hamsi/cpuminer-sph_hamsi.o algo/fugue/cpuminer-sph_fugue.o algo/gost/cpuminer-sph_gost.o algo/jh/cpuminer-sph_jh.o algo/keccak/cpuminer-sph_keccak.o algo/keccak/cpuminer-keccak.o algo/sha3/cpuminer-sph_sha2.o algo/sha3/cpuminer-sph_sha2big.o algo/shabal/cpuminer-sph_shabal.o algo/whirlpool/cpuminer-sph_whirlpool.o crypto/cpuminer-blake2s.o crypto/cpuminer-oaes_lib.o crypto/cpuminer-c_keccak.o crypto/cpuminer-c_groestl.o crypto/cpuminer-c_blake256.o crypto/cpuminer-c_jh.o crypto/cpuminer-c_skein.o crypto/cpuminer-hash.o crypto/cpuminer-aesb.o crypto/cpuminer-magimath.o algo/argon2/cpuminer-argon2a.o algo/argon2/ar2/cpuminer-argon2.o algo/argon2/ar2/cpuminer-opt.o algo/argon2/ar2/cpuminer-cores.o algo/argon2/ar2/cpuminer-ar2-scrypt-jane.o algo/argon2/ar2/cpuminer-blake2b.o algo/cpuminer-axiom.o algo/blake/cpuminer-blake.o algo/blake/cpuminer-blake2.o algo/blake/cpuminer-blakecoin.o algo/blake/cpuminer-decred.o algo/blake/cpuminer-pentablake.o algo/bmw/cpuminer-bmw256.o algo/cubehash/sse2/cpuminer-cubehash_sse2.o algo/cryptonight/cpuminer-cryptolight.o algo/cryptonight/cpuminer-cryptonight-common.o algo/cryptonight/cpuminer-cryptonight-aesni.o algo/cryptonight/cpuminer-cryptonight.o algo/cpuminer-drop.o algo/echo/aes_ni/cpuminer-hash.o algo/cpuminer-fresh.o algo/groestl/cpuminer-groestl.o algo/groestl/cpuminer-myr-groestl.o algo/groestl/sse2/cpuminer-grso.o algo/groestl/sse2/cpuminer-grso-asm.o algo/groestl/aes_ni/cpuminer-hash-groestl.o algo/groestl/aes_ni/cpuminer-hash-groestl256.o algo/haval/cpuminer-haval.o algo/heavy/cpuminer-heavy.o algo/heavy/cpuminer-bastion.o algo/cpuminer-hmq1725.o algo/hodl/cpuminer-hodl.o algo/hodl/cpuminer-hodl-gate.o algo/hodl/cpuminer-hodl_arith_uint256.o algo/hodl/cpuminer-hodl_uint256.o algo/hodl/cpuminer-hash.o algo/hodl/cpuminer-hmac_sha512.o algo/hodl/cpuminer-sha256.o algo/hodl/cpuminer-sha512.o algo/hodl/cpuminer-utilstrencodings.o algo/hodl/cpuminer-hodl-wolf.o algo/hodl/cpuminer-aes.o algo/hodl/cpuminer-sha512_avx.o algo/hodl/cpuminer-sha512_avx2.o algo/cpuminer-lbry.o algo/luffa/cpuminer-luffa.o algo/luffa/sse2/cpuminer-luffa_for_sse2.o algo/lyra2/cpuminer-lyra2.o algo/lyra2/cpuminer-sponge.o algo/lyra2/cpuminer-lyra2rev2.o algo/lyra2/cpuminer-lyra2re.o algo/keccak/sse2/cpuminer-keccak.o algo/cpuminer-m7m.o algo/cpuminer-neoscrypt.o algo/cpuminer-nist5.o algo/cpuminer-pluck.o algo/quark/cpuminer-quark.o algo/qubit/cpuminer-qubit.o algo/ripemd/cpuminer-sph_ripemd.o algo/cpuminer-scrypt.o algo/scryptjane/cpuminer-scrypt-jane.o algo/sha2/cpuminer-sha2.o algo/simd/sse2/cpuminer-nist.o algo/simd/sse2/cpuminer-vector.o algo/skein/cpuminer-skein.o algo/skein/cpuminer-skein2.o algo/cpuminer-s3.o algo/tiger/cpuminer-sph_tiger.o algo/whirlpool/cpuminer-whirlpool.o algo/whirlpool/cpuminer-whirlpoolx.o algo/x11/cpuminer-x11.o algo/x11/cpuminer-x11evo.o algo/x11/cpuminer-x11gost.o algo/x11/cpuminer-c11.o algo/x13/cpuminer-x13.o algo/x14/cpuminer-x14.o algo/x15/cpuminer-x15.o algo/x17/cpuminer-x17.o algo/yescrypt/cpuminer-yescrypt.o algo/yescrypt/cpuminer-yescrypt-common.o algo/yescrypt/cpuminer-sha256_Y.o algo/yescrypt/cpuminer-yescrypt-simd.o algo/cpuminer-zr5.o asm/cpuminer-neoscrypt_asm.o  asm/cpuminer-sha2-x64.o asm/cpuminer-scrypt-x64.o asm/cpuminer-aesb-x64.o   -lcurl -lz -ljansson -lpthread  -lssl -lcrypto -lgmp
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_avx2':
<artificial>:(.text+0x9712): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9729): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9760): undefined reference to `scrypt_ChunkMix_avx2'
<artificial>:(.text+0x9785): undefined reference to `scrypt_ChunkMix_avx2'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_xop':
<artificial>:(.text+0x99f2): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a09): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a40): undefined reference to `scrypt_ChunkMix_xop'
<artificial>:(.text+0x9a65): undefined reference to `scrypt_ChunkMix_xop'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_avx':
<artificial>:(.text+0x9cd2): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9ce9): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9d20): undefined reference to `scrypt_ChunkMix_avx'
<artificial>:(.text+0x9d45): undefined reference to `scrypt_ChunkMix_avx'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_ssse3':
<artificial>:(.text+0x9fb2): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0x9fc9): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0xa000): undefined reference to `scrypt_ChunkMix_ssse3'
<artificial>:(.text+0xa025): undefined reference to `scrypt_ChunkMix_ssse3'
/tmp/ccVXbbn8.ltrans6.ltrans.o: In function `scrypt_ROMix_sse2':
<artificial>:(.text+0xa292): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa2a9): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa2e0): undefined reference to `scrypt_ChunkMix_sse2'
<artificial>:(.text+0xa305): undefined reference to `scrypt_ChunkMix_sse2'
collect2: error: ld returned 1 exit status
Makefile:1333: recipe for target 'cpuminer' failed
make[2]: *** [cpuminer] Error 1
make[2]: Leaving directory '/root/cpuminer-opt'
Makefile:3453: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/root/cpuminer-opt'
Makefile:670: recipe for target 'all' failed
make: *** [all] Error 2


I just want to make sure I understand the problem definition

- multi is faster with -flto
- multi without -flto is slower than identically compiled opt
- multi with -flto is faster than pre-avx2 opt compiled without -flto
- opt fails to compile with gcc 5.4.0 with -flto
- -flto compiles with gcc 4.8.4 with no effect in performance.

The significant points are:

- flto is faster with gcc 5.4.0
- code that compiles with -flto using gcc 4.8.4 fails to compile using gcc 5.4.0.

The code that fails to compile is pretty ugly. It uses asm function pointers to select targets at compile time.
I've never seen anything like this so it will take a while to understand what is going on. It looks like the code is
self contained and the error doesn't seem to be related to missing libraries.

As a workaround, if you disable argon2 you can get the best of my optimizations as well as LTO, unless some of my opts
conflict with LTO. It wouldn't be the first time I step on the compiler when trying to optimize.
related to missing libraries

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
johnsmithx
Hero Member
*****
Offline Offline

Activity: 588
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 05, 2016, 04:30:35 PM
 #985

Excellent work. The easiest way to block the compile error is to comment out the source dir for argon2 and remove the registration
call for argon2 in algo-gate-api.c:register_algo_gate. You can easilly remove any algo this way.

You have demonstrated that LTO improves performance with the new compiler but has some incompatibilities with the existing
argon2 code. I will investigate argon2 to try to solve it.

I am glad I could help. When I was compiling on Xeon X5570 which is neither avx/avx2 I got also this error at the final link (the compiler and the source code was the same, just a different cpu):
Code:
/tmp/ccmS1O9H.ltrans19.ltrans.o: In function `grsoQ1024ASM':
<artificial>:(.text+0xa530): undefined reference to `grsoT0'
<artificial>:(.text+0xa538): undefined reference to `grsoT1'
<artificial>:(.text+0xa54a): undefined reference to `grsoT0'
<artificial>:(.text+0xa552): undefined reference to `grsoT1'
<artificial>:(.text+0xa564): undefined reference to `grsoT2'
<artificial>:(.text+0xa56c): undefined reference to `grsoT3'
<artificial>:(.text+0xa57e): undefined reference to `grsoT2'
<artificial>:(.text+0xa586): undefined reference to `grsoT3'
<artificial>:(.text+0xa598): undefined reference to `grsoT4'
<artificial>:(.text+0xa5a0): undefined reference to `grsoT5'
<artificial>:(.text+0xa5b2): undefined reference to `grsoT4'
<artificial>:(.text+0xa5ba): undefined reference to `grsoT5'
<artificial>:(.text+0xa5d0): undefined reference to `grsoT6'
<artificial>:(.text+0xa5d8): undefined reference to `grsoT7'
<artificial>:(.text+0xa5ea): undefined reference to `grsoT6'
<artificial>:(.text+0xa5f2): undefined reference to `grsoT7'
<artificial>:(.text+0xa600): undefined reference to `grsoT0'
<artificial>:(.text+0xa608): undefined reference to `grsoT1'
<artificial>:(.text+0xa61a): undefined reference to `grsoT0'
<artificial>:(.text+0xa622): undefined reference to `grsoT1'
<artificial>:(.text+0xa634): undefined reference to `grsoT2'
<artificial>:(.text+0xa63c): undefined reference to `grsoT3'
<artificial>:(.text+0xa64e): undefined reference to `grsoT2'
<artificial>:(.text+0xa656): undefined reference to `grsoT3'
<artificial>:(.text+0xa668): undefined reference to `grsoT4'
<artificial>:(.text+0xa670): undefined reference to `grsoT5'
<artificial>:(.text+0xa682): undefined reference to `grsoT4'
<artificial>:(.text+0xa68a): undefined reference to `grsoT5'
<artificial>:(.text+0xa6a0): undefined reference to `grsoT6'
<artificial>:(.text+0xa6a8): undefined reference to `grsoT7'
<artificial>:(.text+0xa6ba): undefined reference to `grsoT6'
<artificial>:(.text+0xa6c2): undefined reference to `grsoT7'
<artificial>:(.text+0xa730): undefined reference to `grsoT0'
<artificial>:(.text+0xa738): undefined reference to `grsoT1'
<artificial>:(.text+0xa74a): undefined reference to `grsoT0'
<artificial>:(.text+0xa752): undefined reference to `grsoT1'
<artificial>:(.text+0xa764): undefined reference to `grsoT2'
<artificial>:(.text+0xa76c): undefined reference to `grsoT3'
<artificial>:(.text+0xa77e): undefined reference to `grsoT2'
<artificial>:(.text+0xa786): undefined reference to `grsoT3'
<artificial>:(.text+0xa798): undefined reference to `grsoT4'
<artificial>:(.text+0xa7a0): undefined reference to `grsoT5'
<artificial>:(.text+0xa7b2): undefined reference to `grsoT4'
<artificial>:(.text+0xa7ba): undefined reference to `grsoT5'
<artificial>:(.text+0xa7d0): undefined reference to `grsoT6'
<artificial>:(.text+0xa7d8): undefined reference to `grsoT7'
<artificial>:(.text+0xa7ea): undefined reference to `grsoT6'
<artificial>:(.text+0xa7f2): undefined reference to `grsoT7'
<artificial>:(.text+0xa800): undefined reference to `grsoT0'
<artificial>:(.text+0xa808): undefined reference to `grsoT1'
<artificial>:(.text+0xa81a): undefined reference to `grsoT0'
<artificial>:(.text+0xa822): undefined reference to `grsoT1'
<artificial>:(.text+0xa834): undefined reference to `grsoT2'
<artificial>:(.text+0xa83c): undefined reference to `grsoT3'
<artificial>:(.text+0xa84e): undefined reference to `grsoT2'
<artificial>:(.text+0xa856): undefined reference to `grsoT3'
<artificial>:(.text+0xa868): undefined reference to `grsoT4'
<artificial>:(.text+0xa870): undefined reference to `grsoT5'
<artificial>:(.text+0xa882): undefined reference to `grsoT4'
<artificial>:(.text+0xa88a): undefined reference to `grsoT5'
<artificial>:(.text+0xa8a0): undefined reference to `grsoT6'
<artificial>:(.text+0xa8a8): undefined reference to `grsoT7'
<artificial>:(.text+0xa8ba): undefined reference to `grsoT6'
<artificial>:(.text+0xa8c2): undefined reference to `grsoT7'
<artificial>:(.text+0xa930): undefined reference to `grsoT0'
<artificial>:(.text+0xa938): undefined reference to `grsoT1'
<artificial>:(.text+0xa94a): undefined reference to `grsoT0'
<artificial>:(.text+0xa952): undefined reference to `grsoT1'
<artificial>:(.text+0xa964): undefined reference to `grsoT2'
<artificial>:(.text+0xa96c): undefined reference to `grsoT3'
<artificial>:(.text+0xa97e): undefined reference to `grsoT2'
<artificial>:(.text+0xa986): undefined reference to `grsoT3'
<artificial>:(.text+0xa998): undefined reference to `grsoT4'
<artificial>:(.text+0xa9a0): undefined reference to `grsoT5'
<artificial>:(.text+0xa9b2): undefined reference to `grsoT4'
<artificial>:(.text+0xa9ba): undefined reference to `grsoT5'
<artificial>:(.text+0xa9d0): undefined reference to `grsoT6'
<artificial>:(.text+0xa9d8): undefined reference to `grsoT7'
<artificial>:(.text+0xa9ea): undefined reference to `grsoT6'
<artificial>:(.text+0xa9f2): undefined reference to `grsoT7'
<artificial>:(.text+0xaa00): undefined reference to `grsoT0'
<artificial>:(.text+0xaa08): undefined reference to `grsoT1'
<artificial>:(.text+0xaa1a): undefined reference to `grsoT0'
<artificial>:(.text+0xaa22): undefined reference to `grsoT1'
<artificial>:(.text+0xaa34): undefined reference to `grsoT2'
<artificial>:(.text+0xaa3c): undefined reference to `grsoT3'
<artificial>:(.text+0xaa4e): undefined reference to `grsoT2'
<artificial>:(.text+0xaa56): undefined reference to `grsoT3'
<artificial>:(.text+0xaa68): undefined reference to `grsoT4'
<artificial>:(.text+0xaa70): undefined reference to `grsoT5'
<artificial>:(.text+0xaa82): undefined reference to `grsoT4'
<artificial>:(.text+0xaa8a): undefined reference to `grsoT5'
<artificial>:(.text+0xaaa0): undefined reference to `grsoT6'
<artificial>:(.text+0xaaa8): undefined reference to `grsoT7'
<artificial>:(.text+0xaaba): undefined reference to `grsoT6'
<artificial>:(.text+0xaac2): undefined reference to `grsoT7'
<artificial>:(.text+0xab30): undefined reference to `grsoT0'
<artificial>:(.text+0xab38): undefined reference to `grsoT1'
<artificial>:(.text+0xab4a): undefined reference to `grsoT0'
<artificial>:(.text+0xab52): undefined reference to `grsoT1'
<artificial>:(.text+0xab64): undefined reference to `grsoT2'
<artificial>:(.text+0xab6c): undefined reference to `grsoT3'
<artificial>:(.text+0xab7e): undefined reference to `grsoT2'
<artificial>:(.text+0xab86): undefined reference to `grsoT3'
<artificial>:(.text+0xab98): undefined reference to `grsoT4'
<artificial>:(.text+0xaba0): undefined reference to `grsoT5'
<artificial>:(.text+0xabb2): undefined reference to `grsoT4'
<artificial>:(.text+0xabba): undefined reference to `grsoT5'
<artificial>:(.text+0xabd0): undefined reference to `grsoT6'
<artificial>:(.text+0xabd8): undefined reference to `grsoT7'
<artificial>:(.text+0xabea): undefined reference to `grsoT6'
<artificial>:(.text+0xabf2): undefined reference to `grsoT7'
<artificial>:(.text+0xac00): undefined reference to `grsoT0'
<artificial>:(.text+0xac08): undefined reference to `grsoT1'
<artificial>:(.text+0xac1a): undefined reference to `grsoT0'
<artificial>:(.text+0xac22): undefined reference to `grsoT1'
<artificial>:(.text+0xac34): undefined reference to `grsoT2'
<artificial>:(.text+0xac3c): undefined reference to `grsoT3'
<artificial>:(.text+0xac4e): undefined reference to `grsoT2'
<artificial>:(.text+0xac56): undefined reference to `grsoT3'
<artificial>:(.text+0xac68): undefined reference to `grsoT4'
<artificial>:(.text+0xac70): undefined reference to `grsoT5'
<artificial>:(.text+0xac82): undefined reference to `grsoT4'
<artificial>:(.text+0xac8a): undefined reference to `grsoT5'
<artificial>:(.text+0xaca0): undefined reference to `grsoT6'
<artificial>:(.text+0xaca8): undefined reference to `grsoT7'
<artificial>:(.text+0xacba): undefined reference to `grsoT6'
<artificial>:(.text+0xacc2): undefined reference to `grsoT7'
/tmp/ccmS1O9H.ltrans19.ltrans.o: In function `grsoP1024ASM':
<artificial>:(.text+0xadbb): undefined reference to `grsoT0'
<artificial>:(.text+0xadc3): undefined reference to `grsoT1'
<artificial>:(.text+0xadd5): undefined reference to `grsoT0'
<artificial>:(.text+0xaddd): undefined reference to `grsoT1'
<artificial>:(.text+0xadef): undefined reference to `grsoT2'
<artificial>:(.text+0xadf7): undefined reference to `grsoT3'
<artificial>:(.text+0xae09): undefined reference to `grsoT2'
<artificial>:(.text+0xae11): undefined reference to `grsoT3'
<artificial>:(.text+0xae23): undefined reference to `grsoT4'
<artificial>:(.text+0xae2b): undefined reference to `grsoT5'
<artificial>:(.text+0xae3d): undefined reference to `grsoT4'
<artificial>:(.text+0xae45): undefined reference to `grsoT5'
<artificial>:(.text+0xae57): undefined reference to `grsoT6'
<artificial>:(.text+0xae5f): undefined reference to `grsoT7'
<artificial>:(.text+0xae6d): undefined reference to `grsoT6'
<artificial>:(.text+0xae75): undefined reference to `grsoT7'
<artificial>:(.text+0xae9b): undefined reference to `grsoT0'
<artificial>:(.text+0xaea3): undefined reference to `grsoT1'
<artificial>:(.text+0xaeb5): undefined reference to `grsoT0'
<artificial>:(.text+0xaebd): undefined reference to `grsoT1'
<artificial>:(.text+0xaecf): undefined reference to `grsoT2'
<artificial>:(.text+0xaed7): undefined reference to `grsoT3'
<artificial>:(.text+0xaee9): undefined reference to `grsoT2'
<artificial>:(.text+0xaef1): undefined reference to `grsoT3'
<artificial>:(.text+0xaf03): undefined reference to `grsoT4'
<artificial>:(.text+0xaf0b): undefined reference to `grsoT5'
<artificial>:(.text+0xaf1d): undefined reference to `grsoT4'
<artificial>:(.text+0xaf25): undefined reference to `grsoT5'
<artificial>:(.text+0xaf37): undefined reference to `grsoT6'
<artificial>:(.text+0xaf3f): undefined reference to `grsoT7'
<artificial>:(.text+0xaf4d): undefined reference to `grsoT6'
<artificial>:(.text+0xaf55): undefined reference to `grsoT7'
<artificial>:(.text+0xaf7b): undefined reference to `grsoT0'
<artificial>:(.text+0xaf83): undefined reference to `grsoT1'
<artificial>:(.text+0xaf95): undefined reference to `grsoT0'
<artificial>:(.text+0xaf9d): undefined reference to `grsoT1'
<artificial>:(.text+0xafaf): undefined reference to `grsoT2'
<artificial>:(.text+0xafb7): undefined reference to `grsoT3'
<artificial>:(.text+0xafc9): undefined reference to `grsoT2'
<artificial>:(.text+0xafd1): undefined reference to `grsoT3'
<artificial>:(.text+0xafe3): undefined reference to `grsoT4'
<artificial>:(.text+0xafeb): undefined reference to `grsoT5'
<artificial>:(.text+0xaffd): undefined reference to `grsoT4'
<artificial>:(.text+0xb005): undefined reference to `grsoT5'
<artificial>:(.text+0xb017): undefined reference to `grsoT6'
<artificial>:(.text+0xb01f): undefined reference to `grsoT7'
<artificial>:(.text+0xb02d): undefined reference to `grsoT6'
<artificial>:(.text+0xb035): undefined reference to `grsoT7'
<artificial>:(.text+0xb061): undefined reference to `grsoT0'
<artificial>:(.text+0xb069): undefined reference to `grsoT1'
<artificial>:(.text+0xb07b): undefined reference to `grsoT0'
<artificial>:(.text+0xb083): undefined reference to `grsoT1'
<artificial>:(.text+0xb095): undefined reference to `grsoT2'
<artificial>:(.text+0xb09d): undefined reference to `grsoT3'
<artificial>:(.text+0xb0af): undefined reference to `grsoT2'
<artificial>:(.text+0xb0b7): undefined reference to `grsoT3'
<artificial>:(.text+0xb0c9): undefined reference to `grsoT4'
<artificial>:(.text+0xb0d1): undefined reference to `grsoT5'
<artificial>:(.text+0xb0e3): undefined reference to `grsoT4'
<artificial>:(.text+0xb0eb): undefined reference to `grsoT5'
<artificial>:(.text+0xb0fd): undefined reference to `grsoT6'
<artificial>:(.text+0xb105): undefined reference to `grsoT7'
<artificial>:(.text+0xb113): undefined reference to `grsoT6'
<artificial>:(.text+0xb11b): undefined reference to `grsoT7'
<artificial>:(.text+0xb146): undefined reference to `grsoT0'
<artificial>:(.text+0xb14e): undefined reference to `grsoT1'
<artificial>:(.text+0xb160): undefined reference to `grsoT0'
<artificial>:(.text+0xb168): undefined reference to `grsoT1'
<artificial>:(.text+0xb17a): undefined reference to `grsoT2'
<artificial>:(.text+0xb182): undefined reference to `grsoT3'
<artificial>:(.text+0xb194): undefined reference to `grsoT2'
<artificial>:(.text+0xb19c): undefined reference to `grsoT3'
<artificial>:(.text+0xb1ae): undefined reference to `grsoT4'
<artificial>:(.text+0xb1b6): undefined reference to `grsoT5'
<artificial>:(.text+0xb1c8): undefined reference to `grsoT4'
<artificial>:(.text+0xb1d0): undefined reference to `grsoT5'
<artificial>:(.text+0xb1e2): undefined reference to `grsoT6'
<artificial>:(.text+0xb1ea): undefined reference to `grsoT7'
<artificial>:(.text+0xb1f8): undefined reference to `grsoT6'
<artificial>:(.text+0xb200): undefined reference to `grsoT7'
<artificial>:(.text+0xb22c): undefined reference to `grsoT0'
<artificial>:(.text+0xb234): undefined reference to `grsoT1'
<artificial>:(.text+0xb246): undefined reference to `grsoT0'
<artificial>:(.text+0xb24e): undefined reference to `grsoT1'
<artificial>:(.text+0xb260): undefined reference to `grsoT2'
<artificial>:(.text+0xb268): undefined reference to `grsoT3'
<artificial>:(.text+0xb27a): undefined reference to `grsoT2'
<artificial>:(.text+0xb282): undefined reference to `grsoT3'
<artificial>:(.text+0xb294): undefined reference to `grsoT4'
<artificial>:(.text+0xb29c): undefined reference to `grsoT5'
<artificial>:(.text+0xb2ae): undefined reference to `grsoT4'
<artificial>:(.text+0xb2b6): undefined reference to `grsoT5'
<artificial>:(.text+0xb2c8): undefined reference to `grsoT6'
<artificial>:(.text+0xb2d0): undefined reference to `grsoT7'
<artificial>:(.text+0xb2de): undefined reference to `grsoT6'
<artificial>:(.text+0xb2e6): undefined reference to `grsoT7'
<artificial>:(.text+0xb311): undefined reference to `grsoT0'
<artificial>:(.text+0xb319): undefined reference to `grsoT1'
<artificial>:(.text+0xb32b): undefined reference to `grsoT0'
<artificial>:(.text+0xb333): undefined reference to `grsoT1'
<artificial>:(.text+0xb345): undefined reference to `grsoT2'
<artificial>:(.text+0xb34d): undefined reference to `grsoT3'
<artificial>:(.text+0xb35f): undefined reference to `grsoT2'
<artificial>:(.text+0xb367): undefined reference to `grsoT3'
<artificial>:(.text+0xb379): undefined reference to `grsoT4'
<artificial>:(.text+0xb381): undefined reference to `grsoT5'
<artificial>:(.text+0xb393): undefined reference to `grsoT4'
<artificial>:(.text+0xb39b): undefined reference to `grsoT5'
<artificial>:(.text+0xb3ad): undefined reference to `grsoT6'
<artificial>:(.text+0xb3b5): undefined reference to `grsoT7'
<artificial>:(.text+0xb3c3): undefined reference to `grsoT6'
<artificial>:(.text+0xb3cb): undefined reference to `grsoT7'
<artificial>:(.text+0xb3d9): undefined reference to `grsoT0'
<artificial>:(.text+0xb3e1): undefined reference to `grsoT1'
<artificial>:(.text+0xb3f3): undefined reference to `grsoT0'
<artificial>:(.text+0xb3fb): undefined reference to `grsoT1'
<artificial>:(.text+0xb40d): undefined reference to `grsoT2'
<artificial>:(.text+0xb415): undefined reference to `grsoT3'
<artificial>:(.text+0xb427): undefined reference to `grsoT2'
<artificial>:(.text+0xb42f): undefined reference to `grsoT3'
<artificial>:(.text+0xb441): undefined reference to `grsoT4'
<artificial>:(.text+0xb449): undefined reference to `grsoT5'
<artificial>:(.text+0xb45b): undefined reference to `grsoT4'
<artificial>:(.text+0xb463): undefined reference to `grsoT5'
<artificial>:(.text+0xb475): undefined reference to `grsoT6'
<artificial>:(.text+0xb47d): undefined reference to `grsoT7'
<artificial>:(.text+0xb48b): undefined reference to `grsoT6'
<artificial>:(.text+0xb493): undefined reference to `grsoT7'
collect2: error: ld returned 1 exit status
Makefile:1292: recipe for target 'cpuminer' failed
make[2]: *** [cpuminer] Error 1
make[2]: Leaving directory '/root/cpuminer-opt-sse'
Makefile:3320: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/root/cpuminer-opt-sse'
Makefile:658: recipe for target 'all' failed
make: *** [all] Error 2

I solved it by deleting almost all the assembler from algo/groestl/sse2/grso-asm.c.

CPU architecture selection is made at compile time. If you do a native compile on a CPU that supports AVX2 you can not run it
on a CPU with only AVX.  If you want to cross compile you must specify the arch of the target CPU, and produce seperate executables
for each desired architecture.

Of course you are perfectly right (although cross-compiling is when a different host/target is desired, not just a different cpu type), I just got a bit confused because I saw some remarks "choose runtime" in the source code so I thought maybe there is some decision at runtime.

So I have 3 binaries now for sse/avx/avx2. I made three directories and launch the corresponding binary like this:
Code:
grep -q avx /proc/cpuinfo && feat="avx" || feat="sse" && grep -q avx2 /proc/cpuinfo && feat="avx2"
/somewhere/cpuminer-opt-$feat/cpuminer ...

SSE and AVX versions are on par with tpruvot's speed-wise but when I compile them with tpruvot's uncommented build.sh they seem a tiny bit faster but anyway they have those align flags so I stay with it.

My logic for AVX2 isn't fully implemented yet in the capablilities checks, had it been it would have
displayed a message warning of the impending crash, then crashed. This is what you should see when implemented:

Code:
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
[color=red]Unsupported CPU or SW configuration, miner will likely crash![/color]
Illegal instruction (core dumped)

It's theoretically possibly that the binary would crash even before this print if gcc decided to use some of those "illegal" instructions in the code preceding the print.

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
johnsmithx
Hero Member
*****
Offline Offline

Activity: 588
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 05, 2016, 04:51:11 PM
 #986

I just want to make sure I understand the problem definition

- multi is faster with -flto
- multi without -flto is slower than identically compiled opt
- multi with -flto is faster than pre-avx2 compiled without -flto
- opt fails to compile with gcc 5.4.0 with -flto
- -flto compiles with gcc 4.8.4 with no effect in performance.

The significant points are:

- flto is faster with gcc 5.4.0
- code that compiles with -flto using gcc 4.8.4 fails to compile using gcc 5.4.0.

Yes I think you got all those points right. Well, we would have to deal with 5.4.0 sooner or later anyway, it's not going anywhere.

The code that fails to compile is pretty ugly. It uses asm function pointers to select targets at compile time.
I've never seen anything like this so it will take a while to understand what is going on. It looks like the code is
self contained and the error doesn't seem to be related to missing libraries.

Ugly or ingenious, crazy anyway. It took me a while to figure where exactly was the error coming from  Smiley

I just looked at the current tpruvot's and this culprit file scrypt-jane-romix-template.h is a bit different now. You know, it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 05:37:32 PM
 #987

I just want to make sure I understand the problem definition

- multi is faster with -flto
- multi without -flto is slower than identically compiled opt
- multi with -flto is faster than pre-avx2 compiled without -flto
- opt fails to compile with gcc 5.4.0 with -flto
- -flto compiles with gcc 4.8.4 with no effect in performance.

The significant points are:

- flto is faster with gcc 5.4.0
- code that compiles with -flto using gcc 4.8.4 fails to compile using gcc 5.4.0.

Yes I think you got all those points right. Well, we would have to deal with 5.4.0 sooner or later anyway, it's not going anywhere.

The code that fails to compile is pretty ugly. It uses asm function pointers to select targets at compile time.
I've never seen anything like this so it will take a while to understand what is going on. It looks like the code is
self contained and the error doesn't seem to be related to missing libraries.

Ugly or ingenious, crazy anyway. It took me a while to figure where exactly was the error coming from  Smiley

I just looked at the current tpruvot's and this culprit file scrypt-jane-romix-template.h is a bit different now. You know, it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


Maybe too ingeneous for the compiler.

I can't find any normal definition of scrypt_ChunkMix_avx2 but there is some code in an unfamiliar syntax in
scrypt-jane-mix_salsa64-avx2.h that may be it. I'm guessing the technique used to abtract asm functions by
building custom stack linkage as well as using asm function pointers is too much for LTO to handle.

I'll look into the TPruvot delta to see if it addresses this issue. Another alternative is to rewrite them in C but
I'm not motivated to do that at this time, given there are alternatives for optimized mining of argon2.

The GRS macros have been a pain for me since day 1. I had to make changes so they could be included in
multiple algos. It looks like LTO likes them as much as I do. The problem is they're faster than the SPH versions.
Do the errors occur only with -flto or do they also occur with gcc 5.4.0 with my build.sh options? This code is only
compiled for SSE2 builds, I may have to drop support for it or degrade performance by using SPH functions in the future.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 05:52:10 PM
 #988

it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


I can't figure out how to get my code into github. It must be simple but I haven't figured it out. I have an account and played with
it using the tutorial but I'm stuck.
I also want to make a gradual transition so I don't get bogged down trying to figure out github when I'm trying to focus
on the miner. The first phase is to continue development offline and upload releases.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Fuzzbawls
Hero Member
*****
Offline Offline

Activity: 745
Merit: 500

★YoBit.Net★ 200+ Coins Exchange & Dice


View Profile
August 05, 2016, 06:05:03 PM
 #989

it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


I can't figure out how to get my code into github. It must be simple but I haven't figured it out. I have an account and played with
it using the tutorial but I'm stuck.
I also want to make a gradual transition so I don't get bogged down trying to figure out github when I'm trying to focus
on the miner. The first phase is to continue development offline and upload releases.

I know this may sound counter-productive for focusing on the actual coding of the miner, but the sooner you learn git + github the easier your life becomes in regards to making ANY changes to code. Well worth the invested time!

Its important to remember, however, that github is a GUI (mostly) that is built on top of git. One can manage and get by "ok" with strictly using the github website for their code management, but knowing how to use the git CLI toolset is second to none.

Travis-CI, as mentioned, can also greatly improve the efficiency of any build testing you do. It is another system to learn, and has it's own learning curve and, of course, changes over time...but is also a VERY powerful and customizable tool.

joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 06:08:14 PM
 #990

My logic for AVX2 isn't fully implemented yet in the capablilities checks, had it been it would have
displayed a message warning of the impending crash, then crashed. This is what you should see when implemented:

Code:
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
[color=red]Unsupported CPU or SW configuration, miner will likely crash![/color]
Illegal instruction (core dumped)

It's theoretically possibly that the binary would crash even before this print if gcc decided to use some of those "illegal" instructions in the code preceding the print.

Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 06:29:11 PM
 #991

it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


I can't figure out how to get my code into github. It must be simple but I haven't figured it out. I have an account and played with
it using the tutorial but I'm stuck.
I also want to make a gradual transition so I don't get bogged down trying to figure out github when I'm trying to focus
on the miner. The first phase is to continue development offline and upload releases.

I know this may sound counter-productive for focusing on the actual coding of the miner, but the sooner you learn git + github the easier your life becomes in regards to making ANY changes to code. Well worth the invested time!

Its important to remember, however, that github is a GUI (mostly) that is built on top of git. One can manage and get by "ok" with strictly using the github website for their code management, but knowing how to use the git CLI toolset is second to none.

Travis-CI, as mentioned, can also greatly improve the efficiency of any build testing you do. It is another system to learn, and has it's own learning curve and, of course, changes over time...but is also a VERY powerful and customizable tool.

Hi Fuzzbawls, you are correct but I need to get to a certain level of proficiency with github (I mean both git cli and github gui) before
I can be as productive. I don't want to get bogged down because I'm lost in the dev environment.

If someone knows the command to login and upload a source tree it would get me unstuck and I could move forward.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Fuzzbawls
Hero Member
*****
Offline Offline

Activity: 745
Merit: 500

★YoBit.Net★ 200+ Coins Exchange & Dice


View Profile
August 05, 2016, 06:55:46 PM
 #992

If someone knows the command to login and upload a source tree it would get me unstuck and I could move forward.

quick n' dirty: https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/

this will take your existing source tree and upload it to whatever new repository you create on github, with the "First Commit" being a complete replica of your local source tree's current state (at the moment you make the commit). You will be prompted for your github username/password after step 9.

A quick read of https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup, specifically the two CLI commands pertaining to "Your Identity" should be done between step 4 and 5 (user.email should match the email associated with your github account, but can also be username@users.noreply.github.com where username is your github username)

johnsmithx
Hero Member
*****
Offline Offline

Activity: 588
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 05, 2016, 08:15:08 PM
 #993

The GRS macros have been a pain for me since day 1. I had to make changes so they could be included in
multiple algos. It looks like LTO likes them as much as I do. The problem is they're faster than the SPH versions.
Do the errors occur only with -flto or do they also occur with gcc 5.4.0 with my build.sh options? This code is only
compiled for SSE2 builds, I may have to drop support for it or degrade performance by using SPH functions in the future.

With your build.sh, without messing with LTO, I haven't seen any problem. But that was on avx/avx2 cpus. On non-avx cpu I compiled only today and that was straight with LTO. So now I did just that - downloaded 3.4.0 and ran the untouched build.sh. No error. But look at the remarkable speed difference when I compare it with 3.4.0 built with tpruvot's uncommented (LTO enabled) build.sh:
Code:
root@xxx:~/z/cpuminer-opt-3.4.0# ./cpuminer -a lyra2re --benchmark

         **********  cpuminer-multi 1.2-dev  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0, Jeff Garzik and Optiminer.

CPU: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
CPU features: SSE2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2
Algo features: SSE2 AES AVX AVX2
AES not available, starting mining with SSE2 optimizations...

[2016-08-05 19:03:55] 16 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 19:03:56] Total: 983.04 kH, 872.93 kH/s
[2016-08-05 19:04:00] Total: 3218.22 kH, 927.88 kH/s
[2016-08-05 19:04:06] Total: 4515.37 kH, 896.32 kH/s
[2016-08-05 19:04:10] Total: 4143.24 kH, 907.80 kH/s

root@xxx:~/cpuminer-opt-sse# ./cpuminer -a lyra2re --benchmark

         **********  cpuminer-multi 1.2-dev  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0, Jeff Garzik and Optiminer.

CPU: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
CPU features: SSE2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2
Algo features: SSE2 AES AVX AVX2
AES not available, starting mining with SSE2 optimizations...

[2016-08-05 19:04:33] 16 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 19:04:34] Total: 786.43 kH, 753.31 kH/s
[2016-08-05 19:04:38] Total: 3006.12 kH, 993.53 kH/s
[2016-08-05 19:04:43] Total: 4691.35 kH, 988.85 kH/s
[2016-08-05 19:04:48] Total: 4694.50 kH, 994.89 kH/s
[2016-08-05 19:04:53] Total: 4881.65 kH, 998.29 kH/s

it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.

I can't figure out how to get my code into github. It must be simple but I haven't figured it out. I have an account and played with
it using the tutorial but I'm stuck.
I also want to make a gradual transition so I don't get bogged down trying to figure out github when I'm trying to focus
on the miner. The first phase is to continue development offline and upload releases.

I think this should be of help: https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/. Or, to make you feel better, you can always do what I did when I didn't know how to do this same thing: I created an empty repo on github, then git clone it locally, then copied all the files into the directory, then git add it and then just git push it back to github  Smiley

If I remember correctly when you try to push it the first time it will setup your github credentials, if you don't want it to ask you for your password every single time you can use git config credential.helper store and it will save it and remember.

As for travis, it requires .travis.yml file to be present. This file already exists and it's quite self-explanatory if you look into it. Travis for public opensource projects is free and if you hook it up it will then automatically build after every commit you make. Those builds, for linux, are excellent in that they independently confirm your code is compilable (which you already knew anyway, but still..) but I wouldn't try to gather and publish the binaries, though. I wouldn't publish binaries for linux at all, you would just get into troubles and people are already used to compile themselves. But you can also setup building with win64 target using mingw and those I would publish so that you (the people) don't rely on some unknown third party. I don't want to spread paranoia or anything but you never know what did they do to the code, if they added something..

You can also get an .edu email for $2 from here: https://bitcointalk.org/index.php?topic=1321638.0 which will (among others) allow you to get a github student pack for free, which in turn will (among others) give you access to travis-ci.com (as opposite to travis-ci.org) which is for private building (normally it would cost money, this way it would be free) and that you could use as a sandbox maybe..

But if you are serious about github there is one crucial decision you will have to make, not necessarily now. Whether you want to keep working on top of years old tpruvot's code and keep ignoring all changes and possible improvements he has done ever since you took his code and he will keep doing in the future, or if you actually extract all your changes from the old code you took, fork his repo and put them on top of it. That way you will stay in sync with him and will keep your changes being reapplied after every his commit. This would require quite a lot of work, the part where you extract your changes, it's just about making a diff, fast-forward his code to the recent and reapplying the diff. Maybe someone young and enthusiastic could do that? Unfortunately I am old and tired so I am out of the game.

Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.

EDIT: oh, Fuzzbawls was faster. Btw, Fuzzbawls, you wouldn't get by "ok" with strictly using the github website, you would commit a suicide. Editing file directly via the github website is possible but it's so painful, don't even try it. Let alone everytime you click on "save", of every file you changed, it creates a new commit, so then you can't even review your changes easily. Horrible, terrible. But for non-commandline people many IDEs have git integrated so you don't have to use git but you really would very hardly survive by using just the github website.

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 08:19:36 PM
 #994

I just want to make sure I understand the problem definition

- multi is faster with -flto
- multi without -flto is slower than identically compiled opt
- multi with -flto is faster than pre-avx2 compiled without -flto
- opt fails to compile with gcc 5.4.0 with -flto
- -flto compiles with gcc 4.8.4 with no effect in performance.

The significant points are:

- flto is faster with gcc 5.4.0
- code that compiles with -flto using gcc 4.8.4 fails to compile using gcc 5.4.0.

Yes I think you got all those points right. Well, we would have to deal with 5.4.0 sooner or later anyway, it's not going anywhere.

The code that fails to compile is pretty ugly. It uses asm function pointers to select targets at compile time.
I've never seen anything like this so it will take a while to understand what is going on. It looks like the code is
self contained and the error doesn't seem to be related to missing libraries.

Ugly or ingenious, crazy anyway. It took me a while to figure where exactly was the error coming from  Smiley

I just looked at the current tpruvot's and this culprit file scrypt-jane-romix-template.h is a bit different now. You know, it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


Maybe too ingeneous for the compiler.

I can't find any normal definition of scrypt_ChunkMix_avx2 but there is some code in an unfamiliar syntax in
scrypt-jane-mix_salsa64-avx2.h that may be it. I'm guessing the technique used to abtract asm functions by
building custom stack linkage as well as using asm function pointers is too much for LTO to handle.

I'll look into the TPruvot delta to see if it addresses this issue. Another alternative is to rewrite them in C but
I'm not motivated to do that at this time, given there are alternatives for optimized mining of argon2.

The GRS macros have been a pain for me since day 1. I had to make changes so they could be included in
multiple algos. It looks like LTO likes them as much as I do. The problem is they're faster than the SPH versions.
Do the errors occur only with -flto or do they also occur with gcc 5.4.0 with my build.sh options? This code is only
compiled for SSE2 builds, I may have to drop support for it or degrade performance by using SPH functions in the future.

There are two copies of scrypt-jane, one copy used by scrypt-jane itself and the other used by
argon2. Only the argon2 version is optimized and was taken from the argon2 branch of multi. The other copy was taken
from the windows branch and used by scrypt-jane. At some point I intend to integrate the optimized version for use
by scrypt-jane algo, but it's not a very popular algo. Maybe you could try compiling the argon2 branch of multi to see
if it also has the same error.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
johnsmithx
Hero Member
*****
Offline Offline

Activity: 588
Merit: 507

I don't buy nor sell anything here and never will.


View Profile
August 05, 2016, 08:48:21 PM
 #995

There are two copies of scrypt-jane, one copy used by scrypt-jane itself and the other used by
argon2. Only the argon2 version is optimized and was taken from the argon2 branch of multi. The other copy was taken
from the windows branch and used by scrypt-jane. At some point I intend to integrate the optimized version for use
by scrypt-jane algo, but it's not a very popular algo. Maybe you could try compiling the argon2 branch of multi to see
if it also has the same error.

You are absolutely right. If I build the argon2 branch with the default build.sh (LTO and the other flags disabled) it compiles, but if I do my uncommenting I get the same error:
Code:
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_sse2.lto_priv.317':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_sse2' follow
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_ssse3.lto_priv.316':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_ssse3' follow
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_avx.lto_priv.315':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_avx' follow
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_xop.lto_priv.314':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_xop' follow
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_avx2.lto_priv.313':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_avx2' follow
collect2: error: ld returned 1 exit status
Makefile:881: recipe for target 'cpuminer' failed

My list of 44(+1) reviewed Bitcoin forks | You don't have to download the pre-fork blockchain again for each fork! | Beware of fraudulent AWS accounts sellers and dangerous edu AWS codes! + My personal list of legit sellers and scammers | Never publicly reveal your btc addresses, ownership or any other details and stay very far away from anybody who asks you to! | The general rule of safe buying is: if the seller is a newbie, with no reputation, with no topic nor trust feedback, offering no vouches and/or selling from a locked or self-moderated topic and unwilling to go first or use escrow => AVOID. Always check the trust feedback first and make sure that you have enabled "Show untrusted feedback by default" in "Profile / Forum Profile Information".
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 09:30:25 PM
 #996

There are two copies of scrypt-jane, one copy used by scrypt-jane itself and the other used by
argon2. Only the argon2 version is optimized and was taken from the argon2 branch of multi. The other copy was taken
from the windows branch and used by scrypt-jane. At some point I intend to integrate the optimized version for use
by scrypt-jane algo, but it's not a very popular algo. Maybe you could try compiling the argon2 branch of multi to see
if it also has the same error.

You are absolutely right. If I build the argon2 branch with the default build.sh (LTO and the other flags disabled) it compiles, but if I do my uncommenting I get the same error:


I think the situation is pretty well understood now.

gcc 5.4.0 has enhancements to LTO that are incompatible with the existing optimized scrypt-jane code used by argon2.
Those same enhancements improve performance if compiled with gcc 5.4.0 and -flto.
The short term workaround for users with gcc 5.4.0 is to disable argon2 by hiding the source directory from the compiler
and removing the registration of argon2 in algo-gate-api.c:register_algo_gate.
The long term solution is to find the cause of the compile error and plan a way to fix it or rewrite the functions using
and in a way that doesn't try to outsmart the compiler.

In addition it is necessary to workaound a compile error in grs. This is accomplished by hiding algo/groestl/sse2/ directory from
the compiler and removing algo/groestl/sse2/grso-asm.c from Makefile.am. This workaround will break the SSE2 compile.
A long term fix for this issue is unlikely and will probably result in reduced performance for some algos on SSE2 limited
CPUs in order to have more performance on newer ones.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 05, 2016, 11:16:12 PM
 #997


Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.


That's what I was referring to when I wrote "compiler produced AVX2". AVX(2) provides SIMD instructions an it's unlikely something
like a printf would use it. A memcpy wouldn't use it because of the overhead of loading/storing the data to/from the ymm regs.
It's only useful for vector arith, and apparently the compiler isn't smart enough to convert conventionally coded array processing
loops to AVX2. I'm not even sure *if* the compiler can optimize in this fashion, the existance of so much hand coded AVX2 suggests
otherwise.

Since we're playing semantic games would you care to explain your concerns with my use of the term cross compiling?
IMO cross compiling can mean any compilation not done on the target machine and not executable on the build machine.

And my comment about you maybe using a core2 was based on the symptoms you decribed and that some server CPUs can
be optimized for efficiency by removing/disabling unneeded features like floating point, AES or AVX.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Wolf0
Legendary
*
Offline Offline

Activity: 1834
Merit: 1002


Miner Developer


View Profile
August 05, 2016, 11:51:17 PM
 #998


Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.


That's what I was referring to when I wrote "compiler produced AVX2". AVX(2) provides SIMD instructions an it's unlikely something
like a printf would use it. A memcpy wouldn't use it because of the overhead of loading/storing the data to/from the ymm regs.
It's only useful for vector arith, and apparently the compiler isn't smart enough to convert conventionally coded array processing
loops to AVX2. I'm not even sure *if* the compiler can optimize in this fashion, the existance of so much hand coded AVX2 suggests
otherwise.

Since we're playing semantic games would you care to explain your concerns with my use of the term cross compiling?
IMO cross compiling can mean any compilation not done on the target machine and not executable on the build machine.

And my comment about you maybe using a core2 was based on the symptoms you decribed and that some server CPUs can
be optimized for efficiency by removing/disabling unneeded features like floating point, AES or AVX.

Not true. memcpy() and friends CAN and do use SSE/AVX - if the source/dests are aligned properly.

Code:
Donations: BTC: 1WoLFdwcfNEg64fTYsX1P25KUzzSjtEZC -- XMR: 45SLUTzk7UXYHmzJ7bFN6FPfzTusdUVAZjPRgmEDw7G3SeimWM2kCdnDQXwDBYGUWaBtZNgjYtEYA22aMQT4t8KfU3vHLHG
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1016


View Profile
August 06, 2016, 12:16:45 AM
 #999


Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.


That's what I was referring to when I wrote "compiler produced AVX2". AVX(2) provides SIMD instructions an it's unlikely something
like a printf would use it. A memcpy wouldn't use it because of the overhead of loading/storing the data to/from the ymm regs.
It's only useful for vector arith, and apparently the compiler isn't smart enough to convert conventionally coded array processing
loops to AVX2. I'm not even sure *if* the compiler can optimize in this fashion, the existance of so much hand coded AVX2 suggests
otherwise.

Since we're playing semantic games would you care to explain your concerns with my use of the term cross compiling?
IMO cross compiling can mean any compilation not done on the target machine and not executable on the build machine.

And my comment about you maybe using a core2 was based on the symptoms you decribed and that some server CPUs can
be optimized for efficiency by removing/disabling unneeded features like floating point, AES or AVX.

Not true. memcpy() and friends CAN and do use SSE/AVX - if the source/dests are aligned properly.

It doesn't really matter in this context whether it crashes before or after the warning message.

I'll take your word for it, but it doesn't seem to make much sense. It is essentially load, move, store, 256 bits wide. Where
are the savings? I presume it takes longer to load data into the ymm regs than general purpose ones. The same amount of data
has to be moved around in memory. Using AVX seems to make sense if you're going to do a lot of processing of the data while
in vector format.

There's my strawman, rip it apart.

Speaking of alignment I need to fix that up in my avx code. I used all loadu/storeu for convenience.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Wolf0
Legendary
*
Offline Offline

Activity: 1834
Merit: 1002


Miner Developer


View Profile
August 06, 2016, 12:48:52 AM
 #1000


Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.


That's what I was referring to when I wrote "compiler produced AVX2". AVX(2) provides SIMD instructions an it's unlikely something
like a printf would use it. A memcpy wouldn't use it because of the overhead of loading/storing the data to/from the ymm regs.
It's only useful for vector arith, and apparently the compiler isn't smart enough to convert conventionally coded array processing
loops to AVX2. I'm not even sure *if* the compiler can optimize in this fashion, the existance of so much hand coded AVX2 suggests
otherwise.

Since we're playing semantic games would you care to explain your concerns with my use of the term cross compiling?
IMO cross compiling can mean any compilation not done on the target machine and not executable on the build machine.

And my comment about you maybe using a core2 was based on the symptoms you decribed and that some server CPUs can
be optimized for efficiency by removing/disabling unneeded features like floating point, AES or AVX.

Not true. memcpy() and friends CAN and do use SSE/AVX - if the source/dests are aligned properly.

It doesn't really matter in this context whether it crashes before or after the warning message.

I'll take your word for it, but it doesn't seem to make much sense. It is essentially load, move, store, 256 bits wide. Where
are the savings? I presume it takes longer to load data into the ymm regs than general purpose ones. The same amount of data
has to be moved around in memory. Using AVX seems to make sense if you're going to do a lot of processing of the data while
in vector format.

There's my strawman, rip it apart.

Speaking of alignment I need to fix that up in my avx code. I used all loadu/storeu for convenience.

If it's aligned, then the load/stores don't take nearly as long. Also keep in mind that there's no such thing as a mov memaddr, memaddr opcode in x86 that I know of. Therefore, it's gotta go in a register (this is simplified, I know about things like DMA, but they don't come into play for the purposes of this discussion) and if it's aligned, it makes one hell of a lot more sense to stuff it in an AVX register, because it's a lot wider than a GPR. Even better if you're doing some kind of gather-scatter shit, possibly.

Code:
Donations: BTC: 1WoLFdwcfNEg64fTYsX1P25KUzzSjtEZC -- XMR: 45SLUTzk7UXYHmzJ7bFN6FPfzTusdUVAZjPRgmEDw7G3SeimWM2kCdnDQXwDBYGUWaBtZNgjYtEYA22aMQT4t8KfU3vHLHG
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 [50] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 ... 191 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!