Bitcoin Forum
October 21, 2018, 11:30:40 AM *
News: Make sure you are not using versions of Bitcoin Core other than 0.17.0 [Torrent], 0.16.3, 0.15.2, or 0.14.3. More info.
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 ... 190 »
  Print  
Author Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner  (Read 416066 times)
Giulini
Full Member
***
Offline Offline

Activity: 192
Merit: 100


View Profile
April 21, 2016, 07:48:43 PM
 #501

same here with: Sempron145 CPU, configure und make with no mistakes

Code:
        **********  cpuminer-opt 3.1.16  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        AMD Phenom(tm) II X4 940 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........NO.
Incompatible SW build, rebuild with "-march=native"

Why?
1540121440
Hero Member
*
Offline Offline

Posts: 1540121440

View Profile Personal Message (Offline)

Ignore
1540121440
Reply with quote  #2

1540121440
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
AlexGR
Legendary
*
Offline Offline

Activity: 1554
Merit: 1025



View Profile
April 21, 2016, 08:06:33 PM
 #502

May I propose a different approach for much faster mining?

Currently, most, if not all of CPU-mineable coins, are cripple-mined.

The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.

SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data / SIMD), meaning they are not processing 2 batches of information but one.

Right now hashing goes on like that:

The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations / permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).

This serial process doesn't allow for much Single Instruction Multiple Data utilization.

What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.

Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here: http://ispc.github.io/perf.html

That's a fascinating idea but I don't think it will get the visibility here that it deserves. Pooler and TPruvot are the two main guys for
CPU mining although TPruvot is focussed more on other projects at the moment. Both have active threads in this forum. I suggest you
present your idea to them in case they, or their folllowers, may want to take on the challenge. It's beyond my skill level.

It's ok, don't worry. Some people reading this thread will know what to do with it.

I'm not in altcoin mining really as I don't have the hardware and I'm not in the mood of renting. Obviously there's a lot of money here for optimized miners that are doing multiple hashrates than the ordinary ones. But this idea also extends to scaling of bitcoin and altcoins for things like cryptographic verification etc. They are using serial functionality when it could be done in packs of 2 or 4 (or 8 in something like ...AVX3-4-5 - or AVX512 which already exists).
AlexGR
Legendary
*
Offline Offline

Activity: 1554
Merit: 1025



View Profile
April 21, 2016, 08:22:18 PM
 #503

It's not a new idea. It was used back in the GPU bitcoin mining days to get better speed on amd VLIW cards.
It's easy to adapt the miner itself to process multiple nonces per thread, not sure about how much work is needed to work on the algos themselves. Maybe we could make a test with a simple algo like blake. But I'm not the man because I'm not proficient in those cpu instruction extensions.

Neither am I, but it's not that difficult.

Say for example you have a loop like:


for (i = 0; i <100000000; i++)
   b=sqrt (b);
   bb=sqrt(bb);
   bbb=sqrt(bbb);
   bbbb=sqrt(bbbb);


...gcc will make it something like:

40072e:   0f 84 9b 00 00 00       je     4007cf <main+0x12f>
  400734:   f2 0f 51 d6             sqrtsd %xmm6,%xmm2
  400738:   66 0f 2e d2             ucomisd %xmm2,%xmm2
  40073c:   0f 8a 63 02 00 00       jp     4009a5 <main+0x305>
  400742:   66 0f 28 f2             movapd %xmm2,%xmm6
  400746:   f2 0f 51 cd             sqrtsd %xmm5,%xmm1
  40074a:   66 0f 2e c9             ucomisd %xmm1,%xmm1
  40074e:   0f 8a d9 01 00 00       jp     40092d <main+0x28d>
  400754:   66 0f 28 e9             movapd %xmm1,%xmm5
  400758:   f2 0f 51 c7             sqrtsd %xmm7,%xmm0
  40075c:   66 0f 2e c0             ucomisd %xmm0,%xmm0
  400760:   0f 8a 47 01 00 00       jp     4008ad <main+0x20d>
  400766:   66 0f 28 f8             movapd %xmm0,%xmm7
  40076a:   f2 0f 51 c3             sqrtsd %xmm3,%xmm0
  40076e:   66 0f 2e c0             ucomisd %xmm0,%xmm0
  400772:   0f 8a b5 00 00 00       jp     40082d <main+0x18d>

...which is sqrt-scalar-double.

4 instructions / 4 math operations.

What could be done differently (intel syntax follows):

     movlpd xmm1, b      //loading the first variable "b" to the lower part of xmm1
     movhpd xmm1, bb     //loading the second variable "bb" to the higher part of xmm1
     SQRTPD xmm1, xmm1   //batch processing both variables for their square root, with one SIMD command
     movlpd xmm2, bbb    //loading the third variable "bbb" to the lower part of xmm2
     movhpd xmm2, bbbb   //loading the fourth variable "bbbb" to the higher part of xmm2
     SQRTPD xmm2, xmm2   //batch processing their square roots
     movlpd b, xmm1      //
     movhpd bb, xmm1     // Returning all results from the register back memory
     movlpd bbb, xmm2    //
     movhpd bbbb, xmm2   //

SQRTPD - Square root - P(acked)-Double.

So now 4 maths instructions became 2 and the time got down in half (I've actually benchmarked the above and it goes near half). But in order to pack instructions (math or logical) you need to have similar processing load, similar operations. You can't have that in a scenario where it goes like

sqrt
add
shift
xor

and the function is changing...

But if you loaded 4x hashes together, you'd be looking at

sqrt(of the first) sqrt (of the second) sqrt (third) sqrt (fourth) (<=pack them)
add add add add (<=pack them)
shift shift shift shift (<=pack them)
xor xor xor xor (<pack them)

...etc

I wasn't even aware of the above, until a couple of weeks ago when I got down to asm level to see what happens and why some Pascal output was slower than C output... then I run into http://x86.renejeschke.de as a reference where I was trying to understand the instructions and what they are doing, and then rewrote some instructions myself - like the above with the packed (I thought it was pretty easy really) and then, more recently, I went over the code of the asm hash functions of altcoins and bitcoin - and it was full of serial operations, despite "SSE/AVX use" / "SSE/AVX enhanced". And I'm like WHAT THE F***? This is all crippled.
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 21, 2016, 08:43:23 PM
 #504

Code:
        **********  cpuminer-opt 3.1.16  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        AMD Phenom(tm) II X4 940 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........NO.
Incompatible SW build, rebuild with "-march=native"

Why?

What this message is supposed to mean is although the CPU supports SSE2 it wasn't compiled in.
This should only occur is if you specify an arch that doesn't support SSE2. Which arch did you use?

You can override the error by commenting out the exit statement below and recpompiling. Note
cpuminer may crash with the override if the message was correct.

cpu-miner.c function check_cpu_capability line#2700
Code:
        // make sure CPU has at least SSE2
         printf("   CPU arch supports SSE2.....");
         if ( cpu_has_sse2 )
         {
            printf("%s\n", grn_yes );
            printf("   SW built for SSE2..........");
            if ( sw_has_sse2 && !sw_has_aes )
            {
                printf("%s\n", grn_yes );
                printf_mine_without_aes();
            }
            else
            {
                printf("%s\n", ylw_no );
                printf_bad_build();
                exit(1);                            <-------- delete or comment this line

It looks like AMD is going to be a challenge. As AMD users I'll leave it up to you guys to figure out the workarounds
for -march for various CPUs. I can then add notes to the README

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 21, 2016, 08:52:49 PM
 #505

May I propose a different approach for much faster mining?

Currently, most, if not all of CPU-mineable coins, are cripple-mined.

The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.

SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data / SIMD), meaning they are not processing 2 batches of information but one.

Right now hashing goes on like that:

The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations / permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).

This serial process doesn't allow for much Single Instruction Multiple Data utilization.

What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.

Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here: http://ispc.github.io/perf.html

That's a fascinating idea but I don't think it will get the visibility here that it deserves. Pooler and TPruvot are the two main guys for
CPU mining although TPruvot is focussed more on other projects at the moment. Both have active threads in this forum. I suggest you
present your idea to them in case they, or their folllowers, may want to take on the challenge. It's beyond my skill level.

It's ok, don't worry. Some people reading this thread will know what to do with it.

I'm not in altcoin mining really as I don't have the hardware and I'm not in the mood of renting. Obviously there's a lot of money here for optimized miners that are doing multiple hashrates than the ordinary ones. But this idea also extends to scaling of bitcoin and altcoins for things like cryptographic verification etc. They are using serial functionality when it could be done in packs of 2 or 4 (or 8 in something like ...AVX3-4-5 - or AVX512 which already exists).

Going after the algos would be daunting as each code segment would have to be analyzed individually. Modifying the scanning
engine to process two, or more, nonces in parallel might give bigger gains at lower effort.

How does ccminer do it in cuda?

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
AlexGR
Legendary
*
Offline Offline

Activity: 1554
Merit: 1025



View Profile
April 21, 2016, 09:26:20 PM
 #506

No idea, haven't looked into CUDA mining.
th3.r00t
Sr. Member
****
Offline Offline

Activity: 312
Merit: 250



View Profile WWW
April 21, 2016, 10:27:59 PM
 #507

Quote from: joblo
What this message is supposed to mean is although the CPU supports SSE2 it wasn't compiled in.
This should only occur is if you specify an arch that doesn't support SSE2. Which arch did you use?

Same as always:
Code:
./autogen.sh && ./configure CFLAGS="-O3 -march=btver1" --with-curl --with-crypto && make
I am 100% sure that btver1 includes SSE2

Quote from: joblo
It looks like AMD is going to be a challenge. As AMD users I'll leave it up to you guys to figure out the workarounds
for -march for various CPUs. I can then add notes to the README

The same commandline worked for AMD since cpuminer-opt-3.1.9 and now does'nt on cpuminer-opt-3.1.16.
Last version it worked was cpuminer-opt-3.1.15, so something is changed between them.

Also the compile output messages is really small, even in Intel CPU in cpuminer-opt-3.1.16.

BitSend ◢◤Clients | Source
www.bitsend.info
█▄
█████▄
████████▄
███████████▄
██████████████
███████████▀
████████▀
█████▀
█▀












Your Digital Network | 10MB Blocks
Algo: XEVAN | DK3 | Masternodes
Bitcore - BTX/BTC -Project












BSD -USDT | Bittrex | C.Gather | S.Exchange
Cryptopia | NovaExchange | Livecoin
CoinPayments | Faucet | Bitsend Airdrop













████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████

████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 21, 2016, 10:44:03 PM
 #508

Quote from: joblo
What this message is supposed to mean is although the CPU supports SSE2 it wasn't compiled in.
This should only occur is if you specify an arch that doesn't support SSE2. Which arch did you use?

Same as always:
Code:
./autogen.sh && ./configure CFLAGS="-O3 -march=btver1" --with-curl --with-crypto && make
I am 100% sure that btver1 includes SSE2

Quote from: joblo
It looks like AMD is going to be a challenge. As AMD users I'll leave it up to you guys to figure out the workarounds
for -march for various CPUs. I can then add notes to the README

The same commandline worked for AMD since cpuminer-opt-3.1.9 and now does'nt on cpuminer-opt-3.1.16.
Last version it worked was cpuminer-opt-3.1.15, so something is changed between them.

Also the compile output messages is really small, even in Intel CPU in cpuminer-opt-3.1.16.

I suspect the SSE2 SW check isn't working. Did you try to override it? If it works with the override I'll remove the check
permanently.

Edit: The override should work because the compile succeeded. If the compiler was truly compiling a non-SSE2 arch
it would have failed on the SSE2 instructions. It would seem the __SSE2__ compiler macro is unreliable. I may remove
the check completely or make it non-fatal.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
AlexGR
Legendary
*
Offline Offline

Activity: 1554
Merit: 1025



View Profile
April 22, 2016, 01:34:05 AM
 #509

Useful link for replacing slow & obsolete implementations: http://bench.cr.yp.to/primitives-hash.html

Perhaps if one googles algo by algo, they can find even better (?).
pallas
Legendary
*
Offline Offline

Activity: 1806
Merit: 1087


Black Belt Developer


View Profile
April 22, 2016, 07:55:31 AM
 #510

Going after the algos would be daunting as each code segment would have to be analyzed individually. Modifying the scanning
engine to process two, or more, nonces in parallel might give bigger gains at lower effort.

How does ccminer do it in cuda?

it's pretty basic.
there is a for cycle with step = number of threads.
just divide the number of threads by the nonces per thread when running the kernel, and make the single thread process more nonces.
you can even do it all in the algo specific file (I did it for decred), without touching the main code.

Giulini
Full Member
***
Offline Offline

Activity: 192
Merit: 100


View Profile
April 22, 2016, 12:43:33 PM
 #511

Tried it out to delete the sourcecode line, then "SW built for SSE2..........NO." change to "YES", but the miner stops working immediately; tried also several -march versions

same here with: Sempron145 CPU, configure und make with no mistakes

Code:
        **********  cpuminer-opt 3.1.16  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        AMD Phenom(tm) II X4 940 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........NO.
Incompatible SW build, rebuild with "-march=native"

Why?
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 22, 2016, 02:39:01 PM
 #512

Tried it out to delete the sourcecode line, then "SW built for SSE2..........NO." change to "YES", but the miner stops working immediately; tried also several -march versions

More info please. What exactly did you change? I didn't suggest changing no to yes. What does stopped working mean?
Did it compile?, did it crash? did it exit cleanly?

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 22, 2016, 03:15:19 PM
 #513

A number of users have reported problems with AMD CPUS that don't have AES_NI but do
have SSE2.

Problems include compile failures and error exits on startup due to a perceived lack of support
for SSE2.

Solving this problem would go quicker with better info from the users when reporting problems.
This applies to any problem, not just the current AMD issues. As a result here are some tips for
problem reporting.

Give some info about your environment, CPU, OS, etc.

In addition to a description of the problem show the problem. Post the console session where the problem occurred
showing the command entered and the output produced.

Is it a new problem, ie did it work before?

Have you deviated from the recommended or previously used procedure or is there any change in the environment?

Have you tried to solve or workaround the problem yourself? How?

Provide info specific to the problem. In this case run the folllowing command and post the output:

gcc -march=native -Q --help=target | fgrep march


cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 22, 2016, 03:38:38 PM
 #514

I've released cpuminer-opt v3.1.17 due to problems with AMD CPUs. The 3.1.16 package was also
not clean, I don't know if the garbage I left lying wroind contrinuted to the problems seen by users.

Here is a clean package with the SSE2 build check disabled.

https://drive.google.com/file/d/0B0lVSGQYLJIZdE0wZ1d6VzJIeG8/view?usp=sharing

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Giulini
Full Member
***
Offline Offline

Activity: 192
Merit: 100


View Profile
April 22, 2016, 03:42:30 PM
 #515

here my output, ubuntu 14.04, will try your next version

.... :~$ gcc -march=native -Q --help=target | fgrep march
  -march=                           amdfam10
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 22, 2016, 03:58:29 PM
 #516

here my output, ubuntu 14.04, will try your next version

.... :~$ gcc -march=native -Q --help=target | fgrep march
  -march=                           amdfam10


Thanks. Did you use build.sh or enter the commands individually?

Could you also try "-march=btver1", it worked for another user with, IIRC, a similar CPU.

I think maybe amdfam10 isn't being seen as SSE2 capable. If that is the case it is outside
the scope of the application and appears to be a compiler issue. The change to disable
the SSE2 build check in v3.1.17 will become permanent and a workaround will be documented
for AMD CPUs recommending "-march=btver1".

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
koin_miner
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
April 22, 2016, 04:05:16 PM
 #517

Is there a mac os version?

Can I compile this on mac os, if yes, how?
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 22, 2016, 04:08:04 PM
 #518

Is there a mac os version?

Can I compile this on mac os, if yes, how?

Sorry, Linux only.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Giulini
Full Member
***
Offline Offline

Activity: 192
Merit: 100


View Profile
April 22, 2016, 04:15:38 PM
 #519

I tried:

... :~/cpuminer-opt-3.1.17$ ./configure CFLAGS="-O3 -march=btver1" --with-curl --with-crypto && make

"configure" o.k., "make" completly failed
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
April 22, 2016, 04:21:49 PM
 #520

I tried:

... :~/cpuminer-opt-3.1.17$ ./configure CFLAGS="-O3 -march=btver1" --with-curl --with-crypto && make

"configure" o.k., "make" completly failed

Console?

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 ... 190 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!