Bitcoin Forum
January 18, 2018, 08:54:20 AM *
News: Latest stable version of Bitcoin Core: 0.15.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 ... 165 »
  Print  
Author Topic: [ANN]: cpuminer-opt v3.7.10, open source optimized multi-algo CPU miner  (Read 390416 times)
th3.r00t
Sr. Member
****
Offline Offline

Activity: 311



View Profile WWW
April 20, 2016, 07:38:19 PM
 #501

Can you run this command on your AMD processors and show me the output?
Code:
gcc -march=native -Q --help=target | fgrep march

Here you are:
Code:
root@beast:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               amdfam10

This curious. I presume that shows which arch is used by native.

On my skylake I get core2-avx and on my haswell I get corei7-avx.
configure fails with -march=skylake on my skylake.
Yeah! This got me curious and do some tests too.  Cool

Intel Core i7-4790K CPU @ 4.40GHz
Code:
root@storm:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               core-avx2

AMD Sempron 145
Code:
root@wolverine:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               amdfam10

BitSend ◢◤Clients | Source
www.bitsend.info
█▄
█████▄
████████▄
███████████▄
██████████████
███████████▀
████████▀
█████▀
█▀












Your Digital Network | 10MB Blocks
Algo: XEVAN | DK3 | Masternodes
Bitcore - BTX/BTC -Project












BSD -USDT | Bittrex | C.Gather | S.Exchange
Cryptopia | NovaExchange | Livecoin
CoinPayments | Faucet | Bitsend Airdrop













████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████

████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████
1516265660
Hero Member
*
Offline Offline

Posts: 1516265660

View Profile Personal Message (Offline)

Ignore
1516265660
Reply with quote  #2

1516265660
Report to moderator
1516265660
Hero Member
*
Offline Offline

Posts: 1516265660

View Profile Personal Message (Offline)

Ignore
1516265660
Reply with quote  #2

1516265660
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1516265660
Hero Member
*
Offline Offline

Posts: 1516265660

View Profile Personal Message (Offline)

Ignore
1516265660
Reply with quote  #2

1516265660
Report to moderator
1516265660
Hero Member
*
Offline Offline

Posts: 1516265660

View Profile Personal Message (Offline)

Ignore
1516265660
Reply with quote  #2

1516265660
Report to moderator
joblo
Legendary
*
Offline Offline

Activity: 1008


View Profile
April 20, 2016, 07:55:38 PM
 #502

Can you run this command on your AMD processors and show me the output?
Code:
gcc -march=native -Q --help=target | fgrep march

Here you are:
Code:
root@beast:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               amdfam10

This curious. I presume that shows which arch is used by native.

On my skylake I get core2-avx and on my haswell sandy bridge I get corei7-avx.
configure fails with -march=skylake on my skylake.
Yeah! This got me curious and do some tests too.  Cool

Intel Core i7-4790K CPU @ 4.40GHz
Code:
root@storm:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               core-avx2

AMD Sempron 145
Code:
root@wolverine:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               amdfam10

Correction, my sandy bridge shows corei7-avx.

Principal developer of cpuminer-opt, the optimized multi-algo CPU miner.
BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
https://bitcointalk.org/index.php?topic=1326803.0
joblo
Legendary
*
Offline Offline

Activity: 1008


View Profile
April 20, 2016, 08:27:53 PM
 #503

cpuminer-opt v3.1.16 adds m7m algo.

https://drive.google.com/file/d/0B0lVSGQYLJIZSHJ3dGMxZUJfeTQ/view?usp=sharing

Principal developer of cpuminer-opt, the optimized multi-algo CPU miner.
BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
https://bitcointalk.org/index.php?topic=1326803.0
hmage
Member
**
Offline Offline

Activity: 83


View Profile
April 21, 2016, 01:50:44 PM
 #504

Can you run this command on your AMD processors and show me the output?
Code:
gcc -march=native -Q --help=target | fgrep march

Here you are:
Code:
root@beast:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               amdfam10

Then your build should have worked out of the box, with -march=native. amdfam10 doesn't have AES support and gcc won't define __AES__ macro. Can you try building it with -march=native again?

This curious. I presume that shows which arch is used by native.

On my skylake I get core2-avx and on my haswell I get corei7-avx.
configure fails with -march=skylake on my skylake.

Yes, this shows which arch gcc use for -march=native.

In your case skylake is too new for gcc, your gcc 4.8.4 doesn't know about it, it should choose the closest match with most features enabled. There's no 'core2-avx' in GCC 4.8.4 manual, maybe you meant 'core-avx2'? core-avx2 defines __AES__ automatically.

https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/i386-and-x86-64-Options.html
joblo
Legendary
*
Offline Offline

Activity: 1008


View Profile
April 21, 2016, 03:24:24 PM
 #505

Can you run this command on your AMD processors and show me the output?
Code:
gcc -march=native -Q --help=target | fgrep march

Here you are:
Code:
root@beast:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               amdfam10

Then your build should have worked out of the box, with -march=native. amdfam10 doesn't have AES support and gcc won't define __AES__ macro. Can you try building it with -march=native again?

This curious. I presume that shows which arch is used by native.

On my skylake I get core2-avx core-avx2 and on my haswell sandy bridge I get corei7-avx.
configure fails with -march=skylake on my skylake.

Yes, this shows which arch gcc use for -march=native.

In your case skylake is too new for gcc, your gcc 4.8.4 doesn't know about it, it should choose the closest match with most features enabled. There's no 'core2-avx' in GCC 4.8.4 manual, maybe you meant 'core-avx2'? core-avx2 defines __AES__ automatically.

https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/i386-and-x86-64-Options.html

Hmage, you're a good teacher and you know your stuff, I'm learning a lot. Core-avx2 correction noted.

Principal developer of cpuminer-opt, the optimized multi-algo CPU miner.
BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
https://bitcointalk.org/index.php?topic=1326803.0
AlexGR
Legendary
*
Offline Offline

Activity: 1484



View Profile
April 21, 2016, 06:48:56 PM
 #506

May I propose a different approach for much faster mining?

Currently, most, if not all of CPU-mineable coins, are cripple-mined.

The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.

SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data / SIMD), meaning they are not processing 2 batches of information but one.

Right now hashing goes on like that:

The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations / permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).

This serial process doesn't allow for much Single Instruction Multiple Data utilization.

What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.

Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here: http://ispc.github.io/perf.html
joblo
Legendary
*
Offline Offline

Activity: 1008


View Profile
April 21, 2016, 07:23:33 PM
 #507

May I propose a different approach for much faster mining?

Currently, most, if not all of CPU-mineable coins, are cripple-mined.

The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.

SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data / SIMD), meaning they are not processing 2 batches of information but one.

Right now hashing goes on like that:

The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations / permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).

This serial process doesn't allow for much Single Instruction Multiple Data utilization.

What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.

Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here: http://ispc.github.io/perf.html

That's a fascinating idea but I don't think it will get the visibility here that it deserves. Pooler and TPruvot are the two main guys for
CPU mining although TPruvot is focussed more on other projects at the moment. Both have active threads in this forum. I suggest you
present your idea to them in case they, or their folllowers, may want to take on the challenge. It's beyond my skill level.


Principal developer of cpuminer-opt, the optimized multi-algo CPU miner.
BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
https://bitcointalk.org/index.php?topic=1326803.0
th3.r00t
Sr. Member
****
Offline Offline

Activity: 311



View Profile WWW
April 21, 2016, 07:32:34 PM
 #508

Can you run this command on your AMD processors and show me the output?
Code:
gcc -march=native -Q --help=target | fgrep march

Here you are:
Code:
root@beast:~$ gcc -march=native -Q --help=target | fgrep march
  -march=                               amdfam10

Then your build should have worked out of the box, with -march=native. amdfam10 doesn't have AES support and gcc won't define __AES__ macro. Can you try building it with -march=native again?

This curious. I presume that shows which arch is used by native.

On my skylake I get core2-avx and on my haswell I get corei7-avx.
configure fails with -march=skylake on my skylake.

Yes, this shows which arch gcc use for -march=native.

In your case skylake is too new for gcc, your gcc 4.8.4 doesn't know about it, it should choose the closest match with most features enabled. There's no 'core2-avx' in GCC 4.8.4 manual, maybe you meant 'core-avx2'? core-avx2 defines __AES__ automatically.

https://gcc.gnu.org/onlinedocs/gcc-4.8.4/gcc/i386-and-x86-64-Options.html

Nope...
Still ./build.sh fails on AMD
Maybe I am to try something else?

BitSend ◢◤Clients | Source
www.bitsend.info
█▄
█████▄
████████▄
███████████▄
██████████████
███████████▀
████████▀
█████▀
█▀












Your Digital Network | 10MB Blocks
Algo: XEVAN | DK3 | Masternodes
Bitcore - BTX/BTC -Project












BSD -USDT | Bittrex | C.Gather | S.Exchange
Cryptopia | NovaExchange | Livecoin
CoinPayments | Faucet | Bitsend Airdrop













████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████

████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████
pallas
Legendary
*
Offline Offline

Activity: 1526


Black Belt Developer


View Profile
April 21, 2016, 07:37:59 PM
 #509

It's not a new idea. It was used back in the GPU bitcoin mining days to get better speed on amd VLIW cards.
It's easy to adapt the miner itself to process multiple nonces per thread, not sure about how much work is needed to work on the algos themselves. Maybe we could make a test with a simple algo like blake. But I'm not the man because I'm not proficient in those cpu instruction extensions.

th3.r00t
Sr. Member
****
Offline Offline

Activity: 311



View Profile WWW
April 21, 2016, 07:40:08 PM
 #510

Code:
        **********  cpuminer-opt 3.1.16  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        AMD Phenom(tm) II X4 940 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........NO.
Incompatible SW build, rebuild with "-march=native"

Why?

BitSend ◢◤Clients | Source
www.bitsend.info
█▄
█████▄
████████▄
███████████▄
██████████████
███████████▀
████████▀
█████▀
█▀












Your Digital Network | 10MB Blocks
Algo: XEVAN | DK3 | Masternodes
Bitcore - BTX/BTC -Project












BSD -USDT | Bittrex | C.Gather | S.Exchange
Cryptopia | NovaExchange | Livecoin
CoinPayments | Faucet | Bitsend Airdrop













████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████

████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████
Giulini
Full Member
***
Offline Offline

Activity: 192


View Profile
April 21, 2016, 07:48:43 PM
 #511

same here with: Sempron145 CPU, configure und make with no mistakes

Code:
        **********  cpuminer-opt 3.1.16  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        AMD Phenom(tm) II X4 940 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........NO.
Incompatible SW build, rebuild with "-march=native"

Why?
AlexGR
Legendary
*
Offline Offline

Activity: 1484



View Profile
April 21, 2016, 08:06:33 PM
 #512

May I propose a different approach for much faster mining?

Currently, most, if not all of CPU-mineable coins, are cripple-mined.

The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.

SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data / SIMD), meaning they are not processing 2 batches of information but one.

Right now hashing goes on like that:

The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations / permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).

This serial process doesn't allow for much Single Instruction Multiple Data utilization.

What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.

Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here: http://ispc.github.io/perf.html

That's a fascinating idea but I don't think it will get the visibility here that it deserves. Pooler and TPruvot are the two main guys for
CPU mining although TPruvot is focussed more on other projects at the moment. Both have active threads in this forum. I suggest you
present your idea to them in case they, or their folllowers, may want to take on the challenge. It's beyond my skill level.

It's ok, don't worry. Some people reading this thread will know what to do with it.

I'm not in altcoin mining really as I don't have the hardware and I'm not in the mood of renting. Obviously there's a lot of money here for optimized miners that are doing multiple hashrates than the ordinary ones. But this idea also extends to scaling of bitcoin and altcoins for things like cryptographic verification etc. They are using serial functionality when it could be done in packs of 2 or 4 (or 8 in something like ...AVX3-4-5 - or AVX512 which already exists).
AlexGR
Legendary
*
Offline Offline

Activity: 1484



View Profile
April 21, 2016, 08:22:18 PM
 #513

It's not a new idea. It was used back in the GPU bitcoin mining days to get better speed on amd VLIW cards.
It's easy to adapt the miner itself to process multiple nonces per thread, not sure about how much work is needed to work on the algos themselves. Maybe we could make a test with a simple algo like blake. But I'm not the man because I'm not proficient in those cpu instruction extensions.

Neither am I, but it's not that difficult.

Say for example you have a loop like:


for (i = 0; i <100000000; i++)
   b=sqrt (b);
   bb=sqrt(bb);
   bbb=sqrt(bbb);
   bbbb=sqrt(bbbb);


...gcc will make it something like:

40072e:   0f 84 9b 00 00 00       je     4007cf <main+0x12f>
  400734:   f2 0f 51 d6             sqrtsd %xmm6,%xmm2
  400738:   66 0f 2e d2             ucomisd %xmm2,%xmm2
  40073c:   0f 8a 63 02 00 00       jp     4009a5 <main+0x305>
  400742:   66 0f 28 f2             movapd %xmm2,%xmm6
  400746:   f2 0f 51 cd             sqrtsd %xmm5,%xmm1
  40074a:   66 0f 2e c9             ucomisd %xmm1,%xmm1
  40074e:   0f 8a d9 01 00 00       jp     40092d <main+0x28d>
  400754:   66 0f 28 e9             movapd %xmm1,%xmm5
  400758:   f2 0f 51 c7             sqrtsd %xmm7,%xmm0
  40075c:   66 0f 2e c0             ucomisd %xmm0,%xmm0
  400760:   0f 8a 47 01 00 00       jp     4008ad <main+0x20d>
  400766:   66 0f 28 f8             movapd %xmm0,%xmm7
  40076a:   f2 0f 51 c3             sqrtsd %xmm3,%xmm0
  40076e:   66 0f 2e c0             ucomisd %xmm0,%xmm0
  400772:   0f 8a b5 00 00 00       jp     40082d <main+0x18d>

...which is sqrt-scalar-double.

4 instructions / 4 math operations.

What could be done differently (intel syntax follows):

     movlpd xmm1, b      //loading the first variable "b" to the lower part of xmm1
     movhpd xmm1, bb     //loading the second variable "bb" to the higher part of xmm1
     SQRTPD xmm1, xmm1   //batch processing both variables for their square root, with one SIMD command
     movlpd xmm2, bbb    //loading the third variable "bbb" to the lower part of xmm2
     movhpd xmm2, bbbb   //loading the fourth variable "bbbb" to the higher part of xmm2
     SQRTPD xmm2, xmm2   //batch processing their square roots
     movlpd b, xmm1      //
     movhpd bb, xmm1     // Returning all results from the register back memory
     movlpd bbb, xmm2    //
     movhpd bbbb, xmm2   //

SQRTPD - Square root - P(acked)-Double.

So now 4 maths instructions became 2 and the time got down in half (I've actually benchmarked the above and it goes near half). But in order to pack instructions (math or logical) you need to have similar processing load, similar operations. You can't have that in a scenario where it goes like

sqrt
add
shift
xor

and the function is changing...

But if you loaded 4x hashes together, you'd be looking at

sqrt(of the first) sqrt (of the second) sqrt (third) sqrt (fourth) (<=pack them)
add add add add (<=pack them)
shift shift shift shift (<=pack them)
xor xor xor xor (<pack them)

...etc

I wasn't even aware of the above, until a couple of weeks ago when I got down to asm level to see what happens and why some Pascal output was slower than C output... then I run into http://x86.renejeschke.de as a reference where I was trying to understand the instructions and what they are doing, and then rewrote some instructions myself - like the above with the packed (I thought it was pretty easy really) and then, more recently, I went over the code of the asm hash functions of altcoins and bitcoin - and it was full of serial operations, despite "SSE/AVX use" / "SSE/AVX enhanced". And I'm like WHAT THE F***? This is all crippled.
joblo
Legendary
*
Offline Offline

Activity: 1008


View Profile
April 21, 2016, 08:43:23 PM
 #514

Code:
        **********  cpuminer-opt 3.1.16  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        AMD Phenom(tm) II X4 940 Processor
   CPU arch supports AES_NI...NO.
   CPU arch supports SSE2.....YES.
   SW built for SSE2..........NO.
Incompatible SW build, rebuild with "-march=native"

Why?

What this message is supposed to mean is although the CPU supports SSE2 it wasn't compiled in.
This should only occur is if you specify an arch that doesn't support SSE2. Which arch did you use?

You can override the error by commenting out the exit statement below and recpompiling. Note
cpuminer may crash with the override if the message was correct.

cpu-miner.c function check_cpu_capability line#2700
Code:
        // make sure CPU has at least SSE2
         printf("   CPU arch supports SSE2.....");
         if ( cpu_has_sse2 )
         {
            printf("%s\n", grn_yes );
            printf("   SW built for SSE2..........");
            if ( sw_has_sse2 && !sw_has_aes )
            {
                printf("%s\n", grn_yes );
                printf_mine_without_aes();
            }
            else
            {
                printf("%s\n", ylw_no );
                printf_bad_build();
                exit(1);                            <-------- delete or comment this line

It looks like AMD is going to be a challenge. As AMD users I'll leave it up to you guys to figure out the workarounds
for -march for various CPUs. I can then add notes to the README

Principal developer of cpuminer-opt, the optimized multi-algo CPU miner.
BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
https://bitcointalk.org/index.php?topic=1326803.0
joblo
Legendary
*
Offline Offline

Activity: 1008


View Profile
April 21, 2016, 08:52:49 PM
 #515

May I propose a different approach for much faster mining?

Currently, most, if not all of CPU-mineable coins, are cripple-mined.

The reason is simple: Under-utilizing of the SIMD nature of SSE & AVX sets.

SSE and AVX commands are used in SISD fashion (single instruction single data, instead of Multiple data / SIMD), meaning they are not processing 2 batches of information but one.

Right now hashing goes on like that:

The main mining routine sends one output to each hash, where it will be subject to a process of SERIAL transmutations / permutation and in the end the hash will output that data back to the miner (some times to send it to the next hash).

This serial process doesn't allow for much Single Instruction Multiple Data utilization.

What should be done instead is that the miner program should issue 2-4 hash candidates to the hashing routines. The hashing routines should be able to get 2-4 inputs (instead of 1) and return back 2-4 outputs. In this way the process would be paralleled and SIMD utilization (packed processing of similar instructions) would result in much faster processing.

Now this might require a lot of recoding, or, one could adjust the code in C for use with a special compiler which runs multiple instances of serial data crunching in order to process them in "packs" with SIMD or "packed" instructions - and then let the compiler do all the packing. Performance benefits of such an approach here: http://ispc.github.io/perf.html

That's a fascinating idea but I don't think it will get the visibility here that it deserves. Pooler and TPruvot are the two main guys for
CPU mining although TPruvot is focussed more on other projects at the moment. Both have active threads in this forum. I suggest you
present your idea to them in case they, or their folllowers, may want to take on the challenge. It's beyond my skill level.

It's ok, don't worry. Some people reading this thread will know what to do with it.

I'm not in altcoin mining really as I don't have the hardware and I'm not in the mood of renting. Obviously there's a lot of money here for optimized miners that are doing multiple hashrates than the ordinary ones. But this idea also extends to scaling of bitcoin and altcoins for things like cryptographic verification etc. They are using serial functionality when it could be done in packs of 2 or 4 (or 8 in something like ...AVX3-4-5 - or AVX512 which already exists).

Going after the algos would be daunting as each code segment would have to be analyzed individually. Modifying the scanning
engine to process two, or more, nonces in parallel might give bigger gains at lower effort.

How does ccminer do it in cuda?

Principal developer of cpuminer-opt, the optimized multi-algo CPU miner.
BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
https://bitcointalk.org/index.php?topic=1326803.0
AlexGR
Legendary
*
Offline Offline

Activity: 1484



View Profile
April 21, 2016, 09:26:20 PM
 #516

No idea, haven't looked into CUDA mining.
th3.r00t
Sr. Member
****
Offline Offline

Activity: 311



View Profile WWW
April 21, 2016, 10:27:59 PM
 #517

Quote from: joblo
What this message is supposed to mean is although the CPU supports SSE2 it wasn't compiled in.
This should only occur is if you specify an arch that doesn't support SSE2. Which arch did you use?

Same as always:
Code:
./autogen.sh && ./configure CFLAGS="-O3 -march=btver1" --with-curl --with-crypto && make
I am 100% sure that btver1 includes SSE2

Quote from: joblo
It looks like AMD is going to be a challenge. As AMD users I'll leave it up to you guys to figure out the workarounds
for -march for various CPUs. I can then add notes to the README

The same commandline worked for AMD since cpuminer-opt-3.1.9 and now does'nt on cpuminer-opt-3.1.16.
Last version it worked was cpuminer-opt-3.1.15, so something is changed between them.

Also the compile output messages is really small, even in Intel CPU in cpuminer-opt-3.1.16.

BitSend ◢◤Clients | Source
www.bitsend.info
█▄
█████▄
████████▄
███████████▄
██████████████
███████████▀
████████▀
█████▀
█▀












Your Digital Network | 10MB Blocks
Algo: XEVAN | DK3 | Masternodes
Bitcore - BTX/BTC -Project












BSD -USDT | Bittrex | C.Gather | S.Exchange
Cryptopia | NovaExchange | Livecoin
CoinPayments | Faucet | Bitsend Airdrop













████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████

████
 ████
  ████
   ████
    ████
     ████
      ████
       ████
        ████
       ████
      ████
     ████
    ████
   ████
  ████
 ████
████
joblo
Legendary
*
Offline Offline

Activity: 1008


View Profile
April 21, 2016, 10:44:03 PM
 #518

Quote from: joblo
What this message is supposed to mean is although the CPU supports SSE2 it wasn't compiled in.
This should only occur is if you specify an arch that doesn't support SSE2. Which arch did you use?

Same as always:
Code:
./autogen.sh && ./configure CFLAGS="-O3 -march=btver1" --with-curl --with-crypto && make
I am 100% sure that btver1 includes SSE2

Quote from: joblo
It looks like AMD is going to be a challenge. As AMD users I'll leave it up to you guys to figure out the workarounds
for -march for various CPUs. I can then add notes to the README

The same commandline worked for AMD since cpuminer-opt-3.1.9 and now does'nt on cpuminer-opt-3.1.16.
Last version it worked was cpuminer-opt-3.1.15, so something is changed between them.

Also the compile output messages is really small, even in Intel CPU in cpuminer-opt-3.1.16.

I suspect the SSE2 SW check isn't working. Did you try to override it? If it works with the override I'll remove the check
permanently.

Edit: The override should work because the compile succeeded. If the compiler was truly compiling a non-SSE2 arch
it would have failed on the SSE2 instructions. It would seem the __SSE2__ compiler macro is unreliable. I may remove
the check completely or make it non-fatal.

Principal developer of cpuminer-opt, the optimized multi-algo CPU miner.
BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
https://bitcointalk.org/index.php?topic=1326803.0
AlexGR
Legendary
*
Offline Offline

Activity: 1484



View Profile
April 22, 2016, 01:34:05 AM
 #519

Useful link for replacing slow & obsolete implementations: http://bench.cr.yp.to/primitives-hash.html

Perhaps if one googles algo by algo, they can find even better (?).
pallas
Legendary
*
Offline Offline

Activity: 1526


Black Belt Developer


View Profile
April 22, 2016, 07:55:31 AM
 #520

Going after the algos would be daunting as each code segment would have to be analyzed individually. Modifying the scanning
engine to process two, or more, nonces in parallel might give bigger gains at lower effort.

How does ccminer do it in cuda?

it's pretty basic.
there is a for cycle with step = number of threads.
just divide the number of threads by the nonces per thread when running the kernel, and make the single thread process more nonces.
you can even do it all in the algo specific file (I did it for decred), without touching the main code.

Pages: « 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [26] 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 ... 165 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!