Bitcoin Forum
March 18, 2019, 05:08:48 PM *
News: Latest Bitcoin Core release: 0.17.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 [146] 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 »
  Print  
Author Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner  (Read 419060 times)
felixbrucker
Hero Member
*****
Offline Offline

Activity: 700
Merit: 500


View Profile WWW
November 25, 2017, 09:40:08 PM
 #2901

Large pages has already been done for cryptonight. I'm doing something that hasn't been done yet.
Large pages for cpuminer-opt will have to wait, though it could benefit a couple of memory hard algos.

looking forward to that as well Cheesy
1552928928
Hero Member
*
Offline Offline

Posts: 1552928928

View Profile Personal Message (Offline)

Ignore
1552928928
Reply with quote  #2

1552928928
Report to moderator
1552928928
Hero Member
*
Offline Offline

Posts: 1552928928

View Profile Personal Message (Offline)

Ignore
1552928928
Reply with quote  #2

1552928928
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1552928928
Hero Member
*
Offline Offline

Posts: 1552928928

View Profile Personal Message (Offline)

Ignore
1552928928
Reply with quote  #2

1552928928
Report to moderator
1552928928
Hero Member
*
Offline Offline

Posts: 1552928928

View Profile Personal Message (Offline)

Ignore
1552928928
Reply with quote  #2

1552928928
Report to moderator
1552928928
Hero Member
*
Offline Offline

Posts: 1552928928

View Profile Personal Message (Offline)

Ignore
1552928928
Reply with quote  #2

1552928928
Report to moderator
nizzuu
Full Member
***
Offline Offline

Activity: 193
Merit: 100

Cryptocurrency enthusiast


View Profile
November 26, 2017, 07:47:14 PM
 #2902

Large pages has already been done for cryptonight. I'm doing something that hasn't been done yet.
Large pages for cpuminer-opt will have to wait, though it could benefit a couple of memory hard algos.

Hi, this may be useful as well: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#othertechs=BMI2&expand=3773

AVX-512F section as well, but I have no supporting cpu :-( So I can't test any benefints as compared to AVX-2. They should be, but...
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 26, 2017, 10:12:27 PM
 #2903


Hi, this may be useful as well: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#othertechs=BMI2&expand=3773

AVX-512F section as well, but I have no supporting cpu :-( So I can't test any benefints as compared to AVX-2. They should be, but...

LOL. That page is permanently open in my browser.

I was wondering when someone would mention AVX-512.
I'm already dreaming about 8-way. It should be easier than going from 1-way to 4-way. That makes my next
CPU a difficult choice. Do I go with Ryzen and just SHA or wait for Cannonlake with SHA and AVX-512?


cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
felixbrucker
Hero Member
*****
Offline Offline

Activity: 700
Merit: 500


View Profile WWW
November 26, 2017, 11:28:28 PM
 #2904

why not both Tongue
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 27, 2017, 12:30:00 AM
 #2905

why not both Tongue

Timing. Cannonlake is delayed until end of 2018 now, still possible for more delays. A Ryzen purchase
could be done in spring when the next Ubuntu LTS is released.

Both does have some advantages. I could get the Ryzen earlier before next LTS and then Cannonlake
delays don't matter.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
nizzuu
Full Member
***
Offline Offline

Activity: 193
Merit: 100

Cryptocurrency enthusiast


View Profile
November 27, 2017, 08:08:24 AM
 #2906

Do I go with Ryzen and just SHA or wait for Cannonlake with SHA and AVX-512?

Well, Ryzens have 2x128-bit wide AVX units instead of 256, don't forget about it Wink I think this is not a good implementation to target to.

As for AVX-512, the only adequate choice for now is i7-7800X, it's not so expensive but has 140W TDP Tongue (and liquit ship inside instead of solder).
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 27, 2017, 03:54:47 PM
 #2907

Do I go with Ryzen and just SHA or wait for Cannonlake with SHA and AVX-512?

Well, Ryzens have 2x128-bit wide AVX units instead of 256, don't forget about it Wink I think this is not a good implementation to target to.

As for AVX-512, the only adequate choice for now is i7-7800X, it's not so expensive but has 140W TDP Tongue (and liquit ship inside instead of solder).

Yes Ryzen's implementation of AVX2 is inferiour. But AVX2 and AVX512 don't improve a CPU's competitive disadvantage
to GPUs. SHA does and is available now with Ryzen. If Cannonlake would come out in summer I could wait for it but as the
release gets delayed it makes a Ryzen purchase more likely.


cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 28, 2017, 05:41:45 PM
 #2908

I've encountered my first major roadblock with 4-way with whirlpool.

The core of whirlpool is a table lookup but the table index is a variable meaning each lane in the vector
uses a different index, ie each lane reads a different address. This operation is not efficient with SIMD as
it needs to load one 64 bit element from 4 different addresses. Although there is a SIMD instruction to do this
it is very expensive with an optimum throughput of 4 to read 4 items. That's no faster than performing
the operation with scalar instructions. When the 4-way overhead is added it hashes significantly slower
than the old way.

I suspect GPUs don't have this problem because each lane has it's own dedicated core with it's own local memory.
All memory accesses can run in parallel with different addresses. On a CPU 4 lanes run on the same core
accessing data from 4 addresses from the same memory system, serially.

This looks like an architectural issue that can't be overcome.

This will affect algos like x15, xevan & m7m which will gain less than previously anticipated.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 28, 2017, 10:01:30 PM
 #2909

cpuminer-opt-3.7.4 is released.

Added 4 way support for tribus and nist5.

Removed some unnecessary compile options.

A 4-way Windows binary is now available.

I'm waiting for someone to get the bonus.The bonus if if one thread can fine more than one nonce in parallel.
It's very rare and I haven't seen it yet but the code checks for it to make sure second, third or even fourth
nonces are submitted. It's almost like a lotto but you don't win anything. The multiple nonces are all part
of the odds.

git: https://github.com/JayDDee/cpuminer-opt

tarball: https://drive.google.com/file/d/1AwdqMWFufxZmuKWKHkWCjlfRm0SPqID8/view?usp=sharing

Windows binaries: https://drive.google.com/file/d/1opN5Wb5tL9_wes8RsZ6QSftOOo2Uhb6p/view?usp=sharing

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
spider703
Full Member
***
Offline Offline

Activity: 224
Merit: 100



View Profile WWW
November 28, 2017, 10:58:30 PM
 #2910

cpuminer-4way not working on my i7-3770

BTC - 1ETPsixbwuDNJH5XvDb3kMrCr69ZFePSdK ETH - 0xDd48FE784Ac7d4e39C8cEE96BF0dB5269753b22E
LTC - LL5d4Nk5CuLy8vRRH2iD4sYq8eRdiamG7H  paypal.me/spider703 write to me https://t.me/spider703
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 29, 2017, 02:22:05 AM
 #2911

cpuminer-4way not working on my i7-3770

From README.txt:

Quote
4way requires a CPU with AES and AVX2.

Your CPU is Ivybrige,  no AVX2.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
nizzuu
Full Member
***
Offline Offline

Activity: 193
Merit: 100

Cryptocurrency enthusiast


View Profile
November 29, 2017, 06:54:39 AM
 #2912

Seems one of the questions was lost in the thread(

Sample usage:

cpuminer-aes-avx2 -a lyra2z330 -t 2 --benchmark

First hashrate output is showed after ~7-8minutes on i5-7600 (860+ h/s), and ~15minutes on a slower (450+ h/s) pentium, but the appropriate cpu utilization starts immediately.

Tried new 4way nist5, tribus - speed is showed immediately, as well as on lyra2z. Why the first output is so slow? It's a real pain to benchmark...
warcries
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
November 29, 2017, 07:03:41 AM
 #2913

@joblo

the program is working fine in my windows 10 but when I ran it in my windows server 2012. Err is program not responding.
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 29, 2017, 01:31:27 PM
 #2914

Seems one of the questions was lost in the thread(

Sample usage:

cpuminer-aes-avx2 -a lyra2z330 -t 2 --benchmark

First hashrate output is showed after ~7-8minutes on i5-7600 (860+ h/s), and ~15minutes on a slower (450+ h/s) pentium, but the appropriate cpu utilization starts immediately.

Tried new 4way nist5, tribus - speed is showed immediately, as well as on lyra2z. Why the first output is so slow? It's a real pain to benchmark...

Interesting observation. There is nothing unique about how cpuminer handles lyra2z330 vs other algos. Lyra2z330 is, however,
unique as the slowest hashing algo. it also has to do a little more work on sttartup (malloc) that others algos don't do. But that
doesn't take minutes.

It might be worthwhile to pay attention to the hash count. Is it proportional to the time? I'm not sure what else to suggest.

BTW lyra2z330 will not benefit from 4way. It is pure lyra2 which is already using AVX2 horizontally. Vertical (4way) AVX2 would
not likely affect compute performance. Furthemore lyra2z is I/O bound (memory hard) so improving compute performance just
means the CPU would spend more time stalled waiting on data from memory.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 29, 2017, 01:32:25 PM
 #2915

@joblo

the program is working fine in my windows 10 but when I ran it in my windows server 2012. Err is program not responding.

More info please It's probably your CPU.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
fynxgloire
Full Member
***
Offline Offline

Activity: 294
Merit: 100


View Profile
November 29, 2017, 03:52:02 PM
Last edit: November 29, 2017, 06:45:43 PM by fynxgloire
 #2916

Hi,
What is the best bang for the buck Xeon processor to go with the H110 Pro BTC+ motherboard?
or
Can an Intel Core i7-8700K CPU work with this motherboard?

regards
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 29, 2017, 08:09:15 PM
 #2917

Hi,
What is the best bang for the buck Xeon processor to go with the H110 Pro BTC+ motherboard?
or
Can an Intel Core i7-8700K CPU work with this motherboard?

regards

System building recommendations deserve their own thread, I'd rather keep this one about
cpuminer software.

That being said If you're tryng to build a combo GPU/CPU rig it's entirely feasible. CPU choice
depends on the features of the various CPU architectures. There are several threads already discussing
the benefits of different architectures and features for CPU mining.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 29, 2017, 11:42:35 PM
Last edit: November 30, 2017, 01:31:59 AM by joblo
 #2918

Here's a puzzle for coding experts.

I was testing with both sph and 4way running side by side and comparing the hash.
Everything was fine. Then I started cleaning up the code and the hash broke. What remains
is the last bit of code I can't remove without breaking the hash. I left a couple of comented out
lines for context (no pun intended). The code as presented works. If I remove the line indicated
the hash breaks and it only submits invalid shares that are rejected. It should be noted that
blake_ctx was never initialized nor was sph_blake256 run before blake256_close so close is
running with random data. Both variables are local and are not referenced anywhere else.

I would suspect local stack corruption but in reverse. Instead of code corrupting the stack,
removing code does.

The input data is 4 80 byte streams interleaved for blake_4way.
vhash is 4 32 byte hash streams returned from blake_4way interleaved.
hash0..3 is vhash deinterleaved for lyra2 to be run serially.
hash and ctx_blake are not in any way involved in the proper functioning of the code.

I'm stumped. Anyone have any insight?

Edit:
I tried nulling sph256_close but it failed. It seems to be dependent on actually running the code
in the function.

I moved the funky code to the end of the function and everything still works. But still, if I remove it
the returned hash is invalid. SPH is stable code and not likely to be accessing data it shouldn't.
Even if it did it would break something, not fix it. It's not even being used properly. There should be
no interactions between the sph code and the 4way code, they have their own data structures and supporting
functions and don't share anything

I'm even more stumped.

Code:
void lyra2z_hash_4way( void *state, const void *input )
{
     uint32_t hash0[8] __attribute__ ((aligned (32)));
     uint32_t hash1[8] __attribute__ ((aligned (32)));
     uint32_t hash2[8] __attribute__ ((aligned (32)));
     uint32_t hash3[8] __attribute__ ((aligned (32)));
     uint32_t vhash[8*4] __attribute__ ((aligned (64)));
     blake256_4way_context ctx __attribute__ ((aligned (64)));

uint32_t _ALIGN(64) hash[8];
sph_blake256_context ctx_blake __attribute__ ((aligned (64)));
//memcpy( &ctx_blake, &lyra2z_blake_mid, sizeof lyra2z_blake_mid );
//sph_blake256( &ctx_blake, input + 64, 16 );
// removing the following line breaks the hash
sph_blake256_close( &ctx_blake, hash );

     memcpy( &ctx, &ctx_mid, sizeof ctx_mid );
     blake256_4way( &ctx, input + (64<<2), 16 );
     blake256_4way_close( &ctx, vhash );

     m128_deinterleave_4x32( hash0, hash1, hash2, hash3, vhash, 256 );

     LYRA2Z( lyra2z_wholeMatrix, hash0, 32, hash0, 32, hash0, 32, 8, 8, 8);
     LYRA2Z( lyra2z_wholeMatrix, hash1, 32, hash1, 32, hash1, 32, 8, 8, 8);
     LYRA2Z( lyra2z_wholeMatrix, hash2, 32, hash2, 32, hash2, 32, 8, 8, 8);
     LYRA2Z( lyra2z_wholeMatrix, hash3, 32, hash3, 32, hash3, 32, 8, 8, 8);

     memcpy( state   , hash0, 32 );
     memcpy( state+32, hash1, 32 );
     memcpy( state+64, hash2, 32 );
     memcpy( state+96, hash3, 32 );
}

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
November 30, 2017, 04:50:28 AM
 #2919

I'm giving up on lyra2z 4way. It wasn't about lyra2z, the gain turned out to be only 2 % with
rejects.

The real point was to test blake256 as a step toward other algos. It's also used by cryptonight
and lyra2rev2.

With the whirlpool problem that's 2 failures in 2 days. it's a good thing i don't have a boss or
customers to answer to.

The problem with lyra2z is one of the weirdest I've ever encountered. I will probably revisit this
in the future when I am able to test the other algos that use blake256. For now I'll move forward
with other algos that build on the work done for tribus and nist5.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
warcries
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
November 30, 2017, 06:14:38 AM
 #2920

@joblo

the program is working fine in my windows 10 but when I ran it in my windows server 2012. Err is program not responding.

More info please It's probably your CPU.

I'm using Intel Xeon x3430(Lynfield) in windows server 2012. Err is program not responding.

thank you.
Pages: « 1 ... 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 [146] 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!