joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 27, 2017, 03:54:47 PM |
|
Do I go with Ryzen and just SHA or wait for Cannonlake with SHA and AVX-512?
Well, Ryzens have 2x128-bit wide AVX units instead of 256, don't forget about it I think this is not a good implementation to target to. As for AVX-512, the only adequate choice for now is i7-7800X, it's not so expensive but has 140W TDP (and liquit ship inside instead of solder). Yes Ryzen's implementation of AVX2 is inferiour. But AVX2 and AVX512 don't improve a CPU's competitive disadvantage to GPUs. SHA does and is available now with Ryzen. If Cannonlake would come out in summer I could wait for it but as the release gets delayed it makes a Ryzen purchase more likely.
|
|
|
|
|
|
There are several different types of Bitcoin clients. The most secure are full nodes like Bitcoin Core, but full nodes are more resource-heavy, and they must do a lengthy initial syncing process. As a result, lightweight clients with somewhat less security are commonly used.
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 28, 2017, 05:41:45 PM |
|
I've encountered my first major roadblock with 4-way with whirlpool.
The core of whirlpool is a table lookup but the table index is a variable meaning each lane in the vector uses a different index, ie each lane reads a different address. This operation is not efficient with SIMD as it needs to load one 64 bit element from 4 different addresses. Although there is a SIMD instruction to do this it is very expensive with an optimum throughput of 4 to read 4 items. That's no faster than performing the operation with scalar instructions. When the 4-way overhead is added it hashes significantly slower than the old way.
I suspect GPUs don't have this problem because each lane has it's own dedicated core with it's own local memory. All memory accesses can run in parallel with different addresses. On a CPU 4 lanes run on the same core accessing data from 4 addresses from the same memory system, serially.
This looks like an architectural issue that can't be overcome.
This will affect algos like x15, xevan & m7m which will gain less than previously anticipated.
|
|
|
|
|
spider703
|
|
November 28, 2017, 10:58:30 PM |
|
cpuminer-4way not working on my i7-3770
|
BTC 1Hof999zuqUKpifmzrSABv7tNr4nRaoJKM LTC Lf2L6DTBr2gXT38d7cVRqDQiHMndtXQyNW or write me in https://t.me/spider703
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 29, 2017, 02:22:05 AM |
|
cpuminer-4way not working on my i7-3770
From README.txt: 4way requires a CPU with AES and AVX2.
Your CPU is Ivybrige, no AVX2.
|
|
|
|
nizzuu
Full Member
Offline
Activity: 187
Merit: 100
Cryptocurrency enthusiast
|
|
November 29, 2017, 06:54:39 AM |
|
Seems one of the questions was lost in the thread(
Sample usage:
cpuminer-aes-avx2 -a lyra2z330 -t 2 --benchmark
First hashrate output is showed after ~7-8minutes on i5-7600 (860+ h/s), and ~15minutes on a slower (450+ h/s) pentium, but the appropriate cpu utilization starts immediately.
Tried new 4way nist5, tribus - speed is showed immediately, as well as on lyra2z. Why the first output is so slow? It's a real pain to benchmark...
|
|
|
|
warcries
Newbie
Offline
Activity: 4
Merit: 0
|
|
November 29, 2017, 07:03:41 AM |
|
@joblo
the program is working fine in my windows 10 but when I ran it in my windows server 2012. Err is program not responding.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 29, 2017, 01:31:27 PM |
|
Seems one of the questions was lost in the thread(
Sample usage:
cpuminer-aes-avx2 -a lyra2z330 -t 2 --benchmark
First hashrate output is showed after ~7-8minutes on i5-7600 (860+ h/s), and ~15minutes on a slower (450+ h/s) pentium, but the appropriate cpu utilization starts immediately.
Tried new 4way nist5, tribus - speed is showed immediately, as well as on lyra2z. Why the first output is so slow? It's a real pain to benchmark...
Interesting observation. There is nothing unique about how cpuminer handles lyra2z330 vs other algos. Lyra2z330 is, however, unique as the slowest hashing algo. it also has to do a little more work on sttartup (malloc) that others algos don't do. But that doesn't take minutes. It might be worthwhile to pay attention to the hash count. Is it proportional to the time? I'm not sure what else to suggest. BTW lyra2z330 will not benefit from 4way. It is pure lyra2 which is already using AVX2 horizontally. Vertical (4way) AVX2 would not likely affect compute performance. Furthemore lyra2z is I/O bound (memory hard) so improving compute performance just means the CPU would spend more time stalled waiting on data from memory.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 29, 2017, 01:32:25 PM |
|
@joblo
the program is working fine in my windows 10 but when I ran it in my windows server 2012. Err is program not responding.
More info please It's probably your CPU.
|
|
|
|
fynxgloire
|
|
November 29, 2017, 03:52:02 PM Last edit: November 29, 2017, 06:45:43 PM by fynxgloire |
|
Hi, What is the best bang for the buck Xeon processor to go with the H110 Pro BTC+ motherboard? or Can an Intel Core i7-8700K CPU work with this motherboard?
regards
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 29, 2017, 08:09:15 PM |
|
Hi, What is the best bang for the buck Xeon processor to go with the H110 Pro BTC+ motherboard? or Can an Intel Core i7-8700K CPU work with this motherboard?
regards
System building recommendations deserve their own thread, I'd rather keep this one about cpuminer software. That being said If you're tryng to build a combo GPU/CPU rig it's entirely feasible. CPU choice depends on the features of the various CPU architectures. There are several threads already discussing the benefits of different architectures and features for CPU mining.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 29, 2017, 11:42:35 PM Last edit: November 30, 2017, 01:31:59 AM by joblo |
|
Here's a puzzle for coding experts. I was testing with both sph and 4way running side by side and comparing the hash. Everything was fine. Then I started cleaning up the code and the hash broke. What remains is the last bit of code I can't remove without breaking the hash. I left a couple of comented out lines for context (no pun intended). The code as presented works. If I remove the line indicated the hash breaks and it only submits invalid shares that are rejected. It should be noted that blake_ctx was never initialized nor was sph_blake256 run before blake256_close so close is running with random data. Both variables are local and are not referenced anywhere else. I would suspect local stack corruption but in reverse. Instead of code corrupting the stack, removing code does. The input data is 4 80 byte streams interleaved for blake_4way. vhash is 4 32 byte hash streams returned from blake_4way interleaved. hash0..3 is vhash deinterleaved for lyra2 to be run serially. hash and ctx_blake are not in any way involved in the proper functioning of the code. I'm stumped. Anyone have any insight? Edit: I tried nulling sph256_close but it failed. It seems to be dependent on actually running the code in the function. I moved the funky code to the end of the function and everything still works. But still, if I remove it the returned hash is invalid. SPH is stable code and not likely to be accessing data it shouldn't. Even if it did it would break something, not fix it. It's not even being used properly. There should be no interactions between the sph code and the 4way code, they have their own data structures and supporting functions and don't share anything I'm even more stumped. void lyra2z_hash_4way( void *state, const void *input ) { uint32_t hash0[8] __attribute__ ((aligned (32))); uint32_t hash1[8] __attribute__ ((aligned (32))); uint32_t hash2[8] __attribute__ ((aligned (32))); uint32_t hash3[8] __attribute__ ((aligned (32))); uint32_t vhash[8*4] __attribute__ ((aligned (64))); blake256_4way_context ctx __attribute__ ((aligned (64)));
uint32_t _ALIGN(64) hash[8]; sph_blake256_context ctx_blake __attribute__ ((aligned (64))); //memcpy( &ctx_blake, &lyra2z_blake_mid, sizeof lyra2z_blake_mid ); //sph_blake256( &ctx_blake, input + 64, 16 ); // removing the following line breaks the hash sph_blake256_close( &ctx_blake, hash );
memcpy( &ctx, &ctx_mid, sizeof ctx_mid ); blake256_4way( &ctx, input + (64<<2), 16 ); blake256_4way_close( &ctx, vhash );
m128_deinterleave_4x32( hash0, hash1, hash2, hash3, vhash, 256 );
LYRA2Z( lyra2z_wholeMatrix, hash0, 32, hash0, 32, hash0, 32, 8, 8, 8); LYRA2Z( lyra2z_wholeMatrix, hash1, 32, hash1, 32, hash1, 32, 8, 8, 8); LYRA2Z( lyra2z_wholeMatrix, hash2, 32, hash2, 32, hash2, 32, 8, 8, 8); LYRA2Z( lyra2z_wholeMatrix, hash3, 32, hash3, 32, hash3, 32, 8, 8, 8);
memcpy( state , hash0, 32 ); memcpy( state+32, hash1, 32 ); memcpy( state+64, hash2, 32 ); memcpy( state+96, hash3, 32 ); }
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 30, 2017, 04:50:28 AM |
|
I'm giving up on lyra2z 4way. It wasn't about lyra2z, the gain turned out to be only 2 % with rejects.
The real point was to test blake256 as a step toward other algos. It's also used by cryptonight and lyra2rev2.
With the whirlpool problem that's 2 failures in 2 days. it's a good thing i don't have a boss or customers to answer to.
The problem with lyra2z is one of the weirdest I've ever encountered. I will probably revisit this in the future when I am able to test the other algos that use blake256. For now I'll move forward with other algos that build on the work done for tribus and nist5.
|
|
|
|
warcries
Newbie
Offline
Activity: 4
Merit: 0
|
|
November 30, 2017, 06:14:38 AM |
|
@joblo
the program is working fine in my windows 10 but when I ran it in my windows server 2012. Err is program not responding.
More info please It's probably your CPU. I'm using Intel Xeon x3430(Lynfield) in windows server 2012. Err is program not responding. thank you.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
|
November 30, 2017, 06:48:37 AM |
|
@joblo
the program is working fine in my windows 10 but when I ran it in my windows server 2012. Err is program not responding.
More info please It's probably your CPU. I'm using Intel Xeon x3430(Lynfield) in windows server 2012. Err is program not responding. thank you. Did you read README.txt? There is a very interesting line: Choose the exe that best matches you CPU's features or use trial and error to find the fastest one that doesn't crash.
|
|
|
|
fynxgloire
|
|
November 30, 2017, 08:14:26 AM |
|
I am thinking of upgrading my Celeron to an 8700k or 7700k on my mining rig. Currently on my gaming rig I am getting 2000kH/s on the Intel Core i7-6700 CPU ( using the 4 way executable ) Does anyone have any benchmarks for the 7700 or 8700 chips so I can compare hash rates and decide what to buy?
|
|
|
|
nizzuu
Full Member
Offline
Activity: 187
Merit: 100
Cryptocurrency enthusiast
|
|
November 30, 2017, 10:35:18 AM |
|
to an 8700k or 7700k
This is out of topic, but... I've just tested 4-way binary (the one provided by joblo) on i5-7600 (non-K) under Deepcool Redhat (it handles up to 250w TDP, haha) with tribus and nist5. settings: cpuminer-4way -t 4 -a ... --benchmark CPU heats up to 95°C (it's 19°C in my server room) on all cores and goes to throttling - about 30-40% hashrate drop (same settings for cryptonight w/o avx/avx2 max-out at 61°C on 3 of 4 cores with -t 3 setting). This means that high-freq CPU must have Z-series chipset to set AVX downclock offset in bios. Other solution - go for a lower-freq desktop cpu (e.g. buy i5-7400 instead of i5-7600 and so) or cpu with solder under it's cap. I'm happy that this cpu maxes-out with -t 2 on lyra2z330 at 72°C, not -t 4, so I'm not that angry I would never ever buy this crap for a rig again (yep, I know this crap is good in gaming and compiling, and so on), and 7700/7700k/8700/8700k, too. Seems that 3.7...3.8GHz it that maximum that Skylake/Kabylake/Coffe Lake can handle under avx load for all cores w/o overheating. If I were you, I'd go for i7-6800k/6850k/6900k or old Xeons E5v3 (they have avx2 and are rather cheap for now).
|
|
|
|
ol92
|
|
November 30, 2017, 10:58:23 AM |
|
I am thinking of upgrading my Celeron to an 8700k or 7700k on my mining rig. Currently on my gaming rig I am getting 2000kH/s on the Intel Core i7-6700 CPU ( using the 4 way executable ) Does anyone have any benchmarks for the 7700 or 8700 chips so I can compare hash rates and decide what to buy?
which algo?
|
|
|
|
fynxgloire
|
|
November 30, 2017, 11:36:39 AM |
|
I am thinking of upgrading my Celeron to an 8700k or 7700k on my mining rig. Currently on my gaming rig I am getting 2000kH/s on the Intel Core i7-6700 CPU ( using the 4 way executable ) Does anyone have any benchmarks for the 7700 or 8700 chips so I can compare hash rates and decide what to buy?
which algo? I am using this script to get 2000 kH/s on current Intel i7-6700 Bin\cpuminer-opt\cpuminer-4way.exe -a lyra2z -o stratum+tcp://xzc.suprnova.cc:1596 -u fynxgloire.Desktop1080CPU -p Intel
|
|
|
|
4ward
Member
Offline
Activity: 473
Merit: 18
|
|
November 30, 2017, 01:00:29 PM |
|
to an 8700k or 7700k
This is out of topic, but... I've just tested 4-way binary (the one provided by joblo) on i5-7600 (non-K) under Deepcool Redhat (it handles up to 250w TDP, haha) with tribus and nist5. settings: cpuminer-4way -t 4 -a ... --benchmark CPU heats up to 95°C (it's 19°C in my server room) on all cores and goes to throttling - about 30-40% hashrate drop (same settings for cryptonight w/o avx/avx2 max-out at 61°C on 3 of 4 cores with -t 3 setting). This means that high-freq CPU must have Z-series chipset to set AVX downclock offset in bios. Other solution - go for a lower-freq desktop cpu (e.g. buy i5-7400 instead of i5-7600 and so) or cpu with solder under it's cap. I'm happy that this cpu maxes-out with -t 2 on lyra2z330 at 72°C, not -t 4, so I'm not that angry I would never ever buy this crap for a rig again (yep, I know this crap is good in gaming and compiling, and so on), and 7700/7700k/8700/8700k, too. Seems that 3.7...3.8GHz it that maximum that Skylake/Kabylake/Coffe Lake can handle under avx load for all cores w/o overheating. If I were you, I'd go for i7-6800k/6850k/6900k or old Xeons E5v3 (they have avx2 and are rather cheap for now). i5 7600k overclocked @ 4.5Ghz with watercooling reaches 75°C on tribus (70 with nist5) If you get 95°C at stock speed, you need a better cooler. Intel's stock cooler is not enough for high loads over a long time.
|
|
|
|
|