Bitcoin Forum
April 24, 2019, 07:06:47 AM *
News: Latest Bitcoin Core release: 0.17.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 [155] 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 »
  Print  
Author Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner  (Read 419408 times)
Larvitar
Jr. Member
*
Offline Offline

Activity: 198
Merit: 1


View Profile
December 19, 2017, 09:45:25 PM
 #3081

Making 2 packages is more work for me, I'm trying to reduce the work.

I'm leaning toward replacing avx2-sha with avx-sha unless someone shows a good
reason for avx2-sha.

Edit: a couple more points

AVX2 and SHA improve different algos and different parts of the same algos. AVX won't have
any effect on SHA code on Ryzen CPUs. There are no technical concerns with AVX-SHA.

The only question is performance on algos that have use AVX2. The best algo to test this is
lyra2v2, it is almost 100% AVX2 and not too hard on memory so it will expose any weaknesses
in AVX2 on Ryzen.
I'm getting around 1.6~2MH/s in Lyra2z. But I'm getting low difficult share errors on every share:
Code:
[2017-12-19 17:45:42] Starting Stratum on stratum+tcp://us-east.lyra2z-hub.miningpoolhub.com:20581
[2017-12-19 17:45:42] 16 miner threads started, using 'lyra2rev2' algorithm.
[2017-12-19 17:45:43] Stratum difficulty set to 10
[2017-12-19 17:46:05] CPU #9: 2097.15 kH, 116.83 kH/s
[2017-12-19 17:46:05] CPU #11: 2097.15 kH, 116.65 kH/s
[2017-12-19 17:46:05] CPU #7: 2097.15 kH, 116.33 kH/s
[2017-12-19 17:46:06] CPU #15: 2097.15 kH, 114.68 kH/s
[2017-12-19 17:46:06] CPU #6: 2097.15 kH, 111.85 kH/s
[2017-12-19 17:46:06] CPU #3: 2097.15 kH, 111.58 kH/s
[2017-12-19 17:46:06] CPU #14: 2097.15 kH, 110.05 kH/s
[2017-12-19 17:46:07] CPU #10: 2097.15 kH, 108.93 kH/s
[2017-12-19 17:46:07] CPU #5: 2097.15 kH, 107.02 kH/s
[2017-12-19 17:46:07] CPU #8: 2097.15 kH, 106.52 kH/s
[2017-12-19 17:46:07] CPU #2: 2097.15 kH, 106.20 kH/s
[2017-12-19 17:46:09] CPU #1: 2097.15 kH, 98.64 kH/s
[2017-12-19 17:46:10] CPU #13: 2097.15 kH, 93.44 kH/s
[2017-12-19 17:46:16] CPU #4: 2097.15 kH, 73.31 kH/s
[2017-12-19 17:46:16] CPU #0: 2097.15 kH, 73.11 kH/s
[2017-12-19 17:46:20] CPU #12: 2097.15 kH, 64.31 kH/s
[2017-12-19 17:46:39] Stratum difficulty set to 7
[2017-12-19 17:46:42] CPU #13: 3174.84 kH, 98.57 kH/s
[2017-12-19 17:46:42] Rejected 1/1 (100.0%), 34.63 MH, 1634.59 kH/s
[2017-12-19 17:46:42] reject reason: low difficulty share of 8.935987400308036e-8
[2017-12-19 17:46:42] factor reduced to : 0.67

Is it miner-related or pool-related?

User error, look carefullly at the algo.

Another note about lyra2z, 4way is likely slower due to previously mentioned issues with it.
Damn! You are right

Too many "Lyras"  Cheesy
1556089607
Hero Member
*
Offline Offline

Posts: 1556089607

View Profile Personal Message (Offline)

Ignore
1556089607
Reply with quote  #2

1556089607
Report to moderator
1556089607
Hero Member
*
Offline Offline

Posts: 1556089607

View Profile Personal Message (Offline)

Ignore
1556089607
Reply with quote  #2

1556089607
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1556089607
Hero Member
*
Offline Offline

Posts: 1556089607

View Profile Personal Message (Offline)

Ignore
1556089607
Reply with quote  #2

1556089607
Report to moderator
1556089607
Hero Member
*
Offline Offline

Posts: 1556089607

View Profile Personal Message (Offline)

Ignore
1556089607
Reply with quote  #2

1556089607
Report to moderator
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 19, 2017, 11:39:38 PM
 #3082

Damn! You are right

Too many "Lyras"  Cheesy

If you're comparing AVX2 vs AVX performance lyra2v2 is better, less memory hard.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
thin
Full Member
***
Offline Offline

Activity: 322
Merit: 107



View Profile
December 20, 2017, 04:33:50 AM
 #3083

I have  a problem running cpuminer-opt  v3.7.x on a couple of machines with win 10 x64. when started, it silently waits several seconds, then exit, without writing  symbol. any advice? do I miss some runtime libs ?

joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 20, 2017, 04:43:12 AM
 #3084

I have  a problem running cpuminer-opt  v3.7.x on a couple of machines with win 10 x64. when started, it silently waits several seconds, then exit, without writing  symbol. any advice? do I miss some runtime libs ?

Advice? Yes, provide proper information. What CPU, algo, command line options?
It's all displayed when the program starts, always provide that when reporting a problem.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
thin
Full Member
***
Offline Offline

Activity: 322
Merit: 107



View Profile
December 20, 2017, 04:55:06 AM
Last edit: December 20, 2017, 05:06:47 AM by thin
 #3085

it does not depend on algo, even if I trying to get help noting displayed. CPU Celeron G3930

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-avx2.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-avx.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-4way.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-4way.exe

C:\App\cpuminer-opt-3.7.7-windows-v2>

joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 20, 2017, 05:42:25 AM
 #3086

it does not depend on algo, even if I trying to get help noting displayed. CPU Celeron G3930

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-avx2.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-avx.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-4way.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-4way.exe

C:\App\cpuminer-opt-3.7.7-windows-v2>

First of all your CPU doesn't have AVX or AVX2 so stay away from those and 4way.
You should be using aes-sse42. But there's a bigger problem if it won't display help.
I've never seen that kind of a problem, it looks like it's your system. No one else is
complaining so the problem is at your end.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
thin
Full Member
***
Offline Offline

Activity: 322
Merit: 107



View Profile
December 20, 2017, 05:55:34 AM
 #3087

it does not depend on algo, even if I trying to get help noting displayed. CPU Celeron G3930

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-avx2.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-avx.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-4way.exe --help

C:\App\cpuminer-opt-3.7.7-windows-v2>cpuminer-4way.exe

C:\App\cpuminer-opt-3.7.7-windows-v2>

First of all your CPU doesn't have AVX or AVX2 so stay away from those and 4way.
You should be using aes-sse42. But there's a bigger problem if it won't display help.
I've never seen that kind of a problem, it looks like it's your system. No one else is
complaining so the problem is at your end.

that's why I ask if I miss some runtime libs ). I never seen such behavior before - it starts, waits silently several seconds, exit. quite unusual.
thanks anyway

Painlor
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 20, 2017, 08:08:53 AM
 #3088

since today I got an error by doing:

./build.sh


compilation terminated.
Makefile:3144: recipe for target 'algo/cpuminer-m7m.o' failed
make[2]: *** [algo/cpuminer-m7m.o] Error 1
make[2]: *** Waiting for unfinished jobs....
mv -f .deps/cpuminer-util.Tpo .deps/cpuminer-util.Po
mv -f .deps/cpuminer-cpu-miner.Tpo .deps/cpuminer-cpu-miner.Po
mv -f algo/.deps/cpuminer-neoscrypt.Tpo algo/.deps/cpuminer-neoscrypt.Po
make[2]: Leaving directory '/downloads/7/77/cpuminer-opt'
Makefile:4147: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/downloads/7/77/cpuminer-opt'
Makefile:725: recipe for target 'all' failed
make: *** [all] Error 2
strip: 'cpuminer': No such file

i have changed nothing. Thanks for Help

solved!
 
apt-get install libgmp-dev
rukez
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 20, 2017, 03:29:57 PM
 #3089

Looks like there is some performance problem with yescript16 implementation on coffee-lake cpus
Intel i5 4440 (stock 3.3GHz) @4 threads generates ~600h/s
Intel i7 5820k (no overclock, 3.6GHz) @12 threads generates ~1200h/s in pool and up to 1400 in solo mining (6 threads generate little less) with both cpuminer-opt (3.7.6 and 3.7.7v2) under windows64 and under ubuntu64
Intel i7 8700k (stock 4.3GHz) @12 threads generates only 950h/s (both pool and solo), overclocking to 5Ghz (50x100) with cache overclock to 4.6Ghz (stock is 4.2) gives no profit, even more usually performance degrade (power limit disabled, core temperatures are ~75C so no throttling involved), 1-2-3-4-5-6 threads gives less results, overclocking bus to 130Mhz (also with ram) gives no result - maximum is about 950h/s
Even more funny - on stock frequency, switching from AVX to SSE2 gives some performance boost from 950 to 1000-1050h/s

I understand that 8700k lacks quad-channel RAM and has little bit less L3 cache (12 vs 15Mb), compared to 5820, but bottleneck is obviously something different because ram overclock gives no result (so double channel is not a problem, we should see performance boost when overclocking bus and ram) and cache is also not a problem (25% cache is gone but we gain >30% frequency bonus (when overclocked) so our smaller cache works at higher speeds together with cpu cores - we can put less but more frequent and calc it in less time - ) also, compared to 4440, if cache was a bottleneck, we have twice more (12 vs 6Mb), taking in mind much higher speed and optimized pipelane, if cache only matters, we should have 2x gain, compared to 4440

I hope for a fix  Smiley
ypsi
Full Member
***
Offline Offline

Activity: 336
Merit: 158


The Wheel weaves as the Wheel wills


View Profile
December 20, 2017, 03:41:03 PM
 #3090

Looks like there is some performance problem with yescript16 implementation on coffee-lake cpus
Intel i5 4440 (stock 3.3GHz) @4 threads generates ~600h/s
Intel i7 5820k (no overclock, 3.6GHz) @12 threads generates ~1200h/s in pool and up to 1400 in solo mining (6 threads generate little less) with both cpuminer-opt (3.7.6 and 3.7.7v2) under windows64 and under ubuntu64
Intel i7 8700k (stock 4.3GHz) @12 threads generates only 950h/s (both pool and solo), overclocking to 5Ghz (50x100) with cache overclock to 4.6Ghz (stock is 4.2) gives no profit, even more usually performance degrade (power limit disabled, core temperatures are ~75C so no throttling involved), 1-2-3-4-5-6 threads gives less results, overclocking bus to 130Mhz (also with ram) gives no result - maximum is about 950h/s
Even more funny - on stock frequency, switching from AVX to SSE2 gives some performance boost from 950 to 1000-1050h/s

I understand that 8700k lacks quad-channel RAM and has little bit less L3 cache (12 vs 15Mb), compared to 5820, but bottleneck is obviously something different because ram overclock gives no result (so double channel is not a problem, we should see performance boost when overclocking bus and ram) and cache is also not a problem (25% cache is gone but we gain >30% frequency bonus (when overclocked) so our smaller cache works at higher speeds together with cpu cores - we can put less but more frequent and calc it in less time - ) also, compared to 4440, if cache was a bottleneck, we have twice more (12 vs 6Mb), taking in mind much higher speed and optimized pipelane, if cache only matters, we should have 2x gain, compared to 4440

I hope for a fix  Smiley
Seems indeed something is wrong with that setup using the coffe lake chip, it should at least hit ~1500 on that stock speed. But didn't Intel change how the L2/L3 cache works on coffee lake? This might be throwing cpuminer-opt off?

--ypsi
4ward
Member
**
Offline Offline

Activity: 456
Merit: 17


View Profile
December 20, 2017, 03:55:12 PM
 #3091

Looks like there is some performance problem with yescript16 implementation on coffee-lake cpus
Intel i5 4440 (stock 3.3GHz) @4 threads generates ~600h/s
Intel i7 5820k (no overclock, 3.6GHz) @12 threads generates ~1200h/s in pool and up to 1400 in solo mining (6 threads generate little less) with both cpuminer-opt (3.7.6 and 3.7.7v2) under windows64 and under ubuntu64
Intel i7 8700k (stock 4.3GHz) @12 threads generates only 950h/s (both pool and solo), overclocking to 5Ghz (50x100) with cache overclock to 4.6Ghz (stock is 4.2) gives no profit, even more usually performance degrade (power limit disabled, core temperatures are ~75C so no throttling involved), 1-2-3-4-5-6 threads gives less results, overclocking bus to 130Mhz (also with ram) gives no result - maximum is about 950h/s
Even more funny - on stock frequency, switching from AVX to SSE2 gives some performance boost from 950 to 1000-1050h/s

I understand that 8700k lacks quad-channel RAM and has little bit less L3 cache (12 vs 15Mb), compared to 5820, but bottleneck is obviously something different because ram overclock gives no result (so double channel is not a problem, we should see performance boost when overclocking bus and ram) and cache is also not a problem (25% cache is gone but we gain >30% frequency bonus (when overclocked) so our smaller cache works at higher speeds together with cpu cores - we can put less but more frequent and calc it in less time - ) also, compared to 4440, if cache was a bottleneck, we have twice more (12 vs 6Mb), taking in mind much higher speed and optimized pipelane, if cache only matters, we should have 2x gain, compared to 4440

I hope for a fix  Smiley

I think it's definitely not normal, You should be getting more. And SSE2 really is faster than AVX
On i5 7600k @4.5Ghz I get ~920 H/s on AVX and ~950 on SSE2 (!)

Drag0g0
Newbie
*
Offline Offline

Activity: 62
Merit: 0


View Profile
December 20, 2017, 04:03:12 PM
 #3092

Lol i have i7-4790k and did think that im better using avx, now did try sse and it faster... would love to know it before.
rukez
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 20, 2017, 04:03:47 PM
 #3093

Basic setup is ok, synth benchmarks shows nice performance
In coffee-lake, as i can see from specs, main difference is added edram L4 cache, which is used as gpu vram (i currently use this embedded gpu, but looks like there will be no benefit from adding external video board because cpu seems to be unable to use this L4 cache for anything else then gpu, or am i wrong?), other changes are (compared to 5820k):
L1 both have 32kb per core, both are 4-way accessed
L2 both have 256kb per core, but 8700k have only 4-way access while 5820 can use 8-way - not sure if it is important for scrypt
Less L3 cache (12 vs 15Mb) and less access ways (16 vs 20) but cache frequency is faster (at least 4.2GHz without overclock, while 5820 operates at about 3Ghz)
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 20, 2017, 04:12:47 PM
 #3094

Looks like there is some performance problem with yescript16 implementation on coffee-lake cpus
Intel i5 4440 (stock 3.3GHz) @4 threads generates ~600h/s
Intel i7 5820k (no overclock, 3.6GHz) @12 threads generates ~1200h/s in pool and up to 1400 in solo mining (6 threads generate little less) with both cpuminer-opt (3.7.6 and 3.7.7v2) under windows64 and under ubuntu64
Intel i7 8700k (stock 4.3GHz) @12 threads generates only 950h/s (both pool and solo), overclocking to 5Ghz (50x100) with cache overclock to 4.6Ghz (stock is 4.2) gives no profit, even more usually performance degrade (power limit disabled, core temperatures are ~75C so no throttling involved), 1-2-3-4-5-6 threads gives less results, overclocking bus to 130Mhz (also with ram) gives no result - maximum is about 950h/s
Even more funny - on stock frequency, switching from AVX to SSE2 gives some performance boost from 950 to 1000-1050h/s

I understand that 8700k lacks quad-channel RAM and has little bit less L3 cache (12 vs 15Mb), compared to 5820, but bottleneck is obviously something different because ram overclock gives no result (so double channel is not a problem, we should see performance boost when overclocking bus and ram) and cache is also not a problem (25% cache is gone but we gain >30% frequency bonus (when overclocked) so our smaller cache works at higher speeds together with cpu cores - we can put less but more frequent and calc it in less time - ) also, compared to 4440, if cache was a bottleneck, we have twice more (12 vs 6Mb), taking in mind much higher speed and optimized pipelane, if cache only matters, we should have 2x gain, compared to 4440

I hope for a fix  Smiley

That's quite a first post, you did your homework.

Your results are concerning but not a software issue. If Coffeelake has a design quirk that can be worked around in software
such a workaround would probably have a negative effect on other models. If it's a coffee lake issue it needs a Coffeelake fix.

On the technical side it's difficult to compare the 8700K with either of the two other CPUs you tested. I have a 6700K @ 4 GHz
and it gets 780 H/s. With a projected linear increase the 8700K should get around 1170. Clearly the 8700k has a problem.

I haven't done a deep dive into the architecture to see if there is a design change that could have an effect. As a new CPU
it could still have a few issues that need to be ironed out.

I noticed in my brief test that reducing the thread count by half on my 6700K had no effect on total hash rate. This tells me the bottleneck
is memory access (cache or main). You stated lower performance with fewer threads. That may be a clue. If your CPU is not
I/O bound when mining an I/O bound algo there may be a problem on the compute side.

Your observation that SSE2 build is faster than AVX is very interesting and deserves more testing. There is no AVX specific code in
yescrypt so there should be no difference in hash speed. Yescrypt is also very self contained, ie it doesn't use any libraries. The only effect
the AVX flag would have is on the compiler. It may compile code differently but there are no big gains between SSE2 and AVX, they are
both mostly limited to 128 bit vectors. It's only with AVX2 that there is a quantum leap to 256 bit vectors. Again I speculate but maybe the
compiler isn't yet tweaked for Codffeelake. What version did you compile with?

I suggest you try other algos with 6 and 12 threads to get a more complete profile. If some algos are affected more than others
it may reveal a pattern.

It would also be interesting to see if other Coffelake owners see the same issues.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
nizzuu
Full Member
***
Offline Offline

Activity: 193
Merit: 100

Cryptocurrency enthusiast


View Profile
December 20, 2017, 06:34:31 PM
 #3095

Not only coffee lake has bottlenecks...

lyra2z330, Core i5-7600 (non-k) locked at 3.9GHz @ all cores, 16Gb DDR4-2400 dual channel

2 threads w/o affinity @ AVX2 build => ~830 h/s, results as ~50% load for each of 4 cores
2 threads --cpu-affinity 3 @ AVX2 build => ~865 h/s (this is interesting), results as ~100% load for cores 0 and 1
4 threads w/o affinity @ AVX2 build => ~790 h/s (that's a crap)
4 threads --cpu-affinity 15 @ AVX2 build => ~792 h/s (that's a crap, too)

I guess these cpus need 4 channel ram to perform at full speed Sad

There is a tool which can show L3 cache usage https://www.cpuid.com/softwares/perfmonitor-2.html, it is old as heck, but still works on some (!) configurations. It could work on i7-6700 and help with optimizations. I will also try to use https://github.com/opcm/pcm which supports Intel's cache monitoring technology.
rukez
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 20, 2017, 06:52:32 PM
 #3096

Quote
If Coffeelake has a design quirk that can be worked around in software
such a workaround would probably have a negative effect on other models. If it's a coffee lake issue it needs a Coffeelake fix.
I suppose that populating scrypt algo parameters to command line can help a lot, if yescript, like regular scrypt, can be calced with different algo presets (precache amount, link split size and so on), then coffeelake make take profit from better cache-fitting splitting.
Anyway tomorrow i'll try to recompile miner with different presets in scrypt.c
Quote
Again I speculate but maybe the
compiler isn't yet tweaked for Codffeelake. What version did you compile with?
i currently use windows precompiled versions on both 5820 and 8700, tomorrow i'll try latest gcc with skylake opt flag, but i suppose that compiler won't make any changes inside asm instruction so opt flag won't help, at least a lot.
Quote
You stated lower performance with fewer threads.
That is mostly windows problem - with 6 threads it uses only 3 physical cores and 3 ht cores - clearly seen with cputemp - after start of 6 threads, 3 cores start to generate heat (70-75C on busy ones, 45C on spare) and shows 100% load, with 12 threads all cores are hot and busy
Under ubuntu difference is within the margin of error
Quote
I suggest you try other algos with 6 and 12 threads to get a more complete profile. If some algos are affected more than others
it may reveal a pattern.
yep, i'll try
4ward
Member
**
Offline Offline

Activity: 456
Merit: 17


View Profile
December 20, 2017, 06:57:44 PM
 #3097

few benchmarks on yescryptr16

3.7.7 "v1" (gcc 4.8.3)
avx - ~930 h/s
sse2 - ~ 870 h/s

3.7.7 v2 (gcc 5.3.1)
avx - ~950 h/s
sse2 - ~970 h/s


3.7.7 4ward (gcc 6.2.1)
avx - ~970 h/s
sse2 - ~960 h/s

additional algos that show better performance in sse2 than avx (although very small):
yescrypt, poltimos and lbry

joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 20, 2017, 07:07:23 PM
 #3098

Quote
If Coffeelake has a design quirk that can be worked around in software
such a workaround would probably have a negative effect on other models. If it's a coffee lake issue it needs a Coffeelake fix.
I suppose that populating scrypt algo parameters to command line can help a lot, if yescript, like regular scrypt, can be calced with different algo presets (precache amount, link split size and so on), then coffeelake make take profit from better cache-fitting splitting.
Anyway tomorrow i'll try to recompile miner with different presets in scrypt.c
Quote
Again I speculate but maybe the
compiler isn't yet tweaked for Codffeelake. What version did you compile with?
i currently use windows precompiled versions on both 5820 and 8700, tomorrow i'll try latest gcc with skylake opt flag, but i suppose that compiler won't make any changes inside asm instruction so opt flag won't help, at least a lot.
Quote
You stated lower performance with fewer threads.
That is mostly windows problem - with 6 threads it uses only 3 physical cores and 3 ht cores - clearly seen with cputemp - after start of 6 threads, 3 cores start to generate heat (70-75C on busy ones, 45C on spare) and shows 100% load, with 12 threads all cores are hot and busy
Under ubuntu difference is within the margin of error
Quote
I suggest you try other algos with 6 and 12 threads to get a more complete profile. If some algos are affected more than others
it may reveal a pattern.
yep, i'll try

If 6 threads aren't balanced use custom cpu affinity (--cpu-affinity 0x555) It hasn't been an issue on Intel before but if you
say only 3 cores are heating up then maybe the mapping has changed.

There is no ASM but there is hardcoded SSE2 code but none between SSE2 and AVX. The only differences between SSE2 and
AVX compile are generated by the compiler.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
rukez
Newbie
*
Offline Offline

Activity: 7
Merit: 0


View Profile
December 20, 2017, 07:07:38 PM
 #3099


There is a tool which can show L3 cache usage https://www.cpuid.com/softwares/perfmonitor-2.html, it is old as heck, but still works on some (!) configurations. It could work on i7-6700 and help with optimizations. I will also try to use https://github.com/opcm/pcm which supports Intel's cache monitoring technology.

5820k:
with 12 threads L2 hit is 49%, L3 hit is 6%
with 6 threads L2 hit is 54%, L3 hit is 11% but, only 3 cores under load

12 threads: stalled cycles 57% (wtf???), branch hit 99% (don't know what is it), 1.2-1.3 instruction per cycle (don't think it is important as long as it is "medium" value)

tomorrow will compare with 8700k
joblo
Legendary
*
Offline Offline

Activity: 1148
Merit: 1050


View Profile
December 20, 2017, 07:11:57 PM
 #3100

few benchmarks on yescryptr16

3.7.7 "v1" (gcc 4.8.3)
avx - ~930 h/s
sse2 - ~ 870 h/s

3.7.7 v2 (gcc 5.3.1)
avx - ~950 h/s
sse2 - ~970 h/s


3.7.7 4ward (gcc 6.2.1)
avx - ~970 h/s
sse2 - ~960 h/s

additional algos that show better performance in sse2 than avx (although very small):
yescrypt, poltimos and lbry


This shows gcc-5.3.1 might be the issue. Is that on a Coffeelake? If not it eliminates that as a Coffeelake
issue and looks purely like a compiler version issue.

cpuminer-opt developer, https://bitcointalk.org/index.php?topic=1326803.0
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
ETH: 0x72122edabcae9d3f57eab0729305a425f6fef6d0
Pages: « 1 ... 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 [155] 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 »
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!