joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 26, 2017, 07:47:31 PM |
|
see my git, same permut (starting with bmw 80) before : 2017-01-26 19:16:55] CPU #1: 1.26 kH/s [2017-01-26 19:16:55] CPU #2: 1.33 kH/s [2017-01-26 19:16:55] CPU #0: 1.27 kH/s [2017-01-26 19:17:28] timetravel block 387151, diff 0.419
now: [2017-01-26 19:20:33] CPU #2: 78.91 kH/s [2017-01-26 19:20:33] CPU #3: 76.86 kH/s [2017-01-26 19:20:33] CPU #1: 74.50 kH/s [2017-01-26 19:20:36] accepted: 2/2 (diff 0.007), 308.84 kH/s yes!
with a few lines changes on linux with an i5 4440 (with bmw512 as first algo) 2x bitbandi miner speed Nice find. Calculating the next permutation on every hash is redundant, only need to when new work received. It could probably be moved up another level or 2. Does it have to be thread specific? It seems each thread will calculate the same chain. Maybe stratum thread can do it when new work received. I will follow up. The hashrate is now about what I expect from an unoptimized 8 function chain, ie faster than x11. And it's easy to see how all that fixed overhead would overwhelm any th erest of the algo. It's starting to make sense. I'll now try the drop in opts tpo see if they now behave as expected. On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with almost any type and can modify both args without pointers. #define swap_vars(a,b) a^=b; b^=a; a^= b;
|
|
|
|
|
|
|
|
|
Bitcoin addresses contain a checksum, so it is very unlikely that mistyping an address will cause you to lose money.
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 26, 2017, 07:49:14 PM |
|
I think you have some kind of bug : i start mining with 4 threads, but always work only 3. Yeah its always threads-1  If i put 2 , then it uses 1. If i put 3, then it uses 2  Thread count is correct for me whether default or specified.
|
|
|
|
doktor83
|
 |
January 26, 2017, 08:00:05 PM |
|
Threads 4 , no cpu 0 thread.. This is on windows, you are probably testing it on nix. 
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
 |
January 26, 2017, 08:02:38 PM Last edit: January 26, 2017, 08:18:25 PM by Epsylon3 |
|
yes, i only made one "compatible" binary for now... my windows build env is a bit messed up for the other ones  i tested different compilers recently and the CPU #0 didnt put results in your log, 4x 66 should be 264
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
 |
January 26, 2017, 08:05:23 PM |
|
On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with almost any type and can modify both args without pointers. #define swap_vars(a,b) a^=b; b^=a; a^= b;
there is also the xchg asm function, but we dont care about that, its no more a big issue xchg ax, bx ; Put AX in BX and BX in AX __xchg() func should exist
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 26, 2017, 08:14:58 PM |
|
644kH/s on i7-6700k without AES Groestl.
Edit: from 445
Edit: AES Groestl now works, 810 kH/s!
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 26, 2017, 08:16:05 PM |
|
On a related note this is my favorite swap routine. It doesn't need a temp and being a macro it works with almost any type and can modify both args without pointers. #define swap_vars(a,b) a^=b; b^=a; a^= b;
there is also the xchg asm function, but we dont care about that, its no more a big issue xchg ax, bx ; Put AX in BX and BX in AX __xchg() func should exist I'm still lost in x86 assembly.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 26, 2017, 08:37:49 PM Last edit: January 26, 2017, 10:29:06 PM by joblo |
|
Options for optimizing permitation calculation more:
1. move it to scanhash outside the hash loop. Trivial to implement, eliminates calling it every hash loop.
2. Move it to miner_tread when new work detected. Also trivial, eliminates calling when no new work.
3. move it to stratum thread. Slightly more complex to implement, Eliminates calculation by every miner thread.
3 should work, if not I'll fall back until it does.
Edit: They all work as long an endian of ntime is correct, scanhash flips to BE before calling hash. Not a big improvement.
|
|
|
|
doktor83
|
 |
January 26, 2017, 08:55:47 PM Last edit: January 26, 2017, 09:10:03 PM by doktor83 |
|
yes, i only made one "compatible" binary for now... my windows build env is a bit messed up for the other ones  i tested different compilers recently and the CPU #0 didnt put results in your log, 4x 66 should be 264 I compiled it myself on mingw. Yes, there should be 4 threads working, but as you can see on the pic only 3 are working. WHen i put 5 threads there are really 4 working threads etc etc. It's always t-1. Edit: Edited joblo's cpuminer opt with your optimizations and here everything works ok : 
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
 |
January 26, 2017, 09:14:29 PM |
|
dont have this problem, look like you have a process using the cpu 0... maybe a gpu miner ? see the capture on my thread https://bitcointalk.org/?topic=841401
|
|
|
|
doktor83
|
 |
January 26, 2017, 09:19:47 PM |
|
damn gremlins 
|
|
|
|
doktor83
|
 |
January 26, 2017, 09:29:06 PM |
|
Something still not good, as can be seen on your capture, the hashrate keeps falling and falling... I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024. But still the falling hashrate ...
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
 |
January 26, 2017, 09:52:00 PM |
|
Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...
I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024. But still the falling hashrate ...
should be the thermal throttling
|
|
|
|
doktor83
|
 |
January 26, 2017, 09:55:25 PM |
|
Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...
I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024. But still the falling hashrate ...
should be the thermal throttling maybe, but this never happens on Limx version.
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 26, 2017, 11:39:46 PM |
|
Something still not good, as can be seen on your capture, the hashrate keeps falling and falling...
I managed to squeeze out a little bit more speed by using the aes_ni version of groestl,setting it's datalenght to 512 instead of 1024. But still the falling hashrate ...
should be the thermal throttling maybe, but this never happens on Limx version. Does pool confirm hash rate? BTW what's the speed of limx vs multi?
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
 |
January 26, 2017, 11:52:27 PM |
|
what is limx ?
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 27, 2017, 12:06:20 AM |
|
|
|
|
|
Epsylon3
Legendary
Offline
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
|
 |
January 27, 2017, 01:23:47 AM |
|
ah ok, well, forum binaries without sources are not in my time log :p
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 27, 2017, 01:34:04 AM |
|
Luffa strikes again. The SSE2 optimized Luffa function stopped working on timetravel. Everything was hashing fine and then the hashrate jumped and it stopped submitting shares. There was not a new block or any other apparent trigger. Restarting the miner doesn't work, only restoring the sph version of Luffa fixes it. Similar problem occurs with SSE2 Luffa on Qubit and Xevan with no shares ever submitted. New symptom this time, started fine then broke, could be a clue. [2017-01-26 19:16:44] Accepted 164/168 (97.6%), 11.22 MH, 735.62 kH/s, 64C [2017-01-26 19:16:58] CPU #0: 2706.77 kH, 93.75 kH/s [2017-01-26 19:16:58] Accepted 165/169 (97.6%), 9757.02 kH, 737.19 kH/s, 62C [2017-01-26 19:17:05] CPU #0: 739.42 kH, 93.66 kH/s [2017-01-26 19:17:06] Accepted 166/170 (97.6%), 7789.67 kH, 737.10 kH/s, 64C [2017-01-26 19:17:15] CPU #3: 4284.87 kH, 93.26 kH/s [2017-01-26 19:17:15] Accepted 167/171 (97.7%), 11.77 MH, 737.15 kH/s, 63C [2017-01-26 19:17:24] CPU #6: 5160.18 kH, 93.90 kH/s [2017-01-26 19:17:28] CPU #1: 5549.19 kH, 93.46 kH/s [2017-01-26 19:17:29] CPU #7: 5593.32 kH, 93.31 kH/s [2017-01-26 19:17:29] CPU #5: 5627.40 kH, 93.63 kH/s [2017-01-26 19:17:30] CPU #2: 5459.40 kH, 88.97 kH/s [2017-01-26 19:17:32] CPU #0: 2472.36 kH, 93.77 kH/s [2017-01-26 19:17:32] CPU #4: 4450.00 kH, 93.57 kH/s [2017-01-26 19:17:32] CPU #7: 297.38 kH, 93.34 kH/s [2017-01-26 19:17:32] CPU #2: 160.54 kH, 90.58 kH/s [2017-01-26 19:17:32] CPU #6: 767.45 kH, 93.86 kH/s [2017-01-26 19:17:32] CPU #3: 1603.72 kH, 93.30 kH/s [2017-01-26 19:17:32] CPU #1: 351.17 kH, 93.50 kH/s [2017-01-26 19:17:32] CPU #5: 283.92 kH, 93.69 kH/s [2017-01-26 19:18:27] CPU #4: 5614.12 kH, 101.83 kH/s <---- hashrate increased, no shares submitted after this even after new block [2017-01-26 19:18:27] CPU #7: 5600.27 kH, 101.06 kH/s [2017-01-26 19:18:27] CPU #1: 5610.26 kH, 101.19 kH/s [2017-01-26 19:18:27] CPU #0: 5626.28 kH, 101.41 kH/s [2017-01-26 19:18:27] CPU #6: 5631.70 kH, 101.38 kH/s [2017-01-26 19:18:27] CPU #3: 5598.09 kH, 100.66 kH/s [2017-01-26 19:18:28] CPU #5: 5621.38 kH, 100.83 kH/s [2017-01-26 19:18:28] CPU #2: 5434.56 kH, 97.30 kH/s [2017-01-26 19:18:35] CPU #4: 814.54 kH, 101.83 kH/s [2017-01-26 19:18:35] CPU #6: 765.25 kH, 100.96 kH/s [2017-01-26 19:18:35] CPU #5: 743.28 kH, 100.71 kH/s [2017-01-26 19:18:35] CPU #1: 778.84 kH, 101.30 kH/s [2017-01-26 19:18:35] CPU #2: 732.46 kH, 100.64 kH/s [2017-01-26 19:18:35] CPU #7: 783.21 kH, 101.50 kH/s [2017-01-26 19:18:35] CPU #0: 775.59 kH, 101.41 kH/s
|
|
|
|
joblo (OP)
Legendary
Offline
Activity: 1470
Merit: 1114
|
 |
January 27, 2017, 04:08:29 AM |
|
I think I'm pretty much settled on the final implementation of Timetravel.
Final hashrate on i7-4790K 4 GHz is right around 800 kH/s
In addition to backing out SSE2 Luffa I had to back out the optomized ctx init for Groestl. Something weird there, it works fine for a while then segfaults.
I also moved the permutation calculation to scanhash and and back to per-thread. I didn't see any performance difference and I think the global array updated by stratum thread introduced a race condition with the miner threads restarting resulting in occasional invalid job id rejects.
It's getting late, I'll build and release it tomorrow.
|
|
|
|
|