It was my assumption that you would have already done that. We both made assumptions, not a good idea.
Yeap. I have only glanced briefly at the source code. Anyway, I should apologise for my behaviour, it was unprofessional and that lead to less productive results. You weren't perfect either but everyone has faults since everyone is human and every suggestion or problem report felt like court trial just on how much work needed to be done on my end compared to what I saw being done on your end regarding the issue or suggestion (you always asked to do research or just more data without seemingly doing any research on your own before you pass your judgement). I really like your work so far and very appreciate it, though, and don't want to distract you from that more than I already did. Your perception of a court trial is pretty accurate. I was thinking something similar, a lawyer gets one crack at presenting a case. If the lawyer comes to court unprepared the case gets tossed and he doesn't get another chance. Although I'm atheist a Bible passage comes to mind. Let he who is without sin throw the first stone. The implication being that no one is without sin. I simply picked up the stones and threw them back. An apology is not required, coming to an understanding and learning from it is more important, and applies to both of us. Nevertheless you offered one and I accept. For my part I'm not one to apologize for my actions, too stubborn, I guess. But in hindsight I think the timing was bad. I had just released v3.2 and had broken zr5 which was embarassing and was trying to focus on that issue. In fact I am not pleased with the overall quality of my releases, too many bad ones. I expect better of myself. Am I losing my edge or is it because I forgot what it was like to be on a steep learning curve after so long being a subject matter expert? Yeah, I'm arrogant too. No hard feelings. Cheers.
|
|
|
I did acknowledge the overhead of the deref but was at a loss to explain why I observed a performance gain.
You didn't provide numbers, unfortunately, and you didn't provide a way to recreate the benchmarks to verify your claims either, since there's no archive of older versions of cpuminer-opt to build against. If it were on github, for example, that would have been easier to test. Each scan takes seconds to run so the overhead of one extra pointer deref every few seconds is immeasurable. Even if you go up a level to the miner_thread loop. There are maybe 20 gated fuction calls every loop. 20 extra derefs every few seconds is still immeasurable.
That was the info I was looking for, thank you. This whole debate was too long just because either I didn't communicate clearly enough that I am assuming it is done on every hash call or because you didn't recognize that when reading. Pseudocode should have been a big hint at that. Either way, this debate is pointless, 20 calls a second isn't something to worry about. The observed slowdown must be caused by other factors. I think you hit the nail on the head when you said you made an assumption. That was, IMO, your biggest mistake and why I kept repeating that you need to do your homework before bringing it to my attention, Had you done that you would have realized yourself that the deref overhead was trivial and any observed performance diff was due to something else. It was my assumption that you would have already done that. We both made assumptions, not a good idea. I didn't have numbers because there was no way to run a controlled test with the necessary level of precision and accuracy. And it's also why I suggested it wasn't worth your effort to go back and restest previous releases.
|
|
|
Yes the pool has loads of shares to calculate because the DDoS bogged down the calculations, it's currently catching up but will probably need a few hours until it has fully catched up with the calculations.
Awesome, thanks - definitely not complaining because as always, the pool's been super stable, and I for one liked the new interface haha - only thing I found myself missing was the worker stats box on the dashboard. all should be good by now. I'd recommend using the other pool(s) as well at least as a backup as I don't know if the attacker comes back. There still appear to be some issues. I'm getting some workers dropping off the list occasionally due to infrequent share submissions, but I'll ride it out. Cheap shot: I wouldn't have been able to tell with the new interface. ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) It's curious that the ddos coincided with the price rise.
|
|
|
Okay then, explain this: https://gist.github.com/hmage/2a1fdbd7bdad252cd08c9b4166c5727aon Core i5-4570S: hmage@dhmd:~/test$ cat /proc/cpuinfo |fgrep name|head -1 model name : Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz hmage@dhmd:~/test$ gcc dereference_bench.c -O2 -o dereference_bench && ./dereference_bench workfunc(): 0.002082 microseconds per call, 480308.777k per second workloopfunc(): 0.001774 microseconds per call, 563746.643k per second
on Core i7-4770: hmage@vhmd:~$ cat /proc/cpuinfo |fgrep name|head -1 model name : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz hmage@vhmd:~$ gcc dereference_bench.c -O2 -o dereference_bench && ./dereference_bench workfunc(): 0.001776 microseconds per call, 562932.922k per second workloopfunc(): 0.001506 microseconds per call, 664150.879k per second
Dereferencing on every call _is_ a big performance hit, unless you have another explanation. Oh, I already know, you get angry.
It looks to me that it was you who got angry. I apologise for my blunt approach. A little impatient maybe but not really angry. I try to stick to the issues. Yes, deferencing a pointer to call a function adds overhead but it has to be taken in context. How often does that occur in the big picture? Take scanhash, for example, the lowest level function that is gated. Each scan takes seconds to run so the overhead of one extra pointer deref every few seconds is immeasurable. Even if you go up a level to the miner_thread loop. There are maybe 20 gated fuction calls every loop. 20 extra derefs every few seconds is still immeasurable. Any change of program flow has overhead, that's why function inlining and loop unrolling exist. But if the code size of an unrolled loop overflows the cache you may end up losing more performance from cache misses than you gained from inlining. This might answer your question: https://bitcointalk.org/index.php?topic=1326803.msg13770966#msg13770966I clearly stated I did not predict a performance gain from algo-gate and if you dig deeper you may find where I did acknowledge the overhead of the deref but was at a loss to explain why I observed a performance gain. Maybe my observations were just noise, maybe some other change is responsible for the increase in performance in spite of the gate. I just don't know. There are too many variables that can't be controlled so I dismiss such observations without a solid case to back it up. Finally what it comes down to, like any decision, is a balance. Algo-gate was never about performance it was about a better architecture that made it easier for developpers to add new algos to the miner with minimal disruption to the existing code. I judged the performnce cost to be negligible.
|
|
|
I have given you the benefit of the doubt and tried to probe you for more info in areas where I didn't have the confidence to call you out. But so far it's come up empty. When you challenge me on one of my strengths you'd better be well prepared.
I don't care if I challenge you or not, I'm not here for your entertainment. 10 runs of cpuminer-opt are giving results that are consistently less than 10 runs of cpuminer-multi on the algos listed above. Simple as that. You're free to ignore this fact, of course. But I thought it'd be nice if you knew it. When I give you constructive feedback you seem to get angry which is counterproductive. I thank you for your work but it was not enough to draw any conclusions. A 2% diffreence is statistically insignificant. But let's assume it is. You suggested it was caused by the use of function pointers by algo-gate. I countered that my measurements when algo-gate was implemented showed an improvement. That disproves you theory, one that was not supported by any evidence BTW. So if the difference is real it must be caused by something else. There are a lot of possibilities. Differences in CPU architecture (I don't mean capabilities) can cause measurable differences between algos. Cache size and organization, execution environment, memory interface, etc can all cause different algos to perform differently on different CPUs. If you look at HOdl it performs well on an i7 but poorly on an i5 due to the smaller cache. As it turns out it was specifically optimized for the size of the i7 cache. You need to do your research, get your facts straight and present a coherent case it you want to get any attention, especially when you are criticizing someone's work. I have a thick skin, thicker than yours apparently, so I can take it and give it back. Put your self in my position, how would you react to someone taking pot shots about what you're doing wrong and how you should do things. Oh, I already know, you get angry.
|
|
|
Your suggestion would add the overhead of a function call and return on every iteration to save a pointer deref. Looks like a bad trade to me.
I meant to move the dereference outside of iteration completely. Have the iteration cycle code for each algo so it doesn't go through dereferencing. Note - all of this is speculation, I still didn't measure exactly where the slowdown is and why it's slower. I'm just reporting that for some reason non-AES versions of algos are slower in cpuminer-opt compared to cpuminer-multi. This needs further investigation. One same CPU, these algos are slower on cpuminer-opt compared to cpuminer-multi: "groestl" => 1109819 / 1000, // cpuminer-opt "groestl" => 1125917 / 1000, // cpuminer-nicehash "keccak" => 6964234 / 1000, // cpuminer-opt "keccak" => 8332952 / 1000, // cpuminer-nicehash "luffa" => 2728931 / 1000, // cpuminer-opt "luffa" => 3177996 / 1000, // cpuminer-nicehash "lyra2" => 716945 / 1000, // cpuminer-opt "lyra2" => 921109 / 1000, // cpuminer-nicehash "neoscrypt" => 27583 / 1000, // cpuminer-opt "neoscrypt" => 28891 / 1000, // cpuminer-nicehash "pentablake" => 3479320 / 1000, // cpuminer-opt "pentablake" => 3609862 / 1000, // cpuminer-nicehash "pluck" => 1722 / 1000, // cpuminer-opt "pluck" => 1818 / 1000, // cpuminer-nicehash "s3" => 1086149 / 1000, // cpuminer-opt "s3" => 1201897 / 1000, // cpuminer-nicehash "scrypt" => 91557 / 1000, // cpuminer-opt "scrypt" => 99702 / 1000, // cpuminer-nicehash "sha256d" => 53122339 / 1000, // cpuminer-opt "sha256d" => 54669375 / 1000, // cpuminer-nicehash "shavite3" => 2232258 / 1000, // cpuminer-opt "shavite3" => 2343704 / 1000, // cpuminer-nicehash "skein" => 6405675 / 1000, // cpuminer-opt "skein" => 6586806 / 1000, // cpuminer-nicehash "skein2" => 7985012 / 1000, // cpuminer-opt "skein2" => 8167405 / 1000, // cpuminer-nicehash
I'm using this version of cpuminer-multi — https://github.com/nicehash/cpuminer-multiWell, your pseudo code had the call/ret inside the loop. Most of the algos in your list are of little interest, except neoscrypt. That is one algo I'd like to improve. In relative terms it underperforms the GPU version by a lot. Another thing to consider is that local hashrate reporting by the miner isn't very reliable and your data is well within a 2% margin of error. I was seeing greater variation just from different sessions of the same code. I thought I was making incremental improvements with some changes and regressions with others when all along it was just noise. I like intelectual challenges but you need to do a better job. You don't provide the full picture initilally and only give more info after I poke holes in your initial presentation. This seeems to be a pattern with your "suggestions". You obviously have some knowledge, maybe not as much as me, but knowledge in areas where I am weak, c++, for example. I'm also weak in GUI apps and web programming but I'm strong in OS fundamentals and CPU architecture, though not specifically Linux and x86. One of my biggest challenges has been applying my knowledge and experience to an unfamiliar environment. I tend to make a lot of mistakes as a result. I have given you the benefit of the doubt and tried to probe you for more info in areas where I didn't have the confidence to call you out. But so far it's come up empty. When you challenge me on one of my strengths you'd better be well prepared. But so far it's come up empty
|
|
|
I've noticed another performance regression compared to cpuminer-multi. Algos that have very high number of calls per second tend to be slower on cpuminer-opt because of algo-gate callback functions. When calling through function pointers, the pointer needs to be dereferenced first before jumping, when the function in question is fast enough, the dereferencing could be reducing performance vs direct function call. One of the ways to fix that is to put dereferencing outside the loop. pseudocode before: func = &hash_sha256; while(true) { func(); }
pseudocode after: funcloop = &hashloop_sha256; funcloop();
hashloop_sha256() {while(true) {hash_sha256());}
This moves deferencing to be done only once at start of the loop. Have you measured a regression? My measurements between 3.0.7 (pre algo-gate) and 3.1 showed a modest improvement in performance accross the board. Your suggestion would add the overhead of a function call and return on every iteration to save a pointer deref. Looks like a bad trade to me.
|
|
|
@ ocminer
the old theme will do for now i guess until u finish the new one....!!!
I wasnt aware people don't like the new one that much - so i switched back for now ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Thank you, it's much appreciated. I looked for more data with the new interface, I believe i tried every link available but couldn't find it. To be blunt I don't see anything wrong with the old interface. The only problem I've observed is the global hashrate seems incorrect, but that's a data issue, not an interface one. There is some info that can be removed such as the exchange rate. Since you don't do exchange I don't see a use for it. There is also some room for improvement on how the data is presented but overall I'm quite satisfied with the status quo.
|
|
|
Regarding Blockquarry: I'm also getting stratum errors several times a day. Stratum connection failed....
For the moment i switched to maxminers.net and i can see that most miners also switched. If there is more pools please post here. Regards I've been using suprnova but they recently changed their web interface and removed a lot of useful information such as unconfirmed balance and worker hashrates. Other than that brain fart I haven't had any problems with them. OCMIner are you reading? You didn't respond to my post in the suprnova pool thread.
|
|
|
The wolfminer was developed by user wolf0
Thanks but Wolf0 didn't do the caching part, I'm looking for the original.
|
|
|
I also use hyperthreading. I haven't intentionally touched either algo in several releases and 3.2 was a restructuring release with no intended change in functionality. If you see it again, let me know and I'll take another look. Buit it's pretty clear from the code that the last TOTAL rate displays the same data as the last line.
Yes, the output is same as total, just in format that's easier to parse, no problem with that fact. The problem was that right before the end, it would spit thousands of lines in a second with ever increasing hashrate that would inflate the total result. Do you keep an archive of older versions of cpuminer-opt? I'd like to check older version and I foolishly deleted my local copy. A few of the DL links are still active. If you want a specific release let me know and I'll reactivate it. Keep in mind there have been some problem releases along the way which you probbaly want to avoid. The post for each release is still in the thread and should help you find the most stable ones. Personally I don't think it's worth the effort to go back. If the problem reoccurs with the current release you can collect more data and we can pursue it from there.
|
|
|
Edit3: I ran the test several times with both algos and alway produced the correct result. Could your script be misinterpreting? Without further information on how to reproduce I consider this issue closed.
I was running it without the script and was getting the wrong output. I'm talking about the last line: 463916 This is the line I'm parsing in the script. Maybe it's because I have hyperthreading enabled on my CPU? Could be a thread synchronization issue. It doesn't always happen for me either. I've been trying to reproduce the problem for you with v3.2.1 and asciinema and so far no luck. I was experiencing problem on 3.1.17. Maybe you fixed it in v3.2.1? I see you've changed up stuff that could be relevant since then. I also use hyperthreading. I haven't intentionally touched either algo in several releases and 3.2 was a restructuring release with no intended change in functionality. If you see it again, let me know and I'll take another look. Buit it's pretty clear from the code that the last TOTAL rate displays the same data as the last line. From looking at the code the time_limit stuff seems in an odd place, the end of the loop would seem more appropriate with the rest of the display code. I may consider moving it on speculation if the problem returns.
|
|
|
I'm looking for the developer of the original hodlminer who designed the caching optimization.
hodlminer performnace on i5 is not very good due to the smaller cache size. hodlminer, both the original and the Wolf0 version are optimized for the i7 cache. I've tried changing the slice size but only get rejects. I would like to communicate with the original author to try to tweak the slice size for better performance with the i5 cache, but I don't know who that is. Anyone?
|
|
|
People need to stop mining DASH until the pool has mined enough to pay miners.
Easier said than done. We can't choose the coin we mine, only the algo.
|
|
|
My wife just downloaded Minergate and first I thought it was another ponzi cloud mining software. But when I read this I think more about real mining or am I wrong about it. Is this legit and is it smart investing in this time and maybe money.
I never got a good feeling about Minergate. The description of their service is vague and their response to my specific questions were evasive. I'm not going to sign up to any service that isn't up front about what they provide. I still don't know what services they provide. Strike 1. They also promote their own mining software without any info about it and no source code available. Strike 2. They control both the miner and the pool so there is no way to independantly verify the accuracy of the stats. Strike 3, you're out!
|
|
|
X17 algo is now supported by cpuminer-opt. It has been tested at zpool with v3.1.18.
The message warning that the algo has not been tested can be ignored. It will be removed from the next release.
|
|
|
The 1080 will cost $700, have 2500 shaders and clocked @ 1600mhz. A used 980ti can be picked up for $400, 2760 shaders and can be oveclocked to 1500mhz stable. (Gigabyte G1 windforce) Quark will draw around 240Watt The shadercount of the gtx 1070 is unknown. To benchmark the pascal is easy compile this sourcecode https://github.com/tpruvot/ccminerrun it with ccminer -a quark --benchmark And then compare it to my modded Maxwell kernal. Here are the results on the 980ti: (31,8MHASH) (132.5KHASH /Watt) ![](https://ip.bitcointalk.org/?u=http%3A%2F%2Fi68.tinypic.com%2Ff3gako.png%26t%3D564%26c%3DJUwuNpEJU6oQwA&t=663&c=QnnvbbKSgRJv6g) As I mentioned in other topic 1080 at Quark - 32Mhs at around ~170W Ethereum - crashes on Genoil 1.0.7 with "device bit not recognizes" message (smth like that) With Ocl ethereum miner - 12.5mhs at.. 30w Neoscrypt is not optimized - 0.450 What do you suggest to test next? 1070 Quark was at 24mhs at 110W the same for ethereum - 12.5 mhs at 30W Must be nice to be an insider, loaner or keeper? The 1070 results don't seem to scale the same as the 1080. Based on the 1080 rate you posted I was estimating 26 MH/s on the 1070. But the power usage on the 1070 is unexpectedly low compared with the 970/980 ratio. Neoscrypt is a curious beast. The original neoscrypt kernel (DJM34) performs better on kepler (780ti specifically) than the improved Pallas neoscrypt kernel, although pallas' works better on Maxwell, both compiled with cuda 6.5. Then the Pallas neoscrypt took a big hit when compiled with cuda 7.5. DJM34 took a crack at it and restored much of the lost hash. Now it appears it's taking another hit on Pascal. I would suggest trying the original DJM34 neoscrypt (SP_MOD 58) compiled with cuda 6.5, the Pallas kernel compiled with 6.5 & 7.5 and the improved DJM34 (I think that is what you already tested).
|
|
|
Took the 970 out...I'm done with it. A book marker for now.
I have a couple of cards that don't like to switch algos much but they will run for days on one algo. One example is a 970 that will mine quark at default intensity fresh from a reboot but over time while profit switching it will fail to start. Sometimes it will start at lower intensity but continues to degrade and eventually won't start at all until a system reboot. Then it will run at default intensity again. The symptoms smell like a memory leak but if that was the case it should affect all cards eventually but it doesn't. I'm stumped. As soon as I put that card into the rig its core clock reads 1519 the other 1492 and it will mine for a few seconds then drivers crash not ccminer. I tried every combo I could think of. Thats on any algo. I'm thinking of putting it on this computer and take out 2 750ti move them to the 970 rig. As soon as I can. See if that works. That's what I get with too much OC, runs for a while then the driver crashes.
|
|
|
I've noticed that some algos misreport their benchmark results if you run it under --benchmark with --time-limit. A good example is c11: cpuminer-opt --benchmark --time-limit 120 -a c11
This will run and will show steady 700kHs on my i7 4770, but at the end it will spur up and report 1791kHs. I am automating the benchmarks and caught this too late -- cpuminers show a single-line result as last line in hashes per second. I extract that and put into mining profitability calculation as seen here: https://hmage.net/minerstats.php#cpu_i7_4770_ocI then graph that to see the trends here: http://grafana.hmage.net/dashboard/db/miner-statsAs a result, I will need to manually rerun all benchmarks that showed cpuminer-opt as favourable. [EDIT]: Same with sib -- last line reports 1333333, but real speed is 486350. I haven't paid any attention to benchmark other than a sanity test before connecting to a pool. Didn't realize anyone was using it for anything useful. I know of one change I made that can affect first and last result, I moved the share submission up before the hash display. The change never produced a significant improvement but I never backed it out. I'll look into it further when I solve the ZR5 problem, Edit: I presume the error you are seeing is in the "Benchmark:" hashrate display. I don't see how the change I mentioned above could have caused this. Furthermore this display uses a unique formatting function. I don't think it's my bug. I had made another change which would show a lower benchmark rate because I was counting the entire miner thread loop time instead of just the scan time but, again, I don't see how that change could have produce the error you are seeing. The "Benchmark" rate should be the same number as the previous "Total" rate because it is using the same variable. This suggests a formatting error. Anyway I'll have to investigate further tomorrow. Edit2: This is what I get, I don't see the problem. [2016-05-11 09:39:03] CPU #5: 246.93 kH, 47.87 kH/s [2016-05-11 09:39:03] CPU #2: 122.98 kH, 50.54 kH/s [2016-05-11 09:39:03] CPU #0: 55.23 kH, 48.31 kH/s [2016-05-11 09:39:04] CPU #6: 52.29 kH, 48.96 kH/s [2016-05-11 09:39:04] CPU #1: 167.29 kH, 55.89 kH/s [2016-05-11 09:39:04] CPU #3: 260.19 kH, 46.91 kH/s [2016-05-11 09:39:04] CPU #7: 49.19 kH, 57.17 kH/s [2016-05-11 09:39:04] Total: 1206.99 kH, 406.06 kH/s [2016-05-11 09:39:06] CPU #7: 262.14 kH, 115.02 kH/s [2016-05-11 09:39:06] Total: 1419.94 kH, 463.92 kH/s [2016-05-11 09:39:07] CPU #5: 262.14 kH, 67.54 kH/s [2016-05-11 09:39:08] CPU #4: 262.14 kH, 52.88 kH/s [2016-05-11 09:39:08] CPU #0: 262.14 kH, 55.62 kH/s [2016-05-11 09:39:08] Benchmark: 463.92 kH/s 463916
Edit3: I ran the test several times with both algos and alway produced the correct result. Could your script be misinterpreting? Without further information on how to reproduce I consider this issue closed.
|
|
|
|