Can you guys try starting it without longpoll? Maybe the longpoll switch code is responsible which is new since 2.1.0
Seems implausible, as I had the problem with 2.1.0. English grammar can be confusing at times, I admit. It's been there since 2.1.0.... it's not in 2.0.8 but is in 2.1.0
|
|
|
... Anyway, lets see if we can find anything else in common, besides just cgminer.
- Im using google DNS - I have donations enabled. (thinking of disabling it, just for testing) - I have several pools configured in fail-over (but Ive since encountered the problem on several different pools) - Im behind a NAT router - hmm? ...
Donation and NAT, but not Google DNS; failover-only pools in this order: Eclipse port 9007; Bitminter; Eclipse port 8337; local bitcoind Hmmm... Can you guys try starting it without longpoll? Maybe the longpoll switch code is responsible which is new since 2.1.0 cgminer's stale rate should be exceptionally low even without longpoll, but it will be slightly higher.
|
|
|
Thanks very much!
Those who gave me alternative payments in case chefnet didn't pay out, please PM me your address and I'll send it back if you like.
|
|
|
Yay thank goodness.
By the way, error 503 is a server not responsive, too busy etc. error... though it is possible to generate this artificially from the miner's end by having DNS issues, router problems and so on. Disabling cached connections in 2.1.1 after failure seemed to achieve sweet FA unfortunately. So I'm now officially in the NFI position.
|
|
|
I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...
Would it be beneficial if we get you a 6990? That would most definitely come under the definition of rhetorical questions. Given 6990s cost more than any other card on the market, I think I know what the likelihood of that happening is, though. But just to be clear since I haven't answered: of course it would...
|
|
|
Whatever, a GPU pushed too hard will produce errors, not rejects.
|
|
|
Now IF you have rejects not related to pool latency then your cards are trying to tell you something. It is possible to drive the card to the point of failure such that despite the higher hashrate you have some many rejects that shares/min is lower. Of course this should be immediately obvious with the giant ugly number next to R. ![Smiley](https://bitcointalk.org/Smileys/default/smiley.gif) Just to be clear to others, the ugly number next to R is Hardware Errors. That's the only time overclocking too much or unlocking broken shaders and what not will actually decrease your effective useful hashrate even though apparently stable; when your cards starts making shit up.
|
|
|
Over in mining hardware I just whined that I had an instance of a "SICK" GPU even after falling back to pretty vanilla settings of gpu-engine 725 (stock) and gpu-memclock 300 for my 5970s. Is there any chance that SICK like the following is not a GPU hardware issue? [2012-01-02 17:56:39] Thread 2 idle for more than 60 seconds, GPU 2 declared SICK! [2012-01-02 17:56:39] Attempting to restart GPU [2012-01-02 17:56:39] Thread 2 still exists, killing it off [2012-01-02 17:56:39] Thread 8 still exists, killing it off [2012-01-02 17:56:39] Thread 2 restarted [2012-01-02 17:56:40] Thread 8 restarted [2012-01-02 17:56:40] Accepted 00000000.30702585.cb8fdf73 GPU 5 thread 11 pool 0 [2012-01-02 17:56:41] Accepted 00000000.676a69c6.4b59b7db GPU 5 thread 5 pool 0 [2012-01-02 17:56:43] Accepted 00000000.1e5767ae.f669070b GPU 2 thread 2 pool 0 # note how healthy it is now!
Anything's possible, but note that the restart code was tested extensively on literally dozens of GPUs to get this sick restart code working -when possible- and the person who helped me test it had 72 GPUs that would often have boxes going down with any other miner. The idea was to make it recover to a fine state after enough rest if possible. So yes it's possible. Maybe even likely, who knows, but this particular scenario was not unusual even at normal clocks when some GPUs were run flat out, regardless of which miner it was. Interestingly it became FAR more common with the phatk2 kernel (which is what is used in cgminer) since that seemed to run GPUs that little bit more than anything else.
|
|
|
I'm sick of adding special case command line parameters...
|
|
|
Feature request... ... I believe that because GPUs 1, 3, and 5 don't return fan values that cgminer is ignoring their temps w/r auto-fan. Assuming that cgminer can't tell via ADL or otherwise that two GPUs share a fan, I would like to able to tell that to cgminer and thus have my temp targets applied to (in my case) odd-numbered GPUs as well as to even-numbered ones.
I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad... I might've been unclear. I was suggesting that the user have the option to specify to the software, presumably via .conf or command line, that certain GPUs comprise a "fan group," i.e., share a fan, and also which of the group has the fan output and control. I don't know, something like, in my case, "fan-group" : "0,1/0, 2,3/2, 4,5/5" ...meaning GPUs 0 and 1 share a fan, the speed of which is readable and controllable via GPU 0; etc. What I'm thinking of would not require any additional hardware coding, but it would require additional fan-control logic within cgminer. No, that's actually unnecessary because the ADL does have information about shared thermal devices... interpreting the results would need prodding though.
|
|
|
Feature request... [P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit GPU 0: 69.5C 4535RPM | 357.0/363.8Mh/s | A:285 R:0 HW:0 U:5.02/m I: 9 GPU 1: 74.0C | 366.4/363.9Mh/s | A:299 R:0 HW:0 U:5.26/m I: 9 GPU 2: 67.5C 4108RPM | 372.9/363.8Mh/s | A:289 R:0 HW:0 U:5.09/m I: 9 GPU 3: 62.5C | 366.4/363.7Mh/s | A:262 R:0 HW:0 U:4.61/m I: 9 GPU 4: 68.0C 3564RPM | 370.8/363.6Mh/s | A:294 R:0 HW:0 U:5.18/m I: 9 GPU 5: 71.0C | 340.5/363.6Mh/s | A:318 R:1 HW:0 U:5.60/m I: 9
These are three 5970s. auto-fan is on with a target of 70C for all, 3C hysteresis. At this snapshot GPUs 1 and 5 ran 3C-4.5C hotter than their card-mates, and GPU 3 ran 5C cooler than its mate. I believe that because GPUs 1, 3, and 5 don't return fan values that cgminer is ignoring their temps w/r auto-fan. Assuming that cgminer can't tell via ADL or otherwise that two GPUs share a fan, I would like to able to tell that to cgminer and thus have my temp targets applied to (in my case) odd-numbered GPUs as well as to even-numbered ones. I cannot code for a 5970 or 6990 without poking and prodding them with code, and since I don't own one, it's unlikely to happen in a safe manner. If I just guess, I'll likely do something which could be bad...
|
|
|
By the way, if you're using the donation option, and you've been having lots of communication issues lately, it's probably that ![Embarrassed](https://bitcointalk.org/Smileys/default/embarrassed.gif) The pool your donations are going to is still having lots of teething problems with its migration.
|
|
|
Those with "comm errors" that lead to failures, are you using the ubuntu 11.11 binary on an older ubuntu?
|
|
|
Yes, people have tried to insist all sorts of things to me ![Roll Eyes](https://bitcointalk.org/Smileys/default/rolleyes.gif)
|
|
|
I just want to submit a possible bug with 2.1.1
I am a HUGE fan of cgminer and dev who is responsible for it
I have been using cgminer elusively on my 8 ghash rig since version 2.0.0
when I upgraded from 2.1.0 to 2.1.1 I am noticing my 5870s dying for some reason I just watched one 5870 fan stop reporting RPM and then go to 127 Degrees and windows crashes this happened three times in a row
I down graded back to 2.1.0 and the problem is not happening any more, that very same card is chillin at 57 degrees delivering a nice 432 mhash
I have also seen at least one 5870 die on all of my rigs since upgrading to 2.1.1
i downgraded back to 2.1.0 and have not seen any cards die as of yet I will keep the thread upgraded if they die or not to see if my hunch is correct - that this is a problem with 2.1.1
thanks
Most unusual. There was no GPU speed or fan management code between 2.1.0 and 2.1.1... On the other hand, there was one change between 2.0.8 and 2.1.0. So not sure what you're seeing there at all.
|
|
|
So I dropped my memclocks and bumped my intensity. Got a minor increase in shares per minute (which matters more than hashrate). ...
Is there a theory about the relation between hash rate and shares/minute? Overall they must correlate, but I've observed that the correlation is not perfect. They correlate, but proportional to luck. So if you're "tuning' based on the value returned for shares/minute, then you're changing settings based on the luck of your most recent mining session, and nothing to do with hash performance...
|
|
|
I for one welcome our new DGM overlords.
|
|
|
It's only needed if you grab a binary for 11.11 and use it on earlier. If you build it yourself it wont be needed...
|
|
|
[P]ool management [G]PU management [S]ettings [D]isplay options [Q]uit GPU 0: 69.5C 4535RPM | 357.0/363.8Mh/s | A:285 R:0 HW:0 U:5.02/m I: 9 GPU 1: 74.0C | 366.4/363.9Mh/s | A:299 R:0 HW:0 U:5.26/m I: 9 GPU 2: 67.5C 4108RPM | 372.9/363.8Mh/s | A:289 R:0 HW:0 U:5.09/m I: 9 GPU 3: 62.5C | 366.4/363.7Mh/s | A:262 R:0 HW:0 U:4.61/m I: 9 GPU 4: 68.0C 3564RPM | 370.8/363.6Mh/s | A:294 R:0 HW:0 U:5.18/m I: 9 GPU 5: 71.0C | 340.5/363.6Mh/s | A:318 R:1 HW:0 U:5.60/m I: 9
I see U columns that are aligning!
|
|
|
Happy New Year, new version 2.1.1, purely bugfixes and cosmetic changes. Links in top post.
- Include API examples in distribution tarball. - Don't attempt to pthread_join when cancelling threads as they're already detached and doing so can lead to a segfault. - Give more generic message if slow pool at startup is the donation pool. - Continue to attempt restarting GPU threads if they're flagged dead at 1 min. intervals. - Don't attempt to restart sick flagged GPUs while they're still registering activity. - Make curl use fresh connections whenever there is any communication issue in case there are dead persistent connections preventing further comms from working. - Display pool in summary if only 1 pool. - Adjust column width of A/R/HW to be the maximum of any device and align them.
|
|
|
|