ocminer
Legendary
Offline
Activity: 2688
Merit: 1240
|
|
January 04, 2012, 09:44:24 AM |
|
Hardware Errors=0 even though one card is "DEAD" ? I have: [P]ool management [G]PU management ettings [D]isplay options [Q]uit GPU 0: 41.0C 1041RPM | DEAD /262.5Mh/s | A:2294 R:10 HW:0 U:3.59/m I: 8 GPU 1: 75.0C 1067RPM | 386.4/386.3Mh/s | A:3427 R:21 HW:0 U:5.36/m I: 8 GPU 2: 58.0C 0% | 215.4/215.9Mh/s | A:1873 R: 9 HW:0 U:2.93/m I: 8
But a "summary" gives me:
Array ( [STATUS] => Array ( [STATUS] => S => 11 [Msg] => Summary [Description] => cgminer 2.1.1 )
[SUMMARY] => Array ( [0] => SUMMARY [Elapsed] => 38299 [Algorithm] => c [MHS av] => 865.15 [Found Blocks] => 0 [Getworks] => 5520 [Accepted] => 7590 [Rejected] => 40 [Hardware Errors] => 0 [Utility] => 11.89 [Discarded] => 433 [Stale] => 0 [Get Failures] => 3 [Local Work] => 15829 [Remote Failures] => 0 [Network Blocks] => 76 )
)
I was trying to check if a card is dead/sick/whatever over "Hardware Errors" as this seems to be the most logic parameter for me.
Is that not possible ?
|
suprnova pools - reliable mining pools - #suprnova on freenet https://www.suprnova.cc - FOLLOW us @ Twitter ! twitter.com/SuprnovaPools
|
|
|
kano
Legendary
Offline
Activity: 4620
Merit: 1851
Linux since 1997 RedHat 4
|
|
January 04, 2012, 10:22:51 AM |
|
The GPU replies with it's status. If you request "devs" you will get all GPUs and CPUs ( or {"command":"devs"} )
Each GPU will have a field called 'Status' that says one of: "Alive", "Dead", "Sick" or "NoStart"
Or you can request each GPU individually e.g. "gpu|0" etc ( or {"command":"gpu","parameter":"0"} ) and again it will be the same as above but with just the single GPU info.
Edit: as you can see in your screen dump, the total HW value is indeed zero.
|
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
January 04, 2012, 12:11:49 PM |
|
Hardware Errors=0 even though one card is "DEAD" ? Hardware errors are very different to a card becoming unresponsive under load. Usually hardware errors occur if someone has unlocked the shaders in a card that has faulty shaders, or they are overclocking beyond reliable levels but below crash levels. Hitting hardware errors without a hardware hang/dead card is actually quite rare.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
DBordello
|
|
January 04, 2012, 02:40:55 PM |
|
Hardware Errors=0 even though one card is "DEAD" ? Hardware errors are very different to a card becoming unresponsive under load. Usually hardware errors occur if someone has unlocked the shaders in a card that has faulty shaders, or they are overclocking beyond reliable levels but below crash levels. Hitting hardware errors without a hardware hang/dead card is actually quite rare. I actually had a 5830 @ 1030MHz that would spew about 7 HW errors a day, but was otherwise stable. I figured I had it OCd right to the edge.
|
www.BTCPak.com - Exchange your bitcoins for MP: Secure, Anonymous and Easy!
|
|
|
ovidiusoft
|
|
January 04, 2012, 03:06:45 PM |
|
Hardware Errors=0 even though one card is "DEAD" ? Hardware errors are very different to a card becoming unresponsive under load. Usually hardware errors occur if someone has unlocked the shaders in a card that has faulty shaders, or they are overclocking beyond reliable levels but below crash levels. Hitting hardware errors without a hardware hang/dead card is actually quite rare. Mmmm... no, not really: RAM: 325 CPU: 1040 Mhash/s: 337,2 Accepted: 289690 Accept/min: 4,51 Hardware: 523 Hardware %: 0,18 As you can see, a uptime of 1,5 months (since I restarted cgminer). The card never haged or went dead.
|
|
|
|
ancow
|
|
January 04, 2012, 03:24:32 PM |
|
I'm thinking this may all be related to my donation pool being flaky lately with the move and nothing to do with the new version.
In support of that theory: I only had one instance of the connection bug so far, and that was when I turned on donations and happened more or less simultaneously for two instances of cgminer. I turned donations off again and have been sailing smoothly since New Year's Eve (which is when the bug occurred for me). Anyway, I'm back to donating manually.
|
BTC: 1GAHTMdBN4Yw3PU66sAmUBKSXy2qaq2SF4
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
January 04, 2012, 03:28:32 PM |
|
Hardware Errors=0 even though one card is "DEAD" ? Hardware errors are very different to a card becoming unresponsive under load. Usually hardware errors occur if someone has unlocked the shaders in a card that has faulty shaders, or they are overclocking beyond reliable levels but below crash levels. Hitting hardware errors without a hardware hang/dead card is actually quite rare. Mmmm... no, not really: RAM: 325 CPU: 1040 Mhash/s: 337,2 Accepted: 289690 Accept/min: 4,51 Hardware: 523 Hardware %: 0,18 As you can see, a uptime of 1,5 months (since I restarted cgminer). The card never haged or went dead. Which doesn't change the fact that hardware errors are quite rare. On 16 GPU I have never had a single HW error logged in over time 9 months. Your high overclock likely has something to do with it.
|
|
|
|
P4man
|
|
January 04, 2012, 03:29:58 PM |
|
I'm thinking this may all be related to my donation pool being flaky lately with the move and nothing to do with the new version.
In support of that theory: I only had one instance of the connection bug so far, and that was when I turned on donations and happened more or less simultaneously for two instances of cgminer. I turned donations off again and have been sailing smoothly since New Year's Eve (which is when the bug occurred for me). Anyway, I'm back to donating manually. On hindsight I think the problems began when I enabled donations as well. How ironic, donators being 'punished'. Ill turn it off as well and see what happens.
|
|
|
|
BkkCoins
|
|
January 04, 2012, 03:31:37 PM Last edit: January 04, 2012, 04:09:36 PM by BkkCoins |
|
When I switched to cgminer a couple months back I noticed that I always had a much higher reject rate. I get typically 3-7% rejects regardless of pool, Ars, Eligius, BTCGuild and others. Not sure what causes this and haven't tried to debug yet. I just let it be because I like the interface and monitoring in cgminer but it would be nice to track this down and see why the rejects are high. Before, same HW/OS setup, I used to get more like 0.7%.
Should I turn LP off? I thought that was to help reduce rejects.
There was a bug that would cause higher rejects with multipool setups that was fixed in 2.1.0. We're only turning LP off at the moment to debug a network connectivity issue. Could it be related to high latency? I have about a 350mS round-trip to me here in Thailand. But that was the same on the previous miner (phoenix 1.6.2). I basically just stopped phoenix and ran cgminer instead with a bit of fiddling with the config. Edit: Trying 2.1.0 now. Will see how it goes.
|
|
|
|
DBordello
|
|
January 04, 2012, 03:33:36 PM |
|
It appears that you can easily change the donation pool (it is pulling the information from a website). Before we all jump on the turning off donation bandwagon, maybe it would be easier to change the donation pool to something more stable for the time being.
I bet donations have a tendency to stay off.
|
www.BTCPak.com - Exchange your bitcoins for MP: Secure, Anonymous and Easy!
|
|
|
ovidiusoft
|
|
January 04, 2012, 03:39:06 PM |
|
Which doesn't change the fact that hardware errors are quite rare. On 16 GPU I have never had a single HW error logged in over time 9 months. Your high overclock likely has something to do with it.
Absolutely, I was just replying to ckolivas's assumption that hardware errors will lock the card or make it 'dead'. It's possible to overclock "just right" so the gain in mhashes is worth the few hardware errors, while keeping the uptime at 100%.
|
|
|
|
ocminer
Legendary
Offline
Activity: 2688
Merit: 1240
|
|
January 04, 2012, 04:01:44 PM |
|
From my experience it really depends on the PSU you use how far you can push a card and it stays there stable. I had some rigs with el-cheapo-800w PSU where I could barely overlock a 5870 to 900 MHz and it would get SICK or DEAD every 2nd day, after one of the PSU died i replaced it by a Corsair TX850 and since then I could overclock the cards even to 950 MHz without any problems, they are running for months stable now. I swapped the second PSU also and hat the same effect.. Dont save at the wrong end
|
suprnova pools - reliable mining pools - #suprnova on freenet https://www.suprnova.cc - FOLLOW us @ Twitter ! twitter.com/SuprnovaPools
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
January 04, 2012, 07:46:45 PM |
|
Thanks guys, I now have a postulated mechanism for failure with donations on which is why I came up with the idea. I've since moved again to a different pool for donations but I realise it also provides a bug mechanism (though harder to hit) without donations, so I'll work on a fix. Nothing like discovering a hard to find bug
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Remember remember the 5th of November
Legendary
Offline
Activity: 1862
Merit: 1011
Reverse engineer from time to time
|
|
January 04, 2012, 08:01:26 PM |
|
Would a 64-bit cgminer benefit in any way? I thought if I had some free time, I could try to compile in MinGW-W64
|
BTC:1AiCRMxgf1ptVQwx6hDuKMu4f7F27QmJC2
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
January 04, 2012, 08:23:42 PM |
|
Would a 64-bit cgminer benefit in any way? I thought if I had some free time, I could try to compile in MinGW-W64
None whatsoever unless you're CPU mining, and even then you'd need to port the assembly code to make it work properly. Also mingw 64 is still much buggier than 32 bit and people have lots of problems with it.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Remember remember the 5th of November
Legendary
Offline
Activity: 1862
Merit: 1011
Reverse engineer from time to time
|
|
January 04, 2012, 08:26:59 PM |
|
Would a 64-bit cgminer benefit in any way? I thought if I had some free time, I could try to compile in MinGW-W64
None whatsoever unless you're CPU mining, and even then you'd need to port the assembly code to make it work properly. Also mingw 64 is still much buggier than 32 bit and people have lots of problems with it. Never had any problems with it. I compiled SDL(Simple DirectMedia Layer) under it. As well as the current litecoin miners(the improved ones).
|
BTC:1AiCRMxgf1ptVQwx6hDuKMu4f7F27QmJC2
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
January 04, 2012, 10:40:52 PM |
|
The git tree should now have a fix for this issue in it.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
January 04, 2012, 11:11:14 PM |
|
The git tree should now have a fix for this issue in it.
Got it running now on the machine I compile on and re-enabled donations on it, will let you know if I see any problems. Thanks a lot. This all assumes the donor pool actually has trouble, so I've put it back to the one that has been unstable lately just now (which means you'd have to restart the miner, sorry). Funny how for a change I'm wishing for pool instability (shh! no one tell the admin )
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
-ck (OP)
Legendary
Offline
Activity: 4284
Merit: 1645
Ruu \o/
|
|
January 05, 2012, 03:48:54 AM |
|
The donor pool had two short outages today so you would have noticed a difference with the new code by now.
|
Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel 2% Fee Solo mining at solo.ckpool.org -ck
|
|
|
Proofer
Member
Offline
Activity: 266
Merit: 36
|
|
January 05, 2012, 03:58:13 AM |
|
The donor pool had two short outages today so you would have noticed a difference with the new code by now.
"Today"? I've been running the new code since Jan 4. 23:17 GMT without problems.
|
|
|
|
|