Bitcoin Forum
August 14, 2024, 01:13:42 PM *
News: Latest Bitcoin Core release: 27.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Diagnose this  (Read 261 times)
dragonmike (OP)
Hero Member
*****
Offline Offline

Activity: 1274
Merit: 556



View Profile
October 17, 2017, 06:11:09 PM
 #1

Two identical rigs of 7x RX570 on MSI Z170A Pro Carbon.

Both mining XMR using Claymore v9.6 (iirc). 1220/1925@975mV, custom timings.




One rig mining for 60 days with only 1 interruption (on Windows I find that quite remarkable).

The other used to be chugging pretty well but recently is just not having it anymore. It will work for 30min then reboot, and start mining again with Wattman stating default clocks restored after some sort of error (so not reapplying my application specific clocks and voltages).

Claymore watchdog-enabled logs say absolutely nothing.

Hwinfo64 shows only a handful of memory errors (i.e. Maybe 10 in the space of 5 minutes).
I bumped core voltage by 20 mV. No change.

I'm now lowering target temp to 69C from 74. Not expecting miracles.

Any suggestions?
belaweb2
Full Member
***
Offline Offline

Activity: 490
Merit: 105


View Profile WWW
October 17, 2017, 07:14:35 PM
 #2

What does the windows eventlog say about the issue?

If you don't believe me or don't get it, I don't have time to try to convince you, sorry.
xxcsu
Hero Member
*****
Offline Offline

Activity: 1498
Merit: 597


View Profile WWW
October 17, 2017, 07:25:10 PM
 #3

cards have modded bios ?
cards are overclocked ?
if yes overclocked one by one or one setting applies to all cards ?
cards have pretty much the same ASIC quality ?
Did you tried mining with default settings ? reset all overclocking / undervolting ?

Learn about Merit & new rank requirements , Learn how to use MERIT , make this community better
If you like the answer you got for your question from any member ,
If you find any post useful , informative use the +Merit button.
generalt
Legendary
*
Offline Offline

Activity: 1096
Merit: 1021


View Profile
October 17, 2017, 07:29:20 PM
 #4

I'm going to say risers.  Those things always cause problems and are the weakest link.  Since it reboots in 30 minutes it shouldn't be to hard to track down which one.  Just use the -di option to disable say 3 cards.  If it crashes, then enable the 3 you disabled and then disable a different set.  Keep going until you narrow it down to just one card that causes the system to crash.  It's either that card or that riser.

BTC: 1GENERALrtBAjEv2Ps5cmEW1FADnXh1bCZ
dragonmike (OP)
Hero Member
*****
Offline Offline

Activity: 1274
Merit: 556



View Profile
October 17, 2017, 07:48:33 PM
 #5

Event log says system rebooted without cleanly shutting down first... so essentially some sort of crash/failure happens.

Using -di seems like a good idea, albeit potentially long and annoying/costly to diagnose. But that actually gave me another idea. Since that watchdog version of Claymore is pretty crap, I might just switch to mining ETH and use that Claymore monitoring instead. It's usually a lot better at reporting GPU failures.

Means I'm gonna have to update drivers and everything. Oh well. It was overdue anyway...
JuanHungLo
Hero Member
*****
Offline Offline

Activity: 935
Merit: 1001


I don't always drink...


View Profile
October 17, 2017, 10:33:05 PM
 #6

I'm going to say risers.  Those things always cause problems and are the weakest link.  Since it reboots in 30 minutes it shouldn't be to hard to track down which one.  Just use the -di option to disable say 3 cards.  If it crashes, then enable the 3 you disabled and then disable a different set.  Keep going until you narrow it down to just one card that causes the system to crash.  It's either that card or that riser.

^THIS!
Been there, done that.

Bull markets are born on pessimism, grow on skepticism, mature on optimism, and die on euphoria. - John Templeton
dragonmike (OP)
Hero Member
*****
Offline Offline

Activity: 1274
Merit: 556



View Profile
October 17, 2017, 10:39:18 PM
 #7

So I have managed to clean up drivers, install blockchain drivers, install newest Claymore and run it.

One card is definitely lagging in terms of core speed it can sustain compared to the others (talking 25-30 MHz lower while dual mining). At least I've identified it now... will have to let it run a while at these lower clocks and see if that solves it. If not, I'll switch riser. If that doesn't do it, I'll know the GPU is probably on its way out.

Hopefully it's just a little degradation.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!