Bitcoin Forum
April 27, 2024, 12:03:52 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: mining rig keeps dying. Too hot?  (Read 1140 times)
krazitrain (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
August 02, 2017, 08:53:27 PM
 #21

So it ran for 10 hours last night and then the devices fell off line.

I am mining zczsh. When it stops working correctly, it gives me the reading that 0 sols are being produced. The miner window is still up and all of the GPU's keep trying to restart but can't.

The last two times, the first device listed is, device 3. With the following log readings.

Device 3 thread exited with code:  77

Then all of the other GPU's exit with the same code 77.
1714176232
Hero Member
*
Offline Offline

Posts: 1714176232

View Profile Personal Message (Offline)

Ignore
1714176232
Reply with quote  #2

1714176232
Report to moderator
1714176232
Hero Member
*
Offline Offline

Posts: 1714176232

View Profile Personal Message (Offline)

Ignore
1714176232
Reply with quote  #2

1714176232
Report to moderator
The grue lurks in the darkest places of the earth. Its favorite diet is adventurers, but its insatiable appetite is tempered by its fear of light. No grue has ever been seen by the light of day, and few have survived its fearsome jaws to tell the tale.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714176232
Hero Member
*
Offline Offline

Posts: 1714176232

View Profile Personal Message (Offline)

Ignore
1714176232
Reply with quote  #2

1714176232
Report to moderator
krazitrain (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
August 02, 2017, 09:00:16 PM
 #22

It may well be a sick card. well it is not sick, it just cannot handle your overclock. Even they are all the same model, one of them may not be able to survive the memory overclock. Actually this is what happened to my rig.

1. read the log and locate which gpu has hanged first or error first. say cpu 0.
2. what I did is pressing 0 to disable GPU 0 in the rig.
3. watch if your rig is stable. If it is, congratulation, you found the sick card.
4. Try one by one till you find it.

or you can just disable watchdog in your bat file, set -wd 0. In this case, the sick card will stop but your other cards keep working. Now touch the card and feel the temp you will know which one is bad.

If you cannot find a card to blame, then you may need to reinstall the windows and update to the latest.

Hope this helps.

Okay. I would like to do this but I am not sure where you disable just one of the GPU's.

Since my device 3 has been the one to stop first the last two times after reading my log, I am guessing that that is the problem.  I'm running Ali carts again now to see if I get that same device to fail. And trying to figure out how to disable the deviceand determine which card it actually is.
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1165


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
August 02, 2017, 10:40:44 PM
 #23

It may well be a sick card. well it is not sick, it just cannot handle your overclock. Even they are all the same model, one of them may not be able to survive the memory overclock. Actually this is what happened to my rig.

1. read the log and locate which gpu has hanged first or error first. say cpu 0.
2. what I did is pressing 0 to disable GPU 0 in the rig.
3. watch if your rig is stable. If it is, congratulation, you found the sick card.
4. Try one by one till you find it.

or you can just disable watchdog in your bat file, set -wd 0. In this case, the sick card will stop but your other cards keep working. Now touch the card and feel the temp you will know which one is bad.

If you cannot find a card to blame, then you may need to reinstall the windows and update to the latest.

Hope this helps.

Okay. I would like to do this but I am not sure where you disable just one of the GPU's.

Since my device 3 has been the one to stop first the last two times after reading my log, I am guessing that that is the problem.  I'm running Ali carts again now to see if I get that same device to fail. And trying to figure out how to disable the deviceand determine which card it actually is.

use nvidia inspector to find the GPU number in question.

what miner app are you using?

assuming ccminer, use the -d flag to choose what devices to use:
-d 0,1,3,4,5
and with ccminer you can trim intensity per-card as well:
-i 17,17,13,17,17

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!