Bitcoin Forum
January 17, 2018, 01:34:17 PM *
News: Electrum users must upgrade to 3.0.5 if they haven't already. More info.
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: Rig Crashing and Won't Restart  (Read 1044 times)
Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 17, 2012, 09:42:14 AM
 #1

I've assembled my first rig, it's a no brand motherboard with 4 video cards (3x 5970 + 1 5870).

Running it with BAMT it's yielding about 2.2GHps as of now, still pending some overclock tuning.

My problem is every couple of hours the rig stops working and won't restart. It seems from my limited knowledge perspective that some of the cards are locking and keeping the whole machine from booting up.

So usually I will remove each card at a time until the machine successfully boots.

How can I "unlock" a video card? And better how can I prevent these locks?
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
zvs
Legendary
*
Offline Offline

Activity: 1526


House Nogleg


View Profile WWW
June 17, 2012, 10:17:13 AM
 #2

I've assembled my first rig, it's a no brand motherboard with 4 video cards (3x 5970 + 1 5870).

Running it with BAMT it's yielding about 2.2GHps as of now, still pending some overclock tuning.

My problem is every couple of hours the rig stops working and won't restart. It seems from my limited knowledge perspective that some of the cards are locking and keeping the whole machine from booting up.

So usually I will remove each card at a time until the machine successfully boots.

How can I "unlock" a video card? And better how can I prevent these locks?

are you sure it's not the power supply?  that's what it sounds like to me

Dacentec Dedicated Servers (Lenoir, NC, USA) from $25/mo & Time4VPS Storage VPS' (Vilnius, Lithuania) from €18/qtr & Hetzner's serverbidding, from ~€20/mo -- had bitcoin nodes & p2pool (dacentec, hetzner) for several years on these.  Only two are affiliate links too, kek! -------  Feel free to join my G2A Goldmine Pyramid Scheme Team, or possibly even buy a Steam game from there.  Maybe.
Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 17, 2012, 10:22:02 AM
 #3

I'm using 2x PSU model BeQuiet Pro-8 1200W, so I have a total of 2400W worth of juice.

When it locks, even if I disconnect all cards and leave only 1 connected being the locked one the system still won't start.

I believe it hangs when it is starting the X, I get a blank screen and nothing happens beyond that point. The system is also not accessible via SSH, it pretty much freezes.
dellech
Jr. Member
*
Offline Offline

Activity: 39


View Profile
June 17, 2012, 11:11:06 PM
 #4

I can't tell you why it crashes ... but I experienced the same problem at booting machines with 2 or more ATI cards.

Linux 64bit, any version of fglrx ... sometimes the machine locks up hard when it tries to start X ... next try may succeed or not ... whenever I get it to start up I am sure to avoid shutting down as long as possible ...
Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 17, 2012, 11:42:54 PM
 #5

I removed the 1st video card and it booted up normally.

I also noticed that the 1st and 2nd video cards locks more often, they're also drawing their air from the next video cards so they have the highest temperature of all cards in the rig.

So the lock is probably happening because of the high temperatures, right? I think I will be able to avoid it by adding extra fans.

The other question is how can I quickly recover from a locked card? How to quickly unlock it and could it be automated instead of requiring manual work such as manually testing which card is locked and removing it from the rig?

And another important question, why is BAMT restarting the system? It doesn't seem a good idea to restart to me since the system won't come back up at all and sit there for hours until I notice the problem and manually fix it.
dellech
Jr. Member
*
Offline Offline

Activity: 39


View Profile
June 18, 2012, 01:52:49 AM
 #6

it might be better to really power the system off instead of rebooting - I noticed that when I "killed" a card by overclocking (seeking the highest stable speed) that a reboot did not suffice for the card to recover ...
Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 18, 2012, 07:25:01 AM
 #7

it might be better to really power the system off instead of rebooting - I noticed that when I "killed" a card by overclocking (seeking the highest stable speed) that a reboot did not suffice for the card to recover ...

That's a good idea. I noticed something similar that if a card if not detected for whatever reason, a reboot won't solve the issue. And shutting it down the then turning on again could make the card get detected.

I still haven't got a fix on the root cause of the issues I'm having, I'm still not sure it's the card that's getting locked or it's a motherboard that somehow remember the card/slot tha failed and won't let me boot up until I remove/replace that card.
miaviator
Donator
Hero Member
*
Offline Offline

Activity: 672


It's for the children!


View Profile WWW
June 20, 2012, 08:14:08 PM
 #8

You said you have Dual PSUs?  If so try running on one PSU - dual PSU's can cause lockups in linux if your not syncing/protecting the board from spikes.  Cablesauras and others have adapters to protect the rigs from this.

You can prevent temp lockups by setting a cutoff threshold in BAMT.

You should monitor all of your miners (even just one) using SNMP (included with bamt) or cacti.  I graph all of mine and send SMS to myself and my tech whenever production drops by more than 10%

Are you using extenders or are the cards directly on the board?  Which version of BAMT?

Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 21, 2012, 04:02:43 AM
 #9

You said you have Dual PSUs?  If so try running on one PSU - dual PSU's can cause lockups in linux if your not syncing/protecting the board from spikes.  Cablesauras and others have adapters to protect the rigs from this.

You can prevent temp lockups by setting a cutoff threshold in BAMT.

You should monitor all of your miners (even just one) using SNMP (included with bamt) or cacti.  I graph all of mine and send SMS to myself and my tech whenever production drops by more than 10%

Are you using extenders or are the cards directly on the board?  Which version of BAMT?

I have switched to 1 PSU, with only 2 cards on each motherboard. Now it's a little more stable, but still rebooting every now and then.

I once saw a reboot as I had the monitor plugged at that time (most of the time the it's running headless) and I saw on the screen an error related to Kernel Panic. How can I troubleshoot this to find the source of the instability?
miaviator
Donator
Hero Member
*
Offline Offline

Activity: 672


It's for the children!


View Profile WWW
June 21, 2012, 01:05:20 PM
 #10

You said you were using BAMT? On a USB drive or hard drive?  I would suggest first using a new copy of BAMT as sometimes the USB or HDD images get curropt.

Also ensure you have temperature cutoffs defined, your rig will crash (and burn) if the cards get overheated.

There are a few million reasons for a kernel panic.  On  a mining rig the options are significantly reduced.  Try a new BAMT image and then list your Hardware:

Motherboard:
CPU:
Memory:
Cards:
Extenders or on-board:
Bamt Version:

01BTC10
VIP
Hero Member
*
Offline Offline

Activity: 742



View Profile
June 21, 2012, 01:14:02 PM
 #11

Had same problem and lowered my OC settings.
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!