Bitcoin Forum
May 10, 2024, 04:11:35 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Rig Crashing and Won't Restart  (Read 1129 times)
Sant001 (OP)
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
June 17, 2012, 09:42:14 AM
 #1

I've assembled my first rig, it's a no brand motherboard with 4 video cards (3x 5970 + 1 5870).

Running it with BAMT it's yielding about 2.2GHps as of now, still pending some overclock tuning.

My problem is every couple of hours the rig stops working and won't restart. It seems from my limited knowledge perspective that some of the cards are locking and keeping the whole machine from booting up.

So usually I will remove each card at a time until the machine successfully boots.

How can I "unlock" a video card? And better how can I prevent these locks?
1715357495
Hero Member
*
Offline Offline

Posts: 1715357495

View Profile Personal Message (Offline)

Ignore
1715357495
Reply with quote  #2

1715357495
Report to moderator
1715357495
Hero Member
*
Offline Offline

Posts: 1715357495

View Profile Personal Message (Offline)

Ignore
1715357495
Reply with quote  #2

1715357495
Report to moderator
BitcoinCleanup.com: Learn why Bitcoin isn't bad for the environment
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715357495
Hero Member
*
Offline Offline

Posts: 1715357495

View Profile Personal Message (Offline)

Ignore
1715357495
Reply with quote  #2

1715357495
Report to moderator
zvs
Legendary
*
Offline Offline

Activity: 1680
Merit: 1000


https://web.archive.org/web/*/nogleg.com


View Profile WWW
June 17, 2012, 10:17:13 AM
 #2

I've assembled my first rig, it's a no brand motherboard with 4 video cards (3x 5970 + 1 5870).

Running it with BAMT it's yielding about 2.2GHps as of now, still pending some overclock tuning.

My problem is every couple of hours the rig stops working and won't restart. It seems from my limited knowledge perspective that some of the cards are locking and keeping the whole machine from booting up.

So usually I will remove each card at a time until the machine successfully boots.

How can I "unlock" a video card? And better how can I prevent these locks?

are you sure it's not the power supply?  that's what it sounds like to me
Sant001 (OP)
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
June 17, 2012, 10:22:02 AM
 #3

I'm using 2x PSU model BeQuiet Pro-8 1200W, so I have a total of 2400W worth of juice.

When it locks, even if I disconnect all cards and leave only 1 connected being the locked one the system still won't start.

I believe it hangs when it is starting the X, I get a blank screen and nothing happens beyond that point. The system is also not accessible via SSH, it pretty much freezes.
dellech
Newbie
*
Offline Offline

Activity: 40
Merit: 0


View Profile
June 17, 2012, 11:11:06 PM
 #4

I can't tell you why it crashes ... but I experienced the same problem at booting machines with 2 or more ATI cards.

Linux 64bit, any version of fglrx ... sometimes the machine locks up hard when it tries to start X ... next try may succeed or not ... whenever I get it to start up I am sure to avoid shutting down as long as possible ...
Sant001 (OP)
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
June 17, 2012, 11:42:54 PM
 #5

I removed the 1st video card and it booted up normally.

I also noticed that the 1st and 2nd video cards locks more often, they're also drawing their air from the next video cards so they have the highest temperature of all cards in the rig.

So the lock is probably happening because of the high temperatures, right? I think I will be able to avoid it by adding extra fans.

The other question is how can I quickly recover from a locked card? How to quickly unlock it and could it be automated instead of requiring manual work such as manually testing which card is locked and removing it from the rig?

And another important question, why is BAMT restarting the system? It doesn't seem a good idea to restart to me since the system won't come back up at all and sit there for hours until I notice the problem and manually fix it.
dellech
Newbie
*
Offline Offline

Activity: 40
Merit: 0


View Profile
June 18, 2012, 01:52:49 AM
 #6

it might be better to really power the system off instead of rebooting - I noticed that when I "killed" a card by overclocking (seeking the highest stable speed) that a reboot did not suffice for the card to recover ...
Sant001 (OP)
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
June 18, 2012, 07:25:01 AM
 #7

it might be better to really power the system off instead of rebooting - I noticed that when I "killed" a card by overclocking (seeking the highest stable speed) that a reboot did not suffice for the card to recover ...

That's a good idea. I noticed something similar that if a card if not detected for whatever reason, a reboot won't solve the issue. And shutting it down the then turning on again could make the card get detected.

I still haven't got a fix on the root cause of the issues I'm having, I'm still not sure it's the card that's getting locked or it's a motherboard that somehow remember the card/slot tha failed and won't let me boot up until I remove/replace that card.
miaviator
Donator
Hero Member
*
Offline Offline

Activity: 686
Merit: 519


It's for the children!


View Profile WWW
June 20, 2012, 08:14:08 PM
 #8

You said you have Dual PSUs?  If so try running on one PSU - dual PSU's can cause lockups in linux if your not syncing/protecting the board from spikes.  Cablesauras and others have adapters to protect the rigs from this.

You can prevent temp lockups by setting a cutoff threshold in BAMT.

You should monitor all of your miners (even just one) using SNMP (included with bamt) or cacti.  I graph all of mine and send SMS to myself and my tech whenever production drops by more than 10%

Are you using extenders or are the cards directly on the board?  Which version of BAMT?

Sant001 (OP)
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
June 21, 2012, 04:02:43 AM
 #9

You said you have Dual PSUs?  If so try running on one PSU - dual PSU's can cause lockups in linux if your not syncing/protecting the board from spikes.  Cablesauras and others have adapters to protect the rigs from this.

You can prevent temp lockups by setting a cutoff threshold in BAMT.

You should monitor all of your miners (even just one) using SNMP (included with bamt) or cacti.  I graph all of mine and send SMS to myself and my tech whenever production drops by more than 10%

Are you using extenders or are the cards directly on the board?  Which version of BAMT?

I have switched to 1 PSU, with only 2 cards on each motherboard. Now it's a little more stable, but still rebooting every now and then.

I once saw a reboot as I had the monitor plugged at that time (most of the time the it's running headless) and I saw on the screen an error related to Kernel Panic. How can I troubleshoot this to find the source of the instability?
miaviator
Donator
Hero Member
*
Offline Offline

Activity: 686
Merit: 519


It's for the children!


View Profile WWW
June 21, 2012, 01:05:20 PM
 #10

You said you were using BAMT? On a USB drive or hard drive?  I would suggest first using a new copy of BAMT as sometimes the USB or HDD images get curropt.

Also ensure you have temperature cutoffs defined, your rig will crash (and burn) if the cards get overheated.

There are a few million reasons for a kernel panic.  On  a mining rig the options are significantly reduced.  Try a new BAMT image and then list your Hardware:

Motherboard:
CPU:
Memory:
Cards:
Extenders or on-board:
Bamt Version:

01BTC10
VIP
Hero Member
*
Offline Offline

Activity: 756
Merit: 503



View Profile
June 21, 2012, 01:14:02 PM
 #11

Had same problem and lowered my OC settings.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!