Bitcoin Forum
December 05, 2016, 04:53:15 PM *
News: To be able to use the next phase of the beta forum software, please ensure that your email address is correct/functional.
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: Rig Crashing and Won't Restart  (Read 719 times)
Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 17, 2012, 09:42:14 AM
 #1

I've assembled my first rig, it's a no brand motherboard with 4 video cards (3x 5970 + 1 5870).

Running it with BAMT it's yielding about 2.2GHps as of now, still pending some overclock tuning.

My problem is every couple of hours the rig stops working and won't restart. It seems from my limited knowledge perspective that some of the cards are locking and keeping the whole machine from booting up.

So usually I will remove each card at a time until the machine successfully boots.

How can I "unlock" a video card? And better how can I prevent these locks?
1480956795
Hero Member
*
Offline Offline

Posts: 1480956795

View Profile Personal Message (Offline)

Ignore
1480956795
Reply with quote  #2

1480956795
Report to moderator
1480956795
Hero Member
*
Offline Offline

Posts: 1480956795

View Profile Personal Message (Offline)

Ignore
1480956795
Reply with quote  #2

1480956795
Report to moderator
1480956795
Hero Member
*
Offline Offline

Posts: 1480956795

View Profile Personal Message (Offline)

Ignore
1480956795
Reply with quote  #2

1480956795
Report to moderator
There are several different types of Bitcoin clients. EWallets are like banks -- a central organization has complete control over your money. You shouldn't put much money in EWallets.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1480956795
Hero Member
*
Offline Offline

Posts: 1480956795

View Profile Personal Message (Offline)

Ignore
1480956795
Reply with quote  #2

1480956795
Report to moderator
zvs
Legendary
*
Offline Offline

Activity: 1386



View Profile WWW
June 17, 2012, 10:17:13 AM
 #2

I've assembled my first rig, it's a no brand motherboard with 4 video cards (3x 5970 + 1 5870).

Running it with BAMT it's yielding about 2.2GHps as of now, still pending some overclock tuning.

My problem is every couple of hours the rig stops working and won't restart. It seems from my limited knowledge perspective that some of the cards are locking and keeping the whole machine from booting up.

So usually I will remove each card at a time until the machine successfully boots.

How can I "unlock" a video card? And better how can I prevent these locks?

are you sure it's not the power supply?  that's what it sounds like to me

Dacentec, best deals for US dedicated servers. They regularly restock $20-$25 Opterons with 8-16GB RAM & 2x1-2TB HDD's (ofc, usually lots of other good stuff to choose from).  I did a Serverbear benchmark of one of my $20/mo Opteron (June last year), it's here.  Have had about a half dozen different servers with Dacentec, & none have failed to sustain at least 40MB/s (burst higher). My favorite is a 12-month rent-to-own ZT Systems 2XL5520 16GB 2x2TB SATA for $40/month (got lucky with the 'off-brand', haven't seen a RTO 2xL5520 for under $50/mo since -- at least for monthly contracts).  wholesaleinternet.com has some ancient 2-core intel CPUs @ $10/mo sometimes (I got an Intel Core 2 6300 @ 1.86GHz, with a 250GB HDD with 46000 hours on it, LOL. $20 @ Dacentec is much better, if you can grab one). joesdatacenter.com (same location as Wholesale Internet) also occasionally has specials (or if you don't want to wait, it has an AMD Opteron 170 @ $16/mo).
Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 17, 2012, 10:22:02 AM
 #3

I'm using 2x PSU model BeQuiet Pro-8 1200W, so I have a total of 2400W worth of juice.

When it locks, even if I disconnect all cards and leave only 1 connected being the locked one the system still won't start.

I believe it hangs when it is starting the X, I get a blank screen and nothing happens beyond that point. The system is also not accessible via SSH, it pretty much freezes.
dellech
Jr. Member
*
Offline Offline

Activity: 36


View Profile
June 17, 2012, 11:11:06 PM
 #4

I can't tell you why it crashes ... but I experienced the same problem at booting machines with 2 or more ATI cards.

Linux 64bit, any version of fglrx ... sometimes the machine locks up hard when it tries to start X ... next try may succeed or not ... whenever I get it to start up I am sure to avoid shutting down as long as possible ...
Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 17, 2012, 11:42:54 PM
 #5

I removed the 1st video card and it booted up normally.

I also noticed that the 1st and 2nd video cards locks more often, they're also drawing their air from the next video cards so they have the highest temperature of all cards in the rig.

So the lock is probably happening because of the high temperatures, right? I think I will be able to avoid it by adding extra fans.

The other question is how can I quickly recover from a locked card? How to quickly unlock it and could it be automated instead of requiring manual work such as manually testing which card is locked and removing it from the rig?

And another important question, why is BAMT restarting the system? It doesn't seem a good idea to restart to me since the system won't come back up at all and sit there for hours until I notice the problem and manually fix it.
dellech
Jr. Member
*
Offline Offline

Activity: 36


View Profile
June 18, 2012, 01:52:49 AM
 #6

it might be better to really power the system off instead of rebooting - I noticed that when I "killed" a card by overclocking (seeking the highest stable speed) that a reboot did not suffice for the card to recover ...
Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 18, 2012, 07:25:01 AM
 #7

it might be better to really power the system off instead of rebooting - I noticed that when I "killed" a card by overclocking (seeking the highest stable speed) that a reboot did not suffice for the card to recover ...

That's a good idea. I noticed something similar that if a card if not detected for whatever reason, a reboot won't solve the issue. And shutting it down the then turning on again could make the card get detected.

I still haven't got a fix on the root cause of the issues I'm having, I'm still not sure it's the card that's getting locked or it's a motherboard that somehow remember the card/slot tha failed and won't let me boot up until I remove/replace that card.
miaviator
Donator
Hero Member
*
Offline Offline

Activity: 672


It's for the children!


View Profile WWW
June 20, 2012, 08:14:08 PM
 #8

You said you have Dual PSUs?  If so try running on one PSU - dual PSU's can cause lockups in linux if your not syncing/protecting the board from spikes.  Cablesauras and others have adapters to protect the rigs from this.

You can prevent temp lockups by setting a cutoff threshold in BAMT.

You should monitor all of your miners (even just one) using SNMP (included with bamt) or cacti.  I graph all of mine and send SMS to myself and my tech whenever production drops by more than 10%

Are you using extenders or are the cards directly on the board?  Which version of BAMT?

Sant001
Full Member
***
Offline Offline

Activity: 182


View Profile
June 21, 2012, 04:02:43 AM
 #9

You said you have Dual PSUs?  If so try running on one PSU - dual PSU's can cause lockups in linux if your not syncing/protecting the board from spikes.  Cablesauras and others have adapters to protect the rigs from this.

You can prevent temp lockups by setting a cutoff threshold in BAMT.

You should monitor all of your miners (even just one) using SNMP (included with bamt) or cacti.  I graph all of mine and send SMS to myself and my tech whenever production drops by more than 10%

Are you using extenders or are the cards directly on the board?  Which version of BAMT?

I have switched to 1 PSU, with only 2 cards on each motherboard. Now it's a little more stable, but still rebooting every now and then.

I once saw a reboot as I had the monitor plugged at that time (most of the time the it's running headless) and I saw on the screen an error related to Kernel Panic. How can I troubleshoot this to find the source of the instability?
miaviator
Donator
Hero Member
*
Offline Offline

Activity: 672


It's for the children!


View Profile WWW
June 21, 2012, 01:05:20 PM
 #10

You said you were using BAMT? On a USB drive or hard drive?  I would suggest first using a new copy of BAMT as sometimes the USB or HDD images get curropt.

Also ensure you have temperature cutoffs defined, your rig will crash (and burn) if the cards get overheated.

There are a few million reasons for a kernel panic.  On  a mining rig the options are significantly reduced.  Try a new BAMT image and then list your Hardware:

Motherboard:
CPU:
Memory:
Cards:
Extenders or on-board:
Bamt Version:

01BTC10
VIP
Hero Member
*
Offline Offline

Activity: 742



View Profile
June 21, 2012, 01:14:02 PM
 #11

Had same problem and lowered my OC settings.
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!