Bitcoin Forum
November 09, 2024, 05:47:07 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Why my rig keeps hanging [BAMT with 4 280x GPU's]?  (Read 2494 times)
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 20, 2014, 02:33:45 PM
Last edit: March 20, 2014, 03:10:39 PM by PeerMedia
 #1

Hello, I've been going nuts trying to figure out what the source of this issue might be, after 6-8 hours my rig seems to hang, I don't believe its crashed as my USB stick has a LED that keeps flashing, however I can no longer ping my rig and its no longer connected to my router when it hangs (verified through list of clients on my router). I have to manually reboot every time to get it restarted and it will hang again and again after 6-8 hours.

My rig:
4x Sapphire 280x OC edition
Mobo: ASRock 970 Extreme4
CPU: AMD Sempron
RAM: 4 Gig RAM
PSU: 1300W EVGA Supernova Gold
OS: BAMT 1.2 powered by USB stick

I've tried the following:
Tried inserting the USB stick in 3 different ports (no luck)
Tried formatting the USB stick and re-install OS (no luck)
Tried a new USB stick (no luck)
Tried installing a new NIC for internet connections instead of the onboard NIC (no luck)

I'm officially out of ideas as to what the issue might be, any help really appreciated.
Thank you!
Wipeout2097
Sr. Member
****
Offline Offline

Activity: 840
Merit: 255


SportsIcon - Connect With Your Sports Heroes


View Profile
March 20, 2014, 02:50:42 PM
 #2

I have a 7870 card that locks up whatever rig it's on. I also had to return an Asrock 970 E4 to RMA.
My thread is somewhere on the main Bitcoin mining forum

Try to run your rig underclocked like 500 Mhz core and 1000 Mhz ram, with everything not needed for mining disabled on the BIOS, disable power saving settings, etc... Then mine with only one card at a time.

███████████████████████████████████████████████████████████████
██▀       ▀█       ▀████████████        ▀█         █▀       ▀██
██   ▀██▄▄▄█   ██   ████████████   ███   ████   ████   ▀██▄▄▄██
███▄     ▀██       ▄████████████       ▄█████   █████▄     ▀███
██▀▀▀██▄   █   █████████████████   █▄  ▀█████   ████▀▀▀██▄   ██
██▄       ▄█   █████████████████   ██▄  ▀████   ████▄       ▄██
███████████████████████████████████████████████████████████████
██       ██▀      ▀█████████████    ▀██   █████████████████████
████   ███   ▄██▄   ████████████     ▀█   █████████████████████
████   ███   ████████   ████   █   ▄  ▀   █████████████████████
████   ███   ▀██▀   █   ████   █   █▄     █████████████████████
██       ██▄      ▄███        ██   ██▄    █████████████████████
███████████████████████████████████████████████████████████████
██████████████
██
██
██
██
██
██
██
██
██
██
██
██████████████
████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                             ████████████████████████████████████████████████
.
.
.

████████████████████████████████████████████████████████████          ████████████████                                 ██████████████████████████████████████████████████████████████████████████████████████
██████████████
██
██
██
██
██
██
██
██
██
██
██
██████████████
███████
██
██
██
██
██
██
██
██
██
██
██
███████
███████
██
██
██
██
██
██
██
██
██
██
██
███████
►►  Powered by
BOUNTY
DETECTIVE
grosminer
Hero Member
*****
Offline Offline

Activity: 718
Merit: 500



View Profile
March 20, 2014, 03:03:27 PM
 #3

What's your PSU?
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 20, 2014, 03:11:20 PM
 #4

What's your PSU?


PSU: 1300W EVGA Supernova Gold

It runs at 1120W as checked by a wattage meter.
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 20, 2014, 07:18:08 PM
 #5

I have a 7870 card that locks up whatever rig it's on. I also had to return an Asrock 970 E4 to RMA.
My thread is somewhere on the main Bitcoin mining forum

Try to run your rig underclocked like 500 Mhz core and 1000 Mhz ram, with everything not needed for mining disabled on the BIOS, disable power saving settings, etc... Then mine with only one card at a time.


This was a new build so I haven't done any optimizations yet, its running stock settings for GPUs and for BIOS too. It takes about 8 hours for the rig to hang so its not so easy to run just 1 GPU at a time to try and isolate it. But its clearly not the USB drive, not the NIC. Since it runs fine for 8 hours at a time I can't see this being a CPU/RAM issue. I'm really baffled what else it could be. The only fix I can come up with is writing a reboot script to reboot every 6 hours or so and hope that circumvents the hanging, but I'd rather fix the issue than rebooting 4x/day.
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 21, 2014, 05:46:52 PM
 #6

Anyone know if its possible that the mobo is causing a sleep? This is the only ASRock 970 I've bought, all the other ones I've gotten were Intel based and I can't for the life of me figure out what's going on! Help!
Wipeout2097
Sr. Member
****
Offline Offline

Activity: 840
Merit: 255


SportsIcon - Connect With Your Sports Heroes


View Profile
March 21, 2014, 05:58:52 PM
 #7

As I said before, I had to send my Asrock 970 Extreme4 to RMA. I've been there, Asrock (970) is crap or at least there are many defective MB's on the market. I wasted time, money and peace of mind. Click here, and see if it looks familiar: https://bitcointalk.org/index.php?topic=381626.0

Now, of course it could be something else, like yes, something that goes to sleep. But be prepared for the worst.

For a few days, I worked around those random lockups with a timer between the wall socket and the PSU plug that cuts the power each x hours, for a minute. You set "Power On" on power failure on the BIOS, and set the machine to start mining on boot. This is half-assed and the power cuts and restarts are not healthy at all for the rig components.

Did you try with Windows? I use Windows 7 64 where I can disable all power saving/sleeping options. Also on the BIOS, disable everything not needed for mining, like audio, firewire, SATA when using an usb pen, etc...

███████████████████████████████████████████████████████████████
██▀       ▀█       ▀████████████        ▀█         █▀       ▀██
██   ▀██▄▄▄█   ██   ████████████   ███   ████   ████   ▀██▄▄▄██
███▄     ▀██       ▄████████████       ▄█████   █████▄     ▀███
██▀▀▀██▄   █   █████████████████   █▄  ▀█████   ████▀▀▀██▄   ██
██▄       ▄█   █████████████████   ██▄  ▀████   ████▄       ▄██
███████████████████████████████████████████████████████████████
██       ██▀      ▀█████████████    ▀██   █████████████████████
████   ███   ▄██▄   ████████████     ▀█   █████████████████████
████   ███   ████████   ████   █   ▄  ▀   █████████████████████
████   ███   ▀██▀   █   ████   █   █▄     █████████████████████
██       ██▄      ▄███        ██   ██▄    █████████████████████
███████████████████████████████████████████████████████████████
██████████████
██
██
██
██
██
██
██
██
██
██
██
██████████████
████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                             ████████████████████████████████████████████████
.
.
.

████████████████████████████████████████████████████████████          ████████████████                                 ██████████████████████████████████████████████████████████████████████████████████████
██████████████
██
██
██
██
██
██
██
██
██
██
██
██████████████
███████
██
██
██
██
██
██
██
██
██
██
██
███████
███████
██
██
██
██
██
██
██
██
██
██
██
███████
►►  Powered by
BOUNTY
DETECTIVE
trc
Full Member
***
Offline Offline

Activity: 164
Merit: 100


View Profile WWW
March 21, 2014, 06:04:08 PM
 #8

Check the switch. Most low quality ones require a power reset at least once a week and some worse, once a day.

>> nope
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 22, 2014, 01:21:35 AM
Last edit: March 22, 2014, 01:44:37 AM by PeerMedia
 #9

As I said before, I had to send my Asrock 970 Extreme4 to RMA. I've been there, Asrock (970) is crap or at least there are many defective MB's on the market. I wasted time, money and peace of mind. Click here, and see if it looks familiar: https://bitcointalk.org/index.php?topic=381626.0

Now, of course it could be something else, like yes, something that goes to sleep. But be prepared for the worst.

For a few days, I worked around those random lockups with a timer between the wall socket and the PSU plug that cuts the power each x hours, for a minute. You set "Power On" on power failure on the BIOS, and set the machine to start mining on boot. This is half-assed and the power cuts and restarts are not healthy at all for the rig components.

Did you try with Windows? I use Windows 7 64 where I can disable all power saving/sleeping options. Also on the BIOS, disable everything not needed for mining, like audio, firewire, SATA when using an usb pen, etc...

I've read the thread you linked me to and it sounds like we have the same problem. The idea of RMA'ing this mobo is a real pain though as it seems to work for 8 hours at a time. It doesn't make sense to me from a logic point of view that it would be the mobo as it works solidly for hours. I'm currently trying to do a soft reboot via crontab to reboot the system before it crashes hoping that will fix it. At least I see that as a better solution than a hard power cut, but I'm having problems getting the reboot command to execute right now.

After you RMA'd the mobo, did it work for you? Other than swapping the mobo, did anything else change (any other different hardware)?
uncle_muddy
Newbie
*
Offline Offline

Activity: 33
Merit: 0


View Profile
March 22, 2014, 10:14:59 AM
 #10

I had exactly the same problem with my 4 x 280X rig, would just hang after a few hours for no reason what so ever. Showed the same showed the same symptoms as your getting.

In the end after a lot of looking around the problem was caused by the IP Stack falling down, no idea why or what caused it. One day it was working fine then all of a sudden it started to play up Embarrassed

I carried out all the same step that you have, reformat of USB key, new USB key, etc.. and still nothing, to the point where I set my rig to reboot every 4hrs to try and keep mining whilst I investigated the problem.

The resolution was to hardcode the IP into the rig and remove the network managers, I'm hunting for the link that I found with all the detail on it but can't find it right now but the command was something along the lines of

Code:
apt-get remove network-manager*  

Hope this helps, and if I can find the link I will post it up for you
Wipeout2097
Sr. Member
****
Offline Offline

Activity: 840
Merit: 255


SportsIcon - Connect With Your Sports Heroes


View Profile
March 22, 2014, 10:40:09 AM
 #11

After you RMA'd the mobo, did it work for you? Other than swapping the mobo, did anything else change (any other different hardware)?
Yes. The store tested with Prime95 and it failed with either my FX-6100 and also a Phenom T1100. Then I told them I wanted a new one, same Asrock unfortunately because a better one costed a lot more (some Asus Sabertooth). However this new one works flawlessly for 3 months now. Nothing changed, I can throw RAMs of different speeds and brands, single-channel 6GB RAM, some overclock, power saving options, lots of weird crap enabled on the BIOS, FX-6100 with a basic heatsink, Windows 7 booting from a USB pen, fake Alfa USB wifi dongle. All this shit works!

But be sure it is the mobo. Because I have a 7870 card that does precisely the same. Or even it's an hardware issue at all.  

███████████████████████████████████████████████████████████████
██▀       ▀█       ▀████████████        ▀█         █▀       ▀██
██   ▀██▄▄▄█   ██   ████████████   ███   ████   ████   ▀██▄▄▄██
███▄     ▀██       ▄████████████       ▄█████   █████▄     ▀███
██▀▀▀██▄   █   █████████████████   █▄  ▀█████   ████▀▀▀██▄   ██
██▄       ▄█   █████████████████   ██▄  ▀████   ████▄       ▄██
███████████████████████████████████████████████████████████████
██       ██▀      ▀█████████████    ▀██   █████████████████████
████   ███   ▄██▄   ████████████     ▀█   █████████████████████
████   ███   ████████   ████   █   ▄  ▀   █████████████████████
████   ███   ▀██▀   █   ████   █   █▄     █████████████████████
██       ██▄      ▄███        ██   ██▄    █████████████████████
███████████████████████████████████████████████████████████████
██████████████
██
██
██
██
██
██
██
██
██
██
██
██████████████
████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                             ████████████████████████████████████████████████
.
.
.

████████████████████████████████████████████████████████████          ████████████████                                 ██████████████████████████████████████████████████████████████████████████████████████
██████████████
██
██
██
██
██
██
██
██
██
██
██
██████████████
███████
██
██
██
██
██
██
██
██
██
██
██
███████
███████
██
██
██
██
██
██
██
██
██
██
██
███████
►►  Powered by
BOUNTY
DETECTIVE
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 22, 2014, 03:56:09 PM
 #12

I finally got crontab to coldreboot after 12 hours, the server hung at around 8 hours. The server was no longer connected to my router although the USB stick was still flashing (showing the OS is running). After 12 hours, the server successfully rebooted and restarted on its own. Clearly, the OS is still functioning but something is up with its internet connectivity, it seems to be dropping the connection. I'm not using any wireless, using a wired cable between the onboard NIC to the router and I also tried a 1GB NIC instead of the mobo NIC.

I can't tell if the NIC is maybe going to sleep because of the mobo (no idea why that would happen) or if the OS is causing it. I've setup this type of rig a few times with other ASRock Mobo's and never seen this issue before so I'm a little baffled.

Uncle_muddy, if you can find that link I'd really appreciate it as it seems what you had is exactly the issue I'm having. Once you removed the network manager, did you replace it with something else?

Thank you!
uncle_muddy
Newbie
*
Offline Offline

Activity: 33
Merit: 0


View Profile
March 22, 2014, 06:27:50 PM
 #13

After hunting I have found it

http://www.bitcointrading.com/forum/linux-distros/bamt-version-0-5-easy-usb-based-mining-linux-with-farm-wide-management-tools/

Look at the 2nd reply in the post, these are the instructions I used and the problem went away Smiley

I didn't replace the network manager with anything, as there was no instruction to do so. I have not had the problem since doing this thou so I'm guessing that this is the cause and that what ever is removed is not required

Hope this helps you out

UM
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 22, 2014, 06:53:54 PM
 #14

I made the changes suggested in the thread, rebooted and disabled my cronjob. I'll know in 12 hours if it worked but it looks very promising as I'm convinced its a network / NIC issue and not the mobo. Thanks for your help uncle_muddy.
uncle_muddy
Newbie
*
Offline Offline

Activity: 33
Merit: 0


View Profile
March 22, 2014, 07:50:30 PM
 #15

fingers are crossed for you, let me know how you get on

UM
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 24, 2014, 06:14:31 PM
 #16

fingers are crossed for you, let me know how you get on

UM

Your suggestion worked perfectly, 48+ hours straight without a glitch. If you have a dogecoin address, PM or post it here as I'd like to make a donation to thank you for your help. Thanks again!
uncle_muddy
Newbie
*
Offline Offline

Activity: 33
Merit: 0


View Profile
March 25, 2014, 11:58:01 AM
 #17

Glad to hear that your problem is solved, I don't understand what happens as my rig was running fine and then all of a sudden started playing this silly game, and wouldn't stop till I had removed the network managers, another rig running from the same .img of BAMT no problems at all  Huh

Well at least its fix, which is the main thing

Thats very kind of you sir address is DRfF5scEP7CU6c6rdk1hkRfGzkYWmSfXhc

UM
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 25, 2014, 04:44:19 PM
 #18

Glad to hear that your problem is solved, I don't understand what happens as my rig was running fine and then all of a sudden started playing this silly game, and wouldn't stop till I had removed the network managers, another rig running from the same .img of BAMT no problems at all  Huh

Well at least its fix, which is the main thing

Thats very kind of you sir address is DRfF5scEP7CU6c6rdk1hkRfGzkYWmSfXhc

UM

Yeah, I had setup 4 other rigs with identical setups, same version of BAMT, but this specific motherboard had this issue and it was brutal to diagnose. I was going to RMA the mobo until this fix worked. Thanks again, and doge's sent. Thanks for your help Smiley
uncle_muddy
Newbie
*
Offline Offline

Activity: 33
Merit: 0


View Profile
March 26, 2014, 10:44:46 AM
 #19

I've only just noted that your Mobo is a ASRock 970 Extreme4, the one giving me grief was a ASRock 970 Extreme3 I wonder if there is a connection there somehow?

DOGE received with thanks Smiley will spend it on a beer at the weekend

UM
PeerMedia (OP)
Newbie
*
Offline Offline

Activity: 44
Merit: 0


View Profile
March 26, 2014, 06:08:06 PM
 #20

The other 4 rigs I setup were all Z77 Extreme4, never an issue. I only bought this one as it was $50 cheaper with rebate from Newegg. I figured what harm could going AMD instead of Intel be. Great thought, lol.
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!