Bitcoin Forum
June 23, 2024, 10:14:24 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Rebooting 2x Per Day  (Read 672 times)
rwtrader (OP)
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
November 20, 2017, 06:53:05 PM
 #1

Can't seem to diagnose why my rig is rebooting itself twice a day. Running a mix of 6 gpus, all nvidia on a biostar tb250btc with g temps running well under 70c. Anyone else have an issue like this and what was the culprit?  Thanks!
Deathwing
Legendary
*
Offline Offline

Activity: 1638
Merit: 1328


Stultorum infinitus est numerus


View Profile WWW
November 20, 2017, 06:53:34 PM
 #2

What are the specs? Most importantly, the operating system?
Miderian
Member
**
Offline Offline

Activity: 72
Merit: 10


View Profile
November 20, 2017, 07:12:27 PM
 #3

hm interesting. Try to speed up your fans more, not by one or two, above 80% and leave it like that for a few hours, but something is not right

   ⚡⚡ PRiVCY ⚡⚡   ▂▃▅▆█ ✅ PRiVCY (PRIV) is a new PoW/PoS revolutionary privacy project ● ☞ ✅ Best privacy crypto-market! ● █▆▅▃▂
    Own Your Privacy! ─────────────────║ WebsiteGithub  |  Bitcointalk  |  Twitter  |  Discord  |  Explorer ║─────────────────
   ✯✯✯✯✯                 ✈✈✈[Free Airdrop - Starts 9th June]✅[Tor]✈✈✈ ║───────────║ Wallet ➢ ✓ Windows  |  ✓ macOS  |  ✓ Linux
percy_tc
Full Member
***
Offline Offline

Activity: 583
Merit: 106


View Profile
November 20, 2017, 07:22:55 PM
 #4

Can't seem to diagnose why my rig is rebooting itself twice a day. Running a mix of 6 gpus, all nvidia on a biostar tb250btc with g temps running well under 70c. Anyone else have an issue like this and what was the culprit?  Thanks!

1. check latest log of miner
2. check that all of your extra power plugs are connected to motherboard
3. i guess you use 2 separeted PSU, make sure, that one PSU is powering all motherboard power ins ( ATX, CPU, 2x 4 pin on mobo) and second PSU just powering VGA and risers.



rwtrader (OP)
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
November 20, 2017, 07:38:26 PM
 #5

Thanks all! Here are some answers.
OS Win 10 Pro
Fans are all 100
Will look at logs
All power connected to mobo including pcie supplement power
Using 1000w for mobo and 3 demanding gpus 500w for less demanding gpus. They draw about 500 and 350w respectively
Does this help for diagnosis? Thanks
fapar
Sr. Member
****
Offline Offline

Activity: 1414
Merit: 267


Undeads.com - P2E Runner Game


View Profile
November 20, 2017, 08:41:39 PM
 #6

You wrote that you need to watch log miner. But you also need to check system log: compmgmt.msc -> Event viewer -> Windows logs -> System

💀|.
   ▄▄▄▄█▄▄              ▄▄█▀▀  ▄▄▄▄▄█      ▄▄    ▄█▄
  ▀▀▀████████▄  ▄██    ███▀ ▄████▀▀▀     ▄███   ▄███
    ███▀▄▄███▀ ███▀   ███▀  ▀█████▄     ▄███   ████▄
  ▄███████▀   ███   ▄███       ▀▀████▄▄███████████▀
▀▀███▀▀███    ███ ▄████       ▄▄████▀▀████   ▄███
 ██▀    ▀██▄  ██████▀▀   ▄▄█████▀▀   ███▀   ▄██▀
          ▀▀█  ▀▀▀▀ ▄██████▀▀       ███▀    █▀
                                      ▀
.
.PLAY2EARN.RUNNER.GAME.
||VIRAL
REF.SYSTEM
GAME
|
████████████████████████████
████████████████████████████
████████████████████████████
██████ ▄▀██████████  ███████
███████▄▀▄▀██████  █████████
█████████▄▀▄▀██  ███████████
███████████▄▀▄ █████████████
███████████  ▄▀▄▀███████████
█████████  ████▄▀▄▀█████████
███████  ████████▄▀ ████████
████████████████████████████
████████████████████████████
████████████████████████████
████████████████████████████
████████████████████████████
████████████████████████████
████████▀▀▄██████▄▀▀████████
███████  ▀        ▀  ███████
██████                ██████
█████▌   ███    ███   ▐█████
█████▌   ▀▀▀    ▀▀▀   ▐█████
██████                ██████
███████▄  ▀██████▀  ▄███████
████████████████████████████
████████████████████████████
████████████████████████████
wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
November 20, 2017, 08:54:35 PM
 #7

Can't seem to diagnose why my rig is rebooting itself twice a day. Running a mix of 6 gpus, all nvidia on a biostar tb250btc with g temps running well under 70c. Anyone else have an issue like this and what was the culprit?  Thanks!
Is this a new rig that you just built? Or was it working fine for some time and started rebooting only recently?
There's not that many good 500W PSUs out there, most of them are crap, so if you're powering 3 cards with one of those, that might be your problem. Even though the cards only take 350W, that could still be too much for a cheap PSU. Then again, you didn't mention the exact specs so we're all guessing here.
Lampaster
Sr. Member
****
Offline Offline

Activity: 406
Merit: 255


View Profile
November 20, 2017, 08:58:03 PM
 #8

The reasons for this can be many. Maybe you need to call a specialist. Windows keeps a log of faults and if there was a restart information must appear in this magazine. I also want to draw your attention to the fact that the temperature is 70 degrees high temperature. My GPU never heated above 58 degrees. Perhaps the driver is to control the temperature and it protects your GPU from overheating.
rwtrader (OP)
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
November 20, 2017, 09:59:13 PM
 #9

i don't see anything out of the ordinary in Claymore logs.
Win says
A connected hardware error has occurred
Component: PCIE Root Port
Error Source: Advanced Error Reporting (PCIE)
Bus Device Function  0x0:0x1C:0x6
VendorID:DeviceID: 0x8086:0xA296
Class Code: 0x30400

New rig working fine for a couple of days
500w is an Antec but it is a couple of years old

After I learn and exhaust all my resources, I might have to call someone
All psus are at 100 with temps running from 49-66c adding another fan today

Thanks for the feedback.  I hope the Win log points us to something.
wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
November 20, 2017, 10:07:02 PM
 #10

Win says
A connected hardware error has occurred
Component: PCIE Root Port
Error Source: Advanced Error Reporting (PCIE)
Bus Device Function  0x0:0x1C:0x6
VendorID:DeviceID: 0x8086:0xA296
Class Code: 0x30400
It's hard to decode these, might be the motherboard, but more likely problems with the risers (one or more might be faulty). Try changing them if you have spares.
dawidt
Jr. Member
*
Offline Offline

Activity: 54
Merit: 10


View Profile
November 20, 2017, 10:56:31 PM
 #11

check risers
cpmcgrat
Member
**
Offline Offline

Activity: 223
Merit: 21

DCAB


View Profile
November 21, 2017, 12:37:16 AM
 #12

For me, my rig was rebooting a couple of times per day due to issues with memory sharding/overflow (my RAM was barebones 4Gb w/ 16Gb swap). The error code reported out to the error monitor was 0x116. I was able to solve this and increase stability by upgrading the rig to 16Gb of RAM (enough to hold any DAG files or buffer up any I/O from the GPUs without having to use the SSD as swap). After doing this, my machine went from rebooting itself 1-2 times a day to being alive and well for the past 2 weeks straight.

If you're running Windows you can find the event logs at Event Logs Viewer > Windows Logs > System. Below is the error I was seeing that tipped me off.

Quote
The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000116 (0xffffe1842ec0b250, 0xfffff802ff76f7d8, 0xffffffffc000009a, 0x0000000000000004). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: b139f2b1-3d17-48a3-a61f-493013377152.

Member, Dero Community Advisory Board (DCAB)
fanatic26
Hero Member
*****
Offline Offline

Activity: 756
Merit: 560


View Profile
November 21, 2017, 12:42:19 AM
 #13

dump Windows and mine on linux if you want reliability and stability

Stop buying industrial miners, running them at home, and then complaining about the noise.
dagarair
Sr. Member
****
Offline Offline

Activity: 847
Merit: 383



View Profile WWW
November 21, 2017, 12:47:14 AM
 #14

1 card - 24 hours
no reboot
2 cards 24 hours
etc etc

4MW Data Center - I BUILT Tongue  - Full story below:
https://bitcointalk.org/index.php?topic=4789787.msg43227027#msg43227027
cpmcgrat
Member
**
Offline Offline

Activity: 223
Merit: 21

DCAB


View Profile
November 21, 2017, 12:55:18 AM
 #15

dump Windows and mine on linux if you want reliability and stability

I mine on both, for my nvidia cards I prefer windows since it is incredibly difficult to overclock/undervolt them on linux systems.

Member, Dero Community Advisory Board (DCAB)
rwtrader (OP)
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
November 21, 2017, 01:41:44 AM
 #16

I will try swapping risers as I bring on card by card.
If I were more comfortable with Linux I would give it a try.
I thought about the ram so I brought it up to 8.  No effect...
wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
November 21, 2017, 01:58:25 AM
 #17

I will try swapping risers as I bring on card by card.
If I were more comfortable with Linux I would give it a try.
I thought about the ram so I brought it up to 8.  No effect...
Looking at your logs, I would say it's more likely to be a hardware problem, so adding more RAM or switching to Linux is not going to help. For now the main suspects are the board (less likely) and the risers (more likely).
rwtrader (OP)
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
November 26, 2017, 06:48:03 PM
 #18

Ok. Seemed to be working fine, no reboots for 3 days. I noticed the extension was getting warm, so I shut it down to use a larger gauge extension and now it's worse! Rebooting every few hours. Any clue why this would be? Thanks again.
wacko
Legendary
*
Offline Offline

Activity: 1106
Merit: 1014


View Profile
November 26, 2017, 06:52:21 PM
 #19

I will try swapping risers as I bring on card by card.

...

Ok. Seemed to be working fine, no reboots for 3 days. I noticed the extension was getting warm, so I shut it down to use a larger gauge extension and now it's worse! Rebooting every few hours. Any clue why this would be? Thanks again.

Did you try to do what was suggested to you? Changing risers at least?
rwtrader (OP)
Newbie
*
Offline Offline

Activity: 32
Merit: 0


View Profile WWW
November 27, 2017, 02:28:30 PM
 #20

Absolutely! That's how I got a good 3 days before I manually shut it down.
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!