rwtrader (OP)
Newbie
Offline
Activity: 32
Merit: 0
|
|
November 20, 2017, 06:53:05 PM |
|
Can't seem to diagnose why my rig is rebooting itself twice a day. Running a mix of 6 gpus, all nvidia on a biostar tb250btc with g temps running well under 70c. Anyone else have an issue like this and what was the culprit? Thanks!
|
|
|
|
Deathwing
Legendary
Offline
Activity: 1638
Merit: 1329
Stultorum infinitus est numerus
|
|
November 20, 2017, 06:53:34 PM |
|
What are the specs? Most importantly, the operating system?
|
|
|
|
Miderian
Member
Offline
Activity: 72
Merit: 10
|
|
November 20, 2017, 07:12:27 PM |
|
hm interesting. Try to speed up your fans more, not by one or two, above 80% and leave it like that for a few hours, but something is not right
|
|
|
|
percy_tc
|
|
November 20, 2017, 07:22:55 PM |
|
Can't seem to diagnose why my rig is rebooting itself twice a day. Running a mix of 6 gpus, all nvidia on a biostar tb250btc with g temps running well under 70c. Anyone else have an issue like this and what was the culprit? Thanks!
1. check latest log of miner 2. check that all of your extra power plugs are connected to motherboard 3. i guess you use 2 separeted PSU, make sure, that one PSU is powering all motherboard power ins ( ATX, CPU, 2x 4 pin on mobo) and second PSU just powering VGA and risers.
|
|
|
|
rwtrader (OP)
Newbie
Offline
Activity: 32
Merit: 0
|
|
November 20, 2017, 07:38:26 PM |
|
Thanks all! Here are some answers. OS Win 10 Pro Fans are all 100 Will look at logs All power connected to mobo including pcie supplement power Using 1000w for mobo and 3 demanding gpus 500w for less demanding gpus. They draw about 500 and 350w respectively Does this help for diagnosis? Thanks
|
|
|
|
fapar
Sr. Member
Offline
Activity: 1414
Merit: 270
Undeads.com - P2E Runner Game
|
|
November 20, 2017, 08:41:39 PM |
|
You wrote that you need to watch log miner. But you also need to check system log: compmgmt.msc -> Event viewer -> Windows logs -> System
|
|
|
|
wacko
Legendary
Offline
Activity: 1106
Merit: 1014
|
|
November 20, 2017, 08:54:35 PM |
|
Can't seem to diagnose why my rig is rebooting itself twice a day. Running a mix of 6 gpus, all nvidia on a biostar tb250btc with g temps running well under 70c. Anyone else have an issue like this and what was the culprit? Thanks!
Is this a new rig that you just built? Or was it working fine for some time and started rebooting only recently? There's not that many good 500W PSUs out there, most of them are crap, so if you're powering 3 cards with one of those, that might be your problem. Even though the cards only take 350W, that could still be too much for a cheap PSU. Then again, you didn't mention the exact specs so we're all guessing here.
|
|
|
|
Lampaster
|
|
November 20, 2017, 08:58:03 PM |
|
The reasons for this can be many. Maybe you need to call a specialist. Windows keeps a log of faults and if there was a restart information must appear in this magazine. I also want to draw your attention to the fact that the temperature is 70 degrees high temperature. My GPU never heated above 58 degrees. Perhaps the driver is to control the temperature and it protects your GPU from overheating.
|
|
|
|
rwtrader (OP)
Newbie
Offline
Activity: 32
Merit: 0
|
|
November 20, 2017, 09:59:13 PM |
|
i don't see anything out of the ordinary in Claymore logs. Win says A connected hardware error has occurred Component: PCIE Root Port Error Source: Advanced Error Reporting (PCIE) Bus Device Function 0x0:0x1C:0x6 VendorID:DeviceID: 0x8086:0xA296 Class Code: 0x30400
New rig working fine for a couple of days 500w is an Antec but it is a couple of years old
After I learn and exhaust all my resources, I might have to call someone All psus are at 100 with temps running from 49-66c adding another fan today
Thanks for the feedback. I hope the Win log points us to something.
|
|
|
|
wacko
Legendary
Offline
Activity: 1106
Merit: 1014
|
|
November 20, 2017, 10:07:02 PM |
|
Win says A connected hardware error has occurred Component: PCIE Root Port Error Source: Advanced Error Reporting (PCIE) Bus Device Function 0x0:0x1C:0x6 VendorID:DeviceID: 0x8086:0xA296 Class Code: 0x30400
It's hard to decode these, might be the motherboard, but more likely problems with the risers (one or more might be faulty). Try changing them if you have spares.
|
|
|
|
dawidt
Jr. Member
Offline
Activity: 54
Merit: 10
|
|
November 20, 2017, 10:56:31 PM |
|
check risers
|
|
|
|
cpmcgrat
Member
Offline
Activity: 223
Merit: 21
DCAB
|
|
November 21, 2017, 12:37:16 AM |
|
For me, my rig was rebooting a couple of times per day due to issues with memory sharding/overflow (my RAM was barebones 4Gb w/ 16Gb swap). The error code reported out to the error monitor was 0x116. I was able to solve this and increase stability by upgrading the rig to 16Gb of RAM (enough to hold any DAG files or buffer up any I/O from the GPUs without having to use the SSD as swap). After doing this, my machine went from rebooting itself 1-2 times a day to being alive and well for the past 2 weeks straight. If you're running Windows you can find the event logs at Event Logs Viewer > Windows Logs > System. Below is the error I was seeing that tipped me off. The computer has rebooted from a bugcheck. The bugcheck was: 0x00000116 (0xffffe1842ec0b250, 0xfffff802ff76f7d8, 0xffffffffc000009a, 0x0000000000000004). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: b139f2b1-3d17-48a3-a61f-493013377152.
|
Member, Dero Community Advisory Board (DCAB)
|
|
|
fanatic26
|
|
November 21, 2017, 12:42:19 AM |
|
dump Windows and mine on linux if you want reliability and stability
|
Stop buying industrial miners, running them at home, and then complaining about the noise.
|
|
|
dagarair
|
|
November 21, 2017, 12:47:14 AM |
|
1 card - 24 hours no reboot 2 cards 24 hours etc etc
|
|
|
|
cpmcgrat
Member
Offline
Activity: 223
Merit: 21
DCAB
|
|
November 21, 2017, 12:55:18 AM |
|
dump Windows and mine on linux if you want reliability and stability
I mine on both, for my nvidia cards I prefer windows since it is incredibly difficult to overclock/undervolt them on linux systems.
|
Member, Dero Community Advisory Board (DCAB)
|
|
|
rwtrader (OP)
Newbie
Offline
Activity: 32
Merit: 0
|
|
November 21, 2017, 01:41:44 AM |
|
I will try swapping risers as I bring on card by card. If I were more comfortable with Linux I would give it a try. I thought about the ram so I brought it up to 8. No effect...
|
|
|
|
wacko
Legendary
Offline
Activity: 1106
Merit: 1014
|
|
November 21, 2017, 01:58:25 AM |
|
I will try swapping risers as I bring on card by card. If I were more comfortable with Linux I would give it a try. I thought about the ram so I brought it up to 8. No effect...
Looking at your logs, I would say it's more likely to be a hardware problem, so adding more RAM or switching to Linux is not going to help. For now the main suspects are the board (less likely) and the risers (more likely).
|
|
|
|
rwtrader (OP)
Newbie
Offline
Activity: 32
Merit: 0
|
|
November 26, 2017, 06:48:03 PM |
|
Ok. Seemed to be working fine, no reboots for 3 days. I noticed the extension was getting warm, so I shut it down to use a larger gauge extension and now it's worse! Rebooting every few hours. Any clue why this would be? Thanks again.
|
|
|
|
wacko
Legendary
Offline
Activity: 1106
Merit: 1014
|
|
November 26, 2017, 06:52:21 PM |
|
I will try swapping risers as I bring on card by card.
... Ok. Seemed to be working fine, no reboots for 3 days. I noticed the extension was getting warm, so I shut it down to use a larger gauge extension and now it's worse! Rebooting every few hours. Any clue why this would be? Thanks again.
Did you try to do what was suggested to you? Changing risers at least?
|
|
|
|
rwtrader (OP)
Newbie
Offline
Activity: 32
Merit: 0
|
|
November 27, 2017, 02:28:30 PM |
|
Absolutely! That's how I got a good 3 days before I manually shut it down.
|
|
|
|
|