Bitcoin Forum
November 07, 2024, 09:42:12 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: mining rig keeps dying. Too hot?  (Read 1155 times)
krazitrain (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
July 28, 2017, 10:33:21 PM
Last edit: July 28, 2017, 10:56:21 PM by krazitrain
 #1

I just got into this last month. Hope this is the right place to post this.

here's my setup:

Custom-built case 30" x 18" x 12"

rig 1
8 PNY GTX 1060 3 GB    MPN:   VCGGTX10603PB
ASUS PRIME Z270-A LGA 1151
Windows 10
EVGA SuperNOVA 1000 G3 powering five cards and the risers, motherboard and SSD
EVGA 850 BQ just powering three of the cards

rig 2
8 PNY GTX 1060 3 GB    MPN:   VCGGTX10603XGPB-OC-BB  XLR8 Edition
ASUS PRIME Z270-A LGA 1151
Windows 10
EVGA SuperNOVA 1000 G3 powering five cards and the risers, motherboard and SSD
EVGA 850 BQ just powering three of the cards

I have the same setup on two rigs.. Except different version cards. on rig 1, I was able to take the power limit down to 68, core clock 46, memory clock 456, fan speed is 45. It is running at mid-60s Celsius. no problems.. I do have it in a larger room..

Rig 2. I have this in a smaller room. I have eight cards working, but it keeps shutting off. I am unable to take the power limit down. These are fancier cards with backplate. I have fans running on them. I also have the window open.. Trying to blow the air out the window. Trying everything and I can't get this to stay stable.

It's 80° in the small room. My air conditioner is in the main room where the other rig is at. I was hoping to put both rigs in the smaller room.  It is getting plenty of airflow around the cards.

Would Linux be better?
should I start pulling cards off one at a time to see if it is stable then?

I ordered an air conditioner for small rooms. Will that help?

It is supposed to be mid-90s here all week and who knows what about August or beyond. Thanks for any help. I used to watch gold rush on discovery and now I know why those guys get so antsy when their mining rigs are not working.



igotfits
Full Member
***
Offline Offline

Activity: 298
Merit: 100


View Profile
July 28, 2017, 11:08:14 PM
Last edit: July 28, 2017, 11:37:16 PM by igotfits
 #2

Whats the temps on the 2nd rig i only see temps for the 1st rig.

now when you say the system dies? what happens? stutters? freezes? cant use it at all and a forced restart is needed?

i have a buddy running the same cards, i suggest: Core 0, Mem 400 (Samsung), sorry to say he isnt able to push his cards over that, the system just crashes.. Run these settings to get stability first then experiment.
you can also start removing cards 1 by 1 and pushing each card individually.
Za1n
Legendary
*
Offline Offline

Activity: 1078
Merit: 1011



View Profile
July 28, 2017, 11:22:11 PM
 #3

On your problem rig, yes go ahead and start removing the cards as it is either heat as you suspect, or possibly one of the cards is acting up.

You could start by removing two cards and, if you can, adjusting the spacing between the remaining ones and let it run for 24 hours or so and see if that was the issue. If that doesn't solve it, then continue to swap out the cards until you can narrow it down to either a card, or if they all have been swapped and the trouble remains, the system.

Another common thing is to remove and re-install the drivers, which I could not tell from your post if you tried this already. I assume you did since you suspect the problem is heat related, but if not it could be easier to try just to rule it out.
krazitrain (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
July 28, 2017, 11:28:02 PM
 #4

i really don't believe it's a temp thing, mid 60c is perfectly fine, im running 60-65c across all my cards.

now when you say the system dies? what happens? stutters? freezes? cant use it at all and a forced restart is needed?


The rig in the small room freezes. can't move the mouse sometimes. screen goes black and just have to power it down and force restart.

a couple cards run at 78 degrees, couple at 75 degrees and some near 72 degrees. depending on where i put big fan.

I've reinstalled drivers a few times.
igotfits
Full Member
***
Offline Offline

Activity: 298
Merit: 100


View Profile
July 28, 2017, 11:33:44 PM
 #5

i really don't believe it's a temp thing, mid 60c is perfectly fine, im running 60-65c across all my cards.

now when you say the system dies? what happens? stutters? freezes? cant use it at all and a forced restart is needed?


The rig in the small room freezes. can't move the mouse sometimes. screen goes black and just have to power it down and force restart.

a couple cards run at 78 degrees, couple at 75 degrees and some near 72 degrees. depending on where i put big fan.

I've reinstalled drivers a few times.

i have the same board, fantastic board BTW with the latest drivers ofcourse. Those cards i believe are running hot, i dont like to see anything over 75c, damage doesnt start to kick in until your nearing 90c.
the slider alone is weird as heck! you should always have control of that thing no matter what, i kinda wanna say software issue but can also be hardware that isnt working properly.
You can start by lowering to the settings i sent you, then if that doesnt fix it, start removing card by card, then swapped risers. process of elimination.
philipma1957
Legendary
*
Online Online

Activity: 4298
Merit: 8804


'The right to privacy matters'


View Profile WWW
July 28, 2017, 11:34:17 PM
 #6

if you can not lower tdp on msi afterburner something is wrong.

go to 1 card and see if the tdp slider works.

if it is stuck.

uninstall msi afterburner and uninstall nvidia drivers.

try to instal nvidia 382.33 with 1 card.  try to install msi afterburner 4.3 with 1 card

see if slider works.

some advice never never never do 100% tdp.---- many will say i am wrong  I simply don't care.

if the slider does not work with the one card.

try with it in every slot or riser just the one card.

if that card does not ever let the tdp slider work  walk it out of the room try the next card i have found 1 or 5 or 6 or even 4 can be the entire issue and not using the one shit head card solves the issue.  most of the time

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
krazitrain (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
July 28, 2017, 11:56:25 PM
 #7

I will try the suggestions.

I am also going to bring the rig out to the bigger room where it is a little cooler and see if that changes things..


I ccurrently have it set to 80 power right now with 400+ memory  and I have two cards running hotter at 74, 75. The lowest are 64 where the fans are probably hitting them.I'm sure it will freeze pretty soon.
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 29, 2017, 12:29:39 AM
 #8

be sure its not the power supply overheating as well... use a laser thermometer to check.

I have a 1000w power supply that didn't like having my GTX980 right against it in the Fractal Designs Define R5 case.... and if it got to be pretty hot inside, it would randomly freeze or reboot.

Screwing a fan to the exhaust port in the back, sucking more air through it helped immensely, but ultimately, I removed that card form it that was butted right up against it and I havent had an issue yet.

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
July 29, 2017, 12:51:13 AM
 #9

It's power, the temps are fine. The 1060 is rated at 120W and you have 8 of them and the rest of the system powered by a 1000 W PSU.
Pull 2 cards from each rig and build a new rig with them or replace the PSUs with a pair of 1600 W beasts.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
philipma1957
Legendary
*
Online Online

Activity: 4298
Merit: 8804


'The right to privacy matters'


View Profile WWW
July 29, 2017, 01:03:47 AM
 #10

It's power, the temps are fine. The 1060 is rated at 120W and you have 8 of them and the rest of the system powered by a 1000 W PSU.
Pull 2 cards from each rig and build a new rig with them or replace the PSUs with a pair of 1600 W beasts.


he has 1000 for 5 cards and 850 for the other three.

he said he could not turn down tdp lower then 80    so 80 x 120 =  960 watts with 1850 in watts available.

He need see why the cards only drop to 80% tdp

if he dropped them to 70%   he would be 70 x 120 = 840 watts    and he would not overheat.  that extra 120 watts of power is too hard to cool off.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
krazitrain (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
July 29, 2017, 02:17:54 AM
 #11

I used kill a watt to get wattage reading from each power supply when it was mining and the 1000 PSU is  at 700 W and the 850 PSU is at around 300 W. So 1000 total. I took those readings when I started without changing settings.

So now it's been running steady for two hours at power 80%, memory 400+. the two hottest cards are now about 73 and 74°C. I'm holding my breath.

It's still 80° Fahrenheit in that room I have the window open. 89° Fahrenheit outside..
krazitrain (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
July 29, 2017, 02:35:04 AM
 #12

2.5 hours.  must have been an error. windows retarted.

so that's where I've been with this rig.. It may run for two hours. It may run for four hours. It may run for one hour. But it won't run steady..

Thankfully the other rig is running like a champ.
joblo
Legendary
*
Offline Offline

Activity: 1470
Merit: 1114


View Profile
July 29, 2017, 02:45:30 AM
 #13

It's power, the temps are fine. The 1060 is rated at 120W and you have 8 of them and the rest of the system powered by a 1000 W PSU.
Pull 2 cards from each rig and build a new rig with them or replace the PSUs with a pair of 1600 W beasts.


he has 1000 for 5 cards and 850 for the other three.

he said he could not turn down tdp lower then 80    so 80 x 120 =  960 watts with 1850 in watts available.

He need see why the cards only drop to 80% tdp

if he dropped them to 70%   he would be 70 x 120 = 840 watts    and he would not overheat.  that extra 120 watts of power is too hard to cool off.

Oops, that's what I get for skim reading. Still 75C is not that hot, and an overheated GPU won't cause a system shutdown.
Either the power draw between the PSUs is unbalanced and overloading one of them or one is defective and can't handle
a normal load.

There are a few things that can be done to troubleshoot and isolate the suspect component. Split the rig and try running it
with 5 cards and the big PSU, then try running it with the other 3 cards only using the second PSU. Swap cards with the other
rig. Swap PSUs. Just move components around to see if the problem follows.

AKA JayDDee, cpuminer-opt developer. https://github.com/JayDDee/cpuminer-opt
https://bitcointalk.org/index.php?topic=5226770.msg53865575#msg53865575
BTC: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT,
Elder III
Sr. Member
****
Offline Offline

Activity: 1246
Merit: 274


View Profile
July 29, 2017, 03:42:04 AM
 #14

I would suspect that either the PSU cannot handle the load and eventually craps out, or you may have a bad riser (possibly GPU instead). We have some PNY GPUs that will not go lower then 80% power limit in Afterburner, I believe they are the same ones you have. I can drag the slider past 80% but monitoring the power draw, both via software and via a killawatt meter at the wall shows that the GPUs still consumer ~80% power even if the slider is at 60 or 70%. I chalked it up to being a hardcoded into the bios issue and didn't investigate it any further. The temps and profits both have justified running them at 80% so I didn't worry about it (actually had forgotten until I read this thread).
igotfits
Full Member
***
Offline Offline

Activity: 298
Merit: 100


View Profile
July 29, 2017, 03:45:31 AM
 #15

I would suspect that either the PSU cannot handle the load and eventually craps out, or you may have a bad riser (possibly GPU instead). We have some PNY GPUs that will not go lower then 80% power limit in Afterburner, I believe they are the same ones you have. I can drag the slider past 80% but monitoring the power draw, both via software and via a killawatt meter at the wall shows that the GPUs still consumer ~80% power even if the slider is at 60 or 70%. I chalked it up to being a hardcoded into the bios issue and didn't investigate it any further. The temps and profits both have justified running them at 80% so I didn't worry about it (actually had forgotten until I read this thread).

i agree, im running 8 1060s on a 1000w sucking up 910watts and it runs fine on that psu. i suspect riser or gpu.
JaredKaragen
Legendary
*
Offline Offline

Activity: 1848
Merit: 1166


My AR-15 ID's itself as a toaster. Want breakfast?


View Profile WWW
July 29, 2017, 11:30:40 PM
 #16

I really have my $$$ on a faulty PSU.

I am not expeirencing my random freezes and restarts when my PSU temp is back down to a reasonable amount....   Those types of freezes and reboots really point towards the PSU.


Secondarily;  test the ram with a testing software... but honestly, nowadays, power supplies cause all of the problems that memory used to back in the original SDRAM/1st gen DDR days.

Link to my batch and script resources here.  

DO NOT TRUST YOBIT  -JK

Donations: 1Q8HjG8wMa3hgmDFbFHC9cADPLpm1xKHQM
krazitrain (OP)
Newbie
*
Offline Offline

Activity: 15
Merit: 0


View Profile
August 02, 2017, 03:58:37 AM
 #17

Model Number:   VCGGTX10603XGPB-OC-BB   PNY GTX 1060 3GB XLR8 VERSION

Yesterday,  I tore it down so that they were just two cards running on the 1000 PSU. I ran it all day with a power limit of 70 and memory clock +476  individually on both cards.
I added the third card when I went to bed and woke up to a frozen screen again.  So, I replaced the riser on the third card..

Six hours ago, I attached two more cards to the 1000 PSU for 5 total. But this time I placed settings for all five at power limit 70, memory clock + 476, core +46..

I also  reinstalled  Nvidia drivers to the latest,  version     384.94

It's been running fine for the last six hours with temperatures  on the cards between 66 and 69°C. I don't have any fans on them and the window slightly open. It's been 96°Fahrenheit out today. air-conditioners in the other room.

471 W total.

I guess I'll let it run through the night and see what happens. tomorrow add the sixth card off the 850 PSU.

So I'm not sure if was a riser or settings. I just hope it's working good now..
philipma1957
Legendary
*
Online Online

Activity: 4298
Merit: 8804


'The right to privacy matters'


View Profile WWW
August 02, 2017, 04:08:12 AM
 #18

Model Number:   VCGGTX10603XGPB-OC-BB   PNY GTX 1060 3GB XLR8 VERSION

Yesterday,  I tore it down so that they were just two cards running on the 1000 PSU. I ran it all day with a power limit of 70 and memory clock +476  individually on both cards.
I added the third card when I went to bed and woke up to a frozen screen again.  So, I replaced the riser on the third card..

Six hours ago, I attached two more cards to the 1000 PSU for 5 total. But this time I placed settings for all five at power limit 70, memory clock + 476, core +46..

I also  reinstalled  Nvidia drivers to the latest,  version     384.94

It's been running fine for the last six hours with temperatures  on the cards between 66 and 69°C. I don't have any fans on them and the window slightly open. It's been 96°Fahrenheit out today. air-conditioners in the other room.

471 W total.

I guess I'll let it run through the night and see what happens. tomorrow add the sixth card off the 850 PSU.

So I'm not sure if was a riser or settings. I just hope it's working good now..


So these five let you do 70 percent tdp and ran overnight.

That sixth card could be the issue.
I have had issues like this.

Ran the odd ball card in a one or two card rig and it works well.


▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
MingMining
Member
**
Offline Offline

Activity: 202
Merit: 10

Eloncoin.org - Mars, here we come!


View Profile
August 02, 2017, 04:36:41 AM
 #19

It may well be a sick card. well it is not sick, it just cannot handle your overclock. Even they are all the same model, one of them may not be able to survive the memory overclock. Actually this is what happened to my rig.

1. read the log and locate which gpu has hanged first or error first. say cpu 0.
2. what I did is pressing 0 to disable GPU 0 in the rig.
3. watch if your rig is stable. If it is, congratulation, you found the sick card.
4. Try one by one till you find it.

or you can just disable watchdog in your bat file, set -wd 0. In this case, the sick card will stop but your other cards keep working. Now touch the card and feel the temp you will know which one is bad.

If you cannot find a card to blame, then you may need to reinstall the windows and update to the latest.

Hope this helps.

ElonCoin.org    ElonCoin.org    ElonCoin.org     ElonCoin.org     ElonCoin.org    ElonCoin.org    ElonCoin.org
●          Mars, here we come!          ●
██ ████ ███ ██ ████ ███ ██   Join Discord   ██ ███ ████ ██ ███ ████ ██
jenia1
Sr. Member
****
Offline Offline

Activity: 504
Merit: 267


HashWare - Mining solutions for everyone!


View Profile
August 02, 2017, 06:53:52 AM
 #20

sorry i didnt read all replys. but i have also experienced the same symptoms. for me it was crashing because of too much OC and high temps. i then had to underclock manually every card that has caused the rig to crash. this is tidious work i know. but the only way i managed to make it stable.

Making mining profitable
▬▬▬▬▬▬▬▬▬▬▬  ANN  Website  Facebook
                                   █████████
                             ███      ████████   █████
               ▄█████████▄   █          ██████   █████  ████  ████  ████   ▄██████▄    ▄██████▄    ████████
       █████  █████  ██████      ███     █████   █████  ████  ████  ████  ████ █████  ████  ████  █████████
█████  █████  █████   █████      ███     █████   █████  ████  ████  ████  ████  ████  ████  ████  ████
█████  █████  █████   █████  █    ██     █████   █████  ████  ████  ████  ████  ████  ████  ████  ████
████████████  █████   █████  ██    ███████████████████  ████  ████  ████  ████  ████  ████  ████  ████████
█████  █████  █████████████  ████    █████████████████  ████  ████  ████  ██████████  █████████▀  ████
█████  █████  █████████████  ██████    ███████   █████  ████  ████  ████  ████  ████  ████████    ████
█████  █████  █████   █████       ███   ██████   █████  ████  ████  ████  ████  ████  ████ ▀███▄  ████
       █████  █████   █████       ███    █████   █████  ████  ████  ████  ████  ████  ████  ████  ████
              █████   █████       ███    █████   █████  ████████████████  ████  ████  ████  ████  █████████
                      █████  █          ██████   █████   ▀█████  █████▀   ████  ████  ████  ████   ████████
                             ███       ███████   █████
                              ▀▀███   ████▀▀▀
▬▬▬▬▬▬▬▬  PM
Contact Us





Mining Farm planning & advising
Equipment supply & transportation
Linking investors & ideas




Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!