Bitcoin Forum
December 06, 2016, 10:12:00 PM *
News: Latest stable version of Bitcoin Core: 0.13.1  [Torrent].
 
   Home   Help Search Donate Login Register  
Pages: [1] 2 »  All
  Print  
Author Topic: 5850 sudden runaway temperature  (Read 3784 times)
P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
September 17, 2011, 10:26:31 PM
 #1

Did I fry my card?

update: I did end up frying the card. The reason is monitoring voltage with 2 apps, like GPU-z, Everest and/or Afterburner can cause a bug in the 5xx0 cards to spike voltage to 1.65v!. Read below for details.


I have a 5850 with a huge Accelero twin turbo cooler at 100% and using stock cooler plate for ram and vrms.
Its been mining under ubuntu for 2 weeks with no problems, GPU below 50C, VRMS around 80C.

I booted into windows for some stuff, and kept mining. When I checked my temps, I got pretty much the same temps as in ubuntu. A while later I checked again, and suddenly I saw the left side of the chart:



OUCH
VRMS above 120C GPU throtteling at 100C!
Immediately stopped the miner, thats the first decline you see.

First I thought the fans must have died or something, but strangely, the temps dropped back to normal idle temps. So call me crazy but some time later I launched the miner again and kept watching. For 20 minutes or so, I got perfectly normal loaded temps again. GPU below 50C, VRMS around 80C. Then BANG, VRM and GPU temps shoot up again almost instantly.

WTF?

Card still seems to work fine. I dont get it.

Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1481062320
Hero Member
*
Offline Offline

Posts: 1481062320

View Profile Personal Message (Offline)

Ignore
1481062320
Reply with quote  #2

1481062320
Report to moderator
1481062320
Hero Member
*
Offline Offline

Posts: 1481062320

View Profile Personal Message (Offline)

Ignore
1481062320
Reply with quote  #2

1481062320
Report to moderator
1481062320
Hero Member
*
Offline Offline

Posts: 1481062320

View Profile Personal Message (Offline)

Ignore
1481062320
Reply with quote  #2

1481062320
Report to moderator
ssateneth
Legendary
*
Offline Offline

Activity: 1288



View Profile
September 18, 2011, 01:04:01 AM
 #2

Are the fans still spinning? Going to guess yes, because I see the motherboard temperature went up too. Monitor the voltage of your GPU. It would appear that something is causing it to shoot up to max voltage is my guess

P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
September 18, 2011, 11:41:25 AM
 #3

Get a utility which allows you to chart core/memory frequency and voltage.  I wonder if the card is spontaneously changing a setting leading to a massive increase in current = heat.

Seems extremely unlikely an increase in clockspeed would cause this. Notice how slowly the temperatures ramp up going from idle (which is like 300 MHz and doing nothing) to full load when I start mining, which is 725 MHz and FULL load. It still takes 10+ minutes for temps to go up.

Also, I only checked for like 1 second before shutting down the miner, but it was still working, though much slower than usually (IIRC around 200 MHs instead of 300). Guess thats the throtteling kicking in.

I suspect the voltage regulation is the problem, at some point the VRM gives out and causes a voltage spike which in turns leads to a temperature spike which is instantaneous for the VRM and very fast for the rest of the GPU

edit: doh, you mentioned current. I agree. Probably a VRM crapping out.

The card still seems to work for gaming (CPU temps under 35-40C and VRMs under 50C), I think ill just stop mining and prepare for the card to die entirely

deslok
Sr. Member
****
Offline Offline

Activity: 448


It's all about the game, and how you play it


View Profile
September 18, 2011, 03:04:47 PM
 #4

Come to think of it this sounds a lot like when i had a 5770 burn out(the board turned a very pretty color too) after we had a power outage

"If we don't hang together, by Heavens we shall hang separately." - Benjamin Franklin

If you found that funny or something i said useful i always appreciate spare change
1PczDQHfEj3dJgp6wN3CXPft1bGB23TzTM
P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
September 18, 2011, 06:18:52 PM
 #5

Happened to my Sapphire 5850s all the time back when I mined.
The reason was the fan speed reseting to 50% and the memory clock reseting to 1000mhz.
I found no fix for it, it just kept happening.

That cant be the same thing; for one, even at 50% fan and full clock, I (used to) not get anywhere near those temps with my accelero.
Secondly, it would still not cause a temperature spike like that in like 1 second. I dont how fast yours went, but I got HUGE cooler on there, even with the fans off temperature climbs slowly, particularly the GPU (VRMs do go pretty fast). Not +40C in  like 2 seconds.

P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
September 18, 2011, 06:37:16 PM
 #6

Can't find any other reason for the VRM temp skyrocketing than temporary airflow/fan failure

Simple, VRM failure (causing voltage spike)
Fans are working fine, and even without any GPU fan my temps otherwise remained in spec for almost foreever in my rig (lots of case flow and big GPU cooler). Ints not a fan problem. If your VRM spike was as bad and as sudden as mine, consider the card dead or dying Sad

P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
September 18, 2011, 08:52:33 PM
 #7

I tested again, this time with GPU-Z open. Indeed, after a few minutes, the voltage suddenly spikes from 1.085 to 1.65v!



No surprise, VRM and core temperature skyrocket when that happens. Seems like a dead VRM to me Sad

ssateneth
Legendary
*
Offline Offline

Activity: 1288



View Profile
September 18, 2011, 09:10:48 PM
 #8

WOOOOOOOSH VOLTAGE! LOL

P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
September 18, 2011, 09:11:32 PM
 #9

More googling. Doh! It seems measuring the voltage is what causes it to spike randomly, according to this thread:

http://www.overclock.net/amd-ati/648462-hd-5870-random-voltage-bump-1-a.html

I've seen this before - it has happened on my 5870.

It's apparently an issue that happens when (a) you have a 5xxx series GPU under stress [it happened to me when running FurMark as well], AND (b) when you're running more than one (or in some cases ONLY one] GPU voltage monitoring tool at the same time [i.e., GPU+Z + Afterburner, or Everest]. The voltage spikes to the max unlocked core - mine was 1.5v.

There was a thread on this over at the Everest forums. Check it out.

I only run Afterburner now, and if I need to open GPU-Z, I close afterburner first if the GPU will be under stress. Bizarre? Yes. Verifiable? Also yes.

PS: You'll notice that in Afterburner voltage MONITORING is off by default. It's because voltage monitoring is what is causing the issue. Run just afterburner with voltage monitoring off, and you won't have the problem.


I think I was running afterburner when it happened the first time and I did run everest for sure. I guess that also explains why it never gave any trouble in ubuntu where it ran stable for weeks at 100% load and at higher clocks to boot. Maybe the card isnt dead yet lol

and three cheers for my accelero twin turbo that keeps the card from catching fire even at 1.65v!

Jack of Diamonds
Sr. Member
****
Offline Offline

Activity: 252



View Profile
September 19, 2011, 09:16:05 AM
 #10

Holy, 1.65v is beyond insane. Even 1.4v on liquid helium is pushing the extreme limits of any highest-end GPU.
It's a small miracle the card didn't start smoking or the VRM melting at that point.

Fortunately you had a custom air cooler

1f3gHNoBodYw1LLs3ndY0UanYB1tC0lnsBec4USeYoU9AREaCH34PBeGgAR67fx
P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
September 19, 2011, 09:24:33 AM
 #11

Holy, 1.65v is beyond insane. Even 1.4v on liquid helium is pushing the extreme limits of any highest-end GPU.
It's a small miracle the card didn't start smoking or the VRM melting at that point.

Fortunately you had a custom air cooler

Not sure if the cooler actually had any effect, looks like the card throttled to save its life. The GPU was at a very constant 100C which seems like a hardware throttle point. The throttling of the GPU probably also prevented the VRMs from overheating even more, unless they are also protected somehow. I think they where at 125 or 130C but also completely constant (see graph). Not sure if thats some VRM overheat protection or simply the upper scale of the temperature sensor, but I also believe its the maximum temperature they are designed for, so I dont think its coincidence.

Still, even with throttling keeping the GPU from frying, sending that kind of voltage through the chip would cause electromigration and kill it pretty fast,  no matter if you could keep it "cool".

allinvain
Legendary
*
Offline Offline

Activity: 1988



View Profile
January 11, 2012, 12:37:08 AM
 #12

What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?

P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
January 11, 2012, 07:31:28 AM
 #13

What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?

As I understand it, the problem is not really with the software, but the videocard. The problem occurs if two simultaneous attempts are made to read voltage or temps, so it probably doesnt really matter what apps or version you use. But I suppose MSI are aware off it, as they disabled VRM and voltage monitoring by default. Which doesnt help when running gpu-z and everest like I was Smiley.

allinvain
Legendary
*
Offline Offline

Activity: 1988



View Profile
January 11, 2012, 01:55:18 PM
 #14

What version of MSI Afterburner were you using? Is this issue something that can be fixed with software (ie MSI is aware of it)?

As I understand it, the problem is not really with the software, but the videocard. The problem occurs if two simultaneous attempts are made to read voltage or temps, so it probably doesnt really matter what apps or version you use. But I suppose MSI are aware off it, as they disabled VRM and voltage monitoring by default. Which doesnt help when running gpu-z and everest like I was Smiley.

Damn, I will have to make sure I never run more than one monitoring app. At the moment only 1 of my miners is windows based, and on it I use GPU-Z and only one instance. But man what a horrible bug to run into! I wonder how far back this bug was discovered.

Remember remember the 5th of November
Legendary
*
Offline Offline

Activity: 1526

Reverse engineer from time to time


View Profile
January 11, 2012, 01:58:13 PM
 #15

Sorry pal, but your VRMs don't even come close to my 123C VRM temps

BTC:1AiCRMxgf1ptVQwx6hDuKMu4f7F27QmJC2
P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
January 11, 2012, 02:31:04 PM
 #16

Lol? Look at the chart again not .the gpuz shot, tbats ony  for max voltage

P4man
Hero Member
*****
Offline Offline

Activity: 504



View Profile
January 11, 2012, 04:25:19 PM
 #17

Sorry pal, but your VRMs don't even come close to my 123C VRM temps

123C VRM?  That card will be dead within a year.  Hell thermal throttling should be kicking in to interrupt the card before allowing sustained >120C temps.

He's had the same symptom I had; temps suddenly for no apparent reason shooting up to throttle point. Im pretty certain it has the same cause and then temps are the least of his problems. 1.65v likely kills the GPU a lot faster than 120C on the VRMs. My 5850 only suffered an hour of so of that, it was enough to kill it. A week after I started this thread it was dead.

SlaveInDebt
Hero Member
*****
Offline Offline

Activity: 702


Your Minion


View Profile
January 11, 2012, 06:14:00 PM
 #18

When having voltage control unlocked and two or more monitoring programs running the registry glitches and you get that and yes that is indeed your actual voltage. This has been known for well over a year, be wary and stick to one app.

EDIT: figures allinvain necro's this thread. Hows that 500K doing you?

"A banker is a fellow who lends you his umbrella when the sun is shining, but wants it back the minute it begins to rain." - Mark Twain
ArtForz
Sr. Member
****
Offline Offline

Activity: 406


View Profile
January 11, 2012, 06:37:28 PM
 #19

When having voltage control unlocked and two or more monitoring programs running the registry glitches and you get that and yes that is indeed your actual voltage. This has been known for well over a year, be wary and stick to one app.

EDIT: figures allinvain necro's this thread. Hows that 500K doing you?
Actually the problem comes from 2+ programs trying to bitbang the internal power management I2C bus simultaneously via direct GPU register access.
The resulting corrupted I2C transactions have a good chance of writing 0xFFs to random registers, on a vt1165 that results in setting ~2.03V core;
A hardwired limit in the vt1165 caps that to 1.65V.
Fix is simple: don't run multiple programs that try to directly talk to the VRM controller at the same time.

bitcoin: 1Fb77Xq5ePFER8GtKRn2KDbDTVpJKfKmpz
i0coin: jNdvyvd6v6gV3kVJLD7HsB5ZwHyHwAkfdw
SlaveInDebt
Hero Member
*****
Offline Offline

Activity: 702


Your Minion


View Profile
January 11, 2012, 07:31:11 PM
 #20

Yep registry glitch  Roll Eyes

"A banker is a fellow who lends you his umbrella when the sun is shining, but wants it back the minute it begins to rain." - Mark Twain
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!