Bitcoin Forum
November 07, 2024, 09:15:58 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Sapphire 6870 dead or headed that way?  (Read 2284 times)
DrG (OP)
Legendary
*
Offline Offline

Activity: 2086
Merit: 1035


View Profile
July 09, 2012, 10:56:51 AM
 #1

Background: I bought a whole bunch of 6870s for my mining rigs (mostly Sapphire ones) and for the most part they have been problem free running 965/175 stock volts.  The ones I got from Amazon can all run that speed.  2 that I got from Newegg seem to have issues so I throttled them back to 950/175 and ran them in a 2 card rig.  The card with the issue has hashed for about 6 months without major problems until now.

Issue:  Yesterday CGminer reported that the card that usually runs cooler (one farther from the CPU) was "SICK".  So I killed CGMiner 2.4.4 and restarted it - card seemed to be working fine.  30 minutes later I check up on it and card says SICK again and is not mining. This time I watch it on a restart and note it seems to be hashing OK but the temps seem to be ramping up past the max it normally went to (past the 72C it previously maxed at).  MSI Afterburner shows it hit up to 92C sometime in the past (I'm assuming when it went SICK previously).  On this reattempt it got up to 88C, the machine hangs for 10 seconds, and the driver crashes with Win 7 x64 running fine just CGMiner not hashing.  I took the card out and brought it home to test it and it basically does the same thing when mining on my home rig (single card setup).  It seemed to run 3DMark 2011 Basic OK with a 4220 score (average) and played Civ 5 without problem for 2 hours).

Any idea what's wrong?  The fan seems to be spinning fine and I had it set at 50%.

I did a lot of messing around with the rig recently that may be relevant.  Last week I swapped out the previous Antec Neo Eco 620C and used a Antec HCG-400 (400W 30A on 12V single rail) since I needed the 620 for a 7970.  The rig should have had enough power as it read 340W on the killawatt with both cards mining.   2 nights before the "SICKness" I "upgraded" the drivers to 12.3 from 11.7 to see if the hashrate would really take a hit (it did, dropped from 305 MH to 275).  Yeah it was a bonehead move, but my 6950s suffered almost no penalty from going from 11.7 to 12.3 oddly and that cured the CPU bug on those rigs.  So I'm not sure if swapping the PSU or changing drivers could have resulted in the hardware "failure".  I keep hearing about VRMs overheating on 5970s - are the 6870s susceptible to such damage?

I could always RMA the card to Sapphire but I worry they'll find nothing wrong with it and I'll be out a mining card for 2-3 weeks.
crazyates
Legendary
*
Offline Offline

Activity: 952
Merit: 1000



View Profile
July 09, 2012, 01:54:26 PM
 #2

Just a few suggestions. If the card is overheating, take it apart and re-paste it. Check the mem/VDDC pads as well. Then, set the fan to --auto-fan and --temp-target 70 and see what the fan settles down at after 10-15 minutes. If you're concerned about VRM temps, open CPU-z and check the VDDC temps after the same 10-15 minutes.

Try going back to the 620W PSU and see if that makes a difference. Your 400W PSU might be bad?

As far as your new driving dropping your hashrates by ~10%, you probably upgraded your SDK without realizing it. I'd recommend downgrading to 2.5, deleting all your .bin files, and see if that makes a difference.

Tips? 1crazy8pMqgwJ7tX7ZPZmyPwFbc6xZKM9
Previous Trade History - Sale Thread
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
July 10, 2012, 04:23:52 AM
 #3

Sounds to me that most likely your fan is dead. Check the actual RPM's to see if they are low, or put your ear to it or finger lightly on the face of the fan. Does it sound or feel like its vibrating? The fan when powered off should spin almost freely with light resistance every quarter turn; You should be able to spin the fan with the flick of your fingers many turns easily. If it feels stiff throughout or you can't spin it without it stopping after a turn or two, you have a dud fan.

BUT DON'T FRET!

You can actually fix this yourself WITHOUT getting a new fan or taking the card apart. You will need a drill and a (preferably) very small drill bit, maybe 1mm or less in diameter. What you'll want to do is VERY CAREFULLY drill a hole about 2-4 mm off center on the face of the fan. You only want to go deep enough to make a hole in the fan face and not penetrate any components inside it. Then you'll want to get your handy dandy 3-in-1 multi-purpose oil (don't use WD40. It will evaporate and your fan will slow down again within a month) and put about 3-5 drops in the hole. Some will probably ooze around the hole, and thats fine. Just wipe it up after you're done and use an alcohol wipe (if its saturated, wring some of the alcohol out. Don't want any going inside the fan) to clean up any remaining oil on the surface of the fan. Put the sticker back on the face of the fan (or some thin light tape if there was no sticker or you drilled through the sticker) to seal it and spin the fan a couple dozen times to work the oil throughout the fan axle. It should quickly become very free-flowing and work as if it were brand new, and your temperatures should go back to normal.

Another extremely less likely scenario is that your VGA heatsink's heatpipes have ruptured and are no longer able to effectively carry away heat, but I highly doubt this since it would require some sort of physical harm to cause that. In both cases though, I believe the 6870 cards themselves are fine. Your cooling device just has problems.

DrG (OP)
Legendary
*
Offline Offline

Activity: 2086
Merit: 1035


View Profile
July 10, 2012, 10:09:50 AM
 #4

I doubt its the PSU now that I think about it since I had a 6950 mining in my home machine without issue, the 6870 in question does the same thing at home.

It could be the fan, but it still seems to run smoothly.  CGMiner was reporting a RPM of 2216 which is just a little faster than most of the the other 6870s I'm running (they're 2100-2200 at 50%).

I'll launch GPU-Z and see what the VDDC temps are.

Honestly I'd rather do surgery on a human than open the card up.  Not because I don't trust in myself but I got 2 little monkeys at home that want to destroy everything.  One already pried a capacitor off a 3 card Mobo rendering it dead so doing any "surgery" at home isn't really feasible Sad
DrG (OP)
Legendary
*
Offline Offline

Activity: 2086
Merit: 1035


View Profile
July 10, 2012, 01:55:35 PM
 #5

As suggested I took a loot at the temps with GPU-Z.  I ran the card at stock 900 core with mem at 550 (didn't bother to lower to 175 like I normally do with Trixx).  I bumped the fan to 60%.  I heard the audible increase in fan speed.  Launched CGMiner and it was able to mine at about 280MH/s for a good 4 minutes with temps getting to 82C with ambient about 75F.  I launched GPU-Z and

GPU Temp #1: 82C (same temp reported in CGMiner - I guess the GPU core)
GPU Temp #2: 145C
GPU Temp #3: 93C

Um, yeah  - I'm no EE but 145C doesn't sound good haha.  I immediately shut down the miner.  Is that GPU Temp #3 145C the VRMs?

At idle and clocked at 900/150 the 3 temps are 39C/40C/37C

EDIT:
Retested with fan at 60% by dropping memclock down to 150 and temps are better: 68.5C/107C/76.5C

Looks like it is functional again but still running pretty hot for a 6870 being the only card in the rig.
crazyates
Legendary
*
Offline Offline

Activity: 952
Merit: 1000



View Profile
July 10, 2012, 02:23:05 PM
 #6

145C?! Damn! VRM temps upwards of 100C is not uncommon, but also not recommended for 24/7 use. On my 5xxx cards, GPU-Z labels the VDDC temps pretty clearly, but idk about those 6xxx cards. If you're not comfortable opening it up and replacing the thermal paste / memory pads, I'd suggest finding a computer shop who will.

Tips? 1crazy8pMqgwJ7tX7ZPZmyPwFbc6xZKM9
Previous Trade History - Sale Thread
ssateneth
Legendary
*
Offline Offline

Activity: 1344
Merit: 1004



View Profile
July 10, 2012, 06:07:32 PM
 #7

No sure what the GPU temps are; I don't use GPU-Z since the labels are kind of vague. I prefer hwinfo; better labeling. If it is indeed the VRMs though, double check to make sure that the little minisink on them is still securely in place. It's possible the thermal pad is damaged somehow and not transfering heat. In order to check (or replace) this though, you need to take off the heatink. Might be worth doing a little TLC on it anyways with a soft bristle toothbrush or mascara brush and get all the dust out of the heatsink(s).

DrG (OP)
Legendary
*
Offline Offline

Activity: 2086
Merit: 1035


View Profile
July 11, 2012, 06:31:29 AM
 #8

It never was really an issue of me being uncomfortable opening it (I've been playing with hardware since back in the days of a Hayes 300 baud modem since my dad is an EE) just I didn't want to void warranty or have my kids interrupt me and eat a thermal pad or retention spring (if you have kids you know what I mean).

I'm usually gentle with my hardware but I found my cordless 18V leafblower did a lot better job of blowing dust out of my 6950s than compressed air did.  Just put them all in the driveway and watched the dust blow out.  They're all mining fine 1 year later haha.

I decided to just try to RMA it but Sapphire RMA leaves something to be desired.  From posts on HardForums I think Sapphire wanted me to RMA to the original vendor.   The Amazon rep had a hard time finding contact info for Sapphire so she agreed to accept the card for a partial refund which I was OK with.

It was my first failure out of 40 or so cards - guess that's not too bad.  Guess something was bound to happen when the ambient got up to 100F because the AC doesn't run on Sunday in the office where my main mining setup is.
QiVX
Member
**
Offline Offline

Activity: 81
Merit: 10



View Profile
July 11, 2012, 09:51:17 AM
 #9

A general rule I keep for my GPUs is for no temperatures at all to be over 100.
I know that on the 5000 series GPUs the max rated VRM tem is 125, and that doesn't mean 125 24/7, it probably means 125 for a few minutes.

Then I see you have 145C on the VRM.
I'm sorry, but I think that VRM is damaged.

You can try re-pasting etc, but I wouldn't be surprised if the card doesn't work.

Good luck though.
crazyates
Legendary
*
Offline Offline

Activity: 952
Merit: 1000



View Profile
July 11, 2012, 04:18:02 PM
 #10

A general rule I keep for my GPUs is for no temperatures at all to be over 100.
I know that on the 5000 series GPUs the max rated VRM tem is 125, and that doesn't mean 125 24/7, it probably means 125 for a few minutes.

Then I see you have 145C on the VRM.
I'm sorry, but I think that VRM is damaged.

You can try re-pasting etc, but I wouldn't be surprised if the card doesn't work.

Good luck though.

Did you read the post above you? DrG just said they're returning the card for a partial refund thru amazon.

Tips? 1crazy8pMqgwJ7tX7ZPZmyPwFbc6xZKM9
Previous Trade History - Sale Thread
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!