Bitcoin Forum

Alternate cryptocurrencies => Mining (Altcoins) => Topic started by: lightfoot on January 11, 2017, 02:38:36 AM



Title: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 11, 2017, 02:38:36 AM
So I've been fixing Titans, Neptunes, Monarchs, Singles, Avalons, and a whole bunch of mining technologies over the years, but for some reason never really fiddled around much with GPU cards. They blow up too, and I see them on sale at Ebay all the time. I need a challenge, so I thought I would start a thread on my observations in fixing them if possible, developing techniques that can work, and figuring out how to tell one that can be fixed from a brick.

As normal, I will post my thoughts below and see what I can come up with. First up I need to find some dead cards to practice on....

Background: Years of doing SMD repair on electric car power controllers (400v/500a) as well as miners (.6 volts, 1000 amps) and other small things. I prefer to use hot air rework tools, and I like to use pre-heat to keep from roasting components. I don't use the toaster to repair boards. :-)

Let's see where this goes.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 11, 2017, 02:39:05 AM
Reserved for tips and tricks


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 11, 2017, 02:39:19 AM
Reserved for status. Let's roll....


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: Emoclaw on January 11, 2017, 02:53:47 AM
Nice. I'd be interested to see how many of the cards you attempt to repair can actually be repaired. 
I have a friend in the component-level repair industry and he says that most GPUs die because their VRMs are either of terrible design or the cooling is bad. The graphics chip itself rarely dies. Though he doesn't actually repair graphics card due to luck of schematics, which he says makes the process more time consuming.
Good luck, I'll be following this thread.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: reb0rn21 on January 11, 2017, 02:57:59 AM
I presume if VRM get shorted, PCB will be damaged at least on mid/high end cards

In past like 6+ years ago most problems were due GPU used bad solder to PCB so reflow helped, now I think its VRM mostly or GPU memory going bad


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: bathrobehero on January 11, 2017, 03:04:13 AM
Cool.

Out of dozens of GPUs over the years I only ever had one particular model (GV-N75TOC-2GI) dying because it had weak VRMs. I think 5 out of six died withing months.
After the RMA repair process the same cards still work flawlessly.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: SweaterJacket on January 11, 2017, 03:17:15 AM
Reserved


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 11, 2017, 03:25:25 AM
Interesting. Power subsystems are one of my specialties, it's surprisingly hard to build a good one and easy to screw it up.

My first thought was that overheating the GPU chip could cause the solder balls to go high resistance, thus causing it to fail, however the problem is most GPUs are a very high density BGA mounted on a board to a pitch that will mate to a rational PCB. The high density BGA isn't the issue, it's that they glue the die to the carrier and if you overheat the chip too much the solder balls "blow out" and short under the die. That's sunk.

I'll take a look into the VRMs.

C


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 11, 2017, 03:29:10 AM
I presume if VRM get shorted, PCB will be damaged at least on mid/high end cards

In past like 6+ years ago most problems were due GPU used bad solder to PCB so reflow helped, now I think its VRM mostly or GPU memory going bad
Typically the high side FETs on reasonable VRMs will have a RC circuit or a op amp comparator across them to measure current flow and shut down the VRM if the current flow goes too high (ie a burned FET) before there is a cut through short to ground. Low side FETs rarely fail because their on time is much higher than the high side, so they don't have as much switching loss.

If the GPU shorts internally you're sunk of course but that can be tested by pulling the high side FETs and looking for shorts. Hm.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: l8nit3 on January 11, 2017, 03:29:23 AM
Im highly intrigued by this idea and have thought of the same myself, however dont have the low-level hardware background to make it a possibility. Personally i have a 280x thats driving me nuts. Hopefully you end up working with a card with a similar issue.

Just to put it out there, the card mines just fine, but no matter what drivers or gpu-reading software i use (gpu-z, AB, trixx) I cannot ever get this thing to show a temperature! In fact ive spent the shipping and had it sent back to gigabyte under warrunty, and after claiming to fix it, it still shows no temp!

All that said, I love the idea of this thread and will be following very closely, Good luck, and thank you in advance for any tips and tricks you find. :)


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 11, 2017, 03:32:27 AM
Nice. I'd be interested to see how many of the cards you attempt to repair can actually be repaired. 
I have a friend in the component-level repair industry and he says that most GPUs die because their VRMs are either of terrible design or the cooling is bad. The graphics chip itself rarely dies. Though he doesn't actually repair graphics card due to luck of schematics, which he says makes the process more time consuming.
Good luck, I'll be following this thread.
Indeed. Lack of cooling on VRMs will cause the FETs to go, my guess is if you're overclocking that can do it (current will avalanche as temps go up). As for schematics, there never seem to be any, anymore especially for Bitcoin miners; no one wants to take the liability I suppose. However these things are pretty simple at their heart: Get power into them, get work into the chip and out, and put the heat somewhere.

Now I need some dead boards to start working on. Anyone got a box of old dead boards?


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: mirny on January 11, 2017, 03:47:57 AM
I have 5, or 6 dead boards, 7950,7970,280x,6990s


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: bathrobehero on January 11, 2017, 04:19:46 AM
Getting back to my previous comment about certain models having the same issue, I also used to have a bunch of Asus GTX 780 Ti cards that were designed in a way that their VRMs would go well above 100°C as they had absolutely no dissipation (just hid under the heatsink with no contact). I bought a few thermal pads and put it on them so that the pads connected them to the heatsink and the temps were decreased drastically.

Also, when I used to mine Ethereum I noticed the memory modules would go slightly above 100°C (GTX 970) even without overclocking while the GPU itself was about 60°C and I expect a lot of those cards will end up dying coming from miners who mined Eth for a long time or might even still mine it.

So my point is that probably each exact model of cards have an expected way of dying. And ebay is probably full of faulty cards that were already checked by someone experienced like OP and deemed FUBAR.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: adaseb on January 11, 2017, 11:18:17 AM
The most common failure with GPUs are the fans. Depending which type of fan it uses it can be repaired in different ways.

The Sapphire Dual-X R9 280X, Gigabyte Windforce fans, all have fan blades that you can easily pop-off using some string and then relube the bearing with grease. Works everytime pretty much.

The more durable fans like on the ASUS 7970 / ASUS 280x / MSI 280x you need to drill a hole in the back slightly off-centre and pour in the thinnest oil that can fit inside. This sometimes works great ... sometimes works but rattles.... reason being that lube would be best however its impossible to lubricate the bearing.

For the newer RX 470 / 480 the fans will probably start failing sooner or later however for those most have 2-3 year warranty and you can just RMA them.

 


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: FFI2013 on January 11, 2017, 08:08:31 PM
I have a gigabyte r9/270 you can check out I lost the receipt to RMA but if your in the us I can ship it to you. I also have a gridseed blade that needs to be looked at


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 11, 2017, 10:12:32 PM
Yes, I am in the US, feel free to PM me as needed.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 11, 2017, 10:14:20 PM
I had a GPU die in quite a silly way, the PCI extender I was using (16x to 1x) I had the 1x plugged in the wrong way, and apparently this killed the card through an extender. If this is something you think you can fix, I will gladly send it to you for shipping cost.
Sure. I'll PM you my address.

C


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: hhdllhflower on January 11, 2017, 10:21:52 PM
Reserved
nice job 8)


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: helipotte on January 12, 2017, 01:49:18 AM
Nice to see you working on GPU's.  I have a few stencils coming for tahiti/pitcairn/hawaii and I will take a crack at re-balling some of the units I have.  They look
like they use 0.5mm balls.  Can send you some of my "trouble" units if you want to try to fix them.  I have a few units that keep popping mosfets.  Have been trying
to find out a way to narrow down bad memory chips on cards.  Don't even know if this is possible without changing them one at a time.

Cheers!


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 12, 2017, 02:13:20 AM
You would think one could run diagnostics on the things to find the bad memory cards; those are easy to swap out but yes, a pain.

Have you tried checking the resistance with the inductors off? Back in the BFL days that was the #1 best way to identify which FET was shorted and also a way to identify a shorted die (0 ohms means infinite current no matter how you slice it).

C


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 12, 2017, 02:19:23 AM
So in terms of GPUs and such I've noticed that there are not a whole lot of chips on a GPU aside from the main chip. So I am guessing the GPU chip pretty much has it all.

if I think about this like an SGI Indigo ELan or Reality Engine we have these parts in the video system:

Sequence command decoder
Raster memory and Display Generator
Z buffer memory and Texture memory
GE7 Geometry engine GPUs.

They require different power supplies: The decoders and memory are normal 1.8/3.3 volt systems that pull a small amount of current to serve as the hotel load. The real power is needed for the Geometry Engines, and even they had a normal supply for the sequencers and buffers with the big power reserved for the transformation engines, lighting, shading, and polygon calculations.

So if we have a board that appears as a device in Windows it's probable that the hotel circuits are running, but the vector processors are out which might be those power supplies. I'll start sorting boards based on if they come up at all, come up with bad screens (probably Z buffer or raster memory errors) or something else.

Hm.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: m0niker on January 12, 2017, 04:39:39 AM
Will you be able to post pictures along with what you find when fixing the GPUs? It would be awesome to learn, if all goes well I'll try to find some dead cards for you to fix  ;)


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 12, 2017, 04:47:21 AM
Will you be able to post pictures along with what you find when fixing the GPUs? It would be awesome to learn, if all goes well I'll try to find some dead cards for you to fix  ;)
Absolutely. I did that in my Titan and Neptune threads, it's fun to do.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: helipotte on January 12, 2017, 05:54:05 AM
All of the AMD cards I have worked on have three dc/dc supplies on them.

1) 1.5V for the GDDR5. This one is usually fixed and consists of two phases.
2) 0.9V - 1.0V for the GPU memory controller I/O.  This is controllable via firmware and is often one or two phases.
3) 0.8V - 1.2V  for GPU core.  This is always firmware controlled and is usually at least 4 phases but some cards can have 10 or more. :o

I have seen shorts on all of these.  I have an Asus 280x that has all THREE shorted.  I have checked the gate of each mosfet and they are all good.
Thinking about pulling the chokes but when I apply current limited power to it and watch with a thermal camera the GPU die heats up.  Strange.

One of the cards that keeps popping mosfets is an older Nvidia 760ti.  This card always blows just ONE high side fet.  It will turn on, post then a variable
amount of time later (5-10 minutes) the PSU shuts down (shorted fet).  I feel the gate controller is to blame, could be wrong.   This card has 4 core phases and it is
not always the same one that pops it's high side fet.  This card also has the same core resistance as a working card. ???


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 12, 2017, 11:27:44 PM
Yup, BFL's had 6 phase power supplies which really reduced ripple but went batshit if the FET drivers (2708's in the old days) would short. Likewise you had an RC circuit across each inductor, this would tell you the current across the inductor and was compared with the phase position from the LM driver to adjust for FETs running hot or cold. Problem is if that RC circuit goes out of whack then FETs start exploding.

Oddly enough Titan/Neptunes did it right: They bought off the shelf supplies, synced them in pairs, then placed them on the board around the chip in an implosion design so that no die in the chip was ever further away from one supply than the other. This matters on a 6-12 phase system since the distance from inductor to die can vary by an inch and while an inch seems like a small distance when you're pulling 400 watts per die at .5v that's 800a of current and P=I^2/R so I gets very big even if R is very small.

I'm guessing a similar situation exists in the GPUs. Gotta run to Boston this weekend, if anyone's up there and wants to grab lunch let me know. Next week I'll start posting some pics of a blown 270 board I have here.

C


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: m1n1ngP4d4w4n on January 13, 2017, 08:52:59 AM
Woah, true electronic repair guys are so rare nowadays  :o, keep up the good work man  8)


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 22, 2017, 03:10:01 AM
So back from my trip, finished the backlog of work that came in, and can fiddle with these boards a bit more.

First up is an AMD R9/270 that I picked up on Ebay broken. Sure enough it didn't come up as a display, but was registered by the computer. So something was working, just not the whole thing.

The board
https://i.imgur.com/vG63Rov.jpg

It's really pretty simple: On the left is the high power circuits for the GPU, on the right is a lower power circuit pair for the memory, and hotel circuits.

If you look more closely at the left side you can see how this is powered: Five separate chokes indicate 5 power supplies. The little FQDN chips next to the chokes are the FETs plus drivers plus dead-time logic. The lettering on them is hard to read but I can see they are Fairchild 6705B half bridge buck drivers. They contain the switching logic, a high and low side FET, and appropriate logic to determine cut-through and current sensing (via the little RC circuits at the bottom of the board). Pretty simple actually, according to the docs each one can handle 40a of current, so we're looking at a max of 200a into the GPU. About right.

The question is: What is happening? Normally the high side FET shorts, in which case the cut-through circuitry crowbars the low side and shuts down the controller. The problem with that is if +12 was connected to the GPU the low resistance of the GPU would essentially short out your powr supply or blow the GPU sky high. Given that neither are happening I'm not sure if the failure is in the FETs. It's possible the low side FET blew, but since the Rds switching time is mostly on for the low side in a buck converter they don't usually ever short out. Plus the voltage drop on the high side is much higher (going from 12v to 1 instead of 1v to 0) so the high side FET normally blows.

Hm.... One way to find a shorted supply is to pull the chokes and check the resistance of the circuit at the output/FET side of the choke. The bad supply will read high (or low if the low side FET is shorted) and you're in business. Or if the FETs are exposed you can look for a short between gate and source or drain. Normally when a FET blows the gate is shorted as well.


Need to think more about this.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: helipotte on January 23, 2017, 03:49:32 AM
I have two XFX cards.  A 280 and 280x that I got with dead shorts on the VDDC (gpu core) supply.  Turned out to be a smd ceramic cap on the back!
Found it by putting a D cell battery across the VDDC and looking at the card with a thermal camera.  The battery had just enough current to make the
bad cap glow, but not enough to damage anything.

Currently have a Devil 13 dual 290x that has some strange measurements.  One of the gpus shows much lower resistances than the other.  Looking like it might
have a fried GPU.  Do wish I could find pinouts or datasheets on these things.  I am going to try lifting the choke(s) to confirm where the shorts are.                         This card is absurd as far as the power goes.  It has 15 power phases at 40A each.  5 per gpu core, 1 for each gpu memory controller and three for the GDDR5.         That's 120A just for the memory!


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on January 23, 2017, 03:53:49 AM
I have two XFX cards.  A 280 and 280x that I got with dead shorts on the VDDC (gpu core) supply.  Turned out to be a smd ceramic cap on the back!
Found it by putting a D cell battery across the VDDC and looking at the card with a thermal camera.  The battery had just enough current to make the
bad cap glow, but not enough to damage anything.

Currently have a Devil 13 dual 290x that has some strange measurements.  One of the gpus shows much lower resistances than the other.  Looking like it might
have a fried GPU.  Do wish I could find pinouts or datasheets on these things.  I am going to try lifting the choke(s) to confirm where the shorts are.                         This card is absurd as far as the power goes.  It has 15 power phases at 40A each.  5 per gpu core, 1 for each gpu memory controller and three for the GDDR5.         That's 120A just for the memory!
Nice job! I've used a 30ah SAFT NiCD cell as a tester like that, enough current to heat up the short while keeping the voltage below what will blow up a hash engine or GPU. That second one could point to a bad high side FET, does it crowbar the power supply when plugged in by chance?


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: helipotte on January 23, 2017, 04:18:18 AM
It does not. This card has voltage test points on the back edge.  While powered up, When I check the voltages this is what I get:

12V            good  This is only the power from the pci-e slot
3.3V           good  This is only the power from the pci-e slot
1.8V           good  Don't know what this is, suspect power for the PLX pci-e bridge chip
0.9V           good  Pci-e I/O
GDDR5        dead  supply for the 32 GDDR5 modules
gpu1/core   dead  This is the low resistance gpu  (1.0 ohm)
gpu2/core   good  This is the normal resistance gpu (2.5 ohm)
gpu1/mc     dead  This is the memory controller for the "bad" gpu (3.5 ohm)
gpu2/mc     good  This is the memory controller for the "good" gpu (35 ohm)

I suspect the start-up sequence this card uses is to power up the core and memory controller for each gpu first and then the memory due to it being shared.
Likely it's firmware goes:

gpu1-->good?-->no-->crowbar.
gpu2-->good?-->yes-->turn on power.
memory-->do not turn on due to gpu 1 crowbar.

Sound plausible?


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: 64dimensions on January 23, 2017, 02:26:06 PM
Either of you fix PC power supplies?

I have an Antec 1300W that is bad.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: AmDD on January 23, 2017, 03:33:51 PM
Cool, I'll be watching this. I also can add my name to the list of people willing to send a few broken cards to you. I should have some 7950's around somewhere.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: alucard20724 on February 05, 2017, 01:46:41 AM
i just pulled out my OOC box of gpus.

There's:
one R9 290,
three R9 280X
six 7970 with EK waterblocks, two cards of which have burn gnd/pwr traces (water leak i think)
one R290X with ek waterblock... still in system.. too much of a pain to remove it.

you say you drill holes in teeth.. have you ever done a pcb repair?... that's the one thing i've never done.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: alucard20724 on February 05, 2017, 01:58:29 AM
and i'm getting sucked in... just started looking at boom microscopes...  :D


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on February 05, 2017, 03:11:45 AM
you say you drill holes in teeth.. have you ever done a pcb repair?... that's the one thing i've never done.
Sure. I have had to totally rebuild alternate power planes on blown Titans. What a mess that was.

And speaking of mess, pictures of my latest $30 Ebay special coming up.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on February 05, 2017, 03:15:29 AM
So back to the GPUs. Got an R9 sapphire that was dead.

https://i.imgur.com/ekHcsyz.jpg

Didn't take long to figure out why it was shorting the power supply:

https://i.imgur.com/gY2nCq0.jpg

This board also has FETs that include the high and low sides together on one chip die. There are seven phases, with seven FET chips and seven chokes. However it looks like FET #7 shorted and literally *exploded*. Probably had a very large power supply. Not sure if this one can be fixed, but at least we know what is wrong with it....

Got some more Titans in so working on those this weekend, then an R4 that seems to be out. However this would shut the system down pretty hard.

C


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on February 05, 2017, 04:36:33 AM
And I just pulled a few parts to clear around the destruction. You can see the FET I pulled upside down on one of the inductors, even though they sanded off the part numbers we can see it's basically the same FET design and concept. Still doesn't change the fact that the topone vaporized though, my guess is the FET shorted and the power supply graciously destroyed it rather than trip out.

https://i.imgur.com/yTCyyHQ.jpg

https://i.imgur.com/yTCyyHQ.jpg



Never dull, but getting annoying spending money for very blown up boards.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: alucard20724 on February 05, 2017, 05:20:26 AM
And I just pulled a few parts to clear around the destruction. You can see the FET I pulled upside down on one of the inductors, even though they sanded off the part numbers we can see it's basically the same FET design and concept. Still doesn't change the fact that the topone vaporized though, my guess is the FET shorted and the power supply graciously destroyed it rather than trip out.

https://i.imgur.com/yTCyyHQ.jpg

https://i.imgur.com/yTCyyHQ.jpg



Never dull, but getting annoying spending money for very blown up boards.

what desolder tool are you using to remove parts?


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on February 05, 2017, 05:38:45 AM
what desolder tool are you using to remove parts?
Hot air tools and preheat. Specifically an Aoyue 951 and an 853 pre-heater. You really always should pre-heat the board, otherwise you risk lifting pins and overheating components.



Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: smaxz on February 05, 2017, 09:44:58 AM
what desolder tool are you using to remove parts?
Hot air tools and preheat. Specifically an Aoyue 951 and an 853 pre-heater. You really always should pre-heat the board, otherwise you risk lifting pins and overheating components.



this is a good tip, i just started work on that gigabyte we've been pm'ing back and forth about.

preparing to get out the good ole solder sucker tho ;p coupled with my hakko of course.

https://s30.postimg.org/ecrxkj8kh/image.jpg

any idea why all sapphire gpu's have this thermal pad stuck over various smd components? also leaving that single ram chip completely uncooled.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: deadsix on February 05, 2017, 11:32:22 AM
As soon as I turned on my Rig, I saw sparks fly off the top edge of the back-plate and I immediately yanked the power chord.
On removing the backplate this is what I saw :

https://i.imgur.com/ME3a4D7.jpg

Surprisingly though, i plugged it back in after a while to flash stock bios, and saw that the card still works flawlessly. Sent it in for RMA anyways.
Thought this might interest people.

Also, the VRM area just above the highlighted circle, the backplate has a kinda cushiony thermal pad there that makes contact between the backplate and the pcb, but it starts oozing some kinda sticky liquid after 4-5 months of operation (atleast for a few of my cards).
I believe one other card died earlier due to that liquid, unsure about that though, had gotten it RMA'ed.

Anyone else having similar experiences on the RX 470's? Investigative diagnosis from the masters? Suggestions and Warnings, pitfalls to avoid in the future? For the record, my cards are all at or below stock speeds and heavily undervolted.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: adaseb on February 05, 2017, 11:38:41 AM


Anyone else having similar experiences on the RX 470's? Investigative diagnosis from the masters? Suggestions and Warnings, pitfalls to avoid in the future? For the record, my cards are all at or below stock speeds and heavily undervolted.

That's what usually happens, something shorts out, your PSU cuts power. Then when you turn it back on, its when it usually starts sparking/smoking.




this is a good tip, i just started work on that gigabyte we've been pm'ing back and forth about.

preparing to get out the good ole solder sucker tho ;p coupled with my hakko of course.

https://s30.postimg.org/ecrxkj8kh/image.jpg

any idea why all sapphire gpu's have this thermal pad stuck over various smd components? also leaving that single ram chip completely uncooled.


Its not only Sapphire, ASUS does that also. No idea why. Its not an issue until the >2GB games came out and people started getting artifacts due to certain ram chips overheating.


By the way, I had the exact same Sapphire Dual-X 280X go on fire also, it wasn't in that spot, it was one of them VRMs. It pretty much messed up the PCB completed, I desoldered that burnt VRM, checked for shorts and ran undervolted for a while, and then another VRM ended up blowing.

Apparently these VRMs had issues where they easily ran at 120C and higher and didn't throttle properly and ended up frying themselves. I think the issue was solved with the later Vapor-X 280X and everybody complained about throttling issues but those at least didn't blow up.

If you search the old Litecoin forum, that GPU and the Gigabyte Windforce Tahiti's were among the worst for mining because they had VRMs blow up and Capacitors.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: joerazor on February 05, 2017, 12:40:45 PM
Very nice seeing this thread :)

I have 2 dead GPUs (R9390x) (Sitting on the shelf for the past 5 months and wish I could put them to use)

I wonder if there is any easy fix to get them working. I will explain what caused the problem for each:

GPU 1 (R9 390x Sapphire): I mistakenly inserted the powered PCI Riser in the wrong direction. Immediately after powering my PC on, I saw a spark and some smoke (Not sure if it came from the riser or GPU).... Since then, my GPU would never get detected in "device manager" and the GPU would never output anything on any connected monitor... Tried the GPU on 5 different computers!


GPU 2 (R9 390x Asus Strixx): Fans on this GPU failed, so I purchased an Arctic Fan Cooler from amazon and managed to replace the stock asus fans with the arctic ones.. Everything worked fine for a few days then suddenly, my PSU would switch off immediately after a second of switching the computer on.

After fiddling around, I discovered that the problem was with this 390x Asus Strix card... I guess it is causing a short circuit or something causing the PSU to switch off immediately.


Is there any way that I can resurrect these cards from the dead? Please let me know your thoughts! If any of these card work, I will gladly donate 10ETH as a thank you.

PS: One of my friends is an advanced Motherboard technician, so I could pass the info you provide to him.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: smaxz on February 06, 2017, 09:50:32 PM
Alright so part 1 of my 7950 Gigabyte Windforce 2 tear down and overhaul.

The windforce cards are well documented as not the greatest for mining. Particularly early revisions where the VRM had no heat sinks or cooling provided.

I answered an add and picked up a card that was "on its last legs, not overclocking and displaying severe artifacts"..  the seller admitted to overvolting and mining early scrypt coins with this but fortunately the price was right for a chance to mess around with and learn more about these cards.

Plugging it in at home it mined surprisingly well off the bat, but any overclock put it unstable and crashed the OS shortly after. Flashing the ROM to something more tame helped but not by much and even at factory clocks there was artifacting under load and crashing after sustained mining.

I did notice the VRM area was equipped with a heatsink but still got incredibly warm, too hot to touch infact and the back of the vrm just had this flat bar across it as a heatsink which was also uncomfortably hot.

Time for disassembly to apply some VRM cooling mods..

https://s28.postimg.org/9eoax3np9/pcard.jpg

I found some Enzotech low profile copper heatsinks for GPU ram that looked really promising and clear the height allotted by the factory windforce heatsink.

With a steady hand a thin dremel cutting disk I prepared some heatsinks with intent to fix to the top of the VRM mosfet's with some fujipoly thermal pads.

Then the factory aluminum heatsink for the vrm mosfets could be repurposed for the underside instead of that flat bar to keep things chill.

https://s27.postimg.org/v627d31ar/psinks.jpg

Unfortunately while cleaning the thick caked on putty/paste used as a thermal interface medium (TIM) on the underside of the VRM components, a problem became apparent.

https://s27.postimg.org/er8a0ijbn/pcardback.jpg


can you guys spot it?


https://s27.postimg.org/9xt49ec7n/zvrmback.jpg

There was a little SMD/SMT (Surface Mount Technology/Device) capacitor that either disintegrated under extreme heat, had a crack in its solder and came off while I was cleaning the board, or the previous owner attempted to reapply thermal grease here and damaged the board unknowingly.

So with an idea what the issue was, time to figure out how to remedy it.

https://s27.postimg.org/55epk7s5v/pcap.jpg

SMD capacitors come in various sizes, these particularly are extremely small, getting out the digital caliper and accounting for the extra solder on the end of the cap, the dimensions are consistent with 0402 size SMD capacitors.

Knowing the size, the value of the cap is not as easy to determine. Checking with a multimeter is really not so accurate in this instance. Ideally the manufacturer will have the specs on hand and can identify against the board assigned "Cxxx" designation printed beside the capacitor.

I emailed gigabyte support asking for a schematic of the board or value of the capacitor in question and received a not so prompt reply stating this is proprietary information. Further prodding revealed that I would not get far with gigabyte/amd for this issue so decided to dig further on my own.

Although the schematics are kept quite secretive, the VRM portion of these cards are usually sourced out. To find which VRM system is used on your GPU you must locate the PWM (Pulse Width Modular) which is the VRM master control chip of sorts, and read the printing on the chip surface.

https://s27.postimg.org/dq81be2c3/pcardfront.jpg

It will be a larger chip, off near where the VRM mosfets are all lined up. A magnifying glass may be helpful at this point. If you're lucky the text will still be legible.

Mine read:

ADP 4100
JCPZ
TAB49003
#1249
KOREA

Which lead me to ADP's online schematic for their 4100 system at http://www.onsemi.com/pub_link/Collateral/ADP4100-D.PDF

Where the helpful data was identified below:

https://s27.postimg.org/sngicefkj/ppdf.png

Which puts me to where I'm at now.

Stay tuned for how to resolder/reflow smd board repairs, application of upgraded VRM cooling on Gigabyte Windforce 7950 GPU and the performance gains afterwards.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on February 06, 2017, 11:14:56 PM
preparing to get out the good ole solder sucker tho ;p coupled with my hakko of course.
Another solution to removing SMD chips with an iron is to try chipquik. It lowers the melting point of the solder to where you can remove most components (not QFN IIRC) with a simple iron, then clean off the stuff, tin with solder, and put the new component on.

Quote
any idea why all sapphire gpu's have this thermal pad stuck over various smd components? also leaving that single ram chip completely uncooled.


Trying to move heat. However thermal pads are kind of useless but if you have no airflow over the chips I guess it's better than nothing.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: smaxz on March 07, 2017, 03:27:40 AM
alright so after a long wait for parts to arrive and research/prep on how exactly to work with SMD, i spent the end of the weekend getting my fix on.

https://i.supload.com/S12sq5j5e.png

check out how the 0402 caps are shipped, encapsulated in that little strip.. grabbed some solder paste and since i have a surplus of these components now, i removed the other of these same variety caps in the mosfet step circuitry as they probably got subjected to excessive heat over their life span which is why the card was in its sorry state to begin with and it was parted with so cheaply.

https://i.supload.com/S1gy5oscg.jpg

you really have to be on your game with these tiny components.. i should have bought those visor goggles with magnification in them, but think i fared alright with the naked eye.. close up for scrutiny provided (: started with a toothpick but ended up using the head of a pin to apply paste, and worked great at picking up the capacitors also.

https://i.supload.com/HJb2jqci9x.png

the paste is sticky enough to hold the parts in place but i really wanted to get some smush on these components to ensure a solid connection.. i grabbed some foil tape, 3M foil tape.. it has an acrylic adhesive that holds up at least until the paste starts to work its magic. polyimide tape is the way to go here if you got it.. or simply omit the step, the residue left over made me think twice about going this route in the future.

https://i.supload.com/SkMni5ciqg.png

next its off to my designated reflow toaster oven. this paste melts at around 420F so i went 425 for 4 minutes and spiked it to 450 the last 2 minutes.

Now with the repair work out of the way and the GPU settling down after its roast in the toaster.. we can focus back on the mosfet heatsink mod i had originally intended.

https://i.supload.com/ry7hi55sqg.png

because i'll be using Arctic Alumina which is an epoxy adhesive with quick cure time, its vital to have your shit together while executing this mod.

above is a dry run planing the layout of my sinks and which gets designated to which chip.

https://i.supload.com/ByUniq5jce.png

since it would be a waste to permanently fix these nice little copper sinks to this gpu, i'm only going to apply the epoxy to the corners of the mosfet and a conventional tim in the center.. not only does it provide a means to repurpose these heatsinks in the future by making them removable, but the conventional paste actually dissipates heat better than the epoxy so the benefit it two fold.

https://i.supload.com/B1Hhoqqoce.png

we're going to use a clamp once the sinks are all in place, one could have opted for cardboard or another buffer for the underside, it's much more practical to use the repurposed heatsink that was originally on the top side though.

https://i.supload.com/HJP3oc9s5e.png

and all clamped up, try to use a soft wood for this if you can. and leave it a minimum of 2-3 hours. overnight is best.

https://i.supload.com/BJdhsc5sqg.png

while the clamp is working for us, we can focus on the fans.. there were working great beforehand but maintenence can never hurt.

https://i.supload.com/B1Fnjq9i5g.png

many complain their fans must not come off.. sometimes you have to be forceful. i have never broken a blade performing this maintenance.. using a flat blunt edge helps with prying action if it's super stubborn. dont waste your time with the dental floss around the fan blades trick. (:

after cleaning the blade cavities and the motors, put a small drop of oil in the middle of each fan motor.

https://i.supload.com/SJ9hjq5sce.png

heatsinks come from factory in non optimal form as it is, but seriously.. who is the nutter that takes a razor blade to scrape old TIM off their heatsinks? i run in to this ALL the time.

https://i.supload.com/HJsnocqjqg.png

properly lapping the heatsink surface does wonders. get your thermal tape ready too if thats what you plan to use on the GPU RAM.

https://i.supload.com/SyTo95s5g.png

properly applying TIM to the compute unit of the gpu is vital, depending on product you use, excess can be anywhere from catastrophic to just plain insulating heat.

https://i.supload.com/r1n3i59jqx.png

better than new.

https://i.supload.com/S1eTi9qo9e.png

better than new.

https://i.supload.com/ByW6i59icl.png

better than new.

https://i.supload.com/SkG6j99s9l.png

one thing of note, if you intend to keep the metal bracket on the side of the card (i suggest you do) and re purpose the mosfet heatsink for the underside, you will need to clearance that bracket a bit.. i just used an old file i had laying around but a dremel could do the trick just as easily. alternatively you can clearance the heatsink, as it's aluminum and softer.. this was an after thought for me ;/

https://i.supload.com/ryQ6j5qoce.png

here's the GPU back in action.. many solved blocks left in her, fingers crossed.

eyes peeled for next project!


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: bitkilla on May 02, 2017, 04:49:30 PM
Lightfoot how are you doing?

I am not real active on this forum, but I have a couple of 2 th
dragon miners with coincraft chips that conked out on me.

I Started having psu issues and even when I replaced the psu's
I have problems getting them hashing.

Can you send me an email so we can discuss possibly sending the miners to you
for repair?  Thanks for your help please check your pm box for my email or pm my

Thank you


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: 666mrga999 on May 02, 2017, 06:41:20 PM
My card totally dead after PSU fail, system would not boot with new motherboard and this GPU inside, not even if internal GPU is selected as primary... Any ideas?


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: DevonMiner on May 02, 2017, 06:46:36 PM

Interesting thread, only just noticed it.

Over the years I've seen loads of good mining GPUs on eBay (UK), and quite often the description says 'no video display, for spares and parts'. Always wondered if they would be OK for mining as we don't need it to display video, just hash away.

Maybe I should bite the bullet and buy one, just for fun  ;)



Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: JaredKaragen on June 03, 2017, 03:34:18 AM
So;  I have begun to dive into a GTX780 that my friend gave me....

He was playing a game, in his words "trying to abuse SLI" and "run them hard"... when it suddenly just bluescreened, and rebooted, showing no display on the primary video output, but the secondary card showed up.   Only one GPU in the device manager.


The back of the board looks perfect.... no signs of overheating components or anything.

The component/die side... looked fine as well.....

This leads me to want to power it on the bench, and take some voltage measurements at places.... find why it doesn't power up in the slightest (no fan, no post, nadda)

So how do you power these in order to test in such a way?  do you clamp on a rigged cooler to the GPU die for this and just let the rest run open?

Any insight into this sort of issue before?  My gut tells me to follow the power from the PCIE first....  Just by the fact the card displays the signs of no power reaching it.  Also, he said he was messing with the SLI bridge, and I believe the power piggybacks through that connector as well...

I do have a few oscilloscopes (one analog, one digital), a handful of test equipment as well as basic soldering/desoldering equipment.... if it needs heat-work done;  I'll finally have a reason to invest in a bed and temp controlled system.   I've done enough research and learning about the processes under different situations/conditions that I feel confident I can attempt an advanced repair on something.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: Nalut on June 04, 2017, 03:56:09 PM
I had an RX 570 card which i could use for 3 minutes only to test the rig, after reinstalling my rig HDMI output nothing even though the card light is on "Sapphire logo's light"

is it possible it can be fixed ? i might disassemble the card later to check if the board is damaged or not.


Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: smaxz on June 04, 2017, 09:33:19 PM
570 would likely still be under warranty.. and if you can boot it, ensure it's got oem bios and send it for exchange is what I would do.



Title: Re: Hacking GPU cards back into operation because I need something to do....
Post by: lightfoot on June 08, 2017, 12:34:02 AM
Lightfoot how are you doing?

I am not real active on this forum, but I have a couple of 2 th
dragon miners with coincraft chips that conked out on me.

I Started having psu issues and even when I replaced the psu's
I have problems getting them hashing.

Can you send me an email so we can discuss possibly sending the miners to you
for repair?  Thanks for your help please check your pm box for my email or pm my

Thank you
Sure, will do. Been kind of busy with litecoin miner repairs and now am getting bitcoin gear again. Feel free to PM me again and we'll discuss talking a look at it.
C