lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 11, 2017, 02:38:36 AM |
|
So I've been fixing Titans, Neptunes, Monarchs, Singles, Avalons, and a whole bunch of mining technologies over the years, but for some reason never really fiddled around much with GPU cards. They blow up too, and I see them on sale at Ebay all the time. I need a challenge, so I thought I would start a thread on my observations in fixing them if possible, developing techniques that can work, and figuring out how to tell one that can be fixed from a brick.
As normal, I will post my thoughts below and see what I can come up with. First up I need to find some dead cards to practice on....
Background: Years of doing SMD repair on electric car power controllers (400v/500a) as well as miners (.6 volts, 1000 amps) and other small things. I prefer to use hot air rework tools, and I like to use pre-heat to keep from roasting components. I don't use the toaster to repair boards. :-)
Let's see where this goes.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 11, 2017, 02:39:05 AM |
|
Reserved for tips and tricks
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 11, 2017, 02:39:19 AM |
|
Reserved for status. Let's roll....
|
|
|
|
Emoclaw
|
|
January 11, 2017, 02:53:47 AM |
|
Nice. I'd be interested to see how many of the cards you attempt to repair can actually be repaired. I have a friend in the component-level repair industry and he says that most GPUs die because their VRMs are either of terrible design or the cooling is bad. The graphics chip itself rarely dies. Though he doesn't actually repair graphics card due to luck of schematics, which he says makes the process more time consuming. Good luck, I'll be following this thread.
|
|
|
|
reb0rn21
Legendary
Offline
Activity: 1901
Merit: 1024
|
|
January 11, 2017, 02:57:59 AM |
|
I presume if VRM get shorted, PCB will be damaged at least on mid/high end cards
In past like 6+ years ago most problems were due GPU used bad solder to PCB so reflow helped, now I think its VRM mostly or GPU memory going bad
|
|
|
|
bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
January 11, 2017, 03:04:13 AM |
|
Cool.
Out of dozens of GPUs over the years I only ever had one particular model (GV-N75TOC-2GI) dying because it had weak VRMs. I think 5 out of six died withing months. After the RMA repair process the same cards still work flawlessly.
|
Not your keys, not your coins!
|
|
|
SweaterJacket
Newbie
Offline
Activity: 27
Merit: 0
|
|
January 11, 2017, 03:17:15 AM |
|
Reserved
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 11, 2017, 03:25:25 AM |
|
Interesting. Power subsystems are one of my specialties, it's surprisingly hard to build a good one and easy to screw it up.
My first thought was that overheating the GPU chip could cause the solder balls to go high resistance, thus causing it to fail, however the problem is most GPUs are a very high density BGA mounted on a board to a pitch that will mate to a rational PCB. The high density BGA isn't the issue, it's that they glue the die to the carrier and if you overheat the chip too much the solder balls "blow out" and short under the die. That's sunk.
I'll take a look into the VRMs.
C
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 11, 2017, 03:29:10 AM |
|
I presume if VRM get shorted, PCB will be damaged at least on mid/high end cards
In past like 6+ years ago most problems were due GPU used bad solder to PCB so reflow helped, now I think its VRM mostly or GPU memory going bad
Typically the high side FETs on reasonable VRMs will have a RC circuit or a op amp comparator across them to measure current flow and shut down the VRM if the current flow goes too high (ie a burned FET) before there is a cut through short to ground. Low side FETs rarely fail because their on time is much higher than the high side, so they don't have as much switching loss. If the GPU shorts internally you're sunk of course but that can be tested by pulling the high side FETs and looking for shorts. Hm.
|
|
|
|
l8nit3
Legendary
Offline
Activity: 1007
Merit: 1000
|
|
January 11, 2017, 03:29:23 AM |
|
Im highly intrigued by this idea and have thought of the same myself, however dont have the low-level hardware background to make it a possibility. Personally i have a 280x thats driving me nuts. Hopefully you end up working with a card with a similar issue. Just to put it out there, the card mines just fine, but no matter what drivers or gpu-reading software i use (gpu-z, AB, trixx) I cannot ever get this thing to show a temperature! In fact ive spent the shipping and had it sent back to gigabyte under warrunty, and after claiming to fix it, it still shows no temp! All that said, I love the idea of this thread and will be following very closely, Good luck, and thank you in advance for any tips and tricks you find.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 11, 2017, 03:32:27 AM |
|
Nice. I'd be interested to see how many of the cards you attempt to repair can actually be repaired. I have a friend in the component-level repair industry and he says that most GPUs die because their VRMs are either of terrible design or the cooling is bad. The graphics chip itself rarely dies. Though he doesn't actually repair graphics card due to luck of schematics, which he says makes the process more time consuming. Good luck, I'll be following this thread.
Indeed. Lack of cooling on VRMs will cause the FETs to go, my guess is if you're overclocking that can do it (current will avalanche as temps go up). As for schematics, there never seem to be any, anymore especially for Bitcoin miners; no one wants to take the liability I suppose. However these things are pretty simple at their heart: Get power into them, get work into the chip and out, and put the heat somewhere. Now I need some dead boards to start working on. Anyone got a box of old dead boards?
|
|
|
|
mirny
Legendary
Offline
Activity: 1108
Merit: 1005
|
|
January 11, 2017, 03:47:57 AM |
|
I have 5, or 6 dead boards, 7950,7970,280x,6990s
|
This is my signature...
|
|
|
bathrobehero
Legendary
Offline
Activity: 2002
Merit: 1051
ICO? Not even once.
|
|
January 11, 2017, 04:19:46 AM |
|
Getting back to my previous comment about certain models having the same issue, I also used to have a bunch of Asus GTX 780 Ti cards that were designed in a way that their VRMs would go well above 100°C as they had absolutely no dissipation (just hid under the heatsink with no contact). I bought a few thermal pads and put it on them so that the pads connected them to the heatsink and the temps were decreased drastically.
Also, when I used to mine Ethereum I noticed the memory modules would go slightly above 100°C (GTX 970) even without overclocking while the GPU itself was about 60°C and I expect a lot of those cards will end up dying coming from miners who mined Eth for a long time or might even still mine it.
So my point is that probably each exact model of cards have an expected way of dying. And ebay is probably full of faulty cards that were already checked by someone experienced like OP and deemed FUBAR.
|
Not your keys, not your coins!
|
|
|
adaseb
Legendary
Offline
Activity: 3878
Merit: 1733
|
|
January 11, 2017, 11:18:17 AM |
|
The most common failure with GPUs are the fans. Depending which type of fan it uses it can be repaired in different ways.
The Sapphire Dual-X R9 280X, Gigabyte Windforce fans, all have fan blades that you can easily pop-off using some string and then relube the bearing with grease. Works everytime pretty much.
The more durable fans like on the ASUS 7970 / ASUS 280x / MSI 280x you need to drill a hole in the back slightly off-centre and pour in the thinnest oil that can fit inside. This sometimes works great ... sometimes works but rattles.... reason being that lube would be best however its impossible to lubricate the bearing.
For the newer RX 470 / 480 the fans will probably start failing sooner or later however for those most have 2-3 year warranty and you can just RMA them.
|
|
|
|
FFI2013
|
|
January 11, 2017, 08:08:31 PM |
|
I have a gigabyte r9/270 you can check out I lost the receipt to RMA but if your in the us I can ship it to you. I also have a gridseed blade that needs to be looked at
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 11, 2017, 10:12:32 PM |
|
Yes, I am in the US, feel free to PM me as needed.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 11, 2017, 10:14:20 PM |
|
I had a GPU die in quite a silly way, the PCI extender I was using (16x to 1x) I had the 1x plugged in the wrong way, and apparently this killed the card through an extender. If this is something you think you can fix, I will gladly send it to you for shipping cost.
Sure. I'll PM you my address. C
|
|
|
|
hhdllhflower
Newbie
Offline
Activity: 18
Merit: 0
|
|
January 11, 2017, 10:21:52 PM |
|
Reserved nice job
|
|
|
|
helipotte
|
|
January 12, 2017, 01:49:18 AM |
|
Nice to see you working on GPU's. I have a few stencils coming for tahiti/pitcairn/hawaii and I will take a crack at re-balling some of the units I have. They look like they use 0.5mm balls. Can send you some of my "trouble" units if you want to try to fix them. I have a few units that keep popping mosfets. Have been trying to find out a way to narrow down bad memory chips on cards. Don't even know if this is possible without changing them one at a time.
Cheers!
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3178
Merit: 2260
I fix broken miners. And make holes in teeth :-)
|
|
January 12, 2017, 02:13:20 AM |
|
You would think one could run diagnostics on the things to find the bad memory cards; those are easy to swap out but yes, a pain.
Have you tried checking the resistance with the inductors off? Back in the BFL days that was the #1 best way to identify which FET was shorted and also a way to identify a shorted die (0 ohms means infinite current no matter how you slice it).
C
|
|
|
|
|