lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
January 12, 2017, 02:19:23 AM |
|
So in terms of GPUs and such I've noticed that there are not a whole lot of chips on a GPU aside from the main chip. So I am guessing the GPU chip pretty much has it all.
if I think about this like an SGI Indigo ELan or Reality Engine we have these parts in the video system:
Sequence command decoder Raster memory and Display Generator Z buffer memory and Texture memory GE7 Geometry engine GPUs.
They require different power supplies: The decoders and memory are normal 1.8/3.3 volt systems that pull a small amount of current to serve as the hotel load. The real power is needed for the Geometry Engines, and even they had a normal supply for the sequencers and buffers with the big power reserved for the transformation engines, lighting, shading, and polygon calculations.
So if we have a board that appears as a device in Windows it's probable that the hotel circuits are running, but the vector processors are out which might be those power supplies. I'll start sorting boards based on if they come up at all, come up with bad screens (probably Z buffer or raster memory errors) or something else.
Hm.
|
|
|
|
m0niker
Newbie
Offline
Activity: 39
Merit: 0
|
|
January 12, 2017, 04:39:39 AM |
|
Will you be able to post pictures along with what you find when fixing the GPUs? It would be awesome to learn, if all goes well I'll try to find some dead cards for you to fix
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
January 12, 2017, 04:47:21 AM |
|
Will you be able to post pictures along with what you find when fixing the GPUs? It would be awesome to learn, if all goes well I'll try to find some dead cards for you to fix Absolutely. I did that in my Titan and Neptune threads, it's fun to do.
|
|
|
|
helipotte
|
|
January 12, 2017, 05:54:05 AM |
|
All of the AMD cards I have worked on have three dc/dc supplies on them. 1) 1.5V for the GDDR5. This one is usually fixed and consists of two phases. 2) 0.9V - 1.0V for the GPU memory controller I/O. This is controllable via firmware and is often one or two phases. 3) 0.8V - 1.2V for GPU core. This is always firmware controlled and is usually at least 4 phases but some cards can have 10 or more. I have seen shorts on all of these. I have an Asus 280x that has all THREE shorted. I have checked the gate of each mosfet and they are all good. Thinking about pulling the chokes but when I apply current limited power to it and watch with a thermal camera the GPU die heats up. Strange. One of the cards that keeps popping mosfets is an older Nvidia 760ti. This card always blows just ONE high side fet. It will turn on, post then a variable amount of time later (5-10 minutes) the PSU shuts down (shorted fet). I feel the gate controller is to blame, could be wrong. This card has 4 core phases and it is not always the same one that pops it's high side fet. This card also has the same core resistance as a working card.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
January 12, 2017, 11:27:44 PM |
|
Yup, BFL's had 6 phase power supplies which really reduced ripple but went batshit if the FET drivers (2708's in the old days) would short. Likewise you had an RC circuit across each inductor, this would tell you the current across the inductor and was compared with the phase position from the LM driver to adjust for FETs running hot or cold. Problem is if that RC circuit goes out of whack then FETs start exploding.
Oddly enough Titan/Neptunes did it right: They bought off the shelf supplies, synced them in pairs, then placed them on the board around the chip in an implosion design so that no die in the chip was ever further away from one supply than the other. This matters on a 6-12 phase system since the distance from inductor to die can vary by an inch and while an inch seems like a small distance when you're pulling 400 watts per die at .5v that's 800a of current and P=I^2/R so I gets very big even if R is very small.
I'm guessing a similar situation exists in the GPUs. Gotta run to Boston this weekend, if anyone's up there and wants to grab lunch let me know. Next week I'll start posting some pics of a blown 270 board I have here.
C
|
|
|
|
m1n1ngP4d4w4n
Full Member
Offline
Activity: 224
Merit: 100
CryptoLearner
|
|
January 13, 2017, 08:52:59 AM |
|
Woah, true electronic repair guys are so rare nowadays , keep up the good work man
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
January 22, 2017, 03:10:01 AM |
|
So back from my trip, finished the backlog of work that came in, and can fiddle with these boards a bit more. First up is an AMD R9/270 that I picked up on Ebay broken. Sure enough it didn't come up as a display, but was registered by the computer. So something was working, just not the whole thing. The board It's really pretty simple: On the left is the high power circuits for the GPU, on the right is a lower power circuit pair for the memory, and hotel circuits. If you look more closely at the left side you can see how this is powered: Five separate chokes indicate 5 power supplies. The little FQDN chips next to the chokes are the FETs plus drivers plus dead-time logic. The lettering on them is hard to read but I can see they are Fairchild 6705B half bridge buck drivers. They contain the switching logic, a high and low side FET, and appropriate logic to determine cut-through and current sensing (via the little RC circuits at the bottom of the board). Pretty simple actually, according to the docs each one can handle 40a of current, so we're looking at a max of 200a into the GPU. About right. The question is: What is happening? Normally the high side FET shorts, in which case the cut-through circuitry crowbars the low side and shuts down the controller. The problem with that is if +12 was connected to the GPU the low resistance of the GPU would essentially short out your powr supply or blow the GPU sky high. Given that neither are happening I'm not sure if the failure is in the FETs. It's possible the low side FET blew, but since the Rds switching time is mostly on for the low side in a buck converter they don't usually ever short out. Plus the voltage drop on the high side is much higher (going from 12v to 1 instead of 1v to 0) so the high side FET normally blows. Hm.... One way to find a shorted supply is to pull the chokes and check the resistance of the circuit at the output/FET side of the choke. The bad supply will read high (or low if the low side FET is shorted) and you're in business. Or if the FETs are exposed you can look for a short between gate and source or drain. Normally when a FET blows the gate is shorted as well. Need to think more about this.
|
|
|
|
helipotte
|
|
January 23, 2017, 03:49:32 AM |
|
I have two XFX cards. A 280 and 280x that I got with dead shorts on the VDDC (gpu core) supply. Turned out to be a smd ceramic cap on the back! Found it by putting a D cell battery across the VDDC and looking at the card with a thermal camera. The battery had just enough current to make the bad cap glow, but not enough to damage anything.
Currently have a Devil 13 dual 290x that has some strange measurements. One of the gpus shows much lower resistances than the other. Looking like it might have a fried GPU. Do wish I could find pinouts or datasheets on these things. I am going to try lifting the choke(s) to confirm where the shorts are. This card is absurd as far as the power goes. It has 15 power phases at 40A each. 5 per gpu core, 1 for each gpu memory controller and three for the GDDR5. That's 120A just for the memory!
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
January 23, 2017, 03:53:49 AM |
|
I have two XFX cards. A 280 and 280x that I got with dead shorts on the VDDC (gpu core) supply. Turned out to be a smd ceramic cap on the back! Found it by putting a D cell battery across the VDDC and looking at the card with a thermal camera. The battery had just enough current to make the bad cap glow, but not enough to damage anything.
Currently have a Devil 13 dual 290x that has some strange measurements. One of the gpus shows much lower resistances than the other. Looking like it might have a fried GPU. Do wish I could find pinouts or datasheets on these things. I am going to try lifting the choke(s) to confirm where the shorts are. This card is absurd as far as the power goes. It has 15 power phases at 40A each. 5 per gpu core, 1 for each gpu memory controller and three for the GDDR5. That's 120A just for the memory!
Nice job! I've used a 30ah SAFT NiCD cell as a tester like that, enough current to heat up the short while keeping the voltage below what will blow up a hash engine or GPU. That second one could point to a bad high side FET, does it crowbar the power supply when plugged in by chance?
|
|
|
|
helipotte
|
|
January 23, 2017, 04:18:18 AM |
|
It does not. This card has voltage test points on the back edge. While powered up, When I check the voltages this is what I get:
12V good This is only the power from the pci-e slot 3.3V good This is only the power from the pci-e slot 1.8V good Don't know what this is, suspect power for the PLX pci-e bridge chip 0.9V good Pci-e I/O GDDR5 dead supply for the 32 GDDR5 modules gpu1/core dead This is the low resistance gpu (1.0 ohm) gpu2/core good This is the normal resistance gpu (2.5 ohm) gpu1/mc dead This is the memory controller for the "bad" gpu (3.5 ohm) gpu2/mc good This is the memory controller for the "good" gpu (35 ohm)
I suspect the start-up sequence this card uses is to power up the core and memory controller for each gpu first and then the memory due to it being shared. Likely it's firmware goes:
gpu1-->good?-->no-->crowbar. gpu2-->good?-->yes-->turn on power. memory-->do not turn on due to gpu 1 crowbar.
Sound plausible?
|
|
|
|
64dimensions
|
|
January 23, 2017, 02:26:06 PM |
|
Either of you fix PC power supplies?
I have an Antec 1300W that is bad.
|
|
|
|
AmDD
Legendary
Offline
Activity: 1027
Merit: 1005
|
|
January 23, 2017, 03:33:51 PM |
|
Cool, I'll be watching this. I also can add my name to the list of people willing to send a few broken cards to you. I should have some 7950's around somewhere.
|
BTC tip jar: 18EKpbrcXxbpzAZv3T58ccGcVis7W7JR9w LTC tip jar: Lgp8ERykAgx6Q8NdMqpi5vnVoUMD2hYn2a
|
|
|
alucard20724
|
|
February 05, 2017, 01:46:41 AM |
|
i just pulled out my OOC box of gpus.
There's: one R9 290, three R9 280X six 7970 with EK waterblocks, two cards of which have burn gnd/pwr traces (water leak i think) one R290X with ek waterblock... still in system.. too much of a pain to remove it.
you say you drill holes in teeth.. have you ever done a pcb repair?... that's the one thing i've never done.
|
|
|
|
alucard20724
|
|
February 05, 2017, 01:58:29 AM |
|
and i'm getting sucked in... just started looking at boom microscopes...
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
February 05, 2017, 03:11:45 AM |
|
you say you drill holes in teeth.. have you ever done a pcb repair?... that's the one thing i've never done.
Sure. I have had to totally rebuild alternate power planes on blown Titans. What a mess that was. And speaking of mess, pictures of my latest $30 Ebay special coming up.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
February 05, 2017, 03:15:29 AM |
|
So back to the GPUs. Got an R9 sapphire that was dead. Didn't take long to figure out why it was shorting the power supply: This board also has FETs that include the high and low sides together on one chip die. There are seven phases, with seven FET chips and seven chokes. However it looks like FET #7 shorted and literally *exploded*. Probably had a very large power supply. Not sure if this one can be fixed, but at least we know what is wrong with it.... Got some more Titans in so working on those this weekend, then an R4 that seems to be out. However this would shut the system down pretty hard. C
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
February 05, 2017, 04:36:33 AM |
|
And I just pulled a few parts to clear around the destruction. You can see the FET I pulled upside down on one of the inductors, even though they sanded off the part numbers we can see it's basically the same FET design and concept. Still doesn't change the fact that the topone vaporized though, my guess is the FET shorted and the power supply graciously destroyed it rather than trip out. https://i.imgur.com/yTCyyHQ.jpgNever dull, but getting annoying spending money for very blown up boards.
|
|
|
|
alucard20724
|
|
February 05, 2017, 05:20:26 AM |
|
And I just pulled a few parts to clear around the destruction. You can see the FET I pulled upside down on one of the inductors, even though they sanded off the part numbers we can see it's basically the same FET design and concept. Still doesn't change the fact that the topone vaporized though, my guess is the FET shorted and the power supply graciously destroyed it rather than trip out. https://i.imgur.com/yTCyyHQ.jpgNever dull, but getting annoying spending money for very blown up boards. what desolder tool are you using to remove parts?
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3150
Merit: 2257
I fix broken miners. And make holes in teeth :-)
|
|
February 05, 2017, 05:38:45 AM |
|
what desolder tool are you using to remove parts?
Hot air tools and preheat. Specifically an Aoyue 951 and an 853 pre-heater. You really always should pre-heat the board, otherwise you risk lifting pins and overheating components.
|
|
|
|
smaxz
Sr. Member
Offline
Activity: 430
Merit: 253
VeganAcademy
|
|
February 05, 2017, 09:44:58 AM |
|
what desolder tool are you using to remove parts?
Hot air tools and preheat. Specifically an Aoyue 951 and an 853 pre-heater. You really always should pre-heat the board, otherwise you risk lifting pins and overheating components. this is a good tip, i just started work on that gigabyte we've been pm'ing back and forth about. preparing to get out the good ole solder sucker tho ;p coupled with my hakko of course. any idea why all sapphire gpu's have this thermal pad stuck over various smd components? also leaving that single ram chip completely uncooled.
|
- NGdTwHRSdnThdi1drQuHGT3khAHRtZ1HMq -
|
|
|
|