Bitcoin Forum
April 26, 2024, 02:34:26 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 »  All
  Print  
Author Topic: Hacking GPU cards back into operation because I need something to do....  (Read 3879 times)
This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
January 12, 2017, 02:19:23 AM
 #21

So in terms of GPUs and such I've noticed that there are not a whole lot of chips on a GPU aside from the main chip. So I am guessing the GPU chip pretty much has it all.

if I think about this like an SGI Indigo ELan or Reality Engine we have these parts in the video system:

Sequence command decoder
Raster memory and Display Generator
Z buffer memory and Texture memory
GE7 Geometry engine GPUs.

They require different power supplies: The decoders and memory are normal 1.8/3.3 volt systems that pull a small amount of current to serve as the hotel load. The real power is needed for the Geometry Engines, and even they had a normal supply for the sequencers and buffers with the big power reserved for the transformation engines, lighting, shading, and polygon calculations.

So if we have a board that appears as a device in Windows it's probable that the hotel circuits are running, but the vector processors are out which might be those power supplies. I'll start sorting boards based on if they come up at all, come up with bad screens (probably Z buffer or raster memory errors) or something else.

Hm.
1714142066
Hero Member
*
Offline Offline

Posts: 1714142066

View Profile Personal Message (Offline)

Ignore
1714142066
Reply with quote  #2

1714142066
Report to moderator
1714142066
Hero Member
*
Offline Offline

Posts: 1714142066

View Profile Personal Message (Offline)

Ignore
1714142066
Reply with quote  #2

1714142066
Report to moderator
1714142066
Hero Member
*
Offline Offline

Posts: 1714142066

View Profile Personal Message (Offline)

Ignore
1714142066
Reply with quote  #2

1714142066
Report to moderator
The forum strives to allow free discussion of any ideas. All policies are built around this principle. This doesn't mean you can post garbage, though: posts should actually contain ideas, and these ideas should be argued reasonably.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714142066
Hero Member
*
Offline Offline

Posts: 1714142066

View Profile Personal Message (Offline)

Ignore
1714142066
Reply with quote  #2

1714142066
Report to moderator
1714142066
Hero Member
*
Offline Offline

Posts: 1714142066

View Profile Personal Message (Offline)

Ignore
1714142066
Reply with quote  #2

1714142066
Report to moderator
m0niker
Newbie
*
Offline Offline

Activity: 39
Merit: 0


View Profile
January 12, 2017, 04:39:39 AM
 #22

Will you be able to post pictures along with what you find when fixing the GPUs? It would be awesome to learn, if all goes well I'll try to find some dead cards for you to fix  Wink
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
January 12, 2017, 04:47:21 AM
 #23

Will you be able to post pictures along with what you find when fixing the GPUs? It would be awesome to learn, if all goes well I'll try to find some dead cards for you to fix  Wink
Absolutely. I did that in my Titan and Neptune threads, it's fun to do.
helipotte
Hero Member
*****
Offline Offline

Activity: 650
Merit: 500


Pick and place? I need more coffee.


View Profile
January 12, 2017, 05:54:05 AM
 #24

All of the AMD cards I have worked on have three dc/dc supplies on them.

1) 1.5V for the GDDR5. This one is usually fixed and consists of two phases.
2) 0.9V - 1.0V for the GPU memory controller I/O.  This is controllable via firmware and is often one or two phases.
3) 0.8V - 1.2V  for GPU core.  This is always firmware controlled and is usually at least 4 phases but some cards can have 10 or more. Shocked

I have seen shorts on all of these.  I have an Asus 280x that has all THREE shorted.  I have checked the gate of each mosfet and they are all good.
Thinking about pulling the chokes but when I apply current limited power to it and watch with a thermal camera the GPU die heats up.  Strange.

One of the cards that keeps popping mosfets is an older Nvidia 760ti.  This card always blows just ONE high side fet.  It will turn on, post then a variable
amount of time later (5-10 minutes) the PSU shuts down (shorted fet).  I feel the gate controller is to blame, could be wrong.   This card has 4 core phases and it is
not always the same one that pops it's high side fet.  This card also has the same core resistance as a working card. Huh
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
January 12, 2017, 11:27:44 PM
 #25

Yup, BFL's had 6 phase power supplies which really reduced ripple but went batshit if the FET drivers (2708's in the old days) would short. Likewise you had an RC circuit across each inductor, this would tell you the current across the inductor and was compared with the phase position from the LM driver to adjust for FETs running hot or cold. Problem is if that RC circuit goes out of whack then FETs start exploding.

Oddly enough Titan/Neptunes did it right: They bought off the shelf supplies, synced them in pairs, then placed them on the board around the chip in an implosion design so that no die in the chip was ever further away from one supply than the other. This matters on a 6-12 phase system since the distance from inductor to die can vary by an inch and while an inch seems like a small distance when you're pulling 400 watts per die at .5v that's 800a of current and P=I^2/R so I gets very big even if R is very small.

I'm guessing a similar situation exists in the GPUs. Gotta run to Boston this weekend, if anyone's up there and wants to grab lunch let me know. Next week I'll start posting some pics of a blown 270 board I have here.

C
m1n1ngP4d4w4n
Full Member
***
Offline Offline

Activity: 224
Merit: 100

CryptoLearner


View Profile
January 13, 2017, 08:52:59 AM
 #26

Woah, true electronic repair guys are so rare nowadays  Shocked, keep up the good work man  Cool
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
January 22, 2017, 03:10:01 AM
 #27

So back from my trip, finished the backlog of work that came in, and can fiddle with these boards a bit more.

First up is an AMD R9/270 that I picked up on Ebay broken. Sure enough it didn't come up as a display, but was registered by the computer. So something was working, just not the whole thing.

The board


It's really pretty simple: On the left is the high power circuits for the GPU, on the right is a lower power circuit pair for the memory, and hotel circuits.

If you look more closely at the left side you can see how this is powered: Five separate chokes indicate 5 power supplies. The little FQDN chips next to the chokes are the FETs plus drivers plus dead-time logic. The lettering on them is hard to read but I can see they are Fairchild 6705B half bridge buck drivers. They contain the switching logic, a high and low side FET, and appropriate logic to determine cut-through and current sensing (via the little RC circuits at the bottom of the board). Pretty simple actually, according to the docs each one can handle 40a of current, so we're looking at a max of 200a into the GPU. About right.

The question is: What is happening? Normally the high side FET shorts, in which case the cut-through circuitry crowbars the low side and shuts down the controller. The problem with that is if +12 was connected to the GPU the low resistance of the GPU would essentially short out your powr supply or blow the GPU sky high. Given that neither are happening I'm not sure if the failure is in the FETs. It's possible the low side FET blew, but since the Rds switching time is mostly on for the low side in a buck converter they don't usually ever short out. Plus the voltage drop on the high side is much higher (going from 12v to 1 instead of 1v to 0) so the high side FET normally blows.

Hm.... One way to find a shorted supply is to pull the chokes and check the resistance of the circuit at the output/FET side of the choke. The bad supply will read high (or low if the low side FET is shorted) and you're in business. Or if the FETs are exposed you can look for a short between gate and source or drain. Normally when a FET blows the gate is shorted as well.


Need to think more about this.
helipotte
Hero Member
*****
Offline Offline

Activity: 650
Merit: 500


Pick and place? I need more coffee.


View Profile
January 23, 2017, 03:49:32 AM
 #28

I have two XFX cards.  A 280 and 280x that I got with dead shorts on the VDDC (gpu core) supply.  Turned out to be a smd ceramic cap on the back!
Found it by putting a D cell battery across the VDDC and looking at the card with a thermal camera.  The battery had just enough current to make the
bad cap glow, but not enough to damage anything.

Currently have a Devil 13 dual 290x that has some strange measurements.  One of the gpus shows much lower resistances than the other.  Looking like it might
have a fried GPU.  Do wish I could find pinouts or datasheets on these things.  I am going to try lifting the choke(s) to confirm where the shorts are.                         This card is absurd as far as the power goes.  It has 15 power phases at 40A each.  5 per gpu core, 1 for each gpu memory controller and three for the GDDR5.         That's 120A just for the memory!
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
January 23, 2017, 03:53:49 AM
 #29

I have two XFX cards.  A 280 and 280x that I got with dead shorts on the VDDC (gpu core) supply.  Turned out to be a smd ceramic cap on the back!
Found it by putting a D cell battery across the VDDC and looking at the card with a thermal camera.  The battery had just enough current to make the
bad cap glow, but not enough to damage anything.

Currently have a Devil 13 dual 290x that has some strange measurements.  One of the gpus shows much lower resistances than the other.  Looking like it might
have a fried GPU.  Do wish I could find pinouts or datasheets on these things.  I am going to try lifting the choke(s) to confirm where the shorts are.                         This card is absurd as far as the power goes.  It has 15 power phases at 40A each.  5 per gpu core, 1 for each gpu memory controller and three for the GDDR5.         That's 120A just for the memory!
Nice job! I've used a 30ah SAFT NiCD cell as a tester like that, enough current to heat up the short while keeping the voltage below what will blow up a hash engine or GPU. That second one could point to a bad high side FET, does it crowbar the power supply when plugged in by chance?
helipotte
Hero Member
*****
Offline Offline

Activity: 650
Merit: 500


Pick and place? I need more coffee.


View Profile
January 23, 2017, 04:18:18 AM
 #30

It does not. This card has voltage test points on the back edge.  While powered up, When I check the voltages this is what I get:

12V            good  This is only the power from the pci-e slot
3.3V           good  This is only the power from the pci-e slot
1.8V           good  Don't know what this is, suspect power for the PLX pci-e bridge chip
0.9V           good  Pci-e I/O
GDDR5        dead  supply for the 32 GDDR5 modules
gpu1/core   dead  This is the low resistance gpu  (1.0 ohm)
gpu2/core   good  This is the normal resistance gpu (2.5 ohm)
gpu1/mc     dead  This is the memory controller for the "bad" gpu (3.5 ohm)
gpu2/mc     good  This is the memory controller for the "good" gpu (35 ohm)

I suspect the start-up sequence this card uses is to power up the core and memory controller for each gpu first and then the memory due to it being shared.
Likely it's firmware goes:

gpu1-->good?-->no-->crowbar.
gpu2-->good?-->yes-->turn on power.
memory-->do not turn on due to gpu 1 crowbar.

Sound plausible?
64dimensions
Hero Member
*****
Offline Offline

Activity: 578
Merit: 508


View Profile
January 23, 2017, 02:26:06 PM
 #31

Either of you fix PC power supplies?

I have an Antec 1300W that is bad.
AmDD
Legendary
*
Offline Offline

Activity: 1027
Merit: 1005



View Profile
January 23, 2017, 03:33:51 PM
 #32

Cool, I'll be watching this. I also can add my name to the list of people willing to send a few broken cards to you. I should have some 7950's around somewhere.

BTC tip jar: 18EKpbrcXxbpzAZv3T58ccGcVis7W7JR9w
LTC tip jar: Lgp8ERykAgx6Q8NdMqpi5vnVoUMD2hYn2a
alucard20724
Sr. Member
****
Offline Offline

Activity: 703
Merit: 272


View Profile
February 05, 2017, 01:46:41 AM
 #33

i just pulled out my OOC box of gpus.

There's:
one R9 290,
three R9 280X
six 7970 with EK waterblocks, two cards of which have burn gnd/pwr traces (water leak i think)
one R290X with ek waterblock... still in system.. too much of a pain to remove it.

you say you drill holes in teeth.. have you ever done a pcb repair?... that's the one thing i've never done.
alucard20724
Sr. Member
****
Offline Offline

Activity: 703
Merit: 272


View Profile
February 05, 2017, 01:58:29 AM
 #34

and i'm getting sucked in... just started looking at boom microscopes...  Cheesy
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 05, 2017, 03:11:45 AM
 #35

you say you drill holes in teeth.. have you ever done a pcb repair?... that's the one thing i've never done.
Sure. I have had to totally rebuild alternate power planes on blown Titans. What a mess that was.

And speaking of mess, pictures of my latest $30 Ebay special coming up.
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 05, 2017, 03:15:29 AM
 #36

So back to the GPUs. Got an R9 sapphire that was dead.



Didn't take long to figure out why it was shorting the power supply:



This board also has FETs that include the high and low sides together on one chip die. There are seven phases, with seven FET chips and seven chokes. However it looks like FET #7 shorted and literally *exploded*. Probably had a very large power supply. Not sure if this one can be fixed, but at least we know what is wrong with it....

Got some more Titans in so working on those this weekend, then an R4 that seems to be out. However this would shut the system down pretty hard.

C
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 05, 2017, 04:36:33 AM
 #37

And I just pulled a few parts to clear around the destruction. You can see the FET I pulled upside down on one of the inductors, even though they sanded off the part numbers we can see it's basically the same FET design and concept. Still doesn't change the fact that the topone vaporized though, my guess is the FET shorted and the power supply graciously destroyed it rather than trip out.



https://i.imgur.com/yTCyyHQ.jpg



Never dull, but getting annoying spending money for very blown up boards.
alucard20724
Sr. Member
****
Offline Offline

Activity: 703
Merit: 272


View Profile
February 05, 2017, 05:20:26 AM
 #38

And I just pulled a few parts to clear around the destruction. You can see the FET I pulled upside down on one of the inductors, even though they sanded off the part numbers we can see it's basically the same FET design and concept. Still doesn't change the fact that the topone vaporized though, my guess is the FET shorted and the power supply graciously destroyed it rather than trip out.



https://i.imgur.com/yTCyyHQ.jpg



Never dull, but getting annoying spending money for very blown up boards.

what desolder tool are you using to remove parts?
lightfoot (OP)
Legendary
*
Offline Offline

Activity: 3094
Merit: 2239


I fix broken miners. And make holes in teeth :-)


View Profile
February 05, 2017, 05:38:45 AM
 #39

what desolder tool are you using to remove parts?
Hot air tools and preheat. Specifically an Aoyue 951 and an 853 pre-heater. You really always should pre-heat the board, otherwise you risk lifting pins and overheating components.

smaxz
Sr. Member
****
Offline Offline

Activity: 430
Merit: 253


VeganAcademy


View Profile
February 05, 2017, 09:44:58 AM
 #40

what desolder tool are you using to remove parts?
Hot air tools and preheat. Specifically an Aoyue 951 and an 853 pre-heater. You really always should pre-heat the board, otherwise you risk lifting pins and overheating components.



this is a good tip, i just started work on that gigabyte we've been pm'ing back and forth about.

preparing to get out the good ole solder sucker tho ;p coupled with my hakko of course.



any idea why all sapphire gpu's have this thermal pad stuck over various smd components? also leaving that single ram chip completely uncooled.

- NGdTwHRSdnThdi1drQuHGT3khAHRtZ1HMq -
Pages: « 1 [2] 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!