lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 19, 2015, 06:10:44 AM Last edit: December 19, 2015, 04:29:35 PM by lightfoot |
|
Given that KNC seems to be a bit... repetitive... in how they build things (not a bad thing actually), it might be possible to repair a Titan with downed engines. I'm watching this thing and I can see that some power supplies may be in the perumba of the airflow stream, couple that with people's desire to run these things like hell-beasts and you could blow out some of the FETs.
Maybe I should re-title this KNC Neptune and Titan and Jupiter miners....
|
|
|
|
Prelude
Legendary
Offline
Activity: 1596
Merit: 1000
|
|
December 19, 2015, 03:30:17 PM |
|
Fantastic progress, lightfoot. I just bought a cheap SMD rework station partly because of this thread. Hoping to fix a Titan cube and a few Neptune cubes with this info. Now for the n00b questions: When you say you pulled the caps, I assume it goes without saying that you replaced them with new ones. Did you use caps with the same values as stock? I'm wondering what I need to order before attempting any of this. Sorry if this is a dumb question, I've never done this type of work before. If you ever feel the urge to post pictures showing what you replaced, don't hold yourself back.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 19, 2015, 03:54:03 PM |
|
Fantastic progress, lightfoot. I just bought a cheap SMD rework station partly because of this thread. Hoping to fix a Titan cube and a few Neptune cubes with this info.
Thanks! Go for it, this is how we all learn and get better and stuff. I should do another talk at Defcon or something about this, need a good talk title (how to figure shit out when it's on fire? Hm...) When you say you pulled the caps, I assume it goes without saying that you replaced them with new ones. Did you use caps with the same values as stock? I'm wondering what I need to order before attempting any of this. Sorry if this is a dumb question, I've never done this type of work before. If you ever feel the urge to post pictures showing what you replaced, don't hold yourself back. Well, sort of. On the hashing boards I will replace the filtering caps because they serve the purpose of both stabilizing the power input which is being whacked around by the DC-DC's, and because they can help in making the supply more efficient (power factor stuff, really interesting reads out there on that). For the controller board, the caps are important, but a bit less so. You put them on the inputs for a similar reason but since the FPGA is only pulling .5a at 3.2000 volts on the input that's only 1.5 watts and only .001a on the 1.2 volt lines). So the exact values are a bit less critical and if you leave them off for testing purposes the world will not come to an end. So you can play fast and loose on these caps in the short term without destroying too much stuff. I left a few on (the most important one is the one next to the TPS65217 chip because that's where the DC-DC conversion and the chokes are) and for the rest I'll put them back on "later". Finding out the values when the manufacturer doesn't give out schematics (boo!) is a bit complicated, but a $49 or so good Radio Schlock meter with the capacitance testing function is pretty good for getting close. Now, if we were talking about caps in a RC circuit (for timing, checking waveform ripple across an inductor, or as part of a current sensing detector for an overloaded FET) then the values are more critical. But on the low power stuff on a Neptune board (aside from the fact that there is probably something like that around the TPS chip's regulator points) this is once again not too much of a problem. Sure, I'll post pics of this reflow repair, and the times it took with heat to flow the components. Will grab the cam....
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 19, 2015, 04:20:59 PM |
|
Ok, so let's see. First, here's a map of the KNC board with the locations of the main caps. Next a picture of the running board with a reflowed TPS chip (check it out) and some of the caps removed. Next we have what powers most of my work around here. And finally the equal to the above pic in the re-work world. Seriously, a good preheater is the difference between using a blowtorch or a crem brule torch to warm your coffee. By bringing the board up to 200c or so you can quickly remove and reflow components with just a touch of hot air heat.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 20, 2015, 03:03:02 AM |
|
On to the next problem, dead boards.
Taking a look at a Neptune hashing board I can see they have a big chip in the center which appears to be 4 separate dies in one package. Makes sense, as the design of this board is an implosion type, with 8 DC-DC power supplies around the chip and every two supplies power one die/side of the board. That way you don't have to schlep all the power from one point across the board. Wish everyone did it this way, oh well.
Anyway, board #1 has a nice brown discoloration under 1/4 of the die and sure enough that's where the problem is: The 1 volt line to the chips is shorted there, 20 or so ohms on the other three. My guess is a blown something, now to find what....
C
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 20, 2015, 04:52:38 AM Last edit: December 20, 2015, 05:23:47 PM by lightfoot |
|
Ok, boards. They need more heat, need to be careful I don't accidentally re-flow the hashing chips, so I will put tinfoil over the underside of the chip. Kind of like how you keep your turkey from melting.
So anyway, board #1 in the screwed up world. Plugging it in with a Corsair heat sink on top (water cooling is so cool!) gave me a unit that came up but would barely hash. Checking the voltages showed a few things:
1) The layout of the power supplies is not easily apparent. It's actually:
2 0 3 1 4 5 6 7 (Maybe, first shot at mapping them) According to the code, power supply 3 was reading no current, no temp and power supply 2 was reading almost no voltage, 1 amp, and high temps (70c). Unusual. Sure enough the 1v rail normally will have a 30 ohm resistance cold, these two had a zero. Since one was trying to come up, I think the failure is in the other one shorting to ground and locking out the first one.
Great.
Now to figure out where it's shorting. These little power supplies are kind of cute: They are self-contained, can do all sorts of cool stuff, and have a pair of high side FETs on top and three low side FETs on the bottom. Low side carries a lot of current but has very low R(ds). High side is the opposite and they usually get hot as hell. So design makes sense.
On the low side rail short there are two places where it can happen: 1) Caps short 2) FETs on bottom side short.
Now to do some melting and testing...
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 20, 2015, 08:30:28 PM |
|
Note: Board 1 fully operational. Power supply #2 runs a bit hotter than the others, that's probably why it blew the caps apart. New caps on, speed is 620gh at 270 watts with three dies running at full power (450) and the slower die at 350. Peak dc temp 78c, chip temp 45c.
So on to the supplies. Pictures in a little bit.
Ok, pulling the supplies sucks. I might get some cheat sauce if I do more of them. That said:
It's best to heat the board at full temps. Need to get the board over 150c to have a shot.
Flux the tops of the power supply pillars
Apply heat from the *top* of the power board.
Go 400c with the heat, moderate flow
Lift the supply lines first, then the back
Be prepared to lift off straight. Otherwise things on the back of the board will go flying.
That said, with both supplies off I still see a short. Which means either it's a trimmer cap on the board or the CPU is shorted. But there are 30 caps, you can't check them all...
However you *can* use an old trick of powering the short and looking for things that are warm. Warm things are probably the source of the short. It's usually only a few degrees C, but you can pick it up with an IR temp tool. Which I have.
However you can't apply a voltage significantly about the max voltage the component can take, otherwise if the part blows open everything else will fail big time. So I need a 1 volt supply at a thousand or so amps peak.
Fortunately I know exactly where to find such a voltage. Next week will be using explosive power to troubleshoot technology.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 21, 2015, 02:38:13 AM |
|
And a few pictures. First, one with the Neptune power supplies removed. And the water cooled Neptune, actually works great and allows the power supplies to breathe better... Working on some other things for a bit, but still a lot of fun to be poking around in all of this.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 23, 2015, 03:15:49 AM |
|
Fast update: Might have a Titan controller and dead units coming in, we'll see. If so I'll post pictures and updates on those, will be interesting to see what they look like.
Also put on the FPGA on the first board and now I have two running Neptune controllers. The secret to reflowing 100% pin BGA's is heat under the board, zephlux, and a really good watch loupe to verify the balls are on the pads.
Next up in the meantime is a weird Neptune board: This one has a short on one of the low voltage sides, but does not respond at all to a controller. It looks like it's hanging up the FPGA side of the SPI interface, might be shorted. So I'm going to pull the two bad power supplies and see if that clears it.
|
|
|
|
qberty
|
|
December 23, 2015, 06:36:20 AM |
|
Fast update: Might have a Titan controller and dead units coming in, we'll see. If so I'll post pictures and updates on those, will be interesting to see what they look like. ...
Judging by the pictures you've taken of neptunes, you'd be appalled at how similar every thing is with essentially the same exact placements and design (except for the low binned' chip).
|
|
|
|
Prelude
Legendary
Offline
Activity: 1596
Merit: 1000
|
|
December 23, 2015, 07:02:06 AM |
|
Fast update: Might have a Titan controller and dead units coming in, we'll see. If so I'll post pictures and updates on those, will be interesting to see what they look like. ...
Judging by the pictures you've taken of neptunes, you'd be appalled at how similar every thing is with essentially the same exact placements and design (except for the low binned' chip). Yep, pretty much the same. Power supplies are 40A on Titans instead of 50A like Neptunes. Stupid KFC... Loving the progress, lightfoot. Keep it up.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 24, 2015, 12:39:02 AM |
|
Well it may be harder to overvolt Titans. Scrypt has all sorts of memory crap as well as CPUs so just piling it on with more power might not merit the extra performance like a simple sha die.
In other news I am noticing that the FPGAs do get kind of warm when the board is sitting on a carpet. I could see those caps failing if they were not vented properly.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
December 26, 2015, 08:03:13 PM |
|
Happy MacArthur Day! Had some free time so I spent it melting off one of the chips on a board that was a complete failure.
By "complete failure" I mean the board failed every attempt to talk to a controller. Shorts out the controller board basically, different values on pin 1 of the 10 pin adapter. Dead short in one of the DC-DC converters, black burn spot on the back.
Pulling the power supplies and caps did not clear the short so I pulled the chip. Note that these chips are *stupid big*. As in 380 degree bottom heat for 30 minutes. As in 470c top heat all over the chip before it finally came loose. As in lift off chip and have it fall back on the board because my picker doesn't have enough suction.
Oh well.
However I lifted the chip to preserve the top right side (the shorted one) and I can see the problem: The chip underside got so hot at some point it reflowed the solder. Resulting in the chip literally shorting itself. In this case I'm guessing one of those shorts was to the SPI control line which would render the board fucked.
Interesting. So much heat it literally melted the solder under it. No wonder why the board was charred...
Fixing this is going to be a serious bitch on wheels. Hm.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 01, 2016, 08:28:48 PM |
|
Been a productive week, turns out the 600gh Neptune I fixed continues to hash at 600gh, the controller boards continue to work fine, and all the solder is off the wrecked neptune board. I did promise a picture of the board showing the shorts, in this picture look for the little bridges in the middle of the board, typically 3 balls wide, right where the burned part is. That is the proof that the boards are overheating under the chips, melting the solder, and shorting the power lines to the spi bus which is why the board goes into the drink. Now to find a reballing stencil that will fit this thing. It's too big for the normal 90*90mm reflow table, so I will have to either cobble together a custom stencil or do it in quarters. That will be a *lot* of fun... Fortunately there is a lot of redundancy in the balls on this chip, so missing one or two won't sink the whole project. For the record it's .6mm balls, 1.0mm spacing. Next up, Titan work!
|
|
|
|
hawkfish007
|
|
January 01, 2016, 08:51:27 PM |
|
lightfoot, Any idea why one of my cube is drawing more power and DC-DC module amp is higher than the others even at lower volatage? Is it on it's way out?
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 01, 2016, 10:32:59 PM |
|
That chip's might have a weird die. Clock it down a bit, clock the others up a bit. Also note that the position of the power supply on the board with respect to the fan can make a difference. Or the dc-dc is covered in dust, dirt, animal droppings (it's happened...) and is not able to dissipate its' heat as well as the others.
Edit: Oh you're worried about the CHIP temp. Didn't see that. Hm. Maybe the heat sink has come a bit loose? Fan obstructed by dust, dirt, animal stuff? Thermal compound gone bad? If it's hotter, don't move the cube while it is runnning, let it cool or you could smear the solder.
But my guess would be some heat dissipation issues in the chip top to heatsink world.
|
|
|
|
hawkfish007
|
|
January 02, 2016, 12:30:05 AM |
|
That chip's might have a weird die. Clock it down a bit, clock the others up a bit. Also note that the position of the power supply on the board with respect to the fan can make a difference. Or the dc-dc is covered in dust, dirt, animal droppings (it's happened...) and is not able to dissipate its' heat as well as the others.
Edit: Oh you're worried about the CHIP temp. Didn't see that. Hm. Maybe the heat sink has come a bit loose? Fan obstructed by dust, dirt, animal stuff? Thermal compound gone bad? If it's hotter, don't move the cube while it is runnning, let it cool or you could smear the solder.
But my guess would be some heat dissipation issues in the chip top to heatsink world.
I am also worried about Current (A) (3rd column), I believe those Ericsson DC-DC modules are rated for 40A, but they are at 45. Rest of cubes stay between 39-41 except for this one. Is it going to be detrimental in the long run?
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 02, 2016, 01:14:46 AM |
|
Hm. Well, I'm running these test Neptunes at 600gh and here are the numbers:
Temp : 52.0 °C
Power : 263.651 W
DC/DC Voltage (V) Current (A) Power (W) Temperature (°C) 0 0.7782 34.8125 27.091 75.625 1 0.7831 34.6250 27.115 82.625 2 0.7760 44.2500 34.338 85.000 3 0.7777 44.5000 34.608 86.625 4 0.7821 45.6875 35.732 81.750 5 0.7809 45.0000 35.141 76.000 6 0.7765 44.6875 34.700 80.625 7 0.7794 44.8125 34.927 85.000
That's with 350mhz on the first die, 450mhz on the other three. A tad bit hot.
On the water cooled 3/4 unit, 400gh
Temp : 43.5 °C
Power : 170.360 W
DC/DC Voltage (V) Current (A) Power (W) Temperature (°C) 0 0.7825 33.8125 26.458 85.125 1 0.7834 33.5625 26.293 83.625 4 0.7783 38.7500 30.159 75.250 5 0.7786 38.8125 30.219 80.875 6 0.7804 36.6250 28.582 83.000 7 0.7822 36.6250 28.648 86.250
Running 350,400,375mhz on the three remaining dies.
Now checking the actual temps with the laser IR tool I see the temps on the top of the high side FETs are 77c. To me that's kinda hot, and about as high as I would want to run them. The 12v filter caps are right there, and the question then comes "what is the max operating temps of the caps?
Cheap crap caps are 85c max. Good caps are 105c, and the best ones will take 125c. What do you think KNC went with?
Back to the 40a thing my quess is if you trim then down to 40a draws you will probably find the temps drop a lot. Once you start going above 40a, that's when internal temps start to climb. I would say keep the temps below 85c and you will be ok. Less if you're a Titan since the extra hashing speed you gain is probably not worth the risk of fireballing your unit.
|
|
|
|
hawkfish007
|
|
January 02, 2016, 01:39:22 AM |
|
Hm. Well, I'm running these test Neptunes at 600gh and here are the numbers:
Temp : 52.0 °C
Power : 263.651 W
DC/DC Voltage (V) Current (A) Power (W) Temperature (°C) 0 0.7782 34.8125 27.091 75.625 1 0.7831 34.6250 27.115 82.625 2 0.7760 44.2500 34.338 85.000 3 0.7777 44.5000 34.608 86.625 4 0.7821 45.6875 35.732 81.750 5 0.7809 45.0000 35.141 76.000 6 0.7765 44.6875 34.700 80.625 7 0.7794 44.8125 34.927 85.000
That's with 350mhz on the first die, 450mhz on the other three. A tad bit hot.
On the water cooled 3/4 unit, 400gh
Temp : 43.5 °C
Power : 170.360 W
DC/DC Voltage (V) Current (A) Power (W) Temperature (°C) 0 0.7825 33.8125 26.458 85.125 1 0.7834 33.5625 26.293 83.625 4 0.7783 38.7500 30.159 75.250 5 0.7786 38.8125 30.219 80.875 6 0.7804 36.6250 28.582 83.000 7 0.7822 36.6250 28.648 86.250
Running 350,400,375mhz on the three remaining dies.
Now checking the actual temps with the laser IR tool I see the temps on the top of the high side FETs are 77c. To me that's kinda hot, and about as high as I would want to run them. The 12v filter caps are right there, and the question then comes "what is the max operating temps of the caps?
Cheap crap caps are 85c max. Good caps are 105c, and the best ones will take 125c. What do you think KNC went with?
Back to the 40a thing my quess is if you trim then down to 40a draws you will probably find the temps drop a lot. Once you start going above 40a, that's when internal temps start to climb. I would say keep the temps below 85c and you will be ok. Less if you're a Titan since the extra hashing speed you gain is probably not worth the risk of fireballing your unit.
Roger that, 10-15 MH/s gain isn't worth risking the cube. I will set it to 300.
|
|
|
|
coinut
|
|
January 02, 2016, 12:54:47 PM |
|
loving what you are doing here lightfoot cant wait for the titan hacking, keep it up mate
|
|
|
|
|