lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 29, 2016, 04:42:00 AM |
|
Back to working on boards: Got three controllers in today, all dead. Bit odd in that they don't flash the green light which is what normally happens when the TMS chip or FPGA fails.
First board had literally blown two of the pads where the TMS chip is. Great. Put that aside.
Second chip fired up and lit the bright light, followed by green with no miners. Nice. Figured it might be ok, then it shut down the power supply. Not nice. Power cycled and felt temps. When the bright light lit, the FPGA started getting *exceptionally* hot. Hot enough to burn my thumb. Ow. Powered down, went to pull the FPGA chip (bad).
FPGA came off, but some of the traces underneath it literally had delaminated. Wow. Now it's true that the controller board is not designed for heat, but this is a first. So I cleaned it up, took off the solder, and straightened the 6 pads that had come loose. Tomorrow I'll sit down and see if I can put a new chip on it; the balls should pick up the pads and provide stability.
This however points to a failure mode: FPGA degrades and pulls more current. Gets hotter, then either the FPGA shorts or it blows up the TPS chip. Or in this case screws everything up.
Never dull.
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
January 29, 2016, 05:32:39 PM |
|
Man.....why did KNC build such utter SHIT?!?!?
|
|
|
|
Titan-Allen
Newbie
Offline
Activity: 11
Merit: 0
|
|
January 29, 2016, 06:45:23 PM |
|
Man.....why did KNC build such utter SHIT?!?!?
They only wanted to raise money, not actually build something worth the money. >_< Ironically, many claimed that they mined with these machines prior to sending them to the purchasers, but I have to wonder what kind of cluster-F that turned into when the machines were so attention-demanding. Think of all the times they would have had to reset cubes, just to keep them mining. I suspect they realized the production quality was rather poor, once or if they had mined with them. This is what happens when you outsource the entire design and production of a product, with no or little quality assurance, or oversight, being incorporated in the process.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 29, 2016, 11:07:41 PM |
|
I don't think it's utter shit, technically there is always worse. Using off the shelf power supplies was actually a masterful idea since it took the risk of building a mongo power supply out of the equation. And I have no clue how they managed to get a high power scrypt chip together....
Then again the cases are hand cutting.... ok they suck.
(there's also the fact that building miners is a queen's race. I wouldn't do it)
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 29, 2016, 11:13:45 PM |
|
In the meantime I had a couple of interesting breakthroughs:
I know what the pins for those titans are now: They are 3.3 volts and 1.8 volt sources from the TMS power supply. They're not for the main power supplies (they have their own power sources) they might be for the EEPROM. I'm going to modify (cut) a 10 pin cable to bypass them and see what happens when I plug it into a controller:
If the LM75's come up with no power and it recognizes Titan that's not good. If they don't I will haul both and see.
I'll then try powering up the 12 volt line.
I think also I can see where an exploding power supply could take out cubes on a controller (and the controller). Basically that is a shared line to all cubes, if 12v got on those lines it would short anything there.
I just hope it ain't the dies. If so we're fucked.
Also I took apart the roaster board and sure enough both the FPGA and the TMS chip are shot. Replaced TMS, voltages are up, will think about putting my last FPGA chip on there. I have a several hundred dollar order on with Digi-Key, that will come in next week.
Next up: Titan work.
|
|
|
|
helipotte
|
|
January 30, 2016, 01:23:19 AM |
|
Awesome work! Always intriguing to watch your "stream of thought" in these threads.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 30, 2016, 02:04:18 AM |
|
I swear to God, I'm beginning to think I can fix a bad day. 6 BGA pads lifted, placed back by hand, new FPGA placed, aligned, flowed and the damn thing works. Yep, I definitely worked for this one.
I'll deal with the other three when the parts come in next week. But yep, I'm fairly good....
|
|
|
|
boomin
|
|
January 30, 2016, 06:35:40 AM |
|
I'm pretty sure that that one is mine Thanks! (I know I have 3 there somewhere!) LOL
|
|
|
|
Tigggger
Legendary
Offline
Activity: 1098
Merit: 1000
|
|
January 30, 2016, 10:16:40 AM |
|
Further to my post a few pages ago, https://bitcointalk.org/index.php?topic=1283859.msg13519222#msg13519222I turned my machine off shortly after as wasn't really profitable to run anymore in the UK, but the recent spike in rates on nicehash brought it back to life, after talking with GenTarkin I bought his custom firmware and it did cure the problem. Here are a couple of other problems that my machine has, I'm sure you've come across them already but just for info in case you haven't Asic 4, Die 1 (The Half Running Die) I have quite a few of these across my 6 cubes, I presume this is just a faulty asic chip and nothing can be done Asic 5, Die 2 (The Half Speed Die) Have a couple of these as well, they will only run at much reduced speeds as the voltages are very high, these are a mystery and am curious on the problem.
|
|
|
|
Searing
Copper Member
Legendary
Offline
Activity: 2898
Merit: 1465
Clueless!
|
|
January 30, 2016, 11:33:37 AM Last edit: January 30, 2016, 11:52:10 AM by Searing |
|
Man.....why did KNC build such utter SHIT?!?!?
they were designed for 3 month warranty....that was it ..the only reason the orig titan owners got 1 year a bunch of us ordered such before they changed the web pages...they then tried 'dual warranty' those before that date got 1 year those after got 90 days..part of the reason I'd like to figure out how to take all the wayback machines archives of knc from 2013-2015 and 'merge' them for a kinda sorta mirror (anyone how is this done)...just so I could read all the crap the pulled again and flat out 'lies' they said along the way ...(why they nuked it imho ....because of the class action) ..then of course the large amount of RMA's came in on the first batch of dies not being too good...so they then changed the max from 300 on the adv page to 325....so they have overclocked this unit 2x above ..what their orig web pages said of 250mh.....thus the need for the y 2 cable and 2 psu's etc etc.....after that point all titans have been run full out it is a frigging asic 'miracle' my rasberry pi 8th grade science project .controller board just sitting around .exposed to the air/cats/etc (I mean could not even afford a plastic box for it ..how cheap is that) .has lasted 14 months.... I attribute that to just the plain 'evilness' of KNC machines........like 'asic' zombies ...unless monitored constantly...they 'kinda sorta' work but 'refuse to die' keeping 'minon's like us" in the constant battle to stop them from 'bricking themselves to a door stop at the first oppurtunity..evil I tell ya and of course what the market would bear when they put the price tag on these by the by ...what is an estimate from someone in say the mnfg field of the reality of what these cost with parts/labor and the chips....1/5th what they charged is my guess at the MOST.. anyway....like all 'space heaters' they will eventually 'fry out' imho ..they were 'road hard and put away wet' out the factory gate last point...did anything ever come of that 'swedish class action' you put 1.5k down per Titan and/or maybe more if you lost the case?
|
Old Style Legacy Plug & Play BBS System. Get it from www.synchro.net. Updated 1/1/2021. It also works with Windows 10 and likely 11 and allows 16 bit DOS game doors on the same Win 10 Machine in Multi-Node! Five Minute Install! Look it over it uninstalls just as fast, if you simply want to look it over. Freeware! Full BBS System! It is a frigging hoot!:)
|
|
|
mmfiore
|
|
January 30, 2016, 02:18:42 PM |
|
@ Tigggger
I have a couple of cubes that do what your cubes are doing they run for a little bit and then shutoff. I believe it is caused in many cases by insufficient cooling and the heat sync not being in good contact on the surface of the chip. I went out to examine the cubes and found that the two that were malfunctioning I had laying on their sides. This I think had the effect of pulling the heat sync away from the surface of the dies. I placed them right side up and the cubes started to perform better. They still need improvement because they still seem to be running hot. I will recondition them to improve cooling. Its good that people are sharing info on this blog this info is helpful.
Question for lightfoot. Do you fix titan cubes for other people?
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 30, 2016, 03:22:22 PM |
|
Sure, I can fix Titans, they're still worth fixing based on the prices on Ebay. Feel free to PM me about what you have.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 30, 2016, 03:26:57 PM |
|
Asic 4, Die 1 (The Half Running Die) I have quite a few of these across my 6 cubes, I presume this is just a faulty asic chip and nothing can be done Asic 5, Die 2 (The Half Speed Die) Have a couple of these as well, they will only run at much reduced speeds as the voltages are very high, these are a mystery and am curious on the problem. ASIC4 is most likely a blown power supply. However fixing it would only give you a few more MH, so it might not be worth fixing. The second one is a bit more interesting. Try going to -.0366 or so and 50mh on the chip. Watch to see if the supplies turn on for awhile then turn off or start at 0 volts. Also check the heat sink connection, and if there is junk/crap all over the power supplies especially the ones on the sides.
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 30, 2016, 05:25:24 PM |
|
I'm pretty sure that that one is mine Thanks! (I know I have 3 there somewhere!) LOL Yep, and as requested I just sent the one fixed back. I tested it overnight and did a quick test on all six ports this morning. And got out of bed, sucks to get out of bed on Saturday :-) Can't do anything on the others (or the one that just arrived this morning) till Tuesday when the next batch o' parts come in. So I'll focus on these Titan boards and see if I can get one of them to respond to basic commands.
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
January 30, 2016, 06:35:20 PM |
|
Further to my post a few pages ago, https://bitcointalk.org/index.php?topic=1283859.msg13519222#msg13519222I turned my machine off shortly after as wasn't really profitable to run anymore in the UK, but the recent spike in rates on nicehash brought it back to life, after talking with GenTarkin I bought his custom firmware and it did cure the problem. Here are a couple of other problems that my machine has, I'm sure you've come across them already but just for info in case you haven't Asic 4, Die 1 (The Half Running Die) I have quite a few of these across my 6 cubes, I presume this is just a faulty asic chip and nothing can be done Asic 5, Die 2 (The Half Speed Die) Have a couple of these as well, they will only run at much reduced speeds as the voltages are very high, these are a mystery and am curious on the problem. Question, on ASIC 4 DIE 1 ... does that DCDC work fine after a power cycle? or is it permanently like that? Also, due to only having one DCDC powering the ASIC I would suggest lowering voltage a bit more or lowering ur clock, that remaining DCDC is pumping over 46A which is way beyond spec for these DCDC's ... 43A is bout the max I would recommend for longevity. My upcoming firmware will attempt to power cycle the DCDC's to bring back the DCDC 1 thats messed up ... so in ur case if its permanently damaged u may have to turn off that ASIC entirely w/ my upcoming firmware release to prevent constant bfgminer restarts. ASIC 5 DIE 2, I Theorize the voltage is a higher because there is barely any load on the DCDC's at that clock setting...compared to the load of a normal ASIC clock setting.
|
|
|
|
boomin
|
|
January 30, 2016, 06:36:38 PM |
|
I must say - Lightfoot is an amazing contributor and an incredible person. I sent him 3 broken KNC controllers (1 FUBAR) he got them fixed in a VERY timely manner, Always professional in conversation. A TRUE asset to the BTC world. I have been mining for about a year and I have to admit it was a little weird sending someone I have never met some equipment to evaluate and repair. But I will vouch for this dude forever! I wish all of my "chances" that I have taken in the BTC world would have gone this smooth. Keep doing what you are doing Lightfoot. All others could learn from your example! Sincerely, Boomin
|
|
|
|
GenTarkin
Legendary
Offline
Activity: 2450
Merit: 1002
|
|
January 30, 2016, 06:39:56 PM |
|
@ Lightfoot ...
hey dude, give the tech docs a read for the DCDC's ... they are highly configurable, switching frequency and a whole host of settings like 90+ settings can be configured on these DCDC's ... u know more bout he electrical side of things than myself. Maybe you can figure out some more optimal settings to run these DCDC's at, I believe KNC largely just runs them stock cuz I dont see much in the code that really configures the DCDC's at all.
But all the safety precautions Ive coded into my firmware are based off the tech docs, such as overtemp & overcurrent situations etc...
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 30, 2016, 07:33:39 PM Last edit: January 30, 2016, 07:56:16 PM by lightfoot |
|
My wife wonders why I spend the day slaving over a hot iron (soldering iron).....
Ok, more research on the shorting Titans: Pin 4 and 6 are very interesting. They go to not only the LM75 and the EEPROM, but also to the four itty bitty chips U19,18,17,9 and the big cap C150 and I would bet that weird cap C47 as well.
Each of these chips has a small capacitor between that line and frame ground. What I'm beginning to think here is that the power supply failure blew apart the (non) isolation between the 12 volt rail and this little 3.3 volt somewhat isolated rail which went between boards and blew up a lot of stuff.
Unfortunately removing those caps and chips did not clear the short. More knowledge but still stuck. I was hoping it was a shorted cap, that could have been enough to sink everything. Drat.
And for those following along, here is a map of the 10 pin connector along with what you should see to ground and what I *do* see to ground on bad boards that blow up fpgas.... Pin Good Bad pin 1----open open pin 2----6.6k 5.6k pin 8 on rom, pin 1 on lm75 pin 3----open open pin 4----.9k short pin 2 on lm75, pin 2 on rom pin 5----open open pin 6----.9k short same as pin 4 pin 7----6.4k 5.59k pin 8----hopped to megohms 39 ohms pin 9----open open pin 10---short short
Edit: Cutting the lines to pins 2,4,8 results in a controller that doesn't shut down but still doesn't work even with 12 volts off. Which means those power the pins to the LM75 and EEPROM to wake those up. Damn!
|
|
|
|
Tigggger
Legendary
Offline
Activity: 1098
Merit: 1000
|
|
January 30, 2016, 08:36:11 PM |
|
The second one is a bit more interesting. Try going to -.0366 or so and 50mh on the chip. Watch to see if the supplies turn on for awhile then turn off or start at 0 volts. Also check the heat sink connection, and if there is junk/crap all over the power supplies especially the ones on the sides.
Will have a play on Monday when I'm at the location my machine is at and let you know. Question, on ASIC 4 DIE 1 ... does that DCDC work fine after a power cycle? or is it permanently like that? Also, due to only having one DCDC powering the ASIC I would suggest lowering voltage a bit more or lowering ur clock, that remaining DCDC is pumping over 46A which is way beyond spec for these DCDC's ... 43A is bout the max I would recommend for longevity. My upcoming firmware will attempt to power cycle the DCDC's to bring back the DCDC 1 thats messed up ... so in ur case if its permanently damaged u may have to turn off that ASIC entirely w/ my upcoming firmware release to prevent constant bfgminer restarts.
ASIC 5 DIE 2, I Theorize the voltage is a higher because there is barely any load on the DCDC's at that clock setting...compared to the load of a normal ASIC clock setting.
Re Asic 4: All the dies that do that have been like that since I got the machine. I have tried lowering voltages/speeds but it doesn't seem to make any difference to the Amps so just settled on the compromise of slightly over. Server room is air conditioned so not really worried about temps and has probably helped to keep them going. I intend to do as above and mess around with them all on Monday to see if any of the ones I currently have off can be brought back to life through your firmware. Thanks to you both
|
|
|
|
lightfoot (OP)
Legendary
Offline
Activity: 3220
Merit: 2334
I fix broken miners. And make holes in teeth :-)
|
|
January 30, 2016, 09:37:51 PM |
|
Ok, tighten up this is where it gets *really* interesting.
So I'm fuzzing both this board and a dead Neptune. The question is where the fuck is everything and what the fuck is going on?
Let's start with the basics: SCL requires four things:
An SCL clock An SCL signal line A vcc (typically 3.3v) A ground
We're going to assume here for laughs that SCL is what these clowns are using. Ok. So what does the 10 pin connector do?
Well, we can reverse engineer things. We know that on the titans pin 4,6 are shorted. Great. We also know that pin 8 is broken and pin 2 "works". And we know what a LM75 and a EEPROM looks like.
Here is where things go on a Neptune: SCL clk: pin 2 on 10 pin. Ok that makes sense as a clock. SCL signal: pin 5 on 10 pin. Once again, ok. Ground: Frame ground and pin 10 on 10 pin. Fair. vcc: Here is the weirdness. On a Neptune it goes NOWHERE! None of the 10 pins register.
So how the fuck does a Neptune generate the power? It doesn't come from the ribbon and there is no other power supply.... Aw fuck-tarts.... Checking the 14 pin connectors on the back I see that pin "2" on the ericksons which is flagged as a "remote" or something. My guess is KNC pulled the 3.3v for the hotel stuff from there. Explains why some Neptunes don't appear till powered up unlike Titans.
Well, if that is bussed over, how about clk and signal? Checking shows yup, they show up on the Ericksons as well which means that everything is on one SCL bus. Great.
So what is with pins 4,6,8 on a Titan? My guess is they separated things a bit. Maybe. However if they are connected then a blown supply could cause a ground loop and bye bye circuits.
More importantly since the power is generated by the 3.3v supplies on the Titans, the failure could be anywhere along that supply chain. Time to remove power supplies! AAARRRGGGHHH!
|
|
|
|
|