Title: Hacking miners, general info on troubleshooting. Post by: lightfoot on November 07, 2017, 03:26:09 AM Ok, after answering the same question a bunch of times about Avalon things, NotFuzzyWarm pointed out something:
There really isn't a good thread for general information about miners. Common information like why do they run, why do they shoot flames out the front from time to time, and of course why did the power plugs melt on them. Or how about tips on making them run well, what to do with heat (send it out a window....), and common simple troubleshooting techniques that can get people mining again? Since I fix these things and figure stuff out the old fashioned way (probes and a general understanding) I thought I would use this thread to post some of my findings and let others contribute to general knowledge. It will take me a few days to get stuff written down, so be patient and check back from time to time. If it gets good enough maybe it can be locked to the top page. Anyway, on with the show. I'll update this thread regularly with new information as I learn it, hopefully this will help people. Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on November 07, 2017, 03:26:21 AM So far I have information completed on:
General problems and things I can fix. Avalon series miners Antminer R4 systems KNC Neptunes and Titans BW series miners. If you want something else, let me know. Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on November 07, 2017, 03:26:36 AM General information:
This is information that applies to pretty much any type of miner.
That's it for now, will add more later. Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on November 07, 2017, 03:26:53 AM Avalon has two types of miners still in use these days, the A6 series and the 721/741 series. Here are some thoughts:
Another item to check is fan direction: The fan should always blow air *out* of the miner (air comes from the fan). This is because pulling a fluid with a pump is always easier than trying to *push* the fluid. In this case the fluid is air, and trying to push it just creates pockets of turbulence that reduces cooling ability. So always make sure the fan is set to pull air through the miner and out the fan front. Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on November 07, 2017, 03:27:12 AM KNC Titan and Neptune miners can be pretty reliable units, but as shipped from the factory they have a few flaws you need to be aware of:
First: Watch the power draw: Running all four dies at 300mhz for Titans or 600mhz for Neptunes pushes the molex PCIe connectors to the limits. If the connector gets overloaded it will warm up, then heat up, then pins will start to delaminate from the board or go high resistance from the heat. When this happens the remaining pins take more heat, until either the cube shuts down or the grounds lift in which case all hell breaks loose, the ribbon cable becomes the ground, and the cube, controller, and several other cubes are destroyed. Burned plugs are fixable (see use lots of pre-heat), but dead-shorted cubes are not. A second problem is a cube that shuts down the power supply. This is normally caused by blown power FETs on an internal DC-DC power supply. Trying to force more current in the cube can cause components to burn, which if the cube is full of dust bunnies will cause the nice fire being blown out the front by the very handy fan. This can also be fixed, but cleaning a burned cube is a mess. Another issue is a dead controller, where the green light on the side of the controller won't come up. Sometimes this is due to a cube shorting, try powering everything off, disconnecting all cubes, and bringing up just the controller. If the light comes on, then the controller is good and one cube is bad. Turn everything off, then plug in one cube, then keep repeating until the bad cube is found. If it's the controller, they can be fixed. Another issue is a dead controller where the lights on the Pi don't come up. This is caused by a Raspberry pi shorting. You have to either replace it with another older 1.2B Rpi (newer 1.2 B Pi's don't work with the Titan code) and fix the bridgeboard, or use a Neptune BeagleBoneBlack with the Lightfoot code (for a Titan) or 1.06 code (for a Neptune). Finally follow general best practices: Keep the die temperatures below 45c for best result. Never plug or unplug any connector with power on (this can damage the drivers, fixable but a pain). Keep DC-DC temperatures below 80c or so, going much above 90 invites a FET cut-through and short. And every once in awhile put a finger on the PCIe plug, if it's warm then something is wrong. Warm plugs over time become hot plugs which melt and make a mess.... Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on November 07, 2017, 03:27:27 AM Antminers are interesting little devices, especially the R4's. I know people have had issues with boards not starting, here is one fix that works. Heat the incoming air to the miner with a hair dryer to warm the boards up, then apply power. Do this a few times in a row, the problem is the chips develop micro-fractures in the solder that close when warmed up. Once running never shut down of course.
A permanent fix could be done by reflowing the boards, I'd be willing to give it a try but don't do it in your toaster.... C Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on November 07, 2017, 07:12:08 PM Reserved for other miners (Hashfast, BW, BFL, etc) as needed.
I do know that the BW series of miners can burn all of the PCIe plugs if the power supply goes bad. Replacing them is something I will be doing next weekend, if this happens to you let me know but always use a good power supply. Good example of a BW miner with six burned plugs that came in for service: https://i.imgur.com/tfGGT9W.jpg (https://i.imgur.com/tfGGT9W.jpg) All fixed and on the way out! https://i.imgur.com/2tG0Hsy.jpg (https://i.imgur.com/2tG0Hsy.jpg) Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on November 18, 2017, 06:43:52 PM Reserved for some more before and after pictures.
Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on December 14, 2017, 01:49:35 PM So.... Antminers...
As the warranties expire on these things and they still break I figured it was time to do some analysis. To that end I'll open up a thread this weekend on hacking them, along the lines of my other threads (information is free, skills are paid for). But some basic thoughts: 1) There is a reason the S9's fail. 2) S7's are still banging along and worthy of repair The basic design of an Antminer is surprisingly simple: Instead of powering each chip or die individually with a DC-DC, they put the dies in series strings and run them right off the 12v supply with a single boost-buck to control voltage. This is clever because the buck converter doesn't have to step the voltage down from 12 to .8 volts (requiring a high frequency shift on the +12 rail with a very sharp cut-off) instead it's more like 12v to 11v or so (which being only a 1v drop is 18 times more efficient than going to .8v or so). It does however mean that if a single die goes bad a string dies or more likely a zipper effect happens that takes out all 15 or so chips in the string, but that's another issue. Trimming the power voltage can be done either by a resistive stepper (S7) or via a PIC changing frequencies (S9). Either was not that complicated. Clocking is provided by the usual 25mhz clock crystal. So when an S7 or S9 fails the key places to look are either the power system or the clocking system. Now to find out which one of these fails in the cold, time to find the fridge! Title: Re: Hacking miners, general info on troubleshooting. Post by: lightfoot on December 10, 2019, 12:46:52 AM My it has been awhile.....
Things here have been busy, been fixing some miners from time to time, mostly Antminers. They are a pain to work on sometimes, most issues are either:
For awhile I was using an S9 board and logging into it to check status, but I finally broke down and bought one of these: https://i.imgur.com/6b1r46v.jpg (https://i.imgur.com/6b1r46v.jpg) An Antminer tester. It has a USB cable that can hook up to a PC using Putty serial (which is nice) but to be honest 99% of the time the display will tell you what's up. https://i.imgur.com/aaAjyni.jpg (https://i.imgur.com/aaAjyni.jpg) An Antminer S9 board under test, light means the SPI bus is communicating. Also nice you can do a quick test outside of the assembly and not have to wait forever for the board/system to initialize. https://i.imgur.com/nAYgWxd.jpg (https://i.imgur.com/nAYgWxd.jpg) Board connected and ready to test. https://i.imgur.com/MYpeOMd.jpg (https://i.imgur.com/MYpeOMd.jpg) Board under test And of course one can cross-check it with the Power Meter to make sure it's pulling current. https://i.imgur.com/kcs3aAR.jpg (https://i.imgur.com/kcs3aAR.jpg) Overall a pretty handy tool to have. I've got an S17 coming in for repair later this week, I think I'll start a separate thread with some pictures and documentation as there doesn't seem to be a lot of data out there on them.... |