I thought the boards would be 1 2 3 left to right or right to left.
Not the case. Seems they can be in any order.
For me it is trial and error to find sick or dead boards.
Dead boards are easier, little red light is out.
When they are sick the red light is still on. So I pull hash cables one at a time and restart until the sick board goes away.
I have spent quite some time troubleshooting only a few dead boards myself. While I have had only a few boards go dead I will say that the few I have tried to fix had no rhyme or reason to the fixes. The order of boards to I/O is weird.
I took pictures of the one I troubleshot yesterday. I will try to get them up in a post with a step by step for others. I know you guys know how to troubleshoot to a hash board, using ther LED, etc but it sure might make life easier for others just getting into a situation to have a 1,2,3 approach.
What do you guys think? Is it worth the time? Please give me your thoughts before I go to all the trouble of cleaning up all the pictures I took.
Something along the lines of:
A. When viewing the miner from the back we will call the hash boards from right to left board 1, 2, 3. (back meaning facing the exhaust fan)
B. When viewing the miner from the back we will call the connectors on the Bitmain I/O board from right to left J1, J2, J3, and so on. (back still means facing the exhaust fan.)
C. If you see channels 2 and 4 functioning in the miner user interface (in my case) hash board 1 was the one at fault. Hash board 1 was connected to J1 on the IO board.
I can state these things now because I took the time to disconnect power from each hash board, repower the unit, and note which channel shows activity. To further troubleshoot all possible scenarios I also connected the power and cable going to the IO board from a hashboard which was functioning correctly.
So using a different cable, a different connection point on the IO board where a known good hashboard ran, and the PCIe power connections from a known functioning hash board and I have a board which still never shows up in the GUI.
Once I am this far I remove the board from the unit and take it to the bench for further testing, some of which must be under power so be prepared for such to move on. I use many steps from the posts I quoted in my last post in this thread above. I am not going to put those in a hash board step by step at this point as there are too many variables and I feel if someone has the ability to troubleshoot at that level we have a different discussion.
What concerns me is (as you guys have mentioned) the hash board which fails doesn't show (or not show) as a consistent chain in the GUI. This may be why I was unable to simply be sure if the hash board is connected to J1, then it is always the board on the right. (When facing the exhaust fan) It sounds like you guys are saying it is indeed not always the same channel which appears and therefore just because it is connected to J1 it may not be the board on the right.
Please confirm I understand, and also I appreciate your and others input on if you think this is useful along with your experience on which hash board fails versus which channel was "missing" in the GUI.
I know we do not hear people talk when everything is running great but that is another point I think many people would enjoy hearing. Regarding the last three S9 batches, or lets say any purchases within the last 3 months, has anyone had any failures?
If people could share their personal experience it would be a great help. Say, I purchased 4 miners and I had one board failure, or I purchased 14 miners and had zero failures.
For people with several or zero failures are you performing any regular maintenance, how are your miners setup, what are your ambient and operating temperatures, what type of power supply are you using and any other details you would be willing to share.
I look forward to your replies.
Thanks
Edit 1:
PS
I deal with ESD scenarios quite a bit in my day job. The equipment I work with creates a great deal of high frequency noise in normal use and we must use components which are isolated against such and determine ways to further isolate the equipment - particularly PC components. For example we use optically isolated serial ports for RS422 communications and fiber for our network connection, etc.
I understand the debate regarding some people who say "I have been working on electronics for 20 years and have never seen or damaged a component due to noise or a static discharge." I am not interested in that debate. What I am interested in is do people take any precautions while troubleshooting to alleviate a potential issue due to a static discharge? If so, what are those precautions?
Thanks Again
Edit 2:
I have many hashboards which have required repair which has obviously cost me a great deal as two of the units I purchased used and 4 of the hashboards were from those two units. One being a 550 and the other being a 600, but both from the same time frame. I am hoping to hear positive words from people who have made more recent purchases.