Back in the day we used to pull whole units and send them to BMW for repair, but now when we have a bad board we do a calculation based on a 4 week turn around time. Most of the time it's better to let a partially hashing board (which might be down a few ASIC's) to run until dead instead of pulling it for maint while it is still partially functional. When we do pull dead boards we only remove the ones that need to be fixed and leave the unit on the rack and let the good boards continue to hash. So yes, much better to let a 1 board S9 run than not. Try replacing the primary controller board first, and if that does not bring the dead boards back up, pull them and send only those board out to BMW for maint. You can replace the controller on an older beaglebone S9 (those have the black board as part of the controller unit) with a new autotune controller, you just have to remove the front plate because the Ethernet port will not line up. You can get spare controllers and hash board cables right from BitMain.
If you're loosing S9 boards on the reg, and it's not a temp issue, check the quality of your main input power.
Thanks for your detailed reply. I will answer back tomorrow afternoon. I'm getting ready for the dreaded 10 hour night shift at a server farm. I get off at 5:00 AM.
Best wishes!