Around 9am CST6CDT the vast majority of my antminer farm went offline. At the moment I have about 150 S9s and a few L3s offline.
The miners physically look fine, with flashing green lights, no red lights (mostly), network activity, etc. However the majority can not be pinged, and a few that can be are not starting up.
Current network topology: PFSense firewall -> right side Netgear GS316 -> <fans out to 7> Netgear GS316 -> Antminer S9s
-> left side Netgear 10-100 switch on right side -> Netgear 7 port gigabit switch on left side -> <fans out to 3> Netgear GS316 -> Antminer S9s & L3s
-> Avalon 821s & 841s
My normal topology is to have the right side Netgear GS316 also support the left side, but I split the network using a redundant feed I had back to the PFSense firewall with morning in an attempt to isolate the problem.
Of my (76) 14TH S9s, (24) are active, the rest can not be pinged. Likewise (9) of my (25) 13.5TH S9s are reachable - mostly on the right side.
The left side has (11) of (13) 13TH S9s working, (2) T9s, and (2) of (
"problem children" S9s. All of the Avalons are fine, but of course, they are clustered behind a few PIs, so have a lower network port count. (22) of my (26) L3s cannot be pinged.
So both the left side and right side is having problems, and they are independent of each other network wise back to PFSense firewall box. Occasionally I'm seeing Antminers go blinking Red, but a quick power cycle clears that.
I'm at wits end without a clue. I'd be fine if I lost a switch. But my problem children appear to be spread across several switch, and in fact, several physical networks. The LAN side IP addressing is shared at the firewall, but I can't see how that would be a problem.
Although growing (with the latest batch of 34 mixed Antminers being added early this week), the network has been otherwise stable until this morning.
Somebody please! Give me some ideas of what I am overlooking...