while its running, pull the board furthest from the psu (number 1)
it will continue to hash and you can work on that one blade and check the thermal paste, i can almost guarantee its been applied poorly.
if so, reapply and shove it back in (not necessary to power down though the inexperienced will disagree)
if its still freaking after you pamper it, get on to bitmain and send it back for a better blade (no need to send the heatsink)
check the S2 SOLD OUT thread for images of the thermal paste applied in house
best of luck, stay in touch
After monitoring for few days i can safely report it was thermal paste issue.
When i removed the board's heatsink..almost 1/4 of the chips were not covered by paste at all..
I reapplied nicely and no downtime ever since, thanks edgar