To me there is nothing obviously wrong in your kernel log. I've checked it against one of my miners with what I believe is the same hw/fw config as yours and the log is identical "line for line" including "bmminer not found, restart bmminer" and the chain temps etc - so I don't think either of those previous suggestions are your problem. You've obviously done the logical troubleshooting - i.e. Miner-Bad runs fine and doesn't reset with original fans attached, but still resets even with a known working fan delete mod swapped from Miner-Good. Frustrating. It's possibly crashing when it gets to the "checking fans......" stage e.g. this is from one of my auto-tune S9is which occurs around the same time in the kernel log as when yours crashes:
Miner compile time: Wed Nov 7 11:19:02 CST 2018 type: Antminer S9i
miner ID : 803244067910485c
Checking fans...
get fan[4] speed=5880
get fan[4] speed=5880
get fan[5] speed=6000
get fan[4] speed=5880
get fan[5] speed=6000
get fan[4] speed=5880
get fan[5] speed=6000
As you have already trouble-shot your hardware mod, it's possible that there is some subtle difference with your Miner-Bad's compared to the Miner-Good's - but the issue only shows up when you attach the fan delete mod. You have probably done this already - but I'd start looking for what is different with these crashing miners - if anything. E.g. you mention that you have S9's & S9i's in 13, 13.5 and 14 variants - so is there any obvious pattern with the 7 in your tank that are not working e.g. it would give us more clues if, for example, they were all S9i's etc? Also as per the quote below, you use an example of
2 x 13.5 S9i Miners, but your kernel log appears to be from a plain S9 - which is fine obviously as you only mentioned S9i's "as an example". So as the "devil is in the detail" can you just confirm that the kernel log was from an
S9 i.e. not from an
S9i but with the "wrong" S9 asci-boost firmware applied?
So let's say I have 2 13.5 TH S9i miners (I'll call them Miner-Good and Miner-Bad).
Until recently, there was only the fixed-freq and the auto-freq firmware for the S9 available. But since Bitmain launched the S9i and S9j and then also released the asic-boost firmware's (very confusingly in different ways for the 3 S9 models - i.e. loaded "on-top" of the previous firmware for the S9s, but a total new replacement firmware for the S9i and S9j) - there are now 11 current "latest" firmware's for the S9 models available on the Bitmain website. Combined with a lack of Bitmain documentation this has led to the people flashing the wrong firmware to their miners - e.g. S9 FW on an S9i and lots of confusion about the asic-boost update process. The miners may appear to work with wrong firmware in some cases, but it could cause "strange" issues or sub-optimal performance.
So assuming that you are using the standard auto-tune / auto-freq, to completely rule out any FW incompatibility issues, your miners should be as follows if they are up to date:
S9 - the asic-boost patch
Antminer-S9-LPM-20181102.tar.gz flashed on top of the latest auto-tune FW -
Antminer-S9-all-201711171757-autofreq-user-Update2UBI-NF.tar.gz which was probably what your miners already had if they were previously up-to-date. Unfortunately due to the inconsistent way Bitmain released the asic-boost for the S9s as a small patch file, once it is applied "on-top" there is no way of knowing what the "main" firmware version is - the miner's overview page just displays the asic boost version FW details i.e. - "File System Version Sun Nov 2 11:55:42 UTC 2018"
S9i - just the asic-boost new FW -
Antminer-S9i-all-201811071119-autofreq-user-Update2UBI-NF.tar.gz (the very latest S9i version released a few days ago drops both the power and hashrate by around 40% - which Bitmain either didn't intend or haven't documented - so I guess you don't want to use that).
Apologies if the above is obvious to you and you are certain that you have the correct up-to-date FW on all your different S9 models. I only mention it as the Bitmain documentation is not clear and some experienced miners have therefore flashed the wrong firmware as can be seen in other sections of this forum.
Therefore if you are 100% certain of your fan delete mods, all the FW is up-to-date and the correct version for the S9 variant and there are no obvious patterns with the "bad miners", then I guess it is probably going to have to be trial and error to get them working without the fans. A couple of suggestions which are quick to try are:
a. Try flashing a "bad" S9 with the appropriate fixed-freq firmware: e.g.
Antminer-S9-all-201705031838-650M-user-Update2UBI-NF.tar.gz which is for a 14 Th/s S9 but should be fine for all of them. The fixed-freq FW does a much quicker "boot" process and skips all of the auto-tune and auto-freq processes - so that may help to get round the "fan speed checking" issue. Some of my miners are using fixed-freq FW and unlike my normal auto-tune miners the are no "fan checking" lines in the log - so it might work for your set-up - but no guarantee. If it works then you can apply the LPM asic-boost FW on-top which will leave you with a boosted fixed-freq miner still running with less power, but now also faster than before. Your 13 and 13.5 Th/s S9's may be fine running with the 14 TH/s fixed-freq FW - but if not (i.e. unstable or too many HW errors) just use
http://192.168.1.xx/cgi-bin/minerAdvanced.cgi to get to the hidden config page and set the freq manually to whatever is stable (or back to what the miner originally was e.g. freq=631.25 for 13.5Th/s etc). Obviously you can't do this for your S9i's as Bitmain doesn't offer any fixed-freq firmware for this model.
b. Another suggestion - it may not work - but it only takes a few seconds to try - is to manually fix the fan speed - or notional fan speed in your case. I don't know if a fixed "dummy" fan speed is compatible with your "fan delete hw mod". This should just end up with an entry as follows as the very last lines of the kernel log and hopefully will mean that the fan speed does not get checked during restart - and hence hopefully stop your miners from crashing.
Set fixed fan speed=75
FAN PWM: 75
read_temp_func Done!
CRC error counter=0
As you are doing immersion cooling, you are probably far more expert than me with tweaking fans etc, however below is the method I use to manually over-ride the auto-tune fan speed - only takes a few seconds + a restart (it's only temporarily revealing the hidden options in the standard Bitmain miner web-server - so it's not changing any code etc):
The very simplest method - it takes at most 30 seconds per machine and doesn't involve customizing Bitmain's code at all - is as follows:
Note: I've used Chrome - but it is similar for other browsers.
1. Open your miner in your browser and go to the "Miner Configuration" tab.
2. Right-click the area around the new "Low Power Mode" and click "Inspect" - the Chrome DevTools Elements panel will appear on the right.
3. A few lines down from the Low Power Mode, you'll notice a line with "fan-ctrl" in it.
4. On the fan-ctrl line, highlight the text
:none from the
style="display :none", press delete and then press enter.
5. You will now temporarily have a "Customize the fan speed percentage" check-box and input box showing in the miner GUI.
6. Enter your new fixed fan speeds (85 to 90% should be good for most overclocking) and do the normal "Save & Apply".
7. Once the miner restarts - check the kernel log - at the very bottom, it will now show the following instead of the usual auto-tune lines:
Set fixed fan speed=89
FAN PWM: 89
In summary - it's just right-click to inspect the web-page, delete the
:none text on the fan control line and you are done.
Before edit screenshot:
https://imgur.com/cDHmAlFAfter edit screenshot:
https://imgur.com/cIvktyx