Bitcoin Forum
November 17, 2024, 11:24:37 PM *
News: Check out the artwork 1Dq created to commemorate this forum's 15th anniversary
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: [Antminer S9] Miner stops hashing - all stats freeze except LSTime  (Read 423 times)
tim-bc (OP)
Full Member
***
Offline Offline

Activity: 538
Merit: 175


View Profile
January 07, 2019, 08:32:07 PM
Merited by nc50lc (1)
 #1

Hello all,

I've come across a widespread issue in a farm today that I could use your help with.

It started when I received notice that a large number of miners (all of them Antminer S9 base model - mostly 13.5th) appeared to be hashing fine when looking at web miner status or miner API, but on the pool API it showed no hashrate for these miners (no shares being submitted). When I look at these miners through either the status page or API, they appear to be hashing, but all of the stats are frozen. For example, the "elapsed" time does not increase, realtime hashrate does not change, nothing. Kernel log does not change and no error messages are present.

The eerie thing is that LSTime (last share time) still increases as normal. And it also looks like fan_num is always 0, and all fan speeds are zero.

When I reboot the miners, everything comes back up normally. Stats stay updated, fan speeds show up, shares get submitted (confirmed by pool). So what I am planning to do is make a script that checks all of the APIs so that e.g. if hashrate > 1th but last share time > 10min then reboot the miner.

The thing is that I have no clue why this is happening, and what might cause it to reoccur. Any thoughts?

Ignore scammers on Skype, Telegram, etc. I will only ever contact you via forum PMs. See profile for fingerprint.
fanatic26_
Full Member
***
Offline Offline

Activity: 294
Merit: 129


View Profile
January 07, 2019, 09:08:14 PM
Merited by mikeywith (1), tim-bc (1), frodocooper (1)
 #2

For me this issue is 100% related to the ASICboost firmware. It was never an issue before the patches. As a matter of fact, the second ASICboost patch was a fix for the stalling found in the original release.

We have yet to figure out a way to keep them running so we are developing something that can read the LST and reboot as needed as all of the API info freezes at the point it stops mining while still reporting those final functional values as current even when they are not.
tim-bc (OP)
Full Member
***
Offline Offline

Activity: 538
Merit: 175


View Profile
January 08, 2019, 04:48:52 PM
Last edit: January 09, 2019, 12:15:00 AM by frodocooper
Merited by frodocooper (2)
 #3

Wow, it's good to know that I'm not the only one with this issue. I do remember that the first S9 asicboost firmware from October was definitely bugged. I had to upgrade those again with the newer one. I too have a script in the works to read LST and reboot automatically.

The only thing is, I found some miners that hadn't been asicboosted yet... they had a firmware from Sept 2017 and had the same issue.

Before deploying any auto-reboot scripts I'm going to generate a report on all of the miners to try to get some stats on asicboost vs non-asicboost miners etc. It should be easy since I already have firmware and asicboost stats in the database.



Looks like I was mistaken again. So I got some high-level stats and here is what I found:

Farm Summary

6.1% of miners had a Last Share time of greater than 1 day, evenly distributed from 1-32 days

99.4% of affected miners were ASICBoosted S9, other 0.6% were ASICBoosted S9i. So the issue is definitely related to ASICBoost firmware.

Weird thing

42.3% of affected miners had LST greater than 7 days. Out of these miners, 99.7% had a (frozen stat) hashrate greater than 11 TH/s.

For the affected miners with LST less than 7 days, only 11.1% of these miners had a (frozen stat) hashrate greater than 11 TH/s. 88.5% of them had one bad hashboard and thus reported between 8-11 TH/s.

Ignore scammers on Skype, Telegram, etc. I will only ever contact you via forum PMs. See profile for fingerprint.
fanatic26_
Full Member
***
Offline Offline

Activity: 294
Merit: 129


View Profile
January 09, 2019, 06:35:18 PM
Last edit: January 10, 2019, 09:14:30 AM by frodocooper
 #4

Looks like I was mistaken again. So I got some high-level stats and here is what I found...

I have the same issue trying to figure out which are asicboost and which arent. Sometimes the firmware date changes to the Nov 2 date of the ASICboost patch, and sometimes it retains the underlying firmwares date so you actually have to login and look for the LPM checkbox.

The real pain with this issue is the fact that the API and everything else still reports the machine as hashing so if you dont happen to notice in your dashboard that the hashrate is always identical, it can take a while to realize whats happening with the last submission issue.

From my data I have not had any correlation between machine states (aka bad board, etc) and the fact that it stops submitting data. The VAST majority of my machines are in perfect working order and all exhibit the same symptoms.
kano
Legendary
*
Offline Offline

Activity: 4620
Merit: 1851


Linux since 1997 RedHat 4


View Profile
January 09, 2019, 11:50:17 PM
 #5

... use original oem firmware ...

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
tim-bc (OP)
Full Member
***
Offline Offline

Activity: 538
Merit: 175


View Profile
January 10, 2019, 12:20:10 AM
Last edit: January 10, 2019, 09:15:05 AM by frodocooper
 #6

I have the same issue trying to figure out which are asicboost and which arent. Sometimes the firmware date changes to the Nov 2 date of the ASICboost patch, and sometimes it retains the underlying firmwares date so you actually have to login and look for the LPM checkbox.

Are you sure about the date thing? Sounds like the firmwares might have been flashed incompletely or something. All of my asicboosted miners all have Nov 2018 file dates. None of my non-asicboost have the LPM checkbox either.

From my data I have not had any correlation between machine states (aka bad board, etc) and the fact that it stops submitting data. The VAST majority of my machines are in perfect working order and all exhibit the same symptoms.

I wish I could say the same about these machines here. It turns out that was a red herring anyway as these machines here were apparently flashed with asicboost in order of descending hashrate (roughly).

... use original oem firmware ...

The asicboost "firmware" is straight from Bitmain. Besides, nowadays using non-asicboost firmware with the S9 is pointless and often unprofitable.

Ignore scammers on Skype, Telegram, etc. I will only ever contact you via forum PMs. See profile for fingerprint.
kano
Legendary
*
Offline Offline

Activity: 4620
Merit: 1851


Linux since 1997 RedHat 4


View Profile
January 10, 2019, 12:44:08 AM
 #7

...
... use original oem firmware ...
The asicboost "firmware" is straight from Bitmain. Besides, nowadays using non-asicboost firmware with the S9 is pointless and often unprofitable.
No - wipe them, put back the original firmware, update them to asicboost and then they should be OK.
Clearly there was something wrong with the firmware on them when you got them ...

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
tim-bc (OP)
Full Member
***
Offline Offline

Activity: 538
Merit: 175


View Profile
January 10, 2019, 06:31:55 PM
Last edit: January 10, 2019, 11:23:51 PM by frodocooper
 #8

No - wipe them, put back the original firmware, update them to asicboost and then they should be OK.
Clearly there was something wrong with the firmware on them when you got them ...

I tried that; some of the miners were used and had an older Sept '17 firmware before putting the asicboost files on. I reflashed the whole filesystem to Nov 2017 autofreq (which fixed the issue) and then flashed the asicboost again (which caused the issue to reoccur).

It seems like it will freeze up at a consistent time on each individual miner, but the time that it freezes up varies between miners.

Also there are a lot of miners that were new and have only ever had the Nov 2017 autofreq firmware and they also present the same symptoms after being boosted.

Ignore scammers on Skype, Telegram, etc. I will only ever contact you via forum PMs. See profile for fingerprint.
Biffa
Legendary
*
Offline Offline

Activity: 3234
Merit: 1220



View Profile
January 28, 2019, 06:27:54 PM
Last edit: January 29, 2019, 12:45:45 AM by frodocooper
 #9

How cold is it where the miners are? I'm seeing this sporadically on miners in very cold conditions, and it may be related more to the Bitmain PSU's than the miners themselves it seems.

Mine @ pools that pay Tx fees & don't mine empty blocks :: kanopool :: ckpool ::
Should bitmain create LPM for all models?
:: Dalcore's Crypto Mining H/W Hosting Directory & Reputation ::
fanatic26_
Full Member
***
Offline Offline

Activity: 294
Merit: 129


View Profile
January 28, 2019, 08:30:45 PM
 #10

How cold is it where the miners are? I'm seeing this sporadically on miners in very cold conditions, and it may be related more to the Bitmain PSU's than the miners themselves it seems.

I can say with certainty that this issue is power supply agnostic. I have 6000+ non Bitmain PSUs powering S9s and T9s and the issue happened with them as well.
tim-bc (OP)
Full Member
***
Offline Offline

Activity: 538
Merit: 175


View Profile
January 29, 2019, 12:00:04 AM
 #11

So I see that this bug is exclusively with bitmain's LPM and enhanced LPM firmwares. I wonder why this happens and if they are aware / plan to fix it.

Some miners always seem to freeze at the same time, sometimes as low as 10-15 minutes after booting. The only solution for these seems to be to downgrade to non-LPM firmware (or an alternative firmware)...

Ignore scammers on Skype, Telegram, etc. I will only ever contact you via forum PMs. See profile for fingerprint.
Artemis3
Legendary
*
Offline Offline

Activity: 2030
Merit: 1573


CLEAN non GPL infringing code made in Rust lang


View Profile WWW
January 29, 2019, 02:15:44 AM
 #12

So I see that this bug is exclusively with bitmain's LPM and enhanced LPM firmwares. I wonder why this happens and if they are aware / plan to fix it.

Some miners always seem to freeze at the same time, sometimes as low as 10-15 minutes after booting. The only solution for these seems to be to downgrade to non-LPM firmware (or an alternative firmware)...

And turning off LPM and enhanced LPM modes doesn't work?

██████
███████
███████
████████
BRAIINS OS+|AUTOTUNING
MINING FIRMWARE
|
Increase hashrate on your Bitcoin ASICs,
improve efficiency as much as 25%, and
get 0% pool fees on Braiins Pool
philipma1957
Legendary
*
Offline Offline

Activity: 4312
Merit: 8869


'The right to privacy matters'


View Profile WWW
January 29, 2019, 04:19:07 AM
Last edit: January 30, 2019, 10:22:35 AM by frodocooper
 #13

And turning off LPM and enhanced LPM modes doesn't work?

This would be my question.

I finally have begun expanding my mining.

I have 19 s9s
I have 53 different units all told and will be going to 100 units.
Then 150 units.

I have the same symptom of freeze. Has anyone tried to not check the boxes and allow full speed.

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
Biffa
Legendary
*
Offline Offline

Activity: 3234
Merit: 1220



View Profile
January 29, 2019, 09:57:54 AM
Last edit: January 30, 2019, 10:23:34 AM by frodocooper
 #14

I can say with certainty that this issue is power supply agnostic. I have 6000+ non Bitmain PSUs powering S9s and T9s and the issue happened with them as well.

Well that scuppers that theory Smiley

All my S9's are AB enabled, I'm seeing it sporadically in 3 of the machines. Although I have to say, even before AB I had to reboot S9's sporadically for one reason or another as well, for obv different reasons.

Sometimes it happens 5 minutes after rebooting, sometimes you it lasts for days.

Lets face it, these machines are all individual in the way they behave anyway, maybe some machines are more sensitive to the changes that AB makes to how it runs, perhaps more sensitive to changes in voltage or how the AB firmware tries to maintain hashrate at lower power levels.

Just had to reboot one 5 mins ago, it was up, web interface accessible, miner status show it online, but not updating Elapsed, RT or AVG numbers. Interestingly one board had stopped working and of course no hashrate on the pool, and nothing in the log, rebooted it and its hashing away again.

Also, as AB is something that needs to be coded at both the miner and the pool side, could it be pool related? I moved one to a different pool after it kept happening and its been stable for days now.

Mine @ pools that pay Tx fees & don't mine empty blocks :: kanopool :: ckpool ::
Should bitmain create LPM for all models?
:: Dalcore's Crypto Mining H/W Hosting Directory & Reputation ::
Artemis3
Legendary
*
Offline Offline

Activity: 2030
Merit: 1573


CLEAN non GPL infringing code made in Rust lang


View Profile WWW
January 29, 2019, 04:22:31 PM
 #15

Also, as AB is something that needs to be coded at both the miner and the pool side, could it be pool related? I moved one to a different pool after it kept happening and its been stable for days now.

And perhaps the pool's AB implementation varies as well? Maybe one likes BraiinsOS's (cgminer's?) more than Bitmain's (bmminer's), and with another pool is just the opposite?

██████
███████
███████
████████
BRAIINS OS+|AUTOTUNING
MINING FIRMWARE
|
Increase hashrate on your Bitcoin ASICs,
improve efficiency as much as 25%, and
get 0% pool fees on Braiins Pool
philipma1957
Legendary
*
Offline Offline

Activity: 4312
Merit: 8869


'The right to privacy matters'


View Profile WWW
January 30, 2019, 01:31:54 AM
Last edit: January 30, 2019, 10:24:12 AM by frodocooper
 #16

And perhaps the pool's AB implementation varies as well? Maybe one likes BraiinsOS's (cgminer's?) more than Bitmain's (bmminer's), and with another pool is just the opposite?

could be.

I will need to spring for awesome miner as  I will be growing units.

I must say my m10's = Godlike  they just run and run and run and run and run.
I am up to 6 of them

▄▄███████▄▄
▄██████████████▄
▄██████████████████▄
▄████▀▀▀▀███▀▀▀▀█████▄
▄█████████████▄█▀████▄
███████████▄███████████
██████████▄█▀███████████
██████████▀████████████
▀█████▄█▀█████████████▀
▀████▄▄▄▄███▄▄▄▄████▀
▀██████████████████▀
▀███████████████▀
▀▀███████▀▀
.
 MΞTAWIN  THE FIRST WEB3 CASINO   
.
.. PLAY NOW ..
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!