On Ubuntu Linux 16.04, 6x GTX 1070s, doesn't matter what miner I'm using. Related to overclocking probably.
- can't Ctrl-C to quit process
- can't kill -9 the process
- can't successfully reboot the server (ssh disconnects, but I can't reconnect afterwards)
The only thing that will let me back in is physically powering down the system by holding the power button down for a few seconds (I have no reset button) and starting the system back up. Then I can ssh back into it.
Can anyone help me with what's going on here? How to address it? It happens randomly. Is there anything I can do to prevent this? I'm sure lowering my overclock setting would help, but normally I'd just restart the miner to continue mining.
Ctrl-C, kill -9 and server reboot fails means that there is an internal nvidia kernel error. Miner process gets stuck and the system can’t properly unload the nvidia driver during the system shutdown process... All the system gets locked, never complete the reboot process that’s why ssh never come back.
So... for the miner freeze issue, I suggest:
First of all try to execute a ”dmsg” command to get more info about your error (is always the same GPU ? PCI port ? Same kind of error ? ecc ...) then try to :
- change your PCI Risers (bad-quality ones can really become a pain in the ass)
- update your os kernel
- update your nvidia / CUDA drivers
- desable your overclocking setting and/or underclock your GPU
By experience, 99% of the rigs problems are generated by bad PCI Riser so try to always follow this hypothesis first.
Other tips: when your system is stuck like this, you can reboot without using physical power button using “reboot -q”
Hope this will help you