Bitcoin Forum
May 17, 2025, 02:28:51 AM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: [1]
1  Alternate cryptocurrencies / Mining (Altcoins) / Linux Nvidia Monitoring Script for checkmk nagios on: August 08, 2019, 09:10:13 AM
I have been searching for a while for a simple way to monitor nvidia temps (and possibly more) with checkmk nagios.
Found lots of different methods, either way to complicated, not working at all or to hard for me to adapt (I am primarily a windows sys admin, medium skills on linux. Good on powershell, medium to less on everything else)
I am sure there are better, more elegant and simpler ways to achieve what I wanted. This is what I came up with.
The agent actually does detect the core temps by itself, but for some reason this is not being detected on a host rescan.

CheckMK is a free and complete monitoring solution I use professionally and in private.
https://checkmk.com/cms_install_packages.html

This script can be put in the checkmk agent local folder /usr/bin/check_mk_agent/local and will be executed by the agent.
Only thing that needs to be adapted is your preferred temp range.
The output is what checkmk requires.
0 normal
1 warning
2 crit

Result looks like this (everything else beside GPU is built in ofc, Miner process is a custom discovery rule)
https://i.imgur.com/noqwFz9.png

Maybe someone finds this useful.
Please take, copy, improve, whatever...

# Code
count=`nvidia-smi --query-gpu=index --format=csv,noheader`

for index in $count
do

gpu_temp=`nvidia-smi -i $index --query-gpu=temperature.gpu --format=csv,noheader`
gpu_fan=`nvidia-smi -i $index --query-gpu=fan.speed --format=csv,noheader`
gpu_name=`nvidia-smi -i $index --query-gpu=gpu_name --format=csv,noheader`
gpu_power=`nvidia-smi -i $index --query-gpu=power.draw --format=csv,noheader`

if ((10<=$gpu_temp && $gpu_temp<=70))
then echo "0 GPU$index - $gpu_name TEMP $gpu_temp"C" - FAN $gpu_fan - $gpu_power"

elif ((71<=$gpu_temp && $gpu_temp<=72))
then echo "1 GPU$index - $gpu_name TEMP $gpu_temp"C" - FAN $gpu_fan - $gpu_power"

elif ((73<=$gpu_temp && $gpu_temp<=80))
then echo "2 GPU$index - $gpu_name TEMP $gpu_temp"C" - FAN $gpu_fan - $gpu_power"

else echo "2 GPU$index - UNKNOWN"
fi

done

#Code

2  Alternate cryptocurrencies / Mining (Altcoins) / H110Pro cant get win to boot with more than 6x 2080 on: January 21, 2019, 09:29:25 AM
Trying to setup an ASRock H110 with 8x 2080.
With 7th gpu installed windows boots but freezes on login.
I had 7x 1080 running on the same board before.

Win10 1809
i5
16GB RAM
256GB Disk
417.71 drivers
usb risers
Bios P1.10
above 4g transcoding set in bios
pci gen1 link speed in bios

The risers are plugged in the pcie slots with one free between each other ( to narrow), except for the first 2 slots.
Does that make any difference? havent tried other slots yet.

Any hints are very welcome.

Thanks
Pages: [1]
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!