Bitcoin Forum
December 08, 2016, 02:34:12 AM *
News: To be able to use the next phase of the beta forum software, please ensure that your email address is correct/functional.
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: Nagios plugin for monitoring GPU temperature and fan speed?  (Read 7236 times)
Mahkul
Sr. Member
****
Offline Offline

Activity: 420


Be silent, or be silenced.


View Profile WWW
March 28, 2011, 12:35:06 PM
 #1

I may throw some BTC at a person who could write a nagios plugin that would monitor the temperature using aticonfig utility. Anyone else interested in getting such a thing?

Bitalo.com coming soon!

1MAHKULzqZb4evFFg9157LvnJhJQQbeYo7
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
Cdecker
Hero Member
*****
Offline Offline

Activity: 487



View Profile WWW
March 28, 2011, 01:47:46 PM
 #2

I prefer Munin as it works more out of the box, but I'll give it a try Cheesy

Want to see what developers are chatting about? http://bitcoinstats.com/irc/bitcoin-dev/logs/
Bitcoin-OTC Rating
comboy
Sr. Member
****
Offline Offline

Activity: 247



View Profile
March 28, 2011, 02:03:49 PM
 #3

I prefer Munin as it works more out of the box, but I'll give it a try Cheesy

You would need to tune it for yourself but here's what I have for munin:


Code: (temperature)
#!/usr/bin/ruby

if ARGV.first == "config"
  puts "graph_title GPU temperatures"
  #puts "graph_args --base 1000 -r --lower-limit 0"
  puts "graph_category gpu"
  puts "graph_period second"
  puts "graph_vlabel temperature"
  puts "card0.label card0"
  puts "card1.label card1"
  puts "card2.label card2"
else
  out = `DISPLAY=":3" aticonfig --odgt --adapter=all`
  #puts out

  out.split("\n\n").each do |card|
    adapter = card.match(/Adapter ([0-9.]+)/i)[1]
    temp = card.match(/Temperature - ([0-9.]+) C/i)[1]
    puts "card#{adapter}.value #{temp}"
  end
end


Code: (fan speeds)
#!/usr/bin/ruby

if ARGV.first == "config"
  puts "graph_title GPU fan speeds"
  #puts "graph_args --base 1000 -r --lower-limit 0"
  puts "graph_category gpu"
  puts "graph_period second"
  puts "graph_vlabel speed"
  puts "fan0.label card0"
  puts "fan2.label card2"
else
  %w{0 2}.each do |id|
    out = ` DISPLAY=":3.#{id}" aticonfig --pplib-cmd 'get fanspeed 0'`
    speed = out.match(/Speed: (\d+)\%/)[1]
    puts "fan#{id}.value #{speed}"
  end
end

As you may have guessed this one is from machine that has 1x5970 and 1x5870. I was too lazy to do it using plugin configuration.

Variance is a bitch!
Mahkul
Sr. Member
****
Offline Offline

Activity: 420


Be silent, or be silenced.


View Profile WWW
March 28, 2011, 03:07:30 PM
 #4

I will give munin a try, however I would prefer nagios since I use it to monitor my whole network (which is pretty big).

Bitalo.com coming soon!

1MAHKULzqZb4evFFg9157LvnJhJQQbeYo7
electrotime
Newbie
*
Offline Offline

Activity: 21


View Profile
March 28, 2011, 04:37:46 PM
 #5

Whats your average temperature while minning? (GPU and CPU)

Si te ha servido mi información, cualquier donación es bienvenida => 17jvU48o4k2ypZNfspinvy1fPBxePs1aY5 <= Gracias.
LMGTFY
Hero Member
*****
Offline Offline

Activity: 644



View Profile
March 28, 2011, 05:13:39 PM
 #6

I will give munin a try, however I would prefer nagios since I use it to monitor my whole network (which is pretty big).
I've only just this minute looked at Naigos (used Munin before, however) but it looks to me as if you could fairly easily hack comboy's script. If I read the documentation correctly, Naigos wants a plugin that simply spits out something like "GPU Temperature: 75.4". You could do that with comboy's script by removing everything except the else-block. (And maybe a bit of ruby-hackery to pretty-print the output for Nagios/human-consumption).

...but I know nothing about Nagios, YMMV, IANAL, consult a qualified physician before commencing exercise, do not taunt Happy Fun Ball, etc.

This space intentionally left blank.
Mahkul
Sr. Member
****
Offline Offline

Activity: 420


Be silent, or be silenced.


View Profile WWW
March 28, 2011, 05:16:23 PM
 #7

I will give munin a try, however I would prefer nagios since I use it to monitor my whole network (which is pretty big).
I've only just this minute looked at Naigos (used Munin before, however) but it looks to me as if you could fairly easily hack comboy's script. If I read the documentation correctly, Naigos wants a plugin that simply spits out something like "GPU Temperature: 75.4". You could do that with comboy's script by removing everything except the else-block. (And maybe a bit of ruby-hackery to pretty-print the output for Nagios/human-consumption).

...but I know nothing about Nagios, YMMV, IANAL, consult a qualified physician before commencing exercise, do not taunt Happy Fun Ball, etc.


Thanks for that, LMGTFY. I never looked at writing Nagios plugins, I may give it a shot!

Bitalo.com coming soon!

1MAHKULzqZb4evFFg9157LvnJhJQQbeYo7
mikegogulski
Sr. Member
****
Offline Offline

Activity: 360



View Profile WWW
May 29, 2011, 10:01:00 PM
 #8

This is working for me for the GPU temps. You'll need to set up sudoers correctly and chance instances of "syadasti" to whatever userid generally runs your miners.

Haven't yet seen if I can graph off the perfdata, but that's next.

Code:
#!/bin/bash

export DISPLAY=:0
export LD_LIBRARY_PATH=/opt/ati-stream-sdk-v2.3-lnx64/lib/x86_64/
export ATISTREAMSDKROOT=/opt/ati-stream-sdk-v2.3-lnx64
export GPU_USE_SYNC_OBJECTS=1

exit_status=0
serviceoutput=
serviceperfdata=
longserviceoutput=
templist=

for f in `sudo -u syadasti aticonfig --list-adapters | grep : | tr \* ' ' | sed 's/\..*//'`
do
out=`sudo -u syadasti aticonfig --adapter=$f --odgt`
temp=`echo $out | grep Temp | sed -e 's/^.*- .* - //' | sed -e 's/ C.*//'`
templist="$templist $temp"
longserviceoutput="${longserviceoutput}#${f} $temp C;"
#echo $longserviceoutput
serviceperfdata="${serviceperfdata}#${f}=${temp};"
#echo $serviceperfdata
tempint=`echo $temp | sed 's/\..*//'`
if [ $tempint -gt 94 ]
then
exit_status=1
fi
done

if [ $exit_status -ne 0 ]
then
serviceoutput="GPU TEMP WARNING"
else
serviceoutput="GPU TEMP OK"
fi
serviceoutput="$serviceoutput - Temps: / $templist;"

#echo long: $longserviceoutput
#echo serviceperf: $serviceperfdata

echo $serviceoutput \|
echo -n $longserviceoutput \| | sed 's/^#//' | sed 's/#/\n/g'
echo $serviceperfdata | sed 's/;$//' | sed 's/^#/ /'| sed 's/#/\n/g'

exit $exit_status

FREE ROSS ULBRICHT, allegedly one of the Dread Pirates Roberts of the Silk Road
JayC
Jr. Member
*
Offline Offline

Activity: 34


View Profile
May 29, 2011, 10:09:29 PM
 #9

I may throw some BTC at a person who could write a nagios plugin that would monitor the temperature using aticonfig utility. Anyone else interested in getting such a thing?

Send me a PM, I've already got the gpu temps into snmpd on linux and getting that into opennms or nagios would be pretty straight forward.
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!