Bitcoin Forum
April 26, 2024, 07:09:50 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Nagios plugin for monitoring GPU temperature and fan speed?  (Read 7660 times)
Mahkul (OP)
Sr. Member
****
Offline Offline

Activity: 434
Merit: 250


Every saint has a past. Every sinner has a future.


View Profile
March 28, 2011, 12:35:06 PM
 #1

I may throw some BTC at a person who could write a nagios plugin that would monitor the temperature using aticonfig utility. Anyone else interested in getting such a thing?
1714158590
Hero Member
*
Offline Offline

Posts: 1714158590

View Profile Personal Message (Offline)

Ignore
1714158590
Reply with quote  #2

1714158590
Report to moderator
You can see the statistics of your reports to moderators on the "Report to moderator" pages.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
Cdecker
Hero Member
*****
Offline Offline

Activity: 489
Merit: 504



View Profile WWW
March 28, 2011, 01:47:46 PM
 #2

I prefer Munin as it works more out of the box, but I'll give it a try Cheesy

Want to see what developers are chatting about? http://bitcoinstats.com/irc/bitcoin-dev/logs/
Bitcoin-OTC Rating
comboy
Sr. Member
****
Offline Offline

Activity: 247
Merit: 252



View Profile
March 28, 2011, 02:03:49 PM
 #3

I prefer Munin as it works more out of the box, but I'll give it a try Cheesy

You would need to tune it for yourself but here's what I have for munin:


Code: (temperature)
#!/usr/bin/ruby

if ARGV.first == "config"
  puts "graph_title GPU temperatures"
  #puts "graph_args --base 1000 -r --lower-limit 0"
  puts "graph_category gpu"
  puts "graph_period second"
  puts "graph_vlabel temperature"
  puts "card0.label card0"
  puts "card1.label card1"
  puts "card2.label card2"
else
  out = `DISPLAY=":3" aticonfig --odgt --adapter=all`
  #puts out

  out.split("\n\n").each do |card|
    adapter = card.match(/Adapter ([0-9.]+)/i)[1]
    temp = card.match(/Temperature - ([0-9.]+) C/i)[1]
    puts "card#{adapter}.value #{temp}"
  end
end


Code: (fan speeds)
#!/usr/bin/ruby

if ARGV.first == "config"
  puts "graph_title GPU fan speeds"
  #puts "graph_args --base 1000 -r --lower-limit 0"
  puts "graph_category gpu"
  puts "graph_period second"
  puts "graph_vlabel speed"
  puts "fan0.label card0"
  puts "fan2.label card2"
else
  %w{0 2}.each do |id|
    out = ` DISPLAY=":3.#{id}" aticonfig --pplib-cmd 'get fanspeed 0'`
    speed = out.match(/Speed: (\d+)\%/)[1]
    puts "fan#{id}.value #{speed}"
  end
end

As you may have guessed this one is from machine that has 1x5970 and 1x5870. I was too lazy to do it using plugin configuration.

Variance is a bitch!
Mahkul (OP)
Sr. Member
****
Offline Offline

Activity: 434
Merit: 250


Every saint has a past. Every sinner has a future.


View Profile
March 28, 2011, 03:07:30 PM
 #4

I will give munin a try, however I would prefer nagios since I use it to monitor my whole network (which is pretty big).
electrotime
Newbie
*
Offline Offline

Activity: 20
Merit: 0


View Profile
March 28, 2011, 04:37:46 PM
 #5

Whats your average temperature while minning? (GPU and CPU)
LMGTFY
Hero Member
*****
Offline Offline

Activity: 644
Merit: 502



View Profile
March 28, 2011, 05:13:39 PM
 #6

I will give munin a try, however I would prefer nagios since I use it to monitor my whole network (which is pretty big).
I've only just this minute looked at Naigos (used Munin before, however) but it looks to me as if you could fairly easily hack comboy's script. If I read the documentation correctly, Naigos wants a plugin that simply spits out something like "GPU Temperature: 75.4". You could do that with comboy's script by removing everything except the else-block. (And maybe a bit of ruby-hackery to pretty-print the output for Nagios/human-consumption).

...but I know nothing about Nagios, YMMV, IANAL, consult a qualified physician before commencing exercise, do not taunt Happy Fun Ball, etc.

This space intentionally left blank.
Mahkul (OP)
Sr. Member
****
Offline Offline

Activity: 434
Merit: 250


Every saint has a past. Every sinner has a future.


View Profile
March 28, 2011, 05:16:23 PM
 #7

I will give munin a try, however I would prefer nagios since I use it to monitor my whole network (which is pretty big).
I've only just this minute looked at Naigos (used Munin before, however) but it looks to me as if you could fairly easily hack comboy's script. If I read the documentation correctly, Naigos wants a plugin that simply spits out something like "GPU Temperature: 75.4". You could do that with comboy's script by removing everything except the else-block. (And maybe a bit of ruby-hackery to pretty-print the output for Nagios/human-consumption).

...but I know nothing about Nagios, YMMV, IANAL, consult a qualified physician before commencing exercise, do not taunt Happy Fun Ball, etc.


Thanks for that, LMGTFY. I never looked at writing Nagios plugins, I may give it a shot!
mikegogulski
Sr. Member
****
Offline Offline

Activity: 360
Merit: 250



View Profile WWW
May 29, 2011, 10:01:00 PM
 #8

This is working for me for the GPU temps. You'll need to set up sudoers correctly and chance instances of "syadasti" to whatever userid generally runs your miners.

Haven't yet seen if I can graph off the perfdata, but that's next.

Code:
#!/bin/bash

export DISPLAY=:0
export LD_LIBRARY_PATH=/opt/ati-stream-sdk-v2.3-lnx64/lib/x86_64/
export ATISTREAMSDKROOT=/opt/ati-stream-sdk-v2.3-lnx64
export GPU_USE_SYNC_OBJECTS=1

exit_status=0
serviceoutput=
serviceperfdata=
longserviceoutput=
templist=

for f in `sudo -u syadasti aticonfig --list-adapters | grep : | tr \* ' ' | sed 's/\..*//'`
do
out=`sudo -u syadasti aticonfig --adapter=$f --odgt`
temp=`echo $out | grep Temp | sed -e 's/^.*- .* - //' | sed -e 's/ C.*//'`
templist="$templist $temp"
longserviceoutput="${longserviceoutput}#${f} $temp C;"
#echo $longserviceoutput
serviceperfdata="${serviceperfdata}#${f}=${temp};"
#echo $serviceperfdata
tempint=`echo $temp | sed 's/\..*//'`
if [ $tempint -gt 94 ]
then
exit_status=1
fi
done

if [ $exit_status -ne 0 ]
then
serviceoutput="GPU TEMP WARNING"
else
serviceoutput="GPU TEMP OK"
fi
serviceoutput="$serviceoutput - Temps: / $templist;"

#echo long: $longserviceoutput
#echo serviceperf: $serviceperfdata

echo $serviceoutput \|
echo -n $longserviceoutput \| | sed 's/^#//' | sed 's/#/\n/g'
echo $serviceperfdata | sed 's/;$//' | sed 's/^#/ /'| sed 's/#/\n/g'

exit $exit_status

FREE ROSS ULBRICHT, allegedly one of the Dread Pirates Roberts of the Silk Road
JayC
Newbie
*
Offline Offline

Activity: 34
Merit: 0


View Profile
May 29, 2011, 10:09:29 PM
 #9

I may throw some BTC at a person who could write a nagios plugin that would monitor the temperature using aticonfig utility. Anyone else interested in getting such a thing?

Send me a PM, I've already got the gpu temps into snmpd on linux and getting that into opennms or nagios would be pretty straight forward.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!