Bitcoin Forum
May 05, 2024, 12:00:24 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Monitoring AMD GPUs with SNMP in Linux  (Read 7462 times)
JinTu (OP)
Full Member
***
Offline Offline

Activity: 133
Merit: 100



View Profile
October 18, 2011, 12:22:51 AM
Last edit: October 18, 2011, 06:15:47 AM by JinTu
 #1

Hi folks,

I'm a big fan of monitoring system performance with SNMP and when it came to mining I didn't find much out there that satisfied my requirements, so I decided to put something together and share with you all. The following describes how to monitor a Linux-based host with any number of ATI GPUs:

Prerequesites
  • Linux host (makes use of aticonfig and POSIX signalling)
  • Perl
  • AMD drivers
  • All installed GPUs detected and operational
  • All GPUs must have support for PPLIB.
  • Net-snmp installed and operational


Installation instructions
  • Grab the latest script from here.
  • Install this script somewhere net-snmp can execute it (e.g. /usr/local/bin).
  • Edit the following lines in the script as appropriate for your environment:
Code:
########################################################
## Tweak the following variables for your environment
#
my $sudo = "sudo -u jintu";           # The sudo command to execute
my $aticonfig = "/usr/bin/aticonfig"; # The full path to aticonfig
my $display = "DISPLAY=:0";           # The display your X session and ATI GPUs are running under
#
  • Edit snmpd.conf (mine is in /etc/snmp) to include the following lines (edit the path as appropriate):
Code:
# These are multi-line output visible from NET-SNMP-EXTEND-MIB::nsExtendOutLine
extend  gputemp         /usr/local/bin/gpu_snmp.pl temp
extend  gpuload         /usr/local/bin/gpu_snmp.pl load
extend  gpuclock        /usr/local/bin/gpu_snmp.pl clock
extend  gpumemory       /usr/local/bin/gpu_snmp.pl memory
extend  gpuvcore        /usr/local/bin/gpu_snmp.pl vcore
extend  gpufan          /usr/local/bin/gpu_snmp.pl fan
extend  gpuid           /usr/local/bin/gpu_snmp.pl id
extend  gpuaddress      /usr/local/bin/gpu_snmp.pl address
extend  gpudescription  /usr/local/bin/gpu_snmp.pl description
#
  • Test with snmpwalk e.g. 'snmpwalk -v2c -cpublic localhost NET-SNMP-EXTEND-MIB::nsExtendOutLine' which should output something like the following from my dual 6990 rig:
Code:
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuid".1 = STRING: 0
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuid".2 = STRING: 1
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuid".3 = STRING: 2
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuid".4 = STRING: 3
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpufan".1 = STRING: 85
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpufan".2 = STRING: 85
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpufan".3 = STRING: 85
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpufan".4 = STRING: 85
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuload".1 = STRING: 78
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuload".2 = STRING: 91
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuload".3 = STRING: 94
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuload".4 = STRING: 98
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gputemp".1 = STRING: 7350
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gputemp".2 = STRING: 7400
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gputemp".3 = STRING: 7450
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gputemp".4 = STRING: 7550
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuclock".1 = STRING: 500
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuclock".2 = STRING: 500
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuclock".3 = STRING: 515
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuclock".4 = STRING: 505
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuvcore".1 = STRING: 1000
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuvcore".2 = STRING: 1000
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuvcore".3 = STRING: 1050
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuvcore".4 = STRING: 1050
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpumemory".1 = STRING: 150
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpumemory".2 = STRING: 150
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpumemory".3 = STRING: 150
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpumemory".4 = STRING: 150
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuaddress".1 = STRING: 03:00.0
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuaddress".2 = STRING: 04:00.0
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuaddress".3 = STRING: 07:00.0
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuaddress".4 = STRING: 08:00.0
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpudescription".1 = STRING: AMD Radeon HD 6900 Series
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpudescription".2 = STRING: AMD Radeon HD 6900 Series
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpudescription".3 = STRING: AMD Radeon HD 6900 Series
NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpudescription".4 = STRING: AMD Radeon HD 6900 Series



Troubleshooting
  • If you see "sudo: sorry, you must have a tty to run sudo" when executing the script via snmpd, comment out the following line in your sudoers file:
Code:
"Default requiretty"


    Wire up to your NMS as you see fit. For reference, I have posted my Cacti template utilizing these stats here.

    Have fun!
    1714867224
    Hero Member
    *
    Offline Offline

    Posts: 1714867224

    View Profile Personal Message (Offline)

    Ignore
    1714867224
    Reply with quote  #2

    1714867224
    Report to moderator
    1714867224
    Hero Member
    *
    Offline Offline

    Posts: 1714867224

    View Profile Personal Message (Offline)

    Ignore
    1714867224
    Reply with quote  #2

    1714867224
    Report to moderator
    1714867224
    Hero Member
    *
    Offline Offline

    Posts: 1714867224

    View Profile Personal Message (Offline)

    Ignore
    1714867224
    Reply with quote  #2

    1714867224
    Report to moderator
    Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
    1714867224
    Hero Member
    *
    Offline Offline

    Posts: 1714867224

    View Profile Personal Message (Offline)

    Ignore
    1714867224
    Reply with quote  #2

    1714867224
    Report to moderator
    1714867224
    Hero Member
    *
    Offline Offline

    Posts: 1714867224

    View Profile Personal Message (Offline)

    Ignore
    1714867224
    Reply with quote  #2

    1714867224
    Report to moderator
    1714867224
    Hero Member
    *
    Offline Offline

    Posts: 1714867224

    View Profile Personal Message (Offline)

    Ignore
    1714867224
    Reply with quote  #2

    1714867224
    Report to moderator
    P4man
    Hero Member
    *****
    Offline Offline

    Activity: 518
    Merit: 500



    View Profile
    October 18, 2011, 06:28:02 AM
     #2

    Looks awesome.. ! Ill have to try this later. Can you somhow include hashrate from some of the popular miners too?

    JinTu (OP)
    Full Member
    ***
    Offline Offline

    Activity: 133
    Merit: 100



    View Profile
    October 18, 2011, 01:27:51 PM
     #3

    Looks awesome.. ! Ill have to try this later. Can you somhow include hashrate from some of the popular miners too?

    This version of the script (and the reported stats) are totally independent of any mining software you might be using, so you can use any application that uses your GPUs (mining or otherwise)

    I plan to do another script to scrape the stats cgminer provides for hashrate but am not sure when I will get around to it.
    dlasher
    Sr. Member
    ****
    Offline Offline

    Activity: 467
    Merit: 250



    View Profile WWW
    February 17, 2012, 08:34:05 PM
     #4

    Found a problem with the script with PCI Slot ID of "0a:00.0". Breaks on two of my miners, miner2 and miner 5:


    Quote
    [root@miner2 scripts]# ./gpu_snmp.pl  description
    Execution failed with:* 0. 0a:00.0 ATI Radeon HD 5800 Series
      1. 09:00.0 ATI Radeon HD 5800 Series
      2. 04:00.0 AMD Radeon HD 6900 Series

    * - Default adapter

    Check that sudo and aticonfig are configured correctly.

    [root@miner5 scripts]# ./gpu_snmp.pl  description
    Execution failed with:* 0. 0a:00.0 ATI Radeon HD 5800 Series  
      1. 09:00.0 ATI Radeon HD 5800 Series  
      2. 05:00.0 ATI Radeon HD 5800 Series  
      3. 04:00.0 ATI Radeon HD 5800 Series  

    * - Default adapter

    Check that sudo and aticonfig are configured correctly.


    Quote
    [root@miner2 ~]# aticonfig --list-adapters
    * 0. 0a:00.0 ATI Radeon HD 5800 Series
      1. 09:00.0 ATI Radeon HD 5800 Series
      2. 04:00.0 AMD Radeon HD 6900 Series

    * - Default adapter
    [root@miner2 ~]#

    [root@miner5 scripts]# aticonfig --list-adapters
    * 0. 0a:00.0 ATI Radeon HD 5800 Series  
      1. 09:00.0 ATI Radeon HD 5800 Series  
      2. 05:00.0 ATI Radeon HD 5800 Series  
      3. 04:00.0 ATI Radeon HD 5800 Series

    Guessing it's this line in the script:

    Quote
    unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) {
      # Bail out, we are getting an error
      die "Execution failed with:" . $adapter_list . "\nCheck that sudo and aticonfig are configured correctly.\n";
    }

    Sadly, I'm not enough of a perl guru to figure out what to change..


    JinTu (OP)
    Full Member
    ***
    Offline Offline

    Activity: 133
    Merit: 100



    View Profile
    February 20, 2012, 08:01:47 AM
     #5

    Guessing it's this line in the script:

    Quote
    unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) {


    Yep, you nailed it. The regex doesn't work with anything other than 0-9 at the moment. I'll post an updated version that should work with your setup as soon as I get a couple minutes free.
    dlasher
    Sr. Member
    ****
    Offline Offline

    Activity: 467
    Merit: 250



    View Profile WWW
    February 20, 2012, 04:55:21 PM
    Last edit: February 20, 2012, 05:24:15 PM by dlasher
     #6

    Guessing it's this line in the script:

    Quote
    unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) {


    Yep, you nailed it. The regex doesn't work with anything other than 0-9 at the moment. I'll post an updated version that should work with your setup as soon as I get a couple minutes free.

    Thank you! You've created one of what I considered the missing pieces for miners with any decent GHash rate.. I've got a couple of miners actually working in Cacti, and once this is fixed, I'll can find the remaining issues and get the others going. Looking forward to the patch.

    I played with a  little. The right fix would be matching to something like [0-9a-fA-F][0-9a-fA-F] but for now I was able to change d{2} to w{2} in both places it matches, and get by.


    Quote
    --- gpu_snmp.pl.old     2012-02-20 09:22:31.000000000 -0800
    +++ gpu_snmp.pl 2012-02-20 09:20:21.000000000 -0800
    @@ -64,7 +64,7 @@
     #
     # * - Default adapter
     #
    -unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) {
    +unless ($adapter_list =~ m/^.*\d+\.\s+\w{2}:\d{2}\.\d\s+.*/) {
       # Bail out, we are getting an error
       die "Execution failed with:" . $adapter_list . "\nCheck that sudo and aticonfig are configured correctly.\n";
     }
    @@ -73,7 +73,7 @@
     my $num_adapters = 0;
     my @adapter;
     foreach my $line (split (/\n/,$adapter_list)) {
    -  if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\d{2}:\d{2}\.\d)\s+(.*)/) {
    +  if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\w{2}:\d{2}\.\d)\s+(.*)/) {
         #print "Got \$id:$id,\$address:$address,\$description:$description\n";
         $adapter[$num_adapters]{'id'} = $id;
         $adapter[$num_adapters]{'address'} = $address;
    gfaust
    Newbie
    *
    Offline Offline

    Activity: 24
    Merit: 0


    View Profile
    February 20, 2012, 06:04:38 PM
     #7

    changing to match non-whitespace instead of digits works for me:

    # Normal result looks like
    # * 0. 0a:00.0 ATI Radeon HD 5800 Series 
    #   1. 09:00.0 ATI Radeon HD 5800 Series 
    #   2. 04:00.0 ATI Radeon HD 5800 Series 

    #  * - Default adapter

    ...
    unless ($adapter_list =~ m/^.*\d+\.\s+\S{2}:\d{2}\.\d\s+.*/) {
    ...
    if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\S{2}:\d{2}\.\d)\s$
    ...
    bogesman
    Newbie
    *
    Offline Offline

    Activity: 10
    Merit: 0


    View Profile
    February 24, 2012, 04:54:06 PM
     #8

    dlasher i fixed mine the same way like you, but there is more nasty bug with DUAL GPU adapters. Fans there are -1.
    So if you have 6 adapters with IDs 0 1 2 3 4 5 . 1 of them is DUAL GPU. Lets say last one.
    0 1 2 3 4 are physical. ID 5 is the second GPU. ID 5 must be ignored for fan polling. For load, temp etc is fine.
    JinTu (OP)
    Full Member
    ***
    Offline Offline

    Activity: 133
    Merit: 100



    View Profile
    February 27, 2012, 10:36:59 PM
     #9

    Guessing it's this line in the script:

    Quote
    unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) {


    Yep, you nailed it. The regex doesn't work with anything other than 0-9 at the moment. I'll post an updated version that should work with your setup as soon as I get a couple minutes free.

    Thank you! You've created one of what I considered the missing pieces for miners with any decent GHash rate.. I've got a couple of miners actually working in Cacti, and once this is fixed, I'll can find the remaining issues and get the others going. Looking forward to the patch.

    I played with a  little. The right fix would be matching to something like [0-9a-fA-F][0-9a-fA-F] but for now I was able to change d{2} to w{2} in both places it matches, and get by.


    Quote
    --- gpu_snmp.pl.old     2012-02-20 09:22:31.000000000 -0800
    +++ gpu_snmp.pl 2012-02-20 09:20:21.000000000 -0800
    @@ -64,7 +64,7 @@
     #
     # * - Default adapter
     #
    -unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) {
    +unless ($adapter_list =~ m/^.*\d+\.\s+\w{2}:\d{2}\.\d\s+.*/) {
       # Bail out, we are getting an error
       die "Execution failed with:" . $adapter_list . "\nCheck that sudo and aticonfig are configured correctly.\n";
     }
    @@ -73,7 +73,7 @@
     my $num_adapters = 0;
     my @adapter;
     foreach my $line (split (/\n/,$adapter_list)) {
    -  if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\d{2}:\d{2}\.\d)\s+(.*)/) {
    +  if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\w{2}:\d{2}\.\d)\s+(.*)/) {
         #print "Got \$id:$id,\$address:$address,\$description:$description\n";
         $adapter[$num_adapters]{'id'} = $id;
         $adapter[$num_adapters]{'address'} = $address;



    Updated version with dlasher's regex fixes posted to the link in the first post.
    JinTu (OP)
    Full Member
    ***
    Offline Offline

    Activity: 133
    Merit: 100



    View Profile
    February 27, 2012, 10:43:39 PM
     #10

    dlasher i fixed mine the same way like you, but there is more nasty bug with DUAL GPU adapters. Fans there are -1.
    So if you have 6 adapters with IDs 0 1 2 3 4 5 . 1 of them is DUAL GPU. Lets say last one.
    0 1 2 3 4 are physical. ID 5 is the second GPU. ID 5 must be ignored for fan polling. For load, temp etc is fine.


    I would love to see a dump of this (with gpu_snmp.pl fan) if you can provide it. Since my rig only has two dual GPUs, I don't really know what this would look like in a mixed system. My dual 6990's report back two fan set points per card even though there is really only one attached fan.

    The LT
    Full Member
    ***
    Offline Offline

    Activity: 186
    Merit: 100



    View Profile WWW
    March 07, 2012, 02:17:34 PM
    Last edit: March 07, 2012, 03:00:12 PM by The LT
     #11

    Hey JinTu, mind telling your net-snmp version and posting your snmpd.conf? I can't seem to get the extend MIB going!

    EDIT: Nvm, got it figured out, turned out to be a permissions issue...

    Here's my snmpd.conf, I'm running net-snmp-5.4.3

    Code:
    agentAddress udp:161
    rocommunity public  localhost
    sysLocation    Garage
    sysContact     admin <admin@email.com>

    extend  gputemp         /usr/local/bin/gpu_snmp.pl temp
    extend  gpuload         /usr/local/bin/gpu_snmp.pl load
    extend  gpuclock        /usr/local/bin/gpu_snmp.pl clock
    extend  gpumemory       /usr/local/bin/gpu_snmp.pl memory
    extend  gpuvcore        /usr/local/bin/gpu_snmp.pl vcore
    extend  gpufan          /usr/local/bin/gpu_snmp.pl fan
    extend  gpuid           /usr/local/bin/gpu_snmp.pl id
    extend  gpuaddress      /usr/local/bin/gpu_snmp.pl address
    extend  gpudescription  /usr/local/bin/gpu_snmp.pl description
    bogesman
    Newbie
    *
    Offline Offline

    Activity: 10
    Merit: 0


    View Profile
    April 29, 2012, 01:49:42 PM
     #12

    dlasher i fixed mine the same way like you, but there is more nasty bug with DUAL GPU adapters. Fans there are -1.
    So if you have 6 adapters with IDs 0 1 2 3 4 5 . 1 of them is DUAL GPU. Lets say last one.
    0 1 2 3 4 are physical. ID 5 is the second GPU. ID 5 must be ignored for fan polling. For load, temp etc is fine.


    I would love to see a dump of this (with gpu_snmp.pl fan) if you can provide it. Since my rig only has two dual GPUs, I don't really know what this would look like in a mixed system. My dual 6990's report back two fan set points per card even though there is really only one attached fan.



    I can give you few Smiley

    First one. 4 physical, 4 dual

    Code:
    /usr/bin/aticonfig --lsa
    * 0. 10:00.0 ATI Radeon HD 5900 Series
      1. 04:00.0 ATI Radeon HD 5900 Series
      2. 08:00.0 ATI Radeon HD 5900 Series
      3. 09:00.0 ATI Radeon HD 5900 Series
      4. 0c:00.0 ATI Radeon HD 5900 Series
      5. 0d:00.0 ATI Radeon HD 5900 Series
      6. 03:00.0 ATI Radeon HD 5900 Series
      7. 11:00.0 ATI Radeon HD 5900 Series

    * - Default adapter

    Code:
    gpu_snmp.pl fan
    100
    ati_pplib_cmd: execute "get" failed!
    100
    ati_pplib_cmd: execute "get" failed!
    100
    ati_pplib_cmd: execute "get" failed!
    100
    ati_pplib_cmd: execute "get" failed!

    Code:
    gpu_snmp.pl load
    99
    99
    99
    99
    99
    99
    99
    98

    Second setup 6 physical, 2 dual
    Code:
    * 0. 01:00.0 ATI Radeon HD 5800 Series
      1. 02:00.0 ATI Radeon HD 5800 Series
      2. 03:00.0 ATI Radeon HD 5800 Series
      3. 05:00.0 ATI Radeon HD 5800 Series
      4. 08:00.0 ATI Radeon HD 5900 Series
      5. 09:00.0 ATI Radeon HD 5900 Series
      6. 0c:00.0 ATI Radeon HD 5900 Series
      7. 0d:00.0 ATI Radeon HD 5900 Series

    * - Default adapter

    Code:
    gpu_snmp.pl fan
    100
    100
    100
    100
    100
    ati_pplib_cmd: execute "get" failed!
    100
    ati_pplib_cmd: execute "get" failed!

    Code:
    gpu_snmp.pl load
    99
    99
    98
    99
    98
    99
    99
    99
    JinTu (OP)
    Full Member
    ***
    Offline Offline

    Activity: 133
    Merit: 100



    View Profile
    May 01, 2012, 01:46:43 AM
     #13

    dlasher i fixed mine the same way like you, but there is more nasty bug with DUAL GPU adapters. Fans there are -1.
    So if you have 6 adapters with IDs 0 1 2 3 4 5 . 1 of them is DUAL GPU. Lets say last one.
    0 1 2 3 4 are physical. ID 5 is the second GPU. ID 5 must be ignored for fan polling. For load, temp etc is fine.


    I would love to see a dump of this (with gpu_snmp.pl fan) if you can provide it. Since my rig only has two dual GPUs, I don't really know what this would look like in a mixed system. My dual 6990's report back two fan set points per card even though there is really only one attached fan.



    I can give you few Smiley
    <snip>
    Second setup 6 physical, 2 dual
    Code:
    * 0. 01:00.0 ATI Radeon HD 5800 Series
      1. 02:00.0 ATI Radeon HD 5800 Series
      2. 03:00.0 ATI Radeon HD 5800 Series
      3. 05:00.0 ATI Radeon HD 5800 Series
      4. 08:00.0 ATI Radeon HD 5900 Series
      5. 09:00.0 ATI Radeon HD 5900 Series
      6. 0c:00.0 ATI Radeon HD 5900 Series
      7. 0d:00.0 ATI Radeon HD 5900 Series

    * - Default adapter

    Code:
    gpu_snmp.pl fan
    100
    100
    100
    100
    100
    ati_pplib_cmd: execute "get" failed!
    100
    ati_pplib_cmd: execute "get" failed!

    Thanks for the info. I wonder if the issue with your duals has anything to do with the APP/ADL version. With APP 2.5 and fglrx 11.11 on my 6990's the info for the second (non-existent) fan speed setting is reported back and doesn't generate the error yours does.

    bogesman
    Newbie
    *
    Offline Offline

    Activity: 10
    Merit: 0


    View Profile
    May 02, 2012, 08:55:24 PM
     #14

    Could be. I have fglrx 8.85.6 [Apr 19 2011] and SDK 2.4. I can't update right now, but I will be doing some tests with new drivers/sdk soon, so if i get it working I will write.
    Pages: [1]
      Print  
     
    Jump to:  

    Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!