Title: Monitoring AMD GPUs with SNMP in Linux Post by: JinTu on October 18, 2011, 12:22:51 AM Hi folks,
I'm a big fan of monitoring system performance with SNMP and when it came to mining I didn't find much out there that satisfied my requirements, so I decided to put something together and share with you all. The following describes how to monitor a Linux-based host with any number of ATI GPUs: Prerequesites
Installation instructions
Code: ########################################################
Code: # These are multi-line output visible from NET-SNMP-EXTEND-MIB::nsExtendOutLine
Code: NET-SNMP-EXTEND-MIB::nsExtendOutLine."gpuid".1 = STRING: 0 Troubleshooting
Code: "Default requiretty" Wire up to your NMS as you see fit. For reference, I have posted my Cacti template utilizing these stats here (https://bitcointalk.org/index.php?topic=48828.0). Have fun! Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: P4man on October 18, 2011, 06:28:02 AM Looks awesome.. ! Ill have to try this later. Can you somhow include hashrate from some of the popular miners too?
Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: JinTu on October 18, 2011, 01:27:51 PM Looks awesome.. ! Ill have to try this later. Can you somhow include hashrate from some of the popular miners too? This version of the script (and the reported stats) are totally independent of any mining software you might be using, so you can use any application that uses your GPUs (mining or otherwise) I plan to do another script to scrape the stats cgminer provides for hashrate but am not sure when I will get around to it. Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: dlasher on February 17, 2012, 08:34:05 PM Found a problem with the script with PCI Slot ID of "0a:00.0". Breaks on two of my miners, miner2 and miner 5:
Quote [root@miner2 scripts]# ./gpu_snmp.pl description Execution failed with:* 0. 0a:00.0 ATI Radeon HD 5800 Series 1. 09:00.0 ATI Radeon HD 5800 Series 2. 04:00.0 AMD Radeon HD 6900 Series * - Default adapter Check that sudo and aticonfig are configured correctly. [root@miner5 scripts]# ./gpu_snmp.pl description Execution failed with:* 0. 0a:00.0 ATI Radeon HD 5800 Series 1. 09:00.0 ATI Radeon HD 5800 Series 2. 05:00.0 ATI Radeon HD 5800 Series 3. 04:00.0 ATI Radeon HD 5800 Series * - Default adapter Check that sudo and aticonfig are configured correctly. Quote [root@miner2 ~]# aticonfig --list-adapters * 0. 0a:00.0 ATI Radeon HD 5800 Series 1. 09:00.0 ATI Radeon HD 5800 Series 2. 04:00.0 AMD Radeon HD 6900 Series * - Default adapter [root@miner2 ~]# [root@miner5 scripts]# aticonfig --list-adapters * 0. 0a:00.0 ATI Radeon HD 5800 Series 1. 09:00.0 ATI Radeon HD 5800 Series 2. 05:00.0 ATI Radeon HD 5800 Series 3. 04:00.0 ATI Radeon HD 5800 Series Guessing it's this line in the script: Quote unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) { # Bail out, we are getting an error die "Execution failed with:" . $adapter_list . "\nCheck that sudo and aticonfig are configured correctly.\n"; } Sadly, I'm not enough of a perl guru to figure out what to change.. Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: JinTu on February 20, 2012, 08:01:47 AM Guessing it's this line in the script: Quote unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) { Yep, you nailed it. The regex doesn't work with anything other than 0-9 at the moment. I'll post an updated version that should work with your setup as soon as I get a couple minutes free. Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: dlasher on February 20, 2012, 04:55:21 PM Guessing it's this line in the script: Quote unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) { Yep, you nailed it. The regex doesn't work with anything other than 0-9 at the moment. I'll post an updated version that should work with your setup as soon as I get a couple minutes free. Thank you! You've created one of what I considered the missing pieces for miners with any decent GHash rate.. I've got a couple of miners actually working in Cacti, and once this is fixed, I'll can find the remaining issues and get the others going. Looking forward to the patch. I played with a little. The right fix would be matching to something like [0-9a-fA-F][0-9a-fA-F] but for now I was able to change d{2} to w{2} in both places it matches, and get by. Quote --- gpu_snmp.pl.old 2012-02-20 09:22:31.000000000 -0800 +++ gpu_snmp.pl 2012-02-20 09:20:21.000000000 -0800 @@ -64,7 +64,7 @@ # # * - Default adapter # -unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) { +unless ($adapter_list =~ m/^.*\d+\.\s+\w{2}:\d{2}\.\d\s+.*/) { # Bail out, we are getting an error die "Execution failed with:" . $adapter_list . "\nCheck that sudo and aticonfig are configured correctly.\n"; } @@ -73,7 +73,7 @@ my $num_adapters = 0; my @adapter; foreach my $line (split (/\n/,$adapter_list)) { - if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\d{2}:\d{2}\.\d)\s+(.*)/) { + if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\w{2}:\d{2}\.\d)\s+(.*)/) { #print "Got \$id:$id,\$address:$address,\$description:$description\n"; $adapter[$num_adapters]{'id'} = $id; $adapter[$num_adapters]{'address'} = $address; Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: gfaust on February 20, 2012, 06:04:38 PM changing to match non-whitespace instead of digits works for me:
# Normal result looks like # * 0. 0a:00.0 ATI Radeon HD 5800 Series # 1. 09:00.0 ATI Radeon HD 5800 Series # 2. 04:00.0 ATI Radeon HD 5800 Series # # * - Default adapter ... unless ($adapter_list =~ m/^.*\d+\.\s+\S{2}:\d{2}\.\d\s+.*/) { ... if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\S{2}:\d{2}\.\d)\s$ ... Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: bogesman on February 24, 2012, 04:54:06 PM dlasher i fixed mine the same way like you, but there is more nasty bug with DUAL GPU adapters. Fans there are -1.
So if you have 6 adapters with IDs 0 1 2 3 4 5 . 1 of them is DUAL GPU. Lets say last one. 0 1 2 3 4 are physical. ID 5 is the second GPU. ID 5 must be ignored for fan polling. For load, temp etc is fine. Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: JinTu on February 27, 2012, 10:36:59 PM Guessing it's this line in the script: Quote unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) { Yep, you nailed it. The regex doesn't work with anything other than 0-9 at the moment. I'll post an updated version that should work with your setup as soon as I get a couple minutes free. Thank you! You've created one of what I considered the missing pieces for miners with any decent GHash rate.. I've got a couple of miners actually working in Cacti, and once this is fixed, I'll can find the remaining issues and get the others going. Looking forward to the patch. I played with a little. The right fix would be matching to something like [0-9a-fA-F][0-9a-fA-F] but for now I was able to change d{2} to w{2} in both places it matches, and get by. Quote --- gpu_snmp.pl.old 2012-02-20 09:22:31.000000000 -0800 +++ gpu_snmp.pl 2012-02-20 09:20:21.000000000 -0800 @@ -64,7 +64,7 @@ # # * - Default adapter # -unless ($adapter_list =~ m/^.*\d+\.\s+\d{2}:\d{2}\.\d\s+.*/) { +unless ($adapter_list =~ m/^.*\d+\.\s+\w{2}:\d{2}\.\d\s+.*/) { # Bail out, we are getting an error die "Execution failed with:" . $adapter_list . "\nCheck that sudo and aticonfig are configured correctly.\n"; } @@ -73,7 +73,7 @@ my $num_adapters = 0; my @adapter; foreach my $line (split (/\n/,$adapter_list)) { - if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\d{2}:\d{2}\.\d)\s+(.*)/) { + if (($id,$address,$description) = $line =~ m/^.*(\d+)\.\s+(\w{2}:\d{2}\.\d)\s+(.*)/) { #print "Got \$id:$id,\$address:$address,\$description:$description\n"; $adapter[$num_adapters]{'id'} = $id; $adapter[$num_adapters]{'address'} = $address; Updated version with dlasher's regex fixes posted to the link in the first post. Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: JinTu on February 27, 2012, 10:43:39 PM dlasher i fixed mine the same way like you, but there is more nasty bug with DUAL GPU adapters. Fans there are -1. So if you have 6 adapters with IDs 0 1 2 3 4 5 . 1 of them is DUAL GPU. Lets say last one. 0 1 2 3 4 are physical. ID 5 is the second GPU. ID 5 must be ignored for fan polling. For load, temp etc is fine. I would love to see a dump of this (with gpu_snmp.pl fan) if you can provide it. Since my rig only has two dual GPUs, I don't really know what this would look like in a mixed system. My dual 6990's report back two fan set points per card even though there is really only one attached fan. Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: The LT on March 07, 2012, 02:17:34 PM Hey JinTu, mind telling your net-snmp version and posting your snmpd.conf? I can't seem to get the extend MIB going!
EDIT: Nvm, got it figured out, turned out to be a permissions issue... Here's my snmpd.conf, I'm running net-snmp-5.4.3 Code: agentAddress udp:161 Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: bogesman on April 29, 2012, 01:49:42 PM dlasher i fixed mine the same way like you, but there is more nasty bug with DUAL GPU adapters. Fans there are -1. So if you have 6 adapters with IDs 0 1 2 3 4 5 . 1 of them is DUAL GPU. Lets say last one. 0 1 2 3 4 are physical. ID 5 is the second GPU. ID 5 must be ignored for fan polling. For load, temp etc is fine. I would love to see a dump of this (with gpu_snmp.pl fan) if you can provide it. Since my rig only has two dual GPUs, I don't really know what this would look like in a mixed system. My dual 6990's report back two fan set points per card even though there is really only one attached fan. I can give you few :) First one. 4 physical, 4 dual Code: /usr/bin/aticonfig --lsa Code: gpu_snmp.pl fan Code: gpu_snmp.pl load Second setup 6 physical, 2 dual Code: * 0. 01:00.0 ATI Radeon HD 5800 Series Code: gpu_snmp.pl fan Code: gpu_snmp.pl load Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: JinTu on May 01, 2012, 01:46:43 AM dlasher i fixed mine the same way like you, but there is more nasty bug with DUAL GPU adapters. Fans there are -1. So if you have 6 adapters with IDs 0 1 2 3 4 5 . 1 of them is DUAL GPU. Lets say last one. 0 1 2 3 4 are physical. ID 5 is the second GPU. ID 5 must be ignored for fan polling. For load, temp etc is fine. I would love to see a dump of this (with gpu_snmp.pl fan) if you can provide it. Since my rig only has two dual GPUs, I don't really know what this would look like in a mixed system. My dual 6990's report back two fan set points per card even though there is really only one attached fan. I can give you few :) <snip> Second setup 6 physical, 2 dual Code: * 0. 01:00.0 ATI Radeon HD 5800 Series Code: gpu_snmp.pl fan Thanks for the info. I wonder if the issue with your duals has anything to do with the APP/ADL version. With APP 2.5 and fglrx 11.11 on my 6990's the info for the second (non-existent) fan speed setting is reported back and doesn't generate the error yours does. Title: Re: Monitoring AMD GPUs with SNMP in Linux Post by: bogesman on May 02, 2012, 08:55:24 PM Could be. I have fglrx 8.85.6 [Apr 19 2011] and SDK 2.4. I can't update right now, but I will be doing some tests with new drivers/sdk soon, so if i get it working I will write.
|