Bitcoin Forum

Bitcoin => Mining => Topic started by: xf2_org on May 26, 2011, 08:33:47 PM



Title: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: xf2_org on May 26, 2011, 08:33:47 PM
URL: http://yyz.us/bitcoin/gpu-watch.py

This script will monitor all GPUs in a machine (by default, two), and control fan speed accordingly.

gpu-watch sleeps for N seconds, then samples the temperatures of all GPUs.  If the temperature is too low, fanspeed decreases by 5%.  If the temp is too high, fanspeed increases by 10%.

Settings:
poll_time: number of seconds to sleep between runs
card_first: id of first GPU
card_count: number of GPUs
temp_low: low temperature threshold, at which fan speed decreases
temp_high: high temperature threshold, at which fan speed increases

I run with temp_low==70 and temp_high==76.  If the range is too low, fan speed will constantly change.



Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: w128 on May 26, 2011, 08:40:07 PM

URL: http://yyz.us/bitcoin/gpu-watch.py

This script will monitor all GPUs in a machine (by default, two), and control fan speed accordingly.

gpu-watch sleeps for N seconds, then samples the temperatures of all GPUs.  If the temperature is too low, fanspeed decreases by 5%.  If the temp is too high, fanspeed increases by 10%.

Settings:
poll.time: number of seconds to sleep between runs
card.first: id of first GPU
card.count: number of GPUs
temp.low: low temperature threshold, at which fan speed decreases
temp.high: high temperature threshold, at which fan speed increases

I run with temp.low==70 and temp.high==76.  If the range is too low, fan speed will constantly change.

How hard would it be to make this plot a curve based on a handful of user-provided points or a table? The configurable curve in MSI Afterburner is very nice for this purpose.

I'd certainly pay some kind of bounty to see that implemented.


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: ryepdx on May 26, 2011, 08:48:08 PM
How hard would it be to make this plot a curve based on a handful of user-provided points or a table? The configurable curve in MSI Afterburner is very nice for this purpose.

I'd certainly pay some kind of bounty to see that implemented.

Can't you do that with Overdrive in Windows and AmdOverdriveCtrl in Linux?


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: w128 on May 26, 2011, 08:54:43 PM
How hard would it be to make this plot a curve based on a handful of user-provided points or a table? The configurable curve in MSI Afterburner is very nice for this purpose.

I'd certainly pay some kind of bounty to see that implemented.

Can't you do that with Overdrive in Windows and AmdOverdriveCtrl in Linux?

I'm not sure but, I'm a fan of community built tools and given a choice I'd rather have it in something like this that I could potentially modifying and extend to my hearts content than vendor's potentially limited tools.


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: Mononofu on May 26, 2011, 09:46:59 PM
How hard would it be to make this plot a curve based on a handful of user-provided points or a table? The configurable curve in MSI Afterburner is very nice for this purpose.

I'd certainly pay some kind of bounty to see that implemented.

Can't you do that with Overdrive in Windows and AmdOverdriveCtrl in Linux?

With AmdOverdriveCtrl you can do that. You can choose between steps, linear changes or bezier-curves. Works like a charm for me :)


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: elmom on May 27, 2011, 08:29:40 AM
How hard would it be to make this plot a curve based on a handful of user-provided points or a table? The configurable curve in MSI Afterburner is very nice for this purpose.

I'd certainly pay some kind of bounty to see that implemented.

Can't you do that with Overdrive in Windows and AmdOverdriveCtrl in Linux?

With AmdOverdriveCtrl you can do that. You can choose between steps, linear changes or bezier-curves. Works like a charm for me :)

The fan control worked great, but the over/underclocking part messes my GPU. This script seems like a nice and lean tool. Now I just need a simple tool that can set the clocks beyond what aticonfig allows.


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: barbagianni on May 27, 2011, 05:45:53 PM
I run with temp.low==70 and temp.high==76.  If the range is too low, fan speed will constantly change.

hi xf2_org,
I tried your script but as I'm not confident with python language, can you pls write me an example on how to write the settings?




Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: disq on May 27, 2011, 08:22:58 PM
Code:
Traceback (most recent call last):
  File "./gpu-watch.py", line 127, in <module>
    w.watch()
  File "./gpu-watch.py", line 75, in watch
    self.watch_card(card)
  File "./gpu-watch.py", line 79, in watch_card
    temp = card.temp()
  File "./gpu-watch.py", line 21, in temp
    line = subprocess.check_output(
AttributeError: 'module' object has no attribute 'check_output'

Code:
$ python -V
Python 2.6.6

what do I need? :) any way to make it compatible with lower versions of python like 2.6.x?


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: xf2_org on May 28, 2011, 04:36:00 PM

Script updated with two fixes:
- make configuration file work (./gpu-watch.py CONFIG-FILE)
- do not proceed past low end (zero percent), as temps fall



Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: xf2_org on May 28, 2011, 04:38:11 PM
I tried your script but as I'm not confident with python language, can you pls write me an example on how to write the settings?

You create a configuration file, a text file with one setting per line in a SETTING=VALUE format, such as

poll_time=30
card_first=0
card_count=1

Code:
$ python -V
Python 2.6.6

what do I need? :) any way to make it compatible with lower versions of python like 2.6.x?

Tested on python 2.7.  Patches welcome for older versions...



Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: barbagianni on May 28, 2011, 05:30:41 PM
poll_time=30
card_first=0
card_count=1


I still have this error with python 2.7:

registering card 0 (id 0)
registering card 1 (id 1)
Sat May 28 19:26:39 2011 Polling GPU data...
Traceback (most recent call last):
  File "gpu-watch.py", line 126, in <module>
    w.watch()
  File "gpu-watch.py", line 74, in watch
    self.watch_card(card)
  File "gpu-watch.py", line 80, in watch_card
    print "Temperature %.2fC, fan speed %d%%" % (temp, fanspeed)
TypeError: float argument required, not NoneTyp
e



Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: xf2_org on May 28, 2011, 05:46:50 PM
registering card 0 (id 0)
registering card 1 (id 1)
Sat May 28 19:26:39 2011 Polling GPU data...
Traceback (most recent call last):
  File "gpu-watch.py", line 126, in <module>
    w.watch()
  File "gpu-watch.py", line 74, in watch
    self.watch_card(card)
  File "gpu-watch.py", line 80, in watch_card
    print "Temperature %.2fC, fan speed %d%%" % (temp, fanspeed)
TypeError: float argument required, not NoneTyp
e

You passed an incorrect card setup to gpu-watch.



Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: barbagianni on May 28, 2011, 05:51:25 PM
You passed an incorrect card setup to gpu-watch.

it works with this parameters:

poll_time=30
card_first=0
card_count=0

thanks


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: xf2_org on May 28, 2011, 06:18:34 PM
You passed an incorrect card setup to gpu-watch.

it works with this parameters:

poll_time=30
card_first=0
card_count=0

card_count=0 means zero cards...



Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: barbagianni on May 28, 2011, 07:13:41 PM
card_count=0 means zero cards...

If I write card_count=1, it gives this error:

C:\Python27>gpu-watch.py CONFIG-FILE
Sat May 28 21:11:22 2011 Server Starts
registering card 0 (id 0)
Sat May 28 21:11:22 2011 Polling GPU data...
Traceback (most recent call last):
  File "C:\Python27\gpu-watch.py", line 130, in <module>
    w.watch()
  File "C:\Python27\gpu-watch.py", line 75, in watch
    self.watch_card(card)
  File "C:\Python27\gpu-watch.py", line 81, in watch_card
    print "Temperature %.2fC, fan speed %d%%" % (temp, fanspeed)
TypeError: float argument required, not NoneType


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: inh on May 28, 2011, 08:15:12 PM
works great for me, thanks!


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: marcus_of_augustus on May 28, 2011, 11:34:47 PM

Nice idea.

Won't work on multi-gpu cards like 5970 or 6990 since they have two adapter per fan. Unless it was modified to address the fans at 0.0, 0.2 and 0.4, etc (for a bank of 3 5970s) and somehow average or take highest temp. from adapter pairs, [0.0 0.1] [0.2.0.3] [0.4 0.5] etc as inputs ....


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: mikegogulski on May 31, 2011, 12:58:40 PM
Modified version below. REQUIRES dual-GPU cards. Autodetects the cards, and manages all of them. Panic switch at 90C immediately sets that card's fan to 100%.

Code:
#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import time
import re
import os
import subprocess
from string import split

settings = {
'poll_time': 60,
'temp_low': 70.0,
        'temp_high': 76.0,
'aticonfig_path': '/usr/bin/aticonfig'
}

class GPU():
def __init__(self, id, cardid, description):
self.id = id
self.cardid = cardid
self.description = description

def temp(self):
os.environ['DISPLAY'] = ":0"
try:
line = subprocess.check_output(
[settings['aticonfig_path'],
"--adapter=" + str(self.id),
"--odgt"])
m = re.search('Temperature - (\d+\.\d+)\s*C', line)
if m is None:
return None

return float(m.group(1))
except OSError:
return None

class Card():
def __init__(self, id, gpus):
self.id = id
self.x11_id = ':0.' + str(id)
self.gpus = gpus

def fanspeed(self, val=None):
os.environ['DISPLAY'] = self.x11_id
if val is not None:
subprocess.call( [settings['aticonfig_path'],
"--pplib-cmd", "set fanspeed 0 " + str(val)])
else:
line = subprocess.check_output(
[settings['aticonfig_path'],
"--pplib-cmd", "get fanspeed 0"])
m = re.search('Fan Speed: (\d+)%', line)
if m is None:
return False
return int(m.group(1))

class Watcher():
def __init__(self):
self.cards = []
os.environ['DISPLAY'] = ":0"
out = subprocess.check_output([settings['aticonfig_path'],
"--list-adapters"])
card = 0
gpu = 0
lines = split(out, "\n")
for line in lines:
r = re.search('^[ \*]+(\d+)\. [\d:\.]+ (.+)$', line)
if r is None:
continue
gpuid = int(r.group(1))
desc = r.group(2)
if gpu % 2 == 0:
self.cards.append(Card(card, []))
self.cards[-1].gpus.append(GPU(gpuid, card, desc))
print "gpu %d card %d desc %s" % (gpuid, card, desc)
if gpu % 2 == 1:
card = card + 1
gpu = gpu + 1

def watch(self):
while True:
for card in self.cards:
if card is None:
continue
self.watch_card(card)
time.sleep(settings['poll_time'])
print "----------------------------------"

def watch_card(self, card):
fanspeed = card.fanspeed()
fandelta = 0
print "Card %d: fan speed %d%%" % (card.id, fanspeed)
for gpu in card.gpus:
temp = gpu.temp()
print "Card %d GPU %d: Temperature %.2f °C" % (
card.id, gpu.id, temp)
if temp >= 90.0:
fandelta = 100 - fanspeed
break
if temp > settings['temp_high']:
if fanspeed <= 90:
fandelta = +10
else:
fandelta = 100 - fanspeed
break
elif temp < settings['temp_low']:
if fanspeed <= 5:
continue
fandelta = -5
if fandelta != 0 and fanspeed + fandelta >= 10:
print "Card %d: Adjusting fan %d%% to %d%%" % (
card.id, fandelta, fanspeed + fandelta)
card.fanspeed(fanspeed + fandelta)

if __name__ == '__main__':
if len(sys.argv) > 1:
try:
f = open(sys.argv[1])
for line in f:
m = re.search('^(\w+)\s*=\s*(\S.*)$', line)
if m is None:
continue
settings[m.group(1)] = m.group(2)
f.close()
except (OSError, IOError), e:
pass

settings['poll_time'] = int(settings['poll_time'])
settings['temp_low'] = float(settings['temp_low'])
settings['temp_high'] = float(settings['temp_high'])
print repr(settings)
w = Watcher()
w.watch()


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: ryepdx on May 31, 2011, 04:14:32 PM
With AmdOverdriveCtrl you can do that. ...

The fan control worked great, but the over/underclocking part messes my GPU. ...

Wait, what has it been doing with your GPU? I've been using it for over/underclocking my GPUs and I can't seem to get anything near what I should be getting.


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: MiningBuddy on June 02, 2011, 03:14:36 AM
This is an excellent script and I have been testing it, but I have found a problem.
If I stop the load on my gpu the script continues to lower my temps until the fan is 0% and it keeps trying to lower it further.
When the load is applied back onto my cards the temp rises very fast, like 30C straight up to 80C before the script has changed the fan speed above 20%.

Is there anyway to code this to use a fan profile system similar to other popular fan control software?


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: eyeoh on June 03, 2011, 11:44:02 PM
I would really like to see a modified version of this. Instead of changing the fan speed relative to the temp I would like to see it...

  • Max fans to 100% on startup
  • slowly increase the core Mhz till the temp gets to a certain threshold
  • if the temp can't get cooler then XX even after down clocking for X rounds email home and stop the GPU miners
  • It should also auto detect the # of GPU's in a server

I have a shell script that runs via cron that does most of this but I'd like to see a python version that runs as a daemon. Think of it as a GPU auto OC'er and temp watch dog all rolled into 1 with email warnings.


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: marcus_of_augustus on June 03, 2011, 11:48:22 PM

I've found changing clock speeds while hashing can cause lock-ups, i.e. even near clock speeds that can be stable if set before launching.

Scripts changing clock speeds is not such a good idea imo.


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: marcus_of_augustus on June 04, 2011, 12:52:17 PM
I've found a page that has a simple Python PID controller.

http://code.activestate.com/recipes/577231-discrete-pid-controller/

Code:
#The recipe gives simple implementation of a Discrete Proportional-Integral-Derivative (PID) controller. PID controller gives output value for error between desired reference input and measurement feedback to minimize error value.
#More information: http://en.wikipedia.org/wiki/PID_controller
#
#cnr437@gmail.com
#
####### Example #########
#
#p=PID(3.0,0.4,1.2)
#p.setPoint(5.0)
#while True:
#     pid = p.update(measurement_value)
#
#


class PID:
"""
Discrete PID control
"""

def __init__(self, P=2.0, I=0.0, D=1.0, Derivator=0, Integrator=0, Integrator_max=500, Integrator_min=-500):

self.Kp=P
self.Ki=I
self.Kd=D
self.Derivator=Derivator
self.Integrator=Integrator
self.Integrator_max=Integrator_max
self.Integrator_min=Integrator_min

self.set_point=0.0
self.error=0.0

def update(self,current_value):
"""
Calculate PID output value for given reference input and feedback
"""

self.error = self.set_point - current_value

self.P_value = self.Kp * self.error
self.D_value = self.Kd * ( self.error - self.Derivator)
self.Derivator = self.error

self.Integrator = self.Integrator + self.error

if self.Integrator > self.Integrator_max:
self.Integrator = self.Integrator_max
elif self.Integrator < self.Integrator_min:
self.Integrator = self.Integrator_min

self.I_value = self.Integrator * self.Ki

PID = self.P_value + self.I_value + self.D_value

return PID

def setPoint(self,set_point):
"""
Initilize the setpoint of PID
"""
self.set_point = set_point
self.Integrator=0
self.Derivator=0

def setIntegrator(self, Integrator):
self.Integrator = Integrator

def setDerivator(self, Derivator):
self.Derivator = Derivator

def setKp(self,P):
self.Kp=P

def setKi(self,I):
self.Ki=I

def setKd(self,D):
self.Kd=D

def getPoint(self):
return self.set_point

def getError(self):
return self.error

def getIntegrator(self):
return self.Integrator

def getDerivator(self):
return self.Derivator

If any of you Python wizards feels like patching this into your temp. fan control algo it would probably be adequate to eliminate the 'hunting' phenomena, as the fan chases the temperature around.

The current temperature is "measurement_value" I think. P,I, and D are the tuning parameters for the controller. Set-point is what temp. you want the card to be. The output will be some multiplier for the fan speed (normalised would be 0-1, i.e. multiplied by 100 to get fan speed in percentages).

Anyone who feels like twiddling, can tune the PID parameters, they'll probably have long time constants because of the nature of the thermal inertia of the heat sinks, etc.

It needs to perform this loop at least 10 faster than the shortest time constant, which is probably pretty long given the thermal nature, so keep the 30 or 60 second loop is probably adequate.


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: mikegogulski on June 04, 2011, 04:26:37 PM
I would really like to see a modified version of this. Instead of changing the fan speed relative to the temp I would like to see it...

  • Max fans to 100% on startup
  • slowly increase the core Mhz till the temp gets to a certain threshold
  • if the temp can't get cooler then XX even after down clocking for X rounds email home and stop the GPU miners
  • It should also auto detect the # of GPU's in a server

I have a shell script that runs via cron that does most of this but I'd like to see a python version that runs as a daemon. Think of it as a GPU auto OC'er and temp watch dog all rolled into 1 with email warnings.

This is a neat idea but not something I'd implement myself without some very hard limits. The penalty for clocking too high (at least on my 6990s) isn't overtemperature, it's a complete lockup.


Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: mikegogulski on June 04, 2011, 04:30:06 PM
Newer version based on the above. Still requires dual-GPU cards. Now supports configurable limit for immediate 100% fan speed, plus a threshold and interval for simply shutting a GPU down temporarily if the threshold temp is exceeded. Assumes you're using poclbm, and python-psutil is a new requirement.

Code:
#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import time
import re
import os
import subprocess
import psutil
import signal
from string import split

settings = {
'poll_time': 60,
'temp_low': 70.0,
        'temp_high': 76.0,
'temp_fan_100': 90.0,
'temp_cooldown': 93.0,
'cooldown_period': 10,
'aticonfig_path': '/usr/bin/aticonfig'
}

class GPU():
def __init__(self, id, cardid, description):
self.id = id
self.cardid = cardid
self.description = description

def temp(self):
os.environ['DISPLAY'] = ":0"
try:
line = subprocess.check_output(
[settings['aticonfig_path'],
"--adapter=" + str(self.id),
"--odgt"])
m = re.search('Temperature - (\d+\.\d+)\s*C', line)
if m is None:
return None

return float(m.group(1))
except OSError:
return None

class Card():
def __init__(self, id, gpus):
self.id = id
self.x11_id = ':0.' + str(id)
self.gpus = gpus

def fanspeed(self, val=None):
os.environ['DISPLAY'] = self.x11_id
if val is not None:
subprocess.call( [settings['aticonfig_path'],
"--pplib-cmd", "set fanspeed 0 " + str(val)])
else:
line = subprocess.check_output(
[settings['aticonfig_path'],
"--pplib-cmd", "get fanspeed 0"])
m = re.search('Fan Speed: (\d+)%', line)
if m is None:
return False
return int(m.group(1))

class Watcher():
def __init__(self):
self.cards = []
os.environ['DISPLAY'] = ":0"
out = subprocess.check_output([settings['aticonfig_path'],
"--list-adapters"])
card = 0
gpu = 0
lines = split(out, "\n")
for line in lines:
r = re.search('^[ \*]+(\d+)\. [\d:\.]+ (.+)$', line)
if r is None:
continue
gpuid = int(r.group(1))
desc = r.group(2)
if gpu % 2 == 0:
self.cards.append(Card(card, []))
self.cards[-1].gpus.append(GPU(gpuid, card, desc))
print "gpu %d card %d desc %s" % (gpuid, card, desc)
if gpu % 2 == 1:
card = card + 1
gpu = gpu + 1

def watch(self):
while True:
for card in self.cards:
if card is None:
continue
self.watch_card(card)
time.sleep(settings['poll_time'])
print "----------------------------------"

def watch_card(self, card):
fanspeed = card.fanspeed()
fandelta = 0
print "Card %d: fan speed %d%%" % (card.id, fanspeed)
for gpu in card.gpus:
temp = gpu.temp()
print "Card %d GPU %d: Temperature %.2f °C" % (
card.id, gpu.id, temp)
if temp >= settings['temp_cooldown']:
for pid in psutil.get_pid_list():
try:
p = psutil.Process(pid)
if len(p.cmdline) > 2 and 'poclbm' in p.cmdline[1]:
n = int(p.cmdline[2][2])
if n == gpu.id:
print "Suspending GPU %d for 10 seconds" % (n)
os.kill(p.pid, signal.SIGSTOP)
time.sleep(settings['cooldown_period'])
os.kill(p.pid, signal.SIGCONT)
except:
pass
if temp >= settings['temp_fan_100']:
fandelta = 100 - fanspeed
continue
if temp > settings['temp_high']:
if fanspeed <= 90:
fandelta = +10
else:
fandelta = 100 - fanspeed
continue
elif temp < settings['temp_low']:
if fanspeed <= 5:
continue
fandelta = -5
if fandelta != 0 and fanspeed + fandelta >= 10:
print "Card %d: Adjusting fan %d%% to %d%%" % (
card.id, fandelta, fanspeed + fandelta)
card.fanspeed(fanspeed + fandelta)

if __name__ == '__main__':
if len(sys.argv) > 1:
try:
f = open(sys.argv[1])
for line in f:
m = re.search('^(\w+)\s*=\s*(\S.*)$', line)
if m is None:
continue
settings[m.group(1)] = m.group(2)
f.close()
except (OSError, IOError), e:
pass

settings['poll_time'] = int(settings['poll_time'])
settings['temp_low'] = float(settings['temp_low'])
settings['temp_high'] = float(settings['temp_high'])
print repr(settings)
w = Watcher()
w.watch()




Title: Re: gpu-watch: dynamic GPU temperature monitoring and fan control
Post by: TylerJordan on June 23, 2011, 05:42:51 AM
Love this script, but wasn't working right for me so I've made some mods.  firstly I have an onboard AMD GPU that was causing the script to fail so I added a bit to the code to skip it - if you don't want that bit (which you probably don't) then set ati_card_exclude = ''

Secondly, the polling time was far too slow for my taste, so changed it to 4 seconds.

Thirdly the temperature adjustment was not to my liking so I've tied it to the actual temperature of the hottest gpu on a card.  To do this I've removed the low temp and high temp variables and added a new one called fan_overTemp.  My default setting is '10', which thus sets the fanspeed at ten percent over the temperature in degrees C. ==> 70C = 80% fan speed; 80C = 90% fanspeed.  This is a simple mechanism but quite effective for me. It of course maxes out at 100%.

Happy mining

Code:
#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import time
import re
import os
import subprocess
import psutil
import signal
from string import split

settings = {
'poll_time': 4,
'fan_overTemp': 10,
'temp_fan_100': 90.0,
'temp_cooldown': 93.0,
'cooldown_period': 10,
'ati_card_exclude': '4250',
'aticonfig_path': '/usr/bin/aticonfig'
}

class GPU():
def __init__(self, id, cardid, description):
self.id = id
self.cardid = cardid
self.description = description

def temp(self):
os.environ['DISPLAY'] = ":0"
try:
line = subprocess.check_output(
[settings['aticonfig_path'],
"--adapter=" + str(self.id),
"--odgt"])
m = re.search('Temperature - (\d+\.\d+)\s*C', line)
if m is None:
return None

return float(m.group(1))
except OSError:
return None

class Card():
def __init__(self, id, gpus):
self.id = id
self.x11_id = ':0.' + str(id)
self.gpus = gpus

def fanspeed(self, val=None):
os.environ['DISPLAY'] = self.x11_id
if val is not None:
subprocess.call( [settings['aticonfig_path'],
"--pplib-cmd", "set fanspeed 0 " + str(val)])
else:
line = subprocess.check_output(
[settings['aticonfig_path'],
"--pplib-cmd", "get fanspeed 0"])
m = re.search('Fan Speed: (\d+)%', line)
if m is None:
return False
return int(m.group(1))

class Watcher():
def __init__(self):
self.cards = []
os.environ['DISPLAY'] = ":0"
out = subprocess.check_output([settings['aticonfig_path'],
"--list-adapters"])
if settings['ati_card_exclude'] == '':
card = 0
else: card = 1
gpu = 0
lines = split(out, "\n")
for line in lines:
r = re.search('^[ \*]+(\d+)\. [\d:\.]+ (.+)$', line)
if r is None:
continue
gpuid = int(r.group(1))
desc = r.group(2)
if settings['ati_card_exclude'] in desc:
continue
if gpu % 2 == 0:
self.cards.append(Card(card, []))
self.cards[-1].gpus.append(GPU(gpuid, card, desc))
print "gpu %d card %d desc %s" % (gpuid, card, desc)
if gpu % 2 == 1:
card = card + 1
gpu = gpu + 1

def watch(self):
while True:
for card in self.cards:
if card is None:
continue
self.watch_card(card)
time.sleep(settings['poll_time'])
# print "----------------------------------"

def watch_card(self, card):
fanspeed = card.fanspeed()
fandelta = 0
print "Card %d: fan speed %d%%" % (card.id, fanspeed)
for gpu in card.gpus:
temp = gpu.temp()
# print "Card %d GPU %d: Temperature %.2f °C" % (
# card.id, gpu.id, temp)
if temp >= settings['temp_cooldown']:
for pid in psutil.get_pid_list():
try:
p = psutil.Process(pid)
if len(p.cmdline) > 2 and 'poclbm' in p.cmdline[1]:
n = int(p.cmdline[2][2])
if n == gpu.id:
print "Suspending GPU %d for 10 seconds" % (n)
os.kill(p.pid, signal.SIGSTOP)
time.sleep(settings['cooldown_period'])
os.kill(p.pid, signal.SIGCONT)
except:
pass
if gpu.id % 2 != 0 :  # first gpu on card - usually the cooler one
fanspeed2 = int(temp) + settings['fan_overTemp']
continue #check the temp on the second gpu of this card
else:  # second gpu on card
fanspeed = int(temp) + settings['fan_overTemp']
if fanspeed2 > fanspeed:   # use the highest speed
fanspeed = fanspeed2
if fanspeed > 100:
fanspeed = 100
card.fanspeed(fanspeed)

if __name__ == '__main__':
if len(sys.argv) > 1:
try:
f = open(sys.argv[1])
for line in f:
m = re.search('^(\w+)\s*=\s*(\S.*)$', line)
if m is None:
continue
settings[m.group(1)] = m.group(2)
f.close()
except (OSError, IOError), e:
pass

settings['poll_time'] = int(settings['poll_time'])
print repr(settings)
w = Watcher()
w.watch()