eyeoh
Newbie
Offline
Activity: 31
Merit: 0
|
|
June 03, 2011, 11:44:02 PM |
|
I would really like to see a modified version of this. Instead of changing the fan speed relative to the temp I would like to see it... - Max fans to 100% on startup
- slowly increase the core Mhz till the temp gets to a certain threshold
- if the temp can't get cooler then XX even after down clocking for X rounds email home and stop the GPU miners
- It should also auto detect the # of GPU's in a server
I have a shell script that runs via cron that does most of this but I'd like to see a python version that runs as a daemon. Think of it as a GPU auto OC'er and temp watch dog all rolled into 1 with email warnings.
|
|
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
marcus_of_augustus
Legendary
Offline
Activity: 3920
Merit: 2348
Eadem mutata resurgo
|
|
June 03, 2011, 11:48:22 PM |
|
I've found changing clock speeds while hashing can cause lock-ups, i.e. even near clock speeds that can be stable if set before launching.
Scripts changing clock speeds is not such a good idea imo.
|
|
|
|
marcus_of_augustus
Legendary
Offline
Activity: 3920
Merit: 2348
Eadem mutata resurgo
|
|
June 04, 2011, 12:52:17 PM |
|
I've found a page that has a simple Python PID controller. http://code.activestate.com/recipes/577231-discrete-pid-controller/#The recipe gives simple implementation of a Discrete Proportional-Integral-Derivative (PID) controller. PID controller gives output value for error between desired reference input and measurement feedback to minimize error value. #More information: http://en.wikipedia.org/wiki/PID_controller # #cnr437@gmail.com # ####### Example ######### # #p=PID(3.0,0.4,1.2) #p.setPoint(5.0) #while True: # pid = p.update(measurement_value) # #
class PID: """ Discrete PID control """
def __init__(self, P=2.0, I=0.0, D=1.0, Derivator=0, Integrator=0, Integrator_max=500, Integrator_min=-500):
self.Kp=P self.Ki=I self.Kd=D self.Derivator=Derivator self.Integrator=Integrator self.Integrator_max=Integrator_max self.Integrator_min=Integrator_min
self.set_point=0.0 self.error=0.0
def update(self,current_value): """ Calculate PID output value for given reference input and feedback """
self.error = self.set_point - current_value
self.P_value = self.Kp * self.error self.D_value = self.Kd * ( self.error - self.Derivator) self.Derivator = self.error
self.Integrator = self.Integrator + self.error
if self.Integrator > self.Integrator_max: self.Integrator = self.Integrator_max elif self.Integrator < self.Integrator_min: self.Integrator = self.Integrator_min
self.I_value = self.Integrator * self.Ki
PID = self.P_value + self.I_value + self.D_value
return PID
def setPoint(self,set_point): """ Initilize the setpoint of PID """ self.set_point = set_point self.Integrator=0 self.Derivator=0
def setIntegrator(self, Integrator): self.Integrator = Integrator
def setDerivator(self, Derivator): self.Derivator = Derivator
def setKp(self,P): self.Kp=P
def setKi(self,I): self.Ki=I
def setKd(self,D): self.Kd=D
def getPoint(self): return self.set_point
def getError(self): return self.error
def getIntegrator(self): return self.Integrator
def getDerivator(self): return self.Derivator
If any of you Python wizards feels like patching this into your temp. fan control algo it would probably be adequate to eliminate the 'hunting' phenomena, as the fan chases the temperature around. The current temperature is "measurement_value" I think. P,I, and D are the tuning parameters for the controller. Set-point is what temp. you want the card to be. The output will be some multiplier for the fan speed (normalised would be 0-1, i.e. multiplied by 100 to get fan speed in percentages). Anyone who feels like twiddling, can tune the PID parameters, they'll probably have long time constants because of the nature of the thermal inertia of the heat sinks, etc. It needs to perform this loop at least 10 faster than the shortest time constant, which is probably pretty long given the thermal nature, so keep the 30 or 60 second loop is probably adequate.
|
|
|
|
mikegogulski
|
|
June 04, 2011, 04:26:37 PM |
|
I would really like to see a modified version of this. Instead of changing the fan speed relative to the temp I would like to see it... - Max fans to 100% on startup
- slowly increase the core Mhz till the temp gets to a certain threshold
- if the temp can't get cooler then XX even after down clocking for X rounds email home and stop the GPU miners
- It should also auto detect the # of GPU's in a server
I have a shell script that runs via cron that does most of this but I'd like to see a python version that runs as a daemon. Think of it as a GPU auto OC'er and temp watch dog all rolled into 1 with email warnings. This is a neat idea but not something I'd implement myself without some very hard limits. The penalty for clocking too high (at least on my 6990s) isn't overtemperature, it's a complete lockup.
|
|
|
|
mikegogulski
|
|
June 04, 2011, 04:30:06 PM |
|
Newer version based on the above. Still requires dual-GPU cards. Now supports configurable limit for immediate 100% fan speed, plus a threshold and interval for simply shutting a GPU down temporarily if the threshold temp is exceeded. Assumes you're using poclbm, and python-psutil is a new requirement. #!/usr/bin/python # -*- coding: utf-8 -*-
import sys import time import re import os import subprocess import psutil import signal from string import split
settings = { 'poll_time': 60, 'temp_low': 70.0, 'temp_high': 76.0, 'temp_fan_100': 90.0, 'temp_cooldown': 93.0, 'cooldown_period': 10, 'aticonfig_path': '/usr/bin/aticonfig' }
class GPU(): def __init__(self, id, cardid, description): self.id = id self.cardid = cardid self.description = description
def temp(self): os.environ['DISPLAY'] = ":0" try: line = subprocess.check_output( [settings['aticonfig_path'], "--adapter=" + str(self.id), "--odgt"]) m = re.search('Temperature - (\d+\.\d+)\s*C', line) if m is None: return None
return float(m.group(1)) except OSError: return None
class Card(): def __init__(self, id, gpus): self.id = id self.x11_id = ':0.' + str(id) self.gpus = gpus
def fanspeed(self, val=None): os.environ['DISPLAY'] = self.x11_id if val is not None: subprocess.call( [settings['aticonfig_path'], "--pplib-cmd", "set fanspeed 0 " + str(val)]) else: line = subprocess.check_output( [settings['aticonfig_path'], "--pplib-cmd", "get fanspeed 0"]) m = re.search('Fan Speed: (\d+)%', line) if m is None: return False return int(m.group(1))
class Watcher(): def __init__(self): self.cards = [] os.environ['DISPLAY'] = ":0" out = subprocess.check_output([settings['aticonfig_path'], "--list-adapters"]) card = 0 gpu = 0 lines = split(out, "\n") for line in lines: r = re.search('^[ \*]+(\d+)\. [\d:\.]+ (.+)$', line) if r is None: continue gpuid = int(r.group(1)) desc = r.group(2) if gpu % 2 == 0: self.cards.append(Card(card, [])) self.cards[-1].gpus.append(GPU(gpuid, card, desc)) print "gpu %d card %d desc %s" % (gpuid, card, desc) if gpu % 2 == 1: card = card + 1 gpu = gpu + 1
def watch(self): while True: for card in self.cards: if card is None: continue self.watch_card(card) time.sleep(settings['poll_time']) print "----------------------------------"
def watch_card(self, card): fanspeed = card.fanspeed() fandelta = 0 print "Card %d: fan speed %d%%" % (card.id, fanspeed) for gpu in card.gpus: temp = gpu.temp() print "Card %d GPU %d: Temperature %.2f °C" % ( card.id, gpu.id, temp) if temp >= settings['temp_cooldown']: for pid in psutil.get_pid_list(): try: p = psutil.Process(pid) if len(p.cmdline) > 2 and 'poclbm' in p.cmdline[1]: n = int(p.cmdline[2][2]) if n == gpu.id: print "Suspending GPU %d for 10 seconds" % (n) os.kill(p.pid, signal.SIGSTOP) time.sleep(settings['cooldown_period']) os.kill(p.pid, signal.SIGCONT) except: pass if temp >= settings['temp_fan_100']: fandelta = 100 - fanspeed continue if temp > settings['temp_high']: if fanspeed <= 90: fandelta = +10 else: fandelta = 100 - fanspeed continue elif temp < settings['temp_low']: if fanspeed <= 5: continue fandelta = -5 if fandelta != 0 and fanspeed + fandelta >= 10: print "Card %d: Adjusting fan %d%% to %d%%" % ( card.id, fandelta, fanspeed + fandelta) card.fanspeed(fanspeed + fandelta)
if __name__ == '__main__': if len(sys.argv) > 1: try: f = open(sys.argv[1]) for line in f: m = re.search('^(\w+)\s*=\s*(\S.*)$', line) if m is None: continue settings[m.group(1)] = m.group(2) f.close() except (OSError, IOError), e: pass
settings['poll_time'] = int(settings['poll_time']) settings['temp_low'] = float(settings['temp_low']) settings['temp_high'] = float(settings['temp_high']) print repr(settings) w = Watcher() w.watch()
|
|
|
|
TylerJordan
Newbie
Offline
Activity: 58
Merit: 0
|
|
June 23, 2011, 05:42:51 AM Last edit: June 23, 2011, 06:02:33 AM by TylerJordan |
|
Love this script, but wasn't working right for me so I've made some mods. firstly I have an onboard AMD GPU that was causing the script to fail so I added a bit to the code to skip it - if you don't want that bit (which you probably don't) then set ati_card_exclude = '' Secondly, the polling time was far too slow for my taste, so changed it to 4 seconds. Thirdly the temperature adjustment was not to my liking so I've tied it to the actual temperature of the hottest gpu on a card. To do this I've removed the low temp and high temp variables and added a new one called fan_overTemp. My default setting is '10', which thus sets the fanspeed at ten percent over the temperature in degrees C. ==> 70C = 80% fan speed; 80C = 90% fanspeed. This is a simple mechanism but quite effective for me. It of course maxes out at 100%. Happy mining #!/usr/bin/python # -*- coding: utf-8 -*-
import sys import time import re import os import subprocess import psutil import signal from string import split
settings = { 'poll_time': 4, 'fan_overTemp': 10, 'temp_fan_100': 90.0, 'temp_cooldown': 93.0, 'cooldown_period': 10, 'ati_card_exclude': '4250', 'aticonfig_path': '/usr/bin/aticonfig' }
class GPU(): def __init__(self, id, cardid, description): self.id = id self.cardid = cardid self.description = description
def temp(self): os.environ['DISPLAY'] = ":0" try: line = subprocess.check_output( [settings['aticonfig_path'], "--adapter=" + str(self.id), "--odgt"]) m = re.search('Temperature - (\d+\.\d+)\s*C', line) if m is None: return None
return float(m.group(1)) except OSError: return None
class Card(): def __init__(self, id, gpus): self.id = id self.x11_id = ':0.' + str(id) self.gpus = gpus
def fanspeed(self, val=None): os.environ['DISPLAY'] = self.x11_id if val is not None: subprocess.call( [settings['aticonfig_path'], "--pplib-cmd", "set fanspeed 0 " + str(val)]) else: line = subprocess.check_output( [settings['aticonfig_path'], "--pplib-cmd", "get fanspeed 0"]) m = re.search('Fan Speed: (\d+)%', line) if m is None: return False return int(m.group(1))
class Watcher(): def __init__(self): self.cards = [] os.environ['DISPLAY'] = ":0" out = subprocess.check_output([settings['aticonfig_path'], "--list-adapters"]) if settings['ati_card_exclude'] == '': card = 0 else: card = 1 gpu = 0 lines = split(out, "\n") for line in lines: r = re.search('^[ \*]+(\d+)\. [\d:\.]+ (.+)$', line) if r is None: continue gpuid = int(r.group(1)) desc = r.group(2) if settings['ati_card_exclude'] in desc: continue if gpu % 2 == 0: self.cards.append(Card(card, [])) self.cards[-1].gpus.append(GPU(gpuid, card, desc)) print "gpu %d card %d desc %s" % (gpuid, card, desc) if gpu % 2 == 1: card = card + 1 gpu = gpu + 1
def watch(self): while True: for card in self.cards: if card is None: continue self.watch_card(card) time.sleep(settings['poll_time']) # print "----------------------------------"
def watch_card(self, card): fanspeed = card.fanspeed() fandelta = 0 print "Card %d: fan speed %d%%" % (card.id, fanspeed) for gpu in card.gpus: temp = gpu.temp() # print "Card %d GPU %d: Temperature %.2f °C" % ( # card.id, gpu.id, temp) if temp >= settings['temp_cooldown']: for pid in psutil.get_pid_list(): try: p = psutil.Process(pid) if len(p.cmdline) > 2 and 'poclbm' in p.cmdline[1]: n = int(p.cmdline[2][2]) if n == gpu.id: print "Suspending GPU %d for 10 seconds" % (n) os.kill(p.pid, signal.SIGSTOP) time.sleep(settings['cooldown_period']) os.kill(p.pid, signal.SIGCONT) except: pass if gpu.id % 2 != 0 : # first gpu on card - usually the cooler one fanspeed2 = int(temp) + settings['fan_overTemp'] continue #check the temp on the second gpu of this card else: # second gpu on card fanspeed = int(temp) + settings['fan_overTemp'] if fanspeed2 > fanspeed: # use the highest speed fanspeed = fanspeed2 if fanspeed > 100: fanspeed = 100 card.fanspeed(fanspeed)
if __name__ == '__main__': if len(sys.argv) > 1: try: f = open(sys.argv[1]) for line in f: m = re.search('^(\w+)\s*=\s*(\S.*)$', line) if m is None: continue settings[m.group(1)] = m.group(2) f.close() except (OSError, IOError), e: pass
settings['poll_time'] = int(settings['poll_time']) print repr(settings) w = Watcher() w.watch()
|
|
|
|
|