Title: Ubuntu Nvidia Overclocking, Heat and Power Management Post by: cpmcgrat on January 30, 2018, 09:18:19 AM Let me preface this by saying I've had massive amounts of headaches scrounging the internet trying to get my nvidia overclocking/power management working on ubuntu. I'm sure this guide is going to miss some edge cases I didn't hit myself. If you spot one please let me know and I'll be sure to update this post.
Before you begin, please note one of the most frustrating pitfalls you can hit when enabling manual control on your GPUs is trying to use integrated graphics. This will cause your gpumanager to recognize your system as a "hybrid system" which will nullify your generated configuration (unless you really know what you're doing and know how to manually write an xorg.conf file, DO NOT USE INTEGRATED GRAPHICS). In fact, it is generally safer to disable the integrated graphics altogether in the BIOS. 1) Driver For starters, you'll need a driver. Currently, I rely on the Ubuntu optimized apt package 'nvidia-384' which can be pulled by running: Code: $ sudo apt-get install nvidia-384 The driver doesn't really matter. You'll just need one that is compatible with ubuntu. I would recommend installing via either apt or deb package, the runfile can cause headaches down the line when you need to upgrade the drivers. At the beginning this guide is meant for people relatively new to ubuntu and I will not be covering voltage control. If someone wants to help with that please DM me or post here. Once the driver is installed, reboot 2) Enabling Manual Control On ubuntu and other debian based systems, you'll need to set the Coolbits parameter in your xorg.conf file. This is a bit sequence that enables control of various aspects of the graphics card. You can find the breakdown online, but I will provide it here as well alongside an explanation: For those unfamiliar with binary, or those who only want to run the recommended settings, please skip section 2.1 and go straight to 2.2 recommended settings 2.1) Coolbits Bit Sequence [0/1][0/1][0/1][0/1][0/1] 4 3 2 1 0 Bit 0: Enables overclocking of pre-Fermi cards Bit 1: Attempt to initialize SLI for cards with different memory amounts (This one can be and should be ignored to your own peril) Bit 2: Enable manual configuration of fan speed Bit 3: Enable mem/clock overclocking Bit 4: Enable voltage control 2.2) Recommended Settings For most cards, you're going to want a cool bits option of "28" which is Bits 2, 3, and 4 set (11100). If you do have pre-Fermi cards you're cool bits option would be "21" which is bits 0, 2 and 4 set (10101). 2.3) Setting the xorg.conf file To set the coolbits option. You'll need to run the following command with the your recommended or calculated setting from sections 2.1 and 2.2. I'm using "28" as it is most common. Code: $ sudo nvidia-xconfig -a --cool-bits 28 This command is using both the '-a' flag (enables all GPUs in the xorg.conf file) and the '--cool-bits' flag to set the coolbits. If you need to do any manual editing or validation, the generated file will be located at '/etc/X11/xorg.conf'. Now either log out and back in, or reboot the machine (either one will reread the xorg.conf file). Validate that your xorg.conf file has not changed when you get back to your desktop. If it has changed, please refer to section 4 for troubleshooting. 3) Managing your cards You will now need to go through and manage the manual configuration of your cards. This is done through two different commands: Code: nvidia-settings for overclocking and fan controlCode: nvidia-smi for power management]Please note that 'nvidia-smi' needs to be run as root while 'nvidia-settings' cannot be run as root. 3.1) Core/Mem Overclocking and Fan Control In order to overclock the core and memory use the following commands: Code: nvidia-settings -c :0 -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=800" Code: nvidia-settings -c :0 -a "[gpu:0]/GPUGraphicsClockOffset[3]=100" In order to control the fan speed use the following commands: Code: nvidia-settings -c :0 -a "[gpu:$i]/GPUFanControlState=1" Code: nvidia-settings -c :0 -a "[fan:$i]/GPUTargetFanSpeed=85" To break these commands down, the gpu ID follows the same order listed if you run 'nvidia-smi' in your terminal. The ID number is specified by '...[gpu:<id>]...' prefixing each setting string. The following index you'll notice is the '...[3]...' prior to the equal sign. This denotes that in setting 3 it will receive that much overclock. GPUs in ubuntu and other systems have a scale that denotes how fast they should run in hierarchy from 0-3 with 0 being the slowest and 3 being the highest. If your cards get too hot they'll drop out of setting 3 and into 2, 2 to 1, so on and so forth down to 0. I would not recommend setting any overclock levels other than 3 and work on heat management. The last bit after the equal sign denotes the value to set 3.2) Maximum Power Draw Setting In order to set the maximum power draw for each card use the following commands: Code: sudo nvidia-smi -i 0 -pm 1 Code: sudo nvidia-smi -i 0 -pl 150 To break these down, the first command sets the power management for card in ID 0 to on. The second command sets the power limit to 150 for card in ID 0. If no -i flag is specified, the setting applies to all cards. 4) Troubleshooting xorg.conf being overwritten If your xorg.conf is being overwritten when you log back in, the most likely culprit is the GPU manager. There is a swath of reasons why this could be happening. The most common reason I have seen is trying to use integrated graphics instead of hooking the graphics up directly to the primary GPU. If that is not the case you can look in '/var/log/gpumanager.log' to see if it is overwriting the xorg.conf If you believe this guide is incomplete or if you have any questions, please feel free to follow up here or over DM and I'd be happy to help. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: cpmcgrat on January 30, 2018, 05:19:10 PM It is also worth noting that sometimes the nvidia driver install will fail to blacklist nouveau which will need to be done manually.
Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: MarkAz on January 30, 2018, 06:19:38 PM Wow, great writeup - you get merit for that! :) I actually haven't seen a good writeup on the coolbits part before and always wondered about the significance of the values (never a huge fan of magic numbers)
Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: cpmcgrat on January 30, 2018, 06:28:12 PM Wow, great writeup - you get merit for that! :) I actually haven't seen a good writeup on the coolbits part before and always wondered about the significance of the values (never a huge fan of magic numbers) Thank you for the kind words! I was hoping I could help save some people some time and clarify some of the more esoteric concepts with Ubuntu overclocking. Glad it helped you and thank you for the merit :) Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: cpmcgrat on January 30, 2018, 10:12:34 PM If anyone can help flush out the knowledge base here on undervolting cards on Ubuntu I’m willing to offer a bounty of 24 hours of mining time on any equihash, ethash or cryptonight based coin to wallet/pool of your choice with the following rig:
1x GTX 1080ti 3x GTX 1070 Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: cpmcgrat on January 31, 2018, 05:15:16 AM Any takers?
Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: NameTaken on January 31, 2018, 05:27:12 AM If anyone can help flush out the knowledge base here on undervolting cards on Ubuntu I’m willing to offer a bounty of 24 hours of mining time on any equihash, ethash or cryptonight based coin to wallet/pool of your choice with the following rig: Doesn't1x GTX 1080ti 3x GTX 1070 Code: sudo nvidia-smi -i 0 -pl 150 already increase or reduce the power limit for individual cards? What do you think needs elaborating? Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: cpmcgrat on January 31, 2018, 05:38:23 AM If anyone can help flush out the knowledge base here on undervolting cards on Ubuntu I’m willing to offer a bounty of 24 hours of mining time on any equihash, ethash or cryptonight based coin to wallet/pool of your choice with the following rig: Doesn't1x GTX 1080ti 3x GTX 1070 Code: sudo nvidia-smi -i 0 -pl 150 already increase or reduce the power limit for individual cards? What do you think needs elaborating? Undervolting the cards. I haven’t gone down that path yet, but Ive heard you can get better results that way. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: NameTaken on January 31, 2018, 05:42:01 AM Isn't reducing the power limit already undervolting the GPU?
Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: cpmcgrat on January 31, 2018, 06:24:39 AM Isn't reducing the power limit already undervolting the GPU? I guess technically it could be since Watts = Volts * Amps (depends how it’s managed under the hood, which I’m not familiar with), but that knowledge alone is worth adding to the guide for someone new. I personally haven’t gone down that road so I don’t feel qualified to provide information on the subject. If you’re willing to help add what you know I’d be happy to contribute the mining power to you. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: MarkAz on January 31, 2018, 06:43:04 AM Technically I don't think the power limiting on the NVidia driver is the same as undervolting - which you can do on ATI's pretty easily, or some ASIC's. On NVidia, you can limit the amount of watts that the card will use max, but I believe it's still working at the same core voltage... With things like the ATI, you can detune off of 12v, and that basically allows the same performance but with less power consumption.
So, short answer is everyone on the NVidia side seems to get the efficiency gains by using the power limiting, which you already documented - and if you take something like WhatToMine, their calculation are clocking up core and memory, and reducing the power limit. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: NameTaken on January 31, 2018, 06:59:04 AM Technically I don't think the power limiting on the NVidia driver is the same as undervolting - which you can do on ATI's pretty easily, or some ASIC's. On NVidia, you can limit the amount of watts that the card will use max, but I believe it's still working at the same core voltage... With things like the ATI, you can detune off of 12v, and that basically allows the same performance but with less power consumption. Reducing power limit does affect the core voltage. I don't think you can see the voltage using nvidia-smi in Linux but with Windows software you can see how many mV the GPU is using before and after changing power limit.So, short answer is everyone on the NVidia side seems to get the efficiency gains by using the power limiting, which you already documented - and if you take something like WhatToMine, their calculation are clocking up core and memory, and reducing the power limit. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: ruphus on February 20, 2018, 09:26:33 PM Hey guys,
can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig? Not working (any #): Code: rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70 rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70' rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70" rig1@rig1:~$ Working: Code: rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70 Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70. rig1@rig1:~$ BTW: setting Code: nvidia-settings -a [gpu:1]/GPUFanControlState=1 works (for any #.....best regards Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: cpmcgrat on March 07, 2018, 10:55:40 PM Hey guys, can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig? Not working (any #): Code: rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70 rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70' rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70" rig1@rig1:~$ Working: Code: rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70 Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70. rig1@rig1:~$ BTW: setting Code: nvidia-settings -a [gpu:1]/GPUFanControlState=1 works (for any #.....best regards Not sure, could you go through the steps you tried then run "$ history" so I can validate your command sequence? Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: mdpiot on March 08, 2018, 12:44:49 AM Hey guys, can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig? What version driver are you using? I remember once having a similar issue when i was building my first rig. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management: Definitive Guide Post by: cpmcgrat on March 09, 2018, 01:43:26 AM Hey guys, can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig? What version driver are you using? I remember once having a similar issue when i was building my first rig. What issues did you see with the driver? I'd like to update the primary post with that as a common pitfall. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management Post by: NameTaken on March 09, 2018, 02:03:22 AM Hey guys, can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig? Not working (any #): Code: rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70 rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70' rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70" rig1@rig1:~$ Working: Code: rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70 Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70. rig1@rig1:~$ BTW: setting Code: nvidia-settings -a [gpu:1]/GPUFanControlState=1 works (for any #.....best regards Code: $ nvidia-settings -a [fan:1]/GPUTargetFanSpeed=70 Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management Post by: cpmcgrat on March 10, 2018, 07:12:27 AM Hey guys, can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig? Not working (any #): Code: rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70 rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70' rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70" rig1@rig1:~$ Working: Code: rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70 Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70. rig1@rig1:~$ BTW: setting Code: nvidia-settings -a [gpu:1]/GPUFanControlState=1 works (for any #.....best regards Code: $ nvidia-settings -a [fan:1]/GPUTargetFanSpeed=70 I'm pretty sure they've already tried that, just were expressing the number as #. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management Post by: NameTaken on March 29, 2018, 08:06:26 PM Hey guys, can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig? Not working (any #): Code: rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70 rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70' rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70" rig1@rig1:~$ Working: Code: rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70 Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70. rig1@rig1:~$ BTW: setting Code: nvidia-settings -a [gpu:1]/GPUFanControlState=1 works (for any #.....best regards Code: $ nvidia-settings -a [fan:1]/GPUTargetFanSpeed=70 I'm pretty sure they've already tried that, just were expressing the number as #. Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management Post by: cpmcgrat on April 02, 2018, 10:48:23 PM Hey guys, can anyone tell me, why I am not able to set individual fanspeeds on my ubuntu 7x 1060 rig? Not working (any #): Code: rig1@rig1:~$ nvidia-settings -a [gpu:#]/GPUTargetFanSpeed=70 rig1@rig1:~$ nvidia-settings -a '[gpu:#]/GPUTargetFanSpeed=70' rig1@rig1:~$ nvidia-settings -a "[gpu:#]/GPUTargetFanSpeed=70" rig1@rig1:~$ Working: Code: rig1@rig1:~$ nvidia-settings -a GPUTargetFanSpeed=70 Attribute 'GPUTargetFanSpeed' (rig1:0[fan:0]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:1]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:2]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:3]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:4]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:5]) assigned value 70. Attribute 'GPUTargetFanSpeed' (rig1:0[fan:6]) assigned value 70. rig1@rig1:~$ BTW: setting Code: nvidia-settings -a [gpu:1]/GPUFanControlState=1 works (for any #.....best regards Code: $ nvidia-settings -a [fan:1]/GPUTargetFanSpeed=70 I'm pretty sure they've already tried that, just were expressing the number as #. Ah yes, you are correct. Thank you Title: Re: Ubuntu Nvidia Overclocking, Heat and Power Management Post by: agismaniax on December 27, 2018, 02:29:46 PM Can I use nvidia-smi to overclock cpu clock, mem clock and fan speed? Because nvidia-settings need xorg to run firsta and it consumes small amount of VRAM.
|