Let me preface this by saying I've had massive amounts of headaches scrounging the internet trying to get my nvidia overclocking/power management working on ubuntu. I'm sure this guide is going to miss some edge cases I didn't hit myself. If you spot one please let me know and I'll be sure to update this post.
Before you begin, please note one of the most frustrating pitfalls you can hit when enabling manual control on your GPUs is trying to use integrated graphics. This will cause your gpumanager to recognize your system as a "hybrid system" which will nullify your generated configuration (unless you really know what you're doing and know how to manually write an xorg.conf file, DO NOT USE INTEGRATED GRAPHICS). In fact, it is generally safer to disable the integrated graphics altogether in the BIOS.
1) DriverFor starters, you'll need a driver. Currently, I rely on the Ubuntu optimized apt package 'nvidia-384' which can be pulled by running:
$ sudo apt-get install nvidia-384
The driver doesn't really matter. You'll just need one that is compatible with ubuntu. I would recommend installing via either apt or deb package, the runfile can cause headaches down the line when you need to upgrade the drivers. At the beginning this guide is meant for people relatively new to ubuntu and I will not be covering voltage control. If someone wants to help with that please DM me or post here.
Once the driver is installed, reboot
2) Enabling Manual ControlOn ubuntu and other debian based systems, you'll need to set the Coolbits parameter in your xorg.conf file. This is a bit sequence that enables control of various aspects of the graphics card. You can find the breakdown online, but I will provide it here as well alongside an explanation:
For those unfamiliar with binary, or those who only want to run the recommended settings, please skip section 2.1 and go straight to 2.2 recommended settings
2.1) Coolbits Bit Sequence[0/1][0/1][0/1][0/1][0/1]
4 3 2 1 0
Bit 0: Enables overclocking of pre-Fermi cards
Bit 1: Attempt to initialize SLI for cards with different memory amounts (This one can be and should be ignored to your own peril)
Bit 2: Enable manual configuration of fan speed
Bit 3: Enable mem/clock overclocking
Bit 4: Enable voltage control
2.2) Recommended SettingsFor most cards, you're going to want a cool bits option of "28" which is Bits 2, 3, and 4 set (11100). If you do have pre-Fermi cards you're cool bits option would be
"21" which is bits 0, 2 and 4 set (10101).
2.3) Setting the xorg.conf fileTo set the coolbits option. You'll need to run the following command with the your recommended or calculated setting from sections 2.1 and 2.2. I'm using "28" as it is most common.
$ sudo nvidia-xconfig -a --cool-bits 28
This command is using both the '-a' flag (enables all GPUs in the xorg.conf file) and the '--cool-bits' flag to set the coolbits. If you need to do any manual editing or validation, the generated file will be located at '/etc/X11/xorg.conf'.
Now either log out and back in, or reboot the machine (either one will reread the xorg.conf file). Validate that your xorg.conf file has not changed when you get back to your desktop. If it has changed, please refer to section 4 for troubleshooting.
3) Managing your cardsYou will now need to go through and manage the manual configuration of your cards. This is done through two different commands:
for overclocking and fan control
for power management]
Please note that 'nvidia-smi' needs to be run as root while 'nvidia-settings' cannot be run as root.
3.1) Core/Mem Overclocking and Fan ControlIn order to overclock the core and memory use the following commands:
nvidia-settings -c :0 -a "[gpu:0]/GPUMemoryTransferRateOffset[3]=800"
nvidia-settings -c :0 -a "[gpu:0]/GPUGraphicsClockOffset[3]=100"
In order to control the fan speed use the following commands:
nvidia-settings -c :0 -a "[gpu:$i]/GPUFanControlState=1"
nvidia-settings -c :0 -a "[fan:$i]/GPUTargetFanSpeed=85"
To break these commands down, the gpu ID follows the same order listed if you run 'nvidia-smi' in your terminal. The ID number is specified by '...[gpu:<id>]...' prefixing each setting string. The following index you'll notice is the '...[3]...' prior to the equal sign. This denotes that in setting 3 it will receive that much overclock. GPUs in ubuntu and other systems have a scale that denotes how fast they should run in hierarchy from 0-3 with 0 being the slowest and 3 being the highest. If your cards get too hot they'll drop out of setting 3 and into 2, 2 to 1, so on and so forth down to 0. I would not recommend setting any overclock levels other than 3 and work on heat management. The last bit after the equal sign denotes the value to set
3.2) Maximum Power Draw SettingIn order to set the maximum power draw for each card use the following commands:
sudo nvidia-smi -i 0 -pm 1
sudo nvidia-smi -i 0 -pl 150
To break these down, the first command sets the power management for card in ID 0 to on. The second command sets the power limit to 150 for card in ID 0. If no -i flag is specified, the setting applies to all cards.
4) Troubleshootingxorg.conf being overwrittenIf your xorg.conf is being overwritten when you log back in, the most likely culprit is the GPU manager. There is a swath of reasons why this could be happening. The most common reason I have seen is trying to use integrated graphics instead of hooking the graphics up directly to the primary GPU. If that is not the case you can look in '/var/log/gpumanager.log' to see if it is overwriting the xorg.conf
If you believe this guide is incomplete or if you have any questions, please feel free to follow up here or over DM and I'd be happy to help.