Chapter 22. PCI-Express Runtime D3 (RTD3) Power Management

Table of Contents

Introduction
Supported Configurations
System Settings
Driver Settings
Video Memory Utilization Threshold
Procfs Interface For Runtime D3
Known Issues And Workarounds
Automated Setup
Reporting Issues

Introduction

NVIDIA GPUs have many power-saving mechanisms. Some of them will reduce clocks and voltages to different parts of the chip, and in some cases turn off clocks or power to parts of the chip entirely, without affecting functionality or while continuing to function, just at a slower speed.

However, the lowest power states for NVIDIA GPUs require turning power off to the entire chip, often through ACPI calls. Obviously, this impacts functionality. Nothing can run on the GPU while it is powered off. Care has to be taken to only enter this state when there are no workloads running on the GPU and any attempts to start work or any memory mapped I/O (MMIO) access must be preceded with a sequence to first turn the GPU back on and restore any necessary state.

The NVIDIA GPU may have one, two or four PCI functions:

  • Function 0: VGA controller / 3D controller

  • Function 1: Audio device

  • Function 2: USB xHCI Host controller

  • Function 3: USB Type-C UCSI controller

Out of the four PCI functions, the NVIDIA driver directly manages the VGA controller / 3D Controller PCI function. Other PCI functions are managed by the device drivers provided with the Linux kernel. The NVIDIA driver is capable of handling entry into and exit from these low power states, for the PCI function 0. The remaining PCI functions are also powered down along with function 0 when entering these low power states. As a result, the device drivers for the other three functions also need to be taken into account to:

  • prevent entering the lowest-power state when the device is in use, 
  • trigger exiting the lowest-power state when the device is needed, 
  • save and restore any hardware state around power-off events.

The NVIDIA Linux driver includes initial experimental support for dynamically managing power to the NVIDIA GPU. It depends on the runtime power management framework within the Linux kernel to arbitrate power needs of various PCI functions. In order to have maximum power saving from this feature, two conditions must be met:

1. Runtime power management for all the PCI functions of the GPU should be enabled.

2. The device drivers for all the PCI functions should support runtime power management.

If these conditions are satisfied and if all the PCI functions are idle, then The NVIDIA GPU will go to the lowest power state resulting into maximum power savings.

Supported Configurations

This feature is available only when the following conditions are satisfied:

  • This feature is supported only on notebooks.

  • This feature requires system hardware as well as ACPI support (ACPI "_PR0" and "_PR3" methods are needed to control PCIe power). The necessary hardware and ACPI support was first added in Intel Coffeelake chipset series. Hence, this feature is supported from Intel Coffeelake chipset series.

  • This feature requires a Turing or newer GPU.

  • This feature is supported with Linux kernel versions 4.18 and newer. With older kernel versions, it may not work as intended.

  • This feature is supported when Linux kernel defines CONFIG_PM (CONFIG_PM=y). Typically, if the system supports S3 (suspend-to-RAM), then CONFIG_PM would be defined.

System Settings

  1. Enable runtime power management for each PCI function.

    Runtime power management can be enabled for each PCI function using the following command.

    echo auto > /sys/bus/pci/devices/<Domain>:<Bus>:<Device>.<Function>/power/control

    For example:

    echo auto > /sys/bus/pci/devices/0000\:01\:00.0/power/control

Driver Settings

This feature is disabled by default by the NVIDIA driver. It can be enabled via an option NVreg_DynamicPowerManagement.

Option "NVreg_DynamicPowerManagement=0x00"

With this setting, the NVIDIA driver will only use the GPU's built-in power management so it always is powered on and functional. This is the default option, since this feature is a new and highly experimental feature. Actual power usage will vary with the GPU's workload.

Option "NVreg_DynamicPowerManagement=0x01"

This setting is called coarse-grained power control. With this setting, the NVIDIA GPU driver will allow the GPU to go into its lowest power state when no applications are running that use the nvidia driver stack. Whenever an application requiring NVIDIA GPU access is started, the GPU is put into an active state. When the application exits, the GPU is put into a low power state.

Option "NVreg_DynamicPowerManagement=0x02"

This setting is called fine-grained power control. With this setting, the NVIDIA GPU driver will allow the GPU to go into its lowest power state when no applications are running that use the nvidia driver stack. Whenever an application requiring NVIDIA GPU access is started, the GPU is put into an active state. When the application exits, the GPU is put into a low power state.

Additionally, the NVIDIA driver will actively monitor GPU usage while applications using the GPU are running. When the applications have not used the GPU for a short period, the driver will allow the GPU to be powered down. As soon as the application starts using the GPU, the GPU is reactivated.

Furthermore, the NVIDIA GPU driver controls power to the NVIDIA GPU and its video memory separately. While turning off the NVIDIA GPU, the video memory will be kept in a low power self-refresh mode unless the following conditions are met:

If these conditions are met, the NVIDIA GPU driver will completely turn off the video memory, in addition to the rest of the GPU.

Keeping video memory in a self-refresh mode uses more power than turning off video memory, but allows the GPU to be powered off and reactivated more quickly.

It is important to note that the NVIDIA GPU will remain in an active state if it is driving a display. In this case, the NVIDIA GPU will go to a low power state only when the X configuration option HardDPMS is enabled and the display is turned off by some means - either automatically due to an OS setting or manually using commands like xset.

Similarly, the NVIDIA GPU will remain in an active state if a CUDA application is running.

Option NVreg_DynamicPowerManagement can be set on the command line while loading the NVIDIA Linux kernel module. For example,

modprobe nvidia "NVreg_DynamicPowerManagement=0x02"

Video Memory Utilization Threshold

The NVIDIA GPU driver uses 200MB as the default video memory utilization threshold to decide whether the video memory can be turned off or kept in a self-refresh mode. This threshold value can be decreased using an option NVreg_DynamicPowerManagementVideoMemoryThreshold. This option can be set on the command line while loading the NVIDIA Linux kernel module. For example,

modprobe nvidia "NVreg_DynamicPowerManagementVideoMemoryThreshold=100"

The video memory utilization threshold value should be a positive integer. It is expressed in Megabytes (1048576 bytes). In the example above, the threshold value will be set to 100 MB. The maximum threshold value can be 200 MB. Any value greater than 200 MB will be ignored by the NVIDIA GPU driver and it will use the default threshold of 200 MB.

This threshold can be set to 0 in order to prevent the video memory from being turned off.

Procfs Interface For Runtime D3

The following entries in the file /proc/driver/nvidia/gpus/domain:bus:device.function/power provide more details regarding the runtime D3 feature.

  • "Runtime D3 status" entry gives the current status of this feature.

  • "Video Memory" entry gives the power status of the video memory.

  • "Video Memory Self Refresh" entry reports whether the NVIDIA GPU hardware supports video memory self refresh mode.

  • "Video Memory Off" entry reports whether the NVIDIA GPU hardware supports video memory off mode.

Known Issues And Workarounds

  1. As of this writing, The USB xHCI Host controller and USB Type-C UCSI controller drivers present in most Linux distributions do not fully support runtime power management. Several upstream kernel changes are being done to fix the issues. In the interim, these two PCI functions have to be disabled for this feature to work properly. This can be done using the following command.

    echo 1 > /sys/bus/pci/devices/<Domain>:<Bus>:<Device>.2/remove

    echo 1 > /sys/bus/pci/devices/<Domain>:<Bus>:<Device>.3/remove

    For example:

    echo 1 > /sys/bus/pci/devices/0000\:01\:00.2/remove

    echo 1 > /sys/bus/pci/devices/0000\:01\:00.3/remove

  2. There is a known issue with the audio driver due to which the audio PCI function remains in an active state from the kernel version 4.19 and up. (from commit id: 37a3a98ef601f89100e3bb657fb0e190b857028c). Upstream kernel changes are being done to fix the issue. In the interim, the Audio PCI function needs to be disabled by using the following command.

    echo 1 > /sys/bus/pci/devices/<Domain>:<Bus>:<Device>.1/remove

    For example:

    echo 1 > /sys/bus/pci/devices/0000\:01\:00.1/remove

    This workaround will result in audio loss when the audio function is being used to play audio over DP/HDMI connection. To recover from audio loss, rescanning the PCI tree will bring back the audio PCI function and audio operation can be recovered. However, after rescanning the PCI tree, all the disabled PCI functions will again become active. To ensure that this feature works again, the workarounds mentioned in this section have to be done again.

  3. When the NVIDIA GPU is driving a console, runtime power management is enabled for the VGA Controller PCI function and nvidia driver is uninstalled, the console will become blank. The workaround for this issue is to disable runtime power management for PCI function 0 before uninstalling the NVIDIA driver using the following command:

    echo on > /sys/bus/pci/devices/<Domain>:<Bus>:<Device>.<Function>/power/control

    For example:

    echo on > /sys/bus/pci/devices/0000\:01\:00.0/power/control

Automated Setup

This section describes automated ways to perform the manual steps mentioned above so that this feature works seamlessly after boot.

  1. Create a file named 80-nvidia-pm.rules in /lib/udev/rules.d/ directory.

    Add the content given below to 80-nvidia-pm.rules file. This would enable runtime power management for the VGA Controller / 3D Controller PCI function. It would also remove Audio device PCI function, USB xHCI Host Controller function as well as USB Type-C UCSI Controller PCI function.

    # Remove NVIDIA USB xHCI Host Controller devices, if present
    ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{remove}="1"
    
    # Remove NVIDIA USB Type-C UCSI devices, if present
    ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{remove}="1"
    
    # Remove NVIDIA Audio devices, if present
    ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", ATTR{remove}="1"
    
    # Enable runtime PM for NVIDIA VGA/3D controller devices on driver bind
    ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="auto"
    ACTION=="bind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="auto"
    
    # Disable runtime PM for NVIDIA VGA/3D controller devices on driver unbind
    ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030000", TEST=="power/control", ATTR{power/control}="on"
    ACTION=="unbind", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x030200", TEST=="power/control", ATTR{power/control}="on"
    
  2. The driver option NVreg_DynamicPowerManagement can be set via the distribution's kernel module configuration files (such as those under /etc/modprobe.d). For example, the following line can be added to /etc/modprobe.d/nvidia.conf file to seamlessly enable this feature.

    options nvidia "NVreg_DynamicPowerManagement=0x02"

  3. Reboot the system.

Reporting Issues

For better error reporting, nvidia-bug-report.sh collects a dump of ACPI tables using acpidump utility. Depending on your Linux distribution, this utility may be found in a package called acpica-tools or acpica or similar.