Appendix L. Known Issues

The following problems still exist in this release and are in the process of being resolved.

Known Issues

OpenGL and dlopen()

There are some issues with older versions of the glibc dynamic loader (e.g., the version that shipped with Red Hat Linux 7.2) and applications such as Quake3 and Radiant, that use dlopen(). Please see Chapter 4, Frequently Asked Questions for more details.

Multicard, Multimonitor

In some cases, the secondary card is not initialized correctly by the NVIDIA kernel module. You can work around this by enabling the XFree86 Int10 module to soft-boot all secondary cards. See Appendix D, X Config Options for details.

Interaction with pthreads

Single-threaded applications that use dlopen() to load NVIDIA's libGL library, and then use dlopen() to load any other library that is linked against libpthread will crash in libGL. This does not happen in NVIDIA's new ELF TLS OpenGL libraries (please see Appendix C, Installed Components for a description of the ELF TLS OpenGL libraries). Possible workarounds for this problem are:

  1. Load the library that is linked with libpthread before loading libGL.

  2. Link the application with libpthread.

The X86-64 platform (AMD64/EM64T) and 2.6 kernels

Many 2.4 and 2.6 x86_64 kernels have an accounting problem in their implementation of the change_page_attr kernel interface. Early 2.6 kernels include a check that triggers a BUG() when this situation is encountered (triggering a BUG() results in the current application being killed by the kernel; this application would be your OpenGL application or potentially the X server). The accounting issue has been resolved in the 2.6.11 kernel.

We have added checks to recognize that the NVIDIA kernel module is being compiled for the x86-64 platform on a kernel between 2.6.0 and 2.6.11. In this case, we will disable usage of the change_page_attr kernel interface. This will avoid the accounting issue but leaves the system in danger of cache aliasing (see entry below on Cache Aliasing for more information about cache aliasing). Note that this change_page_attr accounting issue and BUG() can be triggered by other kernel subsystems that rely on this interface.

If you are using a 2.6 x86_64 kernel, it is recommended that you upgrade to a 2.6.11 or later kernel.

Also take note of common dma issues on 64-bit platforms in Appendix AA, Allocating DMA Buffers on 64-bit Platforms.

Cache Aliasing

Cache aliasing occurs when multiple mappings to a physical page of memory have conflicting caching states, such as cached and uncached. Due to these conflicting states, data in that physical page may become corrupted when the processor's cache is flushed. If that page is being used for DMA by a driver such as NVIDIA's graphics driver, this can lead to hardware stability problems and system lockups.

NVIDIA has encountered bugs with some Linux kernel versions that lead to cache aliasing. Although some systems will run perfectly fine when cache aliasing occurs, other systems will experience severe stability problems, including random lockups. Users experiencing stability problems due to cache aliasing will benefit from updating to a kernel that does not cause cache aliasing to occur.

NVIDIA has added driver logic to detect cache aliasing and to print a warning with a message similar to the following:

NVRM: bad caching on address 0x1cdf000: actual 0x46 != expected 0x73

If you see this message in your log files and are experiencing stability problems, you should update your kernel to the latest version.

If the message persists after updating your kernel, please send a bug report to NVIDIA.

64-Bit BARs (Base Address Registers)

Starting with native PCI Express GPUs, NVIDIA's GPUs will advertise a 64-bit BAR capability (a Base Address Register stores the location of a PCI I/O region, such as registers or a frame buffer). This means that the GPU's PCI I/O regions (registers and frame buffer) can be placed above the 32-bit address space (the first 4 gigabytes of memory).

The decision of where the BAR is placed is made by the system BIOS at boot time. If the BIOS supports 64-bit BARs, then the NVIDIA PCI I/O regions may be placed above the 32-bit address space. If the BIOS does not support this feature, then our PCI I/O regions will be placed within the 32-bit address space as they have always been.

Unfortunately, current Linux kernels (as of 2.6.11.x) do not understand or support 64-bit BARs. If the BIOS does place any NVIDIA PCI I/O regions above the 32-bit address space, the kernel will reject the BAR and the NVIDIA driver will not work.

There is no known workaround at this point.

Valgrind

The NVIDIA OpenGL implementation makes use of self modifying code. To force Valgrind to retranslate this code after a modification you must run using the Valgrind command line option:

--smc-check=all

Without this option Valgrind may execute incorrect code causing incorrect behavior and reports of the form:

==30313== Invalid write of size 4

Laptops

If you are using a laptop please see the "Known Laptop Issues" in Appendix I, Configuring a Laptop.

FSAA

When FSAA is enabled (the __GL_FSAA_MODE environment variable is set to a value that enables FSAA and a multisample visual is chosen), the rendering may be corrupted when resizing the window.

libGL DSO finalizer and pthreads

When a multithreaded OpenGL application exits, it is possible for libGL's DSO finalizer (also known as the destructor, or "_fini") to be called while other threads are executing OpenGL code. The finalizer needs to free resources allocated by libGL. This can cause problems for threads that are still using these resources. Setting the environment variable "__GL_NO_DSO_FINALIZER" to "1" will work around this problem by forcing libGL's finalizer to leave its resources in place. These resources will still be reclaimed by the operating system when the process exits. Note that the finalizer is also executed as part of dlclose(3), so if you have an application that dlopens(3) and dlcloses(3) libGL repeatedly, "__GL_NO_DSO_FINALIZER" will cause libGL to leak resources until the process exits. Using this option can improve stability in some multithreaded applications, including Java3D applications.

XVideo and the Composite X extension

XVideo will not work correctly when Composite is enabled. See Appendix S, The X Composite Extension.

This section describes problems that will not be fixed. Usually, the source of the problem is beyond the control of NVIDIA. Following is the list of problems:

Problems that Will Not Be Fixed

Gigabyte GA-6BX Motherboard

This motherboard uses a LinFinity regulator on the 3.3 V rail that is only rated to 5 A -- less than the AGP specification, which requires 6 A. When diagnostics or applications are running, the temperature of the regulator rises, causing the voltage to the NVIDIA chip to drop as low as 2.2 V. Under these circumstances, the regulator cannot supply the current on the 3.3 V rail that the NVIDIA chip requires.

This problem does not occur when the graphics board has a switching regulator or when an external power supply is connected to the 3.3 V rail.

VIA KX133 and 694X Chip sets with AGP 2x

On Athlon motherboards with the VIA KX133 or 694X chip set, such as the ASUS K7V motherboard, NVIDIA drivers default to AGP 2x mode to work around insufficient drive strength on one of the signals.

Irongate Chip sets with AGP 1x

AGP 1x transfers are used on Athlon motherboards with the Irongate chipset to work around a problem with signal integrity.

ALi chipsets, ALi1541 and ALi1647

On ALi1541 and ALi1647 chipsets, NVIDIA drivers disable AGP to work around timing issues and signal integrity issues. See Chapter 5, Common Problems for more information on ALi chipsets.

I/O APIC (SMP)

If you are experiencing stability problems with a Linux SMP machine and seeing I/O APIC warning messages from the Linux kernel, system reliability may be greatly improved by setting the "noapic" kernel parameter.

Local APIC (UP)

On some systems, setting the "Local APIC Support on Uniprocessors" kernel configuration option can have adverse effects on system stability and performance. If you are experiencing lockups with a Linux UP machine and have this option set, try disabling local APIC support.