Monday, December 12, 2016

KernelOops on Nvidia 367 with dual monitor setup

My home Linux display setup is complicated.  The Linux desktop has two monitors, and one of the monitors is connected to the desktop via a KVM switch.  The KVM is also connected to a non-Linux desktop.  I use a Nvidia GTX950 card on the Linux machine, which needs proprietary Nvidia drivers to run. In the last few weeks, I started getting Kerneloops when suspending the Linux machine.  The problem seemed to be the nvidia modules of the kernel.  I didn't have kernel oops earlier, so I started to track down the cause.

I finally found the issue was nvidia not being able to modeset correctly.  My previous Xorg config forced a specific screen layout.  The new Nvidia binary driver undid this, and that is roughly when the suspend problems started happening.

I have two screens: DFP-0, and DFP-4.  You can see your own list of screens by looking at the file /var/log/Xorg.0.log.  The relevant lines that show you which screens are connected is here:

[  6615.937] (--) NVIDIA(0): Ancor Communications Inc ASUS VS247 (DFP-0): connected
[  6615.937] (--) NVIDIA(0): Ancor Communications Inc ASUS VS247 (DFP-0): Internal TMDS
[  6615.937] (--) NVIDIA(0): Ancor Communications Inc ASUS VS247 (DFP-0): 330.0 MHz maximum pixel clock
[  6615.937] (--) NVIDIA(0): 
[  6615.937] (--) NVIDIA(0): DFP-1: disconnected
[  6615.937] (--) NVIDIA(0): DFP-1: Internal TMDS
[  6615.937] (--) NVIDIA(0): DFP-1: 330.0 MHz maximum pixel clock
[  6615.937] (--) NVIDIA(0): 
[  6615.937] (--) NVIDIA(0): DFP-2: disconnected
[  6615.937] (--) NVIDIA(0): DFP-2: Internal DisplayPort
[  6615.937] (--) NVIDIA(0): DFP-2: 960.0 MHz maximum pixel clock
[  6615.937] (--) NVIDIA(0): 
[  6615.937] (--) NVIDIA(0): DFP-3: disconnected
[  6615.937] (--) NVIDIA(0): DFP-3: Internal TMDS
[  6615.937] (--) NVIDIA(0): DFP-3: 330.0 MHz maximum pixel clock
[  6615.937] (--) NVIDIA(0): 
[  6615.937] (--) NVIDIA(0): Acer K272HUL (DFP-4): connected
[  6615.937] (--) NVIDIA(0): Acer K272HUL (DFP-4): Internal DisplayPort
[  6615.937] (--) NVIDIA(0): Acer K272HUL (DFP-4): 960.0 MHz maximum pixel clock
[  6615.937] (--) NVIDIA(0): 
[  6615.937] (--) NVIDIA(0): DFP-5: disconnected
[  6615.937] (--) NVIDIA(0): DFP-5: Internal TMDS
[  6615.937] (--) NVIDIA(0): DFP-5: 330.0 MHz maximum pixel clock
[  6615.937] (--) NVIDIA(0): 
[  6615.937] (--) NVIDIA(0): DFP-6: disconnected
[  6615.937] (--) NVIDIA(0): DFP-6: Internal DisplayPort
[  6615.937] (--) NVIDIA(0): DFP-6: 960.0 MHz maximum pixel clock
[  6615.937] (--) NVIDIA(0): 
[  6615.938] (--) NVIDIA(0): DFP-7: disconnected
[  6615.938] (--) NVIDIA(0): DFP-7: Internal TMDS
[  6615.938] (--) NVIDIA(0): DFP-7: 330.0 MHz maximum pixel clock
[  6615.938] (--) NVIDIA(0): 


In the above example, you can see that DFP-0 and DFP-4 are connected to the graphics device while the others are disconnected.

In my case, the graphics card was trying to determine which screens were still connected.  This is the incorrect behavior when I switch to the non-Linux computer through my KVM switch.  My Linux desktop resizes and all windows move to the screen that is still connected to the desktop.

To force X to force both screens to stay connected, you can use the magic 'ConnectedMonitor' option.  This goes in the Screen section.  It is confusing since the screens that you refer to in the Connected Monitor line are DFP entries that you get from Xorg, while the metamode lines refer to screens in the same terminology as the output from xrandr.  DVI-I-1 is the first DVI device, while DP-2 is the second DisplayPort device.   I suspect DFP means Digital Flat Panel, to distinguish it from Cathode Ray Tube (CRT) and Liquid Crystal Display (LCD).

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "metamodes" "DVI-I-1: nvidia-auto-select +2560+0 {rotation=left}, DP-2: nvidia-auto-select +0+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    Option         "ConnectedMonitor"   "DFP-0,DFP-4"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Using the example above you can force DFP-0 and DFP-4 as connected devices. Once you add the line in red, you need to restart X, and then the Nvidia driver will not reallocate the desktop when a screen is disconnected from the display (when the KVM switches).  As a happy side-effect, the Kerneloops while suspending are gone as well.