Bug 1084411 - monitor remains disconnected and displays nothing in 2 nouveau card setup
monitor remains disconnected and displays nothing in 2 nouveau card setup
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: X.Org
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: Michal Srb
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-08 00:32 UTC by Vasilis Liaskovitis
Modified: 2018-06-21 21:46 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
kernel log with drm.debug=0x1e (1.43 MB, text/x-log)
2018-03-08 00:32 UTC, Vasilis Liaskovitis
Details
Xorg log (95.60 KB, text/plain)
2018-03-08 00:33 UTC, Vasilis Liaskovitis
Details
xrandr output (2.10 KB, text/plain)
2018-03-08 00:34 UTC, Vasilis Liaskovitis
Details
TW Xorg log with xf86-video-nouveau installed (85.67 KB, text/plain)
2018-03-15 12:01 UTC, Vasilis Liaskovitis
Details
additional kernel output after "xrandr --setprovideroutputsource 1 0" (68.74 KB, text/x-log)
2018-03-23 14:21 UTC, Vasilis Liaskovitis
Details
Updated provider autoconfiguration patch (3.76 KB, patch)
2018-03-27 11:01 UTC, Michal Srb
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Vasilis Liaskovitis 2018-03-08 00:32:29 UTC
Created attachment 763038 [details]
kernel log with drm.debug=0x1e

Kernel 4.15.7-1, using nouveau cards:

03:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K620] (rev a2)
0b:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 710] (rev a1)

/dev/dri/card0 -> DP-1 -> Monitor 1
/dev/dri/card1 -> DVI-D-1 -> Monitor 2
 
Monitor 2 DVI-D-1 (connected do card1 0b:00) always appears disconnected, and does not display anything (using GNOME). xrandr only sees monitor 1 connected to card0.

Same setup works on 42.3.

I will attach relevant logs. I am not entirely sure it is a kernel drm problem yet or something related to X.
Comment 1 Vasilis Liaskovitis 2018-03-08 00:33:07 UTC
Created attachment 763039 [details]
Xorg log
Comment 2 Vasilis Liaskovitis 2018-03-08 00:34:11 UTC
Created attachment 763040 [details]
xrandr output

The second monitor does not appear at all in the xrandr output.
Comment 3 Vasilis Liaskovitis 2018-03-08 00:41:06 UTC
xrandr --listproviders output is weird for the "missing" 2nd monitor:

Providers: number : 2
Provider 0: id: 0xf5; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 4; outputs: 2; associated providers: 0; name: modesetting
    output DVI-I-1
    output DP-1
Provider 1: id: 0x46; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 4; outputs: 3; associated providers: 0; name: modesetting
    output 0x43
    output 0x44
    output 0x45

On 42.3, it looks normal:

Providers: number : 2
Provider 0: id: 0xc5; cap: 0x7 (Source Output, Sink Output, Source Offload); crtcs: 4; outputs: 2; associated providers: 1; name: nouveau
    output DVI-I-1
    output DP-1
Provider 1: id: 0x66; cap: 0x7 (Source Output, Sink Output, Source Offload); crtcs: 4; outputs: 3; associated providers: 1; name: nouveau
    output DVI-D-1-1
    output HDMI-1-1
    output VGA-1-1
Comment 4 Vasilis Liaskovitis 2018-03-08 00:42:55 UTC
On some boots, but not all (e.g. not in the kernel log attached), I have seen invalid EDID for the "missing" port/monitor:

kernel: nouveau 0000:0b:00.0: DVI-D-1: EDID is invalid:
[...]
kernel: nouveau 0000:0b:00.0: DRM: DDC responded, but no EDID for DVI-D-1

The monitor is disconnected / displays nothing, regardless of whether the invalid EDID message appears in the logs.
Comment 5 Takashi Iwai 2018-03-09 08:07:26 UTC
Could you check 4.14.x kernel in OBS home:tiwai:kernel:4.14 repo and see whether the problem persists?  You can try other older kernels, too.
We'd like to know whether it's a kernel regression, and if so, at which kernel it was introduced.
Comment 6 Vasilis Liaskovitis 2018-03-09 10:39:38 UTC
(In reply to Takashi Iwai from comment #5)
> Could you check 4.14.x kernel in OBS home:tiwai:kernel:4.14 repo and see
> whether the problem persists?  You can try other older kernels, too.
> We'd like to know whether it's a kernel regression, and if so, at which
> kernel it was introduced.

On TW, I tried 4.14, 4.10, 4.4 (and kotd 4.16rc4). The monitor never works.

On 42.3 I tried 4.15 (and default 4.4), the 2nd monitor works.

Weird. Is there some nouveau or  kernel-firmware package needed?

I wonder if it could be an Xorg or GNOME issue instead. But even when booting with multi-user (not graphical) target, there is never a vt / framebuffer on the second monitor. So I assume the issue is still at a lower level (drm / modesetting). Not sure.

The 2 monitors also work fine (in TW) if connected to a single dri card.
Comment 7 Takashi Iwai 2018-03-12 09:33:16 UTC
(In reply to Vasilis Liaskovitis from comment #6)
> (In reply to Takashi Iwai from comment #5)
> > Could you check 4.14.x kernel in OBS home:tiwai:kernel:4.14 repo and see
> > whether the problem persists?  You can try other older kernels, too.
> > We'd like to know whether it's a kernel regression, and if so, at which
> > kernel it was introduced.
> 
> On TW, I tried 4.14, 4.10, 4.4 (and kotd 4.16rc4). The monitor never works.
> 
> On 42.3 I tried 4.15 (and default 4.4), the 2nd monitor works.
> 
> Weird. Is there some nouveau or  kernel-firmware package needed?
> 
> I wonder if it could be an Xorg or GNOME issue instead. But even when
> booting with multi-user (not graphical) target, there is never a vt /
> framebuffer on the second monitor. So I assume the issue is still at a lower
> level (drm / modesetting). Not sure.
> 
> The 2 monitors also work fine (in TW) if connected to a single dri card.

Very weird...  Do you pass the different boot options, e.g. video=xxx ?

Leap 42.3 is, as default, with drm-kmp, which is based on 4.9.x kernel drm codes.  But I don't think this is the difference, as many other kernel versions you've tested didn't work, either.

The kernel-firmware package is likely mandatory, but usually it's already installed and no much update needed regarding nouveau.

But you can try to *downgrade* kernel-firmware package in the level of your working setup (Leap 42.3), too.

Other than that, I have no idea, too.  Better to report to upstream, I suppose.
Comment 8 Michal Srb 2018-03-12 09:55:03 UTC
It seems that on Leap 42.3 you were using nouveau (xf86-video-nouveau) X driver, but on Tumbleweed you are using modesetting (part of xorg-x11-server) driver.

Try if installing xf86-video-nouveau on Tumbleweed makes it better.
Comment 9 Vasilis Liaskovitis 2018-03-15 12:01:21 UTC
Created attachment 763786 [details]
TW Xorg log with xf86-video-nouveau installed

(In reply to Michal Srb from comment #8)
> It seems that on Leap 42.3 you were using nouveau (xf86-video-nouveau) X
> driver, but on Tumbleweed you are using modesetting (part of
> xorg-x11-server) driver.
> 
> Try if installing xf86-video-nouveau on Tumbleweed makes it better.

yes, I forgot to mention I have already tried installing xf86-video-nouveau on TW as well. The monitor still doesn't work. I am attaching the new Xorg log as well in case it can help.
Comment 10 Michal Srb 2018-03-19 13:30:43 UTC
I don't have a setup with two nvidia GPUs, but I have nvidia + usd displaypod (udl driver) which used to work with drm prime in Leap 42.3, but no longer does in Tumbleweed. I see similar strange outputs listed by `xrandr --listproviders` as in comment 3. I can't tell if it is the same problem, but it may be, debugging...
Comment 11 Michal Srb 2018-03-22 10:13:58 UTC
Here are my observations so far:

The "output" lines under provider in xrandr's output are SUSE specific extension. X server sends list of output IDs for every provider and xrandr is trying to match them to the output details (including names). However, the list of output details only contains outputs from the main GPU and all slave GPUs associated as output sources. If the slave GPU is not associated, it's outputs are not found and xrandr prints the ID.

So the difference between Leap 42.3 and Tumbleweed is in whether the secondary GPU is automatically associated. In Leap 42.3 it happened thanks to "n_xserver-optimus-autoconfig-hack.patch". In Tumbleweed this patch was disabled by tobias.johannes.klausmann@mni.thm.de, reportedly it caused problems.

So in Tumbleweed you have to associate the GPUs manually yourself using command like: xrandr --setprovideroutputsource 1 0

After that you should be able to configure the additional monitors using xrandr or any graphical randr tool. Please check if everything works fine after that.

The hope is that one day the graphical randr tools will be able to configure providers, but until then it has to be done using the xrandr command. Or maybe we reintroduce the patch to X server...
Comment 13 Vasilis Liaskovitis 2018-03-23 14:21:54 UTC
Created attachment 764770 [details]
additional kernel output after "xrandr --setprovideroutputsource 1 0"

(In reply to Michal Srb from comment #11)
> Here are my observations so far:
> 
> The "output" lines under provider in xrandr's output are SUSE specific
> extension. X server sends list of output IDs for every provider and xrandr
> is trying to match them to the output details (including names). However,
> the list of output details only contains outputs from the main GPU and all
> slave GPUs associated as output sources. If the slave GPU is not associated,
> it's outputs are not found and xrandr prints the ID.
> 
> 
> So in Tumbleweed you have to associate the GPUs manually yourself using
> command like: xrandr --setprovideroutputsource 1 0
> 
> After that you should be able to configure the additional monitors using
> xrandr or any graphical randr tool. Please check if everything works fine
> after that.

thanks for the explanation. I tried the above command (also tried 1 1), but the monitor stays blank. Is this something that should be applied during xorg startup? I just tried it after normal (gdm) login.

I am attaching the additional kernel output (drm.debug=0x1e) before and after issuing "xrandr --setprovideroutputsource 1 0" in case it helps.

> So the difference between Leap 42.3 and Tumbleweed is in whether the
> secondary GPU is automatically associated. In Leap 42.3 it happened thanks
> to "n_xserver-optimus-autoconfig-hack.patch". In Tumbleweed this patch was
> disabled by tobias.johannes.klausmann@mni.thm.de, reportedly it caused
> problems.

Is there a link to the problems observed with the patch?

Are additional fixes are need to re-enable "n_xserver-optimus-autoconfig-hack.patch"? I tried to enabled it in a local xorg-x11-server build, and after restart, the system keeps switching between vt outputs, even with 1card- 1output only connected ... it's unusable.

> 
> The hope is that one day the graphical randr tools will be able to configure
> providers, but until then it has to be done using the xrandr command. Or
> maybe we reintroduce the patch to X server...
Comment 14 Vasilis Liaskovitis 2018-03-23 14:25:02 UTC
I have tied the above with and without xf86-video-nouveau, same output.

I also noticed mesa-dri-nouveau is installed. I can try without this.
Comment 15 Michal Srb 2018-03-23 14:40:35 UTC
(In reply to Vasilis Liaskovitis from comment #13)
> I tried the above command (also tried 1 1), but
> the monitor stays blank. Is this something that should be applied during
> xorg startup? I just tried it after normal (gdm) login.

To be clear, the parameters after --setprovideroutputsource are the IDs or indexes of the providers. In your case I think it should be "1 0", but if they were in opposite order, it could be "0 1" as well.

This command alone would not bring the monitor up, but after issuing it, you should be able to set up the monitors of the secondary GPU using xrandr or any graphical randr tool.

> I am attaching the additional kernel output (drm.debug=0x1e) before and
> after issuing "xrandr --setprovideroutputsource 1 0" in case it helps.

I don't see anything obviously wrong in the log. Could you please attach /var/log/Xorg.0.log and the output of `xrandr` and `xrandr --listproviders` after calling the "xrandr --setprovideroutputsource ..." command?

> Is there a link to the problems observed with the patch?

I am not aware of any. As far as I know the problems were not described.

> Are additional fixes are need to re-enable
> "n_xserver-optimus-autoconfig-hack.patch"? I tried to enabled it in a local
> xorg-x11-server build, and after restart, the system keeps switching between
> vt outputs, even with 1card- 1output only connected ... it's unusable.

Interesting, I will try to reproduce that and modify the patch to get it working again.
Comment 16 Vasilis Liaskovitis 2018-03-23 21:17:49 UTC
(In reply to Michal Srb from comment #15)
> This command alone would not bring the monitor up, but after issuing it, you
> should be able to set up the monitors of the secondary GPU using xrandr or
> any graphical randr tool.

Oh right. Ok it looks good. After "xrandr --setprovideroutputsource 1 0":

xrandr --listproviders:

Providers: number : 2
Provider 0: id: 0xf5; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 4; outputs: 2; associated providers: 1; name: modesetting
    output DVI-I-1
    output DP-1
Provider 1: id: 0x46; cap: 0xf (Source Output, Sink Output, Source Offload, Sink Offload); crtcs: 4; outputs: 3; associated providers: 1; name: modesetting
    output DVI-D-1-1
    output HDMI-1-1
    output VGA-1-1

and the monitor works after e.g.:

xrandr --output DVI-D-1-1 --mode 1920x1080
xrandr --output DP-1 --left-of DVI-D-1-1 

Xorg.log does not contain additional output from the above commands.

> > Are additional fixes are need to re-enable
> > "n_xserver-optimus-autoconfig-hack.patch"? I tried to enabled it in a local
> > xorg-x11-server build, and after restart, the system keeps switching between
> > vt outputs, even with 1card- 1output only connected ... it's unusable.
> 
> Interesting, I will try to reproduce that and modify the patch to get it
> working again.

thanks. I will try to test it again as well.
Comment 17 Michal Srb 2018-03-27 11:01:26 UTC
Created attachment 765075 [details]
Updated provider autoconfiguration patch

I have tested the original patch and it was crashing X server with assertion failure. It is because the internals of how screens and GPU screens are connected changed in X server:
https://cgit.freedesktop.org/xorg/xserver/commit/?id=5c7af02b103790ac1fb6a71822788892c70290b6

I have updated the patch and it now works fine for me.
Updated xorg-x11-server package is built in:
https://build.opensuse.org/project/show/home:michalsrb:branches:bnc1084411:X11:XOrg

Vasilis, can you please test it?
Comment 18 Vasilis Liaskovitis 2018-03-27 14:34:44 UTC
(In reply to Michal Srb from comment #17)
> I have updated the patch and it now works fine for me.
> Updated xorg-x11-server package is built in:
> https://build.opensuse.org/project/show/home:michalsrb:branches:bnc1084411:
> X11:XOrg

thanks Michal, yes I confirm that the above xorg-x11-server package with the updated patch works fine for me as well. Both monitors, one on each GPU, work out of the box now.
Comment 19 Stefan Dirsch 2018-03-27 14:50:06 UTC
Just accepted Michal's SR. Will be in Factory/TW soon. Michal, could you submit this for sle15/Leap15 as well?
Comment 20 Michal Srb 2018-03-27 14:59:56 UTC
Just submitted.
Leap 15: https://build.opensuse.org/request/show/591655
SLE 15: https://build.suse.de/request/show/160327

Closing the bug.
Comment 21 Andreas Stieger 2018-03-27 15:05:36 UTC
(In reply to Michal Srb from comment #20)
> Leap 15: https://build.opensuse.org/request/show/591655

Leap 15 will get it from SLE 15 via leaper