Bug 1091245 - [20180425] Black screen in VNC because GTK selects Wayland backend in X11 session.
[20180425] Black screen in VNC because GTK selects Wayland backend in X11 ses...
Status: IN_PROGRESS
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: GNOME
Current
Other Other
: P2 - High : Normal (vote)
: ---
Assigned To: E-mail List
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-27 15:37 UTC by Sergio Lindo Mansilla
Modified: 2020-06-04 13:04 UTC (History)
1 user (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
black_screen.jpg (209.80 KB, image/jpeg)
2018-04-27 15:37 UTC, Sergio Lindo Mansilla
Details
firewall_disabled-listening_on_port.txt (3.60 KB, text/plain)
2018-04-27 15:50 UTC, Sergio Lindo Mansilla
Details
tcpdumpt.out (222.93 KB, application/octet-stream)
2018-05-15 11:46 UTC, Sergio Lindo Mansilla
Details
tcpdump_sddm.out (37.23 KB, application/octet-stream)
2018-05-22 15:17 UTC, Sergio Lindo Mansilla
Details
tcpdump_xdm.out (959.77 KB, application/x-xz)
2018-05-29 13:27 UTC, Sergio Lindo Mansilla
Details
journal_log-vnc_tw.txt (21.04 KB, text/plain)
2018-07-05 11:59 UTC, Sergio Lindo Mansilla
Details
journal_log_from_boot-vnc_tw.txt (618.87 KB, text/plain)
2018-07-05 12:03 UTC, Sergio Lindo Mansilla
Details
gnome-session-check-accelerated-1.coredump (750.35 KB, application/x-xz)
2018-07-17 15:23 UTC, Sergio Lindo Mansilla
Details
Choose backend based on current XDG_SESSION_TYPE. (1.64 KB, patch)
2018-07-30 08:57 UTC, Michal Srb
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sergio Lindo Mansilla 2018-04-27 15:37:22 UTC
Created attachment 768547 [details]
black_screen.jpg

It happened to me in three different scenarios:

- After an upgrade from Leap 42.3 to Tumbleweed.
- After a new installation of Tumbleweed.
- After a VNC-remote new installation of Tumbleweed.
Comment 1 Sergio Lindo Mansilla 2018-04-27 15:50:49 UTC
Created attachment 768556 [details]
firewall_disabled-listening_on_port.txt
Comment 2 Michal Srb 2018-05-11 07:28:04 UTC
It looks like Xvnc did not get any response from the display manager.

Which display manager are you using? (gdm, sddm, xdm, ...)

Do you have XDMCP enabled? (File /etc/sysconfig/displaymanager key DISPLAYMANAGER_REMOTE_ACCESS.)
Comment 3 Sergio Lindo Mansilla 2018-05-14 16:22:42 UTC
- After an upgrade from Leap 42.3 to Tumbleweed. (lightdm, DISPLAYMANAGER_REMOTE_ACCESS="yes")
- After a new installation of Tumbleweed. (gdm)
- After a VNC-remote new installation of Tumbleweed. (gdm)
Comment 4 Sergio Lindo Mansilla 2018-05-14 16:32:15 UTC
DISPLAYMANAGER_REMOTE_ACCESS="yes" on all 3 scenarios
Comment 5 Michal Srb 2018-05-15 09:12:26 UTC
I tried to reproduce it with new installation of Tumbleweed, but it works (after enabling VNC in YaST, disabling firewall and restarting display-manager).

It would be helpful to know whether Xvnc managed to contact the display manager. Can you record tcpdump of communication on localhost when you connect to the VNC? For example with:

    tcpdump -i lo -w /tmp/tcpdump.out
Comment 6 Sergio Lindo Mansilla 2018-05-15 11:46:25 UTC
Created attachment 770269 [details]
tcpdumpt.out

Hallo Michal,

Perhaps something related, since you didn't mention it.
Did you have to manually install the package "xorg-x11-Xvnc"?
Because there is still a bug, fixed on upstream Yast and not yet in Tumbleweed 20180511, that those packages are not installed and then yast's "Remote Administration (VNC)" complaints that no services were found for tigervnc and tigervnc-http to configure for firewalld.

I also get black screen with the firewall disabled.

Please, find output file attached.
Comment 7 Michal Srb 2018-05-15 12:03:06 UTC
(In reply to Sergio Lindo Mansilla from comment #6)
> Created attachment 770269 [details]
> tcpdumpt.out

Thank you. The tcpdump shows that XDMCP negotiation went fine and then some X clients (presumably GDM greeter and other helper processes) connected to X display number :1 (that is Xvnc). However none of the X clients did any rendering, just queried few extensions and atoms, which looks like GTK initialization.

I would suspect that something is stuck waiting in GDM, but you said that you observed the same issue also with lightdm. However, that was after upgrade, so maybe something else was broken in that situation and we are seeing two different problems.

Can you please try with just another display manager, perhaps XDM or KDM?

> Because there is still a bug, fixed on upstream Yast and not yet in
> Tumbleweed 20180511, that those packages are not installed and then yast's
> "Remote Administration (VNC)" complaints that no services were found for
> tigervnc and tigervnc-http to configure for firewalld.

I saw that error too, but it should not matter after disabling the firewall.
Comment 8 Sergio Lindo Mansilla 2018-05-16 14:36:14 UTC
Waiting for https://bugzilla.opensuse.org/show_bug.cgi?id=1093369 to be resolved.
Comment 9 Sergio Lindo Mansilla 2018-05-22 15:17:04 UTC
Created attachment 771005 [details]
tcpdump_sddm.out

The same happens with sddm.
Comment 10 Michal Srb 2018-05-22 15:21:52 UTC
(In reply to Sergio Lindo Mansilla from comment #9)
> The same happens with sddm.

Sorry, I should have explicitly mentioned this: SDDM does not support XDMCP, so it can not display the login screen in VNC. (It never did.)

XDM or KDM would be good candidates to try.
Comment 11 Sergio Lindo Mansilla 2018-05-29 13:27:05 UTC
Created attachment 771684 [details]
tcpdump_xdm.out

Hello Michal,

I have tried with xdm/IceWM and the VNC connection works properly.
I also attached the tcpdump in case you find it useful.

What's next?
Comment 12 Michal Srb 2018-05-30 14:21:12 UTC
Great. So that means the problem is somewhere in GDM-Xvnc interaction.

Please switch back to GDM and enable debugging in /etc/gdm/custom.conf:

    [debug]
    # Uncomment the line below to turn on debugging
    Enable=true

Restart display-manager or reboot, then try to connect to the machine using VNC. GDM should log debug information into journal. You can then get it out into file for example using `journalctl -b 0 > journal.txt`. Please attach the file here.
Comment 13 Sergio Lindo Mansilla 2018-07-05 11:59:14 UTC
Created attachment 776200 [details]
journal_log-vnc_tw.txt

Here the journal just before connecting until viewer appears with black screen.
Comment 14 Sergio Lindo Mansilla 2018-07-05 12:03:20 UTC
Created attachment 776202 [details]
journal_log_from_boot-vnc_tw.txt

And here the journal log from boot
Comment 15 Michal Srb 2018-07-16 08:37:20 UTC
Thank you for the logs, it shows that gnome-session-check-accelerated crashed:

> Jul 05 11:40:19 linux-aj2v kernel: gnome-session-c[3452]: segfault at 0 ip
> 0000000000000000 sp 00007ffd688c8098 error 14 in
> gnome-session-check-accelerated[562538e8c000+2000]

Gnome checks if rendering acceleration is available before starting the session or GDM. When running under Xvnc it should detect that software acceleration from Mesa is available and return success. But for some reason it crashed instead.

Please check if you have a coredump recorded with the `coredumpctl` command. If there is, please export the coredump file and attach it to the bug. For example using `coredumpctl dump <PID> -o gnome-session-check-accelerated.coredump`.
Comment 16 Sergio Lindo Mansilla 2018-07-17 15:23:15 UTC
Created attachment 777186 [details]
gnome-session-check-accelerated-1.coredump

It seems I get two coredumps each time I execute vncviewer.
Comment 17 Michal Srb 2018-07-18 13:09:11 UTC
Thank you for the coredump. This is what is happening:

* gnome-session-check-accelerated checks if XDG_SESSION_TYPE is "wayland". If it would be, it would assume acceleration is working and exit early. But we are running in Xvnc, so not a wayland, so it goes further.

* It opens display using `gdk_display_get_default ()`. This returns general `GdkDisplay`, which can open both x11 or wayland.

* It calls `gdk_x11_get_xatom_by_name_for_display (display, ...)` on it. This function assumes that the display is GdkX11Display and casts it to it. But later it crashes because the data do not match GdkX11Display at all. If I cast it to GdkWaylandDisplay, the data start to make sense.

So the problem is that `gdk_display_get_default` opens some (?!) wayland display instead of the Xvnc's X display. Question remains why...
Comment 18 Michal Srb 2018-07-18 13:32:40 UTC
I just noticed that I had this in my /etc/gdm/custom.conf on my Tumbleweed testing machine:
  [daemon]
  WaylandEnable=false

If I remove that, I GDM running in Wayland mode and I can reproduce the bug as well. In theory GDM should be able to show X11 greeters to XDMCP clients (such as Xvnc) even if it is running in Wayland mode on seat0, so I will keep investigating.

You can use this setting as a workaround in case you are not using Wayland sessions.
Comment 19 Michal Srb 2018-07-18 15:13:58 UTC
Small correction: `gdk_display_get_default` just gives the already opened display. It was opened by `gdk_display_manager_open_display`.

The environment of the session contains these relevant variables:
  DISPLAY=::1:0
  XDG_SESSION_TYPE=x11

GDK is not using them when deciding which backend to use. It just iterates over all its backends (wayland, x11, broadway - in this order) and tries to open display using each of them. The wayland one calls `wl_display_connect`. The WAYLAND_DISPLAY variable is not set, so it tries the default "wayland-0". This one exists because of the GDM on seat0, so it is opened. The crash of gnome-session-check-accelerated follows. Even if that would not crash, all GTK applications would be displaying on seat0 instead of VNC.

This isn't only VNC problem. There is easier way to reproduce the issue:
1) Start any X11-based session. For example Gnome on Xorg.
2) As the same user, start any wayland compositor. For example a windowed
   weston.
3) In the X11 session attempt to start any GDK application.
4) Surprise, the application shows window inside the Wayland session, not in
   the X11 session where it was started.

This has to be fixed in GTK. I opened upstream bug:
https://gitlab.gnome.org/GNOME/gtk/issues/1224

Qt for example respects the XDG_SESSION_TYPE variable.
Comment 20 Sergio Lindo Mansilla 2018-07-18 15:27:11 UTC
ok, thanks for the workaround :)
Comment 21 Michal Srb 2018-07-30 08:57:55 UTC
Created attachment 778330 [details]
Choose backend based on current XDG_SESSION_TYPE.

This patch implements the fix in GTK same as Qt done.
Comment 22 Michal Srb 2018-07-30 09:07:07 UTC
The solution to read the XDG_SESSION_TYPE variable to determine the preferred backend has met lot of resistance in the Gnome bug. I honestly do not understand why. No other solutions were proposed.

This is a real problem and we need to have it fixed. This is reproducible in Leap 15 and SLE15 as well. The VNC and X are working as expected, the problem is in GTK, so I am reassigning this to Gnome team. Sorry for throwing it on you, but I don't think there is more I can do. I think everything is described here and in the Gnome bug, but if you have any questions, feel free to ask.

The patch attached above fixes the issue. I would be in favor of putting it in our gtk3 package, but be aware that upstream does not appear to like it.
Comment 23 Michal Srb 2019-03-12 14:52:54 UTC
Good news, there is progress in the upstream bug. A second bug was opened in order to have just the problem description alone without my proposed solution:
https://gitlab.gnome.org/GNOME/gtk/issues/1741

And it seems that the same solution was chosen in the end... So once there is a final patch, you can replace my temporary one with it.