Bugzilla – Bug 1022727
yast2 problem: Window does not open if started remotely via ssh when DRI3 is enabled
Last modified: 2020-07-31 17:15:27 UTC
From a leap 42.2 system in combination with a SLES12SP2 server yast2 does not open its (X-)window if started remotely on the SLES12SP2 server via ssh -X -Y. So I log in via ssh -X -Y from my leap 42.2 desktop running KDE to the SLES12SP2 server and then in this console I start yast2. The result is: nothing happens. The yast2 window does not appear, not error message or warning or any other output. This does only happen when the server is a SLES12SP2. It does e.g. work as expected if the remote server is a SLES12SP1 or SLES11SP4. By chance I found out that if I start *locally* on the leap 42.2 system a gnome-terminal, or a wireshark or probably any other gnome app and close it again and then login remotely via ssh -X -Y to the SLES12SP2 server everything *works* fine. The remote yast2 window appears immedeately on my leap42.2 desktop. By comparing the processes running on my leap42.2 desktop before and after I started the gnome-terminal for the very first time I found out that the following dbus-processes are new after starting gnome-terminal: user 4782 0.0 0.0 335468 7292 ? Sl 08:21 0:00 /usr/lib/at-spi2/at-spi-bus-launcher user 4787 0.0 0.0 39616 3824 ? S 08:21 0:00 /bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2/accessibility.conf --nofork --print-address 3 user 4789 0.0 0.0 194712 4876 ? Sl 08:21 0:00 /usr/lib/at-spi2/at-spi2-registryd --use-gnome-session I guess that these dbus processes makes the difference. Since yast2 also uses gnome by default in SLES12SP2 and I locally run a KDE desktop the problem might might be one of the two choices: 1. When starting my local KDE desktop the dbus-processes from above should have been started automatically after my login or when doing the ssh to the server, which does not happen 2. When starting yast2 remotely these processes should be started but are not. I also had contact with a SLES support engineer (SuSE Service Request # 101047389271) but it remains unclear if the new yast2 on SLES12SP2 causes this problem or if leap42.2 does something wrong. Rainer
Unfortunately it is not obvious what the exact problem is. I added a card to the YaST trello task board so that the issue is prioritised with the other tasks.
This sounds really weird; I never heard of anything like this before. Can you try to start another plain Qt 5 (non-KDE) program that way? qt5ct (available as a separate package) comes to mind, or any GUI program from /usr/lib64/qt5/bin . I wouldn't know of anything in YaST that would use DBUS in any special way (any way that other plain Qt 5 programs don't use). I suspect that the problem is somewhere deep down in Qt itself.
I just tried: - opened remote login via ssh -X -Y on SLES12SP2 host - started qtconfig on the remote side. Works. - started xterm on the remote side. Works. - started yast2. Nothing happens, no window.....Crtl-C. - started gnome-terminal on my local system. OK. Terminated gnome-terminal. - started yast2 on the remote side. Works. Yes this is weird, but its true. Rainer
Hm - qtconfig belongs to Qt 4. That might behave differently. Can you try with a Qt 5 program?
If you don't find anything different, try this: https://github.com/shundhammer/qdirstat RPMs (SLE-12, Leap, TW): https://software.opensuse.org/download/package?project=home:shundhammer:qdirstat-stable&package=qdirstat I happen to know the author. And it might even be useful for you. ;-)
Thanks for the qdirstat link, SLES12SP2 does not really contain many qt programs ... But first one more thing. Are you sure qt is to blame? I used yast2 rzinstal1:~ # bash -x yast2 + '[' -f /.instsys.config ']' ... + DESKTOP_GUI=gtk + '[' gtk = qt -a '!' -x /usr/lib/YaST2/bin/y2controlcenter ']' + '[' gtk = gtk -a '!' -x /usr/lib/YaST2/bin/y2controlcenter-gnome ']' + WANTED_SHELL=gtk + y2ccbin= + case "$WANTED_SHELL" in + y2ccbin=/usr/lib/YaST2/bin/y2controlcenter-gnome + '[' menu == menu -a -x /usr/lib/YaST2/bin/y2controlcenter-gnome ']' + /usr/lib/YaST2/bin/y2controlcenter-gnome So its using the gnome version by default. yast2 --qt leads to a call of /usr/lib/YaST2/bin/y2base menu qt yast2 (gnome) is the one that works after a gnome-terminal call. yast2 --qt does then still not work, it hangs when started, no message and no window. qdirstat prints a one line message "Logging to /tmp/qdirstat-0.log" and then hangs just like yast2 --qt. The SLES12SP2 servers I manage are all VMs running on a Citrix XenServer 6.2. They are no fresh installs, thats what just came to my mind. They were upgraded from SLES11SP4 to SLES12, SLES12SP1 and then to SLES12SP2. I am right now installing a fresh physical SLES12SP2 to see, if this makes any difference.
The freshly installed SLES12SP2 host shows the same behavior. By the way my leap 42.2 desktop is also fresh installed. Perhaps the only thing that is a little special compared to most other 42.2 installs is that it is a NFS4 client, so my home directory is on a NFS4 share. But I cannot see that this could be a problem. Rainer
Duh - my fault, I had assumed you used a yast module directly from that remote server to your machine. So we are actually talking about the control center(s), of which we have two, one Qt based which uses some KDE classes, and one Gtk based. I am trying to find out how much (if any) KDE infrastructure that Qt/KDE one needs got get alive. As for the NFS share, it might possible make a difference for some sockets created in the home directory, or maybe even for file locking. Could you create a local user on that machine for a short while with a local (non-NFS) home directory to exclude that possibility?
GitHub repos for the control centers: https://github.com/yast/yast-control-center (Qt 5) https://github.com/yast/yast-control-center-gnome
Thanks for your answer. The local user was not the problem nor NFS. I myself was the problem, so first of all sorry for wasting your time... I tried a lot since yesterday eg installed a new SLES12SP2, a new 42.2 system with NFS and with rootfs-encryption like my desktop where I have this problem. Doing so I suddenly remembered that when I first used 42.2 there was an annoying keyring dialog upon kde login asking me to unlock it. Its grabbing the keyboard and since I use kde wallet I now had to unlock two keyrings gnome and kde, which is of course nonsense. So I wanted to get rid of the gnome-keyring tried several hints I could google quickly but non of them help and finally I decided to commit a "murder" at gnome-keyring. I uninstalled the rpm gnome-keyring using --nodepos to ignore shown dependencies. The annoying keyring dialog upon login was gone from there on, but its revenge haunted me from this time on, because the removal caused the ssh problem with yast2. I verified this in between by re-installing gnome-keyring (remote yast2 works after kde-relogin) and removing gnome-keyring again (remote yast2 no longer works on SLES12SP2). Strange enough I can login remotely anywhere and everything works except for SLES12SP2. So this is the problem is know and not a real 42.2 problem, since ignoring dependencies is dangerous, I know... Does perhaps by chance someone know how to avoid the modal gnome-keyring unlock window upon kde login ....? So you can close this bug. Thanks a lot again.
OK then. Nevermind. Shit happens. ;-)
I guess I was to quick about saying that it works :-( Yesterday I simply tested if the yast2 window opens, which was originally not true, but after reinstalling gnome-keyring it worked again. However I did not test to open any module of yast2, which still does not work from my desktop. Actually I reinstalled my desktop (reconfiguring it as NFS client again) today to avoid trouble from possibly missing packages... To use NFS and centrally managed users it needs also configuration for ldap, krb5, sssd, autofs and nis. When running yast2 remotely on a SLES12SP2 host the controlcenter appears. When I select any module to start nothing happens again. If I try to run eg yast sw_single also nothing happens, no window opens. Also qdirstat does not open. On the other side I have a notebook with 42.2 where a yast2 disk started remotely on the same SLES12SP2 host does work without problems. So my guess is, that some part of the configuration from above might cause the trouble but at the moment I have no idea how to find out more. If anyone has an idea please let me know... Thanks Rainer
After searching a while for this symptom I came across this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1174257#c6 The problem seems to be dri3 forwarding over ssh. In the bug report there is a workaround. If I start yast2 on the remote side in the following way it works: $ LIBGL_DRI3_DISABLE=1 /sbin/yast2 disk Just running $ /sbin/yast2 disk does not work. Why this only happens when I connect to a SLES12SP2 system is something I cannot explain. It happens with SLES12SP2 VMs as well as physical SLES12SP2 systems. Running: $ grep DRI /var/log/Xorg.0.log 2306.669] (II) glamor: EGL version 1.4 (DRI2): [ 2306.785] (II) modeset(0): [DRI2] Setup complete [ 2306.785] (II) modeset(0): [DRI2] DRI driver: i965 [ 2306.785] (II) modeset(0): [DRI2] VDPAU driver: i965 [ 2306.808] (II) GLX: Initialized DRI2 GL provider for screen on my local 42.2 system, you see that my system is not using DRI3.
Well, but then this is a completely separate problem. Please file a separate bug report for this, otherwise everything gets confused. Closing this one again so we can keep track of things.
OK, so we did things in parallel. Sorry for that. Reopening to minimize the confusion. BNC screening team: Please notice that this is an X11 or Qt problem, not a YaST problem.
IIRC DRI3 is not being mentioned in Xserver log when enabled. I believe you can see it in output of glxinfo.
Max, I remember we've seen this before. If time allows, can yout look into this again?
If I remember correctly, this is something up with Leap 42.2, not with SLE 12 SP2 or any other SUSE system.
Could well be. I believe we've updated Mesa for Leap 42.2
Hmm, but I think it only to happens when we connect *from* Leap 42.2 *to* a DRI 3 user... so maybe the X server? For an easy reproducer, start glxgears instead of a Qt 5 application. It hangs just as well :)
it is not Yast2 only : same problem with Konqueror etc. no problem with 11.4 and 42.1 but with 42.2 and 42.3 The workaround LIBGL_DRI3_DISABLE=1 konqueror or yast is OK
Hmm, I think back in 42.1 we didn't have DRI 3 support in the X server, and now that the machine you're connecting from is advertising it, the X client (i.e. the program you're running on the machine you connected to) locks up. I'm not sure whether this can be patched at all right now.
Note to self: In Mesa, dri3_create_display() sets pdp->base.createScreen = dri3_create_screen; Later: dri3_create_screen() calls loader_dri3_open() calls xcb_dri3_open_reply() calls xcb_wait_for_reply() calls wait_for_reply() calls poll() which locks up. So it seems like we don't get the necessary or full response from the server? I won't get to dig into this right now I'm afraid, but I'll keep it in mind. It's pretty puzzling for the unsuspecting user indeed. Basically, anything using 3D in any shape or form should be affected.
When X server answers to the DRI3Open request, it first sends a file descriptor over the socket followed by the reply. This of course can not work with remote clients and it would fail if they were connected over TCP connection. But when forwarding over SSH, the SSH client is connected to the X server locally over unix socket, so the file descriptor is sent successfully. I have no idea what the SSH client does with it, but it seems that it gets confused and doesn't forward the regular reply either. Ideally the SSH client would somehow forbid fd passing on the unix socket, so the attempt would fail. I did not find any way to do this.
The correct solution for this and all similar problems would be if ssh actively announced to the X server that the connection is remote. There is support for it in X server since 2011: https://cgit.freedesktop.org/xorg/xserver/commit/?id=e2c7d70e5ddb8b17676a13ceebfbb87d14d63243 The initial handshake starts with 'l' or 'B' character defining byte order. That commit added 'r' and 'R' characters which mean little/big endian + announcement that the connection is remote. X server then knows that it should exclude all local-only functionality, including DRI3. All ssh needs to do is change the first byte in the x11 channel. Unfortunately it looks like that functionality was never implemented. Ssh already modifies the handshake (it completely replaces the authentication data), so it would be really easy to also change the byte order character. We should be pushing this feature to ssh upstream. Adding openssh maintainer to CC. Meanwhile we can workaround it by always setting LIBGL_DRI3_DISABLE=1 to environment when connecting with ssh.
Here is openssh branch with experimental patch that marks the forwarded connection is remote: https://build.opensuse.org/package/show/home:michalsrb:branches:bnc1022727:network/openssh It fixes the bug for me.
Created attachment 738857 [details] Mark x11 connection as remote So as far as I can tell, this can be only fixed on ssh side. Either by using the attached patch (sent to openssh-unix-dev, so far without response, https://lists.mindrot.org/pipermail/openssh-unix-dev/2017-August/036180.html), or by setting LIBGL_DRI3_DISABLE=1 environmental variable in the environment when using the ssh forwarding. Reassigning to openssh maintainer.
So the discussion on how to fix it systematically from SSH side is still going upstream. Meanwhile, there is a workaround we can apply to X server: https://cgit.freedesktop.org/xorg/xserver/commit/?id=adefbaee499b9679c6cac21f52ec6545af2b36b5 I am going to backport it to solve this bug for now.
Submitted to openSUSE: https://build.opensuse.org/request/show/532204 And to SLE-12: https://build.suse.de/request/show/143135 https://build.suse.de/request/show/143136 https://build.suse.de/request/show/143137 It is necessary only for X servers with DRI3 (>=1.15.0) and before the fix went upstream (<1.19.4).
openSUSE-RU-2017:2732-1: An update that has one recommended fix can now be installed. Category: recommended (moderate) Bug References: 1022727 CVE References: Sources used: openSUSE Leap 42.3 (src): xorg-x11-server-7.6_1.18.3-25.1 openSUSE Leap 42.2 (src): xorg-x11-server-7.6_1.18.3-12.23.1
SUSE-SU-2017:3047-1: An update that fixes 14 vulnerabilities is now available. Category: security (moderate) Bug References: 1022727,1051150,1052984,1061107,1063034,1063035,1063037,1063038,1063039,1063040,1063041 CVE References: CVE-2017-12176,CVE-2017-12177,CVE-2017-12178,CVE-2017-12179,CVE-2017-12180,CVE-2017-12181,CVE-2017-12182,CVE-2017-12183,CVE-2017-12184,CVE-2017-12185,CVE-2017-12186,CVE-2017-12187,CVE-2017-13721,CVE-2017-13723 Sources used: SUSE Linux Enterprise Software Development Kit 12-SP3 (src): xorg-x11-server-7.6_1.18.3-76.15.2 SUSE Linux Enterprise Software Development Kit 12-SP2 (src): xorg-x11-server-7.6_1.18.3-76.15.2 SUSE Linux Enterprise Server for Raspberry Pi 12-SP2 (src): xorg-x11-server-7.6_1.18.3-76.15.2 SUSE Linux Enterprise Server 12-SP3 (src): xorg-x11-server-7.6_1.18.3-76.15.2 SUSE Linux Enterprise Server 12-SP2 (src): xorg-x11-server-7.6_1.18.3-76.15.2 SUSE Linux Enterprise Desktop 12-SP3 (src): xorg-x11-server-7.6_1.18.3-76.15.2 SUSE Linux Enterprise Desktop 12-SP2 (src): xorg-x11-server-7.6_1.18.3-76.15.2