Bugzilla – Bug 1101560
kwin frequent crashes with nvidia
Last modified: 2023-04-10 15:42:57 UTC
kwin frequently crashes during normal operation since the last kernel update. I tried with both the standard Nvidia driver from repos and the latest one (390.77) recommended from nvidia ("hard way") but there is no difference. Unfortunately it is very hard to do debugging for kwin: there is no xsession-errors visible ... Any help is appreciated ... system data: locutus:/home/thommie # uname -r 4.12.14-lp150.12.4-default locutus:/home/thommie # hwinfo --gfxcard 33: PCI 100.0: 0300 VGA compatible controller (VGA) [Created at pci.378] Unique ID: VCu0.WokWmMyK+cF Parent ID: vSkL.xltNQFyR_DB SysFS ID: /devices/pci0000:00/0000:00:01.0/0000:01:00.0 SysFS BusID: 0000:01:00.0 Hardware Class: graphics card Model: "nVidia GM206 [GeForce GTX 960]" Vendor: pci 0x10de "nVidia Corporation" Device: pci 0x1401 "GM206 [GeForce GTX 960]" SubVendor: pci 0x10b0 "CardExpert Technology" SubDevice: pci 0x1401 Revision: 0xa1 Driver: "nvidia" Driver Modules: "nvidia" Memory Range: 0xde000000-0xdeffffff (rw,non-prefetchable) Memory Range: 0xc0000000-0xcfffffff (ro,non-prefetchable) Memory Range: 0xd0000000-0xd1ffffff (ro,non-prefetchable) I/O Ports: 0xe000-0xefff (rw) Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled) IRQ: 57 (126226 events) Module Alias: "pci:v000010DEd00001401sv000010B0sd00001401bc03sc00i00" Driver Info #0: Driver Status: nouveau is not active Driver Activation Cmd: "modprobe nouveau" Driver Info #1: Driver Status: nvidia_drm is active Driver Activation Cmd: "modprobe nvidia_drm" Driver Info #2: Driver Status: nvidia is active Driver Activation Cmd: "modprobe nvidia" Config Status: cfg=new, avail=yes, need=no, active=unknown Attached to: #29 (PCI bridge) Primary display adapter: #33
Please try if choosing different rendering backend makes any difference. (systemsettings5 -> Display and Monitor -> Compositor -> Rendering backend). Is there any kwin coredump in the output of the `coredumpctl` command? If yes, please dump it into file and attach to the bug.
Currently I am using default values (OpenGL 2.0 and scaling "smooth"), will play with OpenGL 3.1 or XRender for testing. Animation speed is medium (unchanged). The crashes appear especially when moving from one workspace to the next (no matter which effect is used) and when using a Windows VMs through VMWare Workstation 14 (which is a GTK-based application). Other GTK-based applications (Firefox, Thunderbird) cause no problems. Window decoration disappears and windows are freezed/not moveable. A "kwin_x11 --replace &" re-creates the previous state. no kwin coredump in coredumpctl yet
Created attachment 777288 [details] coredumpctl -r Created directly after crashed kwin and before "kwin_x11 --replace"
compositor change to OpenGL 3.1 has no effect. Now trying with XRender but that reduces available effects ...
The change to XRender has no effect too. The kwin crash comes a bit less frequently but it still comes ...
Thank you for trying the rendering backends! (In reply to Thomas Rother from comment #2) > Window decoration disappears and windows are freezed/not moveable. A > "kwin_x11 --replace &" re-creates the previous state. > > no kwin coredump in coredumpctl yet This makes me think that maybe kwin does not fully crash, but is stuck for some reason. Could you please try to get backtrace and coredump this way: 1) In the graphical session, find the value of DISPLAY and XAUTHORITY variables: > echo $DISPLAY :0 > echo $XAUTHORITY /run/user/... 2) SSH into the computer from another machine using the same user as in the graphical session. 3) Set the DISPLAY and XAUTHORITY variables to the same values as above: export DISPLAY=:0 export XAUTHORITY=/run/user/... 4) Start kwin in gdb in the ssh session: gdb --args kwin_x11 --replace 5) Reproduce the issue. 6) The gdb is now hopefully showing segfault, abort or something. If not, hit CTRL+C to get gdb prompt. Then run the "generate-core-file": (gdb) generate-core-file Saved corefile core.25218 And attach it to the bug.
Created attachment 777296 [details] gtk3 config for VMware Workstation 14 (unchanged values after installation)
In the ssh session, I can start kwin directly (kwin_x11 --replace) but not inside gdb (gdb --args kwin_x11 --replace). The gdb command has no effect on the graphic session and - in case of the crash - the following generate-core-file says: You can't do that without a process to debug. If I start/replace kwin_x11 directly from ssh shell I get the following: thommie@locutus:~> kwin_x11 --replace QXcbConnection: XCB error: 3 (BadWindow), sequence: 179, resource id: 6291462, major code: 20 (GetProperty), minor code: 0 QXcbConnection: XCB error: 3 (BadWindow), sequence: 192, resource id: 6291462, major code: 20 (GetProperty), minor code: 0 org.kde.kwindowsystem.keyserver.x11: Your keyboard setup doesn't provide a key to use for meta. See 'xmodmap -pm' or 'xkbcomp $DISPLAY' OpenGL vendor string: NVIDIA Corporation OpenGL renderer string: GeForce GTX 960/PCIe/SSE2 OpenGL version string: 4.6.0 NVIDIA 390.77 OpenGL shading language version string: 4.60 NVIDIA Driver: NVIDIA Driver version: 390.77 GPU class: Unknown OpenGL version: 4.6 GLSL version: 4.60 X server version: 1.19.6 Linux kernel version: 4.12.14 Requires strict binding: no GLSL shaders: yes Texture NPOT support: yes Virtual Machine: no kf5.kcoreaddons.desktopparser: Property type "Url" is not a known QVariant type. Found while parsing property definition for "X-KWin-Video-Url" in "/usr/share/kservicetypes5/kwineffect.desktop" QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2497, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2498, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2507, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2508, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2509, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2510, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2511, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2513, resource id: 0, major code: 14 (GetGeometry), minor code: 0 libkwinglutils: Failed to read shader "blinking-startup-fragment.glsl" And in the moment of the kwin crash: trying to show an empty dialog QOpenGLContext::swapBuffers() called with non-exposed window, behavior is undefined The X11 connection broke: No error (code 0) XIO: fatal IO error 17 (Die Datei existiert bereits) on X server ":0" after 7326 requests (7326 known processed) with 0 events remaining. QObject::~QObject: Timers cannot be stopped from another thread QtDBus: cannot relay signals from parent QObject(0x56064198fea0 "") unless they are emitted in the object's thread QThread(0x560641913570 ""). Current thread is QSGRenderThread(0x560641ff8a40 ""). Any ideas?
The next crash showed: .... trying to show an empty dialog QOpenGLContext::swapBuffers() called witht non-exposed window, behavior is undefined The X11 connection broke: No error (code 0) XIO: fatal IO error 17 (Die Datei existiert bereits) on X server ":0" after 103449 requests (103449 known processed) with 0 events remaining. QObject::~QObject: Timers cannot be stopped from another thread QThread::wait: Thread tried to wait on itself QtDBus: cannot relay signals from parent QObject(0x563d7c7b0680 "") unless they are emitted in the object's thread QThread(0x563d7c735570 ""). Current thread is QSGRenderThread(0x563d7ce9a040 ""). ....
The event QOpenGLContext::swapBuffers() called with non-exposed window, behavior is undefined is triggered any time when I switch from one virtual display to another. In most situations nothing happens but after some tries I get a crash starting with ... The X11 connection broke: No error (code 0) XIO: fatal IO error 11 (Die Ressource ist zur Zeit nicht verfügbar) on X server ":0" ...
Interesting, it looks like this upstream bug: https://bugs.kde.org/show_bug.cgi?id=388127 So it seems kwin terminates because it looses connection to X. And X probably disconnected it because of the swapBuffers issue. In the bug they claim that it is caused by the Desktop Switch On-Screen Display. (systemsettings5 -> Desktop Behavior -> Virtual Desktops -> Switching) And indeed, if I enable it I can see the same warnings about "QOpenGLContext::swapBuffers". I have not yet been able to reproduce a crash.
Any idea for a workaround so far? I can enter kwin_x11 --replace when switching desktops and after a crash of the UI but thats a bit annoying ...
You can disable the "Desktop Switch On-Screen Display" (see comment 11) as a workaround. It would actually be good to get confirmation that disabling it prevents the crash.
When disabling the on screen display during display shift in config, the crash seems to disappear too. I have no trying to show an empty dialog QOpenGLContext::swapBuffers() called witht non-exposed window, behavior is undefined events any more. So, the workaround is "disable the on screen display during display shift".
this bug also affects me. i also have nvidia gfx card. i now try the workaround with disabling the OSD for desktop switching.
Hmm. Does this only happen with NVIDIA drivers or are other drivers also affected by that issue? Not sure which driver Rainer is using here ...
(In reply to Stefan Dirsch from comment #16) > Hmm. Does this only happen with NVIDIA drivers or are other drivers also > affected by that issue? Not sure which driver Rainer is using here ... as far as i read about this issue, ALL people which reported this issue, are using nvidia gfx cards with binary nvidia driver. i am currently using the latest nvidia driver version 396.45 from today. but i am experiencing this issue for a long time now (also with older drivers from nvidia), but it took me some time, to figure out, what is crashing here. i am using: openSUSE Tumbleweed 20180714 plasmashell 5.13.3 Qt: 5.11.1 KDE Frameworks: 5.48.0 kf5-config: 1.0 kernel: 4.17.8 xorg server: 1.20.0
> The X11 connection broke: No error (code 0) This happens when reply was expect but neither reply nor error was received. The QGLXContext::swapBuffer calls glXSwapBuffers, which eventually calls __glXGetDrawableScreen (because of glvnd), that sends request to X server and expects answer which never arrives. My guess is that with any regular driver GLX sends back BadWindow error and kwin ignores that. (At least usually it ignores BadWindow errors because it works with windows of other applications and those may get destroyed at any time.) But when nvidia driver is in use, X server uses nvidia-libglx.so instead. It looks like their GLX implementation does not respond with proper X11 errors in some error paths. If that is true, it is bug in the nvidia driver. In addition, kwin should not trigger that error in the first place. So a fix in kwin, or maybe Qt, is also needed.
Ok. Adding our contact at @NVIDIA.
and it really seems to only happen, when switching desktops, and having enabled "Desktop Switch On-Screen Display".
(In reply to Rainer Klier from comment #20) > and it really seems to only happen, when switching desktops, and having > enabled "Desktop Switch On-Screen Display". confirmed. since I disabled On Screen Display to appear during/after the switch of a display/screen, I have NO crashes at all.
have you try adding to xorg conf file this: Option "UseNvKmsCompositionPipeline" "false" this does fixe the problem: source: https://devtalk.nvidia.com/default/topic/1029484/linux/-various-all-distros-numerous-performance-amp-rendering-issues-on-390-25/post/5255627/#5255627 apparently the whole 390.xx had caused many performance and rendering issues ...but this xorg conf thing has fixed the kwin crash issue for me at least. but now the latest 390.77 (& 396.45) driver release has just fixed kwin crash issue, it would be nice if the driver would be updated ASP. https://devtalk.nvidia.com/default/topic/1037553/unix-graphics-announcements-and-news/linux-solaris-and-freebsd-driver-390-77-long-lived-branch-release-/ (assuming we are talking about the same issue that is)
The initial reporter stated that 390.77 would *not* fix the issue for him. Hmm ...
(In reply to Alexander Ahjolinna from comment #22) > but now the latest 390.77 (& 396.45) driver release has just fixed kwin > crash issue, it would be nice if the driver would be updated ASP. > https://devtalk.nvidia.com/default/topic/1037553/unix-graphics-announcements- > and-news/linux-solaris-and-freebsd-driver-390-77-long-lived-branch-release-/ > > (assuming we are talking about the same issue that is) i think this is something different, because, as i already stated in #comment17 https://bugzilla.opensuse.org/show_bug.cgi?id=1101560#c17 i already use 396.45 and i had the issue, until i disabled the On Screen Display to appear during/after the switch of a display/screen.
I just noticed a patch for libxcb on xorg-devel mailing list that claims to fix this issue: https://lists.x.org/archives/xorg-devel/2018-August/057419.html A Leap 15 libxcb package with that patch is built here: https://build.opensuse.org/package/show/home:michalsrb:bnc1101560:openSUSE:Leap:15.0:Update/libxcb Repository: https://download.opensuse.org/repositories/home:/michalsrb:/bnc1101560:/openSUSE:/Leap:/15.0:/Update/standard Anyone who is able to reproduce it, can you please test if it really fixes it? Remember to re-enable the "Desktop Switch On-Screen Display" if you disabled it as workaround.
(In reply to Michal Srb from comment #25) > A Leap 15 libxcb package with that patch is built here: > https://build.opensuse.org/package/show/home:michalsrb:bnc1101560:openSUSE: > Leap:15.0:Update/libxcb > > Repository: > https://download.opensuse.org/repositories/home:/michalsrb:/bnc1101560:/ > openSUSE:/Leap:/15.0:/Update/standard > > Anyone who is able to reproduce it, can you please test if it really fixes > it? Remember to re-enable the "Desktop Switch On-Screen Display" if you > disabled it as workaround. i am able to reproduce it, but since i am using tumbleweed i need tumbleweed packages instead of leap packages to test....
(In reply to Rainer Klier from comment #26) > i am able to reproduce it, but since i am using tumbleweed i need tumbleweed > packages instead of leap packages to test.... Sure! Tumbleweed project: https://build.opensuse.org/package/show/home:michalsrb:bnc1101560:openSUSE:Tumbleweed/libxcb Repository: https://download.opensuse.org/repositories/home:/michalsrb:/bnc1101560:/openSUSE:/Tumbleweed/openSUSE_Tumbleweed/
(In reply to Michal Srb from comment #27) > > Sure! Tumbleweed project: > https://build.opensuse.org/package/show/home:michalsrb:bnc1101560:openSUSE: > Tumbleweed/libxcb > > Repository: > https://download.opensuse.org/repositories/home:/michalsrb:/bnc1101560:/ > openSUSE:/Tumbleweed/openSUSE_Tumbleweed/ ok, thanks, already installed the new packages. so far, i didn't face the crash again. but it is only some hours now...... i will report back, when i tried longer.
ok, now a full work day and a half, and no kwin crash so far. looks good! i hope, it says that way. if not, i will update this issue. thanks Michal for the packages!
Thank you for the testing. I have now submitted the patch to devel project and SLE15. It will eventually appear as update in Tumbleweed and Leap 15. Until then you can use the package from my home project. If the bug happens again, please reopen.
(In reply to Michal Srb from comment #30) > Thank you for the testing. > > I have now submitted the patch to devel project and SLE15. It will > eventually appear as update in Tumbleweed and Leap 15. Until then you can > use the package from my home project. > > If the bug happens again, please reopen. Concerning Leap 15, the latest regular update for Nvidia came yesterday, /var/log/zypp/history: 2018-09-01 08:02:01|install|nvidia-gfxG04-kmp-default|390.87_k4.12.14_lp150.11-lp150.10.1|x86_64|root@locutus.netzwissen.loc|nvidia2|0dbf65deeaaaa747d66b3e2475285a129bc8ccb685e5912544abc2510cb2d8ac| 2018-09-01 08:02:05|install|nvidia-glG04|390.87-lp150.10.1|x86_64|root@locutus.netzwissen.loc|nvidia2|5db3e8040de9da04b963864e6379dac6c2d103bab2f158de83f09762d4f09163| 2018-09-01 08:02:06|install|nvidia-computeG04|390.87-lp150.10.1|x86_64|root@locutus.netzwissen.loc|nvidia2|85a91e0e9a2b0c67b207c22688f683b92f041e14437623fda716537a784ce7f0| 2018-09-01 08:02:09|install|x11-video-nvidiaG04|390.87-lp150.10.1|x86_64|root@locutus.netzwissen.loc|nvidia2|1e582cc97be0e965e3ae214488e567c81879da84469e5af2725d0e6dcc2da127| 2018-09-01 08:02:10|install|obs-studio|22.0.2-lp150.1.1|x86_64|root@locutus.netzwissen.loc|packman|33bb95685a5fa4a246b63fe2b6bc6ed25c7d5cf6c349f21555b7840d4820ef82| locutus:/home/thommie # But apparently without the patch, I still see the kwin crash soon after activating the on screen display. Maybe the deployment for Leap 15 failed?
The fix is in libxcb, not in the nvidia driver packages. See comment#25 for a fixed package.
(In reply to Stefan Dirsch from comment #34) > The fix is in libxcb, not in the nvidia driver packages. See comment#25 for > a fixed package. Ah, sorry, my fault. I will test the patched libxcb immediately. Thx, Thommie
With Michaels libxcb patch the crash with activated on screen display seems to appear a bit less often , but it is not gone. If I start kwin from the command line again, I see this after 2-4 switches from one virtual workspace to the other: thommie@locutus:~> kwin_x11 --replace QXcbConnection: XCB error: 3 (BadWindow), sequence: 179, resource id: 104857606, major code: 20 (GetProperty), minor code: 0 QXcbConnection: XCB error: 3 (BadWindow), sequence: 192, resource id: 104857606, major code: 20 (GetProperty), minor code: 0 OpenGL vendor string: NVIDIA Corporation OpenGL renderer string: GeForce GTX 960/PCIe/SSE2 OpenGL version string: 4.6.0 NVIDIA 390.87 OpenGL shading language version string: 4.60 NVIDIA Driver: NVIDIA Driver version: 390.87 GPU class: Unknown OpenGL version: 4.6 GLSL version: 4.60 X server version: 1.19.6 Linux kernel version: 4.12.14 Requires strict binding: no GLSL shaders: yes Texture NPOT support: yes Virtual Machine: no kf5.kcoreaddons.desktopparser: Property type "Url" is not a known QVariant type. Found while parsing property definition for "X-KWin-Video-Url" in "/usr/share/kservicetypes5/kwineffect.desktop" QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2079, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2080, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2081, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2082, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2083, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2085, resource id: 0, major code: 14 (GetGeometry), minor code: 0 QXcbConnection: XCB error: 9 (BadDrawable), sequence: 2089, resource id: 0, major code: 14 (GetGeometry), minor code: 0 libkwinglutils: Failed to read shader "blinking-startup-fragment.glsl" trying to show an empty dialog QOpenGLContext::swapBuffers() called with non-exposed window, behavior is undefined The X11 connection broke: No error (code 0) XIO: fatal IO error 17 (Die Datei existiert bereits) on X server ":0" after 4041 requests (4041 known processed) with 0 events remaining. QObject::~QObject: Timers cannot be stopped from another thread QtDBus: cannot relay signals from parent QObject(0x55bd1b45cbd0 "") unless they are emitted in the object's thread QThread(0x55bd1b3be570 ""). Current thread is QSGRenderThread(0x55bd1b9d8c20 "").
Note that the patch did not get to Leap 15 update yet, so you would have to install it from the repository in comment 25. Can you verify that you have the right libxcb package installed? If you run command like this: rpm -q --changelog libxcb1 | head You should see this on top: - U_don-t-flag-extra-reply-in-xcb_take_socket.patch * Fix IO errors with KWin in combination with NVIDIA driver. (bnc#1101560)
ok,the problem was that I am not familiar enough with the tweaks of OBS. I first tried with the src.rpm, then with instructions on https://software.opensuse.org//download.html?project=home%3Amichalsrb%3Abnc1101560%3AopenSUSE%3ALeap%3A15.0%3AUpdate&package=libxcb zypper addrepo https://download.opensuse.org/repositories/home:michalsrb:bnc1101560:openSUSE:Leap:15.0:Update/standard/home:michalsrb:bnc1101560:openSUSE:Leap:15.0:Update.repo zypper refresh zypper install libxcb But it is "libxcb1", not "libxcb". Now I have the patch: locutus:/home/thommie # rpm -q --changelog libxcb1 | head * Mon Aug 13 2018 msrb@suse.com - U_don-t-flag-extra-reply-in-xcb_take_socket.patch * Fix IO errors with KWin in combination with NVIDIA driver. (bnc#1101560) Will give further feedback during the day ...
The libxcb patch definitely solves the problem with the on screen display on leap 15. It should be deployed with the next libxcb upgrade.
SUSE-RU-2018:3228-1: An update that has one recommended fix can now be installed. Category: recommended (moderate) Bug References: 1101560 CVE References: Sources used: SUSE Linux Enterprise Module for Desktop Applications 15 (src): libxcb-1.13-3.3.1 SUSE Linux Enterprise Module for Basesystem 15 (src): libxcb-1.13-3.3.1
openSUSE-RU-2018:3294-1: An update that has one recommended fix can now be installed. Category: recommended (moderate) Bug References: 1101560 CVE References: Sources used: openSUSE Leap 15.0 (src): libxcb-1.13-lp150.2.3.1