Bugzilla – Bug 957061
[nvidia binary] segfault when using TSX (__lll_unlock_elision) (affects plasma5 screen unlocking)
Last modified: 2016-11-28 16:21:16 UTC
after installing the NVidia binary driver (version 358.16 "the hard way" or 352.55 from the repository), I am unable to unlock my plasma5 session once the screen locker steps in. This is not due to #931296 since kcheckpass works for me, also after submitting my password, a black screen is shown for about a second and then I'm back at the screen locker. I dug up several bug reports, but most of them are related to the kcheckpass and pam problem. This is not as the backtrace below shows. It seems that the Arch guys have been affected and they solved it with new Intel microcodes (https://bbs.archlinux.org/viewtopic.php?id=196536), however, that was still a Haswell system while mine is Skylake. I tried with the supplied 4.1.12 Kernel as well as 4.3.0 from Kernel:Stable. I'll post an update with the new microcodes from the Base_System repo (if I get them to be used) related: https://bugs.kde.org/show_bug.cgi?id=346938 https://bugs.kde.org/show_bug.cgi?id=346525 FYI (although I don't think, this is related): I do have a Asus Strix GTX 970 ===================================== > gdb /usr/lib64/libexec/kscreenlocker_greet GNU gdb (GDB; %maintenance_distribution) 7.9.1 Copyright (C) 2015 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-suse-linux". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://bugs.opensuse.org/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib64/libexec/kscreenlocker_greet...(no debugging symbols found)...done. Missing separate debuginfos, use: zypper install plasma5-workspace-debuginfo-5.4.3-122.1.x86_64 (gdb) run Starting program: /usr/lib64/libexec/kscreenlocker_greet [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x7fffdb6a9700 (LWP 3260)] qml: No Fill10 element found in your theme's battery.svg - Using legacy 20% steps for battery icon file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/components/InfoPane.qml:52:22: Unable to assign [undefined] to int file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/lockscreen/LockScreen.qml:165: TypeError: Cannot read property 'showPassword' of undefined file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/lockscreen/LockScreen.qml:207: TypeError: Cannot read property 'ButtonLabel' of undefined Locked at 1448836583 org.kde.keyboardLayout: Layouts list changed: ("de") file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/components/UserDelegate.qml:82:9: QML Image: Cannot open: file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/components/user-identity file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/components/UserDelegate.qml:82:9: QML Image: Cannot open: file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/components/system-log-out file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/components/UserDelegate.qml:82:9: QML Image: Cannot open: file:///usr/share/plasma/look-and-feel/org.openSUSE.desktop/contents/components/system-switch-user Detaching after fork from child process 3263. [Thread 0x7fffdb6a9700 (LWP 3260) exited] Program received signal SIGSEGV, Segmentation fault. 0x00007fffee6d41b8 in __lll_unlock_elision () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007fffee6d41b8 in __lll_unlock_elision () at /lib64/libpthread.so.0 #1 0x00007fffd80ffccc in () at /usr/lib64/libEGL_nvidia.so.0 #2 0x00007fffd808d252 in () at /usr/lib64/libEGL_nvidia.so.0 #3 0x00007fffffffdc70 in () #4 0x00007fffd81150b1 in () at /usr/lib64/libEGL_nvidia.so.0 #5 0x0000000000000000 in () (gdb) ===================================== cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 94 model name : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz stepping : 3 microcode : 0x33 cpu MHz : 800.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm hwp hwp_notify hwp_act_window hwp_epp intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 bugs : bogomips : 6383.86 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 94 model name : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz stepping : 3 microcode : 0x33 cpu MHz : 800.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm hwp hwp_notify hwp_act_window hwp_epp intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 bugs : bogomips : 6383.86 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 94 model name : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz stepping : 3 microcode : 0x33 cpu MHz : 800.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 2 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm hwp hwp_notify hwp_act_window hwp_epp intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 bugs : bogomips : 6383.86 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 94 model name : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz stepping : 3 microcode : 0x33 cpu MHz : 800.000 cache size : 6144 KB physical id : 0 siblings : 4 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm hwp hwp_notify hwp_act_window hwp_epp intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 bugs : bogomips : 6383.86 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:
ok, even after installing the new ucode-intel (and ucode-intel-blob) package from Base:System (which re-creates the initrd), either there are no updates, or this file is not being used (although the mkinitrd reported the use of GenuineIntel.bin) # dmesg | grep microcode [ 1.479081] microcode: CPU0 sig=0x506e3, pf=0x2, revision=0x33 [ 1.479090] microcode: CPU1 sig=0x506e3, pf=0x2, revision=0x33 [ 1.479094] microcode: CPU2 sig=0x506e3, pf=0x2, revision=0x33 [ 1.479101] microcode: CPU3 sig=0x506e3, pf=0x2, revision=0x33 [ 1.479136] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba Anyway, I currently don't see any workaround, except for installing the nouveau driver which has worse game performance and ~12W more idle power :(
FYI: according to iucode_tool, there are no microcode updates in the 20151106 microcode updates from https://downloadcenter.intel.com/search?keyword=Linux*+Processor+Microcode+Data+File > /usr/sbin/iucode_tool -tb -lS <path_to_extracted_microcodes> /usr/sbin/iucode_tool: system has processor(s) with signature 0x000506e3 selected microcodes: >
actually, the problem does not only affect plasma5 - today I encountered it with gnuplot by simply using the CLI and exiting right afterwards (without any input) I did (finally) find a bug report in the NVidia Forums though: https://devtalk.nvidia.com/default/topic/893325/linux/newest-and-beta-linux-driver-causing-segmentation-fault-core-dumped-on-all-skylake-platforms/ I'm aware, that openSUSE probably cannot change this and we need to wait for NVidia to fix the original issue - in the meanwhile, maybe this should probably be mentioned in https://en.opensuse.org/openSUSE:Most_annoying_bugs_42.1 or the release notes so people do not blame the distribution. Nico
FYI: a temporary workaround is to re-install the following packages _AFTER_ the nvidia binary installer has run: Mesa-libEGL1 Mesa-libEGL1-32bit -> this will set the Mesa EGL libs to be used which do not have this bug
This looks like some packaging problem on the nvidia drivers... Stefan any idea?
(In reply to Nico Kruber from comment #4) > FYI: a temporary workaround is to re-install the following packages _AFTER_ > the nvidia binary installer has run: > > Mesa-libEGL1 Mesa-libEGL1-32bit > > -> this will set the Mesa EGL libs to be used which do not have this bug I seriously doubt this. When nvidia-glG0X package is installed libEGL libs by NVIDIA are preferred (via /usr/X11R6/lib{,64} entries in /etc/ld.so.conf.d/nvidia-gfxG0X). Reinstalling Mesa EGL lib packages doesn't change this.
Ok. Not sure how I can help here. If the CPU claims to have the TSX flag, the software tries to use it. Who's wrong here? Firmware maybe still? Compiler? glibc? Or nvidia libs themselves? I don't know. I'm adding my contact at NVIDIA.
Daniel, are you aware of this issue?
FYI: the people over at Arch Linux [1] claim that the next NVidia driver will have this bug fixed. Unfortunately, the driver is not out yet. [1] https://bugs.archlinux.org/task/46064?project=1
Possible temporary workaround for now Add /lib64/noelision to /etc/ld.so.conf on an affected system. Remove this entry again once NVIDIA fixed the issue in their drivers. Does this help?
(In reply to Stefan Dirsch from comment #10) > Add > > /lib64/noelision > > to /etc/ld.so.conf thank you for this (better) workaround - it seems to work
In response to Stefan's question in comment #8: yes, NVIDIA is aware of the issue, and a fix has been identified and will hopefully be released soon.
Nico, thanks for the quick response. Daniel, good to hear this. :-)
(In reply to Daniel Dadap from comment #12) > In response to Stefan's question in comment #8: yes, NVIDIA is aware of the > issue, and a fix has been identified and will hopefully be released soon. Hi, what is the fix? It seems to me that the root cause is in libpthread.
(In reply to Oliver Neukum from comment #15) > Hi, what is the fix? It seems to me that the root cause is in libpthread. A lock was being destroyed twice, which is undefined behavior. This was a bug in the NVIDIA EGL driver, which so far seems to result in adverse effects only when lock elision is enabled in glibc.
*** Bug 960574 has been marked as a duplicate of this bug. ***
Finally, version 361.18 (beta) [1] solves this issue and is usable (as opposed to 361.16 beta). However, due to the integration of the OpenGL Vendor-Neutral Driver (GLVND) infrastructure, I had to re-install (via force update) all xorg and mesa related packages since the old nvidia driver installer removed some libraries. This may be different if installed via packages and only a few package re-installs may actually be required, e.g. libgl-related stuff, but I just wanted to be on the safe side. Should we close this bug, or should we wait for new rpm packages to be available in the repo? [1] http://www.geforce.com/drivers/beta-legacy, http://www.geforce.com/drivers/results/97474
(In reply to Nico Kruber from comment #18) > Should we close this bug, or should we wait for new rpm packages to be > available in the repo? For most users the rpm way is required I guess. Because the fine art of deep debugging is a special ability
packages with fixed driver (352.79) have been prepared and are expected to be released in the repo until the end of this week.
Clsoing as fixed.