Bug 1075878 - Apocalyptic loss of proc/tmpfs/sysfs mounts after WIFI disconnect through NetworkManager
Apocalyptic loss of proc/tmpfs/sysfs mounts after WIFI disconnect through Net...
Status: NEW
: 1119189 (view as bug list)
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Network
Current
x86-64 openSUSE Factory
: P5 - None : Critical (vote)
: ---
Assigned To: openSUSE GNOME
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-01-13 13:45 UTC by Niels van Mourik
Modified: 2020-08-03 07:22 UTC (History)
3 users (show)

See Also:
Found By: Customer
Services Priority:
Business Priority:
Blocker: Yes
Marketing QA Status: ---
IT Deployment: ---
fvogt: needinfo? (johannes)


Attachments
Terminal screenshot with mount output before (right pane;), and after WIFI disconnect (left). (338.99 KB, image/jpeg)
2018-01-13 13:48 UTC, Niels van Mourik
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Niels van Mourik 2018-01-13 13:45:33 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Build Identifier: 

I reinstalled Tumbleweed last week and since then, a major bug grinds my entire system to a halt. So far I figured out that disconnecting from a WIFI network or connecting to a new network, triggers this rather apocalyptic symptom.

As soon as the call to NM happens, procfs disappears rendering the entire system unusable. Commands like sudo/mount/reboot/halt etc no more work, sockets disappear and the only way to reboot is through Alt+SysRq. Kernel and processes keep running, but spawning processes and any other primitive operations fail.

Reproducible: Always

Steps to Reproduce:
All these things trigger the behavior:
- Disconnecting the auto-connected WIFI network through Plasma's network applet.
- Connect to different WIFI network through Plasma's network applet.
- nmcli connection down wlp3s0 <ESSID>
- Logged in as a different user with less privilege groups, same thing happens.
- Removed connections from /etc/NetworkManager/system-connections/ and readded, still happens.
Actual Results:  
Mounts immediately disappear, logging no longer works, system basically melts down. Mouse cursor keeps moving and windows are still drag-gable, tabs in Konsole can still be closed/moved, but no new ones added for instance. Higher level services like D-BUS still seem active, anything low-level is gone.

Expected Results:  
Basic WIFI operation.

$ nmcli
wlp3s0: connected to RNC
        "Intel Wireless 7260 (Dual Band Wireless-AC 7260)"
        wifi (iwlwifi), E8:B1:FC:F7:6A:A2, hw, mtu 1500
        ip4 default
        inet4 10.0.1.22/24
        inet6 fe80::6dd8:d933:d607:d8e1/64

$ dmesg
00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 0b)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b)
00:14.0 USB controller: Intel Corporation 8 Series USB xHCI HC (rev 04)
00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04)
00:16.3 Serial controller: Intel Corporation 8 Series HECI KT (rev 04)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-LM (rev 04)
00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 6 (rev e4)
00:1c.1 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 3 (rev e4)
00:1d.0 USB controller: Intel Corporation 8 Series USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation 8 Series LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 8 Series SMBus Controller (rev 04)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader (rev 01)
03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)

$ dmesg|grep "wlp3s0\|iwlwifi\|error\|firmware"
[    3.162392] psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 3/3
[    5.372976] tpm tpm0: A TPM error (6) occurred attempting to read a pcr value
[    5.530898] iwlwifi 0000:03:00.0: loaded firmware version 17.608620.0 op_mode iwlmvm
[    5.550854] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
[    5.570969] iwlwifi 0000:03:00.0: base HW address: e8:b1:fc:f7:6a:a2
[    5.813741] iwlwifi 0000:03:00.0 wlp3s0: renamed from wlan0
[    6.815055] psmouse serio3: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 3/3
[    7.723688] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready
[    7.974563] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready
[    8.259831] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready
[    8.350551] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready
[   11.603406] IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready
[   14.554637] wlp3s0: authenticate with c0:a0:bb:ee:42:b4
[   14.559795] wlp3s0: send auth to c0:a0:bb:ee:42:b4 (try 1/3)
[   14.563528] wlp3s0: authenticated
[   14.563979] wlp3s0: associate with c0:a0:bb:ee:42:b4 (try 1/3)
[   14.569912] wlp3s0: RX AssocResp from c0:a0:bb:ee:42:b4 (capab=0x431 status=0 aid=2)
[   14.571276] wlp3s0: associated
[   14.836520] IPv6: ADDRCONF(NETDEV_CHANGE): wlp3s0: link becomes ready

$  find /etc/NetworkManager/
/etc/NetworkManager/
/etc/NetworkManager/dispatcher.d
/etc/NetworkManager/dispatcher.d/nfs
/etc/NetworkManager/dispatcher.d/99tlp-rdw-nm
/etc/NetworkManager/dispatcher.d/ntp
/etc/NetworkManager/dispatcher.d/autofs
/etc/NetworkManager/VPN
/etc/NetworkManager/VPN/nm-vpnc-service.name
/etc/NetworkManager/VPN/nm-openvpn-service.name
/etc/NetworkManager/NetworkManager.conf
/etc/NetworkManager/system-connections
/etc/NetworkManager/system-connections/... (stripped for privacy reasons)

$ cat /etc/NetworkManager/NetworkManager.conf
[main]
plugins=ifcfg-suse,keyfile

[connectivity]
uri=http://conncheck.opensuse.org
Comment 1 Niels van Mourik 2018-01-13 13:48:40 UTC
Created attachment 755965 [details]
Terminal screenshot with mount output before (right pane;), and after WIFI disconnect (left).
Comment 2 Niels van Mourik 2018-01-13 14:19:00 UTC
$ uname -a
Linux laptop 4.14.12-1-default #1 SMP PREEMPT Fri Jan 5 18:15:55 UTC 2018 (3cf399e) x86_64 x86_64 x86_64 GNU/Linux

$ ls -l /var/crash
total 0
Comment 3 Niels van Mourik 2018-01-13 14:20:47 UTC
$ tail -n 30 /var/log/wpa_supplicant.log
rfkill: WLAN soft blocked
nl80211: deinit ifname=p2p-dev-wlp3s0 disabled_11b_rates=0
nl80211: deinit ifname=wlp3s0 disabled_11b_rates=0
wlp3s0: SME: Trying to authenticate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=5220 MHz)
wlp3s0: Trying to associate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=5220 MHz)
wlp3s0: Associated with <REDACTED MAC>
wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=DE
wlp3s0: WPA: Key negotiation completed with <REDACTED MAC> [PTK=CCMP GTK=TKIP]
wlp3s0: CTRL-EVENT-CONNECTED - Connection to <REDACTED MAC> completed [id=0 id_str=]
wlp3s0: CTRL-EVENT-DISCONNECTED bssid=<REDACTED MAC> reason=3 locally_generated=1
wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
Successfully initialized wpa_supplicant
wlp3s0: SME: Trying to authenticate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=2437 MHz)
wlp3s0: Trying to associate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=2437 MHz)
wlp3s0: Associated with <REDACTED MAC>
wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=DE
wlp3s0: WPA: Key negotiation completed with <REDACTED MAC> [PTK=CCMP GTK=TKIP]
wlp3s0: CTRL-EVENT-CONNECTED - Connection to <REDACTED MAC> completed [id=0 id_str=]
wlp3s0: CTRL-EVENT-DISCONNECTED bssid=<REDACTED MAC> reason=3 locally_generated=1
wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
Successfully initialized wpa_supplicant
wlp3s0: SME: Trying to authenticate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=2437 MHz)
wlp3s0: Trying to associate with <REDACTED MAC> (SSID='<REDACTED ESSID>' freq=2437 MHz)
wlp3s0: Associated with <REDACTED MAC>
wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=DE
wlp3s0: WPA: Key negotiation completed with <REDACTED MAC> [PTK=CCMP GTK=TKIP]
wlp3s0: CTRL-EVENT-CONNECTED - Connection to <REDACTED MAC> completed [id=0 id_str=]

PLEASE NOTE:
Regulatory domain COUNTRY_IE is incorrect as this device is located in NL, however, this is what my locale has been deliberately set to.
Comment 4 Niels van Mourik 2018-01-13 14:23:47 UTC
JOURNALCTL entries around the incident time:

Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.3671] device (wlp3s0): state change: activated -> deactivating (reason 'user-requested', internal state 'managed')
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.3672] manager: NetworkManager state is now DISCONNECTING
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.3700] audit: op="connection-deactivate" uuid="bbaf9de8-dfaa-4637-86ff-d555a0d9b82d" name="<REDACTED ESSID>" pid=6133 uid=1000 result="success"
Jan 13 13:52:06 laptop dbus-daemon[989]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.6' (uid=0 pid=1162 comm="/usr/sbin/NetworkManager --no-daemon ")
Jan 13 13:52:06 laptop systemd[1]: Starting Network Manager Script Dispatcher Service...
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.3736] device (wlp3s0): state change: deactivating -> disconnected (reason 'user-requested', internal state 'managed')
Jan 13 13:52:06 laptop avahi-daemon[974]: Withdrawing address record for <REDACTED IP-ADDRESS> on wlp3s0.
Jan 13 13:52:06 laptop avahi-daemon[974]: Leaving mDNS multicast group on interface wlp3s0.IPv6 with address <REDACTED IP-ADDRESS>.
Jan 13 13:52:06 laptop avahi-daemon[974]: Interface wlp3s0.IPv6 no longer relevant for mDNS.
Jan 13 13:52:06 laptop dbus-daemon[989]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Jan 13 13:52:06 laptop systemd[1]: Started Network Manager Script Dispatcher Service.
Jan 13 13:52:06 laptop nm-dispatcher[6137]: req:1 'connectivity-change': new request (4 scripts)
Jan 13 13:52:06 laptop nm-dispatcher[6137]: req:1 'connectivity-change': start running ordered scripts...
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.4117] dhcp4 (wlp3s0): canceled DHCP transaction, DHCP client pid 1842
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.4117] dhcp4 (wlp3s0): state changed bound -> done
Jan 13 13:52:06 laptop kernel: wlp3s0: deauthenticating from <REDACTED MAC> by local choice (Reason: 3=DEAUTH_LEAVING)
Jan 13 13:52:06 laptop avahi-daemon[974]: Withdrawing address record for <REDACTED IP-ADDRESS> on wlp3s0.
Jan 13 13:52:06 laptop kernel: wlp3s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-22)
Jan 13 13:52:06 laptop avahi-daemon[974]: Leaving mDNS multicast group on interface wlp3s0.IPv4 with address <REDACTED IP-ADDRESS>.
Jan 13 13:52:06 laptop avahi-daemon[974]: Interface wlp3s0.IPv4 no longer relevant for mDNS.
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.4382] device (wlp3s0): set-hw-addr: set MAC address to <REDACTED MAC> (scanning)
Jan 13 13:52:06 laptop kernel: IPv6: ADDRCONF(NETDEV_UP): wlp3s0: link is not ready
Jan 13 13:52:06 laptop nscd[967]: 967 monitored file `/etc/resolv.conf` was written to
Jan 13 13:52:06 laptop NetworkManager[1162]: nisdomainname: you must be root to change the domain name
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.5938] manager: NetworkManager state is now DISCONNECTED
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.5941] policy: set-hostname: set hostname to 'localhost.localdomain' (no default device)
Jan 13 13:52:06 laptop dbus-daemon[989]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.6' (uid=0 pid=1162 comm="/usr/sbin/NetworkManager --no-daemon ")
Jan 13 13:52:06 laptop nm-dispatcher[6137]: req:2 'down' [wlp3s0]: new request (4 scripts)
Jan 13 13:52:06 laptop nm-dispatcher[6137]: req:2 'down' [wlp3s0]: start running ordered scripts...
Jan 13 13:52:06 laptop systemd[1]: Starting Hostname Service...
Jan 13 13:52:06 laptop NetworkManager[1162]: <warn>  [1515847926.6091] sup-iface[0x<REDACTED>,wlp3s0]: connection disconnected (reason -3)
Jan 13 13:52:06 laptop NetworkManager[1162]: <info>  [1515847926.6092] device (wlp3s0): supplicant interface state: completed -> disconnected
Jan 13 13:52:06 laptop dbus-daemon[989]: [system] Successfully activated service 'org.freedesktop.hostname1'
Jan 13 13:52:06 laptop systemd[1]: Started Hostname Service.
Jan 13 13:52:06 localhost.localdomain systemd-hostnamed[6277]: Changed host name to 'localhost.localdomain'
Jan 13 13:52:06 localhost.localdomain nm-dispatcher[6137]: req:3 'hostname': new request (4 scripts)
Jan 13 13:52:06 localhost.localdomain nm-dispatcher[6137]: ping: /dev/disk/by-path/pci-0000: Name or service not known
Jan 13 13:52:06 localhost.localdomain nm-dispatcher[6137]: ping: /dev/disk/by-path/pci-0000: Name or service not known
Jan 13 13:52:06 localhost.localdomain systemd[2044]: Failed to parse /proc/self/mountinfo: No such file or directory
-- Reboot --
Jan 13 13:52:43 localhost kernel: microcode: microcode updated early to revision 0x20, date = 2017-01-27
Jan 13 13:52:43 localhost kernel: Linux version 4.14.12-1-default (geeko@buildhost) (gcc version 7.2.1 20171020 [gcc-7-branch revision 253932] (SUSE Linux)) #1 SMP PREEMPT Fri Jan 5 18:15:55 UTC 2018 (3cf399e)
Comment 5 Niels van Mourik 2018-01-16 11:50:30 UTC
I managed to isolate the problem down to NetworkManager. When I switch over to Wickd in Yast, none of the triggers still cause the system to lose its proc mount.
Comment 6 Johannes Pöhlmann 2018-11-26 12:02:18 UTC
same Problem with openSUSE 15.0

when terminating a ovpn (tun) session or pulling the USB cable form a Jolla smartphone offering network over USB /proc/, /sysfs/, /tmpfs vanish. 

Last rescue then are the magic SysRQ keys S U B.

Problem also seen with wicked instead of Network manager. 

Seems not to be kernel related, as it can be triggered by wicked AND NM, and because of the deep impact ( mounts, /proc/, /sysfs/ go away). 

Seem not to be network card driver related as it can be triggered with several network adapters including "tun".

I am happy to supply more information or make tests on request
Comment 7 Reinhard Max 2018-12-18 10:56:06 UTC
*** Bug 1119189 has been marked as a duplicate of this bug. ***
Comment 8 Johannes Pöhlmann 2020-07-31 13:45:33 UTC
I had "Apocalyptic loss of proc/tmpfs/sysfs mounts" also on openSUSE 15.1. By stopping open VPN or by pulling a usb network adapter. in fact every taking down of a network adapter.

I found a possible trigger: I had managed somehow to have network 
manager AND wickedd active. After disabling the network manager packages and 
switching to wicked in Yast, the Problem has vanished. 

I am fine now. The remaining root questions  are: 

* Why was i able to activate network manager AND wicked (Reasons on Level 1-7 ;-)
* Why can the combination of network manager and wickedd do such damage ? 
* Should the kernel not defend itself better against such misconfiguration ?

uname -a:
Linux xxx 4.12.14-lp151.28.48-default #1 SMP Fri Apr 17 05:38:36 UTC 2020 (18849d1) x86_64 x86_64 x86_64 GNU/Linux
Comment 9 Fabian Vogt 2020-08-03 07:22:39 UTC
(In reply to Johannes Pöhlmann from comment #8)
> I had "Apocalyptic loss of proc/tmpfs/sysfs mounts" also on openSUSE 15.1.
> By stopping open VPN or by pulling a usb network adapter. in fact every
> taking down of a network adapter.
> 
> I found a possible trigger: I had managed somehow to have network 
> manager AND wickedd active. After disabling the network manager packages and 
> switching to wicked in Yast, the Problem has vanished. 
> 
> I am fine now. The remaining root questions  are: 
> 
> * Why was i able to activate network manager AND wicked (Reasons on Level
> 1-7 ;-)

Probably by not using network.service, but also wicked.service or NetworkManager.service. Or maybe dbus activation.

> * Why can the combination of network manager and wickedd do such damage ? 

Does /etc/NetworkManager/dispatcher.d/nfs tun0 down trigger the issue?
You might be able to run it as non-root, in which case the errors should indicate whether it would've broken the mounts.

If so, please provide the output of:
bash -x /etc/NetworkManager/dispatcher.d/nfs tun0 down

> * Should the kernel not defend itself better against such misconfiguration ?

The kernel is most likely innocent here and doesn't care about /proc and such at all. If userspace decides to break itself, it won't mind.
 
> uname -a:
> Linux xxx 4.12.14-lp151.28.48-default #1 SMP Fri Apr 17 05:38:36 UTC 2020
> (18849d1) x86_64 x86_64 x86_64 GNU/Linux