Bugzilla – Bug 1149980
cleanup target system unmounting in umount_finish
Last modified: 2022-05-24 11:05:07 UTC
Created attachment 817376 [details] yast log umount_finish (https://github.com/yast/yast-installation/blob/master/src/lib/installation/clients/umount_finish.rb) unmounts the filesystems that have been mounted to the target system during the installation (i.e. everything below /mnt). It does so *twice*. As it usually succeeds in the first iteration, the second iteration mostly fails. And as there's quite some fallback stuff programmed in (like lazy unmounting and ro-remounting) this leads to quite a mess in the logs and some unneeded activities. See attached log fragment for an example.
Tracking in YaST Scrum board.
From that y2log it appears that it's the Btrfs subvolumes that cause all those confusing errors in the log. All entries in /proc/mounts (reformatted and minus the mount option to avoid Bugzilla mangling it completely): tmpfs / tmpfs tmpfs / tmpfs proc /proc proc sysfs /sys sysfs /dev/loop0 /parts/mp_0000 squashfs /dev/loop1 /parts/mp_0001 squashfs devtmpfs /dev devtmpfs devpts /dev/pts devpts rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs /dev/loop2 /mounts/mp_0000 squashfs /dev/loop3 /mounts/mp_0001 squashfs /dev/loop5 /mounts/mp_0003 squashfs /dev/loop6 /mounts/mp_0004 squashfs /dev/sda2 /mnt btrfs /dev/sda2 /mnt/.snapshots btrfs /dev/sda2 /mnt/boot/grub2/i386-pc btrfs /dev/sda2 /mnt/boot/grub2/x86_64-efi btrfs /dev/sda2 /mnt/home btrfs /dev/sda2 /mnt/opt btrfs /dev/sda2 /mnt/root btrfs /dev/sda2 /mnt/srv btrfs /dev/sda2 /mnt/tmp btrfs /dev/sda2 /mnt/usr/local btrfs /dev/sda2 /mnt/var btrfs devtmpfs /mnt/dev devtmpfs proc /mnt/proc proc sysfs /mnt/sys sysfs tmpfs /mnt/run tmpfs But we only need mounts to /mnt, so let's grep for that: /dev/sda2 /mnt btrfs /dev/sda2 /mnt/.snapshots btrfs /dev/sda2 /mnt/boot/grub2/i386-pc btrfs /dev/sda2 /mnt/boot/grub2/x86_64-efi btrfs /dev/sda2 /mnt/home btrfs /dev/sda2 /mnt/opt btrfs /dev/sda2 /mnt/root btrfs /dev/sda2 /mnt/srv btrfs /dev/sda2 /mnt/tmp btrfs /dev/sda2 /mnt/usr/local btrfs /dev/sda2 /mnt/var btrfs devtmpfs /mnt/dev devtmpfs proc /mnt/proc proc sysfs /mnt/sys sysfs tmpfs /mnt/run tmpfs Notice that all the snapshots on the root filesystem are also there. But unmounting them separately is what fails with that "filesystem busy" error, so that's what we need to avoid.
The trouble is that since we are also installing into a snapshot (i.e. subvolume), it's very hard to tell them apart: /dev/sda2 /mnt btrfs rw,relatime,space_cache,subvolid=268,subvol=/@/.snapshots/1/snapshot 0 0 /dev/sda2 /mnt/.snapshots btrfs rw,relatime,space_cache,subvolid=267,subvol=/@/.snapshots 0 0 /dev/sda2 /mnt/boot/grub2/i386-pc btrfs rw,relatime,space_cache,subvolid=266,subvol=/@/boot/grub2/i386-pc 0 0 ... ... /dev/sda2 /mnt/var btrfs rw,relatime,space_cache,subvolid=258,subvol=/@/var 0 0 Filtering out all entries with "subvolid" or "subvol=" leaves nothing. We also can't only take the first entry: If the user decides to create a separate /home and format that with Btrfs as well, that would leave /mnt/home mounted. Worse, the user might decide to also create subvolumes and/or prepare for snapshots on that separate /home (unlikely, but possible).
Deciding which of the entries in /proc/mounts is a true toplevel Btrfs and which is only a subvolume is very hard; even more so if it is possible that there are SEVERAL toplevel Btrfs mounts. One approach might be to check each entry during unmounting avain if it's even mounted: A previous unmount might have taken care of it as well. Of course that works only if we strictly follow the order of /proc/mounts. But that is in contradiction of the other requirement: Unmount from deeper nesting upwards to the target's root because otherwise the root will still be busy. The same applies to mounts like /boot that might have more mounts inside them. / ├─/boot │ ├─/boot/efi │ └─/boot/morestuff So we NEED to start unmounting from the deepest mounts upwards. For Btrfs (i.e. all mounts containing mount options "subvolid" or "subvol=") it is tempting to make an exception and start from the top. That works well as long as it's only subvolumes of that same Btrfs mounted inside it. But it is entirely possible that another Btrfs is mounted to that Btrfs, and that might also have those mount options "subvolid" or "subvol=". In that case, it would not be unmounted first, and unmounting the toplevel Btrfs would fail with "filesystem busy".
So the only realistic option I see is to resort to yast-storage's device graph and its knowledge about subvolumes: Filter those paths out when unmounting.
It turns out that you can unmount btrfs subvolumes just fine; but at least one of them, /mnt/var, tends to be busy because libzypp uses /mnt/var/cache/zypp/packages/... for its package cache. If unmounting that one fails with a "busy" error, unmounting its parent Btrfs main volume at /mnt also fails with the same error.
Pull request: https://github.com/yast/yast-installation/pull/975
Related: New bug #1189793 "libzypp cache keeps /mnt/var mount busy" that I wrote when I tested my when I tested my changes.
New y2log in the error case if something could not be unmounted (/mnt/var in this case): 2021-08-26 12:35:30 <1> install(4976) [Ruby] clients/umount_finish.rb(umount_target_mounts):79 Paths to unmount: ["/mnt/run", "/mnt/sys", "/mnt/proc", "/mnt/dev", "/mnt/home", "/mnt/boot/grub2/i386-pc", "/mnt/boot/grub2/x86_64-efi", "/mnt/opt", "/mnt/root", "/mnt/srv", "/mnt/usr/local", "/mnt/var", "/mnt"] ...unmounting each one in turn... 2021-08-26 12:35:30 <2> install(4976) [Ruby] clients/umount_finish.rb(unmount_summary):97 Leftover paths that could not be unmounted: ["/mnt/var", "/mnt"] 2021-08-26 12:35:30 <2> install(4976) [Ruby] clients/umount_finish.rb(log_running_processes):185 Running processes using ["/mnt/var", "/mnt"]: USER PID ACCESS COMMAND /mnt/var: root kernel mount /mnt/var root 4976 f.... Zypp-main /mnt: root kernel mount /mnt 2021-08-26 12:35:30 <1> install(4976) [Ruby] clients/umount_finish.rb(dump_file):107 /proc/mounts: ... ... /dev/sda2 /mnt btrfs rw,relatime,space_cache,subvolid=256,subvol=/@ 0 0 /dev/sda2 /mnt/var btrfs rw,relatime,space_cache,subvolid=258,subvol=/@/var 0 0 2021-08-26 12:35:30 <1> install(4976) [Ruby] clients/umount_finish.rb(write):64 umount_finish.rb done
Created attachment 852070 [details] New y2log snippet while umount_finish is running (2)
It appears to me that we seem to have another fallback during installation that tries to umount leftover /mnt mounts after YaST has finished. I'll check the y2start scripts.
Thanks; the log looks much better now and it even gives useful hints.
I would have expected that after these steps in pre_umount_finish https://github.com/yast/yast-installation/blob/master/src/lib/installation/clients/pre_umount_finish.rb#L56-L62 zypp should be gone. Otherwise I don't see where else it should come from considering our workflow: https://github.com/yast/yast-installation/blob/master/src/lib/installation/clients/inst_finish.rb#L494-L497
The emergency umount code: https://github.com/yast/yast-installation/blob/master/startup/First-Stage/F10-cleanup#L9-L10 see also bug #1189793 comment #7 for why we shouldn't rely on that and get this fixed before it escalates to that last ditch defense script.
Closing this refactoring bug after the PR is merged. For the remaining open mount with the libzypp cache, see the separate bug #1189793.
SR to OBS: https://build.opensuse.org/request/show/914406 SR to IBS: https://build.suse.de/request/show/248874