Bug 1149980 - cleanup target system unmounting in umount_finish
cleanup target system unmounting in umount_finish
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Installation
Current
Other Other
: P3 - Medium : Normal (vote)
: ---
Assigned To: YaST Team
Jiri Srain
https://trello.com/c/XaqRXabj
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-09-09 09:57 UTC by Steffen Winterfeldt
Modified: 2022-05-24 11:05 UTC (History)
0 users

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
yast log (17.58 KB, text/plain)
2019-09-09 09:57 UTC, Steffen Winterfeldt
Details
New y2log snippet while umount_finish is running (13.80 KB, text/plain)
2021-08-26 11:01 UTC, Stefan Hundhammer
Details
New y2log snippet while umount_finish is running (2) (11.24 KB, text/plain)
2021-08-26 11:02 UTC, Stefan Hundhammer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steffen Winterfeldt 2019-09-09 09:57:14 UTC
Created attachment 817376 [details]
yast log

umount_finish (https://github.com/yast/yast-installation/blob/master/src/lib/installation/clients/umount_finish.rb)

unmounts the filesystems that have been mounted to the target system during
the installation (i.e. everything below /mnt).

It does so *twice*.

As it usually succeeds in the first iteration, the second iteration mostly fails.

And as there's quite some fallback stuff programmed in (like lazy unmounting and
ro-remounting) this leads to quite a mess in the logs and some unneeded activities.

See attached log fragment for an example.
Comment 1 Steffen Winterfeldt 2019-09-09 10:04:45 UTC
Tracking in YaST Scrum board.
Comment 2 Stefan Hundhammer 2021-08-10 08:23:25 UTC
From that y2log it appears that it's the Btrfs subvolumes that cause all those confusing errors in the log.

All entries in /proc/mounts (reformatted and minus the mount option to avoid Bugzilla mangling it completely):

tmpfs       /                           tmpfs
tmpfs       /                           tmpfs
proc        /proc                       proc
sysfs       /sys                        sysfs
/dev/loop0  /parts/mp_0000              squashfs
/dev/loop1  /parts/mp_0001              squashfs
devtmpfs    /dev                        devtmpfs
devpts      /dev/pts                    devpts
rpc_pipefs  /var/lib/nfs/rpc_pipefs     rpc_pipefs
/dev/loop2  /mounts/mp_0000             squashfs
/dev/loop3  /mounts/mp_0001             squashfs
/dev/loop5  /mounts/mp_0003             squashfs
/dev/loop6  /mounts/mp_0004             squashfs
/dev/sda2   /mnt                        btrfs
/dev/sda2   /mnt/.snapshots             btrfs
/dev/sda2   /mnt/boot/grub2/i386-pc     btrfs
/dev/sda2   /mnt/boot/grub2/x86_64-efi  btrfs
/dev/sda2   /mnt/home                   btrfs
/dev/sda2   /mnt/opt                    btrfs
/dev/sda2   /mnt/root                   btrfs
/dev/sda2   /mnt/srv                    btrfs
/dev/sda2   /mnt/tmp                    btrfs
/dev/sda2   /mnt/usr/local              btrfs
/dev/sda2   /mnt/var                    btrfs
devtmpfs    /mnt/dev                    devtmpfs
proc        /mnt/proc                   proc
sysfs       /mnt/sys                    sysfs
tmpfs       /mnt/run                    tmpfs


But we only need mounts to /mnt, so let's grep for that:

/dev/sda2   /mnt                        btrfs
/dev/sda2   /mnt/.snapshots             btrfs
/dev/sda2   /mnt/boot/grub2/i386-pc     btrfs
/dev/sda2   /mnt/boot/grub2/x86_64-efi  btrfs
/dev/sda2   /mnt/home                   btrfs
/dev/sda2   /mnt/opt                    btrfs
/dev/sda2   /mnt/root                   btrfs
/dev/sda2   /mnt/srv                    btrfs
/dev/sda2   /mnt/tmp                    btrfs
/dev/sda2   /mnt/usr/local              btrfs
/dev/sda2   /mnt/var                    btrfs
devtmpfs    /mnt/dev                    devtmpfs
proc        /mnt/proc                   proc
sysfs       /mnt/sys                    sysfs
tmpfs       /mnt/run                    tmpfs


Notice that all the snapshots on the root filesystem are also there. But unmounting them separately is what fails with that "filesystem busy" error, so that's what we need to avoid.
Comment 3 Stefan Hundhammer 2021-08-10 08:29:48 UTC
The trouble is that since we are also installing into a snapshot (i.e. subvolume), it's very hard to tell them apart:


/dev/sda2   /mnt  btrfs
  rw,relatime,space_cache,subvolid=268,subvol=/@/.snapshots/1/snapshot
  0  0


/dev/sda2   /mnt/.snapshots  btrfs
  rw,relatime,space_cache,subvolid=267,subvol=/@/.snapshots
  0  0


/dev/sda2   /mnt/boot/grub2/i386-pc  btrfs
  rw,relatime,space_cache,subvolid=266,subvol=/@/boot/grub2/i386-pc
  0  0

...
...

/dev/sda2   /mnt/var  btrfs
  rw,relatime,space_cache,subvolid=258,subvol=/@/var
  0  0


Filtering out all entries with "subvolid" or "subvol=" leaves nothing.

We also can't only take the first entry: If the user decides to create a separate /home and format that with Btrfs as well, that would leave /mnt/home mounted. Worse, the user might decide to also create subvolumes and/or prepare for snapshots on that separate /home (unlikely, but possible).
Comment 4 Stefan Hundhammer 2021-08-10 08:47:32 UTC
Deciding which of the entries in /proc/mounts is a true toplevel Btrfs and which is only a subvolume is very hard; even more so if it is possible that there are SEVERAL toplevel Btrfs mounts.

One approach might be to check each entry during unmounting avain if it's even mounted: A previous unmount might have taken care of it as well. Of course that works only if we strictly follow the order of /proc/mounts.

But that is in contradiction of the other requirement: Unmount from deeper nesting upwards to the target's root because otherwise the root will still be busy. The same applies to mounts like /boot that might have more mounts inside them.

  /
  ├─/boot
  │ ├─/boot/efi
  │ └─/boot/morestuff

So we NEED to start unmounting from the deepest mounts upwards.

For Btrfs (i.e. all mounts containing mount options "subvolid" or "subvol=") it is tempting to make an exception and start from the top. That works well as long as it's only subvolumes of that same Btrfs mounted inside it.

But it is entirely possible that another Btrfs is mounted to that Btrfs, and that might also have those mount options "subvolid" or "subvol=". In that case, it would not be unmounted first, and unmounting the toplevel Btrfs would fail with "filesystem busy".
Comment 5 Stefan Hundhammer 2021-08-10 08:48:40 UTC
So the only realistic option I see is to resort to yast-storage's device graph and its knowledge about subvolumes: Filter those paths out when unmounting.
Comment 6 Stefan Hundhammer 2021-08-23 10:56:53 UTC
It turns out that you can unmount btrfs subvolumes just fine; but at least one of them, /mnt/var, tends to be busy because libzypp uses /mnt/var/cache/zypp/packages/... for its package cache. If unmounting that one fails with a "busy" error, unmounting its parent Btrfs main volume at /mnt also fails with the same error.
Comment 7 Stefan Hundhammer 2021-08-25 13:03:40 UTC
Pull request: https://github.com/yast/yast-installation/pull/975
Comment 8 Stefan Hundhammer 2021-08-25 13:03:58 UTC
Related: New bug #1189793 "libzypp cache keeps /mnt/var mount busy" that I wrote when I tested my when I tested my changes.
Comment 9 Stefan Hundhammer 2021-08-26 11:00:31 UTC
New y2log in the error case if something could not be unmounted (/mnt/var in this case):



2021-08-26 12:35:30 <1> install(4976) [Ruby]
  clients/umount_finish.rb(umount_target_mounts):79
  Paths to unmount:
  ["/mnt/run", "/mnt/sys", "/mnt/proc", "/mnt/dev", "/mnt/home",
  "/mnt/boot/grub2/i386-pc", "/mnt/boot/grub2/x86_64-efi",
  "/mnt/opt", "/mnt/root", "/mnt/srv", "/mnt/usr/local",
  "/mnt/var", "/mnt"]

...unmounting each one in turn...



2021-08-26 12:35:30 <2> install(4976) [Ruby]
  clients/umount_finish.rb(unmount_summary):97
  Leftover paths that could not be unmounted:
  ["/mnt/var", "/mnt"]

2021-08-26 12:35:30 <2> install(4976) [Ruby]
  clients/umount_finish.rb(log_running_processes):185

Running processes using ["/mnt/var", "/mnt"]:
                     USER        PID ACCESS COMMAND
/mnt/var:            root     kernel mount /mnt/var
                     root       4976 f.... Zypp-main
/mnt:                root     kernel mount /mnt

2021-08-26 12:35:30 <1> install(4976) [Ruby]
  clients/umount_finish.rb(dump_file):107


/proc/mounts:

...
...
/dev/sda2 /mnt btrfs rw,relatime,space_cache,subvolid=256,subvol=/@ 0 0
/dev/sda2 /mnt/var btrfs rw,relatime,space_cache,subvolid=258,subvol=/@/var 0 0


2021-08-26 12:35:30 <1> install(4976) [Ruby]
  clients/umount_finish.rb(write):64 umount_finish.rb done
Comment 11 Stefan Hundhammer 2021-08-26 11:02:37 UTC
Created attachment 852070 [details]
New y2log snippet while umount_finish is running (2)
Comment 12 Stefan Hundhammer 2021-08-26 11:09:47 UTC
It appears to me that we seem to have another fallback during installation that tries to umount leftover /mnt mounts after YaST has finished. I'll check the y2start scripts.
Comment 14 Steffen Winterfeldt 2021-08-26 11:49:45 UTC
Thanks; the log looks much better now and it even gives useful hints.
Comment 15 Steffen Winterfeldt 2021-08-26 11:58:43 UTC
I would have expected that after these steps in pre_umount_finish

https://github.com/yast/yast-installation/blob/master/src/lib/installation/clients/pre_umount_finish.rb#L56-L62

zypp should be gone.

Otherwise I don't see where else it should come from considering our workflow:

https://github.com/yast/yast-installation/blob/master/src/lib/installation/clients/inst_finish.rb#L494-L497
Comment 16 Stefan Hundhammer 2021-08-26 12:07:18 UTC
The emergency umount code:

https://github.com/yast/yast-installation/blob/master/startup/First-Stage/F10-cleanup#L9-L10

see also bug #1189793 comment #7 for why we shouldn't rely on that and get this fixed before it escalates to that last ditch defense script.
Comment 17 Stefan Hundhammer 2021-08-26 13:21:03 UTC
Closing this refactoring bug after the PR is merged.

For the remaining open mount with the libzypp cache, see the separate bug #1189793.
Comment 18 Stefan Hundhammer 2021-08-26 13:51:20 UTC
SR to OBS:

  https://build.opensuse.org/request/show/914406

SR to IBS:

  https://build.suse.de/request/show/248874