Bugzilla – Bug 1148500
lvm2 2.02.180-327: vgs hangs with "device XXX not intialized in udev data base"
Last modified: 2021-12-30 08:19:34 UTC
Created attachment 815982 [details] log of "vgs -vvvv" OpenQA problem https://openqa.opensuse.org/tests/1018187, installation with openSUSE-Staging:G-Tumbleweed-DVD-x86_64-Build277.1-Media.iso, installing on encrypted LVM. Blocking LVM2 release (https://build.opensuse.org/request/show/725506) ps aux shows system hanging in "vgs", called by dracut: > root 18034 0.0 0.3 4100 3228 tty1 S+ 03:06 0:00 \_ /bin/bash --norc /sbin/mkinitrd > root 18047 0.0 0.4 5520 4644 tty1 S+ 03:06 0:00 \_ /bin/bash -p /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force /boot/initrd-5.2.9-5-default 5.2.9-5-default > root 19895 0.0 0.3 5520 3036 tty1 S+ 03:23 0:00 \_ /bin/bash -p /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force /boot/initrd-5.2.9-5-default 5.2.9-5-default > root 19896 0.1 0.7 10732 7580 tty1 S+ 03:23 0:00 \_ lvm vgs --noheadings -o pv_name system I reproduced this locally. vgs emits hundreds of messages like this: > #device/dev-type.c:1042 Device /dev/loop0 not initialized in udev database (89/100, 8800000 microseconds). strace shows that vgs is trying to access the udev db under /run/udev/data directly by opening files under that directory. This fails because /run is not mounted in the chroot (system being installed). As soon as a bind-mount is created for /run into the chroot, vgs and then the installation proceeds and completes: mount --bind /run /mnt/run I've assigned this to the "installer" component because it can be fixed in a simple way by bind-mounting /run. I don't know how this problem is related to the lvm2 update from sr#725506.
Created attachment 815984 [details] strace of dracut This is strace -f -p output of dracut trying to build the initrd. You can see vgs trying to access files under /run/udev/data and getting -ENOENT. > 12453 openat(AT_FDCWD, "/run/udev/data/b7:0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) > 12453 openat(AT_FDCWD, "/run/udev/data/b7:0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) dracut seems to re-run vgs in this situation, so that the problem occurs repeatedly: > 12453 execve("/sbin/lvm", ["lvm", "vgs", "--noheadings", "-o", "pv_name", "system"], 0x555dec58c490 /* 101 vars */) = 0 > 12575 execve("/sbin/lvm", ["lvm", "vgs", "--noheadings", "-o", "pv_name", "system"], 0x555dec58c490 /* 101 vars */) = 0 > 12671 execve("/sbin/lvm", ["lvm", "vgs", "--noheadings", "-o", "pv_name", "system"], 0x555dec58c490 /* 101 vars */) = 0 > ... At a certain point, I created the bind mount as mentioned in comment 0. From that point on, dracut made progress. > 12817 execve("/sbin/lvm", ["lvm", "vgs", "--noheadings", "-o", "pv_name", "system"], 0x555dec58c490 /* 101 vars */) = 0 > 12817 openat(AT_FDCWD, "/run/udev/data/b7:1", O_RDONLY|O_CLOEXEC) = 5 > 12817 openat(AT_FDCWD, "/run/udev/data/b7:29", O_RDONLY|O_CLOEXEC) = 5 > ...
Hi Martin, There are many of pictures(snapshots) at https://openqa.opensuse.org/tests/1018187, It is not too easy to get what the case want to do. Do you know how to reproduce this problem via some steps (without openqa environment) ? Thanks Gang
You can ignore all "grey" pictures. Basically you have to install using the media linked on the OpenQA page, and choose "LVM based proposal" and "Disk encryption" in the "guided setup". I don't think the rest matters much. I used "minimal X" installation type, as the OpenQA test did. Anyway, perhaps you want to focus on the results I already obtained. Are these checks for /run/udev/data files ("openat(AT_FDCWD, "/run/udev/data/b7:1", O_RDONLY|O_CLOEXEC)") introduced by the new code base and/or the added patches? If not, I'd focus on the question why /run is not bind-mounted into the chroot. I believe it should be, whether or not encryption and/or LVM is being used.
Hello Martin and All, How do we handle this problem further? Since these patches (for fixing bsc#1145231) have been included in lvm2 v2.02.183. I feel the patch is not wrong, but I do not know why the patch will block our stage testing. Thanks Gang
The obvious solution is to make /run available in the chroot. Honestly, I don't know how installation ever worked without this, in particular, initrd creation and device detection is bound to fail without /run. Applications rightfully assume that they can make libudev calls. And libudev doesn't work if it can't find the udev db under /run/udev. (Correction wrt comment 0: lvm2 does *not* open files under /run/dev directly. Rather, we are looking at valid libudev calls). (One might argue that it's wrong to repeat the search for the udev data over and over again. lvm iterates 100 times and waits 0.1s each time, for each device in the system, which makes for a long wait, and then dracut seems to repeat the "lvm vgs" call, either indefinitely or, at least, very often. But even if we cut down on these wait times, on systems with may devices, the OpenQA timeout would occur.) So, @yast team, please make sure that /run is bind-mounted into the /mnt chroot during installation, in particular while the initrd is rebuilt. The open question for me is: why did we see this only with encrypted volumes? I would expect this problem to occur whenever lvm2 "scan" commands are run in the chroot environment and /run isn't available there, which has nothing to do with encryption. Is it maybe an OpenQA test artifact? Perhaps, without encryption, there are less block devices overall, so that the OpenQA timeout is avoided? Or maybe bind-mounting /run into the chroot simply fails when encryption is on?
> https://openqa.opensuse.org/tests/overview?distri=microos&distri=opensuse&version=Staging%3AG&build=277.1&groupid=2 In this staging project (or perhaps generally, in staging projects), "cryptlvm" was the only test involving LVM. That explains why this test failed why others passed.
@Jiri, can someone from your team please look at this? We need /run mounted in the /mnt chroot during installation.
Well, I guess that /run was created as a regular directory - it is part of the filesystem package. YaST does following mounts after partitioning disk: mount_in_target("/dev", "devtmpfs", "-t devtmpfs") mount_in_target("/proc", "proc", "-t proc") mount_in_target("/sys", "sysfs", "-t sysfs") mount_in_target(EFIVARS_PATH, "efivarfs", "-t efivarfs") if File.exist?(EFIVARS_PATH) Can you see anything else missing?
(In reply to Martin Wilck from comment #10) > > https://openqa.opensuse.org/tests/overview?distri=microos&distri=opensuse&version=Staging%3AG&build=277.1&groupid=2 > > In this staging project (or perhaps generally, in staging projects), > "cryptlvm" was the only test involving LVM. That explains why this test > failed why others passed. Hello Martin and All, I just tried LVM as system disk, but without encrytion, the bug still happened. I can confirm the bug is not related to LVM encryption, is related to udev related files access. Thanks Gang
(In reply to Jiri Srain from comment #12) > Can you see anything else missing? No. I concur that for at least a decade, when I had to repair boot failures manually, I did something like for x in sys dev proc; do mount --bind /$x /mnt/$x; done chroot /mnt ... exactly what YaST is doing. So now, /run must be added to the list as well. The code in comment 12 suggests that it can be done quite easily in YaST. OTOH I'm still wondering that this invocation of "lvm vgs" with LVM2 2.02.183 is the first time that the lack of /run is causing trouble. It's well possible that this has caused hidden problems in the past already without us realizing.
OK, just found that the fix (as coming from a different bugreport) went to the code last week: https://github.com/yast/yast-storage-ng/commit/a77cfd84772e466732aff5f20bd088db8c1b5057 Therefore you can expect YaST mounting /run in chroot during installation /upgrade as soon as everything is re-built.
Jiri, can you tell us a YaST version to look out for?
(In reply to Martin Wilck from comment #18) > Jiri, can you tell us a YaST version to look out for? You need yast2-storage-ng 4.2.37 ------------------------------------------------------------------- Fri Aug 30 09:25:52 UTC 2019 - Steffen Winterfeldt <snwint@suse.com> - bind-mount /run from inst-sys to target system during install (bsc#1136463) - 4.2.37 or newer. Keep in mind that you need the package in the inst-sys, not in the repository (if it makes a change because of only partial rebuild).
Added dependency on bug 1136463. We need to wait for OBS sr#728551 to include yast2-storage-ng in 4.2.38 factory. Not sure if acceptance of that request would mean that this yast2-storage-ng version will also be in the inst-sys immediately, or if we have to wait some more.
Hell Guys, Any update for this bug? Since I also want to submit the patches to SLE12SP5, I want to if SLE12SP5 has this bug too during the installation? Thanks Gang
(In reply to Gang He from comment #22) > Hell Guys, > > Any update for this bug? Go one more step to explore build.opensuse.org, you will find SR#728551 is on its way to Factory. > Since I also want to submit the patches to SLE12SP5, I want to if SLE12SP5 > has this bug too during the installation? > If no clear facts, or even clue, to blame the same on SLE12SP5, I incline to assume SEL12SP5 is sound. That says, I didn't see any clear reason why your potential SLE12SP5 code change need depend on this one.
(In reply to Roger Zhou from comment #25) > If no clear facts, or even clue, to blame the same on SLE12SP5, I incline to > assume SEL12SP5 is sound. Here's some evidence to back up this assumption: This is a yast2-storage-ng bug, which had been fixed in the predecessor y2storage already (bug 717321, comment 31). So it appears we're indeed fine in SLE12.
Forwarding Michal's request to Gang.
(In reply to Gang He from comment #31) > I did a installation test with > openSUSE-Tumbleweed-DVD-x86_64-Snapshot20191007-Media.iso, the result is OK, > no any hang problem. OK, thank you for the feedback. So I assume we can close the bug as fixed.
SUSE-RU-2019:3204-1: An update that has one recommended fix can now be installed. Category: recommended (moderate) Bug References: 1148500 CVE References: Sources used: SUSE Linux Enterprise Module for Open Buildservice Development Tools 15-SP1 (src): yast2-update-4.1.11-3.6.1 SUSE Linux Enterprise Module for Basesystem 15-SP1 (src): yast2-update-4.1.11-3.6.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
openSUSE-RU-2019:2682-1: An update that has one recommended fix can now be installed. Category: recommended (moderate) Bug References: 1148500 CVE References: Sources used: openSUSE Leap 15.1 (src): yast2-update-4.1.11-lp151.2.6.1
backporting to SLE12 SP4 at https://github.com/yast/yast-update/pull/146
maintenance team. For this fix in SLE15-SP1 and SLE12 SP4 installer self update is needed. Can you mark them as it?
(In reply to Josef Reidinger from comment #39) > maintenance team. For this fix in SLE15-SP1 and SLE12 SP4 installer self > update is needed. Can you mark them as it? Thanks, installer self update is being organized.
SUSE-RU-2021:4197-1: An update that has 6 recommended fixes can now be installed. Category: recommended (important) Bug References: 1085212,1089643,1089647,1136463,1148500,1180142 CVE References: JIRA References: Sources used: SUSE OpenStack Cloud Crowbar 9 (src): yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3 SUSE OpenStack Cloud 9 (src): yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3 SUSE Linux Enterprise Server for SAP Installer 12-SP4 (src): yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3 SUSE Linux Enterprise Server for SAP 12-SP4 (src): yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3 SUSE Linux Enterprise Server Installer 12-SP4 (src): yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3 SUSE Linux Enterprise Server 12-SP4-LTSS (src): yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.