Bug 1148500 - lvm2 2.02.180-327: vgs hangs with "device XXX not intialized in udev data base"
lvm2 2.02.180-327: vgs hangs with "device XXX not intialized in udev data base"
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Installation
Current
Other Other
: P2 - High : Normal (vote)
: ---
Assigned To: E-mail List
Jiri Srain
:
Depends on: 1136463
Blocks:
  Show dependency treegraph
 
Reported: 2019-08-28 09:12 UTC by Martin Wilck
Modified: 2021-12-30 08:19 UTC (History)
8 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
log of "vgs -vvvv" (271.89 KB, text/x-log)
2019-08-28 09:12 UTC, Martin Wilck
Details
strace of dracut (5.13 MB, application/x-xz)
2019-08-28 09:19 UTC, Martin Wilck
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Wilck 2019-08-28 09:12:28 UTC
Created attachment 815982 [details]
log of "vgs -vvvv"

OpenQA problem https://openqa.opensuse.org/tests/1018187,
installation with openSUSE-Staging:G-Tumbleweed-DVD-x86_64-Build277.1-Media.iso,
installing on encrypted LVM.

Blocking LVM2 release (https://build.opensuse.org/request/show/725506)

ps aux shows system hanging in "vgs", called by dracut:

> root     18034  0.0  0.3   4100  3228 tty1     S+   03:06   0:00                                 \_ /bin/bash --norc /sbin/mkinitrd
> root     18047  0.0  0.4   5520  4644 tty1     S+   03:06   0:00                                      \_ /bin/bash -p /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force /boot/initrd-5.2.9-5-default 5.2.9-5-default
> root     19895  0.0  0.3   5520  3036 tty1     S+   03:23   0:00                                          \_ /bin/bash -p /usr/bin/dracut --logfile /var/log/YaST2/mkinitrd.log --force /boot/initrd-5.2.9-5-default 5.2.9-5-default
> root     19896  0.1  0.7  10732  7580 tty1     S+   03:23   0:00                                              \_ lvm vgs --noheadings -o pv_name system

I reproduced this locally. vgs emits hundreds of messages like this:

> #device/dev-type.c:1042          Device /dev/loop0 not initialized in udev database (89/100, 8800000 microseconds).

strace shows that vgs is trying to access the udev db under /run/udev/data directly by opening files under that directory. This fails because /run is not mounted in the chroot (system being installed). As soon as a bind-mount is created for /run into the chroot, vgs and then the installation proceeds and completes:

   mount --bind /run /mnt/run

I've assigned this to the "installer" component because it can be fixed in a simple way by bind-mounting /run. I don't know how this problem is related to the lvm2 update from sr#725506.
Comment 2 Martin Wilck 2019-08-28 09:19:10 UTC
Created attachment 815984 [details]
strace of dracut

This is strace -f -p output of dracut trying to build the initrd. You can see vgs trying to access files under /run/udev/data and getting -ENOENT.

> 12453 openat(AT_FDCWD, "/run/udev/data/b7:0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> 12453 openat(AT_FDCWD, "/run/udev/data/b7:0", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

dracut seems to re-run vgs in this situation, so that the problem occurs repeatedly:

> 12453 execve("/sbin/lvm", ["lvm", "vgs", "--noheadings", "-o", "pv_name", "system"], 0x555dec58c490 /* 101 vars */) = 0
> 12575 execve("/sbin/lvm", ["lvm", "vgs", "--noheadings", "-o", "pv_name", "system"], 0x555dec58c490 /* 101 vars */) = 0
> 12671 execve("/sbin/lvm", ["lvm", "vgs", "--noheadings", "-o", "pv_name", "system"], 0x555dec58c490 /* 101 vars */) = 0
> ...

At a certain point, I created the bind mount as mentioned in comment 0. From that point on,  dracut made progress.

> 12817 execve("/sbin/lvm", ["lvm", "vgs", "--noheadings", "-o", "pv_name", "system"], 0x555dec58c490 /* 101 vars */) = 0
> 12817 openat(AT_FDCWD, "/run/udev/data/b7:1", O_RDONLY|O_CLOEXEC) = 5
> 12817 openat(AT_FDCWD, "/run/udev/data/b7:29", O_RDONLY|O_CLOEXEC) = 5
> ...
Comment 4 Gang He 2019-08-28 10:12:13 UTC
Hi Martin,

There are many of pictures(snapshots) at  https://openqa.opensuse.org/tests/1018187,
It is not too easy to get what the case want to do.

Do you know how to reproduce this problem via some steps (without openqa environment) ?


Thanks
Gang
Comment 5 Martin Wilck 2019-08-28 13:48:30 UTC
You can ignore all "grey" pictures.

Basically you have to install using the media linked on the OpenQA page, and choose "LVM based proposal" and "Disk encryption" in the "guided setup".

I don't think the rest matters much. I used "minimal X" installation type, as the OpenQA test did.

Anyway, perhaps you want to focus on the results I already obtained. Are these checks for /run/udev/data files ("openat(AT_FDCWD, "/run/udev/data/b7:1", O_RDONLY|O_CLOEXEC)") introduced by the new code base and/or the added patches?

If not, I'd focus on the question why /run is not bind-mounted into the chroot. 
I believe it should be, whether or not encryption and/or LVM is being used.
Comment 7 Gang He 2019-09-02 07:52:29 UTC
Hello Martin and All,

How do we handle this problem further?
Since these patches (for fixing bsc#1145231) have been included in lvm2 v2.02.183.
I feel the patch is not wrong, but I do not know why the patch will block our stage testing.


Thanks
Gang
Comment 9 Martin Wilck 2019-09-03 08:37:29 UTC
The obvious solution is to make /run available in the chroot.

Honestly, I don't know how installation ever worked without this, in particular, initrd creation and device detection is bound to fail without /run. Applications rightfully assume that they can make libudev calls. And libudev doesn't work if it can't find the udev db under /run/udev.

(Correction wrt comment 0: lvm2 does *not* open files under /run/dev directly. Rather, we are looking at valid libudev calls).

(One might argue that it's wrong to repeat the search for the udev data over and over again. lvm iterates 100 times and waits 0.1s each time, for each device in the system, which makes for a long wait, and then dracut seems to repeat the "lvm vgs" call, either indefinitely or, at least, very often. But even if we cut down on these wait times, on systems with may devices, the OpenQA timeout would occur.)

So, @yast team, please make sure that /run is bind-mounted into the /mnt chroot during installation, in particular while the initrd is rebuilt.

The open question for me is: why did we see this only with encrypted volumes? 
I would expect this problem to occur whenever lvm2 "scan" commands are run in the chroot environment and /run isn't available there, which has nothing to do with encryption.

Is it maybe an OpenQA test artifact? Perhaps, without encryption, there are less block devices overall, so that the OpenQA timeout is avoided? Or maybe bind-mounting /run into the chroot simply fails when encryption is on?
Comment 10 Martin Wilck 2019-09-03 08:49:14 UTC
> https://openqa.opensuse.org/tests/overview?distri=microos&distri=opensuse&version=Staging%3AG&build=277.1&groupid=2

In this staging project (or perhaps generally, in staging projects), "cryptlvm" was the only test involving LVM. That explains why this test failed why others passed.
Comment 11 Martin Wilck 2019-09-03 08:51:37 UTC
@Jiri, can someone from your team please look at this?

We need /run mounted in the /mnt chroot during installation.
Comment 12 Jiri Srain 2019-09-03 09:01:16 UTC
Well, I guess that /run was created as a regular directory - it is part of the filesystem package.

YaST does following mounts after partitioning disk:

        mount_in_target("/dev", "devtmpfs", "-t devtmpfs")
        mount_in_target("/proc", "proc", "-t proc")
        mount_in_target("/sys", "sysfs", "-t sysfs")
        mount_in_target(EFIVARS_PATH, "efivarfs", "-t efivarfs") if File.exist?(EFIVARS_PATH)


Can you see anything else missing?
Comment 13 Gang He 2019-09-03 09:23:36 UTC
(In reply to Martin Wilck from comment #10)
> > https://openqa.opensuse.org/tests/overview?distri=microos&distri=opensuse&version=Staging%3AG&build=277.1&groupid=2
> 
> In this staging project (or perhaps generally, in staging projects),
> "cryptlvm" was the only test involving LVM. That explains why this test
> failed why others passed.

Hello Martin and All,

I just tried LVM as system disk, but without encrytion, 
the bug still happened.
I can confirm the bug is not related to LVM encryption, is related to udev related files access.

Thanks
Gang
Comment 14 Martin Wilck 2019-09-04 07:37:12 UTC
(In reply to Jiri Srain from comment #12)

> Can you see anything else missing?

No. I concur that for at least a decade, when I had to repair boot failures manually, I did something like 

for x in sys dev proc; do mount --bind /$x /mnt/$x; done
chroot /mnt

... exactly what YaST is doing.

So now, /run must be added to the list as well. The code in comment 12 suggests that it can be done quite easily in YaST.

OTOH I'm still wondering that this invocation of "lvm vgs" with LVM2 2.02.183 is the first time that the lack of /run is causing trouble. It's well possible that this has caused hidden problems in the past already without us realizing.
Comment 15 Jiri Srain 2019-09-04 08:41:11 UTC
OK, just found that the fix (as coming from a different bugreport) went to the code last week:

https://github.com/yast/yast-storage-ng/commit/a77cfd84772e466732aff5f20bd088db8c1b5057

Therefore you can expect YaST mounting /run in chroot during installation /upgrade as soon as everything is re-built.
Comment 18 Martin Wilck 2019-09-10 07:03:02 UTC
Jiri, can you tell us a YaST version to look out for?
Comment 19 Jiri Srain 2019-09-10 07:07:57 UTC
(In reply to Martin Wilck from comment #18)
> Jiri, can you tell us a YaST version to look out for?

You need yast2-storage-ng 4.2.37

-------------------------------------------------------------------
Fri Aug 30 09:25:52 UTC 2019 - Steffen Winterfeldt <snwint@suse.com>

- bind-mount /run from inst-sys to target system during install (bsc#1136463)
- 4.2.37

or newer. Keep in mind that you need the package in the inst-sys, not in the repository (if it makes a change because of only partial rebuild).
Comment 20 Martin Wilck 2019-09-10 07:41:19 UTC
Added dependency on bug 1136463. 

We need to wait for OBS sr#728551 to include yast2-storage-ng in 4.2.38 factory.
Not sure if acceptance of that request would mean that this yast2-storage-ng version will also be in the inst-sys immediately, or if we have to wait some more.
Comment 21 Gang He 2019-09-17 08:25:40 UTC
Hell Guys,

Any update for this bug?
Since I also want to submit the patches to SLE12SP5, I want to if SLE12SP5 has this bug too during the installation?


Thanks
Gang
Comment 22 Gang He 2019-09-18 02:48:27 UTC
Hell Guys,

Any update for this bug?
Since I also want to submit the patches to SLE12SP5, I want to if SLE12SP5 has this bug too during the installation?


Thanks
Gang
Comment 25 Roger Zhou 2019-09-18 06:54:01 UTC
(In reply to Gang He from comment #22)
> Hell Guys,
> 
> Any update for this bug?

Go one more step to explore build.opensuse.org, you will find SR#728551 is on its way to Factory.

> Since I also want to submit the patches to SLE12SP5, I want to if SLE12SP5
> has this bug too during the installation?
> 

If no clear facts, or even clue, to blame the same on SLE12SP5, I incline to assume SEL12SP5 is sound. 

That says, I didn't see any clear reason why your potential SLE12SP5 code change need depend on this one.
Comment 26 Martin Wilck 2019-09-24 15:45:23 UTC
(In reply to Roger Zhou from comment #25)

> If no clear facts, or even clue, to blame the same on SLE12SP5, I incline to
> assume SEL12SP5 is sound. 

Here's some evidence to back up this assumption: This is a yast2-storage-ng bug, which had been fixed in the predecessor y2storage already (bug 717321, comment 31). So it appears we're indeed fine in SLE12.
Comment 28 Martin Wilck 2019-10-01 10:28:59 UTC
Forwarding Michal's request to Gang.
Comment 32 Ladislav Slezák 2019-10-10 14:06:01 UTC
(In reply to Gang He from comment #31)
> I did a installation test with
> openSUSE-Tumbleweed-DVD-x86_64-Snapshot20191007-Media.iso, the result is OK,
> no any hang problem.

OK, thank you for the feedback.

So I assume we can close the bug as fixed.
Comment 36 Swamp Workflow Management 2019-12-09 17:12:02 UTC
SUSE-RU-2019:3204-1: An update that has one recommended fix can now be installed.

Category: recommended (moderate)
Bug References: 1148500
CVE References: 
Sources used:
SUSE Linux Enterprise Module for Open Buildservice Development Tools 15-SP1 (src):    yast2-update-4.1.11-3.6.1
SUSE Linux Enterprise Module for Basesystem 15-SP1 (src):    yast2-update-4.1.11-3.6.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 37 Swamp Workflow Management 2019-12-14 23:11:25 UTC
openSUSE-RU-2019:2682-1: An update that has one recommended fix can now be installed.

Category: recommended (moderate)
Bug References: 1148500
CVE References: 
Sources used:
openSUSE Leap 15.1 (src):    yast2-update-4.1.11-lp151.2.6.1
Comment 38 Josef Reidinger 2020-01-16 09:52:48 UTC
backporting to SLE12 SP4 at https://github.com/yast/yast-update/pull/146
Comment 39 Josef Reidinger 2020-01-16 11:45:47 UTC
maintenance team. For this fix in SLE15-SP1 and SLE12 SP4 installer self update is needed. Can you mark them as it?
Comment 41 Zsolt KALMAR 2020-01-21 12:36:38 UTC
(In reply to Josef Reidinger from comment #39)
> maintenance team. For this fix in SLE15-SP1 and SLE12 SP4 installer self
> update is needed. Can you mark them as it?

Thanks, installer self update is being organized.
Comment 45 Swamp Workflow Management 2021-12-30 08:19:34 UTC
SUSE-RU-2021:4197-1: An update that has 6 recommended fixes can now be installed.

Category: recommended (important)
Bug References: 1085212,1089643,1089647,1136463,1148500,1180142
CVE References: 
JIRA References: 
Sources used:
SUSE OpenStack Cloud Crowbar 9 (src):    yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3
SUSE OpenStack Cloud 9 (src):    yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3
SUSE Linux Enterprise Server for SAP Installer 12-SP4 (src):    yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3
SUSE Linux Enterprise Server for SAP 12-SP4 (src):    yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3
SUSE Linux Enterprise Server Installer 12-SP4 (src):    yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3
SUSE Linux Enterprise Server 12-SP4-LTSS (src):    yast2-3.2.51-4.10.2, yast2-installation-3.3.2-4.9.2, yast2-update-3.2.4-5.3.3

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.