Bug 1190326 - Update to kernel 5.14 makes disks disapear/system unbootable
Summary: Update to kernel 5.14 makes disks disapear/system unbootable
Status: RESOLVED FIXED
: 1190814 1190937 (view as bug list)
Alias: None
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel (show other bugs)
Version: Current
Hardware: Other Other
: P5 - None : Major (vote)
Target Milestone: ---
Assignee: dracut maintainers
QA Contact: E-mail List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 1190930
  Show dependency treegraph
 
Reported: 2021-09-09 06:28 UTC by Dennis Glindhart
Modified: 2021-11-27 14:19 UTC (History)
9 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
rdsosreport.txt (60.95 KB, text/plain)
2021-09-16 07:53 UTC, Dennis Glindhart
Details
journalctl (55.36 KB, text/plain)
2021-09-16 07:54 UTC, Dennis Glindhart
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dennis Glindhart 2021-09-09 06:28:42 UTC
Updating to kernel 5.14 (Tumbleweed Sep 5 2021 snapshot) makes system unbootable when running on top of Xen-server.

This seems like a general kernel 5.14 bug, not specific to openSuSE/Tumbleweed as I can reproduce on Fedora 34 (by updating to 5.14 testing kernel from koji). 5.13 works perfectly.

On Tumbleweed, during boot it hangs in "starting dracut initqueue hook" and times out waiting for /dev/disk/by-uuid/xxxx

It seems like kernel cannot identify the disks at all (working fine on 5.13)

After entering the recovery-mode, follow is observed:

- lsblk shows only the CD-rom drive
- ls /dev/disk/ only has by-path and by-id. by-uuid directory is gone.

I've reproduced the problem in cleanly installed Tumbleweed (both MicroOS/Ignition and install from ISO), and clean installed Fedora 34.

Following kernel versions tested/are affected: 5.14.0 and 5.14.1 (5.14.1 tested on Fedora).

Can be reproduced on Fedora with both BTRFS and XFS/LVM install, so not specific to BTRFS.

Reproduced on two seperate XCP-ng 8.2 (xen-based) Hypervisors.

From hypervisors:
# yum info qemu
Installed Packages
Name        : qemu
Arch        : x86_64
Epoch       : 2
Version     : 2.10.2
Release     : 4.5.3.xcpng8.2
Size        : 19 M
Repo        : installed
From repo   : install
Summary     : qemu-dm device model
License     : GPL
Description : This package contains Qemu.

# uname -a
Linux hostname 4.19.0+1 #1 SMP Wed Dec 16 12:16:11 CET 2020 x86_64 x86_64 x86_64 GNU/Linux
Comment 1 Dennis Glindhart 2021-09-09 06:36:27 UTC
Reproduced on 5.14.2 too
Comment 2 Takashi Iwai 2021-09-09 06:53:09 UTC
Likely something about storage: Hannes, please distribute in the team.
Comment 3 Hannes Reinecke 2021-09-16 07:11:43 UTC
Sure, as soon as I got logfiles.

Can you please attach the kernel message log here?
Comment 4 Dennis Glindhart 2021-09-16 07:53:56 UTC
Created attachment 852555 [details]
rdsosreport.txt
Comment 5 Dennis Glindhart 2021-09-16 07:54:21 UTC
Created attachment 852556 [details]
journalctl
Comment 6 Dennis Glindhart 2021-09-16 07:54:58 UTC
rdososreport and journalctl output attached
Comment 7 Dennis Glindhart 2021-09-17 19:26:19 UTC
Seems the xen-blkfront module is somehow not available/loaded for the 5.14 kernel.

Kernel 5.13:

# lsmod | grep xen_
xen_netfront           45056  1
xen_blkfront           49152  6

# ls /lib/modules/5.13.13-1-default/kernel/drivers/block | grep xen
xen-blkback
xen-blkfront.ko.xz

Kernel 5.14 (In emergency mode)

# lsmod | grep xen_
xen_netfront           45056  1

# ls /lib/modules/5.14.0-1-default/kernel/drivers/block | grep xen
(empty)
Comment 8 Dennis Glindhart 2021-09-17 21:17:56 UTC
Seems filesystem (/lib/modules/5.xxx-default) is different in emergency mode than what is on the actual filesystem on disk, so above comment might not mean anything.

On the actual snapshot the module is present on filesystem (when booting into work snapshot and looking at the new (non-working snapshot)

# ls /.snapshots/102/snapshot/lib/modules/5.14.2-1-default/kernel/drivers/block/ | grep xen
xen-blkback
xen-blkfront.ko.xz
Comment 9 Dennis Glindhart 2021-09-17 21:44:35 UTC
Making sure the xen-blkfront module is being added to the initrd seems to solve the problem:

# echo 'add_drivers+="xen-blkfront"' > /etc/dracut.conf.d/1-xen.conf
# transactional-update -c <NEWSNAPSHOTWITHKERNEL5.14> initrd

Then system can reboot into the new kernel/snapshot.

Any idea what change makes this necessary? (xen)-modules not being loaded automatically anymore? Is there a better/more sensible way of doing this?

And is it possible to somehow make the upgrade path smoother not breaking peoples existing systems when updating kernel?

(I will try to read/understand/experiment with how to combine this with Ignition/Bootstrapping on MicroOS)
Comment 10 Dennis Glindhart 2021-09-24 21:22:35 UTC
*** Bug 1190814 has been marked as a duplicate of this bug. ***
Comment 11 Dennis Glindhart 2021-09-24 21:31:29 UTC
Note that this, besides happening on "ordinary" Tumbleweed's, also happens on the MicroOS image for Qemu KVM & XEN. So Xen-drivers not being included in that distribution is definitely not to be expected.

It can be worked around on existing systems with above workaround, but I haven't managed to bootstrap a new system from scratch (unless using old images with pre 5 sep image).
Comment 12 Olaf Hering 2021-09-27 08:21:47 UTC
How does dracut detect the requirement for xen-blkfront in 5.13?
How would it need to detect the requirement for xen-blkfront in 5.14?

Did a kernel ABI change, or does dract just make incorrect assumptions?
Comment 13 Olaf Hering 2021-09-27 09:37:52 UTC
There are two VMs, bug1190814-513.devlab.prv.suse.com and bug1190814-514.devlab.prv.suse.com (root:suse), which run inst-sys from snapshot 20210902/20210904
Hopefully this will reveal the ABI change.
Comment 14 Olaf Hering 2021-09-27 10:33:40 UTC
The modinfo output does not differ, the aliases are the same.
Comment 15 Olaf Hering 2021-09-27 10:40:50 UTC
There is now a VM bug1190814.devlab.prv.suse.com, runs sle15sp4 with kernel 5.3.

mkinitrd with the to-be-installed kernel fails.
Comment 16 Olaf Hering 2021-09-29 11:31:59 UTC
*** Bug 1190937 has been marked as a duplicate of this bug. ***
Comment 17 Thomas Blume 2021-09-30 10:42:15 UTC
/usr/lib/dracut/modules.d/90kernel-modules/module-setup.sh

contains:

-->
    install_block_modules() {
        instmods \
            scsi_dh_rdac scsi_dh_emc scsi_dh_alua \
            =drivers/usb/storage \
            =ide nvme vmd \
            virtio_blk virtio_scsi
--<

So it installs the modules for kvm but not for xen.
But that hasn't changed, there were never xen modules explicitly specified. 
The code also parses modules.dep.
Does that contain the right dependency for xen_netfront/blkfront?

Are the xen modules out of tree?
In that case see the description in /usr/lib/dracut/modules.d/90kernel-modules-extra/module-setup.sh:

-->
# Parses depmod configuration and calls instmods for out-of-tree kernel
# modules found.  Specifically, kernel modules inside directories that
# come from the following places are included (if these kernel modules
# are present in modules.dep):
#   - "search" configuration option;
#   - "override" configuration option (matching an exact file name constructed
#      by concatenating the provided directory and the kernel module name);
#   - "external" configuration option (if "external" is a part of "search"
#     configuration).
# (See depmod.d(5) for details.)
#
# This module has the following variables available for configuration:
#   - "depmod_modules_dep" - Path to the modules.dep file
#                            ("$srcmods/modules.dep" by default);
#   - "depmod_module_dir" - Directory containing kernel modules ("$srcmods"
#                           by default);
#   - "depmod_configs" - array of depmod configuration paths to parse
#                        (as supplied to depmod -C, ("/run/depmod.d/"
#                        "/etc/depmod.d/" "/lib/depmod.d/") by default).
--<
Comment 18 Olaf Hering 2021-09-30 15:19:24 UTC
(In reply to Thomas Blume from comment #17)
> Does that contain the right dependency for xen_netfront/blkfront?

See comment#15 (and comment#13) for three live systems that show the issue.

> Are the xen modules out of tree?

No, they are in kernel-default since sle12sp2.
Comment 19 Thomas Blume 2021-09-30 16:09:41 UTC
(In reply to Olaf Hering from comment #18)
> (In reply to Thomas Blume from comment #17)
> > Does that contain the right dependency for xen_netfront/blkfront?
> 
> See comment#15 (and comment#13) for three live systems that show the issue.

Ok thanks, I can see this:

-->
dracut-install: Handling /lib/modules/5.14.5-7.g24dfde2-default//kernel/drivers/block/xen-blkfront.ko.xz
dracut-install: No symbol or path match for '/lib/modules/5.14.5-7.g24dfde2-default//kernel/drivers/block/xen-blkfront.ko.xz'
--<

which comes from dracut-install.c

-->
        if (!check_module_path(path) || !check_module_symbols(mod)) {
                log_debug("No symbol or path match for '%s'", path);
                return 1;
        }
--<

For kernel 5.3.18-59.19-default this is different:

-->
dracut-install: Handling /lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz
dracut-install: Module xen_blkfront: symbol blk_cleanup_queue matched inclusion filter
dracut-install: dracut_install '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz'
dracut-install: dracut_install('/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz', '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz')
dracut-install: dracut_install ret = 0
dracut-install: cp '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' '/var/tmp/dracut.dTMGuu/initramfs/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz'
dracut-install: cp ret = 0
dracut-install: dracut_install ret = 0
dracut-install: dracut_install 'xen_blkfront' OK
--<

Not really sure whether dracut is to blame therefore.
Comment 20 Fabian Vogt 2021-10-01 12:05:55 UTC
(In reply to Thomas Blume from comment #19)
> (In reply to Olaf Hering from comment #18)
> > (In reply to Thomas Blume from comment #17)
> > > Does that contain the right dependency for xen_netfront/blkfront?
> > 
> > See comment#15 (and comment#13) for three live systems that show the issue.
> 
> Ok thanks, I can see this:
> 
> -->
> dracut-install: Handling
> /lib/modules/5.14.5-7.g24dfde2-default//kernel/drivers/block/xen-blkfront.ko.
> xz
> dracut-install: No symbol or path match for
> '/lib/modules/5.14.5-7.g24dfde2-default//kernel/drivers/block/xen-blkfront.
> ko.xz'
> --<
> 
> which comes from dracut-install.c
> 
> -->
>         if (!check_module_path(path) || !check_module_symbols(mod)) {
>                 log_debug("No symbol or path match for '%s'", path);
>                 return 1;
>         }
> --<
> 
> For kernel 5.3.18-59.19-default this is different:
> 
> -->
> dracut-install: Handling
> /lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz
> dracut-install: Module xen_blkfront: symbol blk_cleanup_queue matched
> inclusion filter
> dracut-install: dracut_install
> '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz'
> '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz'
> dracut-install:
> dracut_install('/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-
> blkfront.ko.xz',
> '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz')
> dracut-install: dracut_install ret = 0
> dracut-install: cp
> '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz'
> '/var/tmp/dracut.dTMGuu/initramfs/lib/modules/5.3.18-59.19-default//kernel/
> drivers/block/xen-blkfront.ko.xz'
> dracut-install: cp ret = 0
> dracut-install: dracut_install ret = 0
> dracut-install: dracut_install 'xen_blkfront' OK
> --<
> 
> Not really sure whether dracut is to blame therefore.

FWICT, the relevant lines are:

local _blockfuncs='ahci_platform_get_resources|ata_scsi_ioctl|scsi_add_host|blk_cleanup_queue|register_mtd_blktrans|...
...
dracut_instmods -o -s "${_blockfuncs}" "=drivers"

It goes through all driver modules and installs those which have symbols in the given list.

xen-blkfront was ported away from blk_cleanup_queue to blk_cleanup_disk in 3b62c140e93d32, so it no longer matches.

Adding blk_cleanup_disk to the list might be the right fix.
Comment 21 Antonio Feijoo 2021-10-01 13:29:25 UTC
> Adding blk_cleanup_disk to the list might be the right fix.

Tested with success. I can confirm that this fix solves the issue.
Comment 22 Antonio Feijoo 2021-10-01 13:39:05 UTC
Actually already exists an upstream fix:

https://github.com/dracutdevs/dracut/commit/b292ce72

Backporting it to our 055 branch.
Comment 23 Olaf Hering 2021-10-01 14:38:21 UTC
I think this has to be backported wherever it is broken...
Comment 24 Antonio Feijoo 2021-10-01 15:05:00 UTC
AFAIK it would only affect Tumbleweed and SLE-15 SP4, both with kernel 5.14, so it should be enough to fix the dracut 055 branch.
Anyway, if I'm wrong, Thomas will tell us which other versions are affected.
Comment 25 Fabian Vogt 2021-10-04 12:09:55 UTC
(In reply to Antonio Feijoo from comment #24)
> AFAIK it would only affect Tumbleweed and SLE-15 SP4, both with kernel 5.14,
> so it should be enough to fix the dracut 055 branch.
> Anyway, if I'm wrong, Thomas will tell us which other versions are affected.

It's not that uncommon to test newer kernels also on systems with older dracut. So ideally this gets backported to all affected dracut versions.
Comment 26 Thomas Blume 2021-10-04 12:23:17 UTC
(In reply to Fabian Vogt from comment #25)
> (In reply to Antonio Feijoo from comment #24)
> > AFAIK it would only affect Tumbleweed and SLE-15 SP4, both with kernel 5.14,
> > so it should be enough to fix the dracut 055 branch.
> > Anyway, if I'm wrong, Thomas will tell us which other versions are affected.
> 
> It's not that uncommon to test newer kernels also on systems with older
> dracut. So ideally this gets backported to all affected dracut versions.

Let's settle on SLE15. 
That would be then dracut-055 and dracut-049.
I wouldn't really like to see a kernel version 5 on a SLE12.

Fabian, ok for you?
Comment 27 Olaf Hering 2021-10-05 06:25:45 UTC
I accidentally rebooted the host. Let me know if the VMs are still needed.
Comment 28 Fabian Vogt 2021-10-05 07:43:55 UTC
(In reply to Thomas Blume from comment #26)
> (In reply to Fabian Vogt from comment #25)
> > (In reply to Antonio Feijoo from comment #24)
> > > AFAIK it would only affect Tumbleweed and SLE-15 SP4, both with kernel 5.14,
> > > so it should be enough to fix the dracut 055 branch.
> > > Anyway, if I'm wrong, Thomas will tell us which other versions are affected.
> > 
> > It's not that uncommon to test newer kernels also on systems with older
> > dracut. So ideally this gets backported to all affected dracut versions.
> 
> Let's settle on SLE15. 
> That would be then dracut-055 and dracut-049.
> I wouldn't really like to see a kernel version 5 on a SLE12.
> 
> Fabian, ok for you?

Sounds good!
Comment 29 OBSbugzilla Bot 2021-10-12 16:42:21 UTC
This is an autogenerated message for OBS integration:
This bug (1190326) was mentioned in
https://build.opensuse.org/request/show/924907 Factory / dracut
Comment 32 Dennis Glindhart 2021-10-26 09:15:39 UTC
Fix for dracut is included in snapshot (20211021) and resolved the issue.
Comment 35 Swamp Workflow Management 2021-11-24 02:26:07 UTC
SUSE-RU-2021:3782-1: An update that has three recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1187190,1188713,1190326
CVE References: 
JIRA References: 
Sources used:
SUSE MicroOS 5.1 (src):    dracut-049.1+suse.216.gf705637b-3.45.1
SUSE Linux Enterprise Module for Basesystem 15-SP3 (src):    dracut-049.1+suse.216.gf705637b-3.45.1
SUSE Linux Enterprise Module for Basesystem 15-SP2 (src):    dracut-049.1+suse.216.gf705637b-3.45.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 36 Swamp Workflow Management 2021-11-24 02:38:06 UTC
openSUSE-RU-2021:3782-1: An update that has three recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1187190,1188713,1190326
CVE References: 
JIRA References: 
Sources used:
openSUSE Leap 15.3 (src):    dracut-049.1+suse.216.gf705637b-3.45.1
Comment 37 Swamp Workflow Management 2021-11-27 14:19:37 UTC
openSUSE-RU-2021:1506-1: An update that has three recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1187190,1188713,1190326
CVE References: 
JIRA References: 
Sources used:
openSUSE Leap 15.2 (src):    dracut-049.1+suse.216.gf705637b-lp152.2.36.1