Bugzilla – Bug 1190326
Update to kernel 5.14 makes disks disapear/system unbootable
Last modified: 2021-11-27 14:19:37 UTC
Updating to kernel 5.14 (Tumbleweed Sep 5 2021 snapshot) makes system unbootable when running on top of Xen-server. This seems like a general kernel 5.14 bug, not specific to openSuSE/Tumbleweed as I can reproduce on Fedora 34 (by updating to 5.14 testing kernel from koji). 5.13 works perfectly. On Tumbleweed, during boot it hangs in "starting dracut initqueue hook" and times out waiting for /dev/disk/by-uuid/xxxx It seems like kernel cannot identify the disks at all (working fine on 5.13) After entering the recovery-mode, follow is observed: - lsblk shows only the CD-rom drive - ls /dev/disk/ only has by-path and by-id. by-uuid directory is gone. I've reproduced the problem in cleanly installed Tumbleweed (both MicroOS/Ignition and install from ISO), and clean installed Fedora 34. Following kernel versions tested/are affected: 5.14.0 and 5.14.1 (5.14.1 tested on Fedora). Can be reproduced on Fedora with both BTRFS and XFS/LVM install, so not specific to BTRFS. Reproduced on two seperate XCP-ng 8.2 (xen-based) Hypervisors. From hypervisors: # yum info qemu Installed Packages Name : qemu Arch : x86_64 Epoch : 2 Version : 2.10.2 Release : 4.5.3.xcpng8.2 Size : 19 M Repo : installed From repo : install Summary : qemu-dm device model License : GPL Description : This package contains Qemu. # uname -a Linux hostname 4.19.0+1 #1 SMP Wed Dec 16 12:16:11 CET 2020 x86_64 x86_64 x86_64 GNU/Linux
Reproduced on 5.14.2 too
Likely something about storage: Hannes, please distribute in the team.
Sure, as soon as I got logfiles. Can you please attach the kernel message log here?
Created attachment 852555 [details] rdsosreport.txt
Created attachment 852556 [details] journalctl
rdososreport and journalctl output attached
Seems the xen-blkfront module is somehow not available/loaded for the 5.14 kernel. Kernel 5.13: # lsmod | grep xen_ xen_netfront 45056 1 xen_blkfront 49152 6 # ls /lib/modules/5.13.13-1-default/kernel/drivers/block | grep xen xen-blkback xen-blkfront.ko.xz Kernel 5.14 (In emergency mode) # lsmod | grep xen_ xen_netfront 45056 1 # ls /lib/modules/5.14.0-1-default/kernel/drivers/block | grep xen (empty)
Seems filesystem (/lib/modules/5.xxx-default) is different in emergency mode than what is on the actual filesystem on disk, so above comment might not mean anything. On the actual snapshot the module is present on filesystem (when booting into work snapshot and looking at the new (non-working snapshot) # ls /.snapshots/102/snapshot/lib/modules/5.14.2-1-default/kernel/drivers/block/ | grep xen xen-blkback xen-blkfront.ko.xz
Making sure the xen-blkfront module is being added to the initrd seems to solve the problem: # echo 'add_drivers+="xen-blkfront"' > /etc/dracut.conf.d/1-xen.conf # transactional-update -c <NEWSNAPSHOTWITHKERNEL5.14> initrd Then system can reboot into the new kernel/snapshot. Any idea what change makes this necessary? (xen)-modules not being loaded automatically anymore? Is there a better/more sensible way of doing this? And is it possible to somehow make the upgrade path smoother not breaking peoples existing systems when updating kernel? (I will try to read/understand/experiment with how to combine this with Ignition/Bootstrapping on MicroOS)
*** Bug 1190814 has been marked as a duplicate of this bug. ***
Note that this, besides happening on "ordinary" Tumbleweed's, also happens on the MicroOS image for Qemu KVM & XEN. So Xen-drivers not being included in that distribution is definitely not to be expected. It can be worked around on existing systems with above workaround, but I haven't managed to bootstrap a new system from scratch (unless using old images with pre 5 sep image).
How does dracut detect the requirement for xen-blkfront in 5.13? How would it need to detect the requirement for xen-blkfront in 5.14? Did a kernel ABI change, or does dract just make incorrect assumptions?
There are two VMs, bug1190814-513.devlab.prv.suse.com and bug1190814-514.devlab.prv.suse.com (root:suse), which run inst-sys from snapshot 20210902/20210904 Hopefully this will reveal the ABI change.
The modinfo output does not differ, the aliases are the same.
There is now a VM bug1190814.devlab.prv.suse.com, runs sle15sp4 with kernel 5.3. mkinitrd with the to-be-installed kernel fails.
*** Bug 1190937 has been marked as a duplicate of this bug. ***
/usr/lib/dracut/modules.d/90kernel-modules/module-setup.sh contains: --> install_block_modules() { instmods \ scsi_dh_rdac scsi_dh_emc scsi_dh_alua \ =drivers/usb/storage \ =ide nvme vmd \ virtio_blk virtio_scsi --< So it installs the modules for kvm but not for xen. But that hasn't changed, there were never xen modules explicitly specified. The code also parses modules.dep. Does that contain the right dependency for xen_netfront/blkfront? Are the xen modules out of tree? In that case see the description in /usr/lib/dracut/modules.d/90kernel-modules-extra/module-setup.sh: --> # Parses depmod configuration and calls instmods for out-of-tree kernel # modules found. Specifically, kernel modules inside directories that # come from the following places are included (if these kernel modules # are present in modules.dep): # - "search" configuration option; # - "override" configuration option (matching an exact file name constructed # by concatenating the provided directory and the kernel module name); # - "external" configuration option (if "external" is a part of "search" # configuration). # (See depmod.d(5) for details.) # # This module has the following variables available for configuration: # - "depmod_modules_dep" - Path to the modules.dep file # ("$srcmods/modules.dep" by default); # - "depmod_module_dir" - Directory containing kernel modules ("$srcmods" # by default); # - "depmod_configs" - array of depmod configuration paths to parse # (as supplied to depmod -C, ("/run/depmod.d/" # "/etc/depmod.d/" "/lib/depmod.d/") by default). --<
(In reply to Thomas Blume from comment #17) > Does that contain the right dependency for xen_netfront/blkfront? See comment#15 (and comment#13) for three live systems that show the issue. > Are the xen modules out of tree? No, they are in kernel-default since sle12sp2.
(In reply to Olaf Hering from comment #18) > (In reply to Thomas Blume from comment #17) > > Does that contain the right dependency for xen_netfront/blkfront? > > See comment#15 (and comment#13) for three live systems that show the issue. Ok thanks, I can see this: --> dracut-install: Handling /lib/modules/5.14.5-7.g24dfde2-default//kernel/drivers/block/xen-blkfront.ko.xz dracut-install: No symbol or path match for '/lib/modules/5.14.5-7.g24dfde2-default//kernel/drivers/block/xen-blkfront.ko.xz' --< which comes from dracut-install.c --> if (!check_module_path(path) || !check_module_symbols(mod)) { log_debug("No symbol or path match for '%s'", path); return 1; } --< For kernel 5.3.18-59.19-default this is different: --> dracut-install: Handling /lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz dracut-install: Module xen_blkfront: symbol blk_cleanup_queue matched inclusion filter dracut-install: dracut_install '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' dracut-install: dracut_install('/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz', '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz') dracut-install: dracut_install ret = 0 dracut-install: cp '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' '/var/tmp/dracut.dTMGuu/initramfs/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' dracut-install: cp ret = 0 dracut-install: dracut_install ret = 0 dracut-install: dracut_install 'xen_blkfront' OK --< Not really sure whether dracut is to blame therefore.
(In reply to Thomas Blume from comment #19) > (In reply to Olaf Hering from comment #18) > > (In reply to Thomas Blume from comment #17) > > > Does that contain the right dependency for xen_netfront/blkfront? > > > > See comment#15 (and comment#13) for three live systems that show the issue. > > Ok thanks, I can see this: > > --> > dracut-install: Handling > /lib/modules/5.14.5-7.g24dfde2-default//kernel/drivers/block/xen-blkfront.ko. > xz > dracut-install: No symbol or path match for > '/lib/modules/5.14.5-7.g24dfde2-default//kernel/drivers/block/xen-blkfront. > ko.xz' > --< > > which comes from dracut-install.c > > --> > if (!check_module_path(path) || !check_module_symbols(mod)) { > log_debug("No symbol or path match for '%s'", path); > return 1; > } > --< > > For kernel 5.3.18-59.19-default this is different: > > --> > dracut-install: Handling > /lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz > dracut-install: Module xen_blkfront: symbol blk_cleanup_queue matched > inclusion filter > dracut-install: dracut_install > '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' > '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' > dracut-install: > dracut_install('/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen- > blkfront.ko.xz', > '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz') > dracut-install: dracut_install ret = 0 > dracut-install: cp > '/lib/modules/5.3.18-59.19-default//kernel/drivers/block/xen-blkfront.ko.xz' > '/var/tmp/dracut.dTMGuu/initramfs/lib/modules/5.3.18-59.19-default//kernel/ > drivers/block/xen-blkfront.ko.xz' > dracut-install: cp ret = 0 > dracut-install: dracut_install ret = 0 > dracut-install: dracut_install 'xen_blkfront' OK > --< > > Not really sure whether dracut is to blame therefore. FWICT, the relevant lines are: local _blockfuncs='ahci_platform_get_resources|ata_scsi_ioctl|scsi_add_host|blk_cleanup_queue|register_mtd_blktrans|... ... dracut_instmods -o -s "${_blockfuncs}" "=drivers" It goes through all driver modules and installs those which have symbols in the given list. xen-blkfront was ported away from blk_cleanup_queue to blk_cleanup_disk in 3b62c140e93d32, so it no longer matches. Adding blk_cleanup_disk to the list might be the right fix.
> Adding blk_cleanup_disk to the list might be the right fix. Tested with success. I can confirm that this fix solves the issue.
Actually already exists an upstream fix: https://github.com/dracutdevs/dracut/commit/b292ce72 Backporting it to our 055 branch.
I think this has to be backported wherever it is broken...
AFAIK it would only affect Tumbleweed and SLE-15 SP4, both with kernel 5.14, so it should be enough to fix the dracut 055 branch. Anyway, if I'm wrong, Thomas will tell us which other versions are affected.
(In reply to Antonio Feijoo from comment #24) > AFAIK it would only affect Tumbleweed and SLE-15 SP4, both with kernel 5.14, > so it should be enough to fix the dracut 055 branch. > Anyway, if I'm wrong, Thomas will tell us which other versions are affected. It's not that uncommon to test newer kernels also on systems with older dracut. So ideally this gets backported to all affected dracut versions.
(In reply to Fabian Vogt from comment #25) > (In reply to Antonio Feijoo from comment #24) > > AFAIK it would only affect Tumbleweed and SLE-15 SP4, both with kernel 5.14, > > so it should be enough to fix the dracut 055 branch. > > Anyway, if I'm wrong, Thomas will tell us which other versions are affected. > > It's not that uncommon to test newer kernels also on systems with older > dracut. So ideally this gets backported to all affected dracut versions. Let's settle on SLE15. That would be then dracut-055 and dracut-049. I wouldn't really like to see a kernel version 5 on a SLE12. Fabian, ok for you?
I accidentally rebooted the host. Let me know if the VMs are still needed.
(In reply to Thomas Blume from comment #26) > (In reply to Fabian Vogt from comment #25) > > (In reply to Antonio Feijoo from comment #24) > > > AFAIK it would only affect Tumbleweed and SLE-15 SP4, both with kernel 5.14, > > > so it should be enough to fix the dracut 055 branch. > > > Anyway, if I'm wrong, Thomas will tell us which other versions are affected. > > > > It's not that uncommon to test newer kernels also on systems with older > > dracut. So ideally this gets backported to all affected dracut versions. > > Let's settle on SLE15. > That would be then dracut-055 and dracut-049. > I wouldn't really like to see a kernel version 5 on a SLE12. > > Fabian, ok for you? Sounds good!
This is an autogenerated message for OBS integration: This bug (1190326) was mentioned in https://build.opensuse.org/request/show/924907 Factory / dracut
Fix for dracut is included in snapshot (20211021) and resolved the issue.
SUSE-RU-2021:3782-1: An update that has three recommended fixes can now be installed. Category: recommended (moderate) Bug References: 1187190,1188713,1190326 CVE References: JIRA References: Sources used: SUSE MicroOS 5.1 (src): dracut-049.1+suse.216.gf705637b-3.45.1 SUSE Linux Enterprise Module for Basesystem 15-SP3 (src): dracut-049.1+suse.216.gf705637b-3.45.1 SUSE Linux Enterprise Module for Basesystem 15-SP2 (src): dracut-049.1+suse.216.gf705637b-3.45.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
openSUSE-RU-2021:3782-1: An update that has three recommended fixes can now be installed. Category: recommended (moderate) Bug References: 1187190,1188713,1190326 CVE References: JIRA References: Sources used: openSUSE Leap 15.3 (src): dracut-049.1+suse.216.gf705637b-3.45.1
openSUSE-RU-2021:1506-1: An update that has three recommended fixes can now be installed. Category: recommended (moderate) Bug References: 1187190,1188713,1190326 CVE References: JIRA References: Sources used: openSUSE Leap 15.2 (src): dracut-049.1+suse.216.gf705637b-lp152.2.36.1