Bugzilla – Bug 1136641
Failure to install leap15.1 on system with existing md-RAID and LVM
Last modified: 2020-02-26 12:00:40 UTC
Back in January 2017 I installed leap42.2 (IIRC) on a system using two SSDs and two HDDs creating primary partitions and a RAID1 set on each of the pairs, set up LVM PVs and VGs on top of those RAID devices, everything fine so far as I could tell. That system has served me well to this date and received online in-place upgrades to 42.3 and 15.0 just fine. After upgrading mainboard, CPU and RAM recently with 15.0 I encountered a few minor problems with the 15.0 installation (Intel GPU started rendering sloppy, forgetting to update window contents, but this is off topic...). Therefore I decided to try to start with a fresh install onto a free LV on my existing setup. But during installation I get an error requester complaining that both volume groups were missing physical devices rendering them unusable and destined for deletion. Switching to the shell on virtual console 2 however I could verify that everything still seemed in perfect order. /proc/mdstat showed all four RAID members up, LVM showed all PVs, VGs and LVs. lsblk output was also perfectly fine. I believe my setup is valid and sound and therefore consider this failure a bug and regression comparing to previous behaviour.
Created attachment 806255 [details] yast log from installation
Created attachment 806256 [details] various command output describing the disk layout The output from - /proc/mdstat - pvs - vgs - lvs - lsblk to give a better picture.
Thanks for the report including logs. I am able to reproduce the problem with Leap 15.1. With Leap 15.0 the problem does not appear. One problem is that the output of pvs has changed. In Leap 15.0 pvs only reports the physical volumes in the RAID. In Leap 15.1 it also reports the physical volumes on the partitions used for the RAID. For this YaST could simple be updated. But when activating LVM in the installed system I get errors from 'vgchange -a y' that some physical volumes have duplicates and thus activation fails (on a named RAID, on an unnamed RAID it works): # vgchange -ay WARNING: found device with duplicate /dev/sdc2 WARNING: found device with duplicate /dev/md127 WARNING: Disabling lvmetad cache which does not support duplicate PVs. WARNING: Scan found duplicate PVs. WARNING: Not using lvmetad because cache update failed. WARNING: Not using device /dev/sdc2 for PV ECQlxl-NZZr-fLCA-Bc2P-L1yQ-szWg-cXKEQC. WARNING: Not using device /dev/md127 for PV ECQlxl-NZZr-fLCA-Bc2P-L1yQ-szWg-cXKEQC. WARNING: PV ECQlxl-NZZr-fLCA-Bc2P-L1yQ-szWg-cXKEQC prefers device /dev/sdb2 because of previous preference. WARNING: PV ECQlxl-NZZr-fLCA-Bc2P-L1yQ-szWg-cXKEQC prefers device /dev/sdb2 because of previous preference. Cannot activate LVs in VG vg-b while PVs appear on duplicate devices. 0 logical volume(s) in volume group "vg-b" now active 1 logical volume(s) in volume group "vg-a" now active This looks like a problem of the openSUSE base system.
Created attachment 806655 [details] output of pvs in Leap 15.0 inst-sys
Created attachment 806656 [details] output of pvs in Leap 15.1 inst-sys
Trying to narrow down the problem I decided to try a inline upgrade of my current system running 15.0. After reboot LVM complained, see below, rendering the system not bootable. Thanks to btrfs and snapper I could roll back.... Jun 01 23:46:00 dragon lvm[644]: WARNING: found device with duplicate /dev/sdb1 Jun 01 23:46:00 dragon lvm[644]: WARNING: found device with duplicate /dev/sdd1 Jun 01 23:46:00 dragon lvm[644]: WARNING: found device with duplicate /dev/md127 Jun 01 23:46:00 dragon lvm[644]: WARNING: Disabling lvmetad cache which does not support duplicate PVs. Jun 01 23:46:00 dragon lvm[644]: WARNING: Scan found duplicate PVs. Jun 01 23:46:00 dragon lvm[644]: WARNING: Not using lvmetad because cache update failed. ... Jun 01 23:46:00 dragon lvm[644]: WARNING: Not using device /dev/sdb1 for PV ytDoZ3-EJu7-mMKH-hDW1-w1uJ-P3Hg-sC0wLG. Jun 01 23:46:00 dragon lvm[644]: WARNING: Not using device /dev/md127 for PV ytDoZ3-EJu7-mMKH-hDW1-w1uJ-P3Hg-sC0wLG. Jun 01 23:46:00 dragon lvm[644]: WARNING: Not using device /dev/sdd1 for PV dd586g-9Nr7-5dJ7-cAyY-xIkS-yPOB-tjC97b. Jun 01 23:46:00 dragon lvm[644]: WARNING: PV ytDoZ3-EJu7-mMKH-hDW1-w1uJ-P3Hg-sC0wLG prefers device /dev/sda1 because of previous preference. Jun 01 23:46:00 dragon lvm[644]: WARNING: PV dd586g-9Nr7-5dJ7-cAyY-xIkS-yPOB-tjC97b prefers device /dev/sdc1 because of previous preference. Jun 01 23:46:00 dragon lvm[644]: WARNING: Device mismatch detected for vgssd/leap422 which is accessing /dev/md127 instead of /dev/sda1. Jun 01 23:46:00 dragon lvm[644]: 1 logical volume(s) in volume group "vgssd" monitored I would guess that the have been severe changes within LVM causing the problem, right?
I had a similar Problem with one of my machines. In /etc/lvm/lvm.conf I had to change the filter entry to filter = [ "a|/dev/md.*|", "r|/.*/|" ] This assumes that ALL lvm physical volumes are on mdraid. In other cases the filter entry will look different and has to consider other partitions, too.
Just tested: trying to install current tumbleweed from USB stick gives the same problems as leap 15.1. Just FYI.
Last night I did my first 15.0 to 15.1 online/zypper dup upgrade involving software RAID1 (but no LVM), with kernel locked. Existing /etc/mdadm.conf, with 16 partitions on two 320GB Seagates comprising 8 md devices, did not get changed. To perform initial upgrade and kernel installation I used "systemd.log_level=debug printk.devkmsg=on" on cmdline, and saved each's journal. First boot with distribution kernel lp151.27.3 worked as expected, as do subsequent boots with 15.1 update kernel lp151.28.4.1.
Note that LVM2 has a feature called "md_component_detection", which should prevent this from happening. The findings in this bug suggest that this detection doesn't work as it should, unless the reporter switched md component detection off (which I doubt). Tobias, what happens if you run with no "filter" expression at all in lvm.conf?
Just do a mark, This bug may relate to bug 1137296.
...and to Bug: https://bugzilla.opensuse.org/show_bug.cgi?id=1099391
1099391, 1099329 & 1136641 are same bug. need below patch (under lvm2 stable-2.02 branch): these patch had been merged in tumbleweed (lvm2-2.03.05) patch: ``` commit a188b1e513ed5ca0f5f3702c823490f5610d4495 Author: David Teigland <teigland@redhat.com> Date: Fri Nov 30 16:32:32 2018 -0600 pvscan lvmetad: use udev info to improve md component detection commit a01e1fec0fe7c2fa61577c0e636e907cde7279ea Author: David Teigland <teigland@redhat.com> Date: Thu Nov 29 14:06:20 2018 -0600 pvscan lvmetad: use full md filter when md 1.0 devices are present commit 0e42ebd6d4012d210084a9ccf8d76f853726de3c Author: Peter Rajnoha <prajnoha@redhat.com> Date: Thu Nov 29 11:51:05 2018 -0600 scan: md metadata version 0.90 is at the end of disk commit e7bb50880901a4462e350ce0d272a63aa8440781 Author: David Teigland <teigland@redhat.com> Date: Thu Oct 18 11:32:32 2018 -0500 scan: enable full md filter when md 1.0 devices are present commit de2863739f2ea17d89d0e442379109f967b5919d Author: David Teigland <teigland@redhat.com> Date: Fri Jun 15 11:42:10 2018 -0500 scan: use full md filter when md 1.0 devices are present commit c527a0cbfc391645d30407d2dc4a30275c6472f1 Author: David Teigland <teigland@redhat.com> Date: Mon Aug 27 11:15:35 2018 -0500 lvmetad: improve scan for pvscan all ``` *** This bug has been marked as a duplicate of bug 1099329 ***
Hi, sorry, fixing tumbleweed only does not help me (and others) at all, as I currently have no upgrade path to leap 15.1 if current lvm2 package in 15.1 is not fixed as well. And 15.0 gets out of support in about a month and 15.2 which will hopefully incorporate the fix will not be available for another six months. So, could you please fix this issue for 15.1, too?! Thank you!
sles-15sp1 patches were merged by request: https://build.suse.de/request/show/203593 sles-15sp2 is lvm2-2.03.05+, which already contains these patches.
for leap 15.1: https://build.opensuse.org/request/show/743388
15.1 will be imported from SLE15 SP1 once it is released.
Just wondering, as it has already been more than three weeks since the supposed commit, when is the ETA of the fixed package for SLES 15.1 and trickling down to Leap 15.1? I can't see neither (SLES at work, Leap at home)... Support for Leap 15.0 is ending and I really would like to up upgrade my system soon without having to lock old lvm2 packages from 15.0 in order to succeed. Thank you!
I had sent my fixed codes into sles-15sp1 20 days ago. It is currently in testing status. I can't control the speed of release. Please be patient. If you want, you can download my fix from my private project page: https://build.opensuse.org/package/show/home:hmzhao:branches:openSUSE:Leap:15.1:Update/lvm2 The patch files "bug-1145231_xxx" is for this bug fixes.
For SLES product, you can raise a ticket from customer support to require PTF package for this bug.
*** Bug 1156086 has been marked as a duplicate of this bug. ***
*** Bug 1160661 has been marked as a duplicate of this bug. ***
*** Bug 1161197 has been marked as a duplicate of this bug. ***
*** Bug 1164926 has been marked as a duplicate of this bug. ***