Bug 1136641 - Failure to install leap15.1 on system with existing md-RAID and LVM
Failure to install leap15.1 on system with existing md-RAID and LVM
Status: RESOLVED DUPLICATE of bug 1099329
: 1156086 1160661 1161197 1164926 (view as bug list)
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Basesystem
Leap 15.1
x86-64 Other
: P2 - High : Major with 6 votes (vote)
: ---
Assigned To: heming zhao
E-mail List
https://trello.com/c/waeprl1L
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-05-28 18:30 UTC by Tobias Abt
Modified: 2020-02-26 12:00 UTC (History)
13 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
yast log from installation (127.55 KB, application/x-xz)
2019-05-28 18:32 UTC, Tobias Abt
Details
various command output describing the disk layout (5.34 KB, text/plain)
2019-05-28 18:35 UTC, Tobias Abt
Details
output of pvs in Leap 15.0 inst-sys (1.46 KB, text/x-log)
2019-06-03 12:53 UTC, Arvin Schnell
Details
output of pvs in Leap 15.1 inst-sys (2.27 KB, text/x-log)
2019-06-03 12:53 UTC, Arvin Schnell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Abt 2019-05-28 18:30:28 UTC
Back in January 2017 I installed leap42.2 (IIRC) on a system using two SSDs and two HDDs creating primary partitions and a RAID1 set on each of the pairs, set up LVM PVs and VGs on top of those RAID devices, everything fine so far as I could tell.

That system has served me well to this date and received online in-place upgrades to 42.3 and 15.0 just fine.

After upgrading mainboard, CPU and RAM recently with 15.0 I encountered a few minor problems with the 15.0 installation (Intel GPU started rendering sloppy, forgetting to update window contents, but this is off topic...). Therefore I decided to try to start with a fresh install onto a free LV on my existing setup.

But during installation I get an error requester complaining that both volume groups were missing physical devices rendering them unusable and destined for deletion.

Switching to the shell on virtual console 2 however I could verify that everything still seemed in perfect order. /proc/mdstat showed all four RAID members up, LVM showed all PVs, VGs and LVs. lsblk output was also perfectly fine.

I believe my setup is valid and sound and therefore consider this failure a bug and regression comparing to previous behaviour.
Comment 1 Tobias Abt 2019-05-28 18:32:26 UTC
Created attachment 806255 [details]
yast log from installation
Comment 2 Tobias Abt 2019-05-28 18:35:05 UTC
Created attachment 806256 [details]
various command output describing the disk layout

The output from
- /proc/mdstat
- pvs
- vgs
- lvs
- lsblk
to give a better picture.
Comment 3 Arvin Schnell 2019-06-03 12:45:42 UTC
Thanks for the report including logs.

I am able to reproduce the problem with Leap 15.1. With Leap 15.0
the problem does not appear. One problem is that the output of pvs
has changed. In Leap 15.0 pvs only reports the physical volumes in
the RAID. In Leap 15.1 it also reports the physical volumes on the
partitions used for the RAID. For this YaST could simple be updated.

But when activating LVM in the installed system I get errors from
'vgchange -a y' that some physical volumes have duplicates and thus
activation fails (on a named RAID, on an unnamed RAID it works):

# vgchange -ay
  WARNING: found device with duplicate /dev/sdc2
  WARNING: found device with duplicate /dev/md127
  WARNING: Disabling lvmetad cache which does not support duplicate PVs.
  WARNING: Scan found duplicate PVs.
  WARNING: Not using lvmetad because cache update failed.
  WARNING: Not using device /dev/sdc2 for PV ECQlxl-NZZr-fLCA-Bc2P-L1yQ-szWg-cXKEQC.
  WARNING: Not using device /dev/md127 for PV ECQlxl-NZZr-fLCA-Bc2P-L1yQ-szWg-cXKEQC.
  WARNING: PV ECQlxl-NZZr-fLCA-Bc2P-L1yQ-szWg-cXKEQC prefers device /dev/sdb2 because of previous preference.
  WARNING: PV ECQlxl-NZZr-fLCA-Bc2P-L1yQ-szWg-cXKEQC prefers device /dev/sdb2 because of previous preference.
  Cannot activate LVs in VG vg-b while PVs appear on duplicate devices.
  0 logical volume(s) in volume group "vg-b" now active
  1 logical volume(s) in volume group "vg-a" now active

This looks like a problem of the openSUSE base system.
Comment 5 Arvin Schnell 2019-06-03 12:53:01 UTC
Created attachment 806655 [details]
output of pvs in Leap 15.0 inst-sys
Comment 6 Arvin Schnell 2019-06-03 12:53:19 UTC
Created attachment 806656 [details]
output of pvs in Leap 15.1 inst-sys
Comment 7 Tobias Abt 2019-06-04 20:55:36 UTC
Trying to narrow down the problem I decided to try a inline upgrade of my current system running 15.0. After reboot LVM complained, see below, rendering the system not bootable. Thanks to btrfs and snapper I could roll back....

Jun 01 23:46:00 dragon lvm[644]:   WARNING: found device with duplicate /dev/sdb1
Jun 01 23:46:00 dragon lvm[644]:   WARNING: found device with duplicate /dev/sdd1
Jun 01 23:46:00 dragon lvm[644]:   WARNING: found device with duplicate /dev/md127
Jun 01 23:46:00 dragon lvm[644]:   WARNING: Disabling lvmetad cache which does not support duplicate PVs.
Jun 01 23:46:00 dragon lvm[644]:   WARNING: Scan found duplicate PVs.
Jun 01 23:46:00 dragon lvm[644]:   WARNING: Not using lvmetad because cache update failed.
...
Jun 01 23:46:00 dragon lvm[644]:   WARNING: Not using device /dev/sdb1 for PV ytDoZ3-EJu7-mMKH-hDW1-w1uJ-P3Hg-sC0wLG.
Jun 01 23:46:00 dragon lvm[644]:   WARNING: Not using device /dev/md127 for PV ytDoZ3-EJu7-mMKH-hDW1-w1uJ-P3Hg-sC0wLG.
Jun 01 23:46:00 dragon lvm[644]:   WARNING: Not using device /dev/sdd1 for PV dd586g-9Nr7-5dJ7-cAyY-xIkS-yPOB-tjC97b.
Jun 01 23:46:00 dragon lvm[644]:   WARNING: PV ytDoZ3-EJu7-mMKH-hDW1-w1uJ-P3Hg-sC0wLG prefers device /dev/sda1 because of previous preference.
Jun 01 23:46:00 dragon lvm[644]:   WARNING: PV dd586g-9Nr7-5dJ7-cAyY-xIkS-yPOB-tjC97b prefers device /dev/sdc1 because of previous preference.
Jun 01 23:46:00 dragon lvm[644]:   WARNING: Device mismatch detected for vgssd/leap422 which is accessing /dev/md127 instead of /dev/sda1.
Jun 01 23:46:00 dragon lvm[644]:   1 logical volume(s) in volume group "vgssd" monitored

I would guess that the have been severe changes within LVM causing the problem, right?
Comment 8 Herbert Graeber 2019-06-05 10:21:39 UTC
I had a similar Problem with one of my machines. In /etc/lvm/lvm.conf I had to change the filter entry to

  filter = [ "a|/dev/md.*|", "r|/.*/|" ]

This assumes that ALL lvm physical volumes are on mdraid. In other cases the filter entry will look different and has to consider other partitions, too.
Comment 9 Tobias Abt 2019-06-05 17:07:55 UTC
Just tested: trying to install current tumbleweed from USB stick gives the same problems as leap 15.1. Just FYI.
Comment 10 Felix Miata 2019-06-10 01:41:26 UTC
Last night I did my first 15.0 to 15.1 online/zypper dup upgrade involving software RAID1 (but no LVM), with kernel locked. Existing /etc/mdadm.conf, with 16 partitions on two 320GB Seagates comprising 8 md devices, did not get changed. To perform initial upgrade and kernel installation I used "systemd.log_level=debug printk.devkmsg=on" on cmdline, and saved each's journal. First boot with distribution kernel lp151.27.3 worked as expected, as do subsequent boots with 15.1 update kernel lp151.28.4.1.
Comment 11 Martin Wilck 2019-06-17 14:32:34 UTC
Note that LVM2 has a feature called "md_component_detection", which should prevent this from happening. The findings in this bug suggest that this detection  doesn't work as it should, unless the reporter switched md component detection off (which I doubt).

Tobias, what happens if you run with no "filter" expression at all in lvm.conf?
Comment 12 heming zhao 2019-07-03 08:12:10 UTC
Just do a mark, This bug may relate to bug 1137296.
Comment 13 Stefan Schäfer 2019-10-11 07:23:40 UTC
...and to Bug: https://bugzilla.opensuse.org/show_bug.cgi?id=1099391
Comment 14 heming zhao 2019-10-14 06:48:52 UTC
1099391, 1099329 & 1136641 are same bug.
need below patch (under lvm2 stable-2.02 branch):
these patch had been merged in tumbleweed (lvm2-2.03.05)

patch:
```
commit a188b1e513ed5ca0f5f3702c823490f5610d4495
Author: David Teigland <teigland@redhat.com>
Date:   Fri Nov 30 16:32:32 2018 -0600

    pvscan lvmetad: use udev info to improve md component detection


commit a01e1fec0fe7c2fa61577c0e636e907cde7279ea
Author: David Teigland <teigland@redhat.com>
Date:   Thu Nov 29 14:06:20 2018 -0600

    pvscan lvmetad: use full md filter when md 1.0 devices are present


commit 0e42ebd6d4012d210084a9ccf8d76f853726de3c
Author: Peter Rajnoha <prajnoha@redhat.com>
Date:   Thu Nov 29 11:51:05 2018 -0600

    scan: md metadata version 0.90 is at the end of disk


commit e7bb50880901a4462e350ce0d272a63aa8440781
Author: David Teigland <teigland@redhat.com>
Date:   Thu Oct 18 11:32:32 2018 -0500

    scan: enable full md filter when md 1.0 devices are present


commit de2863739f2ea17d89d0e442379109f967b5919d
Author: David Teigland <teigland@redhat.com>
Date:   Fri Jun 15 11:42:10 2018 -0500

    scan: use full md filter when md 1.0 devices are present


commit c527a0cbfc391645d30407d2dc4a30275c6472f1
Author: David Teigland <teigland@redhat.com>
Date:   Mon Aug 27 11:15:35 2018 -0500

    lvmetad: improve scan for pvscan all
```

*** This bug has been marked as a duplicate of bug 1099329 ***
Comment 15 Tobias Abt 2019-10-27 19:20:30 UTC
Hi,

sorry, fixing tumbleweed only does not help me (and others) at all, as I currently have no upgrade path to leap 15.1 if current lvm2 package in 15.1 is not fixed as well.

And 15.0 gets out of support in about a month and 15.2 which will hopefully incorporate the fix will not be available for another six months.

So, could you please fix this issue for 15.1, too?!

Thank you!
Comment 16 heming zhao 2019-10-28 02:32:23 UTC
sles-15sp1 patches were merged by request: https://build.suse.de/request/show/203593

sles-15sp2 is lvm2-2.03.05+, which already contains these patches.
Comment 17 heming zhao 2019-10-28 03:36:17 UTC
for leap 15.1: https://build.opensuse.org/request/show/743388
Comment 18 Marcus Meissner 2019-10-28 09:10:43 UTC
15.1 will be imported from SLE15 SP1 once it is released.
Comment 19 Tobias Abt 2019-11-21 17:42:38 UTC
Just wondering, as it has already been more than three weeks since the supposed commit, when is the ETA of the fixed package for SLES 15.1 and trickling down to Leap 15.1? I can't see neither (SLES at work, Leap at home)...

Support for Leap 15.0 is ending and I really would like to up upgrade my system  soon without having to lock old lvm2 packages from 15.0 in order to succeed.

Thank you!
Comment 20 heming zhao 2019-11-22 03:28:44 UTC
I had sent my fixed codes into sles-15sp1 20 days ago.
It is currently in testing status. I can't control the speed of release. 
Please be patient.

If you want, you can download my fix from my private project page:
https://build.opensuse.org/package/show/home:hmzhao:branches:openSUSE:Leap:15.1:Update/lvm2
The patch files "bug-1145231_xxx" is for this bug fixes.
Comment 21 heming zhao 2019-11-22 04:36:02 UTC
For SLES product, you can raise a ticket from customer support to require PTF package for this bug.
Comment 22 José Iván López González 2019-12-19 10:09:44 UTC
*** Bug 1156086 has been marked as a duplicate of this bug. ***
Comment 23 Ancor Gonzalez Sosa 2020-01-10 11:24:43 UTC
*** Bug 1160661 has been marked as a duplicate of this bug. ***
Comment 24 Ancor Gonzalez Sosa 2020-01-20 16:19:20 UTC
*** Bug 1161197 has been marked as a duplicate of this bug. ***
Comment 25 Ancor Gonzalez Sosa 2020-02-26 12:00:40 UTC
*** Bug 1164926 has been marked as a duplicate of this bug. ***