Bug 1156086

Summary: Combination of RAID5 with (encrypted) logical volumes blows up system
Product: [openSUSE] openSUSE Distribution Reporter: Occo Eric Nolf <oen999>
Component: YaST2Assignee: YaST Team <yast-internal>
Status: RESOLVED DUPLICATE QA Contact: Jiri Srain <jsrain>
Severity: Normal    
Priority: P5 - None CC: ancor, aschnell, jlopez
Version: Leap 15.1   
Target Milestone: ---   
Hardware: x86-64   
OS: Other   
URL: https://trello.com/c/nwAGA5gG
See Also: http://bugzilla.suse.com/show_bug.cgi?id=1136641
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---
Attachments: Yast2 logs, as mentioned in the request to report a bug
Nonsense scenario

Description Occo Eric Nolf 2019-11-06 22:00:35 UTC
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Build Identifier: 

After creating a RAID5 array and using it (entirely) for LVM, the creation of multiple (encrypted) LVs leads to insurmountable problems. The RAID5 (although not specifically named) is seen as 'dirty'.
During re-install, the RAID5 is visible but the volume group and LVs seem to have disappeared. After re-install the RAID5 is visible in Yast2, but again without VG and LVs; moreover, the RAID5 can not be removed and submitting a bug report is requested.

I've been away from the Linux world for 10 years and only recently returned, but I'm not exactly a 'beginner' in computer land.
This problem (which was very hard to pinpoint) has come up several times, and also occurs in Tumbleweed. In fact, if the phantom RAID5 array is present the installation of Tumbleweed won't get past the selection of the desktop.

Reproducible: Always

Steps to Reproduce:
1. Build a RAID5 array (I used 3 8TB hard drives, giving 14.55 TB net)
2. Build a volume group, consisting of the complete RAID5 array.
3. Build encrypted logical volumes; I think the first LV won't cause problems, and maybe the 2nd is OK too, but with 3 LVs the system goes bonkers.
Actual Results:  
Booting the system usually, but not always, fails.
After booting, Yast2 lists the RAID5 array but without volume group and without any logical volumes. Even so, the RAID5 can not be removed.

It is possible to remove the RAID5 using mdadm, but it's not very useful as the problem will re-occur next time RAID5 and LVM are set up.

The Yast2 logs will be added to this bug report.
Comment 1 Occo Eric Nolf 2019-11-06 22:03:34 UTC
Created attachment 823549 [details]
Yast2 logs, as mentioned in the request to report a bug
Comment 2 Arvin Schnell 2019-11-11 08:47:31 UTC
From the logs:

2019-11-06 20:43:09 <3> linux-bzid(3677) [Ruby] yast/wfm.rb:253 Client /usr/share/YaST2/clients/partitioner.rb failed with 'undefined method `plain_device' for nil:NilClass' (NoMethodError).

Looks like a problem in the frontend/Ruby code.
Comment 3 Occo Eric Nolf 2019-11-11 11:28:00 UTC
I notice the severity of this bug has been changed from critical to normal.
With all due respect, I doubt the wisdom of this decision.

After the RAID5 has morphed into a phantom array, the volume group it contains and all associated logical volumes have disappeared completely.
That means ALL data is gone ... as far as I've seen, forever.

Frankly, I would have trouble imagining a bug that could cause more damage than wiping out (possible huge amounts of) data.
Maybe the status of this particular bug should be reconsidered?
Comment 4 Arvin Schnell 2019-11-11 11:47:50 UTC
The logs additionally show that problem of bug #1136641 seems to be
Comment 5 Ancor Gonzalez Sosa 2019-11-11 16:10:07 UTC
We can keep this bug open to investigate the partitioner exception pointed by Arvin in comment#2. But, as said in comment#4, the main problem here that is "driving the system bonkers" is not in YaST, but in the lvm tools.

See the already mentioned bug#1136641, which contains links to a maintenance request to fix the severely broken LVM we shipped with 15.1. They simply don't work properly when LVM and RAID are combined.
Comment 6 Stefan Hundhammer 2019-11-20 15:23:26 UTC
Moving to our Trello task queue to fix the nil:NilClass exception from comment #2.

The other part (the broken LVM) is a duplicate of bug #1099329 / bug #1136641.
Comment 7 José Iván López González 2019-12-19 10:09:43 UTC
The scenario recognized by YaST simply does not make sense (see screenshot). There are two LVM PVs, and both are associated to the same two underlying devices (/dev/sda and /dev/md/HDD)! A LVM PV can only be over a single device.

I would say this nonsense scenario was produced because the issue with lvm2 package shipped in openSUSE 15.1 [1]. But the lvm2 package was already fixed [2], so the problem is gone with an updated Leap 15.1 or Tumbleweed. 

From the YaST point of view, there is nothing to fix. The nil:NilClass exception occurs because there is an LVM PV without an associated block device and that PV belongs to none LVM VG. I would say that insane scenario only would be possible with broken lvm tools.

Closing the bug as duplicated.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1136641

[2] https://build.opensuse.org/request/show/743388

*** This bug has been marked as a duplicate of bug 1136641 ***
Comment 8 José Iván López González 2019-12-19 10:10:41 UTC
Created attachment 826449 [details]
Nonsense scenario