Bugzilla – Bug 1114113
the latest LVM2 can overwrite extents beyond metadata area
Last modified: 2018-12-22 07:36:17 UTC
I got a bug report from the upstream developer, he said, in the latest versions, LVM2 can overwrite extents beyond metadata area.
A critical bug has been found in LVM which can cause data corruption in rare cases. Avoid using LVM commands that change Volume Group metadata (e.g. lvcreate, lvextend) while LVs in the VG are being used.
An I/O bug can cause LVM commands to read and write back the first 128KB of data immediately following the LVM metadata at the start of the disk. If these blocks are being modified by another program (or file system) at the same time as the LVM command, then that program's changes could be lost.
A fix is being evaluated and will be provided as soon as possible.
The more details can be found at the link
The reproducible LV corruption I was seeing was from the following:
setup and start sanlock and lvmlockd
vgcreate --shared --metadatasize 1m foo /dev/sdg
(Note that this creates an internal "lvmlock" LV that sanlock uses to store leases.)
lvcreate 500 inactive LVs in foo
During the vgremove step, sanlock notices that its updates to the internal "lvmlock" LV are periodically lost. It's because when vgremove writes metadata at the end of the metadata area, it also clobbers PEs that were allocated to the lvmlock LV. (sanlock reads/writes blocks to the lvmlock LV and notices if data changes out from under it.)
It should be straight forward to reproduce this same issue without lvmlockd and sanlock. Create an ordinary VG, create an initial small LV (that uses the first PEs in the VG), start a script or program that reads/writes data to that LV and verifies that what it wrote comes back again. Then start creating 500 other LVs in the VG, and removing those 500 LVs. This causes the LV metadata to grow large and wrap around the end of the metadata area. When lvm writes to the end of the metadata area, it will clobber data that the test program wrote and the test program should eventually notice that its last write is missing.
From the reproduce description, I think this bug will be triggered in case sanlock+lvmlockd, so far, we do not declare to support sanlock, then, the priority of this bug is not very high, but I will back-port the patch soon.
The final fix is available here, https://sourceware.org/git/?p=lvm2.git;a=commit;h=ab27d5dc2a5c3bf23ab8fed438f1542015dc723d
I will backport it to SLE12SP4 and Tumbleweed
I have submitted the code change to open::factory, sle12sp4update and sle15sp1 code branches.
The patch has been in the related code branches, close it.
SUSE-RU-2018:4226-1: An update that has two recommended fixes can now be installed.
Category: recommended (moderate)
Bug References: 1110872,1114113
SUSE Linux Enterprise Software Development Kit 12-SP4 (src): lvm2-2.02.180-9.4.2
SUSE Linux Enterprise Server 12-SP4 (src): lvm2-2.02.180-9.4.2
SUSE Linux Enterprise High Availability 12-SP4 (src): lvm2-2.02.180-9.4.2
SUSE Linux Enterprise Desktop 12-SP4 (src): lvm2-2.02.180-9.4.2