Bug 912170 - Boot fails with BTRFS RAID1 array as /home - open ctree failed
Boot fails with BTRFS RAID1 array as /home - open ctree failed
Status: RESOLVED WONTFIX
: 1000366 (view as bug list)
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Basesystem
13.2
64bit openSUSE 13.2
: P5 - None : Major (vote)
: ---
Assigned To: Jeff Mahoney
E-mail List
ibs:running:3562:low
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-01-08 01:36 UTC by Ryan Kingsbury
Modified: 2018-04-12 13:58 UTC (History)
9 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
jeffm: needinfo? (RyanSKingsbury)


Attachments
supportconfig from test machine (1.12 MB, application/x-compressed-tar)
2016-12-05 16:11 UTC, Ednilson Miura
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Kingsbury 2015-01-08 01:36:16 UTC
I recently did a clean install of OpenSUSE 13.2 into a system with three drives - 1 SSD containing the / partition (btrfs format with default subvolumes), and 2 identical HDD's containing /home in an encrypted BTRFS RAID1 array. Initially, I was using only one of the HDD's as /home (still encrypted BTRFS). Once I added the second drive and started the RAID1 mirroring, my system will no longer boot without manual intervention. I get dropped into "emergency mode", but then all I have to do is exit (Ctrl+D) to continue booting and everything works normally.

The last entry in the system log before Emergency Mode is invoked is:
Code:

BTRFS: open_ctree failed

It appears that a similar (maybe the same) bug was reported on ArchWiki:
https://wiki.archlinux.org/index.php/Btrfs#BTRFS:_open_ctree_failed

But the solution refers to mkinitcpio rather than dracut that OpenSUSE uses, so I'm unsure how to apply it to my system.


Some relevant system information:

excerpt from /etc/fstab:
----
# encrypted RAID1 array containing /home
UUID=e91f611f-524a-43f5-bde5-8ebb9672f146 /home btrfs defaults 0 0
----

/etc/crypttab:
----
encrypted-home-sdb UUID=9c8fb7d0-74e2-4e38-b7c7-6211bbb6d2b1 none luks, retry=1
encrypted-home-sdc UUID=b82e5894-cea2-4fe7-a0a5-80f918e9db61 none luks, retry=1
----

excerpt from fdisk -l:
----
Device     Start        End    Sectors   Size Type
/dev/sdb1   2048 1953523711 1953521664 931.5G Microsoft basic data

Disk /dev/sdc: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: CAC38602-D5A5-4F28-8344-65B559124514

Device     Start        End    Sectors   Size Type
/dev/sdc1   2048 1953525134 1953523087 931.5G Linux filesystem

Disk /dev/mapper/encrypted-home-sdc: 931.5 GiB, 1000201723392 bytes, 1953518991 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/mapper/encrypted-home-sdb: 931.5 GiB, 1000200994816 bytes, 1953517568 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
----

Device UUID's:
----
/dev/sda1: UUID="04e20317-f554-43cf-a95d-b0385ebd22bb" UUID_SUB="cb9dc431-1855-466b-ae9a-5c0ddcd7f95a" TYPE="btrfs" PTTYPE="dos" PARTLABEL="primary" PARTUUID="860c637a-83c8-4399-8dc4-5b5ab66c0f06" 
/dev/sdb1: UUID="9c8fb7d0-74e2-4e38-b7c7-6211bbb6d2b1" TYPE="crypto_LUKS" PARTLABEL="primary" PARTUUID="532a7bd2-6774-422e-90c0-31f19796098d" 
/dev/sdc1: UUID="1ca9d3ba-c409-4127-91f5-e3d9c21242bd" TYPE="crypto_LUKS" PARTLABEL="Linux filesystem" PARTUUID="5a6830cc-9064-476c-aa79-6189dd0964bd" 
/dev/mapper/encrypted-home-sdc: UUID="e91f611f-524a-43f5-bde5-8ebb9672f146" UUID_SUB="6007c4c7-5119-4b0f-ab21-134832c4774c" TYPE="btrfs" 
/dev/mapper/encrypted-home-sdb: UUID="e91f611f-524a-43f5-bde5-8ebb9672f146" UUID_SUB="020bae26-b64f-413e-8263-a6e832ccb224" TYPE="btrfs"
----
Comment 1 Ryan Kingsbury 2015-01-08 01:40:17 UTC
It was suggested in the forums that I try:

echo 'c /dev/btrfs-control 0660 root root - 10 234' > /etc/tmpfiles.d/btrfs-control.conf

I issued this command as root, then rebooted, but the problem still occurred.
Comment 2 Andrei Borzenkov 2015-01-09 17:38:00 UTC
Yes, I can reproduce it. Setup - two encrypted containers and btrfs on them as raid0 for both data and metadata.

The bug is in patch 1060-udev-use-device-mapper-target-name-for-btrfs-device-ready.patch. When btrfs builtin runs, /dev/mapper link is not yet created so builtin fails. This results in SYSTEMD_READY being not set and systemd attempts immediately mount multi-device filesystem without second device being available. 

1392  open("/dev/btrfs-control", O_RDWR|O_CLOEXEC) = 7
1392  ioctl(7, BTRFS_IOC_DEVICES_READY, 0x7ffff80a3520) = -1 ENOENT (No such file or directory)

Cc'ing Jeff who was the author of patch. I am not sure what the statement 

--><--
If the device is a DM device, udev will have already cached the table name
from sysfs and we can use that to pass /dev/mapper/<name> to the builtin
so that the correct name is used.
--><--

is based on - as far as I can tell, udev is passing argument verbatim, without doing any processing. Nor do I understand what problem was supposed to be fixed here.

@Ryan - as a workaround, copy /usr/lib/udev/rules.d/64-btrfs.rules into /etc/udev/rules.d/64-btrfs.rules and replace the following two lines

ENV{DM_NAME}=="", IMPORT{builtin}="btrfs ready $devnode"
ENV{DM_NAME}=="?*", IMPORT{builtin}="btrfs ready /dev/mapper/$env{DM_NAME}"

with single one

IMPORT{builtin}="btrfs ready $devnode"

If you will have *any* issues, please mention them here.
Comment 3 Ryan Kingsbury 2015-01-10 13:26:59 UTC
Andrei,

THANK YOU! Your fix worked. Will this workaround survive future updates to the kernel or udev?


I'll add one other observation that could be relevant.  Before adding the second disk (when everything was working), the plymouth screen behind the prompt for the encryption password would load very quickly, fading from black to the teal background in 1-2 seconds. 

Once both disks were added to the array, the plymouth screen took almost 10 seconds to complete this transition. Could that indicate some kind of performance issue related to this?  Note that the slow loading behavior persists even after I applied your fix.
Comment 4 Andrei Borzenkov 2015-01-11 07:28:35 UTC
(In reply to Ryan Kingsbury from comment #3)
> Andrei,
> 
> THANK YOU! Your fix worked. Will this workaround survive future updates to
> the kernel or udev?
> 

Yes, it will; but it makes sense to compare updated file and merge back any other change.

@systemd-maintaners: note that this causes failure to mount btrfs even *MANUALLY*. Because patch effectively skips calling "btrfs ready" for every device-mapper device, when you run "mount /dev/xxx" kernel knows only about one single device /dev/xxx and fails mount. User needs to either run "btrfs scan" before or repeat mount for every device that is part of multi-device btrfs.
Comment 5 Dr. Werner Fink 2015-01-12 11:54:44 UTC
@ Jeff : Please give a comment here
Comment 6 Jeff Mahoney 2015-01-12 16:12:36 UTC
The problem that patch fixes is that btrfs caches the first name used to refer to a device. So when systemd calls btrfs ready on the device, it caches /dev/dm-# instead of the named LVM volume. That results in things like 'mount' and 'df' showing the /dev/dm-# name instead of the LVM volume name. There was a separate bug for this filed (bsc#888215)

The patch provided is just reverting that patch. So, any fix needs to also address bsc#888215. It's probably an ordering issue that the file system mount attempt shouldn't happen before the device mapper nodes are created.
Comment 7 Andrei Borzenkov 2015-01-12 17:19:45 UTC
(In reply to Jeff Mahoney from comment #6)
> The problem that patch fixes is that btrfs caches the first name used to
> refer to a device. So when systemd calls btrfs ready on the device, it
> caches /dev/dm-# instead of the named LVM volume. That results in things
> like 'mount' and 'df' showing the /dev/dm-# name instead of the LVM volume
> name.

And what exactly does not work then? You still did not answer whet *problem* it solves. I have UUID=xxx in /etc/fstab and df shows /dev/sda1. What's wrong with it?

> There was a separate bug for this filed (bsc#888215)
>

Which of course we are not allowed to see.
 
> The patch provided is just reverting that patch. So, any fix needs to also
> address bsc#888215.

Well, *I* cannot address it because I cannot even read it. I can only say that any fix for bsc#888215 needs also address boo#912170.
Comment 8 Jeff Mahoney 2015-01-12 19:25:04 UTC
I understand that you're frustrated that you can't view that bug, but it was reported by a partner against a SLE12 beta and is partner-confidential. I try to describe the problem, in detail, when creating a patch and I'm not sure what you think is missing from the description in the patch. I gave examples in the patch description.

UUID=blablah is interpreted by mount via blkid and is never passed to the kernel. It will use whatever is resolved by blkid, and if that happens to be on an LVM volume, the volume name will be used.

Mount resolves dm-# devices to canonical names. Btrfsprogs' 'btrfs ready' resolves dm-# devices to canonical names. Systemd's builtin btrfs-ready does not resolve dm-# devices to canonical names and there is where the bug pops up.

It's possible that the fix for bsc#888215 is incomplete and that's why we're running into this. The rule tests for the presence of the DM_NAME, which typically means that the link will have been created. The link creation is skipped when DM_UDEV_DISABLE_DM_RULES_FLAG is specified to udev, so that might be the issue. I'm not sure what component is responsible for creating the links these days, and how races with that can be reliably protected against.

In any case, the removal of the rule is not the right fix. Adjustment to account for the race potential, sure.
Comment 9 Jeff Mahoney 2015-01-12 20:29:59 UTC
Werner, what are the ordering rules here? I've assumed that if a SYMLINK rule is earlier in the ruleset, it should be executed by the time this rule is executed. Is that not the case?

A quick review of the LVM2 code shows that the flag is only set when udev integration is disabled. That usually needs to be explicit, like by calling the dm commands with various, obviously named options to disable udev or via the udev_rules=0 option in /etc/lvm/lvm.conf

Ryan, have you disabled udev integration explicitly? I expect that the answer is "no" but I'd rather ask than chase ghosts.

Or, perhaps there's an issue with cryptsetup and setting up the dm devices. I'll dig a little deeper there.
Comment 10 Jeff Mahoney 2015-01-12 20:44:53 UTC
(In reply to Andrei Borzenkov from comment #2)
> Yes, I can reproduce it. Setup - two encrypted containers and btrfs on them
> as raid0 for both data and metadata.
> 
> The bug is in patch
> 1060-udev-use-device-mapper-target-name-for-btrfs-device-ready.patch. When
> btrfs builtin runs, /dev/mapper link is not yet created so builtin fails.
> This results in SYSTEMD_READY being not set and systemd attempts immediately
> mount multi-device filesystem without second device being available. 
> 
> 1392  open("/dev/btrfs-control", O_RDWR|O_CLOEXEC) = 7
> 1392  ioctl(7, BTRFS_IOC_DEVICES_READY, 0x7ffff80a3520) = -1 ENOENT (No such
> file or directory)

Do you happen to know which device it's trying to mark ready here? It looks like cryptsetup creates a temporary device. I wonder if we're hitting that.
Comment 11 Ryan Kingsbury 2015-01-12 21:15:24 UTC
> 
> Ryan, have you disabled udev integration explicitly? I expect that the
> answer is "no" but I'd rather ask than chase ghosts.
> 

Not that I'm aware of (at least I haven't done so intentionally). The system should be close to an out-of-the-box 13.2 configuration, other than the specific disk setup I showed above.
Comment 12 Dr. Werner Fink 2015-01-13 08:16:35 UTC
(In reply to Jeff Mahoney from comment #9)

> Werner, what are the ordering rules here? I've assumed that if a SYMLINK rule 
> is earlier in the ruleset, it should be executed by the time this rule is 
> executed. Is that not the case?

This depends on the boundary conditions, if those are hit for both of two different rules then yes. And indeed the rules 10-dm.rules, 13-dm-disk.rules, and 11-dm-lvm.rules should be done.

The question rises why ENV{DM_NAME}=="" and ENV{DM_NAME}=="?*" are not hit in the 64-btrfs.rules (as the workaround in comment #2 suggests). 

Maybe Robert has an idea what could be wrong here.
Comment 13 Robert Milasan 2015-01-13 08:43:09 UTC
To see the order, you can always try to run: udevadm test <syspath device> and you'll see the ordering of or reading of each rules.

I have no idea why it doesn't work. I haven't been involved too much with btrfs, could it be the dracut initrd is not including all need it as its not a root file system? Just putting some ideas on the table. Haven't read the hole bug, so I might be way off.

Also note here: this shouldn't be used:

ENV{DM_NAME}=="" ....

Better to check if the disk/partition has the filesystem btrfs, something like:

ENV{ID_FS_TYPE}=="btrfs", IMPORT{builtin}="btrfs ready $devnode"

That should be all we need, don't get why the rule is written in the way.

This is how I see how the rule should look like:

--- 64-btrfs.rules.orig	2015-01-13 09:40:43.506199377 +0100
+++ 64-btrfs.rules	2015-01-13 09:41:30.038199073 +0100
@@ -2,12 +2,10 @@
 
 SUBSYSTEM!="block", GOTO="btrfs_end"
 ACTION=="remove", GOTO="btrfs_end"
-ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
 ENV{SYSTEMD_READY}=="0", GOTO="btrfs_end"
 
 # let the kernel know about this btrfs filesystem, and check if it is complete
-ENV{DM_NAME}=="", IMPORT{builtin}="btrfs ready $devnode"
-ENV{DM_NAME}=="?*", IMPORT{builtin}="btrfs ready /dev/mapper/$env{DM_NAME}"
+ENV{ID_FS_TYPE}=="btrfs", IMPORT{builtin}="btrfs ready $devnode"
 
 # mark the device as not ready to be used by the system
 ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
Comment 14 Andrei Borzenkov 2015-01-13 12:44:45 UTC
(In reply to Jeff Mahoney from comment #8)
> Mount resolves dm-# devices to canonical names. Btrfsprogs' 'btrfs ready'
> resolves dm-# devices to canonical names. Systemd's builtin btrfs-ready does
> not resolve dm-# devices to canonical names and there is where the bug pops
> up.
> 

Which bug? The one you tried to fix or the one I'm commenting in right now?

And dm-# *is* canonical name. Anything else is just a convenience alias; the only true and real device name is dm-#. Just like the only true name for SCSI disk is sdX and not /dev/disk/by-whatever/YYY.

> The rule tests for the presence of the DM_NAME, which
> typically means that the link will have been created.

Keyword here is "will". How can you pass link that *will* be created to kernel *now*? How do you expect it to work?

Any link that is specified in udev rule is created only after all rules had been processed. Unless it happened to exist before event due to other reasons.

(In reply to Jeff Mahoney from comment #10)
> > 1392  open("/dev/btrfs-control", O_RDWR|O_CLOEXEC) = 7
> > 1392  ioctl(7, BTRFS_IOC_DEVICES_READY, 0x7ffff80a3520) = -1 ENOENT (No such
> > file or directory)
> 
> Do you happen to know which device it's trying to mark ready here? It looks
> like cryptsetup creates a temporary device. I wonder if we're hitting that.

/dev/mapper/$env{DM_NAME}. cryptsetup creates normal device-mapper device. Which then gets symlink /dev/mapper/$env{DM_NAME} - after this rule is processed. When this rule is being processed no /dev/mapper/$env{DM_NAME} exists yet.
Comment 15 Andrei Borzenkov 2015-01-13 12:49:24 UTC
(In reply to Dr. Werner Fink from comment #12)
> The question rises why ENV{DM_NAME}=="" and ENV{DM_NAME}=="?*" are not hit
> in the 64-btrfs.rules

Huh? They are hit of course. And attempt to pass /dev/mapper/$DM_NAME to kernel. But this link does not yet exist because udev is still processing rules.
Comment 16 Dr. Werner Fink 2015-01-13 13:26:18 UTC
(In reply to Andrei Borzenkov from comment #15)

Ouch ... If the numbering scheme does not prevent against such races then the design of udev is somewhat flawy or is there a possiblity to get such rules into synch. AFAICS the dmsetup(8) provides/uses cookies within the environment todo such a synchronisation.
Comment 17 Jeff Mahoney 2015-01-13 14:10:05 UTC
Andrei, I know cryptsetup creates a normal device mapper device. Prior to doing that, it *also* sets up a temporary device.

Further, the "convenience" names are the ones that util-linux and btrfsprogs have automatically translated from dm-# names, so if you have a problem there, please take it up with them. The /dev/mapper/volname naming has been in place for a very long time and *nobody* uses the /dev/dm-# names directly unless forced to do so.

This is an issue for btrfs because of the multi device support. It reports the device name via the kernel's mechanism to allow file systems to self-report. It *could* do that by using bdevname() or something, but that would always return the dm-# name. Instead, we try to take the path of least surprise, which would be responding with the /dev/mapper name. That's what users use to mount it (directly or indirectly via UUID=/LABEL=/whatever) and that's what they expect to see in /proc/mounts, etc. Lastly, the /dev/mapper names are a convention, and mapping them automatically in the kernel is essentially implementing policy in the kernel, which there are longstanding rules against doing.

So, simply reverting that fix isn't the answer. It fixed a real issue reported by a real customer. Redefining the scope of the problem to paper over it doesn't make that go away.

Robert, I suspect that the $ENV{DM_NAME}=*? rule is getting executed. That's what Andrei's strace seems to demonstrate. The issue is that it's executed and the link that it expects to use (and that should have been created via the SYMLINK rule in the DM rules) hasn't been created. Your suggestion may clean up that rule a little bit to avoid multiple gotos but it's functionally equivalent to the suggestion in comment #2 that is wrong.

This rule *should work fine.*

The issue seems to be that the symlink isn't created. The cookies that DM creates are used, AFAIK, for finer-grained synchronization, so that DM doesn't need to do something like a "udevadm settle" or whatever the command is these days that will wait for *all* events to be processed when it only cares about the device it just created.
Comment 18 Andrei Borzenkov 2015-01-13 14:19:05 UTC
(In reply to Jeff Mahoney from comment #17)
> The issue is that it's executed
> and the link that it expects to use (and that should have been created via
> the SYMLINK rule in the DM rules) hasn't been created.

OK, so here is the source of confusion. SYMLINK does not create anything. SYMLINK adds link to device property. Real link creation happens at the very end of even processing.
Comment 19 Andrei Borzenkov 2015-01-13 14:19:53 UTC
(In reply to Andrei Borzenkov from comment #18)

> end of even processing.

s/even/event/
Comment 20 Robert Milasan 2015-01-13 14:44:13 UTC
Jeff, still this is stupid:

ENV{DM_NAME}=="?*", IMPORT{builtin}="btrfs ready /dev/mapper/$env{DM_NAME}"

Any device, virtual or physical should have a device node, it doesn't have it, we shouldn't care about it. The rule for btrfs is broken, might not help this bug, but just pointing it out.
Comment 21 Jeff Mahoney 2015-01-13 15:25:12 UTC
Robert, what method would you suggest to pass that name to btrfs ready, then?
Comment 22 Robert Milasan 2015-01-13 15:27:59 UTC
Jeff, check comment #13.
Comment 23 Jeff Mahoney 2015-01-13 15:30:36 UTC
Robert, no. I'm not trying to be difficult, but please read paragraphs 2 and 3 of comment #17.
Comment 24 Robert Milasan 2015-01-13 16:30:38 UTC
Jeff, I understand what you are saying, but relying on possible symlinks is a bad idea. udev (thats all I know) works with device nodes, symlinks are only to make things a bit easier, but doesn't mean we should rely on symlinks within udev rules 'BAD IDEA'.

So, that all I can say from my side, I might be wrong.
Comment 25 Andrei Borzenkov 2015-01-13 16:31:54 UTC
(In reply to Jeff Mahoney from comment #21)
> what method would you suggest to pass that name to btrfs ready, then?

Which "that" name?

bor@opensuse:~/src/systemd> udevadm info -q symlink -n dm-0 | xargs -n1 /bin/echo
disk/by-id/dm-name-system-root
disk/by-id/dm-uuid-LVM-F1ikgD2RES306Gil9M7iwa4NKWEbV1NVz0zCdWG2lT1LAHFo5MsclirNdUuY2f5T
disk/by-id/raid-system-root
disk/by-uuid/bf08b064-2753-45b5-9af1-4fab01c9cca5
mapper/system-root
root
system/root

Why exactly /dev/mapper/system-root is better than /dev/system/root or /dev/disk/by-id/dm-name-system-root or any other link?
Comment 26 Jeff Mahoney 2015-01-13 16:35:03 UTC
Because those are the links that util-linux's mount command uses. I'm not choosing this arbitrarily. I'm choosing this for consistency with other tools and user expectations.
Comment 27 Robert Milasan 2015-01-13 16:48:03 UTC
Jeff, you not suppose to keep consistency util-linux package, doesn't make sense, check most udev rules, we use $devnode, not some symlinks which util-linux uses. util-linux comes into the action way later on, so we don't care what util-linux likes and doesn't.

I'm just saying.
Comment 28 Jeff Mahoney 2015-01-22 18:37:59 UTC
Ok, so I went back to the root of the issue, which is btrfs caching names. The reason we do that is because btrfs's multi-device support and online resizing means that we can remove the device that was used the mount the file system. Even if the file system was mounted with a single device, a new device can be added and the old one removed. As a result, the kernel now has the ability to allow each file system to report which device is used to mount it. Btrfs and nfs are the only file systems that make use of this. Other file systems use the vfsmnt's dev_name pointer, which is a copy of whatever is used to mount the file system. 

This is where util-linux comes in, since libmount will canonicalize dm-# names to their /dev/mapper/<tablename> names before passing into the kernel. This applies even if I attempt to mount the device as /dev/dm-4, for example. The kernel passes that name back out via /proc/mounts and anything that parses it, like df, uses that name. So that's why XFS and ext4 work as expected when it comes to dm devices. The name has been canonicalized in userspace and passed to the kernel during mount, and then passed back out as such.

When we have systemd doing the device ready stuff, it only makes the /dev/dm-* names available to the kernel. We could jump through some hoops to discover the device passed during mount and report that, but it's a layering violation and still leaves us with the problem described above when the device used to mount the file system goes away.

For consistency with the rest of the system, the /dev/mapper/ names should be reflected in /proc/mounts. We *could* use the dm-# names, but that would require every tool reading /proc/mounts to do the canonicalization. There are far fewer tools that write them.

So, it's obvious that the rule I've written only works in a subset of possible circumstances. How do we fix it so that it works in all of them? Is it even possible without the links already existing? Is it possible to invoke the rule after the links are created?
Comment 29 Robert Milasan 2015-01-23 07:18:16 UTC
Jeff, check comment #13.

Andrei, would you be able to test also the patch from comment #13 ?
Comment 36 Andrei Borzenkov 2015-01-23 16:32:01 UTC
(In reply to Robert Milasan from comment #29)
> Andrei, would you be able to test also the patch from comment #13 ?

Yes, of course - it works. It does exactly the same as workaround I suggested - it reverts to upstream behavior.

(In reply to Jeff Mahoney from comment #28)
> Is it possible to
> invoke the rule after the links are created?

Not without cooperation with kernel. If kernel would send CHANGE event for all devices when btrfs filesystem becomes mountable, this would be possible. It would also allow us to make it properly - current rule is no more than hack.
Comment 37 Robert Milasan 2015-01-23 17:00:17 UTC
This little patch might work:

index 74b2209..eba9d40 100644
--- a/src/udev/udev-builtin-btrfs.c
+++ b/src/udev/udev-builtin-btrfs.c
@@ -49,6 +49,10 @@ static int builtin_btrfs(struct udev_device *dev, int argc, char *argv[], bool t
         if (err < 0)
                 return EXIT_FAILURE;
 
+        fd = open(argv[2], O_WRONLY|O_NONBLOCK|O_CLOEXEC);
+        if (fd < 0)
+                return EXIT_FAILURE;
+
         udev_builtin_add_property(dev, test, "ID_BTRFS_READY", err == 0 ? "1" : "0");
         return EXIT_SUCCESS;
 }

This will generate a change event and happens to run from IMPORT{builtin}="btrfs ready $devnode", but we need to put a stop to it after the first run, otherwise it will be an infinity loop.

I'll think about it a bit more and get back to you on it.
Comment 38 Robert Milasan 2015-01-23 17:10:04 UTC
This might work a bit better:

--- a/src/udev/udev-builtin-btrfs.c
+++ b/src/udev/udev-builtin-btrfs.c
@@ -35,6 +35,7 @@
 static int builtin_btrfs(struct udev_device *dev, int argc, char *argv[], bool test) {
         struct btrfs_ioctl_vol_args args = {};
         _cleanup_close_ int fd = -1;
+        const char *value;
         int err;
 
         if (argc != 3 || !streq(argv[1], "ready"))
@@ -49,6 +50,15 @@ static int builtin_btrfs(struct udev_device *dev, int argc, char *argv[], bool t
         if (err < 0)
                 return EXIT_FAILURE;
 
+        value = udev_device_get_property_value(dev, "ID_BTRFS_DONE");
+        if (!value) {
+                fd = open(argv[2], O_WRONLY|O_NONBLOCK|O_CLOEXEC);
+                if (fd < 0)
+                        return EXIT_FAILURE;
+
+                udev_builtin_add_property(dev, test, "ID_BTRFS_DONE", "1");
+        }
+
         udev_builtin_add_property(dev, test, "ID_BTRFS_READY", err == 0 ? "1" : "0");
         return EXIT_SUCCESS;
 }
Comment 39 Jeff Mahoney 2015-01-23 17:23:40 UTC
This work looks promising, Robert. Thanks for looking into it! I'll get a test scenario set up.
Comment 40 Jeff Mahoney 2015-01-23 17:27:28 UTC
This is what I had in mind for the accompanying rule:

ACTION=="add", IMPORT{builtin}="btrfs ready $devnode"
ACTION=="change", ENV{DM_NAME}=="?*", IMPORT{builtin}="btrfs ready /dev/mapper/$env{DM_NAME}"

I suppose the btrfs builtin command could also look at ACTION and only issue the event if it's add.
Comment 41 Jeff Mahoney 2015-01-23 17:28:24 UTC
Oops, no. That's why you have the ID_BTRFS_DONE attribute.
Comment 42 Robert Milasan 2015-01-23 17:40:02 UTC
Just leave the rule as you modified in the previous bug, but add the additional patch to btrfs builtin command.
Comment 43 Robert Milasan 2015-01-23 17:43:18 UTC
One question to you Jeff:

You are a kernel guy, as I know. What is the downfall of this:

fd = open(argv[2], O_WRONLY|O_NONBLOCK|O_CLOEXEC); ?
Comment 44 Jeff Mahoney 2016-05-03 22:20:03 UTC
Bugzilla cleanup day for me.  Sorry this got lost for so long.  The good news is that I think I have a cleaner fix that doesn't involve any code changes.

Here are the updated rules:

# let the kernel know about this btrfs filesystem, and check if it is complete
IMPORT{builtin}="btrfs ready $devnode"

# Once the device mapper symlink is created, tell btrfs about it
# so we get the friendly name in /proc/mounts (and tools that read it)
ENV{DM_NAME}=="?*", RUN{builtin}+="btrfs ready /dev/mapper/$env{DM_NAME}"

I've tested this to work as expected on my test system using two device mapper devices as components for a RAID1 btrfs setup.  The file system mounts without intervention and the friendly names appear in 'df' output.

The reason this works is: IMPORT rules are executed immediately since they are a dependency for further rule evaluation.  RUN and SYMLINK are executed at the end in the order they are defined.  Since the SYMLINK rule is before the btrfs rules, the symlink is in place by the time the RUN rule executes.

The kernel recognizes it as a device we've already seen and just updates the name we use for it.
Comment 45 Jeff Mahoney 2016-05-06 20:00:44 UTC
I posted that rule to the systemd list and they rejected it as more appropriate for device-mapper.

They're wrong.  We'll ship the rule in btrfsprogs.
Comment 46 Swamp Workflow Management 2016-05-25 15:13:46 UTC
openSUSE-RU-2016:1398-1: An update that has 5 recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 888215,912170,956819,958562,966257
CVE References: 
Sources used:
openSUSE 13.2 (src):    btrfsprogs-4.5.3-13.1
Comment 48 Franck Bui 2016-09-22 08:23:14 UTC
Jeff could you propagate the change to 13.1 as well  so we can use the same systemd for both 13.1 and 13.2 ?
Comment 49 Bernhard Wiedemann 2016-11-03 21:00:48 UTC
This is an autogenerated message for OBS integration:
This bug (912170) was mentioned in
https://build.opensuse.org/request/show/438638 13.2 / btrfsprogs
Comment 50 Jeff Mahoney 2016-11-03 21:07:15 UTC
I've propagated the packaging changes to openSUSE 13.1, 13.2, and Leap 42.1 so that they're the same.
Comment 51 Jeff Mahoney 2016-11-03 21:29:08 UTC
*** Bug 1000366 has been marked as a duplicate of this bug. ***
Comment 53 Bernhard Wiedemann 2016-11-03 23:00:31 UTC
This is an autogenerated message for OBS integration:
This bug (912170) was mentioned in
https://build.opensuse.org/request/show/438641 42.1 / btrfsprogs
Comment 58 Andreas Stieger 2016-11-09 19:47:43 UTC
Incident for 13.2 is running. Packages will appear in the test repositories below. Please test. 
http://download.opensuse.org/repositories/openSUSE:/Maintenance:/5806/
http://download.opensuse.org/update/13.2-test/
Comment 59 Bernhard Wiedemann 2016-11-09 21:00:17 UTC
This is an autogenerated message for OBS integration:
This bug (912170) was mentioned in
https://build.opensuse.org/request/show/439410 13.2 / btrfsprogs
Comment 63 Swamp Workflow Management 2016-11-18 00:08:56 UTC
openSUSE-RU-2016:2854-1: An update that has one recommended fix can now be installed.

Category: recommended (low)
Bug References: 912170
CVE References: 
Sources used:
openSUSE 13.2 (src):    btrfsprogs-4.5.3-23.2
Comment 64 Ednilson Miura 2016-12-05 16:10:00 UTC
I'm testing this update at qam-sle but for my setup, at least, it is not fixed yet:

I'm running this test on a vm, with 3 virtual disks, where:
/dev/vda1 swap
/dev/vda2 rootfs in btrfs

/dev/vdb1 (as linux partition)
/dev/vdc1 (as linux partition)
The 2 disks were setup as follow (basically, followed
https://www.peterbeard.co/blog/post/i-am-a-data-hoarder-or-how-to-create-a-btrfs-raid-on-multiple-encrypted-disks/)

# created a key to decrypt:
# dd if=/dev/urandom of=~/keyfile bs=1 count=256
# cryptsetup --key-file=keyfile luksFormat /dev/vdc1
# cryptsetup --key-file=keyfile -y luksAddKey /dev/vdc1 (used blank password)
# cryptsetup --key-file=keyfile luksOpen /dev/vdc1 newdrive
# mkfs.btrfs /dev/mapper/newdrive
# mkdir -p /media/newdrive
# mount /dev/mapper/newdrive /media/newdrive
# cryptsetup --key-file=keyfile luksFormat /dev/vdb1
# cryptsetup --key-file=keyfile -y luksAddKey /dev/vdb1
# cryptsetup --key-file=keyfile luksOpen /dev/vdb1 oldmedia
# btrfs device add /dev/mapper/oldmedia /media/newdrive

fstab:
UUID=7e910515-4e60-444b-bf17-cab2c8d47995 /home btrfs defaults 0 0

crypttab:
newdrive UUID=d9b955c5-5710-4062-98cc-eed2f92e371a /root/keyfile luks
oldmedia UUID=65ef76fd-26b1-4959-a65f-83651281b340 /root/keyfile luks

# blkid
/dev/vda1: UUID="172192c1-3db6-49b2-a15b-1ee07fa50f14" TYPE="swap" PARTUUID="00094b3e-01"
/dev/vda2: UUID="7748f02a-55a4-4c67-bb1f-c99668cb25a0" UUID_SUB="45073475-028a-410a-80ed-c53be91c455b" TYPE="btrfs" PTTYPE="dos" PARTUUID="00094b3e-02"
/dev/vdb1: UUID="d9b955c5-5710-4062-98cc-eed2f92e371a" TYPE="crypto_LUKS" PARTUUID="c712057f-01"
/dev/vdc1: UUID="65ef76fd-26b1-4959-a65f-83651281b340" TYPE="crypto_LUKS" PARTUUID="f87bd043-01"
/dev/mapper/oldmedia: UUID="7e910515-4e60-444b-bf17-cab2c8d47995" UUID_SUB="c8686031-d7e6-489f-939a-4a4f825799eb" TYPE="btrfs"
/dev/mapper/newdrive: UUID="7e910515-4e60-444b-bf17-cab2c8d47995" UUID_SUB="f1a02aa9-de05-4fe0-8168-6de699590f25" TYPE="btrfs"

Using this setup I've got the exact same [    7.140104] BTRFS: open_ctree failed message, before or after updating package. Also, I have the same behaviour pressing CTRL-D (home gets mounted)

# rpm -qf /usr/lib/udev/rules.d/64-btrfs-dm.rules
btrfsprogs-udev-rules-4.1.2-9.5.1.noarch

# cat /usr/lib/udev/rules.d/64-btrfs-dm.rules
SUBSYSTEM!="block", GOTO="btrfs_end"
KERNEL!="dm-[0-9]*", GOTO="btrfs_end"
ACTION!="add|change", GOTO="btrfs_end"
ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"

# Once the device mapper symlink is created, tell btrfs about it
# so we get the friendly name in /proc/mounts (and tools that read it)
ENV{DM_NAME}=="?*", RUN{builtin}+="btrfs ready /dev/mapper/$env{DM_NAME}"

LABEL="btrfs_end"

This setup seems similar to that customer is using. 
I'm attaching supportconfig and journal output from this test machine.
Comment 65 Ednilson Miura 2016-12-05 16:11:23 UTC
Created attachment 704856 [details]
supportconfig from test machine
Comment 68 Swamp Workflow Management 2016-12-16 13:07:53 UTC
SUSE-RU-2016:3170-1: An update that has two recommended fixes can now be installed.

Category: recommended (low)
Bug References: 912170,997061
CVE References: 
Sources used:
SUSE Linux Enterprise Software Development Kit 12-SP2 (src):    btrfsprogs-4.5.3-16.1
SUSE Linux Enterprise Server for Raspberry Pi 12-SP2 (src):    btrfsprogs-4.5.3-16.1
SUSE Linux Enterprise Server 12-SP2 (src):    btrfsprogs-4.5.3-16.1
SUSE Linux Enterprise Desktop 12-SP2 (src):    btrfsprogs-4.5.3-16.1
Comment 69 Leonardo Chiquitto 2016-12-19 12:32:43 UTC
Who can help us figure out why the fix works on 12-SP2 but not on 12-SP1? Some other fix perhaps in the kernel or systemd?
Comment 71 Swamp Workflow Management 2016-12-31 02:08:38 UTC
openSUSE-RU-2016:3311-1: An update that has one recommended fix can now be installed.

Category: recommended (low)
Bug References: 912170
CVE References: 
Sources used:
openSUSE 13.1 (src):    btrfsprogs-3.12-4.25.1
Comment 72 Swamp Workflow Management 2017-01-07 23:08:38 UTC
openSUSE-RU-2017:0048-1: An update that has two recommended fixes can now be installed.

Category: recommended (low)
Bug References: 912170,997061
CVE References: 
Sources used:
openSUSE Leap 42.2 (src):    btrfsprogs-4.5.3-3.1
Comment 73 Tomáš Chvátal 2018-04-12 13:58:01 UTC
This version of openSUSE changed to end-of-life (EOL [1]) status. As such
it is no longer maintained, which means that it will not receive any
further security or bug fix updates.
As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
openSUSE, or consider the bug still valid, please feel free to reopen this
bug against that version, or open a new ticket.

Thank you for reporting this bug and we are sorry it could not be fixed
during the lifetime of the release.

[1] https://en.opensuse.org/Lifetime