Bug 921570 - after dist upgrade system boots to emergency mode
after dist upgrade system boots to emergency mode
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Distribution
Classification: openSUSE
Component: Basesystem
13.2
x86-64 openSUSE 13.2
: P2 - High : Major (vote)
: ---
Assigned To: Johannes Thumshirn
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-03-10 14:59 UTC by Steffen Hau
Modified: 2020-04-01 02:32 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Output of journalctl -b (390.00 KB, application/x-tar)
2015-03-10 16:14 UTC, Steffen Hau
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Steffen Hau 2015-03-10 14:59:01 UTC
I've started upgrading systems from openSUSE 13.1 to 13.2. Virtual machines without special stuff like mdadm raid devices oder multipathed FC LUN's went fine.

But systems (IBM HS22 Blades) with mdadm raid devices are booting to emergency mode. The systems have three raid 1 devices (swap, / and /srv or /home), swap and / are correctly assembled but the third md device is missing and it also does not appear in /proc/mdstat. Manually running "mdadm -A --scan" brings it up and "systemctl default" continues booting. To dig deeper into this issue, I've installed openSUSE 13.2 from scratch on a spare blade and here md2 is correctly assembled.

This is the content of /etc/systemd/system/ of the upgraded system:
/etc/systemd/system/dbus-org.opensuse.Network.AUTO4.service
/etc/systemd/system/dbus-org.opensuse.Network.DHCP4.service
/etc/systemd/system/dbus-org.opensuse.Network.DHCP6.service
/etc/systemd/system/dbus-org.opensuse.Network.Nanny.service
/etc/systemd/system/default.target
/etc/systemd/system/default.target.wants/sysstat.service
/etc/systemd/system/default.target.wants/systemd-readahead-collect.service
/etc/systemd/system/default.target.wants/systemd-readahead-replay.service
/etc/systemd/system/getty.target.wants/getty@tty1.service
/etc/systemd/system/multi-user.target.wants/acpid.service
/etc/systemd/system/multi-user.target.wants/apache2.service
/etc/systemd/system/multi-user.target.wants/auditd.service
/etc/systemd/system/multi-user.target.wants/cron.service
/etc/systemd/system/multi-user.target.wants/dsmc.service
/etc/systemd/system/multi-user.target.wants/irqbalance.service
/etc/systemd/system/multi-user.target.wants/mcelog.service
/etc/systemd/system/multi-user.target.wants/ntpd.service
/etc/systemd/system/multi-user.target.wants/postfix.service
/etc/systemd/system/multi-user.target.wants/remote-fs.target
/etc/systemd/system/multi-user.target.wants/smartd.service
/etc/systemd/system/multi-user.target.wants/sshd.service
/etc/systemd/system/multi-user.target.wants/syslog-ng.service
/etc/systemd/system/multi-user.target.wants/wicked.service
/etc/systemd/system/network-online.target.wants/wicked.service
/etc/systemd/system/network.service
/etc/systemd/system/sysinit.target.wants/multipathd.service
/etc/systemd/system/syslog.service
/etc/systemd/system/system-update.target.wants/systemd-readahead-drop.service
/etc/systemd/system/timers.target.wants/logrotate.timer
/etc/systemd/system/wickedd.service.wants/wickedd-auto4.service
/etc/systemd/system/wickedd.service.wants/wickedd-dhcp4.service
/etc/systemd/system/wickedd.service.wants/wickedd-dhcp6.service
/etc/systemd/system/wickedd.service.wants/wickedd-nanny.service
/etc/systemd/system/wicked.service.wants/wickedd.service

I've made the scratch system identical to the problematic system (identical installed packages, conf files in /etc, active systemd services, and so on) and it still assembles md2. I've no more ideas where to search for possible causes. Please let me know what kind of information I should provide in order to help you to find the cause.
Comment 1 Steffen Hau 2015-03-10 16:03:10 UTC
I hit the submit button a bit to early.

After enabling multipathd.service on the the spare blade, it also boots to emergency mode. Disabling multipathd.service and rebooting makes the system boot again. The same applies to the updated system.

I have no clue why multipathd prevents the system from assembling md2. In emergency mode I checked "multipath -ll" and "dmsetup ls --tree" and both do not have the disk devices used for md2 in use.

I'll attach the journalctl -b output both for a failed (multipathd enabled) boot as well as for a successfull (multipathd disabled) boot. Disabling multipathd is no option as the system is equipped with a FC LUN from our SAN.
Comment 2 Steffen Hau 2015-03-10 16:14:43 UTC
Created attachment 626158 [details]
Output of journalctl -b

Output of journalctl -b for both a failed an successful boot on the upgraded and the spare host
Comment 3 Bernhard Wiedemann 2015-03-16 17:58:54 UTC
the failed one has
kernel: device-mapper: multipath: version 1.7.0 loaded
kernel: device-mapper: multipath service-time: version 0.2.0 loaded
kernel: device-mapper: table: 253:0: multipath: error getting device
kernel: device-mapper: ioctl: error adding target to table
kernel: device-mapper: table: 253:0: multipath: error getting device
kernel: device-mapper: ioctl: error adding target to table
Comment 4 Steffen Hau 2015-03-17 10:31:54 UTC
These messages do net seem to be the cause. When I disable the multipathd systemd unit and start it manually after a reboot, the messages are also shown:

Mär 10 17:16:34 testhost kernel: device-mapper: multipath: version 1.7.0 loaded
Mär 10 17:16:34 testhost kernel: device-mapper: multipath service-time: version 0.2.0 loaded
Mär 10 17:16:34 testhost kernel: device-mapper: table: 253:0: multipath: error getting device
Mär 10 17:16:34 testhost kernel: device-mapper: ioctl: error adding target to table
Mär 10 17:16:34 testhost kernel: device-mapper: table: 253:0: multipath: error getting device
Mär 10 17:16:34 testhost kernel: device-mapper: ioctl: error adding target to table

But the devices are avalable:
testhost:~ # multipath -ll
360050768018085377800000000000084 dm-0 IBM,2145
size=1000G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| |- 1:0:6:0 sdd 8:48 active ready running
| `- 2:0:6:0 sdf 8:80 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  |- 1:0:1:0 sdc 8:32 active ready running
  `- 2:0:0:0 sde 8:64 active ready running
testhost:~ # dmsetup ls --tree
360050768018085377800000000000084-part1 (253:1)
 └─360050768018085377800000000000084 (253:0)
    ├─ (8:64)
    ├─ (8:32)
    ├─ (8:80)
    └─ (8:48)


I can see the device-mapper lines an all systems (also 13.1), were multipathd is enabled, so they look harmless to me.
Comment 5 Hannes Reinecke 2015-05-27 09:16:29 UTC
We have scheduled a multipath-tools and dracut update for 13.2 which might help here.
Comment 6 Steffen Hau 2015-06-16 13:19:25 UTC
I've just installed the dracut update, but that didn't help. md2 is still not assambled when multipathd.service is enabled. I'm now waiting for the multipath-tools update.
Comment 7 Steffen Hau 2015-10-05 20:17:56 UTC
I just wanted to ask when the scheduled multipath-tools update will arrive.

This issue prevents me from updating a lot of bare metal servers depending on multipath for over half a year now. The issue should be easy to reproduce: I just had to enable multipathd.service and md2 was missing.

Could you please again have a look or provide the updated multipath-tools package?
Comment 8 Johannes Thumshirn 2015-10-21 10:01:49 UTC
Can you try with the version from:

https://build.opensuse.org/package/show/home:morbidrsa:branches:openSUSE:13.2:Update/multipath-tools

It's the version used in  Leap and SLE12-SP1.
Comment 9 Steffen Hau 2015-10-22 09:57:54 UTC
Dear Johannes,

I wasn't able to find a package to download with the provided link. "Download package" says "no data".

I've found http://download.opensuse.org/distribution/leap/42.1-Current/repo/oss/suse/x86_64/multipath-tools-0.5.0-48.3.x86_64.rpm but this one requires libdevmapper.so.1.02(DM_1_02_97)(64bit). So I've also fetched http://download.opensuse.org/distribution/leap/42.1-Current/repo/oss/suse/x86_64/device-mapper-1.02.97-61.4.x86_64.rpm. I'll try this out and report back wether this fixes the issue.
Comment 10 Steffen Hau 2015-10-22 10:12:04 UTC
With both updates applied, system still boots to emergency mode. I'll try changing from multipathd.service to multipathd.socket.
Comment 11 Johannes Thumshirn 2015-10-22 10:29:20 UTC
Forgot to activate the download repo sorry.

But yes, the Leap 42.1 is the same.

I'll check if there's any difference to upstream
Comment 12 Hannes Reinecke 2015-10-22 10:57:41 UTC
If you were to enable multipath on a non-multipathed root system you need to blacklist the root filesystem.

Also, due to the timing involved multipath might claim the device for MD, so if MD references the devices as raw block devices (ie using /dev/sdX) it won't be able to start.
Comment 13 Steffen Hau 2015-10-22 11:34:03 UTC
(In reply to Hannes Reinecke from comment #12)
> If you were to enable multipath on a non-multipathed root system you need to
> blacklist the root filesystem.
I don't know why I should have to do that. OpenSUSE 13.1 works fine with Swap, / and /srv each on RAID1 md devices and additional multipathed FC LUN's. OpenSUSE 13.2 also assembles SWAP and /, but misses /srv if multipathd.service is enabled.


> Also, due to the timing involved multipath might claim the device for MD, so
> if MD references the devices as raw block devices (ie using /dev/sdX) it
> won't be able to start.
If already written that I have checked that point. While in emergency mode, dmsetup does not report /dev/sd[a,b]3 to be claimed (see #c4). I can manually assemble the missing array and continue booting.
Comment 14 Steffen Hau 2016-03-08 11:28:13 UTC
The issue does not exist in 42.1. We will skip 13.2. You can close this issue.
Comment 15 Johannes Thumshirn 2016-03-22 13:33:42 UTC
Closing as it's fixed on 42.1