Bug 1161438 - dracut: `rd.neednet=1 ip=dhcp` boot parameters end in emergency shell (again)
dracut: `rd.neednet=1 ip=dhcp` boot parameters end in emergency shell (again)
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: dracut maintainers
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-01-21 14:44 UTC by Ignaz Forster
Modified: 2021-03-02 16:34 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
rdsosreport.txt (Dell Optiplex 990), rd.debug (3.16 MB, text/plain)
2020-02-05 17:04 UTC, Ignaz Forster
Details
rdsosreport.txt (VirtualBox), rd.debug (2.51 MB, text/plain)
2020-02-19 13:21 UTC, Ignaz Forster
Details
rdsosreport.txt (QEMU/KVM), rd.debug (2.50 MB, text/plain)
2020-02-19 14:11 UTC, Ignaz Forster
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ignaz Forster 2020-01-21 14:44:07 UTC
On a Tumbleweed system, calling `dracut -f -a network` from the installed system and then rebooting, adding `ip=dhcp rd.neednet=1` to the kernel command line, will end up in an emergency shell after about 3 minutes, repeatedly printing messages such as
localhost dracut-initqueue[418]: Warning: dracut-initqueue timeout - starting timeout scripts

In `ip addr` the interface is listed, but no network is configured.
Comment 1 Ignaz Forster 2020-01-21 15:23:24 UTC
I should add that this happens on bare metal (Dell Optiplex 990); in a VirtualBox VM the network is configured correctly, in KVM this happened to me quite often, but I currently can't reproduce it.
Comment 2 Ignaz Forster 2020-01-23 12:44:54 UTC
I think I found the trigger: Our KIWI images have `net.ifnames=0` set, and those are the installations that work. When adding that option to the command line options of regular installations, then those also get IP addresses (tested with VirtualBox). This didn't work on the physical Dell machine however, the interface is always called "em1" there.

This leads to the conclusion that dracut doesn't seem to be able to handle any non-ethX names / Predictable Network Interface Names.
Comment 3 Daniel Molkentin 2020-02-04 15:24:54 UTC
Can you point me to your image? Upstream is only doing predictable network interface, so either it's due to suse-only patches, or something else is off.
Comment 4 Ignaz Forster 2020-02-04 15:46:20 UTC
The Tumbleweed images (where it doesn't work) are the regular ones from https://software.opensuse.org/distributions/tumbleweed. The (working) KIWI images I've been using are available from http://download.opensuse.org/tumbleweed/appliances/ (for KVM you want to get openSUSE-MicroOS.x86_64.qcow2).
Comment 5 Daniel Molkentin 2020-02-04 21:38:59 UTC
I tried both, and it doesn't seem to be a dracut issue per say:


(In reply to Ignaz Forster from comment #2)
> I think I found the trigger: Our KIWI images have `net.ifnames=0` set, and
> those are the installations that work. When adding that option to the
> command line options of regular installations, then those also get IP
> addresses (tested with VirtualBox). This didn't work on the physical Dell
> machine however, the interface is always called "em1" there.

Yay, Dell. They use biosdevname to name the interfaces, systemd does not play a role here (and biosnamedev is hence unmoved by the parameters). This approach is called "Consistent Network Device Naming". The gory details are layed out here: https://linux.dell.com/files/whitepapers/consistent_network_device_naming_in_linux.pdf
 
> This leads to the conclusion that dracut doesn't seem to be able to handle
> any non-ethX names / Predictable Network Interface Names. However, both images have net.ifnames=0 set by default, so I think our real problem are Predictable Network Interface names, because the biosdevname dracut module is probably not part of the default dracut image (see below for how to verify)

It works fine for me, however, there were two caviats:

> The (working) KIWI
> images I've been using are available from
> http://download.opensuse.org/tumbleweed/appliances/ (for KVM you want to get
> openSUSE-MicroOS.x86_64.qcow2).

This works just fine, but comes with af_packet module, as well as the network and network-legacy dracut modules in the initrd.

(In reply to Ignaz Forster from comment #4)
> The Tumbleweed images (where it doesn't work) are the regular ones from
> https://software.opensuse.org/distributions/tumbleweed.

This image seems to lack both of the above, af_packet (Address Family 17) is needed for user networking, which is qemu's default (I don't usually use VirtualBox). And without the network modules, dracut will not care about network parameters at all.

How to try to get it work:

modprobe af_packet
(for dell machine: ensure biosdevname package it installed, dracut will include the 97biosdevname package automatically on the next run)
dracut -a network -a network-legacy -f
lsinitrd # check here if network, network-legacy (and biosdevname) modules as well as af_packet.ko are part of the initramfs.
reboot # and check if it sets up the network.

(more details about what the network modules manage to setup can be collected by booting with rd.debug=1 and checking the journal.)

Can you verify my analysis and whether the absence of the modules (in one case) and biosdevname (in both cases for dell machines) from the initrd actually is the problem?
Comment 6 Ignaz Forster 2020-02-05 17:04:33 UTC
Created attachment 829330 [details]
rdsosreport.txt (Dell Optiplex 990), rd.debug

Test 1/3 (Dell Optiplex 990):
On the desktop PC adding the dracut und kernel modules didn't have the desired effect: The machine is still running into an emergency shell.

The initrd seems to be set up as expected:

e23:/home/ignaz # lsinitrd | grep biosdev
biosdevname
-rwxr-xr-x   1 root     root        39784 Jan 30 09:59 sbin/biosdevname
-rw-r--r--   1 root     root         1130 Jan 30 09:59 usr/lib/udev/rules.d/71-biosdevname.rules

e23:/home/ignaz # lsinitrd | grep af_packet
-rw-r--r--   1 root     root        25064 Feb  5 10:00 lib/modules/5.5.1-4.g267811a-default/kernel/net/packet/af_packet.ko.xz
e23:/home/ignaz # lsinitrd | grep network

Arguments: -f -a 'network' -a 'network-legacy'
network-legacy
network
kernel-network-modules
-rw-r--r--   1 root     root          252 Jan 16 11:21 usr/lib/sysctl.d/51-network.conf
drwxr-xr-x   1 root     root            0 Feb  5 16:57 usr/lib/systemd/network
-rw-r--r--   1 root     root          441 Jan 14 14:31 usr/lib/systemd/network/99-default.link
-rw-r--r--   1 root     root          505 Jan 14 14:31 usr/lib/systemd/system/network-online.target
-rw-r--r--   1 root     root          502 Jan 14 14:31 usr/lib/systemd/system/network-pre.target
-rw-r--r--   1 root     root          521 Jan 14 14:31 usr/lib/systemd/system/network.target
Comment 7 Ignaz Forster 2020-02-19 13:21:37 UTC
Created attachment 830631 [details]
rdsosreport.txt (VirtualBox), rd.debug

Test 2/3 (VirtualBox):
In the VirtualBox VM adding the dracut und kernel modules also didn't have the desired effect: The machine is still running into an emergency shell.

I have attached the corresponding log.
Comment 8 Ignaz Forster 2020-02-19 13:26:01 UTC
Test 3/3 (Dell Latitude 7480):

Same for my laptop.

To summarize I can NOT verify that the absence of the modules (in one case) and biosdevname (in both cases for dell machines) from the initrd actually is the problem. I hope the logs can shed more light on the issue...
Comment 9 Ignaz Forster 2020-02-19 14:11:37 UTC
Created attachment 830643 [details]
rdsosreport.txt (QEMU/KVM), rd.debug

Bonus Test (QEMU/KVM):

I also can't get networking to work in QEMU/KVM with the same messages, so I'm wondering what you did differently. This is a regular Tumbleweed installation with minimal X, booting with the kernel parameters "ip=dhcp rd.neednet=1" appended.

localhost:/home/ignaz # lsinitrd | grep af_packet
-rw-r--r--   1 root     root        25088 Feb 12 01:15 lib/modules/5.5.2-1-default/kernel/net/packet/af_packet.ko.xz
localhost:/home/ignaz # lsinitrd | grep network
Arguments: -a 'network' -a 'network-legacy' -f
network-legacy
network
kernel-network-modules
-rw-r--r--   1 root     root          252 Feb  7 15:14 usr/lib/sysctl.d/51-network.conf
drwxr-xr-x   1 root     root            0 Feb 19 14:51 usr/lib/systemd/network
-rw-r--r--   1 root     root          441 Feb  6 15:13 usr/lib/systemd/network/99-default.link
-rw-r--r--   1 root     root          505 Feb  6 15:13 usr/lib/systemd/system/network-online.target
-rw-r--r--   1 root     root          502 Feb  6 15:13 usr/lib/systemd/system/network-pre.target
-rw-r--r--   1 root     root          521 Feb  6 15:13 usr/lib/systemd/system/network.target
Comment 10 Olivier LAHAYE 2020-05-25 07:18:15 UTC
Same problem for me with dracut-049.1+suse.138.g9068a629-lp152.1.12.x86_64

PROBLEM FOUND: it is in 35network-legacy/ifup.sh around line 629: initqueue/online hook is not run if network is initialized from DHCP.

There are plenty of bug in there.
1- line 629: lack sourcebook initqueue/online $netif
adding this before the ;; will fix the problem.

2 - line 631 is stupid because we are already in the same statement line 616. Thus this statement is always true.

3 - the netroot call is inconstant among different network init. if DHCP, it is run after the case statement else it is run in the case statement.

4 - a major rewrite is needed IMHO as I can see that source initqueue/online $netif is also put in a script created dynamically in $hookdir/initqueue/setup_net_$netif.sh
IMHO having this hook run at multiple places is unmaintainable. Why so much complexity just for setting up an interface, run a hook and eventually start net_root???

Anyway, to fix this specific problem: just add:
sourcebook initqueue/online $netif
at line 629, rebuild your initrd and it works (tested with SystemImager-NG imager dracut module that requires this hook to be run after network is setup).
Comment 11 Thomas Blume 2020-05-26 06:26:09 UTC
(In reply to Ignaz Forster from comment #9)
> Created attachment 830643 [details]
> rdsosreport.txt (QEMU/KVM), rd.debug
> 

-->
[...]
[   24.510014] localhost kernel: virtio_net virtio0 enp1s0: renamed from eth0
[...]
[   27.963203] localhost dracut-initqueue[431]: /bin/dracut-initqueue@35(): '[' -e /lib/dracut/hooks/initqueue/ifup-eth0.sh ']'
[   27.963203] localhost dracut-initqueue[431]: /bin/dracut-initqueue@36(): job=/lib/dracut/hooks/initqueue/ifup-eth0.sh
[   27.963203] localhost dracut-initqueue[431]: /bin/dracut-initqueue@36(): . /lib/dracut/hooks/initqueue/ifup-eth0.sh
[   27.963203] localhost dracut-initqueue[431]: //lib/dracut/hooks/initqueue/ifup-eth0.sh@1(): '[' -e /lib/dracut/hooks/initqueue/ifup-eth0.sh ']'
[   27.963203] localhost dracut-initqueue[431]: //lib/dracut/hooks/initqueue/ifup-eth0.sh@1(): rm -f -- /lib/dracut/hooks/initqueue/ifup-eth0.sh
[   27.963203] localhost dracut-initqueue[431]: //lib/dracut/hooks/initqueue/ifup-eth0.sh@2(): /sbin/ifup eth0
--<

That is actually a duplicate of bug 1170255.
It is fixed with dracut 0.50 (see bug#1170255 comment#8).
Daniel, should we backport the upstream patch or just go to version 0.50 in tumbleweed?
Comment 12 Thomas Blume 2020-05-27 05:27:16 UTC
(In reply to Thomas Blume from comment #11)

> That is actually a duplicate of bug 1170255.
> It is fixed with dracut 0.50 (see bug#1170255 comment#8).
> Daniel, should we backport the upstream patch or just go to version 0.50 in
> tumbleweed?

Oops, sorry, tumbleweed is already on dracut version 050.
Ignaz, can you retry your test with a current tumbleweeds and confirm that this is fixed?

Olivier, I have built testpackages for Leap15.2 at:

https://build.opensuse.org/package/show/home:tsaupe:branches:openSUSE:Leap:15.2:Update:dracut-bsc1161438/dracut

could you give it a try?
Comment 13 Ignaz Forster 2020-05-27 16:37:03 UTC
Successfully retested with current Tumbleweed on the Dell laptop and VirtualBox: An IP address is assigned and the boot succeeds again.
Comment 14 Olivier LAHAYE 2020-05-28 06:55:56 UTC
@Thomas Blume
I'dl like to test, unfortunately, the download package on page gives me no data...
Comment 15 Thomas Blume 2020-05-28 07:04:44 UTC
(In reply to Olivier LAHAYE from comment #14)
> @Thomas Blume
> I'dl like to test, unfortunately, the download package on page gives me no
> data...

I have published the package.
Can you try downloading it from:

https://download.opensuse.org/repositories/home:/tsaupe:/branches:/openSUSE:/Leap:/15.2:/Update:/dracut-bsc1161438/standard/

or just adding the repo to zypper and installing dracut from there?
Comment 16 Olivier LAHAYE 2020-05-28 08:04:36 UTC
@Thomas Blume
I confirm that dracut-049.1+suse.145.gbcc76af1-lp152.3.1.x86_64 is working.
initqueue/online hook is sourced when DHCP is used to configure an interface.
Comment 17 Thomas Blume 2020-05-29 15:43:05 UTC
(In reply to Olivier LAHAYE from comment #16)
> @Thomas Blume
> I confirm that dracut-049.1+suse.145.gbcc76af1-lp152.3.1.x86_64 is working.
> initqueue/online hook is sourced when DHCP is used to configure an interface.

Thanks for the feedback, patches have been submitted to SLES15SP2 and Leap 15.2.
-> closing
Comment 18 Thomas Blume 2020-05-29 15:43:32 UTC
closing