Bug 1119088 - KVM guest start at boot failure
KVM guest start at boot failure
Status: RESOLVED INVALID
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Basesystem
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: Jonathan Kang
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-12-11 09:59 UTC by Philip Raets
Modified: 2019-02-18 06:57 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
detailed boot log (112.56 KB, text/x-log)
2018-12-11 09:59 UTC, Philip Raets
Details
libvirt log at boot (2.11 KB, text/x-log)
2018-12-11 10:00 UTC, Philip Raets
Details
networkmanager at boot (17.09 KB, text/x-log)
2018-12-11 10:00 UTC, Philip Raets
Details
samba log at boot (448 bytes, text/x-log)
2018-12-11 10:00 UTC, Philip Raets
Details
networkmanager devices (764 bytes, text/plain)
2018-12-11 10:01 UTC, Philip Raets
Details
virtual machine config (5.36 KB, text/xml)
2018-12-11 10:06 UTC, Philip Raets
Details
full boot log with wicked service (321.36 KB, text/x-log)
2018-12-12 08:26 UTC, Philip Raets
Details
bootlog snapshot 20181204 (422.19 KB, text/x-log)
2018-12-13 11:14 UTC, Philip Raets
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Philip Raets 2018-12-11 09:59:38 UTC
Created attachment 792364 [details]
detailed boot log

situation:
I have a virtual machine in KVM with a windows install. The network is bridged with a bridge in NetworkManager.

The machine is configured to autostart at boot.

After the latest update of Samba the machine doesn't start at boot anymore. I can start it after logging in on the host.

The host is my computer connected to a windows AD domain and I login with a domain user through winbind. I also have a samba share on the host.

Why I suspect the samba update is because I had a previous bug with samba and winbind not starting at boot, because the network wasn't up, which prevented me from logging on to the computer. With testing of newer samba packages, the above events occured (virtual machine not starting at boot).

Now there was an update of samba (to samba 4.9.3) and the virtual machine doesn't start at boot anymore.

logs are attached.
Comment 1 Philip Raets 2018-12-11 10:00:06 UTC
Created attachment 792365 [details]
libvirt log at boot
Comment 2 Philip Raets 2018-12-11 10:00:32 UTC
Created attachment 792366 [details]
networkmanager at boot
Comment 3 Philip Raets 2018-12-11 10:00:50 UTC
Created attachment 792367 [details]
samba log at boot
Comment 4 Philip Raets 2018-12-11 10:01:55 UTC
Created attachment 792368 [details]
networkmanager devices
Comment 5 Philip Raets 2018-12-11 10:06:10 UTC
Created attachment 792369 [details]
virtual machine config
Comment 6 James Fehlig 2018-12-11 15:45:21 UTC
Since a samba update caused the problem, the samba folks should be the first to take a look IMO. It sounds like the service that connects to the windows AD domain is not starting as promptly after the update?
Comment 7 Philip Raets 2018-12-11 16:07:13 UTC
I don't have a problem logging in on the host computer with my AD account. Winbind is started as soon as I have connection.

The problem is that my virtual machine (KVM) doesn't start at boot anymore. I don't know why this is happening, but it happened after the latest samba update.

I had the same problem after applying updates from the attached bug report 1111373. Rolling back to a previous samba then restored the booting of the virtual machine at host boot.
Comment 8 David Mulder 2018-12-11 22:31:41 UTC
(In reply to James Fehlig from comment #6)
> Since a samba update caused the problem, the samba folks should be the first
> to take a look IMO. It sounds like the service that connects to the windows
> AD domain is not starting as promptly after the update?

Hey Jim,

There seems to be a bug in network manager or maybe systemd that is allowing processes to start before the network is up. This happened to cause a crash in samba, which we've fixed, but the same problem is probably causing issues here.
I think that Philip's network is assigning addresses slowly, which brought this to the surface.
Comment 9 James Fehlig 2018-12-12 02:56:08 UTC
(In reply to David Mulder from comment #8)
> There seems to be a bug in network manager or maybe systemd that is allowing
> processes to start before the network is up.

Yep, looks to be the case. From the boot log attached to #0:

dec 10 16:19:13 PRAETS systemd[1]: Reached target Network.
dec 10 16:19:13 PRAETS systemd[1]: Reached target Network is Online.

Fine, we reached target Network. But not really because later in the log we see NetworkManager finally doing something with the bridge device:

dec 10 16:19:13 PRAETS NetworkManager[1056]: <info>  [1544455153.9104] manager: (bridge0): new Bridge device (/org/freedesktop/NetworkManager/Devices/3)
dec 10 16:19:14 PRAETS NetworkManager[1056]: <info>  [1544455154.0106] device (bridge0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
dec 10 16:19:14 PRAETS NetworkManager[1056]: <info>  [1544455154.0189] device (bridge0): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
...
dec 10 16:19:14 PRAETS NetworkManager[1056]: <info>  [1544455154.2822] device (bridge0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
dec 10 16:19:14 PRAETS NetworkManager[1056]: <info>  [1544455154.2824] device (bridge0): IPv4 config waiting until carrier is on
dec 10 16:19:14 PRAETS NetworkManager[1056]: <info>  [1544455154.2824] device (bridge0): IPv6 config waiting until carrier is on

but before the bridge is operational services are started that depend on the Network target, including libvirtd:

dec 10 16:19:14 PRAETS systemd[1]: Starting Virtualization daemon...

libvirtd tries to autostart your VM but qemu fails since it can't initialize the spice server (fails to open sockets):

dec 10 16:19:15 PRAETS.boma.local libvirtd[2369]: 2018-12-10T15:19:15.138674Z qemu-system-x86_64: failed to initialize spice server

Later in the log, after the failed autostart, we finally see NetworkManager finish config of the bridge device:

dec 10 16:19:16 PRAETS.boma.local NetworkManager[1056]: <info>  [1544455156.8630] device (eth0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
dec 10 16:19:16 PRAETS.boma.local NetworkManager[1056]: <info>  [1544455156.8639] device (eth0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
dec 10 16:19:16 PRAETS.boma.local NetworkManager[1056]: <info>  [1544455156.8640] device (eth0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')

As you say, the bug could be in NetworkManager or systemd. I'll reassign to the systemd-maintainers for now since if nothing else they are helpful in debugging service startup dependencies.
Comment 10 Philip Raets 2018-12-12 08:26:34 UTC
Created attachment 792469 [details]
full boot log with wicked service

This morning I had the crazy idea to try this with wicked instead of NetworkManager. With wicked everything works, and the virtual machine starts when the host boots.

Attached are the log files for wicked.

So I think that NetworkManager is slow with setting up the bridge.

The reason I use NetworkManager is because I use the Gnome Desktop, and Networkmanager has a plugin for openVPN which I also use.
Comment 11 James Fehlig 2018-12-12 15:41:12 UTC
(In reply to Philip Raets from comment #10)
> This morning I had the crazy idea to try this with wicked instead of
> NetworkManager.

That is not so crazy since wicked is *by far* the more common setup on a virt host :-). It is the default setup for SLES KVM and Xen hosts.

> So I think that NetworkManager is slow with setting up the bridge.

Right. In the Description you claimed this worked at one time. If that is the case, can you try to isolate what package update caused the regression? Given our investigation so far I doubt it was samba. More likely NetworkManager, dbus, or systemd.
Comment 12 Franck Bui 2018-12-12 17:08:39 UTC
(In reply to James Fehlig from comment #9)
> (In reply to David Mulder from comment #8)
> > There seems to be a bug in network manager or maybe systemd that is allowing
> > processes to start before the network is up.
> 
> Yep, looks to be the case. From the boot log attached to #0:
> 
> dec 10 16:19:13 PRAETS systemd[1]: Reached target Network.
> dec 10 16:19:13 PRAETS systemd[1]: Reached target Network is Online.
> 

Target units related to network (such as network.target, network-online.target) are mostly scheduled by the network management software (NM in this case). So it's up to this component to define what "network is up" means and when it should be activated.

So it may sound a bit buck-passing from me but I think the NM folks should look at this first since I don't think systemd plays a crucial role here.

(Also I'm leaving on vacation so better strike while the iron is hot)

Jonathan, could you have a look ?
Comment 13 Philip Raets 2018-12-13 11:14:48 UTC
Created attachment 792600 [details]
bootlog snapshot 20181204

I've installed tumbleweed-cli so I could get back to previous snapshots. I went back to 20181204, because then it was still working.

I rebooted after installing, and the vm started at boot.

I then started upgrading to the next snapshot and rebooted to find out exactly when this started happening.

I've upgraded to all newer snapshots, but the vm keeps starting at boot now with NetworkManager. I don't know what happened.

Now I'm back at the latest snapshot (20181211).

Attached is the bootlog for 20181204
Comment 14 Jonathan Kang 2018-12-17 03:22:50 UTC
Sorry for the late response. In the first attched log, it seems that
NetworkManager-wait-online.service is not even enabled somehow. So the system
has a faulty state of online.

For more information about "network is up", check out this link[1]

*[1] https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
Comment 15 Jonathan Kang 2018-12-17 03:30:22 UTC
(In reply to Philip Raets from comment #13)
> Created attachment 792600 [details]
> bootlog snapshot 20181204
> 
> I've installed tumbleweed-cli so I could get back to previous snapshots. I
> went back to 20181204, because then it was still working.
> 
> I rebooted after installing, and the vm started at boot.
> 
> I then started upgrading to the next snapshot and rebooted to find out
> exactly when this started happening.
> 
> I've upgraded to all newer snapshots, but the vm keeps starting at boot now
> with NetworkManager. I don't know what happened.
> 
> Now I'm back at the latest snapshot (20181211).
> 
> Attached is the bootlog for 20181204

It could be some misconfiguration and it got fixed when you up/downgrade the
system? That's odd.

Anyway, glad that it works.
Comment 16 Jonathan Kang 2019-02-18 06:57:29 UTC
Closing this bug as it's cannot be reproduced as described in comment#13. Please feel free to reopen it if you're still experiencing this issue.