Bug 1183247 - starting lxc container failes with GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 4370 does not belong to any known machine
starting lxc container failes with GDBus.Error:org.freedesktop.machine1.NoMac...
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Virtualization:Other
Current
x86-64 openSUSE Tumbleweed
: P2 - High : Major (vote)
: ---
Assigned To: James Fehlig
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2021-03-09 18:27 UTC by Stefan Eisenwiener
Modified: 2023-01-23 22:26 UTC (History)
5 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
reduced log with descriped error (117.49 KB, text/x-log)
2021-03-09 18:27 UTC, Stefan Eisenwiener
Details
packagelist diff of snappshots 20210302 and 20210305 (8.89 KB, patch)
2021-03-14 09:51 UTC, Stefan Eisenwiener
Details | Diff
[PATCH] cgroup/LXC: Do not condition availability of v2 by controllers (1.73 KB, patch)
2022-10-14 15:50 UTC, Michal Koutný
Details | Diff
[PATCH] cgroup/LXC: Do not condition availability of v2 by controllers (2.67 KB, patch)
2022-10-20 19:40 UTC, Eric van Blokland
Details | Diff
Fix lxc container initialization with systemd and hybrid cgroups (3.15 KB, patch)
2022-12-07 20:51 UTC, Eric van Blokland
Details | Diff
Fix lxc container initialization with systemd and hybrid cgroups (3.18 KB, patch)
2022-12-24 17:20 UTC, Eric van Blokland
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Eisenwiener 2021-03-09 18:27:23 UTC
Created attachment 846959 [details]
reduced log with descriped error

after Tumbleweed update to 20210307 starting an lxc container failes with
GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 4370 does not belong to any known machine.
Comment 1 Stefan Eisenwiener 2021-03-13 20:27:19 UTC
I have made some tests with other openSUSE distributions.
For testing I installed a minimal Server setup, and added libvird with lxc driver.
> zypper in -y --no-recommends libvirt-daemon-lxc libvirt-client libvirt-admin libvirt-daemon-config-network libvirt-daemon-config-nwfilter

This was tested with a also minimal lxc container setup
<domain type='lxc'>
  <name>vm1</name>
  <memory>500000</memory>
  <os>
    <type>exe</type>
    <init>/bin/sh</init>
  </os>
  <vcpu>1</vcpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/libvirt_lxc</emulator>
    <interface type='network'>
      <source network='default'/>
    </interface>
    <console type='pty' />
  </devices>
</domain>

The results are as follows
openSUSE-42.3       | WORKING
openSUSE-15.1       | WORKING
openSUSE-15.2       | WORKING
openSUSE-tumbleweed | failed with NoMachineForPID Error

I think there are some access rights missing or incorrect, but I can't find the root cause, because I'm not that firmiliar with the libvirt, and/or systemd internal workings.
Comment 2 Stefan Eisenwiener 2021-03-14 09:51:42 UTC
Created attachment 847180 [details]
packagelist diff of snappshots 20210302 and 20210305

I have narrowed it down to the tumbleweed snapshot which begins to show the issue.

tumbleweed-20210302 still working
tumbleweed-20210305 shows the issue
- libvirt         was updated from 7.0.0  to 7.1.0
- gnutls          was updated from 3.6.15 to 3.7.0
- crypto-policies has been newly added (but not completely configured yet)
Comment 3 James Fehlig 2021-03-16 16:30:07 UTC
(In reply to Stefan Eisenwiener from comment #2)
> tumbleweed-20210302 still working
> tumbleweed-20210305 shows the issue
> - libvirt         was updated from 7.0.0  to 7.1.0

Thanks for bisecting to this point! I've bisected libvirt between these releases and found commit 9c1693eff4 introduced the problem. When starting an LXC container, virLXCProcessStart() calls virCgroupNewDetectMachine(), which now calls virSystemdGetMachineUnitByPID(). The latter fails, even though the container process should be running since handshake with the lxc driver has already occurred. Investigating...
Comment 4 James Fehlig 2021-03-16 22:46:55 UTC
It smells like a libvirt+cgroups V1 issue. I noticed my TW machine had 'systemd.legacy_systemd_cgroup_controller=0' kernel parameter, which enables the hybrid hierarchy (both V1 and V2)

# mount | grep cgroup
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)

I suppose the default in TW is hybrid. After removing the parameter I still have V1+V2 (and still encounter this bug)

# mount | grep cgroup
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)

Finally, I forced the unified (aka V2) hierarchy by adding the 'systemd.unified_cgroup_hierarchy=1' kernel parameter and now we have only V2

# mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)

and the container starts successfully.
Comment 5 Michal Koutný 2021-03-17 13:46:28 UTC
(In reply to James Fehlig from comment #4)
> I suppose the default in TW is hybrid.
True (as of now).
>  After removing the parameter I still have V1+V2 (and still encounter this bug)
The switch therefore has no effect, that's expected.

> Finally, I forced the unified (aka V2) hierarchy by adding the
> 'systemd.unified_cgroup_hierarchy=1' kernel parameter and now we have only V2
> [...]
> and the container starts successfully.
Good to know but I guess this should still work for the hybrid mode as well.

> which now calls virSystemdGetMachineUnitByPID(). The latter fails,
This boils down to machined checking /proc/$PID/cgroup and what is the path in the unified hierarchy (the one that systemd uses to track processes).

I noticed in the following dump from libvirt logs (comment 0):
> 2021-03-08 22:19:26.374+0000: 4250: debug : virCgroupNewDetect:1151 : pid=4370 controllers=-1 group=0x7f35ebffe310
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupDetectPlacement:348 : Detecting placement for pid 4370 path 
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 0:cpu at /sys/fs/cgroup/cpu,cpuacct in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpu,cpuacct in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 3:memory at /sys/fs/cgroup/memory in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 4:devices at /sys/fs/cgroup/devices in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 5:freezer at /sys/fs/cgroup/freezer in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 6:blkio at /sys/fs/cgroup/blkio in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 7:net_cls at /sys/fs/cgroup/net_cls,net_prio in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 8:perf_event at /sys/fs/cgroup/perf_event in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370
> 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 9:name=systemd at /sys/fs/cgroup/systemd in /system.slice for pid 4370

The last line is suspicious. Beware it's not the unified hierarchy path but
named v1 hierarchy path that systemd maintains for compatibility in v1 mode.
However, the unified path should be the same and that is the suspicion -- there
should be no process directly under system.slice cgroup.

Q1) In what unified cgroup is the LXC leader process
    (`grep 0:: /proc/$PID/cgroup`)?
Q2) Are there any processes directly under system.slice cgroup
    (`cat /sys/fs/cgroup/unified/system.slice/cgroup.procs`,
     `cat /sys/fs/cgroup/systemd/system.slice/cgroup.procs`)?
    (Would be good to know not only pids but also what processes these are.)

Hypothesis: LXC backend (or libvirt?) migrates the leader process directly
under system.slice (from machine-lxc\x2d4370\x2dcontainer1.scope) and machined
can't find it then.
Comment 6 James Fehlig 2021-03-18 15:57:08 UTC
(In reply to Michal Koutný from comment #5)
> Q1) In what unified cgroup is the LXC leader process
>     (`grep 0:: /proc/$PID/cgroup`)?

It apparently depends on where libvirtd was started. E.g. if started via systemd

0::/system.slice/libvirtd.service

If started in a terminal session

0::/user.slice/user-0.slice/session-7.scope

> Q2) Are there any processes directly under system.slice cgroup
>     (`cat /sys/fs/cgroup/unified/system.slice/cgroup.procs`,
>      `cat /sys/fs/cgroup/systemd/system.slice/cgroup.procs`)?

There are no processes under either location when starting libvirtd in a terminal session. When started by systemd

# cat /sys/fs/cgroup/systemd/system.slice/cgroup.procs
4653

where pid 4653 is the libvirt-lxc process.

> Hypothesis: LXC backend (or libvirt?) migrates the leader process directly
> under system.slice (from machine-lxc\x2d4370\x2dcontainer1.scope) and
> machined
> can't find it then.

I think your hypothesis is correct. It's the libvirt lxc driver that's moving the process. I don't see the same behavior with the qemu driver. Regardless of how libvirtd was started, the unified cgroup for a qemu VM is something like

0::/machine.slice/machine-qemu\x2d1\x2dsle15sp3b2\x2dkvm.scope

I'm not familiar with the libvirt lxc driver, so off to find a needle in the haystack...
Comment 7 Michal Koutný 2021-03-18 18:09:24 UTC
(In reply to James Fehlig from comment #6)
> There are no processes under either location when starting libvirtd in a
> terminal session. 
My bet, you'd find the libvirt-lxc under
/sys/fs/cgroup/{unified,systemd}/user.slice/user-0.slice/cgroup.procs

> I think your hypothesis is correct.
Hm, it seems it's migrated there directly from under libvirtd.service and not the scope cgroup. Anyway, it looks like it attempts to move to its parent cgroup relative to current position.

This caught my eye (in v7.0.0..v7.1.0):
184245f53b94 ("vircgroup: introduce nested cgroup to properly work with systemd")

Interestingly, the qemu VM isn't under the new nested 'libvirt' cgroup despite that.

Not sure what would be the most effective question now, maybe what cgroup(s) was libvirt-lxc in before the update.
Comment 8 Bernd Wachter 2021-06-05 21:34:40 UTC
I've hit this same issue when testing if we can start upgrading some systems to 15.3. Libvirt version there also is 7.1.0. When forcing only v2 cgroup hierarchy I can start a container, though still have other issues making it unusable:

# virsh -c lxc:// lxc-enter-namespace www1-de --noseclabel /usr/bin/bash
libvirt: Cgroup error : Unable to write to '/sys/fs/cgroup/machine.slice/machine-lxc\x2d1712\x2dwww1\x2dde.scope/cgroup.procs': Device or resource busy
error: internal error: Child process (27649) unexpected exit status 125
Comment 9 Michal Koutný 2021-06-07 15:54:10 UTC
(In reply to Bernd Wachter from comment #8)
> # virsh -c lxc:// lxc-enter-namespace www1-de --noseclabel /usr/bin/bash
> libvirt: Cgroup error : Unable to write to
> '/sys/fs/cgroup/machine.slice/machine-lxc\x2d1712\x2dwww1\x2dde.scope/cgroup.
> procs': Device or resource busy
> error: internal error: Child process (27649) unexpected exit status 125
This sounds like the container's root cgroup has some subcgroups with controllers attached (would be visible in
> /sys/fs/cgroup/machine.slice/machine-lxc\x2d1712\x2dwww1\x2dde.scope/cgroup.subtree_control
)
This is due to no internal process constraint in v2 hierarchy -- hence this is a different bug than the originally reported one.

It depends on container cgroup layout, new processes can be placed into leaves and lxc-enter-namespace would need to know the internal structure of a running container to achieve this. IOW, it may be impossible to enter root cgroup of some containers on the v2 hierarchy.
Comment 10 Bernd Wachter 2021-06-07 17:27:55 UTC
I'm aware of this being a different issue - I've just included those bits to show that while it is possible to get containers started by using cgroups v2 only it is not a practical solution to do so.
Comment 12 James Fehlig 2021-06-24 23:18:53 UTC
FYI, I verified the cgroups hybrid issue still exists with libvirt.git master and filed an issue in the upstream tracker

https://gitlab.com/libvirt/libvirt/-/issues/182
Comment 13 James Fehlig 2021-10-07 21:21:43 UTC
FYI, Cole posted a patch to fix the NoMachineForPID issue

https://listman.redhat.com/archives/libvir-list/2021-October/msg00157.html

It works fine with hybrid mode in my testing. While testing the patch with V2 only, I noticed the following unrelated problem

virsh -c lxc:/// create lxc-test.xml 
error: Failed to create domain from lxc-test.xml
error: internal error: Unable to find 'memory' cgroups controller mount

From the libvirtd log

2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectPlacement:199 : group=0x7f65580081a0 path= controllers= selfpath=/system.slice/libvirtd.service
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'cpu' present=no
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'cpuacct' present=yes
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'cpuset' present=no
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'memory' present=no
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'devices' present=yes
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'freezer' present=no
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'io' present=no
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'net_cls' present=no
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'perf_event' present=no
2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'name=systemd' present=no
2021-10-07 20:16:00.521+0000: 4131: error : virLXCProcessStart:1236 : internal error: Unable to find 'memory' cgroups controller mount

Hmm, what controllers are along the path to /system.slice/libvirtd.service?

# cat /sys/fs/cgroup/cgroup.controllers 
cpuset cpu io memory hugetlb pids rdma misc
# cat /sys/fs/cgroup/system.slice/cgroup.controllers
pids
# cat /sys/fs/cgroup/system.slice/libvirtd.service/cgroup.controllers
pids

So no memory controller, which along with cpuacct and devices, are required by the lxc driver. I'm surprised there have been no complaints from other distros. Could there be cgroup.subtree_control differences between distros?
Comment 14 Michal Koutný 2021-10-08 17:24:50 UTC
What I find interesting that libvirtd seems to be probing _its_ libvirtd.service cgroup but it should really be interested in the container .scope cgroup under machine.slice (to see what's was delegated to the container) or the root cgroup (to see what is available on the host (for further delegation)).

The reason might be that other distros enable DefaultMemoryAccounting=yes and in such a case even libvirtd.service would have memory controller enabled in itself which is likely enough to trick lxc.
Comment 15 James Fehlig 2021-10-08 21:04:13 UTC
(In reply to Michal Koutný from comment #14)
> What I find interesting that libvirtd seems to be probing _its_
> libvirtd.service cgroup but it should really be interested in the container
> .scope cgroup under machine.slice (to see what's was delegated to the
> container) or the root cgroup (to see what is available on the host (for
> further delegation)).

Ah, good point. Likely an upstream improvement that could be made to the lxc driver. I'm not familiar with the driver and don't have the motivation to learn it, particularly since there is no support on the enterprise side.
 
> The reason might be that other distros enable DefaultMemoryAccounting=yes
> and in such a case even libvirtd.service would have memory controller
> enabled in itself which is likely enough to trick lxc.

Thanks for the hint! /usr/lib/systemd/system.conf.d/__20-defaults-SUSE.conf has

# Memory accounting incurs performance penalties. Let's keep it
# disabled by default since the actual use case for enabling it is not
# clear (jsc#PM-2229).
DefaultMemoryAccounting=no

A local override of the setting allows me to start lxc domains with cgroups V2. But that's a separate doc bug IMO. The bug reported here against hybrid/V1 cgroups is now fixed and submitted to Factory.
Comment 16 OBSbugzilla Bot 2021-10-08 22:40:11 UTC
This is an autogenerated message for OBS integration:
This bug (1183247) was mentioned in
https://build.opensuse.org/request/show/924290 Factory / libvirt
Comment 18 Zhufa Sa 2021-10-19 08:08:29 UTC
Is there a plan for backport these patches to Leap 15.3?
It ships Libvirt 7.1.0.
Comment 19 James Fehlig 2021-10-19 21:07:36 UTC
(In reply to Zhufa Sa from comment #18)
> Is there a plan for backport these patches to Leap 15.3?
> It ships Libvirt 7.1.0.

I hadn't planned on it since the bug was filed against TW. Are you affected by this issue on 15.3?
Comment 20 Bernd Wachter 2021-10-20 05:09:24 UTC
(In reply to James Fehlig from comment #19)
> (In reply to Zhufa Sa from comment #18)
> > Is there a plan for backport these patches to Leap 15.3?
> > It ships Libvirt 7.1.0.
> 
> I hadn't planned on it since the bug was filed against TW. Are you affected
> by this issue on 15.3?

See https://bugzilla.opensuse.org/show_bug.cgi?id=1183247#c8 - we're stuck on Leap 15.2 for all systems hosting LXC containers until this is fixed.
Comment 21 Zhufa Sa 2021-10-20 09:02:51 UTC
(In reply to James Fehlig from comment #19)
> (In reply to Zhufa Sa from comment #18)
> > Is there a plan for backport these patches to Leap 15.3?
> > It ships Libvirt 7.1.0.
> 
> I hadn't planned on it since the bug was filed against TW. Are you affected
> by this issue on 15.3?

Yes, I affected by it.
I also thinking in consideration of that the libvirtd shiped within SLE repo relevant to SLE-15-SP3, so there is a plan following ...

But this is not a big problem for me because I can workaround it by using Virtualization repo from OBS, Let it go if there no any plan.

Thanks.
Comment 22 James Fehlig 2021-10-20 16:55:28 UTC
I added the patch to the SLE15SP3/Leap15.3 libvirt package. It will be included in the next maintenance release.
Comment 24 Swamp Workflow Management 2021-11-05 17:17:36 UTC
SUSE-SU-2021:3619-1: An update that contains security fixes can now be installed.

Category: security (moderate)
Bug References: 1177902,1183247,1186398,1190420,1190493,1190693,1190695,1190917
CVE References: 
JIRA References: 
Sources used:
SUSE MicroOS 5.1 (src):    libvirt-7.1.0-6.8.1
SUSE Linux Enterprise Module for Server Applications 15-SP3 (src):    libvirt-7.1.0-6.8.1
SUSE Linux Enterprise Module for Basesystem 15-SP3 (src):    libvirt-7.1.0-6.8.1

NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
Comment 25 Swamp Workflow Management 2021-11-05 17:19:46 UTC
openSUSE-SU-2021:3619-1: An update that contains security fixes can now be installed.

Category: security (moderate)
Bug References: 1177902,1183247,1186398,1190420,1190493,1190693,1190695,1190917
CVE References: 
JIRA References: 
Sources used:
openSUSE Leap 15.3 (src):    libvirt-7.1.0-6.8.1
Comment 26 James Fehlig 2021-12-10 15:51:38 UTC
Apparently the "fix" for this issue has caused other problems and will be reverted upstream

https://gitlab.com/libvirt/libvirt/-/issues/182#note_763329557

I'll wait for an upstream solution before making additional changes to the downstream libvirt packages.
Comment 27 Bernd Wachter 2022-09-22 12:59:15 UTC
I'm now hitting this bug on Leap 15.4 on our test machine to see if we can move from 15.3 to 15.4 without issues.
Comment 28 Eric van Blokland 2022-10-05 14:33:30 UTC
I've been hit by this as well and had to shelve some upgrades.

Tumbleweed is working fine at the moment. I've installed libvirtd 8.7.0 on a Leap 15.4 test machine using https://download.opensuse.org/repositories/Virtualization/15.4/Virtualization.repo  but this exhibits the same behavior. So maybe it is a systemd interoperability issue? A glance through the commit history did not reveal something obviously related.
Comment 29 Michal Koutný 2022-10-05 15:08:28 UTC
(In reply to Eric van Blokland from comment #28)
> I've been hit by this as well and had to shelve some upgrades.

Is it the same issue like the initial comment? I.e. 'GDBus.Error:org.freedesktop.machine1.NoMachineForPID:'?
What process does the offending PID refer to (is it the PID 1 of the container?)
Can you check what's in /proc/$PID/cgroup (when the error occurs)?
TY

> Tumbleweed is working fine at the moment.

FTR, openSUSE TW defaults to the unified (v2) hierarchy now and it's not affected (was confirmed in comment 4).
Comment 30 Eric van Blokland 2022-10-05 16:59:03 UTC
(In reply to Michal Koutný from comment #29)
> (In reply to Eric van Blokland from comment #28)
> > I've been hit by this as well and had to shelve some upgrades.
> 
> Is it the same issue like the initial comment? I.e.
> 'GDBus.Error:org.freedesktop.machine1.NoMachineForPID:'?
> What process does the offending PID refer to (is it the PID 1 of the
> container?)
> Can you check what's in /proc/$PID/cgroup (when the error occurs)?
> TY
> 
> > Tumbleweed is working fine at the moment.
> 
> FTR, openSUSE TW defaults to the unified (v2) hierarchy now and it's not
> affected (was confirmed in comment 4).

Figures, I thought TW was still hybrid. 

I'm pretty sure it's the same issue. Booting with 'systemd.unified_cgroup_hierarchy=1' resolves it. 

If I'm not mistaken the offending PID is that from the driver/controller(?) not the actual container. It exits too quickly to dump the cgroup information. I could try and attach a debugger if you're interested. But that would have to wait till tomorrow.
Comment 31 Michal Koutný 2022-10-05 18:56:26 UTC
(In reply to Eric van Blokland from comment #30)
> It exits too quickly to dump the cgroup information. 

Understood.

> I could try and attach a debugger if you're interested. But that would have
> to wait till tomorrow.

Maybe the debug logs as in comment 0 would suffice. (Although they don't seem
to contain the unified hierarchy path.)
Comment 32 Eric van Blokland 2022-10-14 12:05:19 UTC
(In reply to Michal Koutný from comment #31)
> (In reply to Eric van Blokland from comment #30)
> > It exits too quickly to dump the cgroup information. 
> 
> Understood.
> 
> > I could try and attach a debugger if you're interested. But that would have
> > to wait till tomorrow.
> 
> Maybe the debug logs as in comment 0 would suffice. (Although they don't seem
> to contain the unified hierarchy path.)

As an exercise I've taken a dive into this issue. I have very little knowledge about cgroups so that has been a bit of a handicap.

As mentioned in one of the comments, the lxc control process is started within the scope of the parent, libvirtd. To me it seems as if the control process is correctly added to the new scope. However, the PID remains tied to the old scope. I've tried changing the controller to use cgroup.procs instead of tasks, which does get rid of the process in the old scope. However, the last line of /proc/PID/cgroup never changes and systemd does not find the controller process in the correct unit.
Comment 33 Michal Koutný 2022-10-14 14:26:31 UTC
(In reply to Eric van Blokland from comment #32)
> To me it seems as if the control process is correctly added to the new scope.
> However, the PID remains tied to the old scope. 

There are multiple hierarchies, for systemd tracking purposes the only v2 is
relevant.

> I've tried changing the controller ...

Do you refer to libvirt or lxc or other code here?

> ... to use cgroup.procs instead of tasks, which does get rid of the process
> in the old scope.

(That should only cause change if you dealt with multi-threaded processes.)

> However, the last line of /proc/PID/cgroup never changes and systemd does
> not find the controller process in the correct unit.

The last row (assuming it's the '0::' v2 hierarchy) is the "source of truth"
for systemd when it comes to unit assocition (when it runs in hybrid or unified
mode).
Comment 34 Eric van Blokland 2022-10-14 15:00:38 UTC
(In reply to Michal Koutný from comment #33)
> (In reply to Eric van Blokland from comment #32)
> > To me it seems as if the control process is correctly added to the new scope.
> > However, the PID remains tied to the old scope. 
> 
> There are multiple hierarchies, for systemd tracking purposes the only v2 is
> relevant.
> 

Like I said my understanding of cgroups is extremely limited. I have no clue about the differences between v1, v2 and what unified is supposed to mean. Somehow I was led to believe that v2 is unified.


> > I've tried changing the controller ...
> 
> Do you refer to libvirt or lxc or other code here?

I'm referring to the code in libvirt, specifically the code for the lxc controller process (lxc_controller) that attempts to associate the process with correct cgroups

> 
> > ... to use cgroup.procs instead of tasks, which does get rid of the process
> > in the old scope.
> 
> (That should only cause change if you dealt with multi-threaded processes.)
> 

So I understood and as far as I've seen the lxc_controller process is not threaded. Imagine my confusion. Could the controller be running threads to handle IPC?

> > However, the last line of /proc/PID/cgroup never changes and systemd does
> > not find the controller process in the correct unit.
> 
> The last row (assuming it's the '0::' v2 hierarchy) is the "source of truth"
> for systemd when it comes to unit assocition (when it runs in hybrid or
> unified
> mode).

I glanced over the systemd code and that led me to believe that if it's not using unified cgroups, it uses the controller at 1:name=systemd
Or does it use the "unified code path" when v2 is available?

While I am writing this I think to recall only v1 was mounted, but now I'm unsure, I would have to check. Anyway, it is clear I'm having some issues with the terminology.
Comment 35 Michal Koutný 2022-10-14 15:50:18 UTC
Created attachment 862186 [details]
[PATCH] cgroup/LXC: Do not condition availability of v2 by controllers

Thanks for the pointer towards lxc_controller.
I see that libvirt uses combination of direct cgroupfs and systemd API access to cgroups. I assume the former is only part of (legacy?) lxc code. The latter is part of the libvirt "core" and hence it fails to properly associate the unit to PID.

I attach a patch that could solve this (I'm not sure about ramificatinos on non-systemd, non-LXC setups). Would you be able to apply it and test the behavior on your setup? TY
Comment 36 Eric van Blokland 2022-10-20 19:40:11 UTC
Created attachment 862326 [details]
[PATCH] cgroup/LXC: Do not condition availability of v2 by  controllers

So if I understand correctly, in hybrid cgroups the scope needs to be set using V2 for systemd. However, libvirt/lxc treats V2 as unavailable if there are no controllers.

Your patch does not work out of the box. It fails in virCgroupV2BindMount when it tries to create the directory (g_mkdir_with_parents) "/sys/fs/cgroup/unified". Which makes sense to me on a read-only file system. 
If I make the errors in this function non-fatal, everything seems to work as expected.

If I make virCgroupBindMount do the binds in reverse order (V2 has precedence over V1) virCgroupV2BindMount succeeds without modifications. That may seem a bit dirty, but to me it's less convoluted than letting V2 be aware it should be mounted within V1.

I've tested this patch with hybrid and unified cgroups, but not without systemd.
Comment 37 Michal Koutný 2022-10-21 15:55:54 UTC
(In reply to Eric van Blokland from comment #36)
> Your patch does not work out of the box. It fails in virCgroupV2BindMount
> when it tries to create the directory (g_mkdir_with_parents)
> "/sys/fs/cgroup/unified". 

Thanks for the test!

> That may seem a bit dirty, but to me it's less convoluted than letting V2 be
> aware it should be mounted within V1.

It relies on the tmpfs mount over the ro-path in the v1 virCgroupBindMount, yeah, that's slightly dirty but nothing incorrectable (should be explained in the commit message though).

> I've tested this patch with hybrid and unified cgroups, but not without
> systemd.

Would you send the patch (even the current form) to the upstream [1]? (To resolve the nuances there.) I can help you if you're unfamiliar; if you send yourself, please Cc me on that mail.

[1] https://libvirt.org/submitting-patches.html
Comment 38 James Fehlig 2022-10-21 16:12:05 UTC
Just a reminder, please reference the gitlab issue mentioned in #12 in any patches sent to the libvirt list that fix this issue. Thanks a lot for the help!
Comment 39 Eric van Blokland 2022-10-21 19:05:22 UTC
(In reply to Michal Koutný from comment #37)
> (In reply to Eric van Blokland from comment #36)
> > Your patch does not work out of the box. It fails in virCgroupV2BindMount
> > when it tries to create the directory (g_mkdir_with_parents)
> > "/sys/fs/cgroup/unified". 
> 
> Thanks for the test!
> 
> > That may seem a bit dirty, but to me it's less convoluted than letting V2 be
> > aware it should be mounted within V1.
> 
> It relies on the tmpfs mount over the ro-path in the v1 virCgroupBindMount,
> yeah, that's slightly dirty but nothing incorrectable (should be explained
> in the commit message though).
> 

If you have any suggestions on how to improve that, let me know. I recall "do" loops are frowned upon these days, but I wanted to make explicit this loop was doing something else than the others, without messing with types. In any case I'll mention it more explicitly in the commit message.

> > I've tested this patch with hybrid and unified cgroups, but not without
> > systemd.
> 
> Would you send the patch (even the current form) to the upstream [1]? (To
> resolve the nuances there.) I can help you if you're unfamiliar; if you send
> yourself, please Cc me on that mail.
> 
> [1] https://libvirt.org/submitting-patches.html

You guessed correctly that I'm not familiar with mailing patches around, but I think I'll manage. I've got time tomorrow in the afternoon to tidy things up. I'll let you know if I get stuck somewhere.

I'll also be sure to cc you and mention the report in the libvirt gitlab.
Comment 40 Eric van Blokland 2022-10-24 11:50:45 UTC
The patch was accepted. I broke the clang builds in the process though. :-/
Comment 41 Eric van Blokland 2022-11-30 16:14:41 UTC
Since my patch broke cgroups, it was reverted. On the mailinglist three possible fixes/workarounds were mentioned but upstream does not seem to care. I would like to see this fixed though. I would like to know which fix would be acceptable by the OpenSUSE maintainers, so I can implement that.

1. Skip obtaining the machine unit name during container startup
2. Let systemd launch the lxc controller process as a transient unit
3. Add a dummy controller to force some v2 logic
Comment 42 James Fehlig 2022-12-07 16:42:44 UTC
(In reply to Eric van Blokland from comment #41)
> Since my patch broke cgroups, it was reverted. On the mailinglist three
> possible fixes/workarounds were mentioned but upstream does not seem to
> care. I would like to see this fixed though. I would like to know which fix
> would be acceptable by the OpenSUSE maintainers, so I can implement that.
> 
> 1. Skip obtaining the machine unit name during container startup
> 2. Let systemd launch the lxc controller process as a transient unit
> 3. Add a dummy controller to force some v2 logic

I'd be receptive to any fix that is isolated to the lxc driver. I'm less fond of a fix that touches common code, which could also affect the qemu driver.
Comment 43 Eric van Blokland 2022-12-07 18:21:47 UTC
In that case it is up to you (In reply to James Fehlig from comment #42)
> (In reply to Eric van Blokland from comment #41)
> > Since my patch broke cgroups, it was reverted. On the mailinglist three
> > possible fixes/workarounds were mentioned but upstream does not seem to
> > care. I would like to see this fixed though. I would like to know which fix
> > would be acceptable by the OpenSUSE maintainers, so I can implement that.
> > 
> > 1. Skip obtaining the machine unit name during container startup
> > 2. Let systemd launch the lxc controller process as a transient unit
> > 3. Add a dummy controller to force some v2 logic
> 
> I'd be receptive to any fix that is isolated to the lxc driver. I'm less
> fond of a fix that touches common code, which could also affect the qemu
> driver.

In that case it's a trade off between maintainability and reliability.

In lxc_process.c there are two calls that cause issues:

virLXCDomainGetMachineName   lxc_process.c:1465
virCgroupNewDetectMachine    lxc_process.c:1473

The purpose of these calls is to abort the container initialization in an early stage if something goes wrong. More specifically, if issues occurred with machined registration and cgroup creation.

Option 1:
For easy maintenance I could make a condition to only call these when the cgroup v2 backend is available or when there is no systemd at the cost of some reliability.

Option 2:
Alter the signature of virCgroupNewDetectMachine to have a flag to optionally check the unit name. When called from lxc_process.c we pass the flag to not check the unit name when the cgroup v2 backend is not available and systemd is present. This way we keep some reliability at the cost of (probably) more maintenance.

Option 3:
Implement a function with the relevant parts of virCgroupNewDetectMachine directly in lxc_process.c as a fallback for when the cgroup v2 backend is not available and systemd is present. Some reliability, more maintenance.

In my opinion option 1 is sufficient. 

Let me know if one of these solutions would be acceptable.
Comment 44 Eric van Blokland 2022-12-07 20:51:46 UTC
Created attachment 863403 [details]
Fix lxc container initialization with systemd and hybrid cgroups

Or Option 4:
In an environment with systemd and hybrid cgroups retrieve the control process child pid and perform these particular checks with it.
Comment 45 Till Dörges 2022-12-18 17:48:47 UTC
FWIW:

My host (including LXC containers) was working fine with Leap 15.3.

After 'zypper dup' to 15.4 the LXC containers weren't starting anymore showing the error

  PID ... does not belong to any known machine

Adding systemd.unified_cgroup_hierarchy=1 to the kernel cmdline didn't help. The error changed to

  internal error: Unable to find 'memory' cgroups controller mount


Downgrading to the packages from Leap 15.3 does give me working LXC containers:

  # add 15.3 repos
  http://download.opensuse.org/distribution/leap/15.3/repo/oss/
  http://download.opensuse.org/update/leap/15.3/sle/
  http://download.opensuse.org/update/leap/15.3/oss

  # downgrade
  zypper install --oldpackage libvirt-daemon-driver-lxc-7.1.0-150300.6.35.2 libvirt-7.1.0-150300.6.35.2
  [...]
  The following 52 packages are going to be downgraded:
    kexec-tools libvirt libvirt-client libvirt-daemon libvirt-daemon-config-network libvirt-daemon-config-nwfilter libvirt-daemon-driver-interface libvirt-daemon-driver-libxl libvirt-daemon-driver-lxc libvirt-daemon-driver-network
    libvirt-daemon-driver-nodedev libvirt-daemon-driver-nwfilter libvirt-daemon-driver-qemu libvirt-daemon-driver-secret libvirt-daemon-driver-storage libvirt-daemon-driver-storage-core libvirt-daemon-driver-storage-disk
    libvirt-daemon-driver-storage-gluster libvirt-daemon-driver-storage-iscsi libvirt-daemon-driver-storage-iscsi-direct libvirt-daemon-driver-storage-logical libvirt-daemon-driver-storage-mpath libvirt-daemon-driver-storage-rbd
    libvirt-daemon-driver-storage-scsi libvirt-daemon-lxc libvirt-libs lxd lxd-bash-completion python3-libvirt-python qemu qemu-audio-spice qemu-block-curl qemu-block-nfs qemu-block-rbd qemu-chardev-baum qemu-chardev-spice
    qemu-hw-display-qxl qemu-hw-display-virtio-gpu qemu-hw-display-virtio-gpu-pci qemu-hw-display-virtio-vga qemu-hw-s390x-virtio-gpu-ccw qemu-hw-usb-redirect qemu-hw-usb-smartcard qemu-kvm qemu-tools qemu-ui-curses qemu-ui-gtk
    qemu-ui-opengl qemu-ui-spice-app qemu-ui-spice-core qemu-x86 xen-libs

  The following 3 NEW packages are going to be installed:
    libbrlapi0_7 libvirglrenderer0 libvirt-bash-completion

  The following 3 packages are going to be REMOVED:
    qemu-accel-qtest qemu-accel-tcg-x86 qemu-hw-usb-host
  [...]

  # lock versions
  zypper addlock libvirt-daemon-driver-lxc

  # remove 15.3 repos
Comment 46 James Fehlig 2022-12-22 18:34:03 UTC
(In reply to Till Dörges from comment #45)
> After 'zypper dup' to 15.4 the LXC containers weren't starting anymore
> showing the error
> 
>   PID ... does not belong to any known machine
> 
> Adding systemd.unified_cgroup_hierarchy=1 to the kernel cmdline didn't help.
> The error changed to
> 
>   internal error: Unable to find 'memory' cgroups controller mount

The memory controller is disabled by default on some SUSE distros. See comment #15.
Comment 47 James Fehlig 2022-12-23 00:47:12 UTC
(In reply to Eric van Blokland from comment #44)
> Created attachment 863403 [details]
> Fix lxc container initialization with systemd and hybrid cgroups

Thanks. I've only compile-tested the patch. I assume you've verified it actually solves the problem :-). If others are interested in testing, I've added Eric's patch to a libvirt package branched to my home project:

https://build.opensuse.org/package/show/home:jfehlig:branches:Virtualization:bug1183247/libvirt

Binaries are published to corresponding repos. E.g. for 15.4:

https://mirrorcache-us.opensuse.org/repositories/home:/jfehlig:/branches:/Virtualization:/bug1183247/15.4/

> Or Option 4:
> In an environment with systemd and hybrid cgroups retrieve the control
> process child pid and perform these particular checks with it.

I like this option since it is confined to the lxc driver. From the patch commit message:

> To work around this we retrieve the lxc control process child process pid (the process that is registered with machined) and perform the checks using that pid.

Ok. From the patch:

+    /* In an environment with hybrid cgroups and systemd the v2 backend is not available.
+     * Systemd however depends on V2 for unit naming. This causes the next two checks to fail.
+     * To work around this issue we retrieve the actual container pid and check on that instead. */
+    if (virSystemdHasMachined() == 0 && cgroupBackends[VIR_CGROUP_BACKEND_TYPE_V2]->available() == false) {
+        pidFile = g_strdup_printf("/proc/%u/task/%u/children", vm->pid, vm->pid);
+        if (virFileReadAll(pidFile, 1024 * 1024, &pidStr) < 0)
+            goto cleanup;
+
+        virTrimSpaces(pidStr, NULL);
+
+        pidList = g_strsplit(pidStr, " ", 20);
+        if (!pidList)
+            goto cleanup;
+
+        if (virStrToLong_i(pidList[0], NULL, 10, &checkPid) < 0)
+            goto cleanup;
+
+    } else {
+        checkPid = vm->pid;
+    }

Is the "actual container pid" *always* the first pid in this file? If so, and we don't care about the rest, seems the g_strsplit 'max_tokens' param could be 2.
Comment 48 Eric van Blokland 2022-12-24 17:20:53 UTC
Created attachment 863686 [details]
Fix lxc container initialization with systemd and hybrid  cgroups

@James,

Haha, yeah I did actually verify it worked.

I'm fairly certain the forked lxc_process will consistently be the only child process at this stage of the synchronization. I can´t be 100% sure though. I've updated the patch with your suggestion. I've also added a cast on vm->pid to be consistent with other occurrences in the lxc driver.

Thank you for considering my patch, happy holidays!
Comment 49 Till Dörges 2022-12-25 13:42:12 UTC
Just tested with 8.10.0-Virt.150400.1038.1 and my LXC containers are working:

--- snip ---
user@box:~> rpm -qa | egrep -i 'libvirt|Virt.150400' | sort
libvirt-client-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-config-network-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-config-nwfilter-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-interface-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-lxc-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-network-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-nodedev-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-nwfilter-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-qemu-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-secret-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-core-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-disk-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-gluster-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-iscsi-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-iscsi-direct-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-logical-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-mpath-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-rbd-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-driver-storage-scsi-8.10.0-Virt.150400.1038.1.x86_64
libvirt-daemon-lxc-8.10.0-Virt.150400.1038.1.x86_64
libvirt-glib-1_0-0-4.0.0-150400.1.10.x86_64
libvirt-libs-8.10.0-Virt.150400.1038.1.x86_64
python3-libvirt-python-8.0.0-150400.1.6.x86_64
system-group-libvirt-20170617-150400.22.33.noarch
typelib-1_0-LibvirtGLib-1_0-4.0.0-150400.1.10.x86_64
--- snap ---
Comment 50 James Fehlig 2022-12-27 18:34:42 UTC
(In reply to Eric van Blokland from comment #48)
> Created attachment 863686 [details]
> Fix lxc container initialization with systemd and hybrid  cgroups

I've applied this patch to the libvirt package and submitted to Factory. A holiday gift from Eric for all those who have been patient with this bug :-).

I've also backported it to the SLE15 SP4 libvirt package (which feeds Leap 15.4) for a future maintenance update.