Bugzilla – Bug 1183247
starting lxc container failes with GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 4370 does not belong to any known machine
Last modified: 2023-01-23 22:26:11 UTC
Created attachment 846959 [details] reduced log with descriped error after Tumbleweed update to 20210307 starting an lxc container failes with GDBus.Error:org.freedesktop.machine1.NoMachineForPID: PID 4370 does not belong to any known machine.
I have made some tests with other openSUSE distributions. For testing I installed a minimal Server setup, and added libvird with lxc driver. > zypper in -y --no-recommends libvirt-daemon-lxc libvirt-client libvirt-admin libvirt-daemon-config-network libvirt-daemon-config-nwfilter This was tested with a also minimal lxc container setup <domain type='lxc'> <name>vm1</name> <memory>500000</memory> <os> <type>exe</type> <init>/bin/sh</init> </os> <vcpu>1</vcpu> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/libvirt_lxc</emulator> <interface type='network'> <source network='default'/> </interface> <console type='pty' /> </devices> </domain> The results are as follows openSUSE-42.3 | WORKING openSUSE-15.1 | WORKING openSUSE-15.2 | WORKING openSUSE-tumbleweed | failed with NoMachineForPID Error I think there are some access rights missing or incorrect, but I can't find the root cause, because I'm not that firmiliar with the libvirt, and/or systemd internal workings.
Created attachment 847180 [details] packagelist diff of snappshots 20210302 and 20210305 I have narrowed it down to the tumbleweed snapshot which begins to show the issue. tumbleweed-20210302 still working tumbleweed-20210305 shows the issue - libvirt was updated from 7.0.0 to 7.1.0 - gnutls was updated from 3.6.15 to 3.7.0 - crypto-policies has been newly added (but not completely configured yet)
(In reply to Stefan Eisenwiener from comment #2) > tumbleweed-20210302 still working > tumbleweed-20210305 shows the issue > - libvirt was updated from 7.0.0 to 7.1.0 Thanks for bisecting to this point! I've bisected libvirt between these releases and found commit 9c1693eff4 introduced the problem. When starting an LXC container, virLXCProcessStart() calls virCgroupNewDetectMachine(), which now calls virSystemdGetMachineUnitByPID(). The latter fails, even though the container process should be running since handshake with the lxc driver has already occurred. Investigating...
It smells like a libvirt+cgroups V1 issue. I noticed my TW machine had 'systemd.legacy_systemd_cgroup_controller=0' kernel parameter, which enables the hybrid hierarchy (both V1 and V2) # mount | grep cgroup cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) I suppose the default in TW is hybrid. After removing the parameter I still have V1+V2 (and still encounter this bug) # mount | grep cgroup cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) Finally, I forced the unified (aka V2) hierarchy by adding the 'systemd.unified_cgroup_hierarchy=1' kernel parameter and now we have only V2 # mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate) and the container starts successfully.
(In reply to James Fehlig from comment #4) > I suppose the default in TW is hybrid. True (as of now). > After removing the parameter I still have V1+V2 (and still encounter this bug) The switch therefore has no effect, that's expected. > Finally, I forced the unified (aka V2) hierarchy by adding the > 'systemd.unified_cgroup_hierarchy=1' kernel parameter and now we have only V2 > [...] > and the container starts successfully. Good to know but I guess this should still work for the hybrid mode as well. > which now calls virSystemdGetMachineUnitByPID(). The latter fails, This boils down to machined checking /proc/$PID/cgroup and what is the path in the unified hierarchy (the one that systemd uses to track processes). I noticed in the following dump from libvirt logs (comment 0): > 2021-03-08 22:19:26.374+0000: 4250: debug : virCgroupNewDetect:1151 : pid=4370 controllers=-1 group=0x7f35ebffe310 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupDetectPlacement:348 : Detecting placement for pid 4370 path > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 0:cpu at /sys/fs/cgroup/cpu,cpuacct in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 1:cpuacct at /sys/fs/cgroup/cpu,cpuacct in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 2:cpuset at /sys/fs/cgroup/cpuset in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 3:memory at /sys/fs/cgroup/memory in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 4:devices at /sys/fs/cgroup/devices in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 5:freezer at /sys/fs/cgroup/freezer in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 6:blkio at /sys/fs/cgroup/blkio in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 7:net_cls at /sys/fs/cgroup/net_cls,net_prio in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 8:perf_event at /sys/fs/cgroup/perf_event in /machine.slice/machine-lxc\x2d4370\x2dcontainer1.scope for pid 4370 > 2021-03-08 22:19:26.376+0000: 4250: debug : virCgroupV1ValidatePlacement:418 : Detected mount/mapping 9:name=systemd at /sys/fs/cgroup/systemd in /system.slice for pid 4370 The last line is suspicious. Beware it's not the unified hierarchy path but named v1 hierarchy path that systemd maintains for compatibility in v1 mode. However, the unified path should be the same and that is the suspicion -- there should be no process directly under system.slice cgroup. Q1) In what unified cgroup is the LXC leader process (`grep 0:: /proc/$PID/cgroup`)? Q2) Are there any processes directly under system.slice cgroup (`cat /sys/fs/cgroup/unified/system.slice/cgroup.procs`, `cat /sys/fs/cgroup/systemd/system.slice/cgroup.procs`)? (Would be good to know not only pids but also what processes these are.) Hypothesis: LXC backend (or libvirt?) migrates the leader process directly under system.slice (from machine-lxc\x2d4370\x2dcontainer1.scope) and machined can't find it then.
(In reply to Michal Koutný from comment #5) > Q1) In what unified cgroup is the LXC leader process > (`grep 0:: /proc/$PID/cgroup`)? It apparently depends on where libvirtd was started. E.g. if started via systemd 0::/system.slice/libvirtd.service If started in a terminal session 0::/user.slice/user-0.slice/session-7.scope > Q2) Are there any processes directly under system.slice cgroup > (`cat /sys/fs/cgroup/unified/system.slice/cgroup.procs`, > `cat /sys/fs/cgroup/systemd/system.slice/cgroup.procs`)? There are no processes under either location when starting libvirtd in a terminal session. When started by systemd # cat /sys/fs/cgroup/systemd/system.slice/cgroup.procs 4653 where pid 4653 is the libvirt-lxc process. > Hypothesis: LXC backend (or libvirt?) migrates the leader process directly > under system.slice (from machine-lxc\x2d4370\x2dcontainer1.scope) and > machined > can't find it then. I think your hypothesis is correct. It's the libvirt lxc driver that's moving the process. I don't see the same behavior with the qemu driver. Regardless of how libvirtd was started, the unified cgroup for a qemu VM is something like 0::/machine.slice/machine-qemu\x2d1\x2dsle15sp3b2\x2dkvm.scope I'm not familiar with the libvirt lxc driver, so off to find a needle in the haystack...
(In reply to James Fehlig from comment #6) > There are no processes under either location when starting libvirtd in a > terminal session. My bet, you'd find the libvirt-lxc under /sys/fs/cgroup/{unified,systemd}/user.slice/user-0.slice/cgroup.procs > I think your hypothesis is correct. Hm, it seems it's migrated there directly from under libvirtd.service and not the scope cgroup. Anyway, it looks like it attempts to move to its parent cgroup relative to current position. This caught my eye (in v7.0.0..v7.1.0): 184245f53b94 ("vircgroup: introduce nested cgroup to properly work with systemd") Interestingly, the qemu VM isn't under the new nested 'libvirt' cgroup despite that. Not sure what would be the most effective question now, maybe what cgroup(s) was libvirt-lxc in before the update.
I've hit this same issue when testing if we can start upgrading some systems to 15.3. Libvirt version there also is 7.1.0. When forcing only v2 cgroup hierarchy I can start a container, though still have other issues making it unusable: # virsh -c lxc:// lxc-enter-namespace www1-de --noseclabel /usr/bin/bash libvirt: Cgroup error : Unable to write to '/sys/fs/cgroup/machine.slice/machine-lxc\x2d1712\x2dwww1\x2dde.scope/cgroup.procs': Device or resource busy error: internal error: Child process (27649) unexpected exit status 125
(In reply to Bernd Wachter from comment #8) > # virsh -c lxc:// lxc-enter-namespace www1-de --noseclabel /usr/bin/bash > libvirt: Cgroup error : Unable to write to > '/sys/fs/cgroup/machine.slice/machine-lxc\x2d1712\x2dwww1\x2dde.scope/cgroup. > procs': Device or resource busy > error: internal error: Child process (27649) unexpected exit status 125 This sounds like the container's root cgroup has some subcgroups with controllers attached (would be visible in > /sys/fs/cgroup/machine.slice/machine-lxc\x2d1712\x2dwww1\x2dde.scope/cgroup.subtree_control ) This is due to no internal process constraint in v2 hierarchy -- hence this is a different bug than the originally reported one. It depends on container cgroup layout, new processes can be placed into leaves and lxc-enter-namespace would need to know the internal structure of a running container to achieve this. IOW, it may be impossible to enter root cgroup of some containers on the v2 hierarchy.
I'm aware of this being a different issue - I've just included those bits to show that while it is possible to get containers started by using cgroups v2 only it is not a practical solution to do so.
FYI, I verified the cgroups hybrid issue still exists with libvirt.git master and filed an issue in the upstream tracker https://gitlab.com/libvirt/libvirt/-/issues/182
FYI, Cole posted a patch to fix the NoMachineForPID issue https://listman.redhat.com/archives/libvir-list/2021-October/msg00157.html It works fine with hybrid mode in my testing. While testing the patch with V2 only, I noticed the following unrelated problem virsh -c lxc:/// create lxc-test.xml error: Failed to create domain from lxc-test.xml error: internal error: Unable to find 'memory' cgroups controller mount From the libvirtd log 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectPlacement:199 : group=0x7f65580081a0 path= controllers= selfpath=/system.slice/libvirtd.service 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'cpu' present=no 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'cpuacct' present=yes 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'cpuset' present=no 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'memory' present=no 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'devices' present=yes 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'freezer' present=no 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'io' present=no 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'net_cls' present=no 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'perf_event' present=no 2021-10-07 20:16:00.521+0000: 4131: debug : virCgroupV2DetectControllers:335 : Controller 'name=systemd' present=no 2021-10-07 20:16:00.521+0000: 4131: error : virLXCProcessStart:1236 : internal error: Unable to find 'memory' cgroups controller mount Hmm, what controllers are along the path to /system.slice/libvirtd.service? # cat /sys/fs/cgroup/cgroup.controllers cpuset cpu io memory hugetlb pids rdma misc # cat /sys/fs/cgroup/system.slice/cgroup.controllers pids # cat /sys/fs/cgroup/system.slice/libvirtd.service/cgroup.controllers pids So no memory controller, which along with cpuacct and devices, are required by the lxc driver. I'm surprised there have been no complaints from other distros. Could there be cgroup.subtree_control differences between distros?
What I find interesting that libvirtd seems to be probing _its_ libvirtd.service cgroup but it should really be interested in the container .scope cgroup under machine.slice (to see what's was delegated to the container) or the root cgroup (to see what is available on the host (for further delegation)). The reason might be that other distros enable DefaultMemoryAccounting=yes and in such a case even libvirtd.service would have memory controller enabled in itself which is likely enough to trick lxc.
(In reply to Michal Koutný from comment #14) > What I find interesting that libvirtd seems to be probing _its_ > libvirtd.service cgroup but it should really be interested in the container > .scope cgroup under machine.slice (to see what's was delegated to the > container) or the root cgroup (to see what is available on the host (for > further delegation)). Ah, good point. Likely an upstream improvement that could be made to the lxc driver. I'm not familiar with the driver and don't have the motivation to learn it, particularly since there is no support on the enterprise side. > The reason might be that other distros enable DefaultMemoryAccounting=yes > and in such a case even libvirtd.service would have memory controller > enabled in itself which is likely enough to trick lxc. Thanks for the hint! /usr/lib/systemd/system.conf.d/__20-defaults-SUSE.conf has # Memory accounting incurs performance penalties. Let's keep it # disabled by default since the actual use case for enabling it is not # clear (jsc#PM-2229). DefaultMemoryAccounting=no A local override of the setting allows me to start lxc domains with cgroups V2. But that's a separate doc bug IMO. The bug reported here against hybrid/V1 cgroups is now fixed and submitted to Factory.
This is an autogenerated message for OBS integration: This bug (1183247) was mentioned in https://build.opensuse.org/request/show/924290 Factory / libvirt
Is there a plan for backport these patches to Leap 15.3? It ships Libvirt 7.1.0.
(In reply to Zhufa Sa from comment #18) > Is there a plan for backport these patches to Leap 15.3? > It ships Libvirt 7.1.0. I hadn't planned on it since the bug was filed against TW. Are you affected by this issue on 15.3?
(In reply to James Fehlig from comment #19) > (In reply to Zhufa Sa from comment #18) > > Is there a plan for backport these patches to Leap 15.3? > > It ships Libvirt 7.1.0. > > I hadn't planned on it since the bug was filed against TW. Are you affected > by this issue on 15.3? See https://bugzilla.opensuse.org/show_bug.cgi?id=1183247#c8 - we're stuck on Leap 15.2 for all systems hosting LXC containers until this is fixed.
(In reply to James Fehlig from comment #19) > (In reply to Zhufa Sa from comment #18) > > Is there a plan for backport these patches to Leap 15.3? > > It ships Libvirt 7.1.0. > > I hadn't planned on it since the bug was filed against TW. Are you affected > by this issue on 15.3? Yes, I affected by it. I also thinking in consideration of that the libvirtd shiped within SLE repo relevant to SLE-15-SP3, so there is a plan following ... But this is not a big problem for me because I can workaround it by using Virtualization repo from OBS, Let it go if there no any plan. Thanks.
I added the patch to the SLE15SP3/Leap15.3 libvirt package. It will be included in the next maintenance release.
SUSE-SU-2021:3619-1: An update that contains security fixes can now be installed. Category: security (moderate) Bug References: 1177902,1183247,1186398,1190420,1190493,1190693,1190695,1190917 CVE References: JIRA References: Sources used: SUSE MicroOS 5.1 (src): libvirt-7.1.0-6.8.1 SUSE Linux Enterprise Module for Server Applications 15-SP3 (src): libvirt-7.1.0-6.8.1 SUSE Linux Enterprise Module for Basesystem 15-SP3 (src): libvirt-7.1.0-6.8.1 NOTE: This line indicates an update has been released for the listed product(s). At times this might be only a partial fix. If you have questions please reach out to maintenance coordination.
openSUSE-SU-2021:3619-1: An update that contains security fixes can now be installed. Category: security (moderate) Bug References: 1177902,1183247,1186398,1190420,1190493,1190693,1190695,1190917 CVE References: JIRA References: Sources used: openSUSE Leap 15.3 (src): libvirt-7.1.0-6.8.1
Apparently the "fix" for this issue has caused other problems and will be reverted upstream https://gitlab.com/libvirt/libvirt/-/issues/182#note_763329557 I'll wait for an upstream solution before making additional changes to the downstream libvirt packages.
I'm now hitting this bug on Leap 15.4 on our test machine to see if we can move from 15.3 to 15.4 without issues.
I've been hit by this as well and had to shelve some upgrades. Tumbleweed is working fine at the moment. I've installed libvirtd 8.7.0 on a Leap 15.4 test machine using https://download.opensuse.org/repositories/Virtualization/15.4/Virtualization.repo but this exhibits the same behavior. So maybe it is a systemd interoperability issue? A glance through the commit history did not reveal something obviously related.
(In reply to Eric van Blokland from comment #28) > I've been hit by this as well and had to shelve some upgrades. Is it the same issue like the initial comment? I.e. 'GDBus.Error:org.freedesktop.machine1.NoMachineForPID:'? What process does the offending PID refer to (is it the PID 1 of the container?) Can you check what's in /proc/$PID/cgroup (when the error occurs)? TY > Tumbleweed is working fine at the moment. FTR, openSUSE TW defaults to the unified (v2) hierarchy now and it's not affected (was confirmed in comment 4).
(In reply to Michal Koutný from comment #29) > (In reply to Eric van Blokland from comment #28) > > I've been hit by this as well and had to shelve some upgrades. > > Is it the same issue like the initial comment? I.e. > 'GDBus.Error:org.freedesktop.machine1.NoMachineForPID:'? > What process does the offending PID refer to (is it the PID 1 of the > container?) > Can you check what's in /proc/$PID/cgroup (when the error occurs)? > TY > > > Tumbleweed is working fine at the moment. > > FTR, openSUSE TW defaults to the unified (v2) hierarchy now and it's not > affected (was confirmed in comment 4). Figures, I thought TW was still hybrid. I'm pretty sure it's the same issue. Booting with 'systemd.unified_cgroup_hierarchy=1' resolves it. If I'm not mistaken the offending PID is that from the driver/controller(?) not the actual container. It exits too quickly to dump the cgroup information. I could try and attach a debugger if you're interested. But that would have to wait till tomorrow.
(In reply to Eric van Blokland from comment #30) > It exits too quickly to dump the cgroup information. Understood. > I could try and attach a debugger if you're interested. But that would have > to wait till tomorrow. Maybe the debug logs as in comment 0 would suffice. (Although they don't seem to contain the unified hierarchy path.)
(In reply to Michal Koutný from comment #31) > (In reply to Eric van Blokland from comment #30) > > It exits too quickly to dump the cgroup information. > > Understood. > > > I could try and attach a debugger if you're interested. But that would have > > to wait till tomorrow. > > Maybe the debug logs as in comment 0 would suffice. (Although they don't seem > to contain the unified hierarchy path.) As an exercise I've taken a dive into this issue. I have very little knowledge about cgroups so that has been a bit of a handicap. As mentioned in one of the comments, the lxc control process is started within the scope of the parent, libvirtd. To me it seems as if the control process is correctly added to the new scope. However, the PID remains tied to the old scope. I've tried changing the controller to use cgroup.procs instead of tasks, which does get rid of the process in the old scope. However, the last line of /proc/PID/cgroup never changes and systemd does not find the controller process in the correct unit.
(In reply to Eric van Blokland from comment #32) > To me it seems as if the control process is correctly added to the new scope. > However, the PID remains tied to the old scope. There are multiple hierarchies, for systemd tracking purposes the only v2 is relevant. > I've tried changing the controller ... Do you refer to libvirt or lxc or other code here? > ... to use cgroup.procs instead of tasks, which does get rid of the process > in the old scope. (That should only cause change if you dealt with multi-threaded processes.) > However, the last line of /proc/PID/cgroup never changes and systemd does > not find the controller process in the correct unit. The last row (assuming it's the '0::' v2 hierarchy) is the "source of truth" for systemd when it comes to unit assocition (when it runs in hybrid or unified mode).
(In reply to Michal Koutný from comment #33) > (In reply to Eric van Blokland from comment #32) > > To me it seems as if the control process is correctly added to the new scope. > > However, the PID remains tied to the old scope. > > There are multiple hierarchies, for systemd tracking purposes the only v2 is > relevant. > Like I said my understanding of cgroups is extremely limited. I have no clue about the differences between v1, v2 and what unified is supposed to mean. Somehow I was led to believe that v2 is unified. > > I've tried changing the controller ... > > Do you refer to libvirt or lxc or other code here? I'm referring to the code in libvirt, specifically the code for the lxc controller process (lxc_controller) that attempts to associate the process with correct cgroups > > > ... to use cgroup.procs instead of tasks, which does get rid of the process > > in the old scope. > > (That should only cause change if you dealt with multi-threaded processes.) > So I understood and as far as I've seen the lxc_controller process is not threaded. Imagine my confusion. Could the controller be running threads to handle IPC? > > However, the last line of /proc/PID/cgroup never changes and systemd does > > not find the controller process in the correct unit. > > The last row (assuming it's the '0::' v2 hierarchy) is the "source of truth" > for systemd when it comes to unit assocition (when it runs in hybrid or > unified > mode). I glanced over the systemd code and that led me to believe that if it's not using unified cgroups, it uses the controller at 1:name=systemd Or does it use the "unified code path" when v2 is available? While I am writing this I think to recall only v1 was mounted, but now I'm unsure, I would have to check. Anyway, it is clear I'm having some issues with the terminology.
Created attachment 862186 [details] [PATCH] cgroup/LXC: Do not condition availability of v2 by controllers Thanks for the pointer towards lxc_controller. I see that libvirt uses combination of direct cgroupfs and systemd API access to cgroups. I assume the former is only part of (legacy?) lxc code. The latter is part of the libvirt "core" and hence it fails to properly associate the unit to PID. I attach a patch that could solve this (I'm not sure about ramificatinos on non-systemd, non-LXC setups). Would you be able to apply it and test the behavior on your setup? TY
Created attachment 862326 [details] [PATCH] cgroup/LXC: Do not condition availability of v2 by controllers So if I understand correctly, in hybrid cgroups the scope needs to be set using V2 for systemd. However, libvirt/lxc treats V2 as unavailable if there are no controllers. Your patch does not work out of the box. It fails in virCgroupV2BindMount when it tries to create the directory (g_mkdir_with_parents) "/sys/fs/cgroup/unified". Which makes sense to me on a read-only file system. If I make the errors in this function non-fatal, everything seems to work as expected. If I make virCgroupBindMount do the binds in reverse order (V2 has precedence over V1) virCgroupV2BindMount succeeds without modifications. That may seem a bit dirty, but to me it's less convoluted than letting V2 be aware it should be mounted within V1. I've tested this patch with hybrid and unified cgroups, but not without systemd.
(In reply to Eric van Blokland from comment #36) > Your patch does not work out of the box. It fails in virCgroupV2BindMount > when it tries to create the directory (g_mkdir_with_parents) > "/sys/fs/cgroup/unified". Thanks for the test! > That may seem a bit dirty, but to me it's less convoluted than letting V2 be > aware it should be mounted within V1. It relies on the tmpfs mount over the ro-path in the v1 virCgroupBindMount, yeah, that's slightly dirty but nothing incorrectable (should be explained in the commit message though). > I've tested this patch with hybrid and unified cgroups, but not without > systemd. Would you send the patch (even the current form) to the upstream [1]? (To resolve the nuances there.) I can help you if you're unfamiliar; if you send yourself, please Cc me on that mail. [1] https://libvirt.org/submitting-patches.html
Just a reminder, please reference the gitlab issue mentioned in #12 in any patches sent to the libvirt list that fix this issue. Thanks a lot for the help!
(In reply to Michal Koutný from comment #37) > (In reply to Eric van Blokland from comment #36) > > Your patch does not work out of the box. It fails in virCgroupV2BindMount > > when it tries to create the directory (g_mkdir_with_parents) > > "/sys/fs/cgroup/unified". > > Thanks for the test! > > > That may seem a bit dirty, but to me it's less convoluted than letting V2 be > > aware it should be mounted within V1. > > It relies on the tmpfs mount over the ro-path in the v1 virCgroupBindMount, > yeah, that's slightly dirty but nothing incorrectable (should be explained > in the commit message though). > If you have any suggestions on how to improve that, let me know. I recall "do" loops are frowned upon these days, but I wanted to make explicit this loop was doing something else than the others, without messing with types. In any case I'll mention it more explicitly in the commit message. > > I've tested this patch with hybrid and unified cgroups, but not without > > systemd. > > Would you send the patch (even the current form) to the upstream [1]? (To > resolve the nuances there.) I can help you if you're unfamiliar; if you send > yourself, please Cc me on that mail. > > [1] https://libvirt.org/submitting-patches.html You guessed correctly that I'm not familiar with mailing patches around, but I think I'll manage. I've got time tomorrow in the afternoon to tidy things up. I'll let you know if I get stuck somewhere. I'll also be sure to cc you and mention the report in the libvirt gitlab.
The patch was accepted. I broke the clang builds in the process though. :-/
Since my patch broke cgroups, it was reverted. On the mailinglist three possible fixes/workarounds were mentioned but upstream does not seem to care. I would like to see this fixed though. I would like to know which fix would be acceptable by the OpenSUSE maintainers, so I can implement that. 1. Skip obtaining the machine unit name during container startup 2. Let systemd launch the lxc controller process as a transient unit 3. Add a dummy controller to force some v2 logic
(In reply to Eric van Blokland from comment #41) > Since my patch broke cgroups, it was reverted. On the mailinglist three > possible fixes/workarounds were mentioned but upstream does not seem to > care. I would like to see this fixed though. I would like to know which fix > would be acceptable by the OpenSUSE maintainers, so I can implement that. > > 1. Skip obtaining the machine unit name during container startup > 2. Let systemd launch the lxc controller process as a transient unit > 3. Add a dummy controller to force some v2 logic I'd be receptive to any fix that is isolated to the lxc driver. I'm less fond of a fix that touches common code, which could also affect the qemu driver.
In that case it is up to you (In reply to James Fehlig from comment #42) > (In reply to Eric van Blokland from comment #41) > > Since my patch broke cgroups, it was reverted. On the mailinglist three > > possible fixes/workarounds were mentioned but upstream does not seem to > > care. I would like to see this fixed though. I would like to know which fix > > would be acceptable by the OpenSUSE maintainers, so I can implement that. > > > > 1. Skip obtaining the machine unit name during container startup > > 2. Let systemd launch the lxc controller process as a transient unit > > 3. Add a dummy controller to force some v2 logic > > I'd be receptive to any fix that is isolated to the lxc driver. I'm less > fond of a fix that touches common code, which could also affect the qemu > driver. In that case it's a trade off between maintainability and reliability. In lxc_process.c there are two calls that cause issues: virLXCDomainGetMachineName lxc_process.c:1465 virCgroupNewDetectMachine lxc_process.c:1473 The purpose of these calls is to abort the container initialization in an early stage if something goes wrong. More specifically, if issues occurred with machined registration and cgroup creation. Option 1: For easy maintenance I could make a condition to only call these when the cgroup v2 backend is available or when there is no systemd at the cost of some reliability. Option 2: Alter the signature of virCgroupNewDetectMachine to have a flag to optionally check the unit name. When called from lxc_process.c we pass the flag to not check the unit name when the cgroup v2 backend is not available and systemd is present. This way we keep some reliability at the cost of (probably) more maintenance. Option 3: Implement a function with the relevant parts of virCgroupNewDetectMachine directly in lxc_process.c as a fallback for when the cgroup v2 backend is not available and systemd is present. Some reliability, more maintenance. In my opinion option 1 is sufficient. Let me know if one of these solutions would be acceptable.
Created attachment 863403 [details] Fix lxc container initialization with systemd and hybrid cgroups Or Option 4: In an environment with systemd and hybrid cgroups retrieve the control process child pid and perform these particular checks with it.
FWIW: My host (including LXC containers) was working fine with Leap 15.3. After 'zypper dup' to 15.4 the LXC containers weren't starting anymore showing the error PID ... does not belong to any known machine Adding systemd.unified_cgroup_hierarchy=1 to the kernel cmdline didn't help. The error changed to internal error: Unable to find 'memory' cgroups controller mount Downgrading to the packages from Leap 15.3 does give me working LXC containers: # add 15.3 repos http://download.opensuse.org/distribution/leap/15.3/repo/oss/ http://download.opensuse.org/update/leap/15.3/sle/ http://download.opensuse.org/update/leap/15.3/oss # downgrade zypper install --oldpackage libvirt-daemon-driver-lxc-7.1.0-150300.6.35.2 libvirt-7.1.0-150300.6.35.2 [...] The following 52 packages are going to be downgraded: kexec-tools libvirt libvirt-client libvirt-daemon libvirt-daemon-config-network libvirt-daemon-config-nwfilter libvirt-daemon-driver-interface libvirt-daemon-driver-libxl libvirt-daemon-driver-lxc libvirt-daemon-driver-network libvirt-daemon-driver-nodedev libvirt-daemon-driver-nwfilter libvirt-daemon-driver-qemu libvirt-daemon-driver-secret libvirt-daemon-driver-storage libvirt-daemon-driver-storage-core libvirt-daemon-driver-storage-disk libvirt-daemon-driver-storage-gluster libvirt-daemon-driver-storage-iscsi libvirt-daemon-driver-storage-iscsi-direct libvirt-daemon-driver-storage-logical libvirt-daemon-driver-storage-mpath libvirt-daemon-driver-storage-rbd libvirt-daemon-driver-storage-scsi libvirt-daemon-lxc libvirt-libs lxd lxd-bash-completion python3-libvirt-python qemu qemu-audio-spice qemu-block-curl qemu-block-nfs qemu-block-rbd qemu-chardev-baum qemu-chardev-spice qemu-hw-display-qxl qemu-hw-display-virtio-gpu qemu-hw-display-virtio-gpu-pci qemu-hw-display-virtio-vga qemu-hw-s390x-virtio-gpu-ccw qemu-hw-usb-redirect qemu-hw-usb-smartcard qemu-kvm qemu-tools qemu-ui-curses qemu-ui-gtk qemu-ui-opengl qemu-ui-spice-app qemu-ui-spice-core qemu-x86 xen-libs The following 3 NEW packages are going to be installed: libbrlapi0_7 libvirglrenderer0 libvirt-bash-completion The following 3 packages are going to be REMOVED: qemu-accel-qtest qemu-accel-tcg-x86 qemu-hw-usb-host [...] # lock versions zypper addlock libvirt-daemon-driver-lxc # remove 15.3 repos
(In reply to Till Dörges from comment #45) > After 'zypper dup' to 15.4 the LXC containers weren't starting anymore > showing the error > > PID ... does not belong to any known machine > > Adding systemd.unified_cgroup_hierarchy=1 to the kernel cmdline didn't help. > The error changed to > > internal error: Unable to find 'memory' cgroups controller mount The memory controller is disabled by default on some SUSE distros. See comment #15.
(In reply to Eric van Blokland from comment #44) > Created attachment 863403 [details] > Fix lxc container initialization with systemd and hybrid cgroups Thanks. I've only compile-tested the patch. I assume you've verified it actually solves the problem :-). If others are interested in testing, I've added Eric's patch to a libvirt package branched to my home project: https://build.opensuse.org/package/show/home:jfehlig:branches:Virtualization:bug1183247/libvirt Binaries are published to corresponding repos. E.g. for 15.4: https://mirrorcache-us.opensuse.org/repositories/home:/jfehlig:/branches:/Virtualization:/bug1183247/15.4/ > Or Option 4: > In an environment with systemd and hybrid cgroups retrieve the control > process child pid and perform these particular checks with it. I like this option since it is confined to the lxc driver. From the patch commit message: > To work around this we retrieve the lxc control process child process pid (the process that is registered with machined) and perform the checks using that pid. Ok. From the patch: + /* In an environment with hybrid cgroups and systemd the v2 backend is not available. + * Systemd however depends on V2 for unit naming. This causes the next two checks to fail. + * To work around this issue we retrieve the actual container pid and check on that instead. */ + if (virSystemdHasMachined() == 0 && cgroupBackends[VIR_CGROUP_BACKEND_TYPE_V2]->available() == false) { + pidFile = g_strdup_printf("/proc/%u/task/%u/children", vm->pid, vm->pid); + if (virFileReadAll(pidFile, 1024 * 1024, &pidStr) < 0) + goto cleanup; + + virTrimSpaces(pidStr, NULL); + + pidList = g_strsplit(pidStr, " ", 20); + if (!pidList) + goto cleanup; + + if (virStrToLong_i(pidList[0], NULL, 10, &checkPid) < 0) + goto cleanup; + + } else { + checkPid = vm->pid; + } Is the "actual container pid" *always* the first pid in this file? If so, and we don't care about the rest, seems the g_strsplit 'max_tokens' param could be 2.
Created attachment 863686 [details] Fix lxc container initialization with systemd and hybrid cgroups @James, Haha, yeah I did actually verify it worked. I'm fairly certain the forked lxc_process will consistently be the only child process at this stage of the synchronization. I can´t be 100% sure though. I've updated the patch with your suggestion. I've also added a cast on vm->pid to be consistent with other occurrences in the lxc driver. Thank you for considering my patch, happy holidays!
Just tested with 8.10.0-Virt.150400.1038.1 and my LXC containers are working: --- snip --- user@box:~> rpm -qa | egrep -i 'libvirt|Virt.150400' | sort libvirt-client-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-config-network-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-config-nwfilter-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-interface-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-lxc-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-network-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-nodedev-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-nwfilter-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-qemu-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-secret-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-core-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-disk-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-gluster-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-iscsi-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-iscsi-direct-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-logical-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-mpath-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-rbd-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-driver-storage-scsi-8.10.0-Virt.150400.1038.1.x86_64 libvirt-daemon-lxc-8.10.0-Virt.150400.1038.1.x86_64 libvirt-glib-1_0-0-4.0.0-150400.1.10.x86_64 libvirt-libs-8.10.0-Virt.150400.1038.1.x86_64 python3-libvirt-python-8.0.0-150400.1.6.x86_64 system-group-libvirt-20170617-150400.22.33.noarch typelib-1_0-LibvirtGLib-1_0-4.0.0-150400.1.10.x86_64 --- snap ---
(In reply to Eric van Blokland from comment #48) > Created attachment 863686 [details] > Fix lxc container initialization with systemd and hybrid cgroups I've applied this patch to the libvirt package and submitted to Factory. A holiday gift from Eric for all those who have been patient with this bug :-). I've also backported it to the SLE15 SP4 libvirt package (which feeds Leap 15.4) for a future maintenance update.