Bugzilla – Bug 1186256
qemu-linux-user: hardcoded binfmt handler doesn't play well with containers
Last modified: 2022-10-27 18:38:39 UTC
Created attachment 849481 [details] Proposed patch for qemu-binfmt-conf.sh Since abbc0ce ("qemu-binfmt-conf: use qemu-ARCH-binfmt"), qemu-binfmt-conf.sh under openSUSE automatically replaces the default qemu binfmt wrapper "qemu-$ARCH" with "qemu-$ARCH-binfmt" in order to ensure that argv[0] is preserved; qemu-$ARCH-binfmt is a link to qemu-binfmt, which is just a simple wrapper that mangles argv to achieve the desired result. This is a SUSE-specific modification which isn't used upstream. This approach is inconvenient in some situations. In particular for running foreign-arch containers, it's useful to use the binfmt_misc "F" ("fix binary") flag to pre-load the qemu wrapper in the kernel. That way, foreign-arch containers can be run just like native containers, without having to bind-mount interpreters into the container. But that's impossible with the SUSE binfmt wrapper that needs to exec() a different (native) executable. In the openSUSE default mode of qemu-binfmt-conf.sh, the user needs to bind-mount both the -binfmt executable and the actual emulator into the container: > $ podman run -it --rm \ > -v /usr/bin/qemu-ppc64le-binfmt:/usr/bin/qemu-ppc64le-binfmt \ > -v /usr/bin/qemu-ppc64le:/usr/bin/qemu-ppc64le \ > ppc64le/busybox uname -m > ppc64le Otherwise, he gets > $ podman run -t --rm ppc64le/busybox uname -m > standard_init_linux.go:219: exec user process caused: no such file or directory If qemu-binfmt-conf.sh is used with the --persistent flag, qemu-ppc64le-binfmt is loaded into the kernel, but qemu-ppc64le must still be bind-mounted. If qemu-ppc64le was used directly as persistent binfmt_misc helper, it would be sufficient to run the container as if it was a native one: > $ podman run -it --rm ppc64le/busybox uname -m > ppc64le I can see why it makes sense to try to preserve argv[0], but for me at least, the "foreign container" use case is more important. Therefore I'd like to be able to switch the behavior of the qemu binfmt_misc helper back to the upstream default. So far I've worked around the issue by simply using the upstream container "docker.io/multiarch/qemu-user-static", but I'd like to be able to do this easily with openSUSE on-board tools. The attached patch allows the user to override the default "-binfmt" suffix by running "qemu-binfmt-conf.sh --qemu-suffix ''". (Note: "qemu-binfmt-conf.sh -F ''" doesn't work, that's a different issue).
Created attachment 849483 [details] Proposed patch for qemu-binfmt-conf.sh
wrt "-F", I just posted a patch to qemu-devel, subject "qemu-binfmt-conf.sh: fix -F option".
Note: I tried to create an OBS request with these two patches, but I failed to make update_git.sh work.
Hello Martin, Just added your patch in our stage repo (https://build.opensuse.org/package/revisions/Virtualization/qemu). I'll send a SR to Factory as soon as they finish the QEMU v6.1 update. (https://build.opensuse.org/request/show/914458). Thank you! Jose
Great, thank you!
This is an autogenerated message for OBS integration: This bug (1186256) was mentioned in https://build.opensuse.org/request/show/917638 Factory / qemu
José, we're not there yet because an upstream bot rejected my -F patch (comment 2) because of a style issue which was definitely not my fault. The overlong line was there before my patch already. I never got this reply (spam folder? no idea), so I was also never able to fix this non-issue. https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg06012.html I'll re-post the patch and cc you. I'd be glad if you could pull it into opensuse before upstream gets to it.
(In reply to Martin Wilck from comment #7) > José, > > we're not there yet because an upstream bot rejected my -F patch (comment 2) > because of a style issue which was definitely not my fault. The overlong > line was there before my patch already. I never got this reply (spam folder? > no idea), so I was also never able to fix this non-issue. > > https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg06012.html > > I'll re-post the patch and cc you. I'd be glad if you could pull it into > opensuse before upstream gets to it. Hello Martin, Sure, I'll add it here. By the way, your -F patch is in Factory, should be available in this next update. Thanks
This is an autogenerated message for OBS integration: This bug (1186256) was mentioned in https://build.opensuse.org/request/show/920365 Factory / qemu
The upstream v2 submission fell through the cracks again, it seems. Trying once more. Perhaps an acked-by: of one of you guys might help...
Laurent has reviewed my -F patch now ... https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg05530.html
But FTR, the patch from comment 1 is not yet in factory (qemu-linux-user-6.1.0-34.1.x86_64).
(In reply to Martin Wilck from comment #15) > But FTR, the patch from comment 1 is not yet in factory > (qemu-linux-user-6.1.0-34.1.x86_64). > Mmm... indeed. And I'm not sure I understand why. Especially, I don't know why https://build.opensuse.org/request/show/920365 contains: * Patches dropped: qemu-binfmt-conf.sh-allow-overriding-SUS.patch So, it seems that the patch was there (I'm guessing added by Jose?) and was removed. In any case, I am adding/reinstating it. It will appear here: https://build.opensuse.org/package/show/home:dfaggioli:devel:Virtualization/qemu and I'll submit to Factory after a quick test.
(In reply to Martin Wilck from comment #14) > Laurent has reviewed my -F patch now ... > https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg05530.html > Interesting. I have to look properly at the code, but the change mentioned here ("linux-user: manage binfmt-misc preserve-arg[0] flag") is already in 6.0.0 and 6.1.0. I think that means we can adjust things in such a way that we then could drop both the old patch from Alex, and also your one from comment 1, at least for distros that have > 5.12 kernel (i.e., TW and 15.4). What do you think? While your -F patch will stay, until we ship a QEMU version that has it, I guess.
Thanks, I finally start to understand. I have to say I only partially understood Laurent's response so far. If Alex' patch is dropped, my patch from comment 1 almost certaintly won't be necessary any more. I don't care about preserving argv[0]. All I'm interested in is not to have to bind-mount a qemu executable into foreign arch containers. But if you drop Alex' patch, you may have to talk to some of the people who are interested in argv[0] preservation.
(In reply to Martin Wilck from comment #18) > Thanks, I finally start to understand. I have to say I only partially > understood Laurent's response so far. > Hey, so, sorry this is taking a while. As said in comment 16, I have a build with both your patches in. I have installed qemu and qemu-linux-user from that repo, and I can see the patches there. I.e., from the changelog (obtained with `rpm -q --changelog qemu-linux-user`): * Fri Dec 03 2021 Dario Faggioli <dfaggioli@suse.com> * Patches added: qemu-binfmt-conf.sh-allow-overriding-SUS.patch - Replace patch to fix hardcoded binfmt handler (bsc#1186256) * Patches dropped: qemu-binfmt-conf.sh-allow-overriding-SUS.patch * Patches added: qemu-binfmt-conf.sh-should-use-F-as-shor.patch [1] I also have manually checked the /usr/sbin/qemu-binfmt-conf.sh file that is installed on the system, and it looks correct (it has both patches applied). Now I'm doing the following: virt136:~ # qemu-binfmt-conf.sh -F '' Setting /usr/bin/qemu-alpha as binfmt interpreter for alpha Setting /usr/bin/qemu-arm as binfmt interpreter for arm Setting /usr/bin/qemu-armeb as binfmt interpreter for armeb ... ... ... Which results in: virt136:~ # cat /proc/sys/fs/binfmt_misc/qemu-ppc64le enabled interpreter /usr/bin/qemu-ppc64le flags: P offset 0 magic 7f454c4602010100000000000000000002001500 mask ffffffffffffff00fffffffffffffffffeffff00 But I still see this: virt136:~ # podman run -t --rm ppc64le/busybox uname -m standard_init_linux.go:228: exec user process caused: no such file or directory Do you happen to see what I might be doing wrong? [1] I finally think I understand what happened... It seems like Jose had added the "override SUSE workaround patch" but then he misunderstood one of the comments and, instead of just adding the "fix -F" patch, he replaced the previously added "override SUSE workaround patch" with it. Well, that does not matter much now, but just FTR...
In fact, if I do just: virt136:~ # qemu-binfmt-conf.sh I.e., I don't take advantage of your patches, I then see this: virt136:~ # cat /proc/sys/fs/binfmt_misc/qemu-ppc64le enabled interpreter /usr/bin/qemu-ppc64le-binfmt flags: P offset 0 magic 7f454c4602010100000000000000000002001500 And, consistently: virt136:~ # podman run -t --rm ppc64le/busybox uname -m standard_init_linux.go:228: exec user process caused: no such file or directory virt136:~ # podman run -it --rm \ -v /usr/bin/qemu-ppc64le-binfmt:/usr/bin/qemu-ppc64le-binfmt \ -v /usr/bin/qemu-ppc64le:/usr/bin/qemu-ppc64le \ ppc64le/busybox uname -m ppc64le
Mmm... I also see this: virt136:~ # ls /usr/bin/qemu-ppc64le* -l -rwxr-xr-x 1 root root 3940664 Dec 6 14:35 /usr/bin/qemu-ppc64le lrwxrwxrwx 1 root root 11 Dec 6 14:33 /usr/bin/qemu-ppc64le-binfmt -> qemu-binfmt Not sure if/how it matters yet, I need to check...
What I want to achieve (being able to simply start a foreign-arch container without having to bind-mount anything from the native environment into it) only works with with "fix binary" settings, where the statically linked interpreter binary is loaded into the kernel (--persistent flag of qemu-binfmt-conf.sh, "F" flag in the kernel). So you need to run e.g. qemu-binfmt-conf.sh --systemd s390x --persistent yes --qemu-suffix "" to make this work. The result looks like this: # cat /proc/sys/fs/binfmt_misc/qemu-s390x enabled interpreter /usr/bin/qemu-s390x flags: PF offset 0 magic 7f454c4602020100000000000000000000020016 mask ffffffffffffff00fffffffffffffffffffeffff Hope this makes sense. (In reply to Dario Faggioli from comment #21) > Mmm... I also see this: > > virt136:~ # ls /usr/bin/qemu-ppc64le* -l > -rwxr-xr-x 1 root root 3940664 Dec 6 14:35 /usr/bin/qemu-ppc64le > lrwxrwxrwx 1 root root 11 Dec 6 14:33 /usr/bin/qemu-ppc64le-binfmt -> > qemu-binfmt This is the normal SUSE setup.
(In reply to Martin Wilck from comment #22) > So you need to run e.g. > > qemu-binfmt-conf.sh --systemd s390x --persistent yes --qemu-suffix "" > Ah, right! > to make this work. The result looks like this: > > # cat /proc/sys/fs/binfmt_misc/qemu-s390x > enabled > interpreter /usr/bin/qemu-s390x > flags: PF > Indeed, I was missing one of the flags. > offset 0 > magic 7f454c4602020100000000000000000000020016 > mask ffffffffffffff00fffffffffffffffffffeffff > > Hope this makes sense. > It does. Sorry again, but I have not much experience with qemu-binfmt-conf.sh. In fact, I used to set things up manually, and am only now getting familiar with the code. Like you said, it works. SR coming!
SR 936373 (https://build.opensuse.org/request/show/936373) is in Factory now, and it had both the patches, and according to my tests, things work as wanted now, so I'm closing this. Thanks for the patches and for the help reproducing and debugging this!
This is an autogenerated message for OBS integration: This bug (1186256) was mentioned in https://build.opensuse.org/request/show/1008827 Factory / qemu