Bug 1170495 - clock_gettime64() fails with EPERM in a docker container
clock_gettime64() fails with EPERM in a docker container
Status: NEW
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kernel
Current
Other Other
: P5 - None : Normal (vote)
: ---
Assigned To: Aleksa Sarai
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2020-04-24 21:48 UTC by Michael Schröder
Modified: 2020-04-27 10:58 UTC (History)
6 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Schröder 2020-04-24 21:48:02 UTC
This is a somewhat weird bug. We got reports that container builds fail for Factory in i586 because zypper can not find packages. After some debugging we found that mkostemp() did not work correctly in libzypp, leading to libzypp being unable to refresh repositories.

After some more debugging I found that this happens when the vdso vsyscall falls back to a real clock_gettime64() syscall. This syscall fails with EPERM when running in a container:

clock_gettime64(CLOCK_MONOTONIC, 0xbffe46dc) = -1 EPERM (Operation not permitted)

I don't see why clock_gettime64() should be restricted in any way.

(It would also be interesting to know why the vdso call needs to do a syscall, but that's a different issue...)
Comment 1 Michael Schröder 2020-04-24 22:07:48 UTC
Hmm, Fabian just wrote me that it's probably just the clock_gettime64 syscall missing from docker's seccomp profile. So this is most likely not a kernel bug.

You may want to investigate what's wrong with the vdso call, though. This is independent of docker, changing the clocksource from tsc to hpet makes the vdso code do a syscall.

This is guest kernel-pae-5.6.4-1.2.i686 running in a host kernel-default-4.12.14-lp151.28.48.1.x86_64
Comment 2 Takashi Iwai 2020-04-25 07:05:47 UTC
(In reply to Michael Schröder from comment #1)
> Hmm, Fabian just wrote me that it's probably just the clock_gettime64
> syscall missing from docker's seccomp profile. So this is most likely not a
> kernel bug.

See bug 1163766 as another example.
Comment 3 Michal Suchanek 2020-04-25 09:35:10 UTC
To be very specific here:

Many seccomp profiles only allow specific syscalls.

However, to be able to support post Y2038 dates, large files, and whatnot 64bit syscall variant needs to be called on a 32bit system.

Profiles that allow a 32bit syscall but not its 64bit variant are broken.

With support for post Y2038 dates many 32bit applications start calling clock_gettime64 instead of clock_gettime which exposes this issue.
Comment 4 Michael Schröder 2020-04-27 08:46:02 UTC
Note that glibc implements clock_gettime() with a clock_gettime64() call on 32bit, so this is not just some applications...

Anyway, the big question is why the syscall is needed at all and it's not just the vdso code reading the values from memory.
Comment 5 Fabian Vogt 2020-04-27 10:20:42 UTC
Reassigning to docker maintainers, as that's the root cause.

(In reply to Michael Schröder from comment #4)
> Note that glibc implements clock_gettime() with a clock_gettime64() call on
> 32bit, so this is not just some applications...

Unlike the statx case, here it's actually feasible to work around this in glibc (if desirable) by just treating EPERM like ENOSYS. Adding glibc maintainer to CC.

> Anyway, the big question is why the syscall is needed at all and it's not
> just the vdso code reading the values from memory.

Some clocksources don't have vdso support. I see the same here with kvm-clock, but after switching to tsc vdso works again. So this is presumably expected.
Comment 6 Michal Suchanek 2020-04-27 10:58:51 UTC
Also some containers don't have vdso support at all.

IIRC you need the vdso binaries that are provided in /lib/modules by the kernel package and it is typically not installed in containers.