Bug 1084766 - kubernetes-kubelet doesn't create /var/lib/kubelet properly in transactional-update
kubernetes-kubelet doesn't create /var/lib/kubelet properly in transactional-...
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Containers
Current
Other Other
: P2 - High : Normal (vote)
: ---
Assigned To: Maximilian Meister
E-mail List
obs:running:10751:important
: Fix_No_Build
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2018-03-09 21:26 UTC by Richard Brown
Modified: 2021-05-17 12:55 UTC (History)
4 users (show)

See Also:
Found By: ---
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Brown 2018-03-09 21:26:51 UTC
kubernetes-kubelet doesn't create /var/lib/kubelet properly in transactional-update

This means kubelet is not installable on Kubic unless it's done as part of the systems initial installation

This means users cannot move from MicroOS to a functioning Kubernetes cluster

Should be another good usecase for a tmpfiles.d workaround
Comment 1 Valentin Rothberg 2018-05-08 12:22:53 UTC
It seems that this error has been lifted to be a fatal one with Kubernetes 1.10. I cannot get a 1.10.2 kubelet running at the moment.
Comment 2 Flavio Castelli 2018-05-08 12:54:30 UTC
@Richard: can you fix that inside of MicroOS?
Comment 3 Richard Brown 2018-05-08 13:08:09 UTC
(In reply to Flavio Castelli from comment #2)
> @Richard: can you fix that inside of MicroOS?

No - MicroOS cannot pre-emptively make a directory for a package that it doesn't have installed

It should be fixed in the package, and I probably shouldn't be the one fixing the package.
Comment 4 Richard Brown 2018-05-08 13:09:21 UTC
Note - kubelet 1.10 also wants to make a folder in /usr/libexec/kubernetes/kubelet-plugins which it wants to make at runtime - this should also be fixed to be made by the package
Comment 5 Valentin Rothberg 2018-05-08 13:27:14 UTC
(In reply to Richard Brown from comment #4)
> Note - kubelet 1.10 also wants to make a folder in
> /usr/libexec/kubernetes/kubelet-plugins which it wants to make at runtime -
> this should also be fixed to be made by the package

/usr/libexec/ doesn't exist on openSUSE/SLE as the RPM macro defaults to /usr/lib/. Is there a configuration option to alter the plugins path? CRI-O and Podman have such switches, which makes it easier to be adopted on more distros.
Comment 6 Richard Brown 2018-05-08 14:52:14 UTC
(In reply to Valentin Rothberg from comment #5)
> (In reply to Richard Brown from comment #4)
> > Note - kubelet 1.10 also wants to make a folder in
> > /usr/libexec/kubernetes/kubelet-plugins which it wants to make at runtime -
> > this should also be fixed to be made by the package
> 
> /usr/libexec/ doesn't exist on openSUSE/SLE as the RPM macro defaults to
> /usr/lib/. Is there a configuration option to alter the plugins path? CRI-O
> and Podman have such switches, which makes it easier to be adopted on more
> distros.

--volume-plugin-dir looks like the runtime switch, I guess this could be put into our KUBELET_ARGS in /etc/kubernetes/kubelet?
Comment 7 Flavio Castelli 2018-05-09 14:49:30 UTC
I'm against changing the default locations. This makes things harder for people jumping on Kubic/CaaSP because they don't find what they are looking for into the expected locations.
Comment 8 Richard Brown 2018-05-09 15:13:23 UTC
(In reply to Flavio Castelli from comment #7)
> I'm against changing the default locations. This makes things harder for
> people jumping on Kubic/CaaSP because they don't find what they are looking
> for into the expected locations.

I get where you're coming from but creating /usr/libexec would be madness given /usr/libexec is not common, was forbidden in many versions of the FHS, is not a valid path in any *SUSE distribution...and is generally not 'normal' behaviour

In this case we almost certainly should be using /usr/lib, instead of creating an entirely new structure under /usr just for k8s
Comment 9 Flavio Castelli 2018-05-09 16:11:29 UTC
(In reply to Richard Brown from comment #8)
> I get where you're coming from but creating /usr/libexec would be madness
> given /usr/libexec is not common, was forbidden in many versions of the FHS,
> is not a valid path in any *SUSE distribution...and is generally not
> 'normal' behaviour
> 
> In this case we almost certainly should be using /usr/lib, instead of
> creating an entirely new structure under /usr just for k8s

I see, fine with me then.
Comment 10 Maximilian Meister 2018-05-29 07:22:32 UTC
will fix this as part of the 1.10 update
Comment 11 Maximilian Meister 2018-05-29 09:25:03 UTC
i've tested now 1.10.2 on kubic, and as expected this error prevents kubelet from starting

kubelet.go:1354] Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 46 in cached partitions map

"cat /proc/self/mountinfo" doesn't show anything related to that folder, but it exists on the host

so what should we do exactly to fix this?
Comment 12 Flavio Castelli 2018-05-29 09:31:23 UTC
We would have to create the subvolume during package installation. We are doing that inside of the kubelet package to address CNI's requirements.

Take a look at line 335 of 
https://build.suse.de/package/view_file/Devel:CASP:Head:ControllerNode/kubernetes/kubernetes.spec?expand=1

%post kubelet
%service_add_post kubelet.service
%if 0%{?suse_version} < 1500
# create some subvolumes needed by CNI
if [ ! -e /var/lib/cni ]; then
  if [ "`findmnt -o FSTYPE -l  /|grep -v FSTYPE`" = "btrfs" ]; then
    /usr/sbin/mksubvolume /var/lib/cni
  fi
fi
%endif
Comment 13 Thorsten Kukuk 2018-05-29 12:54:54 UTC

(In reply to Flavio Castelli from comment #12)
> We would have to create the subvolume during package installation. We are
> doing that inside of the kubelet package to address CNI's requirements.

We don't need to create a subvolume. If customers do a new installation,
/var is a subvolume. If customers update, /var/lib/kubelet exist already.

You want to create a systemd-tmpfiles for kubelet like we have
with kubernetes.tmp.conf for /var/lib/cni.
Comment 14 Maximilian Meister 2018-05-30 04:48:54 UTC
(In reply to Thorsten Kukuk from comment #13)
> 
> (In reply to Flavio Castelli from comment #12)
> > We would have to create the subvolume during package installation. We are
> > doing that inside of the kubelet package to address CNI's requirements.
> 
> We don't need to create a subvolume. If customers do a new installation,
> /var is a subvolume. If customers update, /var/lib/kubelet exist already.
> 
> You want to create a systemd-tmpfiles for kubelet like we have
> with kubernetes.tmp.conf for /var/lib/cni.

created an image [0] with the changes from this request [1], but unfortunately that's still not fixing the error, have i missed something besides adding to that file?

[0] https://download.opensuse.org/repositories/home:/m_meister:/branches:/devel:/CaaSP:/images/images/openSUSE-Tumbleweed-Kubic.x86_64-15.0-CaaSP-Stack-hardware-x86_64-Build5.9.qcow2
[1] https://build.opensuse.org/request/show/612897
Comment 15 Thorsten Kukuk 2018-05-30 04:56:27 UTC
(In reply to Maximilian Meister from comment #14)

> created an image [0] with the changes from this request [1], but
> unfortunately that's still not fixing the error, have i missed something
> besides adding to that file?

The fix looks correct. What exactly is the error you are seeing?
(Sorry, cannot download here the image and will be only back in the office next week).
Comment 16 Maximilian Meister 2018-05-30 04:59:31 UTC
(In reply to Thorsten Kukuk from comment #15)
> (In reply to Maximilian Meister from comment #14)
> 
> > created an image [0] with the changes from this request [1], but
> > unfortunately that's still not fixing the error, have i missed something
> > besides adding to that file?
> 
> The fix looks correct. What exactly is the error you are seeing?
> (Sorry, cannot download here the image and will be only back in the office
> next week).

May 30 04:56:24 admin hyperkube[2766]: F0530 04:56:24.779150    2766 kubelet.go:1354] Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/var/lib/kubelet": could not find device with major: 0, minor: 46 in cached partitions map
Comment 17 Thorsten Kukuk 2018-05-30 05:08:06 UTC
> May 30 04:56:24 admin hyperkube[2766]: F0530 04:56:24.779150    2766
> kubelet.go:1354] Failed to start ContainerManager failed to get rootfs info:
> failed to get device for dir "/var/lib/kubelet": could not find device with
> major: 0, minor: 46 in cached partitions map

But /var/lib/kubelet exist? If yes, then this bug is solved and that's another, new problem.
Comment 18 Maximilian Meister 2018-05-30 05:27:16 UTC
(In reply to Thorsten Kukuk from comment #17)
>
> But /var/lib/kubelet exist? If yes, then this bug is solved and that's
> another, new problem.

yes it exists:

admin:~ # ls -la /var/lib/kubelet
total 4
drwxr-xr-x 1 root root 124 May 30 04:42 .
drwxr-xr-x 1 root root 322 May 30 04:42 ..
-rw-r--r-- 1 root root  40 May 30 04:42 cpu_manager_state
drwxr-xr-x 1 root root   0 May 30 04:42 device-plugins
drwxr-xr-x 1 root root  44 May 30 04:42 pki
drwx------ 1 root root   0 May 30 04:42 plugin-containers
drwxr-x--- 1 root root   0 May 30 04:42 plugins
drwxr-x--- 1 root root   0 May 30 04:42 pods

we can either change the title or open a new bug then
Comment 19 Maximilian Meister 2018-05-30 05:30:13 UTC
maybe it makes a difference here, i have used the Stack-hardware image, which comes with the additional 9p kernel modules, and the whole stack (k8s/docker) already installed. i just saw the bug is filed for normal tw and not the kubic component... did i use the wrong image?
Comment 20 Maximilian Meister 2018-05-30 05:46:39 UTC
> kubelet.go:1354] Failed to start ContainerManager failed to get rootfs info:
> failed to get device for dir "/var/lib/kubelet": could not find device with
> major: 0, minor: 46 in cached partitions map

flavio, could this eventually be a regression of https://github.com/google/cadvisor/pull/1668 ?
Comment 21 Thorsten Kukuk 2018-05-30 05:51:04 UTC
(In reply to Maximilian Meister from comment #18)

> we can either change the title or open a new bug then

Please track in a new bug, it's always confusing to track to different things in the same bug.
Comment 22 Maximilian Meister 2018-05-30 06:09:25 UTC
(In reply to Thorsten Kukuk from comment #21)
> 
> Please track in a new bug, it's always confusing to track to different
> things in the same bug.

moving discussion to https://bugzilla.suse.com/show_bug.cgi?id=1095131

@flavio please comment there
Comment 23 Maximilian Meister 2018-06-18 14:22:00 UTC
fixed as part of https://build.opensuse.org/request/show/617520
Comment 24 Swamp Workflow Management 2018-07-20 15:40:48 UTC
This is an autogenerated message for OBS integration:
This bug (1084766) was mentioned in
https://build.opensuse.org/request/show/624291 Backports:SLE-12-SP3 / kubectl
Comment 25 Swamp Workflow Management 2018-08-03 18:10:47 UTC
This is an autogenerated message for OBS integration:
This bug (1084766) was mentioned in
https://build.opensuse.org/request/show/627379 15.0 / kubectl
Comment 28 Swamp Workflow Management 2018-08-20 16:08:44 UTC
SUSE-RU-2018:2455-1: An update that has two recommended fixes can now be installed.

Category: recommended (moderate)
Bug References: 1065972,1084766
CVE References: 
Sources used:
SUSE CaaS Platform 3.0 (src):    kubernetes-1.9.8-4.3.1, kubernetes-salt-3.0.0+git_r847_a28e485-3.6.1
Comment 29 Maximilian Meister 2018-08-27 11:19:00 UTC
seems that this is fixed for a while now via https://build.opensuse.org/request/show/617520
Comment 30 Swamp Workflow Management 2018-12-17 15:41:41 UTC
This is an autogenerated message for OBS integration:
This bug (1084766) was mentioned in
https://build.opensuse.org/request/show/658922 15.0 / kubectl
Comment 31 Swamp Workflow Management 2018-12-18 11:41:10 UTC
This is an autogenerated message for OBS integration:
This bug (1084766) was mentioned in
https://build.opensuse.org/request/show/659074 15.0 / kubectl
Comment 32 Swamp Workflow Management 2019-07-11 19:20:52 UTC
This is an autogenerated message for OBS integration:
This bug (1084766) was mentioned in
https://build.opensuse.org/request/show/714707 15.1 / kubernetes
Comment 33 Swamp Workflow Management 2019-07-12 00:20:46 UTC
This is an autogenerated message for OBS integration:
This bug (1084766) was mentioned in
https://build.opensuse.org/request/show/714723 15.1 / kubernetes
Comment 35 Swamp Workflow Management 2020-04-26 19:15:18 UTC
openSUSE-SU-2020:0554-1: An update that solves 7 vulnerabilities and has 22 fixes is now available.

Category: security (important)
Bug References: 1039663,1042383,1042387,1057277,1059207,1061027,1065972,1069469,1084765,1084766,1085009,1086185,1086412,1095131,1095154,1096773,1097473,1100838,1101010,1104598,1104821,1112980,1118897,1118898,1136403,1144065,1155323,1161056,1161179
CVE References: CVE-2016-5195,CVE-2016-8859,CVE-2017-1002101,CVE-2018-1002105,CVE-2018-16873,CVE-2018-16874,CVE-2019-10214
Sources used:
openSUSE Leap 15.1 (src):    cri-o-1.17.1-lp151.2.2, cri-tools-1.18.0-lp151.2.1, go1.14-1.14-lp151.6.1, kubernetes-1.18.0-lp151.5.1