Bug 1139800 - Cilium not working in Kubic
Cilium not working in Kubic
Status: CONFIRMED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Kubic
Current
All openSUSE Factory
: P5 - None : Normal (vote)
: ---
Assigned To: Michał Rostecki
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2019-06-30 08:30 UTC by Jason Evans
Modified: 2019-10-22 05:26 UTC (History)
5 users (show)

See Also:
Found By: Community User
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---
guillaume.gardet: needinfo? (mrostecki)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Evans 2019-06-30 08:30:03 UTC
I am testing this on the latest Snapshot20190627.

Init worked well:

>
master:~ # kubicctl init --pod-network cilium
Initializing kubernetes master can take several minutes, please be patient.
Initialize Kubernetes control-plane
Deploy cilium
Deploy Kubernetes Reboot Daemon (kured)
Kubernetes master was succesfully setup.
>

cilium-etcd-operator pod is still not coming up:

>
master:~ # kubectl get pods -n kube-system
NAME                                    READY   STATUS             RESTARTS   AGE
cilium-5fdrl                            0/1     PodInitializing    0          73s
cilium-etcd-operator-5f9468cf8c-6b5bn   0/1     ImagePullBackOff   0          73s
cilium-operator-cb87f5c57-d8qwz         0/1     Pending            0          73s
coredns-fb8b8dccf-c7f8n                 0/1     Pending            0          73s
coredns-fb8b8dccf-wl5rg                 0/1     Pending            0          73s
etcd-master                             1/1     Running            0          30s
kube-apiserver-master                   1/1     Running            0          40s
kube-controller-manager-master          1/1     Running            0          37s
kube-proxy-jgttd                        1/1     Running            0          73s
kube-scheduler-master                   1/1     Running            0          37s
>

Can not pull the image from registry.opensuse.org/kubic/cilium-etcd-operator:2.0


>
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  4m28s                 default-scheduler  Successfully assigned kube-system/cilium-etcd-operator-5f9468cf8c-6b5bn to master
  Normal   Pulling    103s (x4 over 4m25s)  kubelet, master    Pulling image "registry.opensuse.org/kubic/cilium-etcd-operator:2.0"
  Warning  Failed     103s (x4 over 4m11s)  kubelet, master    Failed to pull image "registry.opensuse.org/kubic/cilium-etcd-operator:2.0": rpc error: code = Unknown desc = Error reading manifest 2.0 in registry.opensuse.org/kubic/cilium-etcd-operator: name unknown
  Warning  Failed     103s (x4 over 4m11s)  kubelet, master    Error: ErrImagePull
  Normal   BackOff    89s (x6 over 4m10s)   kubelet, master    Back-off pulling image "registry.opensuse.org/kubic/cilium-etcd-operator:2.0"
  Warning  Failed     78s (x7 over 4m10s)   kubelet, master    Error: ImagePullBackOff
>
Comment 1 Andreas Färber 2019-07-09 11:22:27 UTC
Same problem on aarch64 with snapshot 20190607.
Comment 2 Thorsten Kukuk 2019-07-22 11:28:39 UTC
I submitted the missing container image to Factory now, I hope that's all what's missing.
Comment 3 Michał Rostecki 2019-07-23 09:34:02 UTC
(In reply to Andreas Färber from comment #1)
> Same problem on aarch64 with snapshot 20190607.

That's because there are no Cilium packages or images for aarch64. I still need to fix build of few dependencies.
Comment 4 Guillaume GARDET 2019-07-23 12:35:39 UTC
cilium-proxy is unresolvable because jwt_verify_lib fails to build for aarch64.
Comment 5 Guillaume GARDET 2019-07-23 13:17:56 UTC
(In reply to Guillaume GARDET from comment #4)
> cilium-proxy is unresolvable because jwt_verify_lib fails to build for
> aarch64.

SR to fix build of jwt_verify_lib:
https://build.opensuse.org/request/show/717893
Comment 6 Jason Evans 2019-07-25 09:06:02 UTC
re: Thorsten
I don't know how long it takes for a container image in factory to be available in the container registry, but so far we are having the same issue. This is from the latest build:


Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  113s                default-scheduler  Successfully assigned kube-system/cilium-etcd-operator-5585fb77d4-9dztf to master
  Normal   Pulling    31s (x3 over 112s)  kubelet, master    Pulling image "registry.opensuse.org/kubic/cilium-etcd-operator:2.0"
  Warning  Failed     30s (x3 over 111s)  kubelet, master    Failed to pull image "registry.opensuse.org/kubic/cilium-etcd-operator:2.0": rpc error: code = Unknown desc = Error reading manifest 2.0 in registry.opensuse.org/kubic/cilium-etcd-operator: name unknown
  Warning  Failed     30s (x3 over 111s)  kubelet, master    Error: ErrImagePull
  Normal   BackOff    7s (x4 over 110s)   kubelet, master    Back-off pulling image "registry.opensuse.org/kubic/cilium-etcd-operator:2.0"
  Warning  Failed     7s (x4 over 110s)   kubelet, master    Error: ImagePullBackOff
Comment 7 Guillaume GARDET 2019-07-25 10:22:20 UTC
cilium-proxy fails to build for all archs:
https://build.opensuse.org/package/show/devel:kubic/cilium-proxy

I tried older bazel, without success. The maintainer can probably have a look.
Comment 8 Michał Rostecki 2019-07-25 21:27:52 UTC
(In reply to Guillaume GARDET from comment #7)
> cilium-proxy fails to build for all archs:
> https://build.opensuse.org/package/show/devel:kubic/cilium-proxy
> 
> I tried older bazel, without success. The maintainer can probably have a
> look.

I will try to fix it by updating Envoy to the newest version. Which depends on Bazel 0.25 though.
Comment 9 Thorsten Kukuk 2019-07-28 16:44:26 UTC
(In reply to Jason Evans from comment #6)
> re: Thorsten
> I don't know how long it takes for a container image in factory to be
> available in the container registry, but so far we are having the same
> issue. This is from the latest build:

You can check the registry yourself, enter "^kubic" in the search field, and all official released images for Kubic are shown.

How long it takes with openSUSE to get an image released depends on in which staging project the image ends, some are fast, some other are really slow if there are problems with other packages.

But it should work now, all images are now released.
Comment 10 Jason Evans 2019-08-01 08:47:43 UTC
While the image is now available, Cillium is still not working.

The main cillium pod "cillium-xxxxx" is in a repeating CrashLoopBackOff

Events:
  Type     Reason                  Age    From               Message
  ----     ------                  ----   ----               -------
  Normal   Scheduled               4m3s   default-scheduler  Successfully assigned kube-system/kured-2klhw to worker10
  Warning  FailedCreatePodSandBox  2m31s  kubelet, worker10  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kured-2klhw_kube-system_e65460cd-c824-40e2-872f-2b9a6e69ec42_0(fca46fe4ffc72e146befc14d19c036a1beed6fb8918f5e0dbaaed33319cd58c7): Unable to create endpoint: Put http:///var/run/cilium/cilium.sock/v1/endpoint/cilium-local:0: context deadline exceeded
  Warning  FailedCreatePodSandBox  55s    kubelet, worker10  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kured-2klhw_kube-system_e65460cd-c824-40e2-872f-2b9a6e69ec42_0(9745ea80f03bb52293b43733b87eade1e94f762e7aac75dbc34010033cd5c926): Unable to create endpoint: Put http:///var/run/cilium/cilium.sock/v1/endpoint/cilium-local:0: EOF






Also the kured-xxxxx pod are stuck in a "ContainerCreating" mode.





Events:
  Type     Reason                  Age    From               Message
  ----     ------                  ----   ----               -------
  Normal   Scheduled               5m8s   default-scheduler  Successfully assigned kube-system/kured-2klhw to worker10
  Warning  FailedCreatePodSandBox  3m36s  kubelet, worker10  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kured-2klhw_kube-system_e65460cd-c824-40e2-872f-2b9a6e69ec42_0(fca46fe4ffc72e146befc14d19c036a1beed6fb8918f5e0dbaaed33319cd58c7): Unable to create endpoint: Put http:///var/run/cilium/cilium.sock/v1/endpoint/cilium-local:0: context deadline exceeded
  Warning  FailedCreatePodSandBox  2m     kubelet, worker10  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kured-2klhw_kube-system_e65460cd-c824-40e2-872f-2b9a6e69ec42_0(9745ea80f03bb52293b43733b87eade1e94f762e7aac75dbc34010033cd5c926): Unable to create endpoint: Put http:///var/run/cilium/cilium.sock/v1/endpoint/cilium-local:0: EOF
  Warning  FailedCreatePodSandBox  14s    kubelet, worker10  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kured-2klhw_kube-system_e65460cd-c824-40e2-872f-2b9a6e69ec42_0(7f86af61f02880035e8d3915ded95cda717306724ad1a31d70c75387d6f01e21): Unable to create endpoint: Put http:///var/run/cilium/cilium.sock/v1/endpoint/cilium-local:0: context deadline exceeded
Comment 11 Guillaume GARDET 2019-08-01 10:07:12 UTC
@Jason, cilium-proxy is still failing for all archs:
  https://build.opensuse.org/package/show/devel:kubic/cilium-proxy
So, we probably need to get it build properly?

And kubic-cilium-image is still unresolvable for aarch64: 
  https://build.opensuse.org/package/show/openSUSE:Factory:ARM/kubic-cilium-image
Comment 12 Guillaume GARDET 2019-08-15 13:24:09 UTC
(In reply to Michał Rostecki from comment #8)
> (In reply to Guillaume GARDET from comment #7)
> > cilium-proxy fails to build for all archs:
> > https://build.opensuse.org/package/show/devel:kubic/cilium-proxy
> > 
> > I tried older bazel, without success. The maintainer can probably have a
> > look.
> 
> I will try to fix it by updating Envoy to the newest version. Which depends
> on Bazel 0.25 though.

Any progress on this update?