Kafka on Minikube: Back-off restarting failed container

Kafka on Minikube: Back-off restarting failed container - docker

I'm need up Kafka and Cassandra in Minikube
Host OS is Ubuntu 16.04
$ uname -a
Linux minikuber 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Minikube started normally:
$ minikube start
Starting local Kubernetes v1.8.0 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
Services list:
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 1d
Zookeeper and Cassandra is running, but kafka crashing with error "CrashLoopBackOff"
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
zookeeper-775db4cd8-lpl95 1/1 Running 0 1h
cassandra-d84d697b8-p5wcs 1/1 Running 0 1h
kafka-6d889c567-w5n4s 0/1 CrashLoopBackOff 25 1h
View logs:
kubectl logs kafka-6d889c567-w5n4s -p
Output:
waiting for kafka to be ready
...
INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
INFO Waiting for keeper state SyncConnected (org.I0Itec.zkclient.ZkClient)
WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
WARN Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
...
INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server '' with timeout of 6000 ms
...
INFO shutting down (kafka.server.KafkaServer)
INFO shut down completed (kafka.server.KafkaServer)
FATAL Exiting Kafka. (kafka.server.KafkaServerStartable)
Сan any one help how to solve the problem of restarting the container?
kubectl describe pod kafka-6d889c567-w5n4s
Output describe:
Name: kafka-6d889c567-w5n4s
Namespace: default
Node: minikube/192.168.99.100
Start Time: Thu, 23 Nov 2017 17:03:20 +0300
Labels: pod-template-hash=284457123
run=kafka
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"kafka-6d889c567","uid":"0fa94c8d-d057-11e7-ad48-080027a5dfed","a...
Status: Running
IP: 172.17.0.5
Created By: ReplicaSet/kafka-6d889c567
Controlled By: ReplicaSet/kafka-6d889c567
Info about Containers:
Containers:
kafka:
Container ID: docker://7ed3de8ef2e3e665ba693186f5125c6802283e1fabca8f3c85eb584f8de19526
Image: wurstmeister/kafka
Image ID: docker-pullable://wurstmeister/kafka#sha256:2aa183fd201d693e24d4d5d483b081fc2c62c198a7acb8484838328c83542c96
Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 27 Nov 2017 09:43:39 +0300
Finished: Mon, 27 Nov 2017 09:43:49 +0300
Ready: False
Restart Count: 1003
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bnz99 (ro)
Info about Conditions:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Info about volumes:
Volumes:
default-token-bnz99:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-bnz99
Optional: false
QoS Class: BestEffort
Info about events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 38m (x699 over 2d) kubelet, minikube pulling image "wurstmeister/kafka"
Warning BackOff 18m (x16075 over 2d) kubelet, minikube Back-off restarting failed container
Warning FailedSync 3m (x16140 over 2d) kubelet, minikube Error syncing pod

Related

Kubectl create deployment fails to pull image from local docker repo connection refused

I am encountering a very basic error:
I have docker-desktop and minikube setup on my windows 10 machine.
Further I setup a local docker registry using the steps here.
Here is what I have when I run docker-ps :
c:\>docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6675d2c57a74 registry:2 "/entrypoint.sh /etc…" 7 hours ago Up 2 hours 0.0.0.0:5000->5000/tcp registry
d05edc8f05b0 gcr.io/k8s-minikube/kicbase:v0.0.10 "/usr/local/bin/entr…" 8 hours ago Up 8 hours 127.0.0.1:32771->22/tcp, 127.0.0.1:32770->2376/tcp, 127.0.0.1:32769->5000/tcp, 127.0.0.1:32768->8443/tcp minikube
I pushed my local image to my local registry using docker push, and I deleted the local image using docker image remove command to avoid any confusion.
I now tried pulling the local registry image to see if it works, and it does
docker pull localhost:5000/dev/my-web:v1
v1: Pulling from dev/my-web
Digest: sha256:b3a0cf5c66ade8d39709c0cbbd0e08c9cc5f5e1c97f039a2bd1afed657dc8b74
Status: Downloaded newer image for localhost:5000/dev/my-web:v1
localhost:5000/dev/my-web:v1
Now I run my kubectl create commands and they fail with the error connection refused.
C:\>kubectl create deployment myweb --image=localhost:5000/dev/my-web:v1
deployment.apps/myweb created
C:\>kubectl get pods
NAME READY STATUS RESTARTS AGE
myweb-7d467f4bc4-r9xhc 0/1 ErrImagePull 0 12s
C:\>kubectl describe pod/myweb-7d467f4bc4-r9xhc
Name: myweb-7d467f4bc4-r9xhc
Namespace: default
Priority: 0
Node: minikube/172.17.0.3
Start Time: Sun, 09 Aug 2020 23:30:07 -0400
Labels: app=myweb
pod-template-hash=7d467f4bc4
Annotations: <none>
Status: Pending
IP: 172.18.0.3
IPs:
IP: 172.18.0.3
Controlled By: ReplicaSet/myweb-7d467f4bc4
Containers:
my-web:
Container ID:
Image: localhost:5000/dev/my-web:v1
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-nr7vj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-nr7vj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-nr7vj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 93s default-scheduler Successfully assigned default/myweb-7d467f4bc4-r9xhc to minikube
Normal Pulling 53s (x3 over 92s) kubelet, minikube Pulling image "localhost:5000/dev/my-web:v1"
Warning Failed 53s (x3 over 92s) kubelet, minikube Failed to pull image "localhost:5000/dev/my-web:v1": rpc error: code = Unknown desc = Error response from daemon: Get http://localhost:5000/v2/: dial tcp 127.0.0.1:5000: connect: connection refused
Warning Failed 53s (x3 over 92s) kubelet, minikube Error: ErrImagePull
Normal BackOff 13s (x5 over 91s) kubelet, minikube Back-off pulling image "localhost:5000/dev/my-web:v1"
Warning Failed 13s (x5 over 91s) kubelet, minikube Error: ImagePullBackOff
I am not able to figure out why am I getting connection refused error when I am trying through kubectl command as against docker pull command.
Please help.
Additional notes: (1) I am using in-built windows hypervisor (2) Using default networking

Well, Kubernetes cannot find your registry. It depends on the setup you have, but Docker for Windows runs in a VM and you didn't specify how you are running minikube, but most likely in another VM. So potentially here you have two VMs that can or cannot talk to each depending on how you set them up.
And localhost is almost never going to work with Kubernetes because that always resolves to the local IP of your Kubernetes node when it comes to pulling the image. That means that you would have to have your registry and the kubelet pulling the image on the exact same VM.
I would just focus on making it work with the VM IP address where your registry is running and make sure that your Kubernetes VM can reach the registry VM IP address. Also, remember that when you run your registry you also have to expose you container port in this case 5000
P.S. You didn't specify what Hypervisor you are running? VirtualBox? VMware? are you using bridged networking? host only❓
✌️

Helm kubernetes on AKS Pod CrashLoopBackOff

I'm trying to deploy a simple nodejs application via helm kubernetes on Azure Kubernetes Service, but after pulling my image it says CrashLoopBackOff.
Here's what I have tried so far:
My Dockerfile:
FROM node:6
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 32000
CMD [ "npm", "start" ]
My server.js:
'use strict';
const express = require('express');
const PORT = 32000;
const HOST = '0.0.0.0';
const app = express();
app.get('/', (req, res) => {
res.send('Hello world from container.\n');
});
app.listen(PORT, HOST);
console.log(`Running on http://${HOST}:${PORT}`);
I have pushed this image to ACR.
New Update: Here's the complete output of kubectl describe pod POD_NAME:
Name: myrel02-mychart06-5dc9d4b86c-kqg4n
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: aks-nodepool1-19665249-0/10.240.0.6
Start Time: Tue, 05 Feb 2019 11:31:27 +0500
Labels: app.kubernetes.io/instance=myrel02
app.kubernetes.io/name=mychart06
pod-template-hash=5dc9d4b86c
Annotations: <none>
Status: Running
IP: 10.244.2.5
Controlled By: ReplicaSet/myrel02-mychart06-5dc9d4b86c
Containers:
mychart06:
Container ID: docker://c239a2b9c38974098bbb1646a272504edd2d199afa50f61d02a0ce335fe60660
Image: registry-1.docker.io/arycloud/docker-web-app:0.5
Image ID: docker-pullable://registry-1.docker.io/arycloud/docker-web-app#sha256:4faab280d161b727e0a6a6d9dfb52b22cf9c6cd7dd07916d6fe164d9af5737a7
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 05 Feb 2019 11:39:56 +0500
Finished: Tue, 05 Feb 2019 11:40:22 +0500
Ready: False
Restart Count: 7
Liveness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
KUBERNETES_PORT_443_TCP_ADDR: cluster06-ary-2a187a-dc393b82.hcp.centralus.azmk8s.io
KUBERNETES_PORT: tcp://cluster06-ary-2a187a-dc393b82.hcp.centralus.azmk8s.io:443
KUBERNETES_PORT_443_TCP: tcp://cluster06-ary-2a187a-dc393b82.hcp.centralus.azmk8s.io:443
KUBERNETES_SERVICE_HOST: cluster06-ary-2a187a-dc393b82.hcp.centralus.azmk8s.io
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gm49w (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-gm49w:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gm49w
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/myrel02-mychart06-5dc9d4b86c-kqg4n to aks-nodepool1-19665249-0
Normal Pulling 10m kubelet, aks-nodepool1-19665249-0 pulling image "registry-1.docker.io/arycloud/docker-web-app:0.5"
Normal Pulled 10m kubelet, aks-nodepool1-19665249-0 Successfully pulled image "registry-1.docker.io/arycloud/docker-web-app:0.5"
Warning Unhealthy 9m30s (x6 over 10m) kubelet, aks-nodepool1-19665249-0 Liveness probe failed: Get http://10.244.2.5:80/: dial tcp 10.244.2.5:80: connect: connection refused
Normal Created 9m29s (x3 over 10m) kubelet, aks-nodepool1-19665249-0 Created container
Normal Started 9m29s (x3 over 10m) kubelet, aks-nodepool1-19665249-0 Started container
Normal Killing 9m29s (x2 over 9m59s) kubelet, aks-nodepool1-19665249-0 Killing container with id docker://mychart06:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 9m23s (x7 over 10m) kubelet, aks-nodepool1-19665249-0 Readiness probe failed: Get http://10.244.2.5:80/: dial tcp 10.244.2.5:80: connect: connection refused
Normal Pulled 5m29s (x6 over 9m59s) kubelet, aks-nodepool1-19665249-0 Container image "registry-1.docker.io/arycloud/docker-web-app:0.5" already present on machine
Warning BackOff 22s (x33 over 7m59s) kubelet, aks-nodepool1-19665249-0 Back-off restarting failed container
Update: docker logs CONTAINER_ID output:
> nodejs#1.0.0 start /usr/src/app
> node server.js
Running on http://0.0.0.0:32000
How can I avoid this issue?
Thanks in advance!

As I can see from kubectl describe pod command output, your Container inside the Pod has been already completed with exit code 0 (#4c74356b41 mentioned about it in the comments). Reason: Completed, which states about successful completion without any errors/problems. However the life cycle of the Pod was very short, therefore Kubernetes continuously schedules new Pods but Liveness and Readiness probes are still failing on the container healthiness.
To keep the Pod running, you must specify a task (process) inside to the container for the ability of continuously running. There are lots of discussions and solutions available on how to solve that kind of issue, more hints can be found here.

kubectl logs command only works if the pod is up and running. If they are not, you can use the kubectl events command. It will give you some log of events and sometimes (in my experience) will give you also clues on what is going on.
kubectl get events -n <your_app_namespace> --sort-by='.metadata.creationTimestamp'
By default it does not sort the events, hence the --sort-by flag.

Kubernetes Calico node 'XXXXXXXXXXX' already using IPv4 Address XXXXXXXXX, CrashLoopBackOff

I used the AWS Kubernetes Quickstart to create a Kubernetes cluster in a VPC and private subnet: https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/doc/heptio-kubernetes-on-the-aws-cloud.pdf. It was running fine for a while. I have Calico installed on my Kubernetes cluster. I have two nodes and a master. The calico pods on the master are running fine, the ones on the nodes are in crashloopbackoff state:
NAME READY STATUS RESTARTS AGE
calico-etcd-ztwjj 1/1 Running 1 55d
calico-kube-controllers-685755779f-ftm92 1/1 Running 2 55d
calico-node-gkjgl 1/2 CrashLoopBackOff 270 22h
calico-node-jxkvx 2/2 Running 4 55d
calico-node-mxhc5 1/2 CrashLoopBackOff 9 25m
Describing one of the crashed pods:
ubuntu#ip-10-0-1-133:~$ kubectl describe pod calico-node-gkjgl -n kube-system
Name: calico-node-gkjgl
Namespace: kube-system
Node: ip-10-0-0-237.us-east-2.compute.internal/10.0.0.237
Start Time: Mon, 17 Sep 2018 16:56:41 +0000
Labels: controller-revision-hash=185957727
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.0.0.237
Controlled By: DaemonSet/calico-node
Containers:
calico-node:
Container ID: docker://d89979ba963c33470139fd2093a5427b13c6d44f4c6bb546c9acdb1a63cd4f28
Image: quay.io/calico/node:v3.1.1
Image ID: docker-pullable://quay.io/calico/node#sha256:19fdccdd4a90c4eb0301b280b50389a56e737e2349828d06c7ab397311638d29
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 18 Sep 2018 15:14:44 +0000
Finished: Tue, 18 Sep 2018 15:14:44 +0000
Ready: False
Restart Count: 270
Requests:
cpu: 250m
Liveness: http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: kubeadm,bgp
CALICO_DISABLE_FILE_LOGGING: true
CALICO_K8S_NODE_REF: (v1:spec.nodeName)
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_IPV4POOL_IPIP: Always
FELIX_IPV6SUPPORT: false
FELIX_IPINIPMTU: 1440
FELIX_LOGSEVERITYSCREEN: info
IP: autodetect
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
install-cni:
Container ID: docker://b37e0ec7eba690473a4999a31d9f766f7adfa65f800a7b2dc8e23ead7520252d
Image: quay.io/calico/cni:v3.1.1
Image ID: docker-pullable://quay.io/calico/cni#sha256:dc345458d136ad9b4d01864705895e26692d2356de5c96197abff0030bf033eb
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Running
Started: Mon, 17 Sep 2018 17:11:52 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 17 Sep 2018 16:56:43 +0000
Finished: Mon, 17 Sep 2018 17:10:53 +0000
Ready: True
Restart Count: 1
Environment:
CNI_CONF_NAME: 10-calico.conflist
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
calico-cni-plugin-token-b7sfl:
Type: Secret (a volume populated by a Secret)
SecretName: calico-cni-plugin-token-b7sfl
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
:NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m (x6072 over 22h) kubelet, ip-10-0-0-237.us-east-2.compute.internal Back-off restarting failed container
The logs for the same pod:
ubuntu#ip-10-0-1-133:~$ kubectl logs calico-node-gkjgl -n kube-system -c calico-node
2018-09-18 15:14:44.605 [INFO][8] startup.go 251: Early log level set to info
2018-09-18 15:14:44.605 [INFO][8] startup.go 269: Using stored node name from /var/lib/calico/nodename
2018-09-18 15:14:44.605 [INFO][8] startup.go 279: Determined node name: ip-10-0-0-237.us-east-2.compute.internal
2018-09-18 15:14:44.609 [INFO][8] startup.go 101: Skipping datastore connection test
2018-09-18 15:14:44.610 [INFO][8] startup.go 352: Building new node resource Name="ip-10-0-0-237.us-east-2.compute.internal"
2018-09-18 15:14:44.610 [INFO][8] startup.go 367: Initialize BGP data
2018-09-18 15:14:44.614 [INFO][8] startup.go 564: Using autodetected IPv4 address on interface ens3: 10.0.0.237/19
2018-09-18 15:14:44.614 [INFO][8] startup.go 432: Node IPv4 changed, will check for conflicts
2018-09-18 15:14:44.618 [WARNING][8] startup.go 861: Calico node 'ip-10-0-0-237' is already using the IPv4 address 10.0.0.237.
2018-09-18 15:14:44.618 [WARNING][8] startup.go 1058: Terminating
Calico node failed to start
So it seems like there is a conflict finding the node IP address, or Calico seems to think the IP is already assigned to another node. Doing a quick search i found this thread: https://github.com/projectcalico/calico/issues/1628. I see that this should be resolved by setting the IP_AUTODETECTION_METHOD to can-reach=DESTINATION, which I'm assuming would be "can-reach=10.0.0.237". This config is an environment variable set on calico/node container. I have been attempting to shell into the container itself, but kubectl tells me the container is not found:
ubuntu#ip-10-0-1-133:~$ kubectl exec calico-node-gkjgl --stdin --tty /bin/sh -c calico-node -n kube-system
error: unable to upgrade connection: container not found ("calico-node")
I'm suspecting this is due to Calico being unable to assign IPs. So I logged onto the host and attempt to shell on the container using docker:
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/bash
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"
So I guess there is no shell to execute in the container. Makes sense why Kubernetes couldn't execute that. I tried running commands externally to list environment variables, but I haven't been able to find any, I could be running these commands wrong however:
root#ip-10-0-0-237:~# docker inspect -f '{{range $index, $value := .Config.Env}}{{$value}} {{end}}' k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 printenv IP_AUTODETECTION_METHOD
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"printenv\": executable file not found in $PATH"
root#ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/env
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/env\": stat /bin/env: no such file or directory"
Okay, so maybe I am going about this the wrong way. Should I attempt to change the Calico config files using Kubernetes and redeploy it? Where can I find these on my system? I haven't been able to find where to set the environment variables.

If you look at the Calico docs IP_AUTODETECTION_METHOD is already defaulting to first-round.
My guess is that something or the IP address is not being released by the previous 'run' of calico, or just simply a bug in the v3.1.1 version of calico.
Try:
Delete your Calico pods that are in a CrashBackOff loop
kubectl -n kube-system delete calico-node-gkjgl calico-node-mxhc5
Your pods will be re-created and hopefully initialize.
Upgrade Calico to v3.1.3 or latest. Follow these docs My guess is that Heptio's Calico installation is using the etcd datastore.
Try to understand how Heptio's AWS AMIs work and see if there are any issues with them. This might take some time so you could contact their support as well.
Try a different method to install Kubernetes with Calico. Well documented on https://kubernetes.io

For me what worked was to remove left over docker-networks on the Nodes.
I had to list out current networks on each Node: docker network list and then remove the unneeded ones: docker network rm <networkName>.
After doing that the calico deployment pods were running fine

Kubernetes cannot pull from insecure registry ans cannot run container from local image on offline cluster

I am working on a offline cluster (machines have no internet access), deploying docker images using ansible and docker compose scripts.
My servers are Centos7.
I have set up an insecure docker registry on the machines. We are going to change environnement, and I am installing kubernetes in order to manage my pull of container.
I follow this guide to install kubernetes:
https://severalnines.com/blog/installing-kubernetes-cluster-minions-centos7-manage-pods-services
After the installation, I tried to launch a testing pod. here is the yml for the pod, launching with
kubectl -f create nginx.yml
here the yml:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: [my_registry_addr]:[my_registry_port]/nginx:v1
ports:
- containerPort: 80
I used kubectl describe to get more information on what was wrong:
Name: nginx
Namespace: default
Node: [my node]
Start Time: Fri, 15 Sep 2017 11:29:05 +0200
Labels: <none>
Status: Pending
IP:
Controllers: <none>
Containers:
nginx:
Container ID:
Image: [my_registry_addr]:[my_registry_port]/nginx:v1
Image ID:
Port: 80/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
No volumes.
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {default-scheduler } Normal Scheduled Successfully assigned nginx to [my kubernet node]
1m 1m 2 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_addr]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
54s 54s 1 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"[my_registry_addr]:[my_registry_port]\""
8s 8s 1 {kubelet [my kubernet node]} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Network timed out while trying to connect to https://index.docker.io/v1/repositories/library/[my_registry_addr]/images. You may want to check your internet connection or if you are behind a proxy."
then, I go to my node and use journalctl -xe
sept. 15 11:22:02 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:02.350930396+02:00" level=info msg="{Action=create, LoginUID=4294967295, PID=11555}"
sept. 15 11:22:17 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:17.351536727+02:00" level=warning msg="Error getting v2 registry: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
sept. 15 11:22:17 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:17.351606330+02:00" level=error msg="Attempting next endpoint for pull after error: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
sept. 15 11:22:32 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:32.353946452+02:00" level=error msg="Not continuing with pull after error: Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
sept. 15 11:22:32 [my_node_ip] kubelet[11555]: E0915 11:22:32.354309 11555 docker_manager.go:2161] Failed to create pod infra container: ErrImagePull; Skipping pod "nginx_default(8b5c40e5-99f4-11e7-98db-f8bc12456ee4)": Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving
sept. 15 11:22:32 [my_node_ip] kubelet[11555]: E0915 11:22:32.354390 11555 pod_workers.go:184] Error syncing pod 8b5c40e5-99f4-11e7-98db-f8bc12456ee4, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "Error while pulling image: Get https://index.docker.io/v1/repositories/library/[my_registry_ip]/images: dial tcp: lookup index.docker.io on [kubernet_master_ip]:53: server misbehaving"
sept. 15 11:22:44 [my_node_ip] dockerd-current[9861]: time="2017-09-15T11:22:44.350708175+02:00" level=error msg="Handler for GET /v1.24/images/[my_registry_ip]:[my_registry_port]/json returned error: No such image: [my_registry_ip]:[my_registry_port]"
I sure thant my docker configuration is good, cause I am using it every day with ansible or mesos.
docker version is 1.12.6, kubernetes version is 1.5.2
What can I do now? I didn't find any configuration key for this usage.
When I saw that pulling was failing, I manually pull the image on all the nodes. I put a tag to ensure that kubernetes will to try to pull as default, and set " imagePullPolicy: IfNotPresent "

The syntax for specifying the docker image is :
[docker_registry]/[image_name]:[image_tag]
In your manifest file, you have used ":" to separate docker repository host and the port the repository is listening on. The default port for docker private registry I guess is 5000.
So change your image declaration from
Image: [my_registry_addr]:[my_registry_port]/nginx:v1
to
Image: [my_registry_addr]/nginx:v1
Also, check the network connectivity from the worker node to your docker registry by doing a ping.
ping [my_registry_addr]
If you still want to check if the port 443 is opened on the registry you can do a tcp check on that port on the host running docker registry
curl telnet://[my_registry_addr]:443
Hope that helps.

I finally find what was the problem.
To work, Kubernetes need a pause container. Kubernetes was trying to find the pause container on the internet.
I deployed a custom pause container on my registry, I set up kubernetes pause container to this image.
After that, kubernetes is working like a charm.

Kubernetes does not pull docker image from private repository without https

I configured docker on the same host as my kubernetes-master for the private docker registry.Docker pushing to the private docker registry without https was successful. I also can pull the image just using docker.
When I run kubernetes for this image, I get with 'kubectl describe pods' following log :
kubectl describe pods
Name: fgpra-250514157-yh6vb
Namespace: default
Node: 5.179.232.64/5.179.232.64
Start Time: Tue, 11 Oct 2016 18:06:59 +0200
Labels: pod-template-hash=250514157,run=fgpra
Status: Pending
IP: <removed myself>
Controllers: ReplicaSet/fgpra-250514157
Containers:
fgpra:
Container ID:
Image: 5.179.232.65:5000/some_api_image
Image ID:
Port: 3000/TCP
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Ready False
Volumes:
default-token-q7u3x:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-q7u3x
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4s 4s 1 {default-scheduler } Normal Scheduled Successfully assigned fgpra-250514157-yh6vb to 5.179.232.64
4s 4s 1 {kubelet 5.179.232.64} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
4s 4s 1 {kubelet 5.179.232.64} spec.containers{fgpra} Normal Pulling pulling image "5.179.232.65:5000/some_api_image"
4s 4s 1 {kubelet 5.179.232.64} spec.containers{fgpra} Warning Failed Failed to pull image "5.179.232.65:5000/some_api_image": unable to ping registry endpoint https://5.179.232.65:5000/v0/
v2 ping attempt failed with error: Get https://5.179.232.65:5000/v2/: http: server gave HTTP response to HTTPS client
v1 ping attempt failed with error: Get https://5.179.232.65:5000/v1/_ping: http: server gave HTTP response to HTTPS client
4s 4s 1 {kubelet 5.179.232.64} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "fgpra" with ErrImagePull: "unable to ping registry endpoint https://5.179.232.65:5000/v0/\nv2 ping attempt failed with error: Get https://5.179.232.65:5000/v2/: http: server gave HTTP response to HTTPS client\n v1 ping attempt failed with error: Get https://5.179.232.65:5000/v1/_ping: http: server gave HTTP response to HTTPS client"
3s 3s 1 {kubelet 5.179.232.64} spec.containers{fgpra} Normal BackOff Back-off pulling image "5.179.232.65:5000/some_api_image"
3s 3s 1 {kubelet 5.179.232.64} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "fgpra" with ImagePullBackOff: "Back-off pulling image \"5.179.232.65:5000/some_api_image\""
I already configured my /etc/init.d/sysconfig/docker to use my insecure private registry.
This is the command to start the kubernetes deployment :
kubectl run fgpra --image=5.179.232.65:5000/some_api_image --port=3000
How can I set kubernetes to pull from my private docker registry without using ssl?

This rather a docker issue than a kubernetes one. You need to add your http registry as a insecure-registry to your docker daemon on each kubernetes node.
docker daemon --insecure-registry=5.179.232.65:5000
In most environment there is a file like /etc/default/docker where you can add this parameter.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Kafka on Minikube: Back-off restarting failed container - docker

Related

Kubectl create deployment fails to pull image from local docker repo connection refused

Helm kubernetes on AKS Pod CrashLoopBackOff

Kubernetes Calico node 'XXXXXXXXXXX' already using IPv4 Address XXXXXXXXX, CrashLoopBackOff

Kubernetes cannot pull from insecure registry ans cannot run container from local image on offline cluster

Kubernetes does not pull docker image from private repository without https

Categories

Resources