Kubernetes pods is in status pending state - docker

I am trying to install Kubectl but when I type this in the terminal :
kubectl get pods --namespace knative-serving -w
I got this :
NAME READY STATUS RESTARTS AGE
activator-69b8474d6b-jvzvs 2/2 Running 0 2h
autoscaler-6579b57774-cgmm9 2/2 Running 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
webhook-6d9568d-v4pgk 1/1 Running 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
controller-66cd7d99df-q59kl 0/1 Pending 0 2h
I don't understand why controller-66cd7d99df-q59kl is still pending.
When I tried this : kubectl describe pods -n knative-serving controller-66cd7d99df-q59kl I got this :
Name: controller-66cd7d99df-q59kl
Namespace: knative-serving
Node: <none>
Labels: app=controller
pod-template-hash=66cd7d99df
Annotations: sidecar.istio.io/inject=false
Status: Pending
IP:
Controlled By: ReplicaSet/controller-66cd7d99df
Containers:
controller:
Image: gcr.io/knative-releases/github.com/knative/serving/cmd/controller#sha256:5a5a0d5fffe839c99fc8f18ba028375467fdcd83cbee9c7015c1a58d01ca6929
Port: 9090/TCP
Limits:
cpu: 1
memory: 1000Mi
Requests:
cpu: 100m
memory: 100Mi
Environment: <none>
Mounts:
/etc/config-logging from config-logging (rw)
/var/run/secrets/kubernetes.io/serviceaccount from controller-token-d9l64 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-logging:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: config-logging
Optional: false
controller-token-d9l64:
Type: Secret (a volume populated by a Secret)
SecretName: controller-token-d9l64
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 40s (x98 over 2h) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.

Please consider the comments above: you have kubectl installed correctly (it's working) and kubectl describe pod/<pod> would help...
But, the information you provide appears sufficient for an answer:
FailedScheduling because of Insufficient cpu
The pod that you show (one of several) requests:
cpu: 1
memory: 1000Mi
The cluster has insufficient capacity to deploy this pod (and apparently the others).
You should increase the number (and|or size) of the nodes in your cluster to accommodate the capacity needed for the pods.
You needn't delete these pods because, once the cluster's capacity increases, you should see these pods deploy successfully.

Please verify your cpu resources by running:
kubectl get nodes
kubectl describe nodes (your node)
take a look also for all information related to:
Capacity:
cpu:
Allocatable:
cpu:
CPU Requests, CPU Limits information can be helpful

Related

Pod coredns stuck in ContainerCreating state with Weave on k8s

First of all, let me thank you for this amazing guide. I'm very new to kubernetes and having a guide like this to follow helps a lot when trying to setup my first cluster!
That said, I'm having some issues with creating deploytments, as there are two pods that aren't being created, and remain stuck in the state: ContainerCreating
[root#master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane 25h v1.24.0
node1 Ready <none> 24h v1.24.0
node2 Ready <none> 24h v1.24.0
[root#master ~]# kubectl cluster-info
Kubernetes control plane is running at https://192.168.3.200:6443
CoreDNS is running at https://192.168.3.200:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The problem:
[root#master ~]# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-6d4b75cb6d-v5pvk 0/1 ContainerCreating 0 114m
kube-system pod/coredns-7599c5f99f-q6nwq 0/1 ContainerCreating 0 114m
kube-system pod/coredns-7599c5f99f-sg4wn 0/1 ContainerCreating 0 114m
kube-system pod/etcd-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-apiserver-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-controller-manager-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-proxy-ftxzx 1/1 Running 2 (3h11m ago) 24h
kube-system pod/kube-proxy-pcl8q 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-proxy-q7dpw 1/1 Running 2 (3h23m ago) 24h
kube-system pod/kube-scheduler-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/weave-net-2p47z 2/2 Running 5 (3h23m ago) 24h
kube-system pod/weave-net-k5529 2/2 Running 4 (3h11m ago) 24h
kube-system pod/weave-net-tq4bs 2/2 Running 7 (3h26m ago) 25h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 25h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 25h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 kubernetes.io/os=linux 25h
kube-system daemonset.apps/weave-net 3 3 3 3 3 <none> 25h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/2 2 0 25h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-6d4b75cb6d 1 1 0 25h
kube-system replicaset.apps/coredns-7599c5f99f 2 2 0 116m
Note that the first three pods, from coredns, fail to start.
[root#master ~]# kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
93m Warning FailedCreatePodSandBox pod/nginx-deploy-99976564d-s4shk (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "fd79c77289f42b3cb0eb0be997a02a42f9595df061deb6e2d3678ab00afb5f67": failed to find network info for sandbox "fd79c77289f42b3cb0eb0be997a02a42f9595df061deb6e2d3678ab00afb5f67"
.
[root#master ~]# kubectl describe pod coredns-6d4b75cb6d-v5pvk -n kube-system
Name: coredns-6d4b75cb6d-v5pvk
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: node2/192.168.3.202
Start Time: Thu, 12 May 2022 19:45:58 +0000
Labels: k8s-app=kube-dns
pod-template-hash=6d4b75cb6d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-6d4b75cb6d
Containers:
coredns:
Container ID:
Image: k8s.gcr.io/coredns/coredns:v1.8.6
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4bpvz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-4bpvz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 93s (x393 over 124m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7d0f8f4b3dbf2dffcf1a8c01b41368e16b1f80bc97ff3faa611c1fd52c0f6967": failed to find network info for sandbox "7d0f8f4b3dbf2dffcf1a8c01b41368e16b1f80bc97ff3faa611c1fd52c0f6967"
Versions:
[root#master ~]# docker --version
Docker version 20.10.15, build fd82621
[root#master ~]# kubelet --version
Kubernetes v1.24.0
[root#master ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:44:24Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
I have no idea where to go from here. I googled keywords like "rpc error weave k8s" and "Failed to create pod sandbox: rpc error" but none of the solutions I found had a solution to my problem. I saw some problems mentioning weaving net, could this be the problem? Maybe I got it wrong, but I'm sure I followed the instructions very well.
Any help would be greatly appreciated!
Looks like you got pretty far! Support for docker as a container runtime was dropped in 1.24.0. I can't tell if that is what you are using or not but if you are that could be your problem.
https://kubernetes.io/blog/2022/05/03/kubernetes-1-24-release-announcement/
You could switch to containerd for your container runtime but for the purposes of learning you could try the latest 1.23.x version of kubernetes. Get that to work then circle back and tackle containerd with kubernetes v1.24.0
You can still use docker on your laptop/desktop but on the k8s servers you will not be able to use docker on 1.24.x or later.
Hope that helps and good luck!

How do I know how much memory I should provide in k8s pod?

I have applied a elasticsearch on k8s in my Mac (a minikube cluster). The elasticsearch configure file is:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.10.0
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
podTemplate:
spec:
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: -Xms2g -Xmx2g
resources:
requests:
memory: 4Gi
cpu: 4
limits:
memory: 4Gi
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: standard
resources:
requests:
storage: 1Gi
after I run kubectl apply -f es.yaml, the pod and services are created but the pod is pending.
$kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 28h
quickstart-es-default ClusterIP None <none> 9200/TCP 21m
quickstart-es-http ClusterIP 10.103.177.195 <none> 9200/TCP 21m
quickstart-es-transport ClusterIP None <none> 9300/TCP 21m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
quickstart-es-default-0 0/1 Pending 0 21m
the output of describe pods is:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m (x2 over 22m) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Warning FailedScheduling 40s (x17 over 22m) default-scheduler 0/1 nodes are available: 1 Insufficient memory.
It seems that I don't have enough memory in my pod. How can I allocate more memories to my pod?
Minikube itself starts with DefaultMemory = 2048, you are hitting this limit.
Using minikube you should think in advance how much resources your pods/replicas use in total, so that you can use --memory flag to allocated need amount of RAM.
You already got an answer in separate question. In addition to that I would add that you should always do minikube delete prior to start minikube with --memory= option, eg minikube start --memory=8192
You can always check you current memory configuration by kubectl describe node in Capacity section, e.g.
Capacity:
cpu: 1
...
memory: 3785984Ki
pods: 110

nginx-ingress k8s on Google no IP address

I am practicing the k8s by following the ingress chapter. I am using Google Cluster. Specification are as follows
master: 1.11.7-gke.4
node: 1.11.7-gke.4
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-singh-default-pool-a69fa545-1sm3 Ready <none> 6h v1.11.7-gke.4 10.148.0.46 35.197.128.107 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
gke-singh-default-pool-a69fa545-819z Ready <none> 6h v1.11.7-gke.4 10.148.0.47 35.198.217.71 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
gke-singh-default-pool-a69fa545-djhz Ready <none> 6h v1.11.7-gke.4 10.148.0.45 35.197.159.75 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
master endpoint: 35.186.148.93
DNS: singh.hbot.io (master IP)
To keep my question short. I post my source code in the snippet and links back to here.
Files:
deployment.yaml
ingress.yaml
ingress-rules.yaml
Problem:
curl http://singh.hbot.io/webapp1 got timed out
Description
$ kubectl get deployment -n nginx-ingress
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx-ingress 1 1 1 0 2h
nginx-ingress deployment is not available.
$ kubectl describe deployment -n nginx-ingress
Name: nginx-ingress
Namespace: nginx-ingress
CreationTimestamp: Mon, 04 Mar 2019 15:09:42 +0700
Labels: app=nginx-ingress
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"nginx-ingress","namespace":"nginx-ingress"},"s...
Selector: app=nginx-ingress
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=nginx-ingress
Service Account: nginx-ingress
Containers:
nginx-ingress:
Image: nginx/nginx-ingress:edge
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
-default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: (v1:metadata.name)
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-ingress-77fcd48f4d (1/1 replicas created)
Events: <none>
pods:
$ kubectl get pods --all-namespaces=true
NAMESPACE NAME READY STATUS RESTARTS AGE
default webapp1-7d67d68676-k9hhl 1/1 Running 0 6h
default webapp2-64d4844b78-9kln5 1/1 Running 0 6h
default webapp3-5b8ff7484d-zvcsf 1/1 Running 0 6h
kube-system event-exporter-v0.2.3-85644fcdf-xxflh 2/2 Running 0 6h
kube-system fluentd-gcp-scaler-8b674f786-gvv98 1/1 Running 0 6h
kube-system fluentd-gcp-v3.2.0-srzc2 2/2 Running 0 6h
kube-system fluentd-gcp-v3.2.0-w2z2q 2/2 Running 0 6h
kube-system fluentd-gcp-v3.2.0-z7p9l 2/2 Running 0 6h
kube-system heapster-v1.6.0-beta.1-5685746c7b-kd4mn 3/3 Running 0 6h
kube-system kube-dns-6b98c9c9bf-6p8qr 4/4 Running 0 6h
kube-system kube-dns-6b98c9c9bf-pffpt 4/4 Running 0 6h
kube-system kube-dns-autoscaler-67c97c87fb-gbgrs 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-1sm3 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-819z 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-djhz 1/1 Running 0 6h
kube-system l7-default-backend-7ff48cffd7-trqvx 1/1 Running 0 6h
kube-system metrics-server-v0.2.1-fd596d746-bvdfk 2/2 Running 0 6h
kube-system tiller-deploy-57c574bfb8-xnmtj 1/1 Running 0 1h
nginx-ingress nginx-ingress-77fcd48f4d-rfwbk 0/1 CrashLoopBackOff 35 2h
describe pod
$ kubectl describe pods -n nginx-ingress
Name: nginx-ingress-77fcd48f4d-5rhtv
Namespace: nginx-ingress
Priority: 0
PriorityClassName: <none>
Node: gke-singh-default-pool-a69fa545-djhz/10.148.0.45
Start Time: Mon, 04 Mar 2019 17:55:00 +0700
Labels: app=nginx-ingress
pod-template-hash=3397804908
Annotations: <none>
Status: Running
IP: 10.48.2.10
Controlled By: ReplicaSet/nginx-ingress-77fcd48f4d
Containers:
nginx-ingress:
Container ID: docker://5d3ee9e2bf7a2060ff0a96fdd884a937b77978c137df232dbfd0d3e5de89fe0e
Image: nginx/nginx-ingress:edge
Image ID: docker-pullable://nginx/nginx-ingress#sha256:16c1c6dde0b904f031d3c173e0b04eb82fe9c4c85cb1e1f83a14d5b56a568250
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
-default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Mon, 04 Mar 2019 18:16:33 +0700
Finished: Mon, 04 Mar 2019 18:16:33 +0700
Ready: False
Restart Count: 9
Environment:
POD_NAMESPACE: nginx-ingress (v1:metadata.namespace)
POD_NAME: nginx-ingress-77fcd48f4d-5rhtv (v1:metadata.name)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-token-zvcwt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-token-zvcwt:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-token-zvcwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 26m default-scheduler Successfully assigned nginx-ingress/nginx-ingress-77fcd48f4d-5rhtv to gke-singh-default-pool-a69fa545-djhz
Normal Created 25m (x4 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Created container
Normal Started 25m (x4 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Started container
Normal Pulling 24m (x5 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz pulling image "nginx/nginx-ingress:edge"
Normal Pulled 24m (x5 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Successfully pulled image "nginx/nginx-ingress:edge"
Warning BackOff 62s (x112 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Back-off restarting failed container
Fix container terminated
Add to the command to ingress.yaml to prevent container finish running and get terminated by k8s.
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
Ingress has no IP address from GKE. Let me have a look in details
describe ingress:
$ kubectl describe ing
Name: webapp-ingress
Namespace: default
Address:
Default backend: default-http-backend:80 (10.48.0.8:8080)
Rules:
Host Path Backends
---- ---- --------
*
/webapp1 webapp1-svc:80 (<none>)
/webapp2 webapp2-svc:80 (<none>)
webapp3-svc:80 (<none>)
Annotations:
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{},"name":"webapp-ingress","namespace":"default"},"spec":{"rules":[{"http":{"paths":[{"backend":{"serviceName":"webapp1-svc","servicePort":80},"path":"/webapp1"},{"backend":{"serviceName":"webapp2-svc","servicePort":80},"path":"/webapp2"},{"backend":{"serviceName":"webapp3-svc","servicePort":80}}]}}]}}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Translate 7m45s (x59 over 4h20m) loadbalancer-controller error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp2-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp3-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
From this line I got all the ultimate solution from Christian Roy Thank you very much.
Fix the ClusterIP
It is default value then I have to edit my manifest file using NodePort as follow
apiVersion: v1
kind: Service
metadata:
name: webapp1-svc
labels:
app: webapp1
spec:
type: NodePort
ports:
- port: 80
selector:
app: webapp1
And that is.
The answer is in your question. The describe of your ingress shows the problem.
You did kubectl describe ing and the last part of that output was:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Translate 7m45s (x59 over 4h20m) loadbalancer-controller error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp2-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp3-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
The important part is:
error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
Solution
Just change all your services to be of type NodePort and it will work.
I have to add command in order to let the container not finish working.
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]

Kubernetes 1.7 on Google Cloud: FailedSync Error syncing pod, SandboxChanged Pod sandbox changed, it will be killed and re-created

My Kubernetes pods and containers are not starting. They are stuck in with the status ContainerCreating.
I ran the command kubectl describe po PODNAME, which lists the events and I see the following error:
Type Reason Message
Warning FailedSync Error syncing pod
Normal SandboxChanged Pod sandbox changed, it will be killed and re-created.
The Count column indicates that these errors are being repeated over and over again, roughly once a second. The full output is below from this command is below, but how do I go about debugging this? I'm not even sure what these errors mean.
Name: ocr-extra-2939512459-3hkv1
Namespace: ocr-da-cluster
Node: gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2/10.240.0.11
Start Time: Tue, 24 Oct 2017 21:05:01 -0400
Labels: component=ocr
pod-template-hash=2939512459
role=extra
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"ocr-da-cluster","name":"ocr-extra-2939512459","uid":"d58bd050-b8f3-11e7-9f9e-4201...
Status: Pending
IP:
Created By: ReplicaSet/ocr-extra-2939512459
Controlled By: ReplicaSet/ocr-extra-2939512459
Containers:
ocr-node:
Container ID:
Image: us.gcr.io/ocr-api/ocr-image
Image ID:
Ports: 80/TCP, 443/TCP, 5555/TCP, 15672/TCP, 25672/TCP, 4369/TCP, 11211/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 31
memory: 10Gi
Liveness: http-get http://:http/ocr/live delay=270s timeout=30s period=60s #success=1 #failure=5
Readiness: http-get http://:http/_ah/warmup delay=180s timeout=60s period=120s #success=1 #failure=3
Environment:
NAMESPACE: ocr-da-cluster (v1:metadata.namespace)
Mounts:
/var/log/apache2 from apachelog (rw)
/var/log/celery from cellog (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro)
log-apache2-error:
Container ID:
Image: busybox
Image ID:
Port: <none>
Args:
/bin/sh
-c
echo Apache2 Error && sleep 90 && tail -n+1 -F /var/log/apache2/error.log
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 20m
Environment: <none>
Mounts:
/var/log/apache2 from apachelog (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro)
log-worker-1:
Container ID:
Image: busybox
Image ID:
Port: <none>
Args:
/bin/sh
-c
echo Celery Worker && sleep 90 && tail -n+1 -F /var/log/celery/worker*.log
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 20m
Environment: <none>
Mounts:
/var/log/celery from cellog (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dhjr5 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
apachelog:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
cellog:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-dhjr5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-dhjr5
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/instance-type=n1-highcpu-32
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
10m 10m 2 default-scheduler Warning FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (10), Insufficient memory (2), MatchNodeSelector (2).
10m 10m 1 default-scheduler Normal Scheduled Successfully assigned ocr-extra-2939512459-3hkv1 to gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2
10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "apachelog"
10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "cellog"
10m 10m 1 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-dhjr5"
10m 1s 382 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Warning FailedSync Error syncing pod
10m 0s 382 kubelet, gke-da-ocr-api-gce-cluster-extra-pool-65029b63-6qs2 Normal SandboxChanged Pod sandbox changed, it will be killed and re-created.
Check your resource limits. I faced the same issue and the reason in my case was because I was using m instead of Mi for memory limits and memory requests.
Are you sure you need 31 cpu as initial request (ocr-node)?
This will require a very big node.
I'm seeing similar issues with some of my pods. Deleting them and allowing them to be recreated sometimes helps. Not consistent.
I'm sure there is enough resources available.
See Kubernetes pods failing on "Pod sandbox changed, it will be killed and re-created"

Why the pods returns Error or ExitCode:0 even they run successfully?

There's two kinds of status code of one-shot pods, running from API or the command:
kubectl run --restart=Never --image test:v0.1 ....
The pods produce output files to a NFS server, and I've got files successfully.
kubectl get pods -ao wide:
NAME READY STATUS RESTARTS AGE
test-90 0/1 ExitCode:0 0 23m 192.168.1.43
test-91 0/1 ExitCode:0 0 23m 192.168.1.43
test-92 0/1 ExitCode:0 0 23m 192.168.1.43
test-93 0/1 ExitCode:0 0 23m 192.168.1.43
test-94 0/1 Error 0 23m 192.168.1.46
test-95 0/1 Error 0 23m 192.168.1.46
test-96 0/1 Error 0 23m 192.168.1.46
test-97 0/1 Error 0 23m 192.168.1.46
test-98 0/1 Error 0 23m 192.168.1.46
test-99 0/1 ExitCode:0 0 23m 192.168.1.43
the description of ExitCode:0 pod:
Name: test-99
Namespace: default
Image(s): test:v0.1
Node: 192.168.1.43/192.168.1.43
Status: Succeeded
Replication Controllers: <none>
Containers:
test:
State: Terminated
Exit Code: 0
Ready: False
Restart Count: 0
the description of Error pod:
Name: test-98
Namespace: default
Image(s): test:v0.1
Node: 192.168.1.46/192.168.1.46
Status: Succeeded
Replication Controllers: <none>
Containers:
test:
State: Terminated
Reason: Error
Exit Code: 0
Ready: False
Restart Count: 0
Their NFS volumes:
Volumes:
input:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 192.168.1.46
Path: /srv/nfs4/input
ReadOnly: false
output:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 192.168.1.46
Path: /srv/nfs4/output
ReadOnly: false
default-token-nmviv:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-nmviv
kubectl logs returns none, since the container just produces output files.
Thanks in advance!
ExitCode 0 means it terminated normally
Exit codes can be used if you pipe to another process, so the process knows what to do next (if previous process failed do this, else do something with the data passed...)

Resources