I am having an issue where a custom built rabbitmq image works in docker, but continuously restarts within kubernetes.
Dockerfile:
# syntax=docker/dockerfile:1
FROM rabbitmq:management-alpine
ADD rabbitmq.conf /etc/rabbitmq/
ADD definitions.json /etc/rabbitmq/
ENTRYPOINT ["docker-entrypoint.sh"]
EXPOSE 4369 5671 5672 15691 15692 25672
CMD ["rabbitmq-server"]
When run with a simple docker run <IMAGE>, I get logs indicating success, and clearly the service is running in the container:
...
2022-11-25 16:37:41.392367+00:00 [info] <0.229.0> Importing concurrently 7 exchanges...
2022-11-25 16:37:41.394591+00:00 [info] <0.229.0> Importing sequentially 1 global runtime parameters...
2022-11-25 16:37:41.395691+00:00 [info] <0.229.0> Importing concurrently 7 queues...
2022-11-25 16:37:41.400586+00:00 [info] <0.229.0> Importing concurrently 7 bindings...
2022-11-25 16:37:41.403519+00:00 [info] <0.787.0> Resetting node maintenance status
2022-11-25 16:37:41.414900+00:00 [info] <0.846.0> Management plugin: HTTP (non-TLS) listener started on port 15672
2022-11-25 16:37:41.414963+00:00 [info] <0.874.0> Statistics database started.
2022-11-25 16:37:41.415003+00:00 [info] <0.873.0> Starting worker pool 'management_worker_pool' with 3 processes in it
2022-11-25 16:37:41.423652+00:00 [info] <0.888.0> Prometheus metrics: HTTP (non-TLS) listener started on port 15692
2022-11-25 16:37:41.423704+00:00 [info] <0.787.0> Ready to start client connection listeners
2022-11-25 16:37:41.424455+00:00 [info] <0.932.0> started TCP listener on [::]:5672
completed with 4 plugins.
2022-11-25 16:37:41.448054+00:00 [info] <0.787.0> Server startup complete; 4 plugins started.
2022-11-25 16:37:41.448054+00:00 [info] <0.787.0> * rabbitmq_prometheus
2022-11-25 16:37:41.448054+00:00 [info] <0.787.0> * rabbitmq_management
2022-11-25 16:37:41.448054+00:00 [info] <0.787.0> * rabbitmq_web_dispatch
2022-11-25 16:37:41.448054+00:00 [info] <0.787.0> * rabbitmq_management_agent
However, if I take this container, and deploy it within my kubernetes cluster, the pod seems to start, and then exit into a "CrashLoopBackoff" state.
kubectl logs <POD> returns:
Segmentation fault (core dumped)
and kubectl describe pod <POD> returns:
Name: rabbitmq-0
Namespace: *****
Priority: 0
Service Account: *****
Node: minikube/*****
Start Time: Thu, 24 Nov 2022 00:35:28 -0500
Labels: app=rabbitmq
controller-revision-hash=rabbitmq-75d6d74c5d
statefulset.kubernetes.io/pod-name=rabbitmq-0
Annotations: <none>
Status: Running
IP: *****
IPs:
IP: *****
Controlled By: StatefulSet/rabbitmq
Containers:
rabbitmq-deployment:
Container ID: docker://32930809a10ced998083d8adacec209da7081b7c7bfda605f7ac87f78cf23fda
Image: *****/<POD>:latest
Image ID: *****
Ports: 5672/TCP, 15672/TCP, 15692/TCP, 4369/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 24 Nov 2022 00:41:26 -0500
Finished: Thu, 24 Nov 2022 00:41:27 -0500
Ready: False
Restart Count: 6
Liveness: exec [rabbitmq-diagnostics status] delay=60s timeout=15s period=60s #success=1 #failure=3
Readiness: exec [rabbitmq-diagnostics ping] delay=20s timeout=10s period=60s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sst9x (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-sst9x:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 35h default-scheduler Successfully assigned mpa/rabbitmq-0 to minikube
Normal Pulled 35h kubelet Successfully pulled image "*****" in 622.632929ms
Normal Pulled 35h kubelet Successfully pulled image "*****" in 233.765678ms
Normal Pulled 35h kubelet Successfully pulled image "*****" in 203.932962ms
Normal Pulling 35h (x4 over 35h) kubelet Pulling image "*****"
Normal Created 35h (x4 over 35h) kubelet Created container rabbitmq-deployment
Normal Started 35h (x4 over 35h) kubelet Started container rabbitmq-deployment
Normal Pulled 35h kubelet Successfully pulled image "*****" in 212.459802ms
Warning BackOff 35h (x52 over 35h) kubelet Back-off restarting failed container
The section of that describe command that states:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
makes me wonder if the process isn't properly being left running. It's almost as if rabbitmq is starting, and then exiting once initialized.
Is there something I am missing here? Thank you.
EDIT:
kubectl get all gives:
NAME READY STATUS RESTARTS AGE
pod/auth-deployment-9cfd4c64f-c5v99 1/1 Running 0 19m
pod/config-deployment-d4f4c959c-dnspd 1/1 Running 0 20m
pod/rabbitmq-0 0/1 CrashLoopBackOff 8 (4m45s ago) 20m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/auth-service ClusterIP 10.101.181.223 <none> 8080/TCP 19m
service/config-service ClusterIP 10.98.208.163 <none> 8080/TCP 20m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/auth-deployment 1/1 1 1 19m
deployment.apps/config-deployment 1/1 1 1 20m
NAME DESIRED CURRENT READY AGE
replicaset.apps/auth-deployment-9cfd4c64f 1 1 1 19m
replicaset.apps/config-deployment-d4f4c959c 1 1 1 20m
NAME READY AGE
statefulset.apps/rabbitmq 0/1 20m
Related
First of all, let me thank you for this amazing guide. I'm very new to kubernetes and having a guide like this to follow helps a lot when trying to setup my first cluster!
That said, I'm having some issues with creating deploytments, as there are two pods that aren't being created, and remain stuck in the state: ContainerCreating
[root#master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane 25h v1.24.0
node1 Ready <none> 24h v1.24.0
node2 Ready <none> 24h v1.24.0
[root#master ~]# kubectl cluster-info
Kubernetes control plane is running at https://192.168.3.200:6443
CoreDNS is running at https://192.168.3.200:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The problem:
[root#master ~]# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/coredns-6d4b75cb6d-v5pvk 0/1 ContainerCreating 0 114m
kube-system pod/coredns-7599c5f99f-q6nwq 0/1 ContainerCreating 0 114m
kube-system pod/coredns-7599c5f99f-sg4wn 0/1 ContainerCreating 0 114m
kube-system pod/etcd-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-apiserver-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-controller-manager-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-proxy-ftxzx 1/1 Running 2 (3h11m ago) 24h
kube-system pod/kube-proxy-pcl8q 1/1 Running 3 (3h26m ago) 25h
kube-system pod/kube-proxy-q7dpw 1/1 Running 2 (3h23m ago) 24h
kube-system pod/kube-scheduler-master 1/1 Running 3 (3h26m ago) 25h
kube-system pod/weave-net-2p47z 2/2 Running 5 (3h23m ago) 24h
kube-system pod/weave-net-k5529 2/2 Running 4 (3h11m ago) 24h
kube-system pod/weave-net-tq4bs 2/2 Running 7 (3h26m ago) 25h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 25h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 25h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/kube-proxy 3 3 3 3 3 kubernetes.io/os=linux 25h
kube-system daemonset.apps/weave-net 3 3 3 3 3 <none> 25h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 0/2 2 0 25h
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-6d4b75cb6d 1 1 0 25h
kube-system replicaset.apps/coredns-7599c5f99f 2 2 0 116m
Note that the first three pods, from coredns, fail to start.
[root#master ~]# kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
93m Warning FailedCreatePodSandBox pod/nginx-deploy-99976564d-s4shk (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "fd79c77289f42b3cb0eb0be997a02a42f9595df061deb6e2d3678ab00afb5f67": failed to find network info for sandbox "fd79c77289f42b3cb0eb0be997a02a42f9595df061deb6e2d3678ab00afb5f67"
.
[root#master ~]# kubectl describe pod coredns-6d4b75cb6d-v5pvk -n kube-system
Name: coredns-6d4b75cb6d-v5pvk
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: node2/192.168.3.202
Start Time: Thu, 12 May 2022 19:45:58 +0000
Labels: k8s-app=kube-dns
pod-template-hash=6d4b75cb6d
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-6d4b75cb6d
Containers:
coredns:
Container ID:
Image: k8s.gcr.io/coredns/coredns:v1.8.6
Image ID:
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4bpvz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-4bpvz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 93s (x393 over 124m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7d0f8f4b3dbf2dffcf1a8c01b41368e16b1f80bc97ff3faa611c1fd52c0f6967": failed to find network info for sandbox "7d0f8f4b3dbf2dffcf1a8c01b41368e16b1f80bc97ff3faa611c1fd52c0f6967"
Versions:
[root#master ~]# docker --version
Docker version 20.10.15, build fd82621
[root#master ~]# kubelet --version
Kubernetes v1.24.0
[root#master ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:44:24Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
I have no idea where to go from here. I googled keywords like "rpc error weave k8s" and "Failed to create pod sandbox: rpc error" but none of the solutions I found had a solution to my problem. I saw some problems mentioning weaving net, could this be the problem? Maybe I got it wrong, but I'm sure I followed the instructions very well.
Any help would be greatly appreciated!
Looks like you got pretty far! Support for docker as a container runtime was dropped in 1.24.0. I can't tell if that is what you are using or not but if you are that could be your problem.
https://kubernetes.io/blog/2022/05/03/kubernetes-1-24-release-announcement/
You could switch to containerd for your container runtime but for the purposes of learning you could try the latest 1.23.x version of kubernetes. Get that to work then circle back and tackle containerd with kubernetes v1.24.0
You can still use docker on your laptop/desktop but on the k8s servers you will not be able to use docker on 1.24.x or later.
Hope that helps and good luck!
I have my own Kubernetes env with 3 nodes. I'm trying to deploy a basic Ubuntu docker image from Docker Hub, but it gives error like below.
Firstly I followed the instructions the link below. I suppose that everything's ok.
https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#create-a-secret-in-the-cluster-that-holds-your-authorization-token
Then I created a basic ubuntu yaml file:
apiVersion: v1
kind: Pod
metadata:
name: private-reg
spec:
containers:
- name: private-reg-container
image: ubuntu:21.04
imagePullSecrets:
- name: regcred
But after I apply, pod's status is "ContainerCreating" and it gives error like below;
[root#user]# kubectl describe pod private-reg
Name: private-reg
Namespace: kube-system
Priority: 0
Node: server3/10.100.2.183
Start Time: Tue, 08 Dec 2020 14:44:41 +0200
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
private-reg-container:
Container ID:
Image: ubuntu:21.04
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-57296 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-57296:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-57296
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18s default-scheduler Successfully assigned kube-system/private-reg to server3
Warning FailedCreatePodSandBox 74s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "aac36b2a8b1af982eeea04bd914c3614ffda31c7881822c2d5dd335780335cf0" network for pod "private-reg": networkPlugin cni failed to set up pod "private-reg_kube-system" network: error getting ClusterInformation: Get "https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), failed to clean up sandbox container "aac36b2a8b1af982eeea04bd914c3614ffda31c7881822c2d5dd335780335cf0" network for pod "private-reg": networkPlugin cni failed to teardown pod "private-reg_kube-system" network: error getting ClusterInformation: Get "https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
Normal SandboxChanged 60s (x2 over 74s) kubelet Pod sandbox changed, it will be killed and re-created.
Pods are like below:
[root#server ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-gsksj 1/1 Running 3 35d
coredns-f9fd979d6-r5gr8 1/1 Running 3 35d
etcd-serverkube1 1/1 Running 4 35d
kube-apiserver-serverkube1 1/1 Running 5 35d
kube-controller-manager-srvkube1 1/1 Running 4 35d
kube-proxy-8ngxt 1/1 Running 1 22d
kube-proxy-lv5b4 1/1 Running 4 35d
kube-proxy-xfswt 1/1 Running 2 22d
kube-scheduler-srvkube1 1/1 Running 4 35d
ubuntu 1/1 Running 5 26d
weave-net-749h4 2/2 Running 2 22d
weave-net-cpj5h 2/2 Running 11 35d
weave-net-fpjqp 2/2 Running 5 22d
How can I solve this problem? Thanks!
I want to trigger a manual workflow in Argo. I am using Openshift and ArgoCD, have scheduled workflows that are running successfully in Argo but failing when triggering a manual run for one workflow.
The concerned workflow is:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: "obslytics-data-exporter-manual-workflow-"
labels:
owner: "obslytics-remote-reader"
app: "obslytics-data-exporter"
pipeline: "obslytics-data-exporter"
spec:
arguments:
parameters:
- name: start_timestamp
value: "2020-11-18T20:00:00Z"
entrypoint: manual-trigger
templates:
- name: manual-trigger
steps:
- - name: trigger
templateRef:
name: "obslytics-data-exporter-workflow-triggers"
template: trigger-workflow
volumes:
- name: "obslytics-data-exporter-workflow-secrets"
secret:
secretname: "obslytics-data-exporter-workflow-secrets"
When I run the command:
argo submit trigger.local.yaml
The build pod is completed but the rest pods fail:
➜ dh-workflow-obslytics git:(master) ✗ oc get pods
NAME READY STATUS RESTARTS AGE
argo-ui-7fcf5ff95-9k8cc 1/1 Running 0 3d
gateway-controller-76bb888f7b-lq84r 1/1 Running 0 3d
obslytics-data-exporter-1-build 0/1 Completed 0 3d
obslytics-data-exporter-calendar-gateway-fbbb8d7-zhdnf 2/2 Running 1 3d
obslytics-data-exporter-manual-workflow-m7jdg-1074461258 0/2 Error 0 4m
obslytics-data-exporter-manual-workflow-m7jdg-1477271209 0/2 Error 0 4m
obslytics-data-exporter-manual-workflow-m7jdg-1544087495 0/2 Error 0 4m
obslytics-data-exporter-manual-workflow-m7jdg-1979266120 0/2 Completed 0 4m
obslytics-data-exporter-sensor-6594954795-xw8fk 1/1 Running 0 3d
opendatahub-operator-8994ddcf8-v8wxm 1/1 Running 0 3d
sensor-controller-58bdc7c4f4-9h4jw 1/1 Running 0 3d
workflow-controller-759649b79b-s69l7 1/1 Running 0 3d
The pods starting with obslytics-data-exporter-manual-workflow are the concerned pods that are failing. When I attempt to debug by describing pods:
➜ dh-workflow-obslytics git:(master) ✗ oc describe pods/obslytics-data-exporter-manual-workflow-4hzqz-3278280317
Name: obslytics-data-exporter-manual-workflow-4hzqz-3278280317
Namespace: dh-dev-argo
Priority: 0
PriorityClassName: <none>
Node: avsrivas-dev-ocp-3.11/10.0.111.224
Start Time: Tue, 24 Nov 2020 07:27:57 -0500
Labels: workflows.argoproj.io/completed=true
workflows.argoproj.io/workflow=obslytics-data-exporter-manual-workflow-4hzqz
Annotations: openshift.io/scc=restricted
workflows.argoproj.io/node-message=timeout after 0s
workflows.argoproj.io/node-name=obslytics-data-exporter-manual-workflow-4hzqz[0].trigger[1].run[0].metric-split(0:cluster_version)[0].process-metric(0)
workflows.argoproj.io/template={"name":"run-obslytics","arguments":{},"inputs":{"parameters":[{"name":"metric","value":"cluster_version"},{"name":"start_timestamp","value":"2020-11-18T20:00:00Z"},{"na...
Status: Failed
IP: 10.128.0.69
Controlled By: Workflow/obslytics-data-exporter-manual-workflow-4hzqz
Init Containers:
init:
Container ID: docker://25b95c684ef66b13520ba9deeba353082142f3bb39bafe443ee508074c58047e
Image: argoproj/argoexec:v2.4.2
Image ID: docker-pullable://docker.io/argoproj/argoexec#sha256:4e393daa6ed985cf680bcf0ecf04f7b0758940f0789505428331fcfe99cce06b
Port: <none>
Host Port: <none>
Command:
argoexec
init
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 24 Nov 2020 07:27:59 -0500
Finished: Tue, 24 Nov 2020 07:27:59 -0500
Ready: True
Restart Count: 0
Environment:
ARGO_POD_NAME: obslytics-data-exporter-manual-workflow-4hzqz-3278280317 (v1:metadata.name)
ARGO_CONTAINER_RUNTIME_EXECUTOR: k8sapi
Mounts:
/argo/podmetadata from podmetadata (rw)
/argo/staging from argo-staging (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qpggm (ro)
Containers:
wait:
Container ID: docker://a94e7f1bc1cfec4c8b549120193b697c91760bb8f3af414babef1d6f7ccee831
Image: argoproj/argoexec:v2.4.2
Image ID: docker-pullable://docker.io/argoproj/argoexec#sha256:4e393daa6ed985cf680bcf0ecf04f7b0758940f0789505428331fcfe99cce06b
Port: <none>
Host Port: <none>
Command:
argoexec
wait
State: Terminated
Reason: Completed
Message: timeout after 0s
Exit Code: 0
Started: Tue, 24 Nov 2020 07:28:00 -0500
Finished: Tue, 24 Nov 2020 07:28:01 -0500
Ready: False
Restart Count: 0
Environment:
ARGO_POD_NAME: obslytics-data-exporter-manual-workflow-4hzqz-3278280317 (v1:metadata.name)
ARGO_CONTAINER_RUNTIME_EXECUTOR: k8sapi
Mounts:
/argo/podmetadata from podmetadata (rw)
/mainctrfs/argo/staging from argo-staging (rw)
/mainctrfs/etc/obslytics-data-exporter from obslytics-data-exporter-workflow-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qpggm (ro)
main:
Container ID: docker://<some_id>
Image: docker-registry.default.svc:5000/<some_id>
Image ID: docker-pullable://docker-registry.default.svc:5000/<some_id>
Port: <none>
Host Port: <none>
Command:
/bin/sh
-e
Args:
/argo/staging/script
State: Terminated
Reason: Error
Exit Code: 126
Started: Tue, 24 Nov 2020 07:28:01 -0500
Finished: Tue, 24 Nov 2020 07:28:01 -0500
Ready: False
Restart Count: 0
Limits:
memory: 1Gi
Requests:
memory: 1Gi
Environment: <none>
Mounts:
/argo/staging from argo-staging (rw)
/etc/obslytics-data-exporter from obslytics-data-exporter-workflow-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qpggm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
podmetadata:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
obslytics-data-exporter-workflow-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: obslytics-data-exporter-workflow-secrets
Optional: false
argo-staging:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-qpggm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qpggm
Optional: false
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/compute=true
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned dh-dev-argo/obslytics-data-exporter-manual-workflow-4hzqz-3278280317 to avsrivas-dev-ocp-3.11
Normal Pulled 27m kubelet, avsrivas-dev-ocp-3.11 Container image "argoproj/argoexec:v2.4.2" already present on machine
Normal Created 27m kubelet, avsrivas-dev-ocp-3.11 Created container
Normal Started 27m kubelet, avsrivas-dev-ocp-3.11 Started container
Normal Pulled 27m kubelet, avsrivas-dev-ocp-3.11 Container image "argoproj/argoexec:v2.4.2" already present on machine
Normal Created 27m kubelet, avsrivas-dev-ocp-3.11 Created container
Normal Started 27m kubelet, avsrivas-dev-ocp-3.11 Started container
Normal Pulling 27m kubelet, avsrivas-dev-ocp-3.11 pulling image "docker-registry.default.svc:5000/dh-dev-argo/obslytics-data-exporter:latest"
Normal Pulled 27m kubelet, avsrivas-dev-ocp-3.11 Successfully pulled image "docker-registry.default.svc:5000/dh-dev-argo/obslytics-data-exporter:latest"
Normal Created 27m kubelet, avsrivas-dev-ocp-3.11 Created container
Normal Started 27m kubelet, avsrivas-dev-ocp-3.11 Started container
The only thing I learn from the above description is that the pods fail due to an error. I am unable to see any error in order to debug this issue.
When I attempt to read the Argo watch logs:
Name: obslytics-data-exporter-manual-workflow-8wzcc
Namespace: dh-dev-argo
ServiceAccount: default
Status: Running
Created: Tue Nov 24 08:01:10 -0500 (8 minutes ago)
Started: Tue Nov 24 08:01:10 -0500 (8 minutes ago)
Duration: 8 minutes 10 seconds
Progress:
Parameters:
start_timestamp: 2020-11-18T20:00:00Z
STEP TEMPLATE PODNAME DURATION MESSAGE
● obslytics-data-exporter-manual-workflow-8wzcc manual-trigger
└───● trigger obslytics-data-exporter-workflow-triggers/trigger-workflow
├───✔ get-labels(0) obslytics-data-exporter-workflow-template/get-labels obslytics-data-exporter-manual-workflow-8wzcc-2604296472 6s
└───● run obslytics-data-exporter-workflow-template/init
└───● metric-split(0:cluster_version) metric-worker
└───● process-metric run-obslytics
├─✖ process-metric(0) run-obslytics obslytics-data-exporter-manual-workflow-8wzcc-4222496183 6s failed with exit code 126
└─◷ process-metric(1) run-obslytics obslytics-data-exporter-manual-workflow-8wzcc-531670266 7m PodInitializing
I have Question on pod termination on using image: kodekloud/throw-dice
pod-defintion.yml
apiVersion: v1
kind: Pod
metadata:
name: throw-dice-pod
spec:
containers:
- image: kodekloud/throw-dice
name: throw-dice
restartPolicy: Never
I have checked the steps in DockerFile . It runs throw-dice.sh which randomly returns number in between 1 - 6.
let's consider first time container return 3, so how below pod is terminated ? where is the condition defined in the pod level it suppose to terminate if script return number is !6.
Below steps were performed to execute pod-definition.yml
master $ kubectl create -f /root/throw-dice-pod.yaml
pod/throw-dice-pod created
master $ kubectl get pods
NAME READY STATUS RESTARTS AGE
throw-dice-pod 0/1 ContainerCreating 0 9s
master $ kubectl get pods
NAME READY STATUS RESTARTS AGE
throw-dice-pod 0/1 Error 0 12s
master $ kubectl describe pod throw-dice-pod
Name: throw-dice-pod
Namespace: default
Priority: 0
Node: node01/172.17.0.83
Start Time: Sat, 16 May 2020 11:20:01 +0000
Labels: <none>
Annotations: <none>
Status: Failed
IP: 10.88.0.4
IPs:
IP: 10.88.0.4
Containers:
throw-dice:
Container ID: docker://4560c794b58cf8f3e3fad691b2292e37db4e84e20c9286321f026d1735272b5f
Image: kodekloud/throw-dice
Image ID: docker-pullable://kodekloud/throw-dice#sha256:9c70a0f907b99293885a9591b6162e9ec89e127937626a97ca7f9f6be2d98b01
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Sat, 16 May 2020 11:20:10 +0000
Finished: Sat, 16 May 2020 11:20:10 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-nr5kl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-nr5kl:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-nr5kl
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/throw-dice-pod to node01
Normal Pulling 21s kubelet, node01 Pulling image "kodekloud/throw-dice"
Normal Pulled 19s kubelet, node01 Successfully pulled image "kodekloud/throw-dice"
Normal Created 19s kubelet, node01 Created container throw-dice
Normal Started 18s kubelet, node01 Started container throw-dice
master $ kubectl logs throw-dice-pod
2
The dockerfile has ENTRYPOINT sh throw-dice.sh which means execute the script and then the container terminates automatically. If you want the container to keep running you need to start a long running process for example a java process ENTRYPOINT ["java", "-jar", "/whatever/your.jar"]
Exit code 1 here is set by the application.
Take a look into throw-dice.sh. We see that the application shuffling exit code between 0,1,2.
The exit code is always equal to shuffling result above.
If the exit code is 0, 6 is logged.
If the exit code is 1 or 2, result from shuffling integer from 1-5 is logged
So in your case, exit code from shuffling is 1 and shuffling result for log is 2. And exit code 1 from application is considered as Error reason for Kubernetes.
I am practicing the k8s by following the ingress chapter. I am using Google Cluster. Specification are as follows
master: 1.11.7-gke.4
node: 1.11.7-gke.4
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
gke-singh-default-pool-a69fa545-1sm3 Ready <none> 6h v1.11.7-gke.4 10.148.0.46 35.197.128.107 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
gke-singh-default-pool-a69fa545-819z Ready <none> 6h v1.11.7-gke.4 10.148.0.47 35.198.217.71 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
gke-singh-default-pool-a69fa545-djhz Ready <none> 6h v1.11.7-gke.4 10.148.0.45 35.197.159.75 Container-Optimized OS from Google 4.14.89+ docker://17.3.2
master endpoint: 35.186.148.93
DNS: singh.hbot.io (master IP)
To keep my question short. I post my source code in the snippet and links back to here.
Files:
deployment.yaml
ingress.yaml
ingress-rules.yaml
Problem:
curl http://singh.hbot.io/webapp1 got timed out
Description
$ kubectl get deployment -n nginx-ingress
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
nginx-ingress 1 1 1 0 2h
nginx-ingress deployment is not available.
$ kubectl describe deployment -n nginx-ingress
Name: nginx-ingress
Namespace: nginx-ingress
CreationTimestamp: Mon, 04 Mar 2019 15:09:42 +0700
Labels: app=nginx-ingress
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"name":"nginx-ingress","namespace":"nginx-ingress"},"s...
Selector: app=nginx-ingress
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=nginx-ingress
Service Account: nginx-ingress
Containers:
nginx-ingress:
Image: nginx/nginx-ingress:edge
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
-default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: (v1:metadata.name)
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-ingress-77fcd48f4d (1/1 replicas created)
Events: <none>
pods:
$ kubectl get pods --all-namespaces=true
NAMESPACE NAME READY STATUS RESTARTS AGE
default webapp1-7d67d68676-k9hhl 1/1 Running 0 6h
default webapp2-64d4844b78-9kln5 1/1 Running 0 6h
default webapp3-5b8ff7484d-zvcsf 1/1 Running 0 6h
kube-system event-exporter-v0.2.3-85644fcdf-xxflh 2/2 Running 0 6h
kube-system fluentd-gcp-scaler-8b674f786-gvv98 1/1 Running 0 6h
kube-system fluentd-gcp-v3.2.0-srzc2 2/2 Running 0 6h
kube-system fluentd-gcp-v3.2.0-w2z2q 2/2 Running 0 6h
kube-system fluentd-gcp-v3.2.0-z7p9l 2/2 Running 0 6h
kube-system heapster-v1.6.0-beta.1-5685746c7b-kd4mn 3/3 Running 0 6h
kube-system kube-dns-6b98c9c9bf-6p8qr 4/4 Running 0 6h
kube-system kube-dns-6b98c9c9bf-pffpt 4/4 Running 0 6h
kube-system kube-dns-autoscaler-67c97c87fb-gbgrs 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-1sm3 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-819z 1/1 Running 0 6h
kube-system kube-proxy-gke-singh-default-pool-a69fa545-djhz 1/1 Running 0 6h
kube-system l7-default-backend-7ff48cffd7-trqvx 1/1 Running 0 6h
kube-system metrics-server-v0.2.1-fd596d746-bvdfk 2/2 Running 0 6h
kube-system tiller-deploy-57c574bfb8-xnmtj 1/1 Running 0 1h
nginx-ingress nginx-ingress-77fcd48f4d-rfwbk 0/1 CrashLoopBackOff 35 2h
describe pod
$ kubectl describe pods -n nginx-ingress
Name: nginx-ingress-77fcd48f4d-5rhtv
Namespace: nginx-ingress
Priority: 0
PriorityClassName: <none>
Node: gke-singh-default-pool-a69fa545-djhz/10.148.0.45
Start Time: Mon, 04 Mar 2019 17:55:00 +0700
Labels: app=nginx-ingress
pod-template-hash=3397804908
Annotations: <none>
Status: Running
IP: 10.48.2.10
Controlled By: ReplicaSet/nginx-ingress-77fcd48f4d
Containers:
nginx-ingress:
Container ID: docker://5d3ee9e2bf7a2060ff0a96fdd884a937b77978c137df232dbfd0d3e5de89fe0e
Image: nginx/nginx-ingress:edge
Image ID: docker-pullable://nginx/nginx-ingress#sha256:16c1c6dde0b904f031d3c173e0b04eb82fe9c4c85cb1e1f83a14d5b56a568250
Ports: 80/TCP, 443/TCP
Host Ports: 0/TCP, 0/TCP
Args:
-nginx-configmaps=$(POD_NAMESPACE)/nginx-config
-default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Mon, 04 Mar 2019 18:16:33 +0700
Finished: Mon, 04 Mar 2019 18:16:33 +0700
Ready: False
Restart Count: 9
Environment:
POD_NAMESPACE: nginx-ingress (v1:metadata.namespace)
POD_NAME: nginx-ingress-77fcd48f4d-5rhtv (v1:metadata.name)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-token-zvcwt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-token-zvcwt:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-token-zvcwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 26m default-scheduler Successfully assigned nginx-ingress/nginx-ingress-77fcd48f4d-5rhtv to gke-singh-default-pool-a69fa545-djhz
Normal Created 25m (x4 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Created container
Normal Started 25m (x4 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Started container
Normal Pulling 24m (x5 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz pulling image "nginx/nginx-ingress:edge"
Normal Pulled 24m (x5 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Successfully pulled image "nginx/nginx-ingress:edge"
Warning BackOff 62s (x112 over 26m) kubelet, gke-singh-default-pool-a69fa545-djhz Back-off restarting failed container
Fix container terminated
Add to the command to ingress.yaml to prevent container finish running and get terminated by k8s.
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
Ingress has no IP address from GKE. Let me have a look in details
describe ingress:
$ kubectl describe ing
Name: webapp-ingress
Namespace: default
Address:
Default backend: default-http-backend:80 (10.48.0.8:8080)
Rules:
Host Path Backends
---- ---- --------
*
/webapp1 webapp1-svc:80 (<none>)
/webapp2 webapp2-svc:80 (<none>)
webapp3-svc:80 (<none>)
Annotations:
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{},"name":"webapp-ingress","namespace":"default"},"spec":{"rules":[{"http":{"paths":[{"backend":{"serviceName":"webapp1-svc","servicePort":80},"path":"/webapp1"},{"backend":{"serviceName":"webapp2-svc","servicePort":80},"path":"/webapp2"},{"backend":{"serviceName":"webapp3-svc","servicePort":80}}]}}]}}
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Translate 7m45s (x59 over 4h20m) loadbalancer-controller error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp2-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp3-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
From this line I got all the ultimate solution from Christian Roy Thank you very much.
Fix the ClusterIP
It is default value then I have to edit my manifest file using NodePort as follow
apiVersion: v1
kind: Service
metadata:
name: webapp1-svc
labels:
app: webapp1
spec:
type: NodePort
ports:
- port: 80
selector:
app: webapp1
And that is.
The answer is in your question. The describe of your ingress shows the problem.
You did kubectl describe ing and the last part of that output was:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Translate 7m45s (x59 over 4h20m) loadbalancer-controller error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp2-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"; service "default/webapp3-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
The important part is:
error while evaluating the ingress spec: service "default/webapp1-svc" is type "ClusterIP", expected "NodePort" or "LoadBalancer"
Solution
Just change all your services to be of type NodePort and it will work.
I have to add command in order to let the container not finish working.
command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]