How to fix catatonit error receiving in the pods? - docker

I have deployed the different micro-service in the cluster. And trying to use Log-shipper as a sidecar in one of the services.
When tried to deploy the micro-service, all the services are coming up but one services pod is getting stuck in CrashLoopBackOff.
The service contains 2 container, one for the service itself and other for logshipper as a sidecar.
The error message from the logshipper is as below:
ERROR (catatonit:6): failed to exec pid1: No such file or directory
and in the pod describe its showing
Backoff 40s restarting failed container=logshipper

Related

Azure AKS - Internal error occurred: Authorization error (user=masterclient, verb=get, resource=nodes, subresource=proxy)

I have two AKS installed, in both
Suddenly I'm getting this Error
Waiting: ImagePullBackOff
POD Log: Internal error occurred: Authorization error (user=masterclient, verb=get, resource=nodes, subresource=proxy)
I have my acr connected fine with aks. Added roles for aks in acr again. BTW the error occurs on some random pod in the scale. e.g. if scalling is 9 pods, it may fail at 7.
Its also happening while installing kafka. so its not just my container registry.

Docker Engine Fails to start on Windows Server 2019

Our application is docker based and requires Nat network to be created on the host machine in order to communicate since its a web service. It was working since last 4 months and suddenly stopped working. Checked and found that docker service is stopped. Manually tried restarting the service but it failed to start. Below is the error in the event log:
Error:
fatal: failed to start deamon: Error initializing network controller: Error creating default network: failed during hnsCallRawResponse: hnsCall failed in Win32: There are no more endpoints available from endpoint mapper. (0x6d9)
Tried the below steps:
Deleted the hns.data and restarted the hns service. Then restarted the docker engine service. The issue persists.
Tried running MOFCOMP. Same issue.
Tried removing docker and reinstalling it. Doesn't work.
Tried creating nat network manually. But getting the above mentioned error.
Can someone help here? what needs to be checked or what could be the reason for this issue?

How can I investigate what is wrong with my SCDF configuration when I get "Failed to create stream"?

I am trying to deploy my first stream APP via the spring cloud dataflow dashboard, but I keep getting the "Failed to create stream" error in the UI. Can someone help me investigate what might be wrong?
I am running SCDF on kubernetes and my deployment consists of the following components:
scdf-server
skipper
mariadb
rabbitmq
My stream is the simple time | log example
Try using kubectl on the scdf-server pod to see if it provides any information. I've seen that error occur if an app I deployed was not accessible - in my case, I'd referenced it by an incorrect filepath which didn't get caught by the server until it tried to deploy the stream.
It could be failing at any point in the deploy. To gain some insight, you can view the events and logs on each pod w/ the following commands:
kubectl describe pods/<pod-name>
kubectl logs pods/<pod_name>

kubelet logs flooding even after pods deleted

Kubernetes version : v1.6.7
Network plugin : weave
I recently noticed that my entire cluster of 3 nodes went down. Doing my initial level of troubleshooting revealed that /var on all nodes was 100%.
Doing further into the logs revealed the logs to be flooded by kubelet stating
Jan 15 19:09:43 test-master kubelet[1220]: E0115 19:09:43.636001 1220 kuberuntime_gc.go:138] Failed to stop sandbox "fea8c54ca834a339e8fd476e1cfba44ae47188bbbbb7140e550d055a63487211" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "<TROUBLING_POD>-1545236220-ds0v1_kube-system" network: CNI failed to retrieve network namespace path: Error: No such container: fea8c54ca834a339e8fd476e1cfba44ae47188bbbbb7140e550d055a63487211
Jan 15 19:09:43 test-master kubelet[1220]: E0115 19:09:43.637690 1220 docker_sandbox.go:205] Failed to stop sandbox "fea94c9f46923806c177e4a158ffe3494fe17638198f30498a024c3e8237f648": Error response from daemon: {"message":"No such container: fea94c9f46923806c177e4a158ffe3494fe17638198f30498a024c3e8237f648"}
The <TROUBLING_POD>-1545236220-ds0v1 was being initiated due to a cronjob and due to some misconfigurations, there were errors occurring during the running of those pods and more pods were being spun up.
So I deleted all the jobs and their related pods. So I had a cluster that had no jobs/pods running related to my cronjob and still see the same ERROR messages flooding the logs.
I did :
1) Restart docker and kubelet on all nodes.
2) Restart the entire control plane
and also
3) Reboot all nodes.
But still the logs are being flooded with the same error messages even though no such pods are even being spun up.
So I dont know how can I stop kubelet from throwing out the errors.
Is there a way for me to reset the network plugin I am using ? Or do something else ?
Check if the pod directory exists under /var/lib/kubelet
You're on a very old version of Kubernetes, upgrading will fix this issue.

Openshift cannot create any pods

I am testing a openshift v3 starter (ca-central-1) and created a project from custom docker image stream (from github). It was running fine, but after I changed a config map, rescaled the deployment to 0 pods, upscaled it to 1 pod, openshift can no longer start any pods.
The error in web interface is (in Events tab):
Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container
for pod "hass-19-98vws": Error response from daemon: grpc: the connection is unavailable.
Pod sandbox changed, it will be killed and re-created.
These messages appear in a endless loop. I tried to deploy new deployment but it gives same logs.
What am I doing wring?
Ok, it seems that I was affected by an upgrade of cluster. The issue resolved itself after a 2 days.

Resources