I am testing a openshift v3 starter (ca-central-1) and created a project from custom docker image stream (from github). It was running fine, but after I changed a config map, rescaled the deployment to 0 pods, upscaled it to 1 pod, openshift can no longer start any pods.
The error in web interface is (in Events tab):
Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container
for pod "hass-19-98vws": Error response from daemon: grpc: the connection is unavailable.
Pod sandbox changed, it will be killed and re-created.
These messages appear in a endless loop. I tried to deploy new deployment but it gives same logs.
What am I doing wring?
Ok, it seems that I was affected by an upgrade of cluster. The issue resolved itself after a 2 days.
Related
I have two AKS installed, in both
Suddenly I'm getting this Error
Waiting: ImagePullBackOff
POD Log: Internal error occurred: Authorization error (user=masterclient, verb=get, resource=nodes, subresource=proxy)
I have my acr connected fine with aks. Added roles for aks in acr again. BTW the error occurs on some random pod in the scale. e.g. if scalling is 9 pods, it may fail at 7.
Its also happening while installing kafka. so its not just my container registry.
I have deployed the different micro-service in the cluster. And trying to use Log-shipper as a sidecar in one of the services.
When tried to deploy the micro-service, all the services are coming up but one services pod is getting stuck in CrashLoopBackOff.
The service contains 2 container, one for the service itself and other for logshipper as a sidecar.
The error message from the logshipper is as below:
ERROR (catatonit:6): failed to exec pid1: No such file or directory
and in the pod describe its showing
Backoff 40s restarting failed container=logshipper
enter image description here
Note : Disbaled encryption in both cluster and I'm using trial licencse, Both cluster's OneFS version is 9.1 and i tried 9.1 between 9.4 too.
The SyncIQ log /var/log/isi_migrate.log shows "Killing policy: No node on source cluster was able to connect to target cluster." message.
When i try to start job, I'm getting above error. can you tell what am i missing or is there Some configuration needs to done before run the job?
Kubernetes version : v1.6.7
Network plugin : weave
I recently noticed that my entire cluster of 3 nodes went down. Doing my initial level of troubleshooting revealed that /var on all nodes was 100%.
Doing further into the logs revealed the logs to be flooded by kubelet stating
Jan 15 19:09:43 test-master kubelet[1220]: E0115 19:09:43.636001 1220 kuberuntime_gc.go:138] Failed to stop sandbox "fea8c54ca834a339e8fd476e1cfba44ae47188bbbbb7140e550d055a63487211" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "<TROUBLING_POD>-1545236220-ds0v1_kube-system" network: CNI failed to retrieve network namespace path: Error: No such container: fea8c54ca834a339e8fd476e1cfba44ae47188bbbbb7140e550d055a63487211
Jan 15 19:09:43 test-master kubelet[1220]: E0115 19:09:43.637690 1220 docker_sandbox.go:205] Failed to stop sandbox "fea94c9f46923806c177e4a158ffe3494fe17638198f30498a024c3e8237f648": Error response from daemon: {"message":"No such container: fea94c9f46923806c177e4a158ffe3494fe17638198f30498a024c3e8237f648"}
The <TROUBLING_POD>-1545236220-ds0v1 was being initiated due to a cronjob and due to some misconfigurations, there were errors occurring during the running of those pods and more pods were being spun up.
So I deleted all the jobs and their related pods. So I had a cluster that had no jobs/pods running related to my cronjob and still see the same ERROR messages flooding the logs.
I did :
1) Restart docker and kubelet on all nodes.
2) Restart the entire control plane
and also
3) Reboot all nodes.
But still the logs are being flooded with the same error messages even though no such pods are even being spun up.
So I dont know how can I stop kubelet from throwing out the errors.
Is there a way for me to reset the network plugin I am using ? Or do something else ?
Check if the pod directory exists under /var/lib/kubelet
You're on a very old version of Kubernetes, upgrading will fix this issue.
I'm trying to run PySpark on a Kubernetes cluster on AWS.
I'm submitting to the cluster with the spark-submit command and viewing the results in the Kubernetes dashboard.
The driver pod is getting created fine, but the executors frequently fail to spin up, failing with either of the following errors:
Failed to pull image "docker.io/joemalt/[image-name]:[tag]": rpc error: code = Unknown desc = Error response from daemon: unauthorized: authentication required
Failed to pull image "docker.io/joemalt/[image name]:[tag]": rpc error: code = Unknown desc = Error response from daemon: error parsing HTTP 404 response body: invalid character 'p' after top-level value: "404 page not found\n"
Kubernetes attempts to recreate the pods, but the errors are frequent enough that it often doesn't manage to get any executor pods working at all.
Neither of these errors occur when setting up the driver pod, or when pulling the image manually. The repository is public so the authentication required in particular doesn't make any sense to me. I've tried replacing the Kubernetes cluster, with no success.