pods are not generating application logs continuosly - docker

I have pods running in my production cluster. The pods suddenly stopped generating logs. When I restart the pod, it generates logs for sometime and again it stops. Can any one brief me what could be the issue?. I also notice docker container have logging drive size option(maxfiles and maxsize), how do we specify that in the deployment yaml file. Wil increasing the value do any help?.
Specifying TTY=false option will help me in this case?
I have tried restarting the pod too.

Related

How to check failed container logs in Kubernetes

Before i check the logs, pods are failing and removed by jenkins and I am unable to see the logs.
How can i check the logs of the pods that are removed.
is there any simple way to save the logs in kubernetes.
I don't have any logging system for my kubernetes.
In a fraction of seconds, it keeps creating and deleting because of some error. I want to find what the error is. before i check the logs, the container name is changed.
Thanks,
Most probably you meant "pods are failing and removed by kubernetes and I am unable to see the logs." This is kubernetes itself who manage API objects, not jenkins.
Answering your question directly - you are not able to fetch any logs from any of your containers once related POD was deleted. Deletion pod means wiping all pod's containers with all the data included. Logs were deleted in the moment your pod was terminated.
By default, if a container restarts, the kubelet keeps one terminated
container with its logs. If a pod is evicted from the node, all
corresponding containers are also evicted, along with their logs.
If you pod were alive - you would be able to use ----previous flag to check the logs, but unfortunatelly thats not your case.
There are a lot of similar questions - and the only main suggestion is to set up some log aggregation system that will store logs separately. IN that case you wont lose them and will be able at least check them.
Logging at the node level
Cluster-level logging architectures
How to see logs of terminated pods
How to access Logs of Pods in Kubernetes after its deletion

Jenkins slave pods on Kubernetes disappear when their is an influx of running pods

I have a Kubernetes cluster running Jenkins master in a single pod and each build running in a separate slave pod. When there are many builds running, there are many pods being spun up and down and often I will see an error in a job like this:
Cannot contact slave-jenkins-0g9p0: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#197b6a38:JNLP4-connect connection from 10.10.3.90/10.10.3.90:54418": Remote call on JNLP4-connect connection from 10.10.3.90/10.10.3.90:54418 failed. The channel is closing down or has closed down
Could not connect to slave-jenkins-0g9p0 to send interrupt signal to process
The pod, for example slave-jenkins-0g9p0, just disappears. There is no trace that it existed. While watching information like kubectl describe pod slave-jenkins-0g9p0, there is no error message, it simply stops existing.
I have a feeling that because there are multiple pods spinning up and down that Kubernetes attempts to balance the load on the nodes and reschedule the pod but after killing it, it cannot spin up the pod on another node. I cannot be sure though. Maybe there is a way to tell K8s to tie a pod to a node until it exits itself? Im not really sure what/how to debug this case.
Kuberentes version: v1.16.13-eks-2ba888 on AWS EKS
Jenkins version: 2.257
Kubernetes plugin version 1.27.2
Any advise would be appreciated
Thanks
UPDATE:
I have uploaded three slave pod manifest examples here where you can see the resources allocated. The above issue occurs in each of these running pods.
The node pool is controlled by the Kubernetes autoscaler (v1.14.6) and use AWS t3a.large (2 CPU, 8GB mem) instances.
UPDATE 2:
I believe that I have found the cause of the problem. I disabled the cluster-autoscaler](https://github.com/kubernetes/autoscaler) (v1.14.6) and the problem stopped.
So what is seems is happening is that the autoscaler is removing the node that the slave pd is running on. I know that taints can be used to tell the autoscaler not to remove a node but is there a way to do this dynamically that it wont remove a node if a certain pod is running on it. Without having to develop something new.

Docker desktop - kubernetes failed to start

I have installed Docker Desktop (version : 2.3.0.4) and enabled Kubernetes.
I deployed couple of PODS and everything was working fine, Since yesterday I am facing a weird issue mentioned below:
Unable to connect to the server: dial tcp 127.0.0.1:6443: connectex: No
connection could be made because the target machine actively refused it.
As such, no changes were made on my system. I am using Linux Containers on Windows 10 machine.
Following steps I have tried:
Restarted the Docker Desktop
Tried the same with minikube and Docker Desktop both
Tried to disable the firewall but due to some permissions, I am not able to turn it off.
I have reset the kubernetes cluster as well.
I tried numerous different changes to fix docker desktop kubernetes failing to start. What finally worked for me is...
Clicked the troubleshooting icon (it's a bug icon) and then chose Clean/Purge Data.*
Finally,I found the solution for this.
VPN was causing the issue, I am using my office laptop and after restart, VPN was enabled and logged-in and due to this Kubernetes was not working.
After disabling the VPN, Kubernetes cluster working fine.
Hope that helps others as well.
For me, just "Clean and Purge" wasn't enough. Here is what I did.
Log off VPN
Go to bug and "Clean and Purge Data"
Also choose "Reset to Factory Defaults"
Restart Docker Desktop
Choose "Enable Kubernetes"
At this point, the "Starting" took a while for Kubernetes to be enabled. Now's it all good.
$ kubectl get namespace
NAME STATUS AGE
default Active 80s
kube-node-lease Active 82s
kube-public Active 82s
kube-system Active 82s
I tried clean/purge data and resetting factory settings but that didn't worked.
I had to reset kubernetes cluster from here.
In my case, the corporate proxy server caused the Kubernetes startup to fail. Addiing *.docker.internal to the no_proxy hosts solved the issue.
I had similar problem.
Install Minikube
I install minikube and I run as following on windows 10.
starting of kubectl
Then I gave permission for docker.
Check cluster-info
When I check cluster-info result as following
cluster info results
Try to get pods
When I try to get pods I did not get any error.
As #N-ate mentioned above, after clicking Clean/Purge Data which removes all downloaded images from my computer, now docker and kubernates are running properly.
As you can see in the image below, I only have kubernates images running on docker and it takes most of the allocated memory. I guess the failure of starting kubernates was related to this memory issue.
In my case, the Kubernetes (Docker Desktop on Mac) is not running properly though I can manage Pods, Services, etc., when I opened the Docker Desktop, it says
Kubernetes failed to start (red background)
I managed to fix the issue by resetting Docker Desktop and Prune/cleaning the storage.
Even I had similar problem after updating to Docker desktop(version 4.11.1). After I downgraded the version it works fine.
Troubleshooting steps
check is there any errors by running following command
kubectl get events|grep node
and make sure all pods are in running state.
kubectl get pods --namespace kube-system
I don't know for others but for some reasons, the above suggested options didn't work for me while fixing K8s on Docker Desktop on Windows. Tried fixing by cleaning the cluster, resetting to default, restarting pc, installing previous versions of Docker Desktop, enabling my pc HiperVisor, and giving it more resource priority, and others but yet still K8s failed to start, even though the Docker starts.
I chanced on Minikube as an alternative tool (without UI) to create my cluster and interacted with it using Kubectl.
And K8s worked for me locally.
I followed this guide - https://minikube.sigs.k8s.io/docs/start/
My docker-desktop is running behind the company proxy server.
I deleted following Proxy Env Variables from my windows OS.
HTTPS_PROXY:serveraddess
HTTP_PROXY:serveraddress
and I set up manual proxy in docker desktop.
My steps:
restart docker - it didn't help.
reset Kubernetes - it didn't help.
Adding missing 'wslconfig' file to C:\Users[MY USER] - it didn't help.
Restart the computer between any step - it didn't help.
stop using Wsl reuse Wsl - it didn't help.
uninstall docker and install again and enable Kubernetes - it didn't help.
Remove '.kube' folder from C:\Users[MY USER] and reset Kubernetes - It causes the Kubernetes to try stopping, and after failure - restart docker - which succeeded.

Cannot deploy apps in Openshift Online: `Error syncing pod: FailedSync`

So I'm new to Openshift Online, and I'm looking to deploy a test image that when run simply runs a C++ native executable that says Hello world!.
After pushing the built docker image to the docker hub and creating an app that uses that image, I've waited for it to deploy. At some point in the process, a warning event arises stating
Error syncing pod, Reason: FailedSync
Then, the deployment stalls at pending until the deadline runs out, and it reports the deployment failed.
As far as I know, I can't have done anything wrong. I simply created an app with the default settings that uses an image.
The only thing that could be happening is that the image runs as root, which, upon creating the app, caused a warning.
WARNING: Image "me/blahblah:test" runs as the 'root' user which may not be permitted by your cluster administrator.
However, this doesn't seem to be causing the problem, since it hasn't even deployed the app by the time the process stalls until it reaches the deadline.
I'll add any extra information that could lead to the problem being solved.

Kubernetes Pod with Error Status Stuck Terminating

I have a Kubernetes Pod created by a Stateful Set (not sure if that matters). There are two containers in this pod. When one of the two containers fails and use the get pods command, 1/2 containers are Ready and the Status is "Error." The second container never attempts a restart and I am unable to destroy the pod except by using the --grace-period=0 --force flags. A typical delete leaves the pod hanging in a "terminating" state either forever or for a very very long time. What could be causing this behavior and how to go about debugging it?
I encounter a similar problem on a node in my k8s 1.6 cluster esp. when the node has been running for a couple of weeks. It can happen to any node. When this happens, I restart kubelet on the node and the errors go away.
It's not the best thing to do, but it always solves the problem. It's also not detrimental to the cluster if you restart kubelet because the running pods continue to stay up.
kubectl get po -o wide will likely reveal to you that the errant pods are running on one node. SSH to that node and restart kubelet.

Resources