Before i check the logs, pods are failing and removed by jenkins and I am unable to see the logs.
How can i check the logs of the pods that are removed.
is there any simple way to save the logs in kubernetes.
I don't have any logging system for my kubernetes.
In a fraction of seconds, it keeps creating and deleting because of some error. I want to find what the error is. before i check the logs, the container name is changed.
Thanks,
Most probably you meant "pods are failing and removed by kubernetes and I am unable to see the logs." This is kubernetes itself who manage API objects, not jenkins.
Answering your question directly - you are not able to fetch any logs from any of your containers once related POD was deleted. Deletion pod means wiping all pod's containers with all the data included. Logs were deleted in the moment your pod was terminated.
By default, if a container restarts, the kubelet keeps one terminated
container with its logs. If a pod is evicted from the node, all
corresponding containers are also evicted, along with their logs.
If you pod were alive - you would be able to use ----previous flag to check the logs, but unfortunatelly thats not your case.
There are a lot of similar questions - and the only main suggestion is to set up some log aggregation system that will store logs separately. IN that case you wont lose them and will be able at least check them.
Logging at the node level
Cluster-level logging architectures
How to see logs of terminated pods
How to access Logs of Pods in Kubernetes after its deletion
Related
I have a Kubernetes cluster running Jenkins master in a single pod and each build running in a separate slave pod. When there are many builds running, there are many pods being spun up and down and often I will see an error in a job like this:
Cannot contact slave-jenkins-0g9p0: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#197b6a38:JNLP4-connect connection from 10.10.3.90/10.10.3.90:54418": Remote call on JNLP4-connect connection from 10.10.3.90/10.10.3.90:54418 failed. The channel is closing down or has closed down
Could not connect to slave-jenkins-0g9p0 to send interrupt signal to process
The pod, for example slave-jenkins-0g9p0, just disappears. There is no trace that it existed. While watching information like kubectl describe pod slave-jenkins-0g9p0, there is no error message, it simply stops existing.
I have a feeling that because there are multiple pods spinning up and down that Kubernetes attempts to balance the load on the nodes and reschedule the pod but after killing it, it cannot spin up the pod on another node. I cannot be sure though. Maybe there is a way to tell K8s to tie a pod to a node until it exits itself? Im not really sure what/how to debug this case.
Kuberentes version: v1.16.13-eks-2ba888 on AWS EKS
Jenkins version: 2.257
Kubernetes plugin version 1.27.2
Any advise would be appreciated
Thanks
UPDATE:
I have uploaded three slave pod manifest examples here where you can see the resources allocated. The above issue occurs in each of these running pods.
The node pool is controlled by the Kubernetes autoscaler (v1.14.6) and use AWS t3a.large (2 CPU, 8GB mem) instances.
UPDATE 2:
I believe that I have found the cause of the problem. I disabled the cluster-autoscaler](https://github.com/kubernetes/autoscaler) (v1.14.6) and the problem stopped.
So what is seems is happening is that the autoscaler is removing the node that the slave pd is running on. I know that taints can be used to tell the autoscaler not to remove a node but is there a way to do this dynamically that it wont remove a node if a certain pod is running on it. Without having to develop something new.
I tried simple PVC example from here with nginx claiming azure-managed-disk and I getting 'unable to mount' error, see below. Also I can't remove the created PV with 'kubectl delete pv pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988'.
$kubectl get events
LAST SEEN TYPE REASON KIND MESSAGE
10m Warning FailedMount Pod MountVolume.WaitForAttach failed for volume "pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988" : azureDisk - WaitForAttach failed within timeout node (aks-agentpool-10844952-2) diskId:(kubernetes-dynamic-pvc-3f3c3c78-9779-11e9-a7eb-1aafd0e2f988) lun:(1)
22s Warning FailedMount Pod Unable to mount volumes for pod "nginx_default(bd16b9c8-97b2-11e9-9018-eaa2ea1705c5)": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx". list of unmounted volumes=[volume]. list of unattached volumes=[volume default-token-92rj6]
My managed aks cluster is using v1.12.8 , SP has contributor role (owner role doesn't not help too). There is storage class 'managed-premium', in the yaml from my simple nginx example (link provided).
For your issue, there are no more details to judge the exact reason. Just list the possible reason here.
It's just a simple error that it's failed when the API call to Azure. If so, you just need to delete them and recreate again.
The node that the pod run in already has too many Azure disks attached. If so, you need to schedule the pod run in another node which does not attach to many disks.
The Azure disk cannot be unmounted or detached from the old node. It means that the PV is in use and attach to another node. If so, you need to create another dynamic PV that does not in use for your pod.
You can check carefully again according to these reasons. In my opinion, the third reason is the most possible one. Of curse, it all dependents on the actual situation. For more details about the similar errors, see How to Understand & Resolve “Warning Failed Attach Volume” and “Warning Failed Mount” Errors in Kubernetes on Azure.
I have pods running in my production cluster. The pods suddenly stopped generating logs. When I restart the pod, it generates logs for sometime and again it stops. Can any one brief me what could be the issue?. I also notice docker container have logging drive size option(maxfiles and maxsize), how do we specify that in the deployment yaml file. Wil increasing the value do any help?.
Specifying TTY=false option will help me in this case?
I have tried restarting the pod too.
I am new to Kubernetes.
I have seen pod automaticaĺly restart in case of failure.
When node failure happens, new pod regenerate to another node.
In both cases,
What happens when pod gets failed in the middle of the process, (say: httpsession)? Can we provide the same session to the already logged in user.
Please forgive if the question is irrelevant.
Yes, you can use health-checks like readiness and liveness probes for your pod. No traffic will be routed to the pod till readiness check passes and pod will be restarted if liveness check fails. These checks can be added to your pod-spec.
And session management is not handled by k8s. It must be done by the application itself.
Anyhow, If you want to persist some data you can use PV and PVC and bind the volume to your pod.
Yes, the normal way to create pods is through one of the higher-level controllers like Deployments or StatefulSets. These will automatically detect if there are not the right number of pods and start replacements. As for showing the user the same log-in session, that's not usually related to the running pod, your login session on a website is usually stored in a cookie of some kind and references stuff in the database, not the web server.
I have a Kubernetes Pod created by a Stateful Set (not sure if that matters). There are two containers in this pod. When one of the two containers fails and use the get pods command, 1/2 containers are Ready and the Status is "Error." The second container never attempts a restart and I am unable to destroy the pod except by using the --grace-period=0 --force flags. A typical delete leaves the pod hanging in a "terminating" state either forever or for a very very long time. What could be causing this behavior and how to go about debugging it?
I encounter a similar problem on a node in my k8s 1.6 cluster esp. when the node has been running for a couple of weeks. It can happen to any node. When this happens, I restart kubelet on the node and the errors go away.
It's not the best thing to do, but it always solves the problem. It's also not detrimental to the cluster if you restart kubelet because the running pods continue to stay up.
kubectl get po -o wide will likely reveal to you that the errant pods are running on one node. SSH to that node and restart kubelet.