Kubernetes Garbage Collection fails - FreeDiskSpaceFailed & ImageGCFailed - docker

Apparently the GC of my Kubernetes cluster is failing to delete any image and the server is getting to full-disk.
Can you please guide me on where to find the logs for the ImageGC with the error trying to delete the images or to a reason of why this is happening?
3m 5d 1591 ip-xxx.internal Node Warning FreeDiskSpaceFailed {kubelet ip-xxx.internal} failed to garbage collect required amount of images. Wanted to free 6312950988, but freed 0
3m 5d 1591 ip-xxx.internal Node Warning ImageGCFailed {kubelet ip-xxx.internal} failed to garbage collect required amount of images. Wanted to free 6312950988, but freed 0
Thanks!

There may not be much in the way of logs (see this issue) but there may be Kubernetes event data. Look for events of type ImageGCFailed.
Alternatively you could check the cadvisor Prometheus metrics to see if it exposes any information about container garbage collecton.
Docs on the GC feature in general: https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/

most likely your host file system is full, can you check /var file system usage.
you can use docker-gc to cleanup old image.
https://github.com/spotify/docker-gc
Run it like this
docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /etc:/etc:ro spotify/docker-gc

Related

How can I get an insufficient cpu error inside a GKE cluster with autopilot mode?

I created a cluster with autopilot mode. When I try to install an app inside this cluster using helm, workloads fail with this error Does not have minimum availability. If I click on this error, I get Cannot schedule pods: Insufficient cpu and Cannot schedule pods: Insufficient memory.
If I do kubectl describe node <name> I find 0/3 nodes are available: 1 Insufficient memory, 3 Insufficient cpu.
Isn't GKE autopilot mode supposed to allocate sufficient memory and cpu?
I found where my mistake was. It had nothing to do with cpu or memomry. It was a mistake inside my yaml file (wrong host for database).

What does `docker system prune -a` actually do?

My empty space before running docker system prune -a was 900 MB and running it gives me 65 GB free space although the command report that it cleaned only 14.5 GB
Is the report is just wrong on am I missing something here?
The docs is not telling something new and it would be normal if it clears only 14.5 GB and this only gives me one answer that I'm doing it in a wrong way. Any thoughts here?
This will remove following content form your host machine where docker is running
- all stopped containers
- all networks not used by at least one container
- all images without at least one container associated to them
- all build cache

Using k8s node resources out of k8s

What would happen with kubernetes scheduling if I have a kubernetes node, but I use the container (docker) engine for some other stuff, outside of the context of kubernetes.
For example if I manually SSH to the respective node and I do docker run something. Would kubernetes scheduling take into account the fact that this node is busy running other stuff, and it might not be able to host any other containers now?
What would happen in the following scenario:
Node with 8 GB RAM
running a pod with resource request 2 GB, limit 4 GB, and current usage 3 GB
ssh on node and docker run a container with 5 GB, using all
P.S. Please skip the "why would you go and run docker run directly on the node" questions. I don't want to, but reasons.
I'm pretty sure Kubernetes's scheduling only considers (a) pods it knows about and not other resources, and (b) only their resource requests.
In the situation you describe, with exactly that resource utilization, things will work fine. The pod can be scheduled on the node because the total resource requests using it are 2 GB out of 8 GB. The total memory usage doesn't exceed the physical memory size either, so you're okay.
Say the pod allocated a little bit more memory. Now the system as a whole is above its physical memory capacity, so the Linux kernel will arbitrarily kill something off. This is often the largest thing. You'll typically see an exit code of 137 (matching SIGKILL) in whichever system manages it.
This behavior is the same even if you run your side job in something like a DaemonSet. It requests 2 GB of RAM, so both pods fit on the same node [4 GB/8 GB], but if it has a resource limit of 6 GB RAM, something will get killed off.
The place where things are different is if you can predict the high memory use. Say your pod requests 3 GB/limits 6 GB of RAM, and your side process will predictably also use 6 GB. If you just docker run it something will definitely get OOM-killed. If you run it as a DaemonSet declaring a 6 GB memory request, the Kubernetes scheduler will know the pod doesn't fit and won't place it there (it may get stuck in "Pending" state if it can't be scheduled anywhere).
Kubernetes won't see other processes running on the host, however you can tell the kubelet on that host how much of the host resources to reserve for the host itself, preventing Kubernetes from scheduling pods that would exceed the host capacity. See the --system-reserved flag that you can pass to the kubelet:
--system-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000]

Google Kubernetes logs

Memory cgroup out of memory: Kill process 545486 (python3) score 2016 or sacrifice child Killed process 545486 (python3) total-vm:579096kB, anon-rss:518892kB, file-rss:16952kB
This node logs and my container is continuously restarting randomly. Running python cotnainer with 4 replicas.
Python application contains socket with a flask. Docker image contain of python3.5:slim
Kubectl get nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-XXXXXXX-cluster-highmem-pool-gen2-f2743e02-msv2 682m 17% 11959Mi 89%
Today morning node log : 0/1 nodes are available: 1 Insufficient cpu.
But node CPU usage is 17% only
There not much running inside pod.
Have a look at the best practices and try to adjust resource requests and limits for CPU and memory. If your app starts hitting your CPU limits, Kubernetes starts throttling your container. Because there is no way to throttle memory usage, if a container goes past its memory limit it will be terminated (and restarted). So, using suitable limits should help you to solve your problem with restarts of your containers.
In case request of your container exceeded limits, Kubernetes will throw an error, similar to one you have, and won’t let you run the container.
After adjusting limits, you could use some monitoring system (like Stackdriver) to find the cause of potential memory leak.

Kubernetes OOM pod killed because kernel memory grows to much

I am working on a java service that basically creates files in a network file system to store data. It runs in a k8s cluster in a Ubuntu 18.04 LTS.
When we began to limit the memory in kubernetes (limits: memory: 3Gi), the pods began to be OOMKilled by kubernetes.
At the beginning we thought it was a leak of memory in the java process, but analyzing more deeply we noticed that the problem is the memory of the kernel.
We validated that looking at the file /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
We isolated the case to only create files (without java) with the DD command like this:
for i in {1..50000}; do dd if=/dev/urandom bs=4096 count=1 of=file$i; done
And with the dd command we saw that the same thing happened ( the kernel memory grew until OOM).
After k8s restarted the pod, I got doing a describe pod:
Last State:Terminated
Reason: OOMKilled
Exit Code: 143
Creating files cause the kernel memory grows, deleting those files cause the memory decreases . But our services store data , so it creates a lot of files continuously, until the pod is killed and restarted because OOMKilled.
We tested limiting the kernel memory using a stand alone docker with the --kernel-memory parameter and it worked as expected. The kernel memory grew to the limit and did not rise anymore. But we did not find any way to do that in a kubernetes cluster.
Is there a way to limit the kernel memory in a K8S environment ?
Why the creation of files causes the kernel memory grows and it is not released ?
Thanks for all this info, it was very useful!
On my app, I solved this by creating a new side container that runs a cron job, every 5 minutes with the following command:
echo 3 > /proc/sys/vm/drop_caches
(note that you need the side container to run in privileged mode)
It works nicely and has the advantage of being predictable: every 5 minutes, your memory cache will be cleared.

Resources