tutum node always at full memory consumption - docker

I have a tutum node and it's always near full memory consumption. If I upgrade the node to more memory, it's again full.
To be more specific: Once I started with a node and, say, 6 services I can't add another one because the node is always full. I have to restart from scratch or buy another node:
> watch -n 5 free
total used free shared buffers cached
Mem: 4048280 3902288 145992 22796 310708 1334052
-/+ buffers/cache: 2257528 1790752
Swap: 0 0 0
The interesting thing is, that tutum itself does not know about that and thinks the node has more than 50% of his memory:
What may be the reason for that? There is nothing else running on the node, it's been completly provisioned via tutum.
The underlying cloud provider is digital ocean, docker version is 1.8.3.

Related

Kubernetes: High CPU usage and zombie processes when "Disk Pressure" event

We run on-premise small K8s cluster (based on RKE stack). 1x etcd/control node, 2x worker nodes. Components are:
OS: Centos 7
Docker version: 19.3.9
K8s: 1.17.2
Other, important fact: we're using Rook-Ceph storage cluster on both worker nodes (rook: v1.2.4, ceph version 14.2.7).
When one of OS mounts run into 90%+ usage (for example: /var), K8s is reporting "Disk Pressure", disables node and it's OK. But when this happens, the CPU usage start growing up to dozens (for example 30+, 40+ on machine with 4 vCPU), many of container processes (childs to containerd-shim) goes into zombie (defunct) state and whole k8s cluster collapse.
First of all we think that's a Rook-Ceph problem with XFS storage (described at https://github.com/rook/rook/issues/3132#issuecomment-580508760), so we switched to EXT4 (because we cannot do upgrade of kernel to 5.6+), but during last weekend this happened again, and we are sure that this case is related to Disk Pressure event. Last contact with (already) dead node was 21-01, #13:50, but load starts growing at 13:07 and quickly goes to 30.5:
/var usage goes from 89.97% to 90%+ exactly at 13:07 this day:
Can you point us what we need to check in k8s configuration, logs or whatever else to find out what is going on? Why k8s is collapsing during quite normal event?
(For clarification: we know that we're using quite old versions, but we'll do a complex upgrade of environment within few weeks).

Using k8s node resources out of k8s

What would happen with kubernetes scheduling if I have a kubernetes node, but I use the container (docker) engine for some other stuff, outside of the context of kubernetes.
For example if I manually SSH to the respective node and I do docker run something. Would kubernetes scheduling take into account the fact that this node is busy running other stuff, and it might not be able to host any other containers now?
What would happen in the following scenario:
Node with 8 GB RAM
running a pod with resource request 2 GB, limit 4 GB, and current usage 3 GB
ssh on node and docker run a container with 5 GB, using all
P.S. Please skip the "why would you go and run docker run directly on the node" questions. I don't want to, but reasons.
I'm pretty sure Kubernetes's scheduling only considers (a) pods it knows about and not other resources, and (b) only their resource requests.
In the situation you describe, with exactly that resource utilization, things will work fine. The pod can be scheduled on the node because the total resource requests using it are 2 GB out of 8 GB. The total memory usage doesn't exceed the physical memory size either, so you're okay.
Say the pod allocated a little bit more memory. Now the system as a whole is above its physical memory capacity, so the Linux kernel will arbitrarily kill something off. This is often the largest thing. You'll typically see an exit code of 137 (matching SIGKILL) in whichever system manages it.
This behavior is the same even if you run your side job in something like a DaemonSet. It requests 2 GB of RAM, so both pods fit on the same node [4 GB/8 GB], but if it has a resource limit of 6 GB RAM, something will get killed off.
The place where things are different is if you can predict the high memory use. Say your pod requests 3 GB/limits 6 GB of RAM, and your side process will predictably also use 6 GB. If you just docker run it something will definitely get OOM-killed. If you run it as a DaemonSet declaring a 6 GB memory request, the Kubernetes scheduler will know the pod doesn't fit and won't place it there (it may get stuck in "Pending" state if it can't be scheduled anywhere).
Kubernetes won't see other processes running on the host, however you can tell the kubelet on that host how much of the host resources to reserve for the host itself, preventing Kubernetes from scheduling pods that would exceed the host capacity. See the --system-reserved flag that you can pass to the kubelet:
--system-reserved=[cpu=100m][,][memory=100Mi][,][ephemeral-storage=1Gi][,][pid=1000]

Google Kubernetes logs

Memory cgroup out of memory: Kill process 545486 (python3) score 2016 or sacrifice child Killed process 545486 (python3) total-vm:579096kB, anon-rss:518892kB, file-rss:16952kB
This node logs and my container is continuously restarting randomly. Running python cotnainer with 4 replicas.
Python application contains socket with a flask. Docker image contain of python3.5:slim
Kubectl get nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-XXXXXXX-cluster-highmem-pool-gen2-f2743e02-msv2 682m 17% 11959Mi 89%
Today morning node log : 0/1 nodes are available: 1 Insufficient cpu.
But node CPU usage is 17% only
There not much running inside pod.
Have a look at the best practices and try to adjust resource requests and limits for CPU and memory. If your app starts hitting your CPU limits, Kubernetes starts throttling your container. Because there is no way to throttle memory usage, if a container goes past its memory limit it will be terminated (and restarted). So, using suitable limits should help you to solve your problem with restarts of your containers.
In case request of your container exceeded limits, Kubernetes will throw an error, similar to one you have, and won’t let you run the container.
After adjusting limits, you could use some monitoring system (like Stackdriver) to find the cause of potential memory leak.

CoreOS Single Container High Memory Usage

So I have a simple Go web app I deployed as a Docker container. I am running a t2.small instance on AWS with CoreOS AMI.
The container is very small, only using about 10MB of memory according to docker stat:
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
8e230506e99a 0.00% 11.11 MB / 2.101 GB 0.53% 49.01 MB / 16.39 MB 1.622 MB / 0 B
However the CoreOS instance seems to be using a lot of memory:
$ free
total used free shared buffers cached
Mem: 2051772 1686012 365760 25388 253096 1031836
-/+ buffers/cache: 401080 1650692
Swap: 0 0 0
As you can see it's using almost 1.7GB of memory of its 2GB total memory with only about 300MB left. And this seems to be slowly getting worse.
I've had the instance running for about 3 days now and the free memory started at around 400MB after fresh launch and starting a single Docker container.
Is this something I should worry about? Or is CoreOS supposed to use so so much memory when my little Go app in a container only uses tiny 10MB.
Because a lot of that memory usage is buffers and cache. The better indicator is your application from Docker (which is likely close if it is a small Go app) and the OS total usage minux buffers and cache on the second line (which is closer to 400 MB used).
See https://unix.stackexchange.com/a/152301/6515 for a decent explanation.

Testing free memory on Debian

On my Virtual Server running Debian, I have impression, that there is wrongly configured memory, even though my provider claims everything works correctly.
Even with 3GB of RAM, I keep running out of memory, even though top commands claims it still has enough memory.
Is there a way to test, that the free memory is actually usable? For instance, if I had 1,5 GB of memory, I would like to create a block of 1 GB and see that everything still works correctly.
Thanks,
Which applications are you using? There must be a reason that you are running out of memory.
Try command free:
$ free -m
total used free shared buffers cached
Mem: 3022 2973 48 0 235 1948
-/+ buffers/cache: 790 2232
Swap: 3907 0 3907
This will show you something like the above (that is a 3 GB machine of my own).
Always check the system log of your machine if you have memory issues.
# tail /var/log/syslog

Resources