Docker OOM: Daemon should kill entire container not random process - docker

I have a docker container running multiple processes started via docker-compose with
mem_limit: 200m
memswap_limit: 200m
When the memory limit is reached docker kills a random process, but I want the entire container (with all processes) to be terminated.
Is there a configuration option for that?
Edited / Additional information:
The container runs a multi-threaded python application with each thread running in its own process.

If the container is killed by docker from hitting the cgroup limit, you can inspect the killed container and should see OOMKilled set to true:
$ docker container inspect --format '{{.State.OOMKilled}}' $container_name
false
If it's false like above, then most likely the kernel killed a process based on the OS running out of memory rather than the cgroup limit being reached. You should see that in the kernel logs (/var/log/messages or /var/log/syslog usually). The lines would look like:
[11686.043641] Out of memory: Kill process 2603 (flasherav) score 761 or sacrifice child
[11686.043647] Killed process 2603 (flasherav) total-vm:1498536kB, anon-rss:721784kB, file-rss:4228kB
If the OS is killing processes, that's a sign you either need to reduce the workload on the host, tighten the cgroup limit on the container, or increase the memory available to the host (larger VM or adding RAM to the machine). If you reach the cgroup limit set on the container, docker should terminate the entire container.

Related

How to debug Docker containers killed by host (137)?

I'm running 3 applications together using docker-compose:
Standard Nginx image
Java/Spark API server
Node.js app (backend + frontend)
I can bring the composed service up with docker-compose up no problem, and it runs for a period of time with no issues. At some point something kills the two non-nginx containers with code 137, and the service goes down.
My docker-compose.yml has restart: always on each container, but as I understand it this will not restart the containers if they're getting killed in this way. I verified this with docker kill $CONTAINER on each one, and they are not restarted.
When the application exits, all I see at the end of my logs is:
nginx exited with code 0
java_app exited with code 143
node_app exited with code 137
How can I debug why the host is killing these containers, and either stop this from happening or make them restart on failure?
You are not have enough memory or your applications have a memory leak. You can limit each container. Also, you can try to create swap space if you are not have enough memory.

Using KinD to create a local cluster and the CPU maintains high usage

I'm using KinD to create a local cluster and noted that the CPU percentage usage stays relatively high, between 40-60 for docker.hyperkit on Mac OS Catalina 10.15.6. Within Docker for mac I limited the resources to CPUs: 4 and Memory:6.00 GB.
My KinD cluster consists of a control plane node and three worker nodes. Is this CPU usage normal for docker for mac? Can I check to see the utilization per container?
Each kind "node" is a Docker container, so you can inspect those in "normal" ways.
Try running kind create cluster to create a single-node cluster. If you run docker stats you will get CPU, memory, and network utilization information; you can also get the same data through the Docker Desktop application, selecting (whale) > Dashboard. This brings up some high-level statistics on the container. Sitting idle on a freshly created cluster, this seems to be consistently using about 30% CPU for me. (So 40-60% CPU for a control-plane node and three workers sounds believable.)
Similarly, since each "node" is a container, you can docker exec -it kind-control-plane bash to get an interactive debugging shell in a node container. Once you're there, you can run top and similar diagnostic commands. On my single node I see the top processes as kube-apiserver (10%), kube-controller (5%), etcd (5%), and kubelet (5%). Again, that seems reasonably normal, though it might be nice if it used less CPU sitting idle.

How docker allocate memory to the process in container?

Docker first initializes a container and then execute the program you want. I wonder how docker manages the memory address of container and the program in it.
Docker does not allocate memory, it's the OS that manages the resources used by programs. Docker (internally) uses cgroups which is a kernel service. The reason that ps command (on the host) won't show up processes running in a container, is that containers run in different cgroups which are isolated from each other.
Rather than worrying about the docker memory, you would need to look at the underlying host (VM/instance) where you are running the docker container. The number of containers is determined by a number of factors including what your app is that runs on the container.
See here for the limits that you can run into Is there a maximum number of containers running on a Docker host?

Does dockerd process cpu contain the cpu used by the containers?

We have a CentOS machine running Docker with a couple of containers. When running top, I see the process dockerd which sometimes is using a lot of cpu. Does this cpu utilization contain the cpu usage inside the containers?

Is there a maximum number of containers running on a Docker host?

Basically, the title says it all: Is there any limit in the number of containers running at the same time on a single Docker host?
There are a number of system limits you can run into (and work around) but there's a significant amount of grey area depending on
How you are configuring your docker containers.
What you are running in your containers.
What kernel, distribution and docker version you are on.
The figures below are from the boot2docker 1.11.1 vm image which is based on Tiny Core Linux 7. The kernel is 4.4.8
Docker
Docker creates or uses a number of resources to run a container, on top of what you run inside the container.
Attaches a virtual ethernet adaptor to the docker0 bridge (1023 max per bridge)
Mounts an AUFS and shm file system (1048576 mounts max per fs type)
Create's an AUFS layer on top of the image (127 layers max)
Forks 1 extra docker-containerd-shim management process (~3MB per container on avg and sysctl kernel.pid_max)
Docker API/daemon internal data to manage container. (~400k per container)
Creates kernel cgroups and name spaces
Opens file descriptors (~15 + 1 per running container at startup. ulimit -n and sysctl fs.file-max )
Docker options
Port mapping -p will run a extra process per port number on the host (~4.5MB per port on avg pre 1.12, ~300k per port > 1.12 and also sysctl kernel.pid_max)
--net=none and --net=host would remove the networking overheads.
Container services
The overall limits will normally be decided by what you run inside the containers rather than dockers overhead (unless you are doing something esoteric, like testing how many containers you can run :)
If you are running apps in a virtual machine (node,ruby,python,java) memory usage is likely to become your main issue.
IO across a 1000 processes would cause a lot of IO contention.
1000 processes trying to run at the same time would cause a lot of context switching (see vm apps above for garbage collection)
If you create network connections from a 1000 containers the hosts network layer will get a workout.
It's not much different to tuning a linux host to run a 1000 processes, just some additional Docker overheads to include.
Example
1023 Docker busybox images running nc -l -p 80 -e echo host uses up about 1GB of kernel memory and 3.5GB of system memory.
1023 plain nc -l -p 80 -e echo host processes running on a host uses about 75MB of kernel memory and 125MB of system memory
Starting 1023 containers serially took ~8 minutes.
Killing 1023 containers serially took ~6 minutes
From a post on the mailing list, at about 1000 containers you start running into Linux networking issues.
The reason is:
This is the kernel, specifically net/bridge/br_private.h BR_PORT_BITS cannot be extended because of spanning tree requirements.
With Docker-compose, I am able to run over 6k containers on single host (with 190GB memory). container image is under 10MB. But Due to bridge limitation i have divided the containers in batches with multiple services, each service have 1k containers and separate subnet network.
docker-compose -f docker-compose.yml up --scale servicename=1000 -d
But after reaching 6k even though memory is still available around 60GB, it stops scaling and suddenly spikes up the memory. There should be bench-marking figures published by docker team to help, but unfortunately its not available. Kubernetes on the other hand publishes bench-marking stats clearly about the number of pods recommended per node.

Resources