Trying to figure out how to kill a container that has its cpu usage over 100% using the docker stats results. I have created the below script that exports the stats to a file then looks at the results and looks for container id with cpu over 100% and kills it the problem is it looks like it is killing containers that are at 40%. The results return in this format 00.00% which I think might be the problem but not sure how the awk views the number when comparing to the % in the file.
#!/bin/bash
docker stats --no-stream > /tmp/cpu.log
sed -i 's/CONTAINER//g' /tmp/cpu.log
KILLCPU=$(awk '$2 >= 11000 {print$1}' /tmp/cpu.log)
docker stop $KILLCPU
Add a +0 to the field to get awk to properly recognize the percentage.
KILLCPU=$(awk '$2+0 >= 110 {print$1}' /tmp/cpu.log)
When a containers uses > 100% CPU it is running on more than one CPU. Killing them because they reach a certain percentage is not the correct approach.
I suggest you use the --cpu-shares option to docker run:
See:
https://docs.docker.com/engine/reference/run/#cpu-share-constraint
CPU share constraint By default, all containers get the same
proportion of CPU cycles. This proportion can be modified by changing
the container’s CPU share weighting relative to the weighting of all
other running containers.
To modify the proportion from the default of 1024, use the -c or
--cpu-shares flag to set the weighting to 2 or higher. If 0 is set, the system will ignore the value and use the default of 1024.
The proportion will only apply when CPU-intensive processes are
running. When tasks in one container are idle, other containers can
use the left-over CPU time. The actual amount of CPU time will vary
depending on the number of containers running on the system.
For example, consider three containers, one has a cpu-share of 1024
and two others have a cpu-share setting of 512. When processes in all
three containers attempt to use 100% of CPU, the first container would
receive 50% of the total CPU time. If you add a fourth container with
a cpu-share of 1024, the first container only gets 33% of the CPU. The
remaining containers receive 16.5%, 16.5% and 33% of the CPU.
On a multi-core system, the shares of CPU time are distributed over
all CPU cores. Even if a container is limited to less than 100% of CPU
time, it can use 100% of each individual CPU core.
For example, consider a system with more than three cores. If you
start one container {C0} with -c=512 running one process, and another
container {C1} with -c=1024 running two processes, this can result in
the following division of CPU shares:
PID container CPU CPU share
100 {C0} 0 100% of CPU0
101 {C1} 1 100% of CPU1
102 {C1} 2 100% of CPU2
Related
I am running postgres timescaledb on my docker swarm node. I set limit of CPU to 4 and Mem limit to 32G. When I check docker stats, I can see this output:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
c6ce29c7d1a4 pg-timescale.1.6c1hql1xui8ikrrcsuwbsoow5 341.33% 20.45GiB / 32GiB 63.92% 6.84GB / 5.7GB 582GB / 172GB 133
CPU% is oscilating around 400%. Node has 6CPUs and average load has been 1 - 2 (1minute load average), So according to me, with my limit of CPUs - 4, the maximum load should be oscilating around 6. My current load is 20 (1minute load average), and output of top command from inside of postgres show 50-60%.
My service configuration limit:
deploy:
resources:
limits:
cpus: '4'
memory: 32G
I am confused, all values are different so what is real CPU usage of postgres and how to limit it ? My server load is pushed to maximum even limit of postgres is set to 4. Inside postgres I can see from htop that there is 6 cores and 64G MEM so its looks like it has all resources of the hosts. From docker stats maximum cpu is 400% - corelate with limit of 4 cpus.
Load average from commands like top in Linux refer to the number of processes running or waiting to run on average over some time period. CPU limits used by docker specify the number of CPU cycles over some timeframe permitted for processes inside of a cgroup. These aren't really measuring the same thing, especially when you factor in things like I/O waiting. You can have a process waiting for a read from disk, that wants to run but is blocked on that I/O call, increasing your load measurements on the host, but not using any CPU cycles.
When calculating how much CPU to allocate to a cgroup, no only do you need to factor in the I/O and other system needs of the process, but consider queuing theory when you approach saturation on the CPU. The closer you get to 100% utilization of the CPU, the longer the queue of processes ready to run will likely be, resulting in significant jumps in load measurements.
Setting these limits correctly will likely require trial and error because not all processes are the same, and not all workload on the host is the same. A batch processing job that kicks off at irregular intervals and saturates the drives and network will have a very different impact on the host from a scientific computation that is heavily CPU and memory bound.
I am running a java process as a docker swarm service. But that service is hogging my CPU, eventually. I tried with CPU limit as 1, and docker stats showing that container to be consistent 100%, but I want to fail that container in 95% and recreated. Is there any way I can accomplish this?
Thanks in advance.
CPU is a compressible resource, unlike memory. When memory requests exceed the limit, the kernel will kill the app. When CPU exceeds the limit, the kernel simply gives that process less time on the CPU and it runs slower.
There's no built in capability to change this behavior. You would need to implement some form of external monitoring with the ability to kill the container when a threshold is exceeded.
More than likely, what you actually want is to setup a healthcheck for your container that detects the application becoming unresponsive. You will need to run the container using swarm mode to automatically recreate the container with the failing healthcheck.
I have a C program running in an alipine docker container. The image size is 10M on both OSX and on ubuntu.
On OSX, when I run this image, using the 'docker stats' I see it uses 1M RAM and so in the docker compose file I allocate a max of 5M within my swarm.
However, on Ubuntu 16.04.4 LTS the image is also 10M but when running it uses about 9M RAM, and I have to increase the max allocated memory in my compose file.
Why is there a such a difference in RAM usage between OSX and Ubuntu?
Even though we have different OSs, I would have thought once you are running inside a framework, then you would behave similarly on different machines. So I would have thought there should be comparable memory usage.
Update:
Thanks for the comments. So 'stats' may be inaccurate, and there are differences so best to baseline on linux. As an aside, but I think interesting, the reason for asking this question is to understand the 'under the hood' in order to tune my setup for a large number of deployed programs. Originally, when I tested I tried to allocate the smallest amount of maximum RAM on ubuntu, this resulted in a lot of disk thrashing something I didn't see or hear on my Macbook, (no hard disks!).
Some numbers which are completely my setup but also I think are interesting.
1000 docker containers, 1 C program each, 20M RAM MAX per container, Server load of 98, Server runs 4K processes in total, [1000 C programs total]
20 docker containers, 100 C programs each, 200M RAM MAX per container, Server load of 5 to 50, Server runs 2.3K processes in total, [2000 C programs total].
This all points at give your docker images a good amount of MAX RAM and it is nicer to your server to have fewer docker containers running.
I'm relatively new to docker. I'm trying to get the percent of cpu quota (actually) used by a container. Is there a default metric emitted by one of the endpoints or is it something that I will have to calculate with other metrics? Thanks!
docker stats --no-stream
CONTAINER ID NAME CPU % (rest of line truncated)
949e2a3724e6 practical_shannon 8.32% (truncated)
As mentioned in the comment from #asuresh4, above, docker stats appears to give the ACTUAL cpu utilization, not the configured values. The output here is from Docker version 17.12.1-ce, build 7390fc6
--no-stream means run stats once, not continuously as it normally does. As you might guess, you can also ask for stats on a single container (specify the container name or id).
In addition to CPU %, MEM USAGE / LIMIT, MEM %, NET I/O, and BLOCK I/O are also shown.
If I am running multiple docker containers with bursty memory and CPU utilization, will they be able to use the full capacity of the host machine? Or will they be limited to their CPU and memory limits of the individual container definitions?
For example:
If I were running 3 containers that burst to 1GB of memory once per day, at disjoint times.
And similarly if those same containers instead were CPU heavy, and bursted to 1CPU unit per day at disjoint times.
Could I run those 3 containers on a box with only 1.1GB of memory, or 1.1 CPU unit respectively?
Docker containers are not VM's,
They run in a cage over the host OS kernel, so there's no hypervisor magic behind.
Processes running inside a container are not much different from host processes from a kernel point of view. They are just highly isolated.
Memory and cpu scheduling will be handled by the "host". What you set on docker settings are CPU shares, to give priority and bounds to some containers.
So yes, containers with sleeping processes won't consume much cpu/memory if the used memory is correctly freed after the processing spike, otherwise, that memory would be swapped out, with no much performance impact.
Instantiating a docker container will only consume memory resources. As long as no process is running, you will see zero cpu usage by it.
I would recommend reviewing cgroups documentation, and actually docs for cgroups v2, since they are better structured that v1 docs. See chapter 5 for cpu and memory controllers: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
When you don't need to explicitly specify --memory and --cpu-shares option at the container startup, the container will have all the cpu share and memory available for use on the instance. If no other process is consuming the resources,then the container can use all the cpu and memory available.
In theory you should be able to run the 3 containers on the instance.
Make sure non of the containers tie up the memory or cpu resources.