percent of cpu quota (actually) used by a container - docker

I'm relatively new to docker. I'm trying to get the percent of cpu quota (actually) used by a container. Is there a default metric emitted by one of the endpoints or is it something that I will have to calculate with other metrics? Thanks!

docker stats --no-stream
CONTAINER ID NAME CPU % (rest of line truncated)
949e2a3724e6 practical_shannon 8.32% (truncated)
As mentioned in the comment from #asuresh4, above, docker stats appears to give the ACTUAL cpu utilization, not the configured values. The output here is from Docker version 17.12.1-ce, build 7390fc6
--no-stream means run stats once, not continuously as it normally does. As you might guess, you can also ask for stats on a single container (specify the container name or id).
In addition to CPU %, MEM USAGE / LIMIT, MEM %, NET I/O, and BLOCK I/O are also shown.

Related

Docker Container shows different CPU Usage with different tools

I am building a project inside the docker container without any resource limitation on creating the container. when I am monitoring it, I see the different results for CPU usage.
from ctop
From the Grafana (Full Node Exporter Chart)
And from the cAdvisor
I do not understand why the results are different, specially with ctop command.
but my main question is, does Docker really use all CPUs? this machine has 16 vCPU and 16GB RAM
It's not exactly clear in the node exporter what instance or container you are monitoring, though it seems like the node exporter it showing the total machine CPU usage in 0-100 format and ctop shows in 100% per vCPU format.
Also try using docker stats, that should show all running containers resource usage, from cpu to network and disk usage, when using it each vCPU would as 100% so your total will be 1600% for 16 vCPU.
Regarding the cAdvisor output it doesn't show the same time range as the grafana node exporter so it would be hard to make a hard conclusion but it seems like similarly to ctop and docker stats it shows on a per core basis but instead of percentage it shows in 'cores' unit of measurement

What is real CPU usage of container from docker stats command

I am running postgres timescaledb on my docker swarm node. I set limit of CPU to 4 and Mem limit to 32G. When I check docker stats, I can see this output:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
c6ce29c7d1a4 pg-timescale.1.6c1hql1xui8ikrrcsuwbsoow5 341.33% 20.45GiB / 32GiB 63.92% 6.84GB / 5.7GB 582GB / 172GB 133
CPU% is oscilating around 400%. Node has 6CPUs and average load has been 1 - 2 (1minute load average), So according to me, with my limit of CPUs - 4, the maximum load should be oscilating around 6. My current load is 20 (1minute load average), and output of top command from inside of postgres show 50-60%.
My service configuration limit:
deploy:
resources:
limits:
cpus: '4'
memory: 32G
I am confused, all values are different so what is real CPU usage of postgres and how to limit it ? My server load is pushed to maximum even limit of postgres is set to 4. Inside postgres I can see from htop that there is 6 cores and 64G MEM so its looks like it has all resources of the hosts. From docker stats maximum cpu is 400% - corelate with limit of 4 cpus.
Load average from commands like top in Linux refer to the number of processes running or waiting to run on average over some time period. CPU limits used by docker specify the number of CPU cycles over some timeframe permitted for processes inside of a cgroup. These aren't really measuring the same thing, especially when you factor in things like I/O waiting. You can have a process waiting for a read from disk, that wants to run but is blocked on that I/O call, increasing your load measurements on the host, but not using any CPU cycles.
When calculating how much CPU to allocate to a cgroup, no only do you need to factor in the I/O and other system needs of the process, but consider queuing theory when you approach saturation on the CPU. The closer you get to 100% utilization of the CPU, the longer the queue of processes ready to run will likely be, resulting in significant jumps in load measurements.
Setting these limits correctly will likely require trial and error because not all processes are the same, and not all workload on the host is the same. A batch processing job that kicks off at irregular intervals and saturates the drives and network will have a very different impact on the host from a scientific computation that is heavily CPU and memory bound.

Nifi 1.6.0 memory leak

We're running Docker containers of NiFi 1.6.0 in production and have to come across a memory leak.
Once started, the app runs just fine, however, after a period of 4-5 days, the memory consumption on the host keeps on increasing. When checked in the NiFi cluster UI, the JVM heap size used hardly around 30% but the memory on the OS level goes to 80-90%.
On running the docker starts command, we found that the NiFi docker container is consuming the memory.
After collecting the JMX metrics, we found that the RSS memory keeps growing. What could be the potential cause of this? In the JVM tab of cluster dialog, young GC also seems to be happening in a timely manner with old GC counts shown as 0.
How do we go about identifying in what's causing the RSS memory to grow?
You need to replicate that in a non-docker environment, because with docker, memory is known to raise.
As I explained in "Difference between Resident Set Size (RSS) and Java total committed memory (NMT) for a JVM running in Docker container", docker has some bugs (like issue 10824 and issue 15020) which prevent an accurate report of the memory consumed by a Java process within a Docker container.
That is why a plugin like signalfx/docker-collectd-plugin mentions (two weeks ago) in its PR -- Pull Request -- 35 to "deduct the cache figure from the memory usage percentage metric":
Currently the calculation for memory usage of a container/cgroup being returned to SignalFX includes the Linux page cache.
This is generally considered to be incorrect, and may lead people to chase phantom memory leaks in their application.
For a demonstration on why the current calculation is incorrect, you can run the following to see how I/O usage influences the overall memory usage in a cgroup:
docker run --rm -ti alpine
cat /sys/fs/cgroup/memory/memory.stat
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
dd if=/dev/zero of=/tmp/myfile bs=1M count=100
cat /sys/fs/cgroup/memory/memory.stat
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
You should see that the usage_in_bytes value rises by 100MB just from creating a 100MB file. That file hasn't been loaded into anonymous memory by an application, but because it's now in the page cache, the container memory usage is appearing to be higher.
Deducting the cache figure in memory.stat from the usage_in_bytes shows that the genuine use of anonymous memory hasn't risen.
The signalFX metric now differs from what is seen when you run docker stats which uses the calculation I have here.
It seems like knowing the page cache use for a container could be useful (though I am struggling to think of when), but knowing it as part of an overall percentage usage of the cgroup isn't useful, since it then disguises your actual RSS memory use.
In a garbage collected application with a max heap size as large, or larger than the cgroup memory limit (e.g the -Xmx parameter for java, or .NET core in server mode), the tendency will be for the percentage to get close to 100% and then just hover there, assuming the runtime can see the cgroup memory limit properly.
If you are using the Smart Agent, I would recommend using the docker-container-stats monitor (to which I will make the same modification to exclude cache memory).
Yes, NiFi docker has memory issues, shoots up after a while & restarts on its own. On the other hand, the non-docker works absolutely fine.
Details:
Docker:
Run it with 3gb Heap size & immediately after the start up it consumes around 2gb. Run some processors, the machine's fan runs heavily & it restarts after a while.
Non-Docker:
Run it with 3gb Heap size & it takes 900mb & runs smoothly. (jconsole)

How to make a Jupyter Docker container use more memory

I'm running a jupyter/scipy-notebook Docker container.
I have not restricted the memory assigned to the container with the run command.
However, what I'm seeing issuing the docker stats command is that the container is limiting its memory usage to 2 GB (on 16 GB available!), even if doing complex calculations.
How is this possible?
Alter the resources (RAM) settings from Docker Desktop - MAC/Windows.
MAC - Docker Desktop
Preferences --> Advanced --> Change Ram Settings
Windows - Docker Desktop
Settings --> Resources --> Change the CPU / RAM / SWAP Settings
Reference: Compiled the solution from #samirko and #Andris Birkmanis. (Added Windows Solution)
I am running Docker on Mac OS and Jupyter crashed when trying to read over 600MB CSV file.
Following Andris Birkmanis instructions helped to tackle the issue by increasing the size of allocated memory for Docker.
If everything is going well, by default, docker shouldn't limit by default memory usage at all. So, your MEM USAGE / LIMIT doing docker stats [containerid] should be the same than your total memory (16Gb in your case), although it's not free but available.
Furthermore, there's no way to set by default a docker memory limit invoking dockerd,
So, the only thing I can purpose is specify memory limit in docker run
-m, --memory="" Memory limit (format: <number>[<unit>]). Number is a positive integer. Unit can be one of b, k, m, or g. Minimum is 4M.
--memory-swap="" Total memory limit (memory + swap, format: <number>[<unit>]). Number is a positive integer. Unit can be one of b, k, m, or g.
--memory-reservation="" Memory soft limit (format: <number>[<unit>]). Number is a positive integer. Unit can be one of b, k, m, or g.
--kernel-memory="" Kernel memory limit (format: <number>[<unit>]). Number is a positive integer. Unit can be one of b, k, m, or g. Minimum is 4M.
For more information, please check Docker documentation run-time options
Check your docker run --memory-reservation=10g ...and let's see.

Killing containers using docker stats results

Trying to figure out how to kill a container that has its cpu usage over 100% using the docker stats results. I have created the below script that exports the stats to a file then looks at the results and looks for container id with cpu over 100% and kills it the problem is it looks like it is killing containers that are at 40%. The results return in this format 00.00% which I think might be the problem but not sure how the awk views the number when comparing to the % in the file.
#!/bin/bash
docker stats --no-stream > /tmp/cpu.log
sed -i 's/CONTAINER//g' /tmp/cpu.log
KILLCPU=$(awk '$2 >= 11000 {print$1}' /tmp/cpu.log)
docker stop $KILLCPU
Add a +0 to the field to get awk to properly recognize the percentage.
KILLCPU=$(awk '$2+0 >= 110 {print$1}' /tmp/cpu.log)
When a containers uses > 100% CPU it is running on more than one CPU. Killing them because they reach a certain percentage is not the correct approach.
I suggest you use the --cpu-shares option to docker run:
See:
https://docs.docker.com/engine/reference/run/#cpu-share-constraint
CPU share constraint By default, all containers get the same
proportion of CPU cycles. This proportion can be modified by changing
the container’s CPU share weighting relative to the weighting of all
other running containers.
To modify the proportion from the default of 1024, use the -c or
--cpu-shares flag to set the weighting to 2 or higher. If 0 is set, the system will ignore the value and use the default of 1024.
The proportion will only apply when CPU-intensive processes are
running. When tasks in one container are idle, other containers can
use the left-over CPU time. The actual amount of CPU time will vary
depending on the number of containers running on the system.
For example, consider three containers, one has a cpu-share of 1024
and two others have a cpu-share setting of 512. When processes in all
three containers attempt to use 100% of CPU, the first container would
receive 50% of the total CPU time. If you add a fourth container with
a cpu-share of 1024, the first container only gets 33% of the CPU. The
remaining containers receive 16.5%, 16.5% and 33% of the CPU.
On a multi-core system, the shares of CPU time are distributed over
all CPU cores. Even if a container is limited to less than 100% of CPU
time, it can use 100% of each individual CPU core.
For example, consider a system with more than three cores. If you
start one container {C0} with -c=512 running one process, and another
container {C1} with -c=1024 running two processes, this can result in
the following division of CPU shares:
PID container CPU CPU share
100 {C0} 0 100% of CPU0
101 {C1} 1 100% of CPU1
102 {C1} 2 100% of CPU2

Resources