Docker contianer limiting CPU resources - docker

I have some upstream flask containers and the CPU usage hit 100% percent when i entertain some requests.
the system shows that the containers are using your CPU 100%.
My questions are:
If i limit the CPU usage on these containers, will they exit with zero error if they hit there allocated resources OR what are the disadvantages of limiting resources against docker containers?
which one is the better approach in terms of resource allocation to docker containers? (For 6 cpu cores)
a) Two containers running with default settings. (Use as much resources as the kernel can provide may be)
b) 4 containers can only use 1 CPU (--limit cpus ='1')
Please let me know if you want me elaborate more.
Thanks in Advance

Containers (and other Linux processes) that try to use more CPU cycles than they have been allocated will just get throttled: the Linux kernel will schedule other processes instead. Going over your CPU limit has no adverse consequences other than your process running slower.
For example, say your program starts 4 threads and each runs some intensive computation using a full core, but you're running this in a Docker container with --cpus=2. All four threads will run, but the combined program will be limited to 200% CPU, and the overall performance will be similar to if you had only launched 2 threads.
You will usually get better overall system utilization if you don't explicitly limit CPU utilization. If you are running 4 containers, and one of them is running the 4-thread computation job described above but the other three are idle, you will fully use the available system resources if you don't have limits.
If you do have a specific computationally intensive container, you may want to limit its CPU utilization to not starve out other processes. If you only have the one worker container and three Web server containers, consider limiting the worker to 3 or 3.5 CPUs on a 4-core system to guarantee some spare cycles for HTTP traffic. This is a tuning optimization, so look into it only if you're seeing a problem.
Note that CPU and memory work differently. You can't really use "too much" CPU, since if you wait there will always be more CPU cycles, but the kernel rations out what your process is able to run. On the other hand, memory is fixed, and your process will get killed if it goes over a memory limit.

Related

What factors can affect different containers processing on one machine at the same time?

For example, I have a 4vCPU, 8GB mem VM. At first, I ran a Nginx container on it and then used a stress test tool to continuously send requests to it and got some information like QPS, average latency. Then I ran three same Nginx containers on the VM and parallelly send the same requests above to these containers.I found that the respective QPS all decreased, and average latency all increased.
So what factors can affect different containers processing on one machine at the same time? I think the CPU and memory are enough to provide resources to these containers. What factors below the docker can affect these, firstly I think is network, but what else? And Specifically, why can network affect these QPS, average latency metrics?

How to set the CPU priority (niceness) of a Docker container?

One of my containers is always busy, and is taking CPU away from other containers (webservers) that need to be responsive and are only active from time to time.
I would like to lower the CPU priority of the CPU-consuming container, so that whenever the other containers need the CPU, it is not clogged.
How do I do this? I have been searching the web for a while now, but I can't find the answer.
I have tried running the container with --entrypoint='nice 10 mybinary', but it turns out --entrypoint can only run binaries, not shell commands.
You can limit CPU resources on the container level. I recommend to use --cpu-shares 512 for your case.
https://docs.docker.com/config/containers/resource_constraints/:
Set this flag to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles. This is only enforced when CPU cycles are constrained. When plenty of CPU cycles are available, all containers use as much CPU as they need. In that way, this is a soft limit. --cpu-shares does not prevent containers from being scheduled in swarm mode. It prioritizes container CPU resources for the available CPU cycles. It does not guarantee or reserve any specific CPU access.
Setting the CPU shares is the most direct answer to your request, and typically preferred over adding capabilities to the container could be used by a malicious actor inside of the container to impact the host. The only reason I can think of to add the SYS_NICE capability to the container is if you have multiple processes inside the container and want to give different priorities to them, or need to change the priority while the container is running.
The more traditional solution to noisy neighbors is to configure each container with a limit on how much CPU and memory it is allowed to use. This is an upper bound, so realize there may be idle CPU resources if you set this low and do not have any other tasks available for the CPU to run.
The easiest way to set the limit on containers from the docker run command line is with --cpus which allows you to configure a fractional number of cores to be available to the container. Passing an option like --cpus 2.5 allows the container to use as many as 2.5 cores before the kernel scheduler throttles the process. If you had a 4 core host, that would ensure that at least 1.5 cores are always available to other processes.
Related to these limits, with Swarm Mode you can also configure a reservation for CPU (and memory). The reservation is a lower limit that Docker ensures has not been reserved for any other containers. This is used to select nodes to schedule containers, and may prevent some containers from being scheduled when there are not enough resources available, rather than scheduling so many jobs on a single node that it fails.
--cpu-shares looks like a good answer, although it's not clear to me how to verify it's working. I'm also curious what the max value is? Document doesn't say.
But, as an alternative for trusted containers, that same document also shows --cap-add=sys_nice that will allow changing process priorities within a container. i.e., if the nice or renice command is available within the container, it should work when you add the sys_nice capability. You'll only want to allow this capability for trusted containers because you don't want untrusted programs changing their own priorities willy nilly.
You can verify by inspecting the NI column for the process in question using top or ps -efl on the host.

How to calculate the docker limits

I create my docker (python flask).
How can I calculate what is the limit to put for memory and CPU?
Do we have some tools that run performance tests on docker with different limitation and then advise what is the best limitation numbers to put?
With an application already running inside of a container, you can use docker stats to see the current utilization of CPU and memory. While there it little harm in setting CPU limits too low (it will just slow down the app, but it will still run), be careful to keep memory limits above the worst case scenario. When apps attempt to exceed their memory limit, they will be killed and usually restarted by a restart policy/orchestration tool. If the limit is set too low, you may find your app in a restart loop.
This is more about the consumption of your specific Flask application, you can probably take use the resource module in Python to calculate them.
More information here and here.

What purpose do CPU limits have in Kubernetes resp. Docker?

I dig into Kubernetes resource restrictions and have a hard time to understand what CPU limits are for. I know Kubernetes passes requests and limits down to the (in my case) Docker runtime.
Example: I have 1 Node with 1 CPU and 2 Pods with CPU requests: 500m and limits: 800m. In Docker, this results in (500m -> 0.5 * 1024 = 512) --cpu-shares=512 and (800m -> 800 * 100) --cpu-quota=80000. The pods get allocated by Kube scheduler because the requests sum does not exceed 100% of the node's capacity; in terms of limits the node is overcommited.
The above allows each container to get 80ms CPU time per 100ms period (the default). As soon as the CPU usage is 100%, the CPU time is shared between the containers based on their weight, expressed in CPU shares. Which would be 50% for each container according to the base value of 1024 and a 512 share fo each. At this point - in my understanding - the limits have no more relevance because none of the containers can get its 80ms anymore. They both would get 50ms. So no matter how much limits I define, when usage reaches critical 100%, it's partitioned by requests anyway.
This makes me wonder: Why should I define CPU limits in the first place, and does overcommitment make any difference at all? requests on the other hand in terms of "how much share do I get when everything is in use" is completely understandable.
One reason to set CPU limits is that, if you set CPU request == limit and memory request == limit, your pod is assigned a Quality of Service class = Guaranteed, which makes it less likely to be OOMKilled if the node runs out of memory. Here I quote from the Kubernetes doc Configure Quality of Service for Pods:
For a Pod to be given a QoS class of Guaranteed:
Every Container in the Pod must have a memory limit and a memory request, and they must be the same.
Every Container in the Pod must have a CPU limit and a CPU request, and they must be the same.
Another benefit of using the Guaranteed QoS class is that it allows you to lock exclusive CPUs for the pod, which is critical for certain kinds of low-latency programs. Quote from Control CPU Management Policies:
The static CPU management policy allows containers in Guaranteed pods with integer CPU requests access to exclusive CPUs on the node. ... Only containers that are both part of a Guaranteed pod and have integer CPU requests are assigned exclusive CPUs.
According to the Motivation for CPU Requests and Limits section of the Assign CPU Resources to Containers and Pods Kubernetes walkthrough:
By having a CPU limit that is greater than the CPU request, you
accomplish two things:
The Pod can have bursts of activity where it makes use of CPU resources that happen to be available.
The amount of CPU resources a Pod can use during a burst is limited to some reasonable amount.
I guess that might leave us wondering why we care about limiting the burst to "some reasonable amount" since the very fact that it can burst seems to seems to suggest there are no other processes contending for CPU at that time. But I find myself dissatisfied with that line of reasoning...
So first off I checked out the command line help for the docker flags you mentioned:
--cpu-quota int Limit CPU CFS (Completely Fair Scheduler) quota
-c, --cpu-shares int CPU shares (relative weight)
Reference to the Linux Completely Fair Scheduler means that in order to understand the value of CPU limit/quota we need to undestand how the underlying process scheduling algorithm works. Makes sense, right? My intuition is that it's not as simple as time-slicing CPU execution according to the CPU shares/requests and allocating whatever is leftover at the end of some fixed timeslice on a first-come, first-serve basis.
I found this old Linux Journal article snippet which seems to be a legit description of how CFS works:
The CFS tries to keep track of the fair share of the CPU that would
have been available to each process in the system. So, CFS runs a fair
clock at a fraction of real CPU clock speed. The fair clock's rate of
increase is calculated by dividing the wall time (in nanoseconds) by
the total number of processes waiting. The resulting value is the
amount of CPU time to which each process is entitled.
As a process waits for the CPU, the scheduler tracks the amount of
time it would have used on the ideal processor. This wait time,
represented by the per-task wait_runtime variable, is used to rank
processes for scheduling and to determine the amount of time the
process is allowed to execute before being preempted. The process with
the longest wait time (that is, with the gravest need of CPU) is
picked by the scheduler and assigned to the CPU. When this process is
running, its wait time decreases, while the time of other waiting
tasks increases (as they were waiting). This essentially means that
after some time, there will be another task with the largest wait time
(in gravest need of the CPU), and the currently running task will be
preempted. Using this principle, CFS tries to be fair to all tasks and
always tries to have a system with zero wait time for each
process—each process has an equal share of the CPU (something an
“ideal, precise, multitasking CPU” would have done).
While I haven't gone as far as to dive into the Linux kernel source to see how this algorithm actually works, I do have some guesses I would like to put forth as to how shares/requests and quotas/limits play into this CFS algorithm.
First off, my intuition leads me to believe that different processes/tasks accumulate wait_runtime at different relative rates based on their assigned CPU shares/requests since Wikipedia claims that CFS is an implementation of weighted fair queuing and this seems like a reasonable way to achieve a shares/request based weighting in the context of an algorithm that attempts to minimize the wait_runtime for all processes/tasks. I know this doesn't directly speak to the question that was asked, but I want to be sure that my explanation as a whole has a place for both concepts of shares/requests and quotas/limits.
Second, with regard to quotas/limits I intuit that these would be applicable in situations where a process/task has accumulated a disproportionately large wait_runtime while waiting on I/O. Remember that the quoted description above CFP prioritizes the process/tasks with the largest wait_runtime? If there were no quota/limit on a given process/task then it seems to me like a burst of CPU usage on that process/task would have the effect of, for as long as it takes for its wait_runtime to reduce enough that another task is allowed to preempt it, blocking all other processes/tasks from execution.
So in other words, CPU quotas/limits in Docker/Kubernetes land is a mechanism that allows the given container/pod/process to burst in CPU activity to play catch up to other processes after waiting on I/O (rather than CPU) without in the course of doing so unfairly blocking other processes from also doing work.
There is no upper bound with just cpu shares. If there are free cycles, you are free to use them. limit is imposed so that one rogue process is not holding up the resource forever.
There should be some fair scheduling. CFS imposes that using cpu quota and cpu period via the limit attribute configured here.
To conclude, this kind of property ensures that when I schedule your task you get a minimum of 50 milliseconds to finish it. If you need more time, then if no one is waiting in the queue I would let you run for few more but not more than 80 milliseconds.
I think it's correct that, during periods where the Node's CPU is being fully utilized, it's the requests (CPU shares) that will determine how much CPU time each container gets, rather than the limits (which are effectively moot at that point). In that sense, a rogue process can't do unlimited damage (by depriving another of its requests).
However, there are still two broad uses for limits:
If you don't want a container to be able to use more than a fixed amount of CPU even if extra CPU is available on the Node. It might seem weird that you wouldn't want excess CPU to be utilized, but there are use cases for this. Some that I've heard:
You're charging customers for the right to use up to x amount of compute resources (a limit), so you don't want to give them more sometimes for free (which might dissuade them from paying for a higher tier on your service).
You're trying to figure out how a service will perform under load, but this gets complicated/unpredictable, because the performance during your load testing depends on how much spare CPU is lying around that the service is able to utilize (which might be a lot more than the spare CPU that'll actually be on the Node during a real-world high-load situation). This is mentioned here as a big risk.
If the requests on all the containers aren't set especially accurately (as is often the case; devs might set the values upfront and forget to update them as the service evolves, or not even set them very carefully initially). In these cases, things sometimes still function well enough if there's enough slack on the Node; limits can then be useful to prevent a buggy workload from eating all the slack and forcing the other pods back to their incorrectly-set(!) requested amounts.

Passenger server upgrade: Processor (CPU) Cores VS Ram?

I went through documentation of Passenger to find out how many application instances it can run with respect to hardware configuration. Documentation only talks about RAM
The optimal value depends on your system’s hardware and the server’s average load. You should experiment with different values. But generally speaking, the value should be at least equal to the number of CPUs (or CPU cores) that you have. If your system has 2 GB of RAM, then we recommend a value of 30. If your system is a Virtual Private Server (VPS) and has about 256 MB RAM, and is also running other services such as MySQL, then we recommend a value of 2.
It says minimum value can be number of CPU/CPU Cores we have. I have a VPS with one VCPU & 1GB RAM & my service provider has an option to just upgrade the RAM. I'm wondering how far I can just keep upgrading only RAM? How important it is to upgrade number of CPUs?
Quick Answer
Depends on what resources are the bottleneck for your app.
Long answer
You'll need to factor in a few things:
How much CPU time does your app need?
How much RAM does any given instance of your app use at peak load?
Does your app spend a lot of time doing IO intensive tasks? (ie: db and file reads/writes, network communication)
There can be other things to factor in, but your bottlenecks will probably be one of the above. If RAM is your main bottleneck, by all means use your newly available RAM. However, if it turns out that your app is being slowed down by CPU availability or flooded IO, no amount of RAM is going to speed things up.
On the topic of CPU cores; my understanding is that the main Apache process that runs Passenger is a single threaded process. Apache spawns new threads to handle concurrency on an as-needed basis. Each additional CPU core theoretically allows you to spawn x*n threads, where x is the number of threads you can optimally run under a single CPU core and n is the number of CPU cores available to Apache.
Disclaimer: I'm not very well read on Passenger internals; though this logic usually holds true for other kinds of Apache configurations.

Resources