Parallel Docker Container Creation - docker

I am using a Docker Setup that consists of 14 different containers. Every container gets a cpu_limit of 2 and a mem_limit of 2g.
To create and run these containers, I've written a Python script that uses the docker-py library. As of now, the containers are created sequentially, which takes approximately 2 minutes.
Now I'm thinking about parallelizing the process. So now instead of doing (its pseudocode):
for container in containers_to_start:
create_container(container)
I do
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
pool.map(create_container, containers_to_start)
And as a result the 14 containers are created 2x faster. BUT: The applications within the containers take a significant longer time to boot. At the end of the day, i dont gain really much, the time until every application is reachable is more or less the same, no matter if with or without multithreading.
But I don't really know why, because every container gets the same amount of CPU and memory resources, so I would expect the same boot time no matter how many containers are starting at the same time. Clearly this is not the case. Maybe I'm missing some knowledge here, any explanation would be greatly appreciated.
System Specs
CPU: intel i7 # 2.90 GHz
32GB RAM
I am using Windows 10 with Docker installed in WSL2 backend.

Related

Limit MarkLogic memory consumption in docker container

The project in which I am working develops a Java service that uses MarkLogic 9 in the backend.
We are running a Jenkins build server that executes (amongst others) several tests in MarkLogic written in XQuery.
For those tests MarkLogic is running in a docker container on the Jenkins host (which is running Ubuntu Linux).
The Jenkins host has 12 GB of RAM and 8 GB of swap configured.
Recently I have noticed that the MarkLogic instance running in the container uses a huge amount of RAM (up to 10 GB).
As there are often other build jobs running in parallel, the Jenkins starts to swap, sometimes even eating up all swap
so that MarkLogic reports it cannot get more memory.
Obviously, this situation leads to failed builds quite often.
To analyse this further I made some tests on my PC running Docker for Windows and found out that the MarkLogic tests
can be run successfully with 5-6 GB RAM. The MarkLogic logs show that it sees all the host memory and wants to use everything.
But as we have other build processes running on that host this behaviour is not desirable.
My question: is there any possibility to tell the MarkLogic to not use so much memory?
We are preparing the docker image during the build, so we could modify some configuration, but it has to be scripted somehow.
The issue of the container not detecting memory limit correctly has been identified, and should be addressed in a forthcoming release.
In the meantime, you might be able to mitigate the issue by:
changing the group cache sizing from automatic to manual and setting cache sizes appropriate for the allocated resources. There area variety of ways to set these configs, whether deploying and settings configs from ml-gradle project, making your own Manage API REST calls, or programmatically:
admin:group-set-cache-sizing
admin:group-set-compressed-tree-cache-partitions
admin:group-set-compressed-tree-cache-size
admin:group-set-expanded-tree-cache-partitions
admin:group-set-expanded-tree-cache-size
admin:group-set-list-cache-partitions
admin:group-set-list-cache-size
reducing the in-memory-limit
in memory limit specifies the maximum number of fragments in an in-memory stand. An in-memory stand contains the latest version of any new or changed fragments. Periodically, in-memory stands are written to disk as a new stand in the forest. Also, if a stand accumulates a number of fragments beyond this limit, it is automatically saved to disk by a background thread.

How does host machine's CPU utilized by docker containers and other applications running on host?

I am running a micro-service application in docker container and have to test that using JMeter tool. So I am running JMeter on my host machine and my host machine has 4 cores. I allocate 2 cores to the container using --cpu=2 flag while running the container. so it means it can use up to 2 cores as per it needs while running. I leave the remaining 2 cores for the JMeter and other applications and system usage.
Here I need a clarification that what will happen if JMeter and other application needs more than 2 cores and container also needs allocated 2 cores fully ?
Is there any way to allocate 2 cores fully to the container? (It means any other applications or system can't use that 2 cores)
Thank you in advance.
The answer is most probably "no", the explanations will differ depending on your operating system.
You can try to implement this by playing with CPU affinity, however CPU is not only one metric you should be looking at, I would rather be concerned about RAM and Disk usage.
In general having load generator and application under test on the same physical machine is a very bad idea because they are both very resource intensive so consider using 2 separate machines for this otherwise both will suffer from context switches and you will not be able to monitor resources usage of JMeter and the application under test using JMeter PerfMon Plugin

Should Docker release all memory when all containers are closed?

I am debugging a possible memory leak in a web service I have running as a Docker network. The service has a Javascript front end, Flask REST API, Dask worker pool, the spaCy natural language toolkit...the works. I see intermittent running-out-of memory problems and I'm trying to get a handle on what could be going on.
I can run this system on my laptop, a MacBook Pro with 16 GB of memory where I am using Docker Desktop. When there are no containers running, Activity Monitor shows com.docker.hyperkit using about 12 GB. Then I launch the Docker network, which ultimately runs 14 containers to house the various components. I perform a fairly large batch job in the Docker network. It runs for an hour, during which time com.docker.hyperkit's memory creeps up to around 18 GB. This is not surprising--this is a memory intensive service. But when I stop all the containers in the network, I would expect com.docker.hyperkit's memory usage to drop back to 12 GB. Instead it stays at 18 GB. The only way I can get it back to 12 GB is to restart the Docker Desktop.
Is this expected behavior? It looks like a memory leak in Docker.
No it should not release the memory, and yes it is expected behavior.
There is no way to run docker containers natively on MacOS, so you run them inside of a virtual machine. A VM gets memory assigned to it, which it assigns to processes running inside of that VM. When those processes inside of the VM exit, the resources are released back to the VM, but not back to the parent MacOS. That's just how VM's work, and the fact that it didn't take all of the memory up to the limit specified in the Docker preferences immediately on startup is an impressive feat itself.
The containers themselves are processes running within this VM, and they will release all of their memory back to the VM upon exit. If you run something like docker run --rm busybox free you'll likely see the memory being used and freed within the VM.
For more details on this, there's several extensive threads in the github issues. Most of the comments on these threads appear to be from users assuming MacOS is running containers, rather than a VM that runs containers. Even completely idle, that VM will use some resources to run the kernel, container runtime daemons, volume sharing code, port forwarding code, etc. There's a lot of magic under the covers to make docker not look like a VM to the user, so that you can just pass paths and connect to ports on the MacOS side. The most helpful comment in the thread to me is here: https://github.com/moby/hyperkit/issues/231#issuecomment-448416559

Docker Swarm CPU overload on deploy with Spring Boot containers

I have created a number of Spring Boot application, which all work like magic in isolation or when started up one of the other manually.
My challenge is that I want to deploy a stack with all the services in a Docker Swarm.
Initially I didn't understand what was going on, as it seemed like all my containers were hanging.
Turns out running a single Spring Boot application spikes up my CPU utilization to max it out for a good couple of seconds (20s+ to start up).
Now the issue is that Docker Swarm is launching 10 of these containers simultaneously and my load average goes above 80 and the system grinds to a halt. The container HEALTHCHECKS starts timing out and eventually Docker restarts them. This is an endless cycle and may or may not stabilize and if it does stabilize it takes a minimum of 30 minutes. So much for micro services vs big fat Java EE applications :(
Is there any way to convince Docker to rollout the containers one by one? I'm sure this will help a lot.
There is a rolling update parameter - https://docs.docker.com/engine/swarm/swarm-tutorial/rolling-update/ - but is does not seem applicable to startup deployment.
Your help will be greatly appreciated.
I've also tried systemd (which isn't ideal for distributed micro services). It worked slightly better than Docker, but have the same issue when deploying all the applications at once.
Initially I wanted to try Kubernetes, but I've got enough on my plate and if I can get away with Docker Swarm, that would be awesome.
Thanks!

Limiting a Docker Container to a single cpu core

I'm trying to build a system which runs pieces of code in consistent conditions, and one way I imagine this being possible is to run the various programs in docker containers with the same layout, reserving the same amount of memory, etc. However, I can't seem to figure out how to keep CPU usage consistent.
The closest thing I can seem to find are "cpu shares," which, if I understand the documentation, limit cpu usage with respect to what other containers/other processes are running on the system, and what's available on the system. They do not seem to be capable of limiting the container to an absolute amount of cpu usage.
Ideally, I'd like to set up docker containers that would be limited to using a single cpu core. Is this at all possible?
If you use a newer version of Docker, you can use --cpuset-cpus="" in docker run to specify the CPU cores you want to allocate:
docker run --cpuset-cpus="0" [...]
If you use an older version of Docker (< 0.9), which uses LXC as the default execution environment, you can use --lxc-conf to configure the allocated CPU cores:
docker run --lxc-conf="lxc.cgroup.cpuset.cpus = 0" [...]
In both of those cases, only the first CPU core will be available to the docker container. Both of these options are documented in the docker help.
I've tried to provide a tutorial on container resource alloc.
https://gist.github.com/afolarin/15d12a476e40c173bf5f

Resources