If I have scripts issueing docker run commands in parallel, the docker engine appears to handle these commands in series. Since runing a minimal container image with "docker run" takes around 100ms to start does this mean issueing commands in parallel to run 1000 containers will take the docker engine 100ms x 1000 = 100 s or nearly 2 minutes? Is there some reason why the docker engine is serial instead of parallel? How do people get around this?
How do people get around this?
a/ They don't start 1000 containers at the same time
b/ if they do, they might use a cluster management system like docker swarm to manage the all process
c/ they do run 1000 containers, in advance in order to take into account the starting time.
Truly parallelize docker run command could be tricky considering some of those command might depend on other containers to be created/started first (like a docker run --volumes-from=xxx)
Related
I have a docker-compose file with 4 services. Services 1,2,3 are job executors. Service 4 is the job scheduler. After the scheduler has finished running all its jobs on executors, it returns 0 and terminates. However the executor services still need to be shut down. With standard docker-compose this is easy. Just use the "--exit-code-from" option:
Terminate docker compose when test container finishes
However when a version 3.0+ compose file is deployed via Docker Stack, I see no equivalent way to wait for 1 service to complete and then terminate all remaining services. https://docs.docker.com/engine/reference/commandline/stack/
A few possible approaches are discussed here -
https://github.com/moby/moby/issues/30942
The solution from miltoncs seems reasonable at first:
https://github.com/moby/moby/issues/30942#issuecomment-540699206
The concept suggested is querying every second with docker stack ps to get service status. Then removing all services with docker stack rm when done. I'm not sure how all the constant stack ps traffic would scale with thousands of jobs running in a cluster. Potentially bogging down the ingress network?
Does anyone have experience / success with this or similar solutions?
I have 10 container in docker compose.
I want 9 containers to start working when i up the docker compose and allow docker-compose to run the 10th container after 1 hour of time.
Currently its running all containers at once.
How i can achieve this ?
Docker Compose doesn’t directly have this functionality. (Kubernetes doesn’t either, though it does have the ability to run a short-lived container at a specified time of day.)
Probably the best workaround to the problem as you’ve stated it is to use a tool like at(1) to run an additional container at a later time
at +1h docker run ...
My experience has generally been that it can get a little messy to depend on starting and stopping Docker containers for workflow management. You may be better off starting a pool of workers against some job queue system like RabbitMQ and injecting a job after an hour, or using a language-native scheduled-task library in your application, and just always start every container every time.
We are using Jenkins and Docker in combination.. we have set up Jenkins like master/slave model, and containers are spun up in the slave agents.
Sometimes due to bug in jenkins docker plugin or for some unknown reasons, containers are left dangling.
Killing them all takes time, about 5 seconds per container process and we have about 15000 of them. Will take ~24hrs to finish running the cleanup job. How can I remove the containers bunch of them at once? or effectively so that it takes less time?
Will uninstalling docker client, remove the containers?
Is there a volume where these containers process kept, could be removed (bad idea)
Any threading/parallelism to remove them faster?
I am going to run a cron job weekly to patch these bugs, but right now I dont have whole day to get these removed.
Try this:
Uninstall docker-engine
Reboot host
rm /var/lib/docker
Rebooting effectively stops all of the containers and uninstalling docker prevents them from coming back upon reboot. (in case they have restart=always set)
If you are interesting in only killing the processes as they are not exiting properly (my assessment of what you mean--correct me if I'm wrong), there is a way to walk the running container processes and kill them using the Pid information from the container's metadata. As it appears you don't necessarily care about clean process shutdown at this point (which is why docker kill is taking so long per container--the container may not respond to the right signals and therefore the engine waits patiently, and then kills the process), then a kill -9 is a much more swift and drastic way to end these containers and clean up.
A quick test using the latest docker release shows I can kill ~100 containers in 11.5 seconds on a relatively modern laptop:
$ time docker ps --no-trunc --format '{{.ID}}' | xargs -n 1 docker inspect --format '{{.State.Pid}}' $1 | xargs -n 1 sudo kill -9
real 0m11.584s
user 0m2.844s
sys 0m0.436s
A clear explanation of what's happening:
I'm asking the docker engine for an "full container ID only" list of all running containers (the docker ps)
I'm passing that through docker inspect one by one, asking to output only the process ID (.State.Pid), which
I then pass to the kill -9 to have the system directly kill the container process; much quicker than waiting for the engine to do so.
Again, this is not recommended for general use as it does not allow for standard (clean) exit processing for the containerized process, but in your case it sounds like that is not important criteria.
If there is leftover container metadata for these exited containers you can clean that out by using:
docker rm $(docker ps -q -a --filter status=exited)
This will remove all exited containers from the engine's metadata store (the /var/lib/docker content) and should be relatively quick per container.
So,
docker kill $(docker ps -a -q)
isn't what you need?
EDIT: obviously it isn't. My next take then:
A) somehow create a list of all containers that you want to stop.
B) Partition that list (maybe by just slicing it into n parts).
C) Kick of n jobs in parallel, each one working one of those list-slices.
D) Hope that "docker" is robust enough to handle n processes sending n kill requests in sequence in parallel.
E) If that really works: maybe start experimenting to determine the optimum setting for n.
I can suspend the processes running inside a container with the PAUSE command. Is it possible to clone the Docker container whilst paused, so that it can be resumed (i.e. via the UNPAUSE command) several times in parallel?
The use case for this is a process which takes long time to start (i.e. ~20 seconds). Given that I want to have a pool of short-living Docker containers running that process in parallel, I would reduce start-up time for each container a lot if this was somehow possible.
No, you can only clone the container's disk image, not any running processes.
Yes, you can, using docker checkpoint (criu). This does not have anything to do with pause though, it is a seperate docker command.
Also see here.
I'm curious about the amount of overhead (time taken to start running, assuming I've already pulled the docker image) Docker gives when doing a docker run opposed to me just writing a script that installs the same things that the docker would. From my experience, docker run seems to always execute instantly and is ready to go, but I could imagine some more complicated dockers might have some additional overhead? I'm thinking about using something like YARN to bring up services on the fly with a docker, but was wondering if it might come up quicker without a Docker. Any thoughts on this?
Note: I'm not concerned about performance after the docker is up right now, I'm concerned about time taking to bring up the service.
Docker is pretty quick to start, but there is some things to consider.
The quickest way to test the overhead is using the time executable and running this command:
docker run --rm -it ubunbu /bin/bash echo test
Which gives you something like this:
$ time docker run --rm -it ubuntu echo test
test
real 0m0.936s
user 0m0.161s
sys 0m0.008s
What you can read from this, is that the cpu only took 0.16 sec to run that command, but it took a little less than a sec in real time, which includes (disk I/O, other process)
But in general, do not worry about performance if you are using containers, they main reason you want to use them for is consistency.