how to kill lots of docker container processes effectively and faster? - jenkins

We are using Jenkins and Docker in combination.. we have set up Jenkins like master/slave model, and containers are spun up in the slave agents.
Sometimes due to bug in jenkins docker plugin or for some unknown reasons, containers are left dangling.
Killing them all takes time, about 5 seconds per container process and we have about 15000 of them. Will take ~24hrs to finish running the cleanup job. How can I remove the containers bunch of them at once? or effectively so that it takes less time?
Will uninstalling docker client, remove the containers?
Is there a volume where these containers process kept, could be removed (bad idea)
Any threading/parallelism to remove them faster?
I am going to run a cron job weekly to patch these bugs, but right now I dont have whole day to get these removed.

Try this:
Uninstall docker-engine
Reboot host
rm /var/lib/docker
Rebooting effectively stops all of the containers and uninstalling docker prevents them from coming back upon reboot. (in case they have restart=always set)

If you are interesting in only killing the processes as they are not exiting properly (my assessment of what you mean--correct me if I'm wrong), there is a way to walk the running container processes and kill them using the Pid information from the container's metadata. As it appears you don't necessarily care about clean process shutdown at this point (which is why docker kill is taking so long per container--the container may not respond to the right signals and therefore the engine waits patiently, and then kills the process), then a kill -9 is a much more swift and drastic way to end these containers and clean up.
A quick test using the latest docker release shows I can kill ~100 containers in 11.5 seconds on a relatively modern laptop:
$ time docker ps --no-trunc --format '{{.ID}}' | xargs -n 1 docker inspect --format '{{.State.Pid}}' $1 | xargs -n 1 sudo kill -9
real 0m11.584s
user 0m2.844s
sys 0m0.436s
A clear explanation of what's happening:
I'm asking the docker engine for an "full container ID only" list of all running containers (the docker ps)
I'm passing that through docker inspect one by one, asking to output only the process ID (.State.Pid), which
I then pass to the kill -9 to have the system directly kill the container process; much quicker than waiting for the engine to do so.
Again, this is not recommended for general use as it does not allow for standard (clean) exit processing for the containerized process, but in your case it sounds like that is not important criteria.
If there is leftover container metadata for these exited containers you can clean that out by using:
docker rm $(docker ps -q -a --filter status=exited)
This will remove all exited containers from the engine's metadata store (the /var/lib/docker content) and should be relatively quick per container.

So,
docker kill $(docker ps -a -q)
isn't what you need?
EDIT: obviously it isn't. My next take then:
A) somehow create a list of all containers that you want to stop.
B) Partition that list (maybe by just slicing it into n parts).
C) Kick of n jobs in parallel, each one working one of those list-slices.
D) Hope that "docker" is robust enough to handle n processes sending n kill requests in sequence in parallel.
E) If that really works: maybe start experimenting to determine the optimum setting for n.

Related

Remove multiple Docker containers based on creation time

I have many Docker containers both running and exited. I only want to keep the containers that were created before today/some specified time -- I would like to remove all containers created today. Easiest way to approach this?
Out of the box on all OS there is the possibility to remove all containers younger than a given one:
docker rm -f $(docker ps -a --filter 'since=<containername>' --format "{{.ID}}")
so the container given in since will be kept, but all newer ones not. Maybe that suits your use case.
If you really need a period of time there will be some bash magic doing that. But specify your needs exactly then..
In detail:
docker rm: removing one or more containers
-f: force running containers to stop
docker ps -a: listing all containers
--filter 'since=..' filtering containers created since given with all details
--format "{{.ID}}": filtering ID-column only

docker container lifecycle confusion

I am new to Docker, and I find the definitions of containers' lifecycle differ a lot.
here is what "Manning.Docker.in.Action.2016.3" shows:
here is what google gives me:
https://medium.com/#nagarwal/lifecycle-of-docker-container-d2da9f85959
here is what the official document says:
status: One of created, restarting, running, removing, paused, exited, or dead
https://docs.docker.com/engine/reference/commandline/ps/
So what's going on here? I guess some new states(and renaming) are introduced in newer version of Docker?
Thanks in advance
Your linked diagram separates docker create from docker start, it includes "die" as a state transition, and it shows how to get to the "restarting" state. That's all valid, though it leads to a more complicated state machine.
(docker create wasn't in the very first versions of Docker but it appeared in Docker 1.3.0 in 2014, which should predate your diagram.)
Practically I might suggest an even simpler state machine:
-------> running -+------> stopped ------>
run | stop rm
\------> exited ------>
process exits rm
That is, never try to restart a container or make changes inside a running container; if you need to tweak anything, delete the existing container and create a new one. This gives you a consistent environment (when the main container process starts you always know what's in its filesystem, up to mounted data). It also matches what happens in cluster environments like Kubernetes, where the cluster manager will routinely create and delete containers for you.
When you get in a situation where internet gives you different answers, you should consider trying it yourself. Especially with technologies like docker, where it is pretty simple to make tests. For example:
I want to run a container (I will use nginx):
docker run -d nginx
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
258cd2edbed8 nginx "nginx -g 'daemon of…" 3 seconds ago Up 2 seconds 80/tcp jolly_golick
Note: docker will keep a container running only if there is a process running in it.
If you would start a debian container (for example), you would see how it immediately stop, as there is nothing running in it. So you could do
docker run -d debian sleep 10
and see that the container is up for 10 seconds.
When a container is running, you can do some things on it. You can't do other things, like removing it. To remove a container, you need to stop it first (or kill it), or force container removal.
Note: You would get all this info from docker itself, if you would be playing around with it, as it would return these info. Like if you would try to remove a running container, you would get this error:
Error response from daemon: You cannot remove a running container 258cd2edbed85bed23ab543312968bd893c1fbd9ba81de40366337f434daedff. Stop the container before attempting removal or force remove
I can't do all possible combinations here. You would get a similar error if you would try removing a paused container. Just play with it, and you will get a clear picture of how it works.

Wondering about differences between Docker vs Supervisor

They seem to accomplish the same thing of managing processes. What's the difference between Docker and Supervisor?
You can use supervisor in a docker container actually: when you can to make sure that exiting your container will kill all your processes.
A Container isolate one main process: as long as that process runs, the container runs.
But if your container needs to run several processes, you need a supervisor to manage the propagation of signals, especially the one indicating a process needs to be terminated.
See more at "Use of Supervisor in docker" to avoid the PID 1 zombie reaping problem. (zombie processes are processes which are never stopped, and remains "zombie", without any parent process)
Since Docker 1.12 (Q3 2016), you don't need supervisor anymore if you have multiple processes:
docker run --init
See PR 26061

What's the difference between "docker stop" and "docker rm"?

I initially thought that docker stop is equivalent to vagrant halt, and docker rm is to vagrant destroy.
But fundamentally, docker containers are stateless, except for VOLUME statement, which AFAIK preserves directory content even after docker rm, if it wasn't called with -v.
So, what is the difference?
docker stop preserves the container in the docker ps -a list (which gives the opportunity to commit it if you want to save its state in a new image).
It sends SIGTERM first, then, after a grace period, SIGKILL.
docker rm will remove the container from docker ps -a list, losing its "state" (the layered filesystems written on top of the image filesystem). It cannot remove a running container (unless called with -f, in which case it sends SIGKILL directly).
In term of lifecycle, you are supposed to stop the container first, then remove it. It gives a chance to the container PID 1 to collect zombie processes.
docker rm removes the container image from your storage location (e.g. debian: /var/lib/docker/containers/) whereas
docker stop simply halts the the container.

Docker daemon memory leak due to logs from long running process

I have the following setup:
Perl service running in a container and writing logs out to STDERR
logspout to ship those logs out to a remote server for archiving
in a 600 MB RAM machine.
I also truncate the logs periodically at:
/var/lib/docker/containers/CID/CID-json.log
as suggested here to avoid 100% disk scenarios.
Problem
Docker daemon starts of with low memory usage, 1% initially and slowly increases to 40% after 2 days of running the container.
Reference
Docker daemon memory leak has been talked about in this issue and this issue. But both of them are closed now saying merged at a commit. Am running the latest major version of docker (Docker version 1.4.0, build 4595d4f), but still face a monotonically increasing memory usage issue.
EDIT: I did this experiment: Just run a bash process in the container, print out a lot of lines to STDERR, docker daemon process's memory usage accelerates very quickly
Does docker do some log buffering and doesn't release memory even if underlying log file (/var/lib/docker/containers/CID/CID-json.log) is cleared?
There's apparently no way to clear the logs. Will this commit solve this issue for long running tasks?
I don't know why docker daemon's memory usage keeps increasing. How do I debug this issue?
There is still at least one outstanding issue relating to memory leaks with logs: https://github.com/docker/docker/issues/9139
This may not be what you are looking for, but I usually run a cron job to restart my containers after a certain amount of time everyday. This ensures that the container has enough RAM all the time, and also I generally restrict the maximum ram usage by the container while creating them.
Containers take only few seconds to restart and serve data and if you are not running a High Availability service and can afford a few seconds downtime, consider restarting the container (assuming that you dont have persistent volumes).
However, if you do find a solution to your problem, do let us know.
docker rm $(docker ps -a -q)
docker rmi --force $(docker images -q)
docker system prune --force
Need to be root user.
systemctl stop docker
rm -rf /var/lib/docker/aufs
apt-get autoclean
apt-get autoremove
systemctl start docker

Resources