They seem to accomplish the same thing of managing processes. What's the difference between Docker and Supervisor?
You can use supervisor in a docker container actually: when you can to make sure that exiting your container will kill all your processes.
A Container isolate one main process: as long as that process runs, the container runs.
But if your container needs to run several processes, you need a supervisor to manage the propagation of signals, especially the one indicating a process needs to be terminated.
See more at "Use of Supervisor in docker" to avoid the PID 1 zombie reaping problem. (zombie processes are processes which are never stopped, and remains "zombie", without any parent process)
Since Docker 1.12 (Q3 2016), you don't need supervisor anymore if you have multiple processes:
docker run --init
See PR 26061
Related
I ve seen in some build the use of supervisor to run the docker-compose up -d command with the possibility to autostart and/or autorestart.
Im wondering if this cohabitation of supervisor and docker-compose works well? Aren't the two autorestart options interfering with each other? Also what is the benefit to use supervisor in place of a simple docker-compose except run at startup if the server is shut down?
Please share your experience if you have some on using theses two tools
Thank you
Running multiple single-process containers is almost always better than running a single multiple-process container; avoid supervisord when possible.
Mechanically, the combination should work fine. Supervisord will capture logs and take responsibility for restarting the process in the container. That means docker logs will have no interesting output, and you need to get the file content out of the container. If one of the managed processes fails then supervisord will restart it. The container itself will probably never be restarted, unless supervisord manages to crash somehow.
There are a couple of notable disadvantages to using supervisord:
As noted, it swallows logs, so you need a complex file-oriented approach to read them out.
If one of the processes fails then you'll have difficulty seeing that from outside the container.
If you have a code update you have to delete and recreate the container with the new image, which with supervisord means restarting every process.
In a clustered environment like Kubernetes, every process in the supervisord container runs on the same node, and you can't scale individual processes up to handle additional load.
Given the choice of tools you suggest, I'd pretty much always use Compose with its restart: policies, and not use supervisord at all.
A few years ago. When I just started playing docker. I remember there are some blog posts mentioned if you don't handle your pid(1) process well. You will create a zombie docker container. At that time. I chose just follow the suggestion start using a init tool called dumb-init. And I never really see a zombie container be created.
But I am still curious why it's a problem. If I remember correctly, docker stop xxx by default will send SIGTERM to the container pid(1) process. And if the process can not gracefully stop within 10s (default). Docker will force kill it by sending SIGKILL to pid(1) process. And I also know that pid(1) process is special in Linux system. It can ignore SIGKILL signal (link). But I think even if the process's PID in docker container is 1. It just because it's using namespaces to scope its processes. In the host machine, you should see the process is another PID. Which can be killed by the kernel.
So my questions are:
Why can't docker engine just kill the container in the host kernel level? So no matter what. The user can ensure the container be killed properly.
How can I create a zombie process in docker container? (If someone can share a Gist will be great!)
Not zombie containers, but zombie processes. Write this zombie.py:
#!/usr/bin/env python3
import subprocess
import time
p = subprocess.Popen(['/bin/sleep', '1'])
time.sleep(2)
subprocess.run(['/bin/ps', '-ewl'])
Write this Dockerfile:
FROM python:3
COPY zombie.py /
CMD ["/zombie.py"]
Build and run it:
chmod +x zombie.py
docker build -t zombie .
docker run --rm zombie
What happens here is the /bin/sleep command runs to execution. The parent process needs to use the wait call to clean up after it, but it doesn't so when it runs ps, you'll see a "Z" zombie process.
But wait, there's more! Say your process does carefully clean up after itself. In this specific example, subprocess.run() includes the required wait call, for instance, and you might change the Popen call to run. If that subprocess launches another subprocess, and it exits (or crashes) without waiting for it, the init process with pid 1 becomes the new parent process of the zombie. (It's worked this way for 40 years.) In a Docker container, though, the main container process runs with pid 1, and if it's not expecting "extra" child processes, you could wind up with stale zombie processes for the life of the container.
This leads to the occasional suggestion that a Docker container should always run some sort of "real" init process, maybe something as minimal as tini, so that something picks up after zombie processes and your actual container job doesn't need to worry about it.
I like to use Jupyter Notebook. If I run it in a VM in virtualbox, I can save the state of the VM, and then pick up right where I left off the next day. Can I do something similar if I were to run it in a docker container? i.e. dump the "state" of the container to disk, then crank it back up and reload the "state"?
It looks like docker checkpoint may be the thing I'm attempting to accomplish here. There's not much in the docs that describes it as such. In fact, the docs for docker checkpoint say "Manage checkpoints" which is massively unhelpful.
UPDATE: This IS, in fact, what docker checkpoint is supposed to accomplish. When I checkpoint my jupyter notebook container, it saves it, I can start it back up with docker start --checkpoint [my_checkpoint] jupyter_notebook, and it shows the things I had running as being in a Running state. However, attempts to then use the Running notebooks fail. I'm not sure if this is a CRIU issue or a Jupyter issue, but I'll bring it up in the appropriate git issue tracker.
Anyhoo docker checkpoint is the thing that is supposed to provide VM-save-state/hibernate style functionality.
The closest approach I can see is docker pause <container-id>
https://docs.docker.com/engine/reference/commandline/pause/
The docker pause command suspends all processes in the specified containers. On Linux, this uses the cgroups freezer. Traditionally, when suspending a process the SIGSTOP signal is used, which is observable by the process being suspended. With the cgroups freezer the process is unaware, and unable to capture, that it is being suspended, and subsequently resumed.
Take into account as an important difference against VirtualBox hibernation, that there is no disk persistence of the memory state of the containerized process.
If you just stop the container, it hibernates:
docker stop myjupyter
(hours pass)
docker start myjupyter
docker attach myjupyter
I do this all the time, especially with docker containers which have web browers in them.
We are using Jenkins and Docker in combination.. we have set up Jenkins like master/slave model, and containers are spun up in the slave agents.
Sometimes due to bug in jenkins docker plugin or for some unknown reasons, containers are left dangling.
Killing them all takes time, about 5 seconds per container process and we have about 15000 of them. Will take ~24hrs to finish running the cleanup job. How can I remove the containers bunch of them at once? or effectively so that it takes less time?
Will uninstalling docker client, remove the containers?
Is there a volume where these containers process kept, could be removed (bad idea)
Any threading/parallelism to remove them faster?
I am going to run a cron job weekly to patch these bugs, but right now I dont have whole day to get these removed.
Try this:
Uninstall docker-engine
Reboot host
rm /var/lib/docker
Rebooting effectively stops all of the containers and uninstalling docker prevents them from coming back upon reboot. (in case they have restart=always set)
If you are interesting in only killing the processes as they are not exiting properly (my assessment of what you mean--correct me if I'm wrong), there is a way to walk the running container processes and kill them using the Pid information from the container's metadata. As it appears you don't necessarily care about clean process shutdown at this point (which is why docker kill is taking so long per container--the container may not respond to the right signals and therefore the engine waits patiently, and then kills the process), then a kill -9 is a much more swift and drastic way to end these containers and clean up.
A quick test using the latest docker release shows I can kill ~100 containers in 11.5 seconds on a relatively modern laptop:
$ time docker ps --no-trunc --format '{{.ID}}' | xargs -n 1 docker inspect --format '{{.State.Pid}}' $1 | xargs -n 1 sudo kill -9
real 0m11.584s
user 0m2.844s
sys 0m0.436s
A clear explanation of what's happening:
I'm asking the docker engine for an "full container ID only" list of all running containers (the docker ps)
I'm passing that through docker inspect one by one, asking to output only the process ID (.State.Pid), which
I then pass to the kill -9 to have the system directly kill the container process; much quicker than waiting for the engine to do so.
Again, this is not recommended for general use as it does not allow for standard (clean) exit processing for the containerized process, but in your case it sounds like that is not important criteria.
If there is leftover container metadata for these exited containers you can clean that out by using:
docker rm $(docker ps -q -a --filter status=exited)
This will remove all exited containers from the engine's metadata store (the /var/lib/docker content) and should be relatively quick per container.
So,
docker kill $(docker ps -a -q)
isn't what you need?
EDIT: obviously it isn't. My next take then:
A) somehow create a list of all containers that you want to stop.
B) Partition that list (maybe by just slicing it into n parts).
C) Kick of n jobs in parallel, each one working one of those list-slices.
D) Hope that "docker" is robust enough to handle n processes sending n kill requests in sequence in parallel.
E) If that really works: maybe start experimenting to determine the optimum setting for n.
I have a deployed application running inside a Docker container, which is, in effect, an websocket client that runs forever. Every deploy I'm rebuilding the container and starting it with docker run using the command set in the Dockerfile.
Now, I've noticed a few times that the process occasionally dies without restarting. When running docker ps, I can see that the container is up, and has been up for 2 weeks, however the process running inside of it has died without the host being any the wiser
Do I need to go so far as to have a process manager inside of the docker container to manage the containerized process?
EDIT:
Dockerfile: https://github.com/DVG/catpen-edi/blob/master/Dockerfile
We've developed a process-manager tailor-made for Docker containers and have been using it with quite a bit of success to solve exactly the problem you describe. The best starting point is to take a look at chaperone-docker on github. The readme on the first page contains a quick link to a minimal base image as well as a fully configured LAMP stack so you can try it out and see what a fully-configured image would look like. It's open-source and fully documented.
This is a very interesting problem here related to PID1 and the fact that docker replaces PID1 with the command specified in CMD or ENTRYPOINT. What's happening is that the child process isn't automagically adopted by anything if the parent dies and it becomes an orphan (since there is no PID1 in the sense of a traditional init system like you're used to). Here is some excellent reading to give you a few ideas. You may get some mileage out of their baseimage-docker image which comes with their simplified init system ("my_app"), which will solve some of this problem for you. However, I would strongly caution you against automatically adopting the Phusion mindset for all of your containers, as there exists some ideological friction in that space. I can't recall any discussion on Docker's Github about a potential minimal init system to solve this problem, but I can't imagine it will be a problem forever. Good luck!
If you have two ruby processes it sounds like the child hasn't exited, the application has just stopped working. It's likely the EventMachine reactor is sitting in the background.
Does the EDI app really need to spawn the additional Ruby process? This only adds another layer between Docker and your app. Run the server directly with CMD [ "ruby", "boot.rb" ]. If you find the problem still occurs with a single process then you will need to find what is causing your app to hang.
When a process is running as PID 1 is docker it will need handle the SIGINT and SIGTERM signals too.
# Trap ^C
Signal.trap("INT") {
shut_down
exit
}
# Trap `Kill `
Signal.trap("TERM") {
shut_down
exit
}
Docker also has restart policies for when the container does actually die.
docker run --restart=always
no
Do not automatically restart the container when it exits. This is
the default.
on-failure[:max-retries]
Restart only if the container
exits with a non-zero exit status. Optionally, limit the number of
restart retries the Docker daemon attempts.
always
Always restart the
container regardless of the exit status. When you specify always, the
Docker daemon will try to restart the container indefinitely. The
container will also always start on daemon startup, regardless of the
current state of the container.
unless-stopped
Always restart the
container regardless of the exit status, but do not start it on daemon
startup if the container has been put to a stopped state before.