In my application consisting of several containers, I pause containers which are currently not needed. When they are needed again, I unpause them. This works fine.
However, if something goes wrong in one of the running containers(container exits with exit code != 0), docker-compose(which I am also using) tries to stop all the other containers. If a container is paused, it cannot be stopped or killed.
A small example to illustrate what happens. (all of these commands are automated in my case)
docker start cd1d8ad01f56
docker pause cd1d8ad01f56
docker stop cd1d8ad01f56
Error response from daemon: Cannot stop container cd1d8ad01f56:
Container cd1d8ad01f56c695a598e168e2eacdcd20a5231b9240029db1579bc0f1dcb903
is paused. Unpause the container before stopping
Error: failed to stop containers: [cd1d8ad01f56]
I want the containers to be stopped, even if they are paused.
Solutions I thought of:
First unpause every sleeped container, then stop or kill it. This is an unsuitable solution that requires manual work. But it works...
I could write a script that looks for paused containers and then unpauses and kills them. But I want for compose to just kill all the other stuff and be done with it. I do not want to have to issue another command to execute my script.
Is there a way to specify the code of a container that exits(i.e. tell it to unpause other containers)? So that the containers are not sleeped when trying to stop them.
First unpausing every sleeped container, to then stop or kill it is tedious work, which I would like to automate. I am working in a test environment and do not care how the containers shutdown. I just want them to end together with the failed container(s).
Related
I have a docker container running a 10 hour luigi task. I want to pause the container to use my laptop for something else. I tried "docker pause" but when I unpause the luigi scheduler shows no tasks running. So I have to start again.
Is there any way I can pause and restart exactly where I left off? I suspect it may be luigi that is deleting the task.
I am trying to understand what is the difference between the commands docker stop ContainerID and docker pause ContainerID. According to this page both of them are used to pause an existing Docker container.
The docker pause command suspends all processes in the specified containers. On Linux, this uses the cgroups freezer. Traditionally, when suspending a process the SIGSTOP signal is used, which is observable by the process being suspended
https://docs.docker.com/engine/reference/commandline/pause/
The docker stop command. The main process inside the container will receive SIGTERM, and after a grace period, SIGKILL.
https://docs.docker.com/engine/reference/commandline/stop/#options
SIGTERM is the termination signal. The default behavior is to terminate the process, but it also can be caught or ignored. The intention is to kill the process, gracefully or not, but to first allow it a chance to cleanup.
SIGKILL is the kill signal. The only behavior is to kill the process, immediately. As the process cannot catch the signal, it cannot cleanup, and thus this is a signal of last resort.
SIGSTOP is the pause signal. The only behavior is to pause the process; the signal cannot be caught or ignored. The shell uses pausing (and its counterpart, resuming via SIGCONT) to implement job control.
And addition to the answers added earlier
running docker events after docker stop shows events
kill (signal 15): where signal 15 = SIGTERM
die
stop
running docker events after docker pause shows only one event
pause
Also docker pause would still keep memory portion while the container is paused. This memory is used when the container is resumed. docker stop releases the memory used after the container is stopped.
This table has even more details.
What is the difference between docker stop and docker pause?
docker stop: Send SIGTERM(termination signal), and if needed SIGKILL(kill signal)
docker pause: Send SIGSTOP(pause signal)
SIGTERM: The default behavior is to terminate the process, but it also can be caught or ignored. The intention is to kill the process,
gracefully or not, but first give it a chance to clean up.
SIGKILL: The only behavior is to kill the process, immediately. As the process cannot catch the signal, it cannot clean up; thus, this is
a signal of last resort.
SIGSTOP: The only behavior is to pause the process; the signal cannot be caught or ignored. The shell uses pausing (and its
counterpart, resuming via SIGCONT) to implement job control
When to use docker stop and docker pause?
docker stop: When you wish to clear up memory or delete all of the processes' -cached- data. Simply put, you no longer care about the processes in the container and are comfortable with killing them.
docker pause: When you only want to suspend the processes in the container; you do not want them to lose data or state.
Example:
Consider a container with a counter. Assume the counter has reached 3000. Running docker stop will cause the counter to lose its value and you will be unable to retrieve it. Using docker pause, on the other hand, will maintain the counter state and value.
Hope it's clear now!
docker pause pauses (i.e., sends SIGSTOP) pauses (read: suspends) all the processes in a container[s].
docker stop stops (i.e., sends SIGTERM, and if needed SIGKILL) to the conrainer[s]'s main process.
When the running container is issued with the docker pause command, the SIGSTOP signal is passed which allows the processes inside the container (basically the container itself) to be in a paused state.
So when the docker unpause is issued, SIGCONT signal is passed to the container processes to restore the container proceses.
When the docker stop command is issued to the running container, the SIGTERM signal is passed to the container processes to stop and stops the container.
Hence when the docker pause is issued to a container, and the docker service is restarted, the cgroups allocated to it is released. (as the SIGTERM is passed to all the container processes)
So after the restart, the unpause would not be helpful as the containers are stopped.
A few years ago. When I just started playing docker. I remember there are some blog posts mentioned if you don't handle your pid(1) process well. You will create a zombie docker container. At that time. I chose just follow the suggestion start using a init tool called dumb-init. And I never really see a zombie container be created.
But I am still curious why it's a problem. If I remember correctly, docker stop xxx by default will send SIGTERM to the container pid(1) process. And if the process can not gracefully stop within 10s (default). Docker will force kill it by sending SIGKILL to pid(1) process. And I also know that pid(1) process is special in Linux system. It can ignore SIGKILL signal (link). But I think even if the process's PID in docker container is 1. It just because it's using namespaces to scope its processes. In the host machine, you should see the process is another PID. Which can be killed by the kernel.
So my questions are:
Why can't docker engine just kill the container in the host kernel level? So no matter what. The user can ensure the container be killed properly.
How can I create a zombie process in docker container? (If someone can share a Gist will be great!)
Not zombie containers, but zombie processes. Write this zombie.py:
#!/usr/bin/env python3
import subprocess
import time
p = subprocess.Popen(['/bin/sleep', '1'])
time.sleep(2)
subprocess.run(['/bin/ps', '-ewl'])
Write this Dockerfile:
FROM python:3
COPY zombie.py /
CMD ["/zombie.py"]
Build and run it:
chmod +x zombie.py
docker build -t zombie .
docker run --rm zombie
What happens here is the /bin/sleep command runs to execution. The parent process needs to use the wait call to clean up after it, but it doesn't so when it runs ps, you'll see a "Z" zombie process.
But wait, there's more! Say your process does carefully clean up after itself. In this specific example, subprocess.run() includes the required wait call, for instance, and you might change the Popen call to run. If that subprocess launches another subprocess, and it exits (or crashes) without waiting for it, the init process with pid 1 becomes the new parent process of the zombie. (It's worked this way for 40 years.) In a Docker container, though, the main container process runs with pid 1, and if it's not expecting "extra" child processes, you could wind up with stale zombie processes for the life of the container.
This leads to the occasional suggestion that a Docker container should always run some sort of "real" init process, maybe something as minimal as tini, so that something picks up after zombie processes and your actual container job doesn't need to worry about it.
I have a deployed application running inside a Docker container, which is, in effect, an websocket client that runs forever. Every deploy I'm rebuilding the container and starting it with docker run using the command set in the Dockerfile.
Now, I've noticed a few times that the process occasionally dies without restarting. When running docker ps, I can see that the container is up, and has been up for 2 weeks, however the process running inside of it has died without the host being any the wiser
Do I need to go so far as to have a process manager inside of the docker container to manage the containerized process?
EDIT:
Dockerfile: https://github.com/DVG/catpen-edi/blob/master/Dockerfile
We've developed a process-manager tailor-made for Docker containers and have been using it with quite a bit of success to solve exactly the problem you describe. The best starting point is to take a look at chaperone-docker on github. The readme on the first page contains a quick link to a minimal base image as well as a fully configured LAMP stack so you can try it out and see what a fully-configured image would look like. It's open-source and fully documented.
This is a very interesting problem here related to PID1 and the fact that docker replaces PID1 with the command specified in CMD or ENTRYPOINT. What's happening is that the child process isn't automagically adopted by anything if the parent dies and it becomes an orphan (since there is no PID1 in the sense of a traditional init system like you're used to). Here is some excellent reading to give you a few ideas. You may get some mileage out of their baseimage-docker image which comes with their simplified init system ("my_app"), which will solve some of this problem for you. However, I would strongly caution you against automatically adopting the Phusion mindset for all of your containers, as there exists some ideological friction in that space. I can't recall any discussion on Docker's Github about a potential minimal init system to solve this problem, but I can't imagine it will be a problem forever. Good luck!
If you have two ruby processes it sounds like the child hasn't exited, the application has just stopped working. It's likely the EventMachine reactor is sitting in the background.
Does the EDI app really need to spawn the additional Ruby process? This only adds another layer between Docker and your app. Run the server directly with CMD [ "ruby", "boot.rb" ]. If you find the problem still occurs with a single process then you will need to find what is causing your app to hang.
When a process is running as PID 1 is docker it will need handle the SIGINT and SIGTERM signals too.
# Trap ^C
Signal.trap("INT") {
shut_down
exit
}
# Trap `Kill `
Signal.trap("TERM") {
shut_down
exit
}
Docker also has restart policies for when the container does actually die.
docker run --restart=always
no
Do not automatically restart the container when it exits. This is
the default.
on-failure[:max-retries]
Restart only if the container
exits with a non-zero exit status. Optionally, limit the number of
restart retries the Docker daemon attempts.
always
Always restart the
container regardless of the exit status. When you specify always, the
Docker daemon will try to restart the container indefinitely. The
container will also always start on daemon startup, regardless of the
current state of the container.
unless-stopped
Always restart the
container regardless of the exit status, but do not start it on daemon
startup if the container has been put to a stopped state before.
I can suspend the processes running inside a container with the PAUSE command. Is it possible to clone the Docker container whilst paused, so that it can be resumed (i.e. via the UNPAUSE command) several times in parallel?
The use case for this is a process which takes long time to start (i.e. ~20 seconds). Given that I want to have a pool of short-living Docker containers running that process in parallel, I would reduce start-up time for each container a lot if this was somehow possible.
No, you can only clone the container's disk image, not any running processes.
Yes, you can, using docker checkpoint (criu). This does not have anything to do with pause though, it is a seperate docker command.
Also see here.