I can suspend the processes running inside a container with the PAUSE command. Is it possible to clone the Docker container whilst paused, so that it can be resumed (i.e. via the UNPAUSE command) several times in parallel?
The use case for this is a process which takes long time to start (i.e. ~20 seconds). Given that I want to have a pool of short-living Docker containers running that process in parallel, I would reduce start-up time for each container a lot if this was somehow possible.
No, you can only clone the container's disk image, not any running processes.
Yes, you can, using docker checkpoint (criu). This does not have anything to do with pause though, it is a seperate docker command.
Also see here.
Related
I have a docker container running a 10 hour luigi task. I want to pause the container to use my laptop for something else. I tried "docker pause" but when I unpause the luigi scheduler shows no tasks running. So I have to start again.
Is there any way I can pause and restart exactly where I left off? I suspect it may be luigi that is deleting the task.
I like to use Jupyter Notebook. If I run it in a VM in virtualbox, I can save the state of the VM, and then pick up right where I left off the next day. Can I do something similar if I were to run it in a docker container? i.e. dump the "state" of the container to disk, then crank it back up and reload the "state"?
It looks like docker checkpoint may be the thing I'm attempting to accomplish here. There's not much in the docs that describes it as such. In fact, the docs for docker checkpoint say "Manage checkpoints" which is massively unhelpful.
UPDATE: This IS, in fact, what docker checkpoint is supposed to accomplish. When I checkpoint my jupyter notebook container, it saves it, I can start it back up with docker start --checkpoint [my_checkpoint] jupyter_notebook, and it shows the things I had running as being in a Running state. However, attempts to then use the Running notebooks fail. I'm not sure if this is a CRIU issue or a Jupyter issue, but I'll bring it up in the appropriate git issue tracker.
Anyhoo docker checkpoint is the thing that is supposed to provide VM-save-state/hibernate style functionality.
The closest approach I can see is docker pause <container-id>
https://docs.docker.com/engine/reference/commandline/pause/
The docker pause command suspends all processes in the specified containers. On Linux, this uses the cgroups freezer. Traditionally, when suspending a process the SIGSTOP signal is used, which is observable by the process being suspended. With the cgroups freezer the process is unaware, and unable to capture, that it is being suspended, and subsequently resumed.
Take into account as an important difference against VirtualBox hibernation, that there is no disk persistence of the memory state of the containerized process.
If you just stop the container, it hibernates:
docker stop myjupyter
(hours pass)
docker start myjupyter
docker attach myjupyter
I do this all the time, especially with docker containers which have web browers in them.
In my application consisting of several containers, I pause containers which are currently not needed. When they are needed again, I unpause them. This works fine.
However, if something goes wrong in one of the running containers(container exits with exit code != 0), docker-compose(which I am also using) tries to stop all the other containers. If a container is paused, it cannot be stopped or killed.
A small example to illustrate what happens. (all of these commands are automated in my case)
docker start cd1d8ad01f56
docker pause cd1d8ad01f56
docker stop cd1d8ad01f56
Error response from daemon: Cannot stop container cd1d8ad01f56:
Container cd1d8ad01f56c695a598e168e2eacdcd20a5231b9240029db1579bc0f1dcb903
is paused. Unpause the container before stopping
Error: failed to stop containers: [cd1d8ad01f56]
I want the containers to be stopped, even if they are paused.
Solutions I thought of:
First unpause every sleeped container, then stop or kill it. This is an unsuitable solution that requires manual work. But it works...
I could write a script that looks for paused containers and then unpauses and kills them. But I want for compose to just kill all the other stuff and be done with it. I do not want to have to issue another command to execute my script.
Is there a way to specify the code of a container that exits(i.e. tell it to unpause other containers)? So that the containers are not sleeped when trying to stop them.
First unpausing every sleeped container, to then stop or kill it is tedious work, which I would like to automate. I am working in a test environment and do not care how the containers shutdown. I just want them to end together with the failed container(s).
I have a deployed application running inside a Docker container, which is, in effect, an websocket client that runs forever. Every deploy I'm rebuilding the container and starting it with docker run using the command set in the Dockerfile.
Now, I've noticed a few times that the process occasionally dies without restarting. When running docker ps, I can see that the container is up, and has been up for 2 weeks, however the process running inside of it has died without the host being any the wiser
Do I need to go so far as to have a process manager inside of the docker container to manage the containerized process?
EDIT:
Dockerfile: https://github.com/DVG/catpen-edi/blob/master/Dockerfile
We've developed a process-manager tailor-made for Docker containers and have been using it with quite a bit of success to solve exactly the problem you describe. The best starting point is to take a look at chaperone-docker on github. The readme on the first page contains a quick link to a minimal base image as well as a fully configured LAMP stack so you can try it out and see what a fully-configured image would look like. It's open-source and fully documented.
This is a very interesting problem here related to PID1 and the fact that docker replaces PID1 with the command specified in CMD or ENTRYPOINT. What's happening is that the child process isn't automagically adopted by anything if the parent dies and it becomes an orphan (since there is no PID1 in the sense of a traditional init system like you're used to). Here is some excellent reading to give you a few ideas. You may get some mileage out of their baseimage-docker image which comes with their simplified init system ("my_app"), which will solve some of this problem for you. However, I would strongly caution you against automatically adopting the Phusion mindset for all of your containers, as there exists some ideological friction in that space. I can't recall any discussion on Docker's Github about a potential minimal init system to solve this problem, but I can't imagine it will be a problem forever. Good luck!
If you have two ruby processes it sounds like the child hasn't exited, the application has just stopped working. It's likely the EventMachine reactor is sitting in the background.
Does the EDI app really need to spawn the additional Ruby process? This only adds another layer between Docker and your app. Run the server directly with CMD [ "ruby", "boot.rb" ]. If you find the problem still occurs with a single process then you will need to find what is causing your app to hang.
When a process is running as PID 1 is docker it will need handle the SIGINT and SIGTERM signals too.
# Trap ^C
Signal.trap("INT") {
shut_down
exit
}
# Trap `Kill `
Signal.trap("TERM") {
shut_down
exit
}
Docker also has restart policies for when the container does actually die.
docker run --restart=always
no
Do not automatically restart the container when it exits. This is
the default.
on-failure[:max-retries]
Restart only if the container
exits with a non-zero exit status. Optionally, limit the number of
restart retries the Docker daemon attempts.
always
Always restart the
container regardless of the exit status. When you specify always, the
Docker daemon will try to restart the container indefinitely. The
container will also always start on daemon startup, regardless of the
current state of the container.
unless-stopped
Always restart the
container regardless of the exit status, but do not start it on daemon
startup if the container has been put to a stopped state before.
Docker makes it easy to stop & restart containers. It also has the ability to pause and then unpause containers. The Docker docs state
When the container is exited, the state of the file system and its exit value is preserved. You can start, stop, and restart a container. The processes restart from scratch (their memory state is not preserved in a container), but the file system is just as it was when the container was stopped.
I tested this out by settting up a container with memcached running, wrote a value to memcache and then
Stopped & then restarted the container - the memcached value was gone
Paused & then unpaused the container - the memcached value was still intact
Somewhere in the docs - I can no longer find the precise document - I read that stopped containers do not consume CPU or memory. However:
I suppose the fact that the file system state is preserved means that the container still does consume some space on the host's file system?
Is there a performance hit (other than host disk space consumption) associated with having 10s, or even 100s, of stopped containers in the system? For instance, does it make it any harder for Docker to startup and manage new containers?
And finally, if Paused containers retain their memory state when Unpaused - as demonstrated by their ability to remember memcached keys - do they have a different impact on CPU and memory?
I'd be most obliged to anyone who might be able to clarify these issues.
I am not an expert about docker core but I will try to answer some of these questions.
I suppose the fact that the file system state is preserved means that the container still does consume some space on the host's file
system?
Yes. Docker save all the container and image data in /var/lib/docker. The default way to save the container and image data is using aufs. The data of each layer is saved under /var/lib/docker/aufs/diff. When a new container is created, a new layer is also created with is folder, and there the changes from the layers of the source image are stored.
Is there a performance hit (other than host disk space consumption) associated with having 10s, or even 100s, of stopped
containers in the system? For instance, does it make it any harder for
Docker to startup and manage new containers?
As far as I know, it should not be any performace hit. When you stop a container, docker daemon sends SIGTERM and SIGKILL to all the process of that container, as described in docker CLI documentation:
Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...]
Stop a running container by sending SIGTERM and then SIGKILL after a
grace period
-t, --time=10 Number of seconds to wait for the container to
stop before killing it. Default is 10 seconds.
3.And finally, if Paused containers retain their memory state when
Unpaused - as demonstrated by their ability to remember memcached
keys - do they have a different impact on CPU and memory?
As #Usman said, docker implements pause/unpause using the cgroup freezer. If I'm not wrong, when you put a process in the freezer (or its cgroup), you block the execution of new task of that process from the kernel task scheduler (i.e.: it stops the process), but you don't kill them and they keep consuming the memory they are using (although the Kernel may move that memory to swap or to solid disk). And the CPU resources used by a paused container I would consider insignificant. For more information about this I would check the pull request of this feature, Docker issue #5948