How can I know why a Docker container stopped? - docker

I have a Docker container that contains a JVM process. When the process ends the container completes and stops.
While thankfully rare, my JVM can go legs-up suddenly with a hard-failure, e.g. OutOfMemoryError. When this happens my container just stops, like a normal JVM exit.
I can have distributed logging, etc., for normal JVM logging, but in this hard-fail case I want to know the JVM's dying words, which are typically uttered on stderr.
Is there a way to know why my container stopped, look around in logs, stderr, or something along these lines?

You can run the docker logs [ContainerName] or docker logs [ContainerID] command even on stopped containers. You can see them with docker ps -a.
Source: the comment of Usman Ismail above: this is just a way to convert his comment into a proper answer.

Related

Best way to detect Docker container exit

I have some remote servers running Docker images, and want them to "phone home" when the container stops, to alert me to check out the recent logs and figure out what caused the main process in the container to exit. What's the best way to do this? I could just drop the container's CMD/ENTRYPOINT into a wrapper script that does something like ./my_process || phone_home.sh, but I imagine that won't handle actual Docker engine issues, like OOM or someone explicitly doing docker stop my-container.
Is there anything built in to Docker for this or am I better off with some custom sidecar "watcher" service that just constantly checks docker ps output?

Docker container restarting in a loop on GCE

I have correctly deployed a Docker container which runs a Python script that grabs some data from the internet and slaps it in BigQuery. The container works well on my machine and on a GCE instance that I've provisioned.
Now, everything works well for the most part but I am failing to understand why the docker container always restarts after exiting (apparently correctly). Logs, in this case, seems to be fairly useless as there is no error whatsoever. My current hunch is that something is failing silently, forcing the instance to restart.
Is there any way to find out the reboot reason for a given Docker container?
Things tried so far
I've tried to print the exit code of the container in the following way. The result is always 0, no matter those restart cycles.
while true
do
docker inspect my_container --format='{{.State.ExitCode}}'
sleep 1
done
The Google Cloud documentation provides you different ways in which you can review your container related logs including container starts and stops.
In any way, I think there is no problem with your container: by default Compute Engine will restart a container on exit, although you can specify a different restart policy if you need to. Please, see the relevant documentation.

What's the difference between a "stopped" and an "exited" container?

Is there a functional difference here? I can docker start either one to make it go again. What's the difference?
It is quite different.
A stopped container can be restarted, unlike an exited container.
Suppose you have a stopped container, which has an id of 21F123 (that is enough to identify it).
docker start 21F123
may succeed.
If you container exits, you can try again ti launch it, but it will have a new, different pid in
docker ps

Is there a way to hibernate a docker container

I like to use Jupyter Notebook. If I run it in a VM in virtualbox, I can save the state of the VM, and then pick up right where I left off the next day. Can I do something similar if I were to run it in a docker container? i.e. dump the "state" of the container to disk, then crank it back up and reload the "state"?
It looks like docker checkpoint may be the thing I'm attempting to accomplish here. There's not much in the docs that describes it as such. In fact, the docs for docker checkpoint say "Manage checkpoints" which is massively unhelpful.
UPDATE: This IS, in fact, what docker checkpoint is supposed to accomplish. When I checkpoint my jupyter notebook container, it saves it, I can start it back up with docker start --checkpoint [my_checkpoint] jupyter_notebook, and it shows the things I had running as being in a Running state. However, attempts to then use the Running notebooks fail. I'm not sure if this is a CRIU issue or a Jupyter issue, but I'll bring it up in the appropriate git issue tracker.
Anyhoo docker checkpoint is the thing that is supposed to provide VM-save-state/hibernate style functionality.
The closest approach I can see is docker pause <container-id>
https://docs.docker.com/engine/reference/commandline/pause/
The docker pause command suspends all processes in the specified containers. On Linux, this uses the cgroups freezer. Traditionally, when suspending a process the SIGSTOP signal is used, which is observable by the process being suspended. With the cgroups freezer the process is unaware, and unable to capture, that it is being suspended, and subsequently resumed.
Take into account as an important difference against VirtualBox hibernation, that there is no disk persistence of the memory state of the containerized process.
If you just stop the container, it hibernates:
docker stop myjupyter
(hours pass)
docker start myjupyter
docker attach myjupyter
I do this all the time, especially with docker containers which have web browers in them.

Docker Process Management

I have a deployed application running inside a Docker container, which is, in effect, an websocket client that runs forever. Every deploy I'm rebuilding the container and starting it with docker run using the command set in the Dockerfile.
Now, I've noticed a few times that the process occasionally dies without restarting. When running docker ps, I can see that the container is up, and has been up for 2 weeks, however the process running inside of it has died without the host being any the wiser
Do I need to go so far as to have a process manager inside of the docker container to manage the containerized process?
EDIT:
Dockerfile: https://github.com/DVG/catpen-edi/blob/master/Dockerfile
We've developed a process-manager tailor-made for Docker containers and have been using it with quite a bit of success to solve exactly the problem you describe. The best starting point is to take a look at chaperone-docker on github. The readme on the first page contains a quick link to a minimal base image as well as a fully configured LAMP stack so you can try it out and see what a fully-configured image would look like. It's open-source and fully documented.
This is a very interesting problem here related to PID1 and the fact that docker replaces PID1 with the command specified in CMD or ENTRYPOINT. What's happening is that the child process isn't automagically adopted by anything if the parent dies and it becomes an orphan (since there is no PID1 in the sense of a traditional init system like you're used to). Here is some excellent reading to give you a few ideas. You may get some mileage out of their baseimage-docker image which comes with their simplified init system ("my_app"), which will solve some of this problem for you. However, I would strongly caution you against automatically adopting the Phusion mindset for all of your containers, as there exists some ideological friction in that space. I can't recall any discussion on Docker's Github about a potential minimal init system to solve this problem, but I can't imagine it will be a problem forever. Good luck!
If you have two ruby processes it sounds like the child hasn't exited, the application has just stopped working. It's likely the EventMachine reactor is sitting in the background.
Does the EDI app really need to spawn the additional Ruby process? This only adds another layer between Docker and your app. Run the server directly with CMD [ "ruby", "boot.rb" ]. If you find the problem still occurs with a single process then you will need to find what is causing your app to hang.
When a process is running as PID 1 is docker it will need handle the SIGINT and SIGTERM signals too.
# Trap ^C
Signal.trap("INT") {
shut_down
exit
}
# Trap `Kill `
Signal.trap("TERM") {
shut_down
exit
}
Docker also has restart policies for when the container does actually die.
docker run --restart=always
no
Do not automatically restart the container when it exits. This is
the default.
on-failure[:max-retries]
Restart only if the container
exits with a non-zero exit status. Optionally, limit the number of
restart retries the Docker daemon attempts.
always
Always restart the
container regardless of the exit status. When you specify always, the
Docker daemon will try to restart the container indefinitely. The
container will also always start on daemon startup, regardless of the
current state of the container.
unless-stopped
Always restart the
container regardless of the exit status, but do not start it on daemon
startup if the container has been put to a stopped state before.

Resources