Restarting an unhealthy docker container based on healthcheck - docker

I am using Docker version 17.09.0-ce, and I see that containers are marked as unhealthy. Is there an option to get the container restart instead of keeping the container as unhealthy?

Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.
At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/
Here is a sample compose file:
version: '2'
services:
autoheal:
restart: always
image: willfarrell/autoheal
environment:
- AUTOHEAL_CONTAINER_LABEL=all
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Simply execute docker-compose up -d on this

You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.
The Docker restart policy should be one of always or unless-stopped.
The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy.
In the following example I used curl with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill command.
HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'
The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless. Then if the curl HTTP request fails 3 times, then kill is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.
kill docs: https://linux.die.net/man/1/kill
curl docs: https://curl.haxx.se/docs/manpage.html
docker restart docs: https://docs.docker.com/compose/compose-file/compose-file-v2/#restart
Gotchas: kill behaves differently in bash than in sh. In bash you can use -1 to signal all the processes with PID greater than 1 to die.

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

You can try put in your Dockerfile something like this:
HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1
Don't forget --restart always option.
kill 1 will kill process with pid 1 in container and force container exit. Usually the process started by CMD or ENTRYPOINT has pid 1.
Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.

Unhealthy docker containers may be restarted with simple crontab rule:
* * * * * docker ps -f health=unhealthy --format "docker restart {{.ID}}" | sh

Docker has a couple of ways to get details on container health. You can configure health checks and how often they run. Also, health checks can be run on applications running inside a container, like http (this would use curl --fail option.) You can view the health_status event to get details.
For detailed information on an unhealthy container the inspect command comes in handy, docker inspect --format='{{json .State.Health}}' container-name (see https://blog.newrelic.com/2016/08/24/docker-health-check-instruction/ for more details.)
You should resolve the error condition causing the "unhealthy" tag (anytime the health check command runs and gets an exit code of 1) first. This may or may not require that Docker restart the container, depending on the error. If you are starting/restarting your containers automatically, then either trapping the start errors or logging them and the health check status can help address errors quickly. Check the link if you are interested in auto start.

According to https://codeblog.dotsandbrackets.com/docker-health-check/
Create container and add " restart: always".
In the use of healthcheck, pay attention to the following points:
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

Related

How to make docker container restart when stuck automatically? [duplicate]

I am using Docker version 17.09.0-ce, and I see that containers are marked as unhealthy. Is there an option to get the container restart instead of keeping the container as unhealthy?
Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.
At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/
Here is a sample compose file:
version: '2'
services:
autoheal:
restart: always
image: willfarrell/autoheal
environment:
- AUTOHEAL_CONTAINER_LABEL=all
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Simply execute docker-compose up -d on this
You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.
The Docker restart policy should be one of always or unless-stopped.
The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy.
In the following example I used curl with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill command.
HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'
The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless. Then if the curl HTTP request fails 3 times, then kill is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.
kill docs: https://linux.die.net/man/1/kill
curl docs: https://curl.haxx.se/docs/manpage.html
docker restart docs: https://docs.docker.com/compose/compose-file/compose-file-v2/#restart
Gotchas: kill behaves differently in bash than in sh. In bash you can use -1 to signal all the processes with PID greater than 1 to die.
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.
You can try put in your Dockerfile something like this:
HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1
Don't forget --restart always option.
kill 1 will kill process with pid 1 in container and force container exit. Usually the process started by CMD or ENTRYPOINT has pid 1.
Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.
Unhealthy docker containers may be restarted with simple crontab rule:
* * * * * docker ps -f health=unhealthy --format "docker restart {{.ID}}" | sh
Docker has a couple of ways to get details on container health. You can configure health checks and how often they run. Also, health checks can be run on applications running inside a container, like http (this would use curl --fail option.) You can view the health_status event to get details.
For detailed information on an unhealthy container the inspect command comes in handy, docker inspect --format='{{json .State.Health}}' container-name (see https://blog.newrelic.com/2016/08/24/docker-health-check-instruction/ for more details.)
You should resolve the error condition causing the "unhealthy" tag (anytime the health check command runs and gets an exit code of 1) first. This may or may not require that Docker restart the container, depending on the error. If you are starting/restarting your containers automatically, then either trapping the start errors or logging them and the health check status can help address errors quickly. Check the link if you are interested in auto start.
According to https://codeblog.dotsandbrackets.com/docker-health-check/
Create container and add " restart: always".
In the use of healthcheck, pay attention to the following points:
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

Difference in docker restart policy between on-failure and unless-stopped?

I have read the docker-compose documentation about restart policy of containers,
However I failed to understand the difference between on-failure and unless-stopped.
When will I use one over the other? In which situations a certain policy will lead to start a container and the other policy not?
on-failure will issue a restart if the exit code indicated a failure, whereas unless-stopped behaves like always and will keep an instance running unless the container is stopped.
You can try with the hello-world to see the difference.
docker run --restart on-failure hello-world will run once and exit successfully, and running a subsequent docker ps will indicate no currently running instances of the container.
However, docker run --restart unless-stopped hello-world will restart the container even if it successfully exits, so subsequently running docker ps will show you a restarting instance until you stop the container.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4d498ebd13a6 hello-world "/hello" 2 seconds ago Restarting (0) Less than a second ago modest_keldysh
Docker restart policies are there to keep containers active in all possible downfalls, we can leverage it in multiple ways as an example if we have a web server running on a container and have to keep it active even on bad request we can use unless-stopped flag, it will keep the server up and running till we stopped it manually.
Restart flag can be any one of these :-
"no" :- it is the default value, and it will never restart the container.
on-failure :- it will restart the container whenever it encounters an error, or say, whenever the process running inside the container exit with non-zero exit code. Exit code :- 0 means no error, we terminated the process intentionally, but any non-zero value is an error.
always :- as the name, it will always restart the container, no matter whatever be the exit code is. Also, it will restart the container even when we manually stopped it but for that we need to restart the docker daemon.
unless-stopped :- it is similar to the always flag the only difference is once the container is stopped manually it will not restart automatically even after restarting the docker daemon, until we start the container manually again.
The difference between unless-stopped and on-failure is the first will always restart until we stopped it manually no matter whatever be the exit code will be and another will only restart the container on real failure, i.e. exit code = non-zero.
Once the container is stopped its restart flags are ignored, this is one way to overcome from restarting loop. That's why in the case of always flag once we stopped it manually the container will not restart till we restart the docker daemon.
You can easily test all of these flags by creating a simple redis-server:
docker run -d --restart=always --name redis-server redis # start redis image
docker container ls # test the status
docker stop redis-server # stop the container manually
docker container ls # test the status again, got the redis-server did not restarted
sudo service docker restart # restart the docker daemon
# test the status again will find the container is again up and running
# try the same steps by changing the restart flag with *unless-stopped*
docker update --restart=unless-stopped redis-server # will update the restart flag of running container.

How to wait until `docker start` is finished?

When I run docker start, it seems the container might not be fully started at the time the docker start command returns. Is it so?
Is there a way to wait for the container to be fully started before the command returns? Thanks.
A common technique to make sure a container is fully started (i.e. services running, ports open, etc) is to wait until a specific string is logged. See this example Waiting until Docker containers are initialized dealing with PostgreSql and Rails.
Edited:
There could be another solution using the HEALTHCHECK of Docker containers.The idea is to configure the container with a health check command that is used to determine whether or not the main service if fully
started and running normally.
The specified command runs inside the container and sets the health status to starting, healthy or unhealthy
depending of its exit code (0 - container healthy, 1 - container is not healthy). The status of the container can then be retrieved
on the host by inspecting the running instance (docker inspect).
Health check options can be configured inside Dockerfile or when the container is run. Here is a simple example for PostgreSQL
docker run --name postgres --detach \
--health-cmd='pg_isready -U postgres' \
--health-interval='5s' \
--health-timeout='5s' \
--health-start-period='20s' \
postgres:latest && \
until docker inspect --format "{{json .State.Health.Status }}" postgres| \
grep -m 1 "healthy"; do sleep 1 ; done
In this case the health command is pg_isready. A web service will typically use curl, other containers have their specific commands
The docker community provides this kind of configuration for several official images here
Now, when we restart the container (docker start), it is already configured and we need only the second part:
docker start postgres && \
until docker inspect --format "{{json .State.Health.Status }}" postgres|\
grep -m 1 "healthy"; do sleep 1 ; done
The command will return when the container is marked as healthy
Hope that helps.
Disclaimer, I'm not an expert in Docker, and will be glad to know by myself whether a better solution exists.
The docker system doesn't really know that container "may not be fully started".
So, unfortunately, there is nothing to do with this in docker.
Usually, the commands used by the creator of the docker image (in the Dockerfile) are supposed to be organized in a way that the container will be usable once the docker start command ends on the image, and its the best way. However, it's not always the case.
Here is an example:
A Localstack, which is a set of services for local development with AWS has a docker image, but once its started, for example, S3 port is not ready to get connections yet.
From what I understand a non-ready-although-exposed port will be a typical situation that you refer to.
So, out of my experience, in the application that talks to docker process the attempt to connect to the server port should be enclosed with retries and once it's available.

Docker - what does `docker run --restart always` actually do?

Although it seems like the --restart flag is simple and straightforward, I came up with a number of questions when experimenting with it:
With respect to ENTRYPOINT definitions - what are the actual defined semantics during restart?
If I exec into the container (I am on a DDC) and kill -9 the process, it restarts, but if I do docker kill it does not. Why?
How does restart interact with Shared Data Containers / Named Volumes?
Restart policies
Using the --restart flag on Docker run you can specify a restart policy for how a container should or should not be restarted on exit.
When a restart policy is active on a container, it will be shown as either Up or Restarting in docker ps. It can also be useful to use docker events to see the restart policy in effect.
docker run --always
Always restart the container regardless of the exit status. When you
specify always, the Docker daemon will try to restart the container
indefinitely. The container will also always start on daemon startup,
regardless of the current state of the container.
I recommend you this documentation about restart-policies
Documentation - Restart policies
Update Docker v19.03
Restart policies (--restart)
Use Docker’s --restart to specify a container’s restart policy. A restart policy > controls whether the Docker daemon restarts a container after exit. Docker supports the following restart policies:
always Always restart the container regardless of the exit status. When you specify always, the Docker daemon will try to restart the container indefinitely. The container will also always start on daemon startup, regardless of the current state of the container.
$ docker run --restart=always redis
Documentation - Restart policies
To configure the restart policy for a container, use the --restart flag when using the docker run command. The value of the --restart flag can be any of the following:
no Do not automatically restart the container. (the default)
on-failure Restart the container if it exits due to an error, which
manifests as a non-zero exit code.
always Always restart the container if it stops. If it is manually
stopped, it is restarted only when Docker daemon restarts or the
container itself is manually restarted.
unless-stopped Similar to always, except that when the container is
stopped (manually or otherwise), it is not restarted even after Docker
daemon restarts.
The following example starts a Redis container and configures it to always restart unless it is explicitly stopped or Docker is restarted.
$ docker run -d --restart unless-stopped redis
This command changes the restart policy for an already running container named redis.
$ docker update --restart unless-stopped redis
And this command will ensure all currently running containers will be restarted unless stopped.
$ docker update --restart unless-stopped $(docker ps -q)
Restart policy details
Keep the following in mind when using restart policies:
A restart policy only takes effect after a container starts successfully. In this case, starting successfully means that the container is up for at least 10 seconds and Docker has started monitoring it. This prevents a container which does not start at all from going into a restart loop.
If you manually stop a container, its restart policy is ignored until the Docker daemon restarts or the container is manually restarted. This is another attempt to prevent a restart loop.
Restart policies only apply to containers. Restart policies for swarm services are configured differently.
Documentation
I had some time to debug this more today -> because I was using an 'official' docker image I had little to no visibility into what was occurring. To resolve this, I extended the official image and invoked my own entrypoint. The Dockerfile:
FROM officialImage:version
ENV envOne=value1 \
envTwo=value2
COPY wrapper-entrypoint.sh /
ENTRYPOINT ["/wrapper-entrypoint.sh"]
Then I did a 'set -x' in the wrapper-entrypoint.sh script and invoked the original:
#!/bin/bash
set -x
echo "Be pedantic: all args passed: $#"
bash -x ./original-entrypoint.sh "$#"
From this I found:
Restart does call the original ENTRYPOINT with the original arguments. The official image I used detected it had already initialized and thus acted differently. This is why I was confused over the semantics. Using -x allowed me to see what was really happening.
I still don't know why docker kill stops the restart, but that is what I see - at least on Docker Data Center.
I don't believe Shared Data Volumes affect this in any way, SAVE for the actions a given ENTRYPOINT script might take based upon it's condition at the time of the restart.

How to check that the docker container is restarted and accessible?

I have a docker container with JETTY CMD instruction.
After "docker restart", which goes immediately, I cannot access JETTY about 9-10 seconds. After that time docker container or jetty service is UP again and I can access it.
Question is: is there a standard way to check that the docker container is really up?
Surely I can make a loop with test requests to my service and wait to 200 response code. But maybe there is a more beautiful solution?
Thanks
Sergey, youre need to use initialization system as supervisor of your in-Docker processes. You may use distro-built-in init systems like systemd/upstart or init.d depends on your OS for checking a container state.
In theory you should to create independent service in you init system on each docker run command without -d option, because with -d option docker detached a container and returned 0 exit status to init system. As result init system lost a control of target process.
For example, realization of this mechanism in Systemd:
Create something.service file in /etc/systemd/system
And type to it something like this:
[Unit]
Description=Simple Blog Rails Docker Container Service
After=docker.service
Requires=docker.service
[Service]
Restart=on-failure
ExecStartPre=-/usr/bin/docker kill simple-blog-rails-container
ExecStartPre=-/usr/bin/docker rm simple-blog-rails-container
ExecStart=/usr/bin/docker run simple-blog-rails
ExecStop=/usr/bin/docker stop simple-blog-rails-container
[Install]
WantedBy=multi-user.target
Reload Systemd configuration systemctl daemon-reload
Just try to run your container by typing systemctl start something.service or restart instead of start.
You can check service state systemctl status something.service
For more information about using systemd and docker you may read this CoreOS manual: https://coreos.com/docs/launching-containers/launching/getting-started-with-systemd/
Keep in mind that docker restart defaults to a wait time of 10 seconds:
restart Usage: docker restart [OPTIONS] CONTAINER [CONTAINER...]
Restart a running container
-t, --time=10 Number of seconds to try to stop for before
killing the container. Once killed it will then be restarted. Default
is 10 seconds.
When I try this with some long running script in a docker container it takes 10 seconds before it's done. If I change the command line to use a different timeout (like -t=4} it comes back in 4 seconds.
If you want to determine if a container is running (even if your service contained within it is not quite ready yet) you can:
Run docker ps - which will give you a list of all running docker
containers
Use the Docker Remote API command GET /containers/json - this will
give you a json response with a list of running containers.
https://docs.docker.com/reference/api/docker_remote_api_v1.16/
https://docs.docker.com/reference/commandline/cli/#ps

Resources