How to make docker container restart when stuck automatically? [duplicate] - docker

I am using Docker version 17.09.0-ce, and I see that containers are marked as unhealthy. Is there an option to get the container restart instead of keeping the container as unhealthy?

Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.
At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/
Here is a sample compose file:
version: '2'
services:
autoheal:
restart: always
image: willfarrell/autoheal
environment:
- AUTOHEAL_CONTAINER_LABEL=all
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Simply execute docker-compose up -d on this

You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.
The Docker restart policy should be one of always or unless-stopped.
The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy.
In the following example I used curl with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill command.
HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'
The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless. Then if the curl HTTP request fails 3 times, then kill is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.
kill docs: https://linux.die.net/man/1/kill
curl docs: https://curl.haxx.se/docs/manpage.html
docker restart docs: https://docs.docker.com/compose/compose-file/compose-file-v2/#restart
Gotchas: kill behaves differently in bash than in sh. In bash you can use -1 to signal all the processes with PID greater than 1 to die.

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

You can try put in your Dockerfile something like this:
HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1
Don't forget --restart always option.
kill 1 will kill process with pid 1 in container and force container exit. Usually the process started by CMD or ENTRYPOINT has pid 1.
Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.

Unhealthy docker containers may be restarted with simple crontab rule:
* * * * * docker ps -f health=unhealthy --format "docker restart {{.ID}}" | sh

Docker has a couple of ways to get details on container health. You can configure health checks and how often they run. Also, health checks can be run on applications running inside a container, like http (this would use curl --fail option.) You can view the health_status event to get details.
For detailed information on an unhealthy container the inspect command comes in handy, docker inspect --format='{{json .State.Health}}' container-name (see https://blog.newrelic.com/2016/08/24/docker-health-check-instruction/ for more details.)
You should resolve the error condition causing the "unhealthy" tag (anytime the health check command runs and gets an exit code of 1) first. This may or may not require that Docker restart the container, depending on the error. If you are starting/restarting your containers automatically, then either trapping the start errors or logging them and the health check status can help address errors quickly. Check the link if you are interested in auto start.

According to https://codeblog.dotsandbrackets.com/docker-health-check/
Create container and add " restart: always".
In the use of healthcheck, pay attention to the following points:
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

Related

Difference in docker restart policy between on-failure and unless-stopped?

I have read the docker-compose documentation about restart policy of containers,
However I failed to understand the difference between on-failure and unless-stopped.
When will I use one over the other? In which situations a certain policy will lead to start a container and the other policy not?
on-failure will issue a restart if the exit code indicated a failure, whereas unless-stopped behaves like always and will keep an instance running unless the container is stopped.
You can try with the hello-world to see the difference.
docker run --restart on-failure hello-world will run once and exit successfully, and running a subsequent docker ps will indicate no currently running instances of the container.
However, docker run --restart unless-stopped hello-world will restart the container even if it successfully exits, so subsequently running docker ps will show you a restarting instance until you stop the container.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4d498ebd13a6 hello-world "/hello" 2 seconds ago Restarting (0) Less than a second ago modest_keldysh
Docker restart policies are there to keep containers active in all possible downfalls, we can leverage it in multiple ways as an example if we have a web server running on a container and have to keep it active even on bad request we can use unless-stopped flag, it will keep the server up and running till we stopped it manually.
Restart flag can be any one of these :-
"no" :- it is the default value, and it will never restart the container.
on-failure :- it will restart the container whenever it encounters an error, or say, whenever the process running inside the container exit with non-zero exit code. Exit code :- 0 means no error, we terminated the process intentionally, but any non-zero value is an error.
always :- as the name, it will always restart the container, no matter whatever be the exit code is. Also, it will restart the container even when we manually stopped it but for that we need to restart the docker daemon.
unless-stopped :- it is similar to the always flag the only difference is once the container is stopped manually it will not restart automatically even after restarting the docker daemon, until we start the container manually again.
The difference between unless-stopped and on-failure is the first will always restart until we stopped it manually no matter whatever be the exit code will be and another will only restart the container on real failure, i.e. exit code = non-zero.
Once the container is stopped its restart flags are ignored, this is one way to overcome from restarting loop. That's why in the case of always flag once we stopped it manually the container will not restart till we restart the docker daemon.
You can easily test all of these flags by creating a simple redis-server:
docker run -d --restart=always --name redis-server redis # start redis image
docker container ls # test the status
docker stop redis-server # stop the container manually
docker container ls # test the status again, got the redis-server did not restarted
sudo service docker restart # restart the docker daemon
# test the status again will find the container is again up and running
# try the same steps by changing the restart flag with *unless-stopped*
docker update --restart=unless-stopped redis-server # will update the restart flag of running container.

How to wait until `docker start` is finished?

When I run docker start, it seems the container might not be fully started at the time the docker start command returns. Is it so?
Is there a way to wait for the container to be fully started before the command returns? Thanks.
A common technique to make sure a container is fully started (i.e. services running, ports open, etc) is to wait until a specific string is logged. See this example Waiting until Docker containers are initialized dealing with PostgreSql and Rails.
Edited:
There could be another solution using the HEALTHCHECK of Docker containers.The idea is to configure the container with a health check command that is used to determine whether or not the main service if fully
started and running normally.
The specified command runs inside the container and sets the health status to starting, healthy or unhealthy
depending of its exit code (0 - container healthy, 1 - container is not healthy). The status of the container can then be retrieved
on the host by inspecting the running instance (docker inspect).
Health check options can be configured inside Dockerfile or when the container is run. Here is a simple example for PostgreSQL
docker run --name postgres --detach \
--health-cmd='pg_isready -U postgres' \
--health-interval='5s' \
--health-timeout='5s' \
--health-start-period='20s' \
postgres:latest && \
until docker inspect --format "{{json .State.Health.Status }}" postgres| \
grep -m 1 "healthy"; do sleep 1 ; done
In this case the health command is pg_isready. A web service will typically use curl, other containers have their specific commands
The docker community provides this kind of configuration for several official images here
Now, when we restart the container (docker start), it is already configured and we need only the second part:
docker start postgres && \
until docker inspect --format "{{json .State.Health.Status }}" postgres|\
grep -m 1 "healthy"; do sleep 1 ; done
The command will return when the container is marked as healthy
Hope that helps.
Disclaimer, I'm not an expert in Docker, and will be glad to know by myself whether a better solution exists.
The docker system doesn't really know that container "may not be fully started".
So, unfortunately, there is nothing to do with this in docker.
Usually, the commands used by the creator of the docker image (in the Dockerfile) are supposed to be organized in a way that the container will be usable once the docker start command ends on the image, and its the best way. However, it's not always the case.
Here is an example:
A Localstack, which is a set of services for local development with AWS has a docker image, but once its started, for example, S3 port is not ready to get connections yet.
From what I understand a non-ready-although-exposed port will be a typical situation that you refer to.
So, out of my experience, in the application that talks to docker process the attempt to connect to the server port should be enclosed with retries and once it's available.

Restarting an unhealthy docker container based on healthcheck

I am using Docker version 17.09.0-ce, and I see that containers are marked as unhealthy. Is there an option to get the container restart instead of keeping the container as unhealthy?
Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.
At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/
Here is a sample compose file:
version: '2'
services:
autoheal:
restart: always
image: willfarrell/autoheal
environment:
- AUTOHEAL_CONTAINER_LABEL=all
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Simply execute docker-compose up -d on this
You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.
The Docker restart policy should be one of always or unless-stopped.
The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy.
In the following example I used curl with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill command.
HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'
The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless. Then if the curl HTTP request fails 3 times, then kill is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.
kill docs: https://linux.die.net/man/1/kill
curl docs: https://curl.haxx.se/docs/manpage.html
docker restart docs: https://docs.docker.com/compose/compose-file/compose-file-v2/#restart
Gotchas: kill behaves differently in bash than in sh. In bash you can use -1 to signal all the processes with PID greater than 1 to die.
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.
You can try put in your Dockerfile something like this:
HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1
Don't forget --restart always option.
kill 1 will kill process with pid 1 in container and force container exit. Usually the process started by CMD or ENTRYPOINT has pid 1.
Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.
Unhealthy docker containers may be restarted with simple crontab rule:
* * * * * docker ps -f health=unhealthy --format "docker restart {{.ID}}" | sh
Docker has a couple of ways to get details on container health. You can configure health checks and how often they run. Also, health checks can be run on applications running inside a container, like http (this would use curl --fail option.) You can view the health_status event to get details.
For detailed information on an unhealthy container the inspect command comes in handy, docker inspect --format='{{json .State.Health}}' container-name (see https://blog.newrelic.com/2016/08/24/docker-health-check-instruction/ for more details.)
You should resolve the error condition causing the "unhealthy" tag (anytime the health check command runs and gets an exit code of 1) first. This may or may not require that Docker restart the container, depending on the error. If you are starting/restarting your containers automatically, then either trapping the start errors or logging them and the health check status can help address errors quickly. Check the link if you are interested in auto start.
According to https://codeblog.dotsandbrackets.com/docker-health-check/
Create container and add " restart: always".
In the use of healthcheck, pay attention to the following points:
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

Docker - what does `docker run --restart always` actually do?

Although it seems like the --restart flag is simple and straightforward, I came up with a number of questions when experimenting with it:
With respect to ENTRYPOINT definitions - what are the actual defined semantics during restart?
If I exec into the container (I am on a DDC) and kill -9 the process, it restarts, but if I do docker kill it does not. Why?
How does restart interact with Shared Data Containers / Named Volumes?
Restart policies
Using the --restart flag on Docker run you can specify a restart policy for how a container should or should not be restarted on exit.
When a restart policy is active on a container, it will be shown as either Up or Restarting in docker ps. It can also be useful to use docker events to see the restart policy in effect.
docker run --always
Always restart the container regardless of the exit status. When you
specify always, the Docker daemon will try to restart the container
indefinitely. The container will also always start on daemon startup,
regardless of the current state of the container.
I recommend you this documentation about restart-policies
Documentation - Restart policies
Update Docker v19.03
Restart policies (--restart)
Use Docker’s --restart to specify a container’s restart policy. A restart policy > controls whether the Docker daemon restarts a container after exit. Docker supports the following restart policies:
always Always restart the container regardless of the exit status. When you specify always, the Docker daemon will try to restart the container indefinitely. The container will also always start on daemon startup, regardless of the current state of the container.
$ docker run --restart=always redis
Documentation - Restart policies
To configure the restart policy for a container, use the --restart flag when using the docker run command. The value of the --restart flag can be any of the following:
no Do not automatically restart the container. (the default)
on-failure Restart the container if it exits due to an error, which
manifests as a non-zero exit code.
always Always restart the container if it stops. If it is manually
stopped, it is restarted only when Docker daemon restarts or the
container itself is manually restarted.
unless-stopped Similar to always, except that when the container is
stopped (manually or otherwise), it is not restarted even after Docker
daemon restarts.
The following example starts a Redis container and configures it to always restart unless it is explicitly stopped or Docker is restarted.
$ docker run -d --restart unless-stopped redis
This command changes the restart policy for an already running container named redis.
$ docker update --restart unless-stopped redis
And this command will ensure all currently running containers will be restarted unless stopped.
$ docker update --restart unless-stopped $(docker ps -q)
Restart policy details
Keep the following in mind when using restart policies:
A restart policy only takes effect after a container starts successfully. In this case, starting successfully means that the container is up for at least 10 seconds and Docker has started monitoring it. This prevents a container which does not start at all from going into a restart loop.
If you manually stop a container, its restart policy is ignored until the Docker daemon restarts or the container is manually restarted. This is another attempt to prevent a restart loop.
Restart policies only apply to containers. Restart policies for swarm services are configured differently.
Documentation
I had some time to debug this more today -> because I was using an 'official' docker image I had little to no visibility into what was occurring. To resolve this, I extended the official image and invoked my own entrypoint. The Dockerfile:
FROM officialImage:version
ENV envOne=value1 \
envTwo=value2
COPY wrapper-entrypoint.sh /
ENTRYPOINT ["/wrapper-entrypoint.sh"]
Then I did a 'set -x' in the wrapper-entrypoint.sh script and invoked the original:
#!/bin/bash
set -x
echo "Be pedantic: all args passed: $#"
bash -x ./original-entrypoint.sh "$#"
From this I found:
Restart does call the original ENTRYPOINT with the original arguments. The official image I used detected it had already initialized and thus acted differently. This is why I was confused over the semantics. Using -x allowed me to see what was really happening.
I still don't know why docker kill stops the restart, but that is what I see - at least on Docker Data Center.
I don't believe Shared Data Volumes affect this in any way, SAVE for the actions a given ENTRYPOINT script might take based upon it's condition at the time of the restart.

How to keep Docker container running after starting services?

I've seen a bunch of tutorials that seem do the same thing I'm trying to do, but for some reason my Docker containers exit. Basically, I'm setting up a web-server and a few daemons inside a Docker container. I do the final parts of this through a bash script called run-all.sh that I run through CMD in my Dockerfile. run-all.sh looks like this:
service supervisor start
service nginx start
And I start it inside of my Dockerfile as follows:
CMD ["sh", "/root/credentialize_and_run.sh"]
I can see that the services all start up correctly when I run things manually (i.e. getting on to the image with -i -t /bin/bash), and everything looks like it runs correctly when I run the image, but it exits once it finishes starting up my processes. I'd like the processes to run indefinitely, and as far as I understand, the container has to keep running for this to happen. Nevertheless, when I run docker ps -a, I see:
➜ docker_test docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c7706edc4189 some_name/some_repo:blah "sh /root/run-all.sh 8 minutes ago Exited (0) 8 minutes ago grave_jones
What gives? Why is it exiting? I know I could just put a while loop at the end of my bash script to keep it up, but what's the right way to keep it from exiting?
If you are using a Dockerfile, try:
ENTRYPOINT ["tail", "-f", "/dev/null"]
(Obviously this is for dev purposes only, you shouldn't need to keep a container alive unless it's running a process eg. nginx...)
I just had the same problem and I found out that if you are running your container with the -t and -d flag, it keeps running.
docker run -td <image>
Here is what the flags do (according to docker run --help):
-d, --detach=false Run container in background and print container ID
-t, --tty=false Allocate a pseudo-TTY
The most important one is the -t flag. -d just lets you run the container in the background.
This is not really how you should design your Docker containers.
When designing a Docker container, you're supposed to build it such that there is only one process running (i.e. you should have one container for Nginx, and one for supervisord or the app it's running); additionally, that process should run in the foreground.
The container will "exit" when the process itself exits (in your case, that process is your bash script).
However, if you really need (or want) to run multiple service in your Docker container, consider starting from "Docker Base Image", which uses runit as a pseudo-init process (runit will stay online while Nginx and Supervisor run), which will stay in the foreground while your other processes do their thing.
They have substantial docs, so you should be able to achieve what you're trying to do reasonably easily.
you can run plain cat without any arguments as mentioned by bro #Sa'ad to simply keep the container working [actually doing nothing but waiting for user input] (Jenkins' Docker plugin does the same thing)
The reason it exits is because the shell script is run first as PID 1 and when that's complete, PID 1 is gone, and docker only runs while PID 1 is.
You can use supervisor to do everything, if run with the "-n" flag it's told not to daemonize, so it will stay as the first process:
CMD ["/usr/bin/supervisord", "-n"]
And your supervisord.conf:
[supervisord]
nodaemon=true
[program:startup]
priority=1
command=/root/credentialize_and_run.sh
stdout_logfile=/var/log/supervisor/%(program_name)s.log
stderr_logfile=/var/log/supervisor/%(program_name)s.log
autorestart=false
startsecs=0
[program:nginx]
priority=10
command=nginx -g "daemon off;"
stdout_logfile=/var/log/supervisor/nginx.log
stderr_logfile=/var/log/supervisor/nginx.log
autorestart=true
Then you can have as many other processes as you want and supervisor will handle the restarting of them if needed.
That way you could use supervisord in cases where you might need nginx and php5-fpm and it doesn't make much sense to have them apart.
Motivation:
There is nothing wrong in running multiple processes inside of a docker container. If one likes to use docker as a light weight VM - so be it. Others like to split their applications into micro services. Me thinks: A LAMP stack in one container? Just great.
The answer:
Stick with a good base image like the phusion base image. There may be others. Please comment.
And this is yet just another plead for supervisor. Because the phusion base image is providing supervisor besides of some other things like cron and locale setup. Stuff you like to have setup when running such a light weight VM. For what it's worth it also provides ssh connections into the container.
The phusion image itself will just start and keep running if you issue this basic docker run statement:
moin#stretchDEV:~$ docker run -d phusion/baseimage
521e8a12f6ff844fb142d0e2587ed33cdc82b70aa64cce07ed6c0226d857b367
moin#stretchDEV:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS
521e8a12f6ff phusion/baseimage "/sbin/my_init" 12 seconds ago Up 11 seconds
Or dead simple:
If a base image is not for you... For the quick CMD to keep it running I would suppose something like this for bash:
CMD exec /bin/bash -c "trap : TERM INT; sleep infinity & wait"
Or this for busybox:
CMD exec /bin/sh -c "trap : TERM INT; (while true; do sleep 1000; done) & wait"
This is nice, because it will exit immediately on a docker stop.
Just plain sleep or cat will take a few seconds before the container is forcefully killed by docker.
Updates
As response to Charles Desbiens concerning running multiple processes in one container:
This is an opinion. And the docs are pointing in this direction. A quote: "It’s ok to have multiple processes, but to get the most benefit out of Docker, avoid one container being responsible for multiple aspects of your overall application." For sure it obviously much more powerful to devide your complex service into multiple containers. But there are situations where it can be beneficial to go the one container route. Especially for appliances. The GitLab Docker image is my favourite example of a multi process container. It makes deployment of this complex system easy. There is no way for mis-configuration. GitLab retains all control over their appliance. Win-Win.
Make sure that you add daemon off; to you nginx.conf or run it with CMD ["nginx", "-g", "daemon off;"] as per the official nginx image
Then use the following to run both supervisor as service and nginx as foreground process that will prevent the container from exiting
service supervisor start && nginx
In some cases you will need to have more than one process in your container, so forcing the container to have exactly one process won't work and can create more problems in deployment.
So you need to understand the trade-offs and make your decision accordingly.
Since docker engine v1.25 there is an option called init.
Docker-compose included this command as of version 3.7.
So my current CMD when running a container that should run into infinity:
CMD ["sleep", "infinity"]
and then run it using:
docker build
docker run --rm --init app
crf.:
rm docs and init docs
Capture the PID of the ngnix process in a variable (for example $NGNIX_PID) and at the end of the entrypoint file do
wait $NGNIX_PID
In that way, your container should run until ngnix is alive, when ngnix stops, the container stops as well
Along with having something along the lines of : ENTRYPOINT ["tail", "-f", "/dev/null"] in your docker file, you should also run the docker container with -td option. This is particularly useful when the container runs on a remote m/c. Think of it more like you have ssh'ed into a remote m/c having the image and started the container. In this case, when you exit the ssh session, the container will get killed unless it's started with -td option. Sample command for running your image would be: docker run -td <any other additional options> <image name>
This holds good for docker version 20.10.2
There are some cases during development when there is no service yet but you want to simulate it and keep the container alive.
It is very easy to write a bash placeholder that simulates a running service:
while true; do
sleep 100
done
You replace this by something more serious as the development progress.
How about using the supervise form of service if available?
service YOUR_SERVICE supervise
Once supervise is successfully running, it will not exit unless it is
killed or specifically asked to exit.
Saves having to create a supervisord.conf

Resources