I have a docker-compose.yml file with a following:
services:
kafka_listener:
build: .
command: bundle exec ./kafka foreground
restart: always
# other services
Then I start containers with: docker-compose up -d
On my amazon instance kafka-server (for example) fails to start sometimes, so ./kafka foregound script fails. When typing docker ps I see a message: Restarting (1) 11 minutes ago. I thought docker should restart failed container instantly, but it seems it doesn't. After all, container has been restarted in about 30 minutes since first failed attempt.
Is there any way to tell Docker-Compose to restart container instantly after failure?
You can use this policy :
on-failure
The on-failure policy is a bit interesting as it allows you to tell Docker to restart a container if the exit code indicates error but not if the exit code indicates success. You can also specify a maximum number of times Docker will automatically restart the container. like on-failure:3 It will retry 3 times.
unless-stopped
The unless-stopped restart policy behaves the same as always with one exception. When a container is stopped and the server is rebooted or the Docker service is restarted, the container will not be restarted.
Hope this will help you in this problem.
Thank you!
Related
I want to my docker container deconz to restart after a reboot or the container failed to start.
My compose file is
version: "3.3"
services:
deconz:
image: marthoc/deconz
container_name: deconz
network_mode: host
restart: "always"
volumes:
- /sharedfolders/media/AppData/deconz:/root/.local/share/dresden-elektronik/deCONZ
devices:
- /dev/ttyACM0
environment:
- DECONZ_WEB_PORT=8083
- DECONZ_WS_PORT=8443
- DEBUG_INFO=1
- DEBUG_APS=0
- DEBUG_ZCL=0
- DEBUG_ZDP=0
- DEBUG_OTAU=0
I use the command docker-compose up -d to start the container. now I assume that after a reboot the container starts before the USB device is recognized. I want docker to keep trying to restart until successful. I assumed that restart: always or restart: unless-stopped does it but apparently I am mistaken.
Docker (docker-compose) will not help you directly in this task. The only thing that the docker orchestrator is doing is to recognize that the container had failed and to create new container to replace it.
Other orchestrators like Kubernetes have improved handling of the lifecycle, by allowing the orchestrator to recognize the internal state of the containers. Based on the internal state, the orchestrator will manage the lifecycle of that container and also the lifecycle of the related containers.
In your particular case, even just by moving to Kubernetes will not really help you, since it is container's task to recognize if he has all the resources ready to start working.
What you need to do is to create a startup script for the container that will recognize that all of the required resources are ready and it can proceed with the start. When you prepare the script, you can choose to exit from the script after waiting certain time (in which case Docker will detect it as container failure and will handle it based on restart rules) or to wait forever, until the resources are ready. I prefer the approach to wait for a while and then fail if resources are still not ready. This makes it more easier for the administrator to recognize that the container is not healthy.
Most trivial example of the script would be:
testfile="/dev/usbdrive/Iamthedrive.txt"
while :
do
if [ -e "$testfile" ]
then
echo "drive is mounted."
start_the_container_main_process.sh
exit(0)
fi
echo "drive is still not ready, waiting 10s"
sleep(10)
done
Make sure you have sleep for certain amount of time to go easy on the system resources.
I have installed gitlab-ce on my server with the following docker.compose.yml :
gitlab:
image: 'gitlab/gitlab-ce:latest'
hostname: 'gitlab.devhunt.eu'
restart: 'always'
environment:
# ...
volumes:
- '/var/lib/gitlab/config:/etc/gitlab'
- '/var/lib/gitlab/logs:/var/log/gitlab'
- '/var/lib/gitlab/data:/var/opt/gitlab'
I used it for a while and now I want to remove it. I noticed that when I did docker stop gitlab (which is the container name), it kept coming back so I figured it was because of the restart: always. Thus the struggle begins :
I tried docker update --restart=no gitlab before docker stop gitlab. Still coming back.
I did docker stop gitlab && docker rm gitlab. It got deleted but came back soon after
I went to change the docker-compose.yml to restart: no, did docker-compose down. The container got stopped and deleted but came back soon after
I did docker-compose up to apply the change in the compose file, checked that it was successfully taken into account with docker inspect -f "{{ .HostConfig.RestartPolicy }}" gitlab. The response was {no 0}. I did docker-compose down. Again, it got stopped and deleted but came back soon after
I did docker stop gitlab && docker rm gitlab && docker image rm fcc1e4187c43 (the image hash I had for docker-ce). The container got stopped, deleted and the image got deleted. And it seemed that I had finally managed to kill the beast... one hour later, gitlab container was reinstalled with another image hash (3cc8e8a0764d) and the container was starting.
I would stop the docker daemon but I have production websites and databases running and I would like to avoid downtime if possible. Any idea what I can do ?
you-ve set the restart policy to always, set it to unless-stoped.
check the docs https://docs.docker.com/config/containers/start-containers-automatically/
TL;DR: I have two almost identical services in my compose file except for the name of the service and the published ports. When deploying with docker stack deploy..., why does the first service fail with a no such image error, while the second service using the same image runs perfectly fine?
Full: I have a docker-compose file with two Apache Tomcat services pulling the same image from my private git repository. The only difference between the two services in my docker-compose.yml is the name of the service (*_dev vs. *_prod) and the published ports. I deploy this docker-compose file on my swarm using the Gitlab CI with the gitlab-ci.yml. For the deployment of my docker-compose in this gitlab-ci.yml I use two commands:
...
script:
- docker pull $REGISTRY:$TAG
- docker stack deploy -c docker-commpose.yml webapp1 --with registry-auth
...
(I use a docker pull [image] command to have the image on the right node, since my --with-registry-auth is not working properly, but this is not my problem currently).
Now the strange thing is that for the first service, I obtain a No such image: error and the service is stopped, while for the second service everything seems to run perfectly fine. Both services are on the same worker node. This is what I get if I docker ps:
:~$ docker service ps webapp1_tomcat_dev
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
xxx1 webapp1_tomcat_dev.1 url/repo:tag worker1 node Shutdown Rejected 10 minutes ago "No such image: url/repo:tag#xxx…"
xxx2 \_ webapp1_tomcat_dev.1 url/repo:tag worker1 node Shutdown Rejected 10 minutes ago "No such image: url/repo:tag#xxx…"
:~$ docker service ps webapp1_tomcat_prod
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
xxx3 webapp1_tomcat_prod.1 url/repo:tag worker1 node Running Running 13 minutes ago
I have used the --no-trunc obtain to see that the IMAGE used by *_prod and *_dev is identical.
The restart_policy in my docker-compose explains why the first service fails three minutes after the second service started. Here is my docker-compose:
version: '3.2'
services:
tomcat_dev:
image: url/repo:tag
deploy:
restart_policy:
condition: on-failure
delay: 60s
window: 120s
max_attempts: 1
ports:
- "8282:8080"
tomcat_prod:
image: url/repo:tag
deploy:
restart_policy:
condition: on-failure
delay: 60s
window: 120s
max_attempts: 1
ports:
- "8283:8080"
Why does the first service fail with a no such image error? Is it for example just not possible to have two services, that use the same image, work on the same worker node?
(I cannot simply scale-up one service, since I need to upload files to the webapp which are different for production and development - e.g. dev vs prod licenses - and hence I need two distinct services)
EDIT: Second service works because it is created first:
$ docker stack deploy -c docker-compose.yml webapp1 --with-registry-auth
Creating service webapp1_tomcat_prod
Creating service webapp1_tomcat_dev
I found a workaround by separating my services over two different docker compose files (docker-compose-prod.yml and docker-compose-dev.yml) and perform the docker stack deploy command in my gitlab-ci.yml twice:
...
script:
- docker pull $REGISTRY:$TAG
- docker stack deploy -c docker-commpose-prod.yml webapp1 --with registry-auth
- docker pull $REGISTRY:$TAG
- docker stack deploy -c docker-commpose-dev.yml webapp1 --with registry-auth
...
My gut says my restart_policy in my docker-compose was too strict as well (had a max_attempts: 1) and may be due to this the image couldn't be used in time / within one restart (as suggested by #Ludo21South). Hence I allowed more attempts, but since I already separated the services over two files (which worked already) I have not checked if this hypothesis is true.
I have a docker-compose file that creates 3 Hello World applications and uses nginx to load balance traffic across the different containers.
The docker-compose code is as follows:
version: '3.2'
services:
backend1:
image: rafaelmarques7/hello-node:latest
restart: always
backend2:
image: rafaelmarques7/hello-node:latest
restart: always
backend3:
image: rafaelmarques7/hello-node:latest
restart: always
loadbalancer:
image: nginx:latest
restart: always
links:
- backend1
- backend2
- backend3
ports:
- '80:80'
volumes:
- ./container-balancer/nginx.conf:/etc/nginx/nginx.conf:ro
I would like to verify that the restart: always policy actually works.
The approach I tried is as follows:
First, I run my application with docker-compose up;
I identify the containers IDs with docker container ps;
I kill/stop one of the containers with docker stop ID_Container or docker kill ID_Container.
I was expecting that after the 3rd step (stop/kill the container. this makes it exist with code 137), the restart policy would kick in and create a new container again.
However, this does not happen. I have read that this is intentional, as to have a way to be able to manually stop containers that have a restart policy.
Despite this, I would like to know how I can kill a container in such a way that it triggers the restart policy so that I can actually verify that it is working.
Thank you for your help.
If you run ps on the host you will be able to see the actual processes in all of your Docker containers. Once you find a container's main process's process ID, you can sudo kill it (you will have to be root). That will look more like a "crash", especially if you kill -13 to send SIGSEGV.
It is very occasionally useful for validation scenarios like this to have an endpoint that crashes your application that you can enable in test builds and some other similar silly things. Just make sure you do have a gate so that those endpoints don't exist in production builds. (In old-school C, an #ifdef TEST would do the job; some languages have equivalents but many don't.)
You can docker exec into the running container and kill processes. If your entrypoint process (pid 1) starts a sub process, find it and kill it
docker exec -it backend3 /bin/sh
ps -ef
Find the process that pid 1 is its parent and kill -9 it.
If your entrypoint in the only process (pid 1), it cannot be killed by the kill command. Consider replacing your entrypoint with a script that calls your actual process, which will allow you to use the idea I suggest above.
This should simulate a crashing container and should kick the restart process.
NOTES:
See explanation in https://unix.stackexchange.com/questions/457649/unable-to-kill-process-with-pid-1-in-docker-container
See why not run NodeJS as pid 1 in https://www.elastic.io/nodejs-as-pid-1-under-docker-images/
We have a flask service application which connects to a mysql database for data. This flask app is server via gunicorn in a docker container. We are using docker-compose for the same.
When the application starts we make the connection to the database. If the connection to the database fails (3 attempts) the application fails to initialize and exits. But am noticing that the container starts. How can i cause the container to fail to start as well when my app fails to start?
First you have to tell docker-compose that you want all containers to stop execution upon exit of your main service. This is done using --abort-on-container-exit command line argument. Lets say if you have 2 services:
docker-compose.yml
version: '3'
services:
db:
...
flask:
...
then command line will look something like:
docker-compose up --exit-code-from flask --abort-on-container-exit
Thus telling your flask service is main and you don't want to continue when it exits.
Second, you configure your flask main process (PID 1) to exit (preferably with non-zero exit code) if it fails to connect to database. Thats' it.
use restart:no
there are 3 options for restart policy.
- restart: no
- restart: always
- restart: on-failure
If it really starts (not in permanent restarting), try add trap 'exit' ERR in top of your entrypoint's script.