Docker swarm stop grace period doesn't work as expected

Docker swarm stop grace period doesn't work as expected - docker

I am running Docker in swarm mode with several nodes in the cluster.
According to the documentation written here: https://docs.docker.com/engine/reference/commandline/service_update/ and here: https://docs.docker.com/engine/reference/commandline/service_create/, --stop-grace-period command sets the time to wait before force killing a container.
Expected behavior -
My expectation was that Docker would wait this period of time until it tries to stop a running container, during a rolling update.
Actual behavior -
Docker sends the termination signal after several seconds the new container with the new version of the image starts.
Steps to reproduce the behavior
docker service create --replicas 1 --stop-grace-period 60s --update-delay 60s --update-monitor 5s --update-order start-first --name nginx nginx:1.15.8
Wait for the service to start up the container (aprox. 2 minutes)
docker service update --image nginx:1.15.9 nginx
docker ps -a
As you can see, the new container started and after a second, the
old one was killed by Docker.
Any idea why?
I also opened an issue on Github, here: https://github.com/docker/for-linux/issues/615

The --stop-grace-period value is the amount of time that Docker will wait after sending a sigterm and give up waiting for the container to exit gracefully. Once the grace period is complete, it will kill the container with a sigkill.
The sequence of events seem to happen as is designed based on your description of your setup. Your container exits cleanly and quickly when it gets its sigterm so Docker never needs to send a sigkill.
I see you also specified --update-delay 60 but that won't take effect since you only have one replica. The update delay will tell docker to wait 60 seconds after cycling the first task, so it is only helpful for 2 or more replicas.
It seems like you want your single-replica service to run a new task and an old task concurrently for 60 seconds, but swarm mode is happy to get rid of old containers with sigterm as soon as the new container is up.

I think you can close the issue on GitHub.
stop-grace-period this is the period between stop (SIGTERM) and kill (SIGKILL).
Of course, you can change SIGTERM to another signal by using --stop-signal switch. The behavior of application into a container, when a stop signal is received, is your responsibility.
Here good article explaining this kitchen.

Related

Difference between docker stop, docker restart, docker-compose stop and uptime

I've read that docker stop sends SIGTERM then SIGKILL to the main process, presumably pid 1 (https://www.edureka.co/community/50826/what-the-process-stopping-and-restarting-docker-container), which as a process leader, therefore kills all the processes running beneath it--i.e., "the container".
However, when I do a docker-compose stop, and then docker-compose start, I see 2 unexpected things:
the uptime of a particular container as unchanged--i.e., over 10 days.
the /tmp contents were NOT deleted (it's Debian GNU/Linux 11 (bullseye))
Two questions:
Is it possible to stop a container but keep its uptime unchanged?
Shouldn't the /tmp contents be deleted, as happens on bare metal restarts?

How to stop a running docker container 'X' minutes from startup?

I would like to stop my running docker container after a specific time, let's say 2 hrs after startup. So far my research has led to the following solutions. I just wanted to know if there were better ways to do it.
Use a cron job to stop the container by calling the docker stop command.
Use an entry point like sleep 5000, but this does not suit my use case.
Using --stop-timeout in the docker run command ; I believe this is just the maximum timeout given for the container to gracefully shutdown. Am I missing something
here?

You can use the timeout command that's part of the coreutils package which is already installed in the debian images (and probably many others).
This will run the container for 30 seconds and then stop
docker run debian timeout 30 tail -f /dev/null
Basically, add timeout 7200 in front of the command you want to run in the container, and it'll be killed after 2 hours.

celery beat container gets stuck

I started running a celery beat worker in a dedicated container. This works fine sometimes, but now I get the following error trying to remove or re-deploy my containers:
An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
Also, I cannot access the container anymore and the following commands just get stuck:
docker restart beat
docker logs beat
docker exec beat bash

Docker container restart instantly, despite I have set the "-t" °timeout° option

here my snippet:
docker restart -t 5 waitforit_
then docker ps returns immediately :
status => run since 1s
How it is possible?
any hint would be great,
thanks

I believe docker restart is equivalent to docker stop; docker start. The -t option isn’t a hard wait. Rather, it says that if the process doesn’t stop on its own after receiving SIGTERM, then send it SIGQUIT (kill -9) after that many seconds.
If your process is well-behaved and exits promptly when it receives SIGTERM, then docker restart will in fact be pretty quick, regardless of whatever value you pass as -t.

Which one should i use? docker kill or docker stop?

Will docker stop fail if processes running inside the container fail to stop?? If i use docker kill, can unsaved data inside the container be preserved. Is docker stop time consuming compared to docker kill?? I want to do a shutdown of the container but without loosing any data(without high latency to complete kill or stop process).

Line reference:
docker stop: Stop a running container (send SIGTERM, and then SIGKILL
after grace period) [...] The main process inside the container will
receive SIGTERM, and after a grace period, SIGKILL. [emphasis mine]
docker kill: Kill a running container (send SIGKILL, or specified
signal) [...] The main process inside the container will be sent
SIGKILL, or any signal specified with option --signal. [emphasis mine]
You can get more info from this post: https://superuser.com/questions/756999/whats-the-difference-between-docker-stop-and-docker-kill

Docker stop:
When you issue a docker stop command a hardware signal is sent to the process inside of that container. In the case of docker stop we send a sig term message which is short for terminate signal its a message that's going to be received by the process telling it essentially to shut down on its own time.
SIGTERM is used any time that you want to stop a process inside of your container and shut the container down, and you want to give that process inside there a little bit of time to shut itself down and do a little bit of clean up.
A lot of different programming languages have the ability for you to listen for these signals inside of your code base, and as soon as you get that signal you could attempt to do a little bit of cleanup or maybe save some file or emit some message or something like that.
On the other hand the docker kill command issue is a sig kill or kills signal to the primary running process inside the container, so kill it essentially means you have to shut down right now and you do not get to do any additional work.
So ideally we always stop a container with the docker stop command in order to get the running process inside of it a little bit of time to shut itself down, otherwise if it feels like the container has locked up and it's not responding to the docker stop command then we could issue docker kill instead.
One kind of little oddity or interesting thing about docker stop, when issue docker stop to a container and if the container dose not automatically stop in 10 seconds then docker is going to automatically fall back to issuing the docker kill command.
So essentially at docker stop is us being nice but it's only got 10 seconds to actually shut down.
A good example could be ping command.
sudo docker run busybox ping google.com
now if you want to stop the container if you use docker stop container_id, you will see it takes 10 seconds before getting shut down because ping command dose not properly respond to a SIGTERM message. In other words the ping command doesn't really have the ability to say oh yeah I understand you want me to shut down.
So after we waited those 10 seconds eventually the kill signal was sent to it telling it hey ping you are done and shut yourself down.
But if you use docker kill container_id you are going to see that's it instantly dead.

You should use docker stop since it stops the container gracefully - like shutting down your laptop, instead of killing them - like forcibly turn off the laptop from it's battery.
But, Docker will force shut down (kill the processes) by the time it takes 10 seconds to stop them gracefully.

docker stop will send SIGTERM (terminate signal) to the process and docker will have 10 seconds to clean up like saving files or emitting some messages.
Use docker kill when container is locked up, if it is not responding.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart