Docker container not shutting down in Swarm cluster - docker

On my docker swarm cluster, when I perform a docker stack deploy with a new version of my service's image or do a docker service update --force, the old containers of the service(s) get desired state SHUTDOWN, they remain with a current state running.
However, they don't seem te be actually running, I can't do anything with them, docker logs, docker inspect, docker exec, ... nothing.
The only way to get rid of them is to restart the docker daemon.
What would you consider look at to try to understand and fix this recurring issue ?

We faced the same issue a few days ago: Turned out, we had a logging-driver configured, but the logging-server was not available. We stopped using this anyway, but forgot to remove the configuration from the service:
logging:
driver: fluentd
options:
fluentd-address: localhost:24224
fluentd-async-connect: "true"
Removing this configuration fixed the issue for future containers. Old instances were still hanging around, but restarting Docker helped.

Related

Kubernetes Cluster - Containers do not restart after reboot

I have a kubernetes cluster setup at home on two bare metal machines.
I used kubespray to install both and it uses kubeadm behind the scenes.
The problem I encounter is that all containers within the cluster have a restartPolicy: no which makes my cluster break when I restart the main node.
I have to manually run "docker container start" for all containers in "kube-system" namespace to make it work after reboot.
Does anyone have an idea where the problem might be coming from ?
Docker provides restart policies to control whether your containers start automatically when they exit, or when Docker restarts. Here your containers have the restart policy - no which means this policy will never automatically start the container under any circumstance.
You need to change the restart policy to Always which restarts the container if it stops. If it is manually stopped, it is restarted only when Docker daemon restarts or the container itself is manually restarted.
You can change the restart policy of an existing container using docker update. Pass the name of the container to the command. You can find container names by running docker ps -a.
docker update --restart=always <CONTAINER NAME>
Restart policy details:
Keep the following in mind when using restart policies:
A restart policy only takes effect after a container starts successfully. In this case, starting successfully means that the container is up for at least 10 seconds and Docker has started monitoring it. This prevents a container which does not start at all from going into a restart loop.
If you manually stop a container, its restart policy is ignored until the Docker daemon restarts or the container is manually restarted. This is another attempt to prevent a restart loop.
I am answering my question:
It wasn't probably very clear but I was talking about the kube-system pods that manage the whole cluster and that should automatically start when the machine restarts.
It turns out those pods (ex: code-dns, kube-proxy, etc) have a restart policy of "no" intentionally and it is the kubelet service on the node that spins up the whole cluster when you restart your node.
https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
In my case kubelet could not start due to missing cri-dockerd process.
Check the issue I opened at kubespray:
Verifying the kubelet logs is done like so:
journalctl -u kubelet -f

Docker containers have lost with no reason

I have been using 3-4 Docker containers for 1-2 months. However, I hibernate my PC instead of shutdown and before hibernate I stop Docker engine everyday for the last weeks. However, today I cannot see my containers and there is only "No containers running" message on Docker dashboard. I restarted many times and finally update to latest version and restarted PC, but still no containers. I also tried Docker factory reset, but nothing is changed. SO, how can I access my containers?
I tried to list containers via: docker container ls, but there is no container listed. So, have my containers has gone even with no reason?
Normally you can list stopped containers with
docker container ls -a
Then check the logs on those containers of they will not start. However...
I also tried Docker factory reset
At this point those containers, images, and volumes are most likely gone. I don't believe there's a recovery after that step.

Does restarting docker service kills all containers?

I'm having trouble with docker where docker ps won't return and is stuck.
I found that doinng docker service restart something like
sudo service docker restart (https://forums.docker.com/t/what-to-do-when-all-docker-commands-hang/28103/4)
However I'm worried if it will kill all the running containers? (I guess the service do provide service so that docker containers can run?)
In the default configuration, your assumption is correct: If the docker daemon is stopped, all running containers are shut down.. But, as outlined on the link, this behaviour can be changed on docker >= 1.12 by adding
{
"live-restore": true
}
to /etc/docker/daemon.json. Crux: the daemon must be restarted for this change to take effect. Please take note of the limitations of live reload, e.g. only patch version upgrades are supported, not major version upgrades.
Another possibility is to define a restart policy when starting a container. To do so, pass one of the following values as value for the command line argument --restart when starting the container via docker run:
no Do not automatically restart the container. (the default)
on-failure Restart the container if it exits due to an error, which manifests
as a non-zero exit code.
always Always restart the container if it stops. If it is manually stopped,
it is restarted only when Docker daemon restarts or the container
itself is manually restarted.
(See the second bullet listed in restart policy details)
unless-stopped Similar to always, except that when the container is stopped
(manually or otherwise), it is not restarted even after Docker
daemon restarts.
For your specific situation, this would mean that you could:
Restart all containers with --restart always (more on that further below)
Re-configure the docker daemon to allow for live reload
Restart the docker daemon (which is not yet configured for live reload, but will be after this restart)
This restart would shut down and then restart all your containers once. But from then on, you should be free to stop the docker daemon without your containers terminating.
Handling major version upgrades
As mentioned above, live reload cannot handle major version upgrades. For a major version upgrade, one has to tear down all running containers. With a restart policy of always, however, the containers will be restarted after the docker daemon is restarted after the upgrade.

Cron job to kill all hanging docker containers

I am new to docker containers but we have containers being deployed and due to some internal application network bugs the process running in the container hangs and the docker container is not terminated. While we debug this issue I would like a way to find all those containers and setup a cron job to periodically check and kill those relevant containers.
So how would I determine from "docker ps -a" which containers should be dropped and how would I go about it? Any ideas? We are eventually moving to kubernetes which will help with these issues.
Docker already have a command to cleanup the docker environment, you can use it manually or maybe setup a job to run the following command:
$ docker system prune
Remove all unused containers, networks, images (both dangling and
unreferenced), and optionally, volumes.
refer to the documentation for more details on advanced usage.

high availability with docker swarm mode

I have some problem using docker swarm mode .
I want to have high availability with swarm mode.
I think I can do that with rolling update of swarm.
Something like this...
docker service update --env-add test=test --update-parallelism 1 --update-delay 10s 6bwm30rfabq4
However there is a problem.
My docker image have entrypoint. Because of this there is a little delay before the service(I mean docker container) is really up. But docker service just think the service is already running, because status of the container is 'Up'. Even the service still do some work on entrypoint. So some container return error when I try to connect the service.
For example, if I create docker service named 'test' and scale up to 4 with port 8080. I can access test:8080 on web browser. And I try to rolling update with --update-parallelism 1 --update-delay 10s options. After that I try to connect the service again.. one container return error.. Because Docker service think that container already run..even the container still doesn't up because of entrypoint. And after 10s another container return error.. because update is started and docker service also think that container is already up.
So.. Is there any solution to solve this problem?
Should I make some nginx settings for disconnect connection to error container and reconnect another one?
The HEALTHCHECK Dockerfile command works for this use case. You specify how Docker should check if the container is available, and it gets used during updates as well as checking service levels in Swarm.
There's a good article about it here: Reducing Deploy Risk With Docker’s New Health Check Instruction.

Resources