Docker hangs and gets corrupted on reboot - docker

We are running a scheduling engine with docker, chronos & mesos.
Running 2 mesos slaves on each node. Sometimes, too many Jobs gets executed on each node and docker becomes unresponsive and docker gets corrupted on rebooting the server. Is there anything wrong with the setup? Not sure, why docker hangs and gets corrupted on reboot?
Thanks

Running Docker containers won't work properly because restarting one agent
will cause Docker containers managed by the other agent to be deleted.
Check out --cgroups_root flag in
https://github.com/apache/mesos/blob/master/docs/configuration/agent.md
This flag only applies to MesosContainerizer (can be used to launch Docker
containers).

Related

Reboot Docker container from inside

I'm working with a Docker container with Debian 11 inside and a server.
I need to update this server and do other things on regular manne. I've written several scripts that can do it, but I encountered serious proble.
If I want to update the server and other packages I need to reboot the container.
I'm obviously able to do so from the computer Docker is installed on (in my case Docker Desktop running with WSL2 on Windows 10), I can reboot the container easily, but I need to automate it.
The simplest way will be to add the shutdown command to the scripts I've written. I was reading about it, but found nothing. Is there any way to reboot this container from the Debian inside it? If no, how can it be achieved and how complicated is it?
I was trying to invoke standard Linux commands to shutdown or reboot system on Debian inside container.
I expect a guide if it's possible and worth efforts.
The only way to trigger a restart of a container from within the container is to first set a restart policy on the container such as --restart=on-failure and then simply stop the container, i.e., let the main process terminate itself. The Docker engine would then restart the container.
This, however, is not the way Docker is intended to be used! Docker containers are not VMs and instead are meant to be ephemeral:
By "ephemeral", we mean that the container can be stopped and destroyed, then rebuilt and replaced with an absolute minimum set up and configuration.
This means, you shouldn't be updating the server within a running container but instead should update/rebuild the image and start a new container from it!

Docker containers have lost with no reason

I have been using 3-4 Docker containers for 1-2 months. However, I hibernate my PC instead of shutdown and before hibernate I stop Docker engine everyday for the last weeks. However, today I cannot see my containers and there is only "No containers running" message on Docker dashboard. I restarted many times and finally update to latest version and restarted PC, but still no containers. I also tried Docker factory reset, but nothing is changed. SO, how can I access my containers?
I tried to list containers via: docker container ls, but there is no container listed. So, have my containers has gone even with no reason?
Normally you can list stopped containers with
docker container ls -a
Then check the logs on those containers of they will not start. However...
I also tried Docker factory reset
At this point those containers, images, and volumes are most likely gone. I don't believe there's a recovery after that step.

Does restarting docker service kills all containers?

I'm having trouble with docker where docker ps won't return and is stuck.
I found that doinng docker service restart something like
sudo service docker restart (https://forums.docker.com/t/what-to-do-when-all-docker-commands-hang/28103/4)
However I'm worried if it will kill all the running containers? (I guess the service do provide service so that docker containers can run?)
In the default configuration, your assumption is correct: If the docker daemon is stopped, all running containers are shut down.. But, as outlined on the link, this behaviour can be changed on docker >= 1.12 by adding
{
"live-restore": true
}
to /etc/docker/daemon.json. Crux: the daemon must be restarted for this change to take effect. Please take note of the limitations of live reload, e.g. only patch version upgrades are supported, not major version upgrades.
Another possibility is to define a restart policy when starting a container. To do so, pass one of the following values as value for the command line argument --restart when starting the container via docker run:
no Do not automatically restart the container. (the default)
on-failure Restart the container if it exits due to an error, which manifests
as a non-zero exit code.
always Always restart the container if it stops. If it is manually stopped,
it is restarted only when Docker daemon restarts or the container
itself is manually restarted.
(See the second bullet listed in restart policy details)
unless-stopped Similar to always, except that when the container is stopped
(manually or otherwise), it is not restarted even after Docker
daemon restarts.
For your specific situation, this would mean that you could:
Restart all containers with --restart always (more on that further below)
Re-configure the docker daemon to allow for live reload
Restart the docker daemon (which is not yet configured for live reload, but will be after this restart)
This restart would shut down and then restart all your containers once. But from then on, you should be free to stop the docker daemon without your containers terminating.
Handling major version upgrades
As mentioned above, live reload cannot handle major version upgrades. For a major version upgrade, one has to tear down all running containers. With a restart policy of always, however, the containers will be restarted after the docker daemon is restarted after the upgrade.

Cron job to kill all hanging docker containers

I am new to docker containers but we have containers being deployed and due to some internal application network bugs the process running in the container hangs and the docker container is not terminated. While we debug this issue I would like a way to find all those containers and setup a cron job to periodically check and kill those relevant containers.
So how would I determine from "docker ps -a" which containers should be dropped and how would I go about it? Any ideas? We are eventually moving to kubernetes which will help with these issues.
Docker already have a command to cleanup the docker environment, you can use it manually or maybe setup a job to run the following command:
$ docker system prune
Remove all unused containers, networks, images (both dangling and
unreferenced), and optionally, volumes.
refer to the documentation for more details on advanced usage.

Marathon says Docker in waiting state, docker engine says container is running

I have a Marathon-Mesos-Docker in an inconsistent state where Marathon says the task is in a Waiting state, Mesos keeps trying to restart the task, but the container is actually running in Docker.
Anyone else seen this and, if so, what you did to fix it?
--John
Okay, I was able to figure it out. The Mesos agent crashed and, therefore, it could not send zookeeper state changes to Marathon. Once I restarted the mesos agent, the state of the Marathon task switched to running

Resources