Docker container restarting in a loop on GCE - docker

I have correctly deployed a Docker container which runs a Python script that grabs some data from the internet and slaps it in BigQuery. The container works well on my machine and on a GCE instance that I've provisioned.
Now, everything works well for the most part but I am failing to understand why the docker container always restarts after exiting (apparently correctly). Logs, in this case, seems to be fairly useless as there is no error whatsoever. My current hunch is that something is failing silently, forcing the instance to restart.
Is there any way to find out the reboot reason for a given Docker container?
Things tried so far
I've tried to print the exit code of the container in the following way. The result is always 0, no matter those restart cycles.
while true
do
docker inspect my_container --format='{{.State.ExitCode}}'
sleep 1
done

The Google Cloud documentation provides you different ways in which you can review your container related logs including container starts and stops.
In any way, I think there is no problem with your container: by default Compute Engine will restart a container on exit, although you can specify a different restart policy if you need to. Please, see the relevant documentation.

Related

docker container lifecycle confusion

I am new to Docker, and I find the definitions of containers' lifecycle differ a lot.
here is what "Manning.Docker.in.Action.2016.3" shows:
here is what google gives me:
https://medium.com/#nagarwal/lifecycle-of-docker-container-d2da9f85959
here is what the official document says:
status: One of created, restarting, running, removing, paused, exited, or dead
https://docs.docker.com/engine/reference/commandline/ps/
So what's going on here? I guess some new states(and renaming) are introduced in newer version of Docker?
Thanks in advance
Your linked diagram separates docker create from docker start, it includes "die" as a state transition, and it shows how to get to the "restarting" state. That's all valid, though it leads to a more complicated state machine.
(docker create wasn't in the very first versions of Docker but it appeared in Docker 1.3.0 in 2014, which should predate your diagram.)
Practically I might suggest an even simpler state machine:
-------> running -+------> stopped ------>
run | stop rm
\------> exited ------>
process exits rm
That is, never try to restart a container or make changes inside a running container; if you need to tweak anything, delete the existing container and create a new one. This gives you a consistent environment (when the main container process starts you always know what's in its filesystem, up to mounted data). It also matches what happens in cluster environments like Kubernetes, where the cluster manager will routinely create and delete containers for you.
When you get in a situation where internet gives you different answers, you should consider trying it yourself. Especially with technologies like docker, where it is pretty simple to make tests. For example:
I want to run a container (I will use nginx):
docker run -d nginx
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
258cd2edbed8 nginx "nginx -g 'daemon of…" 3 seconds ago Up 2 seconds 80/tcp jolly_golick
Note: docker will keep a container running only if there is a process running in it.
If you would start a debian container (for example), you would see how it immediately stop, as there is nothing running in it. So you could do
docker run -d debian sleep 10
and see that the container is up for 10 seconds.
When a container is running, you can do some things on it. You can't do other things, like removing it. To remove a container, you need to stop it first (or kill it), or force container removal.
Note: You would get all this info from docker itself, if you would be playing around with it, as it would return these info. Like if you would try to remove a running container, you would get this error:
Error response from daemon: You cannot remove a running container 258cd2edbed85bed23ab543312968bd893c1fbd9ba81de40366337f434daedff. Stop the container before attempting removal or force remove
I can't do all possible combinations here. You would get a similar error if you would try removing a paused container. Just play with it, and you will get a clear picture of how it works.

Docker Compose "Ghost Containers"

I am using docker-compose to deploy an application combining a number of different images.
Using Docker version 18.09.2, build 6247962
Docker-compose 1.117
Primarily, I have
ZooKeeper
Kafka
MYSQLDb
I notice a strange problem where i could not start my application with docker-compose up due to port already being assigned. I then checked docker stats and saw that there were three containers named "test_ZooKeeper.1slehgaior"
"test_Kafka.kgjdorgsr"
"test_MYSQLDB.kgjdorgsr"
I have tried kill the containers, removing them and pruning the system. When ever I kill one of these containers, it instantly restarts and I cannot for the life of me determine where they are being created from!
Please help :)
If you look into your docker-compose.yaml I'm pretty sure you'll find a restart:always somewhere. If you want to correctly shut down a running docker container managed by docker-compose, one way is to use docker-compose down from the directory where your yaml sits.
More information on the subject:
https://docs.docker.com/config/containers/start-containers-automatically/
Otherwise, you might try out to stop a single running container instead of killing it, which according to my memory tells docker not to restart it again, while a killed container looks to the service like it just has crashed. Not too sure about the last part though.

docker swarm services stuck in preparing

I have a swarm stack deployed and I removed couple services from the stack and tried to deploy them again. these services are showing with desired state remove and current state preparing .. also their name got changed from the custom service name to a random docker name. swarm also trying to start these services which are also stuck in preparing. I ran docker system prune on all nodes and them removed the stack. all the services in the stack are not existent anymore except for the random ones. now I cant delete them and they still in preparing state. the services are not running anywhere in the swarm but I want to know if there is a way to remove them.
I had the same problem. Later I found that the current state, 'Preparing' indicates that docker is trying to pull images from docker hub. But there is no clear indicator in docker service logs <serviceName> available in the docker-compose-version above '3.1'.
But it sometimes imposes the latency due to n\w bandwidth or other docker internal reasons.
Hope it helps! I will update the answer if I find more relevant information.
P.S. I identified that docker stack deploy -c <your-compose-file> <appGroupName> is not stuck when switching the command to docker-compose up. For me, it took 20+ minutes to download my image for some reasons.
So, it proves that there is no open issues with docker stack deploy,
Adding reference from Christian to club and complete this answer.
Use docker-machine ssh to connect to a particular machine:
docker-machine ssh <nameOfNode/Machine>
Your prompt will change. You are now inside another machine. Inside this other machine do this:
tail -f /var/log/docker.log
You'll see the "daemon" log for that machine. There you'll see if that particular daemon is doing the "pull" or what's is doing as part of the service preparation. In my case, I found something like this:
time="2016-09-05T19:04:07.881790998Z" level=debug msg="pull progress map[progress:[===========================================> ] 112.4 MB/130.2 MB status:Downloading
Which made me realise that it was just downloading some images from my docker account.

docker-compose where can I get a detail log (info) about what happened

I'm trying to pull and set a container with an database (mysql) with the following command:
docker-compose up
But for some reason is failing to start the container is showing me this:
some_db exited with code 0
But that doesn't give me any information of why that happened so I can fix it, so how can I see the details of that error?.
Instead of looking for how to find an error in your composition, you should change your process. With docker it is easy to fall into using all of the lovely tools to make things work seamlessly together, but this sounds like an individual container issue, not an issue with docker compse. You should try building an empty container with the baseimage you want then using docker exec -it <somecontainername> sh to go into it and run the commands you're trying to execute in your entrypoint manually to find the specific breakpoint.
Try at least docker logs <exited_container_name/id> (or more recently: docker container logs <exited_container_name/id>)
That way, you can check if there was any error message.
An exited container means the main process stopped: you need to make sure that main process remains up in order for the container to not exit immediately.
Try also docker inspect <exited_container_name/id> (or, again, docker container inspect <exited_container_name/id>)
If you see for instance "OOMKilled": true, that would mean the container memory was exceeded. You can also look for the "ExitCode" for clues.
See more at "Ten tips for debugging Docker containers" from Mark Betz.

Docker Process Management

I have a deployed application running inside a Docker container, which is, in effect, an websocket client that runs forever. Every deploy I'm rebuilding the container and starting it with docker run using the command set in the Dockerfile.
Now, I've noticed a few times that the process occasionally dies without restarting. When running docker ps, I can see that the container is up, and has been up for 2 weeks, however the process running inside of it has died without the host being any the wiser
Do I need to go so far as to have a process manager inside of the docker container to manage the containerized process?
EDIT:
Dockerfile: https://github.com/DVG/catpen-edi/blob/master/Dockerfile
We've developed a process-manager tailor-made for Docker containers and have been using it with quite a bit of success to solve exactly the problem you describe. The best starting point is to take a look at chaperone-docker on github. The readme on the first page contains a quick link to a minimal base image as well as a fully configured LAMP stack so you can try it out and see what a fully-configured image would look like. It's open-source and fully documented.
This is a very interesting problem here related to PID1 and the fact that docker replaces PID1 with the command specified in CMD or ENTRYPOINT. What's happening is that the child process isn't automagically adopted by anything if the parent dies and it becomes an orphan (since there is no PID1 in the sense of a traditional init system like you're used to). Here is some excellent reading to give you a few ideas. You may get some mileage out of their baseimage-docker image which comes with their simplified init system ("my_app"), which will solve some of this problem for you. However, I would strongly caution you against automatically adopting the Phusion mindset for all of your containers, as there exists some ideological friction in that space. I can't recall any discussion on Docker's Github about a potential minimal init system to solve this problem, but I can't imagine it will be a problem forever. Good luck!
If you have two ruby processes it sounds like the child hasn't exited, the application has just stopped working. It's likely the EventMachine reactor is sitting in the background.
Does the EDI app really need to spawn the additional Ruby process? This only adds another layer between Docker and your app. Run the server directly with CMD [ "ruby", "boot.rb" ]. If you find the problem still occurs with a single process then you will need to find what is causing your app to hang.
When a process is running as PID 1 is docker it will need handle the SIGINT and SIGTERM signals too.
# Trap ^C
Signal.trap("INT") {
shut_down
exit
}
# Trap `Kill `
Signal.trap("TERM") {
shut_down
exit
}
Docker also has restart policies for when the container does actually die.
docker run --restart=always
no
Do not automatically restart the container when it exits. This is
the default.
on-failure[:max-retries]
Restart only if the container
exits with a non-zero exit status. Optionally, limit the number of
restart retries the Docker daemon attempts.
always
Always restart the
container regardless of the exit status. When you specify always, the
Docker daemon will try to restart the container indefinitely. The
container will also always start on daemon startup, regardless of the
current state of the container.
unless-stopped
Always restart the
container regardless of the exit status, but do not start it on daemon
startup if the container has been put to a stopped state before.

Resources