How to debug Docker containers killed by host (137)? - docker

I'm running 3 applications together using docker-compose:
Standard Nginx image
Java/Spark API server
Node.js app (backend + frontend)
I can bring the composed service up with docker-compose up no problem, and it runs for a period of time with no issues. At some point something kills the two non-nginx containers with code 137, and the service goes down.
My docker-compose.yml has restart: always on each container, but as I understand it this will not restart the containers if they're getting killed in this way. I verified this with docker kill $CONTAINER on each one, and they are not restarted.
When the application exits, all I see at the end of my logs is:
nginx exited with code 0
java_app exited with code 143
node_app exited with code 137
How can I debug why the host is killing these containers, and either stop this from happening or make them restart on failure?

You are not have enough memory or your applications have a memory leak. You can limit each container. Also, you can try to create swap space if you are not have enough memory.

Related

Docker: how to use a restart policy?

Documentation says:
unless-stopped Similar to always, except that when the container is stopped (manually or otherwise), it is not restarted even after Docker daemon restarts.
Ok. I understand what manually means: docker stop container_name. But what or otherwise stands for?
The paragraph after the table clarifies (emphasis mine):
configures it to always restart unless it is explicitly stopped or Docker is restarted.
One example is if the host reboots. Containers will be implicitly stopped (the container metadata and filesystems exist but the main container process does not), and at this point restart policies apply as well.
Event
no
on-failure
unless-stopped
always
docker stop
Stopped
Stopped
Stopped
Stopped
Host reboot
Stopped
Stopped
Stopped
Restarted
Process exits (code=0)
Stopped
Stopped
Restarted
Restarted
Process exits (codeā‰ 0)
Stopped
Restarted
Restarted
Restarted
The documentation hints that this also applies if the Docker daemon is restarted, but this is a somewhat unusual case. My memory is that this event frequently seems to not affect running containers at all.

Is there a way to run a standby service in docker swarm?

I have a docker swarm setup with a typical web app stack (nginx & php). I need redis as a service in docker swarm. The swarm has 2 nodes and each node should have the web stack and redis service. But only one redis container should be active at a time (and be able to communicate on with each web stack), the other one must be there but in standby mode so that if the first redis fails, this one could switch quickly.
When you work with docker swarm, having a backup, standby container would be considered anti-pattern. A more recommended approach to deploy a reliable container using swarm would be to have a HEALTHCHECK command as part of your Dockerfile. You can set a specific interval after which the healthcheck commands comes into effect for your container to be able to warm up.
Now, club the HEALTHCHECK functionality with the fact that docker-swarm always maintains the specified number of contianers. Make your healthcheck script throw the exit code 1 if it becomes unhealthy. As soon as the swarm detects exit code 1, it kills the container and to maintain the number of containers, it spins up a new one.
The entire process takes only milliseconds and works seamlessly. Have multiple containers in case the warm-up time is long. This will prevent your service from becoming unavailable if one of the containers goes down.
Example of a healthcheck command:
HEALTHCHECK --interval=5m --timeout=3s CMD curl -f http://localhost/ || exit 1

Instances of Dockerized spring-boot services keep failing in swarm mode. How to find the problem?

I have a test env where I have 2 machines, 1 manager and 1 worker in swarm mode. I deploy a stack of 10 services on the worker machine with 1 container for each service. The services start and after some time some instances die, then the manager again puts them in pending and so on keeps happening. The spring-boot services have no problem(I checked the log). To me it seems that the worker is not able to handle the 10 instances however I am not sure.
Are their any docker commands to find out whats going on here? Like some command which might say that the container was killed because it was out memory or something?

How do you kill a docker containers default command without killing the entire container?

I am running a docker container which contains a node server. I want to attach to the container, kill the running server, and restart it (for development). However, when I kill the node server it kills the entire container (presumably because I am killing the process the container was started with).
Is this possible? This answer helped, but it doesn't explain how to kill the container's default process without killing the container (if possible).
If what I am trying to do isn't possible, what is the best way around this problem? Adding command: bash -c "while true; do echo 'Hit CTRL+C'; sleep 1; done" to each image in my docker-compose, as suggested in the comments of the linked answer, doesn't seem like the ideal solution, since it forces me to attach to my containers after they are up and run the command manually.
This is by design by Docker. Each container is supposed to be a stateless instance of a service. If that service is interrupted, the container is destroyed. If that service is requested/started, it is created. If you're using an orchestration platform like k8s, swarm, mesos, cattle, etc at least.
There are applications that exist to represent PID 1 rather than the service itself. But this goes against the design philosophy of microservices and containers. Here is an example of an init system that can run as PID 1 instead and allow you to kill and spawn processes within your container at will: https://github.com/Yelp/dumb-init
Why do you want to reboot the node server? To apply changes from a config file or something? If so, you're looking for a solution in the wrong direction. You should instead define a persistent volume so that when the container respawns the service would reread said config file.
https://docs.docker.com/engine/admin/volumes/volumes/
If you need to restart the process that's running the container, then simply run a:
docker restart $container_name_or_id
Exec'ing into a container shouldn't be needed for normal operations, consider that a debugging tool.
Rather than changing the script that gets run to automatically restart, I'd move that out to the docker engine so it's visible if your container is crashing:
docker run --restart=unless-stopped ...
When a container is run with the above option, docker will restart it for you, unless you intentionally run a docker stop on the container.
As for why killing pid 1 in the container shuts it down, it's the same as killing pid 1 on a linux server. If you kill init/systemd, the box will go down. Inside the namespace of the container, similar rules apply and cannot be changed.

Why docker stops already running containers during startup?

I have some containers running with docker server up. Now if the daemon crashes for some reason (I killed it using kill -9 $Pid_of_daemon to reproduce this behavior), and if I start the docker server again, why does it kill the old running container. The behavior I want is it should go ahead even if there are already running containers. The only reason I found out is, when daemon crashes, it looses its stdin, stdout pipes with the container, so it can no more attach to the old running containers. But if my container does not want stdout, stdin or stderr, then why would the daemon is killing it during startup. Please help

Resources