celery beat container gets stuck - docker

I started running a celery beat worker in a dedicated container. This works fine sometimes, but now I get the following error trying to remove or re-deploy my containers:
An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
Also, I cannot access the container anymore and the following commands just get stuck:
docker restart beat
docker logs beat
docker exec beat bash

Related

Increase docker timeout to join a network

In our environment we use docker swarm to run some jobs heavily consuming ressources. Those get started and end after some time.
We start the the containers using docker run --rm --memory=18000m --memory-swap=18000m --oom-score-adj=900 -v /some/mount/point:/opt/workspace/mount-point --network someOverlayNetwork --name someName privateRegistry/imagename:version -c "do what to do"
Sometimes this call fails with
docker: Error response from daemon: Could not attach to network someOverlayNetwork: context deadline exceeded.
As far I found out context deadline exceeded is the generic way of go to say some timeout happened. I also see the error happens quite exactly 20s after the docker run command. And it makes sense, there might be quite a lot of things going on in the cluster in terms of load and network load.
I have no problem to wait more time until the next job run starts, but breaking up the job start causes problems for us.
So the question: Is it possible to increase the timeout docker run has to enter a network?

Restart docker automatically

I have a web application in a docker container, and it processes requests via HTTP. But, when there are too many requests app stops working. I am busy with other tasks, so don't really have time to fix it. When it crashes, the container is still running, but the app responds with a 500 error. Are there any ways to track it and restart docker automatically, because I don't have an option to check it all the time?
I suggest you:
Create the container with the restart policy set to always or unless-stopped or on-failure.
Instrument Docker Health Check like HEALTHCHECK --interval=5m --timeout=3s CMD curl -f http://localhost/ || exit 1.

Unable to kill a docker which restart each 10 seconds

I have problems with a heretic docker container... I tried to follow this tutorial, trying to build a OpenVPN in my new raspberry (the first one in my life)... and I think I did something really wrong... I tried to run it with reset policy: "always"
This container has an error each time I try to run it,
standard_init_linux.go:211: exec user process caused "exec format error"
It tries to run each 10 seconds during 3 seconds, more or less, and always with a different Docker Container ID. It runs with different PID, too...
I've tried some solutions I've found on the Internet, trying to stop this madness...
It seems you are using systemd script.
You should try with this command.
systemctl stop docker-openvpn#NAME.service
replace NAME with whatever name you have given to your service.
It is stated in their documentation
In the event the service dies (crashes, or is killed) systemd will attempt to restart the service every 10 seconds until the service is stopped with **systemctl stop docker-openvpn#NAME.service**
Checkout following link
In case you forgot your service name, you can run this command and check your service name
systemctl --type=service

The nginx_phpfpm container goes unhealth when running on ECS as task

Why is it really happening on AWS ECS? I have tested the docker image locally before pushing it to ECR. It works so smooth and is healthy.
Now, if I send the same image to ECR and run it as task, setting up the task definition in ECS. Its stopping and running back every time after few period of time.
The health status shows unhealthy. I am not using the ALB for health check but using the docker health check service built in with ECS. I though it might be the command issue so tried all the options hinted by people online.
CMD-SHELL, curl -f http://localhost/ || exit 1
But nothing seems working here.
What might be the exact cause the docker image running so well locally
does not run on ECS?
I even though, may be its not running on background so added this command on Entrypoint setting in task definition of ECS.
systemctl start nginx

Docker swarm stop grace period doesn't work as expected

I am running Docker in swarm mode with several nodes in the cluster.
According to the documentation written here: https://docs.docker.com/engine/reference/commandline/service_update/ and here: https://docs.docker.com/engine/reference/commandline/service_create/, --stop-grace-period command sets the time to wait before force killing a container.
Expected behavior -
My expectation was that Docker would wait this period of time until it tries to stop a running container, during a rolling update.
Actual behavior -
Docker sends the termination signal after several seconds the new container with the new version of the image starts.
Steps to reproduce the behavior
docker service create --replicas 1 --stop-grace-period 60s --update-delay 60s --update-monitor 5s --update-order start-first --name nginx nginx:1.15.8
Wait for the service to start up the container (aprox. 2 minutes)
docker service update --image nginx:1.15.9 nginx
docker ps -a
As you can see, the new container started and after a second, the
old one was killed by Docker.
Any idea why?
I also opened an issue on Github, here: https://github.com/docker/for-linux/issues/615
The --stop-grace-period value is the amount of time that Docker will wait after sending a sigterm and give up waiting for the container to exit gracefully. Once the grace period is complete, it will kill the container with a sigkill.
The sequence of events seem to happen as is designed based on your description of your setup. Your container exits cleanly and quickly when it gets its sigterm so Docker never needs to send a sigkill.
I see you also specified --update-delay 60 but that won't take effect since you only have one replica. The update delay will tell docker to wait 60 seconds after cycling the first task, so it is only helpful for 2 or more replicas.
It seems like you want your single-replica service to run a new task and an old task concurrently for 60 seconds, but swarm mode is happy to get rid of old containers with sigterm as soon as the new container is up.
I think you can close the issue on GitHub.
stop-grace-period this is the period between stop (SIGTERM) and kill (SIGKILL).
Of course, you can change SIGTERM to another signal by using --stop-signal switch. The behavior of application into a container, when a stop signal is received, is your responsibility.
Here good article explaining this kitchen.

Resources