I am trying to stop the docker container automatically after 1 hour. I mean, if there is no process going on or the container is idle for 1 hour, then stop that container. Is this possible to do it programmatically within the Dockfile? Any thoughts would be helpful.
Thanks in advance.
The closest solution that fits your problem supported by Dockerfile would be HEALTHCHECK directive e.g. HEALTHCHECK [OPTIONS] CMD command . Here you can specify interval (e.g. 1 hour) and time out.
--interval=DURATION (default: 30s)
--timeout=DURATION (default: 30s)
--start-period=DURATION (default: 0s)
--retries=N (default: 3)
Other than that you would have to create custom shell script that is triggered by cronjob every 1 hour. In this script you would stop foreground process and by that stooping the running container.
As far as I know such a scenario is not part of the docker workflow.
The container is alive a long as its main process is alive. When that project (PID: 1) exits (with error or success) then the container also stops.
So the only way I see is to either build this logic inside your program (the main process in the container) or wrap the program in a shell script that kills the process based on some rule (like no log entries for a certain amount of time).
Related
I have a docker-compose.yml file that defines a number of services. One is a redis instance, and another is a queue-worker.
The queue-worker fetches jobs from redis and performs the necessary work.
Currently, I have the queue-worker's stop_grace_period set to 5m, within my docker-compose.yml. The idea here is that when I run docker-compose down, the queue-worker will have 5 minutes to deal with any remaining jobs in the queue, before shutting down.
I would like to improve the situation, if possible, by performing a check when docker-compose down is called. If the result of the check is true, e.g. curl http://project/total-jobs-in-queue == 0 then go ahead and stop the queue-worker container immediately. If the result is false, i.e. there are still jobs in the work queue, delay container shutdown until the result of the check is true.
I could write a bash script to perform this check, and shut the containers individually, from within the script, however if it was possible to configure this in such a way that the standard docker-compose up/down commands could continue to be used, that would be much more preferable.
Is this possible?
The docker docs say what a HEALTHCHECK instruction is and how to check the health of a container. But I am not able to figure out what happens when healthcheck fails. Like will the container be restarted or stoped or any of these two as per user instruction.
Further the example quoted is:
HEALTHCHECK --interval=5m --timeout=3s CMD curl -f http://localhost/ || exit 1
What is the exit 1 about?
When running HEALTHCKECKS you can specify:
--interval=DURATION (default 30s)
--timeout=DURATION (default 30s)
--retries=N (default 3)
And the container can have three states:
starting – Initial status when the container is still starting.
healthy – When the command succeeds.
unhealthy – When a single run of the HEALTHCHECK takes longer than the specified timeout. When this happens it will run retries and will be declared "unhealthy" if it still fails.
When the check fails for a specified number of times in a row, the failed container will:
stay in "unhealthy" state if it is in standaolne mode
restart if it is in Swarm mode
Otherwise it will exit with error code 0 which means it is considered "healthy".
I hope it makes things more clear.
There is a good video that i have seen and the guy explains pretty well
https://youtu.be/dLBGoaMz7dQ
basically let's say you are running a web server in production and it has 3 replicas.
during the deployment you want to make sure that you don't lose any of the requests.
the HEALTHCHECK basically helps with identifying when the server is actually running.
it takes like a second or two for your server to start listening and in that time window you can lose requests.
by using HEALTHCHECKS you can make sure that the server is running, that is why sometimes people use CURL (not a best practice)
exit 1 is the response code for the healthcheck
0 - success
1 - unhealthy
2 - reserved
I have a docker container that runs a custom php file or say a unix shell script.
If the script executes fine the docker container should continue to RUN however, if the script fails due to error or due to a custom check; then I wish to terminate (stop) i.e change the status of that docker container to "Exited".
Sample case 1: The unix shell script periodically checks for a particular file or data on a file-system / URL. If that data / file is not found I would like the docker container to shutdown (Exit status) else it should continue to run.
Sample case 2: The script runs and checks for stuck thread count for a different process. If the stuck thread count is more than 5 I would like the docker process to shutdown (Exit status) else it should continue to run.
I know how to shutdown a container from outside however, in this case I wish to trigger container shutdown from within the container depending upon the custom script's failure condition being met.
Can you please suggest ?
Every Docker container has some main process, whatever was launched as the ENTRYPOINT or CMD. That process has pid 1, with the rights and responsibilities that entails. The lifetime of the container is exactly the length of that main process: the only way to cause the container to exit is to cause pid 1 to exit. Since pid 1 is special, it may not work to kill 1.
If I was going to implement this, I'd write a program (probably in C) that could both execute the health checks and run the main process. If the process exited normally, the supervisor would wait(2) for it and then exit itself, causing the container to exit. If a health check failed, the supervisor would kill(2) its child, wait(2) for it, and then exit itself.
I'm not immediately aware of a prebuilt implementation of this concept. It is not dissimilar from what supervisord does, except that supervisor expects to run as an init process that never exits.
Another possibility is to implement the health checks within your application itself. Then you're just running the one process, and if a health check fails, it can kill itself (exit(3), for example). Higher-level orchestrators like Kubernetes also have a health check concept that can be tied to a network request or a command that runs inside a container (for Kubernetes, see Container probes).
I have a Docker image that needs to be run in an environment where I have no admin privileges, using Slurm 17.11.8 in RHEL. I am using udocker to run the container.
In this container, there are two applications that needs to run:
[1] ROS simulation (there is a rosnode that is a TCP client talking to [2])
[2] An executable (TCP server)
So [1] and [2] needs to run together and they shared some common files as well. Usually, I run them in separate terminals. But I have no idea how to do this with SLURM.
Possible Solution:
(A) Use two containers of the same image, but their files will be stored locally. Could use volumes instead. But this requires me to change my code significantly and maybe break compatibility when I am not running it as containers (e.g in Eclipse).
(B) Use a bash script to launch two terminals and run [1] and [2]. Then srun this script.
I am looking at (B) but have no idea how to approach it. I looked into other approaches but they address sequential executions of multiple processes. I need these to be concurrent.
If it helps, I am using xfce-terminal though I can switch to other terminals such as Gnome, Konsole.
This is a shot in the dark since I don't work with udocker.
In your slurm submit script, to be submitted with sbatch, you could allocate enough resources for both jobs to run on the same node(so you just need to reference localhost for your client/server). Start your first process in the background with something like:
udocker container_name container_args &
The & should start the first container in the background.
You would then start the second container:
udocker 2nd_container_name more_args
This would run without & to keep the process in the foreground. Ideally, when the second container completes the script would complete and slurm cleanup would kill the first container. If both containers will come to an end cleanly you can put a wait at the end of the script.
Caveats:
Depending on how Slurm is configured, processes may not be properly cleaned up at the end. You may need to capture the PID of the first udocker as a variable and kill it before you exit.
The first container may still be processing when the second completes. You may need to add a sleep command at the end of your submission script to give it time to finish.
Any number of other gotchas may exist that you will need to find and hopefully work around.
Problem domain
Imagine that a stateful container is being managed by Swarm, e.g. a database, and another container is relying on it, e.g. a service that is executing a long-running job (minutes, sometimes hours) that does not tolerate the database (or even itself) to go down while it's executing.
To give an example, a database importing a multi GB dump.
There's also a CI/CD system in place which takes care of building new versions of the containers and deploying them to the Swarm, or pushing the image to Docker Hub which then calls a defined webhook which fires off the deployment event.
Question
Is there any way I can build my containers so that Swarm can know whether it's ok to update it or not? Similarly how HEALTHCHECK reports whether it needs to be restarted, something that would let Swarm know that 'It's safe to restart this container now'.
Or is it the CI/CD system's responsibility to check whether the stateful containers are safe to restart, and only then issue the update command to swarm?
Thanks in advance!
Docker will not check with a container if it is ready to be stopped, once you give docker the command to stop a container it will perform that action. However it performs the stop in two steps. The first step is a SIGTERM that your container can trap and gracefully handle. By default, after 10 seconds, a SIGKILL is sent that the Linux kernel immediately applies and cannot be trapped by the container. For your goals, you'll want to make sure your app knows when it's safe to exit after receiving the first signal, and you'll probably want to extend the time to much longer than 10 seconds between signals.
The healthcheck won't tell docker that your container is at a safe point to stop. It does tell swarm when your container has finished starting, or when it's misbehaving and needs to be stopped and replaced. The healthcheck defines a command to run inside your container, and the exit code is checked for whether it's 0 (healthy) or 1 (unhealthy). No other exit codes are currently valid.
If you need more than the simple signal handling inside the container, then yes, you're likely moving up the stack to a ci/cd tool to manage the deployment.