The docker docs say what a HEALTHCHECK instruction is and how to check the health of a container. But I am not able to figure out what happens when healthcheck fails. Like will the container be restarted or stoped or any of these two as per user instruction.
Further the example quoted is:
HEALTHCHECK --interval=5m --timeout=3s CMD curl -f http://localhost/ || exit 1
What is the exit 1 about?
When running HEALTHCKECKS you can specify:
--interval=DURATION (default 30s)
--timeout=DURATION (default 30s)
--retries=N (default 3)
And the container can have three states:
starting – Initial status when the container is still starting.
healthy – When the command succeeds.
unhealthy – When a single run of the HEALTHCHECK takes longer than the specified timeout. When this happens it will run retries and will be declared "unhealthy" if it still fails.
When the check fails for a specified number of times in a row, the failed container will:
stay in "unhealthy" state if it is in standaolne mode
restart if it is in Swarm mode
Otherwise it will exit with error code 0 which means it is considered "healthy".
I hope it makes things more clear.
There is a good video that i have seen and the guy explains pretty well
https://youtu.be/dLBGoaMz7dQ
basically let's say you are running a web server in production and it has 3 replicas.
during the deployment you want to make sure that you don't lose any of the requests.
the HEALTHCHECK basically helps with identifying when the server is actually running.
it takes like a second or two for your server to start listening and in that time window you can lose requests.
by using HEALTHCHECKS you can make sure that the server is running, that is why sometimes people use CURL (not a best practice)
exit 1 is the response code for the healthcheck
0 - success
1 - unhealthy
2 - reserved
Related
I have looked for a bit on Stack Overflow for a way to have a container start up and wait for an external connection but have not seen anything.
Here is what my process looks like currently:
Non-Docker external process reaches out at X interval and tells system to run a command.
Command runs.
System should remain idle until the next interval.
Now I have seen a few options with --wait or sleep but I would think that would not allow the container to receive the connection.
I also looked at the wait for container script that is often recommended but in this case I need the container to wait for a script to call it on non defined intervals.
I have tried having this just run the help command for my process but it then fails the container after a bit of time and makes it a mess for finding anything.
Additionally I have tried to have the container start with no command just to run the base OS and wait for the call but that did not work either.
I was looking at this wrong.
Ended up just running like any other webserver and database server.
I am trying to stop the docker container automatically after 1 hour. I mean, if there is no process going on or the container is idle for 1 hour, then stop that container. Is this possible to do it programmatically within the Dockfile? Any thoughts would be helpful.
Thanks in advance.
The closest solution that fits your problem supported by Dockerfile would be HEALTHCHECK directive e.g. HEALTHCHECK [OPTIONS] CMD command . Here you can specify interval (e.g. 1 hour) and time out.
--interval=DURATION (default: 30s)
--timeout=DURATION (default: 30s)
--start-period=DURATION (default: 0s)
--retries=N (default: 3)
Other than that you would have to create custom shell script that is triggered by cronjob every 1 hour. In this script you would stop foreground process and by that stooping the running container.
As far as I know such a scenario is not part of the docker workflow.
The container is alive a long as its main process is alive. When that project (PID: 1) exits (with error or success) then the container also stops.
So the only way I see is to either build this logic inside your program (the main process in the container) or wrap the program in a shell script that kills the process based on some rule (like no log entries for a certain amount of time).
I have a docker container that runs a custom php file or say a unix shell script.
If the script executes fine the docker container should continue to RUN however, if the script fails due to error or due to a custom check; then I wish to terminate (stop) i.e change the status of that docker container to "Exited".
Sample case 1: The unix shell script periodically checks for a particular file or data on a file-system / URL. If that data / file is not found I would like the docker container to shutdown (Exit status) else it should continue to run.
Sample case 2: The script runs and checks for stuck thread count for a different process. If the stuck thread count is more than 5 I would like the docker process to shutdown (Exit status) else it should continue to run.
I know how to shutdown a container from outside however, in this case I wish to trigger container shutdown from within the container depending upon the custom script's failure condition being met.
Can you please suggest ?
Every Docker container has some main process, whatever was launched as the ENTRYPOINT or CMD. That process has pid 1, with the rights and responsibilities that entails. The lifetime of the container is exactly the length of that main process: the only way to cause the container to exit is to cause pid 1 to exit. Since pid 1 is special, it may not work to kill 1.
If I was going to implement this, I'd write a program (probably in C) that could both execute the health checks and run the main process. If the process exited normally, the supervisor would wait(2) for it and then exit itself, causing the container to exit. If a health check failed, the supervisor would kill(2) its child, wait(2) for it, and then exit itself.
I'm not immediately aware of a prebuilt implementation of this concept. It is not dissimilar from what supervisord does, except that supervisor expects to run as an init process that never exits.
Another possibility is to implement the health checks within your application itself. Then you're just running the one process, and if a health check fails, it can kill itself (exit(3), for example). Higher-level orchestrators like Kubernetes also have a health check concept that can be tied to a network request or a command that runs inside a container (for Kubernetes, see Container probes).
Problem domain
Imagine that a stateful container is being managed by Swarm, e.g. a database, and another container is relying on it, e.g. a service that is executing a long-running job (minutes, sometimes hours) that does not tolerate the database (or even itself) to go down while it's executing.
To give an example, a database importing a multi GB dump.
There's also a CI/CD system in place which takes care of building new versions of the containers and deploying them to the Swarm, or pushing the image to Docker Hub which then calls a defined webhook which fires off the deployment event.
Question
Is there any way I can build my containers so that Swarm can know whether it's ok to update it or not? Similarly how HEALTHCHECK reports whether it needs to be restarted, something that would let Swarm know that 'It's safe to restart this container now'.
Or is it the CI/CD system's responsibility to check whether the stateful containers are safe to restart, and only then issue the update command to swarm?
Thanks in advance!
Docker will not check with a container if it is ready to be stopped, once you give docker the command to stop a container it will perform that action. However it performs the stop in two steps. The first step is a SIGTERM that your container can trap and gracefully handle. By default, after 10 seconds, a SIGKILL is sent that the Linux kernel immediately applies and cannot be trapped by the container. For your goals, you'll want to make sure your app knows when it's safe to exit after receiving the first signal, and you'll probably want to extend the time to much longer than 10 seconds between signals.
The healthcheck won't tell docker that your container is at a safe point to stop. It does tell swarm when your container has finished starting, or when it's misbehaving and needs to be stopped and replaced. The healthcheck defines a command to run inside your container, and the exit code is checked for whether it's 0 (healthy) or 1 (unhealthy). No other exit codes are currently valid.
If you need more than the simple signal handling inside the container, then yes, you're likely moving up the stack to a ci/cd tool to manage the deployment.
I've set up a docker swarm mode cluster, with two managers and one worker. This is on Centos 7. They're on machines dkr1, dkr2, dkr3. dkr3 is the worker.
I was upgrading to v1.13 the other day, and wanted zero downtime. But it didn't work exactly as expected. I'm trying to work out the correct way to do it, since this is one of the main goals, of having a cluster.
The swarm is in 'global' mode. That is, one replica per machine. My method for upgrading was to drain the node, stop the daemon, yum upgrade, start daemon. (Note that this wiped out my daemon config settings for ExecStart=...! Be careful if you upgrade.)
Our client/ESB hits dkr2, which does its load balancing magic over the swarm. dkr2 which is the leader. dkr1 is 'reachable'
I brought down dkr3. No issues. Upgraded docker. Brought it back up. No downtime from bringing down the worker.
Brought down dkr1. No issue at first. Still working when I brought it down. Upgraded docker. Brought it back up.
But during startup, it 404'ed. Once up, it was OK.
Brought down dkr2. I didn't actually record what happened then, sorry.
Anyway, while my app was starting up on dkr1, it 404'ed, since the server hadn't started yet.
Any idea what I might be doing wrong? I would suppose I need a health check of some sort, because the container is obviously ok, but the server isn't responding yet. So that's when I get downtime.
You are correct -- You need to specify a healthcheck to run against your app inside the container in order to make sure it is ready. Your container will not receive traffic until this healtcheck has passed.
A simple curl to an endpoint should suffice. Use the Healthcheck flag in your Dockerfile to specify a healthcheck to perform.
An example of the healthcheck line in a Dockerfile to check if an endpoint returned 200 OK would be:
HEALTHCHECK CMD curl -f 'http://localhost:8443/somepath' || exit 1
If you can't modify your Dockerfile, then you can also specify your healthcheck manually at deployment time using the compose file healthcheck format.
If that's also not possible either and you need to update a running service, you can do a service update and use a combination of the health flags to specify your healthcheck.