I have a docker swarm setup with a typical web app stack (nginx & php). I need redis as a service in docker swarm. The swarm has 2 nodes and each node should have the web stack and redis service. But only one redis container should be active at a time (and be able to communicate on with each web stack), the other one must be there but in standby mode so that if the first redis fails, this one could switch quickly.
When you work with docker swarm, having a backup, standby container would be considered anti-pattern. A more recommended approach to deploy a reliable container using swarm would be to have a HEALTHCHECK command as part of your Dockerfile. You can set a specific interval after which the healthcheck commands comes into effect for your container to be able to warm up.
Now, club the HEALTHCHECK functionality with the fact that docker-swarm always maintains the specified number of contianers. Make your healthcheck script throw the exit code 1 if it becomes unhealthy. As soon as the swarm detects exit code 1, it kills the container and to maintain the number of containers, it spins up a new one.
The entire process takes only milliseconds and works seamlessly. Have multiple containers in case the warm-up time is long. This will prevent your service from becoming unavailable if one of the containers goes down.
Example of a healthcheck command:
HEALTHCHECK --interval=5m --timeout=3s CMD curl -f http://localhost/ || exit 1
Related
I need some clarification in regards to using HEALTHCHECK on a docker service.
Context:
We are experimenting with a multi-node mariadb cluster and by utilizing HEALTHCHECK we would like the bootstrapping containers to remain unhealthy until bootstrapping is complete. We want this so that front-end users don’t access that particular container in the service until it is fully online and sync’d with the cluster. The issue is that bootstrapping relies on the network between containers in order to do a state transfer and it won’t work when a container isn’t accessible on the network.
Question:
When a container’s status is either starting or unhealthy does HEALTHCHECK completely kill network access to and from the container?
As an example, when a container is healthy I can run the command getent hosts tasks.<service_name>
inside the container which returns the IP address of other containers in a service. However, when the same container is unhealthy that command does not return anything… Hence my suspicion that HEALTHCHECK kills the network at the container level (as opposed to at the service/load balancer level) if the container isn’t healthy.
Thanks in advance
I ran some more tests and found my own answer. Basically docker does not kill container networking when it is either in the started or unhealthy phase. The reason getent hosts tasks.<service_name> command does not work during those phases is that that command goes back to get the container IP address through the service which does not have the unhealthy container(s) assigned to it.
So I have two services run in docker containers, configured in a docker-compose.yaml. There is a dependency between them. Unlike regular dependencies where one container must be up before the other container starts, I have a service which must finish before starting the other service: service 1 updates a DB and service 2 reads from the DB.
Is there some way to perform this type of dependency check?
Both containers will start at the same time, but you could have the code in the second container wait for the first container to signal that it is finishing before it starts. See here:
https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/
Sidecar containers in Kubernetes Jobs?
"Sidecar" containers in Kubernetes pods
After running docker stack deploy to deploy some services to swarm is there a way to programmatically test if all containers started correctly?
The purpose would be to verify in a staging CI/CD pipeline that the containers are actually running and didn't fail on startup. Restart is disabled via restart_policy.
I was looking at docker stack services, is the replicas column useful for this purpose?
$ docker stack services --format "{{.ID}} {{.Replicas}}" my-stack-name
lxoksqmag0qb 0/1
ovqqnya8ato4 0/1
Yes, there are ways to do it, but it's manual and you'd have to be pretty comfortable with docker cli. Docker does not provide an easy built-in way to verify that docker stack deploy succeeded. There is an open issue about it.
Fortunately for us, community has created a few tools that implement docker's shortcomings in this regard. Some of the most notable ones:
https://github.com/issuu/sure-deploy
https://github.com/sudo-bmitch/docker-stack-wait
https://github.com/ubirak/docker-php
Issuu, authors of sure-deploy, have a very good article describing this issue.
Typically in CI/CD I see everyone using docker or docker-compose. A container runs the same in docker as it does docker swarm with respects to "does this container work by itself as intended".
That being said, if you still wanted to do integration testing in a multi-tier solution with swarm, you could do various things in automation. Note this would all be done on a single node swarm to make testing easier (docker events doesn't pull node events from all nodes, so tracking a single node is much easier for ci/cd):
Have something monitoring docker events, e.g. docker events -f service=<service-name> to ensure containers aren't dying.
always have healthchecks in your containers. They are the #1 way to ensure your app is healthy (at the container level) and you'll see them succeed or fail in docker events. You can put them in Dockerfiles, service create commands, and stack/compose files. Here's some great examples.
You could attach another container to the same network to test your services remotely 1-by-1 using tasks. with reverse DNS. This will avoid the VIP and let you talk to a specific replica(s).
You might get some stuff out of docker inspect <service-id or task-id>
Another solution might be to use docker service scale - it will not return until service is converged to specified amount of replicas or will timeout.
export STACK=devstack # swarm stack name
export SERVICE_APP=yourservice # service name
export SCALE_APP=2 # desired amount of replicas
docker stack deploy $STACK --with-registry-auth
docker service scale ${STACK}_${SERVICE_APP}=${SCALE_APP}
One drawback of that method is that you need to provide service names and their replica counts (but these can be extracted from compose spec file using jq).
Also, in my use case I had to specify timeout by prepending timeout command, i.e. timeout 60 docker service scale, because docker service scale was waiting its own timeout even if some containers failed, which could potentially slow down continuous delivery pipelines
References
Docker CLI: docker service scale
jq - command-line JSON processor
GNU Coreutils: timeout command
you can call this for every service. it returns when converged. (all ok)
docker service update STACK_SERVICENAME
I am having a problem trying to implement the best way to add new container to an existing cluster while all containers run in docker.
Assuming I have a docker swarm, and whenever a container stops/fails for some reason, the swarm bring up new container and expect it to add itself to the cluster.
How can I make any container be able to add itself to a cluster?
I mean, for example, if I want to create a RabbitMQ HA cluster, I need to create a master, and then create slaves, assuming every instance of RabbitMQ (master or slave) is a container, let's now assume that one of them fails, we have 2 options:
1) slave container has failed.
2) master container has failed.
Usually, a service which have the ability to run as a cluster, it also has the ability to elect a new leader to be the master, so, assuming this scenerio is working seemlesly without any intervention, how would a new container added to the swarm (using docker swarm) will be able to add itself to the cluster?
The problem here is, the new container is not created with new arguments every time, the container is always created as it was deployed first time, which means, I can't just change it's command line arguments, and this is a cloud, so I can't hard code an IP to use.
Something here is missing.
Maybe trying to declare a "Service" in the "docker Swarm" level, will acctualy let the new container the ability to add itself to the cluster without really knowing anything the other machines in the cluster...
There are quite a few options for scaling out containers with Swarm. It can range from being as simple as passing in the information via a container environment variable to something as extensive as service discovery.
Here are a few options:
Pass in IP as container environment variable. e.g. docker run -td -e HOST_IP=$(ifconfig wlan0 | awk '/t addr:/{gsub(/.*:/,"",$2);print$2}') somecontainer:latest
this would set the internal container environment variable HOST_IP to the IP of the machine it was started on.
Service Discovery. Querying a known point of entry to determine the information about any required services such as IP, Port, ect.
This is the most common type of scale-out option. You can read more about it in the official Docker docs. The high level overview is that you set up a service like Consul on the masters, which you have your services query to find the information of other relevant services. Example: Web server requires DB. DB would add itself to Consul, the web server would start up and query Consul for the databases IP and port.
Network Overlay. Creating a network in swarm for your services to communicate with each other.
Example:
$ docker network create -d overlay mynet
$ docker service create –name frontend –replicas 5 -p 80:80/tcp –network mynet mywebapp
$ docker service create –name redis –network mynet redis:latest
This allows the web app to communicate with redis by placing them on the same network.
Lastly, in your example above it would be best to deploy it as 2 separate containers which you scale individually. e.g. Deploy one MASTER and one SLAVE container. Then you would scale each dependent on the number you needed. e.g. to scale to 3 slaves you would go docker service scale <SERVICE-ID>=<NUMBER-OF-TASKS> which would start the additional slaves. In this scenario if one of the scaled slaves fails swarm would start a new one to bring the number of tasks back to 3.
https://docs.docker.com/engine/reference/builder/#healthcheck
Docker images have a new layer for health check.
Use a health check layer in your containers for example:
RUN ./anyscript.sh
HEALTHCHECK exit 1 or (Any command you want to add)
HEALTHCHECK check the status code of command 0 or 1 and than result as
1. healthy
2. unhealthy
3. starting etc.
Docker swarm auto restart the unhealthy containers in swarm cluster.
I have some problem using docker swarm mode .
I want to have high availability with swarm mode.
I think I can do that with rolling update of swarm.
Something like this...
docker service update --env-add test=test --update-parallelism 1 --update-delay 10s 6bwm30rfabq4
However there is a problem.
My docker image have entrypoint. Because of this there is a little delay before the service(I mean docker container) is really up. But docker service just think the service is already running, because status of the container is 'Up'. Even the service still do some work on entrypoint. So some container return error when I try to connect the service.
For example, if I create docker service named 'test' and scale up to 4 with port 8080. I can access test:8080 on web browser. And I try to rolling update with --update-parallelism 1 --update-delay 10s options. After that I try to connect the service again.. one container return error.. Because Docker service think that container already run..even the container still doesn't up because of entrypoint. And after 10s another container return error.. because update is started and docker service also think that container is already up.
So.. Is there any solution to solve this problem?
Should I make some nginx settings for disconnect connection to error container and reconnect another one?
The HEALTHCHECK Dockerfile command works for this use case. You specify how Docker should check if the container is available, and it gets used during updates as well as checking service levels in Swarm.
There's a good article about it here: Reducing Deploy Risk With Docker’s New Health Check Instruction.