I am currently attempting a Kafka cluster deployment in Docker Swarm. Kafka does not work with the replica feature of Swarm because each Kafka broker (node) needs to be configured and reachable individually (i.e. no load balancer in front of it). Therefore, each broker is configured as an individual service with replicas=1, e.g. kafka1, kafka2 and kafka3 services.
Every now and then the configuration or image for the Kafka brokers will need to be changed via a docker stack deploy (done by a person or CI/CD pipeline). Then Swarm will recreate all containers simultaneously and as a result, the Kafka cluster is temporarily unavailable, which is not acceptable for a critical piece of infrastructure that is supposed to run 24/7. And I haven't even mentioned the Zookeeper cluster underneath Kafka yet, for which the same applies.
The desired behavior is that Swarm recreates the container of the kafka1 service, waits until it has fully started up and synchronized with the other brokers (all topic partitions are in sync), and only then Swarm restarts kafka2 service and so on.
I think I can construct a health check within the Kafka Docker image that would tell the Docker engine when the Kafka broker is fully synchronized. But how make Swarm perform what amounts to a rolling update across service boundaries? It ignores the depends_on setting that Docker Compose knows, and rolling update policies apply to service replicas only. Any idea?
Related
I need help in distributing already running containers on the newly added docker swarm worker node.
I am running docker swarm mode on docker version - 18.09.5. I am using AWS autoscaling for creating 3 masters and 4 workers. For high availability, if one of the workers goes down, all the containers from that worker node will be balanced on other workers. When autoscaling brings new node up, I am adding that worker node to the current docker swarm setup using some automation. But docker swarm is not balancing containers on that worker node. Even I tried to deploy the docker stack again, still swarm is not balancing the containers. Is it because of different node id? How can I customize it? I am using docker compose file deploying stack.
docker stack deploy -c dockerstack.yml NAME
The only (current) to force re-balancing, is to force-update the services. See https://docs.docker.com/engine/swarm/admin_guide/#force-the-swarm-to-rebalance for more information.
When running Flink JobManager and Flink TaskManager in a Docker Swarm cluster, there is no guarantee that JobManager will run in any particular node.
If I want to access the Web UI on port 8080, do I need to find out which machine is running JobManager and go to http://ip_address:8080?
What if the node that is running JobManager changes?
Doesn't look like a very straightforward way of working. Is there a way to force the containerised Job Manager to run on a specific node?
I am currently creating the services using the Docker Swarm scripts from:
https://github.com/apache/flink/tree/master/flink-contrib/docker-flink
Thank you very much.
To access a service running in a swarm, you can call any node that is participating in the swarm. The swarm node load balancer will then route the request to an available container. This is provided by the docker swarm routing mesh.
I am having a problem trying to implement the best way to add new container to an existing cluster while all containers run in docker.
Assuming I have a docker swarm, and whenever a container stops/fails for some reason, the swarm bring up new container and expect it to add itself to the cluster.
How can I make any container be able to add itself to a cluster?
I mean, for example, if I want to create a RabbitMQ HA cluster, I need to create a master, and then create slaves, assuming every instance of RabbitMQ (master or slave) is a container, let's now assume that one of them fails, we have 2 options:
1) slave container has failed.
2) master container has failed.
Usually, a service which have the ability to run as a cluster, it also has the ability to elect a new leader to be the master, so, assuming this scenerio is working seemlesly without any intervention, how would a new container added to the swarm (using docker swarm) will be able to add itself to the cluster?
The problem here is, the new container is not created with new arguments every time, the container is always created as it was deployed first time, which means, I can't just change it's command line arguments, and this is a cloud, so I can't hard code an IP to use.
Something here is missing.
Maybe trying to declare a "Service" in the "docker Swarm" level, will acctualy let the new container the ability to add itself to the cluster without really knowing anything the other machines in the cluster...
There are quite a few options for scaling out containers with Swarm. It can range from being as simple as passing in the information via a container environment variable to something as extensive as service discovery.
Here are a few options:
Pass in IP as container environment variable. e.g. docker run -td -e HOST_IP=$(ifconfig wlan0 | awk '/t addr:/{gsub(/.*:/,"",$2);print$2}') somecontainer:latest
this would set the internal container environment variable HOST_IP to the IP of the machine it was started on.
Service Discovery. Querying a known point of entry to determine the information about any required services such as IP, Port, ect.
This is the most common type of scale-out option. You can read more about it in the official Docker docs. The high level overview is that you set up a service like Consul on the masters, which you have your services query to find the information of other relevant services. Example: Web server requires DB. DB would add itself to Consul, the web server would start up and query Consul for the databases IP and port.
Network Overlay. Creating a network in swarm for your services to communicate with each other.
Example:
$ docker network create -d overlay mynet
$ docker service create –name frontend –replicas 5 -p 80:80/tcp –network mynet mywebapp
$ docker service create –name redis –network mynet redis:latest
This allows the web app to communicate with redis by placing them on the same network.
Lastly, in your example above it would be best to deploy it as 2 separate containers which you scale individually. e.g. Deploy one MASTER and one SLAVE container. Then you would scale each dependent on the number you needed. e.g. to scale to 3 slaves you would go docker service scale <SERVICE-ID>=<NUMBER-OF-TASKS> which would start the additional slaves. In this scenario if one of the scaled slaves fails swarm would start a new one to bring the number of tasks back to 3.
https://docs.docker.com/engine/reference/builder/#healthcheck
Docker images have a new layer for health check.
Use a health check layer in your containers for example:
RUN ./anyscript.sh
HEALTHCHECK exit 1 or (Any command you want to add)
HEALTHCHECK check the status code of command 0 or 1 and than result as
1. healthy
2. unhealthy
3. starting etc.
Docker swarm auto restart the unhealthy containers in swarm cluster.
So I've got a Plex server running on my Docker swarm!! If I kill a node magically it'll start Plex somewhere else. This is great! Now comes the fun part...
With old-school containers I would just port forward port 32400 on my router to the server that was running Plex and it would work find. Now that Plex can run in multiple different places I need to figure out how to forward the port to some static resource. I could use HAProxy to bind some bridge interface and run it on every node to provide failover...but I'd like to see if there's an easier way to accomplish this.
What's the best way to forward ports to services in Docker Swarm?
Port forwarding is built into the new swarm mode. There's a section on load balancing in the documentation:
The swarm manager uses ingress load balancing to expose the services
you want to make available externally to the swarm. The swarm manager
can automatically assign the service a PublishedPort or you can
configure a PublishedPort for the service in the 30000-32767 range.
External components, such as cloud load balancers, can access the
service on the PublishedPort of any node in the cluster whether or not
the node is currently running the task for the service. All nodes in
the swarm cluster route ingress connections to a running task
instance.
Swarm mode has an internal DNS component that automatically assigns
each service in the swarm a DNS entry. The swarm manager uses internal
load balancing to distribute requests among services within the
cluster based upon the DNS name of the service.
Update
The following article discusses how to integrate a proxy load balancer into the docker engine
https://technologyconversations.com/2016/08/01/integrating-proxy-with-docker-swarm-tour-around-docker-1-12-series/
I have a setup where I am deploying a spring-cloud-consul application from within a docker swarm overlay network. In my overlay network I have created consul images on each node. When I spin up the spring-cloud-consul application I have to specify the host name of the consul agent it should talk to such as "discovery" so it can advertise itself and query for service discovery. The issue here is that every container then is querying the same consul agent. When I remove this particular consul agent the Ribbon DiscoveryClient seems to rely on its own cache rather than use one of the other consul nodes.
What is the proper way of starting up a micro service application using spring-cloud-consul and consul such that they are not reliant on one fixed consul agent.
Solutions I have thought of trying:
Having multiple compose files and which specify different consul agents.
Somehow having the docker image identify the node it is on and then set itself to use the consul agent local to that node. (Not sure how to accomplish this yet.)
Package a consul agent with the spring-boot application.
Thank you for your help.
The consul agent must run on every node in the cluster. It is not necessary to run the consul agent inside every docker container, just on every node. You have the choice of installing the consul agent on every node, or running the consul agent in a docker container on every node.
For the consul agent in a docker container solution you will need to ensure you have the consul agent container running before other containers are started.
For details on running the consul agent in client mode in a docker container see: https://hub.docker.com/_/consul/ and search for Running Consul Agent in Client Mode. This defines the agent container with --net=host networking, so the agent behaves like it is installed natively, when it is actually in a docker container.