I want a lead to the below problem.
My understanding:
Docker swarm incorporates an ingress and a DNS server that identifies services with their names. It also incorporates inbuilt robust load balancers on every node in the cluster.
We can hit any service running on different nodes which are participating in docker swarm mode using any machine's IP address. If a machine does not host service, the load balancer will route the request to a different machine that hosts that service.
For best practice, we can choose a load balancer container(NGINX/HAProxy) as a reverse proxy to route the requests on the basis of some predefined algorithms(round-robin/Hash/IP Hash/Least Connection/Least bandwidth, etc.).
Problem statement:
I want to make a cluster of two/three different machines where I will be deploying all the technical services which are required. A mini QA environment.
As a service is identified by its name, I can not create another service with the same name. Being a developer, I want to have a service up and running on my localHost which is also part of the docker swarm cluster. Obviously, I can not name it the same. So, let's say I name it as myIP_serviceName. Now the DNS entry which docker swarm has will be based on this name.
I want a mechanism where if I make a call for any service using my IP address as host, the load balancer will look for any service which is registered in DNS as myIP_serviceName, if it finds any service with such a name call should be routed to this service, if it doesn't, the call should follow the regular path. This should hold true for every consecutive request which is part of a round trip journey.
I have not explored Kubernetes yet, Please suggest if Kubernetes can be used here to achieve this goal more elegantly.
Please correct my understanding if I am wrong and do provide valuable suggestions.
HAProxy have written about HAProxy on Docker Swarm: Load Balancing and DNS Service Discovery maybe this will point you in the right direction.
Related
I have a docker-swarm set up in digital ocean.
For now there's only 1 master node but more will be added soon.
I use cloudflare dns for this one, and provided the master node's IP as A record IP.
It does work, I am not really sure that this is the correct way though.
Furthermore, I am wondering which IP should I provide to cloudflare when having multiple master nodes?
Any advice on this will be much appreciated, thanks.
You can use any IP you want and Swarm will do routing and load balancing for you. Because of mesh routing you automatically publish your service to all nodes which is part of the swarm network .
As your services will evolve you may need to have more custom rules for the routing. In such case you can introduce a new layer of load balancing done by nginx, haproxy, traefik or any similar tool.
The most simple setup after default swarm routing is to use nginx as load balancer.
Is it possible to cause docker load balancer which uses round robin to direct requests only one container of global docker service deployed on multiple hosts? If this container goes down, requests will be forwarded to other running containers.
The only way i can think of is using external load balancer like nginx, but requires additional docker service.
You can acheive the same result by using replica mode and only having one replica of the container running. In this case you rely on Docker to ensure that an instance is always available.
Alternatively, the recommended way is to use an external load balancer. Check Use swarm mode routing mesh to see the different usages.
I'm working on a project to set up a cloud architecture using docker-swarm. I know that with swarm I could deploy replicas of a service which means multiple containers of that image will be running to serve requests.
I also read that docker has an internal load balancer that manages this request distribution.
However, I need help in understanding the following:
Say I have a container that exposes a service as a REST API or say its a web app. And If I have multiple containers (replicas) deployed in the swarm and I have other containers (running some apps) that talk to this HTTP/REST service.
Then, when I write those apps which IP:PORT combination do I use? Is it any of the worker node IP's running these services? Will doing so take care of distributing the load appropriately even amongst other workers/manager running the same service?
Or should I call the manager which in turn takes care of routing appropriately (even if the manager node does not have a container running this specific service)?
Thanks.
when I write those apps which IP:PORT combination do I use? Is it any
of the worker node IP's running these services?
You can use any node that is participating in the swarm, even if there is no replica of the service in question exists on that node.
So you will use Node:HostPort combination. The ingress routing mesh will route the request to an active container.
One Picture Worth Ten Thousand Words
Will doing so take care of distributing the load appropriately even
amongst other workers/manager running the same service?
The ingress controller will do round robin by default.
Now The clients should use dns round robin to access the service on the docker swarm nodes. The classic DNS cache problem will occur. To avoid that we can use external load balancer like HAproxy.
An important additional information to the existing answer
The advantage of using a proxy (HAProxy) in-front of docker swarm is, swarm nodes can reside on a private network that is accessible to the proxy server, but that is not publicly accessible. This will make your cluster secure.
If you are using AWS VPC, you can create a private subnet and place your swarm nodes inside the private subnet and place the proxy server in public subnet which can forward the traffic to the swarm nodes.
When you access the HAProxy load balancer, it forwards requests to nodes in the swarm. The swarm routing mesh routes the request to an active task. If, for any reason the swarm scheduler dispatches tasks to different nodes, you don’t need to reconfigure the load balancer.
For more details please read https://docs.docker.com/engine/swarm/ingress/
I'm trying to figure out whether Docker Swarm or Kubernetes are a good choice for my use case.
Basically, I want to build a small cluster of forward proxies (via squid, nginx or a custom nodejs script), and be able to deploy/start/stop/purge them all together.
I should be able to access the proxy cluster via a single IP address, manager should be able to load-balance the request to a node, and each proxy node must use a unique outgoing IP address.
I'm wondering:
Are Docker Swarm and/or Kubernetes the right way to go about it?
If so, should I set-up Docker Swarm and/or Kubernetes and its worker nodes (running the proxy) on a single dedicated server or separate virtual servers?
Is it also possible for all the cluster nodes to share a file system storage for caching, common config etc.
Any other tips to get this working.
Thanks!
Docker running in swarm mode should work well for this
Run docker on a single dedicated server; I see no need for virtual servers. You could also run the swarm across multiple dedicated servers.
https://docs.docker.com/engine/swarm/secrets/ work well for some settings and configurations. If you require significant storage, simply add a database service to your cluster
Docker swarm mode fits your requirements quite well; requests are automatically balanced across your swarm and each service instance can be configured to have a unique address. You should check out the swarm mode tutorial: https://docs.docker.com/engine/swarm/swarm-tutorial/
Elasticsearch is designed to run in cluster mode, all I have to do is to define the relevant node IPs in the cluster via environment variable and as long as network connectivity is available it will connect and join the other nodes to the cluster.
I have 3 nodes, 1 is acting as the docker swarm manager and the other two are workers. I have initialized the manager and joined the worker nodes and everything looks ok from that standpoint.
Now I'm trying to run the elasticsearch container in a way that will allow me to join all nodes to the same elasticsearch cluster, however, I want the nodes to join using their overlay network interface and that means that I need to know the container internal IP addresses at the time of running the docker service create command, how can I do this? Do I have to use something like consul to achieve this?
Some clarifications:
I need to know, at the time of service creation the IP addresses (or DNS names) for all Elasticsearch participants so I could start the cluster correctly. This has to be at the time of creation and not afterwards. Also, as I understand, I can expose ports 9200/9300 for all services and work with the external machine IPs and get it to work, but I would like to use the overlay network to do all these communications (I thought this is what swarm mode is for).
Only a partial solution here.
So, when attaching your services to a custom overlay network, you indeed have access to Docker's custom Service discovery feature. I'll detail the networking feature of Docker Swarm mode, before trying to tie it to your problem.
I'll be using the different term of services and tasks, in which a service could be elasticsearch, whereas a task is a single instance of that elasticsearch service.
Docker networking
The idea is that for each services you create, docker assigns a Virtual IP (VIP), and a custom dns alias. You can retrieve this VIP using the docker service inspect myservice command.
But, there is two modes to attach a service to an overlay network dnsrr and VIP. You can select these options using the --endpoint-mode options of docker service create.
The VIP mode (I believe it is the default one, or at least the most used), affects the virtual ip to the service's dns alias. This means that doing an nslookup servicename would return to you a single vip, that behind the scenes, would be linked to one of your container in a round robin fashion. But, there is also a special dns alias that lets you access all of your instances ips (all of your tasks ips) : tasks.myservice.
So in VIP mode you can retrieve all of your tasks ips using a simple nslookup tasks.myservice, where myservice is a service name.
The other mode is dnsrr. This mode simply gets rid of the VIP, and connects the dns alias to the different tasks (=service instances), in a round robin way. This way, you simply have to do a nslookup myservice to retrieve the different service instances ip.
Elasticsearch clustering
Ok so first of all I'm not really familiar with the way elasticsearch lets you cluster. From what I understood from your question, you need when running the elasticsearch binary, give it as a parameter, the adress of all of the other nodes it needs to cluster with.
So what I would do, is to create a custom Elasticsearch image, probably based on the one from the default library, to which I would add a custom Entrypoint that would firstly run a script to retrieve the other tasks ip.
I'd believe that staying in VIP mode is suitable for you, since there is the tasks.myservice dns alias. You'll then need to parse the output to retrieve the tasks ip (and probably remove yours). Then you'll be able to save them in a config file environment variable, or use them as a runtime option for your elasticsearch binary.
Edit: To create a custom overlay network, you will need to use the docker network create command, and use the --network option of docker service create
This is answer is mainly based on the Swarm mode networking documentation