Cluster of forward proxies - docker

I'm trying to figure out whether Docker Swarm or Kubernetes are a good choice for my use case.
Basically, I want to build a small cluster of forward proxies (via squid, nginx or a custom nodejs script), and be able to deploy/start/stop/purge them all together.
I should be able to access the proxy cluster via a single IP address, manager should be able to load-balance the request to a node, and each proxy node must use a unique outgoing IP address.
I'm wondering:
Are Docker Swarm and/or Kubernetes the right way to go about it?
If so, should I set-up Docker Swarm and/or Kubernetes and its worker nodes (running the proxy) on a single dedicated server or separate virtual servers?
Is it also possible for all the cluster nodes to share a file system storage for caching, common config etc.
Any other tips to get this working.
Thanks!

Docker running in swarm mode should work well for this
Run docker on a single dedicated server; I see no need for virtual servers. You could also run the swarm across multiple dedicated servers.
https://docs.docker.com/engine/swarm/secrets/ work well for some settings and configurations. If you require significant storage, simply add a database service to your cluster
Docker swarm mode fits your requirements quite well; requests are automatically balanced across your swarm and each service instance can be configured to have a unique address. You should check out the swarm mode tutorial: https://docs.docker.com/engine/swarm/swarm-tutorial/

Related

Deploying couchbase in a docker swarm environment

I'm trying to deploy couchbase community edition in a docker swarm environment. I followed the steps suggested by Arun Gupta, though I'm not sure if a master-worker model is desired as Couchbase doesn't have the notion of master/slave model.
Following are the problems I encountered. I'm wondering if anyone is able to run Couchbase successfully in a swarm mode.
Docker swarm assigns different IP address each time the service is restarted. Sometimes, docker moves the service to a new node which, again assigns a different IP address. It appears that Couchbase doesn't start if it finds a new IP address. (log says "address on which the service is configured is not up. Waiting for the interface to be brought up"). I'm using a host mounted volume as the data folder (/opt/couchase/var) to persist data across restarts.
I tried to read overlay network address used internally and update ip and ip_start files in a run script within the container. This doesn't help either. Server comes up as a new instance without loading old data. This is a real problem as production data can be lost if docker swarm moves services around.
docker swarm's internal router assigns an address from overlay network in addition to other interfaces. I tried using localhost, master.overlaynet, IP address of the overlaynet, private address assigned by docker to container etc. as server address in the Couchbase cluster configuration. While the cluster servers are able to communicate to each other, this created another problem with client connections. Client normally connects to an address/port exposed by the swarm cluster. This is different from cluster node address. In case of a python client, it reads Couchbase cluster server addresses and tried to connect to that if overlay address is given as server address at the time of joining the cluster. The client times out as the address is not reachable.
I might be able to add a network address constraint to the yaml file to ensure that master node will come up with the same address. For eg.
networks:
default:
ipv4_address: 172.20.x.xx
Above approach may not work for worker nodes as that will impact ability to scale worker nodes based on load/growth.
In this model (master/worker), how does a worker get elected as leader if master node goes down? Is master/worker the right approach for a Couchbase cluster in swarm environment?
It will be helpful if I can get some references to Couchbase swarm mode setup or some suggestions on how to handle IP address change.
We ran into the same problem (couchbase server 5.1.1) and our temporary solution is to use fixed IPs on a new docker bridge network.
networks:<br>
default:<br>
ipv4_address: 172.19.0.x
Although this works, this is not a good solution as we loose auto-scaling as mentioned above. We had some learnings during setup. Just to let you know:
You can run a single-node couchbase setup with dynamic IP. You can stop/restart this container and update couchbase-server version with no limitations.
When you add a second node this initially works with dynamic IP as well during setup. You can add the server and rebalance the cluster. But when you stop/restart/scale 0/1 a couchbase container, it won't start up anymore due to a new IP provides by docker (10.0.0.x with default network).
Changing the "ip" or "ip_start" files (/opt/couchbase/var/lib/couchbase/config) to update the IP does NOT work. Server starts up as "new" server, when changing the ip in "ip" and "ip_start" but it still has all the data. So you can backup your data, if you need now. So even after you "switched" to fixed IP you can't re-start the server directly, but need to cbbackup and cbrestore.
https://docs.couchbase.com/server/5.1/install/hostnames.html documentation for using hostnames is a little misleading as this only documents how to "find" a new server while configuring a cluster. If you specify hostnames couchbase anyway configures all nodes with the static IPs.
You might start your docker swarm with host network might be a solution, but we run multiple instances of other containers on a single host, so we would like to avoid that solution.
So always have a backup of the node/cluster. We always make a file-backup and a cluster-backup with cbbackup. As restoring from a file backup is much faster.
There is a discussion at https://github.com/couchbase/docker/issues/82 on this issue, but this involves using AWS for static IPs, which we don't.
I am aware of couchbase autonomous operator for kubernetes, but for now we would like to stay with docker swarm. If anybody has a nicer solution for this, how to configure couchbase to use hostnames, please share.

Can all docker swarm instances run on same machine?

I have a couple of Docker swarm questions (Sorry for not splitting them up but they are all closely related):
Do all instances in a swarm have to run on different machines or can they all run on the same? (if having limited amount of hardware and just wanting to try swarm mode)
Do I have to run swarm mode to be able to communicate between instances?
What is the key difference between swarm mode and just running a number of containers as regular?
What are the options of communication between instances of containers? (in swarm and in regular mode) http? named pipes? other?
If using http communication between containers on same machine, will it be roughly similarly as fast as named pipes?
Is there any built in support for a message bus or similar in Docker?
Is there support for any consensus protocol in Docker?
Are there any GUI's for designing, managing, testing and/or debugging Docker swarms?
Can a container list other containers, stop/restart some and start new ones? (to be able to function as a manager for other containers)
Can a container be given access to OS-features (Linux in my case) to configure for instance a reverse proxy or port forwarding on the WAN?
Background: What I'm trying to figure out is how I should go about and build a micro service mesh using Docker. The containers will be running .NET Core. I'm not too keen on relying too much on specifically Docker since it may not be the preferred tech in a couple of years. What can/should I do with Docker and what can/should I do inside the containers. That's what I'm trying to figure out.
I've copied your questions and tried to answer them.
Do all instances in a swarm have to run on different machines or can they all run on the same? (if having limited amount of hardware and just wanting to try swarm mode)
You can have only one machine in a swarm and run multiple tasks of the same service or in other words your scale of a service can be more than the number of actual machines. I have a testing swarm with a single machine and one with three and it works the same way.
Do I have to run swarm mode to be able to communicate between instances?
You have to run your docker in swarm mode in order to create a service, please see this link
What is the key difference between swarm mode and just running a number of containers as regular?
The key difference afaik is, that when a task goes down, docker puts another task up automatically. And you can easily scale your services, which means you can easily have multiple tasks just by scaling your service (up or down). As of running a container - when it goes down you have to manually start another.
What are the options of communication between instances of containers? (in swarm and in regular mode) http? named pipes? other?
I've currently only tested with a couple of wildfly servers in a swarm, which are on the same network. I'm not sure about others, but would love to find out. I've only read about RabbitMQ, but can't seem to find the link atm.
If using http communication between containers on same machine, will it be roughly similarly as fast as named pipes?
I can't say.
Is there any built in support for a message bus or similar in Docker?
I can't say.
Are there any GUI's for designing, managing, testing and/or debugging Docker swarms?
I've tested rancher and portainer.io, for a list of them I found this link
Can a container list other containers, stop/restart some and start new ones?
I'm not sure why would you want to do that? And I guess it's possible, see this link
Can a container be given access to OS-features (Linux in my case) to configure for instance a reverse proxy or port forwarding on the WAN?
I can't say.
#namokarm did a great job, and I'm filling in the gaps:
Benefits of Swarm over docker run or docker-compose.
All communications between containers has to be TCP/UDP etc. You could force two containers to only run on a single machine, then bind-mount their socket so they skip the network, but that would be a bit of an anti-pattern. Swarm is designed for everything to be distributed and TCP/UDP.
In a few cases, such as PHP-FPM + Nginx, I recommend bundling both in the same container (against docker best practices, but trust me it's easier than separate containers). This will ensure they scale together (1-to-1 relationship) and stay fast since they use local sockets to communicate). I only recommend this for a few setups like this, the other being ColdFusion + Nginx because they are two parts of the same tool that provide a HTTP response... I don't recommend bundling images together in nearly all other cases, but I'm open to ideas :).
Rancher is no longer supporting Swarm. Portainer and SwarmPit are GUI options.
Yes a container running something like Portainer/SwarmPit or controlling the Docker socket through a bind-mount or TCP can control the whole Swarm. This is how all docker management works :)
For reverse proxy, you would run a container-based proxy like Traefik or Docker Flow Proxy, which sets up HAProxy for Docker and Swarm.
Many of these topics are discussed in my DockerCon talks: https://www.bretfisher.com/dockercon18/

Difference between Docker container and service

I'm wondering whether there are any differences between the following docker setups.
Administrating two separate docker engines via the remote api.
Administrating two docker swarm nodes via one single docker engine.
I'm wondering if you can administrate a swarm with the ability run a container on a specific node are there any use cases to have separate docker engines?
The difference between the two is swarm mode. When a docker engine is running services in swarm mode you get:
Orchestration from the manager to continuously try to correct any differences between the current state and the target state. This can also include HA using the quorum model (as long as a majority of the managers are reachable to make decisions).
Overlay networking which allows containers on different hosts to talk to each other on their own container network. That can also involve IPSEC for security.
Mesh networking for published ports and a VIP for the service that doesn't change like container IP's do. The latter prevents problems from DNS caching. And the former has all nodes in the swarm publish the port and routes traffic to a container providing this service.
Rolling upgrades to avoid any downtime with replicated services.
Load balancing across multiple nodes when scaling up a service.
More details on swarm mode are available from docker's documentation.
The downside of swarm mode is that you are one layer removed from the containers when they run on a remote node. You can't run an exec command on a task to investigate a container, you need to do that on a container and be on the node it's currently using. Docker also removed some options from services like --volumes-from which don't apply when containers may be running on different machines.
If you think you may grow beyond running containers on a single node, need to communicate between the containers on different nodes, or simply want the orchestration features like rolling upgrades, then I would recommend swarm mode. I'd only manage containers directly on the hosts if you have a specific requirement that prevents swarm mode from being an option. And you can always do both, manage some containers directly and others as a service or stack inside of swarm, on the same nodes.

How is load balancing done in Docker-Swarm mode

I'm working on a project to set up a cloud architecture using docker-swarm. I know that with swarm I could deploy replicas of a service which means multiple containers of that image will be running to serve requests.
I also read that docker has an internal load balancer that manages this request distribution.
However, I need help in understanding the following:
Say I have a container that exposes a service as a REST API or say its a web app. And If I have multiple containers (replicas) deployed in the swarm and I have other containers (running some apps) that talk to this HTTP/REST service.
Then, when I write those apps which IP:PORT combination do I use? Is it any of the worker node IP's running these services? Will doing so take care of distributing the load appropriately even amongst other workers/manager running the same service?
Or should I call the manager which in turn takes care of routing appropriately (even if the manager node does not have a container running this specific service)?
Thanks.
when I write those apps which IP:PORT combination do I use? Is it any
of the worker node IP's running these services?
You can use any node that is participating in the swarm, even if there is no replica of the service in question exists on that node.
So you will use Node:HostPort combination. The ingress routing mesh will route the request to an active container.
One Picture Worth Ten Thousand Words
Will doing so take care of distributing the load appropriately even
amongst other workers/manager running the same service?
The ingress controller will do round robin by default.
Now The clients should use dns round robin to access the service on the docker swarm nodes. The classic DNS cache problem will occur. To avoid that we can use external load balancer like HAproxy.
An important additional information to the existing answer
The advantage of using a proxy (HAProxy) in-front of docker swarm is, swarm nodes can reside on a private network that is accessible to the proxy server, but that is not publicly accessible. This will make your cluster secure.
If you are using AWS VPC, you can create a private subnet and place your swarm nodes inside the private subnet and place the proxy server in public subnet which can forward the traffic to the swarm nodes.
When you access the HAProxy load balancer, it forwards requests to nodes in the swarm. The swarm routing mesh routes the request to an active task. If, for any reason the swarm scheduler dispatches tasks to different nodes, you don’t need to reconfigure the load balancer.
For more details please read https://docs.docker.com/engine/swarm/ingress/

service discovery in docker without using consul

I'm new to docker and microservices. I've started to decompose my web-app into microservices and currently, I'm doing manual configuration.
After some study, I came across docker swarm mode which allows service discovery. Also, I came across other tools for service discovery such as Eureka and Consul.
My main aim is to replace IP addresses in curl call with service name and load balance between multiple instances of same service.
i.e. for ex. curl http://192.168.0.11:8080/ to curl http://my-service
I have to keep my services language independent.
Please suggest, Do I need to use Consul with docker swarm for service discovery or i can do it without Consul? What are the advantages?
With the new "swarm mode", you can use docker services to create clustered services across multiple swarm nodes. You can then access those same services, load-balanced, by using the service name rather than the node name in your requests.
This only applies to nodes within the swarm's overlay network. If your client systems are part of the same swarm, then discovery should work out-of-the-box with no need for any external solutions.
On the other hand, if you want to be able to discover the services from systems outside the swarm, you have a few options:
For stateless services, you could use docker's routing mesh, which will make the service port available across all swarm nodes. That way you can just point at any node in the swarm, and docker will direct your request to a node that is running the service (regardless of whether the node you hit has the service or not).
Use an actual load balancer in front of your swarm services if you need to control routing or deal with different states. This could either be another docker service (i.e. haproxy, nginx) launched with the --mode global option to ensure it runs on all nodes, or a separate load-balancer like a citrix netscaler. You would need to have your service containers reconfigure the LB through their startup scripts or via provisioning tools (or add them manually).
Use something like consul for external service discovery. Possibly in conjunction with registrator to add services automatically. In this scenario you just configure your external clients to use the consul server/cluster for DNS resolution (or use the API).
You could of course just move your service consumers into the swarm as well. If you're separating the clients from the services in different physical VLANs (or VPCs etc) though, you would need to launch your client containers in separate overlay networks to ensure you don't effectively defeat any physical network segregation already in place.
Service discovery (via dns) is built into docker since version 1.12. When you create a custom network (like bridge or overlay if you have multiple hosts) you can simply have the containers talk to each other via name as long as they are part of same network. You can also have an alias for each container which would round-robin the list of containers which have the same alias. For simple example see:
https://linuxctl.com/docker-networking-options-bridge
As long as you are using the bridge mode for your docker network and creating your containers inside that network, service discovery is available to you out of the box.
You will need to get help from other tools once your infrastructure starts to span in to multiple servers and microservices distributed on them.
Swarm is a good tool to start with, however, I would like to stick to consul if it comes to any IaaS provider like Amazon for my production loads.

Resources