I'm trying to deploy couchbase community edition in a docker swarm environment. I followed the steps suggested by Arun Gupta, though I'm not sure if a master-worker model is desired as Couchbase doesn't have the notion of master/slave model.
Following are the problems I encountered. I'm wondering if anyone is able to run Couchbase successfully in a swarm mode.
Docker swarm assigns different IP address each time the service is restarted. Sometimes, docker moves the service to a new node which, again assigns a different IP address. It appears that Couchbase doesn't start if it finds a new IP address. (log says "address on which the service is configured is not up. Waiting for the interface to be brought up"). I'm using a host mounted volume as the data folder (/opt/couchase/var) to persist data across restarts.
I tried to read overlay network address used internally and update ip and ip_start files in a run script within the container. This doesn't help either. Server comes up as a new instance without loading old data. This is a real problem as production data can be lost if docker swarm moves services around.
docker swarm's internal router assigns an address from overlay network in addition to other interfaces. I tried using localhost, master.overlaynet, IP address of the overlaynet, private address assigned by docker to container etc. as server address in the Couchbase cluster configuration. While the cluster servers are able to communicate to each other, this created another problem with client connections. Client normally connects to an address/port exposed by the swarm cluster. This is different from cluster node address. In case of a python client, it reads Couchbase cluster server addresses and tried to connect to that if overlay address is given as server address at the time of joining the cluster. The client times out as the address is not reachable.
I might be able to add a network address constraint to the yaml file to ensure that master node will come up with the same address. For eg.
networks:
default:
ipv4_address: 172.20.x.xx
Above approach may not work for worker nodes as that will impact ability to scale worker nodes based on load/growth.
In this model (master/worker), how does a worker get elected as leader if master node goes down? Is master/worker the right approach for a Couchbase cluster in swarm environment?
It will be helpful if I can get some references to Couchbase swarm mode setup or some suggestions on how to handle IP address change.
We ran into the same problem (couchbase server 5.1.1) and our temporary solution is to use fixed IPs on a new docker bridge network.
networks:<br>
default:<br>
ipv4_address: 172.19.0.x
Although this works, this is not a good solution as we loose auto-scaling as mentioned above. We had some learnings during setup. Just to let you know:
You can run a single-node couchbase setup with dynamic IP. You can stop/restart this container and update couchbase-server version with no limitations.
When you add a second node this initially works with dynamic IP as well during setup. You can add the server and rebalance the cluster. But when you stop/restart/scale 0/1 a couchbase container, it won't start up anymore due to a new IP provides by docker (10.0.0.x with default network).
Changing the "ip" or "ip_start" files (/opt/couchbase/var/lib/couchbase/config) to update the IP does NOT work. Server starts up as "new" server, when changing the ip in "ip" and "ip_start" but it still has all the data. So you can backup your data, if you need now. So even after you "switched" to fixed IP you can't re-start the server directly, but need to cbbackup and cbrestore.
https://docs.couchbase.com/server/5.1/install/hostnames.html documentation for using hostnames is a little misleading as this only documents how to "find" a new server while configuring a cluster. If you specify hostnames couchbase anyway configures all nodes with the static IPs.
You might start your docker swarm with host network might be a solution, but we run multiple instances of other containers on a single host, so we would like to avoid that solution.
So always have a backup of the node/cluster. We always make a file-backup and a cluster-backup with cbbackup. As restoring from a file backup is much faster.
There is a discussion at https://github.com/couchbase/docker/issues/82 on this issue, but this involves using AWS for static IPs, which we don't.
I am aware of couchbase autonomous operator for kubernetes, but for now we would like to stay with docker swarm. If anybody has a nicer solution for this, how to configure couchbase to use hostnames, please share.
Related
I created a standalone container for php based website that is suppose to get data from mysql database.
On same VM, i started swarm service for mysql.
I noticed following
1) When I publish ports in mysql service, it was by default connecting to ingress network (which is obvious) so no questions there
2) When I do not publish any port, it got connected to "bridge" network by default. Should it connect to docker_gwbridge network created by swarm? or may I am missing something
2nd point related to DNS resolution, I was able to create a user defined bridge network for my php website and also manually attached the mysql container (which was created inside swarm) directly to that user defined network (I know not a good practice, but was trying to play around). This way, I was able to use DNS resolution as they both being on same network, I just provided mysql container name and my website started working. But ofcourse this will not long live, as my containers can get change over the period of time.
Hence My 2nd question is -
Can I provide DNS name of my swarm service (rather container) in my php db connection string (created in stand alone container). I was not able to make it work as service belongs to different network it seems
I have some experience with Docker Compose and container linking. In a non-swarm environment, you could easily connect from, e.g, the web container to the db_mysql container using its name (for example, in PHP I can configure the MySQL connection to be:
$dsn = 'mysql:host=db_mysql;
I am having a hard time understanding how that works with Docker in Swarm mode, especially considering the "replicas" and "load balancing" mechanisms.
Let's say I have 5 different Docker Machines, each having a different public IP, participating in a Swarm. I also have a web service and a db service that's replicated across these 5 different machines (1 instance per each machine).
My question is: how do I make any of the 5 web containers, communicate to any of the 5 db_mysql containers without forcing these web containers to have knowledge of any Docker Machine public IPs or the fact that these containers live within a Swarm?
You use the service name. This will resolve in DNS to either a VIP or the 5 ip addresses (one for each replica) of the service. Under the covers, the VIP uses IPVS to round robin to one of the healthy replicas without suffering from stale DNS issues. You can also get all the replica IP addresses using service_name.tasks even if you use the default VIP.
In Docker's DNS implementation, you can resolve the container name, and any network alias. The network alias includes the service name with DNSRR (used by docker-compose without swarm). Or the service name resolves to a VIP in swarm mode. The hostname of the container does not resolve, likely because it can change outside of the control (and therefore knowledge) of the docker engine.
Using Docker version 19.03.5 the correct DNS name to query in order to obtain all the IP addresses of the replica of a service is the following:
tasks.<service-name>
How do you access remote Docker container by its hostname?
I need to access remote Docker containers by its hostnames (or some constant IP's) for development and testing purposes. I have tried:
looking for any DNS approach (have not found any clues),
importing /ets/hosts (probably impossible),
creating tunnes (only this works but it is very time consuming).
It's the same as running any other process on a host, Docker or not Docker: you access it via the host name or IP address of the host and the port the service is listening on (the first port of the docker run -p argument). Docker containers don't have externally visible individual IP addresses any more than non-Docker HTTP or ssh daemons do.
If you do have DNS infrastructure available to you, you could set up CNAME records to resolve particular service names to the specific hosts that are running them.
One solution that may help you is some sort of service registry; in the past I've used Consul with some success. You can configure Consul with some health checks or other probes ("look for an HTTP service on port 12345 that answers GET / calls"), and it will provide its own DNS service ("okay, http://whatevername.service.consul:12345/ will reach your service on whichever hosts it happens to be running on").
Nothing in the Docker infrastructure specifically helps this. Using /etc/hosts is distinctly not a best practice: the name-to-IP mapping needs to be kept in sync across all machines and you'll start wishing you had a network service to publish it for you, which is exactly what DNS is for.
I'm trying to figure out whether Docker Swarm or Kubernetes are a good choice for my use case.
Basically, I want to build a small cluster of forward proxies (via squid, nginx or a custom nodejs script), and be able to deploy/start/stop/purge them all together.
I should be able to access the proxy cluster via a single IP address, manager should be able to load-balance the request to a node, and each proxy node must use a unique outgoing IP address.
I'm wondering:
Are Docker Swarm and/or Kubernetes the right way to go about it?
If so, should I set-up Docker Swarm and/or Kubernetes and its worker nodes (running the proxy) on a single dedicated server or separate virtual servers?
Is it also possible for all the cluster nodes to share a file system storage for caching, common config etc.
Any other tips to get this working.
Thanks!
Docker running in swarm mode should work well for this
Run docker on a single dedicated server; I see no need for virtual servers. You could also run the swarm across multiple dedicated servers.
https://docs.docker.com/engine/swarm/secrets/ work well for some settings and configurations. If you require significant storage, simply add a database service to your cluster
Docker swarm mode fits your requirements quite well; requests are automatically balanced across your swarm and each service instance can be configured to have a unique address. You should check out the swarm mode tutorial: https://docs.docker.com/engine/swarm/swarm-tutorial/
Elasticsearch is designed to run in cluster mode, all I have to do is to define the relevant node IPs in the cluster via environment variable and as long as network connectivity is available it will connect and join the other nodes to the cluster.
I have 3 nodes, 1 is acting as the docker swarm manager and the other two are workers. I have initialized the manager and joined the worker nodes and everything looks ok from that standpoint.
Now I'm trying to run the elasticsearch container in a way that will allow me to join all nodes to the same elasticsearch cluster, however, I want the nodes to join using their overlay network interface and that means that I need to know the container internal IP addresses at the time of running the docker service create command, how can I do this? Do I have to use something like consul to achieve this?
Some clarifications:
I need to know, at the time of service creation the IP addresses (or DNS names) for all Elasticsearch participants so I could start the cluster correctly. This has to be at the time of creation and not afterwards. Also, as I understand, I can expose ports 9200/9300 for all services and work with the external machine IPs and get it to work, but I would like to use the overlay network to do all these communications (I thought this is what swarm mode is for).
Only a partial solution here.
So, when attaching your services to a custom overlay network, you indeed have access to Docker's custom Service discovery feature. I'll detail the networking feature of Docker Swarm mode, before trying to tie it to your problem.
I'll be using the different term of services and tasks, in which a service could be elasticsearch, whereas a task is a single instance of that elasticsearch service.
Docker networking
The idea is that for each services you create, docker assigns a Virtual IP (VIP), and a custom dns alias. You can retrieve this VIP using the docker service inspect myservice command.
But, there is two modes to attach a service to an overlay network dnsrr and VIP. You can select these options using the --endpoint-mode options of docker service create.
The VIP mode (I believe it is the default one, or at least the most used), affects the virtual ip to the service's dns alias. This means that doing an nslookup servicename would return to you a single vip, that behind the scenes, would be linked to one of your container in a round robin fashion. But, there is also a special dns alias that lets you access all of your instances ips (all of your tasks ips) : tasks.myservice.
So in VIP mode you can retrieve all of your tasks ips using a simple nslookup tasks.myservice, where myservice is a service name.
The other mode is dnsrr. This mode simply gets rid of the VIP, and connects the dns alias to the different tasks (=service instances), in a round robin way. This way, you simply have to do a nslookup myservice to retrieve the different service instances ip.
Elasticsearch clustering
Ok so first of all I'm not really familiar with the way elasticsearch lets you cluster. From what I understood from your question, you need when running the elasticsearch binary, give it as a parameter, the adress of all of the other nodes it needs to cluster with.
So what I would do, is to create a custom Elasticsearch image, probably based on the one from the default library, to which I would add a custom Entrypoint that would firstly run a script to retrieve the other tasks ip.
I'd believe that staying in VIP mode is suitable for you, since there is the tasks.myservice dns alias. You'll then need to parse the output to retrieve the tasks ip (and probably remove yours). Then you'll be able to save them in a config file environment variable, or use them as a runtime option for your elasticsearch binary.
Edit: To create a custom overlay network, you will need to use the docker network create command, and use the --network option of docker service create
This is answer is mainly based on the Swarm mode networking documentation