add nodes to dockers swarm from different servers - docker

I am new to docker swarm, I read documentation and googled to the topic, but results was vague,
Is it possible to add worker or manager node from distinct and separate Virtual private servers?

Idea is to connect many non-related hosts into a swarm which then creates distribution over many systems and resiliency in case of any HW failures. The only thing you need to watch out for is that the internet connection between the hosts is stable and that all of the needed ports based of the official documentation are open. And you are good to go :)
Oh and between managers you want a VERY stable internet connection without any random ping spikes, or you may encounter weird behaviour (because of consensus with raft and decision making).
other than that it is good

Refer to Administer and maintain a swarm of Docker Engines
In production the best practice to maximise swarm HA is to spread your swarm managers across multiple availability zones. Availability Zones are geo-graphically co-located but distinct sites. i.e. instead of having a single London data centre, have 3 - each connected to a different internet and power utility. That way, if any single ISP or Power utility has an outage, you still have 2 data centres connected to the internet.
Swarm was designed with this kind of Highly available topology in mind and can scale to having its managers - and workers - distributed across nodes in different data centres.
However, Swarm is sensitive to latency over longer distances - so global distribution is not a good idea. In a single city, Data center to Data centre latencies will be in the low 10s of ms. Which is fine.
Connecting data centres in different cities / continents moves the latency to the low, to mid 100s of ms which does cause problems and leads to instability.
Otherwise, go ahead. Build your swarm across AZ distributed nodes.

Related

Why should I run multiple elasticsearch nodes on a single docker host?

There are a lot of articles online about running an Elasticsearch multi-node cluster using docker-compose, including the official documentation for Elasticsearch 8.0. However, I cannot find a reason why you would set up multiple nodes on the same docker host. Is this the recommended setup for a production environment? Or is it an example of theory in practice?
You shouldn't consider this a production environment. The guides are examples, often for lab environments, and testing scenarios with the application. I would not consider them production ready, and compose is often not considered a production grade tool since everything it does is to a single docker node, where in production you typically want multiple nodes spread across multiple availability zones.
Since one ES node heap memory should never get more than half the available memory (and less than ~30.5GB), one reason it makes sense to have several nodes on a given host is when you have hosts with ample memory (say 128GB+). In that case you could run 2 ES nodes (with 64GB of memory each, 30.5GB heap and the rest for Lucene) on the same host by correctly constraining each Docker container.
Note that the above is not related to Docker, you can always configure several nodes per host, whether Docker or not.
Regarding production and given the fact that 2+ nodes would run on the same host, if you lose that host, you lose two nodes, which is not good. However, depending on how many hosts you have, it might be a lesser problem, if and only if, each host is in a different availability zone and you have the appropriate cluster/shard allocation awareness settings configured, which would ensure that your data is redundantly copied in 2+ availability zones. In this case, losing a host (2 nodes) would still keep your cluster running, although in degraded mode.
It's worth noting that Elastic Cloud Enterprise (which powers Elastic Cloud) is designed to run several nodes per hosts depending on the sizing of the nodes and the available hardware. You can find more info on hardware pre-requisites as well as how medium and large scale deployments make use of one or more large 256GB hosts per availability zones.

Docker swarm regionalization for latency sensitive topology

We are currently operating a backend stack in central europe, Japan and Taiwan and are perparing our stack to transition to docker swarm.
We are working with real time data streams from sensor networks to do fast desaster warnings which means that latency is critical for some services. Therefore, we currently have brokers (rabbitmq) running on dedicated servers in each region as well as a backend instance digesting the data that is sent accross these brokers.
I'm uncertain how to best achieve a comparable topology using docker swarm. Is it possible to group nodes, let's say by nationality and then deploy a latency critical service stacks to each of these groups? Should I create separate swarms for each region (feels conceptually contradictory to docker swarm)?
The swarm managers should be in a low latency zone. Swarm workers can be anywhere. You can use a node label to indicate the location of the node, and restrict your workloads to a particular label as needed.
Latency critical considerations on the container-to-container network across large regional boundaries may be relevant depending on your required data path. If the only latency-critical data path is to the rabbitmq service that is external to the swarm, then you won't need to worry about the container-to-container latency.
It is also a valid pattern to have one swarm per region. If you need to be able to lose any region without impacting services on another region, then you'd want to split it up. If you have multiple low latency regions, then you can spread the master nodes across those.

How to emulate 500-50000 worker (docker) nodes network?

So I have a worker docker images. I want to spin up a network of 500-50000 nodes to emulate what happens to a private blockchain such as etherium on different scales. What would be a recomendation for an opensource tool/library for such job:
a) one that would make sure that even on a low-endish (say one 40 cores node) all workers will be moved forward in time equaly (not realtime)
b) would allow (a) in a distributed setting (say 10 low-endish nodes on a single lan)
In other words I do not seek for realtime network emulation, so I can wait for 10 hours to simulate 1 minute and it would be good enough fro me. I thought about Kathara yet a problem still stands - how to make sure that say 10000 containers are given the same amount of ticks in a round-robin manner?
So how to emulate a complex network of docker workers?
I'm taking the assumption that you will run each inside of a container. To ensure each container runs with similar CPU access, you can configure CPU reservations and limits on each replica. These numbers get computed down to fractional slices of a core, so on an 8 core system, you could give each container 0.01 of a core to run upwards of 800 containers. See the compose documentation on how to set resource constraints. And with swarm mode, you could spread these replicas across multiple nodes, sharing a network.
That said, I think the advice to run shorter simulations on more hardware is good. You will find a significant portion of the time is spent in context switching between each process, possibly invalidating any measurements you want to take.
You will also encounter scalability issues with docker and the orchestration tool you choose. For example, you'll need to adjust the subnet size for any shared network which defaults to a /24 with around 253 available IP's. The docker engine itself will likely be spending a non-trivial amount of CPU time maintaining the state for all of the running containers.

How many containers should exist per host in production? How should services be split?

I'm trying to understand the benefits of Docker better and I am not really understanding how it would work in production.
Let's say I have a web frontend, a rest api backend and a db. That makes 3 containers.
Let's say that I want 3 of the front end, 5 of the backend and 7 of the db. (Minor question: Does it ever make sense to have less dbs than backend servers?)
Now, given the above scenario, if I package them all on the same host then I gain the benefit of efficiently using the resources of the host, but then I am DOA when that machine fails or has a network partition.
If I separate them into 1 full application (ie 1 FE, 1 BE & 1 DB) per host, and put extra containers on their own host, I get some advantages of using resources efficiently, but it seems to me that I still lose significantly when I have a network partition since it will take down multiple services.
Hence I'm almost leaning to the conclusion that I should be putting in 1 container per host, but then that means I am using my resources pretty inefficiently and then what are the benefits of containers in production? I mean, an OS might be an extra couple gigs per machine in storage size, but most cloud providers give you a minimum of 10 gigs storage. And let's face it, a rest api backend or a web front end is not gonna even come close to the 10 gigs...even including the OS.
So, after all that, I'm trying to figure out if I'm missing the point of containers? Are the benefits of keeping all containers of an application on 1 host, mostly tied to testing and development benefits?
I know there are benefits from moving containers amongst different providers/machines easily, but for the most part, I don't see that as a huge gain personally since that was doable with images...
Are there any other benefits for containers in production that I am missing? Or are the main benefits for testing and development? (Am I thinking about containers in production wrong)?
Note: The question is very broad and could fill an entire book but I'll shed some light.
Benefits of containers
The exciting part about containers is not about their use on a single host, but their use across hosts connected on a large cluster. Do not look at your machines as independent docker hosts, but as a pool of resource to host your containers.
Containers alone are not ground-breaking (ie. Docker's CTO stating at the last DockerCon that "nobody cares about containers"), but coupled to state of the art schedulers and container orchestration frameworks, they become a very powerful abstraction to handle production-grade software.
As to the argument that it also applies to Virtual Machines, yes it does, but containers have some technical advantage (See: How is Docker different from a normal virtual machine) over VMs that makes them convenient to use.
On a Single host
On a single host, the benefits you can get from containers are (amongst many others):
Use as a development environment mimicking the behavior on a real production cluster.
Reproducible builds independent of the host (convenient for sharing)
Testing new software without bloating your machine with packages you won't use daily.
Extending from a single host to a pool of machines (cluster)
When time comes to manage a production cluster, there are two approaches:
Create a couple of docker hosts and run/connect containers together "manually" through scripts or using solutions like docker-compose. Monitoring the lifetime of your services/containers is at your charge, and you should be prepared to handle service downtime.
Let a container orchestrator deal with everything and monitor the lifetime of your services to better cope with failures.
There are plenty of container orchestrators: Kubernetes, Swarm, Mesos, Nomad, Cloud Foundry, and probably many others. They power many large-scale companies and infrastructures, like Ebay, so they sure found a benefit in using these.
Pick the right replication strategy
A container is better used as a disposable resource meaning you can stop and restart the DB independently and it shouldn't impact the backend (other than throwing an error because the DB is down). As such you should be able to handle any kind of network partition as long as your services are properly replicated across several hosts.
You need to pick a proper replication strategy, to make sure your service stays up and running. You can for example replicate your DB across Cloud provider Availability Zones so that when an entire zone goes down, your data remains available.
Using Kubernetes for example, you can put each of your containers (1 FE, 1 BE & 1 DB) in a pod. Kubernetes will deal with replicating this pod on many hosts and monitor that these pods are always up and running, if not a new pod will be created to cope with the failure.
If you want to mitigate the effect of network partitions, specify node affinities, hinting the scheduler to place containers on the same subset of machines and replicate on an appropriate number of hosts.
How many containers per host?
It really depends on the number of machines you use and the resources they have.
The rule is that you shouldn't bloat a host with too many containers if you don't specify any resource constraint (in terms of CPU or Memory). Otherwise, you risk compromising the host and exhaust its resources, which in turn will impact all the other services on the machine. A good replication strategy is not only important at a single service level, but also to ensure good health for the pool of services that are sharing a host.
Resource constraint should be dealt with depending on the type of your workload: a DB will probably use more resources than your Front-end container so you should size accordingly.
As an example, using Swarm, you can explicitely specify the number of CPUs or Memory you need for a given service (See docker service documentation). Although there are many possibilities and you can also give an upper bound/lower bound in terms of CPU or Memory usage. Depending on the values chosen, the scheduler will pin the service to the right machine with available resources.
Kubernetes works pretty much the same way and you can specify limits for your pods (See documentation).
Mesos has more fine grained resource management policies with frameworks (for specific workloads like Hadoop, Spark, and many more) and with over-commiting capabilities. Mesos is especially convenient for Big Data kind of workloads.
How should services be split?
It really depends on the orchestration solution:
In Docker Swarm, you would create a service for each component (FE, BE, DB) and set the desired replication number for each service.
In Kubernetes, you can either create a pod encompassing the entire application (FE, BE, DB and the volume attached to the DB) or create separate pods for the FE, BE, DB+volume.
Generally: use one service per type of container. Regarding groups of containers, evaluate if it is more convenient to scale the entire group of container (as an atomic unit, ie. a pod) than to manage them separately.
Sum up
Containers are better used with an orchestration framework/platform. There are plenty of available solutions to deal with container scheduling and resource management. Pick one that might fit your use case, and learn how to use it. Always pick an appropriate replication strategy, keeping in mind possible failure modes. Specify resource constraints for your containers/services when possible to avoid resource exhaustion which could potentially lead to bringing a host down.
This depends on the type of application you run in your containers. From the top of my head I can think of a couple different ways to look at this:
is your application diskspace heavy?
do you need the application fail save on multiple machines?
can you run multiple different instance of different applications on the same host without decreasing performance of them?
do you use software like kubernetes or swarm to handle your machines?
I think most of the question are interesting to answer even without containers. Containers might free you of thinking about single hosts, but you still have to decide and measure the load of your host machines yourself.
Minor question: Does it ever make sense to have less dbs than backend servers?
Yes.
Consider cases where you hit normal(without many joins) SQL select statements to get data from the database but your Business Logic demands too much computation. In those cases you might consider keeping your Back-End Service count high and Database Service count low.
It all depends on the use case which is getting solved.
The number of containers per host depends on the design ratio of the host and the workload ratio of the containers. Both ratios are
Throughput/Capacity ratios. In the old days, this was called E/B for execution/bandwidth. Execution was cpu and banwidth was I/o. Solutions were said to be cpu or I/o bound.
Today memories are very large the critical factor is usually cpu/nest
capacity. We describe workloads as cpu intense or nest intense. A useful proxy for nest capacity is the size of highest level cache. A useful design ratio estimator is (clock x cores)/cache. Fir the same core count the machine with a lower design ratio will hold more containers. In part this is because the machine with more cache will scale better and see less saturation at higher utilization. By

Bosun HA and scalability

I have a minor bosun setup, and its collecting metrics from numerous services, and we are planning to scale these services on the cloud.
This will mean more data coming into bosun and hence, the load/efficiency/scale of bosun is affected.
I am afraid of losing data, due to network overhead, and in case of failures.
I am looking for any performance benchmark reports for bosun, or any inputs on benchmarking/testing bosun for scale and HA.
Also, any inputs on good practices to be followed to scale bosun will be helpful.
My current thinking is to run numerous bosun binaries as a cluster, backed by a distributed opentsdb setup.
Also, I am thinking is it worthwhile to run some bosun executors as plain 'collectors' of scollector data (with bosun -n command), and some to just calculate the alerts.
The problem with this approach is it that same alerts might be triggered from multiple bosun instances (running without option -n). Is there a better way to de-duplicate the alerts?
The current best practices are:
Use https://godoc.org/bosun.org/cmd/tsdbrelay to forward metrics to opentsdb. This gets the bosun binary out of the "critical path". It should also forward the metrics to bosun for indexing, and can duplicate the metric stream to multiple data centers for DR/Backups.
Make sure your hadoop/opentsdb cluster has at least 5 nodes. You can't do live maintenance on a 3 node cluster, and hadoop usually runs on a dozen or more nodes. We use Cloudera Manager to manage the hadoop cluster, and others have recommended Apache Ambari.
Use a load balancer like HAProxy to split the /api/put write traffic across multiple instances of tsdbrelay in an active/passive mode. We run one instance on each node (with tsdbrelay forwarding to the local opentsdb instance) and direct all write traffic at a primary write node (with multiple secondary/backup nodes).
Split the /api/query traffic across the remaining nodes pointed directly at opentsdb (no need to go thru the relay) in an active/active mode (aka round robin or hash based routing). This improves query performance by balancing them across the non-write nodes.
We only run a single bosun instance in each datacenter, with the DR site using the read only flag (any failover would be manual). It really isn't designed for HA yet, but in the future may allow two nodes to share a redis instance and allow active/active or active/passive HA.
By using tsdbrelay to duplicate the metric streams you don't have to deal with opentsdb/hbase replication and instead can setup multiple isolated monitoring systems in each datacenter and duplicate the metrics to whichever sites are appropriate. We have a primary and a DR site, and choose to duplicate all metrics to both data centers. I actually use the DR site daily for Grafana queries since it is closer to where I live.
You can find more details about production setups at http://bosun.org/resources including copies of all of the haproxy/tsdbrelay/etc configuration files we use at Stack Overflow.

Resources