I am using Docker Swarm in AWS environment.
I also use auto-scaling.
By default, there are two instances, and if auto-scaling works and the instance grows, you want nodes to grow accordingly.
For example, it is currently set to replica=4 and has two container master and nodes each. If auto-scaling adds one instance, it also wants to have two docker instance. I want the total number of containers to be 6. We want the total number of containers to be eight as another instance increases.
Related
There are a lot of articles online about running an Elasticsearch multi-node cluster using docker-compose, including the official documentation for Elasticsearch 8.0. However, I cannot find a reason why you would set up multiple nodes on the same docker host. Is this the recommended setup for a production environment? Or is it an example of theory in practice?
You shouldn't consider this a production environment. The guides are examples, often for lab environments, and testing scenarios with the application. I would not consider them production ready, and compose is often not considered a production grade tool since everything it does is to a single docker node, where in production you typically want multiple nodes spread across multiple availability zones.
Since one ES node heap memory should never get more than half the available memory (and less than ~30.5GB), one reason it makes sense to have several nodes on a given host is when you have hosts with ample memory (say 128GB+). In that case you could run 2 ES nodes (with 64GB of memory each, 30.5GB heap and the rest for Lucene) on the same host by correctly constraining each Docker container.
Note that the above is not related to Docker, you can always configure several nodes per host, whether Docker or not.
Regarding production and given the fact that 2+ nodes would run on the same host, if you lose that host, you lose two nodes, which is not good. However, depending on how many hosts you have, it might be a lesser problem, if and only if, each host is in a different availability zone and you have the appropriate cluster/shard allocation awareness settings configured, which would ensure that your data is redundantly copied in 2+ availability zones. In this case, losing a host (2 nodes) would still keep your cluster running, although in degraded mode.
It's worth noting that Elastic Cloud Enterprise (which powers Elastic Cloud) is designed to run several nodes per hosts depending on the sizing of the nodes and the available hardware. You can find more info on hardware pre-requisites as well as how medium and large scale deployments make use of one or more large 256GB hosts per availability zones.
If you have a host machine with say 3 VMS (or docker containers) running a different service each, whats the point of adding a replica of one of these VMs/containers on the same host machine or when would you need to do so? If the host machine is under a lot of traffic which will lead to problems with CPU utilization and memory, how will creating even more instances help?
Docker swarm also allows users to create new instances of a running container without adding new nodes to the cluster. How can this possibly help?
When you're traffic is going to your containers you would want more instances of it. With orchestrators such as kubernetes you can spread your instances across many hosts and make them accessible with a single address.
Your assumption that replicas are supposed to be on the same host is wrong.
The very idea of replicas is supposed to provide fault-tolerance, and thus they need to be on different hosts so that if one host goes down, your service is still available on the different node. [Think node clusters]
That said, there's nobody stopping you from creating the new instances on the same node, but that makes no sense and provides no added advantage of fault-tolerance.
Coming to the part where you say, if host machine is already under stress due to loads, how will it help to spawn new instance there?
Well, it won't. That is precisely why we spawn it on a different node on the cluster. And with the same IP, Kubernetes/Docker swarm makes sure to load-balance between each of them.
Container-level scaling increases fault-tolerance.
Node-level scaling increases throughput.
You might just run all services on 1 node (e.g. 1 VM), and when it is overloaded, add another instance. This is fine for on-demand resources, such as CPU and disk IO: They are unused when idle, but each service will also add some fixed overhead, like RAM or database connections. So you will waste this when scaling up everything rather than just the containers.
By being able to scale both on the container level and node level, you can add resources to your cluster (RAM, CPU), but only allocate it to the services needing it.
So the scaling should be something like this:
Node 1:
service A
service B
service C
Node 2:
service A
service B
Node 3:
service B
Any doubles per node just helps with fault tolerance.
Our current docker cluster has mix size of nodes. i:e some nodes have more memory and storage than other nodes.
Is there any way I can create two separate nodes groups for low end and high end nodes so that I can provision heavy containers on high end nodes only.
I understand using constraint filter(https://docs.docker.com/swarm/scheduler/filter/) I can provision a container on a particular node by ID or name. But again I can;t scale it dynamically if that node goes down or new nodes are added to the cluster.
No, you can't subdivide nodes within a swarm, you need to do it through labels. You apply labels either on the Docker engine - for the old Docker Swarm, or on the node for the new Swarm Mode.
Adding a label would be part of your onboarding for a new node - so all nodes have the appropriate labels and the scheduler can manage your services as you want.
We have a little farm of docker containers, spread over several Amazon instances.
Would it make sense to have fewer big host images (in terms of ram and size) to host multiple smaller containers at once, or to have one host instance per container, sized according to container needs?
EDIT #1
The issue here is that we need to decide up-front. I understand that we can decide later using various monitoring stats, but we need to make some architecture and infrastructure decisions before it is going to be used. More over, we do not have control over what content is going to be deployed.
You should read
An Updated Performance Comparison of Virtual Machines
and Linux Containers
http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf
and
Resource management in Docker
https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/
You need to check how much memory, CPU, I/O,... your containers consume, and you will draw your conclusions
You can easily, at least, check a few things with docker stats and docker top my_container
the associated docs
https://docs.docker.com/engine/reference/commandline/stats/
https://docs.docker.com/engine/reference/commandline/top/
When using Kubernetes to manage your docker containers, particularly when using the replication controller, when should you increase an images running container instances to more than 1? I understand that Kubernetes can spawn as many container replicas as needed in the replication controller configuration file, but why spawn multiple running containers (for the same image) when you can just increase the Compute VM size. I would think, when you need more compute power, go ahead and increase the machine CPU / ram higher, and then only when you reach the max available compute power allowed, approx 32 cores currently at Google, then you would need to spawn multiple containers.
However, it would seem as if spawning multiple containers regardless of VM size would prove more high-availability service, but Kubernetes will respawn failed containers even in a 1 container replication controller environment. So what I can't figure out is, for what reason would I want more than 1 running container (for the same image) for a reason other than running out of VM Instance Compute size?
I think you laid out the issues pretty well. The two kinds of scaling you described are called "vertical scaling" (increasing memory or CPU of a single instance) and "horizontal scaling" (increasing number of instances).
On availability: As you observed, you can achieve pretty good availability even with a single container, thanks to auto-restart (at the node level or replication controller level). But it can never be 100% because you will always have the downtime associated with restarting the process, either on the same machine or (if the machine failed) on a new machine. In contrast, horizontal scaling (running multiple replicas of the container) allows effectively "zero downtime" from the end-user's perspective, assuming you have some kind of load balancing or failover mechanism in place among the replicas, and your application is written in a way that allows replication.
On scalability: This is highly application-dependent. For example, vertically scaling CPU for a single-threaded application will not increase the workload it can handle, but running multiple replicas of it behind a load balancer (horizontal scaling) will. On the other hand, some applications aren't written in a way that allows them to be replicated, so for those vertical scaling is your only choice. Many applications (especially "cloud native" applications) are amenable to both horizontal and vertical scaling, but the details are application-dependent. Note that once you need to scale beyond the workload that a single node can handle (due to CPU or memory), you have no choice but to replicate (horizontal scaling).
So the short answer to your question is that people replicate for both availability and scalability.
There are a variety of reasons for why you would scale an application up or down.
The Kubernetes project is looking to provide auto-scaling in the future as a feature to dynamically size up and size down (potentially to 0) a replication controller in response to observed traffic. For a good discussion on auto-scaling, see the following write-up:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md