Hie,
I have recently come across virtualization called containers.
this is regarding auto-scaling.
I have read that when current container is out of resources then it will add a new container to the host. How does adding new container saves in resources?
can any explain the scale of
scenario 1: container running all of the resources (leaving resources for host os)
vs
Scenario 2: the scale of running two containers running in the same
host (where the container is using half of the resources consumed in
the previous case)
if the scale is more in scenario 2 then can anyone explain how scale has increased by having two containers even through total resources are same?
Consider following scenario regarding workload of an application (Only CPU and memory are considered here for example) -
Normal Workload
It requires 4 Cores , 16 GB memory.
Maximum Workload
It requires 8 Cores, 32 GB Memory.
Assuming that the burst of activities (max workload) happens only 2 hours per day.
Case 1 - When application is not containerized
The application has to reserve 8 Cores and 32 GB of memory for it's working so that it could handle max workload with expected performance.
But the 4 Cores and 16 GB of memory is going to be wasted for 22 hours in a day.
Case 2 - When application is containerized
Lets assume that a container of 4 Cores and 16 GB memory is spawned. Then remaining resources, 4 Cores and 16 GB memory are available for other applications in cluster and another container of same configuration will be spawned for this application only for 2 hours in a day when max workload
is required.
Therefore the resources in cluster are used optimally when applications are containerized.
What if single machine does not have all the resources required for the application ?
In such cases, if application is containerized then containers/resources can be allocated from multiple machines in the cluster.
Increase fault tolerance
If application is running on single machine and the machine goes down then the whole application become unavailable. But in case of containerized application
running on different machines in cluster. If a machine fails then only few containers will not be available.
Regarding your question, if the application's workload is going to be uniform throughout the lifetime then there is no benefit in breaking the application in smaller containers in terms of scale. But you may still consider containerizing applications for other benefits. In terms of scale, an application is containerized only when there is varying workload or more workload is anticipated in future.
how scale has increased by having two containers even through total resources are same?
It usually doesn't. If you have a single threaded application, and you have a multi core host, then scaling multiple instances of the container in the same host will give you access to more cores by that application. This could also help with a multi threaded application if it's limited by internal resource contention and not using all of the cores. But for most multi threaded processes, scaling the application up on a single host will not improve performance.
What scaling does help with in an multi node environment is allowing the application to run in other hosts in the cluster that are not fully utilized. This is the horizontal scaling that most target with 12 factor apps. It allows apps deployed to cloud environments to scale out with more replicas/shards by adding more nodes rather than trying to find more resources for a single large node.
Related
I understand the use of replicas in Docker Swarm mode. It is mainly to eliminate points of failure and reduce the amount of downtime. It is well explained in this post.
Since having more replicas is more useful for a system as a whole, why don't companies just initialise as many replicas as possible e.g 1000 replicas for a docker service? I can imagine a large corporation running a back-end system may face multiple points of failures at any given time and they would benefit from having more instances of the particular service.
I would like to know how many replicas are considered TOO MUCH and what are the factors affecting the performance of a Docker Swarm?
I can think of hardware overhead being a limiting factor.
Lets say your running Rails app. Each instance required 128Mb of RAM, and 10% CPU usage. 9 instances is a touch over 1Gb of memory and 1 entire CPU.
While that does not sounds like a lot, image an organization has 100 + teams each with 3,4,5 applications each. The hardware requirements to operation an application at acceptable levels quickly ramp up.
Then there is network chatter. 10MB/s is typical in big org/corp settings. While a heartbeat check for a couple instances is barely noticeable, heartbeat on 100's of instances could jam up the network.
At the end of the day it comes down the constraints. What are the boundaries within the software, hardware, environment, budgetary, and support systems? It is often hard to imagine the pressures present when (technical) decisions are made.
So I have a worker docker images. I want to spin up a network of 500-50000 nodes to emulate what happens to a private blockchain such as etherium on different scales. What would be a recomendation for an opensource tool/library for such job:
a) one that would make sure that even on a low-endish (say one 40 cores node) all workers will be moved forward in time equaly (not realtime)
b) would allow (a) in a distributed setting (say 10 low-endish nodes on a single lan)
In other words I do not seek for realtime network emulation, so I can wait for 10 hours to simulate 1 minute and it would be good enough fro me. I thought about Kathara yet a problem still stands - how to make sure that say 10000 containers are given the same amount of ticks in a round-robin manner?
So how to emulate a complex network of docker workers?
I'm taking the assumption that you will run each inside of a container. To ensure each container runs with similar CPU access, you can configure CPU reservations and limits on each replica. These numbers get computed down to fractional slices of a core, so on an 8 core system, you could give each container 0.01 of a core to run upwards of 800 containers. See the compose documentation on how to set resource constraints. And with swarm mode, you could spread these replicas across multiple nodes, sharing a network.
That said, I think the advice to run shorter simulations on more hardware is good. You will find a significant portion of the time is spent in context switching between each process, possibly invalidating any measurements you want to take.
You will also encounter scalability issues with docker and the orchestration tool you choose. For example, you'll need to adjust the subnet size for any shared network which defaults to a /24 with around 253 available IP's. The docker engine itself will likely be spending a non-trivial amount of CPU time maintaining the state for all of the running containers.
Do we get efficiency in terms of load handling when the same container (in this case the container has a apache server and a php application) is deployed 5 or more times (i.e. 5 or more containers are deployed) on the same Host or VM?
Here efficiency would mean whether the application in such an architecture is able to serve more requests or serve requests faster?
As far as i am aware, each request launches a new apache-php thread and if we have 5 containers handling the requests then will it be inefficient since now the threads launched by apache will be contextually be switched out more often?
Scaling an application requires understanding why the application has reached it's limit. For this, you need to gather metrics from the application and host when it is fully loaded. Without testing and gathering metrics, you're only guessing why you've at capacity.
If the application is fully utilizing one or more cpu cores, but not all of them, then it is either not multi threaded, or is encountering locks preventing all the cores from being used. Adding more containers to the host in this scenario may help scale.
Typically, horizontal scaling is done because a single host is using all of some resource, like disk io, network bandwidth, memory, or cpu. If you find that the app is using all of one or more of these resources when under heavy load, then you need more hosts, not more containers running on the same host.
This all assumes you haven't configured docker to limit resources on the containers. If you reach your capacity with one container, and have resource limits configured, then the easiest way to get further performance is to remove or reduce those limits.
On this page the docker documentations shows an example of a cluster of 4 nodes and 8 web services. The deployment strategy deploys the containers evenly: I assume 2 per node. A load balancer service is also added.
If I understand correctly you will have 3 nodes with 2 web app containers and a fourth node with 2 web apps containers and a load balancer container.
Is there a real performance gain to load balancing on the same node?
Would the node with the load balancer ever load balance to itself while it's busy load balancing?
In my opinion, it depends on:
Your web application architecture,
The engine that runs it, and
Your server capacity (CPU, memory, IO bandwidth, etc.)
A Node web app, for example, is a single threaded application, and depending on how it's written, it may or may not perform optimally for the machine that it runs on. For instance, if you have a compute-heavy application, or if parts of your application perform blocking IO (file, http, etc.) operations, then you'll quickly hit the limits of a single threaded application running on a single core. If your CPU has additional cores, your Node app won't be using them, and that additional power will not be utilized.
In this case, yes, running multiple instances of the application ("load balancing" between them) can offer visible improvement, as long as your CPU and other resources are not already exhausted by any one instance. In other words, if a single instance of your application does not fully utilize the entire CPU capacity available on your host, then yes, running multiple instances of it will help.
If your web application and the engine that runs it, however, is capable of multi-threading and utilizing multiple CPU cores, then running multiple instances of it won't add any value, and may in fact adversely affect your server's performance.
Your ultimate goal in architecting your application and configuring your server should be to optimally utilize all resources available to you: CPU, memory, disk and network bandwidth. If a single instance of your application can exhaust any one of those resources, then there's no benefit in starting additional instances. But if a single instance is unable to use all resources (either naturally, like Node's single threaded nature, or by design, like when your application throttles itself from using too much memory or disk access, etc.) then you can utilize the remaining capacity by running multiple instances side-by-side.
When using Kubernetes to manage your docker containers, particularly when using the replication controller, when should you increase an images running container instances to more than 1? I understand that Kubernetes can spawn as many container replicas as needed in the replication controller configuration file, but why spawn multiple running containers (for the same image) when you can just increase the Compute VM size. I would think, when you need more compute power, go ahead and increase the machine CPU / ram higher, and then only when you reach the max available compute power allowed, approx 32 cores currently at Google, then you would need to spawn multiple containers.
However, it would seem as if spawning multiple containers regardless of VM size would prove more high-availability service, but Kubernetes will respawn failed containers even in a 1 container replication controller environment. So what I can't figure out is, for what reason would I want more than 1 running container (for the same image) for a reason other than running out of VM Instance Compute size?
I think you laid out the issues pretty well. The two kinds of scaling you described are called "vertical scaling" (increasing memory or CPU of a single instance) and "horizontal scaling" (increasing number of instances).
On availability: As you observed, you can achieve pretty good availability even with a single container, thanks to auto-restart (at the node level or replication controller level). But it can never be 100% because you will always have the downtime associated with restarting the process, either on the same machine or (if the machine failed) on a new machine. In contrast, horizontal scaling (running multiple replicas of the container) allows effectively "zero downtime" from the end-user's perspective, assuming you have some kind of load balancing or failover mechanism in place among the replicas, and your application is written in a way that allows replication.
On scalability: This is highly application-dependent. For example, vertically scaling CPU for a single-threaded application will not increase the workload it can handle, but running multiple replicas of it behind a load balancer (horizontal scaling) will. On the other hand, some applications aren't written in a way that allows them to be replicated, so for those vertical scaling is your only choice. Many applications (especially "cloud native" applications) are amenable to both horizontal and vertical scaling, but the details are application-dependent. Note that once you need to scale beyond the workload that a single node can handle (due to CPU or memory), you have no choice but to replicate (horizontal scaling).
So the short answer to your question is that people replicate for both availability and scalability.
There are a variety of reasons for why you would scale an application up or down.
The Kubernetes project is looking to provide auto-scaling in the future as a feature to dynamically size up and size down (potentially to 0) a replication controller in response to observed traffic. For a good discussion on auto-scaling, see the following write-up:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md