AWS EKS Cluster Auto scale - docker

I have a AWS EKS cluster 1.12 version for my applications, We have deployed 6 apps in the cluster everything is working fine, while creating nodes I have added an autoscaling node group which spans across availability zones with minimum 3 and max 6 nodes, so desired 3 nodes are running fine.
I have scenario like this:
when some memory spike happens I need to get more nodes as I mentioned in auto scaling group max nodes, so at the time of cluster set up I didn't add Cluster auto scale.
Can somebody please address following doubts
As per AWS documentation cluster auto scale won't support if our node group in multiple AZs
If at all we require to create multiple node groups as per the aws doc, how to mention min max nodes, is it like for entire cluster ?
How can I achieve auto scale on memory metric since this won't come by default like cpu metric

You should create one node group for every AZ. So if your cluster size is 6 nodes then create 2 instance node groups in one AZ each. You can also spread the pods across AZ for High Availability. If you look at cluster autoscaler documentation, it recommends:
Cluster autoscaler does not support Auto Scaling Groups which span
multiple Availability Zones; instead you should use an Auto Scaling
Group for each Availability Zone and enable the
--balance-similar-node-groups feature. If you do use a single Auto Scaling Group that spans multiple Availability Zones you will find
that AWS unexpectedly terminates nodes without them being drained
because of the rebalancing feature.
I am assuming you want to scale the pods based on memory. For that you will have to use metric server or Prometheus and create a HPA which scaled based on memory. You can find a working example here.

Related

Docker Swarm scaling

I am using Docker Swarm in AWS environment.
I also use auto-scaling.
By default, there are two instances, and if auto-scaling works and the instance grows, you want nodes to grow accordingly.
For example, it is currently set to replica=4 and has two container master and nodes each. If auto-scaling adds one instance, it also wants to have two docker instance. I want the total number of containers to be 6. We want the total number of containers to be eight as another instance increases.

Why should I run multiple elasticsearch nodes on a single docker host?

There are a lot of articles online about running an Elasticsearch multi-node cluster using docker-compose, including the official documentation for Elasticsearch 8.0. However, I cannot find a reason why you would set up multiple nodes on the same docker host. Is this the recommended setup for a production environment? Or is it an example of theory in practice?
You shouldn't consider this a production environment. The guides are examples, often for lab environments, and testing scenarios with the application. I would not consider them production ready, and compose is often not considered a production grade tool since everything it does is to a single docker node, where in production you typically want multiple nodes spread across multiple availability zones.
Since one ES node heap memory should never get more than half the available memory (and less than ~30.5GB), one reason it makes sense to have several nodes on a given host is when you have hosts with ample memory (say 128GB+). In that case you could run 2 ES nodes (with 64GB of memory each, 30.5GB heap and the rest for Lucene) on the same host by correctly constraining each Docker container.
Note that the above is not related to Docker, you can always configure several nodes per host, whether Docker or not.
Regarding production and given the fact that 2+ nodes would run on the same host, if you lose that host, you lose two nodes, which is not good. However, depending on how many hosts you have, it might be a lesser problem, if and only if, each host is in a different availability zone and you have the appropriate cluster/shard allocation awareness settings configured, which would ensure that your data is redundantly copied in 2+ availability zones. In this case, losing a host (2 nodes) would still keep your cluster running, although in degraded mode.
It's worth noting that Elastic Cloud Enterprise (which powers Elastic Cloud) is designed to run several nodes per hosts depending on the sizing of the nodes and the available hardware. You can find more info on hardware pre-requisites as well as how medium and large scale deployments make use of one or more large 256GB hosts per availability zones.

Scheduling and scaling pods in kubernetes

i am running k8s cluster on GKE
it has 4 node pool with different configuration
Node pool : 1 (Single node coroned status)
Running Redis & RabbitMQ
Node pool : 2 (Single node coroned status)
Running Monitoring & Prometheus
Node pool : 3 (Big large single node)
Application pods
Node pool : 4 (Single node with auto-scaling enabled)
Application pods
currently, i am running single replicas for each service on GKE
however 3 replicas of the main service which mostly manages everything.
when scaling this main service with HPA sometime seen the issue of Node getting crashed or kubelet frequent restart PODs goes to Unkown state.
How to handle this scenario ? If the node gets crashed GKE taking time to auto repair and which cause service down time.
Question : 2
Node pool : 3 -4 running application PODs. Inside the application, there are 3-4 memory-intensive micro services i am also thinking same to use Node selector and fix it on one Node.
while only small node pool will run main service which has HPA and node auto scaling auto work for that node pool.
however i feel like it's not best way to it with Node selector.
it's always best to run more than one replicas of each service but currently, we are running single replicas only of each service so please suggest considering that part.
As Patrick W rightly suggested in his comment:
if you have a single node, you leave yourself with a single point of
failure. Also keep in mind that autoscaling takes time to kick in and
is based on resource requests. If your node suffers OOM because of
memory intensive workloads, you need to readjust your memory requests
and limits – Patrick W Oct 10 at
you may need to redesign a bit your infrastructure so you have more than a single node in every nodepool as well as readjust mamory requests and limits
You may want to take a look at the following sections in the official kubernetes docs and Google Cloud blog:
Managing Resources for Containers
Assign CPU Resources to Containers and Pods
Configure Default Memory Requests and Limits for a Namespace
Resource Quotas
Kubernetes best practices: Resource requests and limits
How to handle this scenario ? If the node gets crashed GKE taking time
to auto repair and which cause service down time.
That's why having more than just one node for a single node pool can be much better option. It greatly reduces the likelihood that you'll end up in the situation described above. GKE autorapair feature needs to take its time (usually a few minutes) and if this is your only node, you cannot do much about it and need to accept possible downtimes.
Node pool : 3 -4 running application PODs. Inside the application,
there are 3-4 memory-intensive micro services i am also thinking same
to use Node selector and fix it on one Node.
while only small node pool will run main service which has HPA and
node auto scaling auto work for that node pool.
however i feel like it's not best way to it with Node selector.
You may also take a loot at node affinity and anti-affinity as well as taints and tolerations

How to emulate 500-50000 worker (docker) nodes network?

So I have a worker docker images. I want to spin up a network of 500-50000 nodes to emulate what happens to a private blockchain such as etherium on different scales. What would be a recomendation for an opensource tool/library for such job:
a) one that would make sure that even on a low-endish (say one 40 cores node) all workers will be moved forward in time equaly (not realtime)
b) would allow (a) in a distributed setting (say 10 low-endish nodes on a single lan)
In other words I do not seek for realtime network emulation, so I can wait for 10 hours to simulate 1 minute and it would be good enough fro me. I thought about Kathara yet a problem still stands - how to make sure that say 10000 containers are given the same amount of ticks in a round-robin manner?
So how to emulate a complex network of docker workers?
I'm taking the assumption that you will run each inside of a container. To ensure each container runs with similar CPU access, you can configure CPU reservations and limits on each replica. These numbers get computed down to fractional slices of a core, so on an 8 core system, you could give each container 0.01 of a core to run upwards of 800 containers. See the compose documentation on how to set resource constraints. And with swarm mode, you could spread these replicas across multiple nodes, sharing a network.
That said, I think the advice to run shorter simulations on more hardware is good. You will find a significant portion of the time is spent in context switching between each process, possibly invalidating any measurements you want to take.
You will also encounter scalability issues with docker and the orchestration tool you choose. For example, you'll need to adjust the subnet size for any shared network which defaults to a /24 with around 253 available IP's. The docker engine itself will likely be spending a non-trivial amount of CPU time maintaining the state for all of the running containers.

When should you create more than one docker container image instance with Kubernetes Replication Controller?

When using Kubernetes to manage your docker containers, particularly when using the replication controller, when should you increase an images running container instances to more than 1? I understand that Kubernetes can spawn as many container replicas as needed in the replication controller configuration file, but why spawn multiple running containers (for the same image) when you can just increase the Compute VM size. I would think, when you need more compute power, go ahead and increase the machine CPU / ram higher, and then only when you reach the max available compute power allowed, approx 32 cores currently at Google, then you would need to spawn multiple containers.
However, it would seem as if spawning multiple containers regardless of VM size would prove more high-availability service, but Kubernetes will respawn failed containers even in a 1 container replication controller environment. So what I can't figure out is, for what reason would I want more than 1 running container (for the same image) for a reason other than running out of VM Instance Compute size?
I think you laid out the issues pretty well. The two kinds of scaling you described are called "vertical scaling" (increasing memory or CPU of a single instance) and "horizontal scaling" (increasing number of instances).
On availability: As you observed, you can achieve pretty good availability even with a single container, thanks to auto-restart (at the node level or replication controller level). But it can never be 100% because you will always have the downtime associated with restarting the process, either on the same machine or (if the machine failed) on a new machine. In contrast, horizontal scaling (running multiple replicas of the container) allows effectively "zero downtime" from the end-user's perspective, assuming you have some kind of load balancing or failover mechanism in place among the replicas, and your application is written in a way that allows replication.
On scalability: This is highly application-dependent. For example, vertically scaling CPU for a single-threaded application will not increase the workload it can handle, but running multiple replicas of it behind a load balancer (horizontal scaling) will. On the other hand, some applications aren't written in a way that allows them to be replicated, so for those vertical scaling is your only choice. Many applications (especially "cloud native" applications) are amenable to both horizontal and vertical scaling, but the details are application-dependent. Note that once you need to scale beyond the workload that a single node can handle (due to CPU or memory), you have no choice but to replicate (horizontal scaling).
So the short answer to your question is that people replicate for both availability and scalability.
There are a variety of reasons for why you would scale an application up or down.
The Kubernetes project is looking to provide auto-scaling in the future as a feature to dynamically size up and size down (potentially to 0) a replication controller in response to observed traffic. For a good discussion on auto-scaling, see the following write-up:
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/proposals/autoscaling.md

Resources