Why would one chose many smaller machine types instead of fewer big machine types? - google-cloud-dataflow

In a clustering high-performance computing framework such as Google Cloud Dataflow (or for that matter even Apache Spark or Kubernetes clusters etc), I would think that it's far more performant to have fewer really BIG machine types rather than many small machine types, right? As in, it's more performant to have 10 n1-highcpu-96 rather than say 120 n1-highcpu-8 machine types, because
the cpus can use shared memory, which is way way faster than network communications
if a single thread needs access to lots of memory for a single threaded operation (eg sort), it has access to that greater memory in a BIG machine rather than a smaller one
And since the price is the same (eg 10 n1-highcpu-96 costs the same as 120 n1-highcpu-8 machine types), why would anyone opt for the smaller machine types?
As well, I have a hunch that for the n1-highcpu-96 machine type, we'd occupy the whole host, so we don't need to worry about competing demands on the host by another VM from another Google cloud customer (eg contention in the CPU caches
or motherboard bandwidth etc.), right?
Finally, although I don't think the google compute VMs correctly report the "true" CPU topology of the host system, if we do chose the n1-highcpu-96 machine type, the reported CPU topology may be a touch closer to the "truth" because presumably the VM is using up the whole host, so the reported CPU topology is a little closer to the truth, so any programs (eg the "NUMA" aware option in Java?) running on that VM that may attempt to take advantage of the topology has a better chance of making the "right decisions".

It will depend on many factors if you want to choose many instances with smaller machine type or a few instances with big machine types.
The VMs sizes differ not only in number of cores and RAM, but also on network I/O performance.
Instances with small machine types have are limited in CPU and I/O power and are inadequate for heavy workloads.
Also, if you are planning to grow and scale it is better to design and develop your application in several instances. Having small VMs gives you a better chance of having them distributed across physical servers in the datacenter that have the best resource situation at the time the machines are provisioned.
Having a small number of instances helps to isolate fault domains. If one of your small nodes crashes, that only affects a small number of processes. If a large node crashes, multiple processes go down.
It also depends on the application you are running on your cluster and the workload.I would also recommend going through this link to see the sizing recommendation for an instance.

Related

Does containerization always lead to cpu, ram and storage cost savings as compared to VMs?

Being a naive in the world of containers, and after reading a lot of literature online, I was wondering if someone could render some guidance.
I wanted to know if containers always lead to cost savings in terms of cpu, memory and storage when compared with the same application running inside a VM.
I can think of a scenario when it won’t when the scaleset in case of VM running inside an orchestrator like kubernetes is a high number leading to more consumption of compute.
I was wondering what is the general understanding here
Containerization is not about cost savings in terms of CPU/RAM/Storage, but a lot more.
When an app gets deployed on a VM, you need to have specific tools like Ansible/Chef/Puppet to optimize deployments, and you also need additional tools to monitor the load to increase/decrease the number of VMs running, you also need additional tools to provide WideIP support across the running services in case of a REST API, and the list goes on.
With Containers running on Kubernetes, you have all these features built in to some extent, and when you deploy Servicemesh framework like Istio, you get additional features which add lot of value with minimum effort including Circuit Breakers, retries, authentication, etc.

What is the Impact of having more replicas in Docker Swarm mode?

I understand the use of replicas in Docker Swarm mode. It is mainly to eliminate points of failure and reduce the amount of downtime. It is well explained in this post.
Since having more replicas is more useful for a system as a whole, why don't companies just initialise as many replicas as possible e.g 1000 replicas for a docker service? I can imagine a large corporation running a back-end system may face multiple points of failures at any given time and they would benefit from having more instances of the particular service.
I would like to know how many replicas are considered TOO MUCH and what are the factors affecting the performance of a Docker Swarm?
I can think of hardware overhead being a limiting factor.
Lets say your running Rails app. Each instance required 128Mb of RAM, and 10% CPU usage. 9 instances is a touch over 1Gb of memory and 1 entire CPU.
While that does not sounds like a lot, image an organization has 100 + teams each with 3,4,5 applications each. The hardware requirements to operation an application at acceptable levels quickly ramp up.
Then there is network chatter. 10MB/s is typical in big org/corp settings. While a heartbeat check for a couple instances is barely noticeable, heartbeat on 100's of instances could jam up the network.
At the end of the day it comes down the constraints. What are the boundaries within the software, hardware, environment, budgetary, and support systems? It is often hard to imagine the pressures present when (technical) decisions are made.

Machine type for google cloud dataflow jobs

I noticed there is an option that allows specifying a machine type.
What is the criteria I should use to decide whether to override the default machine type?
In some experiments I saw that throughput is better with smaller instances, but on the other hand jobs tend to experience more "system" failures when many small instances are used instead of a smaller number of default instances.
Thanks,
G
Dataflow will eventually optimize the machine type for you. In the meantime here are some scenarios I can think of where you might want to change the machine type.
If your ParDO operation needs a lot of memory you might want to change the machine type to one of the high memory machines that Google Compute Engine provides.
Optimizing for cost and speed. If your CPU utilization is less than 100% you could probably reduce the cost of your job by picking a machine with fewer CPUs. Alternatively, if you increase the number of machines and reduce the number of CPUs per machine (so total CPUs stays approximately constant) you can make your job run faster but cost approximately the same.
Can you please elaborate more on what type of system failures you are seeing? A large class of failures (e.g. VM interruptions) are probabilistic so you would expect to see a larger absolute number of failures as the number of machines increases. However, failures like VM interruptions should be fairly rare so I'd be surprised if you noticed an increase unless you were using order of magnitude more VMs.
On the other hand, its possible you are seeing more failures because of resource contention due to the increased parallelism of using more machines. If that's the case we'd really like to know about it to see if this is something we can address.

Defining Scaling Threshold for Azure Web Roles

Azure embraces the notion of elastic scaling and I've been able to acheive this with my Worker Roles. However, when it comes to my Web Roles (e.g. MVC Apps) I am not sure what to monitor (or how) to determine when its a good time to increase (or descrease) the number of running instances. I'm assuming I need to monitor one or many Performance Counters but not sure where to start.
Can anyone recommend a best practice for assessing an MVC Web Role instances load relative to scaling decisions?
This question is a bit open-ended, as monitoring is typically app-specific. Having said that:
Start with simple measurements that you'd look at on a local server, representing KPIs for your app. For instance: Maybe look at network utilization. This TechNet article describes performance counters collected by System Center for Windows Azure. For instance:
ASP.NET Applications Requests/sec
Network Interface Bytes
Received/sec
Network Interface Bytes Sent/sec
Processor % Processor Time Total
LogicalDisk Free Megabytes
LogicalDisk % Free Space
Memory Available Megabytes
You may also want to watch # of requests queued and request wait time.
Network utilization is interesting, since your NIC provides approx. 100Mbps per core and could end up being a bottleneck even when CPU and other resources are underutilized. You may need to scale out to more instances to handle high-bandwidth scenarios.
Also: I tend to give less importance to CPU utilization, even though it's so easy to measure (and shows up so frequently in examples). Running a CPU at near capacity is a good thing usually, since you're paying for it and might as well use as much as possible.
As far as decreasing: This needs to be handled a bit more carefully. Windows Azure compute is billed by the hour. If, say, you scale out to an extra instance at 11:50 and scale in again at 12:10, you've just incurred two cpu-hours. Also: You don't want to scale out, then take new measurements and deciding you can now scale back again (effectively creating a constant pulse of adding and decreasing instances). To make things easier, consider the Autoscaling Application Block (WASABi), found in the Enterprise Library. This has all the scale rules baked in (such as the ones I just mentioned) and is very straightforward to use.

How scalable is distributed Erlang?

Part A:
Erlang has a lot of success stories about running concurrent agents e.g. the millions of simultaneous Facebook chats. That's millions of agents, but of course it's not millions of CPUs across a network. I'm having trouble finding metrics on how well Erlang scales when scaling is "horizontal" across a LAN/WAN.
Let's assume that I have many (tens of thousands) physical nodes (running Erlang on Linux) that need to communicate and synchronize small infrequent amounts of data across the LAN/WAN. At what point will I have communications bottlenecks, not between agents, but between physical nodes? (Or will this just work, assuming a stable network?)
Part B:
I understand (as an Erlang newbie, meaning I could be totally wrong) that Erlang nodes attempt to all connect to and be aware of each other, resulting in an N^2 connection point-to-point network. Assuming that part A won't just work with N = 10K's, can Erlang be configured easily (using out-of-the-box config or trivial boilerplate, not writing a full implementation of grouping/routing algorithms myself) to cluster nodes into manageable groups and route system -wide messages through the cluster/group hierarchy?
We should specify that we talk about horizontal scalability of physical machines -- that's the only problem. CPUs on one machine will be handled by one VM, no matter what the number of those is.
node = machine.
To begin, I can say that 30-60 nodes you get out of the box (vanilla OTP installation) with any custom application written on the top of that (in Erlang). Proof: ejabberd.
~100-150 is possible with optimized custom application. I means, it has to be good code, written with knowledge about GC, characteristic of data types, message passing etc.
over +150 is all right but when we talk about numbers like 300, 500 it will require optimizations & customizations of TCP layer. Also, our app has to be aware of cost of e.g. sync calls across the cluster.
The other thing is DB layer. Mnesia (built-in) due its features will not be effective over 20 nodes (my experience - I may be wrong). Solution: just use something else: dynamo DBs, separate cluster of MySQLs, HBase etc.
The most common technique to leverage cost of creating high quality application and scalability are federations of ~20-50 nodes clusters. So internally its an efficient mesh of ~50 erlang nodes and its connected via any suitable protocol with N another 50 nodes clusters. To sum up, such a system is federation of N erlang clusters.
Distributed erlang is designed to run in one data center. If you need more, geographically distant nodes, then use federations.
There are lots of config options e.g. which do not connect all nodes to each other. It may be helpful, however in ~50 cluster erlang overhead is not significant. Also you can create a graph of erlang nodes using 'hidden' connection, which doesn't join this full mesh, but also it cannot benefit from connection to all nodes.
The biggest problem I see, in this kind of systems, is designing it as master-less system. If you do not need that, everything should be ok.

Resources