How many containers should exist per host in production? How should services be split? - docker

I'm trying to understand the benefits of Docker better and I am not really understanding how it would work in production.
Let's say I have a web frontend, a rest api backend and a db. That makes 3 containers.
Let's say that I want 3 of the front end, 5 of the backend and 7 of the db. (Minor question: Does it ever make sense to have less dbs than backend servers?)
Now, given the above scenario, if I package them all on the same host then I gain the benefit of efficiently using the resources of the host, but then I am DOA when that machine fails or has a network partition.
If I separate them into 1 full application (ie 1 FE, 1 BE & 1 DB) per host, and put extra containers on their own host, I get some advantages of using resources efficiently, but it seems to me that I still lose significantly when I have a network partition since it will take down multiple services.
Hence I'm almost leaning to the conclusion that I should be putting in 1 container per host, but then that means I am using my resources pretty inefficiently and then what are the benefits of containers in production? I mean, an OS might be an extra couple gigs per machine in storage size, but most cloud providers give you a minimum of 10 gigs storage. And let's face it, a rest api backend or a web front end is not gonna even come close to the 10 gigs...even including the OS.
So, after all that, I'm trying to figure out if I'm missing the point of containers? Are the benefits of keeping all containers of an application on 1 host, mostly tied to testing and development benefits?
I know there are benefits from moving containers amongst different providers/machines easily, but for the most part, I don't see that as a huge gain personally since that was doable with images...
Are there any other benefits for containers in production that I am missing? Or are the main benefits for testing and development? (Am I thinking about containers in production wrong)?

Note: The question is very broad and could fill an entire book but I'll shed some light.
Benefits of containers
The exciting part about containers is not about their use on a single host, but their use across hosts connected on a large cluster. Do not look at your machines as independent docker hosts, but as a pool of resource to host your containers.
Containers alone are not ground-breaking (ie. Docker's CTO stating at the last DockerCon that "nobody cares about containers"), but coupled to state of the art schedulers and container orchestration frameworks, they become a very powerful abstraction to handle production-grade software.
As to the argument that it also applies to Virtual Machines, yes it does, but containers have some technical advantage (See: How is Docker different from a normal virtual machine) over VMs that makes them convenient to use.
On a Single host
On a single host, the benefits you can get from containers are (amongst many others):
Use as a development environment mimicking the behavior on a real production cluster.
Reproducible builds independent of the host (convenient for sharing)
Testing new software without bloating your machine with packages you won't use daily.
Extending from a single host to a pool of machines (cluster)
When time comes to manage a production cluster, there are two approaches:
Create a couple of docker hosts and run/connect containers together "manually" through scripts or using solutions like docker-compose. Monitoring the lifetime of your services/containers is at your charge, and you should be prepared to handle service downtime.
Let a container orchestrator deal with everything and monitor the lifetime of your services to better cope with failures.
There are plenty of container orchestrators: Kubernetes, Swarm, Mesos, Nomad, Cloud Foundry, and probably many others. They power many large-scale companies and infrastructures, like Ebay, so they sure found a benefit in using these.
Pick the right replication strategy
A container is better used as a disposable resource meaning you can stop and restart the DB independently and it shouldn't impact the backend (other than throwing an error because the DB is down). As such you should be able to handle any kind of network partition as long as your services are properly replicated across several hosts.
You need to pick a proper replication strategy, to make sure your service stays up and running. You can for example replicate your DB across Cloud provider Availability Zones so that when an entire zone goes down, your data remains available.
Using Kubernetes for example, you can put each of your containers (1 FE, 1 BE & 1 DB) in a pod. Kubernetes will deal with replicating this pod on many hosts and monitor that these pods are always up and running, if not a new pod will be created to cope with the failure.
If you want to mitigate the effect of network partitions, specify node affinities, hinting the scheduler to place containers on the same subset of machines and replicate on an appropriate number of hosts.
How many containers per host?
It really depends on the number of machines you use and the resources they have.
The rule is that you shouldn't bloat a host with too many containers if you don't specify any resource constraint (in terms of CPU or Memory). Otherwise, you risk compromising the host and exhaust its resources, which in turn will impact all the other services on the machine. A good replication strategy is not only important at a single service level, but also to ensure good health for the pool of services that are sharing a host.
Resource constraint should be dealt with depending on the type of your workload: a DB will probably use more resources than your Front-end container so you should size accordingly.
As an example, using Swarm, you can explicitely specify the number of CPUs or Memory you need for a given service (See docker service documentation). Although there are many possibilities and you can also give an upper bound/lower bound in terms of CPU or Memory usage. Depending on the values chosen, the scheduler will pin the service to the right machine with available resources.
Kubernetes works pretty much the same way and you can specify limits for your pods (See documentation).
Mesos has more fine grained resource management policies with frameworks (for specific workloads like Hadoop, Spark, and many more) and with over-commiting capabilities. Mesos is especially convenient for Big Data kind of workloads.
How should services be split?
It really depends on the orchestration solution:
In Docker Swarm, you would create a service for each component (FE, BE, DB) and set the desired replication number for each service.
In Kubernetes, you can either create a pod encompassing the entire application (FE, BE, DB and the volume attached to the DB) or create separate pods for the FE, BE, DB+volume.
Generally: use one service per type of container. Regarding groups of containers, evaluate if it is more convenient to scale the entire group of container (as an atomic unit, ie. a pod) than to manage them separately.
Sum up
Containers are better used with an orchestration framework/platform. There are plenty of available solutions to deal with container scheduling and resource management. Pick one that might fit your use case, and learn how to use it. Always pick an appropriate replication strategy, keeping in mind possible failure modes. Specify resource constraints for your containers/services when possible to avoid resource exhaustion which could potentially lead to bringing a host down.

This depends on the type of application you run in your containers. From the top of my head I can think of a couple different ways to look at this:
is your application diskspace heavy?
do you need the application fail save on multiple machines?
can you run multiple different instance of different applications on the same host without decreasing performance of them?
do you use software like kubernetes or swarm to handle your machines?
I think most of the question are interesting to answer even without containers. Containers might free you of thinking about single hosts, but you still have to decide and measure the load of your host machines yourself.

Minor question: Does it ever make sense to have less dbs than backend servers?
Yes.
Consider cases where you hit normal(without many joins) SQL select statements to get data from the database but your Business Logic demands too much computation. In those cases you might consider keeping your Back-End Service count high and Database Service count low.
It all depends on the use case which is getting solved.

The number of containers per host depends on the design ratio of the host and the workload ratio of the containers. Both ratios are
Throughput/Capacity ratios. In the old days, this was called E/B for execution/bandwidth. Execution was cpu and banwidth was I/o. Solutions were said to be cpu or I/o bound.
Today memories are very large the critical factor is usually cpu/nest
capacity. We describe workloads as cpu intense or nest intense. A useful proxy for nest capacity is the size of highest level cache. A useful design ratio estimator is (clock x cores)/cache. Fir the same core count the machine with a lower design ratio will hold more containers. In part this is because the machine with more cache will scale better and see less saturation at higher utilization. By

Related

Why should I run multiple elasticsearch nodes on a single docker host?

There are a lot of articles online about running an Elasticsearch multi-node cluster using docker-compose, including the official documentation for Elasticsearch 8.0. However, I cannot find a reason why you would set up multiple nodes on the same docker host. Is this the recommended setup for a production environment? Or is it an example of theory in practice?
You shouldn't consider this a production environment. The guides are examples, often for lab environments, and testing scenarios with the application. I would not consider them production ready, and compose is often not considered a production grade tool since everything it does is to a single docker node, where in production you typically want multiple nodes spread across multiple availability zones.
Since one ES node heap memory should never get more than half the available memory (and less than ~30.5GB), one reason it makes sense to have several nodes on a given host is when you have hosts with ample memory (say 128GB+). In that case you could run 2 ES nodes (with 64GB of memory each, 30.5GB heap and the rest for Lucene) on the same host by correctly constraining each Docker container.
Note that the above is not related to Docker, you can always configure several nodes per host, whether Docker or not.
Regarding production and given the fact that 2+ nodes would run on the same host, if you lose that host, you lose two nodes, which is not good. However, depending on how many hosts you have, it might be a lesser problem, if and only if, each host is in a different availability zone and you have the appropriate cluster/shard allocation awareness settings configured, which would ensure that your data is redundantly copied in 2+ availability zones. In this case, losing a host (2 nodes) would still keep your cluster running, although in degraded mode.
It's worth noting that Elastic Cloud Enterprise (which powers Elastic Cloud) is designed to run several nodes per hosts depending on the sizing of the nodes and the available hardware. You can find more info on hardware pre-requisites as well as how medium and large scale deployments make use of one or more large 256GB hosts per availability zones.

Guidance on when to chose virtual machines or physical machines over containers

There are many articles and videos comparing containers, virtual machines, physical machines. However almost all information is theoretical: containers are fast, VMs are secure, etc. But I could not find description of specific use cases or guidance on when to choose virtual machines, physical machines, but not containers. So, currently I cannot imagine situation when somebody gives recommendation to not use containers.
Question:
Could you please list specific applications or solutions when you would recommend using VMs, but not container?
Could you please list specific applications or solutions when you would recommend using OS over bare metal, but not containers or VMs?
Here is example of answer I would appreciate to get (note, that I am not sure if this information is correct):
Use case 1: Edge Router
Edge router is a router which connects organizational network to the Internet. Also, in this case it is assumed, that vendor of the router provides it not as device but as a software package (virtualized router).
Edge router most probably will be one of target of hacker's attacks. Thus security requirements come to the first place.
Containers are not recommended in this case. By default containers provide mediocre level of security. Strong security can be achieved with complex configuration (what configuration?) but this is more difficult than in case of VM or bare metal. In addition, high security level may require special hardened Linux kernel, however containers technology does not allow adjusting kernel configuration.
Virtual Machines would be a good choice if vendor of the router provides software as VM image or when organization has many edge routers (for example, many offices with internet access points), and has (or is ready to create) well-established process of preparation of VM images. In this case using VMs will simplify rollout, update and healing the virtualized edge router. VM also provides high security level; nevertheless is it still recommended to place such a VM in a separate server and to not share same server with other applications/VMs to avoid cross-VM attacks.
Physical machine would be a good choice if router vendor provides router's software as an application package (not as a VM) such as .rpm, and rollout, update and healing processes are not expected to take much efforts; this might be the case when when company has few routers (so updates can be performed manually or automated with tools like Ansible), and couple of hour of planned and unplanned downtime is acceptable.
Use case 2: ...
Thank you in advance.
The question is a bit vague so I'll try my best:
you'd usually allocate work to containers when you have a few separate applications with limited physical resources and you'd like to run them each with their own different environment (different runtime version, architecture and dependencies) which managing on a machine (physical or virtual) would be cumbersome.
you'd use a VM when you want specifically a feature that containers couldn't satisfy or it would just be a headache to set them up with it and a simple quick and easy VM could solve (and again you have limited resources you'd like to share between use cases)
and finally, a physical machine when performance is of the essence like I/O requests and latency around that.
you can also mix and match to match each tier needs:
we need to run many applications that VM would be too much of an overhead for them and containers would make their handling more automated and streamline so containers with k8s, but on the other hand, we want local storage offered to those containers to be very fast so we run the k8s cluster on physical machines.
if recoverability would be of the essence we would have used VM due to the options of snapshotting VM states over time.
It's all a big LEGO set you can mix and match depending on your use case and needs

Question regarding Monolithic vs. Microservice Architecture

I'm currently rethinking an architecture I was planning.
So suppose I have a system where there are about 8 different services interacting with a single database. Some services listen and react to database events and do stuff like sending SMS.
Then there's an API layer sitting on top of the database and a frontend connected to this API. So in my understanding this is rather monolithic.
In fact I don't see any advantage of using containers in this scenario. Their real advantage is that they can be swapped out, right? My intuition tells me that there is often no purpose in doing that except maybe some load balancing on API level. Instead many companies just seem to blindly jump on the hype train of containerizing everything.
Now the question arises, is docker the right tool for this context? In each forum people refrain from using docker for the sole purpose of a more resource efficient "VM" aggregating all services within a single container. However this is the only real scenario I'd see any advantages in using docker (the environment, e.g. alpine-linux, is the same on all customer's computers when rolling out the system).
Even docker-compose is not "grouping" containers together as a complete system only exposing port 443 but instead starts an infrastructure of multiple interacting containers. Oftentimes services like Kubernetes are then used for deploying these infrastructures on "nodes", i.e. VMs.
However, in my opinion it would be great to have a single self-contained container without putting them into a VM. This container would include every necessary service only exposing one port, e.g. 443.
Since I'm rather confused now, I'd really appreciate your help here.
Thanks in advance!
Kubernetes does many things and has many useful features. But Kubernetes also require that you architect your apps to follow The Twelve-Factor App principles. An important thing here is that your apps are stateless.
When the app is stateless, it is easy to scale out horizontally - this can also be done automatically when the load increases.
When the app is stateless, it is easy to do Rolling Deployments that upgrade the app to a new version without downtime.
You can run containers on bare metal Linux servers, but this is mostly very big servers. If you use a cloud, you probably want more VM instances, but distributed to 3 Availability Zones - for increased availability.
"Self-contained container - exposing one port". With Kubernetes, you typically use a private network and you only expose services via a single load balancer - typically on a port, but different URLs send traffic to different services.
Some services listen and react to database events and do stuff like sending SMS.
As I said, many things is easier when it is horizontal scalable, but this kind of app - that listen for events and react - is one of few examples where you can not scale horizontally. But it is a good fit for a serverless architecture instead, possibly on Kubernetes using Knative.
Now the question arises, is docker the right tool for this context?
My opinion is that most workload will run in containers. It is more a question about how it should be run in Kubernetes - one or multiple replicas. As stateless Deployments or stateful StatefulSet or some other way.

Container technologies: docker, rkt, orchestration, kubernetes, GKE and AWS Container Service

I'm trying to get a good understanding of container technologies but am somewhat confused. It seems like certain technologies overlap different portions of the stack and different pieces of different technologies can be used as the DevOps team sees fit (e.g., can use Docker containers but don't have to use the Docker engine, could use engine from cloud provider instead). My confusion lies in understanding what each layer of the "Container Stack" provides and who the key providers are of each solution.
Here's my layman's understanding; would appreciate any corrections and feedback on holes in my understanding
Containers: self-contained package including application, runtime environment, system libraries, etc.; like a mini-OS with an application
It seems like Docker is the de-facto standard. Any others that are notable and widely used?
Container Clusters: groups of containers that share resources
Container Engine: groups containers into clusters, manages resources
Orchestrator: is this any different from a container engine? How?
Where do Docker Engine, rkt, Kubernetes, Google Container Engine, AWS Container Service, etc. fall between #s 2-4?
This may be a bit long and present some oversimplification but should be sufficient to get the idea across.
Physical machines
Some time ago, the best way to deploy simple applications was to simply buy a new webserver, install your favorite operating system on it, and run your applications there.
The cons of this model are:
The processes may interfere with each other (because they share CPU and file system resources), and one may affect the other's performance.
Scaling this system up/down is difficult as well, taking a lot of effort and time in setting up a new physical machine.
There may be differences in the hardware specifications, OS/kernel versions and software package versions of the physical machines, which make it difficult to manage these application instances in a hardware-agnostic manner.
Applications, being directly affected by the physical machine specifications, may need specific tweaking, recompilation, etc, which means that the cluster administrator needs to think of them as instances at an individual machine level. Hence, this approach does not scale. These properties make it undesirable for deploying modern production applications.
Virtual Machines
Virtual machines solve some of the problems of the above:
They provide isolation even while running on the same machine.
They provide a standard execution environment (the guest OS) irrespective of the underlying hardware.
They can be brought up on a different machine (replicated) quite quickly when scaling (order of minutes).
Applications typically do not need to be rearchitected for moving from physical hardware to virtual machines.
But they introduce some problems of their own:
They consume large amounts of resources in running an entire instance of an operating system.
They may not start/go down as fast as we want them to (order of seconds).
Even with hardware assisted virtualization, application instances may see significant performance degradation over an application running directly on the host.
(This may be an issue only for certain kinds of applications)
Packaging and distributing VM images is not as simple as it could be.
(This is not as much a drawback of the approach, as it is of the existing tooling for virtualization.)
Containers
Then, somewhere along the line, cgroups (control groups) were added to the linux kernel. This feature lets us isolate processes in groups, decide what other processes and file system they can see, and perform resource accounting at the group level.
Various container runtimes and engines came along which make the process of creating a "container", an environment within the OS, like a namespace which has limited visibility, resources, etc, very easy. Common examples of these include docker, rkt, runC, LXC, etc.
Docker, for example, includes a daemon which provides interactions like creating an "image", a reusable entity that can be launched into a container instantly. It also lets one manage individual containers in an intuitive way.
The advantages of containers:
They are light-weight and run with very little overhead, as they do not have their own instance of the kernel/OS and are running on top of a single host OS.
They offer some degree of isolation between the various containers and the ability to impose limits on various resources consumed by them (using the cgroup mechanism).
The tooling around them has evolved rapidly to allow easy building of reusable units (images), repositories for storing image revisions (container registries) and so on, largely due to docker.
It is encouraged that a single container run a single application process, in order to maintain and distribute it independently. The light-weight nature of a container make this preferable, and leads to faster development due to decoupling.
There are some cons as well:
The level of isolation provided is a less than that in case of VMs.
They are easiest to use with stateless 12-factor applications being built afresh and a slight struggle if one tries to deploy legacy applications, clustered distributed databases and so on.
They need orchestration and higher level primitives to be used effectively and at scale.
Container Orchestration
When running applications in production, as the complexity grows, it tends to have many different components, some of which scale up/down as necessary, or may need to be scaled. The containers themselves do not solve all our problems. We need a system that solves problems associated with real large-scale applications such as:
Networking between containers
Load balancing
Managing storage attached to these containers
Updating containers, scaling them, spreading them across nodes in a multi-node cluster and so on.
When we want to manage a cluster of containers, we use a container orchestration engine. Examples of these are Kubernetes, Mesos, Docker Swarm etc. They provide a host of functionality in addition to those listed above and the goal is to reduce the effort involved in dev-ops.
GKE (Google Container Engine) is hosted Kubernetes on Google Cloud Platform. It lets a user simply specify that they need an n-node kubernetes cluster and exposes the cluster itself as a managed instance. Kubernetes is open source and if one wanted to, one could also set it up on Google Compute Engine, a different cloud provider, or their own machines in their own data-center.
ECS is a proprietary container management/orchestration system built and operated by Amazon and available as part of the AWS suite.
To answer your questions specifically:
Docker engine: A tool to manage the lifecycle of a docker container and docker images. Create, restart, delete docker containers. Create, rename, delete docker images.
rkt: Analogous to docker engine, but different implementation
Kubernetes: A collection of tools to manage the lifecycle of a distributed application that uses containers. Contains tooling to manage containers, groups of containers, configuration for containers, orchestrating containers, scheduling them on actual instances, tooling to help developers write and maintain other services/tools to deal with containers.
Google Container Engine: Instead of getting VMs, installing "docker-engine" on them, installing kubernetes on them and getting it all to work with things like the right permissions to your infrastructure etc. imagine if it all came together so that you can choose the types of machines and the size of your cluster that has all of this just working. Things like pulling images from your project specific docker repository (google container registry) or claiming persistent volumes, or provisioning load-balancers just work without worrying about service accounts and permissions and what not.
ECS: Analogous to GKE (4) but without Kubernetes.
To address the points in your understanding: you are loosely right about things (except container engine I think). It's important to understand that the only important thing to understand is what a container is. The rest of it is just marketing/product names. It's also important to understand that today's understanding of containers is very warped by what Docker containers are and a lot of the opinions enforced by Docker and tooling around Docker. Containers have been around for a long time.
So once you understand what a (docker) container is, a container engine is just a tool to manage them, a container cluster is a just a group of containers, an orchestrator is just a tool to manage where containers run based on some parameters. IMHO, you really don't need to worry too much about what the rest of the tooling is once you understand and build a solid mental model around containers. The rest will just fit in automatically.
The best way to understand all of this? Build & deploy a decently complex application with Docker (persist data/use a database in your app) and everything will make sense.

In Docker Swarm mode is there any point in replicating a service more than the number of hosts available?

I have been looking into the new Docker Swarm mode that will be available in Docker 1.12. In this Docker Swarm Mode Walkthrough video, they create a simple Nginx service that is composed of a single Nginx container. In the video, they have 4 nodes in the Swarm cluster. During the scaling demonstration, they increase the replication factor to 10, thus creating 10 copies of the Nginx container across all 4 machines in the cluster.
I get that the video is just a demonstration, but in the real world, what is the point of creating more replicas of a container (or service) than there are nodes in the Swarm cluster? It seems to be pointless since two containers on the same machine would be sharing that machines finite computing resources anyway. I don't get what the benefit is.
So my question is, is there any real world benefit to replicating a Docker service or container beyond the number of nodes in the Swarm cluster?
Thanks
It depends on how the application handles threading and multiple requests. A single threaded application, or job that only handles one request at a time, may use a fraction of the OS resources and benefit from running multiple instances on a single host. An application that's been tuned to process requests concurrently and which fully utilizes the OS will see no benefit and will in fact incur a penalty of taking away resources to run multiple instances of the application.
One advantage can be performing live zero-downtime software updates. See the Docker 0.12rc2 Swarm tutorial on rolling updates
You have a RabbitMQ or other Queue System with a high load on data. You can start more Containers with workers than nodes to handle the high data load on your RabbitMQ.
Hardware resource constrain is not the only thing one needs to consider when you have your services replicated.
A simple example would be if you are having a service to provide security details. The resource consumption by this service will be low (read a record from Db/Cache and send it out). However if there are 20 or 30 requests to be handled by the same service the requests will be queued up.
Yes there are better ways to implement my example but I believe is good enough to illustrate why one might replicate a service on the same host/node.

Resources