what does it mean by 'one application per one operating system' in hyper-visor virtualization? - docker

I was going through the documentation for docker. It was providing the concepts for virtual machine before container. The author stated that, a server can be divided into multiple virtual machines having their own operating system. He also stated that, this way, multiple applications can be run in one physical server by running each of them in separate virtual machine (one virtual machine for one application). I was little bit confused here. Can't multiple applications run in one virtual machine (operating system) without the need for other vm ? By applications, what do we mean? I am a total beginner in this topic. If anyone could make me understand this terminology, I would be very grateful. Thank You.

An application is a service or a process such as: Nginx, PHP, Redis, Apache, Memcached and so on.
The reason why is recommended this way it is because containers have been designed to isolate a process by giving its own userspace and filesystem.
Therefore, this comes of benefits such as: having just one process per container makes it easily re-usable for another projects, easily scalable and you also separate worries so for example if run 2 applications inside a container and you want to shut one of them then will that process gracefully stop or you will have to stop the entire container?

Related

Guidance on when to chose virtual machines or physical machines over containers

There are many articles and videos comparing containers, virtual machines, physical machines. However almost all information is theoretical: containers are fast, VMs are secure, etc. But I could not find description of specific use cases or guidance on when to choose virtual machines, physical machines, but not containers. So, currently I cannot imagine situation when somebody gives recommendation to not use containers.
Question:
Could you please list specific applications or solutions when you would recommend using VMs, but not container?
Could you please list specific applications or solutions when you would recommend using OS over bare metal, but not containers or VMs?
Here is example of answer I would appreciate to get (note, that I am not sure if this information is correct):
Use case 1: Edge Router
Edge router is a router which connects organizational network to the Internet. Also, in this case it is assumed, that vendor of the router provides it not as device but as a software package (virtualized router).
Edge router most probably will be one of target of hacker's attacks. Thus security requirements come to the first place.
Containers are not recommended in this case. By default containers provide mediocre level of security. Strong security can be achieved with complex configuration (what configuration?) but this is more difficult than in case of VM or bare metal. In addition, high security level may require special hardened Linux kernel, however containers technology does not allow adjusting kernel configuration.
Virtual Machines would be a good choice if vendor of the router provides software as VM image or when organization has many edge routers (for example, many offices with internet access points), and has (or is ready to create) well-established process of preparation of VM images. In this case using VMs will simplify rollout, update and healing the virtualized edge router. VM also provides high security level; nevertheless is it still recommended to place such a VM in a separate server and to not share same server with other applications/VMs to avoid cross-VM attacks.
Physical machine would be a good choice if router vendor provides router's software as an application package (not as a VM) such as .rpm, and rollout, update and healing processes are not expected to take much efforts; this might be the case when when company has few routers (so updates can be performed manually or automated with tools like Ansible), and couple of hour of planned and unplanned downtime is acceptable.
Use case 2: ...
Thank you in advance.
The question is a bit vague so I'll try my best:
you'd usually allocate work to containers when you have a few separate applications with limited physical resources and you'd like to run them each with their own different environment (different runtime version, architecture and dependencies) which managing on a machine (physical or virtual) would be cumbersome.
you'd use a VM when you want specifically a feature that containers couldn't satisfy or it would just be a headache to set them up with it and a simple quick and easy VM could solve (and again you have limited resources you'd like to share between use cases)
and finally, a physical machine when performance is of the essence like I/O requests and latency around that.
you can also mix and match to match each tier needs:
we need to run many applications that VM would be too much of an overhead for them and containers would make their handling more automated and streamline so containers with k8s, but on the other hand, we want local storage offered to those containers to be very fast so we run the k8s cluster on physical machines.
if recoverability would be of the essence we would have used VM due to the options of snapshotting VM states over time.
It's all a big LEGO set you can mix and match depending on your use case and needs

When should I use a Docker and when should I use a Virtual Machine? [duplicate]

This question already has answers here:
How is Docker different from a virtual machine?
(23 answers)
Closed 3 years ago.
Is there any guidelines on when to use Dockers over VM's? (or vice versa)
It seems to me that services like NGINX, Apache, or Redis, should be a docker, but I am unsure if say an ElasticSearch docker should be used in a HPC environment.
Is a Docker always better then a VM?
First, it is a container; docker is one implementation of a container, neither the first nor the last.
A virtual machine (VM) is a superset of a container, so the question isn't about better, it is about depth - namely how much isolation do you need. At the most trivial level, isolation is about getting away from the .so-insanity that plagues the universe; determine what you need, dump it in a container, and voila, no more compatibility problems. At this level, the container is mainly about packaging; and nothing can go wrong with making packaging magic. Just make sure your resume is up to date.
At deeper levels, containers can be involved in isolated deployments. These work, but typically require substantial amounts of management software: orchestration software (k8s), service mess (istio :), [A-Z]AAS. Somewhere in this wilderness they intersect with Virtual Machines, which pre-package many concepts in the other layers, albeit with a different management platform.
Within the domain of VMs, we see a continuum of awareness within the guest of the host which extends from similar to the container (ie. highly dependent) to blithely ignorant (ie. classic virtual machine). The selection criteria in this arena mainly falls in the domain of trust -- this less you trust the guest, the more you want to isolate it; or the less the guest trusts the host environment, the more isolated it wants to be.
To review; in the realm of isolation, containers and VMs occupy overlapping extremes of a continuum. The container is the lightest way to manage packaging, but as the isolation needs increase, the VM becomes increasingly attractive. Within the VM continuum, there are trade offs between trust and performance. There is a tonne of software supporting every stop along the way; but that software is not (yet) unified.

Container technologies: docker, rkt, orchestration, kubernetes, GKE and AWS Container Service

I'm trying to get a good understanding of container technologies but am somewhat confused. It seems like certain technologies overlap different portions of the stack and different pieces of different technologies can be used as the DevOps team sees fit (e.g., can use Docker containers but don't have to use the Docker engine, could use engine from cloud provider instead). My confusion lies in understanding what each layer of the "Container Stack" provides and who the key providers are of each solution.
Here's my layman's understanding; would appreciate any corrections and feedback on holes in my understanding
Containers: self-contained package including application, runtime environment, system libraries, etc.; like a mini-OS with an application
It seems like Docker is the de-facto standard. Any others that are notable and widely used?
Container Clusters: groups of containers that share resources
Container Engine: groups containers into clusters, manages resources
Orchestrator: is this any different from a container engine? How?
Where do Docker Engine, rkt, Kubernetes, Google Container Engine, AWS Container Service, etc. fall between #s 2-4?
This may be a bit long and present some oversimplification but should be sufficient to get the idea across.
Physical machines
Some time ago, the best way to deploy simple applications was to simply buy a new webserver, install your favorite operating system on it, and run your applications there.
The cons of this model are:
The processes may interfere with each other (because they share CPU and file system resources), and one may affect the other's performance.
Scaling this system up/down is difficult as well, taking a lot of effort and time in setting up a new physical machine.
There may be differences in the hardware specifications, OS/kernel versions and software package versions of the physical machines, which make it difficult to manage these application instances in a hardware-agnostic manner.
Applications, being directly affected by the physical machine specifications, may need specific tweaking, recompilation, etc, which means that the cluster administrator needs to think of them as instances at an individual machine level. Hence, this approach does not scale. These properties make it undesirable for deploying modern production applications.
Virtual Machines
Virtual machines solve some of the problems of the above:
They provide isolation even while running on the same machine.
They provide a standard execution environment (the guest OS) irrespective of the underlying hardware.
They can be brought up on a different machine (replicated) quite quickly when scaling (order of minutes).
Applications typically do not need to be rearchitected for moving from physical hardware to virtual machines.
But they introduce some problems of their own:
They consume large amounts of resources in running an entire instance of an operating system.
They may not start/go down as fast as we want them to (order of seconds).
Even with hardware assisted virtualization, application instances may see significant performance degradation over an application running directly on the host.
(This may be an issue only for certain kinds of applications)
Packaging and distributing VM images is not as simple as it could be.
(This is not as much a drawback of the approach, as it is of the existing tooling for virtualization.)
Containers
Then, somewhere along the line, cgroups (control groups) were added to the linux kernel. This feature lets us isolate processes in groups, decide what other processes and file system they can see, and perform resource accounting at the group level.
Various container runtimes and engines came along which make the process of creating a "container", an environment within the OS, like a namespace which has limited visibility, resources, etc, very easy. Common examples of these include docker, rkt, runC, LXC, etc.
Docker, for example, includes a daemon which provides interactions like creating an "image", a reusable entity that can be launched into a container instantly. It also lets one manage individual containers in an intuitive way.
The advantages of containers:
They are light-weight and run with very little overhead, as they do not have their own instance of the kernel/OS and are running on top of a single host OS.
They offer some degree of isolation between the various containers and the ability to impose limits on various resources consumed by them (using the cgroup mechanism).
The tooling around them has evolved rapidly to allow easy building of reusable units (images), repositories for storing image revisions (container registries) and so on, largely due to docker.
It is encouraged that a single container run a single application process, in order to maintain and distribute it independently. The light-weight nature of a container make this preferable, and leads to faster development due to decoupling.
There are some cons as well:
The level of isolation provided is a less than that in case of VMs.
They are easiest to use with stateless 12-factor applications being built afresh and a slight struggle if one tries to deploy legacy applications, clustered distributed databases and so on.
They need orchestration and higher level primitives to be used effectively and at scale.
Container Orchestration
When running applications in production, as the complexity grows, it tends to have many different components, some of which scale up/down as necessary, or may need to be scaled. The containers themselves do not solve all our problems. We need a system that solves problems associated with real large-scale applications such as:
Networking between containers
Load balancing
Managing storage attached to these containers
Updating containers, scaling them, spreading them across nodes in a multi-node cluster and so on.
When we want to manage a cluster of containers, we use a container orchestration engine. Examples of these are Kubernetes, Mesos, Docker Swarm etc. They provide a host of functionality in addition to those listed above and the goal is to reduce the effort involved in dev-ops.
GKE (Google Container Engine) is hosted Kubernetes on Google Cloud Platform. It lets a user simply specify that they need an n-node kubernetes cluster and exposes the cluster itself as a managed instance. Kubernetes is open source and if one wanted to, one could also set it up on Google Compute Engine, a different cloud provider, or their own machines in their own data-center.
ECS is a proprietary container management/orchestration system built and operated by Amazon and available as part of the AWS suite.
To answer your questions specifically:
Docker engine: A tool to manage the lifecycle of a docker container and docker images. Create, restart, delete docker containers. Create, rename, delete docker images.
rkt: Analogous to docker engine, but different implementation
Kubernetes: A collection of tools to manage the lifecycle of a distributed application that uses containers. Contains tooling to manage containers, groups of containers, configuration for containers, orchestrating containers, scheduling them on actual instances, tooling to help developers write and maintain other services/tools to deal with containers.
Google Container Engine: Instead of getting VMs, installing "docker-engine" on them, installing kubernetes on them and getting it all to work with things like the right permissions to your infrastructure etc. imagine if it all came together so that you can choose the types of machines and the size of your cluster that has all of this just working. Things like pulling images from your project specific docker repository (google container registry) or claiming persistent volumes, or provisioning load-balancers just work without worrying about service accounts and permissions and what not.
ECS: Analogous to GKE (4) but without Kubernetes.
To address the points in your understanding: you are loosely right about things (except container engine I think). It's important to understand that the only important thing to understand is what a container is. The rest of it is just marketing/product names. It's also important to understand that today's understanding of containers is very warped by what Docker containers are and a lot of the opinions enforced by Docker and tooling around Docker. Containers have been around for a long time.
So once you understand what a (docker) container is, a container engine is just a tool to manage them, a container cluster is a just a group of containers, an orchestrator is just a tool to manage where containers run based on some parameters. IMHO, you really don't need to worry too much about what the rest of the tooling is once you understand and build a solid mental model around containers. The rest will just fit in automatically.
The best way to understand all of this? Build & deploy a decently complex application with Docker (persist data/use a database in your app) and everything will make sense.

Is it better to start multiple erlang nodes per machine, or just one per machine?

Preface: When I say "machine" below, I mean either a physical dedicated server, or a virtual private server. When I say "node" I mean, an instance of the erlang virtual machine, of which there could be multiple running as separate processes under a single unix kernel.
I've got a project that involves multiple erlang/OTP applications. The applications will be running together and talking to each other on the same machine. They will all be hitting the disk, using memory and spawning erlang processes. They will also be using network resources because they will be talking to similar machines with the same set of applications running on them in a cluster.
Almost all of this communication is via HTTP. Thus I could separate each erlang OTP application into a separate instance of the erlang VM on the same machine and they could still talk to each other.
My question is: Is it better to have them running all under one erlang VM so that this erlang VM process can allocate access to resources among them, and schedule the execution of the various erlang processes.
Or is it better to have separate erlang nodes on a given server?
If one is better than the other, why?
I'm assuming running all of these apps in a single erlang vm which is given, essentially, full run of the server, will result in better performance. The OS is just managing the disk and ram at the low level, and only has one significant process (the erlang VM) to switch with... and the erlang VM is probably smarter about allocating resources when it has the holistic view of all the erlang processes.
This may be something that I need to test, but I'm not in a position to do so effectively in the near term.
The answer is: it depends.
Advantages of using a single node:
Memory is controlled by a single Erlang VM. It is way easier.
Inter-application communication (if using erlang-messaging) is faster.
Less operating system context switches happens
Advantages of using multiple nodes:
If the system is linking in C code to the VM, death of one node due to a bug in C will not kill the others.
Agree with #I GIVE CRAP ANSWERS
I would go with one VM. Here is why:
dynamic handling of run time queues belonging to schedulers (with varied origin of CPU load its important)
fewer VMs to monitor
better understanding of memory allocation and easier to spot malicious process (can compare all of them at once)
much easier inter app supervision
I wouldn't care about VM crash - you need to be prepared any way. Heart works especially well in the cluster of equal units.
We've always used one VM per application because it's easier to manage.
The scheduler and SMP support in Erlang have come a long way in the past few years, so there isn't as much reason as there used to be to run multiple VMs on the same node.
I Agree with previous answers but there is a case scenario where having multiple nodes per cpu is the answer: When a heavy task hits the node. A task may take multiple minutes to complete and in such case a gen server will hold the node until completion of the task.

Should separate Erlang applications share the same VM on the same machine?

I have a CouchDB instance running on one machine, and thus with its own Erlang VM process. If I have another separate Erlang application running on that machine too, is it be better to share the same VM between CouchDB and my application, or is it recommended to start a new Erlang node?
While many would recommend decoupling these subsystems I would take the opposite approach. Erlang has a built in strategy to run many applications on the same release. If your applications talk to each other directly it might make sense for you to bundle them together into a release. This will make calls between the applications faster. Some will argue that all you applications now share the same fate should you need to take the system down for an upgrade that only one of the applications needs. This is a moot point with Erlang where you are distributing your applications across many nodes. Also most upgrades can be done with hot code loading.
There is no problem with running several VMs on the same machnie (at least recent OTP releases), however it is quite handy if you have all your applications on one Erlang node. Easier communication, dependency management, supervision, fault-tolerance - you get it for free in this case, not mentioning maintaining only one 'node' in source control system.
The problem starts with CouchDB. It does not have decent build system which let you use it as one of independent Erlang node applications. So in this case you need to have at least 2 VMs (one acts as Couch daemon, the other one hosts your application)

Resources