I've heard about Docker some days ago and wanted to go across.
But in fact, I don't know what is the purpose of this "container"?
What is a container?
Can it replace a virtual machine dedicated to development?
What is the purpose, in simple words, of using Docker in companies? The main advantage?
VM: Using virtual machine (VM) software, for example, Ubuntu can be installed inside a Windows. And they would both run at the same time. It is like building a PC, with its core components like CPU, RAM, Disks, Network Cards etc, within an operating system and assemble them to work as if it was a real PC. This way, the virtual PC becomes a "guest" inside an actual PC which with its operating system, which is called a host.
Container: It's same as above but instead of using an entire operating system, it cut down the "unnecessary" components of the virtual OS to create a minimal version of it. This lead to the creation of LXC (Linux Containers). It therefore should be faster and more efficient than VMs.
Docker: A docker container, unlike a virtual machine and container, does not require or include a separate operating system. Instead, it relies on the Linux kernel's functionality and uses resource isolation.
Purpose of Docker: Its primary focus is to automate the deployment of applications inside software containers and the automation of operating system level virtualization on Linux. It's more lightweight than standard Containers and boots up in seconds.
(Notice that there's no Guest OS required in case of Docker)
[ Note, this answer focuses on Linux containers and may not fully apply to other operating systems. ]
What is a container ?
It's an App: A container is a way to run applications that are isolated from each other. Rather than virtualizing the hardware to run multiple operating systems, containers rely on virtualizing the operating system to run multiple applications. This means you can run more containers on the same hardware than VMs because you only have one copy of the OS running, and you do not need to preallocate the memory and CPU cores for each instance of your app. Just like any other app, when a container needs the CPU or Memory, it allocates them, and then frees them up when done, allowing other apps to use those same limited resources later.
They leverage kernel namespaces: Each container by default will receive an environment where the following are namespaced:
Mount: filesystems, / in the container will be different from / on the host.
PID: process id's, pid 1 in the container is your launched application, this pid will be different when viewed from the host.
Network: containers run with their own loopback interface (127.0.0.1) and a private IP by default. Docker uses technologies like Linux bridge networks to connect multiple containers together in their own private lan.
IPC: interprocess communication
UTS: this includes the hostname
User: you can optionally shift all the user id's to be offset from that of the host
Each of these namespaces also prevent a container from seeing things like the filesystem or processes on the host, or in other containers, unless you explicitly remove that isolation.
And other linux security tools: Containers also utilize other security features like SELinux, AppArmor, Capabilities, and Seccomp to limit users inside the container, including the root user, from being able to escape the container or negatively impact the host.
Package your apps with their dependencies for portability: Packaging an application into a container involves assembling not only the application itself, but all dependencies needed to run that application, into a portable image. This image is the base filesystem used to create a container. Because we are only isolating the application, this filesystem does not include the kernel and other OS utilities needed to virtualize an entire operating system. Therefore, an image for a container should be significantly smaller than an image for an equivalent virtual machine, making it faster to deploy to nodes across the network. As a result, containers have become a popular option for deploying applications into the cloud and remote data centers.
Can it replace a virtual machine dedicated to development ?
It depends: If your development environment is running Linux, and you either do not need access to hardware devices, or it is acceptable to have direct access to the physical hardware, then you'll find a migration to a Linux container fairly straight forward. The ideal target for a docker container are applications like web based API's (e.g. a REST app), which you access via the network.
What is the purpose, in simple words, of using Docker in companies ? The main advantage ?
Dev or Ops: Docker is typically brought into an environment in one of two paths. Developers looking for a way to more rapidly develop and locally test their application, and operations looking to run more workload on less hardware than would be possible with virtual machines.
Or Devops: One of the ideal targets is to leverage Docker immediately from the CI/CD deployment tool, compiling the application and immediately building an image that is deployed to development, CI, prod, etc. Containers often reduce the time to move the application from the code check-in until it's available for testing, making developers more efficient. And when designed properly, the same image that was tested and approved by the developers and CI tools can be deployed in production. Since that image includes all the application dependencies, the risk of something breaking in production that worked in development are significantly reduced.
Scalability: One last key benefit of containers that I'll mention is that they are designed for horizontal scalability in mind. When you have stateless apps under heavy load, containers are much easier and faster to scale out due to their smaller image size and reduced overhead. For this reason you see containers being used by many of the larger web based companies, like Google and Netflix.
Same questions were hitting my head some days ago and what i found after getting into it, let's understand in very simple words.
Why one would think about docker and containers when everything seems fine with current process of application architecture and development !!
Let's take an example that we are developing an application using nodeJs , MongoDB, Redis, RabbitMQ etc services [you can think of any other services].
Now we face these following things as problems in application development and shipping process if we forget about existence of docker or other alternatives of containerizing applications.
Compatibility of services(nodeJs, mongoDB, Redis, RabbitMQ etc.) with OS(even after finding compatible versions with OS, if something unexpected happens related to versions then we need to relook the compatibility again and fix that).
If two system components requires a library/dependency with different versions in application in OS(That need a relook every time in case of an unexpected behaviour of application due to library and dependency version issue).
Most importantly , If new person joins the team, we find it very difficult to setup the new environment, person has to follow large set of instructions and run hundreds of commands to finally setup the environment And it takes time and effort.
People have to make sure that they are using right version of OS and check compatibilities of services with OS.And each developer has to follow this each time while setting up.
We also have different environment like dev, test and production.If One developer is comfortable using one OS and other is comfortable with other OS And in this case, we can't guarantee that our application will behave in same way in these two different situations.
All of these make our life difficult in process of developing , testing and shipping the applications.
So we need something which handles compatibility issue and allows us to make changes and modifications in any system component without affecting other components.
Now we think about docker because it's purpose is to
containerise the applications and automate the deployment of applications and ship them very easily.
How docker solves above issues-
We can run each service component(nodeJs, MongoDB, Redis, RabbitMQ) in different containers with its own dependencies and libraries in the same OS but with different environments.
We have to just run docker configuration once then all our team developers can get started with simple docker run command, we have saved lot of time and efforts here:).
So containers are isolated environments with all dependencies and
libraries bundled together with their own process and networking
interfaces and mounts.
All containers use the same OS resources
therefore they take less time to boot up and utilise the CPU
efficiently with less hardware costs.
I hope this would be helpful.
Why use docker:
Docker makes it really easy to install and running software without worrying about setup or dependencies. Docker is really made it easy and really straight forward for you to install and run software on any given computer not just your computer but on web servers as well or any cloud based computing platform. For example when I went to install redis in my computer by using bellow command
wget http://download.redis.io/redis-stable.tar.gz
I got error,
Now I could definitely go and troubleshoot this install that program and then try installing redis again, and I kind of get into endless cycle of trying to do all bellow troubleshooting as you I am installing and running software.
Now let me show you how easy it is to run read as if you are making use of Docker instead. just run the command docker run -it redis, this command will install docker without any error.
What docker is:
To understand what is docker you have to know about docker Ecosystem.
Docker client, server, Machine, Images, Hub, Composes are all projects tools pieces of software that come together to form a platform where ecosystem around creating and running something called containers, now if you run the command docker run redis something called docker CLI reached out to something called the Docker Hub and it downloaded a single file called an image.
An image is a single file containing all the dependencies and all the configuration required to run a very specific program, for example redis this which is what the image that you just downloaded was supposed to run.
This is a single file that gets stored on your hard drive and at some point time you can use this image to create something called a container.
A container is an instance of an image and you can kind of think it as being like a running program with it's own isolated set of hardware resources so it kind of has its own little set or its own little space of memory has its own little space of networking technology and its own little space of hard drive space as well.
Now lets examine when you give bellow command:
sudo docker run hello-world
Above command will starts up the docker client or docker CLI, Docker CLI is in charge of taking commands from you kind of doing a little bit of processing on them and then communicating the commands over to something called the docker server, and docker server is in charge of the heavy lifting when we ran the command Docker run hello-world,
That meant that we wanted to start up a new container using the image with the name of hello world, the hello world image has a tiny tittle program inside of it whose sole purpose or sole job is to print out the message that you see in the terminal.
Now when we ran that command and it was issued over to the docker server a series of actions very quickly occurred in background. The Docker server saw that we were trying to start up a new container using an image called hello world.
The first thing that the docker server did was check to see if it already had a local copy like a copy on your personal machine of the hello world image or that hello world file.So the docker server looked into something called the image cache.
Now because you and I just installed Docker on our personal computers that image cache is currently empty, We have no images that have already been downloaded before.
So because the image cache was empty the docker server decided to reach out to a free service called Docker hub. The Docker Hub is a repository of free public images that you can freely download and run on your personal computer. So Docker server reached out to Docker Hub and and downloaded the hello world file and stored it on your computer in the image-cache, where it can now be re-run at some point the future very quickly without having to re-downloading it from the docker hub.
After that the docker server will use it to create an instance of a container, and we know that a container is an instance of an image, its sole purpose is to run one very specific program. So the docker server then essentially took that image file from image cache and loaded it up into memory to created a container out of it and then ran a single program inside of it. And that single programs purpose was to print out the message that you see.
What a container is:
A container is a process or a set of processes that have a grouping of resource specifically assigned to it, in the bellow is a diagram that anytime that we think about a container we've got some running process that sends a system call to a kernel, the kernel is going to look at that incoming system call and direct it to a very specific portion of the hard drive, the RAM, CPU or what ever else it might need and a portion of each of these resources is made available to that singular process.
Let me try to provide as simple answers as possible:
But in fact, I don't know what is the purpose of this "container"?
What is a container?
Simply put: a package containing software. More specifically, an application and all its dependencies bundled together. A regular, non-dockerised application environment is hooked directly to the OS, whereas a Docker container is an OS abstraction layer.
And a container differs from an image in that a container is a runtime instance of an image - similar to how objects are runtime instances of classes in case you're familiar with OOP.
Can it replace a virtual machine dedicated to development?
Both VMs and Docker containers are virtualisation techniques, in that they provide abstraction on top of system infrastructure.
A VM runs a full “guest” operating system with virtual access to host resources through a hypervisor. This means that the VM often provides the environment with more resources than it actually needs In general, VMs provide an environment with more resources than most applications need. Therefore, containers are a lighter-weight technique. The two solve different problems.
What is the purpose, in simple words, of using Docker in companies?
The main advantage?
Containerisation goes hand-in-hand with microservices. The smaller services that make up the larger application are often tested and run in Docker containers. This makes continuous testing easier.
Also, because Docker containers are read-only they enforce a key DevOps principle: production services should remain unaltered
Some general benefits of using them:
Great isolation of services
Great manageability as containers contain everything the app needs
Encapsulation of implementation technology (in the containers)
Efficient resource utilisation (due to light-weight os virtualisation) in comparison to VMs
Fast deployment
If you don't have any prior experience with Docker this answer will cover the basics needed as a developer.
Docker has become a standard tool for DevOps as it is an effective application to improve operational efficiencies. When you look at why Docker was created and why it is very popular, it is mostly for its ability to reduce the amount of time it takes to set up the environments where applications run and are developed.
Just look at how long it takes to set up an environment where you have React as the frontend, a node and express API for backend, which also needs Mongo. And that's just to start. Then when your team grows and you have multiple developers working on the same front and backend and therefore they need to set up the same resources in their local environment for testing purposes, how can you guarantee every developer will run the same environment resources, let alone the same versions? All of these scenarios play well into Docker's strengths where it's value comes from setting containers with specific settings, environments and even versions of resources. Simply type a few commands to have Docker set up, install, and run your resources automatically.
Let's briefly go over the main components. A container is basically where your application or specific resource is located. For example, you could have the Mongo database in one container, then the frontend React application, and finally your node express server in the third container.
Then you have an image, which is from what the container is built. The images contains all the information that a container needs to build a container exactly the same way across any systems. It's like a recipe.
Then you have volumes, which holds the data of your containers. So if your applications are on containers, which are static and unchanging, the data that change is on the volumes.
And finally, the pieces that allow all these items to speak is networking. Yes, that sounds simple, but understand that each container in Docker have no idea of the existence of each container. They're fully isolated. So unless we set up networking in Docker, they won't have any idea how to connect to one and another.
There are really good answers above which I found really helpful.
Below I had drafted a simpler answer:
Reasons to dockerize my web application?
a. One OS for multiple applications ( Resources are shared )
b. Resource manangement ( CPU / RAM) is efficient.
c. Serverless Implementation made easier -Yes, AWS ECS with Fargate, But serverless can be achieved with Lamdba
d. Infra As Code - Agree, but IaC can be achieved via Terraforms
e. "It works in my machine" Issue
Still, below questions are open when choosing dockerization
A simple spring boot application
a. Jar file with size ~50MB
b. creates a Docker Image ~500MB
c. Cant I simply choose a small ec2 instance for my microservices.
Financial Benefits (reducing the individual instance cost) ?
a. No need to pay for individual OS subscription
b. Is there any monetary benefit like the below implementation?
c. let say select t3.2xlarge ( 8 core / 32 GB) and start 4-5 docker images ?
I keep rereading the Docker documentation to try to understand the difference between Docker and a full VM. How does it manage to provide a full filesystem, isolated networking environment, etc. without being as heavy?
Why is deploying software to a Docker image (if that's the right term) easier than simply deploying to a consistent production environment?
Docker originally used LinuX Containers (LXC), but later switched to runC (formerly known as libcontainer), which runs in the same operating system as its host. This allows it to share a lot of the host operating system resources. Also, it uses a layered filesystem (AuFS) and manages networking.
AuFS is a layered file system, so you can have a read only part and a write part which are merged together. One could have the common parts of the operating system as read only (and shared amongst all of your containers) and then give each container its own mount for writing.
So, let's say you have a 1 GB container image; if you wanted to use a full VM, you would need to have 1 GB x number of VMs you want. With Docker and AuFS you can share the bulk of the 1 GB between all the containers and if you have 1000 containers you still might only have a little over 1 GB of space for the containers OS (assuming they are all running the same OS image).
A full virtualized system gets its own set of resources allocated to it, and does minimal sharing. You get more isolation, but it is much heavier (requires more resources). With Docker you get less isolation, but the containers are lightweight (require fewer resources). So you could easily run thousands of containers on a host, and it won't even blink. Try doing that with Xen, and unless you have a really big host, I don't think it is possible.
A full virtualized system usually takes minutes to start, whereas Docker/LXC/runC containers take seconds, and often even less than a second.
There are pros and cons for each type of virtualized system. If you want full isolation with guaranteed resources, a full VM is the way to go. If you just want to isolate processes from each other and want to run a ton of them on a reasonably sized host, then Docker/LXC/runC seems to be the way to go.
For more information, check out this set of blog posts which do a good job of explaining how LXC works.
Why is deploying software to a docker image (if that's the right term) easier than simply deploying to a consistent production environment?
Deploying a consistent production environment is easier said than done. Even if you use tools like Chef and Puppet, there are always OS updates and other things that change between hosts and environments.
Docker gives you the ability to snapshot the OS into a shared image, and makes it easy to deploy on other Docker hosts. Locally, dev, qa, prod, etc.: all the same image. Sure you can do this with other tools, but not nearly as easily or fast.
This is great for testing; let's say you have thousands of tests that need to connect to a database, and each test needs a pristine copy of the database and will make changes to the data. The classic approach to this is to reset the database after every test either with custom code or with tools like Flyway - this can be very time-consuming and means that tests must be run serially. However, with Docker you could create an image of your database and run up one instance per test, and then run all the tests in parallel since you know they will all be running against the same snapshot of the database. Since the tests are running in parallel and in Docker containers they could run all on the same box at the same time and should finish much faster. Try doing that with a full VM.
From comments...
Interesting! I suppose I'm still confused by the notion of "snapshot[ting] the OS". How does one do that without, well, making an image of the OS?
Well, let's see if I can explain. You start with a base image, and then make your changes, and commit those changes using docker, and it creates an image. This image contains only the differences from the base. When you want to run your image, you also need the base, and it layers your image on top of the base using a layered file system: as mentioned above, Docker uses AuFS. AuFS merges the different layers together and you get what you want; you just need to run it. You can keep adding more and more images (layers) and it will continue to only save the diffs. Since Docker typically builds on top of ready-made images from a registry, you rarely have to "snapshot" the whole OS yourself.
It might be helpful to understand how virtualization and containers work at a low level. That will clear up lot of things.
Note: I'm simplifying a bit in the description below. See references for more information.
How does virtualization work at a low level?
In this case the VM manager takes over the CPU ring 0 (or the "root mode" in newer CPUs) and intercepts all privileged calls made by the guest OS to create the illusion that the guest OS has its own hardware. Fun fact: Before 1998 it was thought to be impossible to achieve this on the x86 architecture because there was no way to do this kind of interception. The folks at VMware were the first who had an idea to rewrite the executable bytes in memory for privileged calls of the guest OS to achieve this.
The net effect is that virtualization allows you to run two completely different OSes on the same hardware. Each guest OS goes through all the processes of bootstrapping, loading kernel, etc. You can have very tight security. For example, a guest OS can't get full access to the host OS or other guests and mess things up.
How do containers work at a low level?
Around 2006, people including some of the employees at Google implemented a new kernel level feature called namespaces (however the idea long before existed in FreeBSD). One function of the OS is to allow sharing of global resources like network and disks among processes. What if these global resources were wrapped in namespaces so that they are visible only to those processes that run in the same namespace? Say, you can get a chunk of disk and put that in namespace X and then processes running in namespace Y can't see or access it. Similarly, processes in namespace X can't access anything in memory that is allocated to namespace Y. Of course, processes in X can't see or talk to processes in namespace Y. This provides a kind of virtualization and isolation for global resources. This is how Docker works: Each container runs in its own namespace but uses exactly the same kernel as all other containers. The isolation happens because the kernel knows the namespace that was assigned to the process and during API calls it makes sure that the process can only access resources in its own namespace.
The limitations of containers vs VMs should be obvious now: You can't run completely different OSes in containers like in VMs. However you can run different distros of Linux because they do share the same kernel. The isolation level is not as strong as in a VM. In fact, there was a way for a "guest" container to take over the host in early implementations. Also you can see that when you load a new container, an entire new copy of the OS doesn't start like it does in a VM. All containers share the same kernel. This is why containers are light weight. Also unlike a VM, you don't have to pre-allocate a significant chunk of memory to containers because we are not running a new copy of the OS. This enables running thousands of containers on one OS while sandboxing them, which might not be possible if we were running separate copies of the OS in their own VMs.
Good answers. Just to get an image representation of container vs VM, have a look at the one below.
Source
I like Ken Cochrane's answer.
But I want to add additional point of view, not covered in detail here. In my opinion Docker differs also in whole process. In contrast to VMs, Docker is not (only) about optimal resource sharing of hardware, moreover it provides a "system" for packaging application (preferable, but not a must, as a set of microservices).
To me it fits in the gap between developer-oriented tools like rpm, Debian packages, Maven, npm + Git on one side and ops tools like Puppet, VMware, Xen, you name it...
Why is deploying software to a docker image (if that's the right term) easier than simply deploying to a consistent production environment?
Your question assumes some consistent production environment. But how to keep it consistent?
Consider some amount (>10) of servers and applications, stages in the pipeline.
To keep this in sync you'll start to use something like Puppet, Chef or your own provisioning scripts, unpublished rules and/or lot of documentation... In theory servers can run indefinitely, and be kept completely consistent and up to date. Practice fails to manage a server's configuration completely, so there is considerable scope for configuration drift, and unexpected changes to running servers.
So there is a known pattern to avoid this, the so called immutable server. But the immutable server pattern was not loved. Mostly because of the limitations of VMs that were used before Docker. Dealing with several gigabytes big images, moving those big images around, just to change some fields in the application, was very very laborious. Understandable...
With a Docker ecosystem, you will never need to move around gigabytes on "small changes" (thanks aufs and Registry) and you don't need to worry about losing performance by packaging applications into a Docker container at runtime. You don't need to worry about versions of that image.
And finally you will even often be able to reproduce complex production environments even on your Linux laptop (don't call me if doesn't work in your case ;))
And of course you can start Docker containers in VMs (it's a good idea). Reduce your server provisioning on the VM level. All the above could be managed by Docker.
P.S. Meanwhile Docker uses its own implementation "libcontainer" instead of LXC. But LXC is still usable.
Docker isn't a virtualization methodology. It relies on other tools that actually implement container-based virtualization or operating system level virtualization. For that, Docker was initially using LXC driver, then moved to libcontainer which is now renamed as runc. Docker primarily focuses on automating the deployment of applications inside application containers. Application containers are designed to package and run a single service, whereas system containers are designed to run multiple processes, like virtual machines. So, Docker is considered as a container management or application deployment tool on containerized systems.
In order to know how it is different from other virtualizations, let's go through virtualization and its types. Then, it would be easier to understand what's the difference there.
Virtualization
In its conceived form, it was considered a method of logically dividing mainframes to allow multiple applications to run simultaneously. However, the scenario drastically changed when companies and open source communities were able to provide a method of handling the privileged instructions in one way or another and allow for multiple operating systems to be run simultaneously on a single x86 based system.
Hypervisor
The hypervisor handles creating the virtual environment on which the guest virtual machines operate. It supervises the guest systems and makes sure that resources are allocated to the guests as necessary. The hypervisor sits in between the physical machine and virtual machines and provides virtualization services to the virtual machines. To realize it, it intercepts the guest operating system operations on the virtual machines and emulates the operation on the host machine's operating system.
The rapid development of virtualization technologies, primarily in cloud, has driven the use of virtualization further by allowing multiple virtual servers to be created on a single physical server with the help of hypervisors, such as Xen, VMware Player, KVM, etc., and incorporation of hardware support in commodity processors, such as Intel VT and AMD-V.
Types of Virtualization
The virtualization method can be categorized based on how it mimics hardware to a guest operating system and emulates a guest operating environment. Primarily, there are three types of virtualization:
Emulation
Paravirtualization
Container-based virtualization
Emulation
Emulation, also known as full virtualization runs the virtual machine OS kernel entirely in software. The hypervisor used in this type is known as Type 2 hypervisor. It is installed on the top of the host operating system which is responsible for translating guest OS kernel code to software instructions. The translation is done entirely in software and requires no hardware involvement. Emulation makes it possible to run any non-modified operating system that supports the environment being emulated. The downside of this type of virtualization is an additional system resource overhead that leads to a decrease in performance compared to other types of virtualizations.
Examples in this category include VMware Player, VirtualBox, QEMU, Bochs, Parallels, etc.
Paravirtualization
Paravirtualization, also known as Type 1 hypervisor, runs directly on the hardware, or “bare-metal”, and provides virtualization services directly to the virtual machines running on it. It helps the operating system, the virtualized hardware, and the real hardware to collaborate to achieve optimal performance. These hypervisors typically have a rather small footprint and do not, themselves, require extensive resources.
Examples in this category include Xen, KVM, etc.
Container-based Virtualization
Container-based virtualization, also known as operating system-level virtualization, enables multiple isolated executions within a single operating system kernel. It has the best possible performance and density and features dynamic resource management. The isolated virtual execution environment provided by this type of virtualization is called a container and can be viewed as a traced group of processes.
The concept of a container is made possible by the namespaces feature added to Linux kernel version 2.6.24. The container adds its ID to every process and adding new access control checks to every system call. It is accessed by the clone() system call that allows creating separate instances of previously-global namespaces.
Namespaces can be used in many different ways, but the most common approach is to create an isolated container that has no visibility or access to objects outside the container. Processes running inside the container appear to be running on a normal Linux system although they are sharing the underlying kernel with processes located in other namespaces, same for other kinds of objects. For instance, when using namespaces, the root user inside the container is not treated as root outside the container, adding additional security.
The Linux Control Groups (cgroups) subsystem, the next major component to enable container-based virtualization, is used to group processes and manage their aggregate resource consumption. It is commonly used to limit the memory and CPU consumption of containers. Since a containerized Linux system has only one kernel and the kernel has full visibility into the containers, there is only one level of resource allocation and scheduling.
Several management tools are available for Linux containers, including LXC, LXD, systemd-nspawn, lmctfy, Warden, Linux-VServer, OpenVZ, Docker, etc.
Containers vs Virtual Machines
Unlike a virtual machine, a container does not need to boot the operating system kernel, so containers can be created in less than a second. This feature makes container-based virtualization unique and desirable than other virtualization approaches.
Since container-based virtualization adds little or no overhead to the host machine, container-based virtualization has near-native performance
For container-based virtualization, no additional software is required, unlike other virtualizations.
All containers on a host machine share the scheduler of the host machine saving need of extra resources.
Container states (Docker or LXC images) are small in size compared to virtual machine images, so container images are easy to distribute.
Resource management in containers is achieved through cgroups. Cgroups does not allow containers to consume more resources than allocated to them. However, as of now, all resources of host machine are visible in virtual machines, but can't be used. This can be realized by running top or htop on containers and host machine at the same time. The output across all environments will look similar.
Update:
How does Docker run containers in non-Linux systems?
If containers are possible because of the features available in the Linux kernel, then the obvious question is how do non-Linux systems run containers. Both Docker for Mac and Windows use Linux VMs to run the containers. Docker Toolbox used to run containers in Virtual Box VMs. But, the latest Docker uses Hyper-V in Windows and Hypervisor.framework in Mac.
Now, let me describe how Docker for Mac runs containers in detail.
Docker for Mac uses https://github.com/moby/hyperkit to emulate the hypervisor capabilities and Hyperkit uses hypervisor.framework in its core. Hypervisor.framework is Mac's native hypervisor solution. Hyperkit also uses VPNKit and DataKit to namespace network and filesystem respectively.
The Linux VM that Docker runs in Mac is read-only. However, you can bash into it by running:
screen ~/Library/Containers/com.docker.docker/Data/vms/0/tty.
Now, we can even check the Kernel version of this VM:
# uname -a
Linux linuxkit-025000000001 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:86_64 Linux.
All containers run inside this VM.
There are some limitations to hypervisor.framework. Because of that Docker doesn't expose docker0 network interface in Mac. So, you can't access containers from the host. As of now, docker0 is only available inside the VM.
Hyper-v is the native hypervisor in Windows. They are also trying to leverage Windows 10's capabilities to run Linux systems natively.
Most of the answers here talk about virtual machines. I'm going to give you a one-liner response to this question that has helped me the most over the last couple years of using Docker. It's this:
Docker is just a fancy way to run a process, not a virtual machine.
Now, let me explain a bit more about what that means. Virtual machines are their own beast. I feel like explaining what Docker is will help you understand this more than explaining what a virtual machine is. Especially because there are many fine answers here telling you exactly what someone means when they say "virtual machine". So...
A Docker container is just a process (and its children) that is compartmentalized using cgroups inside the host system's kernel from the rest of the processes. You can actually see your Docker container processes by running ps aux on the host. For example, starting apache2 "in a container" is just starting apache2 as a special process on the host. It's just been compartmentalized from other processes on the machine. It is important to note that your containers do not exist outside of your containerized process' lifetime. When your process dies, your container dies. That's because Docker replaces pid 1 inside your container with your application (pid 1 is normally the init system). This last point about pid 1 is very important.
As far as the filesystem used by each of those container processes, Docker uses UnionFS-backed images, which is what you're downloading when you do a docker pull ubuntu. Each "image" is just a series of layers and related metadata. The concept of layering is very important here. Each layer is just a change from the layer underneath it. For example, when you delete a file in your Dockerfile while building a Docker container, you're actually just creating a layer on top of the last layer which says "this file has been deleted". Incidentally, this is why you can delete a big file from your filesystem, but the image still takes up the same amount of disk space. The file is still there, in the layers underneath the current one. Layers themselves are just tarballs of files. You can test this out with docker save --output /tmp/ubuntu.tar ubuntu and then cd /tmp && tar xvf ubuntu.tar. Then you can take a look around. All those directories that look like long hashes are actually the individual layers. Each one contains files (layer.tar) and metadata (json) with information about that particular layer. Those layers just describe changes to the filesystem which are saved as a layer "on top of" its original state. When reading the "current" data, the filesystem reads data as though it were looking only at the top-most layers of changes. That's why the file appears to be deleted, even though it still exists in "previous" layers, because the filesystem is only looking at the top-most layers. This allows completely different containers to share their filesystem layers, even though some significant changes may have happened to the filesystem on the top-most layers in each container. This can save you a ton of disk space, when your containers share their base image layers. However, when you mount directories and files from the host system into your container by way of volumes, those volumes "bypass" the UnionFS, so changes are not stored in layers.
Networking in Docker is achieved by using an ethernet bridge (called docker0 on the host), and virtual interfaces for every container on the host. It creates a virtual subnet in docker0 for your containers to communicate "between" one another. There are many options for networking here, including creating custom subnets for your containers, and the ability to "share" your host's networking stack for your container to access directly.
Docker is moving very fast. Its documentation is some of the best documentation I've ever seen. It is generally well-written, concise, and accurate. I recommend you check the documentation available for more information, and trust the documentation over anything else you read online, including Stack Overflow. If you have specific questions, I highly recommend joining #docker on Freenode IRC and asking there (you can even use Freenode's webchat for that!).
Through this post we are going to draw some lines of differences between VMs and LXCs. Let's first define them.
VM:
A virtual machine emulates a physical computing environment, but requests for CPU, memory, hard disk, network and other hardware resources are managed by a virtualization layer which translates these requests to the underlying physical hardware.
In this context the VM is called as the Guest while the environment it runs on is called the host.
LXCs:
Linux Containers (LXC) are operating system-level capabilities that make it possible to run multiple isolated Linux containers, on one control host (the LXC host). Linux Containers serve as a lightweight alternative to VMs as they don’t require the hypervisors viz. Virtualbox, KVM, Xen, etc.
Now unless you were drugged by Alan (Zach Galifianakis- from the Hangover series) and have been in Vegas for the last year, you will be pretty aware about the tremendous spurt of interest for Linux containers technology, and if I will be specific one container project which has created a buzz around the world in last few months is – Docker leading to some echoing opinions that cloud computing environments should abandon virtual machines (VMs) and replace them with containers due to their lower overhead and potentially better performance.
But the big question is, is it feasible?, will it be sensible?
a. LXCs are scoped to an instance of Linux. It might be different flavors of Linux (e.g. a Ubuntu container on a CentOS host but it’s still Linux.) Similarly, Windows-based containers are scoped to an instance of Windows now if we look at VMs they have a pretty broader scope and using the hypervisors you are not limited to operating systems Linux or Windows.
b. LXCs have low overheads and have better performance as compared to VMs. Tools viz. Docker which are built on the shoulders of LXC technology have provided developers with a platform to run their applications and at the same time have empowered operations people with a tool that will allow them to deploy the same container on production servers or data centers. It tries to make the experience between a developer running an application, booting and testing an application and an operations person deploying that application seamless, because this is where all the friction lies in and purpose of DevOps is to break down those silos.
So the best approach is the cloud infrastructure providers should advocate an appropriate use of the VMs and LXC, as they are each suited to handle specific workloads and scenarios.
Abandoning VMs is not practical as of now. So both VMs and LXCs have their own individual existence and importance.
Docker encapsulates an application with all its dependencies.
A virtualizer encapsulates an OS that can run any applications it can normally run on a bare metal machine.
They both are very different. Docker is lightweight and uses LXC/libcontainer (which relies on kernel namespacing and cgroups) and does not have machine/hardware emulation such as hypervisor, KVM. Xen which are heavy.
Docker and LXC is meant more for sandboxing, containerization, and resource isolation. It uses the host OS's (currently only Linux kernel) clone API which provides namespacing for IPC, NS (mount), network, PID, UTS, etc.
What about memory, I/O, CPU, etc.? That is controlled using cgroups where you can create groups with certain resource (CPU, memory, etc.) specification/restriction and put your processes in there. On top of LXC, Docker provides a storage backend (http://www.projectatomic.io/docs/filesystems/) e.g., union mount filesystem where you can add layers and share layers between different mount namespaces.
This is a powerful feature where the base images are typically readonly and only when the container modifies something in the layer will it write something to read-write partition (a.k.a. copy on write). It also provides many other wrappers such as registry and versioning of images.
With normal LXC you need to come with some rootfs or share the rootfs and when shared, and the changes are reflected on other containers. Due to lot of these added features, Docker is more popular than LXC. LXC is popular in embedded environments for implementing security around processes exposed to external entities such as network and UI. Docker is popular in cloud multi-tenancy environment where consistent production environment is expected.
A normal VM (for example, VirtualBox and VMware) uses a hypervisor, and related technologies either have dedicated firmware that becomes the first layer for the first OS (host OS, or guest OS 0) or a software that runs on the host OS to provide hardware emulation such as CPU, USB/accessories, memory, network, etc., to the guest OSes. VMs are still (as of 2015) popular in high security multi-tenant environment.
Docker/LXC can almost be run on any cheap hardware (less than 1 GB of memory is also OK as long as you have newer kernel) vs. normal VMs need at least 2 GB of memory, etc., to do anything meaningful with it. But Docker support on the host OS is not available in OS such as Windows (as of Nov 2014) where as may types of VMs can be run on windows, Linux, and Macs.
Here is a pic from docker/rightscale :
1. Lightweight
This is probably the first impression for many docker learners.
First, docker images are usually smaller than VM images, makes it easy to build, copy, share.
Second, Docker containers can start in several milliseconds, while VM starts in seconds.
2. Layered File System
This is another key feature of Docker. Images have layers, and different images can share layers, make it even more space-saving and faster to build.
If all containers use Ubuntu as their base images, not every image has its own file system, but share the same underline ubuntu files, and only differs in their own application data.
3. Shared OS Kernel
Think of containers as processes!
All containers running on a host is indeed a bunch of processes with different file systems. They share the same OS kernel, only encapsulates system library and dependencies.
This is good for most cases(no extra OS kernel maintains) but can be a problem if strict isolations are necessary between containers.
Why it matters?
All these seem like improvements, not revolution. Well, quantitative accumulation leads to qualitative transformation.
Think about application deployment. If we want to deploy a new software(service) or upgrade one, it is better to change the config files and processes instead of creating a new VM. Because Creating a VM with updated service, testing it(share between Dev & QA), deploying to production takes hours, even days. If anything goes wrong, you got to start again, wasting even more time. So, use configuration management tool(puppet, saltstack, chef etc.) to install new software, download new files is preferred.
When it comes to docker, it's impossible to use a newly created docker container to replace the old one. Maintainance is much easier!Building a new image, share it with QA, testing it, deploying it only takes minutes(if everything is automated), hours in the worst case. This is called immutable infrastructure: do not maintain(upgrade) software, create a new one instead.
It transforms how services are delivered. We want applications, but have to maintain VMs(which is a pain and has little to do with our applications). Docker makes you focus on applications and smooths everything.
Docker, basically containers, supports OS virtualization i.e. your application feels that it has a complete instance of an OS whereas VM supports hardware virtualization. You feel like it is a physical machine in which you can boot any OS.
In Docker, the containers running share the host OS kernel, whereas in VMs they have their own OS files. The environment (the OS) in which you develop an application would be same when you deploy it to various serving environments, such as "testing" or "production".
For example, if you develop a web server that runs on port 4000, when you deploy it to your "testing" environment, that port is already used by some other program, so it stops working. In containers there are layers; all the changes you have made to the OS would be saved in one or more layers and those layers would be part of image, so wherever the image goes the dependencies would be present as well.
In the example shown below, the host machine has three VMs. In order to provide the applications in the VMs complete isolation, they each have their own copies of OS files, libraries and application code, along with a full in-memory instance of an OS.
Whereas the figure below shows the same scenario with containers. Here, containers simply share the host operating system, including the kernel and libraries, so they don’t need to boot an OS, load libraries or pay a private memory cost for those files. The only incremental space they take is any memory and disk space necessary for the application to run in the container. While the application’s environment feels like a dedicated OS, the application deploys just like it would onto a dedicated host. The containerized application starts in seconds and many more instances of the application can fit onto the machine than in the VM case.
Source: https://azure.microsoft.com/en-us/blog/containers-docker-windows-and-trends/
There are three different setups that providing a stack to run an application on (This will help us to recognize what a container is and what makes it so much powerful than other solutions):
1) Traditional Servers(bare metal)
2) Virtual machines (VMs)
3) Containers
1) Traditional server stack consist of a physical server that runs an operating system and your application.
Advantages:
Utilization of raw resources
Isolation
Disadvantages:
Very slow deployment time
Expensive
Wasted resources
Difficult to scale
Difficult to migrate
Complex configuration
2) The VM stack consist of a physical server which runs an operating system and a hypervisor that manages your virtual machine, shared resources, and networking interface. Each Vm runs a Guest Operating System, an application or set of applications.
Advantages:
Good use of resources
Easy to scale
Easy to backup and migrate
Cost efficiency
Flexibility
Disadvantages:
Resource allocation is problematic
Vendor lockin
Complex configuration
3) The Container Setup, the key difference with other stack is container-based virtualization uses the kernel of the host OS to rum multiple isolated guest instances. These guest instances are called as containers. The host can be either a physical server or VM.
Advantages:
Isolation
Lightweight
Resource effective
Easy to migrate
Security
Low overhead
Mirror production and development environment
Disadvantages:
Same Architecture
Resource heavy apps
Networking and security issues.
By comparing the container setup with its predecessors, we can conclude that containerization is the fastest, most resource effective, and most secure setup we know to date. Containers are isolated instances that run your application. Docker spin up the container in a way, layers get run time memory with default storage drivers(Overlay drivers) those run within seconds and copy-on-write layer created on top of it once we commit into the container, that powers the execution of containers. In case of VM's that will take around a minute to load everything into the virtualize environment. These lightweight instances can be replaced, rebuild, and moved around easily. This allows us to mirror the production and development environment and is tremendous help in CI/CD processes. The advantages containers can provide are so compelling that they're definitely here to stay.
In relation to:-
"Why is deploying software to a docker image easier than simply
deploying to a consistent production environment ?"
Most software is deployed to many environments, typically a minimum of three of the following:
Individual developer PC(s)
Shared developer environment
Individual tester PC(s)
Shared test environment
QA environment
UAT environment
Load / performance testing
Live staging
Production
Archive
There are also the following factors to consider:
Developers, and indeed testers, will all have either subtlely or vastly different PC configurations, by the very nature of the job
Developers can often develop on PCs beyond the control of corporate or business standardisation rules (e.g. freelancers who develop on their own machines (often remotely) or contributors to open source projects who are not 'employed' or 'contracted' to configure their PCs a certain way)
Some environments will consist of a fixed number of multiple machines in a load balanced configuration
Many production environments will have cloud-based servers dynamically (or 'elastically') created and destroyed depending on traffic levels
As you can see the extrapolated total number of servers for an organisation is rarely in single figures, is very often in triple figures and can easily be significantly higher still.
This all means that creating consistent environments in the first place is hard enough just because of sheer volume (even in a green field scenario), but keeping them consistent is all but impossible given the high number of servers, addition of new servers (dynamically or manually), automatic updates from o/s vendors, anti-virus vendors, browser vendors and the like, manual software installs or configuration changes performed by developers or server technicians, etc. Let me repeat that - it's virtually (no pun intended) impossible to keep environments consistent (okay, for the purist, it can be done, but it involves a huge amount of time, effort and discipline, which is precisely why VMs and containers (e.g. Docker) were devised in the first place).
So think of your question more like this "Given the extreme difficulty of keeping all environments consistent, is it easier to deploying software to a docker image, even when taking the learning curve into account ?". I think you'll find the answer will invariably be "yes" - but there's only one way to find out, post this new question on Stack Overflow.
There are many answers which explain more detailed on the differences, but here is my very brief explanation.
One important difference is that VMs use a separate kernel to run the OS. That's the reason it is heavy and takes time to boot, consuming more system resources.
In Docker, the containers share the kernel with the host; hence it is lightweight and can start and stop quickly.
In Virtualization, the resources are allocated in the beginning of set up and hence the resources are not fully utilized when the virtual machine is idle during many of the times.
In Docker, the containers are not allocated with fixed amount of hardware resources and is free to use the resources depending on the requirements and hence it is highly scalable.
Docker uses UNION File system .. Docker uses a copy-on-write technology to reduce the memory space consumed by containers. Read more here
With a virtual machine, we have a server, we have a host operating system on that server, and then we have a hypervisor. And then running on top of that hypervisor, we have any number of guest operating systems with an application and its dependent binaries, and libraries on that server. It brings a whole guest operating system with it. It's quite heavyweight. Also there's a limit to how much you can actually put on each physical machine.
Docker containers on the other hand, are slightly different. We have the server. We have the host operating system. But instead a hypervisor, we have the Docker engine, in this case. In this case, we're not bringing a whole guest operating system with us. We're bringing a very thin layer of the operating system, and the container can talk down into the host OS in order to get to the kernel functionality there. And that allows us to have a very lightweight container.
All it has in there is the application code and any binaries and libraries that it requires. And those binaries and libraries can actually be shared across different containers if you want them to be as well. And what this enables us to do, is a number of things. They have much faster startup time. You can't stand up a single VM in a few seconds like that. And equally, taking them down as quickly.. so we can scale up and down very quickly and we'll look at that later on.
Every container thinks that it’s running on its own copy of the operating system. It’s got its own file system, own registry, etc. which is a kind of a lie. It’s actually being virtualized.
Source: Kubernetes in Action.
I have used Docker in production environments and staging very much. When you get used to it you will find it very powerful for building a multi container and isolated environments.
Docker has been developed based on LXC (Linux Container) and works perfectly in many Linux distributions, especially Ubuntu.
Docker containers are isolated environments. You can see it when you issue the top command in a Docker container that has been created from a Docker image.
Besides that, they are very light-weight and flexible thanks to the dockerFile configuration.
For example, you can create a Docker image and configure a DockerFile and tell that for example when it is running then wget 'this', apt-get 'that', run 'some shell script', setting environment variables and so on.
In micro-services projects and architecture Docker is a very viable asset. You can achieve scalability, resiliency and elasticity with Docker, Docker swarm, Kubernetes and Docker Compose.
Another important issue regarding Docker is Docker Hub and its community.
For example, I implemented an ecosystem for monitoring kafka using Prometheus, Grafana, Prometheus-JMX-Exporter, and Docker.
For doing that, I downloaded configured Docker containers for zookeeper, kafka, Prometheus, Grafana and jmx-collector then mounted my own configuration for some of them using YAML files, or for others, I changed some files and configuration in the Docker container and I build a whole system for monitoring kafka using multi-container Dockers on a single machine with isolation and scalability and resiliency that this architecture can be easily moved into multiple servers.
Besides the Docker Hub site there is another site called quay.io that you can use to have your own Docker images dashboard there and pull/push to/from it. You can even import Docker images from Docker Hub to quay then running them from quay on your own machine.
Note: Learning Docker in the first place seems complex and hard, but when you get used to it then you can not work without it.
I remember the first days of working with Docker when I issued the wrong commands or removing my containers and all of data and configurations mistakenly.
This is how Docker introduces itself:
Docker is the company driving the container movement and the only
container platform provider to address every application across the
hybrid cloud. Today’s businesses are under pressure to digitally
transform but are constrained by existing applications and
infrastructure while rationalizing an increasingly diverse portfolio
of clouds, datacenters and application architectures. Docker enables
true independence between applications and infrastructure and
developers and IT ops to unlock their potential and creates a model
for better collaboration and innovation.
So Docker is container based, meaning you have images and containers which can be run on your current machine. It's not including the operating system like VMs, but like a pack of different working packs like Java, Tomcat, etc.
If you understand containers, you get what Docker is and how it's different from VMs...
So, what's a container?
A container image is a lightweight, stand-alone, executable package of
a piece of software that includes everything needed to run it: code,
runtime, system tools, system libraries, settings. Available for both
Linux and Windows based apps, containerized software will always run
the same, regardless of the environment. Containers isolate software
from its surroundings, for example differences between development and
staging environments and help reduce conflicts between teams running
different software on the same infrastructure.
So as you see in the image below, each container has a separate pack and running on a single machine share that machine's operating system... They are secure and easy to ship...
There are a lot of nice technical answers here that clearly discuss the differences between VMs and containers as well as the origins of Docker.
For me the fundamental difference between VMs and Docker is how you manage the promotion of your application.
With VMs you promote your application and its dependencies from one VM to the next DEV to UAT to PRD.
Often these VM's will have different patches and libraries.
It is not uncommon for multiple applications to share a VM. This requires managing configuration and dependencies for all the applications.
Backout requires undoing changes in the VM. Or restoring it if possible.
With Docker the idea is that you bundle up your application inside its own container along with the libraries it needs and then promote the whole container as a single unit.
Except for the kernel the patches and libraries are identical.
As a general rule there is only one application per container which simplifies configuration.
Backout consists of stopping and deleting the container.
So at the most fundamental level with VMs you promote the application and its dependencies as discrete components whereas with Docker you promote everything in one hit.
And yes there are issues with containers including managing them although tools like Kubernetes or Docker Swarm greatly simplify the task.
Feature
Virtual Machine
(Docker) Containers
OS
Each VM Does contains an Operating System
Each Docker Container Does Not contains an Operating System
H/W
Each VM contain a virtual copy of the hardware that OS requires to run.
There is No virtualization of H/W with containers
Weight
VM's are heavy -- reason sited above--
containers are lightweight and, thus, fast
Required S/W
Virtuliazation achieve using software called a hypervisor
Containerzation achieve using software called a Docker
Core
Virtual machines provide virtual hardware (or hardware on which an operating system and other programs can be installed)
Docker containers don’t use any hardware virtualization. **It helps to use container
Abstraction
Virtual machines provide hardware abstractions so you can run multiple operating systems.
Containers provide OS abstractions so you can run multiple containers.
Boot-Time
It takes a long time (often minutes) to create and require significant resource overhead because they run a whole operating system in addition to the software you want to use.
It takes less time because Programs running inside Docker containers interface directly with the host’s Linux kernel.
Containers isolates libraries and software packages from the system so that you can install different versions of same software and libraries without conflict. It uses minimal storage and ram, almost no overhead using same base os kernel and available libraries with a small delta difference if possible. You can expose your hardware directly or indirectly to containers so that you can use acceleration such as gpu for computations.
In practice you use docker for pre-made containers. You install them and run them in one line. Installing tensorflow-gpu is as easy as docker run -it tensorflow-gpu. Although I could not stumble upon many premade containers of lxd (lxc containers),I find them easier to customize and more stable and performant.
Both containers and VMs can be used to distribute the load. But since containers has almost no overhead, container management software are focused on creating container clusters so that you distribute them, thus the load, to metal machines easily.
Real Life example:
Suppose that you need more than 50 types of computation environment and 50 types of services such as mysql, webhosting and cloud based services (like jenkins and object storage) and you have more than 50 different bare metal servers. Typically its an academic environment with many faculties. And you need to use resources efficiently and you need high availability. When one server goes down users should not experience any problem.
To solve this, what you do is basically installing all types of containers on all servers. And distribute the load to all metal machines. As one type of container is needed more it is possible to automatically spawn more of them on one or more bare metal machines. So that many different users can use different services and environments continuously and flexibly.
In that setup suppose there are 100 students using the system at the same time. 95 of them are using servers for rudimentary services such as checking GPA's, curriculum, library database etc. But 5 of them are performing 5 different types of engineering simulations. You will see that 49 bare metal servers are fully dedicated to engineering simulation each having 5 different types of computation containers tying to race each but balanced as %20 hw resource use. When you add 2500 more students for rudimentary tasks, that either will use %5 of all bare metal machines. Rest will be used for computations.
Thus the most important distinguishing features of container providing such flexibility benefits are:
ready to deploy premade containers, almost no overhead, fast spawnability
with live-adjustable quotas
using .cpu_allowencess , .ram_allowances or directly cgroup.
Kubernetes does all of this for you. After fiddling with docker and lxd you may want to check it out.
In my opinion it depends, it can be seen from the needs of your application, why decide to deploy to Docker because Docker breaks the application into small parts according to its function, this becomes effective because when one application / function is an error it has no effect on other applications , in contrast to using full vm, it will be slower and more complex in configuration, but in some ways safer than docker
The docker documentation (and self-explanation) makes a distinction between "virtual machines" vs. "containers". They have the tendency to interpret and use things in a little bit uncommon ways. They can do that because it is up to them, what do they write in their documentation, and because the terminology for virtualization is not yet really exact.
Fact is what the Docker documentation understands on "containers", is paravirtualization (sometimes "OS-Level virtualization") in the reality, contrarily the hardware virtualization, which is docker not.
Docker is a low quality paravirtualisation solution. The container vs. VM distinction is invented by the docker development, to explain the serious disadvantages of their product.
The reason, why it became so popular, is that they "gave the fire to the ordinary people", i.e. it made possible the simple usage of typically server ( = Linux) environments / software products on Win10 workstations. This is also a reason for us to tolerate their little "nuance". But it does not mean that we should also believe it.
The situation is made yet more cloudy by the fact that docker on Windows hosts used an embedded Linux in HyperV, and its containers have run in that. Thus, docker on Windows uses a combined hardware and paravirtualization solution.
In short, Docker containers are low-quality (para)virtual machines with a huge advantage and a lot of disadvantages.