Understanding docker from a layman point of view - docker

I am just one day old to docker , so it is relatively very new to me .
I read the docker.io but could not get the answers to few basic questions . Here is what it is:
Docker is basically a tool which allows you to make use of the images and spin up your own customised images by installing softwares so that you can use to create the VMs using that .
Is this what docker is all about from a 10000 ft bird's eye piont of view?
2 . What exactly is the meaning of a container ? Is it synonymn for image?
3 . I remember reading somewhere that it allows you to deploy applications. Is this correct ? In other words will it behave like IIS for deploying the .net applications?
Please answer my questions above , so that I can understand it better and take it forward.

1) What docker is all about from a 10000 ft bird's eye point of view?
From the website: Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.
Drill down a little bit more and a thorough explanation of the what/why docker addresses:
https://www.docker.io/the_whole_story/
https://www.docker.io/the_whole_story/#Why-Should-I-Care-(For-Developers)
https://www.docker.io/the_whole_story/#Why-Should-I-Care-(For-Devops)
Further depth can be found in the technology documentation:
http://docs.docker.io/introduction/technology/
2) What exactly is the meaning of a container ? Is it synonymn for image?
An image is the set of layers that are built up and can be moved around. Images are read-only.
http://docs.docker.io/en/latest/terms/image/
http://docs.docker.io/en/latest/terms/layer/
A container is an active (or inactive if exited) stateful instantiation of an image.
http://docs.docker.io/en/latest/terms/container/
See also: In Docker, what's the difference between a container and an image?
3)I remember reading somewhere that it allows you to deploy applications. Is this correct ? In other words will it behave like IIS for deploying the .net applications?
Yes, Docker can be used to deploy applications. You can deploy single components of the application stack or multiple components within a container. It depends on the use case. See the First steps with Docker page here: http://docs.docker.io/use/basics/
See also:
http://docs.docker.io/examples/nodejs_web_app/
http://docs.docker.io/examples/python_web_app/
http://docs.docker.io/examples/running_redis_service/
http://docs.docker.io/examples/using_supervisord/

So.
It's about providing the separation of processes that you get with virtualisation without the overhead. Of course this doesn't come without a cost - which in this case the largest one is that your docked containers will all be running under the same kernel.
A container is roughly a chroot (with better process encapsulation) and some ethernet virtualisation. The image is the filesystem (plus a few bits) that is mounted to provide the root filesystem^1
deploy is just the term docker uses for spinning up a container instance.
Effectively, each running instance of a container thinks that it is the only thing^2 running on that machine (much like a cloud appliance is typically designed). It provides more separation of processes than running on the host OS would provide, and allows for easily spinning up multiple separate copies of the container as needed; while providing much, much lower overheads than using full virtualisation would need.
^1: Actually there may be several layers of file-system sandwiched together to form the root file system.
^2: Docker does support multiple processes running within a single instance, but that is generally considered to be somewhat advanced usage.

Related

Can concurrent docker containers speed-up computations?

As a disclaimer let me just say that I am a beginner with Docker and hence the question might sound a bit "dummy".
I am exploring parallelization options to speed-up some computations. I'm working with Python, so I followed the official guidelines to create my first image and then run it as a container.
For the time being, I use a dummy program that generates a very large np random matrix (let's say 4000 x 4000) and then finds how many elements in each row fall into a predefined range [min, max].
I then launched a second container of the same image obviously with a different port and name. I didn't get any speed-ups in the computations which I was expecting somehow, since:
a) I haven't developed any mechanism for the 2 containers to somehow "talk to each other" and share intermediate results
b) I am not sure if the program itself is suitable for speedups in such a way.
So my questions corresponding to a, b above are:
Is parallelism a "feature" supported by docker deployments and in what sense? Is it just load sharing? And if I implement a load balancer, how does docker know how to transfer intermediate results from one container to the other?
If the previous question is not "correct", do I then need to write "parallel" versions of my programs to assign to each container? Isn't this equivalent to writing MPI versions of my program and assign them to different cores in my system? What would be the benefit of a docker architecture then?
thanks in advance
Docker is just a way of deploying your application - it does not in itself allow you to 'support' parallelism just by using Docker. Your application itself needs to support parallelism. Docker (and Kubernetes, etc) can help you scale out in parallel easily but your applications need to be able to support that scaling out.
So if you can run multiple instances of your application in parallel now (however you might do that) and it would not deliver any performance improvement then Docker will not help you. If running multiple instances now does improve performance then Docker will help scale out.

Why might an image run differently in Kubernetes than in Docker?

I'm experiencing an issue where an image I'm running as part of a Kubernetes deployment is behaving differently from the expected and consistent behavior of the same image run with docker run <...>. My understanding of the main purpose of containerizing a project is that it will always run the same way, regardless of the host environment (ignoring the influence of the user and of outside data. Is this wrong?
Without going into too much detail about my specific problem (since I feel the solution may likely be far too specific to be of help to anyone else on SO, and because I've already detailed it here), I'm curious if someone can detail possible reasons to look into as to why an image might run differently in a Kubernetes environment than locally through Docker.
The general answer of why they're different is resources, but the real answer is that they should both be identical given identical resources.
Kubernetes uses docker for its container runtime, at least in most cases I've seen. There are some other runtimes (cri-o and rkt) that are less widely adopted, so using those may also contribute to variance in how things work.
On your local docker it's pretty easy to mount things like directories (volumes) into the image, and you can populate the directory with some content. Doing the same thing on k8s is more difficult, and probably involves more complicated mappings, persistent volumes or an init container.
Running docker on your laptop and k8s on a server somewhere may give you different hardware resources:
different amounts of RAM
different size of hard disk
different processor features
different core counts
The last one is most likely what you're seeing, flask is probably looking up the core count for both systems and seeing two different values, and so it runs two different thread / worker counts.

Docker Swarm - Deploying stack with shared code base across hosts

I have a question related with the best practices for deploying applications to the production based on the docker swarm.
In order to simplify discussion related with this question/issue lets consider following scenario:
Our swarm contains:
6 servers (different hosts)
on each of these servers, we will have one service
each service will have only one task/replica docker running
Memcached1 and Memcached2 uses public images from docker hub
"Recycle data 1" and "Recycle data 2" uses custom image from private repository
"Client 1" and "Client 2" uses custom image from private repository
So at the end, for our example application, we have 6 dockers running across 6 different servers. 2 dockers are memcached, and 4 of them are clients which are communicating with memcached.
"Client 1" and "Client 2" are going to insert data in the memcached based on the some kind of rules. "Recycle data 1" and "Recycle data 2" are going to update or delete data from memcached based on some kind of rules. Simple as that.
Our applications which are communicating with memcached are custom ones, and they are written by us. The code for these application reside on github (or any other repository). What is the best way to deploy this application to the production:
Build images which will contain copied code within the image which you can use to deploy things to the swarm
Build image which will use volume where code reside outside of the image.
Having in mind that I am deploying swarm to the production for the first time, I can see a lot of issues with way number 1. Having a code incorporate to the images seems non logical to me, having in mind that in 99% of the time, the updates which are going to happen are going to be code based. This will require building image every time when you want to update the code which runs on specific docker (no matter how small that change is).
Way number 2. seems much more logical to me. But at this specific moment I am not sure is this possible? So there are a number of questions here:
What is the best approach in case where we are going to host multiple dockers which will run the same code in the background?
Is it possible on docker swarm, to have one central host,server (manager, anywhere) where we can clone our repositories and share those repositores as volumes across the docker swarm? (in our example, all 4 customer services will mount volume where we have our code hosted)
If this is possible, what is the docker-compose.yml implementation for it?
After digging more deeper and working with docker and docker swarm mode for last 3 months, these are the answers on questions above:
Answer 1: In general, you should consider your docker image as "compiled" version of your program. Your image should contain either code base, or compiled version of the program (depends which programming language you are using), and that specific image represents your version of the app. Every single time when you want to deploy your next version, you will generate the new image.
This is probably best approach for 99% of the apps which are going to be hosted with the docker (exceptions are development environments and apps where you really want to bash and control things directly from the docker container by itself).
Answer 2: It is possible but it is extremely bad approach. As mentioned in answer one, the best one is to copy the app code directly into the image and "consider" your image (running container) as "app by itself".
I was not able to wrap my head around this concept at the begging, because this concept will not allow you to simply go to the server (or where ever you are hosting your docker) and change the app and restart docker (obviously because container will be at the same beginning again after restart using the same image, same base of code you deployed with that image). Any kind of change SHOULD and NEEDS to be deployed as different image with different version. That is what docker is all about.
Additionally, initial idea for sharing same code base across multiple swarm services is possible, but it totally ruins purpose of the versioning across docker swarm.
Consider having 3 services which are used as redundant services (failover), and you want to use new version on one of them as beta test. This will not be possible with the shared code base.

Container technologies: docker, rkt, orchestration, kubernetes, GKE and AWS Container Service

I'm trying to get a good understanding of container technologies but am somewhat confused. It seems like certain technologies overlap different portions of the stack and different pieces of different technologies can be used as the DevOps team sees fit (e.g., can use Docker containers but don't have to use the Docker engine, could use engine from cloud provider instead). My confusion lies in understanding what each layer of the "Container Stack" provides and who the key providers are of each solution.
Here's my layman's understanding; would appreciate any corrections and feedback on holes in my understanding
Containers: self-contained package including application, runtime environment, system libraries, etc.; like a mini-OS with an application
It seems like Docker is the de-facto standard. Any others that are notable and widely used?
Container Clusters: groups of containers that share resources
Container Engine: groups containers into clusters, manages resources
Orchestrator: is this any different from a container engine? How?
Where do Docker Engine, rkt, Kubernetes, Google Container Engine, AWS Container Service, etc. fall between #s 2-4?
This may be a bit long and present some oversimplification but should be sufficient to get the idea across.
Physical machines
Some time ago, the best way to deploy simple applications was to simply buy a new webserver, install your favorite operating system on it, and run your applications there.
The cons of this model are:
The processes may interfere with each other (because they share CPU and file system resources), and one may affect the other's performance.
Scaling this system up/down is difficult as well, taking a lot of effort and time in setting up a new physical machine.
There may be differences in the hardware specifications, OS/kernel versions and software package versions of the physical machines, which make it difficult to manage these application instances in a hardware-agnostic manner.
Applications, being directly affected by the physical machine specifications, may need specific tweaking, recompilation, etc, which means that the cluster administrator needs to think of them as instances at an individual machine level. Hence, this approach does not scale. These properties make it undesirable for deploying modern production applications.
Virtual Machines
Virtual machines solve some of the problems of the above:
They provide isolation even while running on the same machine.
They provide a standard execution environment (the guest OS) irrespective of the underlying hardware.
They can be brought up on a different machine (replicated) quite quickly when scaling (order of minutes).
Applications typically do not need to be rearchitected for moving from physical hardware to virtual machines.
But they introduce some problems of their own:
They consume large amounts of resources in running an entire instance of an operating system.
They may not start/go down as fast as we want them to (order of seconds).
Even with hardware assisted virtualization, application instances may see significant performance degradation over an application running directly on the host.
(This may be an issue only for certain kinds of applications)
Packaging and distributing VM images is not as simple as it could be.
(This is not as much a drawback of the approach, as it is of the existing tooling for virtualization.)
Containers
Then, somewhere along the line, cgroups (control groups) were added to the linux kernel. This feature lets us isolate processes in groups, decide what other processes and file system they can see, and perform resource accounting at the group level.
Various container runtimes and engines came along which make the process of creating a "container", an environment within the OS, like a namespace which has limited visibility, resources, etc, very easy. Common examples of these include docker, rkt, runC, LXC, etc.
Docker, for example, includes a daemon which provides interactions like creating an "image", a reusable entity that can be launched into a container instantly. It also lets one manage individual containers in an intuitive way.
The advantages of containers:
They are light-weight and run with very little overhead, as they do not have their own instance of the kernel/OS and are running on top of a single host OS.
They offer some degree of isolation between the various containers and the ability to impose limits on various resources consumed by them (using the cgroup mechanism).
The tooling around them has evolved rapidly to allow easy building of reusable units (images), repositories for storing image revisions (container registries) and so on, largely due to docker.
It is encouraged that a single container run a single application process, in order to maintain and distribute it independently. The light-weight nature of a container make this preferable, and leads to faster development due to decoupling.
There are some cons as well:
The level of isolation provided is a less than that in case of VMs.
They are easiest to use with stateless 12-factor applications being built afresh and a slight struggle if one tries to deploy legacy applications, clustered distributed databases and so on.
They need orchestration and higher level primitives to be used effectively and at scale.
Container Orchestration
When running applications in production, as the complexity grows, it tends to have many different components, some of which scale up/down as necessary, or may need to be scaled. The containers themselves do not solve all our problems. We need a system that solves problems associated with real large-scale applications such as:
Networking between containers
Load balancing
Managing storage attached to these containers
Updating containers, scaling them, spreading them across nodes in a multi-node cluster and so on.
When we want to manage a cluster of containers, we use a container orchestration engine. Examples of these are Kubernetes, Mesos, Docker Swarm etc. They provide a host of functionality in addition to those listed above and the goal is to reduce the effort involved in dev-ops.
GKE (Google Container Engine) is hosted Kubernetes on Google Cloud Platform. It lets a user simply specify that they need an n-node kubernetes cluster and exposes the cluster itself as a managed instance. Kubernetes is open source and if one wanted to, one could also set it up on Google Compute Engine, a different cloud provider, or their own machines in their own data-center.
ECS is a proprietary container management/orchestration system built and operated by Amazon and available as part of the AWS suite.
To answer your questions specifically:
Docker engine: A tool to manage the lifecycle of a docker container and docker images. Create, restart, delete docker containers. Create, rename, delete docker images.
rkt: Analogous to docker engine, but different implementation
Kubernetes: A collection of tools to manage the lifecycle of a distributed application that uses containers. Contains tooling to manage containers, groups of containers, configuration for containers, orchestrating containers, scheduling them on actual instances, tooling to help developers write and maintain other services/tools to deal with containers.
Google Container Engine: Instead of getting VMs, installing "docker-engine" on them, installing kubernetes on them and getting it all to work with things like the right permissions to your infrastructure etc. imagine if it all came together so that you can choose the types of machines and the size of your cluster that has all of this just working. Things like pulling images from your project specific docker repository (google container registry) or claiming persistent volumes, or provisioning load-balancers just work without worrying about service accounts and permissions and what not.
ECS: Analogous to GKE (4) but without Kubernetes.
To address the points in your understanding: you are loosely right about things (except container engine I think). It's important to understand that the only important thing to understand is what a container is. The rest of it is just marketing/product names. It's also important to understand that today's understanding of containers is very warped by what Docker containers are and a lot of the opinions enforced by Docker and tooling around Docker. Containers have been around for a long time.
So once you understand what a (docker) container is, a container engine is just a tool to manage them, a container cluster is a just a group of containers, an orchestrator is just a tool to manage where containers run based on some parameters. IMHO, you really don't need to worry too much about what the rest of the tooling is once you understand and build a solid mental model around containers. The rest will just fit in automatically.
The best way to understand all of this? Build & deploy a decently complex application with Docker (persist data/use a database in your app) and everything will make sense.

How to upgrade an application inside a Docker image

I am very new to this docker thing, and as such might not have been able to frame my search good enough to find this answer. However I am trying to build a test image which would contain a few test applications,
But I see a problem there.
If I commit them all to one image and then need to upgrade one of the applications I would need to rebuild me entire Image again and the redistribute this image to all remotes (Is this correct ?).
Do I then used data containers for my applications and just have a linux image ?
regards
You should split your single container into multiple containers each with one Microservice.
Microservices is an approach to application development in which a large application is built as a suite of modular services. Each module supports a specific business goal and uses a simple, well-defined interface to communicate with other modules.
In you case you can start putting each application into one container.
Example:
You have a web application, first step would be having a container for webapp and another for the database.
Volumes are used for persistent data, Like your Database files that you want to keep after removing the container. It's not a good practice to have your entire app in these volumes.

Resources