Docker Base Images and Scaling Architecture - docker

I have an app running on MongoDB, Node JS Api, React front end, Nginx proxy, etc. I have all of these setup as individual images and running locally (OSX) in separate linked containers, which I run with Docker Compose. In production, I have setup a (one) Ubuntu server on Digital Ocean at the moment, and expect to quickly scale as needed to multiple servers.
My question is what is the best way to handle the underlying Linux base image for each of these containers?
1) Should all of the linux setup (apt-gets, node / mongo installs, etc) exist on the Linux machine and outside of Docker and one could simply create a snapshot of this image, spin up a new server instance, and run the desired Docker container if you needed to quickly scale, or
2) Should all of the linux setup exist within a 'base' Ubuntu image, which the mongo, node, and nginx images build on top of. This results in each image's size growing significantly since they each have a separate instance of Ubuntu, plus all of the package dependencies to run mongo, node, and nginx, or
3) Should each process (mongo, node, nginx) have a separate linux base Docker image since they each have separate dependencies? Again, each image would be grow because they each would run an instance of Ubuntu.
What is the proper way to handle this with Docker?

The answer is #2, but I suspect you may not fully understand the relationship between container and image.
How Docker uses images
First of all an image from the the Docker docs:
Containers are created from images. An image is only downloaded and cached locally. Images are distributed via Registries.
Image layers
What makes Docker images different from virtual machine images is how they're built and stored. Again from the docs:
Each image consists of a series of layers.
Docker makes use of union file systems to combine these
layers into a single image. Union file systems allow files and
directories of separate file systems, known as branches, to be
transparently overlaid, forming a single coherent file system.
One of the reasons Docker is so lightweight is because of these
layers. When you change a Docker image—for example, update an
application to a new version— a new layer gets built. Thus, rather
than replacing the whole image or entirely rebuilding, as you may do
with a virtual machine, only that layer is added or updated. Now you
don’t need to distribute a whole new image, just the update, making
distributing Docker images faster and simpler.
So, your mongo, node, and nginx images will be thin layers on top of a base image containing your basic Linux setup. That base image will only be downloaded once and will be re-used as a component layer by the other images.

Related

How to create docker images for a system with multiple applications

I have installed and configured a system in EC2s using Ansible. It is 1 EC2 master with a few EC2 workers. Sometimes when I use ansible to update or reinstall configuration, it fails because either some package has been removed from open-source repositories, or the package is updated so not compatible with some other packages. And I learned that using docker-container can resolve these kind of configuration problems.
However, according to what I learned, each docker image will create image of one application (I guess one application means one process). But mine is a system which has airflow master webserver, airflow worker webserver, flower webserver, rabbitmq, airflow celery, several configuration files, etc. how can I create docker images for that? Should I create one docker image for each process? How do I know which linux folder should I go to create each docker image? How do I know which applications/processes I need to create? And how to combine these images to make them work together as a system?
Or maybe in my case I should not use docker image, Instead I should just create an EC2 image?
Use docker-compose.
Compose is a tool for defining and running multi-container Docker applications
https://docs.docker.com/compose/
each docker image will create image of one application (I guess one application means one process)
That is basically correct. You should create one docker-container per application. In theory you can have multiple process per container, but that doesn't matter in this case.
how can I create docker images for that?
In your case you should make one docker-container for airflow master webserver, one for airflow worker webserver, one for flower webserver, etc. And the you use a docker-compose.yml to link them all together.
Should I create one docker image for each process?
generally yes. (It may depend on your exact setup though)
And how to combine these images to make them work together as a system?
docker-compose.
How do I know which linux folder should I go to create each docker image?
I don't understand that question
How do I know which applications/processes I need to create?
You could create a deployment-diagram and then start from there.

What is the overhead of creating docker images?

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?
Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/
You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

Docker base images, what do they compose of?

I am trying to wrap my head around the Docker architecture, in particular figuring out what exactly a base image consists of, and in doing I have been exploring some of the images found on the docker hub. Specifically when looking at the following repo it references the centos-7.2.1511-docker.tar.xz file.
I've downloaded and examined the contents of the tar and it has your typical Linux filesystem.
As I understand it, this is not a complete Linux OS and is just a replica of a linux filesystem with all the non essentials removed? Where all other requirements are drawn from the Host OS when a container is run(?)
My question essentially boils down to how one would go about creating that tar file? What exactly do you need. My intention is not to create one but rather understand what portion of files/data/dependencies come from a target OS to create an image and what gets used on the Host OS
A Docker container is a set of processes, running a sandbox enabled by Linux namespaces, on top of the host kernel.
A Docker image is a set of layers, which are often simply tarballs, of files that are unpacked, and made to look as if they are the root of the filesystem when used to start a container.
A Docker image could be just a single statically-linked executable! You can create your own Docker image from scratch by simply creating a tarball of a single executable, and giving it to docker load which wI'll store it as the appropriate internal format and register it as an image.
As you can see then, a Docker image need not be much. It certainly doesn't need a kernel, or any of the components normally used for configuring the system, networking daemons, or even things like cron. Those are all left to the host.
Things that are usually available in an image are a dynamic library runtime, and files like /etc/hosts, /etc/resolv.conf, and other files which are referenced directly by libc. This allows you to add typical dynamically-linked executables which interact with the system as if they're running on a traditionalal OS.
I have successfully "Dockerized" a legacy CentOS 6-based VM by uninstalling as many packages as possible, then tar-ing up the filesystem (excluding directories like /proc, /sys, /dev, etc.) and loading this via docker load. Afterwards, I started a container and (sometimes forcefully) removed additional "system" packages that serve no purpose in a Docker image, like kernel, udev, etc.
This blog post goes into some of the specifics of docker load:
http://tuhrig.de/difference-between-save-and-export-in-docker/

Different images in containers

I want to create separated containers with a single service in each (more or less). I am using the php7-apache image which seems to use a base image of debian:jessie, php7 and apache. Since apache and php in this case are pretty intertwined I don't mind using this container.
I want to start adding other services to their own containers (git for example) and was considering using a tiny base image like busybox or alpinebox for these containers to keep image size down.
That said, I have read that using the same base image as other containers only gives you the 'penalty' of the one time image download of the base OS (debian jessie) which is then cached - while using tiny OSes in other containers will download those OSes on top of the base OS.
What is the best practice in this case? Should I use the same base image (debian jessie) for all the containers in this case?
You may want to create a base image from scratch. Create a base image from scratch.
From docker documentation
You can use Docker’s reserved, minimal image, scratch, as a starting point for building containers. Using the scratch “image” signals to the build process that you want the next command in the Dockerfile to be the first filesystem layer in your image.
While scratch appears in Docker’s repository on the hub, you can’t pull it, run it, or tag any image with the name scratch. Instead, you can refer to it in your Dockerfile. For example, to create a minimal container using scratch:
This example creates the hello-world image used in the tutorials. If you want to test it out, you can clone the image repo

Docker: One image per user? Or one image for all users?

Question about Docker best practices/intended use:
I have docker working, and it's cool. I run a PaaS company, and my intent is maybe to use docker to run individual instances of our service for a given user.
So now I have an image that I've created that contains all the stuff for our service... and I can run it. But once I want to set it up for a specific user, theres a set of config files that I will need to modify for each user's instance.
So... the question is: Should that be part of my image filesystem, and hence, I then create a new image (based on my current image, but with their specific config files inside it) for each user?
Or should I put those on the host filesystem in a set of directories, and map the host filesystem config files into the correct running container for each user (hence, having only one image shared among all users)?
Modern PAAS systems favour building an image for each customer, creating versioned copies of both software and configuration. This follows the "Build, release, run" recommendation of the 12 factor app website:
http://12factor.net/
An docker based example is Deis. It uses Heroku build packs to customize the software application environment and the environment settings are also baked into a docker image. At run-time these images are run by chef on each application server.
This approach works well with Docker, because images are easy to build. The challenge I think is managing the docker images, something the docker registry is designed to support.

Resources