What is the overhead of creating docker images? - docker

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?

Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/

You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

Related

Will Docker assist as a high level compressor?

I have a package sizing around 2+GB in an ubuntu machine. I have created a docker image from docker file and the size is around 86MB. Now, I have created another image by updating the docker file with the copy command which is to copy the package in the ubuntu machine to the docker image while building a new docker image.
Now, I am seeing the docker image size is around 2GB since the docker image contains the package as well. I want to know does this docker phenomena helps us in anyway to reduce the size of the package after making as a docker image.
I want the docker image to be created in such a way it should also contains the package which is around 2GB but I don't want docker image size should cross 200MB.
Regards,
Karthik
Docker will not compress an image's data for you. Hoping for 10x compression on an arbitrary installed package is rather optimistic, Docker or otherwise.
While Docker uses some clever and complicated kernel mechanisms, in most cases a file in a container maps to an ordinary file on disk in a strange filesystem layout. If you're trying to install a 2 GB package into an image, it will require 2 GB of local disk space. There are some mechanisms to share data that can help reduce overall disk usage (if you run 10 containers based on that image, they will share the 2 GB base image and not use extra disk space) but no built-in compression.
One further note: once you've taken action in a Dockerfile to use space in some form, it's permanently used. Each line in the Dockerfile creates a new "layer" or separate image which records the changes from the previous layer. If you're trying to install package that's not in a repository of some sort, you're stuck with its space utilization in the image, unless you can use multi-stage builds for it:
FROM ubuntu:18.04
COPY package.deb .
# At this point there is a layer containing the .deb file and
# it uses space in the final image.
RUN dpkg --install package.deb && rm package.deb
# The file isn't "in the image", but the image remembers adding it
# and then removing it, so it still uses space.
As mentioned by David it's not possible using docker which uses overlay or something similar layered FS representation.
But there is a workaround to keep the size of image low (but the size of the container will be big !!), but it'll increase the startup time of app from the docker container, also may add some complications to the arguments that you pass to the app. A possible way to reduce size :
compress the package before adding
copy the compressed one while building an image
use some script as the ENTRYPOINT/CMD whatever you use to start the container which should perform
uncompression [you'll need the uncompression package installed in the container] and
start the application.(you can reuse the arguments received to docker run as $# while passing to the application - or even validate the arguments beforehand)
Check by compressing your package file with gzip / 7zip / bzip2 or any compressor of your choice - all to see the output file is of reduced size.
If it can't be compressed enough

The best way to create docker image "offline installer"

I use docker-compose file to get Elasticsearch Logstash Kibana stack. Everything works fine,
docker-compose build
command creates three images, about 600 MB each, downloads from docker repository needed layers.
Now, I need to do the same, but at the machine with no Internet access. Downloading from respositories there is impossible. I need to create "offline installer". The best way I found is
docker save image1 image2 image3 -o archivebackup.tar
but created file is almost 2GB. During
docker-compose build
command some data are downloaded from the Internet but it is definitely less than 2GB.
What is a better way to create my "offline installer", to avoid making it so big?
The save command is the way to go for running docker images online.
The size difference that you are noticing is because when you are pulling images from a registry, some layers might exist locally and are thus not pulled. So you are not pulling all the image layers, only the ones
that you don't have locally.
On the other hand, when you are saving the image to a tar, all the layers need to be stored.
The best way to create the Docker offline Installer is to
List item
Get the CI/CD pipeline to generate the TAR file as build process.
Later create a local folder with the required TAR files
Write a script to load these TAR files on the machine
The same script can fire the docker-compose up -d command to bring up the whole service ecosystem
Note : It is important to load the images before bringing up the services
Regarding the size issue the answer by Yamenk specifically points to the reason why the size increases. The reason is docker does not pull the shared layers.

Different images in containers

I want to create separated containers with a single service in each (more or less). I am using the php7-apache image which seems to use a base image of debian:jessie, php7 and apache. Since apache and php in this case are pretty intertwined I don't mind using this container.
I want to start adding other services to their own containers (git for example) and was considering using a tiny base image like busybox or alpinebox for these containers to keep image size down.
That said, I have read that using the same base image as other containers only gives you the 'penalty' of the one time image download of the base OS (debian jessie) which is then cached - while using tiny OSes in other containers will download those OSes on top of the base OS.
What is the best practice in this case? Should I use the same base image (debian jessie) for all the containers in this case?
You may want to create a base image from scratch. Create a base image from scratch.
From docker documentation
You can use Docker’s reserved, minimal image, scratch, as a starting point for building containers. Using the scratch “image” signals to the build process that you want the next command in the Dockerfile to be the first filesystem layer in your image.
While scratch appears in Docker’s repository on the hub, you can’t pull it, run it, or tag any image with the name scratch. Instead, you can refer to it in your Dockerfile. For example, to create a minimal container using scratch:
This example creates the hello-world image used in the tutorials. If you want to test it out, you can clone the image repo

Docker Base Images and Scaling Architecture

I have an app running on MongoDB, Node JS Api, React front end, Nginx proxy, etc. I have all of these setup as individual images and running locally (OSX) in separate linked containers, which I run with Docker Compose. In production, I have setup a (one) Ubuntu server on Digital Ocean at the moment, and expect to quickly scale as needed to multiple servers.
My question is what is the best way to handle the underlying Linux base image for each of these containers?
1) Should all of the linux setup (apt-gets, node / mongo installs, etc) exist on the Linux machine and outside of Docker and one could simply create a snapshot of this image, spin up a new server instance, and run the desired Docker container if you needed to quickly scale, or
2) Should all of the linux setup exist within a 'base' Ubuntu image, which the mongo, node, and nginx images build on top of. This results in each image's size growing significantly since they each have a separate instance of Ubuntu, plus all of the package dependencies to run mongo, node, and nginx, or
3) Should each process (mongo, node, nginx) have a separate linux base Docker image since they each have separate dependencies? Again, each image would be grow because they each would run an instance of Ubuntu.
What is the proper way to handle this with Docker?
The answer is #2, but I suspect you may not fully understand the relationship between container and image.
How Docker uses images
First of all an image from the the Docker docs:
Containers are created from images. An image is only downloaded and cached locally. Images are distributed via Registries.
Image layers
What makes Docker images different from virtual machine images is how they're built and stored. Again from the docs:
Each image consists of a series of layers.
Docker makes use of union file systems to combine these
layers into a single image. Union file systems allow files and
directories of separate file systems, known as branches, to be
transparently overlaid, forming a single coherent file system.
One of the reasons Docker is so lightweight is because of these
layers. When you change a Docker image—for example, update an
application to a new version— a new layer gets built. Thus, rather
than replacing the whole image or entirely rebuilding, as you may do
with a virtual machine, only that layer is added or updated. Now you
don’t need to distribute a whole new image, just the update, making
distributing Docker images faster and simpler.
So, your mongo, node, and nginx images will be thin layers on top of a base image containing your basic Linux setup. That base image will only be downloaded once and will be re-used as a component layer by the other images.

How to reduce docker image size?

I'm using docker official rails onbuild pack (https://registry.hub.docker.com/_/rails/) to build and create rails image with application. But each application is taking about 900MB. Is there any way this size can be reduced?
Here's my workflow ->
add dockerfile to the project -> build -> run
The problem is there can be N number of apps on this system and if each application takes 1G of disk space it would be an issue. If we reduce layers will it reduce size? If yes how can that be done?
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
blog2 latest 9d37aaaa3beb About a minute ago 931.1 MB
my-rails-app latest 9904zzzzc2af About an hour ago 931.1 MB
Since they are all coming from the same base image, they don't each take 900MB (assuming you're using AUFS as your file system*). There will be one copy of the base image (_/rails) and then the changes you've made will be stored in separate (usually much smaller) layers.
If you would like to see the size of each image layer, you might like to play with this tool. I've also containerized it here.
*) If you're using docker on Ubuntu or Debian, you're probably defaulting to AUFS. Other host Linux versions can use different file systems for the images, and they don't all share base images well.

Resources