Reducing docker image size - docker

I downloaded an image that has only debian OS inside and started to build on it. The debian image was about 700mb when I first started with it. After installing LAMP, drupal site, varnish, tomcat, solr and some other services. The image has gone up significantly (i,e) upto 13 gb and after importing a 3gb mysql dump to that image the size of the image doubled to about 29gb.
Why is this happening? Is there a way to reduce the overall size of the image?

One small trick is to use apt-get install with the --no-install-recommends flag passed. This will install the minimum dependencies for the packages you are installing.
That being said, putting everything in a Dockerfile is an antipattern. Each "application" should be it's own Docker container. So you'd have one for Varnish, one for Solr, one for the database, one for the drupal site etc. Each will be relatively small by comparison. They'll all share the base debian image as well. To coordinate them, use something like Docker Compose.

Related

Docker best practice: use OS or application as base image?

I would like to build a docker image which contains Apache and Python3. What is the suggested base image to use in this case?
There is a offical apache:2.4.43-alpine image I can use as the base image, or I can install apache on top of a alpine base image.
What would be the best approach in this case?
Option1:
FROM apache:2.4.43-alpine
<Install python3>
Option2:
FROM alpine:3.9.6
<Install Apache>
<Install Python3>
Here are my rules.
rule 1: If the images are official images (such as node, python, apache, etc), it is fine to use them as your application's base image directly, more than you build your own.
rule 2:, if the images are built by the owner, such as hashicorp/terraform, hashicorp is the owner of terraform, then it is better to use it, more than build your own.
rule 3: If you want to save time only, choice the most downloaded images with similar applications installed as base image
Make sure you can view its Dockerfile. Otherwise, don't use it at all, whatever how many download counted.
rule 4: never pull images from public registry servers, if your company has security compliance concern, build your own.
Another reason to build your own is, the exist image are not built on the operation system you prefer. Such as some images proved by aws, they are built with amazon linux 2, in most case, I will rebuild with my own.
rule 5: When build your own, never mind from which base image, no need reinvent the wheel and use exis image's Dockerfile from github.com if you can.
Avoid Alpine, it will often make Python library installs muuuuch slower (https://pythonspeed.com/articles/alpine-docker-python/)
In general, Python version is more important than Apache version. Latest Apache from stable Linux distro is fine even if not latest version, but latest Python might be annoyingly too old. Like, when 3.9 comes out, do you want to be on 3.7?
As such, I would recommend python:3.8-slim-buster (or whatever Python version you want), and install Apache with apt-get.

Will Docker assist as a high level compressor?

I have a package sizing around 2+GB in an ubuntu machine. I have created a docker image from docker file and the size is around 86MB. Now, I have created another image by updating the docker file with the copy command which is to copy the package in the ubuntu machine to the docker image while building a new docker image.
Now, I am seeing the docker image size is around 2GB since the docker image contains the package as well. I want to know does this docker phenomena helps us in anyway to reduce the size of the package after making as a docker image.
I want the docker image to be created in such a way it should also contains the package which is around 2GB but I don't want docker image size should cross 200MB.
Regards,
Karthik
Docker will not compress an image's data for you. Hoping for 10x compression on an arbitrary installed package is rather optimistic, Docker or otherwise.
While Docker uses some clever and complicated kernel mechanisms, in most cases a file in a container maps to an ordinary file on disk in a strange filesystem layout. If you're trying to install a 2 GB package into an image, it will require 2 GB of local disk space. There are some mechanisms to share data that can help reduce overall disk usage (if you run 10 containers based on that image, they will share the 2 GB base image and not use extra disk space) but no built-in compression.
One further note: once you've taken action in a Dockerfile to use space in some form, it's permanently used. Each line in the Dockerfile creates a new "layer" or separate image which records the changes from the previous layer. If you're trying to install package that's not in a repository of some sort, you're stuck with its space utilization in the image, unless you can use multi-stage builds for it:
FROM ubuntu:18.04
COPY package.deb .
# At this point there is a layer containing the .deb file and
# it uses space in the final image.
RUN dpkg --install package.deb && rm package.deb
# The file isn't "in the image", but the image remembers adding it
# and then removing it, so it still uses space.
As mentioned by David it's not possible using docker which uses overlay or something similar layered FS representation.
But there is a workaround to keep the size of image low (but the size of the container will be big !!), but it'll increase the startup time of app from the docker container, also may add some complications to the arguments that you pass to the app. A possible way to reduce size :
compress the package before adding
copy the compressed one while building an image
use some script as the ENTRYPOINT/CMD whatever you use to start the container which should perform
uncompression [you'll need the uncompression package installed in the container] and
start the application.(you can reuse the arguments received to docker run as $# while passing to the application - or even validate the arguments beforehand)
Check by compressing your package file with gzip / 7zip / bzip2 or any compressor of your choice - all to see the output file is of reduced size.
If it can't be compressed enough

Is it necessary to have an base OS for docker image?

I am using Windows OS but to use docker I use CentOS VM over Oracle VM Virtualbox. I have seen a Dockerfile where centos is used as base image. First line of my Dockerfile is
FROM centos
If I check the Dockerfile of CentOS on Docker Hub then first line is
FROM scratch
scratch is used to build an explicitly empty image, especially for building images. Here I can understand that if I start traversing upward using "FROM " line then finally I will end up at "scratch" image. I can see that scratch can be used to create a minimal container.
Question: If I want to create some bigger applications using web server, database etc, then is it necessary to add a base OS image?
I have tried to search for mysql and tomcat and noticed that it finally uses a OS image.
My understanding of Container was that I can "just bundle the required software and my service" in the container. Please clarify.
Your understanding is correct, however "just bundle the required software and my service" may be cumbersome, especially if you also have some shell scripts that make further use of other support programs.
Using some base image that contains already all the necessary stuff is more convenient. You can share the same base image for several services and due to docker's layered images will have no overhead regarding disk space.

What is the overhead of creating docker images?

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?
Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/
You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

How to reduce docker image size?

I'm using docker official rails onbuild pack (https://registry.hub.docker.com/_/rails/) to build and create rails image with application. But each application is taking about 900MB. Is there any way this size can be reduced?
Here's my workflow ->
add dockerfile to the project -> build -> run
The problem is there can be N number of apps on this system and if each application takes 1G of disk space it would be an issue. If we reduce layers will it reduce size? If yes how can that be done?
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
blog2 latest 9d37aaaa3beb About a minute ago 931.1 MB
my-rails-app latest 9904zzzzc2af About an hour ago 931.1 MB
Since they are all coming from the same base image, they don't each take 900MB (assuming you're using AUFS as your file system*). There will be one copy of the base image (_/rails) and then the changes you've made will be stored in separate (usually much smaller) layers.
If you would like to see the size of each image layer, you might like to play with this tool. I've also containerized it here.
*) If you're using docker on Ubuntu or Debian, you're probably defaulting to AUFS. Other host Linux versions can use different file systems for the images, and they don't all share base images well.

Resources