Why does the Ubuntu docker base image only take 89 MB? - docker

In the Docker in Practice book, it explains that the todoapp that they create is layered on top of the node image, which is layered on top of ubtunu image (see image here). Why is the Ubuntu docker base image shown to take up only 89 MB, when in comparison a production OS install of Ubuntu takes approximately 10 times that much space?

If you read the ubuntu Dockerfile, you'll see that it's based on a tarball from https://cloud-images.ubuntu.com/.
If you inspect the images at https://cloud-images.ubuntu.com/minimal/releases/bionic/release/, you'll see that 75MB is typical compressed size.
To explain why this is so -- the minimal packageset is listed at https://packages.ubuntu.com/bionic/ubuntu-minimal. You'll see that it's a very small list -- no GUI, no development tools, no optional servers or services (as the expectation is that those will be installed later).

Related

Will Docker assist as a high level compressor?

I have a package sizing around 2+GB in an ubuntu machine. I have created a docker image from docker file and the size is around 86MB. Now, I have created another image by updating the docker file with the copy command which is to copy the package in the ubuntu machine to the docker image while building a new docker image.
Now, I am seeing the docker image size is around 2GB since the docker image contains the package as well. I want to know does this docker phenomena helps us in anyway to reduce the size of the package after making as a docker image.
I want the docker image to be created in such a way it should also contains the package which is around 2GB but I don't want docker image size should cross 200MB.
Regards,
Karthik
Docker will not compress an image's data for you. Hoping for 10x compression on an arbitrary installed package is rather optimistic, Docker or otherwise.
While Docker uses some clever and complicated kernel mechanisms, in most cases a file in a container maps to an ordinary file on disk in a strange filesystem layout. If you're trying to install a 2 GB package into an image, it will require 2 GB of local disk space. There are some mechanisms to share data that can help reduce overall disk usage (if you run 10 containers based on that image, they will share the 2 GB base image and not use extra disk space) but no built-in compression.
One further note: once you've taken action in a Dockerfile to use space in some form, it's permanently used. Each line in the Dockerfile creates a new "layer" or separate image which records the changes from the previous layer. If you're trying to install package that's not in a repository of some sort, you're stuck with its space utilization in the image, unless you can use multi-stage builds for it:
FROM ubuntu:18.04
COPY package.deb .
# At this point there is a layer containing the .deb file and
# it uses space in the final image.
RUN dpkg --install package.deb && rm package.deb
# The file isn't "in the image", but the image remembers adding it
# and then removing it, so it still uses space.
As mentioned by David it's not possible using docker which uses overlay or something similar layered FS representation.
But there is a workaround to keep the size of image low (but the size of the container will be big !!), but it'll increase the startup time of app from the docker container, also may add some complications to the arguments that you pass to the app. A possible way to reduce size :
compress the package before adding
copy the compressed one while building an image
use some script as the ENTRYPOINT/CMD whatever you use to start the container which should perform
uncompression [you'll need the uncompression package installed in the container] and
start the application.(you can reuse the arguments received to docker run as $# while passing to the application - or even validate the arguments beforehand)
Check by compressing your package file with gzip / 7zip / bzip2 or any compressor of your choice - all to see the output file is of reduced size.
If it can't be compressed enough

Minimal Ubuntu docker image is claimed to be 29MB, so why does "docker images" command say 84.1MB?

https://blog.ubuntu.com/2018/07/09/minimal-ubuntu-released
says
The 29MB Docker image for Minimal Ubuntu 18.04 LTS serves as a highly
efficient container...
...
On Dockerhub, the new Ubuntu 18.04 LTS image is now the new Minimal
Ubuntu 18.04 image. Launching a Docker instance with docker run
ubuntu:18.04 therefore launches a Docker instance with the latest
Minimal Ubuntu.
I ran the exact command mentioned:
docker run ubuntu:18.04
Then I ran "docker images" which said:
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 18.04 16508e5c265d 5 days ago 84.1MB
Why does that output say 84.1MB? The Ubuntu web page I quoted says it should be 29MB.
Am I doing something wrong?
Am I measuring the size incorrectly?
How can I get an Ubuntu image that's only 29MB?
The article states that Docker Hub hosts a "standard" image, which is bigger than the cloud image. The cloud image is the new thing they introduced and it weighs 29MB while the standard image weighs 32MB.
The cloud image is not available on Docker Hub.
But still, where did the 84MB come from? It's because you are downloading a compressed image from the registry. Which, in this case, only weighs 32MB. Once downloaded, it's decompressed into its usable format and stored locally on your machine.
Meaning everything is in order. Where do you get that cloud image from? Well, I'd start by looking at:
[...] are available for use now in Amazon EC2, Google Compute Engine (GCE) [...]
If you'd like to use it with a private cloud, this is where you download the image from link to Ubuntu Minimal Cloud Images
-edit-
addressing your comment, those private cloud sizes may vary. This is at least partially, if not mostly, due to differences between various hypervisor stacks. As is hinted at in the article:
Cloud images also contain the optimised kernel for each cloud and supporting boot utilities.
--
Just as an update, these days (~three years later), the latest 18:04 image weighs 25MB in its compressed format, so the exact numbers from my original answer are no longer valid, but the point is still valid.

Reducing docker image size

I downloaded an image that has only debian OS inside and started to build on it. The debian image was about 700mb when I first started with it. After installing LAMP, drupal site, varnish, tomcat, solr and some other services. The image has gone up significantly (i,e) upto 13 gb and after importing a 3gb mysql dump to that image the size of the image doubled to about 29gb.
Why is this happening? Is there a way to reduce the overall size of the image?
One small trick is to use apt-get install with the --no-install-recommends flag passed. This will install the minimum dependencies for the packages you are installing.
That being said, putting everything in a Dockerfile is an antipattern. Each "application" should be it's own Docker container. So you'd have one for Varnish, one for Solr, one for the database, one for the drupal site etc. Each will be relatively small by comparison. They'll all share the base debian image as well. To coordinate them, use something like Docker Compose.

What is the overhead of creating docker images?

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?
Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/
You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

How to reduce docker image size?

I'm using docker official rails onbuild pack (https://registry.hub.docker.com/_/rails/) to build and create rails image with application. But each application is taking about 900MB. Is there any way this size can be reduced?
Here's my workflow ->
add dockerfile to the project -> build -> run
The problem is there can be N number of apps on this system and if each application takes 1G of disk space it would be an issue. If we reduce layers will it reduce size? If yes how can that be done?
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
blog2 latest 9d37aaaa3beb About a minute ago 931.1 MB
my-rails-app latest 9904zzzzc2af About an hour ago 931.1 MB
Since they are all coming from the same base image, they don't each take 900MB (assuming you're using AUFS as your file system*). There will be one copy of the base image (_/rails) and then the changes you've made will be stored in separate (usually much smaller) layers.
If you would like to see the size of each image layer, you might like to play with this tool. I've also containerized it here.
*) If you're using docker on Ubuntu or Debian, you're probably defaulting to AUFS. Other host Linux versions can use different file systems for the images, and they don't all share base images well.

Resources