Will Docker assist as a high level compressor? - docker

I have a package sizing around 2+GB in an ubuntu machine. I have created a docker image from docker file and the size is around 86MB. Now, I have created another image by updating the docker file with the copy command which is to copy the package in the ubuntu machine to the docker image while building a new docker image.
Now, I am seeing the docker image size is around 2GB since the docker image contains the package as well. I want to know does this docker phenomena helps us in anyway to reduce the size of the package after making as a docker image.
I want the docker image to be created in such a way it should also contains the package which is around 2GB but I don't want docker image size should cross 200MB.
Regards,
Karthik

Docker will not compress an image's data for you. Hoping for 10x compression on an arbitrary installed package is rather optimistic, Docker or otherwise.
While Docker uses some clever and complicated kernel mechanisms, in most cases a file in a container maps to an ordinary file on disk in a strange filesystem layout. If you're trying to install a 2 GB package into an image, it will require 2 GB of local disk space. There are some mechanisms to share data that can help reduce overall disk usage (if you run 10 containers based on that image, they will share the 2 GB base image and not use extra disk space) but no built-in compression.
One further note: once you've taken action in a Dockerfile to use space in some form, it's permanently used. Each line in the Dockerfile creates a new "layer" or separate image which records the changes from the previous layer. If you're trying to install package that's not in a repository of some sort, you're stuck with its space utilization in the image, unless you can use multi-stage builds for it:
FROM ubuntu:18.04
COPY package.deb .
# At this point there is a layer containing the .deb file and
# it uses space in the final image.
RUN dpkg --install package.deb && rm package.deb
# The file isn't "in the image", but the image remembers adding it
# and then removing it, so it still uses space.

As mentioned by David it's not possible using docker which uses overlay or something similar layered FS representation.
But there is a workaround to keep the size of image low (but the size of the container will be big !!), but it'll increase the startup time of app from the docker container, also may add some complications to the arguments that you pass to the app. A possible way to reduce size :
compress the package before adding
copy the compressed one while building an image
use some script as the ENTRYPOINT/CMD whatever you use to start the container which should perform
uncompression [you'll need the uncompression package installed in the container] and
start the application.(you can reuse the arguments received to docker run as $# while passing to the application - or even validate the arguments beforehand)
Check by compressing your package file with gzip / 7zip / bzip2 or any compressor of your choice - all to see the output file is of reduced size.
If it can't be compressed enough

Related

Docker container export and importing it into image

I have started using docker recently, Initially when I stared the docker image was 2-3 GB in size. I am saving the work done from the container into an image(s) so the image size have grown significantly(~6 GB). I want delete images while preserving the work done. When I export the container to gziped file, the size of that file is ~1 GB. Will it work fine if I delete the current image I have now(~6 GB) and create a new one from the gzipped file with docker import. The description of import command says it will create filesystem image, its docker image or something else ie I will be able to create containers from that image?
You can save the image (see more details here), for example:
docker save busybox > busybox.tar
Another alternative is to write a Dokerfile which contains all the instructions necessary to build your image. The main advantage is that this is a text file which can be versioned controlled hence, you can keep track of all the changes you made to your image. Another advantage is that you can deploy that image elsewhere without having to copy images across the system. For example, instead of copying a 1GB or 6GB image, you just need to copy the DockerFile and build the image in that new host. More details about the docker file can be found here

What is docker's scratch image?

I'm new to docker and I was trying out the first hello world example in the docs. As I understand the hello-world image is based on top of the scratch image. Could someone please explain how the scratch image works? As I understand it is essentially blank. How is the binary executed in the hello-world image then?
The scratch image is the most minimal image in Docker. This is the base ancestor for all other images. The scratch image is actually empty. It doesn't contain any folders/files ...
The scratch image is mostly used for building other base images. For instance, the debian image is built from scratch as such:
FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]
The rootfs.tar.xz contains all the files system files. The Debian image adds the filesystem folders to the scratch image, which is empty.
As I understand it is essentially blank. How is the binary executed in
the hello-world image then?
The scratch image is blank.The hello-world executable added to the scratch image is actually statically compiled, meaning that it is self-contained and doesn't need any additional libraries to execute.
As stated in the offical docker docs:
Assuming you built the “hello” executable example from the Docker
GitHub example C-source code, and you compiled it with the -static
flag, you can then build this Docker image using: docker build --tag
hello
This confirms that the hello-world executable is statically compiled. For more info about static compiling, read here.
A bit late to the party, but adding to the answer of #yamenk.
Scratch isn't technically an image, but it's merely a reference. The way container images are constructed is that it makes use of the underlying Kernel providing only the tools and system calls that are present inside the kernel. Because in Linux everything is a file you can add any self-contained binary or an entire operating system as a file in this filesystem.
This means that when creating an image from Scratch, technically refers to the Kernel of the host system and all the files on top of it are loaded. That's why building from Scratch is no also a no-op operation and when adding just a single binary the size of the image is only the size of that binary plus a bit of overhead.
The resources that you can assign when executing an image in a container is by leveraging the cgroups functionality and the networking makes use of the linux network namespacing technique.
In a short, The official scratch image contains nothing, totally zero bytes.
But the container instance is not what the container image looks like. Even the scratch image is empty. When the container like runC run up a instance from a image built from scratch, It need more things (like rootfs etc.) than what you can see in the dockfile.

The best way to create docker image "offline installer"

I use docker-compose file to get Elasticsearch Logstash Kibana stack. Everything works fine,
docker-compose build
command creates three images, about 600 MB each, downloads from docker repository needed layers.
Now, I need to do the same, but at the machine with no Internet access. Downloading from respositories there is impossible. I need to create "offline installer". The best way I found is
docker save image1 image2 image3 -o archivebackup.tar
but created file is almost 2GB. During
docker-compose build
command some data are downloaded from the Internet but it is definitely less than 2GB.
What is a better way to create my "offline installer", to avoid making it so big?
The save command is the way to go for running docker images online.
The size difference that you are noticing is because when you are pulling images from a registry, some layers might exist locally and are thus not pulled. So you are not pulling all the image layers, only the ones
that you don't have locally.
On the other hand, when you are saving the image to a tar, all the layers need to be stored.
The best way to create the Docker offline Installer is to
List item
Get the CI/CD pipeline to generate the TAR file as build process.
Later create a local folder with the required TAR files
Write a script to load these TAR files on the machine
The same script can fire the docker-compose up -d command to bring up the whole service ecosystem
Note : It is important to load the images before bringing up the services
Regarding the size issue the answer by Yamenk specifically points to the reason why the size increases. The reason is docker does not pull the shared layers.

What is the overhead of creating docker images?

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?
Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/
You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

How to reduce docker image size?

I'm using docker official rails onbuild pack (https://registry.hub.docker.com/_/rails/) to build and create rails image with application. But each application is taking about 900MB. Is there any way this size can be reduced?
Here's my workflow ->
add dockerfile to the project -> build -> run
The problem is there can be N number of apps on this system and if each application takes 1G of disk space it would be an issue. If we reduce layers will it reduce size? If yes how can that be done?
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
blog2 latest 9d37aaaa3beb About a minute ago 931.1 MB
my-rails-app latest 9904zzzzc2af About an hour ago 931.1 MB
Since they are all coming from the same base image, they don't each take 900MB (assuming you're using AUFS as your file system*). There will be one copy of the base image (_/rails) and then the changes you've made will be stored in separate (usually much smaller) layers.
If you would like to see the size of each image layer, you might like to play with this tool. I've also containerized it here.
*) If you're using docker on Ubuntu or Debian, you're probably defaulting to AUFS. Other host Linux versions can use different file systems for the images, and they don't all share base images well.

Resources