How to reduce docker image size? - ruby-on-rails

I'm using docker official rails onbuild pack (https://registry.hub.docker.com/_/rails/) to build and create rails image with application. But each application is taking about 900MB. Is there any way this size can be reduced?
Here's my workflow ->
add dockerfile to the project -> build -> run
The problem is there can be N number of apps on this system and if each application takes 1G of disk space it would be an issue. If we reduce layers will it reduce size? If yes how can that be done?
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
blog2 latest 9d37aaaa3beb About a minute ago 931.1 MB
my-rails-app latest 9904zzzzc2af About an hour ago 931.1 MB

Since they are all coming from the same base image, they don't each take 900MB (assuming you're using AUFS as your file system*). There will be one copy of the base image (_/rails) and then the changes you've made will be stored in separate (usually much smaller) layers.
If you would like to see the size of each image layer, you might like to play with this tool. I've also containerized it here.
*) If you're using docker on Ubuntu or Debian, you're probably defaulting to AUFS. Other host Linux versions can use different file systems for the images, and they don't all share base images well.

Related

How to reduce size of docker image by removing files and comitting

I have created a docker image and started installing many packages I needed for some tests.
However, after I was done, I realized I could remove some folders in order to reduce image size.
So that's what I did, and I commited those changes to a new image.
However, the image size remained the same. I saw someone with a similar issue here in SO but it seems there is an answer that explains that docker uses layers for its storage, so the image size only increases.
So my question is if it is possible to reduce image size by deleting folders and files or should I start from scratch?
docker image ls output for reference:
REPOSITORY TAG IMAGE ID CREATED SIZE
abacate melancia 9c1b3acdf62c 3 seconds ago 34.8GB
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx v2 68f6862f8371 9 minutes ago 34.8GB
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx v1 2090d0a74e9d 5 days ago 34.8GB
yes, you should start from scratch. The initial layers are indeed adding content to your image, as the existing answer notes.
If you kept your Dockerfile, you should be able to just replay the changes you made to your original image. This is a key way of working with Docker. The outcome of a build process isn't very valuable, when the process is repeatable. Keep the Dockerfile under source control, and Docker images become almost as ephemeral as Docker containers.

Will Docker assist as a high level compressor?

I have a package sizing around 2+GB in an ubuntu machine. I have created a docker image from docker file and the size is around 86MB. Now, I have created another image by updating the docker file with the copy command which is to copy the package in the ubuntu machine to the docker image while building a new docker image.
Now, I am seeing the docker image size is around 2GB since the docker image contains the package as well. I want to know does this docker phenomena helps us in anyway to reduce the size of the package after making as a docker image.
I want the docker image to be created in such a way it should also contains the package which is around 2GB but I don't want docker image size should cross 200MB.
Regards,
Karthik
Docker will not compress an image's data for you. Hoping for 10x compression on an arbitrary installed package is rather optimistic, Docker or otherwise.
While Docker uses some clever and complicated kernel mechanisms, in most cases a file in a container maps to an ordinary file on disk in a strange filesystem layout. If you're trying to install a 2 GB package into an image, it will require 2 GB of local disk space. There are some mechanisms to share data that can help reduce overall disk usage (if you run 10 containers based on that image, they will share the 2 GB base image and not use extra disk space) but no built-in compression.
One further note: once you've taken action in a Dockerfile to use space in some form, it's permanently used. Each line in the Dockerfile creates a new "layer" or separate image which records the changes from the previous layer. If you're trying to install package that's not in a repository of some sort, you're stuck with its space utilization in the image, unless you can use multi-stage builds for it:
FROM ubuntu:18.04
COPY package.deb .
# At this point there is a layer containing the .deb file and
# it uses space in the final image.
RUN dpkg --install package.deb && rm package.deb
# The file isn't "in the image", but the image remembers adding it
# and then removing it, so it still uses space.
As mentioned by David it's not possible using docker which uses overlay or something similar layered FS representation.
But there is a workaround to keep the size of image low (but the size of the container will be big !!), but it'll increase the startup time of app from the docker container, also may add some complications to the arguments that you pass to the app. A possible way to reduce size :
compress the package before adding
copy the compressed one while building an image
use some script as the ENTRYPOINT/CMD whatever you use to start the container which should perform
uncompression [you'll need the uncompression package installed in the container] and
start the application.(you can reuse the arguments received to docker run as $# while passing to the application - or even validate the arguments beforehand)
Check by compressing your package file with gzip / 7zip / bzip2 or any compressor of your choice - all to see the output file is of reduced size.
If it can't be compressed enough

Why does the Ubuntu docker base image only take 89 MB?

In the Docker in Practice book, it explains that the todoapp that they create is layered on top of the node image, which is layered on top of ubtunu image (see image here). Why is the Ubuntu docker base image shown to take up only 89 MB, when in comparison a production OS install of Ubuntu takes approximately 10 times that much space?
If you read the ubuntu Dockerfile, you'll see that it's based on a tarball from https://cloud-images.ubuntu.com/.
If you inspect the images at https://cloud-images.ubuntu.com/minimal/releases/bionic/release/, you'll see that 75MB is typical compressed size.
To explain why this is so -- the minimal packageset is listed at https://packages.ubuntu.com/bionic/ubuntu-minimal. You'll see that it's a very small list -- no GUI, no development tools, no optional servers or services (as the expectation is that those will be installed later).

What are docker child images

What are docker child images and why can't I delete them?
I have been working with a Kali Linux image and I commit my changes and call it Kaliupdate1, make more changes and call it Kaliupdate2 and then I try to remove Kaliupdate1 but it doesn't work...
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
kaliupdate2 latest e57f94c32fac 18 hours ago 2.25 GB
kaliupdate1 latest 16da215f736c 18 hours ago 1.12 GB
kaliupdate latest a841aa8bb8a9 19 hours ago 1.07 GB
So from your question, assuming that your workflow has been to start a container, work interactively inside the container and then commit the changes to a new image, the answer is that what you're essentially doing is creating a new layer on top of the existing kali base image.
As such the full stack of layers are required to operate. This doesn't mean that the disk space taken is 2.25+1.12+1.07 however as Docker shares the lower layers.
That said this isn't a great way to create Docker images, as doing things like chown and mv can leave redundant files in the image.
A better way is to create a new Dockerfile based on the original kali image (using FROM kali:latest in the Dockerfile) and then make the changes you want in the Dockerfile and execute a build , to give you the final image.
There's more information on Docker's website here

What is the overhead of creating docker images?

I'm exploring using docker so that we deploy new docker images instead of specific file changes, so all of the needs of the application come with each deployment etc.
Question 1:
If I add a new application file, say 10 MB, to a docker image when I deploy the new image, using the tools in Docker tool box, will this require the deployment of an entirely new image to my containers or do docker deployments just take the difference between the 2, similar to git version control?
Another way to put it, I looked on a list of docker base images and saw a version of ubuntu that is 188 MB. If I commit a new application to a docker image, using this base image, will my docker containers need to pull the full 188 MB, which they are already running, plus the application or is there a differential way of just getting what has changed?
Supplementary Question
Am I correct in assuming when using docker, deploying images is the intended approach? Meaning any new changes should require a new image deployment so that images are treated as immutable? When I was using AWS we followed this approach with AMI (Amazon Machine Images) but storing AMIs had low overhead, for docker I don't know yet.
Or is it a better practice to deploy dockerfiles and have the new image be built on the container itself?
Docker uses a layered union filesystem, only one copy of a layer will be pulled by a docker engine and stored on its filesystem. When you build an image, docker will check its layer cache to see if the same parent layer and same command have been used to build an existing layer, and if so, the cache is reused instead of building a new layer. Once any step in the build creates a new layer, all following steps will create new layers, so the order of your Dockerfile matters. You should add frequently changing steps to the end of the Dockerfile so the earlier steps can be cached.
Therefore, if you use a 200MB base image, have 50MB of additions, but only 10MB are new additions at the end of your Dockerfile, you'd push 250MB the first time to a docker engine, but only 10MB to an engine that already had a previous copy of that image, or 50MB to an engine that just had the 200MB base image.
The best practice with images is to build them once, push them to a registry (either self hosted using the registry image, cloud hosted by someone like AWS, or on Docker Hub), and then pull that image to each engine that needs to run it.
For more details on the layered filesystem, see https://docs.docker.com/engine/userguide/storagedriver/imagesandcontainers/
You can also work a little, in order to create smaller images.
You can use Alpine or Busybox instead of using bigger Ubuntu, Debian or Bitnami (Debian light).
A smaller image is more secure as less tools are available.
Some reading
http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/
https://www.dajobe.org/blog/2015/04/18/making-debian-docker-images-smaller/
You have 2 great tools in order to make smaller docker images
https://github.com/docker-slim/docker-slim
and
https://github.com/mvanholsteijn/strip-docker-image
Some examples with docker-slim
https://hub.docker.com/r/k3ck3c/grafana-xxl.slim/
shows
size before -> 357.3 MB
and using docker-slim -> 18.73 MB
or about simh
https://hub.docker.com/r/k3ck3c/simh_bitnami.slim/
size 5.388 MB
when the original
k3ck3c/simh_bitnami 88.86 MB
a popular netcat image
chilcano/netcat is 135.2 MB
when a netcat based on Alpine is 7.812 MB
and based on busybox will need 2 or 3 MB

Resources