How to create the smallest possible Docker image after installing apt dependencies - docker

I've created a Docker image using debian as the parent image. In my Dockerfile I've installed some dependencies using apt and pip.
Now, I want to get rid off everything that is not completely necessary to run my app, which of course, needs the dependencies installed.
For now I have the following lines in my Dockerfile after installing the dependencies.
RUN rm -rf /var/lib/apt/lists/* \
&& rm -Rf /usr/share/doc && rm -Rf /usr/share/man \
&& apt-get clean
I've also installed the dependencies using the --no-install-recommends option.
Anything else I can do to reduce the footprint of my Docker image?
PS: just in case, this is how I installed the dependencies:
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
sudo systemd \
build-essential libffi-dev libssl-dev \
python-pip python-dev python-setuptools python-wheel

To reduce the size of the image, you need to combine your RUN commands into one. When you create files in one layer and delete them in another, the files still exist on the drive and are shipped over the network. Their existence is just hidden when the layers of the filesystem are assembled for your container.
The Dockerfile best practices explain this in more detail: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run
I'd also recommend building with docker build --rm=false --no-cache . (temporarily) and then reviewing the output of docker diff on each of the created images to see what files are created in each layer.

Related

How can I build a similar docker image based on alpine that works on ubuntu?

I am trying to rewrite a Dockerfile (https://github.com/orangefoil/rcssserver-docker/blob/master/Dockerfile) so that it uses alpine instead of ubuntu. Goal is to reduce the file size.
In the original image the robocup soccer server is built from scratch using g++, flex, bison, etc.
FROM ubuntu:18.04 AS build
ARG VERSION=16.0.0
WORKDIR /root
RUN apt update && \
apt -y install autoconf bison clang flex libboost-dev libboost-all-dev libc6-dev make wget
RUN wget https://github.com/rcsoccersim/rcssserver/archive/rcssserver-$VERSION.tar.gz && \
tar xfz rcssserver-$VERSION.tar.gz && \
cd rcssserver-rcssserver-$VERSION && \
./bootstrap && \
./configure && \
make && \
make install && \
ldconfig
I tried to do the same on alpine and had to exchange some packages:
FROM alpine:latest
ARG VERSION=16.0.0
WORKDIR /root
# Add basics first
RUN apk — no-cache update \
&& apk upgrade \
&& apk add autoconf bison clang-dev flex-dev boost-dev make wget automake libtool-dev g++ build-base
RUN wget https://github.com/rcsoccersim/rcssserver/archive/rcssserver-$VERSION.tar.gz
RUN tar xfz rcssserver-$VERSION.tar.gz
RUN cd rcssserver-rcssserver-$VERSION && \
./bootstrap && \
./configure && \
make && \
make install && \
ldconfig
Unfortunately, my version doesn't work yet. It fails with
/usr/lib/gcc/x86_64-alpine-linux-musl/9.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lrcssclangparser
From what I found so far, this can happen, if dev packages are not installed (see ld cannot find an existing library), but I changed to dev packages where I could find them and still no luck.
So, my current assumption is that ubuntu has some package installed, that I need to add in my alpine image. I would exclude a code problem, since the ubuntu version works.
Any ideas, what could be missing? I would also be happy to understand how to compare the packages myself, but the package namings are not the same in ubuntu and alpine, so I find it pretty hard to figure this out.
You should break this up using a multi-stage build. In the image you're building now, the final image contains the C toolchain and all of the development libraries and headers that those -dev packages install; you don't need any of those to actually run the built application. The basic idea is to build the application exactly as you have it now, but then COPY only the built application into a new image with fewer dependencies.
This would look something like this (untested):
FROM ubuntu:18.04 AS build
# ... exactly what's in the original question ...
FROM ubuntu:18.04
# Install the shared libraries you need to run the application,
# but not -dev headers or the full C toolchain. You may need to
# run `ldd` on the built binary to see what exactly it needs.
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --assume-yes --no-install-recommends \
libboost-atomic1.65.1 \
libboost-chrono1.65.1 \
# ... more libboost-* libraries as required ...
# Get the built application out of the original image.
# Autoconf's default is to install into /usr/local, and in a
# typical Docker base image nothing else will be installed there.
COPY --from=build /usr/local /usr/local
RUN ldconfig
# Describe how to run a container.
EXPOSE 12345
CMD ["/usr/local/bin/rcssserver"]
Compared to the size of the C toolchain, header files, and build-time libraries, the difference between an Alpine and Ubuntu image is pretty small, and Alpine has well-documented library compatibility issues with its minimal libc implementation.

Docker: should I combine my apt-get install / build / cleanup steps into one big RUN?

I have a Dockerfile that looks like this:
FROM debian:stable-slim
RUN apt-get update && \
apt-get install -y --no-install-recommends fp-compiler fp-units-fcl fp-units-net libc6-dev
COPY src /whatwg/wattsi/src
WORKDIR /whatwg/wattsi/src
RUN ./build.sh
RUN rm -rf /whatwg/wattsi/src && \
apt-get purge -y fp-compiler fp-units-fcl fp-units-net libc6-dev && \
apt-get autoremove -y
ENTRYPOINT ["/whatwg/wattsi/bin/wattsi"]
As you can see, there are three separate RUN steps: one to install dependencies, one to build, and one to cleanup after building.
I've been poking around to try to figure out why the resulting image is relatively large, and it seems like it's because, even though I do a cleanup step, a layer is retained containing all the installed dependencies.
Should I restructure my Dockerfile like so?
FROM debian:stable-slim
COPY src /whatwg/wattsi/src
WORKDIR /whatwg/wattsi/src
RUN apt-get update && \
apt-get install -y --no-install-recommends fp-compiler fp-units-fcl fp-units-net libc6-dev && \
./build.sh && \
rm -rf /whatwg/wattsi/src && \
apt-get purge -y fp-compiler fp-units-fcl fp-units-net libc6-dev && \
apt-get autoremove -y
ENTRYPOINT ["/whatwg/wattsi/bin/wattsi"]
This feels a bit "squashed", and I can't find any documentation explicitly recommending it. All the documentation that says "minimize RUN commands" seems to focus on not doing multiple apt-get steps; it doesn't talk about squashing everything into one. But maybe it's the right thing to do?
Each layer in a Docker image is like a commit in version control, in can't change previous layers just like deleting a file in Git won't remove it from from history. So deleting a file from a previous layer doesn't make the image smaller.
Since layers are created at the end of RUN, doing what you're doing is indeed one way to make smaller images. The other, as someone mentioned, is multi-stage builds.
The downside of the single RUN variant is that you have to rerun the whole thing every time source code changes. So you need to apt-get all those packages each time instead of relying on Docker's build caching (I wrote a thing explaining the caching here: https://pythonspeed.com/articles/docker-caching-model/).
So multi-stage lets you have both faster builds via caching, and small images. But it's complicated to get right, what you did is simpler and easier.

Docker multistage build vs. keeping artifacts in git

My target container is a build environment container, so my team would build an app in a uniform environment.
This app doesn't necessarily run as a container - it runs on physical machine. The container is solely for building.
The app depends on third parties.
Some I can apt-get install with Dockerfile RUN command.
And some I must build myself because they require special building.
I was wondering which way is better.
Using multistage build seems cool; Dockerfile for example:
From ubuntu:18.04 as third_party
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
...
ADD http://.../boost.tar.gz /
RUN tar boost.tar.gz && \
... && \
make --prefix /boost_out ...
From ubuntu:18.04 as final
COPY --from=third_party /boost_out/ /usr/
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
...
CMD ["bash"]
...
Pros:
Automatically built when I build my final container
Easy to change third party version (boost in this example)
Cons
ADD command downloads ~100MB file each time, makes image build process slower
I want to use --cache-from so I would be able to cache third_party and build from different docker host machine. Meaning I need to store ~1.6GB image in a docker registry. That's pretty heavy to pull/push.
On the other hand
I could just build boost (with this third_party image) and store its artifacts on some storage, git for example. It would take ~200MB which is better than storing 1.6GB image.
Pros:
Smaller disc space
Cons:
Cumbersome build
Manually build and push artifacts to git when changing boost version.
Somehow link Docker build and git to pull newest artifacts and COPY to the final image.
In both ways I need a third_party image that uniformly and automatically builds third parties. In 1. the image bigger than 2. that will contain just build tools, and not build artifacts.
Is this the trade-off?
1. is more automatic but consumes more disk space and push/pull time,
2. is cumbersome but consumes less disk space and push/pull time?
Are there any other virtues for any of these ways?
I'd like to propose changing your first attempt to something like this:
FROM ubuntu:18.04 as third_party
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
...
RUN wget http://.../boost.tar.gz -O /boost.tar.gz && \
tar xvf boost.tar.gz && \
... && \
make --prefix /boost_out ... && \
find -name \*.o -delete && \
rm /boost.tar.gz # this is important!
From ubuntu:18.04 as final
COPY --from=third_party /boost_out/ /usr/
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
...
CMD ["bash"]
This way, you are paying for the download of boost only once (when building the image without a cache), and you do not pay for the storage/pull-time of the original tar-ed sources. Additionally, you should remove unneeded target files (.o?) from the build in the same step in which they are generated. Otherwise, they are stored and pulled as well.
If you are at liberty posting the whole Dockerfile, I'll gladly take a deeper look at it and give you some hints.

Install RPM package in a pre-built node image

I am writing a Node app I want to containerize using a pre-built node image (https://hub.docker.com/_/node/). I need to deploy application that I only have a RPM package for and I cannot figure out where to start finding documentation or a small example to do this.
The examples I'm looking at use yum, which I don't have (from my understanding) in the pre-built node image.
COPY src/MyApp/lib/3rdPartyApp.x86_64.rpm ./3rdPartyApp.x86_64.rpm
RUN yum localinstall 3rdPartyApp.x86_64.rpm; yum clean all && \
rm ./3rdPartyApp.x86_64.rpm
My other option is to use a CentOS docker image which has yum. But I'm running in to problems getting Node installed there trying to use NVM. But I'm also reading I shouldn't try to use NVM when building a Docker container and there is a better way.
You can use alien to convert packages from one format to another.
FROM node
RUN apt-get update && apt-get install -y alien
COPY src/MyApp/lib/3rdPartyApp.x86_64.rpm ./3rdPartyApp.x86_64.rpm
RUN alien -d -i 3rdPartyApp.x86_64.rpm
This will leave a lot of extra files in your image. You can use two step build to clear it up.
FROM node AS builder
RUN apt-get update && apt-get install -y alien
COPY src/MyApp/lib/3rdPartyApp.x86_64.rpm ./3rdPartyApp.x86_64.rpm
RUN alien -d 3rdPartyApp.x86_64.rpm
FROM node
COPY --from=builder 3rdPartyApp.x86_64.deb .
RUN dpkg -i 3rdPartyApp.x86_64.deb && rm 3rdPartyApp.x86_64.deb
FROM centos:centos7.6.1810
# Enable EPEL to install Node.js and npm
RUN rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm && \
    yum -y update && \
    yum install -y npm git && \
    yum clean all

Docker commands require keyboard interaction

I'm trying to create a Docker image for ripping CDs (using abcde).
Here's the relevant portion of the Dockerfile:
FROM ubuntu:17.10
MAINTAINER Graham Nicholls <graham#rockcons.co.uk>
RUN apt update && apt -y install eject vim ruby abcde
...
Unfortunately, the package "abcde" pulls in a mail client (not sure which), and apt tries to configure that by asking what type of mail connection to configure (smarthost/relay etc).
When docker runs, it's not appearing to read from stdin, so I can't redirect into the docker process.
I've tried using --nodeps with apt (and replacing apt with apt-get); unfortunately --nodeps seems no-longer to be a supported option and returns:
E: Command line option --nodeps is not understood in combination with the other options
Someone has suggested using expect in response to a similar question, which I'd rather avoid. This seems to be a "difficult to google" problem - I can't find anything.
So, is there a way of passing in the answer to the config in apt, or of preventing apt from pulling in a mail client, which would be better - I'm not planning in sending updates to cddb.
The typical template to install apt packages in a docker container looks like:
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
eject \
vim \
ruby \
abcde \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
Running it with the "noninteractive" value removes any prompts. You don't want to set that as an ENV since that would also impact any interactive commands you run inside the container.
You also want to cleanup the package database when finished to reduce the layer size and avoid reusing a stale cached package database in a later step.
The no-install-recommends option will reduce the number of packages installed by only installing the required dependencies, not the additional recommended packages. This cuts the size of the root filesystem down by half for me.
If you need to pass a non-default configuration to a package, then use debconf. First run you install somewhere interactively and enter the options you want to save. Install debconf-utils. Then run:
debconf-get-selections | grep "${package_name}"
to view all the options you configured for that package. You can then pipe these options to debconf-set-selections in your container before running your install, e.g.:
RUN echo "postfix postfix/main_mailer_type select No configuration" \
| debconf-set-selections \
&& apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
....
or save your selections to a file that you copy in:
COPY debconf-selections /
RUN debconf-set-selections </debconf-selections \
&& apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
....

Resources