Efficient Dockerfile for many apt-get packages - docker

I've created a Dockerfile for an application I'm building that has a lot of large apt-get package dependencies. It looks something like this:
FROM ubuntu:15.10
RUN apt-get update && apt-get install -y \
lots-of-big-packages
RUN install_my_code.sh
As I develop my application, I keep coming up with unanticipated package dependencies. However, since all the packages are in one Dockerfile instruction, even adding one more breaks the cache and requires the whole lot to be downloaded and installed, which takes forever. I'm wondering if there's a better way to structure my Dockerfile?
One thought would be to put a separate RUN apt-get update && apt-get install -y command for each package, but running apt-get update lots of times probably eats up any savings.
The simplest solution would be to just add a second RUN apt-get update && apt-get install -y right after the first as a catchall for all of the unanticipated packages, but that divides the packages in an unintuitive way. (ie, "when I realized I needed it") I suppose I could combine them when dependencies are more stable, but I find I'm always overly optimistic about when that is.
Anyway, if anyone has a better way to structure it I'd love to hear it. (all of my other ideas run against the Docker principles of reproducibility)

I think you need to run apt-get update only once within the Dockerfile, typically before any other apt-get commands.
You could just first have the large list of known programs to install, and if you come up with a new one then just add a new RUN apt-get install -y abc to you Dockerfile and let docker continue form the previously cached command. Periodically (once a week, one a month?) you could re-organize them as you see fit or just run everything in a single command.
I suppose I could combine them when dependencies are more stable, but
I find I'm always overly optimistic about when that is.
Oh you actually mentioned this solution already, anyway there is no harm doing these "tweaks" every now and then. Just run apt-get update only once.

Related

Vulnerabilities and package deletion when building Docker image

Docker layers are additive, meaning that purging packages in a subsequent layer will not remove them from the previous one, and thus from the image.
In my understanding, what happens is that an additional masking layer is created, in which those packages are not shown anymore.
Indeed, if I build the MWE below and then run apt list --installed | grep libpython3.9-minimal after the purging, the package cannot be found.
However, I still don't understand entirely what happens under the hood.
Are the packages effectively still there, but masked?
If one of these packages causes vulnerability issues, is purging=masking a solution, or will we still have issues while being unaware of them (because the package seems to be removed and so does not show in an image scan, but is still there)?
FROM openjdk:11
# Remove packages
RUN apt-get purge -y libpython3.9-minimal
RUN apt-get autoremove -y
ENTRYPOINT ["/bin/bash"]

Simple & effective way to update packages inherited from parent docker image?

For my Docker-based app, I would like to quickly update all inherited packages FROM the parent image with the latest security patches.
I am fully aware, that the best practice here is to just run the build using the Dockerfile, this would automatically install all the latest packages, but unfortunately, this is not an option.
Is there any other quick & dirty way to achieve this besides doing the following?
FROM baseimage:1.0.0
RUN apt update && apt upgrade -y

How to confirm installations made in a docker container

I've got a docker container and i'm trying to install python. I'm using yum for it:
yum install -y https://centos7.iuscommunity.org/ius-release.rpm
=> NOTE that I using CentOS-8. I found other install tutorials online but they were with dnf which I don't have.
yum update
=> this returns a screen where you get the all-known prompt: Total download size: 24 M
Is this ok [y/N]:
but docker doesn't let me type anything here, he quit automatically leaving Operation aborted. instead of y or n. How can i confirm my installation?
I've found that when you run
yum update -y
it automatically says y to all the questions asked, so i could install it this way.

Ubuntu Docker image with minimal mono runtime in order to run F# app

I need to build a "slim" docker image which only contains the mono runtime in order to execute a pre-compiled F# app. In other words, I want to create the leanest possible image for executing mono apps, without any of the additional stuff for compiling/building apps. I am using Ubuntu:16.04 as a my base image (which weighs at around 47MB).
If I try to install mono on top of that image (using apt-get install mono-devel), then the image grows to whopping 500MB. This of course happens because the entire mono development tools are installed.
How can I proceed to only create an image containing the mono runtime? Is there a way installing through apt-get the mono runtime?
I'm answering the question as it is stated:
How can I proceed to only create an image containing the mono runtime?
For that, the answer is yes. There is a package for just the runtime called mono-runtime. In addition to that, there is an apt option to ignore installing recommended packages (usually docs and other stuff that may not be necessary for a runtime) with --no-install-recommends. Combining the two, we can get down to around 240 MB on the Ubuntu base:
FROM ubuntu
RUN apt update && apt install -qy --no-install-recommends mono-runtime libfsharp-core4.3-cil
Also mentioned in comments, there are some more minimal images based on Alpine linux that may be of interest such as https://hub.docker.com/r/frolvlad/alpine-mono/ (which at the moment is around 200 MB).

Conventional way to resolve docker derived image build time vs. image size tradeoff

Two constraints are often important in writing Dockerfiles: image size and image build time.
It's a commonplace observation that time and space usage can often be traded off for one another. However, it can be useful to avoid that choice by going for fast build time in development and small-but-slower builds in production.
For example, if I write something like this in a project I can quickly rebuild the images in development when frequently_changing_source_code changes, because there is a layer with build-essential installed that can be reused in the derived image:
base image:
RUN apt install build-essential python-dev && \
pip install some-pypi-project
ADD frequently_changing_source_code
derived image:
FROM base_image
RUN pip install another-pypi-project-requiring-build-essential
ADD more_stuff
The above results in larger builds than this next version, which achieves the same functionality but sacrifices build times. Now whenever frequently_changing_source_code changes, rebuilding the derived image results in a re-install of build-essential:
base image:
RUN apt install build-essential python-dev && \
pip install some-pypi-project && \
apt remove build-essential python-dev
ADD frequently_changing_source_code
derived image:
FROM base_image
RUN apt install build-essential python-dev && \
pip install another-pypi-project-requiring-build-essential && \
apt remove build-essential python-dev
ADD more_stuff
I can imagine ways of solving this: for example, writing a slightly more complicated set of Dockerfiles that are parameterized on some sort of development flag, which has the first behaviour for development builds, and the second for production builds. I suspect that would not result in Dockerfiles that people like to read and use, though.
So, how can I best achieve my ends without surprising other developers: i.e. using Dockerfiles that respect docker conventions as much as I can?
Some notes about answers I've considered:
I'm aware of the layer-caching behaviour of docker (that is why the ADD commands for both images in my example are at the end).
I'm aware that one can mount code using -v. Using -v is my usual practice, but this question is about building images, which is also something that happens in development (from time to time, it happens quite a lot).
One obvious suggestion is to eliminate the base image. However, note that for the projects concerned, the base image is typically a base for multiple images, so merging the base with those would result in a bunch of repeated directives in each of those Dockerfiles. Perhaps this is the least-worst option, though.
Also note that (again, in the projects I'm involved with) the mere presence of the frequently_changing_source_code does not by itself significantly contribute to build times: it is re-installs of packages like build-essential that does that. another-pypi-project-requiring-build-essential typically does contribute significantly to build times, but perhaps not enough to need to eliminate that step in development builds too.
Finally, though it is a commonly-cited nice feature of docker that it's possible to use the same configuration in development as in production, this particular source of variation is not a significant concern for us.
In the past there hasn't really been a good answer to this. You either build two different images, one for fast moving developers and the other for compact distribution, or you pick one that's less than ideal for others. There's a potential workaround if the developers compile the code themselves and simply mount their compiled product directly into the container as a volume for testing without a rebuild.
But last week docker added the ability to have a multi-stage build in 17.05.0-ce-rc1 (see pr 32063). They allow you to build parts of the app in separate pieces and copy the results into another image at the end, with caching of all the layers while the final image only contains the layers of the last section of the build. So for your scenario, you could have something like:
FROM debian:latest as build-env
# you can split these run lines now since these layers are only used at build
RUN apt install build-essential python-dev
RUN pip install some-pypi-project
RUN pip install another-pypi-project-requiring-build-essential
# you only need this next remove if the build tools are in the same folders as the app
RUN apt remove build-essential python-dev
FROM debian:latest
# update this copy command depending on the pip install location
COPY --from=build-env /usr/bin /usr/bin
ADD frequently_changing_source_code
ADD more_stuff
All the layers in the first build environment stick around in the cache, letting developers add and remove as they need to, without having to rerun the build-essential install. But in the final image, there's just 3 layers added, one copy command from the build-env and a couple adds, resulting in a small image. And if they only change files in those ADD commands, then only those steps run.
Here's an early blog post going into it in more detail. This is now available as an RC and you can expect it in the 17.05 edge release from docker, hopefully in the next few weeks. If you want to see another example of this really put to use, have a look at the miragesdk Dockerfile.

Resources