Ubuntu Docker image with minimal mono runtime in order to run F# app - docker

I need to build a "slim" docker image which only contains the mono runtime in order to execute a pre-compiled F# app. In other words, I want to create the leanest possible image for executing mono apps, without any of the additional stuff for compiling/building apps. I am using Ubuntu:16.04 as a my base image (which weighs at around 47MB).
If I try to install mono on top of that image (using apt-get install mono-devel), then the image grows to whopping 500MB. This of course happens because the entire mono development tools are installed.
How can I proceed to only create an image containing the mono runtime? Is there a way installing through apt-get the mono runtime?

I'm answering the question as it is stated:
How can I proceed to only create an image containing the mono runtime?
For that, the answer is yes. There is a package for just the runtime called mono-runtime. In addition to that, there is an apt option to ignore installing recommended packages (usually docs and other stuff that may not be necessary for a runtime) with --no-install-recommends. Combining the two, we can get down to around 240 MB on the Ubuntu base:
FROM ubuntu
RUN apt update && apt install -qy --no-install-recommends mono-runtime libfsharp-core4.3-cil
Also mentioned in comments, there are some more minimal images based on Alpine linux that may be of interest such as https://hub.docker.com/r/frolvlad/alpine-mono/ (which at the moment is around 200 MB).

Related

PIP3 installs a lot of other folders too when using requirements.txt making the build fail due to disk space issues

I'm trying to set up a Docker container (RHEL8) with Kaniko. IN the Dockerfile I specified to install Python3.8 and PIP3 to install Python libraries that were requested for the specific container. requirements.txt lists about 9 libraries (joblib, nltk, numpy, pandas, scikit-learn, scipy, spacy, torch, transformers), from some of which are quite large in size (for example Torch: 890M) but then, when I run
RUN python3.8 -m pip install -r requirements.txt
it runs through the requirements.txt from top to bottom, downloads them but then after the last line it also downloads a lot of other folders/packages too, some quite huge in size, like:
nvidia-cublas-cu11 : 317M
nvidia-cudnn-cu11 : 557M
It installs a lot of packages, like: MarkupSafe, blis, catalogue, certifi, charset-normalizer, click, confection, cymem, filelock, huggingface-hub, idna, jinja, langcodes, murmurash, etc.. and the list is quite impressive
I had to increase the disk size of the runner with 6G in order to even cope with the increased amount of downloaded stuff, but the build still fails upon Taking a snapshot of the full filesystem, due to running out of free disk space.
I have increased free disk space from 8G to 10G and then as a second attempt, to 14G, but the build still fails. I have also specified --single-snapshot option for Kaniko to only take one single snapshot at the end, and not creating separate snapshots at every step (RUN, COPY). I have installed an Nvidia driver to the container, for which I picked a quite lightweight one (450.102.04) which should not take up too much space either.
My question is: are the packages installed by pip3 after installing the list specified in requirements.txt basically dependencies, that I still must install, or are those optional?
Is there any option to overcome this excessing disk space issue? When I start the build process (via GitLab CI - Kaniko) the available free space on the xfs is 12G from 14G, so I should be enough, but the build fails with exit code 1 and message: "no space left on drive"

Simple & effective way to update packages inherited from parent docker image?

For my Docker-based app, I would like to quickly update all inherited packages FROM the parent image with the latest security patches.
I am fully aware, that the best practice here is to just run the build using the Dockerfile, this would automatically install all the latest packages, but unfortunately, this is not an option.
Is there any other quick & dirty way to achieve this besides doing the following?
FROM baseimage:1.0.0
RUN apt update && apt upgrade -y

Manage apt-get based dependency's on Docker

We have a large code based on c++ in my company and we are trying to move to a microservice infrastructure based on docker.
We have a couple of library in house that help us with things like helper functions and utility that we regularly use in our code. Our idea was to create a base image for developers with this library's already installed and make it available to use it as our "base" image. This will give us the benefit of all our software always using the latest version of our own library's.
My questions is related to the cache system of Docker in relation with CI and external dependencys. Lets say we have a Docker file like this:
FROM ubuntu:latest
# Install External dependencys
RUN apt update && apt install -y\
boost-libs \
etc...
# Copy our software
...
# Build it
...
# Install it
...
If our code changes we can trigger the CI and docker will understand that it can use the cached image that was created before up to the point were it copy's our software. What happens if one of our external dependency's offers a newer version? Will the cached be automatically be invalidated? How can we trigger a CI build in case any of our packages receives a new version?
In essence how do we make sure we are always using the latest packages available for our external dependency's?
Please keep in mind the above Dockerfile is just an example to illustrate we are trying to use other tricks in the playbook such us using a lighter base image (not Ubuntu) and multistage builds to avoid dev-packages in our production containers.
Docker's caching algorithm is fairly simple. It looks at the previous state of the image build, and the string of the command you are running. If you are performing a COPY or ADD, it also looks at a hash of those files being copied. If a previous build is found on the server with the same previous state and command being run, it will reuse the cache.
That means an external change, e.g. pulling packages from an external repository, will not be detected and the cache will be reused instead of rerunning that line. There are two solutions that I've seen to this:
Option 1: Change your command by adding versions to the dependencies. When one of those dependencies changes, you'll need to update your build. This is added work, but also guarantees that you only pull in the versions that you are ready for. That would look like (fixing boost-libs to a 1.5 version number):
# Install External dependencys
RUN apt update && apt install -y\
boost-libs=1.5 \
etc...
Option 2: change a build arg. These are injected as environment variables into the RUN commands and docker sees a change in the environment as a different command to run. That would look like:
# Install External dependencys
ARG UNIQUE_VAR
RUN apt update && apt install -y\
boost-libs \
etc...
And then you could build the above with the following to trigger the cache to be recreated daily on that line:
docker build --build-arg "UNIQUE_VAR=$(date +%Y%m%d)" ...
Note, there's also the option to build without the cache any time you wish with:
docker build --no-cache ...
This would cause the cache to be ignored for all steps (except the from line).

Conventional way to resolve docker derived image build time vs. image size tradeoff

Two constraints are often important in writing Dockerfiles: image size and image build time.
It's a commonplace observation that time and space usage can often be traded off for one another. However, it can be useful to avoid that choice by going for fast build time in development and small-but-slower builds in production.
For example, if I write something like this in a project I can quickly rebuild the images in development when frequently_changing_source_code changes, because there is a layer with build-essential installed that can be reused in the derived image:
base image:
RUN apt install build-essential python-dev && \
pip install some-pypi-project
ADD frequently_changing_source_code
derived image:
FROM base_image
RUN pip install another-pypi-project-requiring-build-essential
ADD more_stuff
The above results in larger builds than this next version, which achieves the same functionality but sacrifices build times. Now whenever frequently_changing_source_code changes, rebuilding the derived image results in a re-install of build-essential:
base image:
RUN apt install build-essential python-dev && \
pip install some-pypi-project && \
apt remove build-essential python-dev
ADD frequently_changing_source_code
derived image:
FROM base_image
RUN apt install build-essential python-dev && \
pip install another-pypi-project-requiring-build-essential && \
apt remove build-essential python-dev
ADD more_stuff
I can imagine ways of solving this: for example, writing a slightly more complicated set of Dockerfiles that are parameterized on some sort of development flag, which has the first behaviour for development builds, and the second for production builds. I suspect that would not result in Dockerfiles that people like to read and use, though.
So, how can I best achieve my ends without surprising other developers: i.e. using Dockerfiles that respect docker conventions as much as I can?
Some notes about answers I've considered:
I'm aware of the layer-caching behaviour of docker (that is why the ADD commands for both images in my example are at the end).
I'm aware that one can mount code using -v. Using -v is my usual practice, but this question is about building images, which is also something that happens in development (from time to time, it happens quite a lot).
One obvious suggestion is to eliminate the base image. However, note that for the projects concerned, the base image is typically a base for multiple images, so merging the base with those would result in a bunch of repeated directives in each of those Dockerfiles. Perhaps this is the least-worst option, though.
Also note that (again, in the projects I'm involved with) the mere presence of the frequently_changing_source_code does not by itself significantly contribute to build times: it is re-installs of packages like build-essential that does that. another-pypi-project-requiring-build-essential typically does contribute significantly to build times, but perhaps not enough to need to eliminate that step in development builds too.
Finally, though it is a commonly-cited nice feature of docker that it's possible to use the same configuration in development as in production, this particular source of variation is not a significant concern for us.
In the past there hasn't really been a good answer to this. You either build two different images, one for fast moving developers and the other for compact distribution, or you pick one that's less than ideal for others. There's a potential workaround if the developers compile the code themselves and simply mount their compiled product directly into the container as a volume for testing without a rebuild.
But last week docker added the ability to have a multi-stage build in 17.05.0-ce-rc1 (see pr 32063). They allow you to build parts of the app in separate pieces and copy the results into another image at the end, with caching of all the layers while the final image only contains the layers of the last section of the build. So for your scenario, you could have something like:
FROM debian:latest as build-env
# you can split these run lines now since these layers are only used at build
RUN apt install build-essential python-dev
RUN pip install some-pypi-project
RUN pip install another-pypi-project-requiring-build-essential
# you only need this next remove if the build tools are in the same folders as the app
RUN apt remove build-essential python-dev
FROM debian:latest
# update this copy command depending on the pip install location
COPY --from=build-env /usr/bin /usr/bin
ADD frequently_changing_source_code
ADD more_stuff
All the layers in the first build environment stick around in the cache, letting developers add and remove as they need to, without having to rerun the build-essential install. But in the final image, there's just 3 layers added, one copy command from the build-env and a couple adds, resulting in a small image. And if they only change files in those ADD commands, then only those steps run.
Here's an early blog post going into it in more detail. This is now available as an RC and you can expect it in the 17.05 edge release from docker, hopefully in the next few weeks. If you want to see another example of this really put to use, have a look at the miragesdk Dockerfile.

Install python3-gi for Travis-CI and Python >= 3.3

What is the correct way to install python3-gi on Travis-CI using the .travis.yml file?
The past recommendation was to use Python 3.2 (Travis-ci & Gobject introspection), but I would prefer testing against more recent versions.
I did try a few sensible combinations of commands, but my knowledge of the Travis-CI environment is very basic:
This for example fails with and without using system_site_packages: true:
before_install:
- sudo apt-get install -qq python3-gi
virtualenv:
- system_site_packages: true
Two examples of repositories that have this working (as far as I can tell):
https://github.com/ignatenkobrain/gnome-news (CircleCI)
https://github.com/devassistant/devassistant (Travis-CI)
In order to use a newer version you would either have to build it or use a container system like docker.
gnome-news has an example of a pygobject project using circleci (which is another free alternative to travis-ci). They are using fedora rawhide in docker which has the latest versions of the entire gnome stack.

Resources