Docker child image doesn't inherit packages installed in base image - docker

I need to have a hierarchy of the following Docker images:
A "base"image:
FROM python:3.5-slim-stretch
RUN apt install -y python3-enchant enchant libpq-dev gcc && apt clean
And a child image that inherits from the "base" likewise:
FROM myprivaterepo:30999/base-image
ENV PATH /usr/lib/postgresql/9.5/bin:$PATH
RUN pip3 install -r requirements.txt
The requirements.txt includes packages that are meant to be built with gcc and one of them needs to find the pg_config binary included in the libpq-dev package. The problem is that it cannot find them, even if it inherits and starts to build normally from the base image. (although if I install them in the child image, it all works - but that's not what I want.)
Any idea what I'm doing wrong? Many Thanks.

Have you ever built the base-image without that software? Then, it might be a caching problem of docker images, i.e. your child image is based on an old cached version of the base-image.
Verify that the following hashes match:
Building your base image prints as last line:
Successfully built <hash>
Building your child image prints in the beginning:
Step 1/x : FROM myprivaterepo:30999/base-image
---> <hash>
The <hash> should be identical.

Related

Generate requirements.txt from pyproject.toml

I'm using poetry in my project and now working on a feature that will allow to run the app inside of a docker container. Now, my Dockerfile looks like this:
COPY pyproject.toml /
...
RUN poetry install
The last command takes around 4 minutes which is quite a lot so I thought of caching somehow this dependencies. I'm trying to convert my pyproject.toml to requirements.txt so I could feed it to Docker and it would cache it if the file hasn't been changed since the last run.
Now I'm trying:
poetry export -f requirements.txt --output requirements.txt
And it only writes dependencies from [tool.poetry.dependencies] section, but the problem is that I have other sections and would like to see dependencies from those in my requirements.txt file. How should I modify the command above so it would take dependencies from other sections as well.
P.S. Maybe you might know other ways of how to cache poetry install in docker, I'd really appreciate that!
I think you can do 2-step dependencies install with poetry to cache dependensies like in example here - https://pythonspeed.com/articles/poetry-vs-docker-caching/, no need to migrate to requirements.txt. The idea is to copy only toml, install dependencies (this way dependencies will cache and need to update only if toml changes), then copy you source files (which change more often, than toml) and do install again. More detailed explanation in the link above (https://pythonspeed.com/articles/poetry-vs-docker-caching/)

Vulnerabilities and package deletion when building Docker image

Docker layers are additive, meaning that purging packages in a subsequent layer will not remove them from the previous one, and thus from the image.
In my understanding, what happens is that an additional masking layer is created, in which those packages are not shown anymore.
Indeed, if I build the MWE below and then run apt list --installed | grep libpython3.9-minimal after the purging, the package cannot be found.
However, I still don't understand entirely what happens under the hood.
Are the packages effectively still there, but masked?
If one of these packages causes vulnerability issues, is purging=masking a solution, or will we still have issues while being unaware of them (because the package seems to be removed and so does not show in an image scan, but is still there)?
FROM openjdk:11
# Remove packages
RUN apt-get purge -y libpython3.9-minimal
RUN apt-get autoremove -y
ENTRYPOINT ["/bin/bash"]

How to resolve libwkhtmltox.so reference in .Net AWS Lambda Docker image

I'm converting a .Net 2.1 lambda to 3.1 (or higher) and struggling with resolving the references that convert html to pdf. I'm currently using code from this solution https://github.com/HakanL/WkHtmlToPdf-DotNet, which works fine running a console app in the container. The lambda package is introducing issues that break this logic. Using a new lambda solution with this WkHtmlToPdf-DotNet project, the deployed image fails with this exception
GetModule WkHtmlModuleLinux64 Exception System.DllNotFoundException: Unable to load shared library '/var/task/runtimes/linux-x64/native/libwkhtmltox.so' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: libjpeg.so.62: cannot open shared object file: No such file or directory
I am using the LD_DEBUG environment variable which shows before the exception: file=runtimes/linux-x86/native/libwkhtmltox [0]; dynamically loaded by /var/lang/bin/shared/Microsoft.NETCore.App/5.0.12/libcoreclr.so [0]
And I also output to the log a search for the file which yields this line:
GetFilePath res: /var/task/runtimes/linux-x64/native/libwkhtmltox.so
Any suggestions how to continue to troubleshoot this?
Thanks,
Reuven
I was able to resolve this issue by installing few of the packages that is required by DinkToPdf library in a docker container environment.
The issue however for installing those packages were not straight forward in Amazon Linux 2 instances. Below is the docker file I had to add for the DinkToPdf work properly.
FROM public.ecr.aws/lambda/dotnet:core3.1
WORKDIR /var/task
COPY "bin/Release/lambda-publish" .
RUN yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
RUN yum install -y libgdiplus \
ca-certificates \
fontconfig \
freetype \
libX11 \
libXext \
libXrender \
For this to run I also had to copy the three dependent library files after build libwkhtmltox.dll, libwkhtmltox.dylib libwkhtmltox.dll.so.

Reducing copy layer size

Currently I am copying pre-downloaded packages and then installed on the docker image. The COPY layer currently has the same size as the directory being copied. Directory is later erased on another layer. Dockerfile looks as follows:
COPY python-packages /tmp/python-packages
RUN pip install -f /tmp/python-packages --no-index <pkg-name> \
&& rm -rf /tmp/*
Is there a way to copy files without having a layer the same size as the directory being copied? Any way to reduce COPY layer size?
Unfortunately as of this time you cannot reduce the size or eliminate the layer, RUN, COPY and ADD will create a layer every time.
What you can do is use pip to install directly from your version control
e.g. pip install git+https://git.example.com/MyProject#egg=MyProject
More info: https://pip.pypa.io/en/stable/reference/pip_install/#vcs-support
This brings the downside that you will have to give access to pip if your code is private and will introduce the need for network connectivity on your private network or the internet, depending on where your code is, on docker build time.
You could also use a multi stage build and install the python module with pip in another docker image and then just copy the artifacts to the final docker image. I highly do not recommend this though unless you have no choice and understand the risks, since you would have to copy all the folders and/or files pip touches in the install process and maybe create some others that it expects to be present and also get the permissions right in the final docker image, this will be hard to get right without deep diving in pip internals and also hard to maintain in the long run since pip might change its files and folders locations and/or structure in the future.
More on multi stage builds: https://docs.docker.com/develop/develop-images/multistage-build/

Conventional way to resolve docker derived image build time vs. image size tradeoff

Two constraints are often important in writing Dockerfiles: image size and image build time.
It's a commonplace observation that time and space usage can often be traded off for one another. However, it can be useful to avoid that choice by going for fast build time in development and small-but-slower builds in production.
For example, if I write something like this in a project I can quickly rebuild the images in development when frequently_changing_source_code changes, because there is a layer with build-essential installed that can be reused in the derived image:
base image:
RUN apt install build-essential python-dev && \
pip install some-pypi-project
ADD frequently_changing_source_code
derived image:
FROM base_image
RUN pip install another-pypi-project-requiring-build-essential
ADD more_stuff
The above results in larger builds than this next version, which achieves the same functionality but sacrifices build times. Now whenever frequently_changing_source_code changes, rebuilding the derived image results in a re-install of build-essential:
base image:
RUN apt install build-essential python-dev && \
pip install some-pypi-project && \
apt remove build-essential python-dev
ADD frequently_changing_source_code
derived image:
FROM base_image
RUN apt install build-essential python-dev && \
pip install another-pypi-project-requiring-build-essential && \
apt remove build-essential python-dev
ADD more_stuff
I can imagine ways of solving this: for example, writing a slightly more complicated set of Dockerfiles that are parameterized on some sort of development flag, which has the first behaviour for development builds, and the second for production builds. I suspect that would not result in Dockerfiles that people like to read and use, though.
So, how can I best achieve my ends without surprising other developers: i.e. using Dockerfiles that respect docker conventions as much as I can?
Some notes about answers I've considered:
I'm aware of the layer-caching behaviour of docker (that is why the ADD commands for both images in my example are at the end).
I'm aware that one can mount code using -v. Using -v is my usual practice, but this question is about building images, which is also something that happens in development (from time to time, it happens quite a lot).
One obvious suggestion is to eliminate the base image. However, note that for the projects concerned, the base image is typically a base for multiple images, so merging the base with those would result in a bunch of repeated directives in each of those Dockerfiles. Perhaps this is the least-worst option, though.
Also note that (again, in the projects I'm involved with) the mere presence of the frequently_changing_source_code does not by itself significantly contribute to build times: it is re-installs of packages like build-essential that does that. another-pypi-project-requiring-build-essential typically does contribute significantly to build times, but perhaps not enough to need to eliminate that step in development builds too.
Finally, though it is a commonly-cited nice feature of docker that it's possible to use the same configuration in development as in production, this particular source of variation is not a significant concern for us.
In the past there hasn't really been a good answer to this. You either build two different images, one for fast moving developers and the other for compact distribution, or you pick one that's less than ideal for others. There's a potential workaround if the developers compile the code themselves and simply mount their compiled product directly into the container as a volume for testing without a rebuild.
But last week docker added the ability to have a multi-stage build in 17.05.0-ce-rc1 (see pr 32063). They allow you to build parts of the app in separate pieces and copy the results into another image at the end, with caching of all the layers while the final image only contains the layers of the last section of the build. So for your scenario, you could have something like:
FROM debian:latest as build-env
# you can split these run lines now since these layers are only used at build
RUN apt install build-essential python-dev
RUN pip install some-pypi-project
RUN pip install another-pypi-project-requiring-build-essential
# you only need this next remove if the build tools are in the same folders as the app
RUN apt remove build-essential python-dev
FROM debian:latest
# update this copy command depending on the pip install location
COPY --from=build-env /usr/bin /usr/bin
ADD frequently_changing_source_code
ADD more_stuff
All the layers in the first build environment stick around in the cache, letting developers add and remove as they need to, without having to rerun the build-essential install. But in the final image, there's just 3 layers added, one copy command from the build-env and a couple adds, resulting in a small image. And if they only change files in those ADD commands, then only those steps run.
Here's an early blog post going into it in more detail. This is now available as an RC and you can expect it in the 17.05 edge release from docker, hopefully in the next few weeks. If you want to see another example of this really put to use, have a look at the miragesdk Dockerfile.

Resources