Docker how to ADD a file without committing it to an image? - docker

I have a ~300Mb zipped local file that I add to a docker image. The next state then extracts the image.
The problem is that the ADD statement results in a commit that results in a new file system layer makes the image ~300Mb larger than it needs to be.
ADD /files/apache-stratos.zip /opt/apache-stratos.zip
RUN unzip -q apache-stratos.zip && \
rm apache-stratos.zip && \
mv apache-stratos-* apache-stratos
Question: Is there a work-around to ADD local files without causing a commit?
One option is to run a simple web server (e.g. python -m SimpleHTTPServer) before starting the docker build, and then using wget to retrieve the file, but that seems a bit messy:
RUN wget http://localhost:8000/apache-stratos.zip && \
unzip -q apache-stratos.zip && \
rm apache-stratos.zip && \
mv apache-stratos-* apache-stratos
Another option is to extract the zipped file at container start up instead of build time, but I would prefer to keep the start up as quick as possible.

According to the documentation, if you pass an archive file from the local filesystem (not a URL) to ADD in the Dockerfile (with a destination path, not a path + filename), it will uncompress the file into the directory given.
If <src> is a local tar archive in a recognized compression format
(identity, gzip, bzip2 or xz) then it is unpacked as a directory.
Resources from remote URLs are not decompressed. When a directory is
copied or unpacked, it has the same behavior as tar -x: the result is
the union of:
1) Whatever existed at the destination path and 2) The contents of the
source tree, with conflicts resolved in favor of "2." on a file-by-file basis.
try:
ADD /files/apache-stratos.zip /opt/
and see if the files are there, without further decompression.

With Docker 17.05+ you can use a multi-stage build to avoid creating extra layers.
FROM ... as stage1
# No need to clean up here, these layers will be discarded
ADD /files/apache-stratos.zip /opt/apache-stratos.zip
RUN unzip -q apache-stratos.zip
&& mv apache-stratos-* apache-stratos
FROM ...
COPY --from=stage1 apache-stratos/ apache-stratos/

You can use docker-squash to squash newly created layers. That should reduce the image size significantly.
Unfortunately the mentioned workarounds (RUN curl ... && unzip ... & rm ..., unpack on container start) are the only options at the moment (docker 1.11).

There are currently 3 options I can think of.
Option 1: you can switch to a tar or compressed tar format from the zip file and then allow ADD to decompress the file for you.
ADD /files/apache-stratos.tgz /opt/
Only downside is any other changes, like a directory rename, will trigger the copy on write again, so you need to make sure your tar file has the contents in the final directory structure.
Option 2: Use a multi-stage build. Extract the file in an early stage, perform any changes, and then copy the resulting directory to your final stage. This is a good option for any build engines that cannot use BuildKit. augurar's answer covers this so I won't repeat the same Dockerfile he already has.
Option 3: BuildKit (available in 18.09 and newer) allows you to mount files from other locations, including your build context, within a RUN command. This currently requires the experimental syntax. The resulting Dockerfile looks like:
# syntax=docker/dockerfile:experimental
FROM ...
...
RUN --mount=type=bind,source=/files/apache-stratos.zip,target=/opt/apache-stratos.zip \
unzip -q apache-stratos.zip && \
rm apache-stratos.zip && \
mv apache-stratos-* apache-stratos
Then to build that, you export a variable before running your build (you could also export it in your .bashrc or equivalent):
DOCKER_BUILDKIT=1 docker build -t your_image .
More details on BuildKit's experimental features are available here: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md

Related

How can I use a several line command in a Dockerfile in order to create a file within the resulting Image

I'm following installation instructions for RedhawkSDR, which rely on having a Centos7 OS. Since my machine uses Ubuntu 22.04, I'm creating a Docker container to run Centos7 then installing RedhawkSDR in that.
One of the RedhawkSDR installation instructions is to create a file with the following command:
cat<<EOF|sed 's#LDIR#'`pwd`'#g'|sudo tee /etc/yum.repos.d/redhawk.repo
[redhawk]
name=REDHAWK Repository
baseurl=file://LDIR/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk
EOF
How do I get a Dockerfile to execute this command when creating an image?
(Also, although I can see that this command creates the file /etc/yum.repos.d/redhawk.repo, which consists of the lines from [redhawk] to gpgkey=...., I have no idea how to parse this command and understand exactly why it does that...)
Using the text editor of your choice, create the file on your local system. Remove the word sudo from it; give it an additional first line #!/bin/sh. Make it executable using chmod +x create-redhawk-repo.
Now it is an ordinary shell script, and in your Dockerfile you can just RUN it.
COPY create-redhawk-repo ./
RUN ./create-redhawk-repo
But! If you look at what the script actually does, it just writes a file into /etc/yum.repos.d with a LDIR placeholder replaced with some other directory. The filesystem layout inside a Docker image is fixed, and there's no particular reason to use environment variables or build arguments to hold filesystem paths most of the time. You could use a fixed path in the file
[redhawk]
name=REDHAWK Repository
baseurl=file:///redhawk-yum/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk
and in your Dockerfile, just COPY that file in as-is, and make sure the downloaded package archive is in that directory. Adapting the installation instructions:
ARG redhawk_version=3.0.1
RUN wget https://github.com/RedhawkSDR/redhawk/releases/download/$redhawk_version/\
redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& tar xzf redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& rm redhawk-yum-$redhawk_version-el7-x86_64.tar.gz \
&& mv redhawk-yum-$redhawk_version-el7-x86_64 redhawk-yum \
&& rpm -i redhawk-yum/redhawk-release*.rpm
COPY redhawk.repo /etc/yum.repos.d/
Remember that, in a Dockerfile, you are root unless you've switched to another USER (and in that case you can use USER root to switch back); you do not need generally sudo in Docker at all, and can just delete sudo where it appears in these instructions.
How do I get a Dockerfile to execute this command when creating an image?
Just use printf and run this command as single line:
FROM image_name:image_tag
ARG LDIR="/default/folder/if/argument/not/set"
# if container has sudo command and default user is not root
# you should choose this variant
RUN printf '[redhawk]\nname=REDHAWK Repository\nbaseurl=file://%s/\nenabled=1\ngpgcheck=1\ngpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk\n' "$LDIR" | sudo tee /etc/yum.repos.d/redhawk.repo
# if default container user is root this command without piping may be used
RUN printf '[redhawk]\nname=REDHAWK Repository\nbaseurl=file://%s/\nenabled=1\ngpgcheck=1\ngpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhawk\n' "$LDIR" > /etc/yum.repos.d/redhawk.repo
Where LDIR is an argument and docker build process should be run like:
docker build ./ --build-arg LDIR=`pwd`

Docker proxy config not working for ADD in Dockerfile

I try to write a Dockerfile that adds a file to the image like this:
ADD https://repository.internal/file.zip /tmp/
The repository.internal host is only reachable through a proxy. I provide the proxy configuraton with the --config option but the ADD command seems not to use the proxy and fails.
I know the proxy configuration is correct because I added the line
RUN curl https://repository.internal/file.zip
which is working fine.
Is there any possibility to run the ADD command also with the proxy config?
As per my comments above, I believe this to be something to do with the internal way the Docker build process handles the ADD and RUN commands... I cant find documentation to back this up - so someone with greater internal knowledge may confirm or deny, but makes sense as a RUN command is done in a layer TO the image being built, where as the ADD command is performed and the results of it are baked into the image.
Whichever way this is being handled, you can achieve what you need by moving to the RUN method as follows:
FROM <your base image>
RUN curl https://repository.internal/file.zip >> /tmp/file.zip \
&& cd /tmp \
&& unzip file.zip \
&& rm file.zip
And you will have the files unzipped.
You may need to check if the rm at the end is required - cant remember off the top of my head if the unzip command removes the original zip file.
As you mentioned, this would rely on the curl and unzip packages being available on the image... however you could potentially avoid having these within your final application image by using Docker Multi Stage Builds
Your Dockerfile would then look something like:
FROM <some useful base image> as collector
RUN apt-get install -y curl unzip
RUN mkdir /tmp/files && \
&& curl https://repository.internal/file.zip >> /tmp/files/file.zip \
&& cd /tmp/files \
&& unzip file.zip \
&& rm file.zip
FROM <your final desired base image>
COPY --from=collector /tmp/files /tmp
This would then utilise an image to have curl and unzip in to collect and deal with the extraction of your files without having to install them on your final application image.

configured a downloaded package with ./configure, how to remove it completely from centos

did following in setup.sh and creating a docker image
wget -qO-
https://downloads.sourceforge.net/project/libpng/zlib/1.2.8/zlib-1.2.8.tar.gz | tar zvx
rm zlib-1.2.8.tar.gz
cd zlib-1.2.8
./configure
make
make install
at the end of docker file want to remove all binaries of this package to reduce the size of docker, How to do that
You probably want to run make uninstall after you have done what you want with this library + remove zlib-1.2.8 folder. Your Dockerfile should look like :
FROM centos:7
RUN ./setup.sh \
&& ./do_stuff_with_zlib.sh \
&& ./uninstall_zlib.sh
The uninstall_zlib.sh script should contain :
#!/usr/bin/env sh
(cd zlib-1.2.8; make uninstall) # uninstall binaries
rm zlib-1.2.8 # also remove folder to gain some space
Note that ./setup.sh and ./uninstall_zlib.sh should be run in the same layer (same RUN directive), otherwise resulting image size will not be reduced (unless you squash it afterwards).

How to verify if the content of two Docker images is exactly the same?

How can we determine that two Docker images have exactly the same file system structure, and that the content of corresponding files is the same, irrespective of file timestamps?
I tried the image IDs but they differ when building from the same Dockerfile and a clean local repository. I did this test by building one image, cleaning the local repository, then touching one of the files to change its modification date, then building the second image, and their image IDs do not match. I used Docker 17.06 (the latest version I believe).
If you want to compare content of images you can use docker inspect <imageName> command and you can look at section RootFS
docker inspect redis
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:eda7136a91b7b4ba57aee64509b42bda59e630afcb2b63482d1b3341bf6e2bbb",
"sha256:c4c228cb4e20c84a0e268dda4ba36eea3c3b1e34c239126b6ee63de430720635",
"sha256:e7ec07c2297f9507eeaccc02b0148dae0a3a473adec4ab8ec1cbaacde62928d9",
"sha256:38e87cc81b6bed0c57f650d88ed8939aa71140b289a183ae158f1fa8e0de3ca8",
"sha256:d0f537e75fa6bdad0df5f844c7854dc8f6631ff292eb53dc41e897bc453c3f11",
"sha256:28caa9731d5da4265bad76fc67e6be12dfb2f5598c95a0c0d284a9a2443932bc"
]
}
if all layers are identical then images contains identical content
After some research I came up with a solution which is fast and clean per my tests.
The overall solution is this:
Create a container for your image via docker create ...
Export its entire file system to a tar archive via docker export ...
Pipe the archive directory names, symlink names, symlink contents, file names, and file contents, to an hash function (e.g., MD5)
Compare the hashes of different images to verify if their contents are equal or not
And that's it.
Technically, this can be done as follows:
1) Create file md5docker, and give it execution rights, e.g., chmod +x md5docker:
#!/bin/sh
dir=$(dirname "$0")
docker create $1 | { read cid; docker export $cid | $dir/tarcat | md5; docker rm $cid > /dev/null; }
2) Create file tarcat, and give it execution rights, e.g., chmod +x tarcat:
#!/usr/bin/env python3
# coding=utf-8
if __name__ == '__main__':
import sys
import tarfile
with tarfile.open(fileobj=sys.stdin.buffer, mode="r|*") as tar:
for tarinfo in tar:
if tarinfo.isfile():
print(tarinfo.name, flush=True)
with tar.extractfile(tarinfo) as file:
sys.stdout.buffer.write(file.read())
elif tarinfo.isdir():
print(tarinfo.name, flush=True)
elif tarinfo.issym() or tarinfo.islnk():
print(tarinfo.name, flush=True)
print(tarinfo.linkname, flush=True)
else:
print("\33[0;31mIGNORING:\33[0m ", tarinfo.name, file=sys.stderr)
3) Now invoke ./md5docker <image>, where <image> is your image name or id, to compute an MD5 hash of the entire file system of your image.
To verify if two images have the same contents just check that their hashes are equal as computed in step 3).
Note that this solution only considers as content directory structure, regular file contents, and symlinks (soft and hard). If you need more just change the tarcat script by adding more elif clauses testing for the content you wish to include (see Python's tarfile, and look for methods TarInfo.isXXX() corresponding to the needed content).
The only limitation I see in this solution is its dependency on Python (I am using Python3, but it should be very easy to adapt to Python2). A better solution without any dependency, and probably faster (hey, this is already very fast), is to write the tarcat script in a language supporting static linking so that a standalone executable file was enough (i.e., one not requiring any external dependencies, but the sole OS). I leave this as a future exercise in C, Rust, OCaml, Haskell, you choose.
Note, if MD5 does not suit your needs, just replace md5 inside the first script with your hash utility.
Hope this helps anyone reading.
Amazes me that docker doesn't do this sort of thing out of the box. Here's a variant on #mljrg's technique:
#!/bin/sh
docker create $1 | {
read cid
docker export $cid | tar Oxv 2>&1 | shasum -a 256
docker rm $cid > /dev/null
}
It's shorter, doesn't need a python dependency or a second script at all, I'm sure there are downsides but it seems to work for me with the few tests I've done.
There doesn't seem to be a standard way for doing this. The best way that I can think of is using the Docker multistage build feature.
For example, here I am comparing the apline and debian images. In yourm case set the image names to the ones you want to compare
I basically copy all the file from each image into a git repository and commit after each copy.
FROM alpine as image1
FROM debian as image2
FROM ubuntu
RUN apt-get update && apt-get install -y git
RUN git config --global user.email "you#example.com" &&\
git config --global user.name "Your Name"
RUN mkdir images
WORKDIR images
RUN git init
COPY --from=image1 / .
RUN git add . && git commit -m "image1"
COPY --from=image2 / .
RUN git add . && git commit -m "image2"
CMD tail > /dev/null
This will give you an image with a git repository that records the differences between the two images.
docker build -t compare .
docker run -it compare bash
Now if you do a git log you can see the logs and you can compare the two commits using git diff <commit1> <commit2>
Note: If the image building fails at the second commit, this means that the images are identical, since a git commit will fail if there are no changes to commit.
If we rebuild the Dockerfile it is almost certainly going to produce a new hash.
The only way to create an image with the same hash is to use docker save and docker load. See https://docs.docker.com/engine/reference/commandline/save/
We could then use Bukharov Sergey's answer (i.e. docker inspect) to inspect the layers, looking at the section with key 'RootFS'.

Manually installing SonarQube plugins on Docker image

I want to create my custom SonarQube docker image, with some plugins already installed, but every time I run my container, the plugins are not there. It's like something removes the plugins from /opt/sonarqube/extensions/plugins and copy the bundled-plugins there.
My Dockerfile
FROM sonarqube
ENV SONARQUBE_HOME /opt/sonarqube
RUN wget "http://downloads.sonarsource.com/plugins/org/codehaus/sonar-plugins/sonar-scm-git-plugin/1.1/sonar-scm-git-plugin-1.1.jar" \
&& wget "https://github.com/SonarSource/sonar-java/releases/download/3.12-RC2/sonar-java-plugin-3.12-build4634.jar" \
&& wget "https://github.com/SonarSource/sonar-github/releases/download/1.1-M9/sonar-github-plugin-1.1-SNAPSHOT.jar" \
&& wget "https://github.com/SonarSource/sonar-auth-github/releases/download/1.0-RC1/sonar-auth-github-plugin-1.0-SNAPSHOT.jar" \
&& wget "https://github.com/QualInsight/qualinsight-plugins-sonarqube-badges/releases/download/qualinsight-plugins-sonarqube-badges-1.2.1/qualinsight-sonarqube-badges-1.2.1.jar" \
&& mv *.jar $SONARQUBE_HOME/extensions/plugins \
&& ls -lah $SONARQUBE_HOME/extensions/plugins
I tried listing the folder, and it lists my desired plugins. But if I list the same folder after I started the container, they are gone.
I've also tried removing the bundled-plugins with no luck.
Any ideas?
The Sonarqube image uses a volume for the /extensions/ directory, which results in the files in that directory not being stored in the image's filesystem; see the Dockerfile
To package those extensions in your image, you need them outside of that directory, and copy those files to the /extensions/ in an entrypoint script, or store your plugins in a separate image, and mount those plugins as a volume when running the image; you can find an example doing that here; https://github.com/SonarSource/docker-sonarqube/blob/master/recipes.md
Note the accepted answer is no longer true. It should work if you directly use recent parent sonarqube image. So if the Dockerfile metioned in the question does not work, you have a different problem.
See commit 80366e3419d698b4bba4447f153418ef64b3b705 for more info.
Remove volume for "conf", "logs" and "extensions" directories
Explicit declaration of volume is appropriate only for data stored by
application, volume for configurable things is almost never
appropriate (see docker-library/official-images#2437).
This reverts commit 69fca2e. And additionally removes declaration of
volume for "extensions" directory.
Before SonarQube 5.6, plugins are stored in a volume, so the appropriate command is (after starting the sonarqube container):
wget https://sonarsource.bintray.com/Distribution/sonar-auth-github-plugin/sonar-auth-github-plugin-1.2.jar -P `docker inspect -f '{{ (index .Mounts 1).Source }}' sonarqube`/plugins
docker restart sonarqube

Resources