How to remove intermediate images from a build after the build? - docker

When you build your multi-stage Dockerfile with
docker build -t myimage .
it produces the final image tagged myimage, and also intermediate images. To be completely clear we are talking here not about containers, but about images. It looks like this:
See these <none> images? These are what I'm talking about.
Now this "issue" has been discussed to some extent here and here.
Here are some relevant parts:
If these intermediate images would be purged/pruned automatically, the build cache would be gone with each build, therefore forcing you to rebuild the entire image each time.
So okay, it does not make sense to prune then automatically.
Some people do this:
For now, I'm using docker image prune -f after my docker build -t app . command to cleanup those intermediate images.
But unfortunately this is not something I can do. As one discussion participant commented:
It removes "all dangling images", so in shared environments (like Jenkins slave) it's more akin to shooting oneself in the foot. :)
And this is a scenario I found myself in.
So nothing to be "fixed" on Docker side. But how can I remove those extra images, from a single particular build only?
Update
After reading very nice answer from d4nyll below, which is a big step forward, I'd like to add some more constraints to the question ;) First, let me sum up the answer:
One can use ARG to pass a build id from CI/CD to Dockerfile builder
Then one can use LABEL syntax to add build id metadata to the stage images being built
Then one can use the --filter option of docker image prune command to remove only the images with the current build id
This is a big step forward, but I'm still struggling into how to fit it into my usage scenario without adding unnecessary complexity.
In my case a requirement is that application developers who author the Dockerfiles and check them into the source control system are responsible for making sure that their Dockerfiles build the image to their satisfaction. They are not required to craft all their Dockerfiles in a specific way, "so our CI/CD process does not break". They simply have to provide a Dockerfile that produce correct docker image.
Thus, I'm not really in a position to request them to add stuff in the Dockerfile for every single application, just for the sake of CI/CD pipeline. This is something that CI/CD pipeline is expected to handle all by itself.
The only way I can see making this work is to write a Dockerfile parser, that will detect multi-staged build and inject a label per stage and then build that modified Dockerfile. This is a complexity that I'm very hesitant to add to the CI/CD pipeline.
Do I have a better (read simpler) options?

As ZachEddy and thaJeztah mentioned in one of the issues you linked to, you can label the intermediate images and docker image prune those images based on this label.
Dockerfile (using multi-stage builds)
FROM node as builder
LABEL stage=builder
...
FROM node:dubnium-alpine
...
After you've built you image, run:
$ docker image prune --filter label=stage=builder
For Automation Servers (e.g. Jenkins)
If you are running the builds in an automation server (e.g. Jenkins), and want to remove only the intermediate images from that build, you can
Set a unique build ID as an environment variable inside your Jenkins build
Add an ARG instruction for this build ID inside your Dockerfile
Pass the build ID to docker build through the --build-arg flag
FROM node as builder
ARG BUILD_ID
LABEL stage=builder
LABEL build=$BUILD_ID
...
FROM node:dubnium-alpine
...
$ docker build --build-arg BUILD_ID .
$ docker image prune --filter label=stage=builder --filter label=build=$BUILD_ID
If you want to persists the build ID in the image (perhaps as a form of documentation accessible within the container), you can add another ENV instruction that takes the value of the ARG build argument. This also allows you to use the similar environment replacement to set the label value to the build ID.
FROM node as builder
ARG BUILD_ID
ENV BUILD_ID=$BUILD_ID
LABEL stage=builder
LABEL build=$BUILD_ID
...
FROM node:dubnium-alpine
...

We're doing exactly this, applying labels to the Dockerfile at build-time like this:
sed -i '/^FROM/a\
LABEL build_id=${env.BUILD_TAG}\
' Dockerfile
Probably too late to help the OP, but hopefully this will be useful to someone facing the same problem.

You can run docker build inserting another param wich will remove automatically the intermediate images:
docker build --force-rm -t myimage .

The easy way is to run the cmd docker rmi -f $(docker images -f "dangling=true" -q)

A little late, but the best option is
docker builder prune -a

If you do not want to use the cache at all, you can use the --no-cache=true option on the docker build command
Leverage build cache

Please use the below command for deleting all intermediate images:
docker rmi $(docker images -a|grep "<none>"|awk '$1=="<none>" {print $3}')

Related

Is it possible to run a Dockerfile without building an image (or to immediately/transparently discard the image)?

I have a use case where I call docker build . on one of our build machines.
During the build, a volume mount from the host machine is used to persist intermediate artifacts.
I don't care about this image at all. I have been tagging it with -t during the build and calling docker rmi after it's been created, but I was wondering if there was a one-liner/flag that could do this.
The docker build steps don't seem to have an appropriate flag for this behavior, but it may be simply because build is the wrong term.

Is it possible to specify a custom Dockerfile for docker run?

I have searched high and low for an answer to this question. Perhaps it's not possible!
I have some Dockerfiles in directories such as dev/Dockerfile and live/Dockerfile. I cannot find a way to provide these custom Dockerfiles to docker run. docker build has the -f option, but I cannot find a corresponding option for docker run in the root of the app. At the moment I think I will write my npm/gulp script to simply change into those directories, but this is clearly not ideal.
Any ideas?
You can't - that's not how Docker works:
The Dockerfile is used with docker build to build an image
The resulting image is used with docker run to run a container
If you need to make changes at run-time, then you need to modify your base image so that it can take, e.g. a command-line option to docker run, or a configuration file as a mount.

Can I obtain the Docker layer history on non-final stage Docker builds?

I'm working out a way to do Docker layer caching in CircleCI, and I've got a working solution. However, I am trying to improve it. The problem in any form of CI is that the image history is wiped for every build, so one needs to work out what files to restore, using the CI system's caching directives, and then what to load back into Docker.
First I tried this, inspired by this approach on Travis. To restore:
if [ -f /caches/${CIRCLE_PROJECT_REPONAME}.tar.gz ]; then gunzip -c /caches/${CIRCLE_PROJECT_REPONAME}.tar.gz | docker load; docker images; fi
And to create:
docker save $(docker history -q ${CIRCLE_PROJECT_REPONAME}:latest | grep -v '<missing>') | gzip > /caches/${CIRCLE_PROJECT_REPONAME}.tar.gz
This seemed to work OK, but my Dockerfile uses a two-stage build, and as soon as I COPYed files from the first to the final, it stopped referencing the cache. I assume this is because (a) docker history only applies to the final build, and (b) the non-cached changes in the first build stage have a new mtime, and so when they are copied to the final stage, they are regarded as new.
To get around this problem, I decided to try saving all images to the cache:
docker save $(docker images -a -q) | gzip > /caches/${CIRCLE_PROJECT_REPONAME}.tar.gz
This worked! However, it has a new problem: when I modify my Dockerfile, the old image cache will be loaded, new images will be added, and then everything will be stored in the cache. This will accumulate dead layers I will never need again, presumably until the CI provider's cache size limits are hit.
I think this can be fixed by caching all the stages of the build, but I am not sure how to reference the first stage. Is there a command I can run, similar to docker history -q -a, that will give me the hashes either for all non-last stages (since I can do the last one already) or for all stages including the last stage?
I was hoping docker build -q might do that, but it only prints the final hash, not all intermediate hashes.
Update
I have an inelegant solution, which does work, but there is surely a better way than this! I search the output of docker build for --->, which is Docker's way of announcing layer hashes and cache information. I strip out cache messages and arrows, leaving just the complete build layer hash list for all build stages:
docker build -t imagename . | grep '\-\-\->' | grep -v 'Using cache' | sed -e 's/[ >-]//g'
(I actually do the build twice - once for the build CI step proper, and a second time to gather the hashes. I could do it just once, but it feels nice to have the actual build in a separate step. The second build will always be cached, and will only take a few seconds to run).
Can this be improved upon, perhaps using Docker commands?
This is a summary of a conversation in the comments.
One option is to push all build stages to a remote. If there are two build stages, with the first one being named build and the second one unnamed, then one can do this:
docker build --target build --tag image-name-build .
docker build --tag image-name .
One can then push image-name (the final build artifact) and image-name-build (the first stage, which is normally thrown away) to a remote registry.
When rebuilding images, one can pull both of these onto the fresh CI build machine, and then do:
docker build --cache-from image-name-build --target build --tag image-name-build .
docker build --cache-from image-name --tag image-name .
As BMitch says, the --cache-from will indicate that the images can be trusted for the purposes of using them as a local layer cache.
Comparison
The temporary solution in the question is good if you have a CI-native cache system to store files in, and you would rather not clutter up your registry with intermediate build stage images that are normally thrown away.
The --cache-from solution is nice because it is tidier, and uses Docker-native features rather than having to grep build output. It will also be very useful if your CI solution does not provide a file caching system, since it uses a remote registry instead.

How can I make docker to not eat up disk space if used in Continuous integration

I am playing with docker and plan to use it in a GitLab CI environment to package the current project state to containers and provide running instances to do reviews.
I use a very simple Dockerfile as follows:
FROM php:7.0-apache
RUN sed -i 's!/var/www/html!/var/www/html/public!g' /etc/apache2/sites-available/000-default.conf
COPY . /var/www/html/
Now, as soon as a I a new (empty) file (touch foobar) to the current directory and call
docker build -t test2 --rm .
again, a full new layer is created, containing all of the code.
If I do not create a new file, the old image seems to be nicely reused.
I have a half-way solution using the following Dockerfile:
FROM test2:latest
RUN sed -i 's!/var/www/html!/var/www/html/public!g'
/etc/apache2/sites-available/000-default.conf
COPY . /var/www/html/
After digging into that issue and switching the storage driver to overlay, this seems to be what I want - only a few bytes are added as a new layer.
But now I am wondering, how I could integrate this into my CI setup - basically I would need two different Dockerfiles - depending on whether the image already exists or it doesn't.
Is there a better solution for this?
Build your images with same tags or no tags
docker build -t myapp:ci-build ....
or
docker build ....
If you use same tags then old images will be untagged and will have "" as name. If you don't tag them then also they will have "" in name.
Now you can schedule below command
docker system prune -f
This will remove all dangling images containers etc
One suggestion is to use the command docker image prune to clean dangling images. This can save you a lot of space. You can run this command regularly in your CI.

Labelling images in docker

I've got a jenkins server monitoring a git repo and building a docker image on code change. The .git directory is ignored as part of the build, but I want to associate the git commit hash with the image so that I know exactly what version of the code was used to make it and check whether the image is up to date.
The obvious solution is to tag the image with something like "application-name-branch-name:commit-hash", but for many develop branches I only want to keep the last good build, and adding more tags will make cleaning up old builds harder (rather than using the jenkins build number as the image is built, then retagging to :latest and untagging the build number)
The other possibility is labels, but while this looked promising initially, they proved more complicated in practice..
The only way I can see to apply a label directly to an image is in the Dockerfile, which cannot use the build environment variables, so I'd need to use some kind of templating to produce a custom Dockerfile.
The other way to apply a label is to start up a container from the image with some simple command (e.g. bash) and passing in the labels as docker run arguments. The container can then be committed as the new image. This has the unfortunate side effect of making the image's default command whatever was used with the labelling container (so bash in this case) rather than whatever was in the original Dockerfile. For my application I cannot use the actual command, as it will start changing the application state.
None of these seem particularly ideal - has anyone else found a better way of doing this?
Support for this was added in docker v1.9.0, so updating your docker installation to that version would fix your problem if that is OK with you.
Usage is described in the pull-request below:
https://github.com/docker/docker/pull/15182
As an example, take the following Dockerfile file:
FROM busybox
ARG GIT_COMMIT=unknown
LABEL git-commit=$GIT_COMMIT
and build it into an image named test as anyone would do naïvely:
docker build -t test .
Then inspect the test image to check what value ended up for the git-commit label:
docker inspect -f '{{index .ContainerConfig.Labels "git-commit"}}' test
unkown
Now, build the image again, but this time using the --build-arg option:
docker build -t test --build-arg GIT_COMMIT=0123456789abcdef .
Then inspect the test image to check what value ended up for the git-commit label:
docker inspect -f '{{index .ContainerConfig.Labels "git-commit"}}' test
0123456789abcdef
References:
Docker build command documentation for the --build-arg option
Dockerfile reference for the ARG directive
Dockerfile reference for the LABEL directive
You can specify a label on the command line when creating your image. So you would write something like
docker build -t myproject --label "myproject.version=githash" .
instead of hard-coding the version you can also get it directly from git:
docker build -t myproject --label "myproject.version=`git describe`" .
To read out the label from your images you can use docker inspect with a format string:
docker inspect -f '{{index .Config.Labels "myproject.version"}}' myproject
If you are using docker-compose, you could add the following to the build section:
labels:
git-commit-hash: ${COMMIT_HASH}
where COMMIT_HASH is your environment variable, which holds commit hash.

Resources