Disable cache for specific RUN commands - docker

I have a few RUN commands in my Dockerfile that I would like to run with -no-cache each time I build a Docker image.
I understand the docker build --no-cache will disable caching for the entire Dockerfile.
Is it possible to disable cache for a specific RUN command?

There's always an option to insert some meaningless and cheap-to-run command before the region you want to disable cache for.
As proposed in this issue comment, one can add a build argument block (name can be arbitrary):
ARG CACHEBUST=1
before such region, and modify its value each run by adding --build-arg CACHEBUST=$(date +%s) as a docker build argument (value can also be arbitrary, here it is current datetime, to ensure its uniqueness across runs).
This will, of course, disable cache for all following blocks too, as hash sum of the intermediate image will be different, which makes truly selective cache disabling a non-trivial problem, taking into account how docker currently works.

Use
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
before the RUN line you want to always run. This works because ADD will always fetch the file/URL and the above URL generates random data on each request, Docker then compares the result to see if it can use the cache.
I have also tested this and works nicely since it does not require any additional Docker command line arguments and also works from a Docker-compose.yaml file :)

If your goal is to include the latest code from Github (or similar), one can use the Github API (or equivalent) to fetch information about the latest commit using an ADD command.
docker build will always fetch an URL from an ADD command, and if the response is different from the one received last time docker build ran, it will not use the subsequent cached layers.
eg.
ADD "https://api.github.com/repos/username/repo_name/commits?per_page=1" latest_commit
RUN curl -sLO "https://github.com/username/repo_name/archive/main.zip" && unzip main.zip

As of February 2016 it is not possible.
The feature has been requested at GitHub

Not directly but you can divide your Dockerfile in several parts, build an image, then FROM thisimage at the beginning of the next Dockerfile, and build the image with or without caching

the feature added a week ago.
ARG FOO=bar
FROM something
RUN echo "this won't be affected if the value of FOO changes"
ARG FOO
RUN echo "this step will be executed again if the value of FOO changes"
FROM something-else
RUN echo "this won't be affected because this stage doesn't use the FOO build-arg"
https://github.com/moby/moby/issues/1996#issuecomment-550020843

Building on #Vladislav’s solution above I used in my Dockerfile
ARG CACHEBUST=0
to invalidate the build cache from hereon.
However, instead of passing a date or some other random value, I call
docker build --build-arg CACHEBUST=`git rev-parse ${GITHUB_REF}` ...
where GITHUB_REF is a branch name (e.g. main) whose latest commit hash is used. That means that docker’s build cache is being invalidated only if the branch from which I build the image has had commits since the last run of docker build.

I believe that this is a slight improvement on #steve's answer, above:
RUN git clone https://sdk.ghwl;erjnv;wekrv;qlk#gitlab.com/your_name/your_repository.git
WORKDIR your_repository
# Calls for a random number to break the cahing of the git clone
# (https://stackoverflow.com/questions/35134713/disable-cache-for-specific-run-commands/58801213#58801213)
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache
RUN git pull
This uses the Docker cache of the git clone, but then runs an uncached update of the repository.
It appears to work, and it is faster - but many thanks to #steve for providing the underlying principles.

Another quick hack is to write some random bytes before your command
RUN head -c 5 /dev/random > random_bytes && <run your command>
writes out 5 random bytes which will force a cache miss

Related

Right way to use secret flag in docker buildkit

I am struggling with the same problem mentioned by Gavin on
this question.
Specifically in with new
docker build secret information
What is the right way to use it that feature?
Looking around on the internet I only found some variations of the same example in docker documentation mentioned above, which prints the secret on build time. Maybe I didn't fully understand the example, so please help me.
If there is no way to get the secret in build time an use in another part of a Dockerfile (e.g. an ARG variable or RUN command), when and how that new feature can be used to truly keep my secret safe and also do the work?
My go is to use this new feature in build time and also keep my secret information safe in case someone get my image file and execute a history on it.
For example, ff I have a Dockerfile like this:
FROM influxdb:2.0
ENV DOCKER_INFLUXDB_INIT_MODE=setup
ENV DOCKER_INFLUXDB_INIT_USERNAME=admin
ENV DOCKER_INFLUXDB_INIT_PASSWORD="mypassword"
How can I use that new feature mentioned in docker documentation to set my variable (DOCKER_INFLUXDB_INIT_PASSWORD), for example, in a way that it will not logged in image history?
Thanks in advance
How can I use that new feature mentioned in docker documentation to
set my variable (DOCKER_INFLUXDB_INIT_PASSWORD), for example, in a way
that it will not logged in image history?
It depends on whether you need the secret only at build time, or
whether you actually want to use it at runtime. The latter situation
is probably more common, and there's already a canonical solution:
If you want to keep DOCKER_INFLUXDB_INIT_PASSWORD out of your image
history, just don't set it during the build process. Require it to be
set a runtime, e.g., via the -e VAR=VALUE argument to docker run
(or the --env-file option):
docker run -e DOCKER_INFLUXDB_INIT_PASSWORD=mypassword myimage
You could have an ENTRYPOINT script that checks for the variable at
runtime and exits with a helpful error message if it's not set.
The official Docker images for things like MySQL and PostgreSQL work
this way.
In contrast, a build secret is meant for secrets that are only
required at build time. For example, let's say your build process
needs to do something like this:
RUN curl -o /data/mydataset -u username:password \
https://example.com/dataset
You'd like to share your Dockerfile and associated sources with
other people, but you don't want to share your username and password.
This is where build secrets come in. You would instead stick your
authentication information in a file, and modify your Dockerfile to
read that information from a secret:
RUN --mount=type=secret,id=mysecret \
curl -o /data/mydataset -u $(cat /run/secrets/mysecret) \
https://example.com/dataset
In this example, once we've copied the remote file into the image,
we're done with the secret: we don't need it in order to run a
container from the image; it was only required during the build
process.
NB: The above assumes that you're providing the secret at build time as described in the documentation, so your build command might look something like:
DOCKER_BUILDKIT=1 docker build --secret id=mysecret,src=mysecret.txt -t myimage .

Docker build not using cache when copying Gemfile while using --cache-from

On my local machine, I have built the latest image, and running another docker build uses cache everywhere it should.
Then I upload the image to the registry as the latest, and then on my CI server, I'm pulling the latest image of my app in order to use it as the build cache to build the new version :
docker pull $CONTAINER_IMAGE:latest
docker build --cache-from $CONTAINER_IMAGE:latest \
--tag $CONTAINER_IMAGE:$CI_COMMIT_SHORT_SHA \
.
From the build output we can see the COPY of the Gemfile is not using the cache from the latest image, while I haven't updated that file :
Step 15/22 : RUN gem install bundler -v 1.17.3 && ln -s /usr/local/lib/ruby/gems/2.2.0/gems/bundler-1.16.0 /usr/local/lib/ruby/gems/2.2.0/gems/bundler-1.16.1
---> Using cache
---> 47a9ad7747c6
Step 16/22 : ENV BUNDLE_GEMFILE=$APP_HOME/Gemfile BUNDLE_JOBS=8
---> Using cache
---> 1124ad337b98
Step 17/22 : WORKDIR $APP_HOME
---> Using cache
---> 9cd742111641
Step 18/22 : COPY Gemfile $APP_HOME/
---> f7ff0ee82ba2
Step 19/22 : COPY Gemfile.lock $APP_HOME/
---> c963b4c4617f
Step 20/22 : RUN bundle install
---> Running in 3d2cdf999972
Aside node : It is working perfectly on my local machine.
Looking at the Docker documentation Leverage build cache doesn't seem to explain the behaviour here as neither the Dockerfile, nor the Gemfile has changed, so the cache should be used.
What could prevent Docker from using the cache for the Gemfile?
Update
I tried to copy the files setting the right permissions using COPY --chown=user:group source dest but it still doesn't use the cache.
Opened Docker forum topic: https://forums.docker.com/t/docker-build-not-using-cache-when-copying-gemfile-while-using-cache-from/69186
Let me share with you some information that helped me to fix some issues with Docker build and --cache-from, while optimizing a CI build.
I had struggled for several days because I didn't have the correct understanding, I was basing myself on incorrect explanations found on the webs.
So I'm sharing this here hoping it will be useful to you.
When providing multiple --cache-from, the order matters
The order is very important, because at the first match, Docker will stop looking for other matches and it will use that one for all the rest of the commands.
This is explained by the person who implemented the feature in the Github PR:
When using multiple --cache-from they are checked for a cache hit in the order that user specified. If one of the images produces a cache hit for a command only that image is used for the rest of the build.
There is also a lenghtier explanation in the initial ticket proposal:
Specifying multiple --cache-from images is bit problematic. If both images match there is no way(without doing multiple passes) to figure out what image to use. So we pick the first one(let user control the priority) but that may not be the longest chain we could have matched in the end. If we allow matching against one image for some commands and later switch to a different image that had a longer chain we risk in leaking some information between images as we only validate history and layers for cache. Currently I left it so that if we get a match we only use this target image for rest of the commands.
Using --cache-from is exclusive: the local Docker cache won't be used
This means that it doesn't add new caching sources, the image tags you provide will be the only caching source for the Docker build.
Even if you just built the same image locally, the next time you run docker build for it, in order to benefit from the cache, you need to either:
provide the correct tag(s) with --cache-from (and with the correct precedence); or
not use --cache-from at all (so that it will use the local build cache)
If the parent image changes, the cache will be invalidated
For example, if you have an image based on docker:stable, and docker:stable gets updated, the cached builds of your image will not be valid anymore as the layers of the base image were changed.
This is why, if you're configuring a CI build, it can be useful to docker pull the base image as well and include it in the --cache-from, as mentioned in this comment in yet another Github discussion.
I struggled with this problem, and in my case I used COPY when the checksum might have changed (but only technically, the content was functionally identical). So, I worked around this way:
Dockerfile:
ARG builder_image=base-builder
# Compilation/build stage
FROM golang:1.16 AS base-builder
RUN echo "build the app" > /go/app
# This step is required to facilitate docker cache. With the definition of a `builder_image` build tag
# we can essentially skip the build stage and use a prebuilt-image directly.
FROM $builder_image AS builder
# myapp docker image
FROM ubuntu:20.04 AS myapp
COPY --from=builder /go/app /opt/my-app/bin/
Then, I can run the following:
# build cache
DOCKER_BUILDKIT=1 docker build --target base-builder -t myapp-builder .
docker push myapp-builder
# use cache
DOCKER_BUILDKIT=1 docker build --target myapp --build-arg=builder_image=myapp-builder -t myapp .
docker push myapp
This way we can force Docker to use a prebuilt image as a cache.
For whoever is fighting with DockerHub automated builds and --cache-from. I realized images built from DockerHub would always lead to cache bust on COPY commands when pulled and used as build cache source. It seems to be also the case for #Marcelo (refs his comment).
I investigated by creating a very simple image doing a couple of RUN commands and later COPY. Everything is using the cache except the COPY. Even though content and permissions of the file being copied is the same on both the pulled image and the one built locally (verified via sha1sum and ls -l).
The solution for me was to publish the image to the registry from the CI (Travis in my case) rather than letting DockerHub automated build doing it. Let me emphasis here that I'm talking here about a specific case where files are definitely the same and should not cache bust, but you're using DockerHub automated builds.
I'm not sure why is that, but I know for instance old docker-engine version e.g. prior 1.8.0 didn't ignore file timestamp to decide whether to use the cache or not, refs https://docs.docker.com/release-notes/docker-engine/#180-2015-08-11 and https://github.com/moby/moby/pull/12031.
For a COPY command to be cached, the checksum needs to be identical on the source being copied. You can compare the checksum in the docker history output between the cache image and the one you just built. Most importantly, the checksum includes metadata like the file owner and file permission, in addition to file contents. Whitespace changes inside a file like changing to linefeeds between Linux and Windows styles will also affect this. If you download the code from a repo, it's likely the metadata, like the owner, will be different from the cached value.

Selecting different code branches when using a shared base image in Docker

I am containerising a codebase that serves multiple applications. I have created three images;
app-base:
FROM ubuntu
RUN apt-get install package
COPY ./app-code /code-dir
...
app-foo:
FROM app-base:latest
RUN foo-specific-setup.sh
and app-buzz which is very similar to app-foo.
This works currently, except I want to be able to build versions of app-foo and app-buzz for specific code branches and versions. It's easy to do that for app-base and tag appropriately, but app-foo and app-buzz can't dynamically select that tag, they are always pinned to app-base:latest.
Ultimately I want this build process automated by Jenkins. I could just dynamically re-write the Dockerfile, or not have three images and just have two nearly-but-not-quite identical Dockerfiles for each app that would need to be kept in sync manually (later increasing to 4 or 5). Each of those solutions has obvious drawbacks however.
I've seen lots of discussions in the past about things such as an INCLUDE statement, or dynamic tags. None seemed to come to anything.
Does anyone have a working, clean(ish) solution to this problem? As long as it means Dockerfile code can be shared across images, I'd be happy. If it also means that the shared layers of images don't need to be rebuilt for each app, then even better.
You could still use build args to do this.
Dockerfile:
FROM ubuntu
ARG APP_NAME
RUN echo $APP_NAME-specific-setup.sh >> /root/test
ENTRYPOINT cat /root/test
Build:
docker build --build-arg APP_NAME=foo -t foo .
Run:
$ docker run --rm foo
foo-specific-setup.sh
In your case you could run the correct script in the RUN using the argument you just set before. You would have one Dockerfile per app-base variant and run the correct set-up based on the build argument.
FROM ubuntu
RUN apt-get install package
COPY ./app-code /code-dir
ARG APP_NAME
RUN $APP_NAME-specific-setup.sh
Any layers before setting the ARG would not need to be rebuilt when creating other versions.
You can then push the built images to separate docker repositories for each app.
If your apps need different ENTRYPOINT instructions, you can have an APP_NAME-entrypoint.sh per app and rename it to entrypoint.sh within your APP_NAME-specific-setup.sh (or pass it through as an argument to run).

Check if docker used cache for build or not

is there any way to check if Docker used cache on every step of Docker build ?
Return value is 0 for successfull build - not saying anything about whether steps have been performed using cache or not.
I'm executing docker commands in bash script running in circleci environment and I'd like to skip Docker save, if every build step ran through cache.
Thanks for the answer.
I suspect the easiest way is to compare at the image ID - if this hasn't changed, the cache must have been used.
One interesting thing about the cache is that once one command invalidates it, all following commands skip it. From the docs:
Once the cache is invalidated, all subsequent Dockerfile commands will generate new images and the cache will not be used.
This means that if your last step is cached, all other steps before it were cached too - and your image has not changed.

Re-running Docker only until a certain step using caches?

Is there any option to force Docker to run a build without using caches from that step on?
A particular usecase is something like this:
...
ADD some.cfg some.cfg
RUN do something with some.cfg
While working on the Dockerfile it is often necessary to adjust configurations and test them.
From the Docker point of view the steps remain unmodified however from my point as a Dockerfile
write I want Docker to run the build using caches until the ADD operation and from that point on
without caches. Is this possible?
As Mykola suggests, Docker will take a checksum of the files and should invalidate the cache if the content changes. However, you can always force cache invalidation at a given line in a Dockerfile by setting or changing an environment variable at that point e.g:
...
ENV updated-adds-on 14-DEC-14
ADD...
According to this, you don't have to disable cache on the added file change as docker will examine the contents and skip cache on change. Otherwise
you can use the --no-cache=true option on the docker build command.

Resources