Bust cache bust within Dockerfile without providing external build args

Bust cache bust within Dockerfile without providing external build args - docker

We have a Dockerfile, where at a certain point would like no caching to happen.
Currently we are using
ENV CACHE_BUST=$($RANDOM)
Upon further inspection, funny enough that gets cached:
Step 1/1 : ENV CACHE_BUST=$($RANDOM)
---> Using cache
Is there any way from inside the Dockerfile to bust the cache without passing in a unique build-arg (Like docker build . --build-arg CACHE_BUST=$(date +%s)) in the build step?

Update: Reviewing this one, it looks like you injected the cache busting option incorrectly in two ways:
ENV is not an ARG
The $(x) syntax is not a variable expansion, you need curly brackets (${}), not parenthesis ($()).
To break the cache on the next run line, the syntax is:
ARG CACHE_BUST
RUN echo "command with external dependencies"
And then build with:
docker build --build-arg CACHE_BUST=$(date +%s) .
Why does that work? Because during the build, the values for ARG are injected into RUN commands as environment variables. Changing an environment variable results in a cache miss on the new build.
To bust the cache, one of the inputs needs to change. If the command being run is the same, the cache will be reused even if the command has external dependencies that have changed, since docker cannot see those external dependencies.
Options to work around this include:
Passing a build arg that changes (e.g. setting it to a date stamp).
Changing a file that gets included into the image with COPY or ADD.
Running your build with the --no-cache option.
Since you do not want to do option 1, there is a way to do option 3 on a specific line, but only if you can split up your Dockerfile into 2 parts. The first Dockerfile has all the lines as you have today up to the point you want to break the cache. Then the second Dockerfile has a FROM line to depend on the first Dockerfile, and you build that with the --no-cache option. E.g.
Dockerfile1:
FROM base
RUN normal steps
Dockerfile2
FROM intermediate
RUN curl external.jar>file.jar
RUN other lines that cannot be cached
CMD your cmd
Then build with:
docker build -f Dockerfile1 -t intermediate .
docker build -f Dockerfile2 -t final --no-cache .
The only other option I can think of is to make a new frontend with BuildKit that allows you to inject an explicit cache break, or unique variable that results in a cache break.

You can add ADD layer with the downloading of some dynamic page from stable source in the beginning of Dockerfile. Image will be always re-built without using a cache.
Just an example of Dockerfile:
FROM alpine:3.9
ADD https://google.com cache_bust
RUN apk add --no-cache wget
p.s. I believe you are aware of docker build --no-cache option.

Related

Docker question: Does Docker link FROM images at runtime or during build?

Does Docker link images at build time or execution time? I am presenting a very simple case here to demonstrate my dilemma but I am presently rebuilding a chain of Docker files every time I change the base docker image and Im not sure if I even need to.
I have two docker files:
my base-docker-file contains
# this is not based on another image
RUN build commands here
which I build with
docker build --no-cache -t base-image -f base-docker-file.
and my from-docker-file here:
# this is based on the base-image
FROM base-image
RUN build commands here
which I build with
docker build --no-cache -t from-image -f from-docker-file.
Based on this simple setup, do I need to rebuild my from-image whenever I make a change to my base-image or does image linking occur at run time?

you will have to build each time if you make a change to the base-image. You could add these individual commands in a script to be executed together rather than running them separate everytime

1. Multi-stage build
Multi-stage build allows you to isolate images in one Dockerfile. 1
Example:
# syntax=docker/dockerfile:1
FROM alpine:latest AS builder
RUN apk --no-cache add build-base
FROM builder
COPY source1.cpp source.cpp
RUN g++ -o /binary source.cpp
2. Turn off no-cache when building the base-image.
Simple. If there are no changes in the base image, it will not rebuild due to cache.

How can I get the output of the echo command?

I have a Dockerfile with a RUN instruction like this:
RUN echo "$PWD"
When generating the Docker image I only get this in console:
Step 6/17 : RUN echo "$PWD"
---> Using cache
---> 0e27924953b9
How can I get the output of the echo command in my console?

Each line in the dockerfile creates a new layer in the resulting filesystem. Unless a modification is made to your dockerfile that hasn't previously been encountered, Docker optimizes the build by reusing existing layers that have previously been built. You can see these intermediate images with the command docker images --all.
This means that Docker only needs to build from the 1st changed line in the dockerfile onwards, saving much time with repeated builds on a well-crafted dockerfile. You've already built this layer in a previous build, so it's being skipped and taken from cache.
docker build --no-cache .
should prevent the build process from using cached layers.
Change the final path parameter above to suit your environment.

How to create a custom Docker image by applying a custom patch to a file without having status 'Exited (0)' [duplicate]

I'm confused about when should I use CMD vs RUN. For example, to execute bash/shell commands (i.e. ls -la) I would always use CMD or is there a situation where I would use RUN? Trying to understand the best practices about these two similar Dockerfile directives.

RUN is an image build step, the state of the container after a RUN command will be committed to the container image. A Dockerfile can have many RUN steps that layer on top of one another to build the image.
CMD is the command the container executes by default when you launch the built image. A Dockerfile will only use the final CMD defined. The CMD can be overridden when starting a container with docker run $image $other_command.
ENTRYPOINT is also closely related to CMD and can modify the way a container is started from an image.

RUN - command triggers while we build the docker image.
CMD - command triggers while we launch the created docker image.

I found this article very helpful to understand the difference between them:
RUN -
RUN instruction allows you to install your application and packages
required for it. It executes any commands on top of the current image
and creates a new layer by committing the results. Often you will find
multiple RUN instructions in a Dockerfile.
CMD -
CMD instruction allows you to set a default command, which will be
executed only when you run container without specifying a command.
If Docker container runs with a command, the default command will be
ignored. If Dockerfile has more than one CMD instruction, all but last
CMD instructions are ignored.

The existing answers cover most of what anyone looking at this question would need. So I'll just cover some niche areas for CMD and RUN.
CMD: Duplicates are Allowed but Wasteful
GingerBeer makes an important point: you won't get any errors if you put in more than one CMD - but it's wasteful to do so. I'd like to elaborate with an example:
FROM busybox
CMD echo "Executing CMD"
CMD echo "Executing CMD 2"
If you build this into an image and run a container in this image, then as GingerBeer states, only the last CMD will be heeded. So the output of that container will be:
Executing CMD 2
The way I think of it is that "CMD" is setting a single global variable for the entire image that is being built, so successive "CMD" statements simply overwrite any previous writes to that global variable, and in the final image that's built the last one to write wins. Since a Dockerfile executes in order from top to bottom, we know that the bottom-most CMD is the one gets this final "write" (metaphorically speaking).
RUN: Commands May not Execute if Images are Cached
A subtle point to notice about RUN is that it's treated as a pure function even if there are side-effects, and is thus cached. What this means is that if RUN had some side effects that don't change the resultant image, and that image has already been cached, the RUN won't be executed again and so the side effects won't happen on subsequent builds. For example, take this Dockerfile:
FROM busybox
RUN echo "Just echo while you work"
First time you run it, you'll get output such as this, with different alphanumeric IDs:
docker build -t example/run-echo .
Sending build context to Docker daemon 9.216kB
Step 1/2 : FROM busybox
---> be5888e67be6
Step 2/2 : RUN echo "Just echo while you work"
---> Running in ed37d558c505
Just echo while you work
Removing intermediate container ed37d558c505
---> 6f46f7a393d8
Successfully built 6f46f7a393d8
Successfully tagged example/run-echo:latest
Notice that the echo statement was executed in the above. Second time you run it, it uses the cache, and you won't see any echo in the output of the build:
docker build -t example/run-echo .
Sending build context to Docker daemon 9.216kB
Step 1/2 : FROM busybox
---> be5888e67be6
Step 2/2 : RUN echo "Just echo while you work"
---> Using cache
---> 6f46f7a393d8
Successfully built 6f46f7a393d8
Successfully tagged example/run-echo:latest

RUN - Install Python , your container now has python burnt into its image
CMD - python hello.py , run your favourite script

Note: Don’t confuse RUN with CMD. RUN actually runs a command and
commits the result; CMD does not execute anything at build time, but
specifies the intended command for the image.
from docker file reference
https://docs.docker.com/engine/reference/builder/#cmd

RUN Command:
RUN command will basically, execute the default command, when we are building the image. It also will commit the image changes for next step.
There can be more than 1 RUN command, to aid in process of building a new image.
CMD Command:
CMD commands will just set the default command for the new container. This will not be executed at build time.
If a docker file has more than 1 CMD commands then all of them are ignored except the last one. As this command will not execute anything but just set the default command.

RUN: Can be many, and it is used in build process, e.g. install multiple libraries
CMD: Can only have 1, which is your execute start point (e.g. ["npm", "start"], ["node", "app.js"])

There has been enough answers on RUN and CMD. I just want to add a few words on ENTRYPOINT. CMD arguments can be overwritten by command line arguments, while ENTRYPOINT arguments are always used.
This article is a good source of information.

force a docker build to 'rebuild' a single step

I know docker has a --no-cache=true option to force a clean build of a docker image. For me however, all I'd really like to do is force the last step to run in my dockerfile, which is a CMD command that runs a shell script.
For whatever reason, when I modify that script and save it, a typical docker build will reuse the cached version of that step. Is there a way to force docker not to do so, just on that one portion?

Note that this would invalidate the cache for all Dockerfile directives after that line. This is requested in Issue 1996 (not yet implemented, and now (2021) closed), and issue 42799 (mentioned by ub-marco in the comments).
The current workaround is:
FROM foo
ARG CACHE_DATE=2016-01-01
<your command without cache>
docker build --build-arg CACHE_DATE=$(date) ....
That would invalidate cache after the ARG CACHE_DATE line for every build.
acdcjunior reports in the comments having to use:
docker build --build-arg CACHE_DATE=$(date +%Y-%m-%d_%H:%M:%S)
Another workaround from azul:
Here's what I am using to rebuild in CI if changes in git happened:
export LAST_SERVER_COMMIT=`git ls-remote $REPO "refs/heads/$BRANCH" | grep -o "^\S\+"`
docker build --build-arg LAST_SERVER_COMMIT="$LAST_SERVER_COMMIT"
And then in the Dockerfile:
ARG LAST_SERVER_COMMIT
RUN git clone ...
This will only rebuild the following layers if the git repo actually changed.

Create dynamic environment variables at build time in Docker

My specific use case is that I want to organize some data about the EC2 instance a container is running on and make i available as an environment variable. I'd like to do this when the container is built.
I was hoping to be able to do something like ENV VAR_NAME $(./script/that/gets/var) in my Dockerfile, but unsurprisingly that does not work (you just get the string $(./script...).
I should mention that I know the docker run --env... will do this, but I specifically want it to be built into the container.
Am I missing something obvious? Is this even possible?

Docker v1.9 or newer
If you are using Docker v1.9 or newer, this is possible via support for build time arguments. Arguments are declared in the Dockerfile by using the ARG statement.
ARG REQUIRED_ARGUMENT
ARG OPTIONAL_ARGUMENT=default_value
When you later actually build your image using docker build you can pass arguments via the flag --build-arg as described in the docker docs.
$ docker build --build-arg REQUIRED_ARGUMENT=this-is-required .
Please note that it is not recommended to use build-time variables for passwords or secrets such as keys or credentials.
Furthermore, build-time variables may have great impact on caching. Therefore the Dockerfile should be constructed with great care to be able to utilize caching as much as possible and therein speed up the building process.
Edit: the "docker newer than v1.9"-part was added after input from leedm777:s answer.
Docker before v1.9
If you are using a Docker-version before 1.9, the ARG/--build-arg approach was not possible. You couldn't resolve this kind of info during the build so you had to pass them as parameters to the docker run command.
Docker images are to be consistent over time whereas containers can be tweaked and considered as "throw away processes".
More info about ENV
A docker discussion about dynamic builds
The old solution to this problem was to use templating. This is not a neat solution but was one of very few viable options at the time. (Inspiration from this discussion).
save all your dynamic data in a json or yaml file
create a docker file "template" where the dynamic can later be expanded
write a script that creates a Dockerfile from the config data using some templating library that you are familiar with

Docker 1.9 has added support for build time arguments.
In your Dockerfile, you add an ARG statement, which has a similar syntax to ENV.
ARG FOO_REQUIRED
ARG BAR_OPTIONAL=something
At build time, you can pass pass a --build-arg argument to set the argument for that build. Any ARG that was not given a default value in the Dockerfile must be specified.
$ docker build --build-arg FOO_REQUIRED=best-foo-ever .

To build ENV VAR_NAME $(./script/that/gets/var) into the container, create a dynamic Dockerfile at build time:
$ docker build -t awesome -f Dockerfile .
$ # get VAR_NAME value:
$ VAR_VALUE=`docker run --rm awesome \
bash -c 'echo $(./script/that/gets/var)'`
$ # use dynamic Dockerfile:
$ {
echo "FROM awesome"
echo "ENV VAR_NAME $VAR_VALUE"
} | docker build -t awesome -
https://github.com/42ua/docker-autobuild/blob/master/emscripten-sdk/README.md#build-docker-image

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart