Dockerfile inheriting ENV from base image with unexpected result

Dockerfile inheriting ENV from base image with unexpected result - docker

I have three Dockerfiles with two of them providing base images to their siblings as follows:
# Dockerfile.a (for building image-a)
FROM debian:latest
# ...
# Dockerfile.b (for building image-b)
ARG INDEX
FROM image-a
# ...
ENV INDEX=$INDEX
# ...
# Dockerfile.c (for building image-c)
ARG INDEX
FROM image-b
# ...
ENV INDEX=$INDEX
# ...
Builds are done with for ABC in a b c; do podman build -f Dockerfile.$ABC -t $image-$ABC:latest --build-arg INDEX=1 ...; done".
What I currently observe is that in a running instance of image-c $INDEX is empty, i.e. does not equal 1. Am I making an obvious mistake here? Is the repeated use of INDEX both as ARG and ENV and across the chain of inheritance perhaps interfering with the normal semantics of ARG INDEX ... ENV INDEX=$INDEX in Dockerfile.c somehow?

Use ARG before FROM if you need to parametrize your image. In order to use the same build argument later, your have to define the ARG again after FROM:
ARG INDEX
FROM image-b-${INDEX} // just example
ARG INDEX
ENV INDEX=${INDEX}

Related

Do I need separate Dockerfiles for py2 and py3?

Currently I have 2 Dockerfiles, Dockerfile-py2:
FROM python:2.7
# stuff
and Dockerfile-py3:
FROM python:3.4
# stuff
where both instances of # stuff are identical.
I build two docker images using an invoke task:
#task
def docker(ctx):
"""Build docker images.
"""
tag = ctx.run('git log -1 --pretty=%h').stdout.strip()
for pyversion in '23':
name = 'myrepo/myimage{pyversion}'.format(pyversion=pyversion)
image = '{name}:{tag}'.format(name=name, tag=tag)
latest = '{name}:latest'.format(name=name)
ctx.run('docker build -t {image} -f Dockerfile-py{pyversion} .'.format(image=image, pyversion=pyversion))
ctx.run('docker tag {image} {latest}'.format(image=image, latest=latest))
ctx.run('docker push {name}'.format(name=name))
is there any way to prevent the duplication of # stuff so I can't get in a situation where someone edits one file but not the other?

Here is one way using Dockerfile ARGS along with docker build --build-arg:
ARG version
FROM python:${version}
RUN echo "$(python --version)"
# stuff
Now you build for python2.7 like so:
docker build -t myimg/tmp --build-arg version=2.7 .
In the output you will see:
Step 3/3 : RUN echo "$(python --version)"
---> Running in 06e28a29a3d2
Python 2.7.16
And in the same way, for python3.4:
docker build -t myimg/tmp --build-arg version=3.4 .
In the output you will see:
Step 3/3 : RUN echo "$(python --version)"
---> Running in 2283edc1b65d
Python 3.4.10
As you can imagine you can also set default values for ${version} in your dockerfile:
ARG version=3.4
FROM python:${version}
RUN echo "$(python --version)"
# stuff
Now if you just do docker build -t myimg/tmp . you will build for python3.4. But you can still override with the previous two commands.
So to answer your question, No, you don't need two different docker files.

How to unset "ENV" in dockerfile?

For some certain reasons, I have to set "http_proxy" and "https_proxy" ENV in my dockerfile. I would like to now unset them because there are also some building process can't be done through the proxy.
# dockerfile
# ... some process
ENV http_proxy=http://...
ENV https_proxy=http://...
# ... some process that needs the proxy to finish
UNSET ENV http_proxy # how to I unset the proxy ENV here?
UNSET ENV https_proxy
# ... some process that can't use the proxy

It depends on what effect you are trying to achieve.
Note that, as a matter of pragmatics (i.e. how developers actually speak), "unsetting a variable" can mean two things: removing it from the environment, or setting the variable to an empty value. Technically, these are two different operations. In practice though I have not run into a case where the software I'm trying to control differentiates between the variable being absent from the environment, and the variable being present in the environment but set to an empty value. I generally can use either method to get the same result.
If you don't care whether the variable is in the layers produced by Docker, but leaving it with a non-empty value causes problems in later build steps.
For this case, you can use ENV VAR_NAME= at the point in your Dockerfile from which you want to unset the variable. Syntactic note: Docker allows two syntaxes for ENV: this ENV VAR=1 is the same as ENV VAR 1. You can separate the variable name from the value with a space or an equal sign. When you want to "unset" a variable by setting it to an empty value you must use the equal sign syntax or you get an error at build time.
So for instance, you could do this:
ENV NOT_SENSITIVE some_value
RUN something
ENV NOT_SENSITIVE=
RUN something_else
When something runs, NOT_SENSITIVE is set to some_value. When something_else runs, NOT_SENSITIVE is set to the empty string.
It is important to note that doing unset NOT_SENSITIVE as a shell command will not affect anything else than what executes in this shell. Here's an example:
ENV NOT_SENSITIVE some_value
RUN unset NOT_SENSITIVE && printenv NOT_SENSITIVE || echo "does not exist"
RUN printenv NOT_SENSITIVE
The first RUN will print does not exist because NOT_SENSITIVE is unset when printenv executes and because it is unset printenv returns a non-zero exit code which causes the echo to execute. The second RUN is not affected by the unset in the first RUN. It will print some_value to the screen.
But what if I need to remove the variable from the environment, not just set it to an empty value?
In this case using ENV VAR_NAME= won't work. I don't know of any way to tell Docker "from this point on, you must remove this variable from the environment, not just set it to an empty value".
If you still want to use ENV to set your variable, then you'll have to start each RUN in which you want the variable to be unset with unset VAR_NAME, which will unset it for that specific RUN only.
If you want to prevent the variable from being present in the layers produced by Docker.
Suppose that variable contains a secret and the layer could fall into the hands of people who should not have the secret. In this case you CANNOT use ENV to set the variable. A variable set with ENV is baked into the layers to which it applies and cannot be removed from those layers. In particular, (assuming the variable is named SENSITIVE) running
RUN unset SENSITIVE
does not do anything to remove it from the layer. The unset command above only removes SENSITIVE from the shell process that RUN starts. It affects only that shell. It won't affect shells spawned by CMD, ENTRYPOINT, or any command provided through running docker run at the command line.
In order to prevent the layers from containing the secret, I would use docker build --secret= and RUN --mount=type=secret.... For instance, assuming that I've stored my secret in a file named sensitive, I could have a RUN like this:
RUN --mount=type=secret,id=sensitive,target=/root/sensitive \
export SENSITIVE=$(cat /root/sensitive) \
&& [[... do stuff that requires SENSITIVE ]] \
Note that the command given to RUN does not need to end with unset SENSITIVE. Due to the way processes and their environments are managed, setting SENSITIVE in the shell spawned by RUN does not have any effect beyond what that shell itself spawns. Environment changes in this shell won't affect future shells nor will it affect what Docker bakes into the layers it creates.
Then the build can be run with:
$ DOCKER_BUILDKIT=1 docker build --secret id=secret,src=path/to/sensitive [...]
The environment for the docker build command needs DOCKER_BUILDKIT=1 to use BuildKit because this method of passing secrets is only available if Docker uses BuildKit to build the images.

If one needs env vars during the image build but they should not persist, just clear them. In the following example, the running container shows empty env vars.
Dockerfile
# set proxy
ARG http_proxy
ARG https_proxy
ARG no_proxy
ENV http_proxy=$http_proxy
ENV https_proxy=$http_proxy
ENV no_proxy=$no_proxy
# ... do stuff that needs the proxy during the build, like apt-get, curl, et al.
# unset proxy
ENV http_proxy=
ENV https_proxy=
ENV no_proxy=
build.sh
docker build -t the-image \
--build-arg http_proxy="$http_proxy" \
--build-arg https_proxy="$http_proxy" \
--build-arg no_proxy="$no_proxy" \
--no-cache \
.
run.sh
docker run --rm -i \
the-image \
sh << COMMANDS
env
COMMANDS
Output
no_proxy=
https_proxy=
http_proxy=
...

According to docker docs you need to use shell command instead:
FROM alpine
RUN export ADMIN_USER="mark" \
&& echo $ADMIN_USER > ./mark \
&& unset ADMIN_USER
CMD sh
See https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#env for more details.

Short-answer:
Try to avoid unnecessary environment variables, so you don't need to unset them.
In case you have to unset for a command you can do the following:
RUN unset http_proxy https_proxy no_proxy \
&& execute_your_command_here
In case you have to unset for the built image you can do the following:
FROM ubuntu_with_http_proxy
ENV http_proxy= \
https_proxy= \
no_proxy=
Once environment variables are set using the ENV instruction we can't really unset them as it is detailed:
Each ENV line creates a new intermediate layer, just like RUN commands. This means that even if you unset the environment variable in a future layer, it still persists in this layer and its value can be dumped.
See: Best practices for writing Dockerfiles
Details:
I prefer to define http_proxy as an argument during build like the following:
FROM ubuntu:20.04
ARG http_proxy=http://host.docker.internal:3128
ARG https_proxy=http://host.docker.internal:3128
ARG no_proxy=.your.domain,localhost,127.0.0.1,.docker.internal
On corporate proxy we need authentication anyways, so we need to configure local proxy server listening on 127.0.0.1:3128 witch is accessible over host.docker.internal:3128 from containers. This way it also works on docker desktop if we connect to corporate network over VPN (with local/home network blocked).
Setting no_proxy is also important to avoid flooding the proxy server.
See the following article for more details on no_proxy related topics:
Can we standardize NO_PROXY?
Sometimes it is also good to read the related documentation:
ENV
ARG
In case we need to configure those environment variables we can use the following command:
during build (link):
docker build ... --build-arg http_proxy='http://alternative.proxy:3128/' ...
during runs (link):
docker run ... -env http_proxy='http://alternative.proxy:3128/' ...
Also note that we don't even need to define proxy related arguments since those are already predefine according to the following section:
Dockerfile reference - Predefined ARGs

You can add below lines in the Dockerfile
ENV http_proxy ""
ENV https_proxy ""

I found the secret approach didn't work because I needed the env variable to persist in the container when I ran it in interactive mode but then needed to completely remove the variable for a later stage build for production.
What worked was in building for the development phase I appended the environment variable to the /root/.basrc file as
RUN echo export AWS_PROFILE=role-name >> /root/.bashrc
``
In the production stage of the build I then removed the last line of /root/.bashrc:
RUN sed -i '$ d' /root/.bashrc

Execute ENTRYPOINT from base image in build stage

I'm using a code generator tool which is provided as a Docker image with an ENTRYPOINT. I.e. for the manual use case I execute the following command line:
$ docker run --rm -v ${PWD}:/local some/codegen-image:latest \
generate ... parameters for code generator tool ...
So far, so good.
But I want to integrate the code generator image into my own multi-stage image build. I.e. the first stage should call the ENTRYPOINT of the base image to generate the code that will be consumed by the second stage:
# stage 1
FROM some/codegen-image:latest as codegen
... build set up steps for stage 1 ...
# now run ENTRYPOINT from base image, copy & pasted from the output of
#
# docker inspect -f '{{json .Config.Entrypoint}}' some/codegen-image:latest
#
RUN ["some_command", "option1", ..., "optionN", \
"generate", \
... parameters for code generator tool ... \
]
# stage 2
FROM some/other-image as stage2
... build set up steps for stage 2 ...
# copy-in generated code from stage 1
COPY --from=codegen /tmp/build/ .
This works but it violates the DRY principle, i.e. I need to update my Dockerfile every time the upstream project makes an incompatible change to its ENTRYPOINT.
Can I avoid the copy & paste from docker inspect output? My own research has turned up nothing so far...

Multi-Stage Dockerfile was introduced to optimize the overall size of the container docs.
The FROM directive just brings the content of the image specified, but you have to explicitly tell the container what command should be executed.
The feature you are expecting is not yet supported by docker.
Eg.
FROM some/codegen-image:latest as codegen
ARGS_ENTRYPOINT_OF_CODEGEN ["generate","parameters"]
.
.
.
FROM some/other-image as stage2
COPY --from=codegen /tmp/build/ .
It seems your approach is correct at this moment and the only way around.

Docker set ENV based on if-else

I have a situation where I need to set an ENV based on runtime condition like thus:
RUN if [ "$RUNTIME" = "prod" ] then VARIABLE="Some Production URL"; else VARIABLE="Some QA URL"; fi;
ENV={VARIABLE}
Been looking at different solutions but none of them seem to be panning out (for example the basic one where VARIABLE is lost when RUN exits). What would be an elegant way to achieve this?

It is an unfortunate constraint that you only have this "dev/qa/prod" environment variable. However, it is possible to achieve what you want.
First, you might consider baking your environment specific configuration into the image for all environments. (Normally I would discourage to do this!)
For example you can COPY three files into your image:
dev-env.sh: contains your dev config in the form:
ELASTICSEARCH_URL=http://elastic-dev:123
qa-env.sh (similar)
prod-env.sh (similar)
Then you evaluate at run-time (not at build-time) in which environment you are: You add an ENTRYPOINT script to your image which will source the correct file, depending on the ENVIRONMENT_NAME variable.
Dockerfile (part):
ENTRYPOINT ["docker-entrypoint.sh"]
docker-entrypoint.sh (copied into WORKDIR of the image):
#!/bin/bash
set -e
if [ "$ENVIRONMENT_NAME" = "prod" ]; then
source prod-env.sh
fi
# else if qa ... , else if dev ..., else fail
exec "$#"
This script will run when you launch the docker container, so this approach is no option for you if you need the variables to be available in Dockerfile-instructions (at build-time).
Another (build-time) workaround is described here and consists of using temporary files to store environment variables across multiple image layers.

The literal conditional execution can be achieved with multistage build and ONBUILD.
ARG mode=prod
FROM alpine as img_prod
ONBUILD ENV ELASTICSEARH_URL=whatever/for/prod
FROM alpine as img_qa
ONBUILD ENV ELASTICSEARH_URL=whatever/for/qa
FROM img_${mode}
...
Then you build with docker build --build-arg mode=qa .

Wouldn't passing env var with docker run be the solution you need? Something like this:
docker run -e YOUR_VARIABLE="Some Production URL" ...

How to make a build arg mandatory during Docker build?

Is there any way to make a build argument mandatory during docker build? The expected behaviour would be for the build to fail if the argument is missing.
For example, for the following Dockerfile:
FROM ubuntu
ARG MY_VARIABLE
ENV MY_VARIABLE $MY_VARIABLE
RUN ...
I would like the build to fail at ARG MY_VARIABLE when built with docker build -t my-tag . and pass when built with docker build -t my-tag --build-arg MY_VARIABLE=my_value ..
Is there any way to achieve that behaviour? Setting a default value doesn't really do the trick in my case.
(I'm running Docker 1.11.1 on darwin/amd64.)
EDIT:
One way of doing that I can think of is to run a command that fails when MY_VARIABLE is empty, e.g.:
FROM ubuntu
ARG MY_VARIABLE
RUN test -n "$MY_VARIABLE"
ENV MY_VARIABLE $MY_VARIABLE
RUN ...
but it doesn't seem to be a very idiomatic solution to the problem at hand.

I tested with RUN test -n <ARGvariablename> what #konradstrack mentioned in the original (edit) post... that seems do the job of mandating the variable to be passed as the build time argument for the docker build command:
FROM ubuntu
ARG MY_VARIABLE
RUN test -n "$MY_VARIABLE"
ENV MY_VARIABLE $MY_VARIABLE

You can also use shell parameter expansion to achieve this.
Let's say your mandatory build argument is called MANDATORY_BUILD_ARGUMENT, and you want it to be set and non-empty, your Dockerfile could look like this:
FROM debian:stretch-slim
MAINTAINER Evel Knievel <evel#kniev.el>
ARG MANDATORY_BUILD_ARGUMENT
RUN \
# Check for mandatory build arguments
: "${MANDATORY_BUILD_ARGUMENT:?Build argument needs to be set and non-empty.}" \
# Install libraries
&& apt-get update \
&& apt-get install -y \
cowsay \
fortune \
# Cleanup
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/var/tmp/* \
/tmp/* \
CMD ["/bin/bash", "-c", "/usr/games/fortune | /usr/games/cowsay"]
Of course, you would also want to use the build-argument for something, unlike I did, but still, I recommend building this Dockerfile and taking it for a test-run :)
EDIT
As mentioned in #Jeffrey Wen's answer, to make sure that this errors out on a centos:7 image (and possibly others, I admittedly haven't tested this on other images than stretch-slim):
Ensure that you're executing the RUN command with the bash shell.
RUN ["/bin/bash", "-c", ": ${MYUID:?Build argument needs to be set and not null.}"]

Another simple way:
RUN test -n "$MY_VARIABLE" || (echo "MY_VARIABLE not set" && false)

Long time ago I had a need to introduce a required (mandatory) ARG, and for better UX include the check at the beginning:
FROM ubuntu:bionic
ARG MY_ARG
RUN [ -z "$MY_ARG" ] && echo "MY_ARG is required" && exit 1 || true
...
RUN ./use-my-arg.sh
But this busts the build cache for every single layer after the initial MY_ARG, because MY_ARG=VALUE is prepended to every RUN command afterwards.
Whenever I changed MY_ARG it would end up rebuilding the whole image, instead of rerunning the last RUN command only.
To bring caching back, I have changed my build to a multi-staged one:
The first stage uses MY_ARG and checks it's presence.
The second stage proceeds as usual and declares ARG MY_ARG right at the end.
FROM alpine:3.11.5
ARG MY_ARG
RUN [ -z "$MY_ARG" ] && echo "MY_ARG is required" && exit 1 || true
FROM ubuntu:bionic
...
ARG MY_ARG
RUN ./use-my-arg.sh
Since ARG MY_ARG in the second stage is declared right before it's used, all the previous steps in that stage are unaffected, thus cache properly.

You could do something like this...
FROM ubuntu:14.04
ONBUILD ARG MY_VARIABLE
ONBUILD RUN if [ -z "$MY_VARIABLE" ]; then echo "NOT SET - ERROR"; exit 1; else : ; fi
Then docker build -t my_variable_base .
Then build your images based on this...
FROM my_variable_base
...
It's not super clean, but at least it abstracts the 'bleh' stuff away to the base image.

I cannot comment yet because I do not have 50 reputation, but I would like to add onto #Jan Nash's solution because I had a little difficulty getting it to work with my image.
If you copy/paste #Jan Nash's solution, it will work and spit out the error message that the build argument is not specified.
What I want to add
When I tried getting it to work on a CentOS 7 image (centos:7), Docker ran the RUN command without erroring out.
Solution
Ensure that you're executing the RUN command with the bash shell.
RUN ["/bin/bash", "-c", ": ${MYUID:?Build argument needs to be set and not null.}"]
I hope that helps for future incoming people. Otherwise, I believe #Jan Nash's solution is just brilliant.

In case anybody is looking for a the solution but with docker compose build, I used mandatory variables.
version: "3.9"
services:
my-service:
build:
context: .
args:
- ENVVAR=${ENVVAR:?See build instructions}
After running docker compose build:
Before exporting ENVVAR: Invalid template: "required variable ENVVAR is missing a value: See build instructions"
After exporting ENVVAR: build proceeds
Support for Required Environment variables
Compose Environment Variables

None of these answers worked for me. I wanted ${MY_VARIABLE:?} but did not want to print anything, so I did like this:
ARG MY_VARIABLE
RUN test -n ${MY_VARIABLE:?}
Nothing is printed on success. On error you see this, which is a good enough error:
ERROR RUN test -n ${MY_VARIABLE:?}
/bin/sh: MY_VARIABLE: parameter not set or null
executor failed running [/bin/sh -c test -n ${MY_VARIABLE:?}]: >exit code: 2

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart