Why does docker rebuild all layers every time I change build args - docker

I have a docker file which has a lot of layers. At the top of the file I have some args like
FROM ubuntu:18.04
ARGS USER=test-user
ARGS UID=1000
#ARGS PW=test-user
# Then several Layers which does not use any ARGS. Example
LABEL version="1.0"
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN mkdir ~/mapped-volume
RUN apt-get update && apt-get install -y wget bzip2 ca-certificates build-essential curl git-core htop pkg-config unzip unrar tree freetds-dev vim \
sudo nodejs npm net-tools flex perl automake bison libtool byacc
# And so on
# And finally towards the end
# Setup User
RUN useradd -m -d /home/${USER} --uid ${UID} -G sudo -s /bin/bash ${USER}
# && echo "${USER}:${PW}" | chpasswd
# Couple of more commands to change dir, entry point etc. Example
When I build this docker file with any arg value different from the last build and/or after small changes in the last two layers, the build builds everything again. It does not use cached layer. The command I use to build is something like this
docker build --build-arg USER=new-user --build-arg UID=$UID -t my-image:1.0 .
And every time I change the values the build goes all through again. With a truncated top like below
UID -t my-image:1.0 .
Sending build context to Docker daemon 44.54kB
Step 1/23 : FROM ubuntu:18.04
---> ccc6e87d482b
Step 2/23 : ARG USER=ml-user
---> Using cache
---> 6c0c5d5c5056
Step 3/23 : ARG UID=1000
---> Using cache
---> b25867c282c7
Step 4/23 : LABEL version="1.0"
---> Running in 1ffff70d56c1
Removing intermediate container 1ffff70d56c1
---> 0f1277def3ca
Step 5/23 : ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
---> Running in 49d08c41b233
Removing intermediate container 49d08c41b233
---> f5b345573c1f
Step 6/23 : RUN mkdir ~/mapped-volume
---> Running in e4f8a5956450
Removing intermediate container e4f8a5956450
---> 1b22731d9051
Step 7/23 : RUN apt-get update && apt-get install -y wget bzip2 ca-certificates build-essential curl git-core htop pkg-config unzip unrar tree freetds-dev vim sudo nodejs npm net-tools flex perl automake bison libtool byacc
---> Running in ffc297de6234
Get:1 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
So from step 7 it keeps doing all steps without using the cache of that layer which should have a bunch of packages
Why? How can I stop this? Previously when I did not have args, this layer and other layers used to be picked up from cache.

Move your args to just before you need them. Docker does not replace args in the RUN commands before running them. Instead, the args are passed as environment variables and expanded by the shell within the temporary container. Because of that, a change to an arg is a change to the environment, and a miss of the build cache for that step. Once one step misses the cache, all following steps must be rebuilt.
FROM ubuntu:18.04
# Then several Layers which does not use any ARGS. Example
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
RUN mkdir ~/mapped-volume
RUN apt-get update && apt-get install -y wget bzip2 ca-certificates build-essential curl git-core htop pkg-config unzip unrar tree freetds-dev vim \
sudo nodejs npm net-tools flex perl automake bison libtool byacc
# And so on
# And finally towards the end
# Setup User
ARGS USER=test-user
ARGS UID=1000
RUN useradd -m -d /home/${USER} --uid ${UID} -G sudo -s /bin/bash ${USER}
# && echo "${USER}:${PW}" | chpasswd
# Couple of more commands to change dir, entry point etc. Example
LABEL version="1.0"
Also, labels, environment variables that aren't needed at build time, exposed ports, and any other meta data is often best left to the end of the Dockerfile since they have minimal impact on build time and there's no need to miss the cache when they change.

Related

Is there a way to hardcode user inputs in a Dockerfile or run docker build in interactive mode?

I am using a Dockerfile to install a tool. I am running docker build -f Dockerfile -t ubuntu:mytool . command to initiate the build. The line RUN ./toolPackageInstaller expects two user inputs (1) install path selection and (2) an integer for timezone info halfway through the installation. How do I hardcode this info in a dockerfile or run docker build in interactive mode so the user can input these values during the install process?
FROM ubuntu:bionic
WORKDIR /tmp
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
build-essential \
sudo \
git \
make
COPY ToolPackage.tar.xz /tmp
RUN tar xvfJp /tmp/ToolPackage.tar.xz
WORKDIR /tmp/ToolPackage
RUN chmod +x toolPackageInstaller
RUN ./toolPackageInstaller
Use build arguments for those desired arguments:
FROM ubuntu:bionic
ARG ARGUMENT_1=<HARDCODED_VALUE>
ARG ARGUMENT_2=<HARDCODED_VALUE>
WORKDIR /tmp
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
build-essential \
sudo \
git \
make
COPY ToolPackage.tar.xz /tmp
RUN tar xvfJp /tmp/ToolPackage.tar.xz
WORKDIR /tmp/ToolPackage
RUN chmod +x toolPackageInstaller
RUN ./toolPackageInstaller $ARGUMENT_1 $ARGUMENT_2
And configure the script on toolPackageInstaller to use those values as input (referring to them with $1 and $2)
By default it will run with the hardcoded value, and also you can override it if you desire:
docker build --build-arg ARGUMENT_1=<NEW_VALUE> --build-arg ARGUMENT_2=<ANOTHER_NEW_VALUE> -t ubuntu:mytool .

Docker containers retaining data

I'm learning Docker and I've been crafting a Dockerfile for an Ubuntu container.
My problem is I keep getting persistent information between different containers. I have exited, removed the container and then removed its image. After I made changes to my Dockerfile, I executed docker build -t playbuntu . Executing the following Dockerfile:
FROM ubuntu:latest
## for apt to be noninteractive
ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN true
## preesed tzdata, update package index, upgrade packages and install needed software
RUN echo "tzdata tzdata/Areas select Europe" > /tmp/preseed.txt; \
echo "tzdata tzdata/Zones/Europe select London" >> /tmp/preseed.txt; \
debconf-set-selections /tmp/preseed.txt && \
apt-get update && \
apt-get install -y tzdata
RUN apt-get update -y && apt-get upgrade -y && apt-get install tree nano vim -y && apt-get install less -y && apt-get install lamp-server^ -y
RUN echo "ServerName localhost" >> /etc/apache2/apache2.conf
EXPOSE 80
WORKDIR /var/www
COPY ./index.php /var/www
COPY ./000-default.conf /etc/apache2/sites-available
CMD [ "apache2ctl", "start", "&&", "apache2ctl", "restart" ]
Once I execute winpty docker run -it -p 80:80 playbuntu bash, my problem is, instead of my index.php file outputting the following:
<?php
print "<center><h3>Sisko's LAMP activated!</h3><center>";
phpinfo();
I get the following debug code I experimented with hours ago:
<?php
print "...responding";
phpinfo();
Is there a caching system Docker might be using? I pruned all Docker volumes just in case that is how Docker might be caching. All except 2 volumes which are being used by other containers unrelated to my project.
I suspect that, after you changed index.php on your host machine you did not rerun the docker build ... command. You will need to recreate the container image any time any of the image's content changes.
Please confirm whether running the docker build ... command again solves the issue.
Background
Docker images comprise one of more layers.
Image (layers) and Volumes are distinct.
Since your docker run ... command contains no e.g. --volume= bind mounts (or equivalent), I suspect Docker Volumes are not relevant to this question.
There is a possibility that rebuilding images does not replace image layers and thus caching does occur. However, in your case, I think this is not the issue. See the Docker documentation (link) for an overview of the Dockerfile commands that add layers.
Because of the way that layers work, if the preceding layers in your Dockerfile were unchanged and the index.php were unchanged, then Docker would not rebuild these layers. However, because your Dockerfile includes a layer that apt-get update && apt-get install ...., the layer will be invalidated, recreated and so will subsequent layers.
If you change index.php on the host and rebuild the image, this layer will always be rebuilt.
I built your Dockerfile twice. Here's the beginning of the second (!) build command. NB the Using cache commands for the unchanged layers preceding the RUN apt-get update... layer which is rebuilt.
docker build --rm -f "Dockerfile" -t 59582820:0939 "."
Sending build context to Docker daemon 3.584kB
Step 1/10 : FROM ubuntu:latest
---> 549b9b86cb8d
Step 2/10 : ENV DEBIAN_FRONTEND noninteractive
---> Using cache
---> 1529d0e293f3
Step 3/10 : ENV DEBCONF_NONINTERACTIVE_SEEN true
---> Using cache
---> 1ba10410d06a
Step 4/10 : RUN echo "tzdata tzdata/Areas select Europe" > /tmp/preseed.txt; echo "tzdata tzdata/Zones/Europe select London" >> /tmp/preseed.txt; debconf-set-selections /tmp/preseed.txt && apt-get update && apt-get install -y tzdata
---> Using cache
---> afb861da52e4
Step 5/10 : RUN apt-get update -y && apt-get upgrade -y && apt-get install tree nano vim -y && apt-get install less -y && apt-get install lamp-server^ -y
---> Running in 6f05bbb8e80a
The evidence suggests to me that you didn't rebuild after changing the index.php.
You can use volumes to persist some folders like configs or properties etc. Docker Storage

Docker-entrypoint.sh results in "not found" for ARM image with golang

My problem is that I get an error when running my container on an ARM arch system(RaspberryPI with Raspbian). Image was built on that same Raspberry.
This is my dockerfile:
FROM arm32v7/golang
COPY qemu-arm-static /usr/bin
ENV STATUSOK_VERSION 0.1.1
RUN apt-get update \
&& apt-get install -y unzip \
&& wget https://github.com/sanathp/statusok/releases/download/$STATUSOK_VERSION/statusok_linux.zip \
&& unzip statusok_linux.zip \
&& mv ./statusok_linux/statusok /go/bin/StatusOk \
&& rm -rf ./statusok_linux* \
&& apt-get remove -y unzip git \
&& apt-get autoremove -y \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
VOLUME /config
COPY ./docker-entrypoint.sh /docker-entrypoint.sh
ENTRYPOINT /docker-entrypoint.sh
I'm able to succesfully build this on a RaspberryPI running Raspbian:
root#raspberrypi:~/armstatusok# docker build . -t armstatusok
Sending build context to Docker daemon 6.656kB
Step 1/7 : FROM arm32v7/golang
---> 8bbfdfd01a06
Step 2/7 : COPY qemu-arm-static /usr/bin
---> Using cache
---> 2572fd1e03a0
Step 3/7 : ENV STATUSOK_VERSION 0.1.1
---> Using cache
---> 25d39a4c6eb5
Step 4/7 : RUN apt-get update && apt-get install -y unzip && wget https://github.com/sanathp/statusok/releases/download/$STATUSOK_VERSION/statusok_linux.zip && unzip statusok_linux.zip && mv ./statusok_linux/statusok /go/bin/StatusOk && rm -rf ./statusok_linux* && apt-get remove -y unzip git && apt-get autoremove -y && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
---> Using cache
---> bfb1cfa9a985
Step 5/7 : VOLUME /config
---> Using cache
---> 3bfbce28329b
Step 6/7 : COPY ./docker-entrypoint.sh /docker-entrypoint.sh
---> Using cache
---> a1795ca4f40c
Step 7/7 : ENTRYPOINT /docker-entrypoint.sh
---> Using cache
---> d0ce74911ba3
Successfully built d0ce74911ba3
Successfully tagged armstatusok:latest
Next step is to run it, and where I get into trouble:
root#raspberrypi:~/armstatusok# docker run --name=armstatusok -v $PWD:/config armstatusok
/docker-entrypoint.sh: 1: /docker-entrypoint.sh: /go/bin/StatusOk: not found
I went into the container commenting line one of the docker-entrypoint.sh and checked if /go/bin/StatusOk was actually there, and it was.
My docker-entrypoint.sh:
root#raspberrypi:~/armstatusok# cat docker-entrypoint.sh
/go/bin/StatusOk --config /config/config.json
Now my question is, does anybody have a clue where to start? I also tested this dockerfile on x86 arch, and there it worked. I only changed the FROM line to the x86 flavour and removed the COPY qemu-arm-static /usr/bin since that line is there to make it work on ARM arch, according to documentation.
I copied this Dockerfile and start script verbatim and it builds and runs perfectly for me. I get
Config file not present at the given location: /config/config.json give correct file location using --config parameter
because I don't have access to the config file you're using. But the fact I get that message means that StatusOk is running. So I don't know what to suggest.
The only difference I made was to add a shebang #!/bin/sh to the start of the docker-entrypoint.sh file, and ensure it has execute permission, by running ls -al, and if it doesn't have x in the permissions, running chmod +rwx. Don't know if that made any difference as to how the script tried to access /go/bin/StatusOk.
Full docker-entrypoint.sh contents:
#!/bin/sh
/go/bin/StatusOk --config /config/config.json

Setting up our Rasa/NLU container, error?

I have this file Dockerfile.nlu
FROM chatbot/spacy:latest
WORKDIR /app
COPY nlu ./agent_nlu
RUN python –m rasa_nlu.train --config agent_nlu/config.yml --data agent_nlu/data/ --path agent_nlu/agent --fixed_model_name default
and I get the error below:
]$ sudo docker build -t nlu:latest -f docker/Dockerfile.nlu .
Sending build context to Docker daemon 9.216kB
Step 1/4 : FROM chatbot/spacy:latest
---> 496dc6a38abb
Step 2/4 : WORKDIR /app
---> Using cache
---> 7f02012c8452
Step 3/4 : COPY nlu ./agent_nlu
COPY failed: stat /var/lib/docker/tmp/docker-builder363868051/nlu: no such file or directory
It doesn't look like Docker can find the nlu directory. Are you sure it exists? Are you sure that you are executing the command from the correct directory?
But you also aren't installing Rasa at all or any of it's requirements. Is there a reason you aren't using the pre-built Rasa images? available here with docs here.
Here is a fully functional Docker file pulled from their repo.
FROM python:3.6-slim
ENV RASA_NLU_DOCKER="YES" \
RASA_NLU_HOME=/app \
RASA_NLU_PYTHON_PACKAGES=/usr/local/lib/python3.6/dist-packages
# Run updates, install basics and cleanup
# - build-essential: Compile specific dependencies
# - git-core: Checkout git repos
RUN apt-get update -qq \
&& apt-get install -y --no-install-recommends build-essential git-core openssl libssl-dev libffi6 libffi-dev curl \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
WORKDIR ${RASA_NLU_HOME}
COPY . ${RASA_NLU_HOME}
# use bash always
RUN rm /bin/sh && ln -s /bin/bash /bin/sh
RUN pip install -r alt_requirements/requirements_spacy_sklearn.txt
RUN pip install -e .
RUN pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-2.0.0/en_core_web_md-2.0.0.tar.gz --no-cache-dir > /dev/null \
&& python -m spacy link en_core_web_md en \
&& pip install https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-2.0.0/de_core_news_sm-2.0.0.tar.gz --no-cache-dir > /dev/null \
&& python -m spacy link de_core_news_sm de
COPY sample_configs/config_spacy.yml ${RASA_NLU_HOME}/config.yml
VOLUME ["/app/projects", "/app/logs", "/app/data"]
EXPOSE 5000
ENTRYPOINT ["./entrypoint.sh"]
CMD ["start", "-c", "config.yml", "--path", "/app/projects"]

rebuild docker image from specific step

I have the below Dockerfile.
FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel#alexander.com>
RUN apt-get -y install software-properties-common
RUN apt-get -y update
# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
RUN add-apt-repository -y ppa:webupd8team/java
RUN apt-get -y update
RUN apt-get install -y oracle-java8-installer
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /var/cache/oracle-jdk8-installer
# Define working directory.
WORKDIR /work
# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH
# Install maven
RUN apt-get -y update
RUN apt-get -y install maven
# Install Open SSH and git
RUN apt-get -y install openssh-server
RUN apt-get -y install git
# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package
# clone and build zeppelin fork
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests
# Install Supervisord
RUN apt-get -y install supervisor
RUN mkdir -p var/log/supervisor
# Configure Supervisord
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd
EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]
While building image it failed in step 23 i.e.
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests
Now when I rebuild it starts to build from step 23 as docker is using cache.
But if I want to rebuild the image from step 21 i.e.
RUN git clone https://github.com/apache/incubator-zeppelin.git
How can I do that?
Is deleting the cached image is the only option?
Is there any additional parameter to do that?
You can rebuild the entire thing without using the cache by doing a
docker build --no-cache -t user/image-name
To force a rerun starting at a specific line, you can pass an arg that is otherwise unused. Docker passes ARG values as environment variables to your RUN command, so changing an ARG is a change to the command which breaks the cache. It's not even necessary to define it yourself on the RUN line.
FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel#alexander.com>
RUN apt-get -y install software-properties-common
RUN apt-get -y update
# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
RUN add-apt-repository -y ppa:webupd8team/java
RUN apt-get -y update
RUN apt-get install -y oracle-java8-installer
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /var/cache/oracle-jdk8-installer
# Define working directory.
WORKDIR /work
# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH
# Install maven
RUN apt-get -y update
RUN apt-get -y install maven
# Install Open SSH and git
RUN apt-get -y install openssh-server
RUN apt-get -y install git
# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package
# clone and build zeppelin fork, changing INCUBATOR_VER will break the cache here
ARG INCUBATOR_VER=unknown
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests
# Install Supervisord
RUN apt-get -y install supervisor
RUN mkdir -p var/log/supervisor
# Configure Supervisord
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd
EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]
And then just run it with a unique arg:
docker build --build-arg INCUBATOR_VER=20160613.2 -t user/image-name .
To change the argument with every build, you can pass a timestamp as the arg:
docker build --build-arg INCUBATOR_VER=$(date +%Y%m%d-%H%M%S) -t user/image-name .
or:
docker build --build-arg INCUBATOR_VER=$(date +%s) -t user/image-name .
As an aside, I'd recommend the following changes to keep your layers smaller, the more you can merge the cleanup and delete steps on a single RUN command after the download and install, the smaller your final image will be. Otherwise your layers will include all the intermediate steps between the download and cleanup:
FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel#alexander.com>
RUN DEBIAN_FRONTEND=noninteractive \
apt-get -y install software-properties-common && \
apt-get -y update
# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && \
add-apt-repository -y ppa:webupd8team/java && \
apt-get -y update && \
DEBIAN_FRONTEND=noninteractive \
apt-get install -y oracle-java8-installer && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /var/cache/oracle-jdk8-installer && \
# Define working directory.
WORKDIR /work
# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH
# Install maven
RUN apt-get -y update && \
DEBIAN_FRONTEND=noninteractive \
apt-get -y install
maven \
openssh-server \
git \
supervisor && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package
# clone and build zeppelin fork
ARG INCUBATOR_VER=unknown
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests
# Configure Supervisord
RUN mkdir -p var/log/supervisor
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd
EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]
One workaround:
Locate the step you want to execute from.
Before that step put a simple dummy operation like "RUN pwd"
Then just build your Dockerfile. It will take everything up to that step from the cache and then execute the lines after the dummy command.
To complete Dmitry's answer, you can use uniq arg like date +%s to keep always same commanline
docker build --build-arg DUMMY=`date +%s` -t me/myapp:1.0.0
Dockerfile:
...
ARG DUMMY=unknown
RUN DUMMY=${DUMMY} git clone xxx
...
A simpler technique.
Dockerfile:Add this line where you want the caching to start being skipped.
COPY marker /dev/null
Then build using
date > marker && docker build .
Another option is to delete the cached intermediate image you want to rebuild.
Find the hash of the intermediate image you wish to rebuild in your build output:
Step 27/42 : RUN lolarun.sh
---> Using cache
---> 328dfe03e436
Then delete that image:
$ docker image rmi 328dfe03e436
Or if it gives you an error and you're okay with forcing it:
$ docker image rmi -f 328dfe03e436
Finally, rerun your build command, and it will need to restart from that point because it's not in the cache.
If place ARG INCUBATOR_VER=unknown at top, then cache will not be used in case of change of INCUBATOR_VER from command line (just tested the build).
For me worked:
# The rebuild starts from here
ARG INCUBATOR_VER=unknown
RUN INCUBATOR_VER=${INCUBATOR_VER} git clone https://github.com/apache/incubator-zeppelin.git
As there is no official way to do this, the most simple way is to temporarily change the specific RUN command to include something harmless like echo:
RUN echo && apt-get -qq update && \
apt-get install nano
After building remove it.

Resources