docker-compose update from S3 bucket - docker

Our Dockerfile invokes a python script which copies a binary from S3 to /usr/bin. This works fine the first time. But from then on "docker-compose build" does nothing because everything is cached. This is a problem if the binary has changed.
Short of building with --no-cache, what is the best way to make sure "docker-compose build" will always pick up the new binary if there is one. We don't mind if it unnecessarily downloads the binary even if unchanged, so long as it does work then the binary has changed.
Seems like we want a Dockerfile step that always executes?
FROM ubuntu:trusty
RUN apt-get update
RUN apt-get -y install software-properties-common
RUN apt-get -y install --reinstall ca-certificates
RUN add-apt-repository ppa:fkrull/deadsnakes
RUN apt-get update && apt-get install -y \
curl \
wget \
vim \
git \
python3.5 \
python3-pip \
python3-setuptools \
libpcap0.8-dev
RUN ln -sf /usr/bin/python3.5 /usr/bin/python3
ADD . /app
WORKDIR /app
# Install Python Requirements
RUN pip3 install -r etc/python/requirements.txt
# Download/Install processor and associated libs
RUN python3 setup_processor.py
RUN mkdir -p /logs
ENTRYPOINT ["/app/entrypoint.sh"]
Where setup_processor.py downloads directly from S3 to /usr/bin.

So as of now there is no direct feature like this. But there is a workaround to your solution.
Add Build argument before your download step
ARG BUILD_ON=now
# Download/Install processor and associated libs
RUN python3 setup_processor.py
While building the image use below
docker build --build-arg BUILD_ON=$(date) ....
This will always make sure that you get a change in the ARG step and all steps cache after that will be invalidated
A feature has already been requested and being worked out on below thread
https://github.com/moby/moby/issues/1996

Related

Docker image Package Patch within Dockerfile

I have below docker image, where I need to update patch to curl package, in below Docker image, in Line number 3 I am already doing update, but it is shown up in Vulnerabilities report.
I have added, RUN yum -y update curl at the end of Dockerfile, then it is not showing up in Vulnerabilities report.
Any fix?, All Packages must install with latest version, I dont want to be mention explicitly
or any mistakes in Dockerfile?
FROM centos:7 AS base
FROM base AS build
# Install all dependenticies
RUN yum -y update \
&& yum install -y openssl-devel bzip2-devel libffi-devel \
zlib-devel wget gcc make
# Below compile python from source
FROM base
ENV LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib
COPY --from=build /usr/local/ /usr/local/
# Copy Code
COPY . /app/
WORKDIR /app
#Install code dependecies.
RUN /usr/local/bin/python -m pip install --upgrade pip \
&& pip install -r requirements.txt
# Why, I need this step, when I already update RUN in line 3?, If I won't perform I see in compliance report, any fix?
RUN yum -y update curl
# run Application
ENTRYPOINT ["python"]
CMD ["test.py"]
In order to understand what constitutes an image, you need to look at a Dockerfile in a different way:
Every step (with the exception of FROM) creates a new image, with the results of the previous step as a base.
FROM doesn't use the previous step, but an explicitly specified one.
Now, looking at your Dockerfile, you seem to wonder why RUN yum -y update curl doesn't work as expected. For easier understanding, let's trace it backwards:
RUN yum -y update curl
RUN /usr/local/bin/python -m pip install --upgrade pip \ && pip install -r requirements.txt
WORKDIR /app
COPY . /app/
COPY --from=build /usr/local/ /usr/local/
ENV LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib
FROM base -- at this point, the previous step is changed to the last step of base
FROM centos:7 AS base -- here, the previous step is changed to centos:7
As you see, nowhere in the earlier steps is yum update -y curl!
BTW: Typing this, I'm wondering what your precise question is, i.e. whether this works or doesn't or whether you wonder why it's necessary. Are you aware of the difference between yum update and yum update curl even?
docker build and friends have a cache system, based on the text of the input. So if the text of the command yum -y update doesn't change, it will continue using the same cached version of the output forever (or until the cache is deleted). Try running the build with --no-cache and see if that helps.

Packages wont install in docker

I am trying to install tesseract-ocr to docker from a Dockerfile. When I build the Dockerfile everything looks normal and i get no errors but when I run the container tesseract is not installed.
If I access the container using sudo docker exec -t -i <container_id> /bin/bash and manually install tesseract using apt-get install -y tesseract-ocr-all it installs and works perfectly. Why doesn't it work when I try to install it during the build process?
My Dockerfile looks like this:
FROM ubuntu:20.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update \
&& apt-get install -y tesseract-ocr-all
RUN tesseract --version
FROM python:3.7
WORKDIR ocr
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .
Thanks!
It looks like you are leveraging Docker multi-stage builds without realizing it.
When you put FROM python:3.7, you essentially throw away everything you have done above that, since you start a new stage.
The easiest solution I can see is to move
RUN apt-get update \
&& apt-get install -y tesseract-ocr-all
RUN tesseract --version
into the FROM python:3.7 stage, and remove the FROM ubuntu:20.04 stage.
You need to switch your user, since you likely don't have permission to run those commands. Something like this should work:
USER root
RUN apt-get update \
&& apt-get install -y tesseract-ocr-all
USER <switch back to previous user>
You'll need to figure out what the default user is to switch back, which you can probably find in the Ubuntu docs or using whoami.

copy or add command not executed on docker hub

This dockerfile works as expected on my laptop. But it fails if I use automated builds on docker hub.
FROM ubuntu
# Install required software via apt and pip
RUN apt-get -y update && \
apt-get install -y \
awscli \
python \
python-pip \
software-properties-common \
&& add-apt-repository ppa:ubuntugis/ppa \
&& apt-get -y update \
&& apt-get install -y \
gdal-bin \
&& pip install boto3
# Copy Build Thumbnail script to Docker image and add execute permissions
COPY build-thumbnails.py build-thumbnails.py
RUN chmod +x build-thumbnails.py
The error is:
Step 6/7 : COPY build-thumbnails.py build-thumbnails.py
COPY failed: stat /var/lib/docker/tmp/docker-builder259560514/build-thumbnails.py: no such file or directory
The repo is here...
https://github.com/shantanuo/docker/blob/master/batch/Dockerfile
Why would copy or add command not work for automated builds?
Seems like other people have had the same issue see here:
https://forums.docker.com/t/docker-build-failing-on-docker-hub/76191/2
The solution is to set the build context appropriately so that the relative path >in the Dockerfile COPY is correct.
In your Docker Hub repository go to “Builds” and click on “Configure Automated >Builds”. There you can set the “Build Context” for each build rule.
Check the last answer on this page too:
https://github.com/docker/hub-feedback/issues/811
Let me know if that helps!

Prevent docker from building the image from scratch after making changes to the code

A docker newbie, trying to develop in a docker container; I have a problem which is every time I make a single line change of code and try to rerun the container, docker will rebuild the image from scratch which takes a very long time; How should I set up the project correctly so it makes the best use of cache? Pretty sure it doesn't have to reinstall all the apt-get and pip installs (btw I am developing in python) whenever I make some changes to the source code. Anyone have any idea what I am missing. Appreciate any help.
My current docker file:
FROM tiangolo/uwsgi-nginx-flask:python3.6
# Copy the current directory contents into the container at /app
ADD ./app /app
# Run python's package manager and install the flask package
RUN apt-get update -y \
&& apt-get -y install default-jre \
&& apt-get install -y \
build-essential \
gfortran \
libblas-dev \
liblapack-dev \
libxft-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
ADD ./requirements.txt /app/requirements.txt
RUN pip3 install -r requirements.txt
Once the cache breaks in a Dockerfile, all of the following lines will need to be rebuilt since they no longer have a cache hit. The cache search looks for an existing previous layer and an identical command (or contents of something like a COPY) to reuse the cache. If both do not match, then you have a cache miss and it performs the build step. For your scenario, you simply need to reorder your lines to make sure the frequently changing part is at the end rather than the beginning of the file:
FROM tiangolo/uwsgi-nginx-flask:python3.6
# Run python's package manager and install the flask package
RUN apt-get update -y \
&& apt-get -y install default-jre \
&& apt-get install -y \
build-essential \
gfortran \
libblas-dev \
liblapack-dev \
libxft-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install -r requirements.txt
# Copy the current directory contents into the container at /app
COPY app /app
I've also modified your ADD lines to COPY because you don't need the extra features provided by ADD.
During development, I'd recommend mounting app as a volume in your container so you don't need to rebuild the image for every code change. You can leave the COPY app /app inside your Dockerfile and the volume mount will simply overlay the directory, hiding anything in your image at that location. You only need to restart your container to pickup your modifications. Once finished, a build will create an image that looks identical to your development environment.

rebuild docker image from specific step

I have the below Dockerfile.
FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel#alexander.com>
RUN apt-get -y install software-properties-common
RUN apt-get -y update
# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
RUN add-apt-repository -y ppa:webupd8team/java
RUN apt-get -y update
RUN apt-get install -y oracle-java8-installer
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /var/cache/oracle-jdk8-installer
# Define working directory.
WORKDIR /work
# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH
# Install maven
RUN apt-get -y update
RUN apt-get -y install maven
# Install Open SSH and git
RUN apt-get -y install openssh-server
RUN apt-get -y install git
# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package
# clone and build zeppelin fork
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests
# Install Supervisord
RUN apt-get -y install supervisor
RUN mkdir -p var/log/supervisor
# Configure Supervisord
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd
EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]
While building image it failed in step 23 i.e.
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests
Now when I rebuild it starts to build from step 23 as docker is using cache.
But if I want to rebuild the image from step 21 i.e.
RUN git clone https://github.com/apache/incubator-zeppelin.git
How can I do that?
Is deleting the cached image is the only option?
Is there any additional parameter to do that?
You can rebuild the entire thing without using the cache by doing a
docker build --no-cache -t user/image-name
To force a rerun starting at a specific line, you can pass an arg that is otherwise unused. Docker passes ARG values as environment variables to your RUN command, so changing an ARG is a change to the command which breaks the cache. It's not even necessary to define it yourself on the RUN line.
FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel#alexander.com>
RUN apt-get -y install software-properties-common
RUN apt-get -y update
# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections
RUN add-apt-repository -y ppa:webupd8team/java
RUN apt-get -y update
RUN apt-get install -y oracle-java8-installer
RUN rm -rf /var/lib/apt/lists/*
RUN rm -rf /var/cache/oracle-jdk8-installer
# Define working directory.
WORKDIR /work
# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH
# Install maven
RUN apt-get -y update
RUN apt-get -y install maven
# Install Open SSH and git
RUN apt-get -y install openssh-server
RUN apt-get -y install git
# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package
# clone and build zeppelin fork, changing INCUBATOR_VER will break the cache here
ARG INCUBATOR_VER=unknown
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests
# Install Supervisord
RUN apt-get -y install supervisor
RUN mkdir -p var/log/supervisor
# Configure Supervisord
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd
EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]
And then just run it with a unique arg:
docker build --build-arg INCUBATOR_VER=20160613.2 -t user/image-name .
To change the argument with every build, you can pass a timestamp as the arg:
docker build --build-arg INCUBATOR_VER=$(date +%Y%m%d-%H%M%S) -t user/image-name .
or:
docker build --build-arg INCUBATOR_VER=$(date +%s) -t user/image-name .
As an aside, I'd recommend the following changes to keep your layers smaller, the more you can merge the cleanup and delete steps on a single RUN command after the download and install, the smaller your final image will be. Otherwise your layers will include all the intermediate steps between the download and cleanup:
FROM ubuntu:14.04
MAINTAINER Samuel Alexander <samuel#alexander.com>
RUN DEBIAN_FRONTEND=noninteractive \
apt-get -y install software-properties-common && \
apt-get -y update
# Install Java.
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && \
add-apt-repository -y ppa:webupd8team/java && \
apt-get -y update && \
DEBIAN_FRONTEND=noninteractive \
apt-get install -y oracle-java8-installer && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /var/cache/oracle-jdk8-installer && \
# Define working directory.
WORKDIR /work
# Define commonly used JAVA_HOME variable
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
# JAVA PATH
ENV PATH /usr/lib/jvm/java-8-oracle/bin:$PATH
# Install maven
RUN apt-get -y update && \
DEBIAN_FRONTEND=noninteractive \
apt-get -y install
maven \
openssh-server \
git \
supervisor && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# clone Spark
RUN git clone https://github.com/apache/spark.git
WORKDIR /work/spark
RUN mvn -DskipTests clean package
# clone and build zeppelin fork
ARG INCUBATOR_VER=unknown
RUN git clone https://github.com/apache/incubator-zeppelin.git
WORKDIR /work/incubator-zeppelin
RUN mvn clean package -Pspark-1.6 -Phadoop-2.6 -DskipTests
# Configure Supervisord
RUN mkdir -p var/log/supervisor
COPY conf/supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# bash
RUN sed -i s#/home/git:/bin/false#/home/git:/bin/bash# /etc/passwd
EXPOSE 8080 8082
CMD ["/usr/bin/supervisord"]
One workaround:
Locate the step you want to execute from.
Before that step put a simple dummy operation like "RUN pwd"
Then just build your Dockerfile. It will take everything up to that step from the cache and then execute the lines after the dummy command.
To complete Dmitry's answer, you can use uniq arg like date +%s to keep always same commanline
docker build --build-arg DUMMY=`date +%s` -t me/myapp:1.0.0
Dockerfile:
...
ARG DUMMY=unknown
RUN DUMMY=${DUMMY} git clone xxx
...
A simpler technique.
Dockerfile:Add this line where you want the caching to start being skipped.
COPY marker /dev/null
Then build using
date > marker && docker build .
Another option is to delete the cached intermediate image you want to rebuild.
Find the hash of the intermediate image you wish to rebuild in your build output:
Step 27/42 : RUN lolarun.sh
---> Using cache
---> 328dfe03e436
Then delete that image:
$ docker image rmi 328dfe03e436
Or if it gives you an error and you're okay with forcing it:
$ docker image rmi -f 328dfe03e436
Finally, rerun your build command, and it will need to restart from that point because it's not in the cache.
If place ARG INCUBATOR_VER=unknown at top, then cache will not be used in case of change of INCUBATOR_VER from command line (just tested the build).
For me worked:
# The rebuild starts from here
ARG INCUBATOR_VER=unknown
RUN INCUBATOR_VER=${INCUBATOR_VER} git clone https://github.com/apache/incubator-zeppelin.git
As there is no official way to do this, the most simple way is to temporarily change the specific RUN command to include something harmless like echo:
RUN echo && apt-get -qq update && \
apt-get install nano
After building remove it.

Resources