Prevent docker from building the image from scratch after making changes to the code - docker

A docker newbie, trying to develop in a docker container; I have a problem which is every time I make a single line change of code and try to rerun the container, docker will rebuild the image from scratch which takes a very long time; How should I set up the project correctly so it makes the best use of cache? Pretty sure it doesn't have to reinstall all the apt-get and pip installs (btw I am developing in python) whenever I make some changes to the source code. Anyone have any idea what I am missing. Appreciate any help.
My current docker file:
FROM tiangolo/uwsgi-nginx-flask:python3.6
# Copy the current directory contents into the container at /app
ADD ./app /app
# Run python's package manager and install the flask package
RUN apt-get update -y \
&& apt-get -y install default-jre \
&& apt-get install -y \
build-essential \
gfortran \
libblas-dev \
liblapack-dev \
libxft-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
ADD ./requirements.txt /app/requirements.txt
RUN pip3 install -r requirements.txt

Once the cache breaks in a Dockerfile, all of the following lines will need to be rebuilt since they no longer have a cache hit. The cache search looks for an existing previous layer and an identical command (or contents of something like a COPY) to reuse the cache. If both do not match, then you have a cache miss and it performs the build step. For your scenario, you simply need to reorder your lines to make sure the frequently changing part is at the end rather than the beginning of the file:
FROM tiangolo/uwsgi-nginx-flask:python3.6
# Run python's package manager and install the flask package
RUN apt-get update -y \
&& apt-get -y install default-jre \
&& apt-get install -y \
build-essential \
gfortran \
libblas-dev \
liblapack-dev \
libxft-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install -r requirements.txt
# Copy the current directory contents into the container at /app
COPY app /app
I've also modified your ADD lines to COPY because you don't need the extra features provided by ADD.
During development, I'd recommend mounting app as a volume in your container so you don't need to rebuild the image for every code change. You can leave the COPY app /app inside your Dockerfile and the volume mount will simply overlay the directory, hiding anything in your image at that location. You only need to restart your container to pickup your modifications. Once finished, a build will create an image that looks identical to your development environment.

Related

Installing a python project using Poetry in a Docker container

I am using Poetry to install a python project using Poetry in a Docker container. Below you can find my Docker file, which used to work fine until recently when I switched to a new version of Poetry (1.2.1) and the new recommended Poetry installer:
# pull official base image
FROM ubuntu:20.04
ENV PATH = "${PATH}:/home/poetry/bin"
ENV APP_HOME=/home/app/web
RUN apt-get -y update && \
apt upgrade -y && \
apt-get install -y \
python3-pip \
curl \
netcat \
gunicorn && \
rm -fr /var/lib/apt/lists
# alias python2 to python3
RUN ln -s /usr/bin/python3 /usr/bin/python
# Install Poetry
RUN mkdir -p /home/poetry && \
curl -sSL https://install.python-poetry.org | POETRY_HOME=/home/poetry python -
# Cleanup
RUN apt-get remove -y curl && \
apt-get clean
RUN pip install --upgrade pip && \
pip install cryptography && \
pip install psycopg2-binary
# create directory for the app user
# create the app user
# create the appropriate directories
RUN adduser --system --group app && \
mkdir -p $APP_HOME/static-incdtim && \
mkdir -p $APP_HOME/mediafiles
# copy project
COPY . $APP_HOME
WORKDIR $APP_HOME
# Install Python packages
RUN poetry config virtualenvs.create false
RUN poetry install --only main
# copy entrypoint-prod.sh
COPY ./entrypoint.incdtim.prod.sh $APP_HOME/entrypoint.sh
RUN chmod a+x $APP_HOME/entrypoint.sh
# chown all the files to the app user
RUN chown -R app:app $APP_HOME
# change to the app user
USER app
# run entrypoint.prod.sh
ENTRYPOINT ["/home/app/web/entrypoint.sh"]
The poetry install works fine, I attached to a running container and run it myself and found that it works without problems. However, when I open a Python console and try to import a module (django) which is installed by the Poetry project, the module is not found. Please note that I am installing my project in the system environment (poetry config virtualenvs.create false). I verified, and there is only one version of python installed in the docker container. The specific error I get when trying to import a python module installed by Poetry is: ModuleNotFoundError: No module named xxxx
Although this is not an answer, it is too long to fit within the comment section. It is rather a piece of advice:
declare your ENV at the top of the Dockerfile to make it easier to read.
merge the multiple RUN commands together to avoid creating useless intermediate layers. In the particular case of apt-get install, this will also prevent you from installing a package which dates back from the first "apt-get update". Indeed, since the command line has not changed Docker will not re-execute the command and thus not refresh the package list..
avoid making a copy of all the files in "." when you previously copy some specific files to specific places..
Here, you Dockerfile could rather look like:
# pull official base image
FROM ubuntu:20.04
ENV PATH = "${PATH}:/home/poetry/bin"
ENV HOME=/home/app
ENV APP_HOME=/home/app/web
RUN apt-get -y update && \
apt upgrade -y && \
apt-get install -y \
python3-pip \
curl \
netcat \
gunicorn && \
rm -fr /var/lib/apt/lists
# alias python2 to python3
RUN ln -s /usr/bin/python3 /usr/bin/python
# Install Poetry
RUN mkdir -p /home/poetry && \
curl -sSL https://install.python-poetry.org | POETRY_HOME=/home/poetry python -
# Cleanup
RUN apt-get remove -y \
curl && \
apt-get clean
RUN pip install --upgrade pip && \
pip install cryptography && \
pip install psycopg2-binary
# create directory for the app user
# create the app user
# create the appropriate directories
RUN mkdir -p /home/app && \
adduser --system --group app && \
mkdir -p $APP_HOME/static-incdtim && \
mkdir -p $APP_HOME/mediafiles
WORKDIR $APP_HOME
# copy project
COPY . $APP_HOME
# Install Python packages
RUN poetry config virtualenvs.create false && \
poetry install --only main
# copy entrypoint-prod.sh
RUN cp $APP_HOME/entrypoint.incdtim.prod.sh $APP_HOME/entrypoint.sh && \
chmod a+x $APP_HOME/entrypoint.sh && \
chown -R app:app $APP_HOME
# change to the app user
USER app
# run entrypoint.prod.sh
ENTRYPOINT ["/home/app/web/entrypoint.sh"]
UPDATE:
Let's get back to your question. Having your program running okay when you "run it yourself" does not mean all the dependencies are met. Indeed, this can mean that your module has not been imported yet (and thus has not triggered the ModuleNotFoundError exception yet).
In order to validate this theory, you can either:
create a simple application which imports the failing module and then quits. If the import succeeds then there is something weird indeed.
list the installed modules with poetry show --latest. If the package is listed, then there is something weird indeed.
If none of the above indicates the module is installed, that just means the module is not installed and you should update your Dockerfile to install it.
NOTE: I do not know much about poetry, but you may want to have a list external dependencies to be met for your application. In the case of pip3, the list is expressed as a file named requirement.txt and can be installed with pip3 install -r requirement.txt.
It turns out this is known a bug in Poetry: https://github.com/python-poetry/poetry/issues/6459

How to prevent having to rebuild image on code changes

I started using Docker for a personal project and realized that this increases my development time to an unnacceptable amount. I would rather spin up an LXC instance if I had to rebuild images for every code change.
I heard there was a way to mount this but wasn't sure exactly how one would go about it. I also have a docker compose yaml file but I think you mount a volume or something in the Dockerfile? The goal is to have code changes not need to rebuild a container image.
FROM ubuntu:18.04
EXPOSE 5000
# update apt
RUN apt-get update -y
RUN apt-get install -y --no-install-recommends build-essential gcc wget
# pip installs
FROM python:3.10
# TA-Lib
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure && \
make && \
make install
RUN rm -R ta-lib ta-lib-0.4.0-src.tar.gz
ADD app.py /
RUN pip install --upgrade pip setuptools
RUN pip install pymysql
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
RUN pip freeze >> /tmp/requirement.txt
COPY . /tmp
CMD ["python", "/tmp/app.py"]
RUN chmod +x ./tmp/start.sh
RUN ./tmp/start.sh
version: '3.8'
services:
db:
image: mysql:8.0.28
command: '--default-authentication-plugin=mysql_native_password'
restart: always
environment:
- MYSQL_DATABASE=#########
- MYSQL_ROOT_PASSWORD=####
# client:
# build: client
# ports: [3000]
# restart: always
server:
build: server
ports: [5000]
restart: always
Here's what I would suggest to make dev builds faster:
Bind mount code into the container
A bind mount is a directory shared between the container and the host. Here's the syntax for it:
version: '3.8'
services:
# ... other services ...
server:
build: server
ports: [5000]
restart: always
volumes:
# Map the server directory in into the container at /code
- ./server:/code
The first part of the mount, ./server is relative to the directory that the docker-compose.yml file is in. If the server directory and the docker-compose.yml file are in different directories, you'll need to change this part.
After that, you'd remove the part of the Dockerfile which copies code into the container. Something like this:
# pip installs
FROM python:3.10
# TA-Lib
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure && \
make && \
make install
RUN rm -R ta-lib ta-lib-0.4.0-src.tar.gz
RUN pip install --upgrade pip setuptools
RUN pip install pymysql
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
CMD ["python", "/code/app.py"]
The advantage of this approach is that when you hit 'save' in your editor, the change will be immediately propagated into the container, without requiring a rebuild.
Documentation on syntax
Note about production builds: I don't recommend bind mounts when running your production server. In that case, I would recommend copying your code into the container instead of using a bind mount. This makes it easier to upgrade a running server. I typically write two Dockerfiles and two docker-compose.yml files: one set for production, and one set for development.
Install dependencies before copying code into container
One part of your Dockerfile is causing most of the slowness. It's this part:
ADD app.py /
# ... snip two lines ...
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
This defeats Docker's layer caching. Docker is capable of caching layers, and using the cache if nothing in that layer has changed. However, if a layer changes, any layer after that change will be rebuilt. This means that changing app.py will cause the pip install --requirement /tmp/requirements.txt line to run again.
To make use of caching, you should follow the rule that the least-frequently changing file goes in first, and most-frequently changing file goes last. Since you change the code in your project more often than you change which dependencies you're using, that means you should copy app.py in after you've installed the dependencies.
The Dockerfile would change like this:
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
# After installing dependencies
ADD app.py /
In my projects, I find that rebuilding a container without changing dependencies takes about a second, even if I'm not using the bind-mount trick.
For more information, see the documentation on layer caching.
Remove unused stage
You have two stages in your Dockerfile:
FROM ubuntu:18.04
# ... snip ...
FROM python:3.10
The FROM command means that you are throwing out everything in the image and starting from a new base image. This means that everything in between these two lines is not really doing anything. To fix this, remove everything before the second FROM statement.
Why would you use multistage builds? Sometimes it's useful to install a compiler, compile something, then copy it into a fresh image. Example.
Merge install and remove step
If you want to remove a file, you should do it in the same layer where you created the file. The reason for this is that deleting a file in a previous layer does not fully remove the file: the file still takes up space in the image. A tool like dive can show you files which are having this problem.
Here's how I would suggest changing this section:
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure && \
make && \
make install
RUN rm -R ta-lib ta-lib-0.4.0-src.tar.gz
Merge the rm into the previous step:
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure && \
make && \
make install && \
cd .. && \
rm -R ta-lib ta-lib-0.4.0-src.tar.gz

Docker image Package Patch within Dockerfile

I have below docker image, where I need to update patch to curl package, in below Docker image, in Line number 3 I am already doing update, but it is shown up in Vulnerabilities report.
I have added, RUN yum -y update curl at the end of Dockerfile, then it is not showing up in Vulnerabilities report.
Any fix?, All Packages must install with latest version, I dont want to be mention explicitly
or any mistakes in Dockerfile?
FROM centos:7 AS base
FROM base AS build
# Install all dependenticies
RUN yum -y update \
&& yum install -y openssl-devel bzip2-devel libffi-devel \
zlib-devel wget gcc make
# Below compile python from source
FROM base
ENV LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib
COPY --from=build /usr/local/ /usr/local/
# Copy Code
COPY . /app/
WORKDIR /app
#Install code dependecies.
RUN /usr/local/bin/python -m pip install --upgrade pip \
&& pip install -r requirements.txt
# Why, I need this step, when I already update RUN in line 3?, If I won't perform I see in compliance report, any fix?
RUN yum -y update curl
# run Application
ENTRYPOINT ["python"]
CMD ["test.py"]
In order to understand what constitutes an image, you need to look at a Dockerfile in a different way:
Every step (with the exception of FROM) creates a new image, with the results of the previous step as a base.
FROM doesn't use the previous step, but an explicitly specified one.
Now, looking at your Dockerfile, you seem to wonder why RUN yum -y update curl doesn't work as expected. For easier understanding, let's trace it backwards:
RUN yum -y update curl
RUN /usr/local/bin/python -m pip install --upgrade pip \ && pip install -r requirements.txt
WORKDIR /app
COPY . /app/
COPY --from=build /usr/local/ /usr/local/
ENV LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib
FROM base -- at this point, the previous step is changed to the last step of base
FROM centos:7 AS base -- here, the previous step is changed to centos:7
As you see, nowhere in the earlier steps is yum update -y curl!
BTW: Typing this, I'm wondering what your precise question is, i.e. whether this works or doesn't or whether you wonder why it's necessary. Are you aware of the difference between yum update and yum update curl even?
docker build and friends have a cache system, based on the text of the input. So if the text of the command yum -y update doesn't change, it will continue using the same cached version of the output forever (or until the cache is deleted). Try running the build with --no-cache and see if that helps.

copy or add command not executed on docker hub

This dockerfile works as expected on my laptop. But it fails if I use automated builds on docker hub.
FROM ubuntu
# Install required software via apt and pip
RUN apt-get -y update && \
apt-get install -y \
awscli \
python \
python-pip \
software-properties-common \
&& add-apt-repository ppa:ubuntugis/ppa \
&& apt-get -y update \
&& apt-get install -y \
gdal-bin \
&& pip install boto3
# Copy Build Thumbnail script to Docker image and add execute permissions
COPY build-thumbnails.py build-thumbnails.py
RUN chmod +x build-thumbnails.py
The error is:
Step 6/7 : COPY build-thumbnails.py build-thumbnails.py
COPY failed: stat /var/lib/docker/tmp/docker-builder259560514/build-thumbnails.py: no such file or directory
The repo is here...
https://github.com/shantanuo/docker/blob/master/batch/Dockerfile
Why would copy or add command not work for automated builds?
Seems like other people have had the same issue see here:
https://forums.docker.com/t/docker-build-failing-on-docker-hub/76191/2
The solution is to set the build context appropriately so that the relative path >in the Dockerfile COPY is correct.
In your Docker Hub repository go to “Builds” and click on “Configure Automated >Builds”. There you can set the “Build Context” for each build rule.
Check the last answer on this page too:
https://github.com/docker/hub-feedback/issues/811
Let me know if that helps!

docker-compose update from S3 bucket

Our Dockerfile invokes a python script which copies a binary from S3 to /usr/bin. This works fine the first time. But from then on "docker-compose build" does nothing because everything is cached. This is a problem if the binary has changed.
Short of building with --no-cache, what is the best way to make sure "docker-compose build" will always pick up the new binary if there is one. We don't mind if it unnecessarily downloads the binary even if unchanged, so long as it does work then the binary has changed.
Seems like we want a Dockerfile step that always executes?
FROM ubuntu:trusty
RUN apt-get update
RUN apt-get -y install software-properties-common
RUN apt-get -y install --reinstall ca-certificates
RUN add-apt-repository ppa:fkrull/deadsnakes
RUN apt-get update && apt-get install -y \
curl \
wget \
vim \
git \
python3.5 \
python3-pip \
python3-setuptools \
libpcap0.8-dev
RUN ln -sf /usr/bin/python3.5 /usr/bin/python3
ADD . /app
WORKDIR /app
# Install Python Requirements
RUN pip3 install -r etc/python/requirements.txt
# Download/Install processor and associated libs
RUN python3 setup_processor.py
RUN mkdir -p /logs
ENTRYPOINT ["/app/entrypoint.sh"]
Where setup_processor.py downloads directly from S3 to /usr/bin.
So as of now there is no direct feature like this. But there is a workaround to your solution.
Add Build argument before your download step
ARG BUILD_ON=now
# Download/Install processor and associated libs
RUN python3 setup_processor.py
While building the image use below
docker build --build-arg BUILD_ON=$(date) ....
This will always make sure that you get a change in the ARG step and all steps cache after that will be invalidated
A feature has already been requested and being worked out on below thread
https://github.com/moby/moby/issues/1996

Resources