How to prevent having to rebuild image on code changes - docker

I started using Docker for a personal project and realized that this increases my development time to an unnacceptable amount. I would rather spin up an LXC instance if I had to rebuild images for every code change.
I heard there was a way to mount this but wasn't sure exactly how one would go about it. I also have a docker compose yaml file but I think you mount a volume or something in the Dockerfile? The goal is to have code changes not need to rebuild a container image.
FROM ubuntu:18.04
EXPOSE 5000
# update apt
RUN apt-get update -y
RUN apt-get install -y --no-install-recommends build-essential gcc wget
# pip installs
FROM python:3.10
# TA-Lib
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure && \
make && \
make install
RUN rm -R ta-lib ta-lib-0.4.0-src.tar.gz
ADD app.py /
RUN pip install --upgrade pip setuptools
RUN pip install pymysql
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
RUN pip freeze >> /tmp/requirement.txt
COPY . /tmp
CMD ["python", "/tmp/app.py"]
RUN chmod +x ./tmp/start.sh
RUN ./tmp/start.sh
version: '3.8'
services:
db:
image: mysql:8.0.28
command: '--default-authentication-plugin=mysql_native_password'
restart: always
environment:
- MYSQL_DATABASE=#########
- MYSQL_ROOT_PASSWORD=####
# client:
# build: client
# ports: [3000]
# restart: always
server:
build: server
ports: [5000]
restart: always

Here's what I would suggest to make dev builds faster:
Bind mount code into the container
A bind mount is a directory shared between the container and the host. Here's the syntax for it:
version: '3.8'
services:
# ... other services ...
server:
build: server
ports: [5000]
restart: always
volumes:
# Map the server directory in into the container at /code
- ./server:/code
The first part of the mount, ./server is relative to the directory that the docker-compose.yml file is in. If the server directory and the docker-compose.yml file are in different directories, you'll need to change this part.
After that, you'd remove the part of the Dockerfile which copies code into the container. Something like this:
# pip installs
FROM python:3.10
# TA-Lib
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure && \
make && \
make install
RUN rm -R ta-lib ta-lib-0.4.0-src.tar.gz
RUN pip install --upgrade pip setuptools
RUN pip install pymysql
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
CMD ["python", "/code/app.py"]
The advantage of this approach is that when you hit 'save' in your editor, the change will be immediately propagated into the container, without requiring a rebuild.
Documentation on syntax
Note about production builds: I don't recommend bind mounts when running your production server. In that case, I would recommend copying your code into the container instead of using a bind mount. This makes it easier to upgrade a running server. I typically write two Dockerfiles and two docker-compose.yml files: one set for production, and one set for development.
Install dependencies before copying code into container
One part of your Dockerfile is causing most of the slowness. It's this part:
ADD app.py /
# ... snip two lines ...
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
This defeats Docker's layer caching. Docker is capable of caching layers, and using the cache if nothing in that layer has changed. However, if a layer changes, any layer after that change will be rebuilt. This means that changing app.py will cause the pip install --requirement /tmp/requirements.txt line to run again.
To make use of caching, you should follow the rule that the least-frequently changing file goes in first, and most-frequently changing file goes last. Since you change the code in your project more often than you change which dependencies you're using, that means you should copy app.py in after you've installed the dependencies.
The Dockerfile would change like this:
COPY requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt
# After installing dependencies
ADD app.py /
In my projects, I find that rebuilding a container without changing dependencies takes about a second, even if I'm not using the bind-mount trick.
For more information, see the documentation on layer caching.
Remove unused stage
You have two stages in your Dockerfile:
FROM ubuntu:18.04
# ... snip ...
FROM python:3.10
The FROM command means that you are throwing out everything in the image and starting from a new base image. This means that everything in between these two lines is not really doing anything. To fix this, remove everything before the second FROM statement.
Why would you use multistage builds? Sometimes it's useful to install a compiler, compile something, then copy it into a fresh image. Example.
Merge install and remove step
If you want to remove a file, you should do it in the same layer where you created the file. The reason for this is that deleting a file in a previous layer does not fully remove the file: the file still takes up space in the image. A tool like dive can show you files which are having this problem.
Here's how I would suggest changing this section:
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure && \
make && \
make install
RUN rm -R ta-lib ta-lib-0.4.0-src.tar.gz
Merge the rm into the previous step:
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
tar -xvzf ta-lib-0.4.0-src.tar.gz && \
cd ta-lib/ && \
./configure && \
make && \
make install && \
cd .. && \
rm -R ta-lib ta-lib-0.4.0-src.tar.gz

Related

Dockerfile cannot find executable script (no such file or directory)

I'm writting a Dockerfile in order to create an image for a web server (a shiny server more precisely). It works well, but it depends on a huge database folder (db/) that it is not distributed with the package, so I want to do all this preprocessing while creating the image, by running the corresponding script in the Dockerfile.
I expected this to be simple, but I'm struggling figuring out where my files are being located within the image.
This repo has the following structure:
Dockerfile
preprocessing_files
configuration_files
app/
application_files
db/
processed_files
So that app/db/ does not exist, but is created and filled with files when preprocessing_files are run.
The Dockerfile is the following:
# Install R version 3.6
FROM r-base:3.6.0
# Install Ubuntu packages
RUN apt-get update && apt-get install -y \
sudo \
gdebi-core \
pandoc \
pandoc-citeproc \
libcurl4-gnutls-dev \
libcairo2-dev/unstable \
libxml2-dev \
libxt-dev \
libssl-dev
# Download and install ShinyServer (latest version)
RUN wget --no-verbose https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/VERSION -O "version.txt" && \
VERSION=$(cat version.txt) && \
wget --no-verbose "https://s3.amazonaws.com/rstudio-shiny-server-os-build/ubuntu-12.04/x86_64/shiny-server-$VERSION-amd64.deb" -O ss-latest.deb && \
gdebi -n ss-latest.deb && \
rm -f version.txt ss-latest.deb
# Install R packages that are required
RUN R -e "install.packages(c('shiny', 'flexdashboard','rmarkdown','tidyverse','plotly','DT','drc','gridExtra','fitdistrplus'), repos='http://cran.rstudio.com/')"
# Copy configuration files into the Docker image
COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
COPY /app /srv/shiny-server/
COPY /app/db /srv/shiny-server/app/
# Make the ShinyApp available at port 80
EXPOSE 80
CMD ["/usr/bin/shiny-server"]
This above file works well if preprocessing_files are run in advance, so app/application_files can successfully read app/db/processed_files. How could this script be run in the Dockerfile? To me the intuitive solution would be simply to write:
RUN bash -c "preprocessing.sh"
Before the ADD instruction, but then preprocessing_files are not found. If the above instruction is written below ADD and also WORKDIR app/, the same error happens. I cannot understand why.
You cannot execute code on the host machine from Dockerfile. RUN command executes inside the container being built. You can:
Copy preprocessing_files inside docker container and run preprocessing.sh inside the container (this would increase size of the container)
Create a makefile/build.sh script which launches preprocessing.sh before executing docker build

Duplication in Dockerfiles

I have a Django Web-Application that uses celery in the background for periodic tasks.
Right now I have three docker images
one for the django application
one for celery workers
one for the celery scheduler
whose Dockerfiles all look like this:
FROM alpine:3.7
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY Pipfile Pipfile.lock ./
RUN apk update && \
apk add python3 postgresql-libs jpeg-dev git && \
apk add --virtual .build-deps gcc python3-dev musl-dev postgresql-dev zlib-dev && \
pip3 install --no-cache-dir pipenv && \
pipenv install --system && \
apk --purge del .build-deps
COPY . ./
# Run the image as a non-root user
RUN adduser -D noroot
USER noroot
EXPOSE $PORT
CMD <Different CMD for all three containers>
So they are all exactly the same except the last line.
Would it make sense here to create some kind of base image that contains everything except CMD. And all three images use that as base and add only their respective CMD?
Or won't that give me any advantages, because everything is cached anyway?
Is a seperation like you see above reasonable?
Two small bonus questions:
Sometimes the apk update .. layer is cached by docker. How does docker know that there are no updates here?
I often read that I should decrease layers as far a possible to reduce image size. But isn't that against the caching idea and will result in longer builds?
I will suggest to use one Dockerfile and just update your CMD during runtime. Litle bit modification will work for both local and Heroku as well.
As far Heroku is concern they provide environment variable to start container with the environment variable.
heroku set-up-your-local-environment-variables
FROM alpine:3.7
ENV PYTHONUNBUFFERED 1
ENV APPLICATION_TO_RUN=default_application
RUN mkdir /code
WORKDIR /code
COPY Pipfile Pipfile.lock ./
RUN apk update && \
apk add python3 postgresql-libs jpeg-dev git && \
apk add --virtual .build-deps gcc python3-dev musl-dev postgresql-dev zlib-dev && \
pip3 install --no-cache-dir pipenv && \
pipenv install --system && \
apk --purge del .build-deps
COPY . ./
# Run the image as a non-root user
RUN adduser -D noroot
USER noroot
EXPOSE $PORT
CMD $APPLICATION_TO_RUN
So When run you container pass your application name to run command.
docker run -it --name test -e APPLICATION_TO_RUN="celery beat" --rm test
I would recommend looking at docker-compose to simplify management of multiple containers.
Use a single Dockerfile like the one you posted above, then create a docker-compose.yml that might look something like this:
version: '3'
services:
# a django service serving an application on port 80
django:
build: .
command: python manage.py runserver
ports:
- 8000:80
# the celery worker
worker:
build: .
command: celery worker
# the celery scheduler
scheduler:
build: .
command: celery beat
Of course, modify the commands here to be whatever you are using for your currently separate Dockerfiles.
When you want to rebuild the image, docker-compose build will rebuild your container image from your Dockerfile for the first service, then reuse the built image for the other services (because they already exist in the cache). docker-compose up will spin up 3 instances of your container image, but overriding the run command each time.
If you want to get more sophisticated, there are plenty of resources out there for the very common combination of django and celery.

Docker port forwarding cannot see the output on browser

I am a newbie to Docker. I'm using ubuntu 14.04 as my OS and I've installed Docker Community Edition by following instructions from https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/#set-up-the-repository
I have a created a docker file for my project and run it using docker-compose file.
My Dockerfile is as follows.
# ImageName
FROM node:8.8.1
# Create app required directories
ENV appDir /usr/src/app
RUN mkdir -p /usr/src/app /usr/src/app/datas /usr/log/supervisor
# Change working directory
WORKDIR ${appDir}
# Install dependencies
RUN apt-get update && \
apt-get -y install vim\
supervisor \
python3 \
python3-pip \
python3-setuptools \
groff \
less \
&& pip3 install --upgrade pip \
&& apt-get clean
RUN pip3 --no-cache-dir install --upgrade awscli
# Install app dependencies
COPY graphql/package.json /usr/src/app
RUN npm install
RUN npm install -g webpack
# Copy app source code
COPY graphql/ /usr/src/app
COPY datas/ /usr/src/app/datas
# Set Environment Variables
RUN echo export DATA_DIR=/usr/src/app/datas/ >> ~/.data_variables && \
echo "source ~/.data_variables" >> ~/.bash_login && \
echo "source ~/.data_variables" >> ~/.bashrc
COPY supervisord.conf /etc/supercvisor/conf.d/supervisord.conf
# Expose API port to the outside
EXPOSE 5000
# Launch application
CMD ["/usr/bin/supervisord", "-c", "/etc/supercvisor/conf.d/supervisord.conf"]
My docker-compose file
version: '3'
services:
web:
build: .
image: graphql_img
container_name: graphql_img_master
ports:
- "5000:5000"
My supervisord.conf file
[supervisord]
nodaemon=true
[program:babelWatch]
command=npm run babelWatch
[program:monitor]
command=npm run monitor
As you can see I've exposed the port 5000, but when I try to check the output on the browser using the command localhost:5000/graphql it shows an error
This site can’t be reached
I even tried to check for the ip address of docker container using "docker inspect" command and I've used that container ip address with the port still I'm getting the error. Can somebody please help me out on this. Any help would be much appreciated.
Additionally, it would also really helpful to know how to make the program "run monitor" to run on foreground using supervisor

Prevent docker from building the image from scratch after making changes to the code

A docker newbie, trying to develop in a docker container; I have a problem which is every time I make a single line change of code and try to rerun the container, docker will rebuild the image from scratch which takes a very long time; How should I set up the project correctly so it makes the best use of cache? Pretty sure it doesn't have to reinstall all the apt-get and pip installs (btw I am developing in python) whenever I make some changes to the source code. Anyone have any idea what I am missing. Appreciate any help.
My current docker file:
FROM tiangolo/uwsgi-nginx-flask:python3.6
# Copy the current directory contents into the container at /app
ADD ./app /app
# Run python's package manager and install the flask package
RUN apt-get update -y \
&& apt-get -y install default-jre \
&& apt-get install -y \
build-essential \
gfortran \
libblas-dev \
liblapack-dev \
libxft-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
ADD ./requirements.txt /app/requirements.txt
RUN pip3 install -r requirements.txt
Once the cache breaks in a Dockerfile, all of the following lines will need to be rebuilt since they no longer have a cache hit. The cache search looks for an existing previous layer and an identical command (or contents of something like a COPY) to reuse the cache. If both do not match, then you have a cache miss and it performs the build step. For your scenario, you simply need to reorder your lines to make sure the frequently changing part is at the end rather than the beginning of the file:
FROM tiangolo/uwsgi-nginx-flask:python3.6
# Run python's package manager and install the flask package
RUN apt-get update -y \
&& apt-get -y install default-jre \
&& apt-get install -y \
build-essential \
gfortran \
libblas-dev \
liblapack-dev \
libxft-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install -r requirements.txt
# Copy the current directory contents into the container at /app
COPY app /app
I've also modified your ADD lines to COPY because you don't need the extra features provided by ADD.
During development, I'd recommend mounting app as a volume in your container so you don't need to rebuild the image for every code change. You can leave the COPY app /app inside your Dockerfile and the volume mount will simply overlay the directory, hiding anything in your image at that location. You only need to restart your container to pickup your modifications. Once finished, a build will create an image that looks identical to your development environment.

docker-compose update from S3 bucket

Our Dockerfile invokes a python script which copies a binary from S3 to /usr/bin. This works fine the first time. But from then on "docker-compose build" does nothing because everything is cached. This is a problem if the binary has changed.
Short of building with --no-cache, what is the best way to make sure "docker-compose build" will always pick up the new binary if there is one. We don't mind if it unnecessarily downloads the binary even if unchanged, so long as it does work then the binary has changed.
Seems like we want a Dockerfile step that always executes?
FROM ubuntu:trusty
RUN apt-get update
RUN apt-get -y install software-properties-common
RUN apt-get -y install --reinstall ca-certificates
RUN add-apt-repository ppa:fkrull/deadsnakes
RUN apt-get update && apt-get install -y \
curl \
wget \
vim \
git \
python3.5 \
python3-pip \
python3-setuptools \
libpcap0.8-dev
RUN ln -sf /usr/bin/python3.5 /usr/bin/python3
ADD . /app
WORKDIR /app
# Install Python Requirements
RUN pip3 install -r etc/python/requirements.txt
# Download/Install processor and associated libs
RUN python3 setup_processor.py
RUN mkdir -p /logs
ENTRYPOINT ["/app/entrypoint.sh"]
Where setup_processor.py downloads directly from S3 to /usr/bin.
So as of now there is no direct feature like this. But there is a workaround to your solution.
Add Build argument before your download step
ARG BUILD_ON=now
# Download/Install processor and associated libs
RUN python3 setup_processor.py
While building the image use below
docker build --build-arg BUILD_ON=$(date) ....
This will always make sure that you get a change in the ARG step and all steps cache after that will be invalidated
A feature has already been requested and being worked out on below thread
https://github.com/moby/moby/issues/1996

Resources