Pip install local package invalidates docker cache in upper layers - docker

I created a multistage docker file where in the base image I prepare anaconda environment with required packages and in the final image I copy the anaconda and install the local package.
I noticed that on every CI build and push all of the layers are recomputed and pushed, including the one big anaconda layer.
Here is how I build it
DOCKER_BUILTKIT=1 docker build -t my_image:240beac6 \
-f docker/dockerfiles/Dockerfile . \
--build-arg BASE_IMAGE=base_image:240beac64 --build-arg BUILDKIT_INLINE_CACHE=1 \
--cache-from my_image:latest
docker push my_image:240beac6
ARG BASE_IMAGE
FROM $BASE_IMAGE as base
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
# enable conda
ENV PATH=/root/miniconda3/bin/:${PATH}
COPY --from=base /opt/fast_align/* /usr/bin/
COPY --from=base /usr/local/bin/yq /usr/local/bin/yq
COPY --from=base /root/miniconda3 /root/miniconda3
COPY . /opt/my_package
# RUN pip install --no-deps /opt/my_package
If I leave the last run command commented out, the docker only builds the last COPY (if some file in the context changed) layer.
However, if I try to install it, it invalidates everything.
Is it because, I change the /root/miniconda3 with the pip install?
If so, I am surprised by that, I was hoping the lower RUN commands can't mess up the higher commands.
Is there a way to copy the conda environment from the base image, install the local image in a separate command and still benefit from the caching?
Any help is much appreciated.

One solution, albeit a bit hacky would be to replace the last RUN with CMD and install the package on start of the container. It would be almost instant as the requirements are already installed in the base image.

Related

Confused by Dockerfile

I feel confused by the Dockerfile and build process. Specifically, I am working my way through the book Docker on AWS and I feel stuck until I can work my way through a few more of the details. The book had me write the following Dockerfile.
#Test stage
FROM alpine as test
LABEL application=todobackend
#Install basic utilities
RUN apk add --no-cache bash git
#Install build dependencies
RUN apk add --no-cache gcc python3-dev libffi-dev musl-dev linux-headers mariadb-dev py3-pip
RUN ../../usr/bin/pip3 install wheel
#Copy requirements
COPY /src/requirements* /build/
WORKDIR /build
#Build and install requirements
RUN pip3 wheel -r requirements_test.txt --no-cache-dir --no-input
RUN pip3 install -r requirements_test.txt -f /build --no-index --no-cache-dir
# Copy source code
COPY /src /app
WORKDIR /app
# Test entrypoint
CMD ["python3","manage.py","test","--noinput","--settings=todobackend.settings_test"]
The following is a list of the things I understand versus don't understand.
I understand this.
#Test stage
FROM alpine as test
LABEL application=todobackend
It is defining a 'test' stage so I can run commands like docker build --target test and will execute all of the following commands until the next FROM / as command indicates a different target. LABEL is labeling the specific docker image that is built and from which containers will be 'born' (not sure if that is the right word to use). I don't feel any confusion about that EXCEPT if that tag translates to containers spawned from that image.
So NOW I start to feel confused.
I PARTLY understand this
#Install basic utilities
RUN apk add --no-cache bash git
I understand that apk is an overloaded term that represents both the package manager on Alpine Linux and a file type. In this context, it is a package manager command to install (or upgrade) a package to the running system. HOWEVER, I am suppose to be building / packaging up an application and all of its dependencies into an enclosed 'environment'. Sooo... where / when does this 'environment' come in? That is where I feel confused. When the docker file is running apk, is it just saying "locally, on your current machine, please install these the normal way." (ie, the equivalent of a bash script where apk installs to its working directory). When I run docker build --target test -t todobackend-test on my previously pasted docker file, is the docker command doing both a native command execution AND a Docker Engine call to create an isolated environment for my docker image? I feel like what must be happening is when the docker command is run it acts like a wrapper around the built-in package manager / bash / pip functionality AND the docker engine and is doing both but I don't know.
Anyways, I feel hope that this made sense. I just want some implementation details. Feel free to link documentation but it can feel super tedious and unnecessarily detailed OR obfuscated sometimes.
I DO want to point out that if I run an apk command in my Dockerfile with a bad dependency name (e.g. python3-pip instead of py3-pip). I get a very interesting error:
/bin/sh: pip3: not found
Notice the command path. I am assuming anyone reading this will understand why that feels hella confusing.

Why are two images created instead of one?

Please see the command below:
docker build -t iansbasecontainer:v1 -f DotnetDebug.Dockerfile .
It creates one container as shown below:
DotnetDebug.Dockerfile looks like this:
FROM microsoft/aspnetcore:2.0 AS base
# Install the SSHD server
RUN apt-get update \
&& apt-get install -y --no-install-recommends openssh-server \
&& mkdir -p /run/sshd \
&& echo "root:Docker!" | chpasswd
#Copy settings file. See elsewhere to find them.
COPY sshd_config /etc/ssh/sshd_config
COPY authorized_keys root/.ssh/authorized_keys
# Install Visual Studio Remote Debugger
RUN apt-get install zip unzip
RUN curl -sSL https://aka.ms/getvsdbgsh | bash /dev/stdin -v latest -l ~/vsdbg
EXPOSE 2222
I then run this command:
docker build -t iansimageusesbasecontainer:v1 -f DebugASP.Dockerfile .
However, two images appear:
DebugASP.Dockerfile looks like this:
FROM iansbasecontainer:v1 AS base
WORKDIR /app
MAINTAINER Vladimir Vladimir#akopyan.me
FROM microsoft/aspnetcore-build:2.0 AS build
WORKDIR /src
COPY ./DebugSample .
RUN dotnet restore
FROM build AS publish
RUN dotnet publish -c Debug -o /app
FROM base AS final
COPY --from=publish /app /app
COPY ./StartSSHAndApp.sh /app
EXPOSE 5000
CMD /app/StartSSHAndApp.sh
#If you wish to only have SSH running and start
#your service when you start debugging
#then use just the SSH server, you don't need the script
#CMD ["/usr/sbin/sshd", "-D"]
Why do two images appear? Please note I am relatively new to Docker so this may be a simple answer. I have spent the last few hours Googling it.
Also why is the repository and tag set to: .
Why do two images appear?
As mentioned here:
When using multi-stage builds, each stage produces a new image. That image is stored in the local image cache and will be used on subsequent builds (as part of the caching mechanism). You can run each build-stage (and/or tag the stage, if desired).
Read more about multi-stage builds here.
Docker produces intermediate(aka <none>:<none>) images for each layer, which are later used for final image. You can actually see them if execute docker images -a command.
But what you see is called dangling image. It happens, because some intermediate image is no longer used by final image. In case of multi-stage builds -- images for previous stages are not used in final image, so they become dangling.
Dangling images are useless and use your space, so it's recommended to regularly get rid of them(it's called pruning). You can do that with command:
docker image prune

Docker isn't caching Alpine apk add command

Everytime I build the container I have to wait for apk add docker to finish which takes a long time.
Since everytime it downloads the same thing, can I somehow force Docker to cache apk's downloads for development purposes?
Here's my Dockerfile:
FROM golang:1.13.5-alpine
WORKDIR /go/src/app
COPY src .
RUN go get -d -v ./...
RUN go install -v ./...
RUN apk add --update docker
CMD ["app"]
BTW, I am using this part volumes: - /var/run/docker.sock:/var/run/docker.sock in my docker-compose.yml to use sibling containers, if that matters.
EDIT: I've found google to copy docker.tgz in Chromium:
# add docker client -- do not install docker via apk -- it will try to install
# docker engine which takes a lot of space as well (we don't need it, we need
# only the small client to communicate with the host's docker server)
ADD build/docker/docker.tgz /
What is that docker.tgz? How can I get it?
Reorder your Dockerfile and it should work.
FROM golang:1.13.5-alpine
RUN apk add --update docker
WORKDIR /go/src/app
COPY src .
RUN go get -d -v ./...
RUN go install -v ./...
CMD ["app"]
As you are copying before installation, so whenever you change something in src the cache will invalidate for docker installtion.
Whenever you have a COPY command, if any of the files involve change, it causes every command after that to get re-run. If you move your RUN apk add ... command to the start of the file before it COPYs anything, it will get cached across runs.
A fairly generic recipe for most Dockerfiles to accommodate this pattern looks like:
FROM some-base-image
# Install OS-level dependencies
RUN apk add or apt-get install ...
WORKDIR /app
# Install language-level dependencies
COPY requirements.txt requirements.lock ./
RUN something install -r requirements.txt
# Install the rest of the application
COPY main.app ./
COPY src src/
# Set up standard run-time metadata
EXPOSE 12345
CMD ["/app/main.app"]
(Go and Java applications need the additional step of compiling the application, which often lends itself to a multi-stage build, but this same pattern can be repeated in both stages.)
You can download Docker x86_64 binaries for mac, linux, windows and unzip/untar and make it executable.
Whenever you are installing any packages in Docker container those should go at the beginning of Dockerfile, so it won’t ask you again to install same packages and COPY command part must be at the end of Dockerfile.

How to Edit Docker Image?

I did a basic search in the community and could not find a suitable answer, so I am asking here. Sorry if it was asked earlier.
Basically , I am working on a certain project and we keep changing code at a regular interval . So ,we need to build docker image everytime due to that we need to install dependencies from requirement.txt from scratch which took around 10 min everytime.
How can I perform direct change to docker image and also how to configure entrypoint(in Docker File) which reflect changes in Pre-Build docker image
You don't edit an image once it's been built. You always run docker build from the start; it always runs in a clean environment.
The flip side of this is that Docker caches built images. If you had image 01234567, ran RUN pip install -r requirements.txt, and got image 2468ace0 out, then the next time you run docker build it will see the same source image and the same command, and skip doing the work and jump directly to the output images. COPY or ADD files that change invalidates the cache for future steps.
So the standard pattern is
FROM node:10 # arbitrary choice of language
WORKDIR /app
# Copy in _only_ the requirements and package lock files
COPY package.json yarn.lock ./
# Install dependencies (once)
RUN yarn install
# Copy in the rest of the application and build it
COPY src/ src/
RUN yarn build
# Standard application metadata
EXPOSE 3000
CMD ["yarn", "start"]
If you only change something in your src tree, docker build will skip up to the COPY step, since the package.json and yarn.lock files haven't changed.
In my case, I was facing the same, after minor changes, i was building the image again and again.
My old DockerFile
FROM python:3.8.0
WORKDIR /app
# Install system libraries
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y gcc
# Install project dependencies
COPY ./requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt --use-deprecated=legacy-resolver
# Don't use terminal buffering, print all to stdout / err right away
ENV PYTHONUNBUFFERED 1
COPY . .
so what I did, created a base image file first like this (Avoided the last line, did not copy my code)
FROM python:3.8.0
WORKDIR /app
# Install system libraries
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y gcc
# Install project dependencies
COPY ./requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt --use-deprecated=legacy-resolver
# Don't use terminal buffering, print all to stdout / err right away
ENV PYTHONUNBUFFERED 1
and then build this image using
docker build -t my_base_img:latest -f base_dockerfile .
then the final Dockerfile
FROM my_base_img:latest
WORKDIR /app
COPY . .
And as my from this image, I was not able to up the container, issues with my copied python code, so you can edit the image/container code, to fix the issues in the container, by this mean i avoided the task of building images again and again.
When my code got fixed, I copied the changes from container to my code base and then finally, I created the final image.
There are 4 Steps
Start the image you want to edit (e.g. docker run ...)
Modify the running image by shelling into it with docker exec -it <container-id> (you can get the container id with docker ps)
Make any modifications (install new things, make a directory or file)
In a new terminal tab/window run docker commit c7e6409a22bf my-new-image (substituting in the container id of the container you want to save)
An example
# Run an existing image
docker run -dt existing_image
# See that it's running
docker ps
# CONTAINER ID IMAGE COMMAND CREATED STATUS
# c7e6409a22bf existing-image "R" 6 minutes ago Up 6 minutes
# Shell into it
docker exec -it c7e6409a22bf bash
# Make a new directory for demonstration purposes
# (note that this is inside the existing image)
mkdir NEWDIRECTORY
# Open another terminal tab/window, and save the running container you modified
docker commit c7e6409a22bf my-new-image
# Inspect to ensure it saved correctly
docker image ls
# REPOSITORY TAG IMAGE ID CREATED SIZE
# existing-image latest a7dde5d84fe5 7 minutes ago 888MB
# my-new-image latest d57fd15d5a95 2 minutes ago 888MB

how to use pip to install pkg from requirement file without reinstall

I am trying to build an Docker image. My Dockerfile is like this:
FROM python:2.7
ADD . /code
WORKDIR /code
RUN pip install -r requirement.txt
CMD ["python", "manage.py", "runserver", "0.0.0.0:8300"]
And my requirement.txt file like this:
wheel==0.29.0
numpy==1.11.3
django==1.10.5
django-cors-headers==2.0.2
gspread==0.6.2
oauth2client==4.0.0
Now, I have a little change in my code, and i need pandas, so i add it in to requirement.txt file
wheel==0.29.0
numpy==1.11.3
pandas==0.19.2
django==1.10.5
django-cors-headers==2.0.2
gspread==0.6.2
oauth2client==4.0.0
pip install -r requirement.txt will install all packages in that file, although almost of them has installed before. My question is how to make pip install pandas only? That will save the time to build image.
Thank you
If you rebuild your image after changing requirement.txt with docker build -t <your_image> ., I guess it cann't be done because each time when docker runs docker build, it'll start an intermediate container from base image, and it's a new environment so pip obviously will need to install all of dependencies.
You can consider to build your own base image on python:2.7 with common dependencies pre-installed, then build your application image on your own base image. Once there's a need to add more dependencies, manually re-build the base image on the previous one with only extra dependencies installed, and then maybe optionally docker push it back to your registry.
Hope this could be helpful :-)

Resources