I am trying to build an Docker image. My Dockerfile is like this:
FROM python:2.7
ADD . /code
WORKDIR /code
RUN pip install -r requirement.txt
CMD ["python", "manage.py", "runserver", "0.0.0.0:8300"]
And my requirement.txt file like this:
wheel==0.29.0
numpy==1.11.3
django==1.10.5
django-cors-headers==2.0.2
gspread==0.6.2
oauth2client==4.0.0
Now, I have a little change in my code, and i need pandas, so i add it in to requirement.txt file
wheel==0.29.0
numpy==1.11.3
pandas==0.19.2
django==1.10.5
django-cors-headers==2.0.2
gspread==0.6.2
oauth2client==4.0.0
pip install -r requirement.txt will install all packages in that file, although almost of them has installed before. My question is how to make pip install pandas only? That will save the time to build image.
Thank you
If you rebuild your image after changing requirement.txt with docker build -t <your_image> ., I guess it cann't be done because each time when docker runs docker build, it'll start an intermediate container from base image, and it's a new environment so pip obviously will need to install all of dependencies.
You can consider to build your own base image on python:2.7 with common dependencies pre-installed, then build your application image on your own base image. Once there's a need to add more dependencies, manually re-build the base image on the previous one with only extra dependencies installed, and then maybe optionally docker push it back to your registry.
Hope this could be helpful :-)
Related
I created a multistage docker file where in the base image I prepare anaconda environment with required packages and in the final image I copy the anaconda and install the local package.
I noticed that on every CI build and push all of the layers are recomputed and pushed, including the one big anaconda layer.
Here is how I build it
DOCKER_BUILTKIT=1 docker build -t my_image:240beac6 \
-f docker/dockerfiles/Dockerfile . \
--build-arg BASE_IMAGE=base_image:240beac64 --build-arg BUILDKIT_INLINE_CACHE=1 \
--cache-from my_image:latest
docker push my_image:240beac6
ARG BASE_IMAGE
FROM $BASE_IMAGE as base
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
# enable conda
ENV PATH=/root/miniconda3/bin/:${PATH}
COPY --from=base /opt/fast_align/* /usr/bin/
COPY --from=base /usr/local/bin/yq /usr/local/bin/yq
COPY --from=base /root/miniconda3 /root/miniconda3
COPY . /opt/my_package
# RUN pip install --no-deps /opt/my_package
If I leave the last run command commented out, the docker only builds the last COPY (if some file in the context changed) layer.
However, if I try to install it, it invalidates everything.
Is it because, I change the /root/miniconda3 with the pip install?
If so, I am surprised by that, I was hoping the lower RUN commands can't mess up the higher commands.
Is there a way to copy the conda environment from the base image, install the local image in a separate command and still benefit from the caching?
Any help is much appreciated.
One solution, albeit a bit hacky would be to replace the last RUN with CMD and install the package on start of the container. It would be almost instant as the requirements are already installed in the base image.
I am trying to create a python based image with some packages installed. But i want the image layer not to show anything about the packages I installed.
I am trying to use the multistage build
eg:
FROM python:3.9-slim-buster as builder
RUN pip install django # (I dont want this command to be seen when checking the docker image layers, So thats why using multistage build)
FROM python:3.9-slim-buster
# Here i want to copy all the site packages
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
Now build image
docker build -t python_3.9-slim-buster_custom:latest .
and later check the image layers
dive python_3.9-slim-buster_custom:latest
this will not show the RUN pip install django line
Will this be a good way to achieve what i want (hide all the pip install commands)
It depends on what you are installing, if this will be sufficient or not. Some python libraries add binaries to your system on which they rely.
FROM python:3.9-alpine as builder
# install stuff
FROM python:3.9-alpine
# this is for sure required
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
# this depends on what you are installing
COPY --from=builder /usr/local/bin /usr/local/bin
The usual approach I see for this is to use a virtual environment in an earlier build stage, then copy the entire virtual environment into the final image. Remember that virtual environments are very specific to a single Python build and installation path.
If your application has its own setup.cfg or setup.py file, then a minimal version of this could look like:
FROM python:3.9-slim-buster as builder
# If you need build-only tools, like build-essential for Python C
# extensions, install them first
# RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ...
WORKDIR /src
# Create and "activate" the virtual environment
RUN python3 -m venv /app
ENV PATH=/app/bin:$PATH
# Install the application as normal
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
RUN pip install .
FROM python:3.9-slim-buster as builder
# If you need runtime libraries, like a database client C library,
# install them first
# RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ...
# Copy the entire virtual environment over
COPY --from=builder /app /app
ENV PATH=/app/bin:$PATH
# Run an entry_points script from the setup.cfg as the main command
CMD ["my_app"]
Note that this has only minimal protection against a curious user seeing what's in the image. The docker history or docker inspect output will show the /app container directory, you can docker run --rm the-image pip list to see the package dependencies, and the application and library source will be present in a human-readable form.
Currently whats working for me is.
FROM python:3.9-slim-buster as builder
# DO ALL YOUR STUFF HERE
FROM python:3.9-slim-buster
COPY --from=builder / /
need your help i have a standard python library which is in .tar.gz file. i need to manually copy the file in git repo to use it all the time.
i need to create a docker container which will have this file and install the libraries from that standard library.
need your help on it. i looking for a Docker file
tried docker file as below
FROM python:3.6
COPY . /app
WORKDIR /app
RUN ls -ltr
EXPOSE 8080
RUN pip install pipenv
RUN pipenv install --system --deploy --skip-lock
I have a .tar.gz file which i need to copy it to docker and install the packages in it and use it containers
I did a basic search in the community and could not find a suitable answer, so I am asking here. Sorry if it was asked earlier.
Basically , I am working on a certain project and we keep changing code at a regular interval . So ,we need to build docker image everytime due to that we need to install dependencies from requirement.txt from scratch which took around 10 min everytime.
How can I perform direct change to docker image and also how to configure entrypoint(in Docker File) which reflect changes in Pre-Build docker image
You don't edit an image once it's been built. You always run docker build from the start; it always runs in a clean environment.
The flip side of this is that Docker caches built images. If you had image 01234567, ran RUN pip install -r requirements.txt, and got image 2468ace0 out, then the next time you run docker build it will see the same source image and the same command, and skip doing the work and jump directly to the output images. COPY or ADD files that change invalidates the cache for future steps.
So the standard pattern is
FROM node:10 # arbitrary choice of language
WORKDIR /app
# Copy in _only_ the requirements and package lock files
COPY package.json yarn.lock ./
# Install dependencies (once)
RUN yarn install
# Copy in the rest of the application and build it
COPY src/ src/
RUN yarn build
# Standard application metadata
EXPOSE 3000
CMD ["yarn", "start"]
If you only change something in your src tree, docker build will skip up to the COPY step, since the package.json and yarn.lock files haven't changed.
In my case, I was facing the same, after minor changes, i was building the image again and again.
My old DockerFile
FROM python:3.8.0
WORKDIR /app
# Install system libraries
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y gcc
# Install project dependencies
COPY ./requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt --use-deprecated=legacy-resolver
# Don't use terminal buffering, print all to stdout / err right away
ENV PYTHONUNBUFFERED 1
COPY . .
so what I did, created a base image file first like this (Avoided the last line, did not copy my code)
FROM python:3.8.0
WORKDIR /app
# Install system libraries
RUN apt-get update && \
apt-get install -y git && \
apt-get install -y gcc
# Install project dependencies
COPY ./requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt --use-deprecated=legacy-resolver
# Don't use terminal buffering, print all to stdout / err right away
ENV PYTHONUNBUFFERED 1
and then build this image using
docker build -t my_base_img:latest -f base_dockerfile .
then the final Dockerfile
FROM my_base_img:latest
WORKDIR /app
COPY . .
And as my from this image, I was not able to up the container, issues with my copied python code, so you can edit the image/container code, to fix the issues in the container, by this mean i avoided the task of building images again and again.
When my code got fixed, I copied the changes from container to my code base and then finally, I created the final image.
There are 4 Steps
Start the image you want to edit (e.g. docker run ...)
Modify the running image by shelling into it with docker exec -it <container-id> (you can get the container id with docker ps)
Make any modifications (install new things, make a directory or file)
In a new terminal tab/window run docker commit c7e6409a22bf my-new-image (substituting in the container id of the container you want to save)
An example
# Run an existing image
docker run -dt existing_image
# See that it's running
docker ps
# CONTAINER ID IMAGE COMMAND CREATED STATUS
# c7e6409a22bf existing-image "R" 6 minutes ago Up 6 minutes
# Shell into it
docker exec -it c7e6409a22bf bash
# Make a new directory for demonstration purposes
# (note that this is inside the existing image)
mkdir NEWDIRECTORY
# Open another terminal tab/window, and save the running container you modified
docker commit c7e6409a22bf my-new-image
# Inspect to ensure it saved correctly
docker image ls
# REPOSITORY TAG IMAGE ID CREATED SIZE
# existing-image latest a7dde5d84fe5 7 minutes ago 888MB
# my-new-image latest d57fd15d5a95 2 minutes ago 888MB
I'm using this Dockerfile as part of this docker compose file.
Right now, every time I want to add a new pip requirement, I stop my containers, add the new pip requirement, run docker-compose -f local.yml build, and then restart the containers with docker-compose -f local.yml up. This takes a long time, and it even looks like it's recompiling the container for Postgres if I just add a pip dependency.
What's the fastest way to add a single pip dependency to a container?
This is related to fact that the Docker build cache is being invalidated. When you edit the requirements.txt the step RUN pip install --no-cache-dir -r /requirements/production.txt and all subsequent instructions in the Dockerfile get invalidated. Thus they get re-executed.
As a best practice, you should avoid invalidaing the build cache as much as possible. This is achieved by moving the steps that change often to the bottom of the Dockerfile. You can edit the Dockerfile and while developing add separate pip installation steps to the end.
...
USER django
WORKDIR /app
pip install --no-cache-dir <new package>
pip install --no-cache-dir <new package2>
...
And once you are sure of all the dependencies needed, add them to the requirements file. That way you avoid invalidating the build cache early on and only build the steps starting from the installation of the new packages on ward.