I am building a Docker image and need to run pip install vs a private PyPi with credentials.
What is the best way to secure the credentials?
Using various file configuration options (pip.conf, requirements.txt, .netrc) is still a vulnerability even if I delete them because they can be recovered.
Environment variables are also visible.
What's the most secure approach?
I understand that you want to provide those credentials on build time and get rid of them afterwards.
Well, the most secure way to handle this with pip would be by using a multi-stage build process.
First, you would declare an initial build-image with the file configurations and any dependency that could be needed to download/compile your desired packages; don't worry about the possibility of recovering those files, since you will only use them for the build process.
Afterwards define your final image without the build dependencies and copy only the source code you want to run from your project and the dependencies from the build image. The resultant image won't have the configuration files and it's impossible to recover them, since they never were there.
FROM python:3.10-slim as build
RUN apt-get update
RUN apt-get install -y --no-install-recommends \
build-essential gcc
WORKDIR /usr/app
RUN python -m -venv /usr/app/venv
ENV PATH="/usr/app/venv/bin:$PATH"
[HERE YOU COPY YOUR CONFIGURATION FILES WITH CREDENTIALS]
COPY requirements.txt
RUN pip install -r requirements
FROM python:3.10-slim
WORKDIR /usr/app
COPY --from=build /usr/app/venv ./venv
[HERE YOU COPY YOUR SOURCE CODE INTO YOUR CURRENT WORKDIR]
ENV PATH="/usr/app/venv/bin:$PATH"
ENTRYPOINT ["python", "whatever.py"]
Related
I need one application in docker image which requires some specific version of libraries that have to be built from source.
So I am building it during the Docker build process.
Problem is, that it takes so long time (about 30mins).
I am wondering if it's possible to save it to the cache layer and skip it if the build process is done next time.
Here is the critical part of code from Dockerfile:
ADD https://sqlite.org/2022/sqlite-autoconf-3380200.tar.gz sqlite-autoconf-3380200.tar.gz
RUN tar -xvzf sqlite-autoconf-3380200.tar.gz
WORKDIR sqlite-autoconf-3380200
RUN ./configure
RUN make
RUN make install
WORKDIR /tmp
ADD https://download.osgeo.org/proj/proj-9.0.0.tar.gz proj-9.0.0.tar.gz
RUN tar -xvzf proj-9.0.0.tar.gz
WORKDIR proj-9.0.0
RUN mkdir build
WORKDIR build
RUN cmake ..
RUN cmake --build .
RUN cmake --build . --target install
RUN projsync --system-directory --list-files
The important detail about Docker layer caching is that, if any of the previous steps have changed, then all of the following steps will be rebuilt. So for your setup, if you change anything in one of the earlier dependencies, it will cause all of the later steps to be rebuilt again.
This is a case where Docker multi-stage builds can help. The idea is that you'd build each library in its own image, and therefore each library build can be independently cached. You can then copy all of the build results into a final image.
The specific approach I'll describe here assumes (a) all components install into /usr/local, (b) /usr/local is initially empty, and (c) there aren't conflicts between the different library installations. You should be able to adapt it to other filesystem layouts.
Everything below is in the same Dockerfile.
I'd make a very first stage selecting a base Linux-distribution image. If you know you'll always need to install something – TLS CA certificates, mandatory package updates – you can put it here. Having this helps ensure that everything is being built against a consistent base.
FROM ubuntu:20.04 AS base
# empty in this example
Since you have multiple things you need to build, a next stage will install any build-time dependencies. The C toolchain and its dependencies are large, so having this separate saves time and space since the toolchain can be shared across the later stages.
FROM base AS build-deps
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
build-essential \
cmake
# libfoo-dev
Now for each individual library, you have a separate build stage that downloads the source, builds it, and installs it into /usr/local.
FROM build-deps AS sqlite
WORKDIR /sqlite
ADD https://sqlite.org/2022/sqlite-autoconf-3380200.tar.gz sqlite-autoconf-3380200.tar.gz
...
RUN make install
FROM build-deps AS proj
WORKDIR /proj
ADD https://download.osgeo.org/proj/proj-9.0.0.tar.gz proj-9.0.0.tar.gz
...
RUN cmake --build . --target install
To actually build your application, you'll need the C toolchain, plus you'll also need these various libraries.
FROM build-deps AS app
COPY --from=sqlite /usr/local/ /usr/local/
COPY --from=proj /usr/local/ /usr/local/
WORKDIR /app
COPY ./ ./
RUN ./configure && make && make install
Once you've done all of this, in the app image, the /usr/local tree will have all of the installed libraries (COPYed from the previous image) plus your application. So for the final stage, start from the original OS image (without the C toolchain) and COPY the /usr/local tree in (without the original sources).
FROM base
COPY --from=app /usr/local/ /usr/local/
EXPOSE 12345
CMD ["myapp"] # in `/usr/local/bin`
Let's say you update to a newer patch version of proj. In the sqlite path, the base and build-deps layers haven't changed and the ADD and RUN commands are the same, so this stage runs entirely from cache. proj is rebuilt. That will cause the COPY --from=proj step to invalidate the cache in the app stage, and you'll rebuild your application against the newer library.
I am trying to create a python based image with some packages installed. But i want the image layer not to show anything about the packages I installed.
I am trying to use the multistage build
eg:
FROM python:3.9-slim-buster as builder
RUN pip install django # (I dont want this command to be seen when checking the docker image layers, So thats why using multistage build)
FROM python:3.9-slim-buster
# Here i want to copy all the site packages
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
Now build image
docker build -t python_3.9-slim-buster_custom:latest .
and later check the image layers
dive python_3.9-slim-buster_custom:latest
this will not show the RUN pip install django line
Will this be a good way to achieve what i want (hide all the pip install commands)
It depends on what you are installing, if this will be sufficient or not. Some python libraries add binaries to your system on which they rely.
FROM python:3.9-alpine as builder
# install stuff
FROM python:3.9-alpine
# this is for sure required
COPY --from=builder /usr/local/lib/python3.9/site-packages /usr/local/lib/python3.9/site-packages
# this depends on what you are installing
COPY --from=builder /usr/local/bin /usr/local/bin
The usual approach I see for this is to use a virtual environment in an earlier build stage, then copy the entire virtual environment into the final image. Remember that virtual environments are very specific to a single Python build and installation path.
If your application has its own setup.cfg or setup.py file, then a minimal version of this could look like:
FROM python:3.9-slim-buster as builder
# If you need build-only tools, like build-essential for Python C
# extensions, install them first
# RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ...
WORKDIR /src
# Create and "activate" the virtual environment
RUN python3 -m venv /app
ENV PATH=/app/bin:$PATH
# Install the application as normal
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
RUN pip install .
FROM python:3.9-slim-buster as builder
# If you need runtime libraries, like a database client C library,
# install them first
# RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install ...
# Copy the entire virtual environment over
COPY --from=builder /app /app
ENV PATH=/app/bin:$PATH
# Run an entry_points script from the setup.cfg as the main command
CMD ["my_app"]
Note that this has only minimal protection against a curious user seeing what's in the image. The docker history or docker inspect output will show the /app container directory, you can docker run --rm the-image pip list to see the package dependencies, and the application and library source will be present in a human-readable form.
Currently whats working for me is.
FROM python:3.9-slim-buster as builder
# DO ALL YOUR STUFF HERE
FROM python:3.9-slim-buster
COPY --from=builder / /
Using the official golang docker image, I can use the protoc command to generate the x.pb.go and x_grpc.pb.go files. The problem is that it uses the latest versions, while I want to generate those using whichever version that is part of the go.mod file.
I tried to start from the golang image, then get my project's go.mod file, get the dependencies and try to generate from there. Here is my dockerfile:
FROM golang:1.15
WORKDIR /app
RUN apt-get update
RUN apt install -y protobuf-compiler
COPY go.* .
RUN go mod download
RUN go get all
RUN export PATH="$PATH:$(go env GOPATH)/bin"
RUN mkdir /api
Then I try to bind the volume of the .proto file and the /pb folder to output them, and use the protoc command again (I'm trying directly from the docker right now). Something like this:
protoc --proto_path=/api --go_out=/pb --go-grpc_out=/pb /api/x.proto
I'm getting this error though:
protoc-gen-go: program not found or is not executable
--go_out: protoc-gen-go: Plugin failed with status code 1.
My go.sum file has google.golang.org/protobuf v1.25.0 in it, so how come it is not found?
go.mod & go.sum are used for versioning when building go programs. This is not what you need here. You want the protoc compiler to use the correct plugin versions when running it against your .proto file(s).
To install the desired protoc-gen-go (and protoc-gen-go-grpc if using gRPC) plugins, install them directly. Update your Dockerfile like so:
FROM golang:1.15
WORKDIR /app
RUN apt-get update
RUN apt install -y protobuf-compiler
RUN GO111MODULE=on \
go get google.golang.org/protobuf/cmd/protoc-gen-go#v1.25.0 \
google.golang.org/grpc/cmd/protoc-gen-go-grpc#v1.1.0
# export is redundant here `/go/bin` is already in `golang` image's path
# (and actual any env change here is lost once the command completes)
# RUN export PATH="$PATH:$(go env GOPATH)/bin"
RUN mkdir /api
If you want the latest versions of either plugin, either use #latest - or drop the # suffix
I'm using this Dockerfile as part of this docker compose file.
Right now, every time I want to add a new pip requirement, I stop my containers, add the new pip requirement, run docker-compose -f local.yml build, and then restart the containers with docker-compose -f local.yml up. This takes a long time, and it even looks like it's recompiling the container for Postgres if I just add a pip dependency.
What's the fastest way to add a single pip dependency to a container?
This is related to fact that the Docker build cache is being invalidated. When you edit the requirements.txt the step RUN pip install --no-cache-dir -r /requirements/production.txt and all subsequent instructions in the Dockerfile get invalidated. Thus they get re-executed.
As a best practice, you should avoid invalidaing the build cache as much as possible. This is achieved by moving the steps that change often to the bottom of the Dockerfile. You can edit the Dockerfile and while developing add separate pip installation steps to the end.
...
USER django
WORKDIR /app
pip install --no-cache-dir <new package>
pip install --no-cache-dir <new package2>
...
And once you are sure of all the dependencies needed, add them to the requirements file. That way you avoid invalidating the build cache early on and only build the steps starting from the installation of the new packages on ward.
I have a multi-stage Dockerfile. In stage one, I git clone from a github repo. In later stage, I do other stuff like pip etc and use a file from stage 1. I'd like to only disable caching for the first stage.
It looks like docker build --target stage1 --no-cache doesn't do what I want.
Is there a way to disable only a certain stage?
My Dockerfile looks like this:
FROM yijian/git-alpine
WORKDIR /tmp
RUN git clone https://github.com/abc/abc.git
FROM python:3.5.3-slim
RUN mkdir /app
ADD requirements.txt /app
ADD pip/pip.conf /root/.pip/pip.conf
WORKDIR /app
RUN pip3 install --upgrade pip && \
pip3 install pbr && \
pip3 install -r requirements.txt
ADD server.py /app
ADD docker/start.sh /app
RUN chmod a+x /app/start.sh
COPY --from=0 /tmp/abc/directory /usr/local/lib/python3.5/site-packages/abc/directory
EXPOSE 9092
ENTRYPOINT ["./start.sh"]
I don't believe that a single Dockerfile can have caching disabled for a specific state. That might make a nice feature request but I would rather see that as a declarative statement in the file rather than on the command line.
According to Docker's reference site:
https://docs.docker.com/engine/reference/commandline/build/#usage
The "--target" flag allows you to select a target stage from a Dockerfile, meaning that it would only run that part of the Dockerfile. I would expect that the --no-cache flag would work in conjunction with this flag, however I wouldn't expect the other sections of the Dockerfile to run.
I believe that what you want to occur would take multiple commands which may defeat the purpose of having a multistage Dockerfile.
It would take more work, but depending on what you want to cache, you could possible include a script, such as bash or powershell, which can accomplish this goal.
Another option (depending on your needs) may be to use a separate Docker container which caches just what you need. For instance, I created a CI build which uses a Dockerfile that only imports dependencies and then my main build happens in a container that references that first container. I have done this with "dotnet restore" commands, so that the dependencies are preloaded and also have done this using "npm install". This method would work with any package management tool which allows you to specify a source. so where you have a project.json, you can extract the common dependencies and call it cache.package.json, then build a base image that has already done the heavy downloading for you, then ideally when you run this again during your more frequent builds it needs to pull less. Take advantage of the layered approach Docker offers!
If your earlier stages change more often than the later ones you might want to consider reversing the order of the stages.
Stage 1. Setup your environment with pip (possibly in an virtual env).
Stage 2. copy environment files for virtual env from previous stage & then do git clone.
As long as your stage 1 doesn't need to change the cache can be used there and only the git clone part will be updated.