AWS Codeartifact and docker build cache

AWS Codeartifact and docker build cache - docker

Im trying to use AWS Codeartifact as my pip repo.
every time I build a docker image I need to login or generate token,
I tried this: How to use AWS CodeArtifact *within* A Dockerfile in AWSCodeBuild
but in each build the pip.conf file is different (new token) which breaks the docker cache.
for now I want to avoid base image with all the packages pre-installed.
anyone has a solution for this problem?
thx!

looks like docker buildkit is the answer.
Makefile:
docker_build:
#$(eval CODEARTIFACT_AUTH_TOKEN := $(shell aws codeartifact get-authorization-token --domain your-domain --domain-owner your-id --region your-region --query authorizationToken --output text --duration-seconds 900))
#pip config set global.index-url "https://aws:${CODEARTIFACT_AUTH_TOKEN}#<your-domain>-<your-id>.d.codeartifact.<your-region>.amazonaws.com/pypi/your-repo/simple/"
cp ~/.config/pip/pip.conf /tmp/pip.conf
DOCKER_BUILDKIT=1 docker build --progress=plain --secret id=pip.conf,src=/tmp/pip.conf -t tmp_docker_image .
Dockerfile:
FROM python:3.8.8-slim-buster
WORKDIR /code
ADD requirements.txt /code/requirements.txt
RUN --mount=type=secret,id=pip.conf,dst=/root/.pip/pip.conf \
pip install -r ./requirements.txt
I have tested it couple of times, changed the token on each run, looks good.
this one helped: https://dev.to/hugoprudente/managing-secrets-during-docker-build-3682

Related

Pip install local package invalidates docker cache in upper layers

I created a multistage docker file where in the base image I prepare anaconda environment with required packages and in the final image I copy the anaconda and install the local package.
I noticed that on every CI build and push all of the layers are recomputed and pushed, including the one big anaconda layer.
Here is how I build it
DOCKER_BUILTKIT=1 docker build -t my_image:240beac6 \
-f docker/dockerfiles/Dockerfile . \
--build-arg BASE_IMAGE=base_image:240beac64 --build-arg BUILDKIT_INLINE_CACHE=1 \
--cache-from my_image:latest
docker push my_image:240beac6
ARG BASE_IMAGE
FROM $BASE_IMAGE as base
FROM ubuntu:20.04
ENV DEBIAN_FRONTEND=noninteractive
# enable conda
ENV PATH=/root/miniconda3/bin/:${PATH}
COPY --from=base /opt/fast_align/* /usr/bin/
COPY --from=base /usr/local/bin/yq /usr/local/bin/yq
COPY --from=base /root/miniconda3 /root/miniconda3
COPY . /opt/my_package
# RUN pip install --no-deps /opt/my_package
If I leave the last run command commented out, the docker only builds the last COPY (if some file in the context changed) layer.
However, if I try to install it, it invalidates everything.
Is it because, I change the /root/miniconda3 with the pip install?
If so, I am surprised by that, I was hoping the lower RUN commands can't mess up the higher commands.
Is there a way to copy the conda environment from the base image, install the local image in a separate command and still benefit from the caching?
Any help is much appreciated.

One solution, albeit a bit hacky would be to replace the last RUN with CMD and install the package on start of the container. It would be almost instant as the requirements are already installed in the base image.

Use Github secrets in Dockerfile does not work with Github Actions

I have a Github Action to build image from a Dockerfile located in the same repo with the Github Action.
In the Dockerfile I use sensitive data so I chose to use Github Secrets.
Here is my Dockerfile:
From python:3.9.5
ARG NEXUS_USER
ARG NEXUS_PASS
RUN pip install --upgrade pip
RUN pip config set global.extra-index-url https://${NEXUS_USER}:${NEXUS_PASS}#<my nexus endpoint>
RUN pip config set global.trusted-host <my nexus endpoint>
COPY ./src/python /python-scripts
ENTRYPOINT [ "python", "/python-scripts/pipe.py" ]
Actions builds an image using this Dockerfile:
jobs:
docker:
runs-on: self-hosted
.
.
.
.
.
- name: build
run: |
docker build -t ${GITHUB_REPO} .
Action fails when calling the Github secrets from Dockerfile. What is the proper way to do that? As you can see I tried to add ARG in Dockerfile but that didn't work as well.

Is not clear where you are calling secrets from the Dockerfile, BTW you could pass the credentials to the build command using the build-arg flag, like:
docker build \
--build-arg "NEXUS_USER=${{ secrets.NEXUS_USER }}" \
--build-arg "NEXUS_PASS=${{ secrets.NEXUS_PASS }}" \
-t ${GITHUB_REPO} .

Problem running a Docker container in Gitlab CI/CD

I am trying to build and run my Docker image using Gitlab CI/CD, but there is one issue I can't fix even though locally everything works well.
Here's my Dockerfile:
FROM <internal_docker_repo_image>
RUN apt update && \
apt install --no-install-recommends -y build-essential gcc
COPY requirements.txt /requirements.txt
RUN pip install --no-cache-dir --user -r /requirements.txt
COPY . /src
WORKDIR /src
ENTRYPOINT ["python", "-m", "dvc", "repro"]
This is how I run the container:
docker run --volume ${PWD}:/src --env=GOOGLE_APPLICATION_CREDENTIALS=<path_to_json> <image_name> ./dvc_configs/free/dvc.yaml --force
Everything works great when running this locally, but it fails when run on Gitlab CI/CD.
stages:
- build_image
build_image:
stage: build_image
image: <internal_docker_repo_image>
script:
- echo "Building Docker image..."
- mkdir ~/.docker
- cat $GOOGLE_CREDENTIALS > ${CI_PROJECT_DIR}/key.json
- docker build . -t <image_name>
- docker run --volume ${PWD}:/src --env=GOOGLE_APPLICATION_CREDENTIALS=<path_to_json> <image_name> ./dvc_configs/free/dvc.yaml --force
artifacts:
paths:
- "./data/*csv"
expire_in: 1 week
This results in the following error:
ERROR: you are not inside of a DVC repository (checked up to mount point '/src')
Just in case you don't know what DVC is, this is a tool used in machine learning for versioning your models, datasets, metrics, and, in addition, setting up your pipelines, which I use it for in my case.
Essentially, it requires two folders .dvc and .git in the directory from which dvc repro is executed.
In this particular case, I have no idea why it's not able to run this command given that the contents of the folders are exactly the same and both .dvc and .git exist.
Thanks in advance!

Your COPY . /src is problematic for the same reason as Hidden file .env not copied using Docker COPY. You probably need !.dvc in your .dockerignore.
Additionally, docker run --volume ${PWD}:/src will overwrite the container's /src so $PWD itself will need .git & .dvc etc. You don't seem to have cloned a repo before running these script commands.

Docker: Copy file out of container while building it

I build the following image with docker build -t mylambda .
I now try to export lambdatest.zip to my localhost while building it so I see the .zip file on my Desktop. So far I used docker cp <Container ID>:/var/task/lambdatest.zip ~/Desktop but that doesn't work inside my Dockerfile (?). Do you have any ideas?
FROM lambci/lambda:build-python3.7
COPY lambda_function.py .
RUN python3 -m venv venv
RUN . venv/bin/activate
# ZIP
RUN pushd /var/task/venv/lib/python3.7/site-packages/
# Execute "zip" in bash for explanation of -9qr
RUN zip -9qr /var/task/lambdatest.zip *
Dockerfile (updated):
FROM lambci/lambda:build-python3.7
RUN python3 -m venv venv
RUN . venv/bin/activate
RUN pip install --upgrade pip
RUN pip install pystan==2.18
RUN pip install fbprophet
WORKDIR /var/task/venv/lib/python3.7/site-packages
COPY lambda_function.py .
COPY .lambdaignore .
RUN echo "Package size: $(du -sh | cut -f1)"
RUN zip -9qr lambdatest.zip *
RUN cat .lambdaignore | xargs zip -9qr /var/task/lambdatest.zip * -x

The typical answer is you do not. A Dockerfile does not have access to write files out to the host, by design, just as it does not have access to read arbitrary files from outside of the build context. There are various reasons for that, including security (you don't want an image build dropping a backdoor on a build host in the cloud) and reproducibility (images should not have dependencies outside of their context).
As a result, you need to take an extra step to extract contexts of an image back to the host. Typically this involves creating a container a running a docker cp command, along the lines of the following:
docker build -t your_image .
docker create --name extract your_image
docker cp extract:/path/to/files /path/on/host
docker rm extract
Or it can involve I/O pipes, where you run a tar command inside the container to package the files, and pipe that to a tar command running on the host to save the files.
docker build -t your_image
docker run --rm your_image tar -cC /path/in/container . | tar -xC /path/on/host
Recently, Docker has been working on buildx which is currently experimental. Using that, you can create a stage that consists of the files you want to export to the host and use the --output option to write that stage to the host rather than to an image. Your Dockerfile would then look like:
FROM lambci/lambda:build-python3.7 as build
COPY lambda_function.py .
RUN python3 -m venv venv
RUN . venv/bin/activate
# ZIP
RUN pushd /var/task/venv/lib/python3.7/site-packages/
# Execute "zip" in bash for explanation of -9qr
RUN zip -9qr /var/task/lambdatest.zip *
FROM scratch as artifact
COPY --from=build /var/task/lambdatest.zip /lambdatest.zip
FROM build as release
And then the build command to extract the zip file would look like:
docker buildx build --target=artifact --output type=local,dest=$(pwd)/out/ .
I believe buildx is still marked as experimental in the latest release, so to enable that, you need at least the following json entry in $HOME/.docker/config.json:
{ "experimental": "enabled" }
And then for all the buildx features, you will want to create a non-default builder with docker buildx create.
With recent versions of the docker CLI, integration to buildkit has exposed more options. Now it's no longer needed to run buildx to get access to the output flag. That means the above changes to:
docker build --target=artifact --output type=local,dest=$(pwd)/out/ .
If buildkit hasn't been enabled on your version (should be on by default in 20.10), you can enable it in your shell with:
export DOCKER_BUILDKIT=1
or for the entire host, you can make it the default with the following in /etc/docker/daemon.json:
{
"features": {"buildkit": true }
}
And to use the daemon.json the docker engine needs to be reloaded:
systemctl reload docker

Since docker 18.09, it natively supports a custom backend called BuildKit:
DOCKER_BUILDKIT=1 docker build -o target/folder myimage
This allows you to copy your latest stage to target/folder. If you want only specific files and not an entire filesystem, you can add a stage to your build:
FROM XXX as builder-stage
# Your existing dockerfile stages
FROM scratch
COPY --from=builder-stage /file/to/export /
Note: You will need your docker client and engine to be compatible with Docker Engine API 1.40+, otherwise docker will not understand the -o flag.
Reference: https://docs.docker.com/engine/reference/commandline/build/#custom-build-outputs

Dockerfile COPY and RUN in one layer

I have a script used in the preapration of a Docker image. I have this in the Dockerfile:
COPY my_script /
RUN bash -c "/my_script"
The my_script file contains secrets that I don't want in the image (it deletes itself when it finishes).
The problem is that the file remains in the image despite being deleted because the COPY is a separate layer. What I need is for both COPY and RUN to affect the same layer.
How can I COPY and RUN a script so that both actions affect the same layer?

take a look to multi-stage:
Use multi-stage builds
With multi-stage builds, you use multiple FROM statements in your
Dockerfile. Each FROM instruction can use a different base, and each
of them begins a new stage of the build. You can selectively copy
artifacts from one stage to another, leaving behind everything you
don’t want in the final image. To show how this works, let’s adapt the
Dockerfile from the previous section to use multi-stage builds.
Dockerfile:
FROM golang:1.7.3
WORKDIR /go/src/github.com/alexellis/href-counter/
RUN go get -d -v golang.org/x/net/html
COPY app.go .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
FROM alpine:latest
RUN apk --no-cache add ca-certificates
WORKDIR /root/
COPY --from=0 /go/src/github.com/alexellis/href-counter/app .
CMD ["./app"]

As of 18.09 you can use docker build --secret to use secret information during the build process. The secrets are mounted into the build environment and aren't stored in the final image.
RUN --mount=type=secret,id=script,dst=/my_script \
bash -c /my_script
$ docker build --secret id=script,src=my_script.sh
The script wouldn't need to delete itself.

This can be handled by BuildKit:
# syntax=docker/dockerfile:experimental
FROM ...
RUN --mount=type=bind,target=/my_script,source=my_script,rw \
bash -c "/my_script"
You would then build with:
DOCKER_BUILDKIT=1 docker build -t my_image .
This also sounds like you are trying to inject secrets into the build, e.g. to pull from a private git repo. BuildKit also allows you to specify:
# syntax=docker/dockerfile:experimental
FROM ...
RUN --mount=type=secret,target=/creds,id=cred \
bash -c "/my_script -i /creds"
You would then build with:
DOCKER_BUILDKIT=1 docker build -t my_image --secret id=creds,src=./creds .
With both of the BuildKit options, the mount command never actually adds the file to your image. It only makes the file available as a bind mount during that single RUN step. As long as that RUN step does not output the secret to another file in your image, the secret is never injected in the image.
For more on the BuildKit experimental syntax, see: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md

I guess you can use a workaround to do this:
Put my_script in a local http server which for example using python -m SimpleHTTPServer, and then the file could be accessed with http://http_server_ip:8000/my_script
Then, in Dockerfile use next:
RUN curl http://http_server_ip:8000/my_script > /my_script && chmod +x /my_script && bash -c "/my_script"
This workaround assure file add & delete in same layer, of course, you may need to add curl install in Dockerfile.

I think RUN --mount=type=bind,source=my_script,target=/my_script bash /my_script in BuildKit can solve your problem.
First, prepare BuildKit
export DOCKER_CLI_EXPERIMENTAL=enabled
export DOCKER_BUILDKIT=1
docker buildx create --name mybuilder --driver docker-container
docker buildx use mybuilder
Then, write your Dockerfile.
# syntax = docker/dockerfile:experimental
FORM debian
## something
RUN --mount=type=bind,source=my_script,target=/my_script bash -c /my_script
The first lint must be # syntax = docker/dockerfile:experimental because it's experimental feature.
And this method are not work in Play with docker, but work on my computer...
My computer us Ubuntu 20.04 with docker 19.03.12
Then, build it with
docker buildx build --platform linux/amd64 -t user/imgname -f ./Dockerfile . --push

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart