I'm trying to speed up my build pipeline by using a previously built docker image as a cache. This works locally, but on Azure DevOps the pipeline rebuilds the docker image from scratch every time.
I've split up the instructions in the Dockerfile such that changes in source code should only affect the last layer of the image
Dockerfile:
FROM my_teams_own_baseimage
# Set index for package installation of our own python packages
ARG INDEX_URL
ENV PIP_EXTRA_INDEX_URL=$INDEX_URL
# Copy the requirements file, set the working directory, and install
# python requirements for this project.
COPY requirements.txt /work/
WORKDIR /work
RUN python -m pip install pip --upgrade \
&& pip install --no-cache -r requirements.txt
# Copy all the remaining stuff into /work
COPY . /work/
Relevant pipeline steps:
Authenticate pip to create the PIP_EXTRA_INDEX_URL which is passed to the docker build context with --build-arg
Log in to the Azure Container Registry
Pull the "latest" image
Build the new image using the "latest" image as cache
- task: PipAuthenticate#1
displayName: 'Pip authenticate'
inputs:
artifactFeeds: $(ArtifactFeed)
onlyAddExtraIndex: true
- task: Docker#1
displayName: 'Docker login'
inputs:
containerregistrytype: 'Azure Container Registry'
azureSubscriptionEndpoint: '$(AzureSubscription)'
azureContainerRegistry: '$(ACR)'
command: 'login'
- script: "docker pull $(ACR)/$(ImagePrefix)$(ImageName):latest"
displayName: "Pull latest image for layer caching"
continueOnError: true # for first build, no cache
- task: Docker#1
displayName: 'Build image'
inputs:
containerregistrytype: 'Azure Container Registry'
azureSubscriptionEndpoint: '$(AzureSubscription)'
azureContainerRegistry: '$(ACR)'
command: 'Build an image'
dockerFile: 'Dockerfile'
arguments: |
--cache-from $(ACR)/$(ImagePrefix)$(ImageName):latest
--build-arg INDEX_URL=$(PIP_EXTRA_INDEX_URL)
imageName: '$(ACR)/$(ImagePrefix)$(ImageName):$(ImageTag)'
When I do this locally, I (as expected) get all layers using the cached versions except the last one, whereas on ADO the only thing that is cached is the layers needed to download the image specified in the FROM instruction (so clearly something is cached..), and then - weirdly - the second step but not the first??
Output from the ADO pipeline log:
Step 1/11 : FROM my_teams_own_baseimage
---> 257aee2d50ca
Step 2/11 : ARG INDEX_URL
---> Using cache
---> 51b3ddad9198
Step 3/11 : ENV PIP_EXTRA_INDEX_URL=$INDEX_URL
---> Running in 01338566e424
Removing intermediate container 01338566e424
---> df80a24236d0
Step 4/11 : COPY requirements.txt /work/
---> 10c7e91c753e
Step 5/11 : WORKDIR /work
---> Running in af615be24108
Removing intermediate container af615be24108
---> f01b0b69df75
Step 6/11 : RUN python -m pip install pip --upgrade && pip install --no-cache -r requirements.txt
---> Running in 0266deda77c6
I have tried not using the Docker#1 ADO task and instead using an inline script like I would do locally, but the result is the same.
My only idea is that the INDEX_URL is actually different each time and therefore invalidates the subsequent layers. I cannot get it printed out to check because its a "secret" so ADO puts **** instead.
EDIT
After some more trial and error, it appears that the PIP_EXTRA_INDEX_URL created by the PipAuthenticate#1 task is unique every time and only lasts for a day or two. Because the argument to pass in this env var is at the top of the Dockerfile because it is needed in the pip install command, every subsequent image layer will be un-cached. I cannot find a way around this, except for a static PIP_EXTRA_INDEX_URL but that seems non-ideal in a cloud environment..
Are you using Self-Hosted or Microsoft hosted agent pools for your pipeline. If you are running your pipeline on Microsoft hosted agents, it provisions a new ephemeral agent for your each pipeline run. Newly provisioned agent does not have history of pipeline execution so it does not have any cache of your Docker build stage. Best solution for this case is to use self-hosted agent pool where you have more control over your pipeline behaviour. If you are running more than one agent in any pool, you can always point your pipeline to a particular agent specifying agent.name in deman to keep cache of docker build task.
pool:
name: Your Pool
demands:
- Agent.Name -equals agentName
I guess the answer to the question is that there is no way to use cache when also passing in a variable (here, the PIP_EXTRA_INDEX_URL) at the top, which changes every time, because all subsequent layers will be invalidated for caching purposes.
The way around this that I found was to do this part in our base image and generate that once a day. Then only the FROM part of the Dockerfile has a change which means that the pip install step can be cached from an earlier version of the same build.
Related
I need to build 2 stages based on a common one
$ ls
Dockerfile dev other prod
$ cat Dockerfile
FROM scratch as dev
COPY dev /
FROM scratch as other
COPY other /
FROM scratch as prod
COPY --from=dev /env /
COPY prod /
As you can see prod stage does not depend on other stage, however it builds it anyway
$ docker build . --target prod
Sending build context to Docker daemon 4.096kB
Step 1/7 : FROM scratch as dev
--->
Step 2/7 : COPY dev /
---> 64c24f1f1d8c
Step 3/7 : FROM scratch as other
--->
Step 4/7 : COPY other /
---> 9b0753ec4353
Step 5/7 : FROM scratch as prod
--->
Step 6/7 : COPY --from=dev /dev /
---> Using cache
---> 64c24f1f1d8c
Step 7/7 : COPY prod /
---> 9fe8cc3d3ac1
Successfully built 9fe8cc3d3ac1
Why Docker needs to build the other layers ?
How can I build prod without other ? Do I have to use another Dockerfile ?
There are two different backends for docker build. The "classic" backend works exactly the way you describe: it runs through the entire Dockerfile until it reaches the final stage, so even if a stage is unused it will still be executed. The newer BuildKit backend can do some dependency analysis and determine that a stage is never used and skip over it as you request.
Very current versions of Docker use BuildKit as their default backend. Slightly older versions have BuildKit available, but it isn't the default. You can enable it by running
export DOCKER_BUILDKIT=1
in your shell environment where you run docker build.
(It's often a best practice to run the same Docker image in all environments, and to use separate Dockerfiles for separate components. That avoids any questions around which stages exactly get run.)
Problem
Working with Azure DevOps, we use a Dockerfile to build and statically serve an Angular application:
Dockerfile
FROM node:12-14-alpine AS build
WORKDIR /usr/etc/app
COPY *.json ./
RUN npm install
COPY . .
RUN npm run build -- -c stage
FROM node:alpine as runtime
WORKDIR /app
RUN yarn add express
COPY --from=build /usr/etc/app/dist/production ./dist
COPY --from=build /usr/etc/app/server.js .
ENV NODE_ENV=production
EXPOSE 8080
ENTRYPOINT ["node", "server.js"]
Locally, the container builds as expected. However, running this dockerfile (or a similar one) on the pipeline gives following output:
Pipeline Output
Starting: Build frontend image
==============================================================================
Task : Docker
Description : Build, tag, push, or run Docker images, or run a Docker command
Version : 1.187.2
Author : Microsoft Corporation
Help : https://learn.microsoft.com/azure/devops/pipelines/tasks/build/docker
==============================================================================
/usr/bin/docker pull =build /usr/etc/app/server.js .
invalid reference format
/usr/bin/docker inspect =build /usr/etc/app/server.js .
Error: No such object: =build /usr/etc/app/server.js .
[]
/usr/bin/docker build -f /home/***/YYY/app/myagent/_work/1/s/Frontend/Dockerfile --label com.azure.dev.image.system.teamfoundationcollectionuri=XXXX --label com.azure.dev.image.build.sourceversion=6440c30bb386************d370f2bc6387 --label com.azure.dev.image.system.teamfoundationcollectionuri=
Sending build context to Docker daemon 508.4kB
Step 1/18 : FROM node:12.14-alpine AS build
...
# normal build until finish, successful
(note the duplicate teamfoundationcollectionuri labelling, but this is another issue)
Questions
We don't understand:
how and why the first command is constructed (/usr/bin/docker pull =build /usr/etc/app/server.js .)
how and why the second command is constructed (/usr/bin/docker inspect =build /usr/etc/app/server.js .)
how the docker agent does not recognize the --from clause at first, but builds successfully (and correctly) nevertheless
why the docker agent does warn about an invalid reference format but then goes on recognising every single instruction correctly.
btw, all these errors also happen when building the .NET backend (with a similar dockerfile).
We now DO understand that this problem only happens with task version 1.187.2 (or 0.187.2, see link below), but not with the previous 1.181.0 (resp. 0.181.0).
Additional Sources
all we could find about this problem is an old issue thread from 2018 that has been archived by microsoft. the only link is via the IP address, no valid certificate. The user has the exact same problem, but the thread was closed. Interestingly enough, the minor and patch version numbers are identical to our system.
Came across this question while searching for an answer to the same issue. I have spent the last few hours digging through source code for the Docker task and I think I can answer your questions.
It appears that the Docker task tries to parse the Dockerfile to determine the base image, and there is (was) a bug in the task that it was looking for lines with FROM in them, but was incorrectly parsing the --from from the COPY --from line.
It then passes that base image to docker pull and docker inspect prior to calling docker build. The first two commands fail because they're being passed garbage, but the third (docker build) reads the dockerfile correctly and does a pull anyway, so it succeeds.
It looks like this was fixed on 2021-08-17 to only parse lines that start with FROM, so I assume it will make it to DevOps agents soon.
I'm using multi-stage building with a Dockerfile like this:
#####################################
## Build the client
#####################################
FROM node:12.19.0 as web-client-builder
WORKDIR /workspace
COPY web-client/package*.json ./
# Running npm install before we update our source allows us to take advantage
# of docker layer caching. We are excluding node_modules in .dockerignore
RUN npm ci
COPY web-client/ ./
RUN npm run test:ci
RUN npm run build
#####################################
## Host the client on a static server
#####################################
FROM nginx:1.19 as web-client
COPY --from=web-client-builder /workspace/nginx-templates /etc/nginx/templates/
COPY --from=web-client-builder /workspace/nginx.conf /etc/nginx/nginx.conf
COPY --from=web-client-builder /workspace/build /var/www/
#####################################
## Build the server
#####################################
FROM openjdk:11-jdk-slim as server-builder
WORKDIR /workspace
COPY build.gradle settings.gradle gradlew ./
COPY gradle ./gradle
COPY server/ ./server/
RUN ./gradlew --no-daemon :server:build
#####################################
## Start the server
#####################################
FROM openjdk:11-jdk-slim as server
WORKDIR /app
ARG JAR_FILE=build/libs/*.jar
COPY --from=server-builder /workspace/server/$JAR_FILE ./app.jar
ENTRYPOINT ["java","-jar","/app/app.jar"]
I also have a docker-compose.yml like this:
version: "3.8"
services:
server:
restart: always
container_name: server
build:
context: .
dockerfile: Dockerfile
target: server
image: server
ports:
- "8090:8080"
web-client:
restart: always
container_name: web-client
build:
context: .
dockerfile: Dockerfile
target: web-client
image: web-client
environment:
- LISTEN_PORT=80
ports:
- "8091:80"
The two images involved here, web-client and server are completely independent. I'd like to take advantage of multi-stage build parallelization.
When I run docker-compose build (I'm on docker-compose 1.27.4), I get output like this
λ docker-compose build
Building server
Step 1/24 : FROM node:12.19.0 as web-client-builder
---> 1f560ce4ce7e
... etc ...
Step 6/24 : RUN npm run test:ci
---> Running in e9189b2bff1d
... Runs tests ...
... etc ...
Step 24/24 : ENTRYPOINT ["java","-jar","/app/app.jar"]
---> Using cache
---> 2ebe48e3b06e
Successfully built 2ebe48e3b06e
Successfully tagged server:latest
Building web-client
Step 1/11 : FROM node:12.19.0 as web-client-builder
---> 1f560ce4ce7e
... etc ...
Step 6/11 : RUN npm run test:ci
---> Using cache
---> 0f205b9549e0
... etc ...
Step 11/11 : COPY --from=web-client-builder /workspace/build /var/www/
---> Using cache
---> 31c4eac8c06e
Successfully built 31c4eac8c06e
Successfully tagged web-client:latest
Notice that my tests (npm run test:ci) run twice (Step 6/24 for the server target and then again at Step 6/11 for the web-client target). I'd like to understand why this is happening, but I guess it's not a huge problem, because at least it's cached by the time it gets around to the tests the second time.
Where this gets to be a bigger problem is when I try to run my build in parallel. Now I get output like this:
λ docker-compose build --parallel
Building server ...
Building web-client ...
Building server
Building web-client
Step 1/11 : FROM node:12.19.0 as web-client-builderStep 1/24 : FROM node:12.19.0 as web-client-builder
---> 1f560ce4ce7e
... etc ...
Step 6/24 : RUN npm run test:ci
---> e96afb9c14bf
Step 6/11 : RUN npm run test:ci
---> Running in c17deba3c318
---> Running in 9b0faf487a7d
> web-client#0.1.0 test:ci /workspace
> react-scripts test --ci --coverage --reporters=default --reporters=jest-junit --watchAll=false
> web-client#0.1.0 test:ci /workspace
> react-scripts test --ci --coverage --reporters=default --reporters=jest-junit --watchAll=false
... Now my tests run in parallel twice, and the output is interleaved for both parallel runs ...
It's clear that the tests are running twice now, because now that I'm running the builds in parallel, there's no chance for them to cache.
Can anyone help me understand this? I thought that one of the high points of docker multi-stage builds was that they were parallelizable, but this behavior doesn't make sense to me. What am I misunderstanding?
Note
I also tried enabling BuildKit for docker-compose. I had a harder time making sense of the output. I don't believe it was running things twice, but I'm also not sure that it was parallelizing. I need to dig more into it, but my main question stands: I'm hoping to understand why multi-stage builds don't run in parallel in the way I expected without BuildKit.
You can split this into two separate Dockerfiles. I might write a web-client/Dockerfile containing the first two stages (changing the relative COPY paths to ./), and leave the root-directory Dockerfile to build the server application. Then your docker-compose.yml file can point at these separate directories:
services:
server:
build: . # equivalent to {context: ., dockerfile: Dockerfile}
web-client:
build: web-client
As #Stefano notes in their answer, multi-stage builds are more optimized around building a single final image, and in the "classic" builder they always run from the beginning up through the named target stage without any particular logic for where to start.
why multi-stage builds don't run in parallel in the way I expected without BuildKit.
That's the high point of BuildKit.
The main purpose of the multistage in Docker is to produce smaller images by keeping only what's required by the application to properly work. e.g.
FROM node as builder
COPY package.json package-lock.json
RUN npm ci
COPY . /app
RUN npm run build
FROM nginx
COPY --from=/app/dist --chown=nginx /app/dist /var/www
All the development tools required for building the project are simply not copied into the final image. This translates into smaller final images.
EDIT:
From the BuildKit documentation:
BuildKit builds are based on a binary intermediate format called LLB that is used for defining the dependency graph for processes running part of your build. tl;dr: LLB is to Dockerfile what LLVM IR is to C.
In other words, BuildKit is able to evaluate the dependencies for each stage allowing parallel execution.
I have a Dockerfile that uses the COPY --chown $UID:$GID syntax. I can use Docker to build images from this Dockerfile without issue on my MacBook and Linux workstation both of which are running Docker version 19.03.5. However my automated builds on DockerHub fail with the following error.
Step 16/26 : COPY --chown=$UID:$GID environment.yml requirements.txt postBuild ./
unable to convert uid/gid chown string to host mapping: can't find uid for user $UID: no such user: $UID
build hook failed! (1)
I was surprised by this failure as I assumed that DockerHub would be using a recent version of Docker for automated builds.
How can I check the version of Docker used by DockerHub? Even better would be if I could specify the version of Docker that DockerHub should use in my build hook.
It seems that the variables $UID and $GID are not set in their environment.
In that case you should use the id 1000 like this:
COPY --chown=1000:1000 environment.yml requirements.txt postBuild ./
For a cron job triggered script, I have used python:3.6.8-slim as a base image for my script to run.
The script runs every hour and does so successfully until the docker system prune job runs.
After that, the script fails to pull the image with the message "ERROR: error pulling image configuration: unknown blob"
When rebuilding and pushing the image to the registry again, the docker pull command runs without any problems until the prune job.
I am using sonatype nexus3 as my private docker registry.
I do not understand why the docker system prune job is causing this behaviour since the registry nexus3 is running in its very own container.
my cron job:
30 * * * * docker pull my.registry.com/path/name:tag && docker run --rm my.registry.com/path/name:tag
my dockerfile:
FROM python:3.6.8-slim
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY ./ ./src/
CMD ["python", "src/myscript.py"]