Why does docker-compose build run my steps twice? - docker

I'm using multi-stage building with a Dockerfile like this:
#####################################
## Build the client
#####################################
FROM node:12.19.0 as web-client-builder
WORKDIR /workspace
COPY web-client/package*.json ./
# Running npm install before we update our source allows us to take advantage
# of docker layer caching. We are excluding node_modules in .dockerignore
RUN npm ci
COPY web-client/ ./
RUN npm run test:ci
RUN npm run build
#####################################
## Host the client on a static server
#####################################
FROM nginx:1.19 as web-client
COPY --from=web-client-builder /workspace/nginx-templates /etc/nginx/templates/
COPY --from=web-client-builder /workspace/nginx.conf /etc/nginx/nginx.conf
COPY --from=web-client-builder /workspace/build /var/www/
#####################################
## Build the server
#####################################
FROM openjdk:11-jdk-slim as server-builder
WORKDIR /workspace
COPY build.gradle settings.gradle gradlew ./
COPY gradle ./gradle
COPY server/ ./server/
RUN ./gradlew --no-daemon :server:build
#####################################
## Start the server
#####################################
FROM openjdk:11-jdk-slim as server
WORKDIR /app
ARG JAR_FILE=build/libs/*.jar
COPY --from=server-builder /workspace/server/$JAR_FILE ./app.jar
ENTRYPOINT ["java","-jar","/app/app.jar"]
I also have a docker-compose.yml like this:
version: "3.8"
services:
server:
restart: always
container_name: server
build:
context: .
dockerfile: Dockerfile
target: server
image: server
ports:
- "8090:8080"
web-client:
restart: always
container_name: web-client
build:
context: .
dockerfile: Dockerfile
target: web-client
image: web-client
environment:
- LISTEN_PORT=80
ports:
- "8091:80"
The two images involved here, web-client and server are completely independent. I'd like to take advantage of multi-stage build parallelization.
When I run docker-compose build (I'm on docker-compose 1.27.4), I get output like this
λ docker-compose build
Building server
Step 1/24 : FROM node:12.19.0 as web-client-builder
---> 1f560ce4ce7e
... etc ...
Step 6/24 : RUN npm run test:ci
---> Running in e9189b2bff1d
... Runs tests ...
... etc ...
Step 24/24 : ENTRYPOINT ["java","-jar","/app/app.jar"]
---> Using cache
---> 2ebe48e3b06e
Successfully built 2ebe48e3b06e
Successfully tagged server:latest
Building web-client
Step 1/11 : FROM node:12.19.0 as web-client-builder
---> 1f560ce4ce7e
... etc ...
Step 6/11 : RUN npm run test:ci
---> Using cache
---> 0f205b9549e0
... etc ...
Step 11/11 : COPY --from=web-client-builder /workspace/build /var/www/
---> Using cache
---> 31c4eac8c06e
Successfully built 31c4eac8c06e
Successfully tagged web-client:latest
Notice that my tests (npm run test:ci) run twice (Step 6/24 for the server target and then again at Step 6/11 for the web-client target). I'd like to understand why this is happening, but I guess it's not a huge problem, because at least it's cached by the time it gets around to the tests the second time.
Where this gets to be a bigger problem is when I try to run my build in parallel. Now I get output like this:
λ docker-compose build --parallel
Building server ...
Building web-client ...
Building server
Building web-client
Step 1/11 : FROM node:12.19.0 as web-client-builderStep 1/24 : FROM node:12.19.0 as web-client-builder
---> 1f560ce4ce7e
... etc ...
Step 6/24 : RUN npm run test:ci
---> e96afb9c14bf
Step 6/11 : RUN npm run test:ci
---> Running in c17deba3c318
---> Running in 9b0faf487a7d
> web-client#0.1.0 test:ci /workspace
> react-scripts test --ci --coverage --reporters=default --reporters=jest-junit --watchAll=false
> web-client#0.1.0 test:ci /workspace
> react-scripts test --ci --coverage --reporters=default --reporters=jest-junit --watchAll=false
... Now my tests run in parallel twice, and the output is interleaved for both parallel runs ...
It's clear that the tests are running twice now, because now that I'm running the builds in parallel, there's no chance for them to cache.
Can anyone help me understand this? I thought that one of the high points of docker multi-stage builds was that they were parallelizable, but this behavior doesn't make sense to me. What am I misunderstanding?
Note
I also tried enabling BuildKit for docker-compose. I had a harder time making sense of the output. I don't believe it was running things twice, but I'm also not sure that it was parallelizing. I need to dig more into it, but my main question stands: I'm hoping to understand why multi-stage builds don't run in parallel in the way I expected without BuildKit.

You can split this into two separate Dockerfiles. I might write a web-client/Dockerfile containing the first two stages (changing the relative COPY paths to ./), and leave the root-directory Dockerfile to build the server application. Then your docker-compose.yml file can point at these separate directories:
services:
server:
build: . # equivalent to {context: ., dockerfile: Dockerfile}
web-client:
build: web-client
As #Stefano notes in their answer, multi-stage builds are more optimized around building a single final image, and in the "classic" builder they always run from the beginning up through the named target stage without any particular logic for where to start.

why multi-stage builds don't run in parallel in the way I expected without BuildKit.
That's the high point of BuildKit.
The main purpose of the multistage in Docker is to produce smaller images by keeping only what's required by the application to properly work. e.g.
FROM node as builder
COPY package.json package-lock.json
RUN npm ci
COPY . /app
RUN npm run build
FROM nginx
COPY --from=/app/dist --chown=nginx /app/dist /var/www
All the development tools required for building the project are simply not copied into the final image. This translates into smaller final images.
EDIT:
From the BuildKit documentation:
BuildKit builds are based on a binary intermediate format called LLB that is used for defining the dependency graph for processes running part of your build. tl;dr: LLB is to Dockerfile what LLVM IR is to C.
In other words, BuildKit is able to evaluate the dependencies for each stage allowing parallel execution.

Related

How should I dockerize and deploy a NestJS monorepo project?

I have NestJS monorepo project with structure as below:
...
apps
app1
app2
app3
...
If I got an idea correctly, I have possibility to run all the applications in same time, i.e. I run command and have access to apps by paths like http://my.domain/app1/, http://my.domain/app2/, http://my.domain/app3/ or in some similar way. And I need to put all apps in a docker container(s) and run them from there.
I haven't found something about this proceess. Did I undestand the idea correctly and where could I know more about deployment NestJS monorepo project?
This is how I solved it:
apps
app1
Dockerfile
...
app2
Dockerfile
...
app3
Dockerfile
...
docker-compose.yml
Each Dockerfile does the same:
FROM node:16.15.0-alpine3.15 AS development
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
FROM node:16.15.0-alpine3.15 AS production
ARG NODE_ENV=production
ENV NODE_ENV=${NODE_ENV}
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install --only=production --omit=dev
COPY --from=development /usr/src/app/dist ./dist
CMD ["npm", "run", "start-app1:prod"]
Where the last line should start the application so adjust that to your project naming.
Later you should build each of the images in your CI/CD pipeline and deploy them separately. To run the docker build from the root folder of the project you just need to provide a Dockerfile path for -f parameter, for example:
docker build -f apps/app1/Dockerfile -t app1:version1 .
docker build -f apps/app2/Dockerfile -t app2:version1 .
docker build -f apps/app3/Dockerfile -t app3:version1 .
To run it locally for tests, utilize docker-compose.yml
version: '3.8'
services:
app1:
image: app1:version1
ports:
- 3000:3000 # set according to your project setup
app2:
...
app3:
...
And start it by calling docker compose up

Multiple Dockerfiles re-using shared library in project

I have the following code structure and I am trying to structure my Dockerfile(s) as to maximize caching and the like.
serverfoo/
Dockerfile
main.go
serverbar/
Dockerfile
main.go
proto/
Dockerfile
sharedproto.proto // Generates a sharedproto.pb.go file to be imported.
Both serverfoo and serverbar import the compiled sharedproto.pb.go file which I manually regenerate on my workstation. This works fine but now I am attempting to containerize my two servers.
The Dockerfiles with my server folders cannot (by default) copy proto/ content. Ideally I pre-compile the protobufs into a sharedproto.pb.go then import a cached version of that file into the two server Dockerfiles. The goal is to cache the compiled protobufs until the underlying protos are modified.
I am new to Docker and need some best practice for this type of thing. I want to avoid a root Dockerfile in my project's directory that just has code to compile a zillion different servers.
I am open to restructuring my project to some degree.
NOTE: I suppose your target is to have on the specific server
container both the compiled go file ( from specific main.go file )
and the compiled protocol buffer file ( from shared
sharedproto.proto file ).
Assuming your files are organized as follow on your workstation:
serverfoo/
Dockerfile
main.go
serverbar/
Dockerfile
main.go
proto/
Dockerfile
sharedproto.proto
You can structure the specific server Dockerfile using the multistage build as follow ( e.g. serverbar Dockerfile ):
#####
# The serverbar Dockerfile
#####
#----
# Compile proto stage
#----
FROM moul/protoc-gen-gotemplate AS protostage
WORKDIR /workspace
# Copy .proto file
COPY proto/sharedproto.proto .
# Compile .pb.go
RUN protoc -I=. --go_out=. sharedproto.proto
#----
# Build stage
#----
FROM golang:1.12.4-alpine3.9 as buildstage
WORKDIR /workspace
COPY serverbar/main.go .
RUN GOOS=linux GOARCH=amd64 go build -o serverbar main.go
#----
# Final stage
#----
FROM alpine:3.7
WORKDIR /home
COPY --from=buildstage workspace/serverbar .
COPY --from=protostage workspace/sharedproto.pb.go .
CMD ["./serverbar"]
Using this approach you basically have the following 3 stages:
proto stage: On the container created on this stage you need to compile the shared protocol buffer source file into the sharedproto.pb.go that then will be included on the third final stage. So here you would need to install on the container the protoc compiler and the related Go plugin. However, as usual with Docker, you'll find a docker image that already includes your needed tools. For this purpose we can start from the moul/protoc-gen-gotemplate docker image.
Specifically the follow Dockerfile instruction generates the workspace/sharedproto.pb.go:
RUN protoc -I=. --go_out=. sharedproto.proto
build stage: Here you need to compile the server source file into the executable one. Also this will be included on the third final stage. To avoid to install Golang we can start from the golang:1.12.4-alpine3.9 docker image that already includes all the needed tools.
Specifically the follow Dockerfile instruction generates the workspace/serverbar executable:
RUN GOOS=linux GOARCH=amd64 go build -o serverbar main.go
final stage: This is the server container that we'll then upload on our Docker registry for test or production where we'll copy the files compiled on the previous two stage with the following commands:
COPY --from=buildstage workspace/serverbar .
COPY --from=protostage workspace/sharedproto.pb.go .
One of the advantages of this solution is that, for each server build, you can cache the compiled protobufs until the underlying protos are modified.
Example:
Building first time the serverbar container we can note that .proto compilation is performed on a new container with id 92ae211bd27d:
> docker build -f serverbar/Dockerfile .
Sending build context to Docker daemon 10.24kB
Step 1/13 : FROM moul/protoc-gen-gotemplate AS protostage
---> 635345fde953
Step 2/13 : WORKDIR /workspace
---> Using cache
---> de8890a5e775
Step 3/13 : COPY proto/sharedproto.proto .
---> 1253fa0576aa
Step 4/13 : RUN protoc -I=. --go_out=. sharedproto.proto
---> Running in 8426f5810b98
Removing intermediate container 8426f5810b98
---> 92ae211bd27d <=========================================
Step 5/13 : FROM golang:1.12.4-alpine3.9 as buildstage
---> b97a72b8e97d
Step 6/13 : WORKDIR /workspace
....
Building then a second time without modifying the sharedproto.proto we can note that container with id 92ae211bd27d is re-used from cache.
> docker build -f serverbar/Dockerfile .
Sending build context to Docker daemon 10.24kB
Step 1/13 : FROM moul/protoc-gen-gotemplate AS protostage
---> 635345fde953
Step 2/13 : WORKDIR /workspace
---> Using cache
---> de8890a5e775
Step 3/13 : COPY proto/sharedproto.proto .
---> Using cache
---> 1253fa0576aa
Step 4/13 : RUN protoc -I=. --go_out=. sharedproto.proto
---> Using cache <=========================================
---> 92ae211bd27d
Step 5/13 : FROM golang:1.12.4-alpine3.9 as buildstage
---> b97a72b8e97d
....

Webpack app in docker needs environment variables before it can be built

New to docker so maybe I'm missing something obvious...
I have an app split into a web client and a back end server. The back end is pretty easy to create an image for via a Dockerfile:
COPY source
RUN npm install, npm run build
CMD npm run start
The already-built back end app will then access the environment variables at runtime.
With the web client it's not as simple because webpack needs to have the environment variables before the application is built. This leaves me as far as I'm aware only two options:
Require the user to build their own image from the application source
Build the web client on container run by running npm run build in CMD
Currently I'm doing #2 but both options seem wrong to me. What's the best solution?
FROM node:latest
COPY ./server /app/server
COPY ./web /app/web
WORKDIR /app/web
CMD ["sh", "-c", "npm install && npm run build && cd ../server && npm install && npm run build && npm run start"]
First, it would be a good idea for both the backend server and web client to each have their own Dockerfile/image. Then it would be easy to run them together using something like docker-compose.
The way you are going to want to provide environment variables to the web Dockerfile is by using build arguments. Docker build arguments are available when building the Docker image. You use these by specifying the ARG key in the Dockerfile, or by passing the --build-arg flag to docker build.
Here is an example Dockerfile for your web client based on what you provided:
FROM node:latest
ARG NODE_ENV=dev
COPY ./web /app/web
WORKDIR /app/web
RUN npm install \
&& npm run build
CMD ["npm", "run", "start"]
The following Dockerfile uses the ARG directive to create a variable with a default value of dev.
The value of NODE_ENV can then be overridden when running docker build.
Like so:
docker build -t <myimage> --build-arg NODE_ENV=production .
Whether you override it or not NODE_ENV will be available to webpack before it is built. This allows you to build a single image, and distribute it to many people without them having to build the web client.
Hopefully this helps you out.

Artifact caching for multistage docker builds

I have a Dockerfiles like this
# build-home
FROM node:10 AS build-home
WORKDIR /usr/src/app
COPY /home/package.json /home/yarn.lock /usr/src/app/
RUN yarn install
COPY ./home ./
RUN yarn build
# build-dashboard
FROM node:10 AS build-dashboard
WORKDIR /usr/src/app
COPY /dashboard/package.json /dashboard/yarn.lock /usr/src/app/
RUN yarn install
COPY ./dashboard ./
RUN yarn build
# run
FROM nginx
EXPOSE 80
COPY nginx.conf /etc/nginx/nginx.conf
COPY --from=build-home /usr/src/app/dist /usr/share/nginx/html/home
COPY --from=build-dashboard /usr/src/app/dist /usr/share/nginx/html/dashboard
Here building two react application and then artifacts of build are put in nginx. To improve build performance, I need to cache the dist folder in the build-home andbuild-dashboard build-stages.
For this i create a volume in docker-compose.yml
...
web:
container_name: web
build:
context: ./web
volumes:
- ./web-build-cache:/usr/src/app
ports:
- 80:80
depends_on:
- api
I’ve stopped at this stage because I don’t understand how to add volume created bydocker-compose first for the build-home stage, and after adding thisvolume to the build-dashboard.
Maybe i should be create a two volumes and attach each to each of build stages, but how do this?
UPDATE:
Initial build.
Home application:
Install modules: 100.91s
Build app: 39.51s
Dashboard application:
Install modules: 100.91s
Build app: 50.38s
Overall time:
real 8m14.322s
user 0m0.560s
sys 0m0.373s
Second build (without code or dependencies change):
Home application:
Install modules: Using cache
Build app: Using cache
Dashboard application:
Install modules: Using cache
Build app: Using cache
Overall time:
real 0m2.933s
user 0m0.309s
sys 0m0.427s
Third build (with small change in code in first app):
Home application:
Install modules: Using cache
Build app: 50.04s
Dashboard application:
Install modules: Using cache
Build app: Using cache
Overall time:
real 0m58.216s
user 0m0.340s
sys 0m0.445s
Initial build of home application without Docker: 89.69s
real 1m30.111s
user 2m6.148s
sys 2m17.094s
Second build of home application without Docker, the dist folder exists on disk (without code or dependencies change): 18.16s
real 0m18.594s
user 0m20.940s
sys 0m2.155s
Third build of home application without Docker, the dist folder exists on disk (with small change in code): 20.44s
real 0m20.886s
user 0m22.472s
sys 0m2.607s
In the docker-container, the third builds of the application is 2 times longer. This shows that if the result of the first build is on disk, other builds completed faster. In the docker container, all assemblies after the first are executed as long as the first, because there is no dist folder.
If you're using multi-stage builds then there's a problem with docker cache. The final image don't have layers with build steps. By using --target and --cache-from together you can save this layers and reuse them in rebuild.
You need something like
docker build \
--target build-home \
--cache-from build-home:latest \
-t build-home:latest
docker build \
--target build-dashboard \
--cache-from build-dashboard:latest \
-t build-dashboard:latest
docker build \
--cache-from build-dashboard:latest \
--cache-from build-home:latest \
-t my-image:latest \
You can find more details at
https://andrewlock.net/caching-docker-layers-on-serverless-build-hosts-with-multi-stage-builds---target,-and---cache-from/
You can't use volumes during image building, and in any case Docker already does the caching you're asking for. If you leave your Dockerfile as-is and don't try to add your code in volumes in the docker-compose.yml, you should get caching of the built Javascript files access rebuilds as you expect.
When you run docker build, Docker looks at each step in turn. If the input to the step hasn't changed, the step itself hasn't changed, and any files that are being added haven't changed, then Docker will just reuse the result of running that step previously. In your Dockerfile, if you only change the nginx config, it will skip over all of the Javascript build steps and reuse their result from the previous time around.
(The other relevant technique, which you already have, is to build applications in two steps: first copy in files like package.json and yarn.lock that name dependencies, and install dependencies; then copy in and build your application. Since the "install dependencies" step is frequently time-consuming and the dependencies change relatively infrequently, you want to encourage Docker to reuse the last build's node_modules directory.)

Use Docker to run a build process

I'm using docker and docker-compose to set up a build pipeline. I've got a front-end that's written in javascript and needs to be built before being used. The backend is written in go.
To make this component integrate with the rest of our docker-compose setup, I want to do the building in a docker image as well.
This is the flow I'm going for:
during build do:
build the frontend stuff and put it in /output (that is bound to the
output volume
build the backend server
when running do:
run the server, it has access to the build files in /output
I'm quite new to docker and docker-compose so I'm not sure if this is possible, or even the right thing to do.
For reference, here's my docker-compose.yml:
version: '2'
volumes:
output:
driver: local
services:
frontend:
build: .
volumes:
- output:/output
backend:
build: ./backend
depends_on:
- frontend
volumes:
- output:/output
and Dockerfile:
FROM node
# create working dir
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
ADD package.json /usr/src/app/package.json
# install packages
RUN npm install
COPY . /usr/src/app
# build frontend files and place results in /output
RUN npm build
RUN cp /usr/src/app/build/* /output
And backend/Dockerfile:
FROM go
# copy and build server
COPY . /usr/src/backend
WORKDIR /usr/src/backend
RUN go build
# run the server
ENTRYPOINT ["/usr/src/backend/main"]
Something is wrong here, but I do not know what. It seems as though the output of the build step are not persisted in the output volume. What can I do to fix this?
You cannot attach a volume during docker build.
The reason for this is that the goal of the docker build command is to build an image, and nothing else, it doesn't need to have volumes, as Dockerfile has ADD / COPY.
To produce your output, you should create a script which mostly does the npm install ; npm build ; cp /usr/src/app/build/* /output from your current dockerfile and use this script as the entrypoint / cmd in your dockerfile.
I'm not sure compose can run this, but in any case, I find it more clear wrapped in a shell script that first executes the frontend builder container, then executing the backend container with the output directory as a volume.

Resources