Build multiple docker images without building binaries in each Dockerfile - docker

I have a .NET Core solution with 6 runnable applications (APIs) and multiple netstandard projects. In a build pipeline on Azure DevOps I need to create 6 Docker images and push them to the Azure Registry.
Right now what I do is I build image by image and every one of these 6 Dockerfiles builds the solution from scratch (restores, builds, publishes). This takes a few minutes and the whole pipeline goes almost to 30 minutes.
My goal is to optimize the time of the build. I figured two possible, parallel, ways of doing that:
remove restore and build, run just publish (because it restores references and does the same thing as build)
publish the code once (for all runnable applications) and in Dockerfiles just copy binaries, without building again
Are both ways doable? I can't figure out how to make the second one work - should I just run dotnet publish for each runnable application and then gather all Dockerfiles within the folder with binaries and run docker build? My concern is - I will need to copy required .dll files to the image but how do I choose which ones, without explicitly specifying them?
EDIT:
I'm using Linux containers. I don't write my Dockerfiles - they are autogenerated by Visual Studio. I'll show you one example:
FROM mcr.microsoft.com/dotnet/core/aspnet:2.2-stretch-slim AS base
WORKDIR /app
EXPOSE 80
EXPOSE 443
FROM mcr.microsoft.com/dotnet/core/sdk:2.2-stretch AS build
WORKDIR /src
COPY ["Application.WebAPI/Application.WebAPI.csproj", "Application.WebAPI/"]
COPY ["Processing.Dependency/Processing.Dependency.csproj", "Processing.Dependency/"]
COPY ["Processing.QueryHandling/Processing.QueryHandling.csproj", "Processing.QueryHandling/"]
COPY ["Model.ViewModels/Model.ViewModels.csproj", "Model.ViewModels/"]
COPY ["Core.Infrastructure/Core.Infrastructure.csproj", "Core.Infrastructure/"]
COPY ["Model.Values/Model.Values.csproj", "Model.Values/"]
COPY ["Sql.Business/Sql.Business.csproj", "Sql.Business/"]
COPY ["Model.Events/Model.Events.csproj", "Model.Events/"]
COPY ["Model.Messages/Model.Messages.csproj", "Model.Messages/"]
COPY ["Model.Commands/Model.Commands.csproj", "Model.Commands/"]
COPY ["Sql.Common/Sql.Common.csproj", "Sql.Common/"]
COPY ["Model.Business/Model.Business.csproj", "Model.Business/"]
COPY ["Processing.MessageBus/Processing.MessageBus.csproj", "Processing.MessageBus/"]
COPY ["Processing.CommandHandling/Processing.CommandHandling.csproj", "Processing.CommandHandling/"]
COPY ["Processing.EventHandling/Processing.EventHandling.csproj", "Processing.EventHandling/"]
COPY ["Sql.System/Sql.System.csproj", "Sql.System/"]
COPY ["Application.Common/Application.Common.csproj", "Application.Common/"]
RUN dotnet restore "Application.WebAPI/Application.WebAPI.csproj"
COPY . .
WORKDIR "/src/Application.WebAPI"
RUN dotnet build "Application.WebAPI.csproj" -c Release -o /app
FROM build AS publish
RUN dotnet publish "Application.WebAPI.csproj" -c Release -o /app
FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "Application.WebApi.dll"]
One more thing - The problem is that azure devops has this job which builds an image and I just copied this job 6 times, pointing every copy to other Dockerfile. That's why they don't reuse the code - I would love to change that so they base on the same binaries. Here are steps in Azure DevOps:
Get sources
Build and push image no. 1
Build and push image no. 2
Build and push image no. 3
Build and push image no. 4
Build and push image no. 5
Build and push image no. 6
Every single 'Build and push image' does:
dotnet restore
dotnet build
dotnet publish
I want to get rid of this overhead - is it possible?

It's hard to say without seeing your Dockerfiles, but you probably are making some mistakes that are adding time to the image build. For example, each command in a Dockerfile results in a layer. Docker caches these layers and only rebuilds the layer if it or previous layers have changed.
A very common mistake people make is to copy their entire project with all the files within first, and then run dotnet restore. When you do that, any change to any file invalidates that copy layer and thus also the dotnet restore layer, meaning that you have to restore packages every single build. The only thing necessary for the dotnet restore is the project file(s), so if you copy just those, run dotnet restore, and then copy all the files, those layers will be cached, unless the project file itself changes. Since that normally only happens when you change packages (add, update, remove, etc.), most of the time, you will not have to repeat the restore step, and the build will go much quicker.
Another issue can occur when you're using npm and Linux images on Windows. This one bit me personally. In order to support Linux images, Docker uses a Linux VM (MobyLinux). At the start of a build, Docker lifts the entire filesystem context (i.e. where you run the docker command) into the MobyLinux VM, first, as all the Dockerfile commands will be run actually in the VM, and thus the files will need to reside there. If you have a node_modules directory, it can take a significant amount of time to move all that over. You can solve this by adding node_modules to your .dockerignore file.
There's other similar types of mistakes you might be making. We'd really need to see your Dockerfiles to help you further. Regardless, you should not go with either of your proposed approaches. Just running publish will suffer from the same issues described above, and gives you no recourse to alleviate the problem at that point. Publishing outside of the image can lead to platform inconsistencies and other problems unless you're very careful. It also adds a bunch of manual steps to the image building process, which defeats a lot of the benefit Docker provides. Your images will be larger as well, unless you just happen to publish on exactly the same architecture as what the image will use. If you're developing on Windows, but using Linux images, for example, you'll have to include the full ASP.NET Core runtime. If you build and publish within the image, you can include the SDK only in a stage to build and publish, and then target something like alpine linux, with a self-contained architecture-specific publish.

Related

Why does docker bother of context if we do not copy all

In various sites of Docker official web, it warns about the folder that is sent to docker daemon (they call as context) to build new image with docker build. For example, from understand-build-context
Inadvertently including files that are not necessary for building an
image results in a larger build context and larger image size. This
can increase the time to build the image, time to pull and push it,
and the container runtime size. To see how big your build context is,
look for a message like this when building your Dockerfile:
Sending build context to Docker daemon 187.8MB
I do not understand why the context is so important if we do not use all its content.
Let say that my build context is a 1GB folder, but in Dockerfile I have only one COPY command of a file of 1KB. Then why do we bother about the rest? How could the rest affect the size of my image?
Similarly, why do we have .dockerignore? If I do not use them in Dockerfile, are not they ignored at all? If not, then for what are they used?
Let say that my build context is a 1GB folder, but in Dockerfile...
The Dockerfile is normally transferred as part of the build context. Perhaps the easiest place to see this is in the "build an image" Docker HTTP API: the dockerfile parameter is explicitly a path within the build context, which is expected to be transferred in the HTTP body as a tar file. In that low-level API there's no way to pass the Dockerfile outside of that build-context tar-file HTTP body.
So first you send the build context to the Docker daemon, then the daemon unpacks it, and then it reads the Dockerfile and sees
I have only one COPY command of a file of 1KB.
so only that one file is copied into the resulting image; the rest of the context is just ignored.
Then why do we bother about the rest? How could the rest affect the size of my image? Similarly, why do we have .dockerignore?
Sending the build context is surprisingly slow. Even if you're not using remote Docker, and working directly on a native-Linux host, it can take multiple seconds to send that 1 GB tar-file build context over the Unix socket. So smaller build contexts can result in faster builds, and .dockerignore is a convenient way to cause things you're not going to use to be omitted from the build context.
It is very common to copy the entire build context into an image, though, and in this case it's important to control what goes in there. Let's consider a typical Node application. In day-to-day development I might just use Node, so I'll have a package.json file and a src subdirectory, but Node installs all of its dependencies in a node_modules subdirectory as well. A typical Node Dockerfile will look something like
FROM node:lts
WORKDIR /app
# Copy and install dependencies
COPY package*.json ./
RUN npm ci
# Copy and build the rest of the application
COPY ./ ./ # <-- IMPORTANT
RUN npm run build
# Explain how to run the container
EXPOSE 3000
CMD ["node", "./build/index.js"]
The RUN npm ci line recreates the node_modules directory inside the image. In the next line I copy the entire build context – my src directory, webpack.js configuration, .typescript configuration, static assets, the whole works - into the image, with enough parts and local files that I'd prefer to not list them out individually.
In that context it's important that COPY ./ ./ not include the host's node_modules directory. The host might be a different OS, or a different C library version, or any of several other things that might cause incompatibilities. That's where putting it in .dockerignore lets me say "copy everything, except this".
Your question hints at a very carefully curated build-context directory. That's a possibility too; in particular it's something that made sense with a compiled language, on a native-Linux host, before Docker multi-stage builds existed. You could consider writing something like a Makefile that copied specific files from your source tree into a dedicated docker directory, and then used that directory as the build context. Then you'd know exactly what was in the build context and therefore exactly what was going into the image. With modern Docker and multi-stage builds, I feel like this setup is a little unusual though.
The documentation was written before buildkit became standard in docker, but it's still a good practice for older build tooling. The reason for this in the classic builder is that docker is a client/server based app. To run a build, the client sends over the entire context, Dockerfile, and all the parameters for the server to build, and the server runs that build, pulling parts out of the context that the Dockerfile requests. As much as it looks like everything is happening locally, and often is, the server could be a remote host without direct access to your filesystem, and the build process is a JSON REST API that sends the request and then monitors for the build to complete.
Buildkit, however, changes this. Both the server and the client communicate with each other, and the server has a cache of not only the previous builds, but of the previous build contexts. So when a file changes in the context between builds, it can perform the equivalent of an rsync to send just that one file, and only when the server requests it from the client.
There is still a need for a .dockerignore since even with buildkit, you often want to exclude files within the build that would otherwise be copied in a wildcard match. For example, if you have the step:
COPY . /src
Then even with the buildkit caching, you'll include every file in the directory, even if a number of those files aren't needed to build your app (like the .git folder, the Dockerfile itself, the README, LICENSE, etc). That not only bloats your image and makes your builds slower, but it risks causing a cache miss when the resulting image would normally be unchanged.
Some will make the .dockerignore look similar to their .gitignore with some added files that don't affect the build. I often do the reverse, excluding everything, and then reincluding only the files I need with the ! prefix. E.g. the following would include only the Makefile, src, and static folders:
*
!Makefile
!src/
!static/
If you do that, make sure you remember to update it when adding new files or directories to your builds.

Docker multistage build without copying from previous image?

does it have any advantages to use a multistage build in Docker, if you don't copy any files from the previously built image?
eg.
FROM some_base_image as base
#Some random commands
RUN mkdir /app
RUN mkdir /app2
RUN mkdir /app3
#ETC
#Second stage starts from first stage
FROM base
#Add some files to image
COPY foo.txt /app
Does this result in a smaller image or offer any other advantages compared to a non multi-stage version? Or are multi stage builds only useful for preparing some files and then copying those into another base image?
Or are multi stage builds only useful for preparing some files and then copying those into another base image?
This is the main use-case discussed in "Use multi-stage builds"
The main goal is to reduce the number of layers by copying files from one image to another, without including the build environment needed to produce said files.
But, another goal could be not rebuild the entire Dockerfile including every stage.
Then your suggestion (not copying) could still apply.
You can specify a target build stage. The following command assumes you are using the previous Dockerfile but stops at the stage named builder:
$ docker build --target builder -t alexellis2/href-counter:latest .
A few scenarios where this might be very powerful are:
Debugging a specific build stage
Using a debug stage with all debugging symbols or tools enabled, and a lean production stage
Using a testing stage in which your app gets populated with test data, but building for production using a different stage which uses real data

Including pkg in .dockerignore file

Right now my .dockerignore file has this contents:
.vscode
.idea
.git
bin
pkg
and my Dockerfile looks like:
FROM golang:latest
RUN mkdir -p /app
WORKDIR /app
COPY . .
ENV GOPATH /app
RUN go install huru
EXPOSE 3000
ENTRYPOINT /app/bin/huru
My question is - should I be copying the pkg folder from host to image or not? Right now I am not, as my dockerignore file makes clear.
I get the feeling that I should just COPY the pkg folder from host to image, because that might have pre-built files in it that go install can use instead of re-downloading the source from github etc?
Personally, I think copying pkg folder from host to image is not a good idea because :
it tightly couples the place from where you are building the image (your host) and the image itself. You could potentially have differences in resulting images depending on where you build the image, and that's probably what you don't want
moreover, if you have automated builds (from CI for example), you're probably rebuilding the whole application from a clean environment each time, so there is no initial pkg folder to copy.
If you're familiar with Java world, I've already encountered that problem for images built with Maven. To speed up the build, some people are copying their local Maven repository (~/.m2) in the image to avoid redownloading artifacts. I don't particularly agree with that, since there is always a risk that their .m2 folder contains corrupted artifacts : therefore, the image built on their machine will be different than if it was built on a clean environment. It depends on whether you want to have consistent builds or quick builds (I prefer the former).
In conclusion, I think that building images from a clean environment, without depending on the host where the image is built, is a good practice. That's why I personally would not copy any files (except application source code!) inside the image.

How to create a docker container using a project solution where lib projects are located one level higher than the building context

I have a VS2017 (v5.18.0) solution which contains a .NET Core 2.0 console application "ReferenceGenerator" as the "startup" application. The solution contains also two .Net Core lib 2.0 projects FwCore and LibReferenceGenerator, which are "homegrown" libs. I have added docker support (Linux) and so all files needed to create a docker application are added. I can debug the application in the "docker-compose" mode with "docker for windows in Linux mode". And the application works fine. If I try to build a release version I get an error that a COPY occurs from an illegal path. The docker file looks like this:
FROM microsoft/dotnet:2.0-runtime AS base
WORKDIR /app
FROM microsoft/dotnet:2.0-sdk AS build
WORKDIR /src
COPY ReferenceGenerator/ReferenceGenerator.csproj
ReferenceGenerator/
COPY ../LibReferenceGenerator/LibReferenceGenerator.csproj ../LibReferenceGenerator/
COPY ../FwCore/FwCore/FwCore.csproj ../FwCore/FwCore/
RUN dotnet restore
ReferenceGenerator/ReferenceGenerator.csproj
COPY . .
WORKDIR /src/ReferenceGenerator
RUN dotnet build ReferenceGenerator.csproj -c Release -o /app
FROM build AS publish
RUN dotnet publish ReferenceGenerator.csproj -c Release -o /app
FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "ReferenceGenerator.dll"]
The line with beneath content:
COPY ../LibReferenceGenerator/LibReferenceGenerator.csproj ../LibReferenceGenerator/
Is causing error:
Step 6/17 : COPY ../LibReferenceGenerator/LibReferenceGenerator.csproj ../LibReferenceGenerator/
1>Service 'referencegenerator' failed to build: COPY failed: Forbidden path outside the build context: ../LibReferenceGenerator/LibReferenceGenerator.csproj ()
I have read that relative paths are not allowed, so be it. But the output of the compiler is already complete in the bin directory of the project ReferenceGenerator. I already tried to remove the two copy lines referencing the libs but then the build complains about the missing lib project files at the dotnet build stage.
Having some "homebuild" lib projects being included in an solution seems to me a very common situation. I am a newbee on docker containers and I have no idea how to fix this, anyone?
Additional info my file structure looks like this:
/Production/ReferenceGenerator/ReferenceGenerator.sln
/Production/ReferenceGenerator/ReferenceGenerator/ReferenceGenerator.csproj
/Production/LibReferenceGenerator/LibReferenceGenerator.csproj
/Production/FwCore/FwCore/FwCore.csproj
/Production/ReferenceGenerator/ReferenceGenerator/Dockerfile
Please anyone. The people that tried to help me have not succeeded in doing so. I'm completely stuck in development....
The answer is, there is no solution...
If you need libraries you must include them by using (private) nuget libraries.
It is not a neat solution because while debugging you do not have the sources of your libraries available but including libs outside the build context is a no go I learned researching the internet...
Also in a micro-service environment sharing code should be minized to avoid teams breaking code of other teams. Sorry for all developers who liked to have a solution for this problem, again beside a workaround using nuget packages there is none!
As the error says, you can't copy files that exist outside of the build context. When you run a command like docker image build ., that last argument (.) specifies the build context. That context is copied to the Docker engine for building. Files outside of that (e.g., ../LibReferenceGenerator/LibReferenceGenerator.csproj) simply don't exist.
So, for your example to work, you need to adjust your build context up one level in order to access LibReferenceGenerator and FwCore. Then, make the source of your COPY instructions relative to that one-level up context.
Note that the default location of the Dockerfile is a file named Dockerfile at your build context. You'll need to either move your Dockerfile, or specify a custom path using the -f, --file option.
docker image build documentation
You are missing one level in the copy.
It should be:
COPY ../../LibReferenceGenerator/LibReferenceGenerator.csproj ../LibReferenceGenerator/

What is a Docker build stage?

As far as I understand build stages in Docker are fundamental things, and I have a practical understanding of them but I have trouble coming up with a proper definition, and I also can't seem to find one.
So: what is the definition of a Docker build stage?
Edit: I'm not asking "how do I use a build stage?" or "how can I use multi-build stages?" which people seem very eager to answer :-)
The reason I have this question is because I saw the following sentences in the docs:
"The FROM instruction initializes a new build stage"
"a name can be given to a new build stage"
Which left me wondering: what exactly is a build stage?
I don't think there will ever be a strict definition for Docker build stage because a build stage is in general something theoretical which:
can be defined by you
depends on your case (language / libraries)
In this question: Difference between build and deploy? one of the answers says...
Build means to Compile the project.
I think you can see it this way too. A build stage is any procedure that generates something which can later be taken and used.
The idea with docker multi-stage builds is to:
generate what you are going to need
leave behind what you don't need and use the product of step 1 in a more lightweight way
If you have read the docs, Alex Ellis has a nice example where the same logic takes place:
he starts with a golang image, adds libraries, builds his app (Go generates a binary executable file)
after that, he doesn't need golang and the libraries to ship/run it so, he picks an alpine image, adds the executable file from step 1 and ships his app with an image that has much smaller size.
Since version 17, docker now supports multiple stages during a docker build executions.
This means, that you no longer need to define only one source image in your docker file and do the whole build in a single run, but you can define multiple stages with different images in your Dockerfile for each stage with multiple FROM definitions:
# Build stage
FROM microsoft/aspnetcore
# ..do a build with a dev image for creating ./app artifact
# Publish - use a hardened, production image
FROM alpine:latest
CMD ["./app"]
This gives you the benefit to break your image building process to be optimized for a task that you are doing in a stage - for example the stages could be:
use an image with extra linting dependencies to check your source
use a dev-image with all development dependencies already installed to build your source
use another image including test frameworks to run various tests on the artifacts
and once everything passed ok, use a minimal-sized, optimized, hardened image to capture the final artifacts for production
Read more in details about multistage-build:
https://docs.docker.com/develop/develop-images/multistage-build/
A stage is the creation an image. In a multi-stage build, you go through the process of creating more than one image, however you typically only tag a single one (exceptions being multiple builds, building a multi-architecture image manifest with a tool like buildx, and anything else docker releases after this answer).
Each stage, building a distinct image, starts from a FROM line in the Dockerfile. One stage doesn't inherit anything done in previous stages, it is based on its own base image. So if you have the following:
FROM alpine as stage1
RUN apk add your_tool
FROM alpine as stage2
RUN your_tool some args
you will get an error since your_tool is not installed in the second stage.
Which stage do you get as output from the build? By default the last stage, but you can change that with the docker image build --target stage1 . to build the stage with the name, stage1 in this example. The classic docker build will run from the top of the Dockerfile until if finishes the target stage. Buildkit builds a dependency graph and builds stages concurrently and only if needed, so do not depend on this ordering to control something like a testing workflow in your Dockerfile (buildkit can see if nothing in the test stage is needed in your release stage and skip building the test).
What's the value of multiple stages? Typically, its done to separate the build environment from the runtime environment. It allows you to perform the entire build inside of docker. This has two advantages.
First, you don't require an external Makefile and various compilers and other tools installed on the host to compile the binaries that then get copied into the image with a COPY line, anyone with docker can build your image.
And second, the resulting image doesn't include all the compilers or other build time tooling that isn't needed at runtime, resulting in smaller and more secure images. The typical example is a java app with maven and a full JDK to build, a runtime with just the jar file and the JRE.
If each stage makes a separate image, how do you get the jar file from the build stage to the run stage? That comes from a new option to the COPY command, --from. An oversimplified multi-stage build looks like:
FROM maven as build
COPY src /app/src
WORKDIR /app/src
RUN mvn install
FROM openjdk:jre as release
COPY --from=build /app/src/target/app.jar /app
CMD java -jar /app/app.jar
With that COPY --from=build we are able to take the artifact built in the build stage and add it to the release stage, without including anything else from that first stage (no layers of compile tools like JDK or Maven get added to our second stage).
How is the FROM x as y and the COPY --from=y /a /b working together? The FROM x as y is defining an image name for the duration of this build, in this case y. Anywhere later in the Dockerfile that you would put an image name, you can put y and you'll get the result of this stage as your input. So you could say:
FROM upstream as mybuilder
RUN apk add common_tools
FROM mybuilder as stage2
RUN some_tool arg2
FROM mybuilder as stage3
RUN some_tool arg3
FROM minimal_base as release
COPY --from=stage2 /bin2 /
COPY --from=stage3 /bin3 /
Note how stage2 and stage3 are each FROM mybuilder that is the output of the first stage.
The COPY --from=y allows you to change the context where you are copying from to be another image instead of the build context. It doesn't have to be another stage. So, for example, you could do the following to get a docker binary in your image:
FROM alpine
COPY --from=docker:stable /usr/local/bin/docker /usr/local/bin/
Further documentation on this is available at: https://docs.docker.com/develop/develop-images/multistage-build/
a build stage starts at a FROM statement and ends at the step before the next FROM statement
stage | steɪdʒ |
noun
a point, period, or step in a process or development
Take a practical example: you want to build an image which contains a production ready web server with Typescript files compiled to Javascript. You want to build that Typescript within a Docker container to simplify dependency management. So you need:
node.js
Typescript
any dependencies needed for compilation
Webpack or whatever
nginx/Apache/whatever
In your final image you only really need the compiled .js files and, say, nginx. But to get there, you need all that other stuff first. When you upload that final image, it will contain all the intermediate layers, even if they're unnecessary for the final product.
Docker build stages now allow you to actually separate those stages, or steps, into separate images, while still using just one Dockerfile and not needing to glue several Dockerfiles together with external shell scripts or such. E.g.:
FROM node as builder
RUN npm install ...
# whatever you need to build your files
FROM nginx as production
COPY --from=builder /final.js /var/www/html
The final result of this Dockerfile is a small image with nginx as its base plus just the final .js file. It does not contain all the unnecessary stuff like node.js and the npm dependencies.
builder here is the first stage, production is the second stage. In this case the first stage will be discarded at the end of the process, but you can also choose to build a specific stage using docker build --target=builder. A new FROM introduces a new, separate stage. They're essentially separate Dockerfiles, but they can share data using COPY --from.

Resources