Including pkg in .dockerignore file - docker

Right now my .dockerignore file has this contents:
.vscode
.idea
.git
bin
pkg
and my Dockerfile looks like:
FROM golang:latest
RUN mkdir -p /app
WORKDIR /app
COPY . .
ENV GOPATH /app
RUN go install huru
EXPOSE 3000
ENTRYPOINT /app/bin/huru
My question is - should I be copying the pkg folder from host to image or not? Right now I am not, as my dockerignore file makes clear.
I get the feeling that I should just COPY the pkg folder from host to image, because that might have pre-built files in it that go install can use instead of re-downloading the source from github etc?

Personally, I think copying pkg folder from host to image is not a good idea because :
it tightly couples the place from where you are building the image (your host) and the image itself. You could potentially have differences in resulting images depending on where you build the image, and that's probably what you don't want
moreover, if you have automated builds (from CI for example), you're probably rebuilding the whole application from a clean environment each time, so there is no initial pkg folder to copy.
If you're familiar with Java world, I've already encountered that problem for images built with Maven. To speed up the build, some people are copying their local Maven repository (~/.m2) in the image to avoid redownloading artifacts. I don't particularly agree with that, since there is always a risk that their .m2 folder contains corrupted artifacts : therefore, the image built on their machine will be different than if it was built on a clean environment. It depends on whether you want to have consistent builds or quick builds (I prefer the former).
In conclusion, I think that building images from a clean environment, without depending on the host where the image is built, is a good practice. That's why I personally would not copy any files (except application source code!) inside the image.

Related

Build .deb package in one Docker build stage, install in another stage, but without the .deb package itself taking up space in the final image?

I have a multistage Docker build file. The first stage creates an installable package (e.g. a .deb file). The second stage needs only to install the .deb package.
Here's an example:
FROM debian:buster AS build_layer
COPY src/ /src/
WORKDIR /src
RUN ./build_deb.sh
# A lot of stuff happens here and a big huge .deb package pops out.
# Suppose the package is 300MB.
FROM debian:buster AS app_layer
COPY --from=build_layer /src/myapp.deb /
RUN dpkg -i /myapp.deb && rm /myapp.deb
ENTRYPOINT ["/usr/bin/myapp"]
# This image will be well over 600MB, since it contains both the
# installed package as well as the deleted copy of the .deb file.
The problem with this is that the COPY stage runs in its own layer, and drops the large .deb package into the final build context. Then, the next step installs the package and removes the .deb file. However, since the COPY stage has to execute independently, the .deb package still takes up room in the final image. If it were a small package you might just deal with it, but in my case the package file is hundreds of MB, so its presence in the final build layers does increase the image size appreciably with no benefit.
There are related posts on SO, such as this one which discusses files containing secrets, and this one which is for copying a large installer from outside the container into it (and the solution for that one is still kinda janky, requiring you to run a temporary local http server). However neither of these address the situation of needing to copy from another build stage but not retain the copied file in the final package.
The only way I could think of to do this would be to extend the idea of a web server and make available an SFTP or similar server so that the build layer can upload the package somewhere. But this also requires extra infrastructure, and now you're also dealing with SSH secrets and such, and this starts to get real complex and is a lot less reproducible on another developer's system or in a CI/CD environment.
Alternatively I could use the --squash option in BuildKit, but this ends up killing the advantages of the layer system. I can't then reuse similar layers across multiple images (e.g. the image now can't take advantage of the fact that the Debian base image might exist on the end user's system already). This would minimize space usage, but wouldn't be ideal for a lot of other reasons.
What's the recommended way to approach this?

Why does docker bother of context if we do not copy all

In various sites of Docker official web, it warns about the folder that is sent to docker daemon (they call as context) to build new image with docker build. For example, from understand-build-context
Inadvertently including files that are not necessary for building an
image results in a larger build context and larger image size. This
can increase the time to build the image, time to pull and push it,
and the container runtime size. To see how big your build context is,
look for a message like this when building your Dockerfile:
Sending build context to Docker daemon 187.8MB
I do not understand why the context is so important if we do not use all its content.
Let say that my build context is a 1GB folder, but in Dockerfile I have only one COPY command of a file of 1KB. Then why do we bother about the rest? How could the rest affect the size of my image?
Similarly, why do we have .dockerignore? If I do not use them in Dockerfile, are not they ignored at all? If not, then for what are they used?
Let say that my build context is a 1GB folder, but in Dockerfile...
The Dockerfile is normally transferred as part of the build context. Perhaps the easiest place to see this is in the "build an image" Docker HTTP API: the dockerfile parameter is explicitly a path within the build context, which is expected to be transferred in the HTTP body as a tar file. In that low-level API there's no way to pass the Dockerfile outside of that build-context tar-file HTTP body.
So first you send the build context to the Docker daemon, then the daemon unpacks it, and then it reads the Dockerfile and sees
I have only one COPY command of a file of 1KB.
so only that one file is copied into the resulting image; the rest of the context is just ignored.
Then why do we bother about the rest? How could the rest affect the size of my image? Similarly, why do we have .dockerignore?
Sending the build context is surprisingly slow. Even if you're not using remote Docker, and working directly on a native-Linux host, it can take multiple seconds to send that 1 GB tar-file build context over the Unix socket. So smaller build contexts can result in faster builds, and .dockerignore is a convenient way to cause things you're not going to use to be omitted from the build context.
It is very common to copy the entire build context into an image, though, and in this case it's important to control what goes in there. Let's consider a typical Node application. In day-to-day development I might just use Node, so I'll have a package.json file and a src subdirectory, but Node installs all of its dependencies in a node_modules subdirectory as well. A typical Node Dockerfile will look something like
FROM node:lts
WORKDIR /app
# Copy and install dependencies
COPY package*.json ./
RUN npm ci
# Copy and build the rest of the application
COPY ./ ./ # <-- IMPORTANT
RUN npm run build
# Explain how to run the container
EXPOSE 3000
CMD ["node", "./build/index.js"]
The RUN npm ci line recreates the node_modules directory inside the image. In the next line I copy the entire build context – my src directory, webpack.js configuration, .typescript configuration, static assets, the whole works - into the image, with enough parts and local files that I'd prefer to not list them out individually.
In that context it's important that COPY ./ ./ not include the host's node_modules directory. The host might be a different OS, or a different C library version, or any of several other things that might cause incompatibilities. That's where putting it in .dockerignore lets me say "copy everything, except this".
Your question hints at a very carefully curated build-context directory. That's a possibility too; in particular it's something that made sense with a compiled language, on a native-Linux host, before Docker multi-stage builds existed. You could consider writing something like a Makefile that copied specific files from your source tree into a dedicated docker directory, and then used that directory as the build context. Then you'd know exactly what was in the build context and therefore exactly what was going into the image. With modern Docker and multi-stage builds, I feel like this setup is a little unusual though.
The documentation was written before buildkit became standard in docker, but it's still a good practice for older build tooling. The reason for this in the classic builder is that docker is a client/server based app. To run a build, the client sends over the entire context, Dockerfile, and all the parameters for the server to build, and the server runs that build, pulling parts out of the context that the Dockerfile requests. As much as it looks like everything is happening locally, and often is, the server could be a remote host without direct access to your filesystem, and the build process is a JSON REST API that sends the request and then monitors for the build to complete.
Buildkit, however, changes this. Both the server and the client communicate with each other, and the server has a cache of not only the previous builds, but of the previous build contexts. So when a file changes in the context between builds, it can perform the equivalent of an rsync to send just that one file, and only when the server requests it from the client.
There is still a need for a .dockerignore since even with buildkit, you often want to exclude files within the build that would otherwise be copied in a wildcard match. For example, if you have the step:
COPY . /src
Then even with the buildkit caching, you'll include every file in the directory, even if a number of those files aren't needed to build your app (like the .git folder, the Dockerfile itself, the README, LICENSE, etc). That not only bloats your image and makes your builds slower, but it risks causing a cache miss when the resulting image would normally be unchanged.
Some will make the .dockerignore look similar to their .gitignore with some added files that don't affect the build. I often do the reverse, excluding everything, and then reincluding only the files I need with the ! prefix. E.g. the following would include only the Makefile, src, and static folders:
*
!Makefile
!src/
!static/
If you do that, make sure you remember to update it when adding new files or directories to your builds.

In Dockerfile, COPY all contents of current directory except one directory

In my Dockerfile, I have the following:
COPY . /var/task
...which copies my app code into the image.
I need to exclude the vendor/ directory when performing this copy.
I cannot add vendor/ to .dockerignore, because that directory needs to be part of the image when it gets built within the image with a RUN composer install.
I cannot specify every file and directory that should be copied, because they may change and I can't rely on other developers to keep the list updated.
I've tried the following, with the following errors:
COPY [^vendor$]* /var/task
When using COPY with more than one source file, the destination must be a directory and end with a /
COPY [^vendor$]*/ /var/task
COPY failed: no source files were specified
It is actually enough to add the vendor directory to the .dockerignore file.
You can broadly follow the flow of files through docker build in three phases:
docker build reads files from the directory you name, ignoring things in the .dockerignore file, and sends them to the Docker daemon as the build context.
The COPY instruction copies files from the build context into the container filesystem.
RUN instructions do further transformation or processing.
If you put vendor in the .dockerignore file, it prevents the directory from being included in the build context. The build will go somewhat faster, and COPY won't have the files to copy into the image. It won't prevent a RUN composer install step later on from creating its own vendor directory in the image.
I don't think there is an easy solution to this problem.
If you need vendor for RUN composer install and you're not using a multistage build then it doesn't matter if you remove the vendor folder in the copy command. If you've copied it into the build earlier then it's going to be present in your final image, even if you don't copy it over in your COPY step.
One way to get around this is with multi-stage builds, like so:
FROM debian as base
COPY . /var/task/
RUN rm -rf /var/task/vendor
FROM debian
COPY --from=base /var/task /var/task
If you can use this pattern in your larger build file then the final image will contain all the files in your working directory except vendor.
There's still a performance hit though. You're still going to have to copy the entire vendor directory into the build, and depending on what docker features you're using that will still take a long time. But if you need it for composer install then there's really no way around this.

How to create a docker container using a project solution where lib projects are located one level higher than the building context

I have a VS2017 (v5.18.0) solution which contains a .NET Core 2.0 console application "ReferenceGenerator" as the "startup" application. The solution contains also two .Net Core lib 2.0 projects FwCore and LibReferenceGenerator, which are "homegrown" libs. I have added docker support (Linux) and so all files needed to create a docker application are added. I can debug the application in the "docker-compose" mode with "docker for windows in Linux mode". And the application works fine. If I try to build a release version I get an error that a COPY occurs from an illegal path. The docker file looks like this:
FROM microsoft/dotnet:2.0-runtime AS base
WORKDIR /app
FROM microsoft/dotnet:2.0-sdk AS build
WORKDIR /src
COPY ReferenceGenerator/ReferenceGenerator.csproj
ReferenceGenerator/
COPY ../LibReferenceGenerator/LibReferenceGenerator.csproj ../LibReferenceGenerator/
COPY ../FwCore/FwCore/FwCore.csproj ../FwCore/FwCore/
RUN dotnet restore
ReferenceGenerator/ReferenceGenerator.csproj
COPY . .
WORKDIR /src/ReferenceGenerator
RUN dotnet build ReferenceGenerator.csproj -c Release -o /app
FROM build AS publish
RUN dotnet publish ReferenceGenerator.csproj -c Release -o /app
FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "ReferenceGenerator.dll"]
The line with beneath content:
COPY ../LibReferenceGenerator/LibReferenceGenerator.csproj ../LibReferenceGenerator/
Is causing error:
Step 6/17 : COPY ../LibReferenceGenerator/LibReferenceGenerator.csproj ../LibReferenceGenerator/
1>Service 'referencegenerator' failed to build: COPY failed: Forbidden path outside the build context: ../LibReferenceGenerator/LibReferenceGenerator.csproj ()
I have read that relative paths are not allowed, so be it. But the output of the compiler is already complete in the bin directory of the project ReferenceGenerator. I already tried to remove the two copy lines referencing the libs but then the build complains about the missing lib project files at the dotnet build stage.
Having some "homebuild" lib projects being included in an solution seems to me a very common situation. I am a newbee on docker containers and I have no idea how to fix this, anyone?
Additional info my file structure looks like this:
/Production/ReferenceGenerator/ReferenceGenerator.sln
/Production/ReferenceGenerator/ReferenceGenerator/ReferenceGenerator.csproj
/Production/LibReferenceGenerator/LibReferenceGenerator.csproj
/Production/FwCore/FwCore/FwCore.csproj
/Production/ReferenceGenerator/ReferenceGenerator/Dockerfile
Please anyone. The people that tried to help me have not succeeded in doing so. I'm completely stuck in development....
The answer is, there is no solution...
If you need libraries you must include them by using (private) nuget libraries.
It is not a neat solution because while debugging you do not have the sources of your libraries available but including libs outside the build context is a no go I learned researching the internet...
Also in a micro-service environment sharing code should be minized to avoid teams breaking code of other teams. Sorry for all developers who liked to have a solution for this problem, again beside a workaround using nuget packages there is none!
As the error says, you can't copy files that exist outside of the build context. When you run a command like docker image build ., that last argument (.) specifies the build context. That context is copied to the Docker engine for building. Files outside of that (e.g., ../LibReferenceGenerator/LibReferenceGenerator.csproj) simply don't exist.
So, for your example to work, you need to adjust your build context up one level in order to access LibReferenceGenerator and FwCore. Then, make the source of your COPY instructions relative to that one-level up context.
Note that the default location of the Dockerfile is a file named Dockerfile at your build context. You'll need to either move your Dockerfile, or specify a custom path using the -f, --file option.
docker image build documentation
You are missing one level in the copy.
It should be:
COPY ../../LibReferenceGenerator/LibReferenceGenerator.csproj ../LibReferenceGenerator/

Conditionally ignoring a .dockerignore file on COPY

I have a front end build that uses variations of a Dockerfile for multiple steps: dev, CI (with Jenkins), and production. I'd like to not successively download node_modules for CI and production build images (both of which happen successively on the same box). Dev's node_modules are hosted on a volume to lower the overhead of restarting the dev container.
The three stages all share the same .dockerignore file which has a line excluding node_modules. Is it possible to add node_modules in via something like COPY node_modules/* node_modules/? I've searched in vain for a way to use a bind mount during the build portion of both CI and production builds. This doesn't seem to be possible.
Currently there is no such way where you can provide a different .dockerignore file.
As an alternative, you can copy the node_modules to a different directory such as ./node_new_module using cp on the host OR probably integrate that cp command in your CI.
After that you can use the new ./node_new_module to copy node modules in your Dockerfile -
COPY ./node_new_modules/* node_modules/
Hope this helps or gives you a way to solve this problem.

Resources