Override .dockerignore file when using ADD - docker

I have one Rockerfile that builds 4 images; I also have one central .dockerignore file. For one of the images I require assets that are blocked by the .dockerignore file -- is there a way when doing ADD or COPY to force add / ignore this list?
It'll be a lot easier to do this in one file as opposed to three separate...!

In a simple way no.
The .dockerignore file is used to filter what will be used in the build before even reading the Dockerfile.
The docker daemon does not see your build folder, when the build starts, all the files in the context build folder are compressed (or just packed) and send to the daemon and only then it will read your Dockerfile to build the container with the files it received.
More content about .dockerignore: https://docs.docker.com/engine/reference/builder/#/dockerignore-file

In a normal Docker build the .dockerignore file affects the "build context" that is packaged up and sent to the docker server at the beginning of the build. If the "build context" doesn't contain the files then you can't reference them, so this is how the files are excluded. They don't "exist" for the build.
Rocker claims to run differently by not sending a build context to the server. The code looks like each ADD/COPY step is composed into a tar file that ignores the files. Also, the .dockerignore is read once at startup and cached.
As Rocker is not sending the build context before each build, only filtering for each ADD/COPY command, there is hope. But due to the ignore data being read only once at startup you can't do anything funky like copying different .dockerignore files at different stages of the build though.
Use MOUNT
One option is to continue using the .dockerignore as is and use a Rocker MOUNT command to manually copy the ignored directories. Their last example in the mount section demonstrates:
FROM debian:jessie
ADD . /app # assets/ in .dockerignore
WORKDIR /app
MOUNT .:/context
RUN cp -r /context/assets /app # include assets/
Change App Structure
The only other useful option I can think of is to split out your ADD or COPY into multiple commands so that you don't rely on the the .dockerignore to filter files to the other 3 images. This would probably require your assets directory to be stored outside of your application root.

Related

Why does docker bother of context if we do not copy all

In various sites of Docker official web, it warns about the folder that is sent to docker daemon (they call as context) to build new image with docker build. For example, from understand-build-context
Inadvertently including files that are not necessary for building an
image results in a larger build context and larger image size. This
can increase the time to build the image, time to pull and push it,
and the container runtime size. To see how big your build context is,
look for a message like this when building your Dockerfile:
Sending build context to Docker daemon 187.8MB
I do not understand why the context is so important if we do not use all its content.
Let say that my build context is a 1GB folder, but in Dockerfile I have only one COPY command of a file of 1KB. Then why do we bother about the rest? How could the rest affect the size of my image?
Similarly, why do we have .dockerignore? If I do not use them in Dockerfile, are not they ignored at all? If not, then for what are they used?
Let say that my build context is a 1GB folder, but in Dockerfile...
The Dockerfile is normally transferred as part of the build context. Perhaps the easiest place to see this is in the "build an image" Docker HTTP API: the dockerfile parameter is explicitly a path within the build context, which is expected to be transferred in the HTTP body as a tar file. In that low-level API there's no way to pass the Dockerfile outside of that build-context tar-file HTTP body.
So first you send the build context to the Docker daemon, then the daemon unpacks it, and then it reads the Dockerfile and sees
I have only one COPY command of a file of 1KB.
so only that one file is copied into the resulting image; the rest of the context is just ignored.
Then why do we bother about the rest? How could the rest affect the size of my image? Similarly, why do we have .dockerignore?
Sending the build context is surprisingly slow. Even if you're not using remote Docker, and working directly on a native-Linux host, it can take multiple seconds to send that 1 GB tar-file build context over the Unix socket. So smaller build contexts can result in faster builds, and .dockerignore is a convenient way to cause things you're not going to use to be omitted from the build context.
It is very common to copy the entire build context into an image, though, and in this case it's important to control what goes in there. Let's consider a typical Node application. In day-to-day development I might just use Node, so I'll have a package.json file and a src subdirectory, but Node installs all of its dependencies in a node_modules subdirectory as well. A typical Node Dockerfile will look something like
FROM node:lts
WORKDIR /app
# Copy and install dependencies
COPY package*.json ./
RUN npm ci
# Copy and build the rest of the application
COPY ./ ./ # <-- IMPORTANT
RUN npm run build
# Explain how to run the container
EXPOSE 3000
CMD ["node", "./build/index.js"]
The RUN npm ci line recreates the node_modules directory inside the image. In the next line I copy the entire build context – my src directory, webpack.js configuration, .typescript configuration, static assets, the whole works - into the image, with enough parts and local files that I'd prefer to not list them out individually.
In that context it's important that COPY ./ ./ not include the host's node_modules directory. The host might be a different OS, or a different C library version, or any of several other things that might cause incompatibilities. That's where putting it in .dockerignore lets me say "copy everything, except this".
Your question hints at a very carefully curated build-context directory. That's a possibility too; in particular it's something that made sense with a compiled language, on a native-Linux host, before Docker multi-stage builds existed. You could consider writing something like a Makefile that copied specific files from your source tree into a dedicated docker directory, and then used that directory as the build context. Then you'd know exactly what was in the build context and therefore exactly what was going into the image. With modern Docker and multi-stage builds, I feel like this setup is a little unusual though.
The documentation was written before buildkit became standard in docker, but it's still a good practice for older build tooling. The reason for this in the classic builder is that docker is a client/server based app. To run a build, the client sends over the entire context, Dockerfile, and all the parameters for the server to build, and the server runs that build, pulling parts out of the context that the Dockerfile requests. As much as it looks like everything is happening locally, and often is, the server could be a remote host without direct access to your filesystem, and the build process is a JSON REST API that sends the request and then monitors for the build to complete.
Buildkit, however, changes this. Both the server and the client communicate with each other, and the server has a cache of not only the previous builds, but of the previous build contexts. So when a file changes in the context between builds, it can perform the equivalent of an rsync to send just that one file, and only when the server requests it from the client.
There is still a need for a .dockerignore since even with buildkit, you often want to exclude files within the build that would otherwise be copied in a wildcard match. For example, if you have the step:
COPY . /src
Then even with the buildkit caching, you'll include every file in the directory, even if a number of those files aren't needed to build your app (like the .git folder, the Dockerfile itself, the README, LICENSE, etc). That not only bloats your image and makes your builds slower, but it risks causing a cache miss when the resulting image would normally be unchanged.
Some will make the .dockerignore look similar to their .gitignore with some added files that don't affect the build. I often do the reverse, excluding everything, and then reincluding only the files I need with the ! prefix. E.g. the following would include only the Makefile, src, and static folders:
*
!Makefile
!src/
!static/
If you do that, make sure you remember to update it when adding new files or directories to your builds.

In Dockerfile, COPY all contents of current directory except one directory

In my Dockerfile, I have the following:
COPY . /var/task
...which copies my app code into the image.
I need to exclude the vendor/ directory when performing this copy.
I cannot add vendor/ to .dockerignore, because that directory needs to be part of the image when it gets built within the image with a RUN composer install.
I cannot specify every file and directory that should be copied, because they may change and I can't rely on other developers to keep the list updated.
I've tried the following, with the following errors:
COPY [^vendor$]* /var/task
When using COPY with more than one source file, the destination must be a directory and end with a /
COPY [^vendor$]*/ /var/task
COPY failed: no source files were specified
It is actually enough to add the vendor directory to the .dockerignore file.
You can broadly follow the flow of files through docker build in three phases:
docker build reads files from the directory you name, ignoring things in the .dockerignore file, and sends them to the Docker daemon as the build context.
The COPY instruction copies files from the build context into the container filesystem.
RUN instructions do further transformation or processing.
If you put vendor in the .dockerignore file, it prevents the directory from being included in the build context. The build will go somewhat faster, and COPY won't have the files to copy into the image. It won't prevent a RUN composer install step later on from creating its own vendor directory in the image.
I don't think there is an easy solution to this problem.
If you need vendor for RUN composer install and you're not using a multistage build then it doesn't matter if you remove the vendor folder in the copy command. If you've copied it into the build earlier then it's going to be present in your final image, even if you don't copy it over in your COPY step.
One way to get around this is with multi-stage builds, like so:
FROM debian as base
COPY . /var/task/
RUN rm -rf /var/task/vendor
FROM debian
COPY --from=base /var/task /var/task
If you can use this pattern in your larger build file then the final image will contain all the files in your working directory except vendor.
There's still a performance hit though. You're still going to have to copy the entire vendor directory into the build, and depending on what docker features you're using that will still take a long time. But if you need it for composer install then there's really no way around this.

Docker COPY all files and folders except some files/folders

Dockerfile copy command allows golang regex. But with the regex, I am not able to omit a particular folder.
For example, if the directory has:-
public
dist
webapp
somefile.txt
anotherfile.txt
Now, how should I write the expression for COPY such that it omits 'webapp' and copy all other files and folders?
NOTE: I know I can put it to .dockerignore, but in later build stage in the same Dockerfile, I want to copy that folder - 'webapp'
You have two choices:
List all directories you want to copy directly:
COPY ["foldera", "folderc", "folderd", ..., "/dstPath]
Try to exclude some paths but also make sure that all paths patterns are not including the path we want to exclude:
COPY ["folder[^b]*", "file*", "/dstPath"]
Also you can read more about available solutions in this issue: https://github.com/moby/moby/issues/15771
COPY with exclusions work around
I have a PHP + Node app, with both node_modules and vendor directories, with layer caching in place.
I was looking to exclude my dependencies, by excluding some files from being copied, but since Docker COPY does not support exclusions, I took a different approach, to get my dependencies cached in a different layer.
It took a combination of 3 different steps:
Step 1
Script the tarring the node_modules and vendor directories in my build process:
tar -cf ./node_modules.tgz --directory=./src/node_modules .
tar -cf ./vendor.tgz --directory=./src/vendor .
docker build ...
rm node_modules.tgz vendor.tgz
docker push ...
Step 2
Use .dockerignore to ignore the node_modules and vendor directories:
src/node_modules
src/vendor
Step 3
Add the tar files to the project, before copying the rest of my source code:
ADD node_modules.tgz /var/www/node_modules
ADD vendor.tgz /var/www/vendor
COPY ./src /var/www
Obviously, the first build is slow while the layer gets cached, and whenever the cache is invalidated (e.g. new packages).
Credit to jason-kane from here for inspiration: https://github.com/moby/moby/issues/15771#issuecomment-207113714
Something else to note: my vendor and node_modules directories are in the same folder as the source code.

how to reduce Build Context

I have a dockerfile in which a 7GB SQL Server database bak file is being copied from the host.This increases the build context. If bak file is ignored in .dockeringore, COPY fails, as expected. How do I handle this without increasing the build context?
My Folder structure is
C:\proj
artifacts (contains sql bak file)
docker (contains dockerfile)
scripts (contains powershell script for restoring db)
PS C:\proj> docker image build -t testdb:v14 -f .\docker\wcp_db.dockerfile .
Here, the build context includes the bak file and image size increases.
Sending build context to Docker daemon 7.196GB
If I add .gitignore to skip artifacts folder or skip bak file, build context gets reduced.
Sending build context to Docker daemon 11.26kB
However, COPY fails as expected since .gitignore ignores the folder/file from context.
Step 5/6 : COPY ./artifacts/testDB.bak .
COPY failed: CreateFile \\?\C:\ProgramData\Docker\tmp\docker-builder101517566\artifacts\testDB.bak: The system cannot find the file specified.
I believe, we cannot copy a file which is outside the build context.
My dockerfile is below:
#escape = `
FROM microsoft/mssql-server-windows-express:latest
ENV ACCEPT_EULA="Y" `
sa_password="someSApwd012#"
WORKDIR C:\workspace
COPY ./scripts/DeployDatabase.ps1 .
COPY ./artifacts/testDB.bak .
CMD powershell ./DeployDatabase.ps1 -sa_password $env:sa_password -dbName 'testDB' -serverName ".\sqlexpress" -sourceBackupFolder "C:\workspace" -Verbose
How to handle this situation, where I can still copy the bak file, and keeping build context at a minimum?
Usually filtering of the build context is done using .Dockerignore file. If the bak is required for the image it has to be present in buildcontext AFAIK.
One way to overcome it is to provide some way of downloading the bak file in runtime. In this case you'll have to modify the CMD to run some script that first downloads the file and then runs the DB deployment script.
In this case, the image will be small, the build will be faster. However it's not really about docker ecosystem but more about how would you like to run the deployment process, if the absence of the backup image is acceptable in your case

What are the files that the .dockerignore works on?

I don't really understand how .dockerignore works.
Is it intended to be used like the following:
First I add somethings in it such as *.md
Then I put this .dockerignore into the container.
After that I run and enter the container.
I create a new file named test.md and commit this container to the new image.
The new image will ignore this file so it will not be in the new container.
Before explaining the use of the .dockerignore file we must spend a little time understanding what docker build does.
Docker build. What does happen when I build an image ?
When you build an image from a Dockerfile using the docker build command the daemon will create a context. That context contains everything in the directory you executed the command in.
What does .dockerignore do and why use it?
The .dockerignore file allows you to exclude files from the context like a .gitignore file allow you to exclude files from your git repository.
It helps to make build faster and lighter by excluding from the context big files or repository that are not used in the build.
docker build has a step where it tars up the CONTEXT directory and sends it to the docker daemon. This is because the daemon and client might not exist on the same server.
The tar and network send is why unused files can slow down the build. These happen even if the daemon runs locally.
Then I put this .dockerignore in container.
nope, don't do that. The .dockerignore file is meant to be in the same directory as your Dockerfile and is intended to speed up the docker build command by excluding at build time some of the files that won't be used to build the docker image.

Resources