Dockerfile copy command allows golang regex. But with the regex, I am not able to omit a particular folder.
For example, if the directory has:-
public
dist
webapp
somefile.txt
anotherfile.txt
Now, how should I write the expression for COPY such that it omits 'webapp' and copy all other files and folders?
NOTE: I know I can put it to .dockerignore, but in later build stage in the same Dockerfile, I want to copy that folder - 'webapp'
You have two choices:
List all directories you want to copy directly:
COPY ["foldera", "folderc", "folderd", ..., "/dstPath]
Try to exclude some paths but also make sure that all paths patterns are not including the path we want to exclude:
COPY ["folder[^b]*", "file*", "/dstPath"]
Also you can read more about available solutions in this issue: https://github.com/moby/moby/issues/15771
COPY with exclusions work around
I have a PHP + Node app, with both node_modules and vendor directories, with layer caching in place.
I was looking to exclude my dependencies, by excluding some files from being copied, but since Docker COPY does not support exclusions, I took a different approach, to get my dependencies cached in a different layer.
It took a combination of 3 different steps:
Step 1
Script the tarring the node_modules and vendor directories in my build process:
tar -cf ./node_modules.tgz --directory=./src/node_modules .
tar -cf ./vendor.tgz --directory=./src/vendor .
docker build ...
rm node_modules.tgz vendor.tgz
docker push ...
Step 2
Use .dockerignore to ignore the node_modules and vendor directories:
src/node_modules
src/vendor
Step 3
Add the tar files to the project, before copying the rest of my source code:
ADD node_modules.tgz /var/www/node_modules
ADD vendor.tgz /var/www/vendor
COPY ./src /var/www
Obviously, the first build is slow while the layer gets cached, and whenever the cache is invalidated (e.g. new packages).
Credit to jason-kane from here for inspiration: https://github.com/moby/moby/issues/15771#issuecomment-207113714
Something else to note: my vendor and node_modules directories are in the same folder as the source code.
Related
In my Dockerfile, I have the following:
COPY . /var/task
...which copies my app code into the image.
I need to exclude the vendor/ directory when performing this copy.
I cannot add vendor/ to .dockerignore, because that directory needs to be part of the image when it gets built within the image with a RUN composer install.
I cannot specify every file and directory that should be copied, because they may change and I can't rely on other developers to keep the list updated.
I've tried the following, with the following errors:
COPY [^vendor$]* /var/task
When using COPY with more than one source file, the destination must be a directory and end with a /
COPY [^vendor$]*/ /var/task
COPY failed: no source files were specified
It is actually enough to add the vendor directory to the .dockerignore file.
You can broadly follow the flow of files through docker build in three phases:
docker build reads files from the directory you name, ignoring things in the .dockerignore file, and sends them to the Docker daemon as the build context.
The COPY instruction copies files from the build context into the container filesystem.
RUN instructions do further transformation or processing.
If you put vendor in the .dockerignore file, it prevents the directory from being included in the build context. The build will go somewhat faster, and COPY won't have the files to copy into the image. It won't prevent a RUN composer install step later on from creating its own vendor directory in the image.
I don't think there is an easy solution to this problem.
If you need vendor for RUN composer install and you're not using a multistage build then it doesn't matter if you remove the vendor folder in the copy command. If you've copied it into the build earlier then it's going to be present in your final image, even if you don't copy it over in your COPY step.
One way to get around this is with multi-stage builds, like so:
FROM debian as base
COPY . /var/task/
RUN rm -rf /var/task/vendor
FROM debian
COPY --from=base /var/task /var/task
If you can use this pattern in your larger build file then the final image will contain all the files in your working directory except vendor.
There's still a performance hit though. You're still going to have to copy the entire vendor directory into the build, and depending on what docker features you're using that will still take a long time. But if you need it for composer install then there's really no way around this.
In XCode 11.7, when I add a Copy Files build phase, it only allows me to select files, not directories, and while I can select files from within sub-directories, they all get mashed together into the top-level.
I can work around it by adding multiple Copy Files phases with a different subpath so that the structure ends up correct, but this is tedious and feels wrong.
Is there a better way to make the output directory structure of Copy Files match the input?
I figured out how to do it - instead of a Copy Files phase, I added a Run Script phase and used the cp -R command.
In my case, I wanted to copy a TargetApp/www folder to the "Wrapper" (root of the app), so my full script is:
echo "copying $SRCROOT/TargetApp/www to $TARGET_BUILD_DIR/$CONTENTS_FOLDER_PATH"
cp -R "$SRCROOT/TargetApp/www" "$TARGET_BUILD_DIR/$CONTENTS_FOLDER_PATH"
FROM nixos/nix#sha256:af330838e838cedea2355e7ca267280fc9dd68615888f4e20972ec51beb101d8
# FROM nixos/nix:2.3
ADD . /build
WORKDIR /build
RUN nix-build
ENTRYPOINT /build/result/bin/app
I have the very simple Dockerfile above that can succesfully build my application. However each time I modify any of the files within the application directory (.), it'll have to rebuild from scratch + download all the nix store dependencies.
Can I somehow grab a "list" of store dependencies downloaded and then add them in on the beginning of the Dockerfile for the purpose of caching them independently (for the ultimate goal of saving time + bandwidth)?
I'm aware I could build this docker image using nix natively which has it's own caching functionality (well the nix store), but I'm trying to have this buildable in a non nix environment (hence using docker).
I can suggest split source in two parts. The idea is to create a separate Docker layer with dependencies only, which changes rarely:
FROM nixos/nix:2.3
ADD ./default.nix /build
# if you have any other Nix files, put them to ./nix subdirectory
ADD ./nix /build/nix
# now let's download all the dependencies
RUN nix-shell --run exit
# At this point, Docker has cached all the dependencies. We can perform the build
ADD . /build
WORKDIR /build
RUN nix-build
ENTRYPOINT /build/result/bin/app
Right now my .dockerignore file has this contents:
.vscode
.idea
.git
bin
pkg
and my Dockerfile looks like:
FROM golang:latest
RUN mkdir -p /app
WORKDIR /app
COPY . .
ENV GOPATH /app
RUN go install huru
EXPOSE 3000
ENTRYPOINT /app/bin/huru
My question is - should I be copying the pkg folder from host to image or not? Right now I am not, as my dockerignore file makes clear.
I get the feeling that I should just COPY the pkg folder from host to image, because that might have pre-built files in it that go install can use instead of re-downloading the source from github etc?
Personally, I think copying pkg folder from host to image is not a good idea because :
it tightly couples the place from where you are building the image (your host) and the image itself. You could potentially have differences in resulting images depending on where you build the image, and that's probably what you don't want
moreover, if you have automated builds (from CI for example), you're probably rebuilding the whole application from a clean environment each time, so there is no initial pkg folder to copy.
If you're familiar with Java world, I've already encountered that problem for images built with Maven. To speed up the build, some people are copying their local Maven repository (~/.m2) in the image to avoid redownloading artifacts. I don't particularly agree with that, since there is always a risk that their .m2 folder contains corrupted artifacts : therefore, the image built on their machine will be different than if it was built on a clean environment. It depends on whether you want to have consistent builds or quick builds (I prefer the former).
In conclusion, I think that building images from a clean environment, without depending on the host where the image is built, is a good practice. That's why I personally would not copy any files (except application source code!) inside the image.
I have one Rockerfile that builds 4 images; I also have one central .dockerignore file. For one of the images I require assets that are blocked by the .dockerignore file -- is there a way when doing ADD or COPY to force add / ignore this list?
It'll be a lot easier to do this in one file as opposed to three separate...!
In a simple way no.
The .dockerignore file is used to filter what will be used in the build before even reading the Dockerfile.
The docker daemon does not see your build folder, when the build starts, all the files in the context build folder are compressed (or just packed) and send to the daemon and only then it will read your Dockerfile to build the container with the files it received.
More content about .dockerignore: https://docs.docker.com/engine/reference/builder/#/dockerignore-file
In a normal Docker build the .dockerignore file affects the "build context" that is packaged up and sent to the docker server at the beginning of the build. If the "build context" doesn't contain the files then you can't reference them, so this is how the files are excluded. They don't "exist" for the build.
Rocker claims to run differently by not sending a build context to the server. The code looks like each ADD/COPY step is composed into a tar file that ignores the files. Also, the .dockerignore is read once at startup and cached.
As Rocker is not sending the build context before each build, only filtering for each ADD/COPY command, there is hope. But due to the ignore data being read only once at startup you can't do anything funky like copying different .dockerignore files at different stages of the build though.
Use MOUNT
One option is to continue using the .dockerignore as is and use a Rocker MOUNT command to manually copy the ignored directories. Their last example in the mount section demonstrates:
FROM debian:jessie
ADD . /app # assets/ in .dockerignore
WORKDIR /app
MOUNT .:/context
RUN cp -r /context/assets /app # include assets/
Change App Structure
The only other useful option I can think of is to split out your ADD or COPY into multiple commands so that you don't rely on the the .dockerignore to filter files to the other 3 images. This would probably require your assets directory to be stored outside of your application root.