Recursively COPY files matching filename and preserve directory structure - docker

Assume I have the following dir structure:
languages
-en-GB
--page1.json
--page2.json
-fr-FR
--page1.json
--page2.json
Now let's assume I want to copy the folder structure, but only page1.json content:
I've tried this:
COPY ["languages/**/*page1.json", "./"]
Which results in the folders being copied, but no files.
What I want to end up with is
languages
-en-GB
--page1.json
-fr-FR
--page1.json
Copied into my image

I am not sure you can use wildcards to produce the filtered result you are looking for.
I believe there are at least two clean and clear ways to achieve this:
Option 1: Copy everything, and cleanup later:
FROM alpine
WORKDIR /languages
COPY languages .
RUN rm -r **/page2.json
Option 2: Add files you don't want into your .dockerignore
# .dockerignore
languages/**/page*.json
!languages/**/page1.json
Option 3: Copy all to a temporary directory, and copy what you need from inside the container using more flexible tools
FROM alpine
WORKDIR /languages
COPY languages /tmp/langs
RUN cd /tmp/langs ; find -name 'page1.json' -exec cp --parents {} /languages \;
CMD ls -lR /languages

Related

How to use cp command in dockerfile

I want to decrease the number of layers used in my Dockerfile.
So I want to combine the COPY commands in a RUN cp.
dependencies
folder1
file1
file2
Dockerfile
The following below commands work which I want to combine using a single RUN cp command
COPY ./dependencies/file1 /root/.m2
COPY ./dependencies/file2 /root/.sbt/
COPY ./dependencies/folder1 /root/.ivy2/cache
This following below command says No such file or directory present error. Where could I be going wrong ?
RUN cp ./dependencies/file1 /root/.m2 && \
cp ./dependencies/file2 /root/.sbt/ && \
cp ./dependencies/folder1 /root/.ivy2/cache
You can't do that.
COPY copies from the host to the image.
RUN cp copies from a location in the image to another location in the image.
To get it all into a single COPY statement, you can create the file structure you want on the host and then use tar to make it a single file. Then when you COPY or ADD that tar file, Docker will unpack it and put the files in the correct place. But with the current structure your files have on the host, it's not possible to do in a single COPY command.
Problem
The COPY is used to copy files from your host to your container. So, when you run
COPY ./dependencies/file1 /root/.m2
COPY ./dependencies/file2 /root/.sbt/
COPY ./dependencies/folder1 /root/.ivy2/cache
Docker will look for file1, file2, and folder1 on your host.
However, when you do it with RUN, the commands are executed inside the container, and ./dependencies/file1 (and so on) does not exist in your container yet, which leads to file not found error.
In short, COPY and RUN are not interchangeable.
How to fix
If you don't want to use multiple COPY commands, you can use one COPY to copy all files from your host to your container, then use the RUN command to move them to the proper location.
To avoid copying unnecessary files, use .dockerignore. For example:
.dockerignore
./dependencies/no-need-file
./dependencies/no-need-directory/
Dockerfile
COPY ./dependencies/ /root/
RUN mv ./dependencies/file1 /root/.m2 && \
mv ./dependencies/file2 /root/.sbt/ && \
mv ./dependencies/folder1 /root/.ivy2/cache
You a re missing final slash in /root/.ivy2/cache/

.NET package restore in Docker cached separately from build

How does one build a Docker image of a .NET 5/C# app so that the restored NuGet packages are cached properly? By proper caching I mean that when sources (but not project files) are changed, the layer containing restored packages is still taken from cache during docker build.
It is a best practice in Docker to perform package restore before adding the full sources and building the app itself as it makes it possible to cache the restore separately, which significantly speeds up the builds. I know that not only the packages directory, but also the bin and obj directories of individual projects have to be preserved from dotnet restore to dotnet publish --no-restore so that everything works together. I also know that once the cache is busted, all following layers are built anew.
My issue is that I cannot come up with a way to COPY just the *.csproj. If I copy more than just the *.csproj, source changes bust the cache. I could copy them into one place outside the docker build and simply COPY them inside the build, but I want to be able to build the image even outside the pipeline, manually, with a reasonably simple command. (Is it an unreasonable requirement?)
For the web app that consists of multiple projects in a pretty standard folder structure src/*/*.csproj, I came up with this attempt that tries to compensate for too many files being copied into the image (which still busts the cache):
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build-env
WORKDIR /src
COPY NuGet.Config NuGet.Config
COPY src/ src/
RUN find . -name NuGet.Config -prune -o \! -type d \! -name \*.csproj -exec rm -f '{}' + \
&& find -depth -type d -empty -exec rmdir '{}' \;
RUN dotnet restore src/Company.Product.Component.App/Company.Product.Component.App.csproj
COPY src/ src/
RUN dotnet publish src/Company.Product.Component.App/Company.Product.Component.App.csproj -c Release --no-restore -o /out
FROM mcr.microsoft.com/dotnet/aspnet:5.0 AS run-env
WORKDIR /app
COPY --from=build-env /out .
ENTRYPOINT ["dotnet", "Company.Product.Component.App.dll"]
I also tried splitting the build-env stage into two just after the restore, copying the /root/.nuget/packages and /src to the build stage, but that did not help either.
The first RUN line and the one immediately before should be replaced with something that copies just the *.csproj, but I don't know what that is. The obvious laborious solution is to have a separate COPY line for each *.csproj, but that does not feel right as projects tend to be added and removed so it makes the Dockerfile hard to maintain. I have tried COPY src/*/*.csproj src/ and then fixing the flattened paths, which is a trick that I googled, but it did not work for me as my Docker processes the wildcards just in file names and interprets directory names literally, emitting an error for nonexistent src/* directory. I am using Docker Desktop 3.5.2 (66501), which uses the BuildKit backend to build the images, but I am open to changing the tooling if it helps.
This leaves me clueless about how to satisfy the relatively simple set of my requirements. My options seem exhausted. Have I missed something? Do I have to accept a tradeoff and drop some of my requirements?
The lack of support for wildcards in directory names is likely a missing feature in BuildKit. The issue has already been reported at moby/buildkit GitHub as #1900.
Till the issue is fixed, disable BuildKit if you don't need any of its features. Either
set the environment variable DOCKER_BUILDKIT to zero (0), or
edit the Docker daemon config so that the "buildkit" feature is set to false and restart the daemon.
In Docker Desktop, the config is easily accessible in Settings > Docker Engine. This method of turning off the feature is recommended by the Docker Desktop 3.2.0 release notes where BuildKit was first enabled by default.
Once BuildKit is disabled, replace
COPY src/ src/
RUN find . -name NuGet.Config -prune -o \! -type d \! -name \*.csproj -exec rm -f '{}' + \
&& find -depth -type d -empty -exec rmdir '{}' \;
with
COPY src/*/*.csproj src/
RUN for from in src/*.csproj; do to=$(echo "$from" | sed 's/\/\([^/]*\)\.csproj$/\/\1&/') \
&& mkdir -p "$(dirname "$to")" && mv "$from" "$to"; done
The COPY will succeed without busting the cache and the RUN will fix the paths. It relies on the fact the projects are in the "src" directory, each in a separate directory of the same name as the project file.
This is basically the solution at the bottom of VonC's answer to a related question. The answer also mentions Moby issue #15858, which has an interesting discussion on the topic.
There is a dotnet tool for the paths fixup, too, but I have not tested it.
An alternate solution that does not require disabling BuildKit is to split the original stage in two right after cleaning up the copied files, i.e. just before restore (not after!).
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS projects-env
WORKDIR /src
COPY NuGet.Config NuGet.Config
COPY src/ src/
RUN find . -name NuGet.Config -prune -o \! -type d \! -name \*.csproj -exec rm -f '{}' + \
&& find . -depth -type d -empty -exec rmdir '{}' \;
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build-env
WORKDIR /src
COPY --from=projects-env /src /src
RUN dotnet restore src/Company.Product.Component.App/Company.Product.Component.App.csproj
COPY src/ src/
RUN dotnet publish src/Company.Product.Component.App/Company.Product.Component.App.csproj -c Release --no-restore -o /out
FROM mcr.microsoft.com/dotnet/aspnet:5.0 AS run-env
WORKDIR /app
COPY --from=build-env /out .
ENTRYPOINT ["dotnet", "Company.Product.Component.App.dll"]
The COPY src/ src/ layer in the sources-env is invalidated by source changes, but cache invalidation works separately for each stage. As the files copied over to the build-env are the same across builds, the COPY --from=projects-env cache is not invalidated, so the RUN dotnet restore layer is taken from the cache, too.
I suspect there are other solutions using the BuildKit mounts (RUN --mount=...), but I haven't tested any.
Here's the alternative way to solve the problem.
First, copy the .sln and .csproj files (depends on the solution folder structure)
COPY *.sln ./
COPY **/*.csproj ./
COPY **/**/**/*.csproj ./
After that run the following script:
RUN dotnet sln list | grep ".csproj" \
| while read -r line; do \
mkdir -p $(dirname $line); \
mv $(basename $line) $(dirname $line); \
done;
The script simply moves .csproj files to the same locations they are in the host filesystem.

Dockerfile - only copy files which match an extension whilst maintaining folder structure

Suppose I have a very nested folder structure with lots of project files:
src
projectA
projectA.csproj
someFile.txt
projectB
projectB.csproj
someFile.txt
projectC
projectC.csproj
someFile.txt
In this case I want my DockerFile to copy over the full folder structure, but only include .csproj files:
src
projectA
projectA.csproj
projectB
projectB.csproj
projectC
projectC.csproj
I can do this for each file line by line, but is there a cleaner way?
COPY src/projectA/projectA.csproj src/projectA/projectA.csproj
COPY src/projectB/projectB.csproj src/projectB/projectB.csproj
COPY src/projectC/projectC.csproj src/projectC/projectC.csproj
I've faced a similar situation and the only solution I've found was to prepare a .tgz file containing what I needed and copy it in the docker image using the ADD directive.
e.g.
this is a run.sh script similar to what I used:
#!/bin/bash
tar cvfz csproj.tgz $( find src -name "*.csproj" )
docker build -t test .
docker run -it --rm test
this is a test Dockerfile:
FROM alpine
RUN mkdir /src
ADD csproj.tgz /src
CMD ls -alR /src
This solution is not very pleasant but it did do what I needed at the time.
The ADD directive (src: https://docs.docker.com/engine/reference/builder/#add) is able to copy files (like the COPY directive) and
If is a local tar archive in a recognized compression format (identity, gzip, bzip2 or xz) then it is unpacked as a directory. Resources from remote URLs are not decompressed.

Where should I place my bash setup files when using docker?

My Dockerfile is below.
Currently I copy the dotfiles (which are referenced within the .bashrc) to /root
Is there a better way to organize them?
FROM alpine:latest
LABEL maintainer="Michael Durrant<junk#snap2web.com>"
RUN apk add bash git vim
COPY alpine_bashrc /root/.bashrc
COPY .bash_functions.sh /root
COPY .bash_aliases /root
COPY .git-completion.bash /root
RUN "/bin/bash"
Instead of having 1 COPY directive for each file, it might be advisable to have a directory instead. The limitation would be that the files must be named as they would appear in the container.
$ ls .
Dockerfile
dotfiles/
.bashrc
.git-completion.bash
.bash_functions.sh
.bash_aliases.sh
Dockerfile
...
COPY dotfiles/ root/
Each of those COPY directives creates a new layer in your container. Save space/time by having one directive.

Is there a more elegant way to copy specific files using Docker COPY to the working directory?

Attempting to create a container with microsoft/dotnet:2.1-aspnetcore-runtime. The .net core solution file has multiple projects nested underneath the solution, each with it's own .csproj file. I am attemping to create a more elegant COPY instruction for the sub-projects
The sample available here https://github.com/dotnet/dotnet-docker/tree/master/samples/aspnetapp has a solution file with only one .csproj so creates the Dockerfile thusly:
COPY *.sln .
COPY aspnetapp/*.csproj ./aspnetapp/
RUN dotnet restore
It works this way
COPY my_solution_folder/*.sln .
COPY my_solution_folder/project/*.csproj my_solution_folder/
COPY my_solution_folder/subproject_one/*.csproj subproject_one/
COPY my_solution_folder/subproject_two/*.csproj subproject_two/
COPY my_solution_folder/subproject_three/*.csproj subproject_three/
for a solution folder structure of:
my_solution_folder\my_solution.sln
my_solution_folder\project\my_solution.csproj
my_solution_folder\subproject_one\subproject_one.csproj
my_solution_folder\subproject_two\subproject_two.csproj
my_solution_folder\subproject_three\subproject_three.csproj
but this doesn't (was a random guess)
COPY my_solution_folder/*/*.csproj working_dir_folder/*/
Is there a more elegant solution?
2021: with BuildKit, see ".NET package restore in Docker cached separately from build" from Palec.
2018: Considering that wildcard are not well-supported by COPY (moby issue 15858), you can:
either experiment with adding .dockerignore files in the folder you don't want to copy (while excluding folders you do want): it is cumbersome
or, as shown here, make a tar of all the folders you want
Here is an example, to be adapted in your case:
find .. -name '*.csproj' -o -name 'Finomial.InternalServicesCore.sln' -o -name 'nuget.config' \
| sort | tar cf dotnet-restore.tar -T - 2> /dev/null
With a Dockerfile including:
ADD docker/dotnet-restore.tar ./
The idea is: the archive gets automatically expanded with ADD.
The OP sturmstrike mentions in the comments "Optimising ASP.NET Core apps in Docker - avoiding manually copying csproj files (Part 2)" from Andrew Lock "Sock"
The alternative solution actually uses the wildcard technique I previously dismissed, but with some assumptions about your project structure, a two-stage approach, and a bit of clever bash-work to work around the wildcard limitations.
We take the flat list of csproj files, and move them back to their correct location, nested inside sub-folders of src.
# Copy the main source project files
COPY src/*/*.csproj ./
RUN for file in $(ls *.csproj); do mkdir -p src/${file%.*}/ && mv $file src/${file%.*}/; done
L01nl suggests in the comments an alternative approach that doesn't require compression: "Optimising ASP.NET Core apps in Docker - avoiding manually copying csproj files", from Andrew Lock "Sock".
FROM microsoft/aspnetcore-build:2.0.6-2.1.101 AS builder
WORKDIR /sln
COPY ./*.sln ./NuGet.config ./
# Copy the main source project files
COPY src/*/*.csproj ./
RUN for file in $(ls *.csproj); do mkdir -p src/${file%.*}/ && mv $file src/${file%.*}/; done
# Copy the test project files
COPY test/*/*.csproj ./
RUN for file in $(ls *.csproj); do mkdir -p test/${file%.*}/ && mv $file test/${file%.*}/; done
RUN dotnet restore
# Remainder of build process
This solution is much cleaner than my previous tar-based effort, as it doesn't require any external scripting, just standard docker COPY and RUN commands.
It gets around the wildcard issue by copying across csproj files in the src directory first, moving them to their correct location, and then copying across the test project files.
One other option to consider is using a multi-stage build to prefilter / prep the desired files. This is mentioned on the same moby issue 15858.
For those building on .NET Framework, you can take it a step further and leverage robocopy.
For example:
FROM mcr.microsoft.com/dotnet/framework/sdk:4.8 AS prep
# Gather only artifacts necessary for NuGet restore, retaining directory structure
COPY / /temp/
RUN Invoke-Expression 'robocopy C:/temp C:/nuget /s /ndl /njh /njs *.sln nuget.config *.csproj packages.config'
[...]
# New build stage, independent cache
FROM mcr.microsoft.com/dotnet/framework/sdk:4.8 AS build
# Copy prepped NuGet artifacts, and restore as distinct layer
COPY --from=prep ./nuget ./
RUN nuget restore
# Copy everything else, build, etc
COPY src/ ./src/
RUN msbuild
[...]
The big advantage here is that there are no assumptions made about the structure of your solution. The robocopy '/s' flag will preserve any directory structure for you.
Note the '/ndl /njh /njs' flags are there just to cut down on log noise.
In addition to VonC's answer (which is correct), I am building from a Windows 10 OS and targetting Linux containers. The equivalent to the above answer using Windows and 7z (which I normally have installed anyway) is:
7z a -r -ttar my_project_files.tar .\*.csproj .\*.sln .\*nuget.config
followed by the ADD in the Dockerfile to decompress.
Be aware that after installing 7-zip, you will need to add the installation folder to your environment path to call it in the above fashion.
Looking at the moby issue 15858, you will see the execution of the BASH script to generate the tar file and then the subsequent execution of the Dockerfile using ADD to extract.
Fully automate either with a batch or use the Powershell execution as given in the below example.
Pass PowerShell variables to Docker commands
Qnother solution, maybe a bit slower but all in one
Everything in one file and one command docker build .
I've split my Dockerfile in 2 steps,
First image to tar the *.csproj files
Second image use the tar and setup project
code:
FROM ubuntu:18.04 as tar_files
WORKDIR /tar
COPY . .
RUN find . -name "*.csproj" -print0 | tar -cvf projectfiles.tar --null -T -
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 AS build
WORKDIR /source
# copy sln
COPY *.sln .
# Copy all the csproj files from previous image
COPY --from=tar_files /tar/projectfiles.tar .
RUN tar -xvf projectfiles.tar
RUN rm projectfiles.tar
RUN dotnet restore
# Remainder of build process
I use this script
COPY SolutionName.sln SolutionName.sln
COPY src/*/*.csproj ./
COPY tests/*/*.csproj ./
RUN cat SolutionName.sln \
| grep "\.csproj" \
| awk '{print $4}' \
| sed -e 's/[",]//g' \
| sed 's#\\#/#g' \
| xargs -I {} sh -c 'mkdir -p $(dirname {}) && mv $(basename {}) $(dirname {})/'
RUN dotnet restore "/src/Service/Service.csproj"
COPY ./src ./src
COPY ./tests ./tests
RUN dotnet build "/src/Service/Service.csproj" -c Release -o /app/build
Copy solution file
Copy project files
(optional) Copy test project files
Make linux magic (scan sln-file for projects and restore directory structure)
Restore packages for service project
Copy sources
(optional) Copy test sources
Build service project
This is working for all Linux containers

Resources