Dockerfile COPY and keep folder structure

Dockerfile COPY and keep folder structure - docker

I'm trying to create a Dockerfile that copies all package.json files into the image but keeps the folder structure.
This what I have now:
FROM node:15.9.0-alpine as base
WORKDIR /app/
COPY ./**/package.json ./
CMD ls -laR /app
Running with: sudo docker run --rm -it $(sudo docker build -q .)
But it only copies 1 package.json and puts it in the base dir (/app)
Here is the directory I'm testings on:
├── Dockerfile
├── t1
│   └── package.json
└── t2
└── ttt
├── b.txt
└── package.json
And i would like it to look like this inside the container:
├── Dockerfile
├── t1
│   └── package.json
└── t2
└── ttt
└── package.json

The Dockerfile COPY directive is documented as using the Go filepath.Match function for glob expansion. That only supports the basic glob characters *, ?, [a-z], but not extensions like ** that some shells support.
Since COPY only takes a filename glob as input and it likes to flatten the file structure, I don't think there's a way to do the sort of selective copy you're describing in a single command.
Instead you need to list out the individual files you want to copy. COPY will create directories as needed, but that means you need to repeat paths on both sides of COPY.
COPY t1/package*.json t1/
COPY t2/ttt/package*.json t2/ttt/
I can imagine some hacky approaches using multi-stage builds; have an initial stage that copies in the entire source tree but then deletes all of the files except package*.json, then copies that into the actual build stage. I'd contemplate splitting my repository into smaller modules with separate Dockerfiles per module first.

Related

COPY files in current local directory only and do not include subdirectories in Docker?

Given a Git repository as Docker context:
my_project_dir
├── Dockerfile
├── run_myapp.py
├── requirements.txt
├── dir1
│ └── ... some files
└── dir2
└── ... some files
I want to use COPY to move the run_myapp.py and requirements.txt but not the two directories. I want to keep my Docker image light so I don't want to include the directories; those are used for other services.
Currently I have COPY used as follows:
...
COPY run_myapp.py run_myapp.py
COPY requirements.txt requirements.txt
...
I don't want to use COPY . . since this will copy everything.
I there a way to specify all docs but not directories?
What I've tried
I read Copy current directory in to docker image and some similar questions outside of StackOverflow like https://www.geeksforgeeks.org/docker-copy-instruction/ but none answer my question.

The Dockerfile COPY syntax supports shell globs but doesn't support any sort of matching on file type. You can copy all things with a given name *.py but not "only files". For the actual glob syntax it delegates to the Go path/filepath module which supports only the basic *, ?, and [a-z] characters as "special".
You aren't limited to a single file in a COPY command, though, as #JoachimSauer notes in a comment, and you don't have to spell out the destination directory or filename on the right-hand side. A relative path like . is relative to the current WORKDIR. So here I might write
WORKDIR /app
COPY run_myapp.py requirements.txt .

Using .dockerignore file with the following content should do the trick:
*/*

How do I exclude everything except .go files in .dockerignore?

So, I have a dummy project, which file structure looks like this:
docker-magic
├── Dockerfile
├── .dockerignore
├── just_a_file
├── src
│   ├── folder_one
│   │   ├── main.go
│   │   └── what_again.sh
│   ├── folder_two
│   │   └── what.kts
│   └── level_one.go
└── top_level.go
My Dockerfile looks like this:
FROM ubuntu:latest as builder
WORKDIR /workdir
COPY ./ ./
RUN find . -type f
ENTRYPOINT ["echo", "wow"]
I build this image with a docker build . -t docker-magic:test --no-cache command to avoid caching results.
The idea is simple - I copy all of the files from docker-magic folder into my image and then list all of them via find . -type f.
Also I want to ignore some of the files. I do that with a .dockerignore file. According to the official Docker docs:
The placement of ! exception rules influences the behavior: the last line of the .dockerignore that matches a particular file determines whether it is included or excluded.
Let's consider following contents of .dockerignore:
**/*.go
It should exclude all the .go files. And it does! I get following contents from find:
./.dockerignore
./src/folder_one/what_again.sh
./src/folder_two/what.kts
./just_a_file
./Dockerfile
Next, let's ignore everything. .dockerignore is now:
**/*
And, as expected, I get empty output from find.
Now it gets difficult. I want to ignore all the files except .go files. According to the docs, it would be the following:
**/*
!**/*.go
But I get the following output from find:
./top_level.go
Which is obviously not what is expected, because other .go files, as we have seen, also match this pattern. How do I get the result I wanted - copying only .go files to my image?
EDIT: my Docker version is 20.10.5, build 55c4c88.

Docker COPY lost intermediate path

Dockerfile and build context file tree, the core-site.xml file's relative path IS conf/etc/hadoop
.
├── Dockerfile
└── conf
├── etc
│   └── hadoop
│   └── core-site.xml
└── xxx.conf
Dockerfile(very simple) as below
FROM alpine:latest
RUN mkdir -p /data
WORKDIR /data
COPY conf/* /data/
After docker build -t any/any:any ., the COPY layer's file list as below. Intermediate path etc/ of file core-site.xml lost. Where the etc/ gone ?
data/
data/xxx.conf
data/hadoop/
data/hadoop/.wh..wh..opq
data/hadoop/core-site.xml

Remove the * so you just have COPY conf /data
The COPY command treats the copying of files and directories a bit differently. Your COPY statement expands to
COPY conf/etc /data/
COPY conf/xxx.conf /data/
The first statement actually copies the contents of conf/etc into the /data directory on the container.

Can you have a non top-level Dockerfile when invoking COPY?

Have a Dockerfile to build releases for an Elixir/Phoenix application...The tree directory structure is as follows, where the Dockerfile (which has a dependency on this other Dockerfile) is in the "infra" subfolder and needs access to all the files one level above "infra".
.
├── README.md
├── assets
│   ├── css
│   ├── js
│   ├── node_modules
│   ├── package-lock.json
│   ├── package.json
├── lib
├── infra
│   ├── Dockerfile
│   ├── config.yaml
│   ├── deployment.yaml
The Dockerfile looks like:
# https://github.com/bitwalker/alpine-elixir
FROM bitwalker/alpine-elixir:latest
# Set exposed ports
EXPOSE 4000
ENV PORT=4000
ENV MIX_ENV=prod
ENV APP_HOME /app
ENV APP_VERSION=0.0.1
COPY ./ ${HOME}
WORKDIR ${HOME}
RUN mix deps.get
RUN mix compile
RUN MIX_ENV=${MIX_ENV} mix distillery.release
RUN echo $HOME
COPY ${HOME}/_build/${MIX_ENV}/rel/my_app/releases/${APP_VERSION}/my_app.tar.gz .
RUN tar -xzvf my_app.tar.gz
USER default
CMD ./bin/my_app foreground
The command "mix distillery.release" is what builds the my_app.tar.gz file in the path indicated by the COPY command.
I invoke the docker build as follows in the top-level directory (the parent directory of "infra"):
docker build -t my_app:local -f infra/Dockerfile .
I basically then get an error with COPY:
Step 13/16 : COPY ${HOME}/_build/${MIX_ENV}/rel/my_app/releases/${APP_VERSION}/my_app.tar.gz .
COPY failed: stat /var/lib/docker/tmp/docker-builder246562111/opt/app/_build/prod/rel/my_app/releases/0.0.1/my_app.tar.gz: no such file or directory
I understand that the COPY command depends on the "build context" but I thought that by issuing the "docker build" in the parent directory of infra meant I had the appropriate context set for the COPY, but clearly that doesn't seem to be the case. Is there a way to have a Dockerfile one level below the parent directory that contains all the files needed to build an Elixir/Phoenix "release" (the my_app.tar.gz and associated files created via the command mix distillery.release)? What bits am I missing?

Docker follow symlink outside context

Yet another Docker symlink question. I have a bunch of files that I want to copy over to all my Docker builds. My dir structure is:
parent_dir
- common_files
- file.txt
- dir1
- Dockerfile
- symlink -> ../common_files
In above example, I want file.txt to be copied over when I docker build inside dir1. But I don't want to maintain multiple copies of file.txt.
Per this link, as of docker version 0.10, docker build must
Follow symlinks inside container's root for ADD build instructions.
But I get no such file or directory when I build with either of these lines in my Dockerfile:
ADD symlink /path/dirname or
ADD symlink/file.txt /path/file.txt
mount option will NOT solve it for me (cross platform...).
I tried tar -czh . | docker build -t without success.
Is there a way to make Docker follow the symlink and copy the common_files/file.txt into the built container?

That is not possible and will not be implemented. Please have a look at the discussion on github issue #1676:
We do not allow this because it's not repeatable. A symlink on your machine is the not the same as my machine and the same Dockerfile would produce two different results. Also having symlinks to /etc/paasswd would cause issues because it would link the host files and not your local files.

If anyone still has this issue I found a very nice solution on superuser.com:
https://superuser.com/questions/842642/how-to-make-a-symlinked-folder-appear-as-a-normal-folder
It basically suggests using tar to dereference the symlinks and feed the result into docker build:
$ tar -czh . | docker build -

One possibility is to run the build in the parent directory, with:
$ docker build [tags...] -f dir1/Dockerfile .
(Or equivalently, in child directory,)
$ docker build [tags...] -f Dockerfile ..
The Dockerfile will have to be configured to do copy/add with appropriate paths. Depending on your setup, you might want a .dockerignore in the parent to leave out
things you don't want to be put into the context.

I know that it breaks portability of docker build, but you can use hard links instead of symbolic:
ln /some/file ./hardlink

I just had to solve this issue in the same context. My solution is to use hierarchical Docker builds. In other words:
parent_dir
- common_files
- Dockerfile
- file.txt
- dir1
- Dockerfile (FROM common_files:latest)
The disadvantage is that you have to remember to build common_files before dir1. The advantage is that if you have a number of dependant images then they are all a bit smaller due to using a common layer.

I got frustrated enough that I made a small NodeJS utility to help with this: file-syncer
Given the existing directory structure:
parent_dir
- common_files
- file.txt
- my-app
- Dockerfile
- common_files -> symlink to ../common_files
Basic usage:
cd parent_dir
// starts live-sync of files under "common_files" to "my-app/HardLinked/common_files"
npx file-syncer --from common_files --to my-app/HardLinked
Then in your Dockerfile:
[regular commands here...]
# have docker copy/overlay the HardLinked folder's contents (common_files) into my-app itself
COPY HardLinked /
Q/A
How is this better than just copying parent_dir/common_files to parent_dir/my-app/common_files before Docker runs?
That would mean giving up the regular symlink, which would be a loss, since symlinks are helpful and work fine with most tools. For example, it would mean you can't see/edit the source files of common_files from the in-my-app copy, which has some drawbacks. (see below)
How is this better than copying parent_dir/common-files to parent_dir/my-app/common_files_Copy before Docker runs, then having Docker copy that over to parent_dir/my-app/common_files at build time?
There are two advantages:
file-syncer does not "copy" the files in the regular sense. Rather, it creates hard links from the source folder's files. This means that if you edit the files under parent_dir/my-app/HardLinked/common_files, the files under parent_dir/common_files are instantly updated, and vice-versa, because they reference the same file/inode. (this can be helpful for debugging purposes and cross-project editing [especially if the folders you are syncing are symlinked node-modules that you're actively editing], and ensures that your version of the files is always in-sync/identical-to the source files)
Because file-syncer only updates the hard-link files for the exact files that get changed, file-watcher tools like Tilt or Skaffold detect changes for the minimal set of files, which can mean faster live-update-push times than you'd get with a basic "copy whole folder on file change" tool would.
How is this better than a regular file-sync tool like Syncthing?
Some of those tools may be usable, but most have issues of one kind or another. The most common one is that the tool either cannot produce hard-links of existing files, or it's unable to "push an update" for a file that is already hard-linked (since hard-linked files do not notify file-watchers of their changes automatically, if the edited-at and watched-at paths differ). Another is that many of these sync tools are not designed for instant responding, and/or do not have run flags that make them easy to use in restricted build tools. (eg. for Tilt, the --async flag of file-syncer enables it to be used in a local(...) invokation in the project's Tiltfile)

One tool that allows to "link" a directory in a way that is accepted by docker, is docker itself.
It is possible to run a temporary docker container, with all necessary files/directories mounted in adequate paths, and build image from within such container. For example:
docker run -it \
--rm \
-v /var/run/docker.sock:/var/run/docker.sock \
--mount "type=bind,source=$ImageRoot/Dockerfile,destination=/Image/Dockerfile,readonly" \
--mount "type=bind,source=$ImageRoot/.dockerignore,destination=/Image/.dockerignore,readonly" \
--mount "type=bind,source=$ReposRoot/project1,destination=/Image/project1,readonly" \
--mount "type=bind,source=$ReposRoot/project2,destination=/Image/project2,readonly" \
--env DOCKER_BUILDKIT=1 \
docker:latest \
docker build "/Image" --tag "my_tag"
In above example I assume variables $ImageRoot and $ReposRoot are set.

instead of using simlinks it is possible to solve problem administratively by just moving files from sites_available to sites_enabled instead of copying or making simlinks
so your site config will be in one copy only in site_available folder if it stopped or something or in sites_enabled if it should be used

Use a small wrapper script to copy the needed dir to the Dockerfile's location;
build.sh;
.
#!/bin/bash
[ -e bin ] && rm -rf bin
cp -r ../../bin .
docker build -t "sometag" .

Commonly I isolate build instructions to subfolder, so application and logic levels are higher located:
.
├── app
│   ├── package.json
│   ├── modules
│   └── logic
├── deploy
│   ├── back
│   │   ├── nginx
│   │   │   └── Chart.yaml
│   │   ├── Containerfile
│   │   ├── skaffold.yaml
│   │   └── .applift -> ../../app
│   ├── front
│   │   ├── Containerfile
│   │   ├── skaffold.yaml
│   │   └── .applift -> ../../app
│   └── skaffold.yaml
└── .......
I utilize name ".applift" for those symbolic links
.applift -> ../../app
And now follow symlink via realpath without care about path depth
dir/deploy/front$ docker build -f Containerfile --tag="build" `realpath .applift`
or pack in func
dir/deploy$ docker_build () { docker build -f "$1"/Containerfile --tag="$2" `realpath "$1/.applift"`; }
dir/deploy$ docker_build ./back "front_builder"
so
COPY app/logic/ ./app/
in Containerfile will work
Yes, in this case you will loose context for other layers. But generally there is no any other context files located in build-directory

I had a situation where the parent_dir contained common libraries in common_files/ and a common docker/Dockerfile. dir1/ contained the contents of a different code repository but I wanted that repository to have access to those parent code repository folders. I solved it without using symlinks as follows:
parent_dir
- common_files
- file.txt
- docker
- Dockerfile
- dir1
- docker-compose.yml --> ../common_files
--> ../docker/Dockerfile
So I created a docker-compose.yml file, where I specified where the files were located relative to the docker-compose.yml where it would be executed from. I also tried to minimise changes to the Dockerfile since it would be used by both repositories, so I provided DIR argument to specify a subdirectory to run:
version: "3.8"
services:
dev:
container_name: template
build:
context: "../"
dockerfile: ./docker/Dockerfile
args:
- DIR=${DIR}
volumes:
- ./dir1:/app
- ./common_files:/common_files
I ran the following from within the dir1/ folder and it ran successfully:
export DIR=./dir1 && docker compose -f docker-compose.yml build
This is the original Dockerfile:
...
WORKDIR /app
COPY . /app
RUN my_executable
...
And this is a snippet with changes I made to the Dockerfile:
...
ARG DIR=${DIR}
WORKDIR /app
COPY . /app
RUN cd ${DIR} && my_executable && cd /app
...
This worked and the parent repository could still run the Dockerfile with the same outcome even though I had introduced the DIR argument since if the parent repository called it DIR would be an empty string and it would behave like it did before.

If you're on mac, rembember to do
brew install gnu-tar
and use gtar instead of tar. Seems there are some differences between the two.
gtar worked for me at least.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart