Best practice to cache downloaded resources between builds - docker

I am building a web application I'd like to deploy as a Docker container. The application depends on a set of assets stored in a separate Git repository. The reason for using a separate repository is that the history of that repository is much larger than the current checkout and we'd like to have a way to throw away that history without touching the history of the repository containing the source code.
In the example below, containing only the relevant parts, I'm passing the assets repository commit ID into the build process using a file:
FROM something:something
# [install Git and stuff]
COPY ["assets_git_id", "/root/"]
RUN git clone --bare git://lala/assets.git /root/assets.git \
&& mkdir -p /srv/app/assets
&& git --git-dir=/root/assets.git --work-tree=/srv/app/assets checkout $(</root/assets_git_id) .
&& rm -r /root/assets.git
# [set up the rest of the application]
The problem here is that whenever that ID changes, the whole repository is cloned during the build process and most of the data is thrown away.
What is the canonical way reduce the wasted resources in such a case? Ideally I'd like to have access to a directory from inside the container during build whose contents are kept between multiple runs of the same build. The RUN script could then just update the repository and copy the relevant data from it instead of cloning the whole repository each time.

Use multi stage builds
# Stage 1
FROM something:something as GitSource
# [install Git]
RUN git clone --bare git://lala/assets.git /root/assets.git
COPY ["assets_git_id", "/root/"]
RUN git --git-dir=/root/assets.git pull
RUN mkdir -p /srv/app/assets
RUN git --git-dir=/root/assets.git --work-tree=/srv/app/assets checkout $(</root/assets_git_id) .
# Stage 2
FROM something:something
COPY --from=GitSource /srv/app/assets /srb/app/assets
# [set up the rest of the application]
For the final image, it will discard whatever you do in Stage 1, except what is being copied to Stage 2.

Related

Build docker-image from remote repository (github-actions, gitlab-ci) with env and secrets from another remote repo?

I have a (private) repository at GitHub with my project and integrated GitHub-actions which is building a docker-image and pushing it directly to GHCR.
But I have a problem with storing and passing secrets to my build image. I have the following structure in my project:
.git (gitignored)
.env (gitignored)
config (gitignored) / config files (jsons)
src (git) / other folders and files
As you may see, I have .env file and config folder. Both of them store data or files, which are not in the repo but are required to be in the built environment.
So I'd like to ask, is there any option not to pass all these files to my main remote repo (even if it's private) but to link them during the build stage within the github-actions?
It's not a problem to publish env & configs somewhere else, privately, in another separate private remote-repo. The point is not to push these files to the main-private-repo, because RBAC logic doesn't allow me to restrict access to the selected files.
P.S. Any other advice of using GitLab CI or BitBucket, if you know how to solve the problem is also appreciated. Don't be shy to share it!
So it seems that this question is a bit hot, so I have found an answer for it.
Example that is shown above is based on node.js and nest.js app and pulling the private remote repo from GitHub.
In my case, this scenario was about pulling from separate private repo config files and other secrets. And we merge them with our project during container build. This option isn't about security of secrets inside container itself. But for making one part of a project (repo itself with business logic) available to other developers (they won't see credentionals and configs from separate private repo, in your development repo) and a secret-private repo with separate access permission.
You all need your personal access token (PAT), on github you can found it here:
As for GitLab, the flow is still the same. You'll need to pass token from somewhere in the settings. And also, just a good advice, create not just one, but two docker files, before testing it.
Why https instead of ssh? In that case you'll need also to pass ssh keys and also config the client correctly. It's a bit more complicated because of CRLF and LF formats, crypto-algos supported by ssh and so on.
# it could be Go, PHP, what-ever
FROM node:17
# you will need your GitHub token from settings
# we will pass it to build env via GitHub action
ARG CR_PAT
ENV CR_PAT=$CR_PAT
# update OS in build container
RUN apt-get update
RUN apt-get install -y git
# workdir app, it is a cd (directory)
WORKDIR /usr/src/app
# installing nest library
RUN npm install -g #nestjs/cli
# config git with credentials
# we will use https since it is much easier to config instead of ssh
RUN git config --global url."https://${github_username}:${CR_PAT}#github.com/".insteadOf "https://github.com/"
# cloning the repo to WORKDIR
RUN git clone https://github.com/${github_username}/${repo_name}.git
# we move all files from pulled repo to root of WORKDIR
# including files named with dot at the beginning (like .env)
RUN mv repo_folder/* repo_folder/.[^.]* . && rmdir repo_folder/
# node.js stuff
COPY package.json ./
RUN yarn install
COPY . .
RUN nest build app
CMD wait && ["node"]
As a result, you'll see a fully container with your code merged with files and code from other separate repo which we pull from.

Dockerfile, RUN git clone does not create required files/folders

English is not my native language, so I apologize in advance for very much possible errors.
I'm new to Docker and Linux in general, and trying to learn rn.
I have a task, I need to create dockerfile based upon tomcat:9.0-alpine, then in this exact file clone provided repository. Then I need to build image from this dockerfile, run container and then visit index.html in a browser, where I will see a specific page.
This how my dockerfile looks like:
FROM tomcat:9.0-alpine
RUN apk update
RUN apk add git
RUN git clone https://github.com/a1qatraineeship/docker_task.git $TOMCAT_HOME/webapps/whateverApp/
When I build image from this, I see that repo was cloned in a directory, that I specified:
#7 [4/4] RUN git clone https://github.com/a1qatraineeship/docker_task.git $TOMCAT_HOME/webapps/aquaApp/
#7 sha256:72b802c3b98dad7151daeba7db257b7b1f1089dc26fb5809fee52f881e19edb5
#7 0.319 Cloning into '/webapps/whateverApp'...
#7 DONE 1.9s
But when I run container and go to http://localhost:8888/whateverApp/, I get "404 - not found".
If I go to http://localhost:8888, I see Tomcat default page, so Tomcat is deffinetely working.
If I bash into container and go to /webapps - I don't see a folder that I specified (whateverApp) in there.
What I'm missing? All seems to work ok, there is no errors thrown, but don't see supposedly cloned repository. In examples that I saw there was no mention about any access restrictions or whatever. Even my teacher said before, that whole dockerfile will essentially consist of 4 lines, and only thing I really need to find out is to where to clone repo for everything to work properly. But If it don't clone at all, how can I verify that I placed files into right place?
The problem is your environment variable, is empty when the container run TOMCAT_HOME
FROM tomcat:9.0-alpine
ENV TOMCAT_HOME=/usr/local/tomcat
RUN apk update
RUN apk add git
RUN git clone https://github.com/a1qatraineeship/docker_task.git $TOMCAT_HOME/webapps/whateverApp/
and with that ENV should work, gook luck!

How to configure docker specific image dependencies which are managed in a different source code repository

How to configure docker specific artifact dependencies which are managed in a different source code repository. My docker image depends on jar files (say project-auth), configuration (say project-theme) which is actually maintained in a different repository than the docker image.
What would be the best way to copy dependencies for a docker image (say project-deploy repo), prior to building the image. i.e in the above case project-deploy needs jar files and configuration which needs to be mounted as a volume from the current folder.
I don't want this to be committed as the dependencies tend to get stale and I want the docker image creation to be part of the build process itself.
You can use Docker multi-stage builds for this purpose.
With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.
For example:
Suppose that the source code for dependencies is present in repo - "https://github.com/demo/demo.git"
Using multi stage builds, you can create a stage in which you'll clone the git repo and create the dependency Jar (or anything else that you need) at runtime.
At last, you can copy the jar into your final image.
# Use any base image. I took centos here
FROM centos:7 as builder
# Install only those packages which are required.
RUN yum install -y maven git \
&& git clone <YOUR_GIT_REPO_URL>
WORKDIR /myfolder
# Create jar at run time. You can update this step according to your project requirements.
RUN mvn clean package
# From here our normal Dockerfile steps starts.
FROM centos:7
# Add all the necessary steps required to build your image
.
.
.
# This is how you can copy the jar which was created above (Step 4) in your final docker image.
COPY --from=builder SOURCE_PATH DESTINATION_PATH
Please refer this to get a better understanding about multi stage builds in docker.

How to benefit of the docker image cache when cloning a git repo

I have a private git repository that I have to add to my docker image. For that I clone it locally in the same directory with the Dockerfile and then use the following docker command:
ADD my_repo_clone /usr/src/
My repo has a version tag that I clone, v1. So the files that I clone are always the same.
The problem is that when I build this docker image I always get a new image instead of replacing the old one:
docker build --rm -t "org_name/image_name" .
Apparently, because the ctime of the files change, the docker cache is not seeing my files as identical so I get always a new image, which I want to avoid.
I tries to touch the cloned repo and change atime and mtime to be a fixed date, but it is still not enough.
How can I stop Docker (without changing Docker source code that computes the file hashes and building it again) from creating all the time new images.
Or how can I clone the repo during the image building process? (For this I need SSH forwarding since the repo is private, and I could also not make SSH agent forwarding work during an image build process)
since you don't care about the repository itself and just need the files for tag v1, you could use git archive instead of git clone to produce a tar archive holding the files for tag v1.
Finally, the docker ADD directive to inject the archive into the image.
The mtime of the produced tar archive will be the time of the tag as documented:
git archive behaves differently when given a tree ID versus when given a commit ID or tag ID. In the first case the current time is used as the modification time of each file in the archive. In the latter case the commit time as recorded in the referenced commit object is used instead.
try:
git archive --remote=https://my.git.server.com/myoproject.git refs/tags/v1 --format=tar > v1.tar

git checkout master command gives error that a file would be overwritten

I just made tried to do git checkout master and I got this error:
macoss-MacBook-Pro-10:Marketing owner12$ git checkout master
error: The following untracked working tree files would be overwritten by checkout:
Marketing.xcodeproj/project.xcworkspace/xcuserdata/owner12.xcuserdatad/UserInterfaceState.xcuserstate
Please move or remove them before you can switch branches.
Aborting
but I am not sure how to handle this situation. I don't mind having this file overwritten by what is in the repo. What is the correct way for me to proceed here?
Thanks!
You have files that are not being tracked. Either
rm untracked.file1 untracked.file2
or
git add . && git commit -m "adding new previously untracked files that serve a purpose"
if you're having permission issues:
git add --ignore-errors .
Either delete the file if you don't care about it or stash it if you think you will need it in the future. Or simply rename.
Commit the files you want to keep and then do a git clean to remove the extra files you don't want to keep. This article on the git ready website describes it very well.
If you just want to get rid of one or two files in your working directory then you can do a dry run first and see which files would be cleaned up using:
git clean -n
And then when you are sure do this:
git clean -f
git clean has a -d switch if you want to clean up directories as well. And you can use that together with the other switches, so this is what I would normally use (and then after the dry run change -n to -f):
git clean -n -d
Then after your git clean, use:
git status
to make sure that you have no untracked files or uncommitted changes. And lastly switch to master with:
git checkout master

Resources