How to configure docker specific image dependencies which are managed in a different source code repository - docker

How to configure docker specific artifact dependencies which are managed in a different source code repository. My docker image depends on jar files (say project-auth), configuration (say project-theme) which is actually maintained in a different repository than the docker image.
What would be the best way to copy dependencies for a docker image (say project-deploy repo), prior to building the image. i.e in the above case project-deploy needs jar files and configuration which needs to be mounted as a volume from the current folder.
I don't want this to be committed as the dependencies tend to get stale and I want the docker image creation to be part of the build process itself.

You can use Docker multi-stage builds for this purpose.
With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.
For example:
Suppose that the source code for dependencies is present in repo - "https://github.com/demo/demo.git"
Using multi stage builds, you can create a stage in which you'll clone the git repo and create the dependency Jar (or anything else that you need) at runtime.
At last, you can copy the jar into your final image.
# Use any base image. I took centos here
FROM centos:7 as builder
# Install only those packages which are required.
RUN yum install -y maven git \
&& git clone <YOUR_GIT_REPO_URL>
WORKDIR /myfolder
# Create jar at run time. You can update this step according to your project requirements.
RUN mvn clean package
# From here our normal Dockerfile steps starts.
FROM centos:7
# Add all the necessary steps required to build your image
.
.
.
# This is how you can copy the jar which was created above (Step 4) in your final docker image.
COPY --from=builder SOURCE_PATH DESTINATION_PATH
Please refer this to get a better understanding about multi stage builds in docker.

Related

Build docker-image from remote repository (github-actions, gitlab-ci) with env and secrets from another remote repo?

I have a (private) repository at GitHub with my project and integrated GitHub-actions which is building a docker-image and pushing it directly to GHCR.
But I have a problem with storing and passing secrets to my build image. I have the following structure in my project:
.git (gitignored)
.env (gitignored)
config (gitignored) / config files (jsons)
src (git) / other folders and files
As you may see, I have .env file and config folder. Both of them store data or files, which are not in the repo but are required to be in the built environment.
So I'd like to ask, is there any option not to pass all these files to my main remote repo (even if it's private) but to link them during the build stage within the github-actions?
It's not a problem to publish env & configs somewhere else, privately, in another separate private remote-repo. The point is not to push these files to the main-private-repo, because RBAC logic doesn't allow me to restrict access to the selected files.
P.S. Any other advice of using GitLab CI or BitBucket, if you know how to solve the problem is also appreciated. Don't be shy to share it!
So it seems that this question is a bit hot, so I have found an answer for it.
Example that is shown above is based on node.js and nest.js app and pulling the private remote repo from GitHub.
In my case, this scenario was about pulling from separate private repo config files and other secrets. And we merge them with our project during container build. This option isn't about security of secrets inside container itself. But for making one part of a project (repo itself with business logic) available to other developers (they won't see credentionals and configs from separate private repo, in your development repo) and a secret-private repo with separate access permission.
You all need your personal access token (PAT), on github you can found it here:
As for GitLab, the flow is still the same. You'll need to pass token from somewhere in the settings. And also, just a good advice, create not just one, but two docker files, before testing it.
Why https instead of ssh? In that case you'll need also to pass ssh keys and also config the client correctly. It's a bit more complicated because of CRLF and LF formats, crypto-algos supported by ssh and so on.
# it could be Go, PHP, what-ever
FROM node:17
# you will need your GitHub token from settings
# we will pass it to build env via GitHub action
ARG CR_PAT
ENV CR_PAT=$CR_PAT
# update OS in build container
RUN apt-get update
RUN apt-get install -y git
# workdir app, it is a cd (directory)
WORKDIR /usr/src/app
# installing nest library
RUN npm install -g #nestjs/cli
# config git with credentials
# we will use https since it is much easier to config instead of ssh
RUN git config --global url."https://${github_username}:${CR_PAT}#github.com/".insteadOf "https://github.com/"
# cloning the repo to WORKDIR
RUN git clone https://github.com/${github_username}/${repo_name}.git
# we move all files from pulled repo to root of WORKDIR
# including files named with dot at the beginning (like .env)
RUN mv repo_folder/* repo_folder/.[^.]* . && rmdir repo_folder/
# node.js stuff
COPY package.json ./
RUN yarn install
COPY . .
RUN nest build app
CMD wait && ["node"]
As a result, you'll see a fully container with your code merged with files and code from other separate repo which we pull from.

How can I cache a nix derivations's dependencies when built via Docker?

FROM nixos/nix#sha256:af330838e838cedea2355e7ca267280fc9dd68615888f4e20972ec51beb101d8
# FROM nixos/nix:2.3
ADD . /build
WORKDIR /build
RUN nix-build
ENTRYPOINT /build/result/bin/app
I have the very simple Dockerfile above that can succesfully build my application. However each time I modify any of the files within the application directory (.), it'll have to rebuild from scratch + download all the nix store dependencies.
Can I somehow grab a "list" of store dependencies downloaded and then add them in on the beginning of the Dockerfile for the purpose of caching them independently (for the ultimate goal of saving time + bandwidth)?
I'm aware I could build this docker image using nix natively which has it's own caching functionality (well the nix store), but I'm trying to have this buildable in a non nix environment (hence using docker).
I can suggest split source in two parts. The idea is to create a separate Docker layer with dependencies only, which changes rarely:
FROM nixos/nix:2.3
ADD ./default.nix /build
# if you have any other Nix files, put them to ./nix subdirectory
ADD ./nix /build/nix
# now let's download all the dependencies
RUN nix-shell --run exit
# At this point, Docker has cached all the dependencies. We can perform the build
ADD . /build
WORKDIR /build
RUN nix-build
ENTRYPOINT /build/result/bin/app

Docker multistage build without copying from previous image?

does it have any advantages to use a multistage build in Docker, if you don't copy any files from the previously built image?
eg.
FROM some_base_image as base
#Some random commands
RUN mkdir /app
RUN mkdir /app2
RUN mkdir /app3
#ETC
#Second stage starts from first stage
FROM base
#Add some files to image
COPY foo.txt /app
Does this result in a smaller image or offer any other advantages compared to a non multi-stage version? Or are multi stage builds only useful for preparing some files and then copying those into another base image?
Or are multi stage builds only useful for preparing some files and then copying those into another base image?
This is the main use-case discussed in "Use multi-stage builds"
The main goal is to reduce the number of layers by copying files from one image to another, without including the build environment needed to produce said files.
But, another goal could be not rebuild the entire Dockerfile including every stage.
Then your suggestion (not copying) could still apply.
You can specify a target build stage. The following command assumes you are using the previous Dockerfile but stops at the stage named builder:
$ docker build --target builder -t alexellis2/href-counter:latest .
A few scenarios where this might be very powerful are:
Debugging a specific build stage
Using a debug stage with all debugging symbols or tools enabled, and a lean production stage
Using a testing stage in which your app gets populated with test data, but building for production using a different stage which uses real data

How can I build an image in Docker without downloading all dependencies every time?

I have a Django app that uses Docker and has a bunch of library dependencies in the requirements.txt Any time I add a new dependency, I have to re-build the image and it downloads all of the dependencies from scratch. Is there a way to cache dependencies when building a docker image?
The most common solution is to create a new base image on top the one that already has all the dependencies. However, if you update all your dependencies very regularly, it might be easier to set a CI process where you build a new base image every so often (every week? every day?)
Multistage might not work in Docker because the dependencies are part of your base image, so then you do docker build . it will always want to pull all the dependencies when you do a pip3 install -r requirements.txt

What is a Docker build stage?

As far as I understand build stages in Docker are fundamental things, and I have a practical understanding of them but I have trouble coming up with a proper definition, and I also can't seem to find one.
So: what is the definition of a Docker build stage?
Edit: I'm not asking "how do I use a build stage?" or "how can I use multi-build stages?" which people seem very eager to answer :-)
The reason I have this question is because I saw the following sentences in the docs:
"The FROM instruction initializes a new build stage"
"a name can be given to a new build stage"
Which left me wondering: what exactly is a build stage?
I don't think there will ever be a strict definition for Docker build stage because a build stage is in general something theoretical which:
can be defined by you
depends on your case (language / libraries)
In this question: Difference between build and deploy? one of the answers says...
Build means to Compile the project.
I think you can see it this way too. A build stage is any procedure that generates something which can later be taken and used.
The idea with docker multi-stage builds is to:
generate what you are going to need
leave behind what you don't need and use the product of step 1 in a more lightweight way
If you have read the docs, Alex Ellis has a nice example where the same logic takes place:
he starts with a golang image, adds libraries, builds his app (Go generates a binary executable file)
after that, he doesn't need golang and the libraries to ship/run it so, he picks an alpine image, adds the executable file from step 1 and ships his app with an image that has much smaller size.
Since version 17, docker now supports multiple stages during a docker build executions.
This means, that you no longer need to define only one source image in your docker file and do the whole build in a single run, but you can define multiple stages with different images in your Dockerfile for each stage with multiple FROM definitions:
# Build stage
FROM microsoft/aspnetcore
# ..do a build with a dev image for creating ./app artifact
# Publish - use a hardened, production image
FROM alpine:latest
CMD ["./app"]
This gives you the benefit to break your image building process to be optimized for a task that you are doing in a stage - for example the stages could be:
use an image with extra linting dependencies to check your source
use a dev-image with all development dependencies already installed to build your source
use another image including test frameworks to run various tests on the artifacts
and once everything passed ok, use a minimal-sized, optimized, hardened image to capture the final artifacts for production
Read more in details about multistage-build:
https://docs.docker.com/develop/develop-images/multistage-build/
A stage is the creation an image. In a multi-stage build, you go through the process of creating more than one image, however you typically only tag a single one (exceptions being multiple builds, building a multi-architecture image manifest with a tool like buildx, and anything else docker releases after this answer).
Each stage, building a distinct image, starts from a FROM line in the Dockerfile. One stage doesn't inherit anything done in previous stages, it is based on its own base image. So if you have the following:
FROM alpine as stage1
RUN apk add your_tool
FROM alpine as stage2
RUN your_tool some args
you will get an error since your_tool is not installed in the second stage.
Which stage do you get as output from the build? By default the last stage, but you can change that with the docker image build --target stage1 . to build the stage with the name, stage1 in this example. The classic docker build will run from the top of the Dockerfile until if finishes the target stage. Buildkit builds a dependency graph and builds stages concurrently and only if needed, so do not depend on this ordering to control something like a testing workflow in your Dockerfile (buildkit can see if nothing in the test stage is needed in your release stage and skip building the test).
What's the value of multiple stages? Typically, its done to separate the build environment from the runtime environment. It allows you to perform the entire build inside of docker. This has two advantages.
First, you don't require an external Makefile and various compilers and other tools installed on the host to compile the binaries that then get copied into the image with a COPY line, anyone with docker can build your image.
And second, the resulting image doesn't include all the compilers or other build time tooling that isn't needed at runtime, resulting in smaller and more secure images. The typical example is a java app with maven and a full JDK to build, a runtime with just the jar file and the JRE.
If each stage makes a separate image, how do you get the jar file from the build stage to the run stage? That comes from a new option to the COPY command, --from. An oversimplified multi-stage build looks like:
FROM maven as build
COPY src /app/src
WORKDIR /app/src
RUN mvn install
FROM openjdk:jre as release
COPY --from=build /app/src/target/app.jar /app
CMD java -jar /app/app.jar
With that COPY --from=build we are able to take the artifact built in the build stage and add it to the release stage, without including anything else from that first stage (no layers of compile tools like JDK or Maven get added to our second stage).
How is the FROM x as y and the COPY --from=y /a /b working together? The FROM x as y is defining an image name for the duration of this build, in this case y. Anywhere later in the Dockerfile that you would put an image name, you can put y and you'll get the result of this stage as your input. So you could say:
FROM upstream as mybuilder
RUN apk add common_tools
FROM mybuilder as stage2
RUN some_tool arg2
FROM mybuilder as stage3
RUN some_tool arg3
FROM minimal_base as release
COPY --from=stage2 /bin2 /
COPY --from=stage3 /bin3 /
Note how stage2 and stage3 are each FROM mybuilder that is the output of the first stage.
The COPY --from=y allows you to change the context where you are copying from to be another image instead of the build context. It doesn't have to be another stage. So, for example, you could do the following to get a docker binary in your image:
FROM alpine
COPY --from=docker:stable /usr/local/bin/docker /usr/local/bin/
Further documentation on this is available at: https://docs.docker.com/develop/develop-images/multistage-build/
a build stage starts at a FROM statement and ends at the step before the next FROM statement
stage | steɪdʒ |
noun
a point, period, or step in a process or development
Take a practical example: you want to build an image which contains a production ready web server with Typescript files compiled to Javascript. You want to build that Typescript within a Docker container to simplify dependency management. So you need:
node.js
Typescript
any dependencies needed for compilation
Webpack or whatever
nginx/Apache/whatever
In your final image you only really need the compiled .js files and, say, nginx. But to get there, you need all that other stuff first. When you upload that final image, it will contain all the intermediate layers, even if they're unnecessary for the final product.
Docker build stages now allow you to actually separate those stages, or steps, into separate images, while still using just one Dockerfile and not needing to glue several Dockerfiles together with external shell scripts or such. E.g.:
FROM node as builder
RUN npm install ...
# whatever you need to build your files
FROM nginx as production
COPY --from=builder /final.js /var/www/html
The final result of this Dockerfile is a small image with nginx as its base plus just the final .js file. It does not contain all the unnecessary stuff like node.js and the npm dependencies.
builder here is the first stage, production is the second stage. In this case the first stage will be discarded at the end of the process, but you can also choose to build a specific stage using docker build --target=builder. A new FROM introduces a new, separate stage. They're essentially separate Dockerfiles, but they can share data using COPY --from.

Resources