How to use docker images when building artefacts in Actions? - docker

TL;DR: I would like to use on a self-hosted Actions runner (itself a docker container on my docker engine) specific docker images to build artefacts that I would move between the build phases, and end with a standalone executable (not a docker container to be deployed). I do not know how to use docker containers as "building engines" in Actions.
Details: I have a home project consisting of a backend in Go (cross compiled to a standalone binary) and a frontend in Javascript (actually a framework: Quasar).
I develop on my laptop in Windows and use GitHub as the SCM.
The manual steps I do are:
build a static version of the frontend which lands in a directory spa
copy that directory to the backend directory
compile the executable that embeds the spa directory
copy (scp) this executable to the final destination
For development purposes this works fine.
I now would like to use Actions to automate the whole thing. I use docker based self-hosted runners (tcardonne/github-runner).
My problem: the containers do a great job isolating the build environment from the server they run on. They are however reused across build jobs and this may create conflicts. More importantly, the default versions of software provided by these containers is not the right (usually - latest) one.
The solution would be to run the build phases in disposable docker containers (that would base on the right image, shortening the build time as a collateral nice to have). Unfortunately, I do not know how to set this up.
Note: I do not want to ultimately create docker containers, I just want to use them as "building engines" and extract the artefacts from them, and share between the jobs (in my specific case - one job would be to build the front with quasar and generate a directory, the other one would be a compilation ending up with a standalone executable copied elsewhere)

Interesting premise, you can certainly do this!
I think you may be slightly mistaken with regards to:
They are however reused across build jobs and this may create conflicts
If you run a new container from an image, then you will start with a fresh instance of that container. Files, software, etc, all adhering to the original image definition. Which is good, as this certainly aids your efforts. Let me know if I have the wrong end of the stick in regards to the above though.
Base Image
You can define your own image for building, in order to mitigate shortfalls of public images that may not be up to date, or suit your requirements. In fact, this is a common pattern for CI, and Google does something similar with their cloud build configuration. For either approach below, you will likely want to do something like the following to ensure you have all the build tools you may
As a rough example:
FROM golang:1.16.7-buster
RUN apt update && apt install -y \
git \
make \
...
&& useradd <myuser> \
&& mkdir /dist
USER myuser
You could build and publish this with the following tag:
docker build . -t <containerregistry>:buildr/golang
It would also be recommended that you maintain a separate builder image for other types of projects, such as node, python, etc.
Approaches
Building with layers
If you're looking to leverage build caching for your applications, this will be the better option for you. Caching is only effective if nothing has changed, and since the projects will be built in isolation, it makes it relatively safe.
Building your app may look something like the following:
FROM <containerregistry>:buildr/golang as builder
COPY src/ .
RUN make dependencies
RUN make
RUN mv /path/to/compiled/app /dist
FROM scratch
COPY --from=builder /dist /dist
The gist of this is that you would start building your app within the builder image, such that it includes all the build deps you require, and then use a multi stage file to publish a final static container that includes your compiled source code, with no dependencies (using the scratch image as the smallest image possible ).
Getting the final files out of your image would be a bit harder using this approach, as you would have to run an instance of the container once published in order to mount the files and persist it to disk, or use docker cp to retrieve the files from a running container (not image) to your disk.
In Github actions, this would look like running a step that builds a Docker container, where the step can occur anywhere with docker accessibility
For example:
jobs:
docker:
runs-on: ubuntu-latest
steps:
...
- name: Build and push
id: docker_build
uses: docker/build-push-action#v2
with:
push: true
tags: user/app:latest
Building as a process
This one can not leverage build caching as well, but you may be able to do clever things like mounting a host npm cache into your container to aid in actions like npm restore.
This approach differs from the former in that the way you build your app will be defined via CI / a purposeful script, as opposed to the Dockerfile.
In this scenario, it would make more sense to define the CMD in the parent image, and mount your source code in, thus not maintaining a image per project you are building.
This would shift the responsibility of building your application from the buildtime of the image, to the runtime. Retrieving your code from the container would be doable through volume mounting for example:
docker run -v /path/to/src:/src /path/to/dist:/dist <containerregistry>:buildr/golang
If the CMD was defined in the builder, that single script would execute and build the mounted in source code, and subsequently publish to /dist in the container, which would then be persisted to your host via that volume mapping.
Of course, this applies if you're building locally. It actually becomes a bit nicer in a Github actions context if you wish to keep your build instructions there. You can choose to run steps within your builder container using something like the following suggestion
jobs:
...
container:
runs-on: ubuntu-latest
container: <containerregistry>:buildr/golang
steps:
- run: |
echo This job does specify a container.
echo It runs in the container instead of the VM.
name: Run in container
Within that run: spec, you could choose to call a build script, or enter the commands that might be present in the script yourself.
What you do with the compiled source is muchly up to you once acquired 👍
Chaining (Frontend / Backend)
You mentioned that you build static assets for your site and then embed them into your golang binary to be served.
Something like that introduces complications of course, but nothing untoward. If you do not need to retrieve your web files until you build your golang container, then you may consider taking the first approach, and copying the content from the published image as part of a Docker directive. This makes more sense if you have two separate projects, one for frontend and backend.
If everything is in one folder, then it sounds like you may just want to extend your build image to facilitate go and js, and then take the latter approach and define those build instructions in a script, makefile, or your run: config in your actions file
Conclusion
This is alot of info, I hope it's digestible for you, and more importantly, I hope it gives you some ideas as to how you can tackle your current issue. Let me know if you would like clarity in the comments

Related

Why does docker bother of context if we do not copy all

In various sites of Docker official web, it warns about the folder that is sent to docker daemon (they call as context) to build new image with docker build. For example, from understand-build-context
Inadvertently including files that are not necessary for building an
image results in a larger build context and larger image size. This
can increase the time to build the image, time to pull and push it,
and the container runtime size. To see how big your build context is,
look for a message like this when building your Dockerfile:
Sending build context to Docker daemon 187.8MB
I do not understand why the context is so important if we do not use all its content.
Let say that my build context is a 1GB folder, but in Dockerfile I have only one COPY command of a file of 1KB. Then why do we bother about the rest? How could the rest affect the size of my image?
Similarly, why do we have .dockerignore? If I do not use them in Dockerfile, are not they ignored at all? If not, then for what are they used?
Let say that my build context is a 1GB folder, but in Dockerfile...
The Dockerfile is normally transferred as part of the build context. Perhaps the easiest place to see this is in the "build an image" Docker HTTP API: the dockerfile parameter is explicitly a path within the build context, which is expected to be transferred in the HTTP body as a tar file. In that low-level API there's no way to pass the Dockerfile outside of that build-context tar-file HTTP body.
So first you send the build context to the Docker daemon, then the daemon unpacks it, and then it reads the Dockerfile and sees
I have only one COPY command of a file of 1KB.
so only that one file is copied into the resulting image; the rest of the context is just ignored.
Then why do we bother about the rest? How could the rest affect the size of my image? Similarly, why do we have .dockerignore?
Sending the build context is surprisingly slow. Even if you're not using remote Docker, and working directly on a native-Linux host, it can take multiple seconds to send that 1 GB tar-file build context over the Unix socket. So smaller build contexts can result in faster builds, and .dockerignore is a convenient way to cause things you're not going to use to be omitted from the build context.
It is very common to copy the entire build context into an image, though, and in this case it's important to control what goes in there. Let's consider a typical Node application. In day-to-day development I might just use Node, so I'll have a package.json file and a src subdirectory, but Node installs all of its dependencies in a node_modules subdirectory as well. A typical Node Dockerfile will look something like
FROM node:lts
WORKDIR /app
# Copy and install dependencies
COPY package*.json ./
RUN npm ci
# Copy and build the rest of the application
COPY ./ ./ # <-- IMPORTANT
RUN npm run build
# Explain how to run the container
EXPOSE 3000
CMD ["node", "./build/index.js"]
The RUN npm ci line recreates the node_modules directory inside the image. In the next line I copy the entire build context – my src directory, webpack.js configuration, .typescript configuration, static assets, the whole works - into the image, with enough parts and local files that I'd prefer to not list them out individually.
In that context it's important that COPY ./ ./ not include the host's node_modules directory. The host might be a different OS, or a different C library version, or any of several other things that might cause incompatibilities. That's where putting it in .dockerignore lets me say "copy everything, except this".
Your question hints at a very carefully curated build-context directory. That's a possibility too; in particular it's something that made sense with a compiled language, on a native-Linux host, before Docker multi-stage builds existed. You could consider writing something like a Makefile that copied specific files from your source tree into a dedicated docker directory, and then used that directory as the build context. Then you'd know exactly what was in the build context and therefore exactly what was going into the image. With modern Docker and multi-stage builds, I feel like this setup is a little unusual though.
The documentation was written before buildkit became standard in docker, but it's still a good practice for older build tooling. The reason for this in the classic builder is that docker is a client/server based app. To run a build, the client sends over the entire context, Dockerfile, and all the parameters for the server to build, and the server runs that build, pulling parts out of the context that the Dockerfile requests. As much as it looks like everything is happening locally, and often is, the server could be a remote host without direct access to your filesystem, and the build process is a JSON REST API that sends the request and then monitors for the build to complete.
Buildkit, however, changes this. Both the server and the client communicate with each other, and the server has a cache of not only the previous builds, but of the previous build contexts. So when a file changes in the context between builds, it can perform the equivalent of an rsync to send just that one file, and only when the server requests it from the client.
There is still a need for a .dockerignore since even with buildkit, you often want to exclude files within the build that would otherwise be copied in a wildcard match. For example, if you have the step:
COPY . /src
Then even with the buildkit caching, you'll include every file in the directory, even if a number of those files aren't needed to build your app (like the .git folder, the Dockerfile itself, the README, LICENSE, etc). That not only bloats your image and makes your builds slower, but it risks causing a cache miss when the resulting image would normally be unchanged.
Some will make the .dockerignore look similar to their .gitignore with some added files that don't affect the build. I often do the reverse, excluding everything, and then reincluding only the files I need with the ! prefix. E.g. the following would include only the Makefile, src, and static folders:
*
!Makefile
!src/
!static/
If you do that, make sure you remember to update it when adding new files or directories to your builds.

How to use Gitlab's Auto DevOps for multi-container application?

I have a multi-container application, with nginx as web server and reverse-proxy, and a simple 'Hello World' Streamlit app.
It is available on my Gitlab.
I am totally new to DevOps, and would therefore like to leverage Gitlab's Auto DevOps so as to make it easy.
By default Gitlab's Auto DevOps expects one Dockerfile only, and at the root of the project (source)
Surprisingly, I only found one ressource on my multi-container use case, that aimed to answer this issue : https://forum.gitlab.com/t/auto-build-for-multiple-docker-containers/46949
I followed the advice, and made only slights changes to the .gitlab-ci.yml for the path to my dockerfiles.
But then I have an issue with the Dockerfiles not recognizing the files in its folder :
App's Dockerfile doesn't find the requirements.txt :
And Nginx's Dockerfile doesn't find the project.conf
It seems that the DOCKERFILE_PATH: src/nginx/Dockerfile variable gives only acess to the Dockerfile in itself, but doesn't understand this path as the location for the build.
How can I customize this .gitlab-ci.yml so that the build passes correctly ?
Thank you very much !
The reason the files are not being found is due to how docker's context works. Since you're running docker build from the root, your context will be within the root as opposed to from the path for your dockerfile. That means that your docker build command is trying to find /requirements.txt instead of src/app/requirements.txt. You can fix this relatively easily by just executing a cd to change to your /src/app directory before you run docker build, and removing the -f flag from your docker build (since you no longer need to specify the folder).
Since each job executes in an isolated container, you don't need to worry about CDing back to your build root, since your job never runs any other non-docker commands.

gcloud rebuilds complete container but Dockerfile is the same, only the script has changed

I am building Docker containers using gcloud:
gcloud builds submit --timeout 1000 --tag eu.gcr.io/$PROJECT_ID/dockername Dockerfiles/folder_with_dockerfile
The last 2 steps of the Dockerfile contain this:
COPY script.sh .
CMD bash script.sh
Many of the changes I want to test are in the script. So the Dockerfile stays intact. Building those Docker files on Linux with Docker-compose results in a very quick build because it detects nothing has changed. However, doing this on gcloud, I notice the complete Docker being re-generated whereas only a minor change in the script.sh has been created.
Any way to prevent this behavior?
Your local build is fast because you already have all remote resouces cached locally.
Looks like using kaniko-cache would speed a lot your build. (see https://cloud.google.com/cloud-build/docs/kaniko-cache#kaniko-build).
To enable the cache on your project run
gcloud config set builds/use_kaniko True
The first time you build the container it will feed the cache (for 6h by default) and the rest will be faster since dependencies will be cached.
If you need to further speed up your build, I would use two containers and have both in my local GCP container registry:
The fist one as a cache with all remote dependencies (OS / language / framework / etc).
The second one is the one you need with just the COPY and CMD using the cache container as base.
Actually, gcloud has a lot to do:
The gcloud builds submit command:
compresses your application code, Dockerfile, and any other assets in the current directory as indicated by .;
uploads the files to a storage bucket;
initiates a build using the uploaded files as input;
tags the image using the provided name;
pushes the built image to Container Registry.
Therefore the compete build process could be time consuming.
There are recommended practices for speeding up builds such as:
building leaner containers;
using caching features;
using a custom high-CPU VM;
excluding unnecessary files from upload.
Those could optimize the overall build process.

Building software in docker - at `build` or at `run` time?

I am currently using docker do create a reproducable build environment (for building Android ROMs). Now I would like to run multiple builds, each with slight variations. Every build contains of several steps, e.g.
Build Linux kernel
Build Android
Include custom apps
Package image
If two builds only vary at step 3, it would be great to be able to reuse the first two steps.
I am thinking of two options:
Enter my docker container, run the build, and save the build artifacts at each step. Later check if I can reuse them. This would require quite a bit of coding, and manual management of build artifacts.
Abuse docker build. Create a dockerfile for each configuration, with one RUN command for each step. I think this will let me use docker's caching - if two builds only differ at step 3, docker will reuse a layer containing steps 1 and 2. I would only ever "run" the container I built to copy out the finished ROM.
Is there a "best" or canonical way to do this? Is there any downside to using docker build in this way?
You could build what's called a "base image", and push that up to a docker registry. Then for the two branches of that image, you use the FROM keyword. But, instead of using a base image like FROM ubuntu:latest , you use your base image:
To use the base image:
FROM repo/base-image:tag
So your base could be:
FROM ubuntu:14.04
# Step 1
COPY /tmp /tmp
# Step 2
ADD /src /src
You build and push that:
docker build -t repo/base-image .
docker push repo/base-image
Then, in your other two Dockerfiles...
Dockerfile1
FROM repo/base-image:tag
# Step 3 specific to this Dockerfile1
ADD /something /somewhere
# Do different things
EXPOSE 443
Dockerfile2
FROM repo/base-image:tag
# Step 3 specific to this Dockerfile2
ADD /something-else /somewhere-else
# Do different things
EXPOSE 80
That way they have the first 2 layers in common, and only differ by the third layer. The lines in docker files are called layers. Kind of like traversing a tree. The more lines you have, the more layers / levels you have. But, based on the FROM repo/img:tag line, this tells you where to inherit ALL previous layers from.
The second option (relying on Dockerfiles + docker build) is definitely the way to go.
Indeed as you already mentioned it in your question, this will enable Docker to use caching.
Also, I recall that even if there is only one Dockerfile involved with a single FROM ... command, Docker's caching will already be active. That's the reason why in a Dockerfile, the order of commands matters (it is preferable to run beforehand the commands that are unlikely to change at each build, and afterwards the commands that are likely to - such as the compilation of custom apps).
You can thus follow the steps detailed in #JabariDash's answer, but if you notice that an intermediate image repo/some-image is only used once (via a command FROM repo/some-image in another Dockerfile), note that you can avoid defining this repo/some-image in a separate Dockerfile: indeed you can put several FROM ... commands in the same Dockerfile, and rely on the so-called multi-stage builds feature of Docker >= 17.05.

Wrap origin public Dockerfile to manage build args, etc

I'm very new to Docker and stuff, so I wonder if I can change source official and public images from Docker.Hub (which I use in FROM directive) on-the-fly, while using them in my own container builds, kinda like chefs chef-rewind do?
For example, if I need to pass build-args to openresty/latest-centos to build it without modules I won't use. I need to put this
FROM openresty/latest-centos
in my Dockerfile, and what else should I do for openresty to be built only with modules I needed?
When you use the FROM directive in a Dockerfile, you are simply instructing Docker to use the named image as the base for the image that will be built with your Dockerfile. This does not cause the base image to be rebuilt, so there is no way to "pass parameters" to the build process.
If the openresty image does not meet your needs, you could:
Clone the openresty git repository,
Modify the Dockerfile,
Run docker build ... to build your own image
Alternatively, you can save yourself that work and just use the existing image and live with a few unused modules hanging around. If the modules are separate components, you could also issue the necessary commands in your Dockerfile to remove them.

Resources