How to share gradle cache with docker containers - docker

We run gradle builds within docker containers (the reason for that is the build requires software we don't want to install on the host; node, wine etc. Not even java or gradle is installed on the host).
Launching each container with an empty cache is annoying slow.
I've set up gradle-4.0's http build cache. That avoided the need to java-compile in most cases. The performance gain is though quite low, because build-time is dominated by downloading dependencies. gradlew --parallel helped to mitigate that a bit, but to really boost the build, downloading should be avoided altogether.
Sharing ~/.gradle as a docker volume is problematic, because it will cause contention when containers run in parallel (https://github.com/gradle/gradle/issues/851).
So, what else can be done to avoid downloading the same artifacts over and over again?

While it is problematic to share gradle caches from containers running in parallel, it is absolutely OK to reuse gradle caches when containers run sequentially. Builds that are launched by jenkins run sequentially.
Jenkins builds can be sped up by using a docker volume for the .gradle folder. The only drawback is, that each job requires its own volume.

You could build a docker image containing a cache, then use this image to run the building containers.

Related

Debug why gradle caching fails across successful docker build instances?

I am looking at trying to make the gradle 6.9 cache work in a docker CI build invoked by Jenkins running in Kubernetes without access to scan.gradle.org.
The idea is to save an image after gradle --build-cache --no-daemon classes bootJar and use that as the ‘FROM‘ of subsequent builds. This works for me on my own machine, but I cannot make it work on the Jenkins server. Everything happens in the gradle home directory so everything should be cached. I am wondering if the path to that matters, as it is deep in a Kubernetes mount under ‘/var‘ and this is the only differences between the two docker builds I can think of.
Caching would preferably be for the whole build, but just caching the Maven dependencies would be a substantial saving.
What am i missing? Is there a way to get insight in why Gradle decides to use what it has already or not?

How to start docker containers using shell commands in Jenkins

I'm trying to start two containers (each with different image) using Jenkins shell commands. I tried installing docker extension in Jenkins and/or setting docker in global configuration tools. I am also doing all this in a pipeline. After executing docker run... I'm getting Docker: not found error in Jenkins console output.
I am also having a hard time finding a guide on the internet that describes exactly what I wish to accomplish. If it is of any importance, I'm trying to start a Selenium Grid and a Selenium Chrome Node and then using maven (that is configured and works correctly) send a test suite on that node.
If u have any experience with something similiar to what I wish to accomplish, please share your thoughts as what the best approach is to this situation.
Cheers.
That's because docker images that you probably create within your pipeline cannot also run (become containers) within the pipeline environment, because that environment isn't designed to also host applications.
You need to find a hosting provider for your docker images (e.g. Azure or GCP). Once you set up the hosting part, you need to add a step to your pipeline to upload/push the image to that provider's docker registry or to the free public Docker Hub. Then, finally, add a step to your pipeline to send a command to your hosting, to download the image from whichever docker registry you chose, and to launch the image into a container (this last part of download and launch is covered by docker run). Only at that point you have a running app.
Good luck.
Somewhat relevant (maybe it'll help you understand how some of those things work):
Command docker build is comparable to the proces of producing an installer package such as MSI.
Docker image is comparable to an installation package (e.g. MSI).
Command docker run is comparable to running an installer package with the goal of installing an app. So, using same analogy, running an MSI installs an app.
Container is comparable to installed application. Just like an app, docker container can run or be in stopped state. This depends on the environment, which I referred to as "hosting" above.
Just like you can build an MSI package on one machine and run it on other machines, you build docker images on one machine (pipeline host, in your case), but you need to host them in environments that support that.

Concurrent build within Docker with regards to multi staging

I have a monolithic repo that contains all of my projects. The current setup I have is to bring up a build container, mount my monolithic repo, and build my projects sequentially. Copy out the binaries, and build their respective runtime (production) containers sequentially.
I find this process quite slow and want to improve the speed. Two main approach I want to take is
Within the build container, build my project binaries concurrently. Instead of sequentially.
Like step 1, also build my runtime (production) containers concurrently.
I did some research and it seems like there are two Docker features that are of my interest:
Multi-stage building. Which allows me to skip worrying about the build container and put everything into one Dockerfiles.
--parallel option for docker-compose, which would solve approach #2, allowing me to build my runtime containers concurrently.
However, there's still two main issues:
How do I glue the two features together?
How do I build my binaries concurrently inside the build Docker? In other words, how can I achieve approach #1?
Clarifications
Regardless of whether multi-stage is used or not, there's two logical phases.
First is the binary building phase. During this phase, the artifacts are the compiled executables (binaries) from the build containers. Since I'm not using multi-stage build, I'm copying these binaries out to the host, so the host serves as an intermediate staging area. Currently, the binaries are being built sequentially, I want to build them concurrently inside the build container. Hence approach #1.
Second is the image building phase. During this phase, the binaries from the previous phase, which are now stored on the host, are used to built my production images. I also want to build these images concurrently, hence approach #2.
Multi-stage allows me to eliminate the need for an intermedia staging area (the host). And --parallel allows me to build the production images concurrently.
What I'm wondering is how I can achieve approach #1 & #2 using multi-stage and --parallel. Because for every project, I can define a separate multi-stage Dockerfiles and call --parallel on all of them to have their images built separately. This would achieve approach #2, but this would spawn a separate build container for each project and take up a lot of resource (I use the same build container for all my projects and it's 6 GB). On the other hand, I can write a script to build my project binaries concurrently inside the build container. This would achieve approach #1, but then I can't use multi-stage if I want to build the production images concurrently.
What I really want is a Dockerfiles like this:
FROM alpine:latest AS builder
RUN concurrent_build.sh binary_a binary_b
FROM builder AS prod_img_a
COPY binary_a .
FROM builder AS prod_img_b
COPY binary_b .
And be able to run a docker-compose command like this (I'm making this up):
docker-compose --parallel prod_img_a prod_img_b
Further clarifications
The run-time binaries and run-time containers are not separate things. I just want to be able to parallel build the binaries AND the production images.
--parallel does not use different hosts, but my build container is huge. If I use multi-stage build and running something like 15 of these build containers in parallel on my local dev machine could be bad.
I'm thinking about compiling the binary and run-time containers separately too but I'm not finding an easy way to do that. I have never used docker commit, would that sacrifice docker cache?
Results
My mono-repo containers 16 projects, some are micro services being a few MBs, some are bigger services that are about 300 to 500 MBs.
The build contains the compilation of two prerequisites, one is gRPC, and the other is XDR. both are trivially small, taking only 1 or 2 seconds to build.
The build contains a node_modules installation phase. NPM install and build is THE bottleneck of the project and by far the slowest.
The strategy I am using is to split the build into two stages:
First stage is to spin up a monolithic build docker, mount the mono-repo to it with cache consistency as a binding volume. And build all of my container's binary dependencies inside of it in parallel using Goroutines. Each Goroutine is calling a build.sh bash script that does the building. The resulting binaries are written to the same mounted volume. There is cache being used in the form of a mounted docker volume, and the binaries are preserved across runs to a best effort.
Second stage is to build the images in parallel. This is done using docker's Go SDK documented here. This is also done in parallel using Goroutines. Nothing else is special about this stage besides some basic optimizations.
I do not have performance data about the old build system, but building all 16 projects easily took the upper bound of 30 minutes. This build was extremely basic and did not build the images in parallel or use any optimizations.
The new build is extremely fast. If everything is cached and there's no changes, then the build takes ~2 minutes. In other words, the overhead of bring up the build system, checking the cache, and building the same cached docker images takes ~2 minutes. If there's no cache at all, the new build takes ~5 minutes. A HUGE improvement from the old build.
Thanks to #halfer for the help.
So, there are several things to try here. Firstly, yes, do try --parallel, it would be interesting to see the effect on your overall build times. It looks like you have no control over the number of parallel builds though, so I wonder if it would try to do them all in one go.
If you find that it does, you could write docker-compose.yml files that only contain a subset of your services, such that you only have five at a time, and then build against each one in turn. Indeed, you could write a script that reads your existing YAML config and splits it up, so that you do not need to maintain your overall config and your split-up configs separately.
I suggested in the comments that multi-stage would not help, but I think now that this is not the case. I was wondering whether the second stage in a Dockerfile would block until the first one is completed, but this should not be so - if the second stage starts from a known image then it should only block when it encounters a COPY --from=first_stage command, which you can do right at the end, when you copy your binary from the compilation stage.
Of course, if it is the case that multi-stage builds are not parallelised, then docker commit would be worth a try. You've asked whether this uses the layer cache, and the answer is I don't think it matters - your operation here would thus:
Spin up the binary container to run a shell or a sleep command
Spin up the runtime container in the same way
Use docker cp to copy the binary from the first one to the second one
Use docker commit to create a new runtime image from the new runtime container
This does not involve any network operations, and so should be pretty quick - you will have benefited greatly from the parallelisation already at this point. If the binaries are of non-trivial size, you could even try parallelising your copy operations:
docker cp binary1:/path/to/binary runtime1:/path/to/binary &
docker cp binary2:/path/to/binary runtime2:/path/to/binary &
docker cp binary3:/path/to/binary runtime3:/path/to/binary &
Note though these are disk-bound operations, so you may find there is no advantage over doing them serially.
Could you give this a go and report back on:
your existing build times per container
your existing build times overall
your new build times after parallelisation
Do it all locally to start off with, and if you get some useful speed-up, try it on your build infrastructure, where you are likely to have more CPU cores.

Docker multi stage builds, Kubernetes, and Distroless compatibility

I am facing "theoritical" compatility issues when using distroless-based containers with kubernetess 1.10.
Actually, distroless requires docker 17.5 (https://github.com/GoogleContainerTools/distroless) whereas kubernetes does support version 17.03 only (https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#external-dependencies)
is it possible to run distroless containers within kubernetes 1.10
clusters w/o any issue?
is it possible to build distroless based
images on a build server running docker 17.05 then deploying it on a
kubernetes 1.10 cluster (docker 17.03)?
The requirement for 17.05 is only to build a "distroless" image with docker build using multistage Dockerfile. When you have an image built, there is nothing stopping it from running on older Docker / containerd versions.
Docker has supported images with no distribution for ages now by using FROM: scratch and leaving it to the image author to populate whatever the software needs, which in some cases like fully static binaries might be only the binary of the software and nothing more :)
It seems that you might need Docker 17.05+ only for building images using multi-stage files.
After you build an image with the multi-stage Dockerfile, it will be the same image in the registry like if you build it in an old-fashioned way.
Taken from Use multi-stage builds:
With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.
The end result is the same tiny production image as before, with a significant reduction in complexity.
Kubernetes does not use Dockerfiles for creating pods. It uses ready to run images from the Docker registry instead.
That's why I believe that you can use such images in Kubernetes Pods without any issues.
But anyway, to create and push your images, you have to use a build machine with Docker 17.05+ that can consume new multi-stage syntax in the Dockerfile.

How to cache downloaded dependencies for a Jenkins Docker SSH Slave (Gradle)

We have Jenkins Docker Slave template that successfully builds a piece of software for example a Gradle project. This is based on the https://hub.docker.com/r/evarga/jenkins-slave/).
When we fire up the docker slave the dependencies are downloaded everytime we do a build. We would like to speed up the build so dependencies that are downloaded can be reused by the same build or even by other builds.
Is there a way to specify an external folder so that cache is used? Or another solution that reuses the same cache?
I think, the described answers work only for exclusive caches for every build-job. If I have different jenkins-jobs running on docker-slaves, I will get some trouble with this scenario. If the jobs run on the same time and write to the same mounted cache in the host-filesystem, it can become corrupted. Or you must mount a folder with the job-name as part of the filesystem-path (one jenkins-job run only once at a time).
Here's an example for maven dependencies, it's exactly what Opal suggested. You create a Volume, wich refers to cache folder of the host.

Resources