How to stop TeamCity from rebuilding docker dependencies every time? - docker

I have a TeamCity build project that parameterizes a docker-compose.yml template with the build versions of a dozen Docker containers, so in order to get the build_counter from each container, I have them set as snapshot dependencies in the docker-compose build job. Each container's Dockerfile and other files are in their own BitBucket repo, and they have triggers for the appropriate files. In the snapshot dependencies in the docker-compose build I have them set to "Do not run new build if there is a suitable one" but it still tries to run all of the dependent builds even though there aren't any changes in their respective repos.
This makes what should be a very simple and quick build in to a very long build. And often times, one of the dependent builds will fail with "could not collect changes: connection refused" and I suspect it has to do with TC trying to hit all of these different repos all at once.
Is there something I can do to not trigger a build of every dependency every time the docker-compose build is run?
Edit:
Here's an example of what our docker-compose.yml.j2 looks like: http://termbin.com/b2xy
Obviously, I've sanitized it for sharing, and our real docker-compose template has about a dozen services listed.
Here is an example Dockerfile for one of the services: http://termbin.com/upins

Rather than changing the source code of your build (parameterized docker-compose.yml) and brute-force your build every time, you could consider building the containers independently while tagging them with a version increment, and labels. After the build store the images in a local registry. Use docker-compose to suit your runtime needs. docker-compose can use multiple yaml files, so if you need other images for a particular build, just pull the other images you need. For production use another yaml file that composes the system to run. Add LABEL to your Dockerfile. See http://label-schema.org//rc1/ for a set of labels that suit your needs.

I know this is old question but I have come across this issue and you can't do what sounds reasonable i.e. get recent green builds without rebuilding. This is partly because of what the snapshot dependencies are designed to do by Jetbrains.
The basic idea is that dependencies are for synchronized revisions of code: that is if you build Compose at a certain time, it will need to use not just its own versions of source code at that point in time but also the code for all the dependencies that also comes from that point of time, regardless of whether anything significant has changed.
In my example, there were always changes because the same repo was used for lots of projects and had unrelated changes that would not trigger a build but would make the project appear behind and cause a build.
If your dependencies have NOT changed and show no changes pending, then they shouldn't build. In this case, you need to tick "Do not run new build if there is a suitable one". "Enforce Revisions Synchronization" is slightly confusing. If ticked, it will find older builds that match the first build after your build was triggered. If unticked, it can use newer builds.

Related

Minimize build time for Gradle project on Docker

Imagine that I need to build a big Cuba application (it uses Gradle to manage dependencies and in the build produces a .war).
I need to dockerize both the build and the application. The latter is run in a Tomcat image in which the .war is copied.
The most of the dependencies remain actually unchanged between consecutive builds of the project, but the build seems to go over them each time, taking like forever...
I'd like to produce a custom Docker image from gradle:jdk8 (kinda) that imports all the Gradle dependencies.
his image will be used for consecutive builds to produce the .wars and will be rebuilt only when there is a change in the dependencies' versions.
Though, I'm quite new to Gradle and I don't know:
if it is possible to import the dependencies without building the project;
if it is actually possible to use previously imported dependencies to build the project in a shorter time.
Any advice/suggestion? Is this possible?
Hope my question is clear, but I have difficulties in explaining my aim. Ask me for better explication.
Thanks in advance.
You mean that you want to build a Docker image for build runner (or build agent), right?
It's not possible to import the dependencies without building the project, because Gradle resolves dependencies lazily, only when they are needed.
E.g. artifacts to build a CUBA theme are resolved only when web theme is built.
Yes, it's possible to re-use previously downloaded library artifacts (cached in ~/.gradle/caches) to build the project in a shorter time.
So in your case you need to create build runner's docker image by fully building your project once in a Docker container. Dependencies will be downloaded and cached in the file system. Then you can pull that image and use it again for subsequent builds, avoiding re-downloading artifacts.
If you change CUBA platform version in your project, you'll need to re-create the build runner image if you want to avoid downloading cuba-*.jar artifacts for every build.

Concurrent build within Docker with regards to multi staging

I have a monolithic repo that contains all of my projects. The current setup I have is to bring up a build container, mount my monolithic repo, and build my projects sequentially. Copy out the binaries, and build their respective runtime (production) containers sequentially.
I find this process quite slow and want to improve the speed. Two main approach I want to take is
Within the build container, build my project binaries concurrently. Instead of sequentially.
Like step 1, also build my runtime (production) containers concurrently.
I did some research and it seems like there are two Docker features that are of my interest:
Multi-stage building. Which allows me to skip worrying about the build container and put everything into one Dockerfiles.
--parallel option for docker-compose, which would solve approach #2, allowing me to build my runtime containers concurrently.
However, there's still two main issues:
How do I glue the two features together?
How do I build my binaries concurrently inside the build Docker? In other words, how can I achieve approach #1?
Clarifications
Regardless of whether multi-stage is used or not, there's two logical phases.
First is the binary building phase. During this phase, the artifacts are the compiled executables (binaries) from the build containers. Since I'm not using multi-stage build, I'm copying these binaries out to the host, so the host serves as an intermediate staging area. Currently, the binaries are being built sequentially, I want to build them concurrently inside the build container. Hence approach #1.
Second is the image building phase. During this phase, the binaries from the previous phase, which are now stored on the host, are used to built my production images. I also want to build these images concurrently, hence approach #2.
Multi-stage allows me to eliminate the need for an intermedia staging area (the host). And --parallel allows me to build the production images concurrently.
What I'm wondering is how I can achieve approach #1 & #2 using multi-stage and --parallel. Because for every project, I can define a separate multi-stage Dockerfiles and call --parallel on all of them to have their images built separately. This would achieve approach #2, but this would spawn a separate build container for each project and take up a lot of resource (I use the same build container for all my projects and it's 6 GB). On the other hand, I can write a script to build my project binaries concurrently inside the build container. This would achieve approach #1, but then I can't use multi-stage if I want to build the production images concurrently.
What I really want is a Dockerfiles like this:
FROM alpine:latest AS builder
RUN concurrent_build.sh binary_a binary_b
FROM builder AS prod_img_a
COPY binary_a .
FROM builder AS prod_img_b
COPY binary_b .
And be able to run a docker-compose command like this (I'm making this up):
docker-compose --parallel prod_img_a prod_img_b
Further clarifications
The run-time binaries and run-time containers are not separate things. I just want to be able to parallel build the binaries AND the production images.
--parallel does not use different hosts, but my build container is huge. If I use multi-stage build and running something like 15 of these build containers in parallel on my local dev machine could be bad.
I'm thinking about compiling the binary and run-time containers separately too but I'm not finding an easy way to do that. I have never used docker commit, would that sacrifice docker cache?
Results
My mono-repo containers 16 projects, some are micro services being a few MBs, some are bigger services that are about 300 to 500 MBs.
The build contains the compilation of two prerequisites, one is gRPC, and the other is XDR. both are trivially small, taking only 1 or 2 seconds to build.
The build contains a node_modules installation phase. NPM install and build is THE bottleneck of the project and by far the slowest.
The strategy I am using is to split the build into two stages:
First stage is to spin up a monolithic build docker, mount the mono-repo to it with cache consistency as a binding volume. And build all of my container's binary dependencies inside of it in parallel using Goroutines. Each Goroutine is calling a build.sh bash script that does the building. The resulting binaries are written to the same mounted volume. There is cache being used in the form of a mounted docker volume, and the binaries are preserved across runs to a best effort.
Second stage is to build the images in parallel. This is done using docker's Go SDK documented here. This is also done in parallel using Goroutines. Nothing else is special about this stage besides some basic optimizations.
I do not have performance data about the old build system, but building all 16 projects easily took the upper bound of 30 minutes. This build was extremely basic and did not build the images in parallel or use any optimizations.
The new build is extremely fast. If everything is cached and there's no changes, then the build takes ~2 minutes. In other words, the overhead of bring up the build system, checking the cache, and building the same cached docker images takes ~2 minutes. If there's no cache at all, the new build takes ~5 minutes. A HUGE improvement from the old build.
Thanks to #halfer for the help.
So, there are several things to try here. Firstly, yes, do try --parallel, it would be interesting to see the effect on your overall build times. It looks like you have no control over the number of parallel builds though, so I wonder if it would try to do them all in one go.
If you find that it does, you could write docker-compose.yml files that only contain a subset of your services, such that you only have five at a time, and then build against each one in turn. Indeed, you could write a script that reads your existing YAML config and splits it up, so that you do not need to maintain your overall config and your split-up configs separately.
I suggested in the comments that multi-stage would not help, but I think now that this is not the case. I was wondering whether the second stage in a Dockerfile would block until the first one is completed, but this should not be so - if the second stage starts from a known image then it should only block when it encounters a COPY --from=first_stage command, which you can do right at the end, when you copy your binary from the compilation stage.
Of course, if it is the case that multi-stage builds are not parallelised, then docker commit would be worth a try. You've asked whether this uses the layer cache, and the answer is I don't think it matters - your operation here would thus:
Spin up the binary container to run a shell or a sleep command
Spin up the runtime container in the same way
Use docker cp to copy the binary from the first one to the second one
Use docker commit to create a new runtime image from the new runtime container
This does not involve any network operations, and so should be pretty quick - you will have benefited greatly from the parallelisation already at this point. If the binaries are of non-trivial size, you could even try parallelising your copy operations:
docker cp binary1:/path/to/binary runtime1:/path/to/binary &
docker cp binary2:/path/to/binary runtime2:/path/to/binary &
docker cp binary3:/path/to/binary runtime3:/path/to/binary &
Note though these are disk-bound operations, so you may find there is no advantage over doing them serially.
Could you give this a go and report back on:
your existing build times per container
your existing build times overall
your new build times after parallelisation
Do it all locally to start off with, and if you get some useful speed-up, try it on your build infrastructure, where you are likely to have more CPU cores.

Build chain in the cloud?

(I understand this question is somewhat out of scope for stack overflow, because contains more problems and somewhat vague. Suggestions to ask it in the proper ways are welcome.)
I have some open source projects depending in each other.
The code resides in github, the builds happen in shippable, using docker images which in turn are built on docker hub.
I have set up an artifact repo and a debian repository where shippable builds put the packages, and docker builds use them.
The build chain looks like this in terms of deliverables:
pre-zenta docker image
zenta docker image (two steps of docker build because it would time out otherwise)
zenta debian package
zenta-tools docker image
zenta-tools debian package
xslt docker image
adadocs artifacts
Currently I am triggering the builds by pushing to github and sometimes rerunning failed builds on shippable after the docker build ran.
I am looking for solutions for the following problems:
Where to put Dockerfiles? Now they are in the repo of the package needing the resulting docker image for build. This way all information to build the package are in one place, but sometimes I have to trigger an extra build to have the package actually built.
How to trigger build automatically?
..., in a way supporting git-flow? For example if I change the code in zenta develop branch, I want to make sure that zenta-tools will build and test with the development version of it, before merging with master.
Are there a tool with which I can overview the health of the whole build chain?
Since your question is related to Shippable, I've created a support issue for you here - https://github.com/Shippable/support/issues/2662. If you are interested in discussing the best way to handle your scenario, you can also send me an email at support#shippable.com You can set up your entire flow, including building the docker images, using Shippable.

Where to keep Dockerfile's in a project?

I am gaining knowledge about Docker and I have the following questions
Where are Dockerfile's kept in a project?
Are they kept together with the source?
Are they kept outside of the source? Do you have an own Git repository just for the Dockerfile?
If the CI server should create a new image for each build and run that on the test server, do you keep the previous image? I mean, do you tag the previous image or do you remove the previous image before creating the new one?
I am a Java EE developer so I use Maven, Jenkins etc if that matter.
The only restriction on where a Dockerfile is kept is that any files you ADD to your image must be beneath the Dockerfile in the file system. I normally see them at the top level of projects, though I have a repo that combines a bunch of small images where I have something like
top/
project1/
Dockerfile
project1_files
project2/
Dockerfile
project2_files
The Jenkins docker plugin can point to an arbitrary directory with a Dockerfile, so that's easy. As for CI, the most common strategy I've seen is to tag each image built with CI as 'latest'. This is the default if you don't add a tag to a build. Then releases get their own tags. Thus, if you just run an image with no arguments you get the last image built by CI, but if you want a particular release it's easy to say so.
I'd recommend keeping the Dockerfile with the source as you would a makefile.
The build context issue means most Dockerfiles are kept at or near the top-level of the project. You can get around this by using scripts or build tooling to copy Dockerfiles or source folders about, but it gets a bit painful.
I'm unaware of best practice with regard to tags and CI. Tagging with the git hash or similar might be a good solution. You will want to keep at least one generation of old images in case you need to rollback.

TFS 2013 build agents sharing common build folder

I'm using TFS 2013 on premises. I have four build agents configured on a Build machine. Several build definitions compile ASP .NET websites. I configured the msbuild parameters to deploy the IIS application to the integration server, which sits out there in Rackspace.
By default webdeploy does differential deployments by comparing file dates. In my case that's a big plus because copying files from our network to Rackspace takes quite some time. Now, in order to preserve file dates the build agent has to compile the same base set of source code. On every build only the differential source code yields a new DLL, minimizing the number of files deployed.
All of that works fine, with a caveat: a given build definition has to be assigned to a build agent (by agent name or tag). The problem is I create a lot of contingency when all builds assigned to the same agent are queued up. They wait in line until the previous build is done.
In an ideal world any agent should be able to take care of any build, but the source code being compiled has to be the same, regardless of the agent.
I tried changing the working folder of all agents to point to the same location but I get an error because two agents can't be mapped to the same folder. I guess there is one workspace per agent.
Any ideas?
Finally I found a way to do this. Here are all the changes you need to do:
By default the working folder of each agent is $(SystemDrive)\Builds\$(BuildAgentId)\$(BuildDefinitionPath). That means there's one working folder per BuildAgentId. I changed it so that all Agents share the same folder: $(SystemDrive)\Builds\WorkingFolder\$(BuildDefinitionPath)
By default at runtime the workflow creates a workspace that looks like "[BuildDefinitionId][AgentId][MachineName]". Because all agents share the same working folder there's an error trying to create each separate workspace. The solution to this is in the build definition: Edit the xaml and look for an activity called "Get sources from Team Foundation Version Control". There's a property called WrokspaceName. Since I want to have one workspace per build definition I set that property to the BuildDetail.BuildDefinition.Name.
Save your customized build template and create a build that uses it.
Make sure the option "1. TF VersionControl/1. Clean workspace" is set to False. Otherwise the build will wipe out all the source code on every build.
Make sure the option "2. Build/3. Clean build" is set to false. Otherwise the build will wipeout the output binaries on every build.
With this setup you can queue up the same build on any agent, and all of them will point to the same source code and bin output. When the source code changes only the affected binaries are recompiled. I have a custom step in the template that deploys the output files to IIS, to all the servers in our webfarm, using msdeploy.exe. Now my builds+deployments take one or two minutes, because only the dlls or content that changed during the build are synchronized to the servers.
You can't run two build agents in the same folder. The point of build agents is to run multiple builds in parallel, usually on separate PCs. If you try to run them on the same source code, then (a) it's pointless as two build of exactly the same source should produce identical results, and (b) they are almost certainly going to trip over each other and cause the builds to fail or produce unexpected results.
If you want to be able to build and then deploy a series of versions of your codebase, then there are two options:
if you queue up multiple builds, then the last one will "win", so the intermediate builds are of no real value. So if you check in New code before your first build completes, you may as well stop the active build and start a new one. you should be asking yourself why the build is so slow, or why you are checking in changes so often that this is necessary.
if each build produces an incremental update to the deployed result, then you need to pass the output of your builds to some deployment agent that is able to diff it against the deployed version and send only the changes to be deployed. This could be set up to gather results from multiple build agents if that would be beneficial.
but I wonder if perhaps your build Is slow because you are doing a complete build each time (which cleans the build folder, gets all the sources, and does a full rebuild), when what you want is an incremental build (which gets the latest changes, compiles only what is affected, and complete quickly). perhaps you should investigate making your build incremental.

Resources