How to build a large project by Bazel on two servers to speed up the building process?
When you have a large codebase, chains of dependencies can become very deep. Even simple binaries can often depend on tens of thousands of build targets.
https://bazel.build/basics/dependencies
The documentation for remote builds starts here:
https://bazel.build/remote/rbe
In particular you might look into the self-service and commercial remote execution services here:
https://bazel.build/community/remote-execution-services
Related
I'm looking for a way to use Jenkins to build a single code base for multiple CPU architectures. at the moment this is amd_64 and armhf, although this may expand in the future. The ideal situation would be to run the build over a number of different jenkins slaves with a different CPU architectures.
These build jobs are not compiler based (maven, gradle ext.) but system independ shell scripts (bash and python) which auto detect their CPU architecture and produce build artifacts to match the CPU.
I may be missing something really obvious, but I don't see a way to automatically run a build a number of times over different architectures or bind a specific build to a specific architecture.
Could anyone point me in the right direction?
Funny you ask this question now. Published last Friday (2019-11-22) ...
You should review the Jenkins blog: Welcome to the Matrix
I often find myself needing to run the same actions on a bunch of
different configurations. Up to now, that meant I had to make multiple
copies of the same stages in my pipelines. When I needed to make
changes, I had to make the same changes in multiple places throughout
my pipeline. Maintaining even a small number of configuration was
difficult for larger pipelines.
Single configuration pipeline
Pipeline for multiple platforms and browsers
Excluding invalid combinations
originally, I use gradle to build my android project, but recently, I migrate it to bazel, and I find that bazel is truly fast than gradle, so I want to know why, but the doc of bazel doesn't give much idea about this, can anyone help me?
Thanks very much!
Full disclosure: I work on Bazel.
That's not an easy question to answer for two reasons. First, performance is highly dependent on the scenario. For example, we'd generally expect a clean build to be slower than a build where only a single file has changed. Second, I don't know how Gradle works internally, and they've done a lot of work recently to improve Gradle performance.
But I can talk about Bazel and what we're doing to make it fast. We've been working on build performance for ~10 years, starting long before we made it public.
The key feature is that we require all dependencies to be declared, and we track them explicitly. If you use a header file in C++, or depend on a Java library, you must declare this dependency in your BUILD file (and we enforce that these are declared by sandboxing individual actions). There are three effects from this:
First, we can heavily parallelize the build, because we know which things depend on which other things.
Second, we can make incremental builds very fast, because we can tell what parts of the build have to be re-done when you change a specific file (BUILD file, header file, source file, ...).
Third, we almost never have to do clean builds. Other build tools often require 'make clean' to get into a predictable state - since Bazel knows all the dependencies, it can get to a predictable state on every single build.
Another effect is that we can cache remotely (i.e., across users), and even execute on another machine, although neither of these are fully supported at the time of this writing.
I have a large solution, with many projects and many files, and only one build configuration, Release. I am using TFS, and the complete rebuild takes like 2 hours.
Is it possible to distribute the build across several agents, so that they will compile different projects, or, even better, different files? Something like dictcc? I can distribute the build on up to 10+ different machines, but the build only works on one.
For now, my impression is that agents can only have specialized jobs, like build, run tests, etc, but not split and distribute only one build.
I already tried optimizing the build, but still the project is big and can benefit on parallel build
You can but but must roll-up your sleeves: there is no built-in template that helps, but Jim explains how to make one.
Do not forget that you can also leverage multi-CPU/Core as explained in Building Multiple Projects in Parallel with MSBuild.
Your best option would be to break your solution down into defunct components that can be built separately.
If you seperate each bit and build and test before publishing as Nuget you can distribute easily across build servers and even only build the bits that have changed.
This process will also work in the new build system coming in 2015 that does not use XAML.
Our project group stored binary files of the project that we are working on in SVN repository for over a year, in the end our repository grew out of control, taking backups of SVN repo became impossible at one point since each binary that is checked in is around 20 MB.
Now we switched to TFS,we are not responsible for backing the repository up, our IT tream takes care of it and we have more network and storage capacity for backups because of that but we want to decide what to do with the binaries. As far as I know TFS stores deltas and for binary files but deltas will be huge, but we might end up reaching our disk space quota one day, so I would like to plan things better from the start, I don't want to get caught up in a bad situation when it's too late to fix the problem.
I would prefer not keeping builds in the source control but our project group insists to keep a copy of every binary for reproducing the problems that we see in the production system, I can't get them to get the source code from TFS, build it and create the binary, because it is not straightforward according to them.
Does TFS offer a better build versioning method? If someone can share some insight I'd really be grateful.
As a general rule you should not be storing build output in TFS. Occasionally you may want to store binaries for common libraries used by many applications but tools such as nuget get around that.
Build output has a few phases of its life and each phase should be stored in a separate place. e.g.
Build output: When code is built (by TFS / Jenkins / Hudson etc.) the output is stored in a drop location. This storage should be considered volatile as you'll be producing a lot of builds, many of which will be discarded.
Builds that have been passed to testers: These are builds that have passed some very basic QA e.g. it compiles, static code analysis tools are happy, unit tests pass. Once a build has been deemed good enough to be given to test it should be moved from the drop location to another area. This could be a network share (non production as the build can be reproduced) there may be a number of builds that get promoted during the lifetime of a project and you will want to keep track of what versions the testers are using in each environment.
Builds that have passed test and are in production: Your test team deem the build to be of a high enough quality to ship. As part of your go live process, you should take the build that has been signed off by test and store it in a 3rd location. In ITIL speak this is a Definitive Media Library. This can be a simple file share, but it should be considered to be "production" and have the same backup and resilience criteria as any other production system.
The DML is the place where you store the binaries that are in production (and associated configuration items such as install instructions, symbol files etc.) The tool producing the build should also have labelled the source in TFS so that you can work out what code was used to produce the binary. Your branching strategy will also help with being able to connect the binary to the code.
It's also a good idea to have a "live like" environment, this should be separate from your regular dev and test environments. As the name suggests it contains only the code that has been released to production. This enables you to quickly reproduce bugs in production
Two methods that may help you:
Use Team Foundation Build System. One of the advantages is that you can set up retention periods for finished builds. For example, you can order TFS to store the 10 latest successful builds, and the two latest failed ones. You can also tell TFS to store certain builds (e.g. "production builds"/final releases) indefinitely. These binaries folders can of course also be backed up externally, if needed.
Use a different collection for your binaries, with another (less frequent) backup schedule. TFS needs to backup whole collections, but by separating data that doesn't change as frequently as the source you can lower the backup cost. This of course depends on the frequency you are required to have the binaries backed up.
You might want to look into creating build definitions in TFS to give your project group an easy 'one button' push to grab the source code from a particular branch and then build it and drop it to a location. That way they get to have their binaries, and you don't have to source control them.
If you are using a branching strategy where you create Release or RTM branches when you push something to production, then you can point your build definitions at those branches and they can manually trigger them from the TFS portal or from within Visual Studio.
How would you manage the lifecycle and automated build process when some of the projects (C# .csproj projects) are part of the actual build system?
Example:
A .csproj is a project that uses MSBuild tasks that are implemented in BuildEnv.csproj.
Both projects are part of the same product (meaning, BuildEnv.csproj frequently changes as the product is being developed and not a 3rd party that is rarely updated)
You must factor this out into two separate "projects" otherwise you'll spend ages chasing your tail trying to find out if a broken build is due to changes in the build system or chages in the code being developed.
Previously we've factored the two systems out into separate projects in CVS.
You want to be able to vary one thing while keeping the other constant to limit what you would have to look at when performing forensic analysis.
Hope that helps.