Handling complex and large dependencies

Handling complex and large dependencies - bazel

Problem
I've been developing a game in C++ in my spare time and I've opted to use Bazel as my build tool since I have never had a ton of luck (or fun) working with make or cmake. I also have dependencies in other languages (python for some of the high level scripting). I'm using glfw for basic window handling and high level graphics support and that works well enough but now comes the problem. I'm uncertain on how I should handle dependencies like glfw in a Bazel world.
For some of my dependencies (like gtest and fruit) I can just reference them in my WORKSPACE file and Bazel handles them automagically but glfw hasn't adopted Bazel. So all of this leads me to ask, what should I do about dependencies that don't use Bazel inside a Bazel project?
Current approach
For many of the simpler dependencies I have, I simply created a new_git_repository entry in my WORKSPACE file and created a BUILD file for the library. This works great until you get to really complicated libraries like glfw that have a number of dependencies on their own.
When building glfw for a Linux machine running X11 you now have a dependency on X11 which would mean adding X11 to my Bazel setup. X11 Comes with its own set of dependencies (the X11 libraries like X11Cursor) and so on.
glfw also tries to provide basic joystick support which is provided by default in Linux which is great! Except that this is provided by the kernel which means that the kernel is also a dependency of my project. Now I shouldn't need anything more than the kernel headers this still seems like a lot to bring in.
Alternative Options
The reason I took the approach I've taken so far is to make the dependencies required to spin up a machine that can successfully build my game very minimal. In theory they just need a C/C++ compiler, Java 8, and Bazel and they're off to the races. This is great since it also means I can create a Docker container that has Bazel installed and do CI/CD really easily.
I could sacrifice this ease and just say that you need to have libraries like glfw installed before attempting to compile the game but that brings the whole which version is installed and how is it all configured problem back up that Bazel is supposed to help solve.
Surely there is a simpler solution and I'm overthinking this?

If the glfw project has no BUILD files, then you have the following options:
Build glfw inside a genrule.
If glfw supports some other build system like make, you could create a genrule that runs the tool. This approach has obvious drawbacks, like the not-to-be-underestimated impracticality of having to declare all inputs of that genrule, but it'd be the simplest way of Bazel'izing glfw.
Pre-build glfw.o and check it into your source tree.
You can create a cc_library rule for it, and put the .o file in the srcs. Even though this solution is the least flexible of all because you not only restrict the target platform to whatever the .o was built for, but also make it harder to reproduce the whole build, the benefits are sometimes worth the costs.
I view this approach as a last resort. Even in Bazel's own source code there's one cc_library.srcs that includes a raw object file, because it was worth it, as the commit message of 92caf38 explains.
Require that glfw be installed.
You already considered this option. Some people may prefer this to the other approaches.

Related

With bazel how do I be/make sure objects taken from cache have been build for the right system/libraries?

I got some strange glibc-related linker errors for builds with distributed build cache configured on build nodes running different Linux distributions.
Now I somehow suspect build artifacts from those machines with different glibc versions getting mixed up, but I don't know how to investigate this.
How do I find out what Bazel takes into account when building the hash for a certain build artifact?
I know I can explicitly set environment variables which then will affect the hash. But how can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
And how do I check/compare what's been taken into account?

This is a complex topic and a multi-facet question. I am going to answer in the following order:
How do I check/compare what's been taken into account?
How to investigate against which glibc a build linked?
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
How do I check/compare what's been taken into account?
To answer this, you should look into the the execution look, specifically you can read up on https://bazel.build/remote/cache-remote#compare-logs. The *.json execution log should contain everything you need to know (granted, it might be a bit verbose) and is a little easier to process with shell-magic/your editor.
How to investigate against which glibc a build linked?
From the execution log, you can get all the required hashes to retrieve cached artifacts/binaries from your remote cache. Given these files, you should be able to use standard tools to get to the glibc version (ldd -r -v binary | grep GLIBC).
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
This depends on the way you have setup for compilation toolchain. The best case would be a fully hermetic compilation toolchain, where all necessary files are declared using attributes like https://bazel.build/reference/be/c-cpp#cc_toolchain.compiler_files.
But this would also mean to lock-down the compiler sysroot. This should include all libraries you are linking against if you want full hermeticity. If you want to use some system libraries, you need to tell bazel where to find them and to factor in their hash: https://stackoverflow.com/a/43419786/20546409 or https://www.stevenengelhardt.com/2021/09/22/practical-bazel-depending-on-a-system-provided-c-cpp-library/
If you use the auto-detected compiler toolchain, some tricks are used to lock-down the sysroot paths, but expect some non-hermiticity. https://github.com/limdor/bazel-examples/tree/master/linux_toolchain is a nice write-up how to move from the auto-detected toolchain to something more hermetic.
The hack
Of course, you can hack around this. Note, this is inherently a bad idea:
create a script that inspects the system, determines everything important like the glibc version, maybe the linux distribution (flavor)
creates a string describing this variation and hash-summing it
use that as the instance key/name for your remote cache

Bazel: how disentangle the 81 ways to config a c/c++ build

I'm writing a new C/C++ library with tasks. It's low level so it'll need the ability to tune the build for CPU, O/S, whether libraries (or tasks) are build opt, dbg, or which network library is used (e.g. generic TCP, Solarflare, Mellanox, Infiniband). This seems like an ideal Bazel use-case.
I have basic Bazel working e.g. I can build sample tasks, libraries, with dependencies. And so far that works nicely.
Now, to the point: as the code is just C with some C++, it seems impractical to build a whole new toolchain. What Bazel ships with and/or can do on GCC/Clang is good enough; most defaults are fine. Can't I just customize it? Still, I need a simple way to accomplish the following:
allow developers to choose their compiler typically clang, or gcc, with version
allow developers to decide how to build their code e.g. x86-32 dbg libraries but x86-32 opt binaries. Or 64 bit versions of same ...
allow developers to select which network library to link in, and therefore build it but not the others
for dbg builds developers may want to toss in an extra compilation flag and be assured it's used everywhere when build libraries and tasks
to limit developers to valid configurations. For example, if I know the code works on Intel chips x86-32bit or 64 bit but PPC processors aren't supported ... then ... developers may have lots of dials to turn for x86, it's desirable to stop with an error when cpu=ppc
Which basic approach is best?
allow Bazel to auto-discover the platform, toolchain and instruct developers to modify it to task? How?
provide a copy-and-pasted-and-edited custom c/c++ tool chain .bzl file?
Focus on on CROSSTOOL only?
Ship the library with customized platform and tool chain files?
TIA

Using large non-bazel dependencies in a bazel project

I would like to use a very large non-bazel system in a bazel project. Specifically, ROS2. This dependency provides a large number of python, C, and C++ libraries which are built using its own hand-rolled buildsystem. Obviously, I would like to avoid having to translate the entire buildsystem over to bazel.
Broadly, what's the best way of me doing this? In instinct was to use a custom repository rule to download the source (since it's split across many repositories), then use a genrule to call the ROS2 build system. Then write my simple cc_import and py_library rules for each of the individual components that I need.
However, I'm having trouble with the bit where I need to call the foreign build system. It seems that genrules require a list of output files to be specified, while I would like it to make an entire build directory available.
Before I spent any more time on this, I thought I'd ask whether I'm on the right lines since I'm new to bazel. Is this a good strategy? How would you approach this problem? Are there any other projects that mainly use bazel, but call other build systems in this way that I can look at?

As of recent, you can use rules_foreign_cc to call native CMake or make/configure like projects.

What exacly is "buidling" from source and how does it work

So I really cant understand how this work but late me explain. First, just in case you need it, I am running Ubuntu 12.04 64-bit on a laptop.
As a building tool am using CMake. I want to load in to my project OpenCV, MRPT (http://www.mrpt.org/) and libfreenect. All of them have a "source code". What I don't understand is when they say "build from source". How to I make a project with all of them?
Do I need to build each one individually and with some way but then in my project OR do I down load the source code and build them all together at ones? As you can see I'm really confused what I have to do... do I run the CMakeList.txt from each source code and the run one CMakeList.txt that has all the other CMakeList.txt?
In fewer world, if I want to build from source, two or more libraries, how do I do that?
I would like a general answer (how this "build from source" works) and an answer specifically on the the ones I mentioned (CMake, OpenCV, MRPT, libfreenect). I hope I made clear what I don't really understand.

It depends of the 'master' project. In general in the c/c++ universe your project must know how to invoke the build process of each subproject/library OR your project needs to know how to include&link the results after building each external project yourself.
You can also mix the two approaches if needed but I think it cleaner to try to use one if possible.
In the first case if all the subprojects offer cmake building files (CMakeLists.txt) you may try to add_subdirectory() each and see if there are any conflicts. For example google test can be easily included this way and it gives your project some global variables that easy linking later.
Alternatively or if the above approach gives problems or the sub project doesn't provide CMakeLists.txt you can use ExternalProject_add(). It takes more work and you have to handle includes/linking configurations with your project manually but it makes the subproject more independent. For example if there are conflicting targets with your project or the subproject doesn't provide CMakeLists.txt.
The last approach involves building and installing the sub projects separately, using configuration variables in your project to point the includes/libraries paths of the sub project. Check CMake:How To Find Libraries for details.

Could Free Pascal benefit of something like Apache Maven?

Apache Maven is a very popular build and dependency management tool in the Java open source ecosphere. I did some tests to find out if it can handle compiled Free Pascal / Delphi units and found it easy to implement. So it would be possible to
release open source libraries precompiled for Free Pascal (or Delphi) in a public Maven repository
include metadata in this repository which contains dependency information
use Maven on the command line to download the open source library from the public repository, and automatically resolve all dependencies
local repositories, working as proxies, could be used to cache frequently used binaries
automatic checksum generation and verification (provided by Maven) would reduce the risk of downloading corrupted binaries
source code and even documentation files could be provided with the binaries
binaries can be provided with or without debug information
continuous integration servers like Hudson, TeamCity or CruiseControl can be used to build projects whenever changes have been submitted to the source control system and notify developers about build errors
This way of dependency management could be very beneficial for open source projects which use many third party libraries with complex dependencies. It would avoid typical conflicts caused by using wrong versions.
For the developer, the workflow for editing and building a project would be reduced to a minimum:
checkout the project source from internal version control system
edit source file(s)
run mvn package to automatically download all required third party libraries (precompiled units) if they are not yet in the workstation's local repository
compile and run
The only additional file for Apache Maven which is required in the project folder is the POM.XML file containing the project information.
Edit: while Maven is usable for some of the required tasks, implementing a solution like Maven in native Free Pascal would have some advantages: no Java SDK required, support for all development platforms where Free Pascal is available, maintenance and plugin development in Pascal.
Usage of a Maven-like tool would not be helpful for open source projects only - commercial projects could access and use the artifacts in public Maven repositories in the same way as well.
Maven features are listed at http://maven.apache.org/maven-features.html
Update:
one use case could be the build of Lazarus, where Maven would download all required libraries and invoke the compiler with the necessary build path arguments. Changes in the dependencies on lower levels would be propagated automatically up to the parent build.
Possible benefits:
less time needed to set up a new work
station, no manual installation of
third party libraries required
less errors caused by wrong library
versions, detection of version
conflicts (for example if two
libraries depend on different
versions of a third library)
artifacts which are created inhouse
can be added to the local maven
repository and shared between
developers and project, central
storage of all artifacts with
metadata
builds are reproducible, just by
using the same source and project
metadata file (pom.xml)
can reduce development time and
increase project stability
Update #2: FPMake
the FPMake build system for Free Pascal seems to be a tool with much potential, in many details it is quite similar to Maven:
FPMake is a pascal based build system developed for and distributed with FPC
FPMake standardizes the building by defining some limits like standard directories
the command fppkg <packagename> will look in a database for the package, extract it, and then compile fpmake.pp and run it
it has standard build targets (clean, build, install, ...)
it can create a 'manifest' file suitable for import into a repository (like mvn deploy or mvn install), the manifest is an XML file which looks very similar to a pom.xml in Maven:
FPMake manifest file:
<packages>
<package name="my-package">
<version major="0" minor="7" micro="6" build="1"/>
<filename>my-package-0.7.6-1.zip</filename>
<author>my name</author>
<license>GPL</license>
<homepageurl>http://www.freepascal.org/</homepageurl>
<email>myname#freepascal.org</email>
<description>this is the package description</description>
<dependencies>
<dependency>
<package packagename="rtl"/>
</dependency>
</dependencies>
</package>
</packages>

Freepascal has been working on a package system of its own in a cross between apt-get and freebsd ports style. (download source/build/install automatically), called fppkg.
However work has stalled. People investing time are the bottleneck, not people wanting to choose tools.
As far as Maven goes, I don't like auxilary tools that need installation of huge external runtimes. It might be fine for a big major app (like Open Office), but not for an util.
I also prefer a tool that is designed to the FPC reality and workflow.
Documentation tools, build tools, download systems, testsuite systems are already all there, it just need a person that dedicates a lot of time into it to make it happen.
Some typical problems when introducing a new technology in a project as FPC, and why it has a tendency to make its own tools:
need to train 20+ committers in parttime.
The only COMMON programming language you can assume is Free Pascal. Even Delphi inner workings can't be taken for granted to be known (many committers came directly to FPC or even still via TP or a Mac Pascal)
Obviously that makes something with plugins in a different language annoying.
Bash script is a close second. (g)make third, but already a magnitude less.
All servers are *nix-like (FreeBSD, OS X, Linux), but not all run Apache. (e.g. my FreeBSD mirror runs XSHTTPD)
somebody most knowledgable must be dedicated maintainer for a long time. Fix problems, update/ do migrations etc. Perferably more than one for obvious reasons.
a major pain are Linux distributions (and FreeBSD to a lesser degree), most maintainers of *nix packages are not capable of more than "./configure;make;make install", and must be spoonfed with a near buildable repository and auxilary files.
In-distribution packaging of FPC/Lazarus has always been important, and is still increasing
All distributions have their own special rules about metadata, depedancies, and how sources must be published. Particularly Debian/Ubuntu is very bureaucratic and slow.
Most don't like third party auto-installers on top of their systems (since that bypasses their dependancy control)
This all leads to the effective practice that own tools in Pascal with minimal scripting work best. Some tools used:
Gmake is mainly used to parameterise the build process on a per directory level, a successor, fpcmake (not really a make derivative despite the name) has begun, but the migration hasn't completed.
Latex and a latex to html conversion (tex4ht, but debian uses hevea) are used in the documentation building (the non library documentation)
The community site (netscape community server which uses TCL scripting, a heavy complex application server) has been a trouble ever since it started, but specially lately since the maintainer became less active.
Mantis has been a problem (specially the email module would crash or lame the server due to the volume), but it has been whipped into shape during successive updates and hard work of several lazarus devels. Currently it is a decent workhorse.
lazarus.freepascal.org PHPBB forum OTOH is relatively painless since a lot of younger people know how to deal with it.
The same goes for subversions (though the more advanced scale needs some adjusting, not everybody is deep into the ins and outs of mergetracking)
If somebody was really serious about Maven, I usually would ask him:
to CRITICIALLY investigate the use for the project. In a very concrete way, with schedule and time estimates. Birds-eye level "everything's possible" overviews are essentialy worthless.
Give some thought on future change of used technologies. Every technology is eventually replaced, even the in-house ones, in 18 year+ projects. A new technology must not make migrations of other infrastructural components hard or involved. The new technology to end all new technologies doesn't exist.
Make a migration plan. Migration is often underrated and underestimated.
And in the end, there is always the 1000000 Euro question, who will do the daily maintenance?
Keep in mind that in a company you just kick the person responsible for the application server. But in an informal environment this is way harder, specially long term, since people's lives, occupations and time spent on the project vary.

Sounds like an interesting plan, but the Delphi community (and FPC even more so, I'd imagine!) values libraries as source far more than precompiled libraries. The general consensus is that anyone who uses a binary-only library is a fool, for two reasons: You can't fix any bugs you find in it, and compiler changes will break compatibility.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart