With bazel how do I be/make sure objects taken from cache have been build for the right system/libraries? - bazel

I got some strange glibc-related linker errors for builds with distributed build cache configured on build nodes running different Linux distributions.
Now I somehow suspect build artifacts from those machines with different glibc versions getting mixed up, but I don't know how to investigate this.
How do I find out what Bazel takes into account when building the hash for a certain build artifact?
I know I can explicitly set environment variables which then will affect the hash. But how can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
And how do I check/compare what's been taken into account?

This is a complex topic and a multi-facet question. I am going to answer in the following order:
How do I check/compare what's been taken into account?
How to investigate against which glibc a build linked?
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
How do I check/compare what's been taken into account?
To answer this, you should look into the the execution look, specifically you can read up on https://bazel.build/remote/cache-remote#compare-logs. The *.json execution log should contain everything you need to know (granted, it might be a bit verbose) and is a little easier to process with shell-magic/your editor.
How to investigate against which glibc a build linked?
From the execution log, you can get all the required hashes to retrieve cached artifacts/binaries from your remote cache. Given these files, you should be able to use standard tools to get to the glibc version (ldd -r -v binary | grep GLIBC).
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
This depends on the way you have setup for compilation toolchain. The best case would be a fully hermetic compilation toolchain, where all necessary files are declared using attributes like https://bazel.build/reference/be/c-cpp#cc_toolchain.compiler_files.
But this would also mean to lock-down the compiler sysroot. This should include all libraries you are linking against if you want full hermeticity. If you want to use some system libraries, you need to tell bazel where to find them and to factor in their hash: https://stackoverflow.com/a/43419786/20546409 or https://www.stevenengelhardt.com/2021/09/22/practical-bazel-depending-on-a-system-provided-c-cpp-library/
If you use the auto-detected compiler toolchain, some tricks are used to lock-down the sysroot paths, but expect some non-hermiticity. https://github.com/limdor/bazel-examples/tree/master/linux_toolchain is a nice write-up how to move from the auto-detected toolchain to something more hermetic.
The hack
Of course, you can hack around this. Note, this is inherently a bad idea:
create a script that inspects the system, determines everything important like the glibc version, maybe the linux distribution (flavor)
creates a string describing this variation and hash-summing it
use that as the instance key/name for your remote cache

Related

Can the same code and same compiler produce a different binary on different machines?

The idea of Nixos binary caches led me to considering this question.
In nix, every compiled binary is associated with a hash key which is obtained from hashing all the dependencies and build script, i.e. a 'derivation' in nix-speak. That is my understanding, anyway.
But couldn't the same derivation lead to different binaries, when compiled on different machines?
If machine A's processor has a slightly different instruction set than machine B's processor, and the compiler took this different instruction set into account, wouldn't the binary produced by compiling the derivation on machine A be distinguishable from the binary produced by compiling the derivation on machine B? If so, then couldn't different binaries could have the same derivation and thus the same nix hash?
Does the same derivation built on machines with different instruction sets always produce the same binary?
This depends on the compiler implementation and options passed to it. For example, GCC by default does not seems to pay attention to the specifics of the current processor, unless you specify -march=native or -mtune=native.
So yes, if you use flags like these or a compiler with default behavior like these flags, you will get a different output on a machine with a different model of cpu.
A build can be non-reproducible for other reasons as well, such as inappropriate use of clock values or random values or even counters that are accessed in non-deterministically interleaved patterns by threads.
Nix does provide a sandbox that removes some sources of entropy; primarily the supposedly unrelated software that may be present on a machine. It does not remove all of these sources for practical reasons.
For these reasons, reproducibility will have to be a consideration, even when packaging with Nix; not something that is solved completely by it.
I'll quote the menu "Achieve deterministic builds
" from https://reproducible-builds.org/docs/ and annotate it with the effect of Nix to the best of my knowledge. Don't quote me on this.
SOURCE_DATE_EPOCH: solved; set by Nixpkgs
Deterministic build systems: partially solved; Nixpkgs may include patches
Volatile inputs can disappear: solvable with Nix if you upload sources to the (binary) cache. Hercules CI does this.
Stable order for inputs: mostly solved. Nix language preserves source order and sorts attributes.
Value initialization: low-level problem not solved by Nix
Version information: not solved; clock is accessible in sandbox
Timestamps: same as above
Timezones: solved by sandbox
Locales: solved by sandbox
Archive metadata: not solved
Stable order for outputs: use of randomness not solvable by sandbox
Randomness: same
Build path: partially; linux uses /build; macOS may differ depending on installation method
System images: broad issue taking elements from previous items
JVM: same

How does Bazel track files so quickly?

I can't find any information about how Bazel tracks file. The documentation doesn't mention if they use something like facebook's watchman.
It obviously takes some kind of hash and compares, but how exactly does it do it? Because it knows if things hasn't changed immediately and it wouldn't be able to read all those files in such a short time.
Also if you are watching many files it would take up a lot of space with a mono repo like Google? I know that is one of the problems scaling git because "git status" will become to slow unless some intelligent caching is used.
Bazel uses OS filesystem monitoring APIs like inotify on Linux and FSEvents on Mac OS
Check out these classes:
https://github.com/bazelbuild/bazel/blob/c5d0b208f39353ae3696016c2df807e2b50848f4/src/main/java/com/google/devtools/build/lib/skyframe/DiffAwareness.java
https://github.com/bazelbuild/bazel/blob/1d2932ae332ca0c517570f559c6dc0bac430b263/src/main/java/com/google/devtools/build/lib/skyframe/LocalDiffAwareness.java
https://github.com/bazelbuild/bazel/blob/c5d0b208f39353ae3696016c2df807e2b50848f4/src/main/java/com/google/devtools/build/lib/skyframe/MacOSXFsEventsDiffAwareness.java

Handling complex and large dependencies

Problem
I've been developing a game in C++ in my spare time and I've opted to use Bazel as my build tool since I have never had a ton of luck (or fun) working with make or cmake. I also have dependencies in other languages (python for some of the high level scripting). I'm using glfw for basic window handling and high level graphics support and that works well enough but now comes the problem. I'm uncertain on how I should handle dependencies like glfw in a Bazel world.
For some of my dependencies (like gtest and fruit) I can just reference them in my WORKSPACE file and Bazel handles them automagically but glfw hasn't adopted Bazel. So all of this leads me to ask, what should I do about dependencies that don't use Bazel inside a Bazel project?
Current approach
For many of the simpler dependencies I have, I simply created a new_git_repository entry in my WORKSPACE file and created a BUILD file for the library. This works great until you get to really complicated libraries like glfw that have a number of dependencies on their own.
When building glfw for a Linux machine running X11 you now have a dependency on X11 which would mean adding X11 to my Bazel setup. X11 Comes with its own set of dependencies (the X11 libraries like X11Cursor) and so on.
glfw also tries to provide basic joystick support which is provided by default in Linux which is great! Except that this is provided by the kernel which means that the kernel is also a dependency of my project. Now I shouldn't need anything more than the kernel headers this still seems like a lot to bring in.
Alternative Options
The reason I took the approach I've taken so far is to make the dependencies required to spin up a machine that can successfully build my game very minimal. In theory they just need a C/C++ compiler, Java 8, and Bazel and they're off to the races. This is great since it also means I can create a Docker container that has Bazel installed and do CI/CD really easily.
I could sacrifice this ease and just say that you need to have libraries like glfw installed before attempting to compile the game but that brings the whole which version is installed and how is it all configured problem back up that Bazel is supposed to help solve.
Surely there is a simpler solution and I'm overthinking this?
If the glfw project has no BUILD files, then you have the following options:
Build glfw inside a genrule.
If glfw supports some other build system like make, you could create a genrule that runs the tool. This approach has obvious drawbacks, like the not-to-be-underestimated impracticality of having to declare all inputs of that genrule, but it'd be the simplest way of Bazel'izing glfw.
Pre-build glfw.o and check it into your source tree.
You can create a cc_library rule for it, and put the .o file in the srcs. Even though this solution is the least flexible of all because you not only restrict the target platform to whatever the .o was built for, but also make it harder to reproduce the whole build, the benefits are sometimes worth the costs.
I view this approach as a last resort. Even in Bazel's own source code there's one cc_library.srcs that includes a raw object file, because it was worth it, as the commit message of 92caf38 explains.
Require that glfw be installed.
You already considered this option. Some people may prefer this to the other approaches.

Install same software/version with different compilers side-by-side

For the development/testing of our CFD code I like to frequently switch between Clang (for its strictness/warnings) and GCC (for performance), but this requires some of its dependencies (like e.g. NetCDF) to be compiled with the same compiler.
I know that Homebrew has the option to install multiple versions of software side-by-side and switch between them, but is it possible to do something similar using the same software version, but compiled with different compilers (by setting HOMEBREW_CC and HOMEBREW_CXX)?
Something like (wishful thinking, after somehow installing NetCDF with both Clang and GCC):
brew switch netcdf 4.3.3-gcc
brew switch netcdf 4.3.3-clang
I think this is possible only if you explicitly have different version numbers like in the example you used "4.3.3-gcc" and "4.3.3-clang".
If the version number is identical then there is no difference in the builds and brew can't differentiate them.
Also I wouldn't do this.
By compiling the same library multiple different ways you now start going into a dependency nightmare.
Dependency conflicts. Because even if you swap out "netcdf", how do you know you swapped out all it's dependencies too? If they are not compiled with the same compiler bad things can happen, for example conflicts can arise due to the differences in how calls are made or due to options enabled in one compiler for both the app and it's dependencies that weren't enabled in another build.
I don't recommend doing this, it's too much of a hassle.
But, if you really need two builds (such as for testing) then I would build them to isolated folder trees outside of your system path and do any testing against them there. Brew is not the best way to address this issue, as this is a non-standard use-case.

I'm writing a script to install postgresql modules. How can I find the "contrib" directory?

I'm writing a script that will install some of the optional postgresql modules. Is there a way to programmatically figure out where the contrib directory is or do I have to prompt for the path? I've looked at a few examples and it seems inconsistent; doesn't appear in pg_config.
(The script might be run on OS X or linux, and I can't make assumptions about how postgresql was installed).
There is no easy works everywhere answer to this, so it's a matter of how much intelligence you want to try and put into your installer. And for some people it will be impossible to do what you hope for, the best you'll be able to do is offer guidance about what's missing. There is a lot of variation on packaging here between Linux distributions, versions of PostgreSQL, and the various ways you can install PostgreSQL on OS X (MacPorts, homebrew, etc.)
First off, only source code installs will have a contrib directory with source code in it that allows building the optional modules. In the packaged builds for Linux, all the contrib binaries may only be available via an optional package, called something like postgresql-contrib. That's the only way to make the optional modules that come with the database available: install the package the binaries are included in. You may see some variation in the OS X builds here too.
If you want to install extensions (what these are officially called as of PostgreSQL 9.1 now, rather than 'modules') using binaries you provide instead, what you then want to know is where to put the resulting shared libraries and matching SQL files that reference them. What pgconfig returns for pkglibdir will tell you where the binaries go, while sharedir points toward the default place to put the SQL. Providing binaries is a losing game though; the job of syncing with every platform to build them is a huge one.
And here are the sort of additional complications you'll run into in this area, if you wanted to ship source code and try to build things yourself in an automatic way:
PostgreSQL 9.1 now installs these using the CREATE EXTENSION
mechanism, so you'll need to handle both the pre-9.1 method and the
new one introduced there.
Not all PostgreSQL installations will have
pg_config. It's considered a development tool, and which package it
gets bundled with (and whether that package is mandatory or not)
varies. Debian/Ubuntu put it into the optional liqpq-dev, RedHat
derived RPMs have it in postgresql-devel or
postgresql-[version]-devel.
Due to pg_config being necessary for
compiling the new 9.1 extensions, packagers have started
reconsidering where pg_config goes; it's considered a lot more
important now than it used to be. 9.1 or later packages might alter which package it's contained in. That doesn't really change what you can and can't do though. It just impacts what advice you might offer for correcting situations your program can't deal with.
I've been describing the standard
Linux packaging here when I talk about that OS. There are also installers for both Linux and
OS X from EnterpriseDB, what they call their "one-click installers".
These use a different standard altogether for what people do and don't get installed in this area. I don't follow the commercial packaging to know what is actually different, but it's another variable you can expect people to encounter.
Recent OS X versions may
have some system PostgreSQL components floating around too. No idea
yet how this handle extensions though.
Basically, all three of version/packager/platform can vary how this will work, and the idea that you'll find any solution that handles even the majority of those permutations is optimistic. Installing extensions is known to be difficult in PostgreSQL, which is one thing that motivated all the 9.1 changes to turn it into a simple CREATE EXTENSION for many of them. But for now, those changes have just added another whole set of variation into the mix, actually making this harder during the transition period. It will be a while until PostgreSQL versions supporting that are the only ones in popular use.

Resources