Bazel Clang cross platform

Bazel Clang cross platform - bazel

I'm trying to get bazel to use clang as the compiler on both Windows and Linux. (Debian 10 if that matters)
On Windows I managed to get it working by adding a windows-clang platform and registering the toolchain as described here.
Is there a similarly easy way to get switch to using clang on Linux?

I've dug around a but and as far as I can tell, you have two (or three options):
You can configure your won cc_toolchain. This may be arguably the "correct" solution.
You can try to tweak cc_configure() from #bazel_tools//tools/cpp:cc_configure.bzl and use it in your WORKSPACE. But that is both challenging and not necessarily pretty.
Neither of the two ways are likely to qualify as "easy". For that, the quickest would appear to be:
Automagic toolchain resolution considers environmental variable CC (only, not CXX) and if set its value is used for toolchain configuration. Hence, for instance this would do:
CC=/usr/bin/clang bazel build //:some_tgt
I hope I haven't missed anything, but I have not spotted a way to choose compiler through platforms (without having your own toolchain definition in place) as of today.

Related

With bazel how do I be/make sure objects taken from cache have been build for the right system/libraries?

I got some strange glibc-related linker errors for builds with distributed build cache configured on build nodes running different Linux distributions.
Now I somehow suspect build artifacts from those machines with different glibc versions getting mixed up, but I don't know how to investigate this.
How do I find out what Bazel takes into account when building the hash for a certain build artifact?
I know I can explicitly set environment variables which then will affect the hash. But how can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
And how do I check/compare what's been taken into account?

This is a complex topic and a multi-facet question. I am going to answer in the following order:
How do I check/compare what's been taken into account?
How to investigate against which glibc a build linked?
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
How do I check/compare what's been taken into account?
To answer this, you should look into the the execution look, specifically you can read up on https://bazel.build/remote/cache-remote#compare-logs. The *.json execution log should contain everything you need to know (granted, it might be a bit verbose) and is a little easier to process with shell-magic/your editor.
How to investigate against which glibc a build linked?
From the execution log, you can get all the required hashes to retrieve cached artifacts/binaries from your remote cache. Given these files, you should be able to use standard tools to get to the glibc version (ldd -r -v binary | grep GLIBC).
How can I be sure a given compiler, a certain version of glibc, etc. will lead to different hashes for built artifacts?
This depends on the way you have setup for compilation toolchain. The best case would be a fully hermetic compilation toolchain, where all necessary files are declared using attributes like https://bazel.build/reference/be/c-cpp#cc_toolchain.compiler_files.
But this would also mean to lock-down the compiler sysroot. This should include all libraries you are linking against if you want full hermeticity. If you want to use some system libraries, you need to tell bazel where to find them and to factor in their hash: https://stackoverflow.com/a/43419786/20546409 or https://www.stevenengelhardt.com/2021/09/22/practical-bazel-depending-on-a-system-provided-c-cpp-library/
If you use the auto-detected compiler toolchain, some tricks are used to lock-down the sysroot paths, but expect some non-hermiticity. https://github.com/limdor/bazel-examples/tree/master/linux_toolchain is a nice write-up how to move from the auto-detected toolchain to something more hermetic.
The hack
Of course, you can hack around this. Note, this is inherently a bad idea:
create a script that inspects the system, determines everything important like the glibc version, maybe the linux distribution (flavor)
creates a string describing this variation and hash-summing it
use that as the instance key/name for your remote cache

What is the difference between cc_toolchain_suite and register_toolchains?

In the context of C++ toolchain, I am trying to understand the difference of the concept between cc_toolchain_suite and register_toolchains, to me it seems they achieve the same purpose: select a toolchain based on command line parameters.
See https://docs.bazel.build/versions/master/toolchains.html for register_toolchains
See https://docs.bazel.build/versions/master/cc-toolchain-config-reference.html for cc_toolchain_suite
Can someone please help understand the subtlety behind these 2 concepts?

TL;DR The cc_toolchain_suite is part of the legacy toolchain configuration system. It still exists in Bazel because the migration to the new API is not complete. The register_toolchains is part of the newer, unified toolchain API. When possible use register_toolchains instead of cc_toolchain_suite/--*crosstool_top
Originally the concept of a 'toolchain' was not standardised in Bazel, so a Java toolchain would be implemented very differently than a cc toolchain.
Before the unified starlark toolchain API, cc toolchains were specified in proto-text formatted 'CROSSTOOL' files.
With the introduction of the platforms API and the unified toolchains API, the concepts in the CROSSTOOL files were converted almost 1:1 to the new unified platforms/toolchains starlark API. This was mostly to ensure that there was compatibility between the old/new API's.
One of the concepts in the older 'CROSSTOOL' configuration system was a 'toolchain suite', that allowed you to define a group of toolchains targeting different CPU's (This was before the platforms API was introduced).
As far as I understand the only reason that cc_toolchain_suite is still a part of Bazel's starlark API is that some of the apple/android toolchains have not yet been completely migrated across.
Here are a few examples of where I've opted to using the newer register_toolchains approach. Note that these toolchains do not use cc_toolchain_suite anymore.

Restricting a Bazel target to specific platforms

I have a target which can only be built for Linux (in this case, because it depends on syscalls only available on Linux and there is no desire to try and make this cross-platform). How can I express this in my BUILD files?
I can see from the Platforms documentation that there exists a Linux platform definition as #bazel_tools//platforms:linux, but it is not clear to me how to make use of this to restrict a target. Trying to specify this in compatible_with results in an error like this:
(13:27:09) ERROR: /foo/BUILD:4:1: in compatible_with attribute of go_library rule //foo:go_default_library: constraint_value rule '#bazel_tools//platforms:linux' is misplaced here (expected environment). Since this rule was created by the macro 'go_library_macro', the error might have been caused by the macro implementation in /foo/BUILD:4:1
So I have a few related questions:
The error seems to indicate I've supplied the wrong type of rule to compatible_with. What is an environment and how do I provide one? (I've struggled to find documentation on this)
I gather that the migration to Platforms might not yet be complete and rules_go might not have been updated. If it's not possible with Platforms, is there an "old way" to do this instead?
Ideally, I would like this not to result in build errors when running commands like bazel test //:all on a different (non-Linux) platform – ie. I'd prefer it just exclude these, or something. Is this possible?
Thanks for your help 😊

Turns out that this is an open issue. Once this is fixed, I think it should be possible: https://github.com/bazelbuild/bazel/issues/3780

Handling complex and large dependencies

Problem
I've been developing a game in C++ in my spare time and I've opted to use Bazel as my build tool since I have never had a ton of luck (or fun) working with make or cmake. I also have dependencies in other languages (python for some of the high level scripting). I'm using glfw for basic window handling and high level graphics support and that works well enough but now comes the problem. I'm uncertain on how I should handle dependencies like glfw in a Bazel world.
For some of my dependencies (like gtest and fruit) I can just reference them in my WORKSPACE file and Bazel handles them automagically but glfw hasn't adopted Bazel. So all of this leads me to ask, what should I do about dependencies that don't use Bazel inside a Bazel project?
Current approach
For many of the simpler dependencies I have, I simply created a new_git_repository entry in my WORKSPACE file and created a BUILD file for the library. This works great until you get to really complicated libraries like glfw that have a number of dependencies on their own.
When building glfw for a Linux machine running X11 you now have a dependency on X11 which would mean adding X11 to my Bazel setup. X11 Comes with its own set of dependencies (the X11 libraries like X11Cursor) and so on.
glfw also tries to provide basic joystick support which is provided by default in Linux which is great! Except that this is provided by the kernel which means that the kernel is also a dependency of my project. Now I shouldn't need anything more than the kernel headers this still seems like a lot to bring in.
Alternative Options
The reason I took the approach I've taken so far is to make the dependencies required to spin up a machine that can successfully build my game very minimal. In theory they just need a C/C++ compiler, Java 8, and Bazel and they're off to the races. This is great since it also means I can create a Docker container that has Bazel installed and do CI/CD really easily.
I could sacrifice this ease and just say that you need to have libraries like glfw installed before attempting to compile the game but that brings the whole which version is installed and how is it all configured problem back up that Bazel is supposed to help solve.
Surely there is a simpler solution and I'm overthinking this?

If the glfw project has no BUILD files, then you have the following options:
Build glfw inside a genrule.
If glfw supports some other build system like make, you could create a genrule that runs the tool. This approach has obvious drawbacks, like the not-to-be-underestimated impracticality of having to declare all inputs of that genrule, but it'd be the simplest way of Bazel'izing glfw.
Pre-build glfw.o and check it into your source tree.
You can create a cc_library rule for it, and put the .o file in the srcs. Even though this solution is the least flexible of all because you not only restrict the target platform to whatever the .o was built for, but also make it harder to reproduce the whole build, the benefits are sometimes worth the costs.
I view this approach as a last resort. Even in Bazel's own source code there's one cc_library.srcs that includes a raw object file, because it was worth it, as the commit message of 92caf38 explains.
Require that glfw be installed.
You already considered this option. Some people may prefer this to the other approaches.

Clang dynamic memory analyzer not referencing back to source code Red Hat 6.3

We recently built the 3.3 release of clang/llvm using the Fedora 20 packaging process as a guide to unpacking, moving the different parts to the correct location and building the compiler tool chain. All seems to be working correctly except the dynamic memory analyzer is not referencing back to the source code. The same usage on the Fedora platform does reference back to the source code.
This is our first attempt to use the clang/llvm tool set. Also this is the first question asked in this forum which seems a bit different on its organization from all the others I have participated in so my appologies in advance if I have not figured out the nuances of posting a question here. Does seem odd that the main projects do not seem to have a way of asking questions.

We found a solution, do not know quite why we needed to add the extra
environment setup. Compiling as follows:
PATH=/net/fas4045/home3/jq031c/llvm_sandbox/bin:$PATH make -j 16
DEPFILES= CXX=clang++ CC=clang CXXFLAGS="-fsanitize=memory
-fsanitize-memory-track-origins -fno-omit-frame-pointer"
LDXFLAGS=-fsanitize=memory
Runing as follows:
MSAN_SYMBOLIZER_PATH=/net/fas4045/home3/jq031c/llvm_sandbox/bin/llvm-symbolizer ./runtests.sh
We can understand that we need to add the analysis option to the link flags as we do a two step build of compile followed by link. The discovery after searching was the need to define the path to llvm-symbolizer with an environment variable which none of the other dynamic analysis options seems to need.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart