Are there any tools out there for introspecting a collection of Bazel build files to run queries against a codebase with? I'm thinking of a simple case of collecting all defined tags used in a codebase. Some sort of bazel metaquery sort of capability that will let me scope out the conventions and usages across a repo with a substantial amount of build files.
It would even be nice to be able to do a cross tabulation of cc_test and py_test rules against their collective tags. Ideally there'd be a python client to introspect the bazel files.
bazel query provides information about your target dependency graph, with a highly expressive query language. It can output into various formats like DOT, XML, Protobuf, and the text representation of the expanded BUILD files themselves (if there are macros) for post-processing. See: Bazel query how-to, Bazel query reference.
bazel cquery does the same as query, but also performs the analysis phase, which computes information about configurations (e.g. CPU, API levels) over the target dependency graph. This takes slightly longer, but gives you a more accurate representation of the graph that Bazel brings the execution phase. See: Bazel cquery reference.
bazel aquery is not directly related to BUILD file introspections in that it presents information about executable actions, which is a few layers of computation after BUILD file parsing and analysis. See: Bazel aquery reference
query, cquery and aquery don't operate on the syntax of the BUILD files. If you want to work with the Starlark syntax / AST, check out the buildozer and buildifier tooling in the bazelbuild/buildtools repository.
If there are information about your build graph that cannot be retrieved using these mechanisms, please file a feature request on the Bazel GitHub project.
Related
Looking at the outputs from Bazel's build event protocol, I can see that the BEP output only has one unique invocation_id (which makes sense) and also only one unique build_id (even when building multiple targets at once).
In that case, what's the point of build_id?
A higher-level system that uses Bazel (e.g., a CI system) may have a concept of a build that is broader than one Bazel invocation. For instance, the higher-level system may retry entire Bazel invocations under certain circumstances or allow a "build" to contain multiple Bazel build or test steps. The build id allows multiple Bazel invocations composing a high-level build to be correlated in Bazel's metadata emissions (most notably, the Build Event Protocol).
A few years ago I wrote a set of wrappers for Bazel that enabled me to use it to build FPGA code. The FPGA bit is only relevant because the full clean build takes many CPU days so I really care about caching and minimizing rebuilds.
Using Bazel v0.28 I never found a way to have my Bazel package depend on a single source file from somewhere else in the git repo. It felt like this wasn't something Bazel was designed for.
We want to do this because we have a library of VHDL source files that are parameterized and the parameters are set in the instantiating VHDL source. (VHDL generics). If we declare this library as a Bazel package in its own right then a change to one library file would rebuild everything (at huge time cost) when in practice only a couple of steps might need to be rebuilt.
I worked around this with a python script to copy all the individual source files into a subdirectory and then generate the BUILD file to reference these copies. The resulting build process is:
call python preparation script
bazel build //:allfpgas
call python result extractor
This is clearly quite ugly but the benefits were huge so we live with it.
Now we want to leverage Bazel to build our Java, C++ etc so I wanted to revisit and try and make everything work with Bazel alone.
In the latest Bazel is there a way to have a BUILD package depend on individual source files outside of the package directory? If Bazel cant, would buck pants or please.build work better for our use case?
The Bazel rules for most languages support doing something like this already. For example, the Python rules bundle source files from multiple packages together, and the C++ rules manage include files from other packages. Somehow the rule has to pass the source files around in providers, so that another rule can generate actions which use them. Hard to be more specific without knowing which rules you're using.
If you just want to copy the files, you can do that in bazel with a genrule. In the package with the source file:
exports_files(["templated1.vhd", "templated2.vhd"])
In the package that uses it:
genrule(
name = "copy_templates",
srcs = ["//somewhere:templated1.vhd", "//somewhere:templated2.vhd"],
outs = ["templated1.vhd", "templated2.vhd"],
cmd = "cp $(SRCS) $(RULEDIR)",
)
some_library(
srcs = ["templated1.vhd", "templated2.vhd", "other.vhd"],
)
If you want to deduplicate that across multiple packages that use it, put the filenames in a list and write a macro to create the genrule.
Given a rather big repository built with bazel and tons of third party dependencies in multiple languages (including heavy docker containers), I have the following problem:
running Bazel queries triggers the downloading of many of these dependencies, resulting in slow query performance. Hence, the question:
Is there a way to run bazel query without having to download the dependencies?
Typical query: bazel query 'kind("source file", deps(//...) except deps(//3rdparty/...))
I'm aware of the caching options, which I mostly use, but depending on the languages, things can still be slow.
After asking on Bazel's Slack channel, the response (from Sahin Yort) is not encouraging:
I don’t believe that’s possible due to nature of workspace files. loads from a workspace leads to fetch of the given workspace because has to expand the workspaces in order to know their targets. at that point, it is up to the repository rule to fetch whatever it needs to fetch eagerly or lazily. workspace rules usually expand BUILD files using various patterns. eg running a executable or using expand_template. i have little faith in that it is possible to get what you want.
I'll be looking into other ways to speed things up: a likely culprit for the slowness is probably the action/analysis cache being invalidated due to some flags changing.
If you only need to parse the AST without resolving dependencies, you can use buildozer instead:
buildozer "print srcs" "//some:target"
It also supports -output_json for machine readable output
New to Bazel. Looking to see if there is a way to create a Bazel rule that allows me to get a list of all targets and then feed that data into a kotlin file or something like that.
I was able to run bazel query //... --export xml > temp.xml this gives me all the targets and their build files info but I would like to retrieve this info using a bazel rule, any ideas of how I could go about this?
The short answer is no. The closest thing to this in Bazel is a genquery, though it's worth noting that there are some caveats to this approach as mentioned in the docs;
In order to keep the build consistent, the query is allowed only to visit the transitive closure of the targets specified in the scope attribute. Queries violating this rule will fail during execution if strict is unspecified or true (if strict is false, the out of scope targets will simply be skipped with a warning). The easiest way to make sure this does not happen is to mention the same labels in the scope as in the query expression.
If you are happy to blow past the warnings around 'build consistency' it might be possible to achieve this using a similar approach to the buildifier rules, where you would determine the path of the workspace and run Bazel query as a subprocess of Bazel. Personally, I wouldn't recommend this and instead would suggest that you just use the output of bazel query directly.
I'm taking a glance over at the buildtools repo (https://github.com/bazelbuild/buildtools) and trying to understand the scope of its responsibilities as it relates to the three phases of a bazel build (loading, analysis, execution)
The repo's description states that it is A bazel BUILD file formatter and editor. I find much logic in the repo written in go-lang that lends complete support for an AST parser, starlark syntax interpreting capabilities, reformatting and rewriting of BUILD files and what not. Basically there's logic designed to operate upon a single starlark file at a time. Rereading that repo description in this light leads me to conclude that buildtools is really a single file scoped effort and presents tools that only intersect functionality wise (perhaps only partially) to those loading operations bazel conducts while building.
Question: Is it accurate that the focus of buildtools is upon the single starlark file?
If that's true then all the multiple starlark file analysis logic and so forth seems to actually be maintained over at https://github.com/bazelbuild/bazel/tree/master/src/main/java/com/google/devtools/build/lib and I should not expect to find any tools for the analysis phase and beyond in the buildtools repo. Is that right?
I don't work on Buildtools, but we agree: these tools seem to focus on BUILD / .bzl files in isolation. They let you process these files in parallel, to do similar operations on them.
If you wonder whether these tools understand relations between these files, the answer seems to be no.
If you further wonder what tools do then, the answer is Bazel's query, cquery, and aquery. I'm not aware of a programmable API for these queries though; you have to run Bazel to perform them.
buildtools has tools working on a syntactic level (it looks at the syntax tree). These tools are outside of Bazel and have no knowledge of Bazel build phases. In the future, we may expand the code to work on multiple files (for the static analysis), but it will still be independent from Bazel phases.
https://github.com/bazelbuild/bazel/tree/master/src/main/java/com/google/devtools/build/lib/ is the source code of Bazel. The syntax/ directory includes the code for reading and evaluating the Starlark files. The code there is called by Skyframe. The interpreter is called by Skyframe many times in parallel, both during the loading and the analysis phases.
If you have a more specific question (what are you trying to do?), I can help more. :)