How to write Bazel rules that work with external repositories? - bazel

The Bazel Starlark API does strange things with files in external repositories. I have the following Starlark snippet:
print(ctx.genfiles_dir)
print(ctx.genfiles_dir.path)
print(output_filename)
ret = ctx.new_file(ctx.genfiles_dir, output_filename)
print(ret.path)
It is creating the following output:
DEBUG: build_defs.bzl:292:5: <derived root>
DEBUG: build_defs.bzl:293:5: bazel-out/k8-fastbuild/genfiles
DEBUG: build_defs.bzl:294:5: google/protobuf/descriptor.upb.c
DEBUG: build_defs.bzl:296:5: bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.upb.c
That extra external/com_google_protobuf comes seemingly out of nowhere, and it makes my rule fail:
I tell protoc to generate into ctx.genfiles_dir.path (which is bazel-out/k8-fastbuild/genfiles).
So protoc generates bazel-out/k8-fastbuild/genfiles/google/protobuf/descriptor.upb.c
Bazel fails because I didn't generate bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.upb.c
Likewise, when I try to call file.short_path on a source file from an external repository, I get a result like ../com_google_protobuf/google/protobuf/descriptor.proto. This seems quite unhelpful, so I just wrote some manual code to strip off the leading ../com_google_protobuf/.
Am I missing something? How can I write this rule in a way that doesn't feel like I'm fighting Bazel the whole time?

Am I missing something?
The basic problem, as you already realized, is that you have two path "namespaces" the one that protoc sees (i.e. import paths) and the one that bazel sees (i.e. the path you pass to declare_file().
2 things to note:
1) All paths declared with declare_file() get the path <bin dir>/<package path incl. workspace>/<path you passed to declare_file()>
2) All actions are executed from <bin dir> (unless output_to_genfils=True in which case this switches to <gen dir> as in your example.
Trying to solve the exact same problem you encountered, I resorted to stripping the known path from the output_file's path to determine which directory to pass as p:
# This code is run from the context of the external protobuf dependency
proto_path = "google/a/b.proto"
output_file = ctx.actions.declare_file(proto_path)
# output_file.path would be `<gen_dir>/external/protobuf/google/a/b.proto`
# Strip the known proto_path from output_file.path
protoc_prefix = output_file.path[:-len(proto_path)]
print(protoc_prefix) # Prints: <gen_dir>/external/protobuf
command = "{protoc} {proto_paths} {cpp_out} {plugin} {plugin_options} {proto_file}".format(
...
cpp_out = "--cpp_out=" + protoc_prefix,
...
)
Alternatives
You may also be able to construct the same path with ctx.bin_dir, ctx.label.workspace_name, ctx.label.package, and ctx.label.name.
Misc.
proto_library recently gained an attribute strip_import_prefix. When used, the above is not correct, as all dependent files are symlinked into a new directory from which they have the relative paths declared with strip_import_prefix.
The path format is:
<bin dir>/<repo>/<package>/_virtual_base/<label name>/<path `import`ed in .proto files>
i.e.
<bin dir>/external/protobuf/_virtual_base/b_proto/google/a/b.proto
Assuming you are building an external repo called protobuf, which contains a BUILD file at its root with a target named b_proto, which in turn, relies on a proto_library wrapping google/a/b.proto AND uses the strip_import_prefix attribute.

Related

Reading the content of directory declared with `actions.declare_directory`

Imagine I have a java_binary target triggered by a custom rule that generates source code and places the generated sources under a directory, let's call it "root".
So after the code generation we will have something like this:
// bazel-bin/...../src/com/example/root
root:
-> Foo.java
-> Bar.java
-> utils
-> Baz.java
Now, I have another target, a java_library, that depends on the previously generated sources, so it depends on the custom rule.
My custom rule definition currently looks something like this:
def _code_generator(ctx):
outputDir = ctx.actions.declare_directory("root")
files = [
ctx.actions.declare_file("root/Foo.java"),
ctx.actions.declare_file("root/Bar.java"),
ctx.actions.declare_file("root/utils/Baz.java"),
// and many,
// many other files
]
outputs = []
outputs.append(outputDir)
outputs.extend(files)
ctx.actions.run(
executable = // executable pointing to the java_binary
outputs = outputs
// ....
)
This works. But as you can see, every anticipated file that is to be generated, is hard-coded in the rule definition. This makes it very fragile, should the code generation produce a different set of files in the future (which it will).
(Without specifying each of the files, as shown above, Bazel will fail the build saying that the files have no generating action)
So I was wondering, is there a way to read the content of the root directory and automatically, somehow, declare each of the files as an output?
What I tried:
The documentation of declare_directory says:
The contents of the directory are not directly accessible from Starlark, but can be expanded in an action command with Args.add_all().
And add_all says:
[...] Each directory File item is replaced by all Files recursively contained in that directory.
This sounds like there could be a way to get access to the individual files in the directory, but I am not sure how.
I tried:
outputDir = ctx.actions.declare_directory("root")
//...
args = ctx.actions.args()
args.add_all(outputDir)
with the intention to access the individual files later from args, but the build fails with: "Error in add_all: expected value of type sequence or depset for values, got File".
Any other ideas on how to implement the rule, so that I don't have to hard-code each and every file that will be generated?

How to integrate C/C++ analysis tooling in Bazel?

I have a code analysis tool that I'd like to run for each cc_library (and cc_binary, silently implied for rest of the question). The tool has a CLI interfaces taking:
A tool project file
Compiler specifics, such as type sizes, built-ins, macros etc.
Files to analyze
File path, includes, defines
Rules to (not) apply
Files to add to the project
Options for synchronizing files with build data
JSON compilation database
Parse build log
Analyze and generate analysis report
I've been looking at how to integrate this in Bazel so that the files to analyze AND the associated includes and defines are updated automatically, and that any analysis result is properly cached. Generating JSON compilation database (using third party lib) or parsing build log both requires separate runs and updating the source tree. For this question I consider that a workaround I'm trying to remove.
What I've tried so far is using aspects, adding an analysis aspect to any library. The general idea is having a base project file holding library invariant configuration, appended with the cc_library files to analysis, and finally an analysis is triggered generating the report. But I'm having trouble to execute, and I'm not sure it's even possible.
This is my aspect implementation so far, trying to iterate through cc_library attributes and target compilation context:
def _print_aspect_impl(target, ctx):
# Make sure the rule has a srcs attribute
if hasattr(ctx.rule.attr, 'srcs'):
# Iterate through the files
for src in ctx.rule.attr.srcs:
for f in src.files.to_list():
if f.path.endswith(".c"):
print("file: ")
print(f.path)
print("includes: ")
print(target[CcInfo].compilation_context.includes)
print("quote_includes: ")
print(target[CcInfo].compilation_context.quote_includes)
print("system_includes: ")
print(target[CcInfo].compilation_context.system_includes)
print("define: " + define)
print(ctx.rule.attr.defines)
print("local_defines: ")
print(ctx.rule.attr.local_defines)
print("") # empty line to separate file prints
return []
What I cannot figure out is how to get ALL includes and defines used when compiling the library:
From libraries depended upon, recursively
copts, defines, includes
From the toolchain
features, cxx_builtin_include_directories
Questions:
How do I get the missing flags, continuing on presented technique?
Can I somehow retrieve the compile action command string?
Appended to analysis project using the build log API
Some other solution entirely?
Perhaps there is something one can do with cc_toolchain instead of aspects...
Aspects are the right tool to do that. The information you're looking for is contained in the providers, fragments, and toolchains of the cc_* rules the aspect has access to. Specifically, CcInfo has the target-specific pieces, the cpp fragment has the pieces configured from the command-line flag, and CcToolchainInfo has the parts from the toolchain.
CcInfo in target tells you if the current target has that provider, and target[CcInfo] accesses it.
The rules_cc my_c_compile example is where I usually look for pulling out a complete compiler command based on a CcInfo. Something like this should work from the aspect:
load("#rules_cc//cc:action_names.bzl", "C_COMPILE_ACTION_NAME")
load("#rules_cc//cc:toolchain_utils.bzl", "find_cpp_toolchain")
[in the impl]:
cc_toolchain = find_cpp_toolchain(ctx)
feature_configuration = cc_common.configure_features(
ctx = ctx,
cc_toolchain = cc_toolchain,
requested_features = ctx.features,
unsupported_features = ctx.disabled_features,
)
c_compiler_path = cc_common.get_tool_for_action(
feature_configuration = feature_configuration,
action_name = C_COMPILE_ACTION_NAME,
)
[in the loop]
c_compile_variables = cc_common.create_compile_variables(
feature_configuration = feature_configuration,
cc_toolchain = cc_toolchain,
user_compile_flags = ctx.fragments.cpp.copts + ctx.fragments.cpp.conlyopts,
source_file = src.path,
)
command_line = cc_common.get_memory_inefficient_command_line(
feature_configuration = feature_configuration,
action_name = C_COMPILE_ACTION_NAME,
variables = c_compile_variables,
)
env = cc_common.get_environment_variables(
feature_configuration = feature_configuration,
action_name = C_COMPILE_ACTION_NAME,
variables = c_compile_variables,
)
That example only handles C files (not C++), you'll have to change the action names and which parts of the fragment it uses appropriately.
You have to add toolchains = ["#bazel_tools//tools/cpp:toolchain_type"] and fragments = ["cpp"] to the aspect invocation to use those. Also see the note in find_cc_toolchain.bzl about the _cc_toolchain attr if you're using legacy toolchain resolution.
The information coming from the rules and the toolchain is already structured. Depending on what your analysis tool wants, it might make more sense to extract it directly instead of generating a full command line. Most of the provider, fragment, and toolchain is well-documented if you want to look at those directly.
You might pass required_providers = [CcInfo] to aspect to limit propagation to rules which include it, depending on how you want to manage propagation of your aspect.
The Integrating with C++ Rules documentation page also has some more info.

Have all Bazel packages expose their documentation files (or any file with a given extension)

Bazel has been working great for me recently, but I've stumbled upon a question for which I have yet to find a satisfactory answer:
How can one collect all files bearing a certain extension from the workspace?
Another way of phrasing the question: how could one obtain the functional equivalent of doing a glob() across a complete Bazel workspace?
Background
The goal in this particular case is to collect all markdown files to run some checks and generate a static site from them.
At first glance, glob() sounds like a good idea, but will stop as soon as it runs into a BUILD file.
Current Approaches
The current approach is to run the collection/generation logic outside of the sandbox, but this is a bit dirty, and I'm wondering if there is a way that is both "proper" and easy (ie, not requiring that each BUILD file explicitly exposes its markdown files.
Is there any way to specify, in the workspace, some default rules that will be added to all BUILD files?
You could write an aspect for this to aggregate markdown files in a bottom-up manner and create actions on those files. There is an example of a file_collector aspect here. I modified the aspect's extensions for your use case. This aspect aggregates all .md and .markdown files across targets on the deps attribute edges.
FileCollector = provider(
fields = {"files": "collected files"},
)
def _file_collector_aspect_impl(target, ctx):
# This function is executed for each dependency the aspect visits.
# Collect files from the srcs
direct = [
f
for f in ctx.rule.files.srcs
if ctx.attr.extension == f.extension
]
# Combine direct files with the files from the dependencies.
files = depset(
direct = direct,
transitive = [dep[FileCollector].files for dep in ctx.rule.attr.deps],
)
return [FileCollector(files = files)]
markdown_file_collector_aspect = aspect(
implementation = _file_collector_aspect_impl,
attr_aspects = ["deps"],
attrs = {
"extension": attr.string(values = ["md", "markdown"]),
},
)
Another way is to do a query on file targets (input and output files known to the Bazel action graph), and process these files separately. Here's an example querying for .bzl files in the rules_jvm_external repo:
$ bazel query //...:* | grep -e ".bzl$"
//migration:maven_jar_migrator_deps.bzl
//third_party/bazel_json/lib:json_parser.bzl
//settings:stamp_manifest.bzl
//private/rules:jvm_import.bzl
//private/rules:jetifier_maven_map.bzl
//private/rules:jetifier.bzl
//:specs.bzl
//:private/versions.bzl
//:private/proxy.bzl
//:private/dependency_tree_parser.bzl
//:private/coursier_utilities.bzl
//:coursier.bzl
//:defs.bzl

Does Bazel need external-repo BUILD files to be in $WORKSPACE_ROOT/external?

I made a repository for glfw with this:
load("#bazel_tools//tools/build_defs/repo:git.bzl", "new_git_repository")
new_git_repository(
name = "glfw",
build_file = "BUILD.glfw",
remote = "https://github.com/glfw/glfw.git",
tag = "3.2.1",
)
I put BUILD.glfw in the WORKSPACE root. When I built, I saw:
no such package '#glfw//': Not a regular file: [snipped/external/BUILD.glfw
I moved BUILD.glfw to external/BUILD.glfw and it seems to work, but I couldn't find documentation about this. The docs about new_git_repository say that build_file "...is a label relative to the main workspace."; I don't see anything about 'external' there.
This is due to an inconsistent semantical difference between the native and (newer) Skylark versions of new_git_repository. To use the native new_git_repository, comment/remove the load statement:
# load("#bazel_tools//tools/build_defs/repo:git.bzl", "new_git_repository")
Assuming that new_git_repository has the same problem that http_archive has, per Bazel issue 6225 you need to refer to the BUILD file for glfw as #//:BUILD.glfw

Why can't waf find a path that exists?

Let's say I have x.y file in /mydir/a/b (on Linux)
When I run waf, it does not find the file.
def configure(context):
pass
def build(build_context):
build_context(source='/mydir/a/b/x.y',
rule='echo ${SRC} > ${TGT}',
target='test.out')
Result: source not found: '/mydir/a/b/x.y' in bld(features=[], idx=1, meths=['process_rule', 'process_source'] ...
Ok, maybe you want a relative path, Waf? And you are not telling me?
def build(context):
path_str = '/mydir/a/b'
xy_node = context.path.find_dir(path_str)
if xy_node is None:
exit ("Error: Failed to find path {}".format(path_str))
# just refer to the current script
orig_path = context.path.find_resource('wscript')
rel_path = xy_node.path_from(orig_path)
print "Relative path: ", rel_path
Result: Error: Failed to find path /mydir/a/b
But that directory exists! What's up with that?
And, by the way, the relative path for some subdirectory (which it can find) is one off. e.g. a/b under current directory results in relative path "../a/b". I'd expect "a/b"
In general there are (at least) two node objects in each context:
- path: is pointing to the location of the wscript
- root: is pointing to the filesystem root
So in you case the solution is to use context.root:
def build(context):
print context.path.abspath()
print context.root.abspath()
print context.root.find_dir('/mydir/a/b')
Hmm, looks like I found an answer on the waf-users group forum, answered by Mr. Nagy himself:
The source files must be present under the top-level directory. You
may either:
create a symlink to the source directory
copy the external source files into the build directory (which may cause problem if there is a structure of folders to copy)
set top to a common folder such as '/' (may require superuse permissions, so it is a bad idea in general)
The recommendation in conclusion is to add a symlink to the outside directory during the configuration step. I wonder how that would work, if I need this on both, Linux and Windows...
Just pass the Node to the copy rule instead of passing the string representing the path:
def build(build_context):
source_node = build_context.root.find_node('/mydir/a/b/x.y')
build_context(source=source_node,
rule='echo ${SRC} > ${TGT}',
target='test.out')
Waf will be able to find the file even if outside of the top level directory.

Resources