I have a yaml file in a Bazel monorepo that has constants I'd like to use in several languages. This is kind of like how protobuffs are created and used.
How can I parse this yaml file at build time and depend on the outputs?
For instance:
item1: "hello"
item2: "world"
nested:
nested1: "I'm nested"
nested2: "I'm also nested"
I then need to parse this yaml file so it can be used in many different languages (e.g., Rust, TypeScript, Python, etc.). For instance, here's the desired output for TypeScript:
export default {
item1: "hello",
item2: "world",
nested: {
nested1: "I'm nested",
nested2: "I'm also nested",
}
}
Notice, I don't want TypeScript code that reads the yaml file and converts it into an object. That conversion should be done in the build process.
For the actual conversion, I'm thinking of writing that in Python, but it doesn't need to be. This would then mean the python also needs to run at build time.
P.S. I care mostly about the functionality, so I'm flexible with the exactly how it's done. I'm even fine using another file format aside from yaml.
Thanks to help from #oakad, I was able to figure it out. Essentially, you can use genrule to create outputs.
So, assuming you have some target like python setup to generate the output named parse_config, you can just do this:
genrule(
name = "generated_output",
srcs = [],
outs = ["output.txt"],
cmd = "./$(execpath :parse_config) > $#" % name,
exec_tools = [":parse_config"],
visibility = ["//visibility:public"],
)
The generated file is output.txt. And you can access it via //lib/config:generated_output.
Note, essentially the cmd is piping the stdout into the file contents. In Python that means anything printed will appear in the generated file.
Related
Imagine I have a java_binary target triggered by a custom rule that generates source code and places the generated sources under a directory, let's call it "root".
So after the code generation we will have something like this:
// bazel-bin/...../src/com/example/root
root:
-> Foo.java
-> Bar.java
-> utils
-> Baz.java
Now, I have another target, a java_library, that depends on the previously generated sources, so it depends on the custom rule.
My custom rule definition currently looks something like this:
def _code_generator(ctx):
outputDir = ctx.actions.declare_directory("root")
files = [
ctx.actions.declare_file("root/Foo.java"),
ctx.actions.declare_file("root/Bar.java"),
ctx.actions.declare_file("root/utils/Baz.java"),
// and many,
// many other files
]
outputs = []
outputs.append(outputDir)
outputs.extend(files)
ctx.actions.run(
executable = // executable pointing to the java_binary
outputs = outputs
// ....
)
This works. But as you can see, every anticipated file that is to be generated, is hard-coded in the rule definition. This makes it very fragile, should the code generation produce a different set of files in the future (which it will).
(Without specifying each of the files, as shown above, Bazel will fail the build saying that the files have no generating action)
So I was wondering, is there a way to read the content of the root directory and automatically, somehow, declare each of the files as an output?
What I tried:
The documentation of declare_directory says:
The contents of the directory are not directly accessible from Starlark, but can be expanded in an action command with Args.add_all().
And add_all says:
[...] Each directory File item is replaced by all Files recursively contained in that directory.
This sounds like there could be a way to get access to the individual files in the directory, but I am not sure how.
I tried:
outputDir = ctx.actions.declare_directory("root")
//...
args = ctx.actions.args()
args.add_all(outputDir)
with the intention to access the individual files later from args, but the build fails with: "Error in add_all: expected value of type sequence or depset for values, got File".
Any other ideas on how to implement the rule, so that I don't have to hard-code each and every file that will be generated?
What is the best way to refer to an external package's path in any arbitrary files processed by Bazel?
I'm trying to understand how Bazel preprocesses BUILD and .bzl files. I see instances where strings contain calls to package() and I am wondering how it works (and could not find any relevant documentation). Here is an example of this:
I have a toolchain which BUILD file contains the following expression :
cc_toolchain_config(
name = "cc-toolchain-config",
abi_libc_version = "glibc_" + host_gcc8_bundle()["pkg_version"]["glibc"],
abi_version = "gcc-" + host_gcc8_bundle()["version"],
compiler = "gcc-" + host_gcc8_bundle()["version"],
cpu = "x86_64",
cxx_builtin_include_directories = [
"%package(#host_gcc8_toolchain//include/c++/8)%",
"%package(#host_gcc8_toolchain//lib64/gcc/x86_64-unknown-linux-gnu/8/include-fixed)%",
"%package(#host_gcc8_kernel_headers//include)%",
"%package(#host_gcc8_glibc//include)%",
],
host_system_name = "x86_64-unknown-linux-gnu",
target_libc = "glibc_" + host_gcc8_bundle()["pkg_version"]["glibc"],
target_system_name = "x86_64-unknown-linux-gnu",
toolchain_identifier = "host_linux_gcc8",
)
From my understanding, the cxx_builtin_include_directories defines a list of strings to serve as the --sysroot option passed to GCC as detailed in https://docs.bazel.build/versions/0.23.0/skylark/lib/cc_common.html These strings are in the format %sysroot%.
Since package(#host_gcc8_toolchain//include/c++/8) for example, does not mean anything to GCC, bazel has to somehow expand this function to produce the actual path to the files included in the package before passing them to the compiler driver.
But how can it determine that this needs to be expanded and that it is not a regular string ? So how does Bazel preprocess the BUILD file ? Is it because of the % ... % pattern ? Where is this documented ?
is "%package(#external_package//target)%" a pattern that can be used elsewhere ? In any BUILD file ? Where do I find Bazel documentation showing how this works ?
These directives are expanded by cc_common.create_cc_toolchain_config_info within the cc_toolchain_config rule implementation not any sort of preprocessing on the BUILD file (I.e., "%package(#host_gcc8_glibc//include)%" is literally passed into the cc_toolchain_config rule.) I'm not aware that these special expansions are completely documented anywhere but the source.
Bazel has been working great for me recently, but I've stumbled upon a question for which I have yet to find a satisfactory answer:
How can one collect all files bearing a certain extension from the workspace?
Another way of phrasing the question: how could one obtain the functional equivalent of doing a glob() across a complete Bazel workspace?
Background
The goal in this particular case is to collect all markdown files to run some checks and generate a static site from them.
At first glance, glob() sounds like a good idea, but will stop as soon as it runs into a BUILD file.
Current Approaches
The current approach is to run the collection/generation logic outside of the sandbox, but this is a bit dirty, and I'm wondering if there is a way that is both "proper" and easy (ie, not requiring that each BUILD file explicitly exposes its markdown files.
Is there any way to specify, in the workspace, some default rules that will be added to all BUILD files?
You could write an aspect for this to aggregate markdown files in a bottom-up manner and create actions on those files. There is an example of a file_collector aspect here. I modified the aspect's extensions for your use case. This aspect aggregates all .md and .markdown files across targets on the deps attribute edges.
FileCollector = provider(
fields = {"files": "collected files"},
)
def _file_collector_aspect_impl(target, ctx):
# This function is executed for each dependency the aspect visits.
# Collect files from the srcs
direct = [
f
for f in ctx.rule.files.srcs
if ctx.attr.extension == f.extension
]
# Combine direct files with the files from the dependencies.
files = depset(
direct = direct,
transitive = [dep[FileCollector].files for dep in ctx.rule.attr.deps],
)
return [FileCollector(files = files)]
markdown_file_collector_aspect = aspect(
implementation = _file_collector_aspect_impl,
attr_aspects = ["deps"],
attrs = {
"extension": attr.string(values = ["md", "markdown"]),
},
)
Another way is to do a query on file targets (input and output files known to the Bazel action graph), and process these files separately. Here's an example querying for .bzl files in the rules_jvm_external repo:
$ bazel query //...:* | grep -e ".bzl$"
//migration:maven_jar_migrator_deps.bzl
//third_party/bazel_json/lib:json_parser.bzl
//settings:stamp_manifest.bzl
//private/rules:jvm_import.bzl
//private/rules:jetifier_maven_map.bzl
//private/rules:jetifier.bzl
//:specs.bzl
//:private/versions.bzl
//:private/proxy.bzl
//:private/dependency_tree_parser.bzl
//:private/coursier_utilities.bzl
//:coursier.bzl
//:defs.bzl
The Bazel Starlark API does strange things with files in external repositories. I have the following Starlark snippet:
print(ctx.genfiles_dir)
print(ctx.genfiles_dir.path)
print(output_filename)
ret = ctx.new_file(ctx.genfiles_dir, output_filename)
print(ret.path)
It is creating the following output:
DEBUG: build_defs.bzl:292:5: <derived root>
DEBUG: build_defs.bzl:293:5: bazel-out/k8-fastbuild/genfiles
DEBUG: build_defs.bzl:294:5: google/protobuf/descriptor.upb.c
DEBUG: build_defs.bzl:296:5: bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.upb.c
That extra external/com_google_protobuf comes seemingly out of nowhere, and it makes my rule fail:
I tell protoc to generate into ctx.genfiles_dir.path (which is bazel-out/k8-fastbuild/genfiles).
So protoc generates bazel-out/k8-fastbuild/genfiles/google/protobuf/descriptor.upb.c
Bazel fails because I didn't generate bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.upb.c
Likewise, when I try to call file.short_path on a source file from an external repository, I get a result like ../com_google_protobuf/google/protobuf/descriptor.proto. This seems quite unhelpful, so I just wrote some manual code to strip off the leading ../com_google_protobuf/.
Am I missing something? How can I write this rule in a way that doesn't feel like I'm fighting Bazel the whole time?
Am I missing something?
The basic problem, as you already realized, is that you have two path "namespaces" the one that protoc sees (i.e. import paths) and the one that bazel sees (i.e. the path you pass to declare_file().
2 things to note:
1) All paths declared with declare_file() get the path <bin dir>/<package path incl. workspace>/<path you passed to declare_file()>
2) All actions are executed from <bin dir> (unless output_to_genfils=True in which case this switches to <gen dir> as in your example.
Trying to solve the exact same problem you encountered, I resorted to stripping the known path from the output_file's path to determine which directory to pass as p:
# This code is run from the context of the external protobuf dependency
proto_path = "google/a/b.proto"
output_file = ctx.actions.declare_file(proto_path)
# output_file.path would be `<gen_dir>/external/protobuf/google/a/b.proto`
# Strip the known proto_path from output_file.path
protoc_prefix = output_file.path[:-len(proto_path)]
print(protoc_prefix) # Prints: <gen_dir>/external/protobuf
command = "{protoc} {proto_paths} {cpp_out} {plugin} {plugin_options} {proto_file}".format(
...
cpp_out = "--cpp_out=" + protoc_prefix,
...
)
Alternatives
You may also be able to construct the same path with ctx.bin_dir, ctx.label.workspace_name, ctx.label.package, and ctx.label.name.
Misc.
proto_library recently gained an attribute strip_import_prefix. When used, the above is not correct, as all dependent files are symlinked into a new directory from which they have the relative paths declared with strip_import_prefix.
The path format is:
<bin dir>/<repo>/<package>/_virtual_base/<label name>/<path `import`ed in .proto files>
i.e.
<bin dir>/external/protobuf/_virtual_base/b_proto/google/a/b.proto
Assuming you are building an external repo called protobuf, which contains a BUILD file at its root with a target named b_proto, which in turn, relies on a proto_library wrapping google/a/b.proto AND uses the strip_import_prefix attribute.
Suppose I have this macro which creates a native.cc_binary target:
def build_it(name, **kwargs):
native.cc_binary(
name = name + ".out",
linkopts = [
"-Lsomedir",
"-lsomelib"
],
**kwargs)
And I also have this rule which takes some sources, runs a tool on them, and generates a value, writing that value to an output file:
def _write_value_impl:
args = [f.path for f in ctx.files.srcs] + [ctx.outputs.out.path]
ctx.actions.run(
inputs = ctx.files.srcs,
outputs = [ctx.outputs.out],
arguments = args,
executable = ctx.executable._tool
)
write_value = rule(
implementation=_write_value_impl,
attrs = {
"srcs": attr.label_list(allow_files = True),
"out": attr.output(mandatory = True),
"_tool": attr.label(
executable = True,
allow_files = True,
default = Label("//tools:generate_value")
}
)
Okay, what I'd like to do is modify the macro so that it adds the value generated by the write_value rule to the linkopts. Something like this:
def build_it(name, value, **kwargs):
native.cc_binary(
name = name + ".out",
linkopts = [
"-Lsomedir",
"-lsomelib",
"-Wl,--defsym=SOME_SYMBOL={}".format(value)
],
**kwargs)
How do I make this work? The problem is that the target of build_it is generated at analysis time, but the value it needs is generated at evaluation time. Also, the value got put into a file. How do I get the value out of the file and give it to the macro?
I suspect that instead of a macro, I need a rule, but how do I get the rule to call native.cc_binary?
You can write a repository_rule() to create files and generate values prior to the loading phase, and then files in the #external_repo//... will be accessible by the rules during analysis. https://docs.bazel.build/versions/master/skylark/repository_rules.html
This can't be done in Bazel, precisely because of what you mentioned. All the inputs to rules need to be determined in the analysis phase rather than the execution phase. Bazel wants to build the complete action graph before executing any actions, and this would require the write_value rule to run before build_it could be analyzed.
A workaround might be to generate the BUILD file yourself outside of Bazel beforehand, and then use the generated BUILD file during your build.
Another workaround is to hard-code the linkopts to specify them to what you expect them to be. Then in write_value check if they are what you expected and if not throw an exit code. That way Bazel will at least warn you when they aren't going to match, but it will take some effort to update both places so they're aligned again.
For your specific problem there is a concept of linker scripts, and even implicit linker scripts. Perhaps you could generate one and supply that to cc_binary in the srcs attribute. You may need to name it as a .o file (even though it's not an object file). The GCC linker documentation says:
If you specify a linker input file which the linker can not recognize
as an object file or an archive file, it will try to read the file as
a linker script.