Aggregate filegroups in subpackages into a large filegroup in bazel? - bazel

I have a parent directory foo, and child directories bar, baz, and qux. All four directories contain a bazel BUILD file and define filegroup rules that contain all files in the subdirectory (plus various other rules). The problem with this is that the filegroup in the parent directory foo cannot use a glob to ensure that all files are included (because globs do not cross package boundaries). Instead, I'm currently manually listing all of the children's rules as srcs in foo, and this manual listing is error-prone because when another child of foo is added, the author must remember to add it to the srcs of foo. I tried to make some progress solving this problem by looking at adding a genquery rule in foo (thinking I could somehow extract a list of srcs from this programmatically at build time), but recursive patterns are not allowed in the expression of a genquery rule, so this was unsuccessful.
What is the least mistake-prone way of creating such a filegroup? Is there anything better than my current manual construction of it by listing srcs?

The src of a filegroup is a list of labels.
Hence, you can (and should) do
filegroup(
name = "foo_supergroup",
srcs = [
"//foo/bar:smallergroup",
"//foo/baz:smallergroup",
"//foo/qux:smallergroup",
],
)
Edit: You can then add a presubmit check that these dependencies are the same as the subgroups.
For, this purpose, I suggest you introduce a tag "yeah":
foo/BUILD contains
filegroup(
name = "smallergroup",
srcs = glob(["*.txt"]),
tags = ["yeah"],
)
Thanks to this:
blaze query 'attr("tags", ".*yeah.*", deps(//foo/...))'
//foo/bar:smallergroup
//foo/baz:smallergroup
//foo/qux:smallergroup
It then becomes easy to compare with the sources of the supergroup:
blaze query 'deps(//foo:foo_supergroup, 1)'
//foo:foo_supergroup
//foo/bar:smallergroup
//foo/baz:smallergroup
//foo/qux:smallergroup
In fact, you don't need a specific presubmit. You can use a sh_test (using command diff) to compare the output of this two blaze queries made with gen_query

Related

How does Bazel interpret special lexemes like `/`, `:`, `//` and `#` in labels?

I'm having trouble understanding how to construct proper label forms when dealing with external repositories (directories with their own WORKSPACE).
What is the semantic meaning of characters like /, :, // or #?
For example:
#foo/bar
#foo:bar
//foo
foo
Do they preserve their meaning when used in an external repository? Also, is //external special in any way?
/ is a separator for package and target names.
relative/package/to/my:target
//absolute/package/to:my/file/target.java
A package is defined as a directory containing a BUILD or BUILD.bazel file.
: is the lexeme for selecting a rule or file target in a package.
//my/package:my_java_binary
Selects the target my_java_binary defined in <workspace root>/my/package/BUILD
//my/package:file.go
Selects the file <workspace root>/my/package/file.go if <workspace root>/my/package/BUILD exists, and if there's a rule in that BUILD file that references it.
//:my/nested/file.txt
Selects the file <workspace root>/my/nested/file.txt if <workspace root>/BUILD exists, but not in the my and my/nested subdirectories.
// is the location of the current or closest parent directory containing a WORKSPACE file.
Otherwise known as workspace root.
# is used for referencing a repository by its name when used to the left of //
#io_bazel_rules_scala//scala:scala.bzl: look into your WORKSPACE file for a repository named io_bazel_rules_scala. Usually defined using http_archive or git_repository.
#//my/package:target: # alone refers to the current workspace.
As of Bazel 0.16.0, # can be used in package names.
Do they preserve their meaning when used in an external repository?
Yes, think of the #<repository> syntax as a namespace mechanism.
Also, is //external special in any way?
Yes, it's used for the bind function, which is not recommended anymore. bind lets you give a target an alias in //external.

How to retrieve fully-qualified names for target's dependencies?

I want to create a tarball of a binary and all libs it depends upon using pkg_tar(). I can retrieve a list of the binary's dependencies with
deps = native.existing_rule('my_binary')['deps']
However, the items in the list lack the #repo_name// prefix that was specified in the cc_binary() rule. For example, #system//:ace becomes :ace; when I try to operate on :ace, bazel rightfully tells me there is no such target.
I've looked through the entire dictionary returned by native.existing_rule and don't see a way to find the missing info. Is it not possible to retrieve this information with native.existing_rule or similar?
I know I can write a macro that creates the cc_binary target and the pkg_tar target, sharing the list of deps between them. This would be more elegant - but it seems quite strange if the deps can't be retrieved from the rule.
Have you considered using aspects? You can attach an aspect to dependencies of a given target and propagate information (in this case, fully-qualified label strings?) up to the root.
Let me know if you need any additional guidance!

How to deal with implicit dependency (e.g C++ include) for incremental build in custom Skylark rule

Problem
I wonder how to inform bazel about dependencies unknown at declaration time, but known at build time (a.k.a implicit dependencies, dynamic dependencies, ...). For instance when compiling C++ sources, the .cpp source file will depends on some header files and this information is not available when writing the BUILD file. It needs to be retrieve at build time. Whatever is the solution to get the information (dry-run, generating depfile, parsing stdout), it needs to be done at build time and the information need to be retrieved to bazel build graph.
Since skylark does not allow to do I/O, for instance to read a generated depfile or to parse stdout result containing a dependency list, I have no clue on how to deal with it.
Behind implicit dependencies, I am looking for correct incremental build.
Example
To experiment this problem I have created a simple tool, just_a_tool.exe which takes an input file, read a list of file from it, and concatenate the content of all these file to an output file.
command line example:
just_a_tool.exe --input input.txt --depfile dep.d output.txt
dep.d contains the list of all the read files.
Issue
If I change the content of test1.txt, test2.txt, or test3.txt, bazel does not rebuild output.txt file. Of course, because it does not know there were dependencies.
Example files
just_a_tool.bzl
def _impl(ctx):
exec_path = "C:/Code/JustATool/just_a_tool.exe"
for f in ctx.attr.source.files:
source_path = f.path
output_path = ctx.outputs.out.path
dep_file = ctx.actions.declare_file("dep.d")
args = ["--input", source_path, "--dep_file", dep_file.path, output_path]
ctx.actions.run(
outputs=[ctx.outputs.out, dep_file],
executable=exec_path,
inputs=ctx.attr.source.files,
arguments=args
)
jat_convert = rule(
implementation = _impl,
attrs = {
"source" : attr.label(mandatory=True, allow_files=True, single_file=True)
},
outputs = {"out": "%{name}.txt"}
)
BUILD
load("//tool:just_a_tool.bzl", "jat_convert")
jat_convert(
name="my_output",
source=":input.txt"
)
input.txt
test1.txt
test2.txt
test3.txt
Goal
I want to do correct and fast incremental build for the following situation:
Generate reflection data from C++ sources, this custom tool execution depends on header file included in my source files.
Use a internal tool to build asset file which can include other files
Run a custom preprocessor on my shaders allowing a #include feature
Thanks!
Bazel's extension language doesn't support creating actions with a dynamic set of inputs, where this set depends on the output of a previous action. In other words, custom rules cannot run an action, read the action's output, then create actions with those inputs or update (or prune the set of) inputs of already created actions.
Instead I suggest adding attribute(s) to your rule where the user can declare the set of files that the sources may include. I call this "the universe of headers". The actions you create depend on this user-defined universe, so the set of action inputs is completely defined. Of course this means these actions potentially depend on more files than the cpp files, which they process, include.
This approach is analogous to how the cc_* rules work: a file in cc_*.srcs can include other files in the srcs of the same rule and from hdrs of dependencies, but nothing else. Thus the union of srcs + hdrs of (direct & transitive) dependencies defines the universe of header files that a cpp file may include.

Disagreed names of Bazel external dependencies in difference projects

Say there are two Bazel projects, they both depend on the Python package six.
Project A adds six with the name six_1_10_0:
new_http_archive(
name = "six_1_10_0"
...
)
py_binary(
name = "lib_a",
deps = ["#six_1_10_0//:six"]
)
Project B adds six with the name six_archive.
new_http_archive(
name = "six_archive"
...
)
py_binary(
name = "lib_b",
deps = ["#six_archive//:six"]
)
In my project, I depend on both A and B. Is there a way to let them use the same six?
To change the BUILD file contents of a dependency, the simplest way I can think of is to use one of the new_* repository rules (e.g. new_git_repository). Using the build_file or build_file_content attribute to write a new BUILD file, write a new py_binary rule with its deps containing your canonical #six repository, and keeping all other attributes the same.
There isn't a straightforward way of doing this because Bazel makes no assumption on why Project A uses a different version of six compared to Project B.
The only way that Bazel knows that they're using the same version is if both new_http_archive rules specify the same SHA checksum. If they are the same checksum, you can use --experimental_repository_cache=/some/path to avoid downloading the same archive twice.

Skylark - How to execute a jar from a repository rule

Context
I am writing a repository rule that invokes another Bazel project. My current approach is to build the additional project as a deploy jar. I would like a user to be able to instantiate the rule like:
jar_path = some/relative/path
my_rule(name = "something", p_arg="m_arg", binary=jar_path)
and then given the jar_path and the arguments, I would like the repository rule to execute the following command in the shell:
java -jar $(SOME_JAR) $(ARGUMENTS_PROVIDED_BY_RULE)
Problem
First, it's unclear how best to accomplish the deploy jar approach. So far, I have attempt two different approaches, with varying levels of success. For examples, I have skimmed through the scala_rules, the maven_rules, and the skylark cookbook.
Second, and more importantly, I am not sure whether the deploy jar is the best route to accomplishing my goals. Again, my interest is to invoke a target from an external Bazel project, that is currently hosted on github. (So feasibly, I could try to fetch the project using the http_archive rule).
Below, I describe the attempts I have made.
Approach 1
My first approach involved trying to execute the command using the command field in ctx.action. I tried various enumerations of
java -jar {computed_absolute_path_of_deploy_jar} {args_passed_from_instantiation}.
My biggest issue here was with determining the absolute path of the deploy jar. The file's root path, would contain some additional information. For example, it would like something like this.
/abs/olute/path[ something ]/rela/tive/path
As a side note, I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None, despite File.none not being None.
My first approach involved was to was to try to use skylark [ctx.binary]
Approach 2
Next thing I tried was to mimic the input binary example from the docs. This was also unsuccessful. The issue was that the actual binary could not be found. Here is how I configured it.
First, I relaxed the repository rule into a regular skylark rule.
def _test_binary(ctx):
ctx.action(
....
arguments = [ctx.attr.p_arg],
executable = ctx.executable.binary)
test_binary = rule(
...
attrs = {
"binary":attr.label(mandatory=True, cfg="host", allow_files=True, executable=True),
...
}
Then, in my external project, I loaded the skylark rule into the WORKSPACE file. Finally, I called the macro from one of my BUILD files as follows:
load("#something_rule//:something_rule.bzl", "test_binary")
test_binary(name = "hello", p_arg = "hello", binary = "script.sh")
The script is a one line java -jar something_deploy.jar -- -arg:$1, and is in the same directory as the BUILD file.
Bazel complains that src/script.sh does not exist. I presume because it is looking for the file in /private/var/tmp/-bazel_username/somehash/relative_path. In response, I tried to pass the absolute path, which is not allowed.
Cheers.
It looks like you're mixing up repository rules with build extensions ("normal" rules). A good rule of thumb is:
Repository rules are for getting sources onto your system or symlinking them to a place Bazel can see them.
Build extension are for everything else: compiling, copying files, running binaries, etc.
I don't actually think you need to use either, for this. You say that the other project is on GitHub, so you can add the following to your WORKSPACE file:
http_archive(
name = "other_project",
...
)
Then, in your BUILD file:
genrule(
name = "run-a-jar",
srcs = ["#other_project//some/relative:path"],
cmd = "java -jar $(location #other_project//some/relative:path) -- arg1 arg2 > $#",
outs = ["jar-output"],
)
You shouldn't need to use the _deploy.jar target, since you're not moving the jar out of its project (_deploy.jar is useful when you need to relocate it).
Other things from your question:
I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None,
Are you sure it didn't evaluate to ""? The path is relative to the execution root, so for sources, it will always be "" (for outputs, it'll be bazel-out/local-fastbuild/bin or similar).
Bazel complains that src/script.sh does not exist.
Passing -s to Bazel can really help debugging Skylark rules. You can see exactly where it is looking.

Resources