Disagreed names of Bazel external dependencies in difference projects - bazel

Say there are two Bazel projects, they both depend on the Python package six.
Project A adds six with the name six_1_10_0:
new_http_archive(
name = "six_1_10_0"
...
)
py_binary(
name = "lib_a",
deps = ["#six_1_10_0//:six"]
)
Project B adds six with the name six_archive.
new_http_archive(
name = "six_archive"
...
)
py_binary(
name = "lib_b",
deps = ["#six_archive//:six"]
)
In my project, I depend on both A and B. Is there a way to let them use the same six?

To change the BUILD file contents of a dependency, the simplest way I can think of is to use one of the new_* repository rules (e.g. new_git_repository). Using the build_file or build_file_content attribute to write a new BUILD file, write a new py_binary rule with its deps containing your canonical #six repository, and keeping all other attributes the same.
There isn't a straightforward way of doing this because Bazel makes no assumption on why Project A uses a different version of six compared to Project B.
The only way that Bazel knows that they're using the same version is if both new_http_archive rules specify the same SHA checksum. If they are the same checksum, you can use --experimental_repository_cache=/some/path to avoid downloading the same archive twice.

Related

Bazel WORKSPACE conditionally define exactly one of two `git_repository`s

I'm maintaining two python libraries A and B, each partially using Bazel for building non-python codes. Library B depends on A in terms of Bazel, so B needs a remote repository of A.
For the released version of B, I'd like to have remote repository of A in canonical form, for example git_repository with commit hash.
git_repository(
name = "A",
commit = "...",
remote = "https://github.com/foo/A",
)
During development, I'd like to have remote repository of A in symbolic link form, for example, git_repository with master branch.
git_repository(
name = "A",
branch = "master",
remote = "https://github.com/foo/B",
)
And I'd like to use exactly one of them. After some research I found there is no "conditional branch" method (fed from command line flags or environment variable) I can use at the WORKSPACE level. I'm asking for any options I couldn't have found.
Followings are the alternatives that I've searched for, but not 100% happy.
Using local_repository during development is not an attractive solution, as in real there are 8+ libraries with chained dependencies, and I don't think it is realistic to manually clone and sometimes pull them.
Using alias() with select() at a BUILD level is not also very attractive solution, because it turns out there are tens of A's blaze targets that are used in B. Defining aliases for all of them is not maintainable at scale. (or is there a way to define alias at a package level?).
# WORKSPACE
git_repository(name = "A", ...)
git_repository(name = "A_master", ...)
# BUILD
config_setting(name = "use_master", ...)
alias(
name = "A_pkg_label", # There are too many targets to declare
actual = select({
":use_master": "#A_master/pkg:label",
"//conditions:default": "#A/pkg:label",
})
)
Using two WORKSPACE files seems feasible, but I couldn't find a clean way to select WORKSPACE file other than manually renaming them.
Defining custom repository_rule, branching by the repository_ctx.os.environ value, seemed promising until I figured out that I cannot reuse other repository rule inside implementation.
While you can't reuse other repository rules in general, in practice many of them are written in Starlark and are easy to reuse. For example, git_repository's implementation looks like this:
def _git_repository_implementation(ctx):
update = _clone_or_update(ctx)
patch(ctx)
ctx.delete(ctx.path(".git"))
return _update_git_attrs(ctx.attr, _common_attrs.keys(), update)
Most of those utility functions are either NOPs if you're only using the basic features or possible to load from your own starlark code. You could do a barebones replacement with just this:
load("#bazel_tools//tools/build_defs/repo:git_worker.bzl", "git_repo")
def _my_git_repository_implementation(ctx):
directory = str(ctx.path("."))
git_repo(ctx, directory)
ctx.delete(ctx.path(".git"))

Aggregate filegroups in subpackages into a large filegroup in bazel?

I have a parent directory foo, and child directories bar, baz, and qux. All four directories contain a bazel BUILD file and define filegroup rules that contain all files in the subdirectory (plus various other rules). The problem with this is that the filegroup in the parent directory foo cannot use a glob to ensure that all files are included (because globs do not cross package boundaries). Instead, I'm currently manually listing all of the children's rules as srcs in foo, and this manual listing is error-prone because when another child of foo is added, the author must remember to add it to the srcs of foo. I tried to make some progress solving this problem by looking at adding a genquery rule in foo (thinking I could somehow extract a list of srcs from this programmatically at build time), but recursive patterns are not allowed in the expression of a genquery rule, so this was unsuccessful.
What is the least mistake-prone way of creating such a filegroup? Is there anything better than my current manual construction of it by listing srcs?
The src of a filegroup is a list of labels.
Hence, you can (and should) do
filegroup(
name = "foo_supergroup",
srcs = [
"//foo/bar:smallergroup",
"//foo/baz:smallergroup",
"//foo/qux:smallergroup",
],
)
Edit: You can then add a presubmit check that these dependencies are the same as the subgroups.
For, this purpose, I suggest you introduce a tag "yeah":
foo/BUILD contains
filegroup(
name = "smallergroup",
srcs = glob(["*.txt"]),
tags = ["yeah"],
)
Thanks to this:
blaze query 'attr("tags", ".*yeah.*", deps(//foo/...))'
//foo/bar:smallergroup
//foo/baz:smallergroup
//foo/qux:smallergroup
It then becomes easy to compare with the sources of the supergroup:
blaze query 'deps(//foo:foo_supergroup, 1)'
//foo:foo_supergroup
//foo/bar:smallergroup
//foo/baz:smallergroup
//foo/qux:smallergroup
In fact, you don't need a specific presubmit. You can use a sh_test (using command diff) to compare the output of this two blaze queries made with gen_query

How to deal with implicit dependency (e.g C++ include) for incremental build in custom Skylark rule

Problem
I wonder how to inform bazel about dependencies unknown at declaration time, but known at build time (a.k.a implicit dependencies, dynamic dependencies, ...). For instance when compiling C++ sources, the .cpp source file will depends on some header files and this information is not available when writing the BUILD file. It needs to be retrieve at build time. Whatever is the solution to get the information (dry-run, generating depfile, parsing stdout), it needs to be done at build time and the information need to be retrieved to bazel build graph.
Since skylark does not allow to do I/O, for instance to read a generated depfile or to parse stdout result containing a dependency list, I have no clue on how to deal with it.
Behind implicit dependencies, I am looking for correct incremental build.
Example
To experiment this problem I have created a simple tool, just_a_tool.exe which takes an input file, read a list of file from it, and concatenate the content of all these file to an output file.
command line example:
just_a_tool.exe --input input.txt --depfile dep.d output.txt
dep.d contains the list of all the read files.
Issue
If I change the content of test1.txt, test2.txt, or test3.txt, bazel does not rebuild output.txt file. Of course, because it does not know there were dependencies.
Example files
just_a_tool.bzl
def _impl(ctx):
exec_path = "C:/Code/JustATool/just_a_tool.exe"
for f in ctx.attr.source.files:
source_path = f.path
output_path = ctx.outputs.out.path
dep_file = ctx.actions.declare_file("dep.d")
args = ["--input", source_path, "--dep_file", dep_file.path, output_path]
ctx.actions.run(
outputs=[ctx.outputs.out, dep_file],
executable=exec_path,
inputs=ctx.attr.source.files,
arguments=args
)
jat_convert = rule(
implementation = _impl,
attrs = {
"source" : attr.label(mandatory=True, allow_files=True, single_file=True)
},
outputs = {"out": "%{name}.txt"}
)
BUILD
load("//tool:just_a_tool.bzl", "jat_convert")
jat_convert(
name="my_output",
source=":input.txt"
)
input.txt
test1.txt
test2.txt
test3.txt
Goal
I want to do correct and fast incremental build for the following situation:
Generate reflection data from C++ sources, this custom tool execution depends on header file included in my source files.
Use a internal tool to build asset file which can include other files
Run a custom preprocessor on my shaders allowing a #include feature
Thanks!
Bazel's extension language doesn't support creating actions with a dynamic set of inputs, where this set depends on the output of a previous action. In other words, custom rules cannot run an action, read the action's output, then create actions with those inputs or update (or prune the set of) inputs of already created actions.
Instead I suggest adding attribute(s) to your rule where the user can declare the set of files that the sources may include. I call this "the universe of headers". The actions you create depend on this user-defined universe, so the set of action inputs is completely defined. Of course this means these actions potentially depend on more files than the cpp files, which they process, include.
This approach is analogous to how the cc_* rules work: a file in cc_*.srcs can include other files in the srcs of the same rule and from hdrs of dependencies, but nothing else. Thus the union of srcs + hdrs of (direct & transitive) dependencies defines the universe of header files that a cpp file may include.

How to retrieve the attributes of a Bazel workspace rule using aspects

I'm writing a post-build tool that synthesizes maven pom files after a bazel build. I'm using aspects to gather relevant information on the various targets.
One of the features involves adding external jar dependencies to the relevant pom files.
Lets assume our workspace contains the following target:
maven_jar(
name = "com_google_guava_guava",
artifact = "com.google.guava:guava:19.0",
)
and one of our BUILD files contains a target which has guava as a dependency
scala_library(
name = "somename",
srcs = glob(["*.scala"]) + glob(["*.java"]),
deps = [
"#com_google_guava_guava//jar:file" ,
],
In an aspect for this target how can one retrieve the attributes of maven_jar, specifically artifact?
(The closest I was able to get was:
[InputFileConfiguredTarget(#com_google_guava_guava//jar:guava-19.0.jar)]
Using ctx.rule.attr.srcs)
I could probably just parse the workspace external jars targets and get a map from the name to the artifact as a hybrid solution,
but a much more elegant solution would be for the aspect to provide the artifact by itself. Is it possible?
The "artifact" attribute is an attribute of a repository rule, which are not accessible from Skylark. The artifact seems like an information that could be integrated into the jar target in some way, feel free to file a feature request at https://github.com/bazelbuild/bazel/issues/new, with the reason why you need that.

Aliasing jar target of maven_jar rule

I have the following maven_jar in my workspace:
maven_jar(
name = "com_google_code_findbugs_jsr305",
artifact = "com.google.code.findbugs:jsr305:3.0.1",
sha1 = "f7be08ec23c21485b9b5a1cf1654c2ec8c58168d",
)
In my project I reference it through #com_google_code_findbugs_jsr305//jar. However, I now want to depend on a third party library that references #com_google_code_findbugs_jsr305 without the jar target.
I tried looking into both bind and alias, however alias cannot be applied inside the WORKSPACE and bind doesn't seem to allow you to define targets as external repositories.
I could rename the version I use so it doesn't conflict, but that feels like the wrong solution.
IIUC, your code needs to depend on both #com_google_code_findbugs_jsr305//jar and #com_google_code_findbugs_jsr305//:com_google_code_findbugs_jsr305. Unfortunately, there isn't any pre-built rule that generates BUILD files for both of those targets, so you basically have to define the BUILD files yourself. Fortunately, #jart has written most of it for you in the closure rule you linked to. You just need to add //jar:jar by appending a couple of lines, after line 69 add something like:
repository_ctx.file(
'jar/BUILD',
"\n".join([
"package(default_visibility = '//visibility:public')"] + _make_java_import('jar', '//:com_google_code_findbugs_jsr305.jar')
This creates a //jar:jar (or equivalently, //jar) target in the repository.

Resources