How to run test on the source tree in bazel? - bazel

I am migrating a project from cmake to bazel. I have a folder contains some python code and some genrules.
I had a test script run all the python tests in this folder recursively.
So basically I need all py files under this folder as data for the test script. But given there are some genrule I need to run, there are some BUILD files, so that glob(["**/*.py"]) can't get through.
For example, we have a folder python contains following files.
python/BUILD
python/test_a.py
python/folder_a/BUILD this one has genrule in it.
python/folder_a/folder_b/BUILD this one has genrule as well.
python/folder_a/folder_b/folder_c/test_b.py
I want to run the test script under python/, it will run all the test_*.py recursively. Now we want to wrap it as a sh_test in bazel. So we need to specify all the test_*.py in the data field. But there is no easy way to do that since glob() can't get through python/folder_a/BUILD and python/folder_a/folder_b/BUILD.
It will be quite convenience if I can run this script in the source tree. But it seems that bazel didn't provide this. Adding local = 1 in sh_test only make the runfiles tree writable.
I know it is not a good way to use bazel to test, but sometimes it is too much work for migrating everything at the same time.

I can't think of an easy way to obtain all of the target names in a BUILD file, but there is a Bazel query you can run to get the target names, which you can then collect in a filegroup target to reference in the sh_test.data attribute.
bazel query 'filter(".*:test_.*\.py", kind("source file", //python/...:*) + kind("generated file", //python/...:*))'
Breaking this down:
kind("source file", //python/...:*) queries all source file
targets in the //python package recursively. This collects the
normal source files.
kind("generated file", //python/...:*) queries all generated
file targets in the //python package recursively. This collects
the genrule'd files.
filter(".*:test_.*\.py", ...) filters the results that contain
only targets in the form of //any/package:test_name.py
For example, running
bazel query 'filter(".*:test_.*\.py", kind("source file", //src/...:* + //tools/...:*) + kind("generated file", //src/...:* + //tools/...:*))'
on Bazel's own source tree finds one target: //src/test/py/bazel:test_base.py

Related

Bazel rules with unknown output filenames

I have a command that compiles and runs a program, but the intermediate files are randomly named (but contained within a directory). E.g.
build foo.src bar.src -o output_dir
run output_dir
Bazel requires me to pre-declare all of the outputs of my rule, but I can't do that because they're randomly named. Can I somehow name an entire directory instead?
The only alternative I can think of is having the rule zip/unzip the directory before/after it runs the commands, which is a pretty awful solution.
Edit: I found an issue exactly describing the "just zip/unzip everything" solution here. The closing comment says to just use the rules from rules_pkg to zip/unzip stuff. Unfortunately it requires Python too.
Some of the comments in that thread suggest you can use declare_directory() but I don't think that really works.
There are tree artifacts. An example of how to use an tree artifact can be found here.
Tree artifacts are problematic for caching since Bazel is not aware of the content of the corresponding directory and if for some reason the content of a tree artifact is different between two machines that use the same Bazel cache and same Bazel configuration you are trouble.

How do you enumerate and copy multiple files to the source folder in Bazel?

How do you enumerate and copy multiple files to the source folder in Bazel?
I'm new to Bazel and I am trying to replace a non-Bazel build step that is effectively cp -R with an idiomatic Bazel solution. Concrete use cases are:
copying .proto files to a a sub-project where they will be picked up by a non-Bazel build system. There are N .proto files in N Bazel packages, all in one protos/ directory of the repository.
copying numerous .gotmpl template files to a different folder where they can be picked up in a docker volume for a local docker-compose development environment. There are M template files in one Bazel package in a small folder hierarchy. Example code below.
Copy those same .gotmpl files to a gitops-type repo for a remote terraform to send to prod.
All sources are regular, checked in files in places where Bazel can enumerate them. All target directories are also Bazel packages. I want to write to the source folder, not just to bazel-bin, so other non-Bazel tools can see the output files.
Currently when adding a template file or a proto package, a script must be run outside of bazel to pick up that new file and add it to a generated .bzl file, or perform operations completely outside of Bazel. I would like to eliminate this step to move closer to having one true build command.
I could accomplish this with symlinks but it still has an error-prone manual step for the .proto files and it would be nice to gain the option to manipulate the files programmatically in Bazel in the build.
Some solutions I've looked into and hit dead ends:
glob seems to be relative to current package and I don't see how it can be exported since it needs to be called from BUILD. A filegroup solves the export issue but doesn't seem to allow enumeration of the underlying files in a way that a bazel run target can take as input.
Rules like cc_library that happily input globs as srcs are built into the Bazel source code, not written in Starlark
genquery and aspects seem to have powerful meta-capabilities but I can't see how to actually accomplish this task with them.
The "bazel can write to the source folder" pattern and write_source_files from aspect-build/bazel-lib might be great if I could programmatically generate the files parameter.
Here is the template example which is the simpler case. This was my latest experiment to bazel-ify cp -R. I want to express src/templates/update_templates_bzl.py in Bazel.
src/templates/BUILD:
# [...]
exports_files(glob(["**/*.gotmpl"]))
# [...]
src/templates/update_templates_bzl.py:
#!/usr/bin/env python
from pathlib import Path
parent = Path(__file__).parent
template_files = [str(f.relative_to(parent)) for f in list(parent.glob('**/*.gotmpl'))]
as_python = repr(template_files).replace(",", ",\n ")
target_bzl = Path(__file__).parent / "templates.bzl"
target_bzl.write_text(f""""Generated template list from {Path(__file__).relative_to(parent)}"
TEMPLATES = {as_python}""")
src/templates/copy_templates.bzl
"""Utility for working with this list of template files"""
load("#aspect_bazel_lib//lib:write_source_files.bzl", "write_source_files")
load("templates.bzl", "TEMPLATES")
def copy_templates(name, prefix):
files = {
"%s/%s" % (prefix, f) : "//src/templates:%s" % f for f in TEMPLATES
}
write_source_files(
name = name,
files = files,
visibility = ["//visibility:public"],
)
other/module:
load("//src/templates:copy_templates.bzl", "copy_templates")
copy_templates(
name = "write_template_files",
prefix = "path/to/gitops/repo/templates",
)
One possible method to do this would be to use google/bazel_rules_install.
As mentioned in the project README.md you need to add the following to your WORKSPACE file;
# file: WORKSPACE
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "com_github_google_rules_install",
urls = ["https://github.com/google/bazel_rules_install/releases/download/0.3/bazel_rules_install-0.3.tar.gz"],
sha256 = "ea2a9f94fed090859589ac851af3a1c6034c5f333804f044f8f094257c33bdb3",
strip_prefix = "bazel_rules_install-0.3",
)
load("#com_github_google_rules_install//:deps.bzl", "install_rules_dependencies")
install_rules_dependencies()
load("#com_github_google_rules_install//:setup.bzl", "install_rules_setup")
install_rules_setup()
Then in your src/templates directory you can add the following to bundle all your templates into one target.
# file: src/templates/BUILD.bazel
load("#com_github_google_rules_install//installer:def.bzl", "installer")
installer(
name = "install_templates",
data = glob(["**/*.gotmpl"]),
)
Then you can use the installer to install into your chosen directory like so.
bazel run //src/templates:install_templates -- path/to/gitops/repo/templates
It's also worth checking out bazelbuild/rules_docker for building your development environments using only Bazel.

When to prefix a BUILD file (*.BUILD) in Bazel

In its C++ unit testing tutorial, Bazel suggests adding a root level gtest.BUILD file to the workspace root in order to properly integrate Google Test into the test project.
https://docs.bazel.build/versions/master/cpp-use-cases.html
Why would one create a new BUILD file and add gtest prefix to it rather than adding a new build rule to an existing BUILD file in the workspace? Is it just a minor style preference?
Because if you added a BUILD file somewhere in the workspace (e.g. under //third_party/gtest/BUILD) then that file would create a package there.
Then, if you had targets declared in that BUILD file, would their files exist under //third_party/gtest, or would they exist in the zip file that the http_archive downloads? If the former, then there's no need for a http_archive because the files are already in the source tree; if the latter, then the BUILD file references non-existent files in its own package. Both scenarios are flawed.
Better to call gtest's BUILD-file-to-be something that doesn't create a package, but that's descriptive of its purpose.
The build_file attribute of http_archive can reference any file, there's no requirement of the name. The name gtest.BUILD is mostly stylistic, yes, but it also avoids creating a package where it shouldn't. You could say it's an "inactive" BUILD file that will be "active" when Bazel downloads the http_archive, extracts it somewhere, and creates in that directory a symlink called BUILD which points to gtest.BUILD.
Another advantage of having such "inactive" BUILD files is that you can have multiple of them within one package, for multiple http_archives.

Clean up unreachable generated files in Bazel

Suppose I have a very minimal project with an empty WORKSPACE and a single package defined at the project root that simply uses touch to create a file called a, as follows:
genrule(
name = "target",
cmd = "touch $#",
outs = ["a"],
)
If I now run
bazel build //:target
the package will be "built" and the a file will be available under bazel-genfiles.
Suppose I now change the BUILD to write the output to a different file, as follows:
genrule(
name = "target",
cmd = "touch $#",
outs = ["b"],
)
Building the same target will result in the file b being available under bazel-genfiles. a will still be there though, even though at this point it's "unreachable" from within the context of the build definition.
Is there a way to ask Bazel to perform some sort of "garbage collection" and remove files (and possibly other content) generated by previous builds that are no longer reachable as-per the current build definition, without getting rid of the entire directory? The bazel clean command seems to adopt the latter behavior.
There seems to be a feature in the works, but apparently it cannot be performed on demand, but rather it executes automatically as soon as a certain threshold has been reached.
Note that running bazel clean will not actually delete the external directory. To remove all external artifacts, use bazel clean --expunge
bazel clean is the way to remove these.
The stale outputs aren't visible to actions, provided you build with sandboxing. (Not yet available on Windows, only on Linux and macOS.)
What trouble do these files make?

How can I tell Which bazel aspect outputs are still relevant

As part of our efforts to create a bazel-maven transition interop tool (that creates maven sized jars from more granular sized bazel jars),
we have written an aspect that runs on bazel build of the entire bazel repo and writes important information to txt files outputs (e.g.: jar file paths, compile deps targets and runtime deps targets, etc.)
We ran across an issue where the repo's code was changed such that some of the txt file were not written anymore. But the old txt file from previous runs (before the code change) remained!
Is there a way to know that these txt files are no longer relevant?
You should be able to run with --build_event_json_file=file.json and try to locate generated artifacts. For example we use it on ci.bazel.io to locate actual test.xml file that were generated: https://github.com/bazelbuild/continuous-integration/blob/09975cbb487a84a62ca1e43aa43e7c6fe078f058/jenkins/lib/src/build/bazel/ci/BazelUtils.groovy#L218
The definition of the protocol can be found in build_event_stream.proto

Resources