Clean up unreachable generated files in Bazel - bazel

Suppose I have a very minimal project with an empty WORKSPACE and a single package defined at the project root that simply uses touch to create a file called a, as follows:
genrule(
name = "target",
cmd = "touch $#",
outs = ["a"],
)
If I now run
bazel build //:target
the package will be "built" and the a file will be available under bazel-genfiles.
Suppose I now change the BUILD to write the output to a different file, as follows:
genrule(
name = "target",
cmd = "touch $#",
outs = ["b"],
)
Building the same target will result in the file b being available under bazel-genfiles. a will still be there though, even though at this point it's "unreachable" from within the context of the build definition.
Is there a way to ask Bazel to perform some sort of "garbage collection" and remove files (and possibly other content) generated by previous builds that are no longer reachable as-per the current build definition, without getting rid of the entire directory? The bazel clean command seems to adopt the latter behavior.
There seems to be a feature in the works, but apparently it cannot be performed on demand, but rather it executes automatically as soon as a certain threshold has been reached.

Note that running bazel clean will not actually delete the external directory. To remove all external artifacts, use bazel clean --expunge

bazel clean is the way to remove these.
The stale outputs aren't visible to actions, provided you build with sandboxing. (Not yet available on Windows, only on Linux and macOS.)
What trouble do these files make?

Related

Bazel builds from scratch ignoring cache

I observe that my Bazel build agent frequently builds the project from scratch (including compiling grpc, which keeps unchanged) instead of taking results from cache. Is there a way, like query or cquery (pardon my ignorance) to determine why is the cache considered invalid for particular target? Or any techniques to tackle cache invalidation problem?
This is How the bazel build works :
When running a build or a test, Bazel does the following: Loads the BUILD files relevant to the target. Analyzes the inputs and their dependencies, applies the specified build rules, and produces an action graph. Executes the build actions on the inputs until the final build outputs are produced.
If you are having any clear assumptions can you please share the complete details!
This is most likely due to the rebuild sensitivity to particular environment variables. Many build actions will read from environment variables and use them to change the outputs. Bazel keeps track of this and will rebuild seemingly unchanged remote targets when your env changes.
To demonstrate this;
Build grpc (2x ensure it is cached the second time)
Change the PATH environment variable (your IDE may do this without you knowing)
mkdir ~/bin && export PATH=$PATH:~/bin
Rebuild grpc (This should trigger a complete rebuild)
There are a couple helpful flags to combat this rebuild sensitivity, and I'd recommend adding them to your bazelrc.
incompatible_strict_action_env: Freezes your environment and doesn't source environment variables from your shell.
action_env modify environment variables as needed for you build.
# file //.bazelrc
# Don't source environment from shell
build --incompatible_strict_action_env
# Use action_env as needed for your project
build --action_env=CC=clang

bazel WORKSPACE file not behaving as documented?

Consider the following hierarchy:
WORKSPACE
foo/
BUILD
foo.sh
bar/
BUILD
bar.sh
Where, e.g., foo/BUILD contains
sh_binary(
name = "foo",
srcs = ["foo.sh"],
)
and similarly for bar/BUILD. As expected, bazel cquery //... prints:
INFO: Analyzed 2 targets (0 packages loaded, 0 targets configured).
INFO: Found 2 targets...
//bar:bar (a5d130b)
//foo:foo (a5d130b)
According to https://docs.bazel.build/versions/master/build-ref.html, "Bazel ignores any directory trees in a workspace rooted at a subdirectory containing a WORKSPACE file (as they form another workspace)."
Therefore, if I touch bar/WORKSPACE, I should expect bar to no longer be part of my workspace, and its contents should be ignored by bazel. Why, then, do I still get the same query results?
$ ls bar
BUILD WORKSPACE bar.sh
$ bazel cquery //...
INFO: Analyzed 2 targets (0 packages loaded, 0 targets configured).
INFO: Found 2 targets...
//bar:bar (a5d130b)
//foo:foo (a5d130b)
Bazel version is 3.7.0.
I answered this in the slack thread the author created, but for posterity, the confusion here was due to the author's interpretation (and possibly poor verbiage) in the documentation.
A WORKSPACE file alone doesn't "instantiate" or "define" a wholly separate workspace, as the author of this question assumed -- as such, Bazel won't treat that subdirectory any differently from any other package. Instead, the workspace needs to be defined as an "external repository" -- that is to say, "external" to the parent workspace, not necessarily a repository whose source code lives somewhere else.
You can do this using local_repository like so:
WORKSPACE
local_repository(
name = "bar_workspace_name",
path = "bar",
)
I may be projecting an actual behavior of bazel into interpreting the intended meaning of the sentence when I suppose the "ignoring" it speaks of goes in the other direction not descending into nested directories, but crawling up towards root. Essentially...
If you have that bar/WORKSPACE file, it does not change the behavior when looking at it from bar/'s parent and //bar still appears as package in that workspace.
However, it impact the behavior should you run bazel with bar/ for its working directory. As it tries to find the workspace root it stops WORKSPACE therein bar/ becomes its own // in that case (and by extension its parent is not accessible as part of the same workspace).
Or in practical terms... try the same thing:
bazel cquery //...
but in bar:
(cd bar/ && bazel cquery //... )
Once with and once without bar/WORKSPACE.

How to avoid deleting cached files after build in Bazel

I have a genrule in Bazel that is supposed to manipulate some files. I think I'm not accessing these files by the correct path, so I want to look at the directory structure that Bazel is creating so I can debug.
I added some echo statements to my genrule and I can see that Bazel is working in the directory /home/lyft/.cache/bazel/_bazel_lyft/8de0a1069de8d166c668173ca21c04ae/sandbox/linux-sandbox/1/execroot/. However, after Bazel finishes running, this directory is gone, so I can't look at the directory structure.
How can I prevent Bazel from deleting its temporary files so that I can debug what's happening?
Since this question is a top result for "keep sandbox files after build bazel" Google search and it wasn't obvious for me from the accepted answer, I feel the need to write this answer.
Short answer
Use --sandbox_debug. If this flag is passed, Bazel will not delete the files inside the sandbox folder after the build finishes.
Longer answer
Run bazel build with --sandbox_debug option:
$ bazel build mypackage:mytarget --sandbox_debug
Then you can inspect the contents of the sandbox folder for the project.
To get the location of the sandbox folder for current project, navigate to project and then run:
$ bazel info output_base
/home/johnsmith/.cache/bazel/_bazel_johnsmith/d949417420413f64a0b619cb69f1db69 # output will be something like this
Inside that directory there will be sandbox folder.
Possible caveat: (I'm NOT sure about this but) It's possible that some of the files are missing in sandbox folder, if you previously ran a build without --sandbox_debug flag and it partially succeeded. The reason is Bazel won't rerun parts of the build that already succeeded, and consequently the files corresponding to the successful build parts might not end up in the sandbox.
If you want to make sure all the sandbox files are there, clean the project first using either bazel clean or bazel clean --expunge.
You can use --spawn_strategy=standalone.
You can also use --sandbox_debug to see which directories are mounted to the sandbox.
You can also set the genrule's cmd to find . > $# to debug what's available to the genrule.
Important: declare all srcs/outs/tools that the genrule will read/write/use, and use $(location //label/of:target) to look up their path. Example:
genrule(
name = "x1",
srcs = ["//foo:input1.txt", "//bar:generated_file"],
outs = ["x1out.txt", "x1err.txt"],
tools = ["//util:bin1"],
cmd = "$(location //util:bin1) --input1=$(location //foo:input1.txt) --input2=$(location //bar:generated_file) --some_flag --other_flag >$(location x1out.txt) 2>$(location x1err.txt)",
)

How to run test on the source tree in bazel?

I am migrating a project from cmake to bazel. I have a folder contains some python code and some genrules.
I had a test script run all the python tests in this folder recursively.
So basically I need all py files under this folder as data for the test script. But given there are some genrule I need to run, there are some BUILD files, so that glob(["**/*.py"]) can't get through.
For example, we have a folder python contains following files.
python/BUILD
python/test_a.py
python/folder_a/BUILD this one has genrule in it.
python/folder_a/folder_b/BUILD this one has genrule as well.
python/folder_a/folder_b/folder_c/test_b.py
I want to run the test script under python/, it will run all the test_*.py recursively. Now we want to wrap it as a sh_test in bazel. So we need to specify all the test_*.py in the data field. But there is no easy way to do that since glob() can't get through python/folder_a/BUILD and python/folder_a/folder_b/BUILD.
It will be quite convenience if I can run this script in the source tree. But it seems that bazel didn't provide this. Adding local = 1 in sh_test only make the runfiles tree writable.
I know it is not a good way to use bazel to test, but sometimes it is too much work for migrating everything at the same time.
I can't think of an easy way to obtain all of the target names in a BUILD file, but there is a Bazel query you can run to get the target names, which you can then collect in a filegroup target to reference in the sh_test.data attribute.
bazel query 'filter(".*:test_.*\.py", kind("source file", //python/...:*) + kind("generated file", //python/...:*))'
Breaking this down:
kind("source file", //python/...:*) queries all source file
targets in the //python package recursively. This collects the
normal source files.
kind("generated file", //python/...:*) queries all generated
file targets in the //python package recursively. This collects
the genrule'd files.
filter(".*:test_.*\.py", ...) filters the results that contain
only targets in the form of //any/package:test_name.py
For example, running
bazel query 'filter(".*:test_.*\.py", kind("source file", //src/...:* + //tools/...:*) + kind("generated file", //src/...:* + //tools/...:*))'
on Bazel's own source tree finds one target: //src/test/py/bazel:test_base.py

Run single PS script from Release definition without pulling down entire project

In my release definition, I want to run a single PS script which lives in source control (TFVC in my case). I don't see a way to do this without TFS pulling down the entire source tree containing the one script on the agent machine. I currently have an unversioned copy of the script out on the agent machine, and I reference its absolute path from the release definition. This works, but I'm not guaranteed the latest version of this script to be run at release time.
You have at least two way to do it:
define a mapping that picks only what you need -- you can define a mapping up to a single file, e.g. cloak $/ and map $/path_to_my_file
use a dummy build that collects the file you need and save them as artifacts, I explained this technique in http://blog.casavian.eu/2017/03/04/mixing-tfvc-and-git/

Resources