Don't discard analysis cache when --action_env changes - bazel

I have an --action_env variable I'm passing into Bazel sometimes, but each time I remove it or add it back, it discards the analysis cache, which triggers a re-analysis that takes several minutes because I'm working in a large repo. Is there a way to prevent this? I'm already using --trim_test_configuration

Answer
Not really. You can think of the set of environment variables as a file that almost every Bazel action depends on. As a simple example let's say you have a build file that looks something like this;
genrule(
name = "foo_header",
cmd = "echo #define FOO $FOO_FROM_ENV > foo.h",
outs = ["foo.h"],
)
cc_library(
name = "my_library_that_everything_depends_on",
hdrs = [":foo_header"],
)
In this simple case it's not hard to see that if you change --action_env=FOO_FROM_ENV=7 to another value that everything that depends on foo.h now has to be completely rebuilt and analysed. So while frustrating it's probably a good thing that Bazel does this otherwise you'd end up with an inconsistent build.
Partial workarounds
Remove usage of use_default_shell_env from your rules/actions (this is far from trivial as most of the standard rules do this)
Use a centralised cache to prevent Bazel from deleting the artifact cache. e.g. I add these to ~/.bazelrc as then it shares the action/repository cache between all my projects. This only helps if you are switching between env variables on a regular basis rather than using new env values each time. Also, be careful with this as Bazel doesn't presently do any garbage collection so the cache directories can end up very large.
# ~/.bazelrc
common --disk_cache=~/.cache/shared_bazel_action_cache
common --repository_cache=~/.cache/shared_bazel_repository_cache
Add --incompatible_strict_action_env to your project bazelrc, this will prevent changes in your user shell triggering Bazel to discard the analysis cache.

Related

Can I add static analysis to a py_binary or py_library rule?

I have a repo which uses bazel to build a bunch of Python code. I would like to introduce various flavors of static analysis into the build and have the build fail if these static analyses throw errors. What is the best way to do this?
For example, I'd like to declare something like:
py_library_with_static_analysis(
name = "foo",
srcs = ["foo.py"],
)
py_library_with_static_analysis(
name = "bar",
srcs = ["bar.py"],
deps = [":foo"],
)
In a build file and have it error out if there are mypy/flake/etc errors in foo.py. I would like to be able to do this gradually, converting libraries/binaries to static analysis one target at a time. I'm not sure if I should do this via a new rule, a macro, an aspect or something else.
Essentially, I think I'm asking how to run an additional command while building a py_binary/py_library and fail if that command fails.
I could create my own version of a py_library rule and have it run static analysis within the implementation but that seems like something which is really easy to get wrong (my guess is that native.py_library is quite complex?) and there doesn't seem to be a way to instantiate a native.py_library within a custom rule.
I've also played around with macros a bit, but haven't been able to get that to work either. I think my issue there is that a macro doesn't actually specify new commands, only new targets and I can't figure out how to make the static analysis target get force built along with the py_library/py_binary I'm interested in.
A macro that adds implicit test targets is not such a bad idea: The test targets will be picked up automatically when you run bazel test //..., which you could do in a gating CI to prevent imperfect code from merging.
Bazel supports a BUILD prelude (which is underdocumented) that you could use to replace all py_binary, py_library, and even py_test with your test-adding wrapper macros with minimal changes to existing code.
If you somehow fail the build instead it will make it harder to quickly prototype things. Sometimes you want to just quickly try something out, and you don't care about any pydoc violations yet.
In case you do want to fail the build, you might be able to use the Validations Output Group of a rule that you implement to wrap or replace your py_libraries.

Are absolute paths safe to use in Bazel?

I am experimenting with Bazel to be added along with an old, make/shell based build system. I can easily make shell commands which returns an absolute path to some tool or library build by the old build system as early prerequisites. These commands I can use in a genrule(), which copies the needed files (like headers and libs) into Bazel proper to be exposed in form of a cc_library().
I found out that genrule() does not detect a dependency if the command uses a file with absolute path - it is not caught by the sandbox. In a way I am (ab)using that behavior.
It is it safe? Will some future update of Bazel refuse access to files based on absolute path in that way in a command in genrule?
Most of Bazel's sandboxes allow access to most paths outside of the source tree by default. Details depend on which sandbox implementation you're using. The docker sandbox, for example, allows access to all those paths inside of a docker image. It's kind of hard to make promises about future Bazel versions, but I think it's unlikely that a sandbox will prevent accessing /bin/bash (for example), which means other absolute paths will probably continue to work too.
--sandbox_block_path can be used to explicitly block a path if you want.
If you always have the files available on every machine you build on, your setup should work. Keep in mind that Bazel will not recognize when the contents of those files change, so you can easily get stale results in various caches. You can avoid that by ensuring the external paths change whenever their contents do.
new_local_repository might be a better fit to avoid those problems, if you know the paths ahead of time.
If you don't know the paths ahead of time, you can write a custom repository rule which runs arbitrary commands via repository_ctx.execute to retrieve the paths and them symlinks them in with repository_ctx.symlink.
Tensorflow's third_party/sycl/sycl_configure.bzl has an example of doing something similar (you would do something other than looking at environment variables like find_computecpp_root does, and you might symlink entire directories instead of all the files in them):
def _symlink_dir(repository_ctx, src_dir, dest_dir):
"""Symlinks all the files in a directory.
Args:
repository_ctx: The repository context.
src_dir: The source directory.
dest_dir: The destination directory to create the symlinks in.
"""
files = repository_ctx.path(src_dir).readdir()
for src_file in files:
repository_ctx.symlink(src_file, dest_dir + "/" + src_file.basename)
def find_computecpp_root(repository_ctx):
"""Find ComputeCpp compiler."""
sycl_name = ""
if _COMPUTECPP_TOOLKIT_PATH in repository_ctx.os.environ:
sycl_name = repository_ctx.os.environ[_COMPUTECPP_TOOLKIT_PATH].strip()
if sycl_name.startswith("/"):
return sycl_name
fail("Cannot find SYCL compiler, please correct your path")
def _sycl_autoconf_imp(repository_ctx):
<snip>
computecpp_root = find_computecpp_root(repository_ctx)
<snip>
_symlink_dir(repository_ctx, computecpp_root + "/lib", "sycl/lib")
_symlink_dir(repository_ctx, computecpp_root + "/include", "sycl/include")
_symlink_dir(repository_ctx, computecpp_root + "/bin", "sycl/bin")

How to pass an array from Bazel cli to rules?

Let's say I have a rule like this.
foo(
name = "helloworld",
myarray = [
":bar",
"//path/to:qux",
],
)
In this case, myarray is static.
However, I want it to be given by cli, like
bazel run //:helloworld --myarray=":bar,//path/to:qux,:baz,:another"
How is this possible?
Thanks
To get exactly what you're asking for, Bazel would need to support LABEL_LIST in Starlark-defined command line flags, which are documented here:
https://docs.bazel.build/versions/2.1.0/skylark/lib/config.html
and here: https://docs.bazel.build/versions/2.1.0/skylark/config.html
Unfortunately that's not implemented at the moment.
If you don't actually need a list of labels (i.e., to create dependencies between targets), then maybe STRING_LIST will work for you.
If you do need a list of labels, and the different possible values are known, then you can use --define, config_setting(), and select():
https://docs.bazel.build/versions/2.1.0/configurable-attributes.html
The question is, what are you really after. Passing variable, array into the bazel build/run isn't really possible, well not as such and not (mostly) without (very likely unwanted) side effects. Aren't you perhaps really just looking into passing arguments directly to what is being run by the run? I.e. pass it to the executable itself, not bazel?
There are few ways you could sneak stuff in (you'd also in most cases need to come up with a syntax to pass data on CLI and unpack the array in a rule), but many come with relatively substantial price.
You can define your array in a bzl file and load it from where the rule uses it. You can then dump the bzl content rewriting your build/run configuration (also making it obvious, traceable) and load the bits from the rule (only affecting the rule loading and using the variable). E.g, BUILD file:
load(":myarray.bzl", "myarray")
foo(
name = "helloworld",
myarray = myarray,
],
)
And you can then call your build:
$ echo 'myarray=[":bar", "//path/to:qux", ":baz", ":another"]' > myarray.bzl
$ bazel run //:helloworld
Which you can of course put in a single wrapper script. If this really needs to be a bazel array, this one is probably the cleanest way to do that.
--workspace_status_command: you can collection information about your environment, add either or both of the resulting files (depending on whether the inputs are meant to invalidate the rule results or not, you could use volatile or stable status files) as a dependency of your rule and process the incoming file in the what is being executed by the rule (at which point one would wonder why not pass it to as its command line arguments directly). If using stable status file, also each other rule depending on it is invalidated by any change.
You can do similar thing by using --action_env. From within the executable/tool/script underpinning the rule, you can directly access defined environmental variable. However, this also means environment of each rule is affected (not just the one you're targeting); and again, why would it parse the information from environment and not accept arguments on the command line.
There is also --define, but you would not really get direct access it's value as much as you could select() a choice out of possible options.

Bazel Monorepo - How Rebuild and Publish only Changed Docker Images?

Objective
I have a monorepo setup with a growing number of services services. When I deploy the application I run a command and every service will be rebuilt and the final Docker images will be published.
But as the number of services grows the time it takes to rebuilt all of them gets longer and longer, although changes were made to only a few of them.
Why does my setup rebuilt all Docker images although only a few have changed? My goal is to rebuilt and publish only the images that have actually changed.
Details
I am using Bazel to build my Docker images, thus in the root of my project there is one BUILD file which contains the target I run when I want to deploy. It is just a collection of k8s_objects, where every service is included:
load("#io_bazel_rules_k8s//k8s:objects.bzl", "k8s_objects")
k8s_objects(
name = "kubernetes_deployment",
objects = [
"//services/service1",
"//services/service2",
"//services/service3",
"//services/service4",
# ...
]
)
Likewise there is one BUILD file for every service which first creates a Typescript library from all the source files, then creates the Node.Js image and finally passes the image to the Kubernetes object:
load("#npm_bazel_typescript//:index.bzl", "ts_library")
ts_library(
name = "lib",
srcs = glob(
include = ["**/*.ts"],
exclude = ["**/*.spec.ts"]
),
deps = [
"//packages/package1",
"//packages/package2",
"//packages/package3",
],
)
load("#io_bazel_rules_docker//nodejs:image.bzl", "nodejs_image")
nodejs_image(
name = "image",
data = [":lib", "//:package.json"],
entry_point = ":index.ts",
)
load("#k8s_deploy//:defaults.bzl", "k8s_deploy")
k8s_object(
name = "service",
template = ":service.yaml",
kind = "deployment",
cluster = "my-cluster"
images = {
"gcr.io/project/service:latest": ":image"
},
)
Note that the Typescript lib also depends on some packages, which should also be accounted for when redeploying!
To deploy I run bazel run :kubernetes_deployment.apply
Initially one reason I decided to choose Bazel is because I thought it would handle building only changed services itself. But obviously this is either not the case or my setup is faulty in some way.
If you need more detailed insight into the project you can check it out here: https://github.com/flolude/cents-ideas
Looks like Bazel repo itself does something similar:
https://github.com/bazelbuild/bazel/blob/ef0f8e61b5d3a139016c53bf04361a8e9a09e9ab/scripts/ci/ci.sh
The rough steps are:
Calculate the list of files that has changed
Use the file list and find their dependents (e.g. The bazel querykind(.*_binary, rdeps(//..., set(file1.txt file2.txt))) will find all binary targets which are dependents of either file1.txt or file2.txt)
build/test the list of targets
You will need to adapt this script to your need (e.g. make sure it finds docker image targets)
To find out the kind of a target, you can use bazel query //... --output label_kind
EDIT:
A bit of warning for anyone who wants to go down this rabbit hole (especially if you absolutely do not want to miss tests in CI):
You need to think about:
Deleted files / BUILD files (who depended on them)
Note that moved files == Deleted + Added as well
Also you cannot query reverse deps of files/BUILD that doesn't exist anymore!
Modified BUILD files (To be safe, make sure all reverse deps of all targets in the BUILD are built)
I think there is a ton of complexity here going down this route (if even possible). It might be less erro-prone to rely on Bazel itself to figure out what changed, using remote caches & --subcommands to calculate which side-effects need to be performed.

Skylark - How to execute a jar from a repository rule

Context
I am writing a repository rule that invokes another Bazel project. My current approach is to build the additional project as a deploy jar. I would like a user to be able to instantiate the rule like:
jar_path = some/relative/path
my_rule(name = "something", p_arg="m_arg", binary=jar_path)
and then given the jar_path and the arguments, I would like the repository rule to execute the following command in the shell:
java -jar $(SOME_JAR) $(ARGUMENTS_PROVIDED_BY_RULE)
Problem
First, it's unclear how best to accomplish the deploy jar approach. So far, I have attempt two different approaches, with varying levels of success. For examples, I have skimmed through the scala_rules, the maven_rules, and the skylark cookbook.
Second, and more importantly, I am not sure whether the deploy jar is the best route to accomplishing my goals. Again, my interest is to invoke a target from an external Bazel project, that is currently hosted on github. (So feasibly, I could try to fetch the project using the http_archive rule).
Below, I describe the attempts I have made.
Approach 1
My first approach involved trying to execute the command using the command field in ctx.action. I tried various enumerations of
java -jar {computed_absolute_path_of_deploy_jar} {args_passed_from_instantiation}.
My biggest issue here was with determining the absolute path of the deploy jar. The file's root path, would contain some additional information. For example, it would like something like this.
/abs/olute/path[ something ]/rela/tive/path
As a side note, I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None, despite File.none not being None.
My first approach involved was to was to try to use skylark [ctx.binary]
Approach 2
Next thing I tried was to mimic the input binary example from the docs. This was also unsuccessful. The issue was that the actual binary could not be found. Here is how I configured it.
First, I relaxed the repository rule into a regular skylark rule.
def _test_binary(ctx):
ctx.action(
....
arguments = [ctx.attr.p_arg],
executable = ctx.executable.binary)
test_binary = rule(
...
attrs = {
"binary":attr.label(mandatory=True, cfg="host", allow_files=True, executable=True),
...
}
Then, in my external project, I loaded the skylark rule into the WORKSPACE file. Finally, I called the macro from one of my BUILD files as follows:
load("#something_rule//:something_rule.bzl", "test_binary")
test_binary(name = "hello", p_arg = "hello", binary = "script.sh")
The script is a one line java -jar something_deploy.jar -- -arg:$1, and is in the same directory as the BUILD file.
Bazel complains that src/script.sh does not exist. I presume because it is looking for the file in /private/var/tmp/-bazel_username/somehash/relative_path. In response, I tried to pass the absolute path, which is not allowed.
Cheers.
It looks like you're mixing up repository rules with build extensions ("normal" rules). A good rule of thumb is:
Repository rules are for getting sources onto your system or symlinking them to a place Bazel can see them.
Build extension are for everything else: compiling, copying files, running binaries, etc.
I don't actually think you need to use either, for this. You say that the other project is on GitHub, so you can add the following to your WORKSPACE file:
http_archive(
name = "other_project",
...
)
Then, in your BUILD file:
genrule(
name = "run-a-jar",
srcs = ["#other_project//some/relative:path"],
cmd = "java -jar $(location #other_project//some/relative:path) -- arg1 arg2 > $#",
outs = ["jar-output"],
)
You shouldn't need to use the _deploy.jar target, since you're not moving the jar out of its project (_deploy.jar is useful when you need to relocate it).
Other things from your question:
I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None,
Are you sure it didn't evaluate to ""? The path is relative to the execution root, so for sources, it will always be "" (for outputs, it'll be bazel-out/local-fastbuild/bin or similar).
Bazel complains that src/script.sh does not exist.
Passing -s to Bazel can really help debugging Skylark rules. You can see exactly where it is looking.

Resources