Bazel clean only a subset of the cached rules - bazel

I am currently developing in a monorepo that has a pretty large workspace file.
Right now, I am noting that one of my testing rules, is not getting its dependency rules re-built when I update one of my tests. Here is an example of this:
load("#npm//#grafana/toolkit:index.bzl", "grafana_toolkit")
load("#build_bazel_rules_nodejs//:index.bzl", "copy_to_bin")
APPLICATION_DEPS = glob(
[
# My updated test file is included in this glob
"src/**/*",
],
) + [
"my-config-files.json"
]
RULE_DEPS = [
"#npm//#grafana/data",
"#npm//#grafana/ui",
"#npm//emotion",
"#npm//fs-extra",
]
copy_to_bin(
name = "bin_files",
srcs = APPLICATION_DEPS,
)
grafana_toolkit(
name = "test",
args = [
"plugin:test",
],
chdir = package_name(),
data = RULE_DEPS + [
":bin_files",
],
)
I then have a file called maybe something.test.ts. I run bazel run :test and my test might show that I failed and I see the problem and fix it. The problem is that the next time I run my test, I see from the output that it's still failing because it's running the old test instead of the new test.
The Problem
The way that I normally fix this sort of issue with stale files not updating, is by running bazel clean. The problem is that doing bazel clean means I clean EVERYTHING. And that makes re-running all the build steps take pretty damn long. I'm wondering if there is a way I can specify that I only clean a subset of the cache (maybe only the output of my bin_files rule, for example). That way, rather than starting all over again, I only rebuild what I want to rebuild.

I've actually found a pretty quick and easy way to do what I was originally asking is basically to just go to the bazel-bin directory, and delete the output of whichever rule it is I want to re-run. So maybe in this case, I could delete the bin_files output in my bazel-bin directory then run my bin_files rule again.
With that being said, I think #ahumsky might be right in that if you're needing to do this, it's more likely a bug with something else. In my case, I was running a build version of my rule instead of a test version of my rule. So, cleaning a subset of my cache didn't really have anything to do with my original problem.

Related

Can I add static analysis to a py_binary or py_library rule?

I have a repo which uses bazel to build a bunch of Python code. I would like to introduce various flavors of static analysis into the build and have the build fail if these static analyses throw errors. What is the best way to do this?
For example, I'd like to declare something like:
py_library_with_static_analysis(
name = "foo",
srcs = ["foo.py"],
)
py_library_with_static_analysis(
name = "bar",
srcs = ["bar.py"],
deps = [":foo"],
)
In a build file and have it error out if there are mypy/flake/etc errors in foo.py. I would like to be able to do this gradually, converting libraries/binaries to static analysis one target at a time. I'm not sure if I should do this via a new rule, a macro, an aspect or something else.
Essentially, I think I'm asking how to run an additional command while building a py_binary/py_library and fail if that command fails.
I could create my own version of a py_library rule and have it run static analysis within the implementation but that seems like something which is really easy to get wrong (my guess is that native.py_library is quite complex?) and there doesn't seem to be a way to instantiate a native.py_library within a custom rule.
I've also played around with macros a bit, but haven't been able to get that to work either. I think my issue there is that a macro doesn't actually specify new commands, only new targets and I can't figure out how to make the static analysis target get force built along with the py_library/py_binary I'm interested in.
A macro that adds implicit test targets is not such a bad idea: The test targets will be picked up automatically when you run bazel test //..., which you could do in a gating CI to prevent imperfect code from merging.
Bazel supports a BUILD prelude (which is underdocumented) that you could use to replace all py_binary, py_library, and even py_test with your test-adding wrapper macros with minimal changes to existing code.
If you somehow fail the build instead it will make it harder to quickly prototype things. Sometimes you want to just quickly try something out, and you don't care about any pydoc violations yet.
In case you do want to fail the build, you might be able to use the Validations Output Group of a rule that you implement to wrap or replace your py_libraries.

Don't discard analysis cache when --action_env changes

I have an --action_env variable I'm passing into Bazel sometimes, but each time I remove it or add it back, it discards the analysis cache, which triggers a re-analysis that takes several minutes because I'm working in a large repo. Is there a way to prevent this? I'm already using --trim_test_configuration
Answer
Not really. You can think of the set of environment variables as a file that almost every Bazel action depends on. As a simple example let's say you have a build file that looks something like this;
genrule(
name = "foo_header",
cmd = "echo #define FOO $FOO_FROM_ENV > foo.h",
outs = ["foo.h"],
)
cc_library(
name = "my_library_that_everything_depends_on",
hdrs = [":foo_header"],
)
In this simple case it's not hard to see that if you change --action_env=FOO_FROM_ENV=7 to another value that everything that depends on foo.h now has to be completely rebuilt and analysed. So while frustrating it's probably a good thing that Bazel does this otherwise you'd end up with an inconsistent build.
Partial workarounds
Remove usage of use_default_shell_env from your rules/actions (this is far from trivial as most of the standard rules do this)
Use a centralised cache to prevent Bazel from deleting the artifact cache. e.g. I add these to ~/.bazelrc as then it shares the action/repository cache between all my projects. This only helps if you are switching between env variables on a regular basis rather than using new env values each time. Also, be careful with this as Bazel doesn't presently do any garbage collection so the cache directories can end up very large.
# ~/.bazelrc
common --disk_cache=~/.cache/shared_bazel_action_cache
common --repository_cache=~/.cache/shared_bazel_repository_cache
Add --incompatible_strict_action_env to your project bazelrc, this will prevent changes in your user shell triggering Bazel to discard the analysis cache.

Bazel Monorepo - How Rebuild and Publish only Changed Docker Images?

Objective
I have a monorepo setup with a growing number of services services. When I deploy the application I run a command and every service will be rebuilt and the final Docker images will be published.
But as the number of services grows the time it takes to rebuilt all of them gets longer and longer, although changes were made to only a few of them.
Why does my setup rebuilt all Docker images although only a few have changed? My goal is to rebuilt and publish only the images that have actually changed.
Details
I am using Bazel to build my Docker images, thus in the root of my project there is one BUILD file which contains the target I run when I want to deploy. It is just a collection of k8s_objects, where every service is included:
load("#io_bazel_rules_k8s//k8s:objects.bzl", "k8s_objects")
k8s_objects(
name = "kubernetes_deployment",
objects = [
"//services/service1",
"//services/service2",
"//services/service3",
"//services/service4",
# ...
]
)
Likewise there is one BUILD file for every service which first creates a Typescript library from all the source files, then creates the Node.Js image and finally passes the image to the Kubernetes object:
load("#npm_bazel_typescript//:index.bzl", "ts_library")
ts_library(
name = "lib",
srcs = glob(
include = ["**/*.ts"],
exclude = ["**/*.spec.ts"]
),
deps = [
"//packages/package1",
"//packages/package2",
"//packages/package3",
],
)
load("#io_bazel_rules_docker//nodejs:image.bzl", "nodejs_image")
nodejs_image(
name = "image",
data = [":lib", "//:package.json"],
entry_point = ":index.ts",
)
load("#k8s_deploy//:defaults.bzl", "k8s_deploy")
k8s_object(
name = "service",
template = ":service.yaml",
kind = "deployment",
cluster = "my-cluster"
images = {
"gcr.io/project/service:latest": ":image"
},
)
Note that the Typescript lib also depends on some packages, which should also be accounted for when redeploying!
To deploy I run bazel run :kubernetes_deployment.apply
Initially one reason I decided to choose Bazel is because I thought it would handle building only changed services itself. But obviously this is either not the case or my setup is faulty in some way.
If you need more detailed insight into the project you can check it out here: https://github.com/flolude/cents-ideas
Looks like Bazel repo itself does something similar:
https://github.com/bazelbuild/bazel/blob/ef0f8e61b5d3a139016c53bf04361a8e9a09e9ab/scripts/ci/ci.sh
The rough steps are:
Calculate the list of files that has changed
Use the file list and find their dependents (e.g. The bazel querykind(.*_binary, rdeps(//..., set(file1.txt file2.txt))) will find all binary targets which are dependents of either file1.txt or file2.txt)
build/test the list of targets
You will need to adapt this script to your need (e.g. make sure it finds docker image targets)
To find out the kind of a target, you can use bazel query //... --output label_kind
EDIT:
A bit of warning for anyone who wants to go down this rabbit hole (especially if you absolutely do not want to miss tests in CI):
You need to think about:
Deleted files / BUILD files (who depended on them)
Note that moved files == Deleted + Added as well
Also you cannot query reverse deps of files/BUILD that doesn't exist anymore!
Modified BUILD files (To be safe, make sure all reverse deps of all targets in the BUILD are built)
I think there is a ton of complexity here going down this route (if even possible). It might be less erro-prone to rely on Bazel itself to figure out what changed, using remote caches & --subcommands to calculate which side-effects need to be performed.

Access runfiles path in bazel BUILD file

I try to write a very simple cc_test in Bazel that builds a test runner and hands it the path to a test file as command line argument.
I tried to use the following snippet which seemed to do the trick according to 1 and 2.
cc_test(
name = "my_test",
srcs = [...],
deps = [...],
data = [":test_file"],
args = ["$(location :test_file)"]
)
My runner gets a relative path to the testfile, which, however, is not the proper path and the test fails.
This seems to be related to symlink issues with bazel under Windows (http://jayconrod.com/posts/108/writing-bazel-rules--data-and-runfiles), but I cannot believe there is no way to easily achieve what I am trying to.
I am aware of this answer, but I am searching for a purely BUILD-file based solution which doesn't use custom rules (whether this is a proper choice is different question, but I have the feeling that I am just missing something very fundamental here).

Skylark - How to execute a jar from a repository rule

Context
I am writing a repository rule that invokes another Bazel project. My current approach is to build the additional project as a deploy jar. I would like a user to be able to instantiate the rule like:
jar_path = some/relative/path
my_rule(name = "something", p_arg="m_arg", binary=jar_path)
and then given the jar_path and the arguments, I would like the repository rule to execute the following command in the shell:
java -jar $(SOME_JAR) $(ARGUMENTS_PROVIDED_BY_RULE)
Problem
First, it's unclear how best to accomplish the deploy jar approach. So far, I have attempt two different approaches, with varying levels of success. For examples, I have skimmed through the scala_rules, the maven_rules, and the skylark cookbook.
Second, and more importantly, I am not sure whether the deploy jar is the best route to accomplishing my goals. Again, my interest is to invoke a target from an external Bazel project, that is currently hosted on github. (So feasibly, I could try to fetch the project using the http_archive rule).
Below, I describe the attempts I have made.
Approach 1
My first approach involved trying to execute the command using the command field in ctx.action. I tried various enumerations of
java -jar {computed_absolute_path_of_deploy_jar} {args_passed_from_instantiation}.
My biggest issue here was with determining the absolute path of the deploy jar. The file's root path, would contain some additional information. For example, it would like something like this.
/abs/olute/path[ something ]/rela/tive/path
As a side note, I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None, despite File.none not being None.
My first approach involved was to was to try to use skylark [ctx.binary]
Approach 2
Next thing I tried was to mimic the input binary example from the docs. This was also unsuccessful. The issue was that the actual binary could not be found. Here is how I configured it.
First, I relaxed the repository rule into a regular skylark rule.
def _test_binary(ctx):
ctx.action(
....
arguments = [ctx.attr.p_arg],
executable = ctx.executable.binary)
test_binary = rule(
...
attrs = {
"binary":attr.label(mandatory=True, cfg="host", allow_files=True, executable=True),
...
}
Then, in my external project, I loaded the skylark rule into the WORKSPACE file. Finally, I called the macro from one of my BUILD files as follows:
load("#something_rule//:something_rule.bzl", "test_binary")
test_binary(name = "hello", p_arg = "hello", binary = "script.sh")
The script is a one line java -jar something_deploy.jar -- -arg:$1, and is in the same directory as the BUILD file.
Bazel complains that src/script.sh does not exist. I presume because it is looking for the file in /private/var/tmp/-bazel_username/somehash/relative_path. In response, I tried to pass the absolute path, which is not allowed.
Cheers.
It looks like you're mixing up repository rules with build extensions ("normal" rules). A good rule of thumb is:
Repository rules are for getting sources onto your system or symlinking them to a place Bazel can see them.
Build extension are for everything else: compiling, copying files, running binaries, etc.
I don't actually think you need to use either, for this. You say that the other project is on GitHub, so you can add the following to your WORKSPACE file:
http_archive(
name = "other_project",
...
)
Then, in your BUILD file:
genrule(
name = "run-a-jar",
srcs = ["#other_project//some/relative:path"],
cmd = "java -jar $(location #other_project//some/relative:path) -- arg1 arg2 > $#",
outs = ["jar-output"],
)
You shouldn't need to use the _deploy.jar target, since you're not moving the jar out of its project (_deploy.jar is useful when you need to relocate it).
Other things from your question:
I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None,
Are you sure it didn't evaluate to ""? The path is relative to the execution root, so for sources, it will always be "" (for outputs, it'll be bazel-out/local-fastbuild/bin or similar).
Bazel complains that src/script.sh does not exist.
Passing -s to Bazel can really help debugging Skylark rules. You can see exactly where it is looking.

Resources