How can I build custom rules using the output of workspace_status_command? - bazel

The bazel build flag --workspace_status_command supports calling a script to retrieve e.g. repository metadata, this is also known as build stamping and available in rules like java_binary.
I'd like to create a custom rule using this metadata.
I want to use this for a common support function. It should receive the git version and some other attributes and create a version.go output file usable as a dependency.
So I started a journey looking at rules in various bazel repositories.
Rules like rules_docker support stamping with stamp in container_image and let you reference the status output in attributes.
rules_go supports it in the x_defs attribute of go_binary.
This would be ideal for my purpose and I dug in...
It looks like I can get what I want with ctx.actions.expand_template using the entries in ctx.info_file or ctx.version_file as a dictionary for substitutions. But I didn't figure out how to get a dictionary of those files. And those two files seem to be "unofficial", they are not part of the ctx documentation.
Building on what I found out already: How do I get a dict based on the status command output?
If that's not possible, what is the shortest/simplest way to access workspace_status_command output from custom rules?

I've been exactly where you are and I ended up following the path you've started exploring. I generate a JSON description that also includes information collected from git to package with the result and I ended up doing something like this:
def _build_mft_impl(ctx):
args = ctx.actions.args()
args.add('-f')
args.add(ctx.info_file)
args.add('-i')
args.add(ctx.files.src)
args.add('-o')
args.add(ctx.outputs.out)
ctx.actions.run(
outputs = [ctx.outputs.out],
inputs = ctx.files.src + [ctx.info_file],
arguments = [args],
progress_message = "Generating manifest: " + ctx.label.name,
executable = ctx.executable._expand_template,
)
def _get_mft_outputs(src):
return {"out": src.name[:-len(".tmpl")]}
build_manifest = rule(
implementation = _build_mft_impl,
attrs = {
"src": attr.label(mandatory=True,
allow_single_file=[".json.tmpl", ".json_tmpl"]),
"_expand_template": attr.label(default=Label("//:expand_template"),
executable=True,
cfg="host"),
},
outputs = _get_mft_outputs,
)
//:expand_template is a label in my case pointing to a py_binary performing the transformation itself. I'd be happy to learn about a better (more native, fewer hops) way of doing this, but (for now) I went with: it works. Few comments on the approach and your concerns:
AFAIK you cannot read in (the file and perform operations in Skylark) itself...
...speaking of which, it's probably not a bad thing to keep the transformation (tool) and build description (bazel) separate anyways.
It could be debated what constitutes the official documentation, but ctx.info_file may not appear in the reference manual, it is documented in the source tree. :) Which is case for other areas as well (and I hope that is not because those interfaces are considered not committed too yet).
For sake of comleteness in src/main/java/com/google/devtools/build/lib/skylarkbuildapi/SkylarkRuleContextApi.java there is:
#SkylarkCallable(
name = "info_file",
structField = true,
documented = false,
doc =
"Returns the file that is used to hold the non-volatile workspace status for the "
+ "current build request."
)
public FileApi getStableWorkspaceStatus() throws InterruptedException, EvalException;
EDIT: few extra details as asked in the comment.
In my workspace_status.sh I would have for instance the following line:
echo STABLE_GIT_REF $(git log -1 --pretty=format:%H)
In my .json.tmpl file I would then have:
"ref": "${STABLE_GIT_REF}",
I've opted for shell like notation of text to be replaced, since it's intuitive for many users as well as easy to match.
As for the replacement, relevant (CLI kept out of this) portion of the actual code would be:
def get_map(val_file):
"""
Return dictionary of key/value pairs from ``val_file`.
"""
value_map = {}
for line in val_file:
(key, value) = line.split(' ', 1)
value_map.update(((key, value.rstrip('\n')),))
return value_map
def expand_template(val_file, in_file, out_file):
"""
Read each line from ``in_file`` and write it to ``out_file`` replacing all
${KEY} references with values from ``val_file``.
"""
def _substitue_variable(mobj):
return value_map[mobj.group('var')]
re_pat = re.compile(r'\${(?P<var>[^} ]+)}')
value_map = get_map(val_file)
for line in in_file:
out_file.write(re_pat.subn(_substitue_variable, line)[0])
EDIT2: This is how the Python script is how I expose the python script to rest of bazel.
py_binary(
name = "expand_template",
main = "expand_template.py",
srcs = ["expand_template.py"],
visibility = ["//visibility:public"],
)

Building on Ondrej's answer, I now use somthing like this (adapted in SO editor, might contain small errors):
tools/bazel.rc:
build --workspace_status_command=tools/workspace_status.sh
tools/workspace_status.sh:
echo STABLE_GIT_REV $(git rev-parse HEAD)
version.bzl:
_VERSION_TEMPLATE_SH = """
set -e -u -o pipefail
while read line; do
export "${line% *}"="${line#* }"
done <"$INFILE" \
&& cat <<EOF >"$OUTFILE"
{ "ref": "${STABLE_GIT_REF}"
, "service": "${SERVICE_NAME}"
}
EOF
"""
def _commit_info_impl(ctx):
ctx.actions.run_shell(
outputs = [ctx.outputs.outfile],
inputs = [ctx.info_file],
progress_message = "Generating version file: " + ctx.label.name,
command = _VERSION_TEMPLATE_SH,
env = {
'INFILE': ctx.info_file.path,
'OUTFILE': ctx.outputs.version_go.path,
'SERVICE_NAME': ctx.attr.service,
},
)
commit_info = rule(
implementation = _commit_info_impl,
attrs = {
'service': attr.string(
mandatory = True,
doc = 'name of versioned service',
),
},
outputs = {
'outfile': 'manifest.json',
},
)

Related

How is vendorSha256 computed?

I'm trying to understand how the vendorSha256 is calculated when using buildGoModule. In nixpkgs manual the only info I get is:
"vendorSha256: is the hash of the output of the intermediate fetcher derivation."
Is there a way I can calculate the vendorSha256 for a nix expression I'm writing? To take a specific example, how was the "sha256-Y4WM+o+5jiwj8/99UyNHLpBNbtJkKteIGW2P1Jd9L6M=" generated here:
{ lib, buildGoModule, fetchFromGitHub }:
buildGoModule rec {
pname = "oapi-codegen";
version = "1.6.0";
src = fetchFromGitHub {
owner = "deepmap";
repo = pname;
rev = "v${version}";
sha256 = "sha256-doJ1ceuJ/gL9vlGgV/hKIJeAErAseH0dtHKJX2z7pV0=";
};
vendorSha256 = "sha256-Y4WM+o+5jiwj8/99UyNHLpBNbtJkKteIGW2P1Jd9L6M=";
# Tests use network
doCheck = false;
meta = with lib; {
description = "Go client and server OpenAPI 3 generator";
homepage = "https://github.com/deepmap/oapi-codegen";
license = licenses.asl20;
maintainers = [ maintainers.j4m3s ];
};
}
From the manual:
The function buildGoModule builds Go programs managed with Go modules.
It builds a Go Modules through a two phase build:
An intermediate fetcher derivation. This derivation will be used to fetch all of the dependencies of the Go module.
A final derivation will use the output of the intermediate derivation to build the binaries and produce the final output.
You can see that, when you're trying to build the above expression in the ouput of nix-build. If you run:
nix-build -E 'with import <nixpkgs> { }; callPackage ./yourExpression.nix { }'
you see the first 2 lines of output:
these 2 derivations will be built:
/nix/store/j13s3dvlwz5w9xl5wbhkcs7lrkgksv3l-oapi-codegen-1.6.0-go-modules.drv
/nix/store/4wyj1d9f2m0521nlkjgr6al0wfz12yjn-oapi-codegen-1.6.0.drv
The first derivation will be used to fetch all dependencies for your Go module, and the second will be used to build your actual module. So vendorSha256 is the hash of the output of that first derivation.
When you write a Nix expression to build a Go module you don't know in advance that hash. You only know it that the first derivation has been realised(download dependencies and find the hash based on them). However you can use Nix validation to find out the value of vendorSha256.
Modify your Nix expression like so:
{ lib, buildGoModule, fetchFromGitHub }:
buildGoModule rec {
pname = "oapi-codegen";
version = "1.6.0";
src = fetchFromGitHub {
owner = "deepmap";
repo = pname;
rev = "v${version}";
sha256 = "sha256-doJ1ceuJ/gL9vlGgV/hKIJeAErAseH0dtHKJX2z7pV0=";
};
vendorSha256 = lib.fakeSha256;
# Tests use network
doCheck = false;
meta = with lib; {
description = "Go client and server OpenAPI 3 generator";
homepage = "https://github.com/deepmap/oapi-codegen";
license = licenses.asl20;
maintainers = [ maintainers.j4m3s ];
};
}
The only difference is vendorSha256 has now the value of lib.fakeSha256, which is just a fake/wrong sha256 hash. Nix will try to build the first derivation and will check the hash of the dependencies against this value. Since they will not match, an error will occur:
error: hash mismatch in fixed-output derivation '/nix/store/j13s3dvlwz5w9xl5wbhkcs7lrkgksv3l-oapi-codegen-1.6.0-go-modules.drv':
specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
got: sha256-Y4WM+o+5jiwj8/99UyNHLpBNbtJkKteIGW2P1Jd9L6M=
error: 1 dependencies of derivation '/nix/store/4wyj1d9f2m0521nlkjgr6al0wfz12yjn-oapi-codegen-1.6.0.drv' failed to build
So this answer your question. The value of vendorSha256 you need is sha256-Y4WM+o+5jiwj8/99UyNHLpBNbtJkKteIGW2P1Jd9L6M=. Copy and add it your file and you're good to go!

How to define string based on host os in bazel rule definition?

I have the following rule definition:
helm_action = rule(
attrs = {
…
"cluster_aliases": attr.string_dict(
doc = "key value pair matching for creating a cluster alias where the name used to evoke a cluster alias is different than the actual cluster's name",
default = DEFAULT_CLUSTER_ALIASES,
),
…
},
…
)
I'd like for DEFAULT_CLUSTER_ALIASES value to be based on the host os but
DEFAULT_CLUSTER_ALIASES = {
"local": select({
"#platforms//os:osx": "docker-desktop",
"#platforms//os:linux": "minikube",
})
}
errors with:
Error in string_dict: expected value of type 'string' for dict value element, but got select({"#platforms//os:osx": "docker-desktop", "#platforms//os:linux": "minikube"}) (select)
How do I go about defining DEFAULT_CLUSTER_ALIASES based on the host os?
Judging from https://github.com/bazelbuild/bazel/issues/2045, selecting based on host os is not possible.
When you create a rule or macro, it is evaluated during the loading phase, before command-line flags are evaluated. Bazel needs to know the default value in your build rule helm_action during the loading phase but can't because it hasn't parsed the command line and analysed the build graph.
The command line is parsed and select statements are evaluated during the analysis phase. As a broad rule, if your select statement isn't in a BUILD.bazel then it's not going to work. So the easiest way to achieve what you are after is to create a macro that uses your rule injecting the default. e.g.
# helm_action.bzl
# Add an '_' prefix to your rule to make the rule private.
_helm_action = rule(
attrs = {
…
"cluster_aliases": attr.string_dict(
doc = "key value pair matching for creating a cluster alias where the name used to evoke a cluster alias is different than the actual cluster's name",
# Remove default attribute.
),
…
},
…
)
# Wrap your rule in a publicly exported macro.
def helm_action(**kwargs):
_helm_action(
name = kwargs["name"],
# Instantiate your rule with a select.
cluster_aliases = DEFAULT_CLUSTER_ALIASES,
**kwargs,
)
It's important to note the difference between a macro and a rule. A macro is a way of generating a set of targets using other build rules, and actually expands out roughly equivalent to it's contents when used in a BUILD file. You can check this by querying a target with the --output build flag. e.g.
load(":helm_action.bzl", "helm_action")
helm_action(
name = "foo",
# ...
)
You can query the output using the command;
bazel query //:foo --output build
This will demonstrate that the select statement is being copied into the BUILD file.
A good example of this approach is in the rules_docker repository.
EDIT: The question was clarified, so I've got an updated answer below but will keep the above answer in case it is useful to others.
A simple way of achieving what you are after is to use Bazels toolchain api. This is a very flexible API and is what most language rulesets use in Bazel. e.g.
Create a build file with your toolchains;
# //helm:BUILD.bazel
load(":helm_toolchains.bzl", "helm_toolchain")
toolchain_type(name = "toolchain_type")
helm_toolchain(
name = "osx",
cluster_aliases = {
"local": "docker-desktop",
},
)
toolchain(
name = "osx_toolchain",
toolchain = ":osx",
toolchain_type = ":toolchain_type",
exec_compatible_with = ["#platforms//os:macos"],
# Optionally use to restrict target platforms too.
# target_compatible_with = []
)
helm_toolchain(
name = "linux",
cluster_aliases = {
"local": "minikube",
},
)
toolchain(
name = "linux_toolchain",
toolchain = ":linux",
toolchain_type = ":toolchain_type",
exec_compatible_with = ["#platforms//os:linux"],
)
Register your toolchains so that Bazel knows what to look for;
# //:WORKSPACE
# the rest of your workspace...
register_toolchains("//helm:all")
# You may need to register your execution platforms too...
# register_execution_platforms("//your_platforms/...")
Implement the toolchain backend;
# //helm:helm_toolchains.bzl
HelmToolchainInfo = provider(fields = ["cluster_aliases"])
def _helm_toolchain_impl(ctx):
toolchain_info = platform_common.ToolchainInfo(
helm_toolchain_info = HelmToolchainInfo(
cluster_aliases = ctx.attr.cluster_aliases,
),
)
return [toolchain_info]
helm_toolchain = rule(
implementation = _helm_toolchain_impl,
attrs = {
"cluster_aliases": attr.string_dict(),
},
)
Update helm_action to use toolchains. e.g.
def _helm_action_impl(ctx):
cluster_aliases = ctx.toolchains["#your_repo//helm:toolchain_type"].helm_toolchain_info.cluster_aliases
#...
helm_action = rule(
_helm_action_impl,
attrs = {
#…
},
toolchains = ["#your_repo//helm:toolchain_type"]
)

Conditionally create a Bazel rule based on --config

I'm working on a problem in which I only want to create a particular rule if a certain Bazel config has been specified (via '--config'). We have been using Bazel since 0.11 and have a bunch of build infrastructure that works around former limitations in Bazel. I am incrementally porting us up to newer versions. One of the features that was missing was compiler transitions, and so we rolled our own using configs and some external scripts.
My first attempt at solving my problem looks like this:
load("#rules_cc//cc:defs.bzl", "cc_library")
# use this with a select to pick targets to include/exclude based on config
# see __build_if_role for an example
def noop_impl(ctx):
pass
noop = rule(
implementation = noop_impl,
attrs = {
"deps": attr.label_list(),
},
)
def __sanitize(config):
if len(config) > 2 and config[:2] == "//":
config = config[2:]
return config.replace(":", "_").replace("/", "_")
def build_if_config(**kwargs):
config = kwargs['config']
kwargs.pop('config')
name = kwargs['name'] + '_' + __sanitize(config)
binary_target_name = kwargs['name']
kwargs['name'] = binary_target_name
cc_library(**kwargs)
noop(
name = name,
deps = select({
config: [ binary_target_name ],
"//conditions:default": [],
})
)
This almost gets me there, but the problem is that if I want to build a library as an output, then it becomes an intermediate dependency, and therefore gets deleted or never built.
For example, if I do this:
build_if_config(
name="some_lib",
srcs=[ "foo.c" ],
config="//:my_config",
)
and then I run
bazel build --config my_config //:some_lib
Then libsome_lib.a does not make it to bazel-out, although if I define it using cc_library, then it does.
Is there a way that I can just create the appropriate rule directly in the macro instead of creating a noop rule and using a select? Or another mechanism?
Thanks in advance for your help!
As I noted in my comment, I was misunderstanding how Bazel figures out its dependencies. The create a file section of The Rules Tutorial explains some of the details, and I followed along here for some of my solution.
Basically, the problem was not that the built files were not sticking around, it was that they were never getting built. Bazel did not know to look in the deps variable and build those things: it seems I had to create an action which uses the deps, and then register an action by returning a (list of) DefaultInfo
Below is my new noop_impl function
def noop_impl(ctx):
if len(ctx.attr.deps) == 0:
return None
# ctx.attr has the attributes of this rule
dep = ctx.attr.deps[0]
# DefaultInfo is apparently some sort of globally available
# class that can be used to index Target objects
infile = dep[DefaultInfo].files.to_list()[0]
outfile = ctx.actions.declare_file('lib' + ctx.label.name + '.a')
ctx.actions.run_shell(
inputs = [infile],
outputs = [outfile],
command = "cp %s %s" % (infile.path, outfile.path),
)
# we can also instantiate a DefaultInfo to indicate what output
# we provide
return [DefaultInfo(files = depset([outfile]))]

How do I generate declared files and directories using java_common.compile.annotation_processor_additional_outputs?

I use an annotation processor to generate language bindings from interfaces defined in Java. I want to replace actions created by ctx.actions.run with actions generated by java_common.compile, so that I can exploit Bazel's native support for persistent javac workers.
Here's a mockup of the original working Bazel rule implementation using ctx.actions.run:
def _impl(ctx):
output_dir = ctx.actions.declare_directory(output)
args = ctx.actions.args()
args.add(output_dir.path, format = "-AoutputDir=%s")
ctx.actions.run(
...,
outputs = [output_dir],
arguments = [args],
)
return DefaultInfo(files = depset([output_dir]))
What I'd now like to do is swap out ctx.actions.run with java_common.compile. Here's a mockup of what I've come up with:
def _impl(ctx):
output_dir = ctx.actions.declare_directory(output)
java_common.compile(
...
output = ctx.label.name + "-placeholder.jar", # -proc:only
javac_opts = [
"-proc:only",
"-AoutputDir={}".format(output_dir.path),
],
plugins = [ctx.attr._emitter[JavaInfo]],
annotation_processor_additional_outputs = [output_dir],
)
return DefaultInfo(files = depset([output_dir]))
Here's my problem: after building my target, output_dir is created but empty. I'm able to locate my files in Bazel's output root by running find -L bazel-out -name uniqueOutputDir, but they're buried under bazel-out/darwin-fastbuild/bin/my_packages/_javac/my_target/my_target-placeholder_sourcegenfiles/uniqueOutputDir, and then rolled into bazel-bin/my_package/my_target-placeholder-gensrc.jar.
Any ideas? Like, how is annotation_processor_additional_outputs supposed to work? How can I specify via javac_opts to write directly to output_dir.path without the ...sourcegenfiles prefix?
Thanks!
Convincing java_compile to let my processor emit directly to output_dir appears to be a non-starter. (Output location seems to be governed by the Bazel-impl-defined sourcegendir.)
So with that in mind, after java_common.compile, I added an action which extracted my outputs from the generated source jar, and this appears to solve my problem.
def _impl(ctx):
output_dir = ctx.actions.declare_directory(output)
java_common.compile(
...
output = ctx.label.name + "-placeholder.jar", # -proc:only
javac_opts = [
"-proc:only",
"-AoutputDir={}".format(output_dir.basename),
],
plugins = [ctx.attr._emitter[JavaInfo]],
annotation_processor_additional_outputs = [output_dir],
)
extract_args = ctx.actions.args()
extract_args.add(output_dir.dirname)
extract_args.add(java_info.annotation_processing.source_jar)
ctx.actions.run_shell(
inputs = [java_info.annotation_processing.source_jar],
outputs = [output_dir],
arguments = [extract_args],
command = """
set -euo pipefail
output_root=$1
gensrcjar=$2
unzip -q -d $output_root $gensrcjar
""",
)
return DefaultInfo(files = depset([output_dir]))

How to write a Bazel test rule using a provided tool rather than a rule-built one?

I have a test tool (roughly, a diffing tool) that takes two inputs, and returns both an output (the difference between the two inputs), and a return code (0 if the two inputs are matching, 1 otherwise). It's built in Kotlin, and available at //java/fr/enoent/phosphorus in my repo.
I want to write a rule that tests that a file generated by something is identical to the reference file already present in the repository. I tried something with ctx.actions.run, the problem being that my rule, having test = True set, needs to return an executable built by that rule (so not a tool provided to the rule). I then tried to wrap it in a shell script following the example, like this:
def _phosphorus_test_impl(ctx):
output = ctx.actions.declare_file("{name}.phs".format(name = ctx.label.name))
script = phosphorus_compare(
ctx,
reference = ctx.file.reference,
comparison = ctx.file.comparison,
out = output,
)
ctx.actions.write(
output = ctx.outputs.executable,
content = script,
)
runfiles = ctx.runfiles(files = [ctx.executable._phosphorus_tool, ctx.file.reference, ctx.file.comparison])
return [DefaultInfo(runfiles = runfiles)]
phosphorus_test = rule(
_phosphorus_test_impl,
attrs = {
"comparison": attr.label(
allow_single_file = [".phs"],
doc = "File to compare to the reference",
mandatory = True,
),
"reference": attr.label(
allow_single_file = [".phs"],
doc = "Reference file",
mandatory = True,
),
"_phosphorus_tool": attr.label(
default = "//java/fr/enoent/phosphorus",
executable = True,
cfg = "host",
),
},
doc = "Compares two files, and fails if they are different.",
test = True,
)
(phosphorus_compare is just a macro generating the actual command.)
However, this approach has two issues:
The output can't be declared this way. It's not linked to any action (and Bazel is complaining about it). Maybe I don't really need to declare an output for a test? Does Bazel make anything in the test folder available when the test fails?
The runfiles necessary to run the tool don't seem to be available when the test runs:
java/fr/enoent/phosphorus/phosphorus: line 359: /home/kernald/.cache/bazel/_bazel_kernald/58c025fbb926eac6827117ef80f7d2fa/sandbox/linux-sandbox/1979/execroot/fr_enoent/bazel-out/k8-fastbuild/bin/tools/phosphorus/tests/should_pass.runfiles/remotejdk11_linux/bin/java: No such file or directory
Overall I feel like using a shell script is just adding an unnecessary indirection, and losing some context (e.g. tools' runfiles). Ideally, I would just use ctx.actions.run and rely on its return code, but it doesn't seem to be an option as a test apparently needs to generate an executable. What would be the correct approach to write such a rule?
Turns out, generating a script is the correct approach, it's (as far as I understood) impossible to return some kind of pointer to a ctx.actions.run. A test rule needs to have an executable output.
Regarding the output file that the tool is generating: there's no need to declare it, at all. I just need to make sure that it's generated in $TEST_UNDECLARED_OUTPUTS_DIR. Every single file in this directory will be added to an archive called output.zip by Bazel. This is (partly) documented here.
Concerning the runfiles, well, I had the tool's binary, but not its own runfiles. Here is the fixed rule:
def _phosphorus_test_impl(ctx):
script = phosphorus_compare(
ctx,
reference = ctx.file.reference,
comparison = ctx.file.comparison,
out = "%s.phs" % ctx.label.name,
)
ctx.actions.write(
output = ctx.outputs.executable,
content = script,
)
return [
DefaultInfo(
runfiles = ctx.runfiles(
files = [
ctx.executable._phosphorus_tool,
ctx.file.reference,
ctx.file.comparison,
],
).merge(ctx.attr._phosphorus_tool[DefaultInfo].default_runfiles),
executable = ctx.outputs.executable,
),
]
def phosphorus_test(size = "small", **kwargs):
_phosphorus_test(size = size, **kwargs)
_phosphorus_test = rule(
_phosphorus_test_impl,
attrs = {
"comparison": attr.label(
allow_single_file = [".phs"],
doc = "File to compare to the reference",
mandatory = True,
),
"reference": attr.label(
allow_single_file = [".phs"],
doc = "Reference file",
mandatory = True,
),
"_phosphorus_tool": attr.label(
default = "//java/fr/enoent/phosphorus",
executable = True,
cfg = "target",
),
},
doc = "Compares two files, and fails if they are different.",
test = True,
)
The key part being .merge(ctx.attr._phosphorus_tool[DefaultInfo].default_runfiles) in the returned DefaultInfo.
I also made a small mistake about the configuration, as this test is intended to run on the target configuration, not host, it's been fixed accordingly.

Resources