Does Bazel offer a variable substitution for a temp directory in genrules?
Sometimes I need a staging area before creating the final output artefact.
I am imagining something like this:
genrule(
name = "example",
srcs = [ "a.txt" ],
cmd = "cp $< $(TMP)/b.txt && cp $(TMP)/b.txt $#",
)
$(TMP) would be a folder generated for me by Bazel on each rule execution.
No it doesn't. (As of Bazel 0.23.1)
It does set $TMPDIR though (even with --incompatible_strict_action_env), so mktemp should work. But $TMPDIR is by no means a dedicated temp directory (it's often just /tmp), so be careful what you clobber.
I migrated my genrule to a full Starlark rule. There I can do
tmp = ctx.actions.declare_directory("TMP_" + ctx.label.name)
and just use that directory as my temp in further actions.
It is similar to what the Starlark tutorial shows, in https://docs.bazel.build/versions/2.0.0/skylark/rules-tutorial.html#creating-a-file. The difference is that I do not register that directory as an output. That is, I don't do something like
return [DefaultInfo(files = depset([tmp]))]
You can make your own inside the bash code:
export TMP=$(mktemp -d || mktemp -d -t bazel-tmp)
trap "rm -rf $TMP" EXIT # Delete on exit
# Do things...
Related
I have set up bazel to build a number of CLI tools that perform various database maintenance tasks. Each one is a py_binary or cc_binary target that is called from the command line with the path to some data file: it processes that file and stores the results in a database.
Now, I need to create a dependent package that contains data files and shell scripts that call these CLI tools to perform application-specific database operations.
However, there doesn't seem to be a way to depend on the existing py_binary or cc_binary targets from a new package that only contains sh_binary targets and data files. Trying to do so results in an error like:
ERROR: /workspace/shbin/BUILD.bazel:5:12: in deps attribute of sh_binary rule //shbin:run: py_binary rule '//pybin:counter' is misplaced here (expected sh_library)
Is there a way to call/depend on an existing bazel binary target from a shell script using sh_binary?
I have implemented a full example here:
https://github.com/psigen/bazel-mixed-binaries
Notes:
I cannot use py_library and cc_library instead of py_binary and cc_binary. This is because (a) I need to call mixes of the two languages to process my data files and (b) these tools are from an upstream repository where they are already designed as CLI tools.
I also cannot put all the data files into the CLI tool packages -- there are multiple application-specific packages and they cannot be mixed.
You can either create a genrule to run these tools as part of the build, or create a sh_binary that depends on the tools via the data attribute and runs them them.
The genrule approach
This is the easier way and lets you run the tools as part of the build.
genrule(
name = "foo",
tools = [
"//tool_a:py",
"//tool_b:cc",
],
srcs = [
"//source:file1",
":file2",
],
outs = [
"output_file1",
"output_file2",
],
cmd = "$(location //tool_a:py) --input=$(location //source:file1) --output=$(location output_file1) && $(location //tool_b:cc) < $(location :file2) > $(location output_file2)",
)
The sh_binary approach
This is more complicated, but lets you run the sh_binary either as part of the build (if it is in a genrule.tools, similar to the previous approach) or after the build (from under bazel-bin).
In the sh_binary you have to data-depend on the tools:
sh_binary(
name = "foo",
srcs = ["my_shbin.sh"],
data = [
"//tool_a:py",
"//tool_b:cc",
],
)
Then, in the sh_binary you have to use the so-called "Bash runfiles library" built into Bazel to look up the runtime-path of the binaries. This library's documentation is in its source file.
The idea is:
the sh_binary has to depend on a specific target
you have to copy-paste some boilerplate code to the top of the sh_binary (reason is described here)
then you can use the rlocation function to look up the runtime-path of the binaries
For example your my_shbin.sh may look like this:
#!/bin/bash
# --- begin runfiles.bash initialization ---
...
# --- end runfiles.bash initialization ---
path=$(rlocation "__main__/tool_a/py")
if [[ ! -f "${path:-}" ]]; then
echo >&2 "ERROR: could not look up the Python tool path"
exit 1
fi
$path --input=$1 --output=$2
The __main__ in the rlocation path argument is the name of the workspace. Since your WORKSPACE file does not have a "workspace" rule in, which would define the workspace's name, Bazel will use the default workspace name, which is __main__.
An easier approach for me is to add the cc_binary as a dependency in the data section. In prefix/BUILD
cc_binary(name = "foo", ...)
sh_test(name = "foo_test", srcs = ["foo_test.sh"], data = [":foo"])
Inside foo_test.sh, the working directory is different, so you need to find the right prefix for the binary
#! /usr/bin/env bash
executable=prefix/foo
$executable ...
A clean way to do this is to use args and $(location):
Contents of BUILD:
py_binary(
name = "counter",
srcs = ["counter.py"],
main = "counter.py",
)
sh_binary(
name = "run",
srcs = ["run.sh"],
data = [":counter"],
args = ["$(location :counter)"],
)
Contents of counter.py (your tool):
print("This is the counter tool.")
Contents of run.sh (your bash script):
#!/bin/bash
set -eEuo pipefail
counter="$1"
shift
echo "This is the bash script, about to call the counter tool."
"$counter"
And here's a demo showing the bash script calling the Python tool:
$ bazel run //example:run 2>/dev/null
This is the bash script, about to call the counter tool.
This is the counter tool.
It's also worth mentioning this note (from the docs):
The arguments are not passed when you run the target outside of bazel (for example, by manually executing the binary in bazel-bin/).
I'm just getting started working with Bazel. So, I apologize in advance that I haven't been able to figure this out.
I'm trying to run a command that outputs a bunch of files to a directory and make this directory available for subsequent targets. I have two different attempts:
Use genrule
Write my own rule
I was naively hoping to just do this with a genrule. But, it doesn't seem you can say "I don't know exactly what this command is going to output" and put a directory in outs. Now I'm trying to write a rule that can use ctx.actions.declare_directory but I haven't gotten it quite right. I can't seem to get the tools over from my workspace and into my rule.
My genrule attempt looks something like this:
genrule(
name = "doit",
srcs = [
"doitConfigA",
"doitConfigB",
],
cmd = 'HOME=. ./$(location path/to/doit) install',
# Neither of the below outs work - seems like bazel wants to know
# exactly this list of files. I don't know the files that
# will be output ahead of time.
# This one looks at the `out_dir` that I already have and
# expects the files to be the same which they might not be
outs = glob(["out_dir/**/*.*"]),
# this fails with:
# "declared output 'out_dir' was not
# created by genrule. This is probably because the genrule actually
# didn't create this output, or because the output was a directory
# and the genrule was run remotely (note that only the contents of
# declared file outputs are copied from genrules run remotely)"
outs = ['out_dir'],
tools = ['path/to/doit'],
)
My custom rule attempt looks something like this:
def _impl(ctx):
dir = ctx.actions.declare_directory("out_dir")
ctx.actions.run_shell(
outputs=[dir],
progress_message="Running doit install ...",
command="HOME=. ./path/to/doit install",
tools=[ctx.attr.tools],
)
doit = rule(
implementation=_impl,
attrs={
"tools": attr.label_list(allow_files=True),
},
outputs={"out": "out_dir"},
)
Then, to run my doit rule, my BUILD file looks like this:
doit(
name = 'doit',
tools = ['path/to/doit'],
)
In my genrule, the command runs but it doesn't like my trying to use a directory in outs, it seems. In my custom rule, I can't seem to tell Bazel that I want to use ./path/to/doit as a tool from my workspace, eg expected type 'File' for 'tools' element but got type 'list' instead ...
Seems like I must be missing something basic because surely this is a common situation to run a command and output a bunch of unknown stuff to a directory?
The output of a genrule must be a fixed list of files. As a work-around, you can create a zip from the output directory.
I used this approach to manipulate the output of yarn install where the usual method was not viable:
genrule(
name = "node_modules",
srcs = [
"package.json",
"yarn.lock",
],
cmd = " && ".join([
"yarn install --pure-lockfile",
"zip -r $# node_modules",
]),
outs = [
"node_modules.zip",
],
)
Then a rule that consumes the zip:
# Rule that generates a list of the folders in node_modules
genrule(
name = "node_modules_ls",
srcs = [
":node_modules",
],
cmd = " && ".join([
"unzip $(location :node_modules) -d . ",
"ls > $#",
]),
outs = [
"out.txt",
],
)
A while ago I created this example showing how to use directories with skylark action: How to build static library from the Generated source files using Bazel Build. Maybe it still works :)
Genrule won't work, this is too advanced use case.
https://github.com/aspect-build/bazel-lib/blob/main/docs/run_binary.md has a similar API to genrule, and it supports directory outputs.
I'm looking for a good recipe to run "checks" or "verify" steps in Bazel, like go vet, gofmt, pylint, cppcheck. These steps don't create any output file. The only thing that matters is the return code (like a test).
Right now I'm using the following recipe:
sh_test(
name = "verify-pylint",
srcs = ["verify-pylint.sh"],
data = ["//:all-srcs"],
)
And verify-pylint.sh looks like this:
find . -name '*.py' | xargs pylint
This has two problems:
The verify logic is split between the shell script and the BUILD file. Ideally I would like to have both in the same place (in the BUILD file)
Anytime one of the source file changes (in //:all-srcs), bazel test verify-pylint re-runs pylint on every single file (and that can be expensive/slow).
What is the idiomatic way in bazel to run these steps?
There are more than one solutions.
The cleanest way is to do the verification at build time: you create a genrule for each file (or batch of files) you want to verify, and if verification succeeds, the genrule outputs something, if it fails, then the rule outputs nothing, which automatically fails the build as well.
Since success of verification depends on the file's contents, and the same input should yield the same output, the genrules should produce an output file that's dependent on the contents of the input(s). The most convenient thing is to write the digest of the file(s) to the output if verification succeeded, and no output if verification fails.
To make the verifier reusable, you could create a Skylark macro and use it in all your packages.
To put this all together, you'd write something like the following.
Contents of //tools:py_verify_test.bzl:
def py_verify_test(name, srcs, visibility = None):
rules = {"%s-file%d" % (name, hash(s)): s for s in srcs}
for rulename, src in rules.items():
native.genrule(
name = rulename,
srcs = [s],
outs = ["%s.md5" % rulename],
cmd = "$(location //tools:py_verifier) $< && md5sum $< > $#",
tools = ["//tools:py_verifier"],
visibility = ["//visibility:private"],
)
native.sh_test(
name = name,
srcs = ["//tools:build_test.sh"],
data = rules.keys(),
visibility = visibility,
)
Contents of //tools:build_test.sh:
#!/bin/true
# If the test rule's dependencies could be built,
# then all files were successfully verified at
# build time, so this test can merely return true.
Contents of //tools:BUILD:
# I just use sh_binary as an example, this could
# be a more complicated rule of course.
sh_binary(
name = "py_verifier",
srcs = ["py_verifier.sh"],
visibility = ["//visibility:public"],
)
Contents of any package that wants to verify files:
load("//tools:py_verify_test.bzl", "py_verify_test")
py_verify_test(
name = "verify",
srcs = glob(["**/*.py"]),
)
A simple solution.
In your BUILD file:
load(":gofmt.bzl", "gofmt_test")
gofmt_test(
name = "format_test",
srcs = glob(["*.go"]),
)
In gofmt.bzl:
def gofmt_test(name, srcs):
cmd = """
export TMPDIR=.
out=$$(gofmt -d $(SRCS))
if [ -n "$$out" ]; then
echo "gmfmt failed:"
echo "$$out"
exit 1
fi
touch $#
"""
native.genrule(
name = name,
cmd = cmd,
srcs = srcs,
outs = [name + ".out"],
tools = ["gofmt.sh"],
)
Some remarks:
If your wrapper script grows, you should put it in a separate .sh file.
In the genrule command, we need $$ instead $ due to escaping (see documentation)
gofmt_test is actually not a test and will run with bazel build :all. If you really need a test, see Laszlo's example and call sh_test.
I call touch to create a file because genrule requires an output to succeed.
export TMPDIR=. is needed because by default the sandbox prevents writing in other directories.
To cache results for each file (and avoid rechecking a file that hasn't changed), you'll need to create multiple actions. See Laszlo's for loop.
To simplify the code, we could provide a generic rule. Maybe this is something we should put in a standard library.
I seem to have hit a quirk regarding Makefiles, list processing, and pattern rules. When my source is installed in a directory whose name includes '%', it fails to build correctly. I've boiled it down to a simple test case anybody could run.
I have the following Makefile:
INSTALL_DIR=$(shell pwd)/idir
INSTALL_TREE =
INSTALL_TREE += $(INSTALL_DIR)/dir1
INSTALL_TREE += $(INSTALL_DIR)/dir2
INSTALL_TREE += $(INSTALL_DIR)/dir3
create_tree: $(INSTALL_TREE)
$(INSTALL_TREE):
mkdir -p $#
I put this in a directory called test1, and when I run it I get exactly what I want... all the directories listed in INSTALL_TREE are created:
% make -n
mkdir -p /home/christopher.arguin/test/test1/idir/dir1
mkdir -p /home/christopher.arguin/test/test1/idir/dir2
mkdir -p /home/christopher.arguin/test/test1/idir/dir3
Now use the exact same Makefile in a directory called test%2, and this is what I get:
% make -n
mkdir -p /home/christopher.arguin/test/test2%/idir/dir1
It stops after the first directory. Interestingly, if the first directory does not have a % sign, they all get generated..., e.g.,
INSTALL_TREE =
INSTALL_TREE += ./idir/dir4
INSTALL_TREE += $(INSTALL_DIR)/dir1
INSTALL_TREE += $(INSTALL_DIR)/dir2
INSTALL_TREE += $(INSTALL_DIR)/dir3
gives me:
mkdir -p dir4
mkdir -p /home/christopher.arguin/test/test2%/idir/dir1
mkdir -p /home/christopher.arguin/test/test2%/idir/dir2
mkdir -p /home/christopher.arguin/test/test2%/idir/dir3
The other interesting thing is that the files names are correct... if the % were being misinterpreted as a prefix rule or something you might expect some substitution, but if I let it create the directories it does the absolute right thing.
I tried escaping the % character, but to no avail. Adding this code:
PERCENT := %
ITREE = $(subst $(PERCENT),$(PERCENT)$(PERCENT),$(INSTALL_TREE))
create_tree2: $(ITREE)
$(ITREE):
mkdir -p $#
Just yielded
mkdir -p /home/christopher.arguin/test/test2%%/idir/dir1
The substitution worked, but it didn't actually escape the percent sign. Neither did replacing "%" with "$(PERCENT)", as the filename just contained $(PERCENT) in it afterward.
I found two related questions,
Does GNU Make support '%' in a filename?
and How to correctly escape "%" sign when using pattern rules and patsubst in GNU make?
but neither of those suffered the basic problem I have... that the directory my source is in is out of my control.
Background:
The reason is this an issue for me is that I am trying to migrate to Jenkins Multibranch Pipeline. When Jenkins detects a commit, it creates a workspace based on the name of the branch. If your branch naming convention happens to have slashes in it, Jenkins does the right thing and converts those to "%2F". My makefiles are running afoul of those.
A target containing % defines a pattern rule, and pattern rules with multiple targets are processed differently:
https://www.gnu.org/software/make/manual/make.html#Pattern-Intro
Pattern rules may have more than one target. Unlike normal rules, this does not act as many different rules with the same prerequisites and recipe. If a pattern rule has multiple targets, make knows that the rule’s recipe is responsible for making all of the targets. The recipe is executed only once to make all the targets.
You can fix this by quoting % when defining the path rule. Note that this isn't necessary for the dependencies of create_tree as that is a regular rule, in fact it won't work if you do, as make will look for targets with a literal \%.
INSTALL_DIR := $(abspath .)/idir
INSTALL_TREE += $(INSTALL_DIR)/dir1
INSTALL_TREE += $(INSTALL_DIR)/dir2
INSTALL_TREE += $(INSTALL_DIR)/dir3
.PHONY: create_tree
create_tree: $(INSTALL_TREE)
$(subst %,\%,$(INSTALL_TREE)):
mkdir -p $#
The program is very simple:
#!/bin/csh -f
foreach path ( fileA.txt fileB.txt )
wc -l $path
grep "test" $path
end
However, the output is:
fileA.txt/wc: Not a directory.
fileA.txt/grep: Not a directory.
fileB.txt/wc: Not a directory.
fileB.txt/grep: Not a directory.
So what's wrong with the code and what's the correct way of doing it?
You should never use path as a generic variable name in C-Shell since it contains the current search directories for the shell to find the command programs.
This will work much better than your code:
#!/bin/csh -f
foreach mypath ( fileA.txt fileB.txt )
wc -l $mypath
grep "test" $mypath
end