I'm working on a problem in which I only want to create a particular rule if a certain Bazel config has been specified (via '--config'). We have been using Bazel since 0.11 and have a bunch of build infrastructure that works around former limitations in Bazel. I am incrementally porting us up to newer versions. One of the features that was missing was compiler transitions, and so we rolled our own using configs and some external scripts.
My first attempt at solving my problem looks like this:
load("#rules_cc//cc:defs.bzl", "cc_library")
# use this with a select to pick targets to include/exclude based on config
# see __build_if_role for an example
def noop_impl(ctx):
pass
noop = rule(
implementation = noop_impl,
attrs = {
"deps": attr.label_list(),
},
)
def __sanitize(config):
if len(config) > 2 and config[:2] == "//":
config = config[2:]
return config.replace(":", "_").replace("/", "_")
def build_if_config(**kwargs):
config = kwargs['config']
kwargs.pop('config')
name = kwargs['name'] + '_' + __sanitize(config)
binary_target_name = kwargs['name']
kwargs['name'] = binary_target_name
cc_library(**kwargs)
noop(
name = name,
deps = select({
config: [ binary_target_name ],
"//conditions:default": [],
})
)
This almost gets me there, but the problem is that if I want to build a library as an output, then it becomes an intermediate dependency, and therefore gets deleted or never built.
For example, if I do this:
build_if_config(
name="some_lib",
srcs=[ "foo.c" ],
config="//:my_config",
)
and then I run
bazel build --config my_config //:some_lib
Then libsome_lib.a does not make it to bazel-out, although if I define it using cc_library, then it does.
Is there a way that I can just create the appropriate rule directly in the macro instead of creating a noop rule and using a select? Or another mechanism?
Thanks in advance for your help!
As I noted in my comment, I was misunderstanding how Bazel figures out its dependencies. The create a file section of The Rules Tutorial explains some of the details, and I followed along here for some of my solution.
Basically, the problem was not that the built files were not sticking around, it was that they were never getting built. Bazel did not know to look in the deps variable and build those things: it seems I had to create an action which uses the deps, and then register an action by returning a (list of) DefaultInfo
Below is my new noop_impl function
def noop_impl(ctx):
if len(ctx.attr.deps) == 0:
return None
# ctx.attr has the attributes of this rule
dep = ctx.attr.deps[0]
# DefaultInfo is apparently some sort of globally available
# class that can be used to index Target objects
infile = dep[DefaultInfo].files.to_list()[0]
outfile = ctx.actions.declare_file('lib' + ctx.label.name + '.a')
ctx.actions.run_shell(
inputs = [infile],
outputs = [outfile],
command = "cp %s %s" % (infile.path, outfile.path),
)
# we can also instantiate a DefaultInfo to indicate what output
# we provide
return [DefaultInfo(files = depset([outfile]))]
Related
I have the following rule definition:
helm_action = rule(
attrs = {
…
"cluster_aliases": attr.string_dict(
doc = "key value pair matching for creating a cluster alias where the name used to evoke a cluster alias is different than the actual cluster's name",
default = DEFAULT_CLUSTER_ALIASES,
),
…
},
…
)
I'd like for DEFAULT_CLUSTER_ALIASES value to be based on the host os but
DEFAULT_CLUSTER_ALIASES = {
"local": select({
"#platforms//os:osx": "docker-desktop",
"#platforms//os:linux": "minikube",
})
}
errors with:
Error in string_dict: expected value of type 'string' for dict value element, but got select({"#platforms//os:osx": "docker-desktop", "#platforms//os:linux": "minikube"}) (select)
How do I go about defining DEFAULT_CLUSTER_ALIASES based on the host os?
Judging from https://github.com/bazelbuild/bazel/issues/2045, selecting based on host os is not possible.
When you create a rule or macro, it is evaluated during the loading phase, before command-line flags are evaluated. Bazel needs to know the default value in your build rule helm_action during the loading phase but can't because it hasn't parsed the command line and analysed the build graph.
The command line is parsed and select statements are evaluated during the analysis phase. As a broad rule, if your select statement isn't in a BUILD.bazel then it's not going to work. So the easiest way to achieve what you are after is to create a macro that uses your rule injecting the default. e.g.
# helm_action.bzl
# Add an '_' prefix to your rule to make the rule private.
_helm_action = rule(
attrs = {
…
"cluster_aliases": attr.string_dict(
doc = "key value pair matching for creating a cluster alias where the name used to evoke a cluster alias is different than the actual cluster's name",
# Remove default attribute.
),
…
},
…
)
# Wrap your rule in a publicly exported macro.
def helm_action(**kwargs):
_helm_action(
name = kwargs["name"],
# Instantiate your rule with a select.
cluster_aliases = DEFAULT_CLUSTER_ALIASES,
**kwargs,
)
It's important to note the difference between a macro and a rule. A macro is a way of generating a set of targets using other build rules, and actually expands out roughly equivalent to it's contents when used in a BUILD file. You can check this by querying a target with the --output build flag. e.g.
load(":helm_action.bzl", "helm_action")
helm_action(
name = "foo",
# ...
)
You can query the output using the command;
bazel query //:foo --output build
This will demonstrate that the select statement is being copied into the BUILD file.
A good example of this approach is in the rules_docker repository.
EDIT: The question was clarified, so I've got an updated answer below but will keep the above answer in case it is useful to others.
A simple way of achieving what you are after is to use Bazels toolchain api. This is a very flexible API and is what most language rulesets use in Bazel. e.g.
Create a build file with your toolchains;
# //helm:BUILD.bazel
load(":helm_toolchains.bzl", "helm_toolchain")
toolchain_type(name = "toolchain_type")
helm_toolchain(
name = "osx",
cluster_aliases = {
"local": "docker-desktop",
},
)
toolchain(
name = "osx_toolchain",
toolchain = ":osx",
toolchain_type = ":toolchain_type",
exec_compatible_with = ["#platforms//os:macos"],
# Optionally use to restrict target platforms too.
# target_compatible_with = []
)
helm_toolchain(
name = "linux",
cluster_aliases = {
"local": "minikube",
},
)
toolchain(
name = "linux_toolchain",
toolchain = ":linux",
toolchain_type = ":toolchain_type",
exec_compatible_with = ["#platforms//os:linux"],
)
Register your toolchains so that Bazel knows what to look for;
# //:WORKSPACE
# the rest of your workspace...
register_toolchains("//helm:all")
# You may need to register your execution platforms too...
# register_execution_platforms("//your_platforms/...")
Implement the toolchain backend;
# //helm:helm_toolchains.bzl
HelmToolchainInfo = provider(fields = ["cluster_aliases"])
def _helm_toolchain_impl(ctx):
toolchain_info = platform_common.ToolchainInfo(
helm_toolchain_info = HelmToolchainInfo(
cluster_aliases = ctx.attr.cluster_aliases,
),
)
return [toolchain_info]
helm_toolchain = rule(
implementation = _helm_toolchain_impl,
attrs = {
"cluster_aliases": attr.string_dict(),
},
)
Update helm_action to use toolchains. e.g.
def _helm_action_impl(ctx):
cluster_aliases = ctx.toolchains["#your_repo//helm:toolchain_type"].helm_toolchain_info.cluster_aliases
#...
helm_action = rule(
_helm_action_impl,
attrs = {
#…
},
toolchains = ["#your_repo//helm:toolchain_type"]
)
I have a project that involves multiple BUILD files in a single WORKSPACE, within a fairly complex build system. My goal in short: for some specific target, I want all of its recursive dependencies to be built with an extra set of attributes (copts/defines) compared to when those dependency targets are built in any other way. I have not yet found a way to do this cleanly.
For example, target G is normally built with copts = []. If target P depends on target G, and I run bazel build :P, I want both targets to be built with copts = ["-DMY_DEFINE"], along with all dependencies of target G, etc.
The cc_binary.defines argument propagates in the opposite direction: all targets that depend on some target A will receive all of target A's defines.
Limitations:
prefer to avoid custom command line flags, I don't control how people call bazel {build,test}
duplicating the entire tree of dependency targets is not practical
It doesn't appear possible to set the value of a config_setting from within a BUILD file or a target, so it seems a select-based solution couldn't work.
Previous work:
https://groups.google.com/g/bazel-discuss/c/rZps4nqYqt8/m/YS_pZD6oAQAJ - 2017, recommends "parallel trees" or custom macros (of which we already have many, it would be challenging to wrap them in another)
Propagate copts to all dependencies in Bazel - I believe these all depend on custom command line flags as well
Creating a user-defined build setting doesn't require command-line flags. If you set flag = False, then it actually can't be set on the command line. You can use a user-defined transition to set it instead.
I think something like this will do what you're looking for (save it in extra_copts.bzl):
def _extra_copts_impl(ctx):
context = cc_common.create_compilation_context(
defines = depset(ctx.build_setting_value)
)
return [CcInfo(compilation_context = context)]
extra_copts = rule(
implementation = _extra_copts_impl,
build_setting = config.string_list(flag = False),
)
def _use_extra_copts_implementation(ctx):
return [ctx.attr._copts[CcInfo]]
use_extra_copts = rule(
implementation = _use_extra_copts_implementation,
attrs = "_copts": attr.label(default = "//:extra_copts")},
)
def _add_copts_impl(settings, attr):
return {"//:extra_copts": ["MY_DEFINE"]}
_add_copts = transition(
implementation = _add_copts_impl,
inputs = [],
outputs = ["//:extra_copts"],
)
def _with_extra_copts_implementation(ctx):
infos = [d[CcInfo] for d in ctx.attr.deps]
return [cc_common.merge_cc_infos(cc_infos = infos)]
with_extra_copts = rule(
implementation = _with_extra_copts_implementation,
attrs = {
"deps": attr.label_list(cfg = _add_copts),
"_allowlist_function_transition": attr.label(
default = "#bazel_tools//tools/allowlists/function_transition_allowlist"
)
},
)
and then in the BUILD file:
load("//:extra_copts.bzl", "extra_copts", "use_extra_copts", "with_extra_copts")
extra_copts(name = "extra_copts", build_setting_default = [])
use_extra_copts(name = "use_extra_copts")
cc_library(
name = "G",
deps = [":use_extra_copts"],
)
with_extra_copts(
name = "P_deps",
deps = [":G"],
)
cc_library(
name = "P",
deps = [":P_deps"],
)
extra_copts is the build setting. It returns a CcInfo directly, which means it's straightforward to do any other C++ library swapping with the same approach. Its default is effectively an "empty" CcInfo which won't do anything to libraries that depend on it.
with_extra_copts wraps a set of dependencies, configured to use a different CcInfo. This is the rule that actually changes the value, to create the second version of G with different flags.
_add_copts is the transition which with_extra_copts uses to change the value of the extra_copts build setting. It could examine attr to do something more sophisticated than adding a hard-coded list.
use_extra_copts pulls the CcInfo out of extra_copts so a cc_library can use them.
To avoid rewriting the builtin C++ rules, this uses wrapper rules to pull the copts out and do the transition. You might want to create macros to bundle the wrapper rules along with the corresponding cc_library. Alternatively, you could use rules_cc's my_c_archive as a starting point to create custom rules that reuse the core implementation of the builtin C++ rules while integrating the transition and use of the build setting into a single rule.
I use an annotation processor to generate language bindings from interfaces defined in Java. I want to replace actions created by ctx.actions.run with actions generated by java_common.compile, so that I can exploit Bazel's native support for persistent javac workers.
Here's a mockup of the original working Bazel rule implementation using ctx.actions.run:
def _impl(ctx):
output_dir = ctx.actions.declare_directory(output)
args = ctx.actions.args()
args.add(output_dir.path, format = "-AoutputDir=%s")
ctx.actions.run(
...,
outputs = [output_dir],
arguments = [args],
)
return DefaultInfo(files = depset([output_dir]))
What I'd now like to do is swap out ctx.actions.run with java_common.compile. Here's a mockup of what I've come up with:
def _impl(ctx):
output_dir = ctx.actions.declare_directory(output)
java_common.compile(
...
output = ctx.label.name + "-placeholder.jar", # -proc:only
javac_opts = [
"-proc:only",
"-AoutputDir={}".format(output_dir.path),
],
plugins = [ctx.attr._emitter[JavaInfo]],
annotation_processor_additional_outputs = [output_dir],
)
return DefaultInfo(files = depset([output_dir]))
Here's my problem: after building my target, output_dir is created but empty. I'm able to locate my files in Bazel's output root by running find -L bazel-out -name uniqueOutputDir, but they're buried under bazel-out/darwin-fastbuild/bin/my_packages/_javac/my_target/my_target-placeholder_sourcegenfiles/uniqueOutputDir, and then rolled into bazel-bin/my_package/my_target-placeholder-gensrc.jar.
Any ideas? Like, how is annotation_processor_additional_outputs supposed to work? How can I specify via javac_opts to write directly to output_dir.path without the ...sourcegenfiles prefix?
Thanks!
Convincing java_compile to let my processor emit directly to output_dir appears to be a non-starter. (Output location seems to be governed by the Bazel-impl-defined sourcegendir.)
So with that in mind, after java_common.compile, I added an action which extracted my outputs from the generated source jar, and this appears to solve my problem.
def _impl(ctx):
output_dir = ctx.actions.declare_directory(output)
java_common.compile(
...
output = ctx.label.name + "-placeholder.jar", # -proc:only
javac_opts = [
"-proc:only",
"-AoutputDir={}".format(output_dir.basename),
],
plugins = [ctx.attr._emitter[JavaInfo]],
annotation_processor_additional_outputs = [output_dir],
)
extract_args = ctx.actions.args()
extract_args.add(output_dir.dirname)
extract_args.add(java_info.annotation_processing.source_jar)
ctx.actions.run_shell(
inputs = [java_info.annotation_processing.source_jar],
outputs = [output_dir],
arguments = [extract_args],
command = """
set -euo pipefail
output_root=$1
gensrcjar=$2
unzip -q -d $output_root $gensrcjar
""",
)
return DefaultInfo(files = depset([output_dir]))
The bazel build flag --workspace_status_command supports calling a script to retrieve e.g. repository metadata, this is also known as build stamping and available in rules like java_binary.
I'd like to create a custom rule using this metadata.
I want to use this for a common support function. It should receive the git version and some other attributes and create a version.go output file usable as a dependency.
So I started a journey looking at rules in various bazel repositories.
Rules like rules_docker support stamping with stamp in container_image and let you reference the status output in attributes.
rules_go supports it in the x_defs attribute of go_binary.
This would be ideal for my purpose and I dug in...
It looks like I can get what I want with ctx.actions.expand_template using the entries in ctx.info_file or ctx.version_file as a dictionary for substitutions. But I didn't figure out how to get a dictionary of those files. And those two files seem to be "unofficial", they are not part of the ctx documentation.
Building on what I found out already: How do I get a dict based on the status command output?
If that's not possible, what is the shortest/simplest way to access workspace_status_command output from custom rules?
I've been exactly where you are and I ended up following the path you've started exploring. I generate a JSON description that also includes information collected from git to package with the result and I ended up doing something like this:
def _build_mft_impl(ctx):
args = ctx.actions.args()
args.add('-f')
args.add(ctx.info_file)
args.add('-i')
args.add(ctx.files.src)
args.add('-o')
args.add(ctx.outputs.out)
ctx.actions.run(
outputs = [ctx.outputs.out],
inputs = ctx.files.src + [ctx.info_file],
arguments = [args],
progress_message = "Generating manifest: " + ctx.label.name,
executable = ctx.executable._expand_template,
)
def _get_mft_outputs(src):
return {"out": src.name[:-len(".tmpl")]}
build_manifest = rule(
implementation = _build_mft_impl,
attrs = {
"src": attr.label(mandatory=True,
allow_single_file=[".json.tmpl", ".json_tmpl"]),
"_expand_template": attr.label(default=Label("//:expand_template"),
executable=True,
cfg="host"),
},
outputs = _get_mft_outputs,
)
//:expand_template is a label in my case pointing to a py_binary performing the transformation itself. I'd be happy to learn about a better (more native, fewer hops) way of doing this, but (for now) I went with: it works. Few comments on the approach and your concerns:
AFAIK you cannot read in (the file and perform operations in Skylark) itself...
...speaking of which, it's probably not a bad thing to keep the transformation (tool) and build description (bazel) separate anyways.
It could be debated what constitutes the official documentation, but ctx.info_file may not appear in the reference manual, it is documented in the source tree. :) Which is case for other areas as well (and I hope that is not because those interfaces are considered not committed too yet).
For sake of comleteness in src/main/java/com/google/devtools/build/lib/skylarkbuildapi/SkylarkRuleContextApi.java there is:
#SkylarkCallable(
name = "info_file",
structField = true,
documented = false,
doc =
"Returns the file that is used to hold the non-volatile workspace status for the "
+ "current build request."
)
public FileApi getStableWorkspaceStatus() throws InterruptedException, EvalException;
EDIT: few extra details as asked in the comment.
In my workspace_status.sh I would have for instance the following line:
echo STABLE_GIT_REF $(git log -1 --pretty=format:%H)
In my .json.tmpl file I would then have:
"ref": "${STABLE_GIT_REF}",
I've opted for shell like notation of text to be replaced, since it's intuitive for many users as well as easy to match.
As for the replacement, relevant (CLI kept out of this) portion of the actual code would be:
def get_map(val_file):
"""
Return dictionary of key/value pairs from ``val_file`.
"""
value_map = {}
for line in val_file:
(key, value) = line.split(' ', 1)
value_map.update(((key, value.rstrip('\n')),))
return value_map
def expand_template(val_file, in_file, out_file):
"""
Read each line from ``in_file`` and write it to ``out_file`` replacing all
${KEY} references with values from ``val_file``.
"""
def _substitue_variable(mobj):
return value_map[mobj.group('var')]
re_pat = re.compile(r'\${(?P<var>[^} ]+)}')
value_map = get_map(val_file)
for line in in_file:
out_file.write(re_pat.subn(_substitue_variable, line)[0])
EDIT2: This is how the Python script is how I expose the python script to rest of bazel.
py_binary(
name = "expand_template",
main = "expand_template.py",
srcs = ["expand_template.py"],
visibility = ["//visibility:public"],
)
Building on Ondrej's answer, I now use somthing like this (adapted in SO editor, might contain small errors):
tools/bazel.rc:
build --workspace_status_command=tools/workspace_status.sh
tools/workspace_status.sh:
echo STABLE_GIT_REV $(git rev-parse HEAD)
version.bzl:
_VERSION_TEMPLATE_SH = """
set -e -u -o pipefail
while read line; do
export "${line% *}"="${line#* }"
done <"$INFILE" \
&& cat <<EOF >"$OUTFILE"
{ "ref": "${STABLE_GIT_REF}"
, "service": "${SERVICE_NAME}"
}
EOF
"""
def _commit_info_impl(ctx):
ctx.actions.run_shell(
outputs = [ctx.outputs.outfile],
inputs = [ctx.info_file],
progress_message = "Generating version file: " + ctx.label.name,
command = _VERSION_TEMPLATE_SH,
env = {
'INFILE': ctx.info_file.path,
'OUTFILE': ctx.outputs.version_go.path,
'SERVICE_NAME': ctx.attr.service,
},
)
commit_info = rule(
implementation = _commit_info_impl,
attrs = {
'service': attr.string(
mandatory = True,
doc = 'name of versioned service',
),
},
outputs = {
'outfile': 'manifest.json',
},
)
Situation
I have two Skylark extension rules: blah_library and blah_binary. All of a blah_library's transitive dependencies are propagated by returning a provider(transitive_deps=...), and are handled appropriately by any ultimate dependent blah_binary target.
What I want to do
I want each blah_library to also create a filegroup with all the transitive dependencies mentioned above, so that I can access them separately. E.g., I'd like to be able to pass them in as data dependencies to a cc_binary. In other words:
# Somehow have this automatically create a target named `foo__trans_deps`?
blah_library(
name = "foo",
srcs = [...],
deps = [...],
)
cc_binary(
...,
data = [":foo__trans_deps"],
)
How should I do this? Any help would be appreciated!
What I've tried
Make a macro
I tried making a macro like so:
_real_blah_library = rule(...)
def blah_library(name, *args, **kwargs):
native.filegroup(
name = name + "__trans_deps",
srcs = ???,
)
_real_blah_library(name=name, *args, **kwargs)
But I'm not sure how to access the provider provided by _real_blah_library from within the macro, so I don't know how to populate the filegroup's srcs field...
Modify the blah_library rule's implementation
Right now I have something like:
_blah_provider = provider(fields=['transitive_deps'])
def _blah_library_impl(ctx):
...
trans_deps = []
for dep in ctx.attr.deps:
trans_deps += dep[_blah_provider].trans_deps
return _blah_provider(trans_deps=trans_deps)
blah_library = rule(impl=_blah_library_impl, ...)
I tried adding the following to _blah_library_impl, but it didn't work because apparently native.filegroup can't be called within a rule's implementation ("filegroup() cannot be called during the analysis phase"):
def _blah_library_impl(ctx):
...
trans_deps = []
for dep in ctx.attr.deps:
trans_deps += dep[_blah_provider].trans_deps
native.filegroup(
name = ctx.attr.name + "__trans_deps",
srcs = trans_deps,
)
return _blah_provider(trans_deps=trans_deps)
You can't easily create a filegroup like that, but you can still achieve what you want.
If you want to use the rule in genrule.srcs, filegroup.srcs, cc_binary.data, etc., then return a DefaultInfo provider (along with _blah_provider) and set the files field to the transitive closure of files.
You can refine the solution if you want a different set of files when the rule is in a data attribute vs. when in any other (e.g. srcs): just also set the runfiles-related members in DefaultInfo. (Frankly I don't know the difference between them, I'd just set all runfiles-fields to the same value.)
I ended up making my own special filegroup-like rule, as discussed in the comments under #Laszlo's answer. Here's the raw code in case it's a useful starting point for anyone:
def _whl_deps_filegroup_impl(ctx):
input_wheels = ctx.attr.src[_PyZProvider].transitive_wheels
output_wheels = []
for wheel in input_wheels:
file_name = wheel.basename
output_wheel = ctx.actions.declare_file(file_name)
# TODO(josh): Use symlinks instead of copying. Couldn't figure out how
# to do this due to issues with constructing absolute paths...
ctx.actions.run(
outputs=[output_wheel],
inputs=[wheel],
arguments=[wheel.path, output_wheel.path],
executable="cp",
mnemonic="CopyWheel")
output_wheels.append(output_wheel)
return [DefaultInfo(files=depset(output_wheels))]
whl_deps_filegroup = rule(
_whl_deps_filegroup_impl,
attrs = {
"src": attr.label(),
},
)