Substitutions and Java library lifecycle - bazel

I've got a Java project I'm converting to Bazel.
As is typical with Java projects, there are property files with placeholders that need to be resolved/substituted at build time.
Some of the values can be hardcoded in a BUILD or BZL file:
BUILD_PROPERTIES = { "pom.version": "1.0.0", "pom.group.id": "com.mygroup"}
Some of the variables are "stamps" (e.g. BUILD_TIMESTAMP, GIT_REVISION, etc): The source for these variables are volatile-status.txt and stable-status.txt
I must generate a POM for publish, so I use #bazel_common//tools/maven:pom_file in BUILD
(assume that I need ALL the values described above for my pom template):
_local_build_properties = {}
_local_build_properties.update(BUILD_PROPERTIES)
# somehow add workspace status properties?
# add / override
_local_build_properties.update({
"pom.project.name": "my-submodule",
"pom.project.description": "My submodule description",
"pom.artifact.id": "my-submodule",
})
# Variable placeholders in the pom template are wrapped with {}
_pom_substitutions = { '{'+k+'}':v for (k,v) in _local_build_properties.items()}
pom_file(
name = "my_submodule_pom",
targets = [
"//my-submodule",
],
template_file = "//:pom_template.xml",
substitutions = _pom_substitutions,
)
So, my questions are:
How do I get key-value pairs from volatile/stable -status.txt into the
dictionary I need for pom_file.substitutions?
pom_file depends on java_library so that it can write its dependencies
into the POM. How do I update the jar generated by java_library with the
pom?
Once I have the pom and the updated jar containing the pom, how do I publish to a Maven repo?
When I look at existing code, for example rules_docker, it seems that the implementation always bails to a local executable (shell | python | go) to do the real work of substitution, jar manipulation and image publication. Am I trying to do too much in BUILD and BZL files? Should I be thinking, "Ultimately, what do I need to pass to local shell/python/go scripts to get real build work done?

(Answered on bazel-discuss group)
Hi,
You can't get these values from Starlark. You need a genrule to read the stable/volatile files and do the substitutions using an
external tool like 'sed'.
A file cannot be both an input and output of an action, i.e. you can't update the .jar from which you generate the pom. The action has
to produce a new .jar file.
I don't know -- how would you publish outside of Bazel, is there a tool to do so? Can you write a genrule / Starlark rule to wrap this
tool?
Cheers, László

Related

Have all Bazel packages expose their documentation files (or any file with a given extension)

Bazel has been working great for me recently, but I've stumbled upon a question for which I have yet to find a satisfactory answer:
How can one collect all files bearing a certain extension from the workspace?
Another way of phrasing the question: how could one obtain the functional equivalent of doing a glob() across a complete Bazel workspace?
Background
The goal in this particular case is to collect all markdown files to run some checks and generate a static site from them.
At first glance, glob() sounds like a good idea, but will stop as soon as it runs into a BUILD file.
Current Approaches
The current approach is to run the collection/generation logic outside of the sandbox, but this is a bit dirty, and I'm wondering if there is a way that is both "proper" and easy (ie, not requiring that each BUILD file explicitly exposes its markdown files.
Is there any way to specify, in the workspace, some default rules that will be added to all BUILD files?
You could write an aspect for this to aggregate markdown files in a bottom-up manner and create actions on those files. There is an example of a file_collector aspect here. I modified the aspect's extensions for your use case. This aspect aggregates all .md and .markdown files across targets on the deps attribute edges.
FileCollector = provider(
fields = {"files": "collected files"},
)
def _file_collector_aspect_impl(target, ctx):
# This function is executed for each dependency the aspect visits.
# Collect files from the srcs
direct = [
f
for f in ctx.rule.files.srcs
if ctx.attr.extension == f.extension
]
# Combine direct files with the files from the dependencies.
files = depset(
direct = direct,
transitive = [dep[FileCollector].files for dep in ctx.rule.attr.deps],
)
return [FileCollector(files = files)]
markdown_file_collector_aspect = aspect(
implementation = _file_collector_aspect_impl,
attr_aspects = ["deps"],
attrs = {
"extension": attr.string(values = ["md", "markdown"]),
},
)
Another way is to do a query on file targets (input and output files known to the Bazel action graph), and process these files separately. Here's an example querying for .bzl files in the rules_jvm_external repo:
$ bazel query //...:* | grep -e ".bzl$"
//migration:maven_jar_migrator_deps.bzl
//third_party/bazel_json/lib:json_parser.bzl
//settings:stamp_manifest.bzl
//private/rules:jvm_import.bzl
//private/rules:jetifier_maven_map.bzl
//private/rules:jetifier.bzl
//:specs.bzl
//:private/versions.bzl
//:private/proxy.bzl
//:private/dependency_tree_parser.bzl
//:private/coursier_utilities.bzl
//:coursier.bzl
//:defs.bzl

why I use "local_repository" in a .bzl file, then it tell me name 'local_repository' is not defined?

I wanna build envoy via bazel,i mannual download some package in my pc, then I change http_archive to local_repository, but it tell me name 'local_repository' is not defined. Did local_repository need any load action?
local_repository can be used in WORKSPACE,but can not in my .bzl file
WORKSPACE:
workspace(name = "envoy")
load("//bazel:api_repositories.bzl", "envoy_api_dependencies")
envoy_api_dependencies()
load("//bazel:repositories.bzl", "GO_VERSION", "envoy_dependencies")
load("//bazel:cc_configure.bzl", "cc_configure")
envoy_dependencies()
`repositories.bzl`:
local_repository(
name = "com_google_protobuf",
path = "/home/user/com_google_protobuf",
)
local_repository is a workspace rule so I think it's not available outside of the WORKSPACE file.
If you want to call local_repository from a .bzl file you can define a function in there, using native, and call it from WORKSPACE, e.g.:
# repositories.bzl
def deps():
native.local_repository(
name = "com_google_protobuf",
path = "/home/user/com_google_protobuf",
)
# WORKSPACE
load("//:repositories.bzl", "deps")
deps()
I've seen this pattern, for example, in the grpc project.
In a .bzl file, you have to use native.local_repository instead of just local_repository.
All symbols in .bzl files are expected to be defined in Starlark, but local_repository is a special rule that is defined natively within Bazel.

How to write Bazel rules that work with external repositories?

The Bazel Starlark API does strange things with files in external repositories. I have the following Starlark snippet:
print(ctx.genfiles_dir)
print(ctx.genfiles_dir.path)
print(output_filename)
ret = ctx.new_file(ctx.genfiles_dir, output_filename)
print(ret.path)
It is creating the following output:
DEBUG: build_defs.bzl:292:5: <derived root>
DEBUG: build_defs.bzl:293:5: bazel-out/k8-fastbuild/genfiles
DEBUG: build_defs.bzl:294:5: google/protobuf/descriptor.upb.c
DEBUG: build_defs.bzl:296:5: bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.upb.c
That extra external/com_google_protobuf comes seemingly out of nowhere, and it makes my rule fail:
I tell protoc to generate into ctx.genfiles_dir.path (which is bazel-out/k8-fastbuild/genfiles).
So protoc generates bazel-out/k8-fastbuild/genfiles/google/protobuf/descriptor.upb.c
Bazel fails because I didn't generate bazel-out/k8-fastbuild/genfiles/external/com_google_protobuf/google/protobuf/descriptor.upb.c
Likewise, when I try to call file.short_path on a source file from an external repository, I get a result like ../com_google_protobuf/google/protobuf/descriptor.proto. This seems quite unhelpful, so I just wrote some manual code to strip off the leading ../com_google_protobuf/.
Am I missing something? How can I write this rule in a way that doesn't feel like I'm fighting Bazel the whole time?
Am I missing something?
The basic problem, as you already realized, is that you have two path "namespaces" the one that protoc sees (i.e. import paths) and the one that bazel sees (i.e. the path you pass to declare_file().
2 things to note:
1) All paths declared with declare_file() get the path <bin dir>/<package path incl. workspace>/<path you passed to declare_file()>
2) All actions are executed from <bin dir> (unless output_to_genfils=True in which case this switches to <gen dir> as in your example.
Trying to solve the exact same problem you encountered, I resorted to stripping the known path from the output_file's path to determine which directory to pass as p:
# This code is run from the context of the external protobuf dependency
proto_path = "google/a/b.proto"
output_file = ctx.actions.declare_file(proto_path)
# output_file.path would be `<gen_dir>/external/protobuf/google/a/b.proto`
# Strip the known proto_path from output_file.path
protoc_prefix = output_file.path[:-len(proto_path)]
print(protoc_prefix) # Prints: <gen_dir>/external/protobuf
command = "{protoc} {proto_paths} {cpp_out} {plugin} {plugin_options} {proto_file}".format(
...
cpp_out = "--cpp_out=" + protoc_prefix,
...
)
Alternatives
You may also be able to construct the same path with ctx.bin_dir, ctx.label.workspace_name, ctx.label.package, and ctx.label.name.
Misc.
proto_library recently gained an attribute strip_import_prefix. When used, the above is not correct, as all dependent files are symlinked into a new directory from which they have the relative paths declared with strip_import_prefix.
The path format is:
<bin dir>/<repo>/<package>/_virtual_base/<label name>/<path `import`ed in .proto files>
i.e.
<bin dir>/external/protobuf/_virtual_base/b_proto/google/a/b.proto
Assuming you are building an external repo called protobuf, which contains a BUILD file at its root with a target named b_proto, which in turn, relies on a proto_library wrapping google/a/b.proto AND uses the strip_import_prefix attribute.

How can I use the JAR tool with Bazel v0.19+?

Starting with Bazel v0.19, if you have Starlark (formerly known as "Skylark") code that references #bazel_tools//tools/jdk:jar, you see messages like this at build time:
WARNING: <trimmed-path>/external/bazel_tools/tools/jdk/BUILD:79:1: in alias rule #bazel_tools//tools/jdk:jar: target '#bazel_tools//tools/jdk:jar' depends on deprecated target '#local_jdk//:jar': Don't depend on targets in the JDK workspace; use #bazel_tools//tools/jdk:current_java_runtime instead (see https://github.com/bazelbuild/bazel/issues/5594)
I think I could make things work with #bazel_tools//tools/jdk:current_java_runtime if I wanted access to the java command, but I'm not sure what I'd need to do to get the jar tool to work. The contents of the linked GitHub issue didn't seem to address this particular problem.
I stumbled across a commit to Bazel that makes a similar adjustment to the Starlark java rules. It uses the following pattern: (I've edited the code somewhat)
# in the rule attrs:
"_jdk": attr.label(
default = Label("//tools/jdk:current_java_runtime"),
providers = [java_common.JavaRuntimeInfo],
),
# then in the rule implementation:
java_runtime = ctx.attr._jdk[java_common.JavaRuntimeInfo]
jar_path = "%s/bin/jar" % java_runtime.java_home
ctx.action(
inputs = ctx.files._jdk + other inputs,
outputs = [deploy_jar],
command = "%s cmf %s" % (jar_path, input_files),
)
Additionally, java is available at str(java_runtime.java_executable_exec_path) and javac at "%s/bin/javac" % java_runtime.java_home.
See also, a pull request with a simpler example.
Because my reference to the jar tool is inside a genrule within top-level macro, rather than a rule, I was unable to use the approach from Rodrigo's answer. I instead explicitly referenced the current_java_runtime toolchain and was then able to use the JAVABASE make variable as the base path for the jar tool.
native.genrule(
name = genjar_rule,
srcs = [<rules that create files being jar'd>],
cmd = "some_script.sh $(JAVABASE)/bin/jar $# $(SRCS)",
tools = ["some_script.sh", "#bazel_tools//tools/jdk:current_java_runtime"],
toolchains = ["#bazel_tools//tools/jdk:current_java_runtime"],
outs = [<some outputs>]
)

Does Bazel need external-repo BUILD files to be in $WORKSPACE_ROOT/external?

I made a repository for glfw with this:
load("#bazel_tools//tools/build_defs/repo:git.bzl", "new_git_repository")
new_git_repository(
name = "glfw",
build_file = "BUILD.glfw",
remote = "https://github.com/glfw/glfw.git",
tag = "3.2.1",
)
I put BUILD.glfw in the WORKSPACE root. When I built, I saw:
no such package '#glfw//': Not a regular file: [snipped/external/BUILD.glfw
I moved BUILD.glfw to external/BUILD.glfw and it seems to work, but I couldn't find documentation about this. The docs about new_git_repository say that build_file "...is a label relative to the main workspace."; I don't see anything about 'external' there.
This is due to an inconsistent semantical difference between the native and (newer) Skylark versions of new_git_repository. To use the native new_git_repository, comment/remove the load statement:
# load("#bazel_tools//tools/build_defs/repo:git.bzl", "new_git_repository")
Assuming that new_git_repository has the same problem that http_archive has, per Bazel issue 6225 you need to refer to the BUILD file for glfw as #//:BUILD.glfw

Resources