Can I load common rules from a .bzl file? - bazel

We frequently need common combinations of rules per tech stack.
That currently wastes a lot of space in WORKSPACE - and they should be kept in sync over multiple repos. It's 50+ lines after buildifier and contains too many urls, versions and hashes.
Now say I have a "technology stack" repo and do something like
load("#techstack_repo//mylang.bzl", "load_rules")
load_rules()
where load_rules would load and initialize pinned versions of e.g. rules_go, bazel-gazelle, rules_docker, rules_proto and initialize all of them in the right order so they are visible in WORKSPACE?
I did not get this to work in my tests because load apparently can not be run in a function in a bzl file - it's not a function itself.
Is there a way to do this?
Here's an example of what I tested for Java:
load("#io_bazel_rules_docker//repositories:repositories.bzl", container_repositories = "repositories")
load("#io_bazel_rules_docker//repositories:deps.bzl", container_deps = "deps")
load("#io_bazel_rules_docker//container:container.bzl", "container_pull")
load("#rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")
load(
"#io_grpc_grpc_java//:repositories.bzl",
"IO_GRPC_GRPC_JAVA_ARTIFACTS",
"IO_GRPC_GRPC_JAVA_OVERRIDE_TARGETS",
"grpc_java_repositories",
)
load("#rules_jvm_external//:defs.bzl", "maven_install")
def prepare_stack(maven_deps = []):
container_repositories()
container_deps()
container_pull(
name = "java_base",
# https://console.cloud.google.com/gcr/images/distroless/GLOBAL/java-debian10
# tag = "11", # OpenJDK 11 as of 2020-03-04
digest = "sha256:eda9e5ae2facccc9c7016f0c2d718d2ee352743bda81234783b64aaa402679b6",
registry = "gcr.io",
repository = "distroless/java-debian10",
)
rules_proto_dependencies()
rules_proto_toolchains()
maven_install(
artifacts = maven_deps + IO_GRPC_GRPC_JAVA_ARTIFACTS,
# for improved debugging in IDE
fetch_sources = True,
generate_compat_repositories = True,
override_targets = IO_GRPC_GRPC_JAVA_OVERRIDE_TARGETS,
repositories = [
"https://repo.maven.apache.org/maven2/",
"https://repo1.maven.org/maven2",
],
strict_visibility = True,
)
grpc_java_repositories()
... all http_archive calls for the rule repos are in WORKSPACE and I want to move them in here, but that did not work at all.
As is, I get this error:
ERROR: Failed to load Starlark extension '#rules_python//python:pip.bzl'.
Cycle in the workspace file detected. This indicates that a repository is used prior to being defined.
The following chain of repository dependencies lead to the missing definition.
- #rules_python
This could either mean you have to add the '#rules_python' repository with a statement like `http_archive` in your WORKSPACE file (note that transitive dependencies are not added automatically), or move an existing definition earlier in your WORKSPACE file.
also adding rules_python does not help either.

I found a solution:
Split it into two files.
One with imports like this:
load("#bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
load("#bazel_tools//tools/build_defs/repo:utils.bzl", "maybe")
def declare():
maybe(
git_repository,
name = "rules_cc",
commit = "34ca16f4aa4bf2a5d3e4747229202d6cb630ebab",
remote = "https://github.com/bazelbuild/rules_cc.git",
shallow_since = "1584036492 -0700",
)
# ... for me requires at least rules_cc, rules_python, bazel_skylib
# for later proto, docker, go, java support
and another using the declared external sources:
# go
load("#io_bazel_rules_go//go:deps.bzl", "go_register_toolchains", "go_rules_dependencies")
load("#bazel_gazelle//:deps.bzl", "gazelle_dependencies")
# protobuf
load("#rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")
# container
load("#io_bazel_rules_docker//container:container.bzl", "container_pull")
load("#io_bazel_rules_docker//repositories:repositories.bzl", container_repositories = "repositories")
load("#io_bazel_rules_docker//repositories:deps.bzl", container_deps = "deps")
load("#io_bazel_rules_docker//go:image.bzl", go_image_repositories = "repositories")
def init_rules():
go_rules_dependencies()
go_register_toolchains()
gazelle_dependencies()
rules_proto_dependencies()
rules_proto_toolchains()
container_repositories()
container_deps()
go_image_repositories()
container_pull(
name = "go_static",
digest = "sha256:9b60270ec0991bc4f14bda475e8cae75594d8197d0ae58576ace84694aa75d7a",
registry = "gcr.io",
repository = "distroless/static",
)
It's a bit of a hassle, but fetch this repo with http_archive or git_repository, load the first file and call declare and load the second for init_rules and call that.
It may be a little convoluted, but it still helps to unify the stack and simplify your WORKSPACE.

Related

Extract a subset of files from TreeArtifact as a list of files in bazel?

I'm trying to consume the output of https://github.com/OpenAPITools/openapi-generator-bazel, which happily takes my openapi.json file and generates a tree of source code for python.
Unfortunately I can't use it directly, because I need only the python files in a subdirectory to be used as sources in a py_library rule.
...but it seems to produce a single generated file, which is a ctx.actions.declare_directory sort of artifact.
I'm flummoxed. I can make a rule that extracts a subdirectory from that in the same way using ctx.actions.run_shell, but since you have to declare every output file in a rule, and the directory is opaque to bazel, I can't declare every output file with an action, so I can't find any way to iterate across the input directory.
Surely, surely, surely there is a way to filter a TreeArtifact by inspection. Any ideas?
It's not totally clear how the authors intended for openapi_generator to be used, because in general directory outputs are not well supported as outputs of targets themselves. E.g. py_library and java_library don't know to look inside the directory outputs of other targets. At least for now, typically directory outputs are more for passing things between actions within the implementations of rules.
Indeed there's an open issue for this on OpenAPI: https://github.com/OpenAPITools/openapi-generator-bazel/issues/22
And there's a related Bazel bug about taking directory outputs as srcs, at least for the Java rules: https://github.com/bazelbuild/bazel/issues/11996
Compare to protocol buffers, for example, where (usually) a .proto file corresponds to 1 output python file (foo.proto -> foo_pb2.py), so the outputs can be derived from the py_proto_library's srcs directly.
Anyway, one workaround is to explicitly list the expected output files:
defs.bzl:
def _get_openapi_files(ctx):
for out in ctx.outputs.outs:
ctx.actions.run_shell(
inputs = ctx.files.src,
outputs = [out],
command = "cp {src} {dst}".format(
src = ctx.files.src[0].path + "/" + out.short_path,
dst = out.path,
),
)
return DefaultInfo(files = depset(ctx.outputs.outs))
get_openapi_files = rule(
implementation = _get_openapi_files,
attrs = {
"src": attr.label(mandatory = True),
"outs": attr.output_list(mandatory = True),
},
)
BUILD:
load("#openapi_tools_generator_bazel//:defs.bzl", "openapi_generator")
load(":defs.bzl", "get_openapi_files")
openapi_generator(
name = "gen_petstore_python",
generator = "python",
spec = "petstore.yaml",
)
get_openapi_files(
name = "get_petstore_python_files",
src = ":gen_petstore_python",
outs = [
"openapi_client/models/__init__.py",
"openapi_client/apis/__init__.py",
"openapi_client/__init__.py",
"openapi_client/model_utils.py",
"openapi_client/api/__init__.py",
"openapi_client/api/pets_api.py",
"openapi_client/rest.py",
"openapi_client/configuration.py",
"openapi_client/exceptions.py",
"openapi_client/api_client.py",
"openapi_client/model/__init__.py",
"openapi_client/model/pets.py",
"openapi_client/model/pet.py",
"openapi_client/model/error.py",
],
)
py_library(
name = "petstore_python",
srcs = [":get_petstore_python_files"],
)
py_binary(
name = "petstore_main",
srcs = [":petstore_main.py"],
deps = [":petstore_python"],
)
petstore_main.py:
from openapi_client.model import pet
p = pet.Pet(123, "lassie")
print(p)
petstore.yaml is https://github.com/OpenAPITools/openapi-generator-bazel/blob/fb7e302de4597277bea12757836f2ce988c805ee/internal/test/petstore.yaml
$ bazel run petstore_main
INFO: Analyzed target //:petstore_main (45 packages loaded, 600 targets configured).
INFO: Found 1 target...
Target //:petstore_main up-to-date:
bazel-bin/petstore_main
INFO: Elapsed time: 2.150s, Critical Path: 1.42s
INFO: 20 processes: 5 internal, 15 linux-sandbox.
INFO: Build completed successfully, 20 total actions
INFO: Build completed successfully, 20 total actions
{'id': 123, 'name': 'lassie'}
The obvious downside is that any time you make a modification to the API definition that changes the what files are created, you have to go and update the BUILD file. And creating the list of output files might be tedious if you have a lot of api definitions.
Another workaround is to take advantage of the fact that Python doesn't really get compiled in the build system, and play some symlink tricks. However, this requires setting --experimental_allow_unresolved_symlinks (which can be added to the .bazelrc file):
defs.bzl:
def _symlink_openapi_files_impl(ctx):
symlink = ctx.actions.declare_symlink("openapi_client")
ctx.actions.symlink(
output = symlink,
target_path = ctx.files.src[0].path + "/openapi_client")
return [
DefaultInfo(
default_runfiles = ctx.runfiles(files = ctx.files.src + [symlink])),
PyInfo(transitive_sources = depset(ctx.files.src)),
]
symlink_openapi_files = rule(
implementation = _symlink_openapi_files_impl,
attrs = {
"src": attr.label(mandatory = True),
},
)
BUILD:
load("#openapi_tools_generator_bazel//:defs.bzl", "openapi_generator")
load(":defs.bzl", "symlink_openapi_files")
openapi_generator(
name = "gen_petstore_python",
generator = "python",
spec = "petstore.yaml",
)
symlink_openapi_files(
name = "symlink_petstore_python_files",
src = ":gen_petstore_python",
)
py_binary(
name = "petstore_main",
srcs = [":petstore_main.py"],
deps = [":symlink_petstore_python_files"],
)
$ bazel run petstore_main --experimental_allow_unresolved_symlinks
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
INFO: Analyzed target //:petstore_main (48 packages loaded, 634 targets configured).
INFO: Found 1 target...
Target //:petstore_main up-to-date:
bazel-bin/petstore_main
INFO: Elapsed time: 1.797s, Critical Path: 1.38s
INFO: 7 processes: 6 internal, 1 linux-sandbox.
INFO: Build completed successfully, 7 total actions
INFO: Build completed successfully, 7 total actions
{'id': 123, 'name': 'lassie'}
Another alternative is to use a repository rule to generate the files. A repository rule can do things outside the regular rule rule / target execution model like generating BUILD files, however this amounts to basically reimplementing OpenAPI's Bazel integration.
Good response from ahumesky! It turns out at least in the Python case that there's another way. One way is to do as ahumesky says and explicitly declare all the output files in a separate rule, and that works well. Another way is to declare your own rule which can accept a TreeArtifact, but the sneaky way to do it is wrap the output in a PyInfo provider, as is done at https://github.com/jvolkman/rules_pycross/blob/main/pycross/private/wheel_library.bzl, which I cannibalized for this, although I will probably change to using the symlink approach above:
def _python_client_from_openapi_impl(ctx):
"""Rule that generates the python library from the openapi-generator source
directory (a tree)"""
output = ctx.actions.declare_directory(ctx.attr.package_name) # label.name)
# because the openapi generator imports the package directly ("import [packagename]")
# and names things appropriately, we can't so easily just use it fully-qualified;
# we'll have to add it to the import path.
# If the package is 'jwt_generator/service', this adds '__main__/jwt_generator/service'
# to the import path, so if the client packagename was "jwt_client", you can just
# import jwt_client
imp = paths.join(
ctx.label.workspace_name or ctx.workspace_name,
ctx.label.package,
)
print(imp)
imports = depset(
direct=[imp], transitive=[d[PyInfo].imports for d in ctx.attr.deps]
)
ctx.actions.run_shell(
outputs=[output],
inputs=ctx.files.srcs,
command="cp -r {}/{}/* {}".format(
ctx.files.srcs[0].path, ctx.attr.package_name, output.path
),
)
# now all the relative python sources are in our output directory
transitive_sources = depset(direct=[output], transitive=[])
# runfiles (https://bazel.build/rules/lib/runfiles)
# "a set of files required at runtime execution" -- which in the case of
# interpreted languages generally means much of the source
runfiles = ctx.runfiles(files=[output])
return [
DefaultInfo(files=depset([output]), runfiles=runfiles),
PyInfo(
has_py2_only_sources=False,
has_py3_only_sources=True,
imports=imports,
transitive_sources=transitive_sources,
),
]
python_client_from_openapi = rule(
implementation=_python_client_from_openapi_impl,
attrs={
"srcs": attr.label(allow_files=True, doc="Output of openapi_client rule"),
"package_name": attr.string(),
"deps": attr.label_list(
doc="A list of the client's dependencies; typically just urllib3 and python-dateutil",
providers=[DefaultInfo, PyInfo],
),
},
)
This actually works pretty well although it was a sort of first-try hack and not robust/efficient, so I'm going to probably use a combination of the two.

Bazel: share macro between multiple http_archive BUILD files

My project depends on some external libraries which I have to bazelfy myself. Thus, my WORKSPACE:
http_archive(
name = "external_lib_component1",
build_file = "//third_party:external_lib_component1.BUILD",
sha256 = "xxx",
urls = ["https://example.org/external_lib_component1.tar.gz"],
)
http_archive(
name = "external_lib_component2",
build_file = "//third_party:external_lib_component2.BUILD",
sha256 = "yyy",
urls = ["https://example.org/external_lib_component2.tar.gz"],
)
...
The two entries above are similar, and external_lib_component{1, 2}.BUILD share a lot of code.
What is the best way to share code (macros) between them?
Just putting a shared_macros.bzl file into third_party/ won't work, because it will not be copied into
the archive location on build (only the build_file is copied).
If you place a bzl file such a In your./third_party/shared_macros.bzl into your tree as you've mentioned.
Then in the //third_party:external_lib_component1.BUILD and //third_party:external_lib_component2.BUILD you provide for your external dependencies, you can load symbols from that shared file using:
load("#//third_party:shared_macros.bzl", ...)
Labels starting with #// refer to packages from the main repository, even when used in an external dependency (as they would otherwise be rooted when starting with //. You can for check docs on labels, in particular the last paragraph.
Alternatively you can also refer to the "parent" project by its name. If in your WORKSPACE file you've had:
workspace(name = "parent")
You could say:
load("#parent//third_party:shared_macros.bzl", ...)
Note: in versions prior to 2.0.0 you might want to add --incompatible_remap_main_repo if you mixed both of above approaches in your project.

How can I use the JAR tool with Bazel v0.19+?

Starting with Bazel v0.19, if you have Starlark (formerly known as "Skylark") code that references #bazel_tools//tools/jdk:jar, you see messages like this at build time:
WARNING: <trimmed-path>/external/bazel_tools/tools/jdk/BUILD:79:1: in alias rule #bazel_tools//tools/jdk:jar: target '#bazel_tools//tools/jdk:jar' depends on deprecated target '#local_jdk//:jar': Don't depend on targets in the JDK workspace; use #bazel_tools//tools/jdk:current_java_runtime instead (see https://github.com/bazelbuild/bazel/issues/5594)
I think I could make things work with #bazel_tools//tools/jdk:current_java_runtime if I wanted access to the java command, but I'm not sure what I'd need to do to get the jar tool to work. The contents of the linked GitHub issue didn't seem to address this particular problem.
I stumbled across a commit to Bazel that makes a similar adjustment to the Starlark java rules. It uses the following pattern: (I've edited the code somewhat)
# in the rule attrs:
"_jdk": attr.label(
default = Label("//tools/jdk:current_java_runtime"),
providers = [java_common.JavaRuntimeInfo],
),
# then in the rule implementation:
java_runtime = ctx.attr._jdk[java_common.JavaRuntimeInfo]
jar_path = "%s/bin/jar" % java_runtime.java_home
ctx.action(
inputs = ctx.files._jdk + other inputs,
outputs = [deploy_jar],
command = "%s cmf %s" % (jar_path, input_files),
)
Additionally, java is available at str(java_runtime.java_executable_exec_path) and javac at "%s/bin/javac" % java_runtime.java_home.
See also, a pull request with a simpler example.
Because my reference to the jar tool is inside a genrule within top-level macro, rather than a rule, I was unable to use the approach from Rodrigo's answer. I instead explicitly referenced the current_java_runtime toolchain and was then able to use the JAVABASE make variable as the base path for the jar tool.
native.genrule(
name = genjar_rule,
srcs = [<rules that create files being jar'd>],
cmd = "some_script.sh $(JAVABASE)/bin/jar $# $(SRCS)",
tools = ["some_script.sh", "#bazel_tools//tools/jdk:current_java_runtime"],
toolchains = ["#bazel_tools//tools/jdk:current_java_runtime"],
outs = [<some outputs>]
)

Does Bazel need external-repo BUILD files to be in $WORKSPACE_ROOT/external?

I made a repository for glfw with this:
load("#bazel_tools//tools/build_defs/repo:git.bzl", "new_git_repository")
new_git_repository(
name = "glfw",
build_file = "BUILD.glfw",
remote = "https://github.com/glfw/glfw.git",
tag = "3.2.1",
)
I put BUILD.glfw in the WORKSPACE root. When I built, I saw:
no such package '#glfw//': Not a regular file: [snipped/external/BUILD.glfw
I moved BUILD.glfw to external/BUILD.glfw and it seems to work, but I couldn't find documentation about this. The docs about new_git_repository say that build_file "...is a label relative to the main workspace."; I don't see anything about 'external' there.
This is due to an inconsistent semantical difference between the native and (newer) Skylark versions of new_git_repository. To use the native new_git_repository, comment/remove the load statement:
# load("#bazel_tools//tools/build_defs/repo:git.bzl", "new_git_repository")
Assuming that new_git_repository has the same problem that http_archive has, per Bazel issue 6225 you need to refer to the BUILD file for glfw as #//:BUILD.glfw

Using bazel macros across repositories with labels

I've got two repositories, Client and Library.
Inside of Client's WORKSPACE file Client imports Library as a http_archive with the name "foo".
Inside of Client, I want to use Library macros that reference targets inside Library. My problem is that the Library macros don't know that were imported as "foo", so when the macro is expanded the targets are not found.
library/WORKSPACE:
workspace(name = "library")
library/some.bzl:
def my_macro():
native.java_library(name = "my_macro_lib",
deps = ["#library//:my_macro_lib_dependnecy"]
)
library/BUILD.bazel:
java_library(name = "my_macro_lib_dependnecy",
...
)
client/WORKSPACE:
workspace(name = "client")
http_archive(
name = "library",
urls = [...],
strip_prefix = ...,
sha256 = ...,
)
Because both workspaces use the same name for library workspace (name = "library") and because the macro refers to the workspace name in its dependencies (#library//:my_macro_lib_dependnecy) this works.
Note this works but has some quirks which will be resolved in 0.17.0

Resources