Skylark - How to execute a jar from a repository rule - bazel

Context
I am writing a repository rule that invokes another Bazel project. My current approach is to build the additional project as a deploy jar. I would like a user to be able to instantiate the rule like:
jar_path = some/relative/path
my_rule(name = "something", p_arg="m_arg", binary=jar_path)
and then given the jar_path and the arguments, I would like the repository rule to execute the following command in the shell:
java -jar $(SOME_JAR) $(ARGUMENTS_PROVIDED_BY_RULE)
Problem
First, it's unclear how best to accomplish the deploy jar approach. So far, I have attempt two different approaches, with varying levels of success. For examples, I have skimmed through the scala_rules, the maven_rules, and the skylark cookbook.
Second, and more importantly, I am not sure whether the deploy jar is the best route to accomplishing my goals. Again, my interest is to invoke a target from an external Bazel project, that is currently hosted on github. (So feasibly, I could try to fetch the project using the http_archive rule).
Below, I describe the attempts I have made.
Approach 1
My first approach involved trying to execute the command using the command field in ctx.action. I tried various enumerations of
java -jar {computed_absolute_path_of_deploy_jar} {args_passed_from_instantiation}.
My biggest issue here was with determining the absolute path of the deploy jar. The file's root path, would contain some additional information. For example, it would like something like this.
/abs/olute/path[ something ]/rela/tive/path
As a side note, I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None, despite File.none not being None.
My first approach involved was to was to try to use skylark [ctx.binary]
Approach 2
Next thing I tried was to mimic the input binary example from the docs. This was also unsuccessful. The issue was that the actual binary could not be found. Here is how I configured it.
First, I relaxed the repository rule into a regular skylark rule.
def _test_binary(ctx):
ctx.action(
....
arguments = [ctx.attr.p_arg],
executable = ctx.executable.binary)
test_binary = rule(
...
attrs = {
"binary":attr.label(mandatory=True, cfg="host", allow_files=True, executable=True),
...
}
Then, in my external project, I loaded the skylark rule into the WORKSPACE file. Finally, I called the macro from one of my BUILD files as follows:
load("#something_rule//:something_rule.bzl", "test_binary")
test_binary(name = "hello", p_arg = "hello", binary = "script.sh")
The script is a one line java -jar something_deploy.jar -- -arg:$1, and is in the same directory as the BUILD file.
Bazel complains that src/script.sh does not exist. I presume because it is looking for the file in /private/var/tmp/-bazel_username/somehash/relative_path. In response, I tried to pass the absolute path, which is not allowed.
Cheers.

It looks like you're mixing up repository rules with build extensions ("normal" rules). A good rule of thumb is:
Repository rules are for getting sources onto your system or symlinking them to a place Bazel can see them.
Build extension are for everything else: compiling, copying files, running binaries, etc.
I don't actually think you need to use either, for this. You say that the other project is on GitHub, so you can add the following to your WORKSPACE file:
http_archive(
name = "other_project",
...
)
Then, in your BUILD file:
genrule(
name = "run-a-jar",
srcs = ["#other_project//some/relative:path"],
cmd = "java -jar $(location #other_project//some/relative:path) -- arg1 arg2 > $#",
outs = ["jar-output"],
)
You shouldn't need to use the _deploy.jar target, since you're not moving the jar out of its project (_deploy.jar is useful when you need to relocate it).
Other things from your question:
I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None,
Are you sure it didn't evaluate to ""? The path is relative to the execution root, so for sources, it will always be "" (for outputs, it'll be bazel-out/local-fastbuild/bin or similar).
Bazel complains that src/script.sh does not exist.
Passing -s to Bazel can really help debugging Skylark rules. You can see exactly where it is looking.

Related

How to use --save_temps in Bazel rule instead of command line?

Is there a way to control the Bazel build to generate wanted temp files for a list of source files instead of just using the command line option "--save_temps"?
One way is using a cc_binary, and add "-E" option in the "copts", but the obj file name will always have a ".o". This kind of ".o" files will be overwriten by the other build targets. I don't know how to control the compiler output file name in Bazel.
Any better ideas?
cc_library has an output group with the static library, which you can then extract. Something like this:
filegroup(
name = "extract_archive",
srcs = [":some_cc_library"],
output_group = "archive",
)
Many tools will accept the static archive instead of an object file. If the tool you're using does, then that's easy. If not, things get a bit more complicated.
Extracting the object file from the static archive is a bit trickier. You could use a genrule with the $(AR) Make variable, but that won't work with some C++ toolchains that require additional flags to configure architectures etc.
The better (but more complicated) answer is to follow the guidance in integrating with C++ rules. You can get the ar from the toolchain and the flags to use it in a custom rule, and then create an action to extract it. You could also access the OutputGroupInfo from the cc_library in the rule directly instead of using filegroup if you've already got a custom rule.
Thanks all for your suggestions.
Now I think I can solve this problem in two steps(Seems Bazel does not allow to combine two rules into one):
Step1, add a -E option like a normal cc_libary, we can call it a pp_library. It is easy.
Step2, in a new rules, its input is the target of pp_library, then in this rule find out the obj files(can be found via : action.outputs.to_list()) and copy them to the a new place via ctx.actions.run_shell() run_shell.
I take Bazel: copy multiple files to binary directory as a reference.

Are absolute paths safe to use in Bazel?

I am experimenting with Bazel to be added along with an old, make/shell based build system. I can easily make shell commands which returns an absolute path to some tool or library build by the old build system as early prerequisites. These commands I can use in a genrule(), which copies the needed files (like headers and libs) into Bazel proper to be exposed in form of a cc_library().
I found out that genrule() does not detect a dependency if the command uses a file with absolute path - it is not caught by the sandbox. In a way I am (ab)using that behavior.
It is it safe? Will some future update of Bazel refuse access to files based on absolute path in that way in a command in genrule?
Most of Bazel's sandboxes allow access to most paths outside of the source tree by default. Details depend on which sandbox implementation you're using. The docker sandbox, for example, allows access to all those paths inside of a docker image. It's kind of hard to make promises about future Bazel versions, but I think it's unlikely that a sandbox will prevent accessing /bin/bash (for example), which means other absolute paths will probably continue to work too.
--sandbox_block_path can be used to explicitly block a path if you want.
If you always have the files available on every machine you build on, your setup should work. Keep in mind that Bazel will not recognize when the contents of those files change, so you can easily get stale results in various caches. You can avoid that by ensuring the external paths change whenever their contents do.
new_local_repository might be a better fit to avoid those problems, if you know the paths ahead of time.
If you don't know the paths ahead of time, you can write a custom repository rule which runs arbitrary commands via repository_ctx.execute to retrieve the paths and them symlinks them in with repository_ctx.symlink.
Tensorflow's third_party/sycl/sycl_configure.bzl has an example of doing something similar (you would do something other than looking at environment variables like find_computecpp_root does, and you might symlink entire directories instead of all the files in them):
def _symlink_dir(repository_ctx, src_dir, dest_dir):
"""Symlinks all the files in a directory.
Args:
repository_ctx: The repository context.
src_dir: The source directory.
dest_dir: The destination directory to create the symlinks in.
"""
files = repository_ctx.path(src_dir).readdir()
for src_file in files:
repository_ctx.symlink(src_file, dest_dir + "/" + src_file.basename)
def find_computecpp_root(repository_ctx):
"""Find ComputeCpp compiler."""
sycl_name = ""
if _COMPUTECPP_TOOLKIT_PATH in repository_ctx.os.environ:
sycl_name = repository_ctx.os.environ[_COMPUTECPP_TOOLKIT_PATH].strip()
if sycl_name.startswith("/"):
return sycl_name
fail("Cannot find SYCL compiler, please correct your path")
def _sycl_autoconf_imp(repository_ctx):
<snip>
computecpp_root = find_computecpp_root(repository_ctx)
<snip>
_symlink_dir(repository_ctx, computecpp_root + "/lib", "sycl/lib")
_symlink_dir(repository_ctx, computecpp_root + "/include", "sycl/include")
_symlink_dir(repository_ctx, computecpp_root + "/bin", "sycl/bin")

Access runfiles path in bazel BUILD file

I try to write a very simple cc_test in Bazel that builds a test runner and hands it the path to a test file as command line argument.
I tried to use the following snippet which seemed to do the trick according to 1 and 2.
cc_test(
name = "my_test",
srcs = [...],
deps = [...],
data = [":test_file"],
args = ["$(location :test_file)"]
)
My runner gets a relative path to the testfile, which, however, is not the proper path and the test fails.
This seems to be related to symlink issues with bazel under Windows (http://jayconrod.com/posts/108/writing-bazel-rules--data-and-runfiles), but I cannot believe there is no way to easily achieve what I am trying to.
I am aware of this answer, but I am searching for a purely BUILD-file based solution which doesn't use custom rules (whether this is a proper choice is different question, but I have the feeling that I am just missing something very fundamental here).

Default, platform specific, Bazel flags in bazel.rc

I was wondering if its possible for platform-specific default Bazel build flags.
For example, we want to use --workspace_status_command but this must be a shell script on Linux and must point towards a batch script for Windows.
Is there a way we can write in the tools/bazel.rc file something like...
if platform=WINDOWS build: --workspace_status_command=status_command.bat
if platform=LINUX build: --workspace_status_command=status_command.sh
We could generate a .bazelrc file by having the users run a script before building, but it would be cleaner/nicer if this was not neccessary.
Yes, kind of. You can specify config-specific bazelrc entries, which you can select by passing --config=<configname>.
For example your bazelrc could look like:
build:linux --cpu=k8
build:linux --workspace_status_command=/path/to/command.sh
build:windows --cpu=x64_windows
build:windows --workspace_status_command=c:/path/to/command.bat
And you'd build like so:
bazel build --config=linux //path/to:target
or:
bazel build --config=windows //path/to:target
You have to be careful not to mix semantically conflicting --config flags (Bazel doesn't prevent you from that). Though it will work, the results may be unpredictable when the configs tinker with the same flags.
Passing --config to all commands is tricky, it depends on developers remembering to do this, or controlling the places where Bazel is called.
I think a better answer would be to teach the version control system how to produce the values, like by putting a git-bazel-stamp script on the $PATH/%PATH% so that git bazel-stamp works.
Then we need workspace_status_command to allow commands from the PATH rather than a path on disk.
Proper way to do this is to wrap your cc_library with a custom macro, and pass hardcoded flags to copts. For full reference, look at envoy_library.bzl.
In short, your steps:
Define a macro to wrap cc_library:
def my_cc_library(
name,
copts=[],
**kwargs):
cc_library(name, copts=copts + my_flags(), **kwargs)
Define my_flags() macro as following:
config_setting(
name = "windows_x86_64",
values = {"cpu": "x64_windows"},
)
config_setting(
name = "linux_k8",
values = {"cpu": "k8"},
)
def my_flags():
x64_windows_options = ["/W4"]
k8_options = ["-Wall"]
return select({
":windows_x86_64": x64_windows_options,
":linux_k8": k8_options,
"//conditions:default": [],
})
How it works:
Depending on --cpu flag value my_flags() will return different flags.
This value is resolved automatically based on a platform. On Windows, it's x64_windows, and on Linux it's k8.
Then, your macro my_cc_library will supply this flags to every target in a project.
A better way of doing this has been added since you asked--sometime in 2019.
If you add
common --enable_platform_specific_config to your .bazelrc, then --config=windows will automatically apply on windows hosts, --config=macos on mac, --config=linux on linux, etc.
You can then add lines to your .bazelrc like:
build:windows --windows-flags
build:linux --linux-flags
There is one downside, though. This works based on the host rather than the target. So if you're cross-compiling, e.g. to mobile, and want different flags there, you'll have to go with a solution like envoy's (see other answer), or (probably better) add transitions into your graph targets. (See discussion here and here. "Flagless builds" are still under development, but there are usable hacks in the meantime.) You could also use the temporary platform_mappings API.
References:
Commit that added this functionality.
Where it appears in the Bazel docs.

How to deal with implicit dependency (e.g C++ include) for incremental build in custom Skylark rule

Problem
I wonder how to inform bazel about dependencies unknown at declaration time, but known at build time (a.k.a implicit dependencies, dynamic dependencies, ...). For instance when compiling C++ sources, the .cpp source file will depends on some header files and this information is not available when writing the BUILD file. It needs to be retrieve at build time. Whatever is the solution to get the information (dry-run, generating depfile, parsing stdout), it needs to be done at build time and the information need to be retrieved to bazel build graph.
Since skylark does not allow to do I/O, for instance to read a generated depfile or to parse stdout result containing a dependency list, I have no clue on how to deal with it.
Behind implicit dependencies, I am looking for correct incremental build.
Example
To experiment this problem I have created a simple tool, just_a_tool.exe which takes an input file, read a list of file from it, and concatenate the content of all these file to an output file.
command line example:
just_a_tool.exe --input input.txt --depfile dep.d output.txt
dep.d contains the list of all the read files.
Issue
If I change the content of test1.txt, test2.txt, or test3.txt, bazel does not rebuild output.txt file. Of course, because it does not know there were dependencies.
Example files
just_a_tool.bzl
def _impl(ctx):
exec_path = "C:/Code/JustATool/just_a_tool.exe"
for f in ctx.attr.source.files:
source_path = f.path
output_path = ctx.outputs.out.path
dep_file = ctx.actions.declare_file("dep.d")
args = ["--input", source_path, "--dep_file", dep_file.path, output_path]
ctx.actions.run(
outputs=[ctx.outputs.out, dep_file],
executable=exec_path,
inputs=ctx.attr.source.files,
arguments=args
)
jat_convert = rule(
implementation = _impl,
attrs = {
"source" : attr.label(mandatory=True, allow_files=True, single_file=True)
},
outputs = {"out": "%{name}.txt"}
)
BUILD
load("//tool:just_a_tool.bzl", "jat_convert")
jat_convert(
name="my_output",
source=":input.txt"
)
input.txt
test1.txt
test2.txt
test3.txt
Goal
I want to do correct and fast incremental build for the following situation:
Generate reflection data from C++ sources, this custom tool execution depends on header file included in my source files.
Use a internal tool to build asset file which can include other files
Run a custom preprocessor on my shaders allowing a #include feature
Thanks!
Bazel's extension language doesn't support creating actions with a dynamic set of inputs, where this set depends on the output of a previous action. In other words, custom rules cannot run an action, read the action's output, then create actions with those inputs or update (or prune the set of) inputs of already created actions.
Instead I suggest adding attribute(s) to your rule where the user can declare the set of files that the sources may include. I call this "the universe of headers". The actions you create depend on this user-defined universe, so the set of action inputs is completely defined. Of course this means these actions potentially depend on more files than the cpp files, which they process, include.
This approach is analogous to how the cc_* rules work: a file in cc_*.srcs can include other files in the srcs of the same rule and from hdrs of dependencies, but nothing else. Thus the union of srcs + hdrs of (direct & transitive) dependencies defines the universe of header files that a cpp file may include.

Resources