Are absolute paths safe to use in Bazel? - bazel

I am experimenting with Bazel to be added along with an old, make/shell based build system. I can easily make shell commands which returns an absolute path to some tool or library build by the old build system as early prerequisites. These commands I can use in a genrule(), which copies the needed files (like headers and libs) into Bazel proper to be exposed in form of a cc_library().
I found out that genrule() does not detect a dependency if the command uses a file with absolute path - it is not caught by the sandbox. In a way I am (ab)using that behavior.
It is it safe? Will some future update of Bazel refuse access to files based on absolute path in that way in a command in genrule?

Most of Bazel's sandboxes allow access to most paths outside of the source tree by default. Details depend on which sandbox implementation you're using. The docker sandbox, for example, allows access to all those paths inside of a docker image. It's kind of hard to make promises about future Bazel versions, but I think it's unlikely that a sandbox will prevent accessing /bin/bash (for example), which means other absolute paths will probably continue to work too.
--sandbox_block_path can be used to explicitly block a path if you want.
If you always have the files available on every machine you build on, your setup should work. Keep in mind that Bazel will not recognize when the contents of those files change, so you can easily get stale results in various caches. You can avoid that by ensuring the external paths change whenever their contents do.
new_local_repository might be a better fit to avoid those problems, if you know the paths ahead of time.
If you don't know the paths ahead of time, you can write a custom repository rule which runs arbitrary commands via repository_ctx.execute to retrieve the paths and them symlinks them in with repository_ctx.symlink.
Tensorflow's third_party/sycl/sycl_configure.bzl has an example of doing something similar (you would do something other than looking at environment variables like find_computecpp_root does, and you might symlink entire directories instead of all the files in them):
def _symlink_dir(repository_ctx, src_dir, dest_dir):
"""Symlinks all the files in a directory.
Args:
repository_ctx: The repository context.
src_dir: The source directory.
dest_dir: The destination directory to create the symlinks in.
"""
files = repository_ctx.path(src_dir).readdir()
for src_file in files:
repository_ctx.symlink(src_file, dest_dir + "/" + src_file.basename)
def find_computecpp_root(repository_ctx):
"""Find ComputeCpp compiler."""
sycl_name = ""
if _COMPUTECPP_TOOLKIT_PATH in repository_ctx.os.environ:
sycl_name = repository_ctx.os.environ[_COMPUTECPP_TOOLKIT_PATH].strip()
if sycl_name.startswith("/"):
return sycl_name
fail("Cannot find SYCL compiler, please correct your path")
def _sycl_autoconf_imp(repository_ctx):
<snip>
computecpp_root = find_computecpp_root(repository_ctx)
<snip>
_symlink_dir(repository_ctx, computecpp_root + "/lib", "sycl/lib")
_symlink_dir(repository_ctx, computecpp_root + "/include", "sycl/include")
_symlink_dir(repository_ctx, computecpp_root + "/bin", "sycl/bin")

Related

How to get Bazel output base from within rule Args map_each function?

I'm writing a Bazel rule which processes an input depsets with lots of jar files in it. My goal is to get the full (absolute) path of the jars for launching an application.
I use java_binary for the initial application. However, the application itself later then dynamically loads additional jars into the JVM. I need to give that application the absolute path to those jars.
As the depset is pretty expensive to process I want to move that into the execution phase. This requires the use of Args in combination with a map_each function like this:
args.add_joined("--jars", dependencies, join_with = ",", map_each = _expand_jars)
I'm now stuck with how to turn the jar path into an absolute path. In an older version (which was running in the analysis phase) I passed in the path using a rule attribute. This is required because (for some reason) the bootstrap application couldn't access the jars in the external/ directory. In the mapping function I have it hard coded to "bazel-myworkspace/../../", which I'd like to get rid of. It does follow the symlink into the exec root and from there I'm able to get to the output base.
def _expand_jars(file):
if file.path.startswith("external/"):
return "bazel-myworkspace/../../" + file.path
else:
return file.path
Any ideas?

Determine if files are part of any package

Given I have a list of files, e.g foo/src/main.cpp, foo/src/bar.cpp, foo/README.md is it possible to determine which of those files are part of a bazel package?
In my example, the output would e.g. be foo/src/main.cpp, foo/src/bar.cpp since the README.md would not be part of the build.
One way to do this would be to call bazel query on each file and see if it results in an output, but that is quite inefficient and so I was wondering if there is an easier way.
Background: I am trying to determine if a changes in a set of files have an impact on a target, and I want to use bazel query somepath(//some/target, set($FILES)) for that, but this will fail if any of the files in $FILES is not part of a BUILD file.
How about flipping it around and querying for all the source files of the target with:
bazel query 'kind("source file", deps(//some:target))'
and then checking if the result has any of the files in the set

how to find and deploy the correct files with Bazel's pkg_tar() in Windows?

please take a look at the bin-win target in my repository here:
https://github.com/thinlizzy/bazelexample/blob/master/demo/BUILD#L28
it seems to be properly packing the executable inside a file named bin-win.tar.gz, but I still have some questions:
1- in my machine, the file is being generated at this directory:
C:\Users\John\AppData\Local\Temp_bazel_John\aS4O8v3V\execroot__main__\bazel-out\x64_windows-fastbuild\bin\demo
which makes finding the tar.gz file a cumbersome task.
The question is how can I make my bin-win target to move the file from there to a "better location"? (perhaps defined by an environment variable or a cmd line parameter/flag)
2- how can I include more files with my executable? My actual use case is I want to supply data files and some DLLs together with the executable. Should I use a filegroup() rule and refer its name in the "srcs" attribute as well?
2a- for the DLLs, is there a way to make a filegroup() rule to interpret environment variables? (e.g: the directories of the DLLs)
Thanks!
Look for the bazel-bin and bazel-genfiles directories in your workspace. These are actually junctions (directory symlinks) that Bazel updates after every build. If you bazel build //:demo, you can access its output as bazel-bin\demo.
(a) You can also set TMP and TEMP in your environment to point to e.g. c:\tmp. Bazel will pick those up instead of C:\Users\John\AppData\Local\Temp, so the full path for the output directory (that bazel-bin points to) will be c:\tmp\aS4O8v3V\execroot\__main__\bazel-out\x64_windows-fastbuild\bin.
(b) Or you can pass the --output_user_root startup flag, e.g. bazel--output_user_root=c:\tmp build //:demo. That will have the same effect as (a).
There's currently no way to get rid of the _bazel_John\aS4O8v3V\execroot part of the path.
Yes, I think you need to put those files in pkg_tar.srcs. Whether you use a filegroup() rule is irrelevant; filegroup just lets you group files together, so you can refer to the group by name, which is useful when you need to refer to the same files in multiple rules.
2.a. I don't think so.

Is it possible to get $pwd value from Bazel .bzl rule?

I am trying to get a value of the full current directory path from within .bzl rule. I have tried following:
ctx.host_configuration.default_shell_env.PATH returns "/Users/[user_name]/.rbenv/shims:/usr/local/bin:/usr/bin:/bin:...
ctx.bin_dir.path returns bazel-out/local-fastbuild/bin
pwd = ctx.expand_make_variables("cmd", "$$PWD", {}) returns string $PWD - I don't think this rule is helpful for me, but may be just using it wrong.
What I need is the directory under which the cmd that runs Bazel .bzl rule is running. For example: /Users/[user_name]/git/workspace/path/to/bazel/rule.bzl or at least first part of the path prior to the WORKSPACE directory.
I can't use pwd because I need this value before I call ctx.actions.run_shell()
Are there no attributes in Bazel configurations that hold this value?
The goal is to have hermetic builds, so you shouldn't depend on the absolute path.
Feel free to use pwd inside the command of ctx.actions.run_shell() (for reproducible builds, be careful, avoid putting the absolute path in the generated files).
Edit.
Technically, there are some workarounds. For example, you can pass the path via the --define flag:
bazel build :all --define=path=$(pwd)
Then the value will be available using ctx.var["path"].
Based on your comment below, you want the path to declare an output. Let me repeat: You shouldn't use an absolute path to declare the output file. Declare an output in your package. Then ask the tool you call to use that output.
For example, when you call gcc, you can use -o to specify the output. When a tool writes to stdout, use the shell to redirect it. If the tool is really not flexible, you may want to wrap it with your own script (e.g. call the tool and copy the output file).
Using an absolute path here is not the right solution. For example, it should be possible to execute the action on a remote machine (where your absolute path won't make sense.
Zip may be a reasonable solution. It's useful when you cannot know in advance the number or the names of the output files.

Skylark - How to execute a jar from a repository rule

Context
I am writing a repository rule that invokes another Bazel project. My current approach is to build the additional project as a deploy jar. I would like a user to be able to instantiate the rule like:
jar_path = some/relative/path
my_rule(name = "something", p_arg="m_arg", binary=jar_path)
and then given the jar_path and the arguments, I would like the repository rule to execute the following command in the shell:
java -jar $(SOME_JAR) $(ARGUMENTS_PROVIDED_BY_RULE)
Problem
First, it's unclear how best to accomplish the deploy jar approach. So far, I have attempt two different approaches, with varying levels of success. For examples, I have skimmed through the scala_rules, the maven_rules, and the skylark cookbook.
Second, and more importantly, I am not sure whether the deploy jar is the best route to accomplishing my goals. Again, my interest is to invoke a target from an external Bazel project, that is currently hosted on github. (So feasibly, I could try to fetch the project using the http_archive rule).
Below, I describe the attempts I have made.
Approach 1
My first approach involved trying to execute the command using the command field in ctx.action. I tried various enumerations of
java -jar {computed_absolute_path_of_deploy_jar} {args_passed_from_instantiation}.
My biggest issue here was with determining the absolute path of the deploy jar. The file's root path, would contain some additional information. For example, it would like something like this.
/abs/olute/path[ something ]/rela/tive/path
As a side note, I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None, despite File.none not being None.
My first approach involved was to was to try to use skylark [ctx.binary]
Approach 2
Next thing I tried was to mimic the input binary example from the docs. This was also unsuccessful. The issue was that the actual binary could not be found. Here is how I configured it.
First, I relaxed the repository rule into a regular skylark rule.
def _test_binary(ctx):
ctx.action(
....
arguments = [ctx.attr.p_arg],
executable = ctx.executable.binary)
test_binary = rule(
...
attrs = {
"binary":attr.label(mandatory=True, cfg="host", allow_files=True, executable=True),
...
}
Then, in my external project, I loaded the skylark rule into the WORKSPACE file. Finally, I called the macro from one of my BUILD files as follows:
load("#something_rule//:something_rule.bzl", "test_binary")
test_binary(name = "hello", p_arg = "hello", binary = "script.sh")
The script is a one line java -jar something_deploy.jar -- -arg:$1, and is in the same directory as the BUILD file.
Bazel complains that src/script.sh does not exist. I presume because it is looking for the file in /private/var/tmp/-bazel_username/somehash/relative_path. In response, I tried to pass the absolute path, which is not allowed.
Cheers.
It looks like you're mixing up repository rules with build extensions ("normal" rules). A good rule of thumb is:
Repository rules are for getting sources onto your system or symlinking them to a place Bazel can see them.
Build extension are for everything else: compiling, copying files, running binaries, etc.
I don't actually think you need to use either, for this. You say that the other project is on GitHub, so you can add the following to your WORKSPACE file:
http_archive(
name = "other_project",
...
)
Then, in your BUILD file:
genrule(
name = "run-a-jar",
srcs = ["#other_project//some/relative:path"],
cmd = "java -jar $(location #other_project//some/relative:path) -- arg1 arg2 > $#",
outs = ["jar-output"],
)
You shouldn't need to use the _deploy.jar target, since you're not moving the jar out of its project (_deploy.jar is useful when you need to relocate it).
Other things from your question:
I'm not sure if this is a bug/nit, but the File.root.path, evaluated to None,
Are you sure it didn't evaluate to ""? The path is relative to the execution root, so for sources, it will always be "" (for outputs, it'll be bazel-out/local-fastbuild/bin or similar).
Bazel complains that src/script.sh does not exist.
Passing -s to Bazel can really help debugging Skylark rules. You can see exactly where it is looking.

Resources