Is it possible to get $pwd value from Bazel .bzl rule? - bazel

I am trying to get a value of the full current directory path from within .bzl rule. I have tried following:
ctx.host_configuration.default_shell_env.PATH returns "/Users/[user_name]/.rbenv/shims:/usr/local/bin:/usr/bin:/bin:...
ctx.bin_dir.path returns bazel-out/local-fastbuild/bin
pwd = ctx.expand_make_variables("cmd", "$$PWD", {}) returns string $PWD - I don't think this rule is helpful for me, but may be just using it wrong.
What I need is the directory under which the cmd that runs Bazel .bzl rule is running. For example: /Users/[user_name]/git/workspace/path/to/bazel/rule.bzl or at least first part of the path prior to the WORKSPACE directory.
I can't use pwd because I need this value before I call ctx.actions.run_shell()
Are there no attributes in Bazel configurations that hold this value?

The goal is to have hermetic builds, so you shouldn't depend on the absolute path.
Feel free to use pwd inside the command of ctx.actions.run_shell() (for reproducible builds, be careful, avoid putting the absolute path in the generated files).
Edit.
Technically, there are some workarounds. For example, you can pass the path via the --define flag:
bazel build :all --define=path=$(pwd)
Then the value will be available using ctx.var["path"].
Based on your comment below, you want the path to declare an output. Let me repeat: You shouldn't use an absolute path to declare the output file. Declare an output in your package. Then ask the tool you call to use that output.
For example, when you call gcc, you can use -o to specify the output. When a tool writes to stdout, use the shell to redirect it. If the tool is really not flexible, you may want to wrap it with your own script (e.g. call the tool and copy the output file).
Using an absolute path here is not the right solution. For example, it should be possible to execute the action on a remote machine (where your absolute path won't make sense.
Zip may be a reasonable solution. It's useful when you cannot know in advance the number or the names of the output files.

Related

Can I get the arguments passed to bazel itself?

I want to create some "build traceability" functionality, and include the actual bazel command that was run to produce one of my build artifacts. So if the user did this:
bazel run //foo/bar:baz --config=blah
I want to actually get the string "bazel run //foo/bar:baz --config=blah" and write that to a file during the build. Is this possible?
Stamping is the "correct" way to get information like that into a Bazel build. Note the implications around caching though. You could also just write a wrapper script that puts the command line into a file or environment variable. Details of each approach below.
I can think of three ways to get the information you want via stamping, each with differing tradeoffs.
First way: Hijack --embed_label. This shows up in the BUILD_EMBED_LABEL stamping key. You'd add a line like build:blah --embed_label blah in your .bazelrc. This is easy, but that label is often used for things like release_50, which you might want to preserve.
Second way: hijack the hostname or username. These show up in the BUILD_HOST and BUILD_USER stamping keys. On Linux, you can write a shell script at tools/bazel which will automatically be used to wrap bazel invocations. In that shell script, you can use unshare --uts --map-root-user, which will work if the machine is set up to enable bazel's sandboxing. Inside that new namespace, you can easily change the hostname and then exec the real bazel binary, like the default /usr/bin/bazel shell script does. That shell script has full access to the command line, so it can encode any information you want.
Third way: put it into an environment variable and have a custom --workspace_status_command that extracts it into a stamping key. Add a line like build:blah --action_env=MY_BUILD_STYLE=blah to your .bazelrc, and then do echo STABLE_MY_BUILD_STYLE ${MY_BUILD_STYLE} in your workspace status script. If you want the full command line, you could have a tools/bazel wrapper script put that into an environment variable, and then use build --action_env=MY_BUILD_STYLE to preserve the value and pass it to all the actions.
Once you pick a stamping key to use, src/test/shell/integration/stamping_test.sh in the Bazel source tree is a good example of writing stamp information to a file. Something like this:
genrule(
name = "stamped",
outs = ["stamped.txt"],
cmd = "grep BUILD_EMBED_LABEL bazel-out/volatile-status.txt | cut -d ' ' -f 2 >\$#",
stamp = True,
)
If you want to do it without stamping, just write the information to a file in the source tree in a tools/bazel wrapper. You'd want to put that file in your .gitignore, of course. echo "$#" > cli_args is all it takes to dump them to a file, and then you can use that as a source file like normal in your build. This approach is simplest, but interacts the most poorly with Bazel's caching, because everything that depends on that file will be rebuilt every time with no way to control it.

Are absolute paths safe to use in Bazel?

I am experimenting with Bazel to be added along with an old, make/shell based build system. I can easily make shell commands which returns an absolute path to some tool or library build by the old build system as early prerequisites. These commands I can use in a genrule(), which copies the needed files (like headers and libs) into Bazel proper to be exposed in form of a cc_library().
I found out that genrule() does not detect a dependency if the command uses a file with absolute path - it is not caught by the sandbox. In a way I am (ab)using that behavior.
It is it safe? Will some future update of Bazel refuse access to files based on absolute path in that way in a command in genrule?
Most of Bazel's sandboxes allow access to most paths outside of the source tree by default. Details depend on which sandbox implementation you're using. The docker sandbox, for example, allows access to all those paths inside of a docker image. It's kind of hard to make promises about future Bazel versions, but I think it's unlikely that a sandbox will prevent accessing /bin/bash (for example), which means other absolute paths will probably continue to work too.
--sandbox_block_path can be used to explicitly block a path if you want.
If you always have the files available on every machine you build on, your setup should work. Keep in mind that Bazel will not recognize when the contents of those files change, so you can easily get stale results in various caches. You can avoid that by ensuring the external paths change whenever their contents do.
new_local_repository might be a better fit to avoid those problems, if you know the paths ahead of time.
If you don't know the paths ahead of time, you can write a custom repository rule which runs arbitrary commands via repository_ctx.execute to retrieve the paths and them symlinks them in with repository_ctx.symlink.
Tensorflow's third_party/sycl/sycl_configure.bzl has an example of doing something similar (you would do something other than looking at environment variables like find_computecpp_root does, and you might symlink entire directories instead of all the files in them):
def _symlink_dir(repository_ctx, src_dir, dest_dir):
"""Symlinks all the files in a directory.
Args:
repository_ctx: The repository context.
src_dir: The source directory.
dest_dir: The destination directory to create the symlinks in.
"""
files = repository_ctx.path(src_dir).readdir()
for src_file in files:
repository_ctx.symlink(src_file, dest_dir + "/" + src_file.basename)
def find_computecpp_root(repository_ctx):
"""Find ComputeCpp compiler."""
sycl_name = ""
if _COMPUTECPP_TOOLKIT_PATH in repository_ctx.os.environ:
sycl_name = repository_ctx.os.environ[_COMPUTECPP_TOOLKIT_PATH].strip()
if sycl_name.startswith("/"):
return sycl_name
fail("Cannot find SYCL compiler, please correct your path")
def _sycl_autoconf_imp(repository_ctx):
<snip>
computecpp_root = find_computecpp_root(repository_ctx)
<snip>
_symlink_dir(repository_ctx, computecpp_root + "/lib", "sycl/lib")
_symlink_dir(repository_ctx, computecpp_root + "/include", "sycl/include")
_symlink_dir(repository_ctx, computecpp_root + "/bin", "sycl/bin")

Programatically and reliably retrieve the path to the go executable in container without 'which' with Go

How can I retrieve the path to the go binary in a container that does not have which programatically?
One option would be to execute which go as follows:
bytes, err := exec.Command("which", "go").Output()
However, I would like to not depend on which being available. Does go provide any in-built mechanism to retrieve this and if not, what is the alternative apart from having the user pass in the path themselves?
From the man page of which:
Which takes one or more arguments. For each of its arguments it prints to stdout the full path of the executables that would have been executed when this argument had been entered at the shell prompt. It does this by searching for an executable or script in the directories listed in the environment variable PATH using the same algorithm as bash(1).
Go's os/exec.LookPath function is very close to this functionality:
LookPath searches for an executable named file in the directories named by the PATH environment variable. If file contains a slash, it is tried directly and the PATH is not consulted. The result may be an absolute path or a path relative to the current directory.
Use path/filepath.Abs if you need a guaranteed absolute path.
I don't expect this to be the best answer, but it is one that I just found. I was hoping for more of a go-specific one, but in the meantime type in linux is a default built-in one available in bash and sh (alpine).
You can test this yourself by running type type which yields:
type is a shell builtin
The usage in go would look like this:
b, err := exec.Command("type", "go").Output()
if err != nil {
/* 'type' is not available on the O/S */
}
goPath := strings.TrimPrefix(strings.TrimSuffix(string(b), "\n"), "go is ")
The reason the Trim functions are required is because the output would look like this:
go is /usr/local/go/bin/go\n
This isn't the nicest way of going about it, but it works.

how to find and deploy the correct files with Bazel's pkg_tar() in Windows?

please take a look at the bin-win target in my repository here:
https://github.com/thinlizzy/bazelexample/blob/master/demo/BUILD#L28
it seems to be properly packing the executable inside a file named bin-win.tar.gz, but I still have some questions:
1- in my machine, the file is being generated at this directory:
C:\Users\John\AppData\Local\Temp_bazel_John\aS4O8v3V\execroot__main__\bazel-out\x64_windows-fastbuild\bin\demo
which makes finding the tar.gz file a cumbersome task.
The question is how can I make my bin-win target to move the file from there to a "better location"? (perhaps defined by an environment variable or a cmd line parameter/flag)
2- how can I include more files with my executable? My actual use case is I want to supply data files and some DLLs together with the executable. Should I use a filegroup() rule and refer its name in the "srcs" attribute as well?
2a- for the DLLs, is there a way to make a filegroup() rule to interpret environment variables? (e.g: the directories of the DLLs)
Thanks!
Look for the bazel-bin and bazel-genfiles directories in your workspace. These are actually junctions (directory symlinks) that Bazel updates after every build. If you bazel build //:demo, you can access its output as bazel-bin\demo.
(a) You can also set TMP and TEMP in your environment to point to e.g. c:\tmp. Bazel will pick those up instead of C:\Users\John\AppData\Local\Temp, so the full path for the output directory (that bazel-bin points to) will be c:\tmp\aS4O8v3V\execroot\__main__\bazel-out\x64_windows-fastbuild\bin.
(b) Or you can pass the --output_user_root startup flag, e.g. bazel--output_user_root=c:\tmp build //:demo. That will have the same effect as (a).
There's currently no way to get rid of the _bazel_John\aS4O8v3V\execroot part of the path.
Yes, I think you need to put those files in pkg_tar.srcs. Whether you use a filegroup() rule is irrelevant; filegroup just lets you group files together, so you can refer to the group by name, which is useful when you need to refer to the same files in multiple rules.
2.a. I don't think so.

Bazel action output path

How do I get the absolute path that the sandbox worker will be executing in from skylark?
I've got a number of rules where I need to add an argument to the action command in skylark. The argument is always the equivalent of "-fdebug-prefix-map=$(/bin/readlink -f .)=.". I need the path so I can teach my tools to strip off the sandbox path and leave a relative path. What's the best way to get access to that path?
As a workaround you can use the blaze-bin symlink.
However, in general, you cannot (particularly if you use --define).
You should wrap your script in a blaze rule and use blaze run.

Resources