How to pass an array from Bazel cli to rules? - bazel

Let's say I have a rule like this.
foo(
name = "helloworld",
myarray = [
":bar",
"//path/to:qux",
],
)
In this case, myarray is static.
However, I want it to be given by cli, like
bazel run //:helloworld --myarray=":bar,//path/to:qux,:baz,:another"
How is this possible?
Thanks

To get exactly what you're asking for, Bazel would need to support LABEL_LIST in Starlark-defined command line flags, which are documented here:
https://docs.bazel.build/versions/2.1.0/skylark/lib/config.html
and here: https://docs.bazel.build/versions/2.1.0/skylark/config.html
Unfortunately that's not implemented at the moment.
If you don't actually need a list of labels (i.e., to create dependencies between targets), then maybe STRING_LIST will work for you.
If you do need a list of labels, and the different possible values are known, then you can use --define, config_setting(), and select():
https://docs.bazel.build/versions/2.1.0/configurable-attributes.html

The question is, what are you really after. Passing variable, array into the bazel build/run isn't really possible, well not as such and not (mostly) without (very likely unwanted) side effects. Aren't you perhaps really just looking into passing arguments directly to what is being run by the run? I.e. pass it to the executable itself, not bazel?
There are few ways you could sneak stuff in (you'd also in most cases need to come up with a syntax to pass data on CLI and unpack the array in a rule), but many come with relatively substantial price.
You can define your array in a bzl file and load it from where the rule uses it. You can then dump the bzl content rewriting your build/run configuration (also making it obvious, traceable) and load the bits from the rule (only affecting the rule loading and using the variable). E.g, BUILD file:
load(":myarray.bzl", "myarray")
foo(
name = "helloworld",
myarray = myarray,
],
)
And you can then call your build:
$ echo 'myarray=[":bar", "//path/to:qux", ":baz", ":another"]' > myarray.bzl
$ bazel run //:helloworld
Which you can of course put in a single wrapper script. If this really needs to be a bazel array, this one is probably the cleanest way to do that.
--workspace_status_command: you can collection information about your environment, add either or both of the resulting files (depending on whether the inputs are meant to invalidate the rule results or not, you could use volatile or stable status files) as a dependency of your rule and process the incoming file in the what is being executed by the rule (at which point one would wonder why not pass it to as its command line arguments directly). If using stable status file, also each other rule depending on it is invalidated by any change.
You can do similar thing by using --action_env. From within the executable/tool/script underpinning the rule, you can directly access defined environmental variable. However, this also means environment of each rule is affected (not just the one you're targeting); and again, why would it parse the information from environment and not accept arguments on the command line.
There is also --define, but you would not really get direct access it's value as much as you could select() a choice out of possible options.

Related

Can I add static analysis to a py_binary or py_library rule?

I have a repo which uses bazel to build a bunch of Python code. I would like to introduce various flavors of static analysis into the build and have the build fail if these static analyses throw errors. What is the best way to do this?
For example, I'd like to declare something like:
py_library_with_static_analysis(
name = "foo",
srcs = ["foo.py"],
)
py_library_with_static_analysis(
name = "bar",
srcs = ["bar.py"],
deps = [":foo"],
)
In a build file and have it error out if there are mypy/flake/etc errors in foo.py. I would like to be able to do this gradually, converting libraries/binaries to static analysis one target at a time. I'm not sure if I should do this via a new rule, a macro, an aspect or something else.
Essentially, I think I'm asking how to run an additional command while building a py_binary/py_library and fail if that command fails.
I could create my own version of a py_library rule and have it run static analysis within the implementation but that seems like something which is really easy to get wrong (my guess is that native.py_library is quite complex?) and there doesn't seem to be a way to instantiate a native.py_library within a custom rule.
I've also played around with macros a bit, but haven't been able to get that to work either. I think my issue there is that a macro doesn't actually specify new commands, only new targets and I can't figure out how to make the static analysis target get force built along with the py_library/py_binary I'm interested in.
A macro that adds implicit test targets is not such a bad idea: The test targets will be picked up automatically when you run bazel test //..., which you could do in a gating CI to prevent imperfect code from merging.
Bazel supports a BUILD prelude (which is underdocumented) that you could use to replace all py_binary, py_library, and even py_test with your test-adding wrapper macros with minimal changes to existing code.
If you somehow fail the build instead it will make it harder to quickly prototype things. Sometimes you want to just quickly try something out, and you don't care about any pydoc violations yet.
In case you do want to fail the build, you might be able to use the Validations Output Group of a rule that you implement to wrap or replace your py_libraries.

Don't discard analysis cache when --action_env changes

I have an --action_env variable I'm passing into Bazel sometimes, but each time I remove it or add it back, it discards the analysis cache, which triggers a re-analysis that takes several minutes because I'm working in a large repo. Is there a way to prevent this? I'm already using --trim_test_configuration
Answer
Not really. You can think of the set of environment variables as a file that almost every Bazel action depends on. As a simple example let's say you have a build file that looks something like this;
genrule(
name = "foo_header",
cmd = "echo #define FOO $FOO_FROM_ENV > foo.h",
outs = ["foo.h"],
)
cc_library(
name = "my_library_that_everything_depends_on",
hdrs = [":foo_header"],
)
In this simple case it's not hard to see that if you change --action_env=FOO_FROM_ENV=7 to another value that everything that depends on foo.h now has to be completely rebuilt and analysed. So while frustrating it's probably a good thing that Bazel does this otherwise you'd end up with an inconsistent build.
Partial workarounds
Remove usage of use_default_shell_env from your rules/actions (this is far from trivial as most of the standard rules do this)
Use a centralised cache to prevent Bazel from deleting the artifact cache. e.g. I add these to ~/.bazelrc as then it shares the action/repository cache between all my projects. This only helps if you are switching between env variables on a regular basis rather than using new env values each time. Also, be careful with this as Bazel doesn't presently do any garbage collection so the cache directories can end up very large.
# ~/.bazelrc
common --disk_cache=~/.cache/shared_bazel_action_cache
common --repository_cache=~/.cache/shared_bazel_repository_cache
Add --incompatible_strict_action_env to your project bazelrc, this will prevent changes in your user shell triggering Bazel to discard the analysis cache.

How can a Bazel `repository_rule` adjust a `label_flag` (or a `config_setting` more generally)?

I can create a label_flag in Bazel to allow command line flags to in turn be matched with a config_setting in a Bazel BUILD file.
However, I'd like to not hard-code the default value of the label_flag, and instead compute a good default based on the system when evaluating a repository_rule (or some other part of the WORKSPACE file).
The best (but awful) way I've come up with to do this is to have the default value loaded from a .bzl file that is generated using the template function on the repository_ctx.
I feel like generating a new file by doing textual substitutions probably isn't the right way to do this, but I can't find anything else. Ideas? help?
Generating a bzl file using the repository rule that inspects the host system is the only way to achieve what you need right now. So you're holding it "right" :)

Default, platform specific, Bazel flags in bazel.rc

I was wondering if its possible for platform-specific default Bazel build flags.
For example, we want to use --workspace_status_command but this must be a shell script on Linux and must point towards a batch script for Windows.
Is there a way we can write in the tools/bazel.rc file something like...
if platform=WINDOWS build: --workspace_status_command=status_command.bat
if platform=LINUX build: --workspace_status_command=status_command.sh
We could generate a .bazelrc file by having the users run a script before building, but it would be cleaner/nicer if this was not neccessary.
Yes, kind of. You can specify config-specific bazelrc entries, which you can select by passing --config=<configname>.
For example your bazelrc could look like:
build:linux --cpu=k8
build:linux --workspace_status_command=/path/to/command.sh
build:windows --cpu=x64_windows
build:windows --workspace_status_command=c:/path/to/command.bat
And you'd build like so:
bazel build --config=linux //path/to:target
or:
bazel build --config=windows //path/to:target
You have to be careful not to mix semantically conflicting --config flags (Bazel doesn't prevent you from that). Though it will work, the results may be unpredictable when the configs tinker with the same flags.
Passing --config to all commands is tricky, it depends on developers remembering to do this, or controlling the places where Bazel is called.
I think a better answer would be to teach the version control system how to produce the values, like by putting a git-bazel-stamp script on the $PATH/%PATH% so that git bazel-stamp works.
Then we need workspace_status_command to allow commands from the PATH rather than a path on disk.
Proper way to do this is to wrap your cc_library with a custom macro, and pass hardcoded flags to copts. For full reference, look at envoy_library.bzl.
In short, your steps:
Define a macro to wrap cc_library:
def my_cc_library(
name,
copts=[],
**kwargs):
cc_library(name, copts=copts + my_flags(), **kwargs)
Define my_flags() macro as following:
config_setting(
name = "windows_x86_64",
values = {"cpu": "x64_windows"},
)
config_setting(
name = "linux_k8",
values = {"cpu": "k8"},
)
def my_flags():
x64_windows_options = ["/W4"]
k8_options = ["-Wall"]
return select({
":windows_x86_64": x64_windows_options,
":linux_k8": k8_options,
"//conditions:default": [],
})
How it works:
Depending on --cpu flag value my_flags() will return different flags.
This value is resolved automatically based on a platform. On Windows, it's x64_windows, and on Linux it's k8.
Then, your macro my_cc_library will supply this flags to every target in a project.
A better way of doing this has been added since you asked--sometime in 2019.
If you add
common --enable_platform_specific_config to your .bazelrc, then --config=windows will automatically apply on windows hosts, --config=macos on mac, --config=linux on linux, etc.
You can then add lines to your .bazelrc like:
build:windows --windows-flags
build:linux --linux-flags
There is one downside, though. This works based on the host rather than the target. So if you're cross-compiling, e.g. to mobile, and want different flags there, you'll have to go with a solution like envoy's (see other answer), or (probably better) add transitions into your graph targets. (See discussion here and here. "Flagless builds" are still under development, but there are usable hacks in the meantime.) You could also use the temporary platform_mappings API.
References:
Commit that added this functionality.
Where it appears in the Bazel docs.

Options for MeCab Japanese tokenizer on iOS?

I'm using the iPhone library for MeCab found at https://github.com/FLCLjp/iPhone-libmecab . I'm having some trouble getting it to tokenize all possible words. Specifically, I cannot tokenize "吉本興業" into two pieces "吉本" and "興業". Are there any options that I could use to fix this? The iPhone library does not expose anything, but it uses C++ underneath the objective-c wrapper. I assume there must be some sort of setting I could change to give more fine-grained control, but I have no idea where to start.
By the way, if anyone wants to tag this 'mecab' that would probably be appropriate. I'm not allowed to create new tags yet.
UPDATE: The iOS library is calling mecab_sparse_tonode2() defined in libmecab.cpp. If anyone could point me to some English documentation on that file it might be enough.
There is nothing iOS-specific in this. The dictionary you are using with mecab (probably ipadic) contains an entry for the company name 吉本興業. Although both parts of the name are listed as separate nouns as well, mecab has a strong preference to tag the compound name as one word.
Mecab lacks a feature that allows the user to choose whether or not compounds should be split into parts. Note that such a feature is generally hard to implement because not everyone agrees on which compounds can be split and which ones can't. E.g. is 容疑者 a compound made up of 容疑 and 者? From a purely morphological point of view perhaps yes, but for most practical applications probably no.
If you have a list of compounds you'd like to get segmented, a quick fix is to create a user dictionary for the parts they consist of, and make mecab use this in addition to the main dictionary.
There is Japanese documentation on how to do this here. For your particular example, it would involve the steps below.
Make a user dictionary with two entries, one for 吉本 and one for 興業:
吉本,,,100,名詞,固有名詞,人名,名,*,*,よしもと,ヨシモト,ヨシモト
興業,,,100,名詞,一般,*,*,*,*,こうぎょう,コウギョウ,コウギョウ
I suspect that both entries exist in the default dictionary already, but by adding them to a user dictionary and specifying a relatively low specificness indicator (I've used 100 for both -- the lower, the more likely to be split), you can get mecab to tend to prefer the parts over the whole.
Compile the user dictionary:
$> $MECAB/libexec/mecab/mecab-dict-index -d /usr/lib64/mecab/dic/ipadic -u mydic.dic -f utf-8 -t utf-8 ./mydic
You may have to adjust the command. The above assumes:
Mecab was installed from source in $MECAB. If you use mecab installed by a package manager, you might have difficulties finding the mecab-dict-index tool. Best install from source.
The default dictionary is in /usr/lib64/mecab/dict/ipadic. This is not part of the mecab package; it comes as a separate package (e.g. this) and you may have difficulties finding this, too.
mydic is the name of the user dictionary created in step 1. mydic.dic is the name of the compiled dictionary you'll get as output (needs not exist).
Both the system dictionary (-t option) and the user dictionary (-f option) are encoded in UTF-8. This may be wrong, in which case you'll get an error message later when you use mecab.
Modify the mecab configuration. In a system-wide installation, this is a file named /usr/lib64/mecab/dic/ipadic/dicrc or similar. In your case it may be located somewhere else. Add the following line to the end of the configuration file:
userdic = home/myhome/mydic.dic
Make sure the absolute path to the dictionary compiled above is correct.
If you then run mecab against your input, it will split the compound into its parts (I tested it, using mecab 0.994 on a Linux system).
A more thorough fix would be to get the source of the default dictionary and manually remove all compoun nouns you want to get split, then recompile the dictionary. As a general remark, using a CJK tokenizer for a serious application in production mode over a longer period of time usually involves a certain amount of dictionary maintenance (adding/removing entries) regularly.

Resources