What is the best way to refer to an external package's path in any arbitrary files processed by Bazel?
I'm trying to understand how Bazel preprocesses BUILD and .bzl files. I see instances where strings contain calls to package() and I am wondering how it works (and could not find any relevant documentation). Here is an example of this:
I have a toolchain which BUILD file contains the following expression :
cc_toolchain_config(
name = "cc-toolchain-config",
abi_libc_version = "glibc_" + host_gcc8_bundle()["pkg_version"]["glibc"],
abi_version = "gcc-" + host_gcc8_bundle()["version"],
compiler = "gcc-" + host_gcc8_bundle()["version"],
cpu = "x86_64",
cxx_builtin_include_directories = [
"%package(#host_gcc8_toolchain//include/c++/8)%",
"%package(#host_gcc8_toolchain//lib64/gcc/x86_64-unknown-linux-gnu/8/include-fixed)%",
"%package(#host_gcc8_kernel_headers//include)%",
"%package(#host_gcc8_glibc//include)%",
],
host_system_name = "x86_64-unknown-linux-gnu",
target_libc = "glibc_" + host_gcc8_bundle()["pkg_version"]["glibc"],
target_system_name = "x86_64-unknown-linux-gnu",
toolchain_identifier = "host_linux_gcc8",
)
From my understanding, the cxx_builtin_include_directories defines a list of strings to serve as the --sysroot option passed to GCC as detailed in https://docs.bazel.build/versions/0.23.0/skylark/lib/cc_common.html These strings are in the format %sysroot%.
Since package(#host_gcc8_toolchain//include/c++/8) for example, does not mean anything to GCC, bazel has to somehow expand this function to produce the actual path to the files included in the package before passing them to the compiler driver.
But how can it determine that this needs to be expanded and that it is not a regular string ? So how does Bazel preprocess the BUILD file ? Is it because of the % ... % pattern ? Where is this documented ?
is "%package(#external_package//target)%" a pattern that can be used elsewhere ? In any BUILD file ? Where do I find Bazel documentation showing how this works ?
These directives are expanded by cc_common.create_cc_toolchain_config_info within the cc_toolchain_config rule implementation not any sort of preprocessing on the BUILD file (I.e., "%package(#host_gcc8_glibc//include)%" is literally passed into the cc_toolchain_config rule.) I'm not aware that these special expansions are completely documented anywhere but the source.
Related
I have a code analysis tool that I'd like to run for each cc_library (and cc_binary, silently implied for rest of the question). The tool has a CLI interfaces taking:
A tool project file
Compiler specifics, such as type sizes, built-ins, macros etc.
Files to analyze
File path, includes, defines
Rules to (not) apply
Files to add to the project
Options for synchronizing files with build data
JSON compilation database
Parse build log
Analyze and generate analysis report
I've been looking at how to integrate this in Bazel so that the files to analyze AND the associated includes and defines are updated automatically, and that any analysis result is properly cached. Generating JSON compilation database (using third party lib) or parsing build log both requires separate runs and updating the source tree. For this question I consider that a workaround I'm trying to remove.
What I've tried so far is using aspects, adding an analysis aspect to any library. The general idea is having a base project file holding library invariant configuration, appended with the cc_library files to analysis, and finally an analysis is triggered generating the report. But I'm having trouble to execute, and I'm not sure it's even possible.
This is my aspect implementation so far, trying to iterate through cc_library attributes and target compilation context:
def _print_aspect_impl(target, ctx):
# Make sure the rule has a srcs attribute
if hasattr(ctx.rule.attr, 'srcs'):
# Iterate through the files
for src in ctx.rule.attr.srcs:
for f in src.files.to_list():
if f.path.endswith(".c"):
print("file: ")
print(f.path)
print("includes: ")
print(target[CcInfo].compilation_context.includes)
print("quote_includes: ")
print(target[CcInfo].compilation_context.quote_includes)
print("system_includes: ")
print(target[CcInfo].compilation_context.system_includes)
print("define: " + define)
print(ctx.rule.attr.defines)
print("local_defines: ")
print(ctx.rule.attr.local_defines)
print("") # empty line to separate file prints
return []
What I cannot figure out is how to get ALL includes and defines used when compiling the library:
From libraries depended upon, recursively
copts, defines, includes
From the toolchain
features, cxx_builtin_include_directories
Questions:
How do I get the missing flags, continuing on presented technique?
Can I somehow retrieve the compile action command string?
Appended to analysis project using the build log API
Some other solution entirely?
Perhaps there is something one can do with cc_toolchain instead of aspects...
Aspects are the right tool to do that. The information you're looking for is contained in the providers, fragments, and toolchains of the cc_* rules the aspect has access to. Specifically, CcInfo has the target-specific pieces, the cpp fragment has the pieces configured from the command-line flag, and CcToolchainInfo has the parts from the toolchain.
CcInfo in target tells you if the current target has that provider, and target[CcInfo] accesses it.
The rules_cc my_c_compile example is where I usually look for pulling out a complete compiler command based on a CcInfo. Something like this should work from the aspect:
load("#rules_cc//cc:action_names.bzl", "C_COMPILE_ACTION_NAME")
load("#rules_cc//cc:toolchain_utils.bzl", "find_cpp_toolchain")
[in the impl]:
cc_toolchain = find_cpp_toolchain(ctx)
feature_configuration = cc_common.configure_features(
ctx = ctx,
cc_toolchain = cc_toolchain,
requested_features = ctx.features,
unsupported_features = ctx.disabled_features,
)
c_compiler_path = cc_common.get_tool_for_action(
feature_configuration = feature_configuration,
action_name = C_COMPILE_ACTION_NAME,
)
[in the loop]
c_compile_variables = cc_common.create_compile_variables(
feature_configuration = feature_configuration,
cc_toolchain = cc_toolchain,
user_compile_flags = ctx.fragments.cpp.copts + ctx.fragments.cpp.conlyopts,
source_file = src.path,
)
command_line = cc_common.get_memory_inefficient_command_line(
feature_configuration = feature_configuration,
action_name = C_COMPILE_ACTION_NAME,
variables = c_compile_variables,
)
env = cc_common.get_environment_variables(
feature_configuration = feature_configuration,
action_name = C_COMPILE_ACTION_NAME,
variables = c_compile_variables,
)
That example only handles C files (not C++), you'll have to change the action names and which parts of the fragment it uses appropriately.
You have to add toolchains = ["#bazel_tools//tools/cpp:toolchain_type"] and fragments = ["cpp"] to the aspect invocation to use those. Also see the note in find_cc_toolchain.bzl about the _cc_toolchain attr if you're using legacy toolchain resolution.
The information coming from the rules and the toolchain is already structured. Depending on what your analysis tool wants, it might make more sense to extract it directly instead of generating a full command line. Most of the provider, fragment, and toolchain is well-documented if you want to look at those directly.
You might pass required_providers = [CcInfo] to aspect to limit propagation to rules which include it, depending on how you want to manage propagation of your aspect.
The Integrating with C++ Rules documentation page also has some more info.
By default, the cc_binary rule of bazel produces an output file without any extension on Linux.
My compiler generates a .s19 file extension as output(I have extended the toolchain).
Is there a way to specify the output file's extension?
I get a "Linking 'App-name' failed: not all outputs were created or valid" although the expected output file 'App-name.s19' is generated.
My second question is:
In addition to an 'App-name.s19' file my compiler also generates a 'App-name.map' file. Is there a way to tell bazel to verify both 'App-name.s19' and 'App-name.map' files. i.e. verify multiple outputs generated by cc_binary.
Question 1)
When configuring your toolchain you can load: artifact_name_pattern from #bazel_tools//tools/cpp:cc_toolchain_config_lib.bzl and set artifact_name_patterns attribute of cc_common.create_cc_toolchain_config_info():
artifact_name_patterns = [
artifact_name_pattern(
category_name = "executable",
prefix = "",
extension = ".s19",
),]
Question 2)
It seems that cc_binary doesn't support yet map files https://github.com/bazelbuild/bazel/issues/6718
A possible workaround would be to use a genrule in your BUILD file to compile your code and generate the mapfile,
genrule(
name = "map",
srcs = ["hello-world.cc"],
outs = ["hello-world.exe","output.map"],
output_to_bindir = 1,
cmd= "/usr/bin/x86_64-w64-mingw32-gcc -o $(location hello-world.exe) $(location hello-world.cc) -Wl,-Map=\"$(location output.map)\" -lstdc++",
)
or alternatively https://groups.google.com/g/bazel-discuss/c/A00d7Ui1f8s/m/vybgGEPIBwAJ
It's not clear to me what the difference between the DefaultInfo runfiles's transitive_files and PyInfo transitive_sources are. Are they redundant or is there an important difference?
For example, I have a custom starlark rule which I want to conform as a PyInfo provider, but I want to add an additional provider so I can't use the native py_library rule.
transitive_sources = [dep[PyInfo].transitive_sources for dep in ctx.attr.deps]
return struct(providers = [
DefaultInfo(
files = depset(sources + outs),
runfiles = ctx.runfiles(files = sources + outs, transitive_files = transitive_sources)
),
PyInfo(
transitive_sources = depset(direct = sources + outs, transitive = transitive_sources),
imports = depset(
direct = [_path_join(ctx.workspace_name, ctx.label.package, im) for im in ctx.attr.imports],
transitive = [dep[PyInfo].imports for dep in ctx.attr.deps]
)
),
_EggLibraryInfo(aditional_info="other stuff"),
])
I'm creating redundant depsets to satisfy these providers, which makes me think maybe I'm doing it wrong.
I have also tried another method of looping over all the default_runfiles of the deps, and using runfiles.merge for DefaultInfo. For simple cases, these methods appear equivalent, but I don't know if there are other scenarios where the approaches would diverge.
The PyInfo documentation could use a section on how transitive_sources fits into DefaultInfo, and why additional mechanisms outside of runfiles needs to be provided. https://docs.bazel.build/versions/master/skylark/lib/PyInfo.html
DefaultInfo is a known type to Bazel:
files controls which files are built when you bazel build the target,
runfiles defines which files need to be present in the sandbox when executing the target.
PyInfo is exclusively used by Python rules and is used to propagate metadata to consuming targets.
My guess is that the duplication is necessary because the values may differ, so removing the duplication will either mean Bazel doesn't build/include the right files, or consuming Python rules are missing information.
Starting with Bazel v0.19, if you have Starlark (formerly known as "Skylark") code that references #bazel_tools//tools/jdk:jar, you see messages like this at build time:
WARNING: <trimmed-path>/external/bazel_tools/tools/jdk/BUILD:79:1: in alias rule #bazel_tools//tools/jdk:jar: target '#bazel_tools//tools/jdk:jar' depends on deprecated target '#local_jdk//:jar': Don't depend on targets in the JDK workspace; use #bazel_tools//tools/jdk:current_java_runtime instead (see https://github.com/bazelbuild/bazel/issues/5594)
I think I could make things work with #bazel_tools//tools/jdk:current_java_runtime if I wanted access to the java command, but I'm not sure what I'd need to do to get the jar tool to work. The contents of the linked GitHub issue didn't seem to address this particular problem.
I stumbled across a commit to Bazel that makes a similar adjustment to the Starlark java rules. It uses the following pattern: (I've edited the code somewhat)
# in the rule attrs:
"_jdk": attr.label(
default = Label("//tools/jdk:current_java_runtime"),
providers = [java_common.JavaRuntimeInfo],
),
# then in the rule implementation:
java_runtime = ctx.attr._jdk[java_common.JavaRuntimeInfo]
jar_path = "%s/bin/jar" % java_runtime.java_home
ctx.action(
inputs = ctx.files._jdk + other inputs,
outputs = [deploy_jar],
command = "%s cmf %s" % (jar_path, input_files),
)
Additionally, java is available at str(java_runtime.java_executable_exec_path) and javac at "%s/bin/javac" % java_runtime.java_home.
See also, a pull request with a simpler example.
Because my reference to the jar tool is inside a genrule within top-level macro, rather than a rule, I was unable to use the approach from Rodrigo's answer. I instead explicitly referenced the current_java_runtime toolchain and was then able to use the JAVABASE make variable as the base path for the jar tool.
native.genrule(
name = genjar_rule,
srcs = [<rules that create files being jar'd>],
cmd = "some_script.sh $(JAVABASE)/bin/jar $# $(SRCS)",
tools = ["some_script.sh", "#bazel_tools//tools/jdk:current_java_runtime"],
toolchains = ["#bazel_tools//tools/jdk:current_java_runtime"],
outs = [<some outputs>]
)
Suppose I have this macro which creates a native.cc_binary target:
def build_it(name, **kwargs):
native.cc_binary(
name = name + ".out",
linkopts = [
"-Lsomedir",
"-lsomelib"
],
**kwargs)
And I also have this rule which takes some sources, runs a tool on them, and generates a value, writing that value to an output file:
def _write_value_impl:
args = [f.path for f in ctx.files.srcs] + [ctx.outputs.out.path]
ctx.actions.run(
inputs = ctx.files.srcs,
outputs = [ctx.outputs.out],
arguments = args,
executable = ctx.executable._tool
)
write_value = rule(
implementation=_write_value_impl,
attrs = {
"srcs": attr.label_list(allow_files = True),
"out": attr.output(mandatory = True),
"_tool": attr.label(
executable = True,
allow_files = True,
default = Label("//tools:generate_value")
}
)
Okay, what I'd like to do is modify the macro so that it adds the value generated by the write_value rule to the linkopts. Something like this:
def build_it(name, value, **kwargs):
native.cc_binary(
name = name + ".out",
linkopts = [
"-Lsomedir",
"-lsomelib",
"-Wl,--defsym=SOME_SYMBOL={}".format(value)
],
**kwargs)
How do I make this work? The problem is that the target of build_it is generated at analysis time, but the value it needs is generated at evaluation time. Also, the value got put into a file. How do I get the value out of the file and give it to the macro?
I suspect that instead of a macro, I need a rule, but how do I get the rule to call native.cc_binary?
You can write a repository_rule() to create files and generate values prior to the loading phase, and then files in the #external_repo//... will be accessible by the rules during analysis. https://docs.bazel.build/versions/master/skylark/repository_rules.html
This can't be done in Bazel, precisely because of what you mentioned. All the inputs to rules need to be determined in the analysis phase rather than the execution phase. Bazel wants to build the complete action graph before executing any actions, and this would require the write_value rule to run before build_it could be analyzed.
A workaround might be to generate the BUILD file yourself outside of Bazel beforehand, and then use the generated BUILD file during your build.
Another workaround is to hard-code the linkopts to specify them to what you expect them to be. Then in write_value check if they are what you expected and if not throw an exit code. That way Bazel will at least warn you when they aren't going to match, but it will take some effort to update both places so they're aligned again.
For your specific problem there is a concept of linker scripts, and even implicit linker scripts. Perhaps you could generate one and supply that to cc_binary in the srcs attribute. You may need to name it as a .o file (even though it's not an object file). The GCC linker documentation says:
If you specify a linker input file which the linker can not recognize
as an object file or an archive file, it will try to read the file as
a linker script.