How to modify just one file when repackaging an archive? - bazel

I'm trying to take a non-Bazel produced zipfile, modify some files in it, keeping most of them alone, and then ultimately produce a new tarball with the ~original content (plus my modifications)
I'm having trouble specifying my rules in a clean way and it'd be great if there was a suggestion on how to do it.
I'm importing the original zip file via the 'new_http_archive' WORKSPACE rule. This works pretty well. I put the build file in a package one level under the root. Let's call this 'foo_repackage'.
In foo_repackage/BUILD.root_archive:
package(default_visibility = ["//visibility:public"])
filegroup(
name = "all_files",
srcs = glob(
["**"],
exclude = ["*", "share/doc/api/**"]
),
)
The bigger issue is in the foo_repackage/BUILD file, I'd like to take all of the files out of the all_files group above, except for a few of them that I will modify. I can't see how to do this easily. It seems like every file that I want to modify I should exclude from the above glob and make a new rule that specifies that file. This means that I have to keep modifying the global all_files exclude rule.
If I could create a new filegroup that was all of the above files with some files excluded, that'd be ideal.
I should mention that the last step is of course to use pkg_tar to repackage the result - this is in foo_repackage/BUILD
pkg_tar(
name = "OutputTarball",
files = ["#root_archive//:all_files"],
deps = [":layers_of_modified_files"],
strip_prefix = "/../root_archive",
)
Does anyone have a better way to do this?
Thanks, Sean

Could you use a variable like:
MODIFIABLE_FILES = [
"some/file",
"another/file",
...
]
filegroup(
name = "static-files",
srcs = glob(["**"], exclude = MODIFIABLE_FILES)
)
filegroup(
name = "modifiable-files",
srcs = MODIFIABLE_FILES,
)
Then the list of static files and modifiable files will be kept in sync and you'll get a build error if you accidentally specify a non-existent modifiable file.

Related

bazel: how to create a rule that strips relative path of all files in subfolders

I'm trying to write a bazel BUILD for GSL
the problem is that it has various gsl_*.h header files in subfolders, but they are always included as #include <gsl/gsl_somename.h> so for example the header gsl_errno.h that lives in gsl/err/gsl_errno.h is included as #include <gsl/gsl_errno.h> and gsl_math.h that lives in gsl/gsl_math.h is also included as #include <gsl/gsl_math.h>.
I tried to create a separate cc_library for each folder and use strip_include_prefix and include_prefix like so:
cc_library(
name = "gsl_sys",
visibility = ["//visibility:public"],
srcs = [
"sys/minmax.c",
"sys/prec.c",
"sys/hypot.c",
"sys/log1p.c",
"sys/expm1.c",
"sys/coerce.c",
"sys/invhyp.c",
"sys/pow_int.c",
"sys/infnan.c",
"sys/fdiv.c",
"sys/fcmp.c",
"sys/ldfrexp.c",
],
hdrs = glob(
include = ["sys/*.h"],
),
strip_include_prefix = "sys/",
include_prefix = "gsl/",
)
but the problem is if I go by folder then there are circular dependencies (for example gsl/gsl_math.h includes gsl/sys/gsl_sys.h but some files in gsl/sys include gsl_*.h files that live in the gsl/ root folder.
I think optimally I'd have one cc_library with all the gsl_*.h files but such that they are all accessible as #include <gsl/gsl_*.h> independently of what subfolder they are in.
how can I achieve that?
I would copy them all to a new folder, and then use those new copied versions for your cc_library. A genrule is the simplest way to do this. Something like this in a BUILD file at the top level (don't put BUILD files in any of the subfolders; you want it all in the same package so one rule can handle all the files):
# You could list all the headers instead of the glob, or something
# similar, if you only want a subset of them.
all_headers = glob(["gsl/*/*.h"])
# This is the logic for actually remapping the paths. Consider a macro
# instead of writing it inline like this if it gets more complicated.
unified_header_paths = ["unified_gsl/" + p.split("/")[-1] for p in all_headers]
genrule(
name = "unified_gsl"
srcs = all_headers,
outs = unified_header_paths,
cmd = "\n".join(["cp $(location %s) $(location %s)" %
(src, dest) for src, dest in zip(all_headers, unified_header_paths)]),
)
The files would end up like this after copying:
unified_gsl/gsl/gsl_math.h
unified_gsl/gsl/gsl_sys.h
And then you can write a cc_library like:
cc_library(
name = "gsl_headers",
hdrs = [":unified_gsl"],
strip_include_prefix = "unified_gsl/",
)
cc_library.hdrs is looking for files, so it will grab all the outputs from the genrule.
If you want to do more complicated things with the files than just moving them around, consider a full custom rule. If you include all the copied headers in your DefaultInfo.files, then just passing the target's label to cc_library.hdrs will work like it does with the genrule.

Bazel: share macro between multiple http_archive BUILD files

My project depends on some external libraries which I have to bazelfy myself. Thus, my WORKSPACE:
http_archive(
name = "external_lib_component1",
build_file = "//third_party:external_lib_component1.BUILD",
sha256 = "xxx",
urls = ["https://example.org/external_lib_component1.tar.gz"],
)
http_archive(
name = "external_lib_component2",
build_file = "//third_party:external_lib_component2.BUILD",
sha256 = "yyy",
urls = ["https://example.org/external_lib_component2.tar.gz"],
)
...
The two entries above are similar, and external_lib_component{1, 2}.BUILD share a lot of code.
What is the best way to share code (macros) between them?
Just putting a shared_macros.bzl file into third_party/ won't work, because it will not be copied into
the archive location on build (only the build_file is copied).
If you place a bzl file such a In your./third_party/shared_macros.bzl into your tree as you've mentioned.
Then in the //third_party:external_lib_component1.BUILD and //third_party:external_lib_component2.BUILD you provide for your external dependencies, you can load symbols from that shared file using:
load("#//third_party:shared_macros.bzl", ...)
Labels starting with #// refer to packages from the main repository, even when used in an external dependency (as they would otherwise be rooted when starting with //. You can for check docs on labels, in particular the last paragraph.
Alternatively you can also refer to the "parent" project by its name. If in your WORKSPACE file you've had:
workspace(name = "parent")
You could say:
load("#parent//third_party:shared_macros.bzl", ...)
Note: in versions prior to 2.0.0 you might want to add --incompatible_remap_main_repo if you mixed both of above approaches in your project.

How to generate cc_library from an output directory from a genrule?

I have a binary that takes as input a single file and produces an unknown number of header and source C++ files into a single directory. I would like to be able to write a target like:
x_library(
name = "my_x_library",
src = "source.x",
)
where x_library is a macro that ultimately produces the cc_library from the output files. However, I can't bundle all the output files inside the rule implementation or inside the macro. I tried this answer but it doesn't seem to work anymore.
What's the common solution to this problem? Is it possible at all?
Small example of a macro using a genrule (not a huge fan) to get one C file and one header and provide them as a cc_library:
def x_library(name, src):
srcfile = "{}.c".format(name)
hdrfile = "{}.h".format(name)
native.genrule(
name = "files_{}".format(name),
srcs = [src],
outs = [srcfile, hdrfile],
cmd = "./generator.sh $< $(OUTS)",
tools = ["generator.sh"],
)
native.cc_library(
name = name,
srcs = [srcfile],
hdrs = [hdrfile],
)
Used it like this then:
load(":myfile.bzl", "x_library")
x_library(
name = "my_x_library",
src = "source.x",
)
cc_binary(
name = "tgt",
srcs = ["mysrc.c"],
deps = ["my_x_library"],
)
You should be able to extend that with any number of files (and for C++ content; IIRC the suffices are use for automagic decision how to call the tools) as long as your generator input -> generated content is known and stable (generally a good thing for a build). Otherwise you can no longer use genrule as you need your custom rule (probably a good thing anyways) to use TreeArtifact as described in the linked answer. Or two, one with .cc suffix and one with .hh so that you can pass them to cc_library.

How to use same filegroup definition in different subprojects

I need to declare specific resources for some of sub-projects and I'm doing it following way
filegroup(
name = "some_resources",
visibility = ["//:app"],
srcs = glob([
"src/my/resources/**/*.resources",
]),
)
In any subproject however the path where one can find resources is the same. My question would be - what is the most bazelian (bazelish?) way to minimize code duplication in this particular case?
Basically I want to have something like
expose_some_resources() in relevant subprojects.
and then make this resources visible for every app.
You can put the filegroup into a macro in a .bzl file, and load and run that macro in the relevant subprojects.
so something like:
workspace/resources.bzl:
def expose_some_resources():
native.filegroup(
name = "some_resources",
visibility = ["//:app"],
srcs = native.glob([
"src/my/resources/**/*.resources",
]),
)
workspace/subproject/BUILD:
load("//:resources.bzl", "expose_some_resources")
expose_some_resources()
You might also consider adding some error checking to the macro, like checking that the macro is called only once per package using native.existing_rule, or checking that the glob returns 1 or more files.

How do I unzip a file in bazel properly if I don't know the contents of the zip?

I was to define a rule that unzips a given zip file. However, I don't know the contents of the zip, so I cannot specify outs in a genrule, for example. This seems like a common problem, and googling around leads me to people who have encountered similar scenarios, but I haven't yet seen a specific example of how to solve this.
I want something like:
genrule(
name="unzip",
src="file.zip",
outs=glob(["**"]), # except you're not allowed to use glob here
cmd = "unzip $(location file)",
)
You could use a Workspace Rule to create a BUILD file for the zip that globs everything.
Something like this in your WORKSPACE file:
new_http_archive(
name = "my_zip",
url = "http://example.com/my_zip.zip",
build_file_content = """
filegroup(
name = "srcs",
srcs = glob(["*"]),
visibility = ["//visibility:public"]
)
"""
)
Then from a BUILD file you can reference this as an input using #my_zip//:srcs

Resources