Installing python packages with extras using Bazel pip_parse - bazel

I use Python Rules for Bazel to build my python projects. I use pip_parse to install pip packages, as described in the guide, but that doesn't seem to be working for packages with extras.
For example, I have the following dependency in my requirements.txt:
ray[air,data,default]==2.1.0
WORKSPACE
pip_parse(
name = "pip",
python_interpreter_target = interpreter,
requirements_lock = "//python:requirements.txt",
)
load("#pip//:requirements.bzl", "install_deps")
install_deps()
BUILD.bazel
load("#pip//:requirements.bzl", "requirement")
py_library(
name = "lib",
srcs = glob(["*.py"]),
deps = [
requirement("ray"),
],
)
py_binary(
name = "app",
srcs = ["app.py"],
deps = [":lib"],
)
When I run py_binary with bazel run :app I see the following error:
File "/private/var/tmp/_bazel_andrii/.../app.py", line 2, in <module>
import ray
File "/private/var/tmp/_bazel_andrii/...pip_ray/site-packages/ray/__init__.py", line 171, in <module>
from ray import data # noqa: E402,F401
ImportError: cannot import name 'data' from partially initialized module 'ray' (most likely due to a circular import)
If I change BUILD.bazel to use requirement as requirement("ray[data]") I see different error:
invalid label '#pip_ray[data]//:pkg' in element 1 of attribute 'deps' in 'py_library' rule: invalid repository name '#pip_ray[data]': workspace names may contain only A-Z, a-z, 0-9, '-', '_' and '.'
How do I install and use pip packages with extras?

Related

Generating C++ files via py_binary and genrule

I have a Python script named blob_to_cpp.py (located at scirpts/blob_to_cpp.py relative to the WORKSPACE.bazel file). The Python script takes an input file (e.g. weights/rt_alb.tza) and generates from that a C++ header (.h) and source file (.cpp) that I want to add to a cc_binary.
The source code of my minimal reproducible example can be found here.
The Python script can be called via:
bazel run //:blob_to_cpp -- -o weights/rt_alb.cpp -H weights/rt_alb.h weights/rt_alb.tza
I try to use a genrule to invoke the python script (bazelized via py_binary as //:blob_to_cpp)
bazel/odin_generate_cpp_from_blob.bzl:
"""
SPDX-FileCopyrightText: 2023 Julian Amann <dev#vertexwahn.de>
SPDX-License-Identifier: Apache-2.0
"""
def generate_cpp_from_blob_cc_library(name, **kwargs):
native.genrule(
name = "%s_weights_gen" % name,
srcs = ["weights/" + name],
outs = [
"weights/" + name[0:-4] + ".cpp",
"weights/" + name[0:-4] + ".h",
],
cmd = "./$(location //:blob_to_cpp) weights/%s -o weights/%s.cpp -H weights/%s.h" % (name, name[0:-4], name[0:-4]),
tools = ["//:blob_to_cpp"],
)
native.cc_library(
name = name,
srcs = ["weights/" + name[0:-4] + ".cpp"],
hdrs = ["weights/" + name[0:-4] + ".h"],
**kwargs
)
When the generate_cpp_from_blob_cc_library Bazel macro is invoked I recive the following error messages (bazel build //:Demo):
ERROR: /Users/vertexwahn/dev/Piper/BazelDemos/intermediate/Cpp/BlobToCpp/BUILD.bazel:14:34: declared output 'weights/rt_alb.cpp' was not created by genrule. This is probably because the genrule actually didn't create this output, or because the output was a directory and the genrule was run remotely (note that only the contents of declared file outputs are copied from genrules run remotely)
ERROR: /Users/vertexwahn/dev/Piper/BazelDemos/intermediate/Cpp/BlobToCpp/BUILD.bazel:14:34: declared output 'weights/rt_alb.h' was not created by genrule. This is probably because the genrule actually didn't create this output, or because the output was a directory and the genrule was run remotely (note that only the contents of declared file outputs are copied from genrules run remotely)
ERROR: /Users/vertexwahn/dev/Piper/BazelDemos/intermediate/Cpp/BlobToCpp/BUILD.bazel:14:34: Executing genrule //:rt_alb.tza_weights_gen failed: not all outputs were created or valid
Target //:Demo failed to build
My goal is to generate the files weights/rt_alb.cpp and weights/rt_alb.h. I need them in the weights folder since my cc_binary is expecting that the header file is within the weights folder (#include "weights/rt_alb.h").
My BUILD.bazel file looks like this:
load("//bazel:odin_generte_cpp_from_blob.bzl", "generate_cpp_from_blob_cc_library")
py_binary(
name = "blob_to_cpp",
srcs = ["scripts/blob_to_cpp.py"],
data = ["weights/rt_alb.tza"]
)
generate_cpp_from_blob_cc_library(
name = "rt_alb.tza"
)
cc_binary(
name = "Demo",
srcs = ["main.cpp"],
deps = [":rt_alb.tza"],
)
Any hints to get this working are welcome!
The problem
declared output 'weights/rt_alb.cpp' was not created by genrule
usually means the command in the genrule is putting the files someplace other than where bazel expects them. You can use $(location target) for inputs and outputs, as well as for tools:
# Copyright 2023 Google LLC.
# SPDX-License-Identifier: Apache-2.0
def generate_cpp_from_blob_cc_library(name, **kwargs):
src = "weights/" + name
cpp_out = "weights/" + name[0:-4] + ".cpp"
header_out = "weights/" + name[0:-4] + ".h"
native.genrule(
name = "%s_weights_gen" % name,
srcs = [src],
outs = [
cpp_out,
header_out,
],
cmd = ("./$(location //:blob_to_cpp) $(location {src}) " +
"-o $(location {cpp_out}) " +
"-H $(location {header_out})").format(
src = src,
cpp_out = cpp_out,
header_out = header_out),
tools = ["//:blob_to_cpp"],
)
native.cc_library(
name = name,
srcs = [cpp_out],
hdrs = [header_out],
**kwargs
)

Reuse different parts of downloaded package (directory output)?

New to bazel so please bear with me :) I have a genrule which basically downloads and unpacks a a package:
genrule(
name = "extract_pkg",
srcs = ["#deb_pkg//file:pkg.deb"],
outs = ["pkg_dir"],
cmd = "dpkg-deb --extract $< $(#D)/pkg_dir",
)
Naturally pkg_dir here is a directory. There is another rule which uses this rule as input to create executable, but the main point is that I now need to add a rule (or something) which will allow me to use some headers from that package. This rule is used as an input to a cc_library which is then used in other parts of the repository to get access to the headers. Tried like this:
genrule(
name = "pkg_headers",
srcs = [":extract_pkg"],
outs = [
"pkg_dir/usr/include/pkg/h1.h",
"pkg_dir/usr/include/pkg/h2.h"
]
)
But it seems Bazel doesn't like the fact that both rules use the same directory as output, even though the second one doesn't do anything (?):
output file 'pkg_dir' of rule 'extract_pkg' conflicts with output file 'pkg_dir/usr/include/pkg/h1.h' of rule 'pkg_headers'
It works fine if I use different "root" directory for both rules, but I think there must be some better way to do this.
EDIT
I tried to use declare_directory as follows (compiled from different sources):
unpack_deb.bzl:
def _unpack_deb_impl(ctx):
input_deb_file = ctx.file.deb
output_dir = ctx.actions.declare_directory(ctx.attr.name + ".cc")
print(input_deb_file.path)
print(output_dir.path)
ctx.actions.run_shell(
inputs = [ input_deb_file ],
outputs = [ output_dir ],
arguments = [ input_deb_file.path, output_dir.path ],
progress_message = "Unpacking %s to %s" % (input_deb_file.path, output_dir.path),
command = "dpkg-deb --extract \"$1\" \"$2\"",
)
return [DefaultInfo(files = depset([output_dir]))]
unpack_deb = rule(
implementation = _unpack_deb_impl,
attrs = {
"deb": attr.label(
mandatory = True,
allow_single_file = True,
doc = "The .deb file to be unpacked",
),
},
doc = """
Unpacks a .deb file and returns a directory.
""",
)
BUILD.bazel:
load(":unpack_deb.bzl", "unpack_deb")
unpack_deb(
name = "pkg_dir",
deb = "#deb_pkg//file:pkg.deb"
)
cc_library(
name = "headers",
linkstatic = True,
srcs = [ "pkg_dir" ],
hdrs = ["pkg_dir.cc/usr/include/pkg/h1.h",
"pkg_dir.cc/usr/include/pkg/h2.h"],
strip_include_prefix = "pkg_dir.cc/usr/include",
)
The trick with adding .cc so the input can be accepted by cc_library was stolen from this answer. However the command fails on
ERROR: missing input file 'blah/blah/pkg_dir.cc/usr/include/pkg/h1.h'
From the library.
When I run with debug, I can see the command being "executed" (strange thing is that I don't always see this printout):
SUBCOMMAND: # //blah/pkg:pkg_dir [action 'Unpacking tmp/deb_pkg/file/pkg.deb to blah/pkg/pkg_dir.cc', configuration: xxxx]
(cd /home/user/.../execroot/src && \
exec env - \
/bin/bash -c 'dpkg-deb --extract "$1" "$2"' '' tmp/deb_pkg/file/pkg.deb bazel-out/.../pkg/pkg_dir.cc)
After execution, bazel-out/.../pkg/pkg_dir.cc exists but is empty. If I run the command manually it extracts files correctly. What might be the reason? Also, is it correct that there's an empty string directly after bash command line string?
Bazel's genrule doesn't work very well with directory outputs. See https://docs.bazel.build/versions/master/be/general.html#general-advice
Bazel mostly works with individual files, although there's some support for working with directories in Starlark rules with https://docs.bazel.build/versions/master/skylark/lib/actions.html#declare_directory
Your best bet is probably to extract all the files you're interested in in the genrule, then create filegroups for the different groups of files:
genrule(
name = "extract_pkg",
srcs = ["#deb_pkg//file:pkg.deb"],
outs = [
"pkg_dir/usr/include/pkg/h1.h",
"pkg_dir/usr/include/pkg/h2.h",
"pkg_dir/other_files/file1",
"pkg_dir/other_files/file2",
],
cmd = "dpkg-deb --extract $< $(#D)/pkg_dir",
)
filegroup(
name = "pkg_headers",
srcs = [
":pkg_dir/usr/include/pkg/h1.h",
":pkg_dir/usr/include/pkg/h2.h",
],
)
filegroup(
name = "pkg_other_files",
srcs = [
":pkg_dir/other_files/file1",
":pkg_dir/other_files/file2",
],
)
If you've seen glob, you might be tempted to use glob(["pkg_dir/usr/include/pkg/*.h"]) or similar for the srcs of the filegroup, but note that glob works only with "source files", which means files already on disk, not with the outputs of other rules.
There are rules for creating debs, but I'm not aware of rules for importing them. It's possible to write such rules using Starlark:
https://docs.bazel.build/versions/master/skylark/repository_rules.html
With repository rules, it's possible to avoid having to explicitly write out all the files you want to extract, among other things. Might be more work than you want to do though.

In Bazel, how can I make a C++ library depend on a general rule?

I have a library that depends on graphics files that are generated by a shell script.
I would like the library, when it is compiled, to use the shell script to generate the graphics files, which should be copied as if it were a 'data' statement, but whenever I try to make the library depend on the genrule, I get
in deps attribute of cc_library rule //graphics_assets
genrule rule '//graphics_assets:assets_gen_rule' is misplaced here
(expected cc_inc_library, cc_library, objc_library or
cc_proto_library)
# This is the correct format.
# Here we want to run all the shader .glsl files through the program
# file_utils:archive_tool (which we also build elsewhere) and copy the
# output .arc file to the data.
# 1. List the source files
filegroup(
name = "shader_source",
srcs = glob([
"shaders/*.glsl",
]),
)
# 2. invoke file_utils::archive_tool on the shaders
genrule(
name = "shaders_gen_rule",
srcs = [":shader_source"],
outs = ["shaders.arc"],
cmd = "echo $(locations shader_source) > temp.txt ; \
$(location //common/file_utils:archive_tool) \
--create_from_list=temp.txt \
--archive $(OUTS) ; \
$(location //common/file_utils:archive_tool) \
--ls --archive $(OUTS) ",
tools = ["//common/file_utils:archive_tool"],
)
# 3. when a a binary depends on this tool the arc file will be copied.
# This is the thing I had trouble with
cc_library(
name = "shaders",
srcs = [], # Something
data = [":shaders_gen_rule"],
linkstatic = 1,
)

How can I create a custom bazel build rule that uses the runfiles path of another rule?

I'm trying to create a custom build rule to build a pip package from a py_binary output. To create my pip package I want to invoke a shell script. The shell script builds a pip package by creating a zip file from the runfiles of the py_binary output.
For example suppose I have
py_binary(
name = "some_binary",
srcs = ["some_binary.py"],
srcs_version = "PY2AND3",
)
Building this rule produces
bazel-bin/some_binary.runfiles
I would now like to create a custom build rule that will invoke my shell script with the location bazel-bin/some_binary.runfiles
I tried creating a macro
def build_pip_package(
name, py_binary=None, setup_file=None):
"""Create a pip package from a py_binary.
The source file should be a text file with python formatting i.e.
Args:
name: Name for the rule.
py_binary: Build rule producing the py_binary
setup_file: Build rule producing the setup.py file to use to produce
the package.
"""
output= "somefile"
native.genrule(
name=name,
outs=[output],
cmd="echo $(location //:build_pip_package_script) "
+ "--py_runfiles_path=$(locations %s)" % py_binary
+ " --setup_file=$(location %s) " % setup_file,
tools=["//:build_pip_package_script"],
srcs=[setup_file, py_binary])
This ends up invoking my shell script with
bazel-out/local-fastbuild/bin/some_binary/model_train some_binary.py
How can I invoke my script with the location of the runfiles directory corresponding to the some_binary target.
If you put the py_binary in the tools attribute instead of srcs bazel will include the runfiles tree. You can access it via "$(location %s).runfiles" % py_binary

Bazel: copy multiple files to binary directory

I need to copy some files to binary directory while preserving their names. What I've got so far:
filegroup(
name = "resources",
srcs = glob(["resources/*.*"]),
)
genrule(
name = "copy_resources",
srcs = ["//some/package:resources"],
outs = [ ],
cmd = "cp $(SRCS) $(#D)",
local = 1,
output_to_bindir = 1,
)
Now I have to specify file names in outs but I can't seem to figure out how to resolve the labels to obtain the actual file names.
To make a filegroup available to a binary (executed using bazel run) or to a test (when executed using bazel test) then one usually lists the filegroup as part of the data of the binary, like so:
cc_binary(
name = "hello-world",
srcs = ["hello-world.cc"],
data = [
"//your_project/other/deeply/nested/resources:other_test_files",
],
)
# known to work at least as of bazel version 0.22.0
Usually the above is sufficient.
However, the executable must then recurse through the directory structure "other/deeply/nested/resources/" in order to find the files from the indicated filegroup.
In other words, when populating the runfiles of an executable, bazel preserves the directory nesting that spans from the WORKSPACE root to all the packages enclosing the given filegroup.
Sometimes, this preserved directory nesting is undesirable.
THE CHALLENGE:
In my case, I had several filegroups located at various points in my project directory tree, and I wanted all the individual files of those groups to end up side-by-side in the runfiles collection of the test binary that would consume them.
My attempts to do this with a genrule were unsuccessful.
In order to copy individual files from multiple filegroups, preserving the basename of each file but flattening the output directory, it was necessary to create a custom rule in a bzl bazel extension.
Thankfully, the custom rule is fairly straightforward.
It uses cp in a shell command much like the unfinished genrule listed in the original question.
The extension file:
# contents of a file you create named: copy_filegroups.bzl
# known to work in bazel version 0.22.0
def _copy_filegroup_impl(ctx):
all_input_files = [
f for t in ctx.attr.targeted_filegroups for f in t.files
]
all_outputs = []
for f in all_input_files:
out = ctx.actions.declare_file(f.basename)
all_outputs += [out]
ctx.actions.run_shell(
outputs=[out],
inputs=depset([f]),
arguments=[f.path, out.path],
# This is what we're all about here. Just a simple 'cp' command.
# Copy the input to CWD/f.basename, where CWD is the package where
# the copy_filegroups_to_this_package rule is invoked.
# (To be clear, the files aren't copied right to where your BUILD
# file sits in source control. They are copied to the 'shadow tree'
# parallel location under `bazel info bazel-bin`)
command="cp $1 $2")
# Small sanity check
if len(all_input_files) != len(all_outputs):
fail("Output count should be 1-to-1 with input count.")
return [
DefaultInfo(
files=depset(all_outputs),
runfiles=ctx.runfiles(files=all_outputs))
]
copy_filegroups_to_this_package = rule(
implementation=_copy_filegroup_impl,
attrs={
"targeted_filegroups": attr.label_list(),
},
)
Using it:
# inside the BUILD file of your exe
load(
"//your_project:copy_filegroups.bzl",
"copy_filegroups_to_this_package",
)
copy_filegroups_to_this_package(
name = "other_files_unnested",
# you can list more than one filegroup:
targeted_filegroups = ["//your_project/other/deeply/nested/library:other_test_files"],
)
cc_binary(
name = "hello-world",
srcs = ["hello-world.cc"],
data = [
":other_files_unnested",
],
)
You can clone a complete working example here.

Resources