Bazel: How to extend existing docker image? - bazel

I know in Dockerfile I can extend existing docker image using:
FROM python/python
RUN pip install request
But how to extend it in bazel?
I am not sure if I should use container_import, but with that I am getting the following error:
container_import(
name = "postgres",
base_image_registry = "some.artifactory.com",
base_image_repository = "/existing-image:v1.5.0",
layers = [
"//docker/new_layer",
],
)
root#ba5cc0a3f0b7:/tcx# bazel build pkg:postgres-instance --verbose_failures --sandbox_debug
ERROR: /tcx/docker/postgres-operator/BUILD.bazel:12:17: in container_import rule //docker/postgres-operator:postgres:
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/2f47bbce04529f9da11bfed0fc51707c/external/io_bazel_rules_docker/container/import.bzl", line 98, column 35, in _container_import_impl
"config": ctx.files.config[0],
Error: index out of range (index is 0, but sequence has 0 elements)
ERROR: Analysis of target '//pkg:postgres-instance' failed; build aborted: Analysis of target '//docker/postgres-operator:postgres' failed
INFO: Elapsed time: 0.209s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (1 packages loaded, 2 targets configured)

container_import is the correct rule to import an existing image. However, all it does is import, it doesn't pull it from anywhere. I think you're looking for container_pull instead, which will pull an image from a repository and then automatically use container_import to translate it for other rules_docker rules.
To add a new layer, use container_image, with base set to the imported image and tars set to the additional files you want to add. Or, if you want to add things in other formats, see the docs for alternates to tars (like debs or files).
Putting it all together, something like this in your WORKSPACE:
container_pull(
name = "postgres",
registry = "some.artifactory.com",
repository = "existing-image",
tag = "v1.5.0",
)
and then this in a BUILD file:
container_image(
name = "postgres_plus",
base = "#postgres//image",
tars = ["//docker/new_layer"],
)
The specific problem you're running into is that container_pull.layers isn't for adding new layers, it's for specifying the layers of the image you're importing. You could import those some other way (http_archive, check in the tar files, etc) and then specify them all by hand instead of using container_pull if you're doing something unusual.

Related

Bazel's container_pull failing to pull aws-cli image

tldr; When I try to pull an AWS-CLI image from Docker Hub using Bazel, I'm getting odd 404 errors. Pulling other images in the same way works fine.
I'm trying to use Bazel in my monorepo to (among many other things) create several Docker images. One of the Docker images I'm creating uses the verified AWS CLI image as a base.
I'm following along with the rules_docker documentation along with examples provided in that repo.
WORKSPACE File:
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_file")
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "io_bazel_rules_docker",
sha256 = "b1e80761a8a8243d03ebca8845e9cc1ba6c82ce7c5179ce2b295cd36f7e394bf",
urls = ["https://github.com/bazelbuild/rules_docker/releases/download/v0.25.0/rules_docker-v0.25.0.tar.gz"],
)
load(
"#io_bazel_rules_docker//repositories:repositories.bzl",
container_repositories = "repositories",
)
container_repositories()
load("#io_bazel_rules_docker//repositories:deps.bzl", container_deps = "deps")
container_deps()
load(
"#io_bazel_rules_docker//container:container.bzl",
"container_pull",
)
load("#io_bazel_rules_docker//contrib:dockerfile_build.bzl",
"dockerfile_image")
container_pull(
name = "alpine_linux_amd64",
digest = "sha256:954b378c375d852eb3c63ab88978f640b4348b01c1b3456a024a81536dafbbf4",
registry = "index.docker.io",
repository = "library/alpine",
# tag field is ignored since digest is set
tag = "3.8",
)
container_pull(
name = "aws_cli",
digest = "sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99",
registry = "index.docker.io",
repository = "library/amazon",
# tag field is ignored since digest is set
tag = "2.9.9",
)
http_file(
name = "sam_archive",
downloaded_file_path = "aws-sam-cli-linux-x86_64.zip",
sha256 = "74264b224f133461e324e7877ed8218fe38ac2320ba498024f0c297de7bb3e95",
urls = [
"https://github.com/aws/aws-sam-cli/releases/download/v1.67.0/aws-sam-cli-linux-x86_64.zip",
],
)
And BUILD file:
load("#io_bazel_rules_docker//container:container.bzl", "container_image", "container_layer")
load("#io_bazel_rules_docker//contrib:test.bzl", "container_test")
load("#io_bazel_rules_docker//docker/util:run.bzl", "container_run_and_commit")
# Includes the aws-cli installation archive
container_image(
name = "aws_cli",
base = "#aws_cli//image"
)
container_image(
name = "basic_alpine",
base = "#alpine_linux_amd64//image",
cmd = ["Hello World!"],
entrypoint = ["echo"],
)
Building basic_alpine works fine:
$ bazel build //:basic_alpine
INFO: Analyzed target //:basic_alpine (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:basic_alpine up-to-date:
bazel-bin/basic_alpine-layer.tar
INFO: Elapsed time: 1.140s, Critical Path: 0.99s
INFO: 50 processes: 16 internal, 34 linux-sandbox.
INFO: Build completed successfully, 50 total actions
Admittedly new to Bazel and maybe I'm not doing this correctly, but building aws_cli fails:
$ bazel build //:aws_cli
INFO: Repository aws_cli instantiated at:
/home/jdibling/repos/stream-ai.io/products/filedrop/monorepo/WORKSPACE:38:15: in <toplevel>
Repository rule container_pull defined at:
/home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/io_bazel_rules_docker/container/pull.bzl:294:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'aws_cli':
Traceback (most recent call last):
File "/home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/io_bazel_rules_docker/container/pull.bzl", line 240, column 13, in _impl
fail("Pull command failed: %s (%s)" % (result.stderr, " ".join([str(a) for a in args])))
Error in fail: Pull command failed: 2022/12/23 08:31:25 Running the Image Puller to pull images from a Docker Registry...
2022/12/23 08:31:29 Image pull was unsuccessful: reading image "index.docker.io/library/amazon#sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99": GET https://index.docker.io/v2/library/amazon/manifests/sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:library/amazon Type:repository]]
(/home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/go_puller_linux_amd64/file/downloaded -directory /home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/aws_cli/image -os linux -os-version -os-features -architecture amd64 -variant -features -name index.docker.io/library/amazon#sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99)
ERROR: /home/jdibling/repos/stream-ai.io/products/filedrop/monorepo/WORKSPACE:38:15: fetching container_pull rule //external:aws_cli: Traceback (most recent call last):
File "/home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/io_bazel_rules_docker/container/pull.bzl", line 240, column 13, in _impl
fail("Pull command failed: %s (%s)" % (result.stderr, " ".join([str(a) for a in args])))
Error in fail: Pull command failed: 2022/12/23 08:31:25 Running the Image Puller to pull images from a Docker Registry...
2022/12/23 08:31:29 Image pull was unsuccessful: reading image "index.docker.io/library/amazon#sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99": GET https://index.docker.io/v2/library/amazon/manifests/sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:library/amazon Type:repository]]
(/home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/go_puller_linux_amd64/file/downloaded -directory /home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/aws_cli/image -os linux -os-version -os-features -architecture amd64 -variant -features -name index.docker.io/library/amazon#sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99)
ERROR: /home/jdibling/repos/stream-ai.io/products/filedrop/monorepo/BUILD:6:16: //:aws_cli depends on #aws_cli//image:image in repository #aws_cli which failed to fetch. no such package '#aws_cli//image': Pull command failed: 2022/12/23 08:31:25 Running the Image Puller to pull images from a Docker Registry...
2022/12/23 08:31:29 Image pull was unsuccessful: reading image "index.docker.io/library/amazon#sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99": GET https://index.docker.io/v2/library/amazon/manifests/sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99: UNAUTHORIZED: authentication required; [map[Action:pull Class: Name:library/amazon Type:repository]]
(/home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/go_puller_linux_amd64/file/downloaded -directory /home/jdibling/.cache/bazel/_bazel_jdibling/4ce73e7de2c4ac9889a94fb9b2da25fc/external/aws_cli/image -os linux -os-version -os-features -architecture amd64 -variant -features -name index.docker.io/library/amazon#sha256:abb7e318502e78ec99d85bfa0121d5fbc11d8c49bb95f7f12db0b546ebd5ff99)
ERROR: Analysis of target '//:aws_cli' failed; build aborted: Analysis failed
INFO: Elapsed time: 4.171s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets configured)
Just a quick sanity check - should that be library/amazonlinux? AFAICT library/amazon does not exist. However, that one does not have a tag with the sha265 that you specify.
The link you have in the intro is for the amazon/aws-cli image, which does have that tag, so maybe that's the one that you mean to pull?

Bazel not pulling from remote cache in CI when download_pkgs and install_pkgs used together

I've noticed when I run a Bazel rule that depends on download_pkgs and then install_pkgs into a Docker image that we aren't getting any cache hits from the remote cache even though in consecutive runs I can see the packages downloaded in the download_pkgs rule have the same hash.
I found this link from a while back which explains an issue whereby doing download_pkgs and then immediately install_pkgs can lead to non-deterministic builds but I didn't think this would happen when the hashes of the packages from download_pkgs were consistent.
I'm wondering whether anyone else has seen this issue and whether the workaround is like in the above link (push downloaded packages elsewhere as a tar and then use http_file to get them) or whether there is some fundamental doing Bazel in Docker with remote caching config I am missing?
example rules below:
download_pkgs(
name = "download_ruby_apt_packages",
packages = [
"ca-certificates",
"debsums",
"g++",
"git",
"gnupg2",
"libperconaserverclient20-dev",
"libssl-dev",
"make",
"mysql-common",
"percona-server-client-5.7",
"percona-server-common-5.7",
"ruby2.7",
"ruby2.7-dev",
"zlib1g-dev",
],
)
install_pkgs(
name = "ubuntu2004_with_base_pkgs",
image_tar = "#ubuntu2004//image",
installables_tar = ":download_ruby_apt_packages.tar",
installation_cleanup_commands = "rm -rf /var/lib/apt/lists/*",
output_image_name = "ubuntu2004_with_base_pkgs",
)

How to I set an ENV VAR in my WORKSPACE in Bazel

I am trying to use Bazel with Pybind, and it requires that I set the following variables:
"""Repository rule for Python autoconfiguration.
`python_configure` depends on the following environment variables:
* `PYTHON_BIN_PATH`: location of python binary.
* `PYTHON_LIB_PATH`: Location of python libraries.
"""
https://github.com/pybind/pybind11_bazel/blob/master/python_configure.bzl
I dont want to have to pass it in manually when building my libraries, how can i hardcode these env vars in my WORKSPACE?
To (always) set environmental variable for a repository rule consumption, you case use --repo_env command line option. And if you want to include those with every invocation in your workspace, you can set add these flags to your .bazelrc file therein.
Now the wisdom of doing that could be questioned. If it's actually a project (repo) and not build host configuration, it would probably make more sense, be more targeted and more explicit, if it was an attribute of the given rule which was then checked in with the rest of the build configuration.
And looking at the name, there may be another question about specifying python configuration (from outside the bazel build) instead of actually using correctly resolved python toolchain (but there I have to say have no background in what the given rule is about and what is it trying to accomplish to render judgment, this is just a general comment).
To address your comment... I don't what other factors make it "not accept" or what exactly does that actually look like, but if I have this mini-example:
.
├── BUILD
├── WORKSPACE
└── customrule.bzl
Where customrule.bzl reads:
def _run_me(repo_ctx):
repo_ctx.file(
"WORKSPACE",
'workspace(name = "{}")\n'.format(repo_ctx.name),
executable = False,
)
repo_ctx.file(
"BUILD",
'exports_files(["var.sh"], visibility=["//visibility:public"])',
executable = False,
)
repo_ctx.file(
"var.sh",
"echo {}\n".format(repo_ctx.os.environ.get("var1")),
executable = True,
)
wsrule = repository_rule(
implementation = _run_me,
environ = ["var1"],
)
The WORKSPACE is:
load(":customrule.bzl", "wsrule")
wsrule(
name = "extdep"
)
And BUILD:
sh_binary(
name = "tgt",
srcs = ["#extdep//:var.sh"],
)
Then I do get:
$ bazel run --repo_env var1=val1 tgt
val1
and:
$ bazel run --repo_env var1=val2 tgt
val2
I.e. this is a way to pass variables to a repo rule and it does (as such) work.
If you absolutely know, you must call a build with some variable set to certain value (which as mentioned above is itself a requirement that is worth closer examination) and you want these to be associated with the project / repo. You can always check in a build.sh or any such file that wraps your bazel call to be exactly what it must be. But again, this looks more likely to not be really entirely "The Right Thing" to do or want.

Bazel get location of external dependency as command line arg for py_binary

I need the path to external (or internal) dependency to pass it as an argument to a function inside. We need the location to the folder, not specific files. Also, sometimes, we need the path to the folder where a shared library, generated by cc_library.
Python file
import cppyy
cppyy.add_include_path('path/to/external/dependency/1')
cppyy.add_library_path('path/to/another/external/dependency/2')
cppyy.add_include_path('path/to/another/internal/dependency')
cppyy.include('file/in/external/dependency')
BUILD file
py_binary(
name = "sample",
srcs = ["sample.py"],
deps = [
"#cppyy_archive//:cppyy",
],
data = [
"#external-dependency//location:target",
"//internal-dependency/location:target2"
]
)
From https://docs.bazel.build/versions/master/external.html#layout:
You can see the external directory by running:
ls $(bazel info output_base)/external
How the paths in external actually look like really depends on the rule used for the archive.
For example, if it's declared using an http_file in the WORKSPACE file:
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_file")
http_file(
name = "fenix",
urls = ["https://github.com/mozilla-mobile/fenix/archive/v76.0.0-beta.2.tar.gz"],
sha256 = "94050c664e5ec5b66cd2ca9f6a8b898987ab63d9602090533217df1a3f2dc5a9"
)
You will find that v76.0.0-beta.2.tar.gz file as external/fenix/file/downloaded:
user#host:~$ file $(bazel info output_base)/external/fenix/file/downloaded
/home/user/.cache/bazel/_bazel_user/761044447e04744e746cd54d0b4b5056/external/fenix/file/downloaded: gzip compressed data, from Unix, original size modulo 2^32 15759360

bazel with protobuf / gRPC-gateway / golang - getting started

So I am trying to convert a monorepo of micro services (C#, Go, NodeJS) to use bazel. Just playing with it for now.
I focus on one go service to get started and isolated it as a WORKSPACE.
The go service is gRPC service that uses protobuf obviously, grpc-gateway with the protoc-gen-swagger and also protoc-gen-gorm (this one does not support bazel).
The code builds using a command like go build cmd/server/server.go
I am hoping to get some guidance on how to get started to build this project with all the dependencies.
I see several rules available for protobuf/go and I am not yet comfortable browsing through them or deciding which is better (i cannot get any to work due to grpc gateway or protoc gen gorm)
- https://github.com/stackb/rules_proto
- https://github.com/bazelbuild/rules_go
- https://github.com/stackb/rules_proto/tree/master/github.com/grpc-ecosystem/grpc-gateway
Code structure looks like this:
/repo
svc1
svc2
svc3
cmd/server
BUILD.bazel
server.go
pkg
contains folders and some go files and a BUILD.bazel in each
proto
BUILD.bazel
test.proto
WORKSPACE
BUILD.bazel
Right now I only work on svc3. Later i will probably move the WORKSPACE to the parent folder.
My WORKSPACE looks like this:
load("#bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "io_bazel_rules_go",
sha256 = "96b1f81de5acc7658e1f5a86d7dc9e1b89bc935d83799b711363a748652c471a",
urls = [
"https://storage.googleapis.com/bazel-mirror/github.com/bazelbuild/rules_go/releases/download/0.19.2/rules_go-0.19.2.tar.gz",
"https://github.com/bazelbuild/rules_go/releases/download/0.19.2/rules_go-0.19.2.tar.gz",
],
)
load("#io_bazel_rules_go//go:deps.bzl", "go_register_toolchains", "go_rules_dependencies")
go_rules_dependencies()
go_register_toolchains()
http_archive(
name = "bazel_gazelle",
urls = [
"https://storage.googleapis.com/bazel-mirror/github.com/bazelbuild/bazel-gazelle/releases/download/0.18.1/bazel-gazelle-0.18.1.tar.gz",
"https://github.com/bazelbuild/bazel-gazelle/releases/download/0.18.1/bazel-gazelle-0.18.1.tar.gz",
],
sha256 = "be9296bfd64882e3c08e3283c58fcb461fa6dd3c171764fcc4cf322f60615a9b",
)
load("#bazel_gazelle//:deps.bzl", "gazelle_dependencies", "go_repository")
gazelle_dependencies()
load("#bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")
git_repository(
name = "com_google_protobuf",
commit = "09745575a923640154bcf307fba8aedff47f240a",
remote = "https://github.com/protocolbuffers/protobuf",
shallow_since = "1558721209 -0700",
)
load("#com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")
protobuf_deps()
+ a bunch of go_repository() created by Gazelle
Running gazelle created a bunch of build.bazel files for my go project in each folder.
Next to the .proto, I have a generated build.bazel file:
load("#io_bazel_rules_go//go:def.bzl", "go_library")
load("#io_bazel_rules_go//proto:def.bzl", "go_proto_library")
proto_library(
name = "svc_proto",
srcs = ["test.proto"],
visibility = ["//visibility:public"],
deps = [
# the two github below are referenced as go_repository
"#com_github_infobloxopen_protoc_gen_gorm//options:proto_library", # not sure what to put after the colon
"#com_github_grpc_ecosystem_grpc_gateway//protoc-gen-swagger/options:proto_library",
"#go_googleapis//google/api:annotations_proto",
],
)
go_proto_library(
name = "svc_go_proto",
compilers = ["#io_bazel_rules_go//proto:go_grpc"],
importpath = "src/test/proto/v1",
proto = ":svc_proto",
visibility = ["//visibility:public"],
deps = [
"//github.com/infobloxopen/protoc-gen-gorm/options:go_default_library",
"//github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger/options:go_default_library",
"#go_googleapis//google/api:annotations_go_proto",
],
)
go_library(
name = "go_default_library",
embed = [":svc_go_proto"],
importpath = "src/test/proto/v1",
visibility = ["//visibility:public"],
)
Now the questions:
not sure what to put to reference other proto files: "#com_github_infobloxopen_protoc_gen_gorm//options:proto_library" ? and not sure this is the best way to reference other external libraries from git.
if i build the above using bazel build //proto/v1:svc_proto, i get: no such target '#com_github_grpc_ecosystem_grpc_gateway//protoc-gen-swagger/options:proto_library': target 'proto_library' not declared in package 'protoc-gen-swagger/options'. Probably linked to 1.
i am not sure which rule to use. As I need grpc gateway, I guess i
need to exclusively use
https://github.com/stackb/rules_proto/tree/master/github.com/grpc-ecosystem/grpc-gateway
but i can't make them to work either.
I use statik (https://github.com/rakyll/statik) to package the
swagger file in go to server the swagger. Is there any alternative
or if not, how can i call a custom bash/command as part of the build
process in the chain?
In summary, I am pretty sure my BUILD.bazel file to build the proto and library is structured wrong and would appreciate some up-to-date guidance (github is full of repos that are outdated, using outdated rules or simply not working).

Resources