Local cache for bazel remote repos - bazel

We are using codeship to run CI for a C++ project. Our CI build consists of a Docker image into which we install system dependencies, then a bazel build step that builds our tests.
Our bazel WORKSPACE file pulls in various external dependencies, such as gtest:
new_http_archive(
name = "gtest",
url = "https://github.com/google/googletest/archive/release-1.7.0.zip",
build_file = "thirdparty/gtest.BUILD",
strip_prefix = "googletest-release-1.7.0",
sha256 = "b58cb7547a28b2c718d1e38aee18a3659c9e3ff52440297e965f5edffe34b6d0",
)
During CI builds, a lot of time is spent downloading these files. Is it possible to set up Bazel to use a local cache for these archives?

I think Bazel already caches external repositories in the output_base (It should, if not it's a bug worth reporting). Is it an option for you to keep the cache hot in the docker container? E.g. by fetching the code and running bazel fetch //... or some more specific target? Note you can also specify where is bazel`s output_base by using bazel --output_base=/foo build //.... You might find this doc section relevant.
[EDIT: Our awesome Kristina comes to save the day]:
You can use --experimental_repository_cache=/path/to/some/dir
Does this help?

Related

Bazel builds from scratch ignoring cache

I observe that my Bazel build agent frequently builds the project from scratch (including compiling grpc, which keeps unchanged) instead of taking results from cache. Is there a way, like query or cquery (pardon my ignorance) to determine why is the cache considered invalid for particular target? Or any techniques to tackle cache invalidation problem?
This is How the bazel build works :
When running a build or a test, Bazel does the following: Loads the BUILD files relevant to the target. Analyzes the inputs and their dependencies, applies the specified build rules, and produces an action graph. Executes the build actions on the inputs until the final build outputs are produced.
If you are having any clear assumptions can you please share the complete details!
This is most likely due to the rebuild sensitivity to particular environment variables. Many build actions will read from environment variables and use them to change the outputs. Bazel keeps track of this and will rebuild seemingly unchanged remote targets when your env changes.
To demonstrate this;
Build grpc (2x ensure it is cached the second time)
Change the PATH environment variable (your IDE may do this without you knowing)
mkdir ~/bin && export PATH=$PATH:~/bin
Rebuild grpc (This should trigger a complete rebuild)
There are a couple helpful flags to combat this rebuild sensitivity, and I'd recommend adding them to your bazelrc.
incompatible_strict_action_env: Freezes your environment and doesn't source environment variables from your shell.
action_env modify environment variables as needed for you build.
# file //.bazelrc
# Don't source environment from shell
build --incompatible_strict_action_env
# Use action_env as needed for your project
build --action_env=CC=clang

Bazel - Build, Push, Deploy Docker Containers to Kubernetes within Monorepo

I have a monorepo with some backend (Node.js) and frontend (Angular) services. Currently my deployment process looks like this:
Check if tests pass
Build docker images for my services
Push docker images to container registry
Apply changes to Kubernetes cluster (GKE) with kubectl
I'm aiming to automate all those steps with the help of Bazel and Cloud Build. But I am really struggling to get started with Bazel:
To make it work I'll probably need to add a WORKSPACE file with my external dependencies and multiple BUILD files for my own packages/services? I need help with the actual implementation:
How to build my Dockerfiles with Bazel?
How push those images into a registry (preferably GCR)?
How to apply changes to Google Kubernetes Engine automatically?
How to integrate this toolchain with Google Cloud Build?
More information about the project
I've put together a tiny sample monorepo to showcase my use-case
Structure
├── kubernetes
├── packages
│ ├── enums
│ ├── utils
└── services
├── gateway
General
Gateway service depends on enums and utils
Everything is written in Typescript
Every service/package is a Node module
There is a Dockerfile inside the gateway folder, which I want to be built
The Kubernetes configuration are located in the kubernetes folder.
Note, that I don't want to publish any npm packages!
What we want is a portable Docker container that holds our Angular app along with its server and whatever machine image it requires, that we can bring up on any Cloud provider, We are going to create an entire pipeline to be incremental. "Docker Rules" are fast. Essentially, it provides instrumentality by adding new Docker layers, so that the changes you make to the app are the only things sent over the wire to the cloud host. In addition, since Docker images are tagged with a SHA, we only re-deploy images that changed. To manage our production deployment, we will use Kubernetes, for which Bazel rules also exist. Building a docker image from Dockerfile using Bazel is not possible to my knowledge because
it's by design not allowed due to non-hermetic nature of Dockerfile. (Source:
Building deterministic Docker images with Bazel)
The changes done as part of the source code are going to get deployed in the Kubernetes Cluster, This is one way to achieve the following using Bazel.
We have to put Bazel in watch mode, Deploy replace tells the Kubernetes cluster to update the deployed version of the app.
a.
Command : ibazel run :deploy.replace
In case there are any source code changes do it in the angular.
Bazel incrementally re-builds just the parts of the build graph that depend on the changed file, In this case, that includes the ng_module that was changed, the Angular app that includes that module, and the Docker nodejs_image that holds the server. As we have asked to update the deployment, after the build is complete it pushes the new Docker container to Google Container Registry and the Kubernetes Engine instance starts serving it. Bazel understands the build graph, it only re-builds what is changed.
Here are few Snippet level tips, which can actually help.
WORKSPACE FILE:
Create a Bazel Workspace File, The WORKSPACE file tells Bazel that this directory is a "workspace", which is like a project root. Things that are to be done inside the Bazel Workspace are listed below.
• The name of the workspace should match the npm package where we publish, so that these imports also make sense when referencing the published package.
• Mention all the rules in the Bazel Workspace using "http_archive" , As we are using the angular and node the rules should be mentioned for rxjs, angular,angular_material,io_bazel_rules_sass,angular-version,build_bazel_rules_typescript, build_bazel_rules_nodejs.
• -Next we have to load the dependencies using "load". sass_repositories, ts_setup_workspace,angular_material_setup_workspace,ng_setup_workspace,
• Load the docker base images also , in our case its "#io_bazel_rules_docker//nodejs:image.bzl",
• Dont forget to mention the browser and web test repositaries
web_test_repositories()
browser_repositories(
chromium = True,
firefox = True,
)
"BUILD.bazel" file.
• Load the Modules which was downloaded ng_module, the project module etc.
• Set the Default visiblity using the "default_visibility"
• if you have any Jasmine tests use the ts_config and mention the depndencies inside it.
• ng_module (Assets,Sources and Depndeencies should be mentioned here )
• If you have Any Lazy Loading scripts mention it as part of the bundle
• Mention the root directories in the web_package.
• Finally Mention the data and the welcome page / default page.
Sample Snippet:
load("#angular//:index.bzl", "ng_module")
ng_module(
name = "src",
srcs = glob(["*.ts"]),
tsconfig = ":tsconfig.json",
deps = ["//src/hello-world"],
)
load("#build_bazel_rules_nodejs//:future.bzl", "rollup_bundle")
rollup_bundle(
name = "bundle",
deps = [":src"]
entry_point = "angular_bazel_example/src/main.js"
)
Build the Bundle using the Below command.
bazel build :bundle
Pipeline : through Jenkins
Creating the pipeline through Jenkins and to run the pipeline there are stages. Each Stage does separate tasks, But in our case we use the stage to publish the image using the BaZel Run.
pipeline {
agent any
stages {
stage('Publish image') {
steps {
sh 'bazel run //src/server:push'
}
}
}
}
Note :
bazel run :dev.apply
Dev Apply maps to kubectl apply, which will create or replace an existing configuration.(For more information see the kubectl documentation.) This applies the resolved template, which includes republishing images. This action is intended to be the workhorse of fast-iteration development (rebuilding / republishing / redeploying).
If you want to pull containers using the workpsace file use the below tag
container_pull(
name = "debian_base",
digest = "sha256:**",
registry = "gcr.io",
repository = "google-appengine/debian9",
)
If GKE is used, the gcloud sdk needs to be installed and as we are using GKE(Google Contianer Enginer), It can be authenticated using the below method.
gcloud container clusters get-credentials <CLUSTER NAME>
The Deploymnet Object should be mentioned in the below format:
load("#io_bazel_rules_k8s//k8s:object.bzl", "k8s_object")
k8s_object(
name = "dev",
kind = "deployment",
template = ":deployment.yaml",
images = {
"gcr.io/rules_k8s/server:dev": "//server:image"
},
)
Sources :
https://docs.bazel.build/versions/0.19.1/be/workspace.html
https://github.com/thelgevold/angular-bazel-example
https://medium.com/#Jakeherringbone/deploying-an-angular-app-to-kubernetes-using-bazel-preview-91432b8690b5
https://github.com/bazelbuild/rules_docker
https://github.com/GoogleCloudPlatform/gke-bazel-demo
https://github.com/bazelbuild/rules_k8s#update
https://codefresh.io/howtos/local-k8s-draft-skaffold-garden/
https://github.com/bazelbuild/rules_k8s
A few months later and I've gone relatively far in the whole process.
Posting every detail here would just be too much!
So here is the open-source project which has most of the requirements implemented: https://github.com/flolu/fullstack-bazel
Feel free to contact me with specific questions! :)
Good luck
Flo, have you considered using terraform and a makefile for auto-building the cluster?
In my recent project, I automated infrastructure end to end with make & terraform. Essentially, that approach builds the entire cluster, build and deploys the entire project with one single command within 3 - 5 minutes. Depends on how fast gcp is on a given day.
There is a google sample project showing the idea although the terraform config is outdated and needs to be replaced with a config adhering to the current 0.13 / 0/14 syntax.
https://github.com/GoogleCloudPlatform/gke-bazel-demo#build--deploy-with-bazel
The makefile that enables the one-command end to end automation:
https://github.com/GoogleCloudPlatform/gke-bazel-demo/blob/master/Makefile
Again, replace or customize the scripts for your project; I actually wrote two more scripts, one for checking / installing requirements on the client i.e. git / kubctl & gcloud, and another one for checking or configuring & authentication gcloud in case it's not yet configured and authenticated. From there, the terraform script takes over and build the entire cluster and once that's done, the usual auto-deployment kicks in.
I find the idea of layering make over terraform & bazel for end to end automation just brilliant.

How can Cloud Build take dynamic parameters to increment a registry tag?

I want my Cloud Build to push an image to a registry with an incremented tag. So, when the trigger arrives from GitHub, build the image, and if the latest tag was 1.10, tag the new one 1.11. Similarly, the 1.11 value will serve in multiple other steps in the build.
Reading the registry and incrementing the tag is easy (in a bash Cloud Build step), but Cloud Build has no way to pass parameters. (Substitutions come from outside the Cloud Build process, for example from the Git tags, and are not generated inside the process.)
This StackOverflow question and this article say that Cloud Build steps can communicate by writing files to the workspace directory.
That is clumsy. But worse, this requires using shell steps exclusively, not the native docker-building steps, nor the native image command.
How can I do this?
Sadly you can't. The Cloud Builder image have each time their own sandbox and only the /workspace directory is mounted. By the way, all the environment variable, binaries installed and so, doesn't persist from one container to the next one.
You have to use the shell script each time :( The easiest way is to have a file in your /workspace directory (for example env.var file)
# load the environment variable
source /workspace/env.var
# Add variable
echo "NEW=Variable" >> /workspace/env.var
For this, Cloud Build is boring...

remotely-built `java_binary`s can't run locally

I'm using Bazel 0.29.1 with remote execution to build java_binary targets. They are straightforward targets with a few sources and deps, e.g.
java_binary(
name = "foo",
main_class = "my.foo",
runtime_deps = [
"//my/foo",
"//third_party/jvm/org/apache/logging/log4j:log4j_core",
],
)
The remote execution config is using rbe_autoconfig from Bazel toolchains 0.29.8 and the default build container.
The binary builds fine with bazel --config=remote build //:foo. However, it fails when run with bazel --config=remote run //:foo:
/home/kgessner/.cache/bazel/_bazel_kgessner/[snip]/foo: line 359: /usr/lib/jvm/java-8-openjdk-amd64/bin/java: No such file or directory
The java_binary wrapper/launcher has the wrong java path: /usr/lib/jvm/java-8-openjdk-amd64/bin/java is the path to java in the build container, but not locally.
I can't find the right combination of java flags to make this work: build remotely but use the local JRE when it's run. What's the trick?
Sounds like a bug. Could you please file it on GitHub?

How to stop Bazel from trying to download packages in offline environment

Bazel is trying to download packages on python test. I've wrote a simple python code, and a test file testing it.
I'm running `bazel test //test:python-test and I get the following error:
/Path/to/build/external/bazel_tools/tools/jdk/build:305:1: no such package '#remotejdk_linux//': java.io.IOException: error downloading [ unknown host: mirror.bazel.build and referenced by '#bazel_tools//tools/jdk:remote_jdk'
Now, that's obviously a problem in my workspace, where we work offline. Is there any way to work offline with bazel?
Using the following flags will force bazel to use your local java:
--host_javabase=#bazel_tools//tools/jdk: absolute_javabase --define=ABSOLUTE_JAVABASE=/path/to/my/jdk
You can add them to your local .bazelrc file to write shorter command-line
You can manually download requested artifact and put it in cache before calling build. Bazel will not download artifact if it's already exists in local cache.

Resources