submitting cloud ml engine job for py_binary built with bazel - bazel

Are there any best practices for deploying CloudML Engine jobs when using bazel?
Let's say I have /src/project/pipeline/trainer, which contains my Tensorflow trainer application. Since I'm using bazel, my imports are relative to the workspace root. I basically need to tar my .runfiles directory and upload it, but pkg_tar with include_runfiles set does not work (https://github.com/bazelbuild/bazel/issues/4383), and pkg_runfiles rule is not open source.
I can think of a few ways to do this, but they're all pretty hacky.

Related

Bazel rules with unknown output filenames

I have a command that compiles and runs a program, but the intermediate files are randomly named (but contained within a directory). E.g.
build foo.src bar.src -o output_dir
run output_dir
Bazel requires me to pre-declare all of the outputs of my rule, but I can't do that because they're randomly named. Can I somehow name an entire directory instead?
The only alternative I can think of is having the rule zip/unzip the directory before/after it runs the commands, which is a pretty awful solution.
Edit: I found an issue exactly describing the "just zip/unzip everything" solution here. The closing comment says to just use the rules from rules_pkg to zip/unzip stuff. Unfortunately it requires Python too.
Some of the comments in that thread suggest you can use declare_directory() but I don't think that really works.
There are tree artifacts. An example of how to use an tree artifact can be found here.
Tree artifacts are problematic for caching since Bazel is not aware of the content of the corresponding directory and if for some reason the content of a tree artifact is different between two machines that use the same Bazel cache and same Bazel configuration you are trouble.

Build Beam pipelines using Bazel (with DataflowRunner)

I use Bazel to build my Beam pipeline. The pipeline works well using the DirectRunner, however, I have some trouble managing dependencies when I use DataflowRunner, Python can not find local dependencies (e.g. generated by py_library) in DataflowRunner. Is there any way to hint Dataflow to use the python binary (py_binray zip file) in the worker container to resolve the issue?
Thanks,
Please see here for more details on setting up dependencies for Python SDK on Dataflow. If you are using a local dependency, you should probably look into developing a Python package and using the extra_package option or developing a custom container.

How to get files produced during a Travis-CI build?

I am using Travis-CI to test code in a repository. There are quite some files after the testing and I would like to have them at a persistent place. How can I do that under the context of Travis-CI?
As an artificial example, suppose my Travis-CI server runs a C program that stores a large number of integers in a specific file. The file can be found at the Travis-CI server after the build. But how can I get that file? In my use case, this file is large and it would not make sense to read it from the console of Travis-CI; in other words, I would not consider using "cat ..." in .travis.yml.
After some search, here is what I got:
The most convenient way seems to deploy the generated files to GitHub pages. The process is explained here: https://docs.travis-ci.com/user/deployment/pages/. In short:
first, create a GitHub page from the repository under test. This can be done through the Github web of the repository. The outcome includes an additional remote branch called gh-=pages generated.
then, in .travis.yml, use the deploy section to specify the condition to do the deployment.

Bazel - Build, Push, Deploy Docker Containers to Kubernetes within Monorepo

I have a monorepo with some backend (Node.js) and frontend (Angular) services. Currently my deployment process looks like this:
Check if tests pass
Build docker images for my services
Push docker images to container registry
Apply changes to Kubernetes cluster (GKE) with kubectl
I'm aiming to automate all those steps with the help of Bazel and Cloud Build. But I am really struggling to get started with Bazel:
To make it work I'll probably need to add a WORKSPACE file with my external dependencies and multiple BUILD files for my own packages/services? I need help with the actual implementation:
How to build my Dockerfiles with Bazel?
How push those images into a registry (preferably GCR)?
How to apply changes to Google Kubernetes Engine automatically?
How to integrate this toolchain with Google Cloud Build?
More information about the project
I've put together a tiny sample monorepo to showcase my use-case
Structure
├── kubernetes
├── packages
│ ├── enums
│ ├── utils
└── services
├── gateway
General
Gateway service depends on enums and utils
Everything is written in Typescript
Every service/package is a Node module
There is a Dockerfile inside the gateway folder, which I want to be built
The Kubernetes configuration are located in the kubernetes folder.
Note, that I don't want to publish any npm packages!
What we want is a portable Docker container that holds our Angular app along with its server and whatever machine image it requires, that we can bring up on any Cloud provider, We are going to create an entire pipeline to be incremental. "Docker Rules" are fast. Essentially, it provides instrumentality by adding new Docker layers, so that the changes you make to the app are the only things sent over the wire to the cloud host. In addition, since Docker images are tagged with a SHA, we only re-deploy images that changed. To manage our production deployment, we will use Kubernetes, for which Bazel rules also exist. Building a docker image from Dockerfile using Bazel is not possible to my knowledge because
it's by design not allowed due to non-hermetic nature of Dockerfile. (Source:
Building deterministic Docker images with Bazel)
The changes done as part of the source code are going to get deployed in the Kubernetes Cluster, This is one way to achieve the following using Bazel.
We have to put Bazel in watch mode, Deploy replace tells the Kubernetes cluster to update the deployed version of the app.
a.
Command : ibazel run :deploy.replace
In case there are any source code changes do it in the angular.
Bazel incrementally re-builds just the parts of the build graph that depend on the changed file, In this case, that includes the ng_module that was changed, the Angular app that includes that module, and the Docker nodejs_image that holds the server. As we have asked to update the deployment, after the build is complete it pushes the new Docker container to Google Container Registry and the Kubernetes Engine instance starts serving it. Bazel understands the build graph, it only re-builds what is changed.
Here are few Snippet level tips, which can actually help.
WORKSPACE FILE:
Create a Bazel Workspace File, The WORKSPACE file tells Bazel that this directory is a "workspace", which is like a project root. Things that are to be done inside the Bazel Workspace are listed below.
• The name of the workspace should match the npm package where we publish, so that these imports also make sense when referencing the published package.
• Mention all the rules in the Bazel Workspace using "http_archive" , As we are using the angular and node the rules should be mentioned for rxjs, angular,angular_material,io_bazel_rules_sass,angular-version,build_bazel_rules_typescript, build_bazel_rules_nodejs.
• -Next we have to load the dependencies using "load". sass_repositories, ts_setup_workspace,angular_material_setup_workspace,ng_setup_workspace,
• Load the docker base images also , in our case its "#io_bazel_rules_docker//nodejs:image.bzl",
• Dont forget to mention the browser and web test repositaries
web_test_repositories()
browser_repositories(
chromium = True,
firefox = True,
)
"BUILD.bazel" file.
• Load the Modules which was downloaded ng_module, the project module etc.
• Set the Default visiblity using the "default_visibility"
• if you have any Jasmine tests use the ts_config and mention the depndencies inside it.
• ng_module (Assets,Sources and Depndeencies should be mentioned here )
• If you have Any Lazy Loading scripts mention it as part of the bundle
• Mention the root directories in the web_package.
• Finally Mention the data and the welcome page / default page.
Sample Snippet:
load("#angular//:index.bzl", "ng_module")
ng_module(
name = "src",
srcs = glob(["*.ts"]),
tsconfig = ":tsconfig.json",
deps = ["//src/hello-world"],
)
load("#build_bazel_rules_nodejs//:future.bzl", "rollup_bundle")
rollup_bundle(
name = "bundle",
deps = [":src"]
entry_point = "angular_bazel_example/src/main.js"
)
Build the Bundle using the Below command.
bazel build :bundle
Pipeline : through Jenkins
Creating the pipeline through Jenkins and to run the pipeline there are stages. Each Stage does separate tasks, But in our case we use the stage to publish the image using the BaZel Run.
pipeline {
agent any
stages {
stage('Publish image') {
steps {
sh 'bazel run //src/server:push'
}
}
}
}
Note :
bazel run :dev.apply
Dev Apply maps to kubectl apply, which will create or replace an existing configuration.(For more information see the kubectl documentation.) This applies the resolved template, which includes republishing images. This action is intended to be the workhorse of fast-iteration development (rebuilding / republishing / redeploying).
If you want to pull containers using the workpsace file use the below tag
container_pull(
name = "debian_base",
digest = "sha256:**",
registry = "gcr.io",
repository = "google-appengine/debian9",
)
If GKE is used, the gcloud sdk needs to be installed and as we are using GKE(Google Contianer Enginer), It can be authenticated using the below method.
gcloud container clusters get-credentials <CLUSTER NAME>
The Deploymnet Object should be mentioned in the below format:
load("#io_bazel_rules_k8s//k8s:object.bzl", "k8s_object")
k8s_object(
name = "dev",
kind = "deployment",
template = ":deployment.yaml",
images = {
"gcr.io/rules_k8s/server:dev": "//server:image"
},
)
Sources :
https://docs.bazel.build/versions/0.19.1/be/workspace.html
https://github.com/thelgevold/angular-bazel-example
https://medium.com/#Jakeherringbone/deploying-an-angular-app-to-kubernetes-using-bazel-preview-91432b8690b5
https://github.com/bazelbuild/rules_docker
https://github.com/GoogleCloudPlatform/gke-bazel-demo
https://github.com/bazelbuild/rules_k8s#update
https://codefresh.io/howtos/local-k8s-draft-skaffold-garden/
https://github.com/bazelbuild/rules_k8s
A few months later and I've gone relatively far in the whole process.
Posting every detail here would just be too much!
So here is the open-source project which has most of the requirements implemented: https://github.com/flolu/fullstack-bazel
Feel free to contact me with specific questions! :)
Good luck
Flo, have you considered using terraform and a makefile for auto-building the cluster?
In my recent project, I automated infrastructure end to end with make & terraform. Essentially, that approach builds the entire cluster, build and deploys the entire project with one single command within 3 - 5 minutes. Depends on how fast gcp is on a given day.
There is a google sample project showing the idea although the terraform config is outdated and needs to be replaced with a config adhering to the current 0.13 / 0/14 syntax.
https://github.com/GoogleCloudPlatform/gke-bazel-demo#build--deploy-with-bazel
The makefile that enables the one-command end to end automation:
https://github.com/GoogleCloudPlatform/gke-bazel-demo/blob/master/Makefile
Again, replace or customize the scripts for your project; I actually wrote two more scripts, one for checking / installing requirements on the client i.e. git / kubctl & gcloud, and another one for checking or configuring & authentication gcloud in case it's not yet configured and authenticated. From there, the terraform script takes over and build the entire cluster and once that's done, the usual auto-deployment kicks in.
I find the idea of layering make over terraform & bazel for end to end automation just brilliant.

Building a non-uberjar Docker image with leiningen

I have a clojure project that depends on a Java library, that does not work, when it gets included in an uberjar. (It needs different XML descriptors using the same filename in different JAR files.)
Everything I find on using Docker with leiningen depends on building and packaging a uberjar. That's also how I built all clojure Docker images so far.
Is there any leiningen plugin out there, that understands to package a Docker image using several jar files like io.fabric8/docker-maven-plugin does?
Whenever packaging (uberjar, war) the big file that is created contains .class files and a directory structure. Where are these XML files supposed to be (class)loaded from? You can experiment with packing manually. After all it (whether uberjar, war or jar) is just a zip file.
When you know exactly the layout you need SBT is flexible enough to insure you can package from the many input jar files. Unfortunately lein plugins will do things like always overwrite duplicates, and you can't control the packaging behaviour. I can't remember exactly the inflexibilities, but I couldn't control how the packaging process went, what decisions were made.
For doing it manually I use a Linux something called Archive Manager, which I found to be much better than what I used when on Windows. Doing it manually may be all you need. The downside of SBT of course being that you have to learn it, which includes a bit of Scala.
It needs different XML descriptors using the same filename in different JAR files.
Just thinking about this, is it that you need to append the contents of each file that is in a different jar into the one file that is in the uberjar? You can try it out. If it works and you need to package up often enough that manually creating and renaming a zip file every time becomes a pain, then I believe that SBT will be your best bet.
I have to package my container with the original jar file and then reference this jar in the classpath when starting the application
The classloader loads classes rather than jars. It is the container's job to unpackage all the things you give it, such as .class files, (uber)jars, wars. Any program that dynamically loads from the classpath is loading either classes or resources (things like .xml files). I suppose a .jar file could be a resource, in which case you would put the jar file in the uberjar. So it is still possible to package it up.

Resources