Using build system to run tests or interact with clusters - bazel

What is the purpose of projects like these below that use Bazel for things other than building software?
https://github.com/bazelbuild/rules_webtesting
https://github.com/bazelbuild/rules_k8s
Are they just conveniently providing environment for run command (as opposed to building portable executables) or am I missing something?
The best I can tell is that Bazel could be used to run only subset of E2E tests based on knowledge what changed.

Disclaimer: I have only cursory knowledge about k8s and docker.
Bazel isn't just used for building and testing, it can also deploy, as you've discovered with the rules in those projects.
The best I can tell is that Bazel could be used to run only subset of E2E tests based on knowledge what changed.
Correct, but also extend tests to deployments. If you've only changed a single string in your Go binary that's injected into an image, Bazel is able to use rules_k8s, rules_docker, and rules_go to:
incrementally and reproducibly rebuild the minimum set files to
build the new Go executable
create a new image layer containing the Go executable (without using Docker)
push the image to the registry
redeploy changed pod(s) into cluster
The notable thing is that if you didn't change the source file, Bazel will always create an image with the same digest due to its reproducibility. That means that you can now trust the deployment workflow to not redeploy/restart a pod even if you do a bazel run twice or more.
For more, check out this BazelCon 2017 talk: Using Bazel for Fast, Correct Docker Deployments w/ Databricks
Fun fact: you can also use bazel run to start a REPL for your build target, from 0.15.0 onwards. Haskell and Scala rules use this.

Related

Best practice for running Symfony 5 project with Docker and Docker-Swarm

I have an existing Symfony 5 project with a mysql database and a nginx Webserver. I wanna dockerize this project, but on the web I have found different opinions how to do that.
My plan is to write a multi-stage Docker file with at least a dev and a prod stage and let this build with docker-swarm. In my opinion it is useful to install the complete code during the build and having multiple composer.json files (one for every stage). In the web I have found opinions to not install the app new on every build but to copy the vendor and var folder to the container. Another opinion was to start the installation after the build process of the container is ready. But I think with that the service is not ready, when the app is successfully deployed.
What are you thinking is the best practice here?
Build exactly the same image for all environments
Do not build 2 different images for prod and dev. One of the main docker benefits is, that you can provide exactly the same environment for production and dev.
You should control your environment with ENV vars. for example, you can enable Xdebug for dev and disable it for prod.
Composer has option to install dev and production packages. You should use this feature.
If you decide to install some packages to dev. Try to use the same Dockerfile for both environment. Do not use Dockerfile.prod and Dockerfile.dev It will introduce some mess in the future.
Multistage build
You can do multistage build described in the official Docker documentation if your build environment requires much more dependencies than your runtime.
Example of it is compilation of the program. During compilation you need a lot of libraries, and you produce single binary. So your runtime does not need all dev libraries.
First stage can be build in second stage you just copy binary and it is.
Build all packages into the docker image
You should build your application when Docker image is building. All libraries and packages should be copied into image, you should not install them when the application is starting. Reasons:
Application starts faster when everything is installed
Some of the libraries can be changed in future or removed. You will be in troubles and probably you will spend a lot of time to do debugging.
Implement health checks
You should implement Health-Check. Applications require external dependencies, like Passwords, API KEY, some non sensitive data. Usually, We inject data with environment variables.
You should check if all required variables are passed, and have good format before your application is started. You can implement Health-Check or you can check it in the entrypoint.
Test your solution before it is released
You should implement mechanism of testing your images. For example in the CI:
Run your unit tests before image is built
Build the Docker image
Start new application image with dummy data. If you require PostgreSQL DB you can start another container,
Run integration tests.
Publish new version of the image only if all tests pass.

Bazel: Mixing a Linux remote execution platform with a Mac OS local platform

Goal
I am using Bazel to build a multiplatform C++ client (iOS, OSX, Android, Windows).
iOS and OSX are built locally on my Mac (out of necessity). Android and Windows are built inside a Docker container.
At the end of the build I have a Bazel rule that takes each cc_binary rule for each platform and puts them in a .zip.
I'd like to utilize Bazel's remote execution API to build some of my binaries in the container and others locally and then reference a shared cache to collate them together -- all with one bazel build command.
Bazel support
Bazel claims that these types of multiplatform builds -- where the host (OSX x64), execution (Linux x64), and target platforms (many) are all different -- are possible.
See https://docs.bazel.build/versions/master/platforms.html
My experience
However, I hit this exact issue: https://github.com/bazelbuild/bazel/issues/5397 (where docker-sandbox is a correct proxy for remote builds.)
This, alongside the below Github issue, makes me question Bazel's claim about multiplatform builds.
https://github.com/bazelbuild/bazel/issues/5309
Fundamentally, these issues seem to say that local targets for one platform (e.g., OSX) cannot be built alongside remote targets on another platform (e.g. Linux).
Question
I was wondering:
(1) Is what I am trying to do fundamentally at odds with Bazel's design? If so, what is meant by Bazel being multiplatform?
(2) Is there a workaround I can employ that maintains hermiticity and stays within the Bazel build system? It could be possible to mount a Docker volume and then write a script that combines the Docker cache with my local cache, but it seems like Bazel was built to handle my use case. Am I missing something here?
Related questions: Does bazel support remote execution on different platforms? (Doesn't have a satisfactory answer.)
(1) Is what I am trying to do fundamentally at odds with Bazel's design?
In theory no, in practice yes. Bazel provides functionality which allows users to support your use-case, but it is not implemented by default.
Specifically, as described in the linked Bazel issues: Bazel rules currently make assumptions about the relationship between the host and target platforms which don't hold in your case, e.g. it will auto-detect the JDK files on your host (macOS) and then default to using these JDK files across all Java actions - regardless of target platform.
If so, what is meant by Bazel being multiplatform?
In practice, it means that you can run bazel build ... on multiple platforms and expect that Bazel will transform your inputs into outputs compatible for the current platform.
(2) Is there a workaround I can employ that maintains hermiticity and stays within the Bazel build system?
Yes, you can run bazel build ... from within a Windows VM or Docker container. This was the workaround that the Bazel team recommended when I asked this question.
Relevant advanced Bazel features:
If you want to build for multiple target platforms with one Bazel invocation, have a look at Bazel user-defined transitions (this would allow you to build the same rule for multiple platforms, e.g. iOS and macOS at once, but require you to write your own rules).
If you don't want to run bazel build from within containers/VMs, you can write your own C++ toolchain. At its core, Bazel gives each action a sandbox with all dependent files and guarantees that it will execute a specific command. In a custom C++ toolchain, you could tell Bazel to call a script instead of clang, which takes the command + the files and executes them from within a VM or container. This is likely a lot of work, but is definitely possible.

Concurrent build within Docker with regards to multi staging

I have a monolithic repo that contains all of my projects. The current setup I have is to bring up a build container, mount my monolithic repo, and build my projects sequentially. Copy out the binaries, and build their respective runtime (production) containers sequentially.
I find this process quite slow and want to improve the speed. Two main approach I want to take is
Within the build container, build my project binaries concurrently. Instead of sequentially.
Like step 1, also build my runtime (production) containers concurrently.
I did some research and it seems like there are two Docker features that are of my interest:
Multi-stage building. Which allows me to skip worrying about the build container and put everything into one Dockerfiles.
--parallel option for docker-compose, which would solve approach #2, allowing me to build my runtime containers concurrently.
However, there's still two main issues:
How do I glue the two features together?
How do I build my binaries concurrently inside the build Docker? In other words, how can I achieve approach #1?
Clarifications
Regardless of whether multi-stage is used or not, there's two logical phases.
First is the binary building phase. During this phase, the artifacts are the compiled executables (binaries) from the build containers. Since I'm not using multi-stage build, I'm copying these binaries out to the host, so the host serves as an intermediate staging area. Currently, the binaries are being built sequentially, I want to build them concurrently inside the build container. Hence approach #1.
Second is the image building phase. During this phase, the binaries from the previous phase, which are now stored on the host, are used to built my production images. I also want to build these images concurrently, hence approach #2.
Multi-stage allows me to eliminate the need for an intermedia staging area (the host). And --parallel allows me to build the production images concurrently.
What I'm wondering is how I can achieve approach #1 & #2 using multi-stage and --parallel. Because for every project, I can define a separate multi-stage Dockerfiles and call --parallel on all of them to have their images built separately. This would achieve approach #2, but this would spawn a separate build container for each project and take up a lot of resource (I use the same build container for all my projects and it's 6 GB). On the other hand, I can write a script to build my project binaries concurrently inside the build container. This would achieve approach #1, but then I can't use multi-stage if I want to build the production images concurrently.
What I really want is a Dockerfiles like this:
FROM alpine:latest AS builder
RUN concurrent_build.sh binary_a binary_b
FROM builder AS prod_img_a
COPY binary_a .
FROM builder AS prod_img_b
COPY binary_b .
And be able to run a docker-compose command like this (I'm making this up):
docker-compose --parallel prod_img_a prod_img_b
Further clarifications
The run-time binaries and run-time containers are not separate things. I just want to be able to parallel build the binaries AND the production images.
--parallel does not use different hosts, but my build container is huge. If I use multi-stage build and running something like 15 of these build containers in parallel on my local dev machine could be bad.
I'm thinking about compiling the binary and run-time containers separately too but I'm not finding an easy way to do that. I have never used docker commit, would that sacrifice docker cache?
Results
My mono-repo containers 16 projects, some are micro services being a few MBs, some are bigger services that are about 300 to 500 MBs.
The build contains the compilation of two prerequisites, one is gRPC, and the other is XDR. both are trivially small, taking only 1 or 2 seconds to build.
The build contains a node_modules installation phase. NPM install and build is THE bottleneck of the project and by far the slowest.
The strategy I am using is to split the build into two stages:
First stage is to spin up a monolithic build docker, mount the mono-repo to it with cache consistency as a binding volume. And build all of my container's binary dependencies inside of it in parallel using Goroutines. Each Goroutine is calling a build.sh bash script that does the building. The resulting binaries are written to the same mounted volume. There is cache being used in the form of a mounted docker volume, and the binaries are preserved across runs to a best effort.
Second stage is to build the images in parallel. This is done using docker's Go SDK documented here. This is also done in parallel using Goroutines. Nothing else is special about this stage besides some basic optimizations.
I do not have performance data about the old build system, but building all 16 projects easily took the upper bound of 30 minutes. This build was extremely basic and did not build the images in parallel or use any optimizations.
The new build is extremely fast. If everything is cached and there's no changes, then the build takes ~2 minutes. In other words, the overhead of bring up the build system, checking the cache, and building the same cached docker images takes ~2 minutes. If there's no cache at all, the new build takes ~5 minutes. A HUGE improvement from the old build.
Thanks to #halfer for the help.
So, there are several things to try here. Firstly, yes, do try --parallel, it would be interesting to see the effect on your overall build times. It looks like you have no control over the number of parallel builds though, so I wonder if it would try to do them all in one go.
If you find that it does, you could write docker-compose.yml files that only contain a subset of your services, such that you only have five at a time, and then build against each one in turn. Indeed, you could write a script that reads your existing YAML config and splits it up, so that you do not need to maintain your overall config and your split-up configs separately.
I suggested in the comments that multi-stage would not help, but I think now that this is not the case. I was wondering whether the second stage in a Dockerfile would block until the first one is completed, but this should not be so - if the second stage starts from a known image then it should only block when it encounters a COPY --from=first_stage command, which you can do right at the end, when you copy your binary from the compilation stage.
Of course, if it is the case that multi-stage builds are not parallelised, then docker commit would be worth a try. You've asked whether this uses the layer cache, and the answer is I don't think it matters - your operation here would thus:
Spin up the binary container to run a shell or a sleep command
Spin up the runtime container in the same way
Use docker cp to copy the binary from the first one to the second one
Use docker commit to create a new runtime image from the new runtime container
This does not involve any network operations, and so should be pretty quick - you will have benefited greatly from the parallelisation already at this point. If the binaries are of non-trivial size, you could even try parallelising your copy operations:
docker cp binary1:/path/to/binary runtime1:/path/to/binary &
docker cp binary2:/path/to/binary runtime2:/path/to/binary &
docker cp binary3:/path/to/binary runtime3:/path/to/binary &
Note though these are disk-bound operations, so you may find there is no advantage over doing them serially.
Could you give this a go and report back on:
your existing build times per container
your existing build times overall
your new build times after parallelisation
Do it all locally to start off with, and if you get some useful speed-up, try it on your build infrastructure, where you are likely to have more CPU cores.

DevOps vs Docker

I am wondering how exactly does docker fit into CI /CD .
I understand that with help of containers, you may focus on code , rather than dependencies/environment. But once you check-in your code, you will expect tools like TeamCity, Jenkins or Bamboo to take care of integration build , integration test/unit tests and deployment to target servers ( after approvals) where you will expect same Docker container image to run the built code.
However, in all above, Docker is nowhere in the CI/CD cycle , though it comes into play when execution happens at server. So, why do I see articles listing it as one of the things for DevOps.
I could be wrong , as I am not a DevOps guru, please enlighten !
Docker is just another tool available to DevOps Engineers, DevOps practitioners, or whatever you want to call them. What Docker does is it encapsulates code and code dependencies in a single unit (a container) that can be run anywhere where the Docker engine is installed. Why is this useful? For multiple reasons; but in terms of CI/CD it can help Engineers separate Configuration from Code, decrease the amount of time spent doing dependency management etc., can use it to scale (with the help of some other tools of course). The list goes on.
For example: If I had a single code repository, in my build script I could pull in environment specific dependencies to create a Container that functionally behaves the same in each environment, as I'm building from the same source repository, but it can contain a set of environment specific certificates and configuration files etc.
Another example: If you have multiple build servers, you can create a bunch of utility Docker containers that can be used in your CI/CD Pipeline to do a certain operation by pulling down a Container to do something during a stage. The only dependency on your build server now becomes Docker Engine. And you can change, add, modify, these utility containers independent of any other operation performed by another utility container.
Having said all of that, there really is a great deal you can do to utilize Docker in your CI/CD Pipelines. I think an understanding of what Docker is, and what Docker can do is more important that a "how to use Docker in your CI/CD" guide. While there are some common patterns out there, it all comes down to the problem(s) you are trying to solve, and certain patterns may not apply to a certain use case.
Docker facilitates the notion of "configuration as code". I can write a Dockerfile that specifies a particular base image that has all the frameworks I need, along with the custom configuration files that are checked into my repository. I can then build that image using the Dockerfile, push it to my docker registry, then tell my target host to pull the latest image, and then run the image. I can do all of this automatically, using target hosts that have nothing but Linux installed on them.
This is a simple scenario that illustrates how Docker can contribute to CI/CD.
Docker is also usefull for building your applications. If you have multiple applications with different dependencies you can avoid having a lot of dependencies and conflicts on your CI machine by building everything in docker containers that have the necessary dependencies. If you need to scale in the future all you need is another machine running your CI tool (like jenkins slave), and an installation of docker.
When using microservices this is very important. One applicatio can depend on an old version of a framework while another needs the new version. With containers thats not problem.
Docker is a DevOps Enabler, Not DevOps Itself: Using Docker, developers can support new development, enhancement, and production support tasks easily. Docker containers define the exact versions of software in use, this means we can decouple a developer’s environment from the application that needs to be serviced or enhanced.
Without Pervasive Automation, Docker Won’t Do Much for You : You can’t achieve DevOps with bad code. You must first ensure that the code being delivered is of the highest quality by automating all developer code delivery tasks, such as Unit testing, Integration testing, Automated acceptance testing (AAT), Static code analysis, code review sign offs & pull request workflow, and security analysis.
Leapfrogging to Docker without Virtualization Know-How Won’t Work : Leapfrogging as an IT strategy rarely works. More often than not new technologies bring about abstractions over existing technologies. It is true that such abstractions increase productivity, but they are not an excuse to skip the part where we must understand how a piece of technology works.
Docker is a First-Class Citizen on All Computing Platforms : This is the right time to jump on to the Docker bandwagon. For the first time ever Docker is supported on all major computing platforms in the world. There are two kinds of servers: Linux servers and Windows servers. Native Docker support for Linux existed from Day 1, since then Linux support has been optimized to the point of having access to the pint-sized.
Agile is a Must to Achieve DevOps : DevOps is a must to achieve Agile. The point of Agile is adding and demonstrating value iteratively to all stakeholders without DevOps you likely won’t be able to demonstrate the value you’re adding to stakeholders in a timely manner. So why is Agile also a must to achieve DevOps? It takes a lot of discipline to create a stream of continuous improvement and an Agile framework like Scrum defines fundamental qualities that a team must possess to begin delivering iteratively.
Docker saves the wastage for your organization capital and resources by containerizing our application. Containers on a singe host are isolated from each other and thy uses same OS resources. This frees up RAM, CPU and storage etc. Docker makes it easy to package our application along with all the required dependencies in an image. For most of the application we have readily available base images. One can create customized base image as well. We build our own custom image by writing simple Dockerfile. We can have this image shipped to central registry from where we can PULL it to deploy into various environments like QA, STAGE and PROD. This All these activities can be automated by CI tools like Jenkins.
In a CI/CD pipeline you can expect the Docker coming into picture when the build is ready. Initially CI server (Jenkins) will checkout the code from SCM in a temporary workspace where the application is built. Once you have the build artifact ready, you can package it as an image with the dependencies. Jenkins does this by executing simple docker build commands.
Docker removes what we all know the matrix from hell problem, making the environments independent with its container technology. An open source project Docker changed the game by simplifying container workflows and this has resulted in a lot of excitement around using containers in all stages of the software delivery lifecycle, from development to production.
It is not just about containers, it involves building Docker images, managing your images and dependencies on any Docker registry, deploying to an orchestration platform, etc. and it all comes under CI/CD process.
DevOps is a culture or methodology or procedure to deliver our development is very fast. Docker is a one of the tool in our devops culture to deploy application as container technology (use less resources to deploy our application).
Docker just package devloper environment to run on other system so that developer need not to worry about whether there code work in there system and not work in production due to differences in environment and operating system.
It just make the code portable to other environments.

Should I Compile My Application Inside of a Docker Image

Although most of the time I am developing Java apps and am simply using Maven so my builds should be reproducible (at least that's what Maven says).
But say you are compiling a C++ program or something a little more involved, should you build inside of docker?
Or ideally use vagrant or another technology to produce reproduce able builds.
How do you manage reproducible build with docker?
You can, but not in your final image, as that would mean a much larger image than necessary: it would include all the compilation tool, instead of limiting to only what you need to execute the resulting binary.
You can see an alternative in "How do I build a Docker image for a Ruby project without build tools?"
I use an image to build,
I commit the resulting stopped container as a new image (with a volume including the resulting binary)
I use an execution image (one which only contain what you need to run), and copy the binary from the other image. I commit again the resulting container.
The final image includes the compiled binary and the execution environment.
I wanted to post an answer to this as well actually because to build on VonC's answer. Actually I just had Redhat Openshift training and they use a tool called Source to Image s2i, which uses docker to create docker images. And actually this strategy is great for managing a private (or public) cloud, where your build may be compiled on different machines, but you need to keep the build environment consistent.

Resources