Google Compute Engine: Caching when running Builds on Preemptible VMs

Google Compute Engine: Caching when running Builds on Preemptible VMs - jenkins

We just migrated our Jenkins builds from a build agent running 24/7 to use GCE Preemtible VMs via the google-compute-engine-plugin.
Now our builds take much longer, because all builds need to resolve all dependencies (Docker Images, Maven Artifacts, NPM Packages, etc.) almost every time. The Caching on the VM is no longer effective, because the VMs are stopped after a couple of minutes.
Is there a quick solution or a best practice for this that works for the different use cases (Docker, Maven, NPM)?
For example
can I enable a Proxy or CDN that is "closer" (in terms of network latency) to the VMs in the Google Cloud?
Or would mounting a bucket for persisting the Images, local Maven Repo and NPM Cache speed things up?
Any other ideas?

CDN would cache HTTP(S) load balanced content so not sure whether its a right fit for your use case or not. Proxy could be a possible workaround in terms of latency but it may also depend on your design and use case. However, I was looking at this where they advised to use Google Cloud Storage (GCS). If you use GCS at the same region as VM, it seems would help speed up the process.

Related

What is the state of the art running Jenkins - dedicated server or containers?

I've been running Jenkins in container for about 6 months, only one controller/master and no additional nodes, because its not needed in my case, I think. It works OK. However I find it to be a hassle to make changes to it, not because I'm afraid it will crash, but because it takes a long time to build the image (15+ min), installing SDK's etc. (1.3G).
My question is what is the state of the art running Jenkins? Would it be better to move Jenkins to a dedicated server (VM) with a webserver (reverse proxy)?
what-are-the-advantages-of-running-jenkins-in-a-docker-container

Is 15 mins a long time because you make a change, build, find out something is wrong and need to make another change?
I would look at how you are building the container and get all the SDKs installed in the early layers so that rebuilding the container can use those layers from cache and move your layers that change to as late as possible so as little of the container needs rebuilding.
It is then worth looking at image layer caches if you clean your build environment regularly (I use Artifactory)
Generally, I would advocate that as little building is done on the Controller and this is shipped out to agents which are capable of running Docker.
This way you don't need to install loads of SDKs and change your Controller that often etc as you do all that in containers as and when you need them.
I use the EC2 cloud plugin to dynamically spin up agents as and when they are needed. But you could have a static pool of agents if you are not working in a cloud provider.

Is there a reason running CI builds on kubernetes cluster?

I don't know much about kubernetes, but as far as I know, it is a system that enables you to control and manage containerized applications. So, generally speaking, the essence of the benefit that we get from kubernetes is the ability to "tell" kubernetes what containers we want running, how many of them, on which machines, among other details, and kubernetes will take care of doing that for us. Is that correct?
If so, I just can't see the benefit of running a CI pipeline using a kubernetes pod, as I understand that some people do. Let's say you have your build tools on Docker containers instead of having them installed on a specific machine, that's great - you can just use those containers in the build process, why kubernetes? Is there any performance gain or something like this?
Appreciate some insights.

It is highly recommended to get a good understanding of what Kubernetes is and what it can and cannot do.
Generally, containers combined with an orchestration tools can provide a better management of your machines and services. It can significantly improve the reliability of your application and reduce the time and resources spent on DevOps.
Some of the features worth noting are:
Horizontal infrastructure scaling: New servers can be added or removed easily.
Auto-scaling: Automatically change the number of running containers, based on CPU utilization or other application-provided metrics.
Manual scaling: Manually scale the number of running containers through a command or the interface.
Replication controller: The replication controller makes sure your cluster has an equal amount of pods running. If there are too many pods, the replication controller terminates the extra pods. If there are too few, it starts more pods.
Health checks and self-healing: Kubernetes can check the health of nodes and containers ensuring your application doesn’t run into any failures. Kubernetes also offers self-healing and auto-replacement so you don’t need to worry about if a container or pod fails.
Traffic routing and load balancing: Traffic routing sends requests to the appropriate containers. Kubernetes also comes with built-in load balancers so you can balance resources in order to respond to outages or periods of high traffic.
Automated rollouts and rollbacks: Kubernetes handles rollouts for new versions or updates without downtime while monitoring the containers’ health. In case the rollout doesn’t go well, it automatically rolls back.
Canary Deployments: Canary deployments enable you to test the new deployment in production in parallel with the previous version.
However you should also know what Kubernetes is not:
Kubernetes is not a traditional, all-inclusive PaaS (Platform as a
Service) system. Since Kubernetes operates at the container level
rather than at the hardware level, it provides some generally
applicable features common to PaaS offerings, such as deployment,
scaling, load balancing, and lets users integrate their logging,
monitoring, and alerting solutions. However, Kubernetes is not
monolithic, and these default solutions are optional and pluggable.
Kubernetes provides the building blocks for building developer
platforms, but preserves user choice and flexibility where it is
important.
Especially in your use case note that Kubernetes:
Does not deploy source code and does not build your application.
Continuous Integration, Delivery, and Deployment (CI/CD) workflows are
determined by organization cultures and preferences as well as
technical requirements.
The decision is yours but having in mind the main concepts above will help you make it.

An important detail is that you do not tell Kubernetes what nodes a given pod should run on; it picks itself, and if the cluster is low on resources, in many cases it can actually allocate more nodes on its own (via the cluster autoscaler).
So if your CI system is fairly busy, and uses all containers for everything, it could make more sense to run an individual build job as a Kubernetes Job. If you have 100 builds that all start at the same time, it's possible for the cluster to give itself more hardware, and the build queue will clear out faster. Particularly if you're using Kubernetes for other tasks, this can save you same administrative effort over maintaining a dedicated pool of CI-system workers that need to be separately updated and will sit mostly idle until that big set of builds arrives.
Kubernetes's security settings are also substantially better than Docker's. Say your CI system needs to launch containers as part of a build. In Kubernetes, it can run under a service account, and be given permissions to create and delete deployments in a specific namespace, and nothing else. In Docker the standard approach is to give your CI system access to the host's Docker socket, but this can be easily exploited to take over the host.

NPM CI takes different amount of time on the same Jenkins Pipeline

I have a pipeline running on Jenkins that does a few steps before running my lint, unit and integration tests. One of those steps is to install the dependencies. That's done using npm ci.
I'm trying to figure out what is causing this step to take different amount of time, sometimes it's around 15sec sometimes more than 1min. Unfortunately it's been hard to find anything online that explains this random behaviour.
The pipeline is running on the same code base, so no changes have been made to the dependencies.
Would be very helpful if someone could explain what is causing this difference, or point me to a resource that might help.

This expected behaviour and you should not expect always the same amount of time.
There are many factors while installing node modules, for example
NPM registry server might be busy mean more laod so you can expect latency
Your local server stats, for instance, what if your jenkins CPU is 100% then can I expect constant installation time?
Network traffic etc
So you should not rely on the registry to response you always on the same amount of time.
you can reproduce easily by adding and removing node modules.
If you hate latency that you can configure your own NPM registry.
Verdaccio is a simple, zero-config-required local private npm
registry. No need for an entire database just to get started!
Verdaccio comes out of the box with its own tiny database, and the
ability to proxy other registries (eg. npmjs.org), caching the
downloaded modules along the way. For those looking to extend their
storage capabilities, Verdaccio supports various community-made
plugins to hook into services such as Amazon's s3, Google Cloud
Storage or create your own

Automated deployment of a dockerized application on a single machine

I have a web application consisting of a few services - web, DB and a job queue/worker. I host everything on a single Google VM and my deployment process is very simple and naive:
I manually install all services like the database on the VM
a bash script scheduled by crontab polls a remote git repository for changes every N minutes
if there were changes, it would simply restart all services using supervisord (job queue, web, etc)
Now, I am starting a new web project where I enjoy using docker-compose for local development. However, I seem to suck in analysis paralysis deciding between available options for production deployment - I looked at Kubernetes, Swarm, docker-compose, container registries and etc.
I am looking for a recipe that will keep me productive with a single machine deployment. Ideally, I should be able to scale it to multiple machines when the time comes, but simplicity and staying frugal (one machine) is more important for now. I want to consider 2 options - when the VM already exists and when a new bare VM can be allocated specifically for this application.
I wonder if docker-compose is a reasonable choice for a simple web application. Do people use it in production and if so, how does the entire process look like from bare VM to rolling out an updated application? Do people use Kubernetes or Swarm for a simple single-machine deployment or is it an overkill?

I wonder if docker-compose is a reasonable choice for a simple web application.
It can be, sure, if the development time is best spent focused on the web application and less on the non-web stuff such as the job queue and database. The other asterisk is whether the development environment works ok with hot-reloads or port-forwarding and that kind of jazz. I say it's a reasonable choice because 99% of the work of creating an application suitable for use in a clustered environment is the work of containerizing the application. So if the app already works under docker-compose, then it is with high likelihood that you can take the docker image that is constructed on behalf of docker-compose and roll it out to the cluster.
Do people use it in production
I hope not; I am sure there are people who use docker-compose to run in production, just like there are people that use Windows batch files to deploy, but don't be that person.
Do people use Kubernetes or Swarm for a simple single-machine deployment or is it an overkill?
Similarly, don't be a person that deploys the entire application on a single virtual machine or be mentally prepared for one failure to wipe out everything that you value. That's part of what clustering technologies are designed to protect against: one mistake taking down the entirety of the application, web, queuing, and persistence all in one fell swoop.
Now whether deploying kubernetes for your situation is "overkill" or not depends on whether you get benefit from the other things that kubernetes brings aside from mere scaling. We get benefit from developer empowerment, log aggregation, CPU and resource limits, the ability to take down one Node without introducing any drama, secrets management, configuration management, using a small number of Nodes for a large number of hosted applications (unlike creating a single virtual machine per deployed application because the deployments have no discipline over the placement of config file or ports or whatever). I can keep going, because kubernetes is truly magical; but, as many people will point out, it is not zero human cost to successfully run a cluster.

Many companies I have worked with are shifting their entire production environment towards Kubernetes. That makes sense because all cloud providers are currently pushing Kubernetes and we can be quite positive about Kubernetes being the future of cloud-based deployment. If your application is meant to run in any private or public cloud, I would personally choose Kubernetes as operating platform for it. If you plan to add additional services, you will be easily able to connect them and scale your infrastructure with a growing number of requests to your application. However, if you already know that you do not expect to scale your application, it may be over-powered to use a Kubernetes cluster to run it although Google Cloud etc. make it fairly easy to setup such a cluster with a few clicks.
Regarding an automated development workflow for Kubernetes, you can take a look at my answer to this question: How to best utilize Kubernetes/minikube DNS for local development

What are the advantages of running Jenkins in a docker container

I've found quite a few blogs on how to run your Jenkins in Docker but none really explain the advantages of doing it.
These are the only reasons I found:reasons to use Docker.
1) I want most of the configuration for the server to be under version control.
2) I want the ability to run the build server locally on my machine when I’m experimenting with new features or configurations
3) I want to easily be able to set up a build server in a new environment (e.g. on a local server, or in a cloud environment such as AWS)
Luckily I have people who take care of my Jenkins server for me so these points don't matter as much.
Are these the only reasons or are there better arguments I'm overlooking, like automated scaling and load balancing when many builds are triggered at once (I assume this would be possible with Docker)?

This answer for Docker, what is it and what is the purpose
covered What is docker? and Why docker?
Docker official site also provides an explanation.
The simple guide here is:
Faster delivery of your applications
Deploy and scale more easily
Get higher density and run more workloads
Faster deployment makes for easier management
For Jenkins usage, it's faster and easier to deploy/install in the docker way.
Maybe you don't need the scale more easily feature right now. And since the docker is quite lightweight, so you can run more workloads.
However
The docker way would also bring some other problem. Generally speaking, it's the accessing privilege.
Like when you need to run Docker inside the Jenkins(in Docker), it would become complicated somehow. This blog would provide you with some knowledge of that situation.
So there is no silver bullet as always. There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.
The choice should be made based on the specific scenario.

Jenkins as Code
You list mainly the advantages of having "Jenkins as Code". Which is a very powerfull setup indeed, but does not necessary requires Docker.
So why is Docker the best choice for a Jenkins as Code setup?
Docker
The main reason is that Jenkins pipelines work really well with Docker. Without Docker you need to install additional tools and add different agents to Jenkins. With Docker,
there is no need to install additional tools, you just use images of these tools. Jenkins will download them from internet for you (Docker Hub).
For each stage in the pipeline you can use a different image (i.e. tool). Essentially you get "micro Jenkins agents" which only exists temporary. Hence you do not need fixed agents anymore. This makes your Jenkins setup much more clean.
Getting started
A while ago I have written an small blog on how to get started with Jenkins and Docker, i.e. create a Jenkins image for development which you can launch and destroy in seconds.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart