In Jenkins, how do I set SCM behavior for the master node rather the build nodes? - jenkins

I'm aware I'm lacking basic Jenkins concepts but with my current knowledge it's hard for to research successfully - maybe you can give me some hints I can use to re-word my question if needed.
Currently I'm facing a situation in which in a setup with several build nodes the Jenkins master machine is running out of disk space because Jenkins clones git repositories on both, the master and build nodes (and the master only has limited space). This question explains why.
Note: the master node itself does not build anything - it just clones the repo to a local workspace folder (I guess it just needs the Jenkinsfiles).
Going through the job configurations and googling this issue I find options regarding shallow and sparse clones or cleaning up the workspace before or after the build using the Cleanup Plugin. But those settings and plugins only care about the checkout done with checkout(SCM) on the build nodes, not the master.
But in case I want to leave the situation as is on the build nodes but keep the workspace folders on the Jenkins master machine slim, how do I approach this? What do I have to search for?
And as a side question - isn't it possible to have something like "git exports"? I.e. having the .git folders removed after checking out the commit I need?
In case it depends on the kind of job I use, I'm using scripted pipeline jobs.

I've got a similar setup: A master node, multiple build nodes.
Simply, I set the number of executors=0 on the master node (from Manage Jenkins -> Manage Nodes), so every job will land on build nodes.
The only repo cloned on the master is the shared library.

Running Jenkins builds in the master node is discouraged for two main reasons:
First of all, the usability of the Jenkins platform might be affected by many ongoing builds, for example showing delays on certain operations.
It is a well-known security problem, as pointed out by the documentation:
Any builds running on the built-in node have the same level of access to the controller file system as the Jenkins process.
It is therefore highly advisable to not run any builds on the built-in node, instead using agents (statically configured or provided by clouds) to run builds.
Always in that wiki page you can find details on this security problems, like what an attacker can do and an alternative that lets you use the master node to build, but patching some of the listed security problems. The solution is based on a plugin called Job Restrictions Plugin.
By the way, the most popular decision is to let slave nodes do the build:
To prevent builds from running on the built-in node directly, navigate to Manage Jenkins » Manage Nodes and Clouds. Select master in the list, then select Configure in the menu. Set the number of executors to 0 and save. Make sure to also set up clouds or build agents to run builds on, otherwise builds won’t be able to start.
If you really have strong reasons to build on the master node, you can always apply a different git clone strategy based on the value of the env.NODE_NAME environment variable. It is set to master if the pipeline job is run on the master node, otherwise it is filled with the node name (of course). Nonetheless, I have never seen anyone customizing the git clone command based on the node used, so... Don't do it 😉
About the sparse checkout and the sparse/shallow clone:
The former creates an incomplete working directory, avoiding to map all the trees and blobs present in the current commit, but only those you specify. Do you save that much space? Or better, is your project tree that heavy that you would need to do something like this? The sparse-checkout is generally used when you want a clean working tree, without unnecessary files.
sparse/shallow clone can be useful sometimes to reduce the download time, especially when you have a huge history. The most common option is --depth=1 that instructs git to retrieve only the most recent commit. As far as I know, Jenkins already applies some optimizations to speed the clone process but it generally keeps the entire history. Again, I am not sure you would gain a lot more space.
A valid (at least for me) alternative to space-optimizations on git files, is to build on Docker containers. Jenkins has reached a good level of integration with Docker and there are a lot of advantages using it, among which the disposal of the workspace after the job finished.

I didn't use the pipeline feature myself so far -- but conceptually it is clear that the master requires initial access to the Jenkinsfile. It will therefore be difficult to avoid this step entirely.
If Jenkins itself does not provide an option to fine-tune the clone/checkout behavior on the master side, then I'd see these options:
Create a custom version of Jenkins (or of the corresponding plugin) which hard-codes the behavior that you need (like, shallow/sparse clone). Modifying and building both Jenkins and its plugins is surprisingly simple; often, the most difficult part is to locate the code that you need to touch.
Tune the master's clone in-place. Shallowness and sparse-checkout properties can be set for existing clones. If you set these properties after the initial clone (possibly in the Jenkinsfile itself or in a post-build step), then Jenkins may possibly maintain those properties.
Constantly re-cloning and deleting the repo on master side increases the load both on the Jenkins master and on your Git server, so better be careful with that (especially since your repository has a size where disk space matters already). If you really want to go that way, you could try to force-remove the clone on the master in a post-build step -- this should be relatively easy to implement. You need to check that this hack will not interfere with Jenkins' access to the Jenkinsfile.

Related

Why does Jenkins unnecessarily change to a different node sometimes?

Why is Jenkins changing node to do another build when it doesn't need to?
We have a Jenkins setup with 3 Mac and 3 Windows nodes, building free style projects. We do not use the master for builds. The projects are set to run on any of the 3 nodes suitable for their platform. We are using labels for this purpose.
Some of the time, when we do a build it will do the build on the same node as last time.
But sometimes, without any obvious pattern, it will change to a different node, even though the previously used node is available and not busy. This wastes a lot of time potentially as incremental builds on the same node are much faster than pulling and building everything from scratch.
Jenkins claims it should allocate jobs to the same node if possible, when multiple possibilities exist.
As a result, from the user’s point of view, it looks as if Jenkins tries to always use the same node for the same job, unless it’s not available, in which case it’ll build elsewhere. But as soon as the preferred node is available, the build comes back to it. Reference
Jenkins version is 2.289.2 presently.
The jobs are freestyle builds with shell script/command prompt steps.
Repositories are Git and Mercurial.
I think if you read the Restrict where this project can be run that might provide you answer to your question.
As its definition tells By default, builds of this project may be executed on any agents that are available and configured to accept new builds.
So if you don't specify the agent and restrict the build to be run a particular node, it will randomly choose.
This wastes a lot of time potentially as incremental builds on the
same node are much faster than pulling and building everything from
scratch.
To solve the above problem only, we have the Restrict where this project can be run feature in Jenkins. Please look to the below screenshot for detailed help that we can get in Jenkins .

Execute Build Jobs/Pipelines not on Master but only on Build Agent

Following the Jenkins Best Practices, I want to avoid that Build Jobs/Pipelines could be executed into my Jenkins Master.
To do so, I've installed the Job Restrictions Plugin, using it to configure the Master to run only some Maintenance Pipelines.
The problem is that now Build Pipelines that are configured to run on specific Agents, are not executed anymore. I see that the Build Queue continuously grows, and the Pipelines are not runned. I think that this behaviour could be related to Flyweight Executors of the Master.
So, the question is the following: How can I execute on Master just a little subset of Maintenance Pipelines and, in the mean time, execute Build Pipelines only on specific Agent?
You can configure the master node to only be used when explicitly named. Just click the master node > go to configure and change Use this node as much as possible to Only build jobs with label expressions matching this node
I found the solution that perfectly fits with my needs, here.
To quickly sum up the solution, I was to able to exclude all the user Builds from Master and run on it only the Jobs/Pipelines of a specific Jenkins folder (IuA in my case), configuring the Job Restrictions Plugin in the following way:
In order to better understand the logic behind this solution, I recommend you to give a look at link that I posted above.

Why does gitlab-ci default to using git clone for each job instead of building a docker image first?

Gitlab-ci's default mode is to use git clone in every job in a pipeline.
This is time-consuming, especially since after cloning we need to install/update all dependencies.
I'd like to flip the order of our pipelines, and start with git clone + docker build, then run all subsequent jobs using that image without cloning and rebuilding for each job.
Am I missing something?
Is there any reason that I'd want to clone the repo separately for each job if I already have an image with the current code?
You are not missing anything. If you know what you are doing, you don't need to clone your repo for each stage in your pipeline. If you set the GIT_STRATEGY variable to none, your test jobs, or whatever they are, will run faster and you can simply run your docker pull commands and the tests that you need. Just make sure that you use the correct docker images, even if you start many parallel jobs. You could for example use CI_COMMIT_REF_NAME as part of the name of the docker image.
As to why GitLab defaults to using git clone, my guess is that this is the least surprising behavior. If you imagine someone new to GitLab and new to CI, it will be much easier for them to get up and running if each job simply clones the whole repo. You also have to remember that not everyone builds docker images in their jobs. I would guess that the most common way this is set up is either with programming languages that doesn't need to be compiled, for example python, or that there is a build job that produces binaries, and then a test job that runs the binaries. They can then use artifacts to send the binaries from the build job to the test job.
This is easy and it works. When people then realize that a lot of the time of their test jobs is spent just cloning the repository, they might look into how to change the GIT_STRATEGY, and to do other things to optimize their particular build.
One of the reasons of using a CI is to execute your repo in a fresh state. This cannot be done if you skip the git clone process in certain jobs. A job may modify the repo's state by deleting its file or generating new ones; only the artifacts which are explicitly documented in the pipeline should be shared between jobs-nothing else.

Jenkins Multibranch Pipeline - Issues with deleting jobs

Use case: Using Jenkinsfile to auto create builds for branches
Summary:
For a variety of reasons sometimes the Jenkins master fails to connect to the SCM server. When this occurs Jenkins deletes that job directory on master, because it no longer sees the branches. However, the slaves are not cleaned up and so they still have the old workspace paths (which are uniquely named based on the build # in my setup). When the Jenkins master reconnects to the SCM server, it recreates a new job folder on master, and the build counter is reset to #1.
This creates the following issues:
When a build starts, it executes on a slave. Since master has a new counter the job is #1. But this path may already exist from a previous build on that slave, so the artifact is built with content that was checked out for the original old build (i.e. maven uses the /target directory inside the workspace which already existed from previous build). So the end result is an artifact that potentially has the wrong code.
This can create build storms. After the connection issues are resolved, Jenkins will see all the repositories and branches with Jenkinsfiles and start to build them. So in a setup of let's say 20 repositories with 10 branches each, this will create 200 new builds. This increases with additional repositories and branches. This is obviously not desired.
Solutions:
One quick solution I can think of is to update the Jenkinsfile to delete the workspace if it exists before running the job inside of it. But this is just a work around. I would not want to mask the connection issues and would like to retain the actual build history of a pipeline (not have it keep erasing itself).
Minimize connection issues. This obviously will not always be guaranteed though. Plus sometimes maintenance must force servers offline. While I can construct maintenance in a way to limit or work around such issues, there still will be rare cases where downtime is required across the board. It would be best if Jenkins could handle this use case.
I'm curious if anyone has ran into this issue and what the thoughts are on this problem?

Keeping two jenkins server in sync

I have
a usual jenkins master jettysburg:8888/jenkins,
and a failover jenkins master jettyperry:8888/jenkins.
I am planning to use mercurial to keep the jobs directory of the two masters in sync.
Usually, most of the time, new jobs and builds are defined and executed at jettysburg:8888. Then I would need to sync jettyperry:8888 with whatever took place in jettysburg:8888, and I plan to perform the push once a day.
Which means, after a failover to jettyperry:8888, the push would be performed the converse direction.
I have been using mercurial for my non-programming/coding files (like word, excel and text files) for that purpose, as a means to perform incremental and redundant backup of my "mission critical" files, as well as versioning them.
I am also hoping to depend on mercurial to backout from changes made to jenkins jobs.
Is using mercurial to sync two Jenkins masters a good idea? Is there a better way to keep two Jenkins servers in sync? In this case, I am syncing only the jobs tree.
As long as you don't need shared job state between the servers (they run in their own little universes) and you keep the same plugin modules and libraries on both Jenkins servers, using some form version control to keep the actual job definitions is fine.
My office does this with git. We have a development and a production set of jenkins servers. We maintain a base linux image with jenkins install with all necesarry modules and locally install libraries (like nodejs and such). We then spin up an instance of the image and pull down the jobs.
The one thing that can be a challenge is keeping in sync things like credentials and Jenkins config settings--you might need to keep them as part of the base image.
If you need the job queues to persist and be shared (like a master-master setup) you can look at this plugin, which allows multiple jenkins masters to share the same job queue: https://wiki.jenkins-ci.org/display/JENKINS/Gearman+Plugin
There are two approaches to this:
Instead of approaching a solution with hot back-up why not consider clustered master so you get an active-active solution. You may want to look at https://wiki.jenkins-ci.org/display/JENKINS/Gearman+Plugin. This helps you cluster a bunch of masters so one going down should not be an issue.
Consider running Jenkins on containers, and externalize jenkins projects directory as external volumes in an NFS so you can bring up another container when one goes down - keeping both containers running will be a challenge with concurrent writes (if there are any).
Hope this helps.

Resources