How to throttle an entire pipeline in Jenkins

How to throttle an entire pipeline in Jenkins - jenkins

I'm new to Jenkins pipeline and trying to understand how can I throttle an "entire" pipeline, which basically means that the following will take place:
1) I will be able to run the same pipeline maximum number of concurrent runs, say MAX_CONCURRENT_RUNS = 2
2) Each run (essentially build) can have its own parameters, with the following "extra requirement", that two (or more) different builds CAN have (if required) the same parameters sent to it.
3) In the case where at a particular point in time there are already MAX_CONCURRENT_RUNS builds (runs) of the pipeline, then the MAX_CONCURRENT_RUNS + 1 run will "hold" itself until the first currently running build will terminate and only then will start to execute.
I have looked in this SO question and also this SO question, but they both not "exactly" applicable to my situation (requirements).
I'm using Jenkins server version 2.176.1

After some research I did mainly in these two links:
The throttle plugin official GitHub page and JENKINS-45140 issue where some of the comments were very useful, I have composed this solution:
1) First thing is install the required plugin, that can be found in the Manage Jenkins --> Manage Plugins "search tab" by typing throttle-concurrents (the official plugin page can be found here).
2) A "simple" throttle category needs to be added to the global configuration of the "throttle builder plugin" within Jenkins' global configuration. This can be done by going to Manage Jenkins --> Configure system. There under the "Throttle Concurrent Builds" section the "new" category needs to be added. In the below example, I have set the name of the category to: simpleThrottleCatagory and the following parameters:
This way, the pipeline that would be able to run several builds at the same time, with some "upper limit" on how many builds, which is essentially the MAX_CONCURRENT_RUNS (in this case 2).
3) In this example I will keep the pipeline "itself" implementation "as simple as possible" in order to focus on the "throttling" considerations and not the "common pipeline stuff".
3.1) The "simple concurrent pipeline" will simply receive two parameters from the user:
Number of seconds to sleep:NumSecToSleep.
Some sample choice parameter named BocaOrRiver with two possible values: boca or river.
3.2) The entire pipeline implementation in this case is as follows (note that some extra "approvals" needs to take place so that Calendar.getInstance().getTime().format('YYYY/MM/dd-hh:mm:ss',TimeZone.getTimeZone('CST')) function will work. In case you are unable to perform these changes, replace the two lines with this function call with any other implementation that will get the current time stamp):
// Do NOT place within the pipeline block
properties([ [ $class: 'ThrottleJobProperty',
categories: ['simpleThrottleCatagory'],
limitOneJobWithMatchingParams: false,
maxConcurrentPerNode: 2,
maxConcurrentTotal: 2,
paramsToUseForLimit: '',
throttleEnabled: true,
throttleOption: 'category' ] ])
pipeline
{
agent any parameters
{
string(name: "NumSecToSleep", description: "Number of second to sleep in the Sleep stage")
choice(name: "BocaOrRiver", choices: "boca\nriver", description: "Which Team in Buenos Aires do you prefer?")
}
stages
{
stage("First stage")
{
steps
{
echo "WORKSPACE is:${WORKSPACE}"
echo "Build number is:${env.BUILD_NUMBER}"
}
}
stage("Sleep stage")
{
steps
{
script
{
def time = params.NumSecToSleep echo "Sleeping for ${params.NumSecToSleep} seconds"
def timeStamp = Calendar.getInstance().getTime().format('YYYY/MM/dd-hh:mm:ss',TimeZone.getTimeZone('CST'))
println("Before sleeping current time is:" + timeStamp)
sleep time.toInteger() // seconds
timeStamp = Calendar.getInstance().getTime().format('YYYY/MM/dd-hh:mm:ss',TimeZone.getTimeZone('CST'))
println("After sleeping current time is:" + timeStamp)
echo "Done sleeping for ${params.NumSecToSleep} seconds"
}
}
}
}
}
3.3) NOTES:
3.3.1) The code within the actual pipeline block is essentially straight forward: Simply display some "build specific" parameters just to be sure that each build of the job gets its specific user defined parameters and it sleeps for some number of seconds also so that two (in this case) or more builds can be run indeed concurrently and it would be able to see "for our own eyes" (at run time) that the two jobs run together (in parallel).
3.3.2) The more interesting part of the pipeline is the properties block (at the top):
3.3.2.1) Note that it needs to be defined OUTSIDE of the pipeline block section.
3.3.2.2) I think that most of the settings defined within this properties block are very "self explanatory" YET the two that should be mentioned are:
$class: 'ThrottleJobProperty': This is a "predefined" value of Jenkins to indicate that this "job" (can be also pipeline) can be throttled.
categories: ['simpleThrottleCatagory']: This is the "global throttle category" defined in the previous step.
4) Basic illustration:
In the figure below there is a screen shot of a situation where three builds where started one after the other, with "enough" time to sleep in each one of them so that the first two (build 17 & 18 pointed in points 2 & 3 respectively) won't "finish too soon", meaning, so that indeed the "third" build (build 19) will "have to wait" for an available executor (pointed in point 4):
5) Here I have described a very simple and minimal yet (IMMO) representative implementation along with "global configuration" of an "entire" concurrent pipeline. Off course this topic can be discussed MUCH further, for example, it is also possible to throttle only single step within a pipeline.

Related

Jenkins: coordinating multiple pipelines

I am developing software for an embedded device. The steps involved in building and verifying it all are complicated: creating the build environment (via containers), building the actual SD card image, running unit tests, automated tests on target hardware, license compliance checks and so on - details aren't important here.
Currently I have this in one long declarative Jenkinsfile as a multibranch-pipeline (for all intents and purpose here, we're doing gitflow). In doing this I've hit a limit on the size of a Jenkinsfile (https://issues.jenkins.io/browse/JENKINS-37984) and can't actually get all the stages in that I want to.
It's too big so i need to cut this massive pipeline up. I broke this all up in little pipeline jobs with parameters to pass data/context between each part of the pipeline and came up with something like this:
I've colour-coded the A and B artifacts as they're used a lot and the lines would make things messy. What this tries to show is an order of running things, where things in a column depend on artifacts created in column to the left.
I'm struggling to discover how to do the "waiting" for multiple upstream jobs (for instance in Job Foxtrot in the diagram) before starting another downstream job that depends on them.
I specifically do not want to turn each column in the diagram into a parallel group of things, because for instance Job Delta might take 2 minutes but Job Charlie take 20 minutes. The exact duration of each job is variable and unpredictable as for some parameter combinations will mean building from scratch and others will cause an existing artifact to be output.
I think I need something like the join plugin (https://plugins.jenkins.io/join/), but for pipeline jobs (join only works on freestyle jobs and is quite aged).
The one approach I've explored is to have a "controller" job (maybe job Alpha in the diagram?) that uses the build step (https://www.jenkins.io/doc/pipeline/steps/pipeline-build-step/_) with the wait parameter set to false to trigger the downstream jobs in correct order, with the correct parameters. It would involve searching Jenkins.instance.getItems() to locate the Runs for the downstream projects, which have an upstream cause that matches the currently executing "controller" job. This involves polling waiting for the job to appear and then polling for the job to complete. This feels like I'm "doing it wrong". Below is the source for this polling approach - be gentle, i'm new to groovy!
Is this polling approach a good way? What problems could I encounter with this approach? Should I be using the ItemListener Jenkins ExtensionPoint and writing a plugin to do this sort of thing in a generic way? Is there another way I've not found?
I feel like I'm not "holding it right" when it comes to the overall pipeline design/architecture here.
Finally after writing this I notice that Jobs India, Juliet and Kilo could be collapsed into a single Job, but I don't think that solve much.
#NonCPS
Integer getTriggeredBuildNumber(String project, String causeJobName, Integer causeBuildNumber) {
//find the job/project first
def job = Jenkins.instance.getAllItems(org.jenkinsci.plugins.workflow.job.WorkflowJob.class).find { job -> job.getFullName() == project }
//find a build for this job that was caused by the current build
def build = job.getBuilds().find { build ->
build.getCauses().findAll{ it.class == hudson.model.Cause.UpstreamCause.class }.find { cause ->
cause.getUpstreamProject() == causeJobName && cause.getUpstreamBuild() == causeBuildNumber
} != null
}
if(build != null) {
return build.getNumber()
} else {
return -1
}
}
#NonCPS
Boolean isBuildComplete(String jobName, Integer buildNumber) {
def job = Jenkins.instance.getAllItems(org.jenkinsci.plugins.workflow.job.WorkflowJob.class).find { job -> job.getFullName() == jobName }
if(job) {
def build = job.getBuildByNumber(buildNumber)
return build.isBuilding() == false && build.getResult() != null
} else {
println "WARNING: job '" + jobName + "' not found."
return false
}
}

We've hit the "Code too large" too many times, but the way to cope with it is to refactor your pipeline to remain deep under the limit. The following may be used:
You can run a combination of scripted and declarative pipeline. So some stages in the beginning and/or in the end may be refactored out.
You can build some of the parallel stages dynamically. This code would not be counted towards the limited code size.
Lastly, the issue mentions transformation variables, and that can help too.
We used the combination of the above and have expanded our pipeline well beyond what it was when we first encountered the issue you're facing.

CI for a monorepo with Jenkins and BlueOcean

I'm trying to figure out what options do I have,
when trying to build a good pipeline for CICD for a monorepo,
I'm trying to have something like this (this is only a pseudo pipeline)
and not really what I'm using ATM in my monorepo (or what I will have).
Explanation:
Pre: understand what I should build, test, etc..
Build dynamically a parallel step which will give me the later explained capabilities.
Foo: run the parallel and comfortably wait:)
This is the only way I thought of getting this features:
* Build process among the P’s can be shared and I can generate some waitUntil statements
to make this works, I guess...
* Every P’s is independent from the other, if one Ut of P2 fails f.e, it doesn't affect the other progress
of the pipeline, or if I want, it's only a failFast configuration
* Every step within the way is again not related to the progress of other P’s,
so when Ut finishes in any of the P's it starts immediately it's St.
(thought this might changed according to some configuration I'll probably need)
The main problems with that is:
1. I'm losing the control the Restart single steps (since I can only restart Top level steps)
2. It requires me to do a lot more with Scripted Pipeline, which looks like the support of BlueOcean
(which is kind of critical to me), is questionable...
seems that BlueOcean is more supported within the scope of the Declarative Pipeline.
NOTE: It probably looks like I can split every P’s to a another jenkins job
but, this will require me to wait a lot of time in checkout workspace+preparation of the monorepo,
and like I said the "build" step may have shared between the P’s and it's more efficient to do this like that
I will appreciate every feedback or any suggestion:)

There's no problem whatsoever with doing what you want with a Declarative pipeline, since stage can have a stages child. So:
pipeline {
stages {
stage("Pre") { }
stage("Foo") {
parallel {
stage ("P1") {
stages {
stage("P1-Build") {}
stage("P1-Ut") {}
stage("P1-St") {}
}
}
stage ("P2") {
stages {
stage("P2-Build") {}
stage("P2-Ut") {}
}
}
// etc..
Stages P1..P4 will run in parallel but within each their Build-unittest-test stages will run sequentially.
You won't be able to restart separate stages but it's not a good feature anyway.

What is the most straightforward way to restrict pipeline stages to a specific shared resource?

We have an existing Jenkins install that is testing firmware running on an embedded tart. The multi-stage pipeline looks something like: Checkout -> Build -> Download -> Smoke tests -> Unit tests. This is working great, except it takes 9 hours to run the pipeline. To speed things up and also to test different target variants we have added 3 more targets to the system (UUT#1, #2, and so on).
My question is, what is the most straightforward way to allow the parallelization happen while also restricting the suites to UUTs with specific properties. For example, our Unit tests contain about 10 different suites (suite1 suite2 and so on), and what I’d like to do is spread those out amongst the 4 UUTs (thus having 4 suites running at a time) but restrict the execution this way:
Suite1 can only run on a UUT that has ‘USB’
Suite2 can only run on a UUT that has ‘LCD-display’
Suite3 can run anywhere
.. and so on, then my UUTs might have properties like:
UUT#1 ‘USB LCD-display’
UUT#2 ‘Ethernet’
UUT#3 ‘RS-232 USB’
Etc.
Reading about agents, it seems that a label on an agent may allow this, but agents seem to carry a lot of overhead and I’m not sure if they’re appropriate.
Long-time Jenkins user, but this is the first time I’ve ever attempted anything this complicated and pipelines are a new concept for me.

A straightforward way is to use the Lockable Resources plugin.
This can be used as a step as well as a stage option (undocumented). The latter comes in handy if you have nested stages which all depend on the resource to be locked.
Stage option in declarative pipeline
pipeline {
agent any
stages {
stage('Test') {
options {
// Lock a single resource from all resources labeled 'mylabel'
lock( label: 'mylabel',
quantity: 1,
variable: 'MyResourceName' )
}
steps { // or 'parallel' or 'stages'
echo "Locked resource $MyResourceName"
sleep 10
echo "Resource will be unlocked after this stage"
}
}
}
}
Step in scripted pipeline
node {
stage('Test') {
lock( label: 'mylabel',
quantity: 1,
variable: 'MyResourceName' ) {
echo "Locked resource $MyResourceName"
sleep 10
echo "Resource will be unlocked after this stage"
}
}
}
Caveats
If lock is used as a step in declarative pipeline, you may get an error:
Missing required parameter: "resource"
This seems to be a little bug in argument checking. According to the documentation, you only need to specify either resource or label parameter. Simply pass null as the value for this parameter.
If parameter quantity is not specified, all resources that match the given label will be locked.

Matrix configuration with Jenkins pipelines

The Jenkins Pipeline plugin (aka Workflow) can be extended with other Multibranch plugins to build branches and pull requests automatically.
What would be the preferred way to run multiple configurations? For example, building with Java 7 and Java 8. This is often called matrix configuration (because of the multiple combinations such as language version, framework version, ...) or build variants.
I tried:
executing them serially as separate stage steps. Good, but takes more time than necessary.
executing them inside a parallel step, with or without nodes allocated inside them. Works but I cannot use the stage step inside parallel for known limitations on how it would be visualized.
Is there a recommended way to do this?

TLDR: Jenkins.io wants you to use nodes for each build.
Jenkins.io: In pipeline coding contexts, a "node" is a step that does two things, typically by enlisting help from available executors on agents:
Schedules the steps contained within it to run by adding them to the Jenkins build queue (so that as soon as an executor slot is free on a node, the appropriate steps run)
It is a best practice to do all material work, such as building or running shell scripts, within nodes, because node blocks in a stage tell Jenkins that the steps within them are resource-intensive enough to be scheduled, request help from the agent pool, and lock a workspace only as long as they need it.
Vanilla Jenkins Node blocks within a stage would look like:
stage 'build' {
node('java7-build'){ ... }
node('java8-build'){ ... }
}
Further extending this notion Cloudbees writes about parallelism and distributed builds with Jenkins. Cloudbees workflow for you might look like:
stage 'build' {
parallel 'java7-build':{
node('mvn-java7'){ ... }
}, 'java8-build':{
node('mvn-java8'){ ... }
}
}
Your requirements of visualizing the different builds in the pipeline would could be satisfied with either workflow, but I trust the Jenkins documentation for best practice.
EDIT
To address the visualization #Stephen would like to see, He's right - it doesn't work! The issue has been raised with Jenkins and is documented here, the resolution of involving the use of 'labelled blocks' is still in progress :-(
Q: Is there documentation letting pipeline users not to put stages inside of parallel steps?
A: No, and this is considered to be an incorrect usage if it is done; stages are only valid as top-level constructs in the pipeline, which is why the notion of labelled blocks as a separate construct has come to be ... And by that, I mean remove stages from parallel steps within my pipeline.
If you try to use a stage in a parallel job, you're going to have a bad time.
ERROR: The ‘stage’ step must not be used inside a ‘parallel’ block.

I would suggest Declarative Matrix as a preferred way to run multiple configurations in Jenkins. It allows you to execute the defined stages for every configuration without code duplication.
Example:
pipeline {
agent none
stages {
stage('Test') {
matrix {
agent {
label "${NODENAME}"
}
axes {
axis {
name 'NODENAME'
values 'java7node', 'java8node'
}
}
stages {
stage('Test') {
steps {
echo "Do Test for ${NODENAME}"
}
}
}
}
}
}
}
Note that declarative Matrix is a native declarative Pipeline feature, so no additional Plugin installation needed.
Jenkins blog post about the matrix directive.

As noted by #StephenKing, Blue Ocean will show parallel branches better than the current stage view. A planned upcoming version of the stage view will be able to show all the branches, though it will not visually indicate any nesting structure (would look the same as if you ran the configurations serially).
In any event, the deeper issue is that you will essentially only get a pass/fail status for the build overall, pending a resolution to JENKINS-27395 and related requests.

In order to test each commit on several platforms, I've used this base Jenkinsfile skeleton:
def test_platform(label, with_stages = false)
{
node(label)
{
// Checkout
if (with_stages) stage label + ' Checkout'
...
// Build
if (with_stages) stage label + ' Build'
...
// Tests
if (with_stages) stage label + ' Tests'
...
}
}
/*
parallel ( failFast: false,
Windows: { test_platform("Windows") },
Linux: { test_platform("Linux") },
Mac: { test_platform("Mac") },
)
*/
test_platform("Windows", true)
test_platform("Mac", true)
test_platform("Linux", true)
With this it's relatively easy to switch from a sequential to a parallel execution, each of them having their pros and cons:
Parallel execution runs much faster, but it doesn't contain the stages labelling
Sequential execution is much slower, but you get a detailed report thanks to stages, labelled as "Windows Checkout", "Windows Build", "Windows Tests", "Mac Checkout", etc.)
I'm using the sequential execution for the time being, until I find a better solution.

It seems like there is relief coming at least with the BlueOcean UI. Here is what I got (the tk-* nodes are the parallel steps):

Multiple concurrent builds of the same project in Jenkins

On my team, we have a project that we want to do continuous-integration-style testing on. Our build takes around 2 hours and is triggered by the "Poll SCM" trigger (using Perforce as the server), and we have two build nodes.
Currently, if someone checks in a change, one build node will start up pretty much right away, but if another change gets checked in, the other node will not kick in, as it's waiting for the previous job to finish. However, I could like the other build node to start a build with the newer checkin as soon as possible, so that we can maximize the amount of continuous testing that's occurring (so that if e.g. one build fails we know sooner rather than later).
Is there any simple way to configure a Jenkins job (using Poll SCM against a Perforce server) to not block while another instance of the job is already running?
Unfortunately, due to the nature of the project it's not possible to simply break the project up into multiple build jobs that get pipelined across multiple slaves (as much as I'd like to change it to work in this way).

Use the "Execute concurrent builds if necessary" option in Jenkins configuration.

Just to register here in case someone needs it, in the version I'm using (Jenkins 2.249.3) I had to uncheck the option Do not allow concurrent builds in the child job that is called multiple times from the parent job.
The code is more or less like that:
stage('STAGE IN THE PARENT JOB') {
def subParallelJobs = [:]
LIST_OF_PARAMETERS = LIST_OF_PARAMETERS.split(",")
for (int i = 0; i < LIST_OF_PARAMETERS.size(); i++) {
MY_PARAMETER_VALUE = LIST_OF_PARAMETERS[i].trim()
MY_KEY_USING_THE_PARAMETER_TO_MAKE_IT_UNIQUE = "JOB_KEY_${MY_PARAMETER_VALUE}"
def jobParams = [ string(name: 'MY_JOB_PARAMETER', value: MY_PARAMETER_VALUE) ]
subParallelJobs.put("MY_KEY_USING_THE_PARAMETER_TO_MAKE_IT_UNIQUE", {build (job: "MY_CHILD_JOB", parameters: jobParams)})
}
parallel(subParallelJobs)
}
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart