I'm building a Jenkins job that will run all my staging tests continuously, but not all at once (they rely on shared hardware). So, I'm creating parallel jobs, with a semaphore to ensure that only a limited amount are run at once.
Here's a simplified version of my pipeline that reproduces the issue:
import java.util.concurrent.Semaphore
def run(job) {
return {
this.limiter.acquire();
try {
println "running ${job}"
build job
println "finished ${job}"
} finally {
this.limiter.release();
}
}
}
def getJobs() {
def allJobs = Jenkins.getInstance().getJobNames()
def stagingJobs = []
for(String job : allJobs) {
if (job.startsWith("staging/temp")) {
stagingJobs.add(job)
}
}
println "${stagingJobs.size()} jobs were found."
return stagingJobs
}
this.limiter = new Semaphore(2)
def jobs = [:]
for (job in getJobs()) {
jobs[job] = run(job)
}
parallel jobs
When I run without the semaphores, everything works fine. But with the code above, I get nothing outputted except:
[Pipeline] echo
6 jobs were found.
[Pipeline] parallel
[Pipeline] [staging/temp1] { (Branch: staging/temp1)
[Pipeline] [staging/temp2] { (Branch: staging/temp2)
[Pipeline] [staging/temp3] { (Branch: staging/temp3)
[Pipeline] [staging/temp4] { (Branch: staging/temp4)
[Pipeline] [staging/temp5] { (Branch: staging/temp5)
[Pipeline] [staging/temp6] { (Branch: staging/temp6)
If I view the pipeline steps, I can see the first two jobs start, and their log messages output. However, it seems like the runner never receives a notification that the staging jobs finish. As a result, the semaphore never releases and the other 4 jobs never manage to start. Here's a thread dump mid test, after the downstream builds have definitely finished:
Thread #7
at DSL.build(unsure what happened to downstream build)
at WorkflowScript.run(WorkflowScript:9)
at DSL.parallel(Native Method)
at WorkflowScript.run(WorkflowScript:38)
Thread #8
at DSL.build(unsure what happened to downstream build)
at WorkflowScript.run(WorkflowScript:9)
Thread #11
at WorkflowScript.run(WorkflowScript:6)
Thread #12
at WorkflowScript.run(WorkflowScript:6)
Eventually it times out with several java.lang.InterruptedException errors.
Is it possible to use semaphores in a pipeline, or is there a better way to ensure only a portion of jobs run at once? I would rather avoid spinning up nodes for what amounts to a simple test runner.
The Concurrent Step plugin was just released and should work nicely for this use case.
Wtih this, you can simplify your code:
def semaphore = createSemaphore permit:2
def run(job) {
return {
acquireSemaphore (semaphore) {
println "running ${job}"
build job
println "finished ${job}"
}
}
}
...
Possible workaround with lock step
Lockable resources plugin has no semaphore capabilities.
It took me a long time to figure out how to squeeze the lock step into semaphore behavior... it would be nice if it could do it on its own. Here's an example...
int concurrency = 3
List colors = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
Map tasks = [failFast: false]
for(int i=0; i<colors.size(); i++) {
String color = colors[i]
int lock_id = i % concurrency
tasks["Code ${color}"] = { ->
stage("Code ${color}") {
lock("color-lock-${lock_id}") {
echo "This color is ${color}"
sleep 30
}
}
}
}
// execute the tasks in parallel with concurrency limits
stage("Rainbow") {
parallel(tasks)
}
The above will create custom locks:
color-lock-0
color-lock-1
color-lock-2
The all concurrent tasks will race for one of the three locks. It's not perfectly efficient (certainly not as efficient as a real semaphore) but it does a good enough job...
Hopefully that helps others.
Limitations
Your pipeline will take as long as your slowest locks. So if you unfortunately have several long running jobs racing for the same lock (e.g. color-lock-1), then your pipeline could be longer than if it were a proper semaphore.
Example,
color-lock-0 takes 20 seconds to cycle through all jobs.
color-lock-1 takes 30 minutes to cycle through all jobs.
color-lock-2 takes 2 minutes to cycle through all jobs.
Then your job will take 30 minutes to run... where as with a true semaphore it would have been much faster because the longer running jobs would take the next available lock in the semaphore rather than be blocked.
Better than nothing; it's what I have so far. Sounds like a good time to open a feature request with the lockable resources plugin.
Since at least a year there's a plugin which will help you to get what you intent and there's also the option to use this plugin in pipeline jobs, the plugin is Lockable Plugin Resource
Basically you wrap your share resource and the job will queue before the lock statement if the resource is not free.
If you are interested in parallelizing your tests you also can have a look at the Parallel Test Executor Plugin
Related
Below is a simplified case.
I have one node named comp01. And I have a Jenkins job named Compatibility.
Compatibility is scheduled as follows:
0 12 * * 1 %IntegrationNode=Software_1
0 17 * * 1 %IntegrationNode=Software_2
0 22 * * 1 %IntegrationNode=Software_3
0 2 * * 2 %IntegrationNode=Software_4
0 7 * * 2 %IntegrationNode=Software_5
The jobs start as scheduled. But sometimes, because of some verification failure, the previous job takes more than expected time. So, the next job starts before the completion of the previous job.
Is there a way available in Jenkins, in which the next scheduled job stays in a queue until previous job is complete? Or can we schedule based on previous job status?
We have tried limiting executors for this job, but when more than a couple of jobs are queued, then the expected behavior is not observed.
We have also tried by creating resource-groups and adding multiple nodes to it, but still, expected behavior is not observed when multiple jobs are in queue.
EDIT-1:
We can't use options { disableConcurrentBuilds() } since we start the job concurrently on different nodes. Here we are struggling to ensure that when a job is started on a node, then the other scheduled jobs for the same node should wait till the current job completes.
Have you tried setting the below option?
options { disableConcurrentBuilds() }
Update
AFAIK there is no OOB solution for your problem. But you can definitely implement something. Without seeing your actual Pipelines I can't give a concrete answer. But here ae some options.
Option 01
Use Lockable Resources and create a resource per Jenkins IntegrationNode and acquire it when running the Job, the next build will wait until the lock is released.
lock(resource: 'IntegrationNode1', skipIfLocked: false) {
echo "Run your logic"
}
Option 02
You can implement a waiting logic to check the status of the previous Build. Here is an sample Pipeline and possible Groovy code you can leverage.
pipeline {
agent any
stages {
stage('Build') {
steps {
script {
echo "Waiting"
def jobName = "JobA"
def buildNum = "92"
waitUntil { !isPending(jobName, buildNum) }
echo "Actual Run"
}
}
}
}
}
def isPending(def JobName, def buildNumber) {
def buildA = Jenkins.instance.getItemByFullName(JobName).getBuild(buildNumber)
return buildA.isInProgress()
}
I have two nodes added in Jenkins,
- master
- node-1
Here is my pipeline script, I would like to lock all the executors on "node-1" before executing anything on "master"
node("master") {
stage("stage-1") {
// here lock node-1
//Execute script
}
}
Is there a way to achieve this? (ie: lock node-1)
My strategy is to mark the node offline as soon as your buyild job catches an executor, and then wait for all the other executors to complete. At this stage, your executor is still active, but the other executors can't get a new build since it's offline, so the node is all for yourself. When you are done, you can markl the node online again.
This requires some serious admin approval for the script.
For instance:
final int sleeptimeSeconds = 10
final String agentname = 'node-1'
echo "Waiting for an executor for ${agentname}..."
node(agentname) {
try {
timeout(time: timeoutminutes, unit: 'MINUTES') {
markAgentOnlineOfOffline(agentname, false)
sleep 5
Computer computer = Jenkins.getInstance().getComputer(agentname)
echo "Waiting for other executors to complete on ${agentname}..."
while (computer.countBusy() > 1) {
sleep sleeptimeSeconds
}
echo "Ready to do work on '${agentname}' in exclusive mode..."
...
}
} catch (e) {
markAgentOnlineOfOffline(agentname, true)
throw e
}
}
def markAgentOnlineOfOffline(String nodeName, boolean online) {
...
}
That last function markAgentOnlineOfOffline can be implemented in a logical (e.g. I use the "offline" label myself, which my jobs reject (li.e. the label requirement includes !offline). But you could use the Jenkins api to mark the node truly offline.
I'm running Jenkins Declarative Pipeline controlles by a central Jenkins master and running on 2 slaves on 2 different sites, siteA and siteB.
I have one stage that needs to be run on both sites (ideally in parallel to save time) that waits until some resources are loaded. So this stage basically runs a script that checks if the resources are loaded and if not, waits X seconds and tries again until all resources are loaded.
What happens is that at one site, resource loading is faster than on the other site so when one site finishes the whole stage is done although the other site is not yet complete.
The pipeline for this stage looks like this:
stage('myStage') {
parallel {
stage('myStage-siteA') {
agent {
node {
label 'siteA'
}
}
steps {
waitForResourcesLoaded(siteA)
}
}
stage('myStage-siteB') {
agent {
node {
label 'siteB'
}
}
steps {
waitForResourcesLoaded(siteB)
}
}
}
}
Is there any way to "synchronize" each parallel stage so that the overall stage "myStage" will only be marked complete once each sub-stage has completed?
I'm new to Jenkins and configuring its scripts, so please forgive me if I say anything stupid.
I have a scripted Jenkins pipeline which redistributes building of the codebase to multiple nodes, implemented using a node block wrapped with parallel block. Now, the catch is that after the building, I would like to do a certain action with files that were just built, on one of the nodes that was building the code - but only after all of the nodes are done. Essentially, what I would like to have is something similar to barrier, but between Jenkins' nodes.
Simplified, my Jenkinsfile looks like this:
def buildConf = ["debug", "release"]
parallel buildConf.collectEntries { conf ->
[ conf, {
node {
sh "./checkout_and_build.sh"
// and here I need a barrier
if (conf == "debug") {
// I cannot do this outside this node block,
// because execution may be redirected to a node
// that doesn't have my files checked out and built
sh "./post_build.sh"
}
}
}]
}
Is there any way I can achieve this?
What you can do is add a global counter which counts the number of completed tasks, you need to instruct each task that have post job to wait until the counter is equal to the total number of tasks, first then you can do the post task parts. Like this:
def buildConf = ["debug", "release"]
def doneCounter = 0
parallel buildConf.collectEntries { conf ->
[ conf, {
node {
sh "./checkout_and_build.sh"
doneCounter++
// and here I need a barrier
if (conf == "debug") {
waitUntil { doneCounter == buildConf.size() }
// I cannot do this outside this node block,
// because execution may be redirected to a node
// that doesn't have my files checked out and built
sh "./post_build.sh"
}
}
}]
}
Please note, each task that has post task parts will block the executor until all other parallell tasks are done and the post part can be executed. If you have loads of executors or the tasks are fairly short, then this is probably not a problem. But if you have few executors it could lead to congestion. If you have less or equal number of executors than the total number of parallell tasks which need post work, then you can run into a deadlock!
I am experimenting with Jenkins pipeline and milestones and cannot figure out why Jenkins is not cancelling the previous build when a new build crosses the milestone.
Example Jenkinsfile
pipeline {
agent any
parameters {
booleanParam(defaultValue: true, description: '', name: 'userFlag')
}
stages {
stage("foo") {
steps {
milestone(ordinal: 1, label: "BUILD_START_MILESTONE")
sh 'sleep 1000'
}
}
}
}
Triggering this pipeline twice does not cancel the 1st job
Try this:
/* This method should be added to your Jenkinsfile and called at the very beginning of the build*/
#NonCPS
def cancelPreviousBuilds() {
def jobName = env.JOB_NAME
def buildNumber = env.BUILD_NUMBER.toInteger()
/* Get job name */
def currentJob = Jenkins.instance.getItemByFullName(jobName)
/* Iterating over the builds for specific job */
for (def build : currentJob.builds) {
/* If there is a build that is currently running and it's not current build */
if (build.isBuilding() && build.number.toInteger() != buildNumber) {
/* Than stopping it */
build.doStop()
}
}
}
I don't think the behavior is "If I'm a newer build that crosses this milestone, then all older build that crossed this milestone will be cancelled"
The actual behavior of the milestone step is that when a more recent pipeline crosses it first, then it prevents older pipeline from crossing that milestone.
I have a simple work around with milestone plugin, according to the document:
Builds pass milestones in order (taking the build number as sorter field).
Older builds will not proceed (they are aborted) if a newer one already passed the milestone.
When a build passes a milestone, any older build that passed the previous milestone but not this one is aborted.
Once a build passes the milestone, it will never be aborted by a newer build that didn't pass the milestone yet.
you can try something like this:
pipeline {
agent any
stages {
stage('Stop Old Build') {
steps {
milestone label: '', ordinal: Integer.parseInt(env.BUILD_ID) - 1
milestone label: '', ordinal: Integer.parseInt(env.BUILD_ID)
}
}
}
}
you can put this at the start of any pipeline.
Assume you just start a new build, #5. The first milestone, will be used to passes #4's second milestone, and the second milestone(of #5) will be used to kill #4's process, if it's currently running.
The disableConcurrentBuilds property has been added to Pipeline. The Pipeline syntax snippet generator offers the following syntax hint:
properties([disableConcurrentBuilds(abortPrevious: true)])
That property is used on ci.jenkins.io to cancel older plugin build jobs when newer plugin build jobs start.
Declarative Pipeline also includes the disableConcurrentBuilds option that is documented in the Pipeline syntax page.
The declarative directive generator suggests the following:
options {
disableConcurrentBuilds abortPrevious: true
}
as per https://jenkins.io/blog/2016/10/16/stage-lock-milestone/, a pair of 'milestone()' works for me to kill the previous jobs while the pipeline kicked off for times,
stage('Build') {
// The first milestone step starts tracking concurrent build order
milestone()
node {
echo "Building"
}}
// The Deploy stage does not limit concurrency but requires manual input
// from a user. Several builds might reach this step waiting for input.
// When a user promotes a specific build all preceding builds are aborted,
// ensuring that the latest code is always deployed.
stage('Deploy') {
timeout(time: 60, unit: 'SECONDS') {input "Deploy?"}
milestone()
node {
echo "Deploying"
}
}
The last milestone helps kill previous builds if reached, say deploy button clicked for the above case. Or the locked resource released for the below case,
// This locked resource contains both Test stages as a single concurrency Unit.
// Only 1 concurrent build is allowed to utilize the test resources at a time.
// Newer builds are pulled off the queue first. When a build reaches the
// milestone at the end of the lock, all jobs started prior to the current
// build that are still waiting for the lock will be aborted
lock(resource: 'myResource', inversePrecedence: true){
node('test') {
stage('Unit Tests') {
echo "Unit Tests"
}
stage('System Tests') {
echo "System Tests"
}
}
milestone()
}
Building on #D.W.'s answer, i found a simple pattern that works. It seems to fit into D.W.'s bullet #3 (which is the official doc): When a build passes a milestone, Jenkins aborts older builds that passed the previous milestone but not this one.
Adding an earlier milestone that everything will pass, and then one after the thing that is going to wait, makes it all work like you think it should. In my case:
steps {
milestone 1
input 'ok'
milestone 2
}
Create two active builds with this, and only approve the second one. You'll see the first one get automatically canceled, because build 2 passed milestone 2 first.
Try taking out milestone 1, and you'll see that build 1 does not get canceled when build 2 passes milestone 2.
Adding the early milestone satisfies the requirement. It seems that a build has to pass any milestone before a future milestone passed by a newer build will cause it to cancel.