How can I wait for all executors inside Jenkinsfile's "parallel" block? - jenkins

I'm new to Jenkins and configuring its scripts, so please forgive me if I say anything stupid.
I have a scripted Jenkins pipeline which redistributes building of the codebase to multiple nodes, implemented using a node block wrapped with parallel block. Now, the catch is that after the building, I would like to do a certain action with files that were just built, on one of the nodes that was building the code - but only after all of the nodes are done. Essentially, what I would like to have is something similar to barrier, but between Jenkins' nodes.
Simplified, my Jenkinsfile looks like this:
def buildConf = ["debug", "release"]
parallel buildConf.collectEntries { conf ->
[ conf, {
node {
sh "./checkout_and_build.sh"
// and here I need a barrier
if (conf == "debug") {
// I cannot do this outside this node block,
// because execution may be redirected to a node
// that doesn't have my files checked out and built
sh "./post_build.sh"
}
}
}]
}
Is there any way I can achieve this?

What you can do is add a global counter which counts the number of completed tasks, you need to instruct each task that have post job to wait until the counter is equal to the total number of tasks, first then you can do the post task parts. Like this:
def buildConf = ["debug", "release"]
def doneCounter = 0
parallel buildConf.collectEntries { conf ->
[ conf, {
node {
sh "./checkout_and_build.sh"
doneCounter++
// and here I need a barrier
if (conf == "debug") {
waitUntil { doneCounter == buildConf.size() }
// I cannot do this outside this node block,
// because execution may be redirected to a node
// that doesn't have my files checked out and built
sh "./post_build.sh"
}
}
}]
}
Please note, each task that has post task parts will block the executor until all other parallell tasks are done and the post part can be executed. If you have loads of executors or the tasks are fairly short, then this is probably not a problem. But if you have few executors it could lead to congestion. If you have less or equal number of executors than the total number of parallell tasks which need post work, then you can run into a deadlock!

Related

Multiple job for same cronjob with different params in Jenkins

We are using a third-party service to create and use vouchers. There are 80k+ vouchers already made. One of our cronjobs checks the status (used/unused) of each voucher one by one synchronously and updates it in our server database. It takes 2hours to complete one pass, then it continues from the first voucher for the next pass.
Constraints:
the third party supports the 6 queries per second(QPS).
We have only a primary Jenkins server and no agent nodes.
With one Jenkins server can we improve the execution time?
Can we set up multiple jobs executing parallelly on a primary Jenkins server for the same cronjob? Like the first 50k records are processed by one of the jobs and the rest are processed by another.
If you have room to vertical scale your VM in case you hit a resource(CPU, Memory) bottleneck, you should be able to achieve the performance. IMV best option is using Parallel stages in your Pipeline. If you know the batch sizes beforehand, you can hardcode the sizes within each stage, If you want to add some logic to determine how many records you have and then based on that allocate records, you can create a Pipeline with dynamic stages, something like below.
pipeline {
agent any
stages {
stage('Parallel') {
steps {
script {
parallel parallelJobs()
}
}
}
}
}
def getBatches() {
// Have some logic to allocate batches
return ["1-20000", "20000-30000", "30000-50000"]
}
def parallelJobs() {
jobs = [:]
for (batch in getBatches()) {
jobs[batch] = { stage(batch) {
echo "Processing Batch $batch"
}
}
}
return jobs
}

Jenkins - how to run a single stage using 2 agents

I have a script that acts as a "test driver" (TD). That is, it drives test operations on a "system under test" (SUT). When I run my test framework script (tfs.sh) on my TD, it takes a SUT as an argument. The manual workflow looks like this:
TD ~ $ ./tfs.sh --sut=<IP of SUT>
I want to have a cluster of SUTs (they will have different OSes, and each will repeat a few times), and a few TDs (like, 4 or 5, so driving tests won't be a bottleneck, actually executing them will be).
I don't know the Jenkins primitive with which to accomplish this. I would like it if a Jenkins stage could simply be invoked with 2 agents. One would obviously be the TD, that's what would actually run the script. And the other would be the SUT. Jenkins would manage locking & resource contention like this.
As a workaround, I could simply have all my SUTs entirely unmanaged by Jenkins, and manually implement locking of the SUTs so 2 different TDs don't try to grab the same one. But why re-invent the wheel? And besides, I'd rather work on a Jenkins plugin to accomplish this than on a manual solution.
How can I run a single Jenkins stage on 2 (or more) agents?
If I understand your requirement correctly, you have a static list of SUTs and you want Jenkins to start the TDs by allocating SUTs for each TD. I'm assuming TDs and SUTs have a one-to-one relationship. Following is a very simple example of how you can achieve what you need.
pipeline {
agent any
stages {
stage('parallel-run') {
steps {
script {
try {
def tests = getTestExecutionMap()
parallel tests
} catch (e) {
currentBuild.result = "FAILURE"
}
}
}
}
}
}
def getTestExecutionMap() {
def tests = [:]
def sutList = ["IP1", "IP2" , "IP3"]
int count = 0
for(String ip : sutList) {
tests["TEST${count}"] = {
node {
stage("TD with SUT ${ip}") {
script {
sh "./tfs.sh --sut=${ip}"
}
}
}
}
count++
}
return tests
}
The above pipeline will result in the following.
Further if you wan to select the agent you want to run the TD. You can specify the name of the agent in the node block. node(NAME) {...} . You can improve the Agent selection criteria accordingly. For example you can check how many Jenkins executors are idling for a given Agent and then decide how many TDs you will start there.

How to restrict parallel jobs to particular agents in Declarative Pipeline

I have 3 nodes: A, B, C
On each of these nodes I set up a Jenkins agent with its own root directory
They all have the following label: test && database && mysql
I want to run a job in parallel on all 3 nodes, to clean the workspace folder on them
To achieve that, I wrote this Jenkins script
def labels = "test && mysql && database"
def getNodesName(labels){
def targets = []
def nodes = Jenkins.instance.getLabel(labels).getNodes()
for(node in nodes){
targets.add(node.getNodeName())
}
return targets
}
def nodes = getNodesName(labels)
def cleanWSTasks(targets){
tasks = [:]
for(target in targets){
tasks[target] = {
node(target){
script {
cleanWs()
}
}
}
}
return tasks
}
pipeline{
agent none
stages{
stage ('Clean Workspace'){
steps{
script{
parallel cleanWSTasks(nodes)
}
}
}
}
}
So I thought with node(target) in the cleanWsTasks function I already told Jenkins to restrict the execution of the task only on the particular target node I want. So that all 3 nodes will start cleaning their own workspaces at the same time.
However, what I see is that only 1 node picked up the task to cleanUp the workspace, and it does it 3 times.
For example, it shows:
Running on node A in ...
clean up workspace ..
Running on node A in ...
clean up workspace ..
Running on node A in ...
clean up workspace ..
What did I do wrong in my code? Please help.
The node step is working correctly, the problem you're coming across has to do with how you're defining your tasks.
In your for loop, you're assigning this closure:
{
node(target){
script {
cleanWs()
}
}
to tasks[target].
The code inside the closure won't get evaluated until you execute the closure. So even though you assign node(target) inside the for loop, target's value won't get evaluated until parallel tasks runs, which is when the closure is executed. That happens after the for loop has finished running and so target's value is the name of the last node in your list of nodes.
An easy fix for this is to create a variable in your for loop that's equal to target and use that inside the closure, because you will force the evaluation of target to happen inside your for loop, instead of when the closure runs.
That would look like this:
def cleanWSTasks(targets){
tasks = [:]
for(target in targets){
def thisTarget = target
tasks[thisTarget] = {
node(thisTarget){
script {
cleanWs()
}
}
}
}
return tasks
}

Continue using node after parallel runs

After running tests in parallel, I need to immediately send out notifications. Currently, the parallel nodes are ran then node is given up and the send notifications sometimes waits for next available node.
// List of tasks, one for each marker/label type.
def farmTasks = ['ac', 'dc']
// Create a number of agent tasks that matches the marker/label type.
// Deploys the image to the board, then checks out the code on the agent and
// runs the tests against the board.
timestamps {
stage('Test') {
def test_tasks = [:]
for (int i = 0; i < farmTasks.size(); i++) {
String farmTask = farmTasks[i]
test_tasks["${farmTask}"] = {
node("linux && ${farmTask}") {
stage("${farmTask}: checkout on ${NODE_NAME}") {
// Checkout without clean
doCheckout(false)
}
stage("${farmTask} tests") {
<code>
}
} // end of node
} // end of test_tasks
} // end of for
parallel test_tasks
node('linux') {
sendMyNotifications();
}
} // end of Test stage
} // end of timestamps
Frankly, this code seems totally fine. I'm yet to understand how the notification needs to wait for a node (do you have more pipelines that use these agents? are there multiple instances of this pipeline running concurrently?), however the workaround to this issue is simple:
Set up another agent (it can reside on machines that already host existing agents) and give it a unique label (e.g. notifications) so its sole use will be to send notifications.
It's not perfect because you get a single point of failure, but it help remedy the situation while you figure out what causes the "real" agents to be unavailable after the parallel steps.

Jenkins Pipeline and semaphores

I'm building a Jenkins job that will run all my staging tests continuously, but not all at once (they rely on shared hardware). So, I'm creating parallel jobs, with a semaphore to ensure that only a limited amount are run at once.
Here's a simplified version of my pipeline that reproduces the issue:
import java.util.concurrent.Semaphore
def run(job) {
return {
this.limiter.acquire();
try {
println "running ${job}"
build job
println "finished ${job}"
} finally {
this.limiter.release();
}
}
}
def getJobs() {
def allJobs = Jenkins.getInstance().getJobNames()
def stagingJobs = []
for(String job : allJobs) {
if (job.startsWith("staging/temp")) {
stagingJobs.add(job)
}
}
println "${stagingJobs.size()} jobs were found."
return stagingJobs
}
this.limiter = new Semaphore(2)
def jobs = [:]
for (job in getJobs()) {
jobs[job] = run(job)
}
parallel jobs
When I run without the semaphores, everything works fine. But with the code above, I get nothing outputted except:
[Pipeline] echo
6 jobs were found.
[Pipeline] parallel
[Pipeline] [staging/temp1] { (Branch: staging/temp1)
[Pipeline] [staging/temp2] { (Branch: staging/temp2)
[Pipeline] [staging/temp3] { (Branch: staging/temp3)
[Pipeline] [staging/temp4] { (Branch: staging/temp4)
[Pipeline] [staging/temp5] { (Branch: staging/temp5)
[Pipeline] [staging/temp6] { (Branch: staging/temp6)
If I view the pipeline steps, I can see the first two jobs start, and their log messages output. However, it seems like the runner never receives a notification that the staging jobs finish. As a result, the semaphore never releases and the other 4 jobs never manage to start. Here's a thread dump mid test, after the downstream builds have definitely finished:
Thread #7
at DSL.build(unsure what happened to downstream build)
at WorkflowScript.run(WorkflowScript:9)
at DSL.parallel(Native Method)
at WorkflowScript.run(WorkflowScript:38)
Thread #8
at DSL.build(unsure what happened to downstream build)
at WorkflowScript.run(WorkflowScript:9)
Thread #11
at WorkflowScript.run(WorkflowScript:6)
Thread #12
at WorkflowScript.run(WorkflowScript:6)
Eventually it times out with several java.lang.InterruptedException errors.
Is it possible to use semaphores in a pipeline, or is there a better way to ensure only a portion of jobs run at once? I would rather avoid spinning up nodes for what amounts to a simple test runner.
The Concurrent Step plugin was just released and should work nicely for this use case.
Wtih this, you can simplify your code:
def semaphore = createSemaphore permit:2
def run(job) {
return {
acquireSemaphore (semaphore) {
println "running ${job}"
build job
println "finished ${job}"
}
}
}
...
Possible workaround with lock step
Lockable resources plugin has no semaphore capabilities.
It took me a long time to figure out how to squeeze the lock step into semaphore behavior... it would be nice if it could do it on its own. Here's an example...
int concurrency = 3
List colors = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']
Map tasks = [failFast: false]
for(int i=0; i<colors.size(); i++) {
String color = colors[i]
int lock_id = i % concurrency
tasks["Code ${color}"] = { ->
stage("Code ${color}") {
lock("color-lock-${lock_id}") {
echo "This color is ${color}"
sleep 30
}
}
}
}
// execute the tasks in parallel with concurrency limits
stage("Rainbow") {
parallel(tasks)
}
The above will create custom locks:
color-lock-0
color-lock-1
color-lock-2
The all concurrent tasks will race for one of the three locks. It's not perfectly efficient (certainly not as efficient as a real semaphore) but it does a good enough job...
Hopefully that helps others.
Limitations
Your pipeline will take as long as your slowest locks. So if you unfortunately have several long running jobs racing for the same lock (e.g. color-lock-1), then your pipeline could be longer than if it were a proper semaphore.
Example,
color-lock-0 takes 20 seconds to cycle through all jobs.
color-lock-1 takes 30 minutes to cycle through all jobs.
color-lock-2 takes 2 minutes to cycle through all jobs.
Then your job will take 30 minutes to run... where as with a true semaphore it would have been much faster because the longer running jobs would take the next available lock in the semaphore rather than be blocked.
Better than nothing; it's what I have so far. Sounds like a good time to open a feature request with the lockable resources plugin.
Since at least a year there's a plugin which will help you to get what you intent and there's also the option to use this plugin in pipeline jobs, the plugin is Lockable Plugin Resource
Basically you wrap your share resource and the job will queue before the lock statement if the resource is not free.
If you are interested in parallelizing your tests you also can have a look at the Parallel Test Executor Plugin

Resources