is there a way to check in Jenkins pipeline if an executor is already running a job?
I would like to use different environment variables based on this condition.
The pseudo code of the pipeline I want is as following
IF
Build of Job-A is triggered
THEN
Use Environment_Variable_1
USE Executor-1 for Job-A
ELSE IF
JOB-A is running on Executor-1 AND Build of JOB-A is triggered again
THEN
Use Environment_Variable_2
USE Executor-2 for Job-A
The environment variable will hold paths to different folders because the Job is going to make changes to the folder. So when the Job is triggered again on executor 2, then I would like it to change the other folder.
is there a way to check in Jenkins pipeline if an executor is already running a job?
Yes. With jenkins.model.Jenkins.instance.nodes you can get all configured nodes. From those nodes you can get Computer objects with node.toComputer(). From Computer object it is possible to retrieve all Executors on that computer.
for (node in jenkins.model.Jenkins.instance.nodes) {
def computer = node.toComputer() /* computer behind the node */
def executors = computer.getExecutors()
for (executor in executors) {
println("Node name: " + node.getDisplayName())
println("Computer name: " + computer.getDisplayName())
println("Executor name: " + executor.getDisplayName())
println("Executor number: " + executor.getNumber())
println("Is executor busy: " + executor.isBusy())
}
}
Documentation Jenkins Core API:
Class Node
Class Computer
Class Executor
Related
I've been trying to get our CI job in Jenkins to run on spot instances in EC2 (using the Amazon EC2 plugin), and I'm having trouble figuring out how to retry consistently when they get interrupted. The test run is parallelized across several Jenkins nodes that run on EC2 instances. This is the relevant script for the pipeline:
for (int i = 0; i < numNodes; i++) {
int index = i
def nodeDisplayName = "node_${i.toString().padLeft(2, '0')}"
env["NODE_${index}_RETRY_COUNT"] = 0
nodes[nodeDisplayName] = {
retry(2) {
timeout(time: 90, unit: 'MINUTES') {
int retryCount = env["NODE_${index}_RETRY_COUNT"]
nodeLabel = (retryCount == 0) ? "ec2-spot" : "ec2-on-demand"
env["NODE_${index}_RETRY_COUNT"] = retryCount + 1
node(nodeLabel) {
stage('Debug info') {
// ...
}
stage('Run tests') {
// ...
}
}
}
}
}
}
parallel nodes
Most of the time, this works. If a spot-based node gets interrupted, it retries. But occasionally, the retry just doesn't happen. I don't see anything in the logs (or anywhere else) about why it didn't retry. Here's an example of such a run:
One thing that I've noticed is that I always see this message on the build page the same number of times as there were successful retries:
In other words, if 20 nodes were interrupted, and 19 of them were retried, I will see the "Agent was removed" mesasge 19 times. It seems like for some reason jenkins is not always detecting that the agent disappeared.
Another clue is that at the end of the logs from each node, there's a difference between what gets logged for ones that retry vs ones that didn't. On the ones that retry, the log looks like this:
Cannot contact EC2 (ec2-spot) - Jenkins Agent Image (sir-688pdhsm): hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#4b2fd30b:EC2 (ec2-spot) - Jenkins Agent Image (sir-688pdhsm)": Remote call on EC2 (ec2-spot) - Jenkins Agent Image (sir-688pdhsm) failed. The channel is closing down or has closed down
Could not connect to EC2 (ec2-spot) - Jenkins Agent Image (sir-688pdhsm) to send interrupt signal to process
for nodes that don't retry, the end of the log looks like this:
Cannot contact EC2 (ec2-spot) - Jenkins Agent Image (sir-24h6etnm): hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel#63450caa:EC2 (ec2-spot) - Jenkins Agent Image (sir-24h6etnm)": Remote call on EC2 (ec2-spot) - Jenkins Agent Image (sir-24h6etnm) failed. The channel is closing down or has closed down
note that the final line from the first log does not appear. I'm not sure what this means, but I'm hoping someone else might have a clue.
On my Jenkins machine I'm able to reset the excutor node when it hits the suspend state with the following Groovy command:
import jenkins.model.Jenkins
Jenkins.instance.getNode('nodeName').toComputer().setAcceptingTasks(true)
I was wondering if there is a similar way in Groovy to bring a node back online when it hits a offline state. Maybe the "hudson.model.Husdon.instance" has an option to set: computer.connect(true) to trigger a restart of the node.
Something in the likes of:
Jenkins.instance.getNode('').toComputer().setComputerConnect(true)
This is the Groovy script that I have set up op to this point:
import java.util.regex.Matcher
import java.util.regex.Pattern
int exitcode = 0
println("Looking for Offline Slaves:");
for (slave in hudson.model.Hudson.instance.slaves) {
if (slave.getComputer().isOffline().toString() == "true"){
println(' * Slave ' + slave.name + " is offline!");
if (slave.name == “<nodeName>”) {
println('This is <nodeName>.');
exitcode++;
} // if slave.name
} // if slave offline
} // for slave in slaves
println "<nodeName> is offline: " + hudson.model.Hudson.instance.getNode("<nodeName>").getComputer().isOffline().toString()
With this I'm able to report that the node is offline.
Our Jenkins setup consists of master nodes and different / dedicated worker nodes for running jobs in dev, test and prod environment. How do I go about creating a scripted pipeline code that allows users to select environment (possibly from master node) and depending upon the environment selected would execute the rest of the job in the node selected? Here is my initial thought:
stage('Select environment ') {
script {
def userInput = input(id: 'userInput', message: 'Merge to?',
parameters: [[$class: 'ChoiceParameterDefinition', defaultValue: 'strDef',
description:'describing choices', name:'Env', choices: "dev\ntest\nprod"]
])
println(userInput);
}
echo "Environment here ${params.Env}" // prints null here
stage("Build") {
node(${params.Env}) { // schedule job based upon the environment selected earlier
echo "My test here"
}
}
}
I am in the right path or should I be looking at something else?
Another follow up question is that the job that is running on the worker node also requires additional user input. Is there a way to combine the user input in one go such that the users would not be prompted with multiple user screens?
If you pass the environment as a build parameter when kicking off the job, and you have appropriate labels on your nodes, you could do something like:
agent = params.WHAT_NODE
agentLabels = "deploy && ${agent}"
pipeline {
agent { label agentLabels }
....
}
Ended up doing the following for scripted pipeline:
The code for selecting environment can be run on any node (whether master or slaves with agent running). The parameter can be injected into an environment variable: env..
node {
stage('Select Environment'){
env.Env = input(id: 'userInput', message: 'Select Environment',
parameters: [[$class: 'ChoiceParameterDefinition',
defaultValue: 'strDef',
description:'describing choices',
name:'Env',
choices: "jen-dev-worker\njen-test-worker\njen-prod-worker"]
])
println(env.Env);
}
stage('Display Environment') {
println(env.Env);
}
}
The following code snippet ensures that script would be executed on the environment selected in the last step. Requires Jenkins workers with labels: jen-dev-worker, jen-test-worker, jen-prod-worker) available.
node (env.Env) {
echo "Hello world, I am running on ${env.Env}"
}
In Jenkins, we can block a job A if job B is running using Build blocker plugin.
Similarly or in some fashion, I would like a job, for ex: another_dumb_job to NOT run / (wait and let it sit in queue) if there are any in-progress jobs running on any user selected slave(s) until those slaves are free again.
For ex: I don't want to run a Job (which will delete bunch of slaves either offline/online -- using a downstream job or via calling some groovy/scriptler script) until any of those slave(s) have active/in-progress job(s) running on them?
The end goal is to delete Jenkins node slaves gracefully i.e. the node/slave is marked OFFLINE first, then any existing jobs (running on a slave are complete) and then the slaves get deleted.
For deleting all offline nodes, tweak the script below and run doDelete() only on slaves where isOffline() is true or isOnline() is false. If you want to delete all nodes (be careful) then don't use the the following if statement:
if ( aSlave.name.indexOf(slaveStartsWith) == 0) {
I'm also ignoring a slave (if you want to ALWAYS ignore a slave from getting deleted). It can be enhanced to use a list of slaves to ignore.
Anyways, the following script will gracefully delete any Jenkins node slaves which starts with a given name (so that you have more control) and it'll mark offline (asap) but delete it only after any running job(s) on that given slave(s) is/are complete. Thought I should share here.
Using Jenkins Scriptler Plugin, one can import/upload/run this script: https://github.com/gigaaks/jenkins-scripts/blob/7eaf41348e886db108bad9a72f876c3827085418/scriptler/disableSlaveNodeStartsWith.groovy
/*** BEGIN META {
"name" : "Disable Jenkins Hudson slaves nodes gracefully for all slaves starting with a given value",
"comment" : "Disables Jenkins Hudson slave nodes gracefully - waits until running jobs are complete.",
"parameters" : [ 'slaveStartsWith'],
"core": "1.350",
"authors" : [
{ name : "GigaAKS" }, { name : "Arun Sangal" }
]
} END META**/
// This scriptler script will mark Jenkins slave nodes offline for all slaves which starts with a given value.
// It will wait for any slave nodes which are running any job(s) and then delete them.
// It requires only one parameter named: slaveStartsWith and value can be passed as: "swarm-".
import java.util.*
import jenkins.model.*
import hudson.model.*
import hudson.slaves.*
def atleastOneSlaveRunnning = true;
def time = new Date().format("HH:mm MM/dd/yy z",TimeZone.getTimeZone("EST"))
while (atleastOneSlaveRunnning) {
//First thing - set the flag to false.
atleastOneSlaveRunnning = false;
time = new Date().format("HH:mm MM/dd/yy z",TimeZone.getTimeZone("EST"))
for (aSlave in hudson.model.Hudson.instance.slaves) {
println "-- Time: " + time;
println ""
//Dont do anything if the slave name is "ansible01"
if ( aSlave.name == "ansible01" ) {
continue;
}
if ( aSlave.name.indexOf(slaveStartsWith) == 0) {
println "Active slave: " + aSlave.name;
println('\tcomputer.isOnline: ' + aSlave.getComputer().isOnline());
println('\tcomputer.countBusy: ' + aSlave.getComputer().countBusy());
println ""
if ( aSlave.getComputer().isOnline()) {
aSlave.getComputer().setTemporarilyOffline(true,null);
println('\tcomputer.isOnline: ' + aSlave.getComputer().isOnline());
println ""
}
if ( aSlave.getComputer().countBusy() == 0 ) {
time = new Date().format("HH:mm MM/dd/yy z",TimeZone.getTimeZone("EST"))
println("-- Shutting down node: " + aSlave.name + " at " + time);
aSlave.getComputer().doDoDelete();
} else {
atleastOneSlaveRunnning = true;
}
}
}
//Sleep 60 seconds
if(atleastOneSlaveRunnning) {
println ""
println "------------------ sleeping 60 seconds -----------------"
sleep(60*1000);
println ""
}
}
Now, I can create a free-style jenkins job, use Scriptler script in Build action and use the above script to gracefully delete slaves starting with a given name (job parameter getting passed to scriptler script).
If you are fast enough to get the following error message, that means, you ran or called the Scriptler script (as shown above) in a job and restricted that job to run on a non-master aka node/slave machine. Scriptler Scripts are SYSTEM Groovy scripts i.e. they must run on Jenkins master's JVM to access all Jenkins resources/tweak them. To fix the following issue, you can create a job (restrict it to run on master server i.e. Jenkins master JVM) which will just accept one parameter for the scriptler script and call this job from the first job (as Trigger a project/job and block until the job is complete):
21:42:43 Execution of script [disableSlaveNodesWithPattern.groovy] failed - java.lang.NullPointerException: Cannot get property 'slaves' on null objectorg.jenkinsci.plugins.scriptler.util.GroovyScript$ScriptlerExecutionException: java.lang.NullPointerException: Cannot get property 'slaves' on null object
21:42:43 at org.jenkinsci.plugins.scriptler.util.GroovyScript.call(GroovyScript.java:131)
21:42:43 at hudson.remoting.UserRequest.perform(UserRequest.java:118)
21:42:43 at hudson.remoting.UserRequest.perform(UserRequest.java:48)
21:42:43 at hudson.remoting.Request$2.run(Request.java:328)
21:42:43 at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
21:42:43 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
21:42:43 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
21:42:43 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
21:42:43 at java.lang.Thread.run(Thread.java:745)
21:42:43 Caused by: java.lang.NullPointerException: Cannot get property 'slaves' on null object
i.e.
If you have Scriptler script build step running in a job (which is not running on a MASTER Jenkins machine/JVM), then the above error will come and to solve it, create a job "disableSlaveNodesStartsWith" and restrict it to run on master (safer side) and calling Scriptler script and pass parameter to the job/script.
Now, from the other job, call this job:
I'm using MultiJob plugin and have a job (Job-A) that triggers Job-B several times.
My requirement is to copy some artifact (xml files) from each build.
The difficulty I have is that using Copy Artifact Plugin with "last successful build" option will only take the last build of Job-B, while I need to copy from all builds that were triggered on the same build of Job-A
The flow looks like:
Job-A starts and triggers:
`Job-A` -->
Job-B build #1
Job-B build #2
Job-B build #3
** copy artifcats of all last 3 builds, not just #3 **
Note: Job-B could be executed on different slaves on the same run (I set the slave to run on dynamically by setting parameter on upstream job-A)
When all builds are completed, I want Job-A to copy artifact from build #1, #2 and #3 , and not just from last build.
How can I do this?
Here is more generic groovy script; it uses the groovy plugin and the copyArtifact plugin; see instructions in the code comments.
It simply copies artifacts from all downstream jobs into the upstream job's workspace.
If you call the same job several times, you could use the job number in the copyArtifact's 'target' parameter to keep the artifacts separate.
// This script copies artifacts from downstream jobs into the upstream job's workspace.
//
// To use, add a "Execute system groovy script" build step into the upstream job
// after the invocation of other projects/jobs, and specify
// "/var/lib/jenkins/groovy/copyArtifactsFromDownstream.groovy" as script.
import hudson.plugins.copyartifact.*
import hudson.model.AbstractBuild
import hudson.Launcher
import hudson.model.BuildListener
import hudson.FilePath
for (subBuild in build.builders) {
println(subBuild.jobName + " => " + subBuild.buildNumber)
copyTriggeredResults(subBuild.jobName, Integer.toString(subBuild.buildNumber))
}
// Inspired by http://kevinormbrek.blogspot.com/2013/11/using-copy-artifact-plugin-in-system.html
def copyTriggeredResults(projName, buildNumber) {
def selector = new SpecificBuildSelector(buildNumber)
// CopyArtifact(String projectName, String parameters, BuildSelector selector,
// String filter, String target, boolean flatten, boolean optional)
def copyArtifact = new CopyArtifact(projName, "", selector, "**", null, false, true)
// use reflection because direct call invokes deprecated method
// perform(Build<?, ?> build, Launcher launcher, BuildListener listener)
def perform = copyArtifact.class.getMethod("perform", AbstractBuild, Launcher, BuildListener)
perform.invoke(copyArtifact, build, launcher, listener)
}
I suggest you the following approach:
Use Execute System Groovy script from Groovy Plugin to execute the following script:
import hudson.model.*
// get upstream job
def jobName = build.getEnvironment(listener).get('JOB_NAME')
def job = Hudson.instance.getJob(jobName)
def upstreamJob = job.upstreamProjects.iterator().next()
// prepare build numbers
def n1 = upstreamJob.lastBuild.number
def n2 = n1 - 1
def n3 = n1 - 2
// set parameters
def pa = new ParametersAction([
new StringParameterValue("UP_BUILD_NUMBER1", n1.toString()),
new StringParameterValue("UP_BUILD_NUMBER2", n2.toString()),
new StringParameterValue("UP_BUILD_NUMBER3", n3.toString())
])
Thread.currentThread().executable.addAction(pa)
This script will create three environment variables which correspond to three last build numbers of upstream job.
Add three build steps Copy artifacts from upstream project to copy artifacts from last three builds of upstream project (use environment variables from script above to set build number):
Run build and checkout build log, you should have something like this:
Copied 2 artifacts from "A" build number 4
Copied 2 artifacts from "A" build number 3
Copied 1 artifact from "A" build number 2
Note: perhaps, script need to be adjusted to catch unusual cases like "upstream project has only two builds", "current job doesn't have upstream job", "current job has more than one upstream job" etc.
You can use the following example from an "Execute Shell" Build Step.
Please note it can be run only from the Jenkins Master machine and the job calling this step also triggered the MultiJob.
#--------------------------------------
# Copy Artifacts from MultiJob Project
#--------------------------------------
PROJECT_NAME="MY_MULTI_JOB"
ARTIFACT_PATH="archive/target"
TARGET_DIRECTORY="target"
mkdir -p $TARGET_DIRECTORY
runCount="TRIGGERED_BUILD_RUN_COUNT_${PROJECT_NAME}"
for ((i=1; i<=${!runCount} ;i++))
do
buildNumber="${PROJECT_NAME}_${i}_BUILD_NUMBER"
cp $JENKINS_HOME/jobs/$PROJECT_NAME/builds/${!buildNumber}/$ARTIFACT_PATH/* $TARGET_DIRECTORY
done
#--------------------------------------