Jenkins Matrix Groovy Execution Strategy Plugin hangs if label is offline - jenkins

I am trying to implement a more complex combination filter for Jenkins using the Matrix Groovy Execution Strategy Plugin. See my previous question for more details. It seems to work otherwise but if the nodes where the label is set are offline, the matrix job hangs and does not put the rest of the matrix items into the job queue.
This is enough Groovy to cause the same effect in the plugin:
combinations.each{
result[it.cfg] = result[it.cfg] ?: []
result[it.cfg] << it
}
return [result, true]
If I set the execution strategy to "Classic", all the job labels go into the queue even if some nodes are offline. I have "Execute concurrent builds if necessary" enabled if that makes any difference.
Is there some setting I need to fix or is this a plugin issue?

Thats because the classic strategy puts all the keystone jobs in the queue and then the others afterwards.
This plugin will schedule them in sections and if the node is offline then they will wait, which is standard behaviour
you could try this
Note: I wrote the matrix execution strategy plugin
Incorporated comments
You can force all the combinations to submit in one go by doing the following:
combinations.each{
result["a"] = result["a"] ?: []
result["a"] << it
}
return [result, true]

Related

Jenkins: coordinating multiple pipelines

I am developing software for an embedded device. The steps involved in building and verifying it all are complicated: creating the build environment (via containers), building the actual SD card image, running unit tests, automated tests on target hardware, license compliance checks and so on - details aren't important here.
Currently I have this in one long declarative Jenkinsfile as a multibranch-pipeline (for all intents and purpose here, we're doing gitflow). In doing this I've hit a limit on the size of a Jenkinsfile (https://issues.jenkins.io/browse/JENKINS-37984) and can't actually get all the stages in that I want to.
It's too big so i need to cut this massive pipeline up. I broke this all up in little pipeline jobs with parameters to pass data/context between each part of the pipeline and came up with something like this:
I've colour-coded the A and B artifacts as they're used a lot and the lines would make things messy. What this tries to show is an order of running things, where things in a column depend on artifacts created in column to the left.
I'm struggling to discover how to do the "waiting" for multiple upstream jobs (for instance in Job Foxtrot in the diagram) before starting another downstream job that depends on them.
I specifically do not want to turn each column in the diagram into a parallel group of things, because for instance Job Delta might take 2 minutes but Job Charlie take 20 minutes. The exact duration of each job is variable and unpredictable as for some parameter combinations will mean building from scratch and others will cause an existing artifact to be output.
I think I need something like the join plugin (https://plugins.jenkins.io/join/), but for pipeline jobs (join only works on freestyle jobs and is quite aged).
The one approach I've explored is to have a "controller" job (maybe job Alpha in the diagram?) that uses the build step (https://www.jenkins.io/doc/pipeline/steps/pipeline-build-step/_) with the wait parameter set to false to trigger the downstream jobs in correct order, with the correct parameters. It would involve searching Jenkins.instance.getItems() to locate the Runs for the downstream projects, which have an upstream cause that matches the currently executing "controller" job. This involves polling waiting for the job to appear and then polling for the job to complete. This feels like I'm "doing it wrong". Below is the source for this polling approach - be gentle, i'm new to groovy!
Is this polling approach a good way? What problems could I encounter with this approach? Should I be using the ItemListener Jenkins ExtensionPoint and writing a plugin to do this sort of thing in a generic way? Is there another way I've not found?
I feel like I'm not "holding it right" when it comes to the overall pipeline design/architecture here.
Finally after writing this I notice that Jobs India, Juliet and Kilo could be collapsed into a single Job, but I don't think that solve much.
#NonCPS
Integer getTriggeredBuildNumber(String project, String causeJobName, Integer causeBuildNumber) {
//find the job/project first
def job = Jenkins.instance.getAllItems(org.jenkinsci.plugins.workflow.job.WorkflowJob.class).find { job -> job.getFullName() == project }
//find a build for this job that was caused by the current build
def build = job.getBuilds().find { build ->
build.getCauses().findAll{ it.class == hudson.model.Cause.UpstreamCause.class }.find { cause ->
cause.getUpstreamProject() == causeJobName && cause.getUpstreamBuild() == causeBuildNumber
} != null
}
if(build != null) {
return build.getNumber()
} else {
return -1
}
}
#NonCPS
Boolean isBuildComplete(String jobName, Integer buildNumber) {
def job = Jenkins.instance.getAllItems(org.jenkinsci.plugins.workflow.job.WorkflowJob.class).find { job -> job.getFullName() == jobName }
if(job) {
def build = job.getBuildByNumber(buildNumber)
return build.isBuilding() == false && build.getResult() != null
} else {
println "WARNING: job '" + jobName + "' not found."
return false
}
}
We've hit the "Code too large" too many times, but the way to cope with it is to refactor your pipeline to remain deep under the limit. The following may be used:
You can run a combination of scripted and declarative pipeline. So some stages in the beginning and/or in the end may be refactored out.
You can build some of the parallel stages dynamically. This code would not be counted towards the limited code size.
Lastly, the issue mentions transformation variables, and that can help too.
We used the combination of the above and have expanded our pipeline well beyond what it was when we first encountered the issue you're facing.

Jenkins pipeline queue gets full when all agents are offline

I am using a Jenkins pipeline script and when all nodes are offline, the builds keep on queuing up. How do I stop Jenkins from adding jobs to the queue while all slaves are offline?
pipeline {
triggers {
pollSCM('H/3 * * * 1-5')
}
}
Is your agent's availability configured to 'Keep this agent online as much as possible' ?
One way to tackle this situation is, run the below script on master node and build your pipeline(s) only if at least one of the nodes is online. You can pass the online node name to your downstream job as a parameter.
def axis = []
for (slave in jenkins.model.Jenkins.instance.getNodes()) {
if (slave.toComputer().isOnline()) {
axis += slave.getDisplayName()
}
}
return axis
Above script source: Jenkins: skip if node is offline
Other links that may help are:
Monitor and restart your slave nodes - https://wiki.jenkins.io/display/JENKINS/Monitor+and+Restart+Offline+Slaves
I found this script handy in some situations:
https://github.com/jenkinsci/jenkins-scripts/blob/master/scriptler/clearBuildQueue.groovy
I'm not into pipeline jobs, but for regular freestyle jobs, this kind of queueing will only happen if your builds are parameterized. Seperate builds are needed then to ensure that the project will run seperately for each and every parameter value (it does not matter whether the value is actually different).
So, removing build parameters in your project might solve the problem.

jenkins fails on building a downstream job

I'm trying to trigger a downstream job from my current job like so
pipeline {
stages {
stage('foo') {
steps{
build job: 'my-job', propagate: true, wait: true
}
}
}
}
The purpose is to wait on the job result and fail or succeed according to that result. Jenkins is always failing with the message Waiting for non-job items is not supported . The job mentioned above does not have any parameters and is defined like the rest of my jobs, using multibranch pipeline plugin.
All i can think of is that this type of jenkins item is not supported as a build step input, but that seems counterintuitive and would prove to be a blocker to me. Can anyone confirm if this is indeed the case?
If so, can anyone suggest any workarounds?
Thank you
I actually managed to fix this by paying more attention to the definition of the build step. Since all my downstream jobs are defined as multibranch pipeline jobs, their structure is folder-like, with each item in the folder representing a separate job. Thus the correct way to call the downstream jobs was not build job: 'my-job', propagate: true, wait: true, but rather build job: "my-job/my-branch-name", propagate: true, wait: true.
Also, unrelated to the question but related to the issue at hand, make sure you always have at least one more executor free on the jenkins machine, since the wait on syntax will consume one thread for the waiting job and one for the job being waited on, and you can easily find yourself in a resource-starvation type situation.
Hope this helps
This looks like JENKINS-45443 which includes the comment
Pipeline has no support for the upstream/downstream job system, in part due to technical limitations, in part due to the fact that there is no static job configuration that would make this possible except by inspecting recent build metadata.
But it also offer the workaround:
as long as the solution is still ongoing, I include here our workaround. It is based in the rtp (Rich Text Publisher) plugin, that you should have installed to make it work:
At the end of our Jenkinsfile and after triggering the job, we wait it to finish. In that case, build() returns the object used to run the downstream job. We get the info from it.
Warning: getAbsoluteUrl() function is a critical one. Use it at your own risk!
def startedBld = build(
job: YOUR_DOWNSTREAM_JOB,
wait: true, // VERY IMPORTANT, otherwise build () does not return expected object
propagate: true
)
// Publish the started build information in the Build result
def text = '<h2>Downstream jobs</h2>Started job ' + startedBld.rawBuild.toString () + ''
rtp (nullAction: '1',parserName: 'HTML', stableText: text)
This issue is part of JENKINS-29913, opened for the past two years:
Currently DependencyGraph is limited to AbstractProject, making it impossible for Workflows to participate in upstream/downstream relationships (in cases where job chaining is required, for example due to security constraints).
It refers the RFE (Request for Enhancement) JENKINS-37718, based on another (unanswered) Stack Overflow question.

How to abort Jenkins pipeline build if label is not matched

I have a Jenkinsfile multibranch pipeline script, which runs on two different Jenkins systems. Jenkinsfile relies on a specific label name. In one of the systems, the label based agent is available and in another not (intentionally). In the former it runs fine. In the Jenkins system without the matching label, the job just hangs because it cant find a matching agent.
Is there a way to specify an option to abort (or not start) a build if a label is not found?
Some discussion here:
https://issues.jenkins-ci.org/browse/JENKINS-35905
Might not be possible anytime soon
If they are calling in to a shared library then you can check for label being online/available and then fail the build
def computers = Jenkins.instance.computers
for(computer in computers){
if(computer.isOnline()){
labelStr = computer.node.getLabelString()
}
if labelStr ~= /user input/
break;
}
System.exit(1); // no label
For a declarative pipeline it may be possible to use when{beforeAgent} to test whether a label exists.
This would only be useful where the agent is specified for a stage rather than the whole pipeline.
...and caveat that this is an as yet untested hypothesis.
Just a workaround, but in order to avoid dependency on shared lib I run the below every X minutes to clean-up culprits from queue:
import hudson.model.*
def q = Jenkins.instance.queue
q.items.each {
if (it =~ /someregex or match all/) {
why = it.getWhy()
if (why =~ /.*There are no nodes with the label.*/) {
println "No node found for $it.task.runId. It's stuck in damn jenkins queue forever and ever. Killing it"
q.cancel(it.task)
}
}
}

Multiple concurrent builds of the same project in Jenkins

On my team, we have a project that we want to do continuous-integration-style testing on. Our build takes around 2 hours and is triggered by the "Poll SCM" trigger (using Perforce as the server), and we have two build nodes.
Currently, if someone checks in a change, one build node will start up pretty much right away, but if another change gets checked in, the other node will not kick in, as it's waiting for the previous job to finish. However, I could like the other build node to start a build with the newer checkin as soon as possible, so that we can maximize the amount of continuous testing that's occurring (so that if e.g. one build fails we know sooner rather than later).
Is there any simple way to configure a Jenkins job (using Poll SCM against a Perforce server) to not block while another instance of the job is already running?
Unfortunately, due to the nature of the project it's not possible to simply break the project up into multiple build jobs that get pipelined across multiple slaves (as much as I'd like to change it to work in this way).
Use the "Execute concurrent builds if necessary" option in Jenkins configuration.
Just to register here in case someone needs it, in the version I'm using (Jenkins 2.249.3) I had to uncheck the option Do not allow concurrent builds in the child job that is called multiple times from the parent job.
The code is more or less like that:
stage('STAGE IN THE PARENT JOB') {
def subParallelJobs = [:]
LIST_OF_PARAMETERS = LIST_OF_PARAMETERS.split(",")
for (int i = 0; i < LIST_OF_PARAMETERS.size(); i++) {
MY_PARAMETER_VALUE = LIST_OF_PARAMETERS[i].trim()
MY_KEY_USING_THE_PARAMETER_TO_MAKE_IT_UNIQUE = "JOB_KEY_${MY_PARAMETER_VALUE}"
def jobParams = [ string(name: 'MY_JOB_PARAMETER', value: MY_PARAMETER_VALUE) ]
subParallelJobs.put("MY_KEY_USING_THE_PARAMETER_TO_MAKE_IT_UNIQUE", {build (job: "MY_CHILD_JOB", parameters: jobParams)})
}
parallel(subParallelJobs)
}
}

Resources