Restart Jenkins job if a particular error is found - jenkins

We are building a large number of variants of our code in every nightly build and naturally there will often be intermittent errors, even if the chance of a single error is only a fraction of a percent. Some of the most common ones are slaves that disconnect during a build and servers that don't respond.
The build failure analyzer plugin can categorize different failure causes but what we need is a plugin that can act on those problems and retrigger the build if there is an intermittent error. Preferebly the solution should fit into our build flow so that the results propagate to the job that creates the build report.
Is there such a plugin or other tool for doing this?

Here's Naginator Plugin!
you can :
Rerun build for unstable builds as well as failures
Only rebuild the job if the build's log output contains a given regular expression
Rerun build only for the failed parts of a matrix job

If you are using a Pipeline, you can define a generic retry method that takes an invocation and a retriable strategy as input.
What it does is to tee the execution in a separate file (using the tee step of Pipeline Utility Steps Plugin), then read the file and check if the retriable strategy applies.
Something like:
def retry(execution, isRetriable) {
def retryCount = 0
while (true) {
def file = "execution-${System.currentTimeMillis()}.log"
try {
tee(file) {
execution()
}
break
} catch (Exception e) {
def output = readFile(file: file)
if (isRetriable(output)) {
retryCount++
if (retryCount == 5) {
throw e
}
} else {
throw e
}
}
}
}
And then wrap your invocation with retry:
retry(
{ stepThatOccasionallyFails() },
{ output -> output.contains('a random error!') }
)

Related

Jenkins set build failed if number of tests decreases?

Is there a way to mark a test run failed if the number of tests decreases in jenkins. We have a test report in JUnit.
For some reasons, sometimes the number of tests decreases. Often that are critical errors. Is there a way to say jenkins that in such case the test status should be red? (Maybe some plugin)
Thanks a lot for any hint!
There is a way.
Install the Groovy Postbuild Plugin and add a Post Build Step
Post-Build-Actions -> Add Step -> Groovy Postbuild
import jenkins.model.*
def currentBuild = manager.build
def totalCurrent
def totalPrevious
// evaluate test count of current Build
def result = currentBuild .getAction(hudson.tasks.junit.TestResultAction.class).result
if (result != null) {
totalCurrent = result.hasProperty( 'totalCount' ) ? result.totalCount : null
}
// evaluate test count of previous Build
result = currentBuild .previousBuild.getAction(hudson.tasks.junit.TestResultAction.class).result
if (result != null) {
totalPrevious = result.hasProperty( 'totalCount' ) ? result.totalCount : null
}
// fail the build if test count reduced
if(totalCurrent < totalPrevious) {
manager.buildFailure()
}
You'll may need to add some more nullsafe checks.
Even if there is a way to do this, it sounds like a bad idea considering the fact that your number of tests executed could actually decrease from build to build for legitimate reasons.
First I would try to find out why this is happening unexpectedly.

Inform Jenkins about failed TestNG tests from Gradle run task to mark build as Failed

For the reasons I cannot influence, there is the following mechanism on TestNG launching on the project.
In a word, it creates a new instance of TestNG, adds listeners, specifies classes and runs the tests. Then, all this dirty code is called from Gradle run task (which is actually empty and as far as I understood, simply calls the TestManager.main() method).
I removed the part of code just to show the main direction:
class TestManager {
static void main(String[] args) {
try {
runTests(args[0])
} catch (Exception e) {
e.printStackTrace()
System.in.read()
}
}
private static void runTests(Application app) {
TagsConfig.runs.each { run ->
if (run.execute) {
List<TagsSuite> suites = TagsConfig.suites
suites.each { suite ->
if (suite.execute) {
Reflections reflections = new Reflections("${app.packageName}.${suite.name}")
def classes = reflections.getSubTypesOf(${suite.name})
if (classes.size() > 0) {
TestNG testNG = new TestNG()
testNG.testClasses = classes
testNG.groupByInstances = true
testNG.outputDirectory = "testng-output"
testNG.addListener(new TestListenerAdapter())
testNG.addListener(new ExceptionListener())
testNG.addListener(new AllureTestListener())
if (TagsConfig.isSmoke) {
testNG.setGroups("smoke")
} else if (TagsConfig.isExtendedSmoke) {
testNG.setGroups("extended_smoke")
}
testNG.run()
}
}
}
}
}
}
So the test launch looks like this:
gradle clean
gradle run
I can not change the way the tests are started now and one of the problems is that the build in Jenkins is always Sucessful even if there are failed/broken/skipped tests.
So, I can get the number of failed tests from TestListenerAdapter, but how can I let Jenkins know that there were failed tests?
Maybe by returning an exit code from a Gradle run task or by installing some plugin that will check the count of failed tests in TestListenerAdapter?
For now, I'm setting a "FAILED_TESTS_COUNT" system property in onFinish() event and change the build status in pipeline if it is not 0, but this looks really dirty.
Jenkins 2.89.3
Gradle 3.5.1
TestNG 6.9.8
I personally use the Jenkins Text Finder post-build action. In the above example, it looks for a pattern in the standard output and set the build to "unstable" (orange) when it's found.
In your case, just uncheck both checkboxes and find the according matching pattern.

How to handle exceptions in a Jenkins Job DSL seed job?

If I had a Git repository full of Job DSL groovy scripts and a typical seed job e.g.:
job('seed') {
//... scm, triggers etc.
steps {
dsl {
external 'jobs/**/*.groovy'
}
}
//... more config etc.
}
what happens if just one of the job dsl scripts throws an exception for some reason, for example:
job('deliberate-fail') {
throw new Exception("Arrrgggghhh")
}
Is it possible to handle this exception in the seed job or will the whole seed job fail?
If all but one would work - is it possible for the seed job to record an UNSTABLE result rather than FAILURE?
I don't really want one bad apple to spoil the bunch.
Based on Opal's suggestion to use a try-catch, I modifed the job to capture the exception and print an error to the console.
job('deliberate-fail') {
try {
throw new Exception("Arrrgggghhh")
} catch (Exception ex){
println("deliberate-fail job is [UNSTABLE]")
}
}
As I am currently using the Job DSL plugin (and not a Jenkins Pipeline script), I don't think Opal's suggestion to use "currentBuild.result = 'UNSTABLE'" was available to me. After a little digging I found I could use the Text-Finder plugin to search the console for the "[UNSTABLE]" error and change the seed job state accordingly.
job('seed-job') {
steps {
dsl {
external '**/*_jobdsl.groovy'
}
}
publishers {
textFinder(/[UNSTABLE]/, '', true, false, true)
}
}
A bit convoluted but it seems to work!

How to differentiate build triggers in Jenkins Pipeline

I'm hoping to add a conditional stage to my Jenkinsfile that runs depending on how the build was triggered. Currently we are set up such that builds are either triggered by:
changes to our git repo that are picked up on branch indexing
a user manually triggering the build using the 'build now' button in the UI.
Is there any way to run different pipeline steps depending on which of these actions triggered the build?
The following code should works to determine if a user has started the pipeline or a timer/other trigger:
def isStartedByUser = currentBuild.rawBuild.getCause(hudson.model.Cause$UserIdCause) != null
In Jenkins Pipeline without currentBuild.rawBuild access the build causes could be retrieved in the following way:
// started by commit
currentBuild.getBuildCauses('jenkins.branch.BranchEventCause')
// started by timer
currentBuild.getBuildCauses('hudson.triggers.TimerTrigger$TimerTriggerCause')
// started by user
currentBuild.getBuildCauses('hudson.model.Cause$UserIdCause')
You can get a boolean value with:
isTriggeredByTimer = !currentBuild.getBuildCauses('hudson.triggers.TimerTrigger$TimerTriggerCause').isEmpty()
Or, as getBuildCauses() returns an array, the array's size will work correctly with Groovy truthy semantics:
if (currentBuild.getBuildCauses('hudson.triggers.TimerTrigger$TimerTriggerCause')) {
The ability to get causes for a workflow run was released in version 2.22 (2018 Nov 02) to the Pipeline Supporting APIs Plugin. The feature was requested in JENKINS-41272.
A couple methods were added to the currentBuild global variable with that release:
getBuildCauses
Returns a JSON array of build causes for the current build
EXPERIMENTAL - MAY CHANGE getBuildCauses(String causeClass)
Takes a string representing the fully qualified Cause class and returns a JSON array of build causes filtered by that type for the current build, or an empty JSON array if no causes of the specified type apply to the current build
And an example from me submitting:
echo "${currentBuild.buildCauses}" // same as currentBuild.getBuildCauses()
echo "${currentBuild.getBuildCauses('hudson.model.Cause$UserCause')}"
echo "${currentBuild.getBuildCauses('hudson.triggers.TimerTrigger$TimerTriggerCause')}"
And the output:
[Pipeline] echo
[[_class:hudson.model.Cause$UserIdCause, shortDescription:Started by user anonymous, userId:null, userName:anonymous], [_class:org.jenkinsci.plugins.workflow.cps.replay.ReplayCause, shortDescription:Replayed #12]]
[Pipeline] echo
[]
[Pipeline] echo
[]
[Pipeline] End of Pipeline
Finished: SUCCESS
NOTE
There appears to be an issue with the currentBuild.getBuildCauses(type) when the type is a type of Cause contributed by a plugin. For example, currentBuild.getBuildCauses('org.jenkinsci.plugins.workflow.cps.replay.ReplayCause') fails with a java.lang.ClassNotFoundException. This was reported in JENKINS-54673 for the 2.22 version of the Pipeline: Supporting APIs (workflow-support) plugin. It is reportedly fixed in the 2.24 version.
I might be missing something, but you can achieve what you want easily by making use of the when directive:
pipeline {
agent any
stages {
stage('Always') {
steps {
echo "I am always executed"
}
}
stage('ManualTimed') {
steps {
echo "I am only executed when triggered manually or timed"
}
when {
beforeAgent true
anyOf {
triggeredBy 'TimerTrigger'
triggeredBy cause: 'UserIdCause'
}
}
}
stage('GitLabWebHookCause') {
steps {
echo "I am only executed when triggered by SCM push"
}
when {
beforeAgent true
triggeredBy 'GitLabWebHookCause'
}
}
}
}
You will find many similar useful examples for various use cases in the documentation of the when directive.
Edit:
thanks to Jean-Francois Larvoire's answer, I was able to figure out 'my trigger' GitLabWebHookCause I required for my use case.
#vitalii-blagodir:
Your answer works for detecting builds triggered by users and timers, but not by commits.
Instead, I found this to work in my case:
def isTriggeredByIndexing = currentBuild.getBuildCauses('jenkins.branch.BranchIndexingCause').size()
def isTriggeredByCommit = currentBuild.getBuildCauses('com.cloudbees.jenkins.GitHubPushCause').size()
def isTriggeredByUser = currentBuild.getBuildCauses('hudson.model.Cause$UserIdCause').size()
def isTriggeredByTimer = currentBuild.getBuildCauses('hudson.triggers.TimerTrigger$TimerTriggerCause').size()
The .size() suffix returns 0 if the object is missing, or 1 if it's present. This makes the result usable as a boolean.
For finding the object name to use, I found it convenient to display this in the log:
echo "# Build causes"
def buildCauses = currentBuild.buildCauses
def numCause = 0
for (cause in buildCauses) {
echo "${numCause++}: ${cause.shortDescription}" // Display a human-readable index and description
echo "${cause}" // Display the object class name. This allows knowing what names to use in getBuildCauses(name) calls below.
}
Finally, if the goal is to abort a pipeline build in specific cases, then the test must be done before the beginning of the pipeline.
For example, we had a problem with the branch indexing triggering extra useless builds. This was fixed by adding this before the pipeline:
// Avoid useless buils: The branch indexing should only trigger the initial build of a new branch.
def isTriggeredByBranchIndexing = currentBuild.getBuildCauses('jenkins.branch.BranchIndexingCause').size()
if (isTriggeredByBranchIndexing && currentBuild.previousBuild) { // Then it's not the initial build.
echo "# Reindexing a branch already built. It is useless to rebuild it now. Aborting."
currentBuild.result = 'SUCCESS' // Make sure the build is not displayed in red in the Jenkins UI.
return // Abort before the pipeline even starts. (Inside the pipeline, this would only abort one stage.)
}
I think that the answers here are incomplete and do not provide an actual ready to use answer. Here's my code to get it working:
import com.cloudbees.groovy.cps.NonCPS
#NonCPS
def isStartedByTimer() {
def buildCauses = currentBuild.rawBuild.getCauses()
echo buildCauses
boolean isStartedByTimer = false
for (buildCause in buildCauses) {
if ("${buildCause}".contains("hudson.triggers.TimerTrigger\$TimerTriggerCause")) {
isStartedByTimer = true
}
}
echo isStartedByTimer
return isStartedByTimer
}
// [...]
// Other pipeline stuff
script {
isStartedByTimer()
}
When started by user:
00:00:01.353 [hudson.model.Cause$UserIdCause#fa5cb22a]
[Pipeline] echo
00:00:01.358 false
When started by timer:
00:00:01.585 [hudson.triggers.TimerTrigger$TimerTriggerCause#5]
[Pipeline] echo
00:00:01.590 true
Note: the NonCPS decorator is needed because otherwise the next non-script step will throw.
Assuming the two different build causes are "timer" and "push" (to a git repo), you can add the following stage to your Jenkinsfile (in a declarative Jenkins pipeline) to make use of getBuildCauses():
pipeline {
stages {
stage('preparation') {
steps {
script {
// get build cause (time triggered vs. SCM change)
def buildCause = currentBuild.getBuildCauses()[0].shortDescription
echo "Current build was caused by: ${buildCause}\n"
// e.g. "Current build was caused by: Started by GitHub push by mirekphd"
// vs. "Started by timer"
}
}
}
}
}
Then I can decide whether to perform certain stages conditionally (depending on the build cause). For example, pulling a docker base image and inspecting for changes in system libraries (likely security updates) should be done periodically, regardless of whether there was a source code change or not.
We can use "BUILD_CAUSE" variable for getting the information about who initiated the run
for [jenkins-pipeline] you may use
currentBuild.rawBuild.getCauses()
(see github.com/jenkinsci/pipeline-examples/blob/master/… for more details)
There was a similar requirement, where user detail who triggered the build should be there in success / failure notification. The job was already had time based triggered, hence could not use wrap([$class: 'BuildUser']) directly.
I used below step, which print username if the job is triggered manually or timer triggered. So, I used this:
pipeline {
agent any
stages {
stage('Test') {
steps {
script{
env.buildCauses = currentBuild.rawBuild.getCauses()
if (buildCauses.contains("hudson.triggers.TimerTrigger")){
env.builduser = "TimerTrigger"
} else {
wrap([$class: 'BuildUser']) {
env.builduser = "${BUILD_USER}"
}
}
}
echo "Initiated by: ${env.builduser}"
}
}
}
}

Force Jenkins to run build on a different node in a label everytime?

I have a project setup that runs on the MISC label everytime it builds, and it had been working great.
However, I've encountered a problem where, if the previous build on one machine fails, it can cause further builds on that machine to fail as well. It would be fine on another slave.
We will like the job to run on a different node in the label, if possible, in case this happens again in the future.
Thanks,
I've run into similar problems. My solution is to take the node offline if certain types of errors happen.
I'm using this plugin to run a groovy script after every build ttps://wiki.jenkins-ci.org/display/JENKINS/Global+Post+Script+Plugin
My script looks like this
import jenkins.model.Jenkins
import hudson.model.*
import hudson.slaves.OfflineCause
// this script is designed to be called by https://wiki.jenkins-ci.org/display/JENKINS/Global+Post+Script+Plugin
if (BUILD_RESULT == "FAILURE") {
println("The job failed. The build failure cause will be checked.")
job = Jenkins.instance.getItemByFullName(JOB_NAME)
build = job.getBuildByNumber(BUILD_NUMBER.toInteger())
def buildLog = build.log
if (buildLog.contains("something indicating an unrecoverable error")) {
Node buildNode = build.getBuiltOn();
// Never set master offline
if (Hudson.getInstance() != buildNode) {
println("This is fatal. The node ${NODE_NAME} is being taken offline.")
buildNode.toComputer().setTemporarilyOffline(true, OfflineCause.create(new OfflineMessage()));
} else {
println("The error is marked to take the node offline, but the node is not being taken offline because it is the master")
}
}
}
class OfflineMessage extends org.jvnet.localizer.Localizable {
def message
OfflineMessage() {
super(null, null, [])
def timestr = new Date().format("HH:mm dd/MM/yy z", TimeZone.getDefault())
this.message = "This node was taken offline because of a failed job at " + timestr
}
String toString() {
this.message
}
String toString(java.util.Locale l) {
toString()
}
}

Resources