Occasionally Jenkins Configuration Matrix jobs report failure on success - jenkins

General Symptoms
We use Jenkins to build & test on multiple platforms. We use the Configuration Matrix plugin to help with this. Occasionally (increasingly often) Jenkins will mark the Configuration Matrix master job or subjobs as failed when the jobs seem to have succeeded (the console output reports success). We have no idea why this is happening. Any suggestions?
Some clues:
The exit code of the Jenkins job's script is not relevant. We've had test sub-jobs that simply exit 0 and they can still exhibit this bad behavior.
The failures are bunchy. They seem to come in groups.
The failures tend to effect our Windows platforms more heavily but the issue occurs on our Mac nodes as well.
The failures clump on a single node for a time but they are not exclusive to any 1 node.
We've noticed that the failures happen most often with load, particularly failed sub-jobs are started later than their successful sibling sub-jobs (often after other siblings have already completed).
We suspect that the sub-jobs are somehow considered completed by the master before they actually complete. Since they're not done the master sees them as failed. Later the sub-job really does complete (thus the console output says Success). We suspect this because we've added comments to "failed" jobs which look incomplete only to return later and see additional console logs.

It turns out that there is a bug in either the Jenkins "Set Build Name" plugin or the "Configuration Matrix" plugin. When you use them both you're subject to a few bugs. First is the symptoms I described above. Second is that the names set on builds can be wrong (race conditions mean that the wrong name can be applied).
There is a ticket open against Jenkins here. Unfortunately there isn't currently a posted work-around. We may simply stop using one of the plugins.

Related

Jenkins pipeline TestNG publisher - test configuration failure as unstable?

We're migrating from "classic" Jenkins to Kubernetes with pipelines - for now we're using scripted pipeline, but I'm not sure whether declarative ones would solve this issue for us.
On "old" Jenkins, when the test configuration failed the build was marked as unstable. I'm struggling to do it with pipelines - because by default the failed test configuration leaves the build result unchanged, which shows the successful build in the end.
The documentation mentions the flag failureOnFailedTestConfig, but that one flips the build all the way to FAILURE, which is not what we want (we leave that to compilation issues).
I would be able to work around it and switch any non-success to UNSTABLE, but that is not possible. When the build is FAILURE already, there is no way I'm aware of to lower the result to UNSTABLE. But when it's success, I have no idea, whether some test configurations failed or not.
I also checked whether the step produces any return value, perhaps I can work with that... but no, there are no stats I can use in the pipeline afterwards.
The failure* flags do not actually cause the step to fail, so I can't handle it with catchError as I like - it merely marks the result so I'm out of options.
It seems like some unstableOnFailedTestConfig flag is missing there. Or is there any other way how to get to the same effect? I've been searching the Internet for weeks now, but I couldn't find anything about this problem.
Checking skipped test count is not an option for us, we have some tests that are expected to be skipped based on environment.

How can aborted builds be ignored concerning the Jenkins job status preview?

For internal reasons, one of my jobs is able to run concurrently, but new builds abort themselves if another build is already running (disabling concurrency doesn't help, since I don't want new jobs to be scheduled for execution once the current build is done).
However, this behaviour is detrimental to the job status preview (the colored ball next to the job name when inside the job's parent folder). It often shows the status as "aborted", which is undesirable - I want to view the latest running build as the source of the job status.
I tried deleting aborted builds from within their own execution, but that's unfortunately neither trivial nor stable, and thus not suitable for this situation. I could probably get a workaround running that deletes them from a separate job, but that's not ideal either.
Anyway, I'm now hoping that I can just tell Jenkins to ignore "aborted" builds in the calculation of the job preview. Unfortunately, I wasn't able to find a setting or plugin that allows me to do this.
Is this possible at all? Or do I need to find another way?
Would something like this help?
https://plugins.jenkins.io/build-blocker-plugin/
I haven't used it myself but it supports blocking builds completely if a build is already running.

Process leaked file descriptors error when triggering builds

I have the following error when running a Jenkins task
Process leaked file descriptors
After reading this post Stack Overflow post
I ended up understanding the problem (I think). I have a build which triggers many tasks. It can for example trigger 4 builds. However, sometimes when one dies, the other dies. I guess it's because of the link that exist between the parent and childs builds like explained there Jenkins site
The solution however is with command lines and I don't use a command line to trigger my build.
Is there a way to have a similar solution when triggering a build with the "Trigger/call builds on other projects" action.

Can Jenkins (continuous build) pinpoint the commit that caused a build failure?

Jenkins says a build succeeded or failed, but can it identify the exact commit (and author!) that caused a build to fail?
This issue would seem to indicate no.
Edit: From my exchange with Pace:
What I see is "include culprits", which is everyone since the last
build. I don't want that. I want THE culprit, with Jenkins doing the
binary search. If Jenkins does two builds 10 commits apart, I don't
want 10 possible culprits, I want it to find the one.
I haven't yet heard how to do that.
That page was talking about the "find bugs" plugin, not the normal build cycle. Depending on how things are setup Jenkins can identify the exact commit and author that caused a failure. If Jenkins has the appropriate source control plugins installed and is configured to know about the repository the build is tied to then for every build it will list the changes since the last build.
In addition, Jenkins has the capability in many of its reporting plugins to blame the faulty committer. It can, for example, send an e-mail notification on a failed build to the developer that made the faulty commit.
However, many setups make it difficult for Jenkins to know. For example, if Jenkins is configured for daily builds then there are likely many commits which could have caused the issue. It's also possible that Jenkins isn't configured to know about the source control repository, or there is no source control repository. All of these issues could cause Jenkins to be unable to identify the build breaker.
Specifically for e-mailing faulty committers you can use the email-ext plugin which has options to send e-mails to everyone that committed since the last successful build.
For a humorous take on this subject check out this approach.
I think what you're asking for is impossible in some cases. Determining who the culprit is requires insight into conflict resolution that only a human can decide. Even still, sometimes a manager has to be involved in order to arbitrate. Say for instance you get 3 commits (A,B,C) that depend on a preexisting definition. However, another commit (D) modifies the behavior of that function. Which do you revert? Perhaps it's the business plan to keep A,B,C as is and return D to its original state. The opposite, modifying A,B,C to adapt to the changes of D, is also possible.
In the cases where a machine can handle the arbitration, it is the responsibility of unit tests, and static analyzers, to determine the culprit (although still imperfect). Static analyzers sometimes have built in features that email the person who committed a violation. Unit tests can be written that notify teams or team members responsible for a failed test. Both could work in the same way that identifies who was the last committer on a particular line that failed. Still, if it is a problem with linking, then perhaps some members should be associated with the particular makefile.

Jenkins conditional project

The projects concerned in my linked solution are the initialise database, import database and export database.
If the initialisation succeeds then 'export' should be called. If it fails then 'import' should be called.
dbinit
/ \
export import
Logically this is simple enough; however, due to my lack of Jenkins experience, it's causing considerable grief.
I've looked at the following plugins:
Conditional BuildStep - this basically adds an 'if' statement to the build. I investigated this with the idea that the export/import projects can be collaborated into one project, using the condition to decide which course of action to take. This could work if I was able to check the condition of the upstream build (success or failure)
Post Build Task - executes a shell script based on the log output. This would go in the dbinit project. The problem with this is that I would like import/export jobs to be separated from dbinit. This would work IF I could call another job from the shell
Parameterized Trigger - This could be perfect. This would basically solve the problem by deciding which job to run based on the status of that build. However, at the time of writing, this plugin does not perform correctly with Jenkins version 1.481 or above. This problem was raised a month ago (see error link, dated the 12th Sep 2012) and has still not been fixed, therefore I am still looking for another solution.
Can anyone tell me how to overcome the identified problems with any of these plugins?
Or is there another route that I've overlooked?
Many Thanks,
Rory
In case jenkins 1.481 or later doesn't give you anything you need, and Parametrized Trigger works, then simply use 1.480, and wait 'till problem gets fixed (it is sure to get fixed, that's so popular plugin).
Would the Build Result Trigger help you?
With the BuildResultPlugin, you configure jobB to monitor jobA build result. A build is scheduled if there is a new build result matches your criteria (unstable, failure, ...)

Resources