Jenkins says a build succeeded or failed, but can it identify the exact commit (and author!) that caused a build to fail?
This issue would seem to indicate no.
Edit: From my exchange with Pace:
What I see is "include culprits", which is everyone since the last
build. I don't want that. I want THE culprit, with Jenkins doing the
binary search. If Jenkins does two builds 10 commits apart, I don't
want 10 possible culprits, I want it to find the one.
I haven't yet heard how to do that.
That page was talking about the "find bugs" plugin, not the normal build cycle. Depending on how things are setup Jenkins can identify the exact commit and author that caused a failure. If Jenkins has the appropriate source control plugins installed and is configured to know about the repository the build is tied to then for every build it will list the changes since the last build.
In addition, Jenkins has the capability in many of its reporting plugins to blame the faulty committer. It can, for example, send an e-mail notification on a failed build to the developer that made the faulty commit.
However, many setups make it difficult for Jenkins to know. For example, if Jenkins is configured for daily builds then there are likely many commits which could have caused the issue. It's also possible that Jenkins isn't configured to know about the source control repository, or there is no source control repository. All of these issues could cause Jenkins to be unable to identify the build breaker.
Specifically for e-mailing faulty committers you can use the email-ext plugin which has options to send e-mails to everyone that committed since the last successful build.
For a humorous take on this subject check out this approach.
I think what you're asking for is impossible in some cases. Determining who the culprit is requires insight into conflict resolution that only a human can decide. Even still, sometimes a manager has to be involved in order to arbitrate. Say for instance you get 3 commits (A,B,C) that depend on a preexisting definition. However, another commit (D) modifies the behavior of that function. Which do you revert? Perhaps it's the business plan to keep A,B,C as is and return D to its original state. The opposite, modifying A,B,C to adapt to the changes of D, is also possible.
In the cases where a machine can handle the arbitration, it is the responsibility of unit tests, and static analyzers, to determine the culprit (although still imperfect). Static analyzers sometimes have built in features that email the person who committed a violation. Unit tests can be written that notify teams or team members responsible for a failed test. Both could work in the same way that identifies who was the last committer on a particular line that failed. Still, if it is a problem with linking, then perhaps some members should be associated with the particular makefile.
Related
I am trying to improve our continuous integration process using Jenkins and our source control system (currently svn, but git soon).
Maybe I am thinking about this overly complicated, or maybe I have not yet seen the right hints.
The process I envisioned has three steps and associated roles:
one or more developers would do their job and ultimately submit the code changes for the actual software ("main software") as well as unit tests into source control (git, or something else). Jenkins shall build the software, run unit tests and perhaps some other steps (e.g. static code analysis). If none of this fails, the work of the developers is done. As part of the build, the build number is baked into the main software itself as part of the version number.
one or more test engineers will subsequently pickup the build and perform tests. Some of them may be manual, most of them are desired to be automated/scripted tests. These shall ultimately be submitted into source control as well and be executed through the build server. However, this shall not trigger a new build of the main software (since there is nothing changed). If none of this fails, the test engineers are done. Note that our automated tests currently take several hours to complete.
As a last step, a project manager authorizes release of the software, which executes whatever delivery/deployment steps are needed. Also, the source of the main software, unit tests, and automated test scripts, the jenkins build script - and ideally all build artifacts ("binaries") - are archived (tagged) in the source control system.
Ideally, developers are able to also manually trigger execution of the automated tests to "preview" the outcome of their build.
I have been unable to figure out how to do this with Jenkins and Git - or any other source control system.
Jenkin's pipelines seem to assume that all steps are carried out in sequence automatically. It also seems to assume that committing code into source control starts at the beginning (which I believe is not true if the commit was "merely" automated test scripts). Triggering an unnecessary build of the main software really hurts our process, as it basically invalidates and manual testing and documentation, as it results in a new build number baked into the software.
If my approach is so uncommon, please direct me how to do this correctly. Otherwise I would appreciate pointers how to get this done (conceptually).
I will try to reply with some points. This is indeed conceptually approach as there are a lot of details and different approaches too, this is only one.
You need git :)
You need to setup a git branching strategy which will allow to have multiple developers to work simultaneously, pushing code and validating it agains the static code analysis. I would suggest that you start with Git Flow, it is widely used and can be adapted to whatever reality you do have - you do not need to use it in its pure state, so give some thought how to adapt it. Fundamentally, it will allow for each feature to be tested. Then, each developer can merge it on the develop branch - from this point on, you have your features validated and you can start to deploy and test.
Have a look at multibranch pipelines. This will allow you to test the several feature branches that you might have and to have different flows for the other branches - develop, release and master - depending on your deployment needs. So, when you have a merge on develop branch, you can trigger testing or just use it to run static code analysis.
To overcome the problem that you mention on your second point, there are ways to read your change sets on the pipeline, and in case the changes are only taken on testing scripts, you should not build your SW - check here how to read changes, and here an example of how to read changes and also to prevent your pipeline to build all the stages according to the changes being pushed to git.
In case you still have manual testing taking place, pipelines are pausable which means that you can pause the pipeline asking for approval to proceed. Before approving, testers should do whatever they have to, and whenever they are ready to proceed, just approve the build to proceed for the next steps.
Regarding deployments authorization, it is done the same way that I mention on the last point, with the approvals, but in this case, you can specify which users/roles are allowed to approve that step.
Whatever you need to keep from your builds, Jenkins has an archive artifacts utility. Let me just note that ideally you would look into a proper artefact repository such as Nexus.
To trigger manually a set of tests... You can have a manually triggered job on Jenkins apart from your CI/CD pipeline, that will only execute the automated tests. You can even trigger this same job as one pipeline stage - how to trigger another jobs
Lastly let me say that the branching strategy is the starting point.
Think on your big picture, what SDLC flows you need to have and setup those flows on your multibranch pipeline. Different git branches will facilitate whatever flows you need within the same Jenkinsfile - your pipeline definition. It really depends on how many environments you have to deploy to and what kind of steps you need.
The most aggressive build retention policy one can set for pull request builds is described in "Clean up pull request builds"
a policy that keeps a minimum of 0 builds
Still, it means that successful PR builds (with artifacts no one will ever need) will be deleted only after the next automatic retention cleanup - usually the next day, but in reality it results in nearly two days worth of no longer needed builds.
In our particular case it seems to be desirable to find a way to clean successful PR builds ASAP due to their frequency and artifact's sheer size that may periodically strain our not yet fully organized infrastructure dedicated to PR handling (it will be significantly improved, but not as soon as we'd like to, and those successful PR builds would still remain no less of a dead weight).
And as far as I see the only way to do it would be to delete builds manually.
While it is not too difficult to implement, I'd still like to check whether there is a simpler standard way to delete successful PR builds automatically.
P.S.: There is one particularity in our heavily customized build process - we have multiple dependent artifacts. Like create A, use it to build B, create C to test B... So trying not to Publish artifacts on overall successful build with custom condition like it is suggested below is not exactly feasible.
Let's look at the problem from a different perspective: The problem isn't that builds are retained, the problem is that your PR builds are publishing artifacts.
You can make the Publish Artifacts steps conditional so that they don't run during PRs. Something like and(succeeded(), ne(variables['Build.Reason'], 'PullRequest')) will make the task only run if it's not a PR.
I don't understand what Promoted build really is and how it works. Can someone please explain to me like to a 10 years old kid. If you can provide some sample examples would help me a lot.
Thanks
In a typical software developing organization with CI system, there are 10's or 100's of continuous builds daily. Only one of those builds (usually the latest stable) is selected and "promoted" to be a Release Candidate (RC), which goes to the next quality gate - usually the QA department. Then, they select one of those RC's (others are dropped) and again, "promote" it to the next level - either to staging environment, validation etc. Then, finally... one of these builds is again "promoted" to be an official release.
Why is that important?
Visibility: You would want to distinguish many "regular", continuous builds from few, selected "RC" builds.
Retention: If you commit often (which is the best practice), you will likely get lot of daily builds, and would like to implement a retention policy (e.g. only keep last 100 builds or only builds from the last 7 days). You will then want to make sure promoted builds (RCs) are locked against retention. This is mostly important if you deploy binaries to customers, and may need the exact binary to reproduce an escaping bug in the future (though you still have the source code in the repository, I've seen cases where escaping bugs relate to the build process rather than the source code - due to rapid changes to the build process, or time-of-build sensitive data like digital signatures).
Permissions: you may want to prevent access to builds with "half baked" features from non-developers.
Binary Repositories: you may want to publish only meaningful builds to an external binary repository.
Builds in Jenkins can be "promoted" either manually or automatically, using plugins like Promoted Builds Plugin. You can also create your entire "promotion" workflow using pipeline scripts. Here's an example:
a "Continuous" job that polls SCM and builds on every change. It has a retention policy to keep only the last 50 builds. Access is restricted only to developers;
a "Release Candidates" job that copies artifacts from a manually selected build (using parameters). Access is allowed to QA testers;
a "Releases" jobs that copies artifacts from a manually selected RC. Access is allowed to the entire organization. Binaries are released to external/public repository.
I hope this answers your question :-)
We set up TFS 2013 recently and tried to set up gated check-in. In our experiment, it correctly failed the build and rejected the bad check-in. However, both the build notification tray icon and the build tab on the TFS web access show the failure, and it is this way for all users. This will make everyone think "the" build is broken when it's just one person's "gate." It will skew metrics, too.
A) Why would this be the default behavior? It seems very counter-productive and counter-intuitive. [Or maybe this isn't the default behavior and our setup is hosed?]
B) Is there a configuration for the build tab where a rejected gated check-in/build is visible only to the person who broke it?
C) How can we make the build notifier tool ignore gated failures?
a) This is a public build. Why should it be failing. If the gated build is failing it shows a lack of care by the developers. Are they checking in any old thing? Are they running their unit tests first? It really sounds like you have a quality issue there.
That is why the gated-builds are still public builds.
b) no - Anyone can however send a Private build to the build server if they think that there may be an issue that they can't catch locally..
c) Each user can uncheck the monitoring of the gated build. I would however suggest that they should concentrate on not failing the build however rather than sweeping it under the carpet...
A filed build, even a gated one, should be the exception and not the rule...
MrHinsh answered my question from a largely philosophical standpoint, and that led me to this article:
[http://adamstephensen.com/2012/11/01/gated-checkins-mask-dysfunction/]
I agree in principle with the article. However, because we are attacking agile processes in baby steps, we still want this extra safety check for the time being.
Below isn't the exact solution we wanted, but it is a hybrid of the approach in the article and my original question. It doesn't force gated check-ins, but it allows them to be done, giving risky check-ins a safety net. Code quality beyond build-ability will be addressed with other mechanisms outside the scope of this question.
The compromise solution is that "vanilla" changes would be checked in normally (after local testing, of course), and then the CI build would kick in. Fixing broken builds would be high priority to be addressed immediately.
But anything out of the ordinary (configuration changes, referencing new/different external dependencies, or just anything a developer is unsure of) would use a private gated check-in, but not on the main build, because we don't want everyone to be tricked into thinking the build is broken upon rejections. This build definition is a clone of the main build definition, but with the addition that upon successful check-in, it chains a build of the main build definition using this:
[How to chain TFS builds?
While the main build is redundant in most cases (race conditions may apply), it solves the immediate need.
I am evaluating the Hudson build system for use as a centralized, "sterile" build environment for a large company with very distributed development (from both a geographical and managerial perspective). One goal is to ensure that builds are only a function of the contents of a source control tree and a build script (also part of that tree). This way, we can be certain that the code placed into a production environment actually originated from our source control system.
Hudson seems to provide an ant script with the full set of rights assigned to the user invoking the Hudson server itself. Because we want to allow individual development groups to modify their build scripts without administrator intervention, we would like a way to sandbox the build process to (1) limit the potential harm caused by an errant build script, and (2) avoid all the games one might play to insert malicious code into a build.
Here's what I think I want (at least for Ant, we aren't using Maven/Ivy right now):
The Ant build script only has access to its workspace directory
It can only read from the source tree (so that svn updates can be trusted and no other code is inserted).
It could perhaps be allowed read access to certain directories (Ant distribution, JDK, etc.) that are required for the build classpath.
I can think of three ways to implement this:
Write an ant wrapper that uses the Java security model to constrain access
Create a user for each build and assign the rights described above. Launch builds in this user space.
(Updated) Use Linux "Jails" to avoid the burden of creating a new user account for each build process. I know little about these though, but we will be running our builds on a Linux box with a recent RedHatEL distro.
Am I thinking about this problem correctly? What have other people done?
Update: This guy considered the chroot jail idea:
https://www.thebedells.org/blog/2008/02/29/l33t-iphone-c0d1ng-ski1lz
Update 2: Trust is an interesting word. Do we think that any developers might attempt anything malicious? Nope. However, I'd bet that, with 30 projects building over the course of a year with developer-updated build scripts, there will be several instances of (1) accidental clobbering of filesystem areas outside of the project workspace, and (2) build corruptions that take a lot of time to figure out. Do we trust all our developers to not mess up? Nope. I don't trust myself to that level, that's for sure.
With respect to malicious code insertion, the real goal is to be able to eliminate the possibility from consideration if someone thinks that such a thing might have happened.
Also, with controls in place, developers can modify their own build scripts and test them without fear of catastrophe. This will lead to more build "innovation" and higher levels of quality enforced by the build process (unit test execution, etc.)
This may not be something you can change, but if you can't trust the developers then you have a larger problem then what they can or can not do to your build machine.
You could go about this a different way, if you can't trust what is going to be run, you may need a dedicated person(s) to act as build master to verify not only changes to your SCM, but also execute the builds.
Then you have a clear path of responsibilty for builds to not be modified after the build and to only come from that build system.
Another option is to firewall off outbound requests from the build machine to only allow certain resources like your SCM server, and your other operational network resources like e-mail, os updates etc.
This would prevent people from making requests in Ant to off the build system for resources not in source control.
When using Hudson you can setup a Master/Slave configuration and then not allow builds to be performed on the Master. If you configure the Slaves to be in a virtual machine, that can be easily snapshotted and restored, then you don't have to worry about a person messing up the build environment. If you apply a firewall to these Slaves, then it should solve your isolation needs.
I suggest you have 1 Hudson master instance, which is an entry point for everyone to see/configure/build the projects. Then you can set up multiple Hudson slaves, which might very well be virtual machines or (not 100% sure if this is possible) simply unprivileged users on the same machine.
Once you have this set up, you can tie builds to specific nodes, which are not allowed - either by virtual machine boundaries or by Linux filesystem permissions - to modify other workspaces.
How many projects will Hudson be building? Perhaps one Hudson instance would be too big, given the security concerns you are expressing. Have you considered distributing the Hudson instances out - one per team. This avoids the permission issue entirely.