Team Build - Automatically Reenable Agent After Becoming Unreachable - tfs

We have a central Team Foundation Server (2008) deployment where all projects get stored. Each project sets up their own build server running Team Build to do their own automated builds.
Here's the problem. When a connection error is detected between TFS and the Team Build server, it moves the build agent's status to 'unreachable' which means it's not available for any subsequent builds. Our servers have scheduled reboot windows and when TFS can't communicate with those agents (or vice-versa) during that window, it moves the agent to 'unreachable'. Every morning we come in and find that we have to manually go in and reenable the agent.
Is it possible to have the team build agents come back online as soon they're available again? Or perhaps write a script that brings them back online automatically?

In TFS2008, the AT should ping the unreachable build agent on a regular period (15-30 minutes, can't remember the interval at the moment) to see if it is back up. Are you not seeing this behaviour - do yours stay unreachable?
That said, it is possible to write a bit of .NET code that you could run periodically to set the status of the build agent. Alternatively you could run it as a scheduled task after start-up on the windows machine that is running as your build agent to go talk to TFS and set it's status back to good.
To write the code, you want to use the TFS Build API (Microsoft.TeamFoundation.Build.Client). In particular you want to look at the IBuildAgent. Get the appropriate one from the IBuildServer, change the status and then call buildAgent.Save().

I've also seen that problem myself - Here's a Powershell script that will iterate all build agents on all Team Projects and enabled them. Note that the agents will be updated to enable immediately regardless of whether they are valid (so if the build server is still down when the script runs - as soon as a build triggers - it will revert to Unreachable)
$serverName = "TFSRTM08"
[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.TeamFoundation.Client")
[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.TeamFoundation.WorkItemTracking.Client")
[void][System.Reflection.Assembly]::LoadWithPartialName("Microsoft.TeamFoundation.Build.Client")
$tfs = [Microsoft.TeamFoundation.Client.TeamFoundationServerFactory]::GetServer($serverName)
$wit = $tfs.GetService("Microsoft.TeamFoundation.WorkItemTracking.Client.WorkItemStore")
$bld = $tfs.GetService("Microsoft.TeamFoundation.Build.Client.IBuildServer")
$prjs = $wit.Projects
foreach ($proj in $prjs)
{
$agents = $bld.QueryBuildAgents($proj.Name)
foreach ($agent in $agents)
{
if ($agent.Status -ne "Enabled")
{
Write-Output "Enabling Build Agent: " $agent.Name " on Team Project: " $proj.Name " status was " $agent.Status
$agent.Status = "Enabled"
$agent.Save()
}
}
}

Related

Provide access to workspace files in log while Jenkins Build is running

We want to have a pipeline, that builds our application than pauses and after the built application was manually tested resumes and delivers the tested application.
So I came up with the idea of using a Input to pause the pipeline like this:
...
stage ("Build"){
// build application here and archive it as artefact
}
timeout(time:5, unit:'DAYS') {
input message:'Approve deployment?'
}
stage ("Deliver"){
// deliver the built application
}
The tester got 5 days to test the application then resumes the pipeline and it gets delivered.
My problem here is, while the build is still running, the tester can't yet access the artifact on the status page.
So is there any way to provide any kind of Download-Link in the log output, that points to the application file I archived in the build stage?
Or is there any other good way to achieve this build->pause->test->resume->deliver workflow in one single pipeline job?
Automation of the test in the pipeline is not an option, as the application needs to be manually flashed on some hardware.
This will get you to the artifacts list (you can add more after artifact if you want the link to point to the specific file):
...
timeout(time:5, unit:'DAYS') {
echo "Archive available for download: ${env.BUILD_URL}artifact"
input message:'Approve deployment?'
}
This requires the value of JENKINS_URL to be set in the system configuration: From your Jenkins home, click on Manage Jenkins --> Configure System and look for the Jenkins URL under Jenkins Location
If you don't have admin access to Jenkins, and JENKINS_URL is not set, you could fudge this with something like
https://known-jenkins-url/job/${JOB_NAME}/${BUILD_NUMBER}/artifact

Establish relationship between two Jenkins Jobs available on different Jenkins server

I am building Jenkins for Test / QA automation scripts, lets name it TEST_JOB. For application, I have application source code Jenkins build, name it DEV_JOB.
My scenario is when DEV_JOB completes execution (successfully), execute TEST_JOB immediately. I am aware about setting up project upstream / downstream [ Build after other projects are built ] to accomplish this task. But here, Problem is DEV_JOB is on different server than TEST_JOB. Due to which, TEST_JOB fails to recognize DEV_JOB.
Now, how would I achieve this scenario?
You can use Jenkins API for remote trigger of Job.
Say you have job on DEV_JOB on JENKINS_1, add a penultimate step(or upstream/downstream project having only this step) which invokes TEST_JOB using remote API call of JENKINS_2 server.
Example command would be
$(curl --user "username:password" "http://JENKINS_2/job/TEST_JOB/buildWithParameters?SOMEPARAMETER=$SOMEPARAMETER")
username:password is a valid user on JENKINS_2.
Avoid using your own account here but rather a 'build trigger' account that only has permissions to start those jobs.

Jenkins - make agents wait for other agent to finish

I'm new to Jenkins and I'm trying to setup a project which will use few build executors.
The flow shall be as follows:
two build executors with webservice label return their IP addresses and wait for the third build executor to finish its job
third build executor with tester label collects those IP addresses and performs some long running job (e.x. sends HTTP requests to the webservices deployed on those two agents)
How to achieve such behavior in Jenkins?
I've found that when an build executor finishes its job it is immediately released and I don't know how to make it wait for other build executors to finish their jobs.
Edit:
I forgot to mention that I want the build executors with the webservice label to be reserved (not available for other jobs) till the build executor with the tester label will finish its long-running job.
Also all these build executors should be on separate slaves each. That means each slave has only one build executor.
I've finally managed to do this using Pipeline and below script:
node('webservice') {
def firstHostname = getHostname()
node('webservice') {
def secondHostname = getHostname()
node('tester') {
println 'Running tests against ' + firstHostname + ' and ' + secondHostname
// ...
}
}
}
def getHostname() {
sh 'hostname > output'
readFile('output').trim()
}
It acquires two build executors with webservice label. I'm getting their hostnames (I'm using them instead of the IP addresses) and pass them to the build executor with a tester label. Finally the tester runs some long-running tests.
Those two webservice build executors are blocked till the tester finishes its job, and no other project may use them during that time.
As Alex O mentioned, you can configure the master and slave relationship between the projects /executors inside the Jenkins projects /executors. There is option for that, "Build Triggers" -> Build after other projects are built
or use plugin to achieve it
https://wiki.jenkins-ci.org/display/JENKINS/Parameterized+Trigger+Plugin
or
https://wiki.jenkins-ci.org/display/JENKINS/Join+Plugin
What you actually want is probably that your job uses three slaves at the same time.
Re-thinking the setup in that way, it won't be necessary to consider the collection of IPs and the subsequent usage of the slaves as three different steps that must be aligned in some way.
Unfortunately, Jenkins does not support using multiple slaves for one build out-of-the box, but it will be possible to achieve what you want e.g. using the Multijob plugin and the Join plugin that Aaron mentioned already.
See also this question for information on how to use two slaves at the same time.

How do I cancel another build when a new build starts in Jenkins?

I want to cancel a build for A when a new build for B starts. How do I do that using jenkins?
Is there a plugin that allows me to do that?
At the moment there is no automated solution to stopping builds that are currently building. However it's not a bad idea to let the build finish, if it fails you will at least know why and where the error is. After that build finished your newest build in the que will be build.
However there is a manual way of stopping a build that is currently ongoing. Simply open the project and click on red cross next to the build that is running. Below a picture of how it looks for me.
You can do it using HTTP request in a shell step, e.g.:
curl http://$JENKINS_URL/job/A/lastBuild/stop
or using Groovy system build step, e.g.:
job = Jenkins.instance.getItem('A')
for (build in job.builds) {
if (build.isBuilding()) {
build.doStop();
}
}

How to stop an unstoppable zombie job on Jenkins without restarting the server?

Our Jenkins server has a job that has been running for three days, but is not doing anything. Clicking the little X in the corner does nothing, and the console output log doesn't show anything either. I've checked on our build servers and the job doesn't actually seem to be running at all.
Is there a way to tell jenkins that the job is "done", by editing some file or lock or something? Since we have a lot of jobs we don't really want to restart the server.
I had also the same problem and fix it via Jenkins Console.
Go to "Manage Jenkins" > "Script Console" and run a script:
Jenkins .instance.getItemByFullName("JobName")
.getBuildByNumber(JobNumber)
.finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));
You'll have just specify your JobName and JobNumber.
Go to "Manage Jenkins" > "Script Console" to run a script on your server to interrupt the hanging thread.
You can get all the live threads with Thread.getAllStackTraces() and interrupt the one that's hanging.
Thread.getAllStackTraces().keySet().each() {
t -> if (t.getName()=="YOUR THREAD NAME" ) { t.interrupt(); }
}
UPDATE:
The above solution using threads may not work on more recent Jenkins versions. To interrupt frozen pipelines refer to this solution (by alexandru-bantiuc) instead and run:
Jenkins.instance.getItemByFullName("JobName")
.getBuildByNumber(JobNumber)
.finish(
hudson.model.Result.ABORTED,
new java.io.IOException("Aborting build")
);
In case you got a Multibranch Pipeline-job (and you are a Jenkins-admin), use in the Jenkins Script Console this script:
Jenkins.instance
.getItemByFullName("<JOB NAME>")
.getBranch("<BRANCH NAME>")
.getBuildByNumber(<BUILD NUMBER>)
.finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"));
From https://issues.jenkins-ci.org/browse/JENKINS-43020
If you aren't sure what the full name (path) of the job is, you may use the following snippet to list the full name of all items:
Jenkins.instance.getAllItems(AbstractItem.class).each {
println(it.fullName)
};
From https://support.cloudbees.com/hc/en-us/articles/226941767-Groovy-to-list-all-jobs
Without having to use the script console or additional plugins, you can simply abort a build by entering /stop, /term, or /kill after the build URL in your browser.
Quoting verbatim from the above link:
Pipeline jobs can by stopped by sending an HTTP POST request to URL
endpoints of a build.
<BUILD ID URL>/stop - aborts a Pipeline.
<BUILD ID URL>/term - forcibly terminates a build (should only be used if stop does not work.
<BUILD ID URL>/kill - hard kill a pipeline. This is the most destructive way to stop a pipeline and should only be used as a last
resort.
The first proposed solution is pretty close. If you use stop() instead of interrupt() it even kills runaway threads, that run endlessly in a groovy system script. This will kill any build, that runs for a job.
Here is the code:
Thread.getAllStackTraces().keySet().each() {
if (it.name.contains('YOUR JOBNAME')) {
println "Stopping $it.name"
it.stop()
}
}
Once I encounterred a build which could not be stopped by the "Script Console". Finally I solved the problem with these steps:
ssh onto the jenkins server
cd to .jenkins/jobs/<job-name>/builds/
rm -rf <build-number>
restart jenkins
I use the Monitoring Plugin for this task. After the installation of the plugin
Go to Manage Jenkins > Monitoring of Hudson/Jenkins master
Expand the Details of Threads, the small blue link on the right side
Search for the Job Name that is hung
The Thread's name will start like this
Executor #2 for master : executing <your-job-name> #<build-number>
Click the red, round button on the very right in the table of the line your desired job has
If you have an unstoppable Pipeline job, try the following:
Abort the job by clicking the red X next to the build progress bar
Click on "Pause/resume" on the build to pause
Click on "Pause/resume" again to resume the build
Jenkins will realize that the job should be terminated and stops the build
I guess it is too late to answer but my help some people.
Install the monitoring plugin. (http://wiki.jenkins-ci.org/display/JENKINS/Monitoring)
Go to jenkinsUrl/monitoring/nodes
Go to the Threads section at the bottom
Click on the details button on the left of the master
Sort by User time (ms)
Then look at the name of the thread, you will have the name and number of the build
Kill it
I don't have enough reputation to post images sorry.
Hope it can help
The top answer almost worked for me, but I had one major problem: I had a very large number (~100) of zombie jobs due to a particularly poorly-timed Jenkins restart, so manually finding the job name and build number of each and every zombie job and then manually killing them was infeasible. Here's how I automatically found and killed the zombie jobs:
Jenkins.instance.getItemByFullName(multibranchPipelineProjectName).getItems().each { repository->
repository.getItems().each { branch->
branch.builds.each { build->
if (build.getResult().equals(null)) {
build.doKill()
}
}
}
}
This script loops over all builds of all jobs and uses getResult().equals(null) to determine whether or not the job has finished. A build that's in the queue but not yet started will not be iterated over (since that build won't be in job.builds), and a build that's finished already will return something other than null for build.getResult(). A legitimately running job will also have a build result of null, so make sure you have no running jobs that you don't want to kill before running this.
The multiple nested loops are mainly necessary to discover every branch/PR for every repository in a Multibranch Pipeline project; if you're not using Multibranch Pipelines you can just loop over all your jobs directly with something like Jenkins.instance.getItems().each.
Build-timeout Plugin can come handy for such cases. It will kill the job automatically if it takes too long.
I've looked at the Jenkins source and it appears that what I'm trying to do is impossible, because stopping a job appears to be done via a Thread interrupt. I have no idea why the job is hanging though..
Edit:
Possible reasons for unstoppable jobs:
if Jenkins is stuck in an infinite loop, it can never be aborted.
if Jenkins is doing a network or file I/O within the Java VM (such as lengthy file copy or SVN update), it cannot be aborted.
Alexandru Bantiuc's answer worked well for me to stop the build, but my executors were still showing up as busy. I was able clear the busy executor status using the following
server_name_pattern = /your-servers-[1-5]/
jenkins.model.Jenkins.instance.getComputers().each { computer ->
if (computer.getName().find(server_name_pattern)) {
println computer.getName()
execList = computer.getExecutors()
for( exec in execList ) {
busyState = exec.isBusy() ? ' busy' : ' idle'
println '--' + exec.getDisplayName() + busyState
if (exec.isBusy()) {
exec.interrupt()
}
}
}
}
Recently I came across a node/agent which had one executor occupied for days by a build "X" of a pipeline job, although that jobs page claimed build "X" did not exist anymore (discarded after 10 subsequent builds (!), as configured in the pipeline job). Verified that on disk: build "X" was really gone.
The solution: it was the agent/node which wrongly reported that the occupied executor was busy running build "X". Interrupting that executor's thread has immediately released it.
def executor = Jenkins.instance.getNode('NODENAME').computer.executors.find {
it.isBusy() && it.name.contains('JOBNAME')
}
println executor?.name
if (executor?.isBusy()) executor.interrupt()
Other answers considered:
The answer from #cheffe: did not work (see next point, and update below).
The answers with Thread.getAllStackTraces(): no matching thread.
The answer from #levente-holló and all answers with getBuildByNumber(): did not apply as the build wasn't really there anymore!
The answer from #austinfromboston: that came close to my needs, but it would also have nuked any other builds running at the moment.
Update:
I experienced again a similar situation, where a Executor was occupied for days by a (still existing) finished pipeline build. This code snippet was the only working solution.
Had this same issue but there was not stack thread. We deleted the job by using this snippet in the Jenkins Console. Replace jobname and buil dnumber with yours.
def jobname = "Main/FolderName/BuildDefinition"
def buildnum = 6
Jenkins.instance.getItemByFullName(jobname).getBuildByNumber(buildnum).delete();
This works for me everytime:
Thread.getAllStackTraces().keySet().each() {
if (it.name.contains('YOUR JOBNAME')) {
println "Stopping $it.name"
it.stop()
}
}
Thanks to funql.org
I usually use jenkins-cli in such cases. You can download the jar from a page http://your-jenkins-host:PORT/cli . Then run
java -jar jenkins-cli.jar delete-builds name_of_job_to_delete hanging_job_number
Auxiliary info:
You may also pass a range of builds like 350:400.
General help available by running
java -jar jenkins-cli.jar help
Context command help for delete-builds by
java -jar jenkins-cli.jar delete-builds
I had same issue at the last half hour...
Was not able to delete a zombie build running in my multi-branch pipeline.
Even Server restarts by UI or even from commandline via sudo service jenkins restart
did block the execution... The build was not stoppable... It always reapeared.
Used Version: Jenkins ver 2.150.2
I was very annoyed, but... when looking into the log of the build I found something intersting at the end of the log:
The red marked parts are the "frustrating parts"...
As you can see I always wanted to Abort the build from UI but it did not work...
But there is a hyperlink with text Click here to forcibly terminate running steps...(first green one)
Now I pressed the link...)
After the link execution a message about Still paused appeared with another Link Click
here to forcibily kill entire build (second green one)
After pressing this link also the build finally was hard killed...
So this seems to work without any special plugins (except the multibranch-pipeline build plugin itself).
VERY SIMPLE SOLUTION
The reason I was seeing this issue was incorrect http link on the page instead of https that should stop the job. All you need to do is to edit onclick attribute in html page, by following
Open up a console log of the job (pipeline) that got hang
Click whatever is available to kill the job (x icon, "Click here to forcibly terminate running steps" etc) to get "Click here to forcibly kill entire build" link visible (it's NOT gonna be clickable at the moment)
Open the browser's console (use any one of three for chrome: F12; ctrl + shift + i; menu->more tools->developer tools)
Locate "Click here to forcibly kill entire build" link manually or using "select an element in the page" button of the console
Double click on onclick attribute to edit its value
Append s to http to have https
Press enter to submit the changes
Click "Click here to forcibly kill entire build" link
Use screenshot for reference
I had many zombi-jobs, so I used the following script:
for(int x = 1000; x < 1813; x = x + 1) {
Jenkins .instance.getItemByFullName("JOBNAME/BRANCH")
.getBuildByNumber(x)
.finish(hudson.model.Result.ABORTED, new java.io.IOException("Aborting build"))
}
Using the Script console at https://my-jenkins/script
import hudson.model.Job
import org.jenkinsci.plugins.workflow.job.WorkflowRun
Collection<Job> jobs = Jenkins.instance.getItem('My-Folder').getAllJobs()
for (int i = 0; i < jobs.size(); i++) {
def job = jobs[i]
for (int j = 0; j < job.builds.size(); j++) {
WorkflowRun build = job.builds[j]
if (build.isBuilding()) {
println("Stopping $job ${build.number}")
build.setResult(Result.FAILURE)
}
}
}
Have had the same problem happen to me twice now, the only fix sofa has been to restart the tomcat server and restart the build.
A utility I wrote called jkillthread can be used to stop any thread in any Java process, so long as you can log in to the machine running the service under the same account.
None of these solutions worked for me. I had to reboot the machine the server was installed on. The unkillable job is now gone.
If the "X" button is not working and the job is stuck, then just delete the specific build number. It will free up the executor.
In my case, even though the job was completed, it was still stuck in the executor for hours. Deleting the build worked for me.
You can just copy the job and delete the old one. If it doesn't matter that you lost the old build logs.
Here is how I fixed this issue in version 2.100 with Blue Ocean
The only plugins I have installed are for bitbucket.
I only have a single node.
ssh into my Jenkins box
cd ~/.jenkins (where I keep jenkins)
cd job/<job_name>/branches/<problem_branch_name>/builds
rm -rf <build_number>
After this, you can optionally change the number in nextBuildNumber (I did this)
Finally, I restarted jenkins (brew services restart jenkins) This step will obviously be different depending how you manage and install Jenkins.
Enter the blue-ocean UI.
Try to stop the job from there.

Resources