TFS 2017 - BuildAgent in Queue is ignored - tfs

I have two BuildAgents in my Default queue. Both checkboxes for active are set, both are online and running and also shown as online and running. When I start a build on that queue, it is only sent to one Agent, never to the other. If I stop this one Agent, I get the error message, that there is no agent available. But there is!
Has anyone an idea, whats going on here?
Specs: I have an On-premise-TFS-2017 (it was upgraded from 2012). Build Agents where installed the way, that TFS-2017 describes it on the interface.

You get the message means that the build agent Capabilities don't meet the conditions which set in build definition (Demands settings) or build requirements.
Please try below things to narrow down the issue:
Check the Demands in your build definition, make sure the demands
you added are existing on build agent Capabilities. If not existing
there, just try to Add Capability for the agent manually.
Somethings the agent cannot automatically identify some components as
the system Capabilities. In this case you can try comparing the
Capabilities between the two build agents to identify the
differents. Add the missed Capability for the failed agent manually.
Then try it again.
Deploy a new agent to check that.
Reference this article : TFS/VSTS Build – System Capabilities and Demands

Related

TFS Build Agent - Waiting for an agent to be requested

I am in the process of testing a TFS 2013 to TFS 2018 onprem upgrade. I have installed 2018.1 on a new system (and upgraded a copy of my TFS databases). I have installed a build agent on a new host which shows up under Agent Queues (as online and enabled).
I'm now trying to create a build. I set things up as I feel they should be and it sits at this screen:
Build
Waiting for an available agent
Console
Waiting for an agent to be requested
The VSTS Agent service is running on the build agent system. so I feel that is OK. I'm somewhat at a loss. Any assistance is appreciated.
Just try the below items to narrow down the issue:
Check the build definition requirements (Demands section) and the agent offering. Make sure it has the required capabilities installed on the agent machine.
When a build is queued, the system sends the job only to agents that have the capabilities demanded by the build definition.
Check if the service "Visual Studio Team Foundation Background Job Agent" is running on the TFS application tier server.
If it's not started, just start the service.
If the status is Running, just try to Restart the service.
Make sure the account that the agent is run under is in the "Agent Pool Service Account" role.
Try to change a domain account which is a member of the Build Agent Service Accounts group and belongs to "Agent Pool Service Account" role, to see whether the agent would work or not.
We have just spent five days trying to diagnose this issue and believe we have finally nailed the cause (and the solution!).
TL;DR version:
We're using TFS 2017 Update 3, YMMV. We believe the problem is a result of a badly configured old version of an Elastic Search component which is used by the Code Search extension. If you do not use the Code Search feature please disable or uninstall this extension and report back - we have seen huge improvements as a result.
Detailed explanation:
So what we discovered was that MS have repurposed an Elastic Search component to provide the code search facility within TFS - the service is installed when TFS is installed if you choose to include the search feature.
For those unfamiliar with Elastic, one particularly important aspect is that it uses a multi-node architecture, shifting load between nodes and balancing the workload across the cluster and herein lies the MS Code Search problem.
The Elastic Search component installed in TFS is (badly) configured to be single node, with a variety of features intentionally suppressed or disabled. With the high water-mark setting set to 85%, as soon as the search data reaches 85% of the available disk space on the data drive, the node stops creating new indexes and will only accept data to existing indexes.
In a normal Elastic cluster, this would cause another node to create a new index to accept the new data but, since MS have lobotomised the cluster down to one node, the fall-back... is the same node - rinse and repeat.
The behaviour we saw, looking at the communications between the build agent and the build controller, suggests that the Build Controller tries to communicate with Elastic and eventually fails. Over time, Elastic becomes more unresponsive and chokes this communication which manifests as the controller taking longer and longer to respond to build requests.
It is only because we actually use Elastic Search that we were able to interpret the behaviour and logs to come to this conclusion. Without that knowledge it would be almost impossible to determine the actual cause.
How to fix this?
There are a number of ways that you can fix this:
Don't install the TFS Search feature
If you don't want to use the Code Search feature, don't install it. The problem will not occur.
Remove the TFS Search feature [what we did]
If you don't use the Code Search feature, uninstall it. The problem will go away - you can either just disable the extension in all collections or you can use the server installer to fully remove it. I have detailed instructions from MS for anyone who wants to eradicate it completely, just ask.
Point the Search feature to a properly configured, real Elastic cluster
If you use Elastic properly, rather than stuffing it in a small box on its own, the problem will not occur.
Ensure the search data disk never hits the 85% water-mark
Elastic will continue to function "properly" and should return search results as expected, within the limited parameters.
Hope this helps someone out there avoid the pain we suffered.
The TF Background Job Agent wasn't running on the application tier, because that account didn't have 'log on as a service'.
I was facing the same issue and in my case, it was resolved by restarting the TFS server (TFS is hosted locally in our office).

TFS 2015 and later - Tagging agents

Does anyone know if the feature to tag agents has dissapeared?
I could not find anything related to over the internet.
My idea is to have certain builds use a specific agent. On TFS 2013 I would use tagging, but i no longer see that option.
On the other hand, I see that it is possible to connect a build definition to a certain agent queue.
There is no more agent tags for TFS 2015 or later version. If you want to use a particular build definition and a specific build Agent which used to run the build.
You could add a user Capability to that specific build agent then in the build definition you put that capability as a demand (General tab).
Another way is directly using Agent.Name or Agent.ComputerName demands in build def or when queuing a build. Take a look at this blog: How to send TFS build to a specific agent or server, which also support on TFS2015.
Oren: Is this feature works in TFS15 SP3?
Reply Eric Parvin: Yes, this should work on TFS 2015 to the newest version.
Use demands and capabilities for this. Add a custom capability to the build agent, and then add a matching demand to the build definition.
Demands and capabilities would work, and you could also create a specific queue, with a single available agent in it, and set the queue to your desired build, so you would achieve the same behavior as desired one.

TFS, Jenkins and how to update work items with build numbers

We are using TFS and the TFS Build Service. We are considering to migrate the Build service to Jenkins but we came across some issues. According to this site, there are some things that do not work very well with the TFS and Jenkins plugins. All of them we use a lot:
Associated Change sets – Team Build automatically associates a list of change sets that are included in the build
Associated Work Items – Team Build analysis the relationships and also associates Work Items with a build. Indeed it walks the work item tree (parent) and maintains that association in the chain.
Is this still true? We have this scenario:
A developer checks in a code that fix a bug or resolve a User Story. It does that by associating his check in with the work item ID.
His check in triggers a build that will associate the work item with his changeset. For bugs, the build will update the "Integrated in Build" field with the build number. We use this field to know in witch version the bug was fixed.
Is there any way to make Jenkins behave and do what TFS build service does?
Another option is to mix the two using dummy builds on the TFS side that set the records straight and kick-off the Jenkins' builds. Some hints
How to trigger Jenkins builds remotely and to pass parameters and “Fake” a TFS Build.
This approach requires a bit of effort but has many advantages:
No big-bang, use Jenkins opportunistically
Can continue using existing builds
Having a build identifier in TFS allows you an overall monitoring and to use the Test features
I have a VSTS build definition for one of our projects that requires jenkins to build, but we still have all our other products using VSTS natively. To maintain consistency, this build definition triggers a jenkins build. We configured the build definition to not sync code as jenkins will download it (save time) and not to publish the artifacts back to the agent (i have another script for that found here). This allows developers to continue to use git as normal, and the build/release process is consistent with our other products. Along with task tracking and such.

Discard build in case of failure

I'm creating a custom build process template that allows the person who is queueing the build to define the build and revision numbers (as they are used on the assembly version info).
However, if the build fails, they can't queue a new build for the same version (but they should be able).
Is there a way to automatically discard the build if any step in the workflow or MSBuild Script fails?
TFS maintains assigned build numbers in the database itself, for its own administrative purposes. This maintains its internal consistency with all the assets that are produced and (intermediate) work products.
The only way to free up a previously used build number is to actively DESTROY it from the database. Please see http://geekswithblogs.net/jehan/archive/2011/04/23/tf42064-the-build-number-already-exists-for-build-definition-error.aspx for a further explaination.
One of the features added in TFS 2012 is the ability to retry a build.
This allows you to right click a failed build and select retry instead of queuing a new build. This should allow you to run a build with the same configuration settings without getting errors on your build numbers.

Can I specify the OS-level build user on a per-job basis?

Our team is sharing a Jenkins server with other teams, and this currently means that we are sharing the same OS-level build-user account. The different teams' OS-level build-user settings (Maven settings, bash settings, user-level Ant libraries, etc...) have collided a few times--"fixing" the settings for one team's jobs inadvertently "breaks" another team's jobs. The easiest sol'n that occurs to me is giving each team its own OS-level build-user account with which to execute its Jenkins jobs--but I cannot find a way to do this.
I have checked with Google, and also here
https://wiki.jenkins-ci.org/display/JENKINS/Use+Jenkins
and here
https://wiki.jenkins-ci.org/display/JENKINS/Plugins
to no avail.
Is there a way to do this? If not, can you recommend any best practices for segregating sets of builds from one another?
Maven Specific
You have two options that come to mind,
Add additional installations of Maven into your Jenkins global configuration, each using their own Home directory, and thus settings files. This will allow you to use totally different version of Maven, and selected based on Job requirements (You are given the option to select which "version" of maven you wish to use on the job itself.
Similar to (1), but specify specific settings configurations using Maven command line arguments. Its a little less "obvious" but may be quicker to implement
Multi-slave
You could possibly make use of multiple slaves on each machine. It increases the overheads of the builds quite significantly, and the implementation is such that you'd have multiple user accounts on a machine, each setup as needed, and then one slave instance for each user.
I'm not sure these solutions will totally answer your problem, I'll have a think and see if anything else pops into mind, but it might give some starting points
Key builds to a specific team directory that contains that team's settings. For example, provide a parameter 'TEAM' to every build, set its default value to the appropriate team name, and use that parameter as a key to a directory that contains the team's settings (so instead of using ${HOME} as in what you want to do, you'll use something like ${TEAM_SETTINGS}/${TEAM}).
You can set per-job users (who has access to/can build a particular job).
Under "Manage Jenkins" > "Configure System" >
Click on Enable Security
Check Project-based Matrix Authorization Strategy
However, I do not think there is a "per-build" option for a single job.
If you have the same project that you are sharing between teams, you could (and probably should) create two jobs for this project, and have different libraries/scripts be used in each.
You could also parametrize the build (On the Job Page, "Configure" > This build is parametrized) and supply the library versions, etc via string parameters.
You could also use a parameter to be the team's name, and in your build script change libraries based on the parameter:
For example, have a parameter called "TEAM", with choices: TEAM_A and TEAM_B, and in your script, have
if [ $TEAM == "TEAM_A" ]
then
ANT_HOME=/opt/ant/libA
else
ANT_HOME=/opt/ant/libB
fi
======================================================================
Have you considered sourcing your settings? In Linux, you could do this by saving your OS settings in a script file (for example paths, etc), and using source /path/to/settings/file, in Windows it would be call /path/to/settings/batch/file.
Can you give examples of OS level settings that you would require and per-build user for?
You problem is a common one.
Whenever something nonstandard is installed on a build server, something will break for someone.
The only solutions I know are
Set up a separate build slave for each team or product. Then they can install whatever they want on the build slave and any mess they create is all their own fault.
Any dependencies required by a job need to come with the job. This is my preferred way of working. For example: If a job needs a library or a tool, the library or tool is not installed on the build server but in the source tree and the build uses it from the source tree.
Sometimes the latter way is more work. You need to set up the tools or library so it works when it is installed in the source tree. Some tools have hard-coded paths and they do not work. In that case you can install the source of the tool and compile the tool during the build.
An even better solution is to set up separate Jenkins jobs for all the tools and libraries and the jobs that need a library or tool will download them from the Jenkins jobs.
This way you can control all your dependencies and different jobs do not conflict when e.g. one needs an older version of a library and one a newer version. And if someone upgrades the library, it is immediately visible in the version control who did what.

Resources