We're running TFS 2015 update 2. When a build starts, it says "Waiting for an agent" then, 2 seconds after, the build fails. I looked whether the agent pools were running, and all of them were on green (as I understand this is expected). Also, I looked whether the TFSJobsAgent was running, and it's ok. If I download the zip log from de build process, it is empty, so i don't know what i'm doing wrong. i tried to create a new agent pool, and run the build process on that agent, but i got the same result.
Any ideas on how to tackle/solve this?
PS: All the builds were working fine a week ago.
The build service account needs to be a member of the Build Service Accounts group.
You also need to check whether the build agent service is running.
Related
I just created a new Build server and added it to an existing Queue, I turned off the other agents in the queue so that this new build server would get the job. I sent a very simple .NET build but the build window says "Waiting for an available agent"
So I get it, I must have missed installing something on the build server. So when the job is submitted it's looking for an agent that can satisfy the "capabilities" How can I see that what capabilities the job needs so I can see why it's stuck?
As far as I know, if the agent capabilities couldn't meet the build demands, it will show the warning message at the build result or queue time.
For example:
You could try the following points to troubleshooting:
Navigate to Local machine->Service and check if the Visual Studio Team Foundation Background Job Agent is running on the TFS application tier server.
You could start or restart this service.
Make sure the account that the agent is run under is in the Agent Pool Service Account role or Project Collection Build Service Accounts and the account is a domain account.
TFS2015 -> Agent Pool Service Account
TFS2017 TFS2018 ->Project Collection Build Service Accounts
Change another available account for agent service, restart the service.
3.Restart the whole TFS server and check if this do the trick. Need some time to wait.
Here is another ticket with the similar issue, you could refer to it.
Hope this helps.
We detached one of our main collections and it failed to detach so we restarted it and everything appeared to be fine. But we found this morning that the Build pipelines are not picking up new builds. The build servers are communicating and showing online but nothing is queuing. Anyone ever experience this during a detach/reattach process?
Update
Customer quiesced and unquiesced TFS and that fixed the pipeline issues.
Detach/reattach collection usually will not effect build related.
First, suggest you to take a look at Event View if there are some useful logs for troubleshooting.
If you are using vNext build, check the agent pool and agents which should all be green.
Make sure in your build definition, you have selected the right agent queue. Also try to create an empty build definition with no build task to see if the issue is related to the definition. And also restart your agent service on your TFS server.
Also check if the service Visual Studio Team Foundation Background Job Agent is running. If not, start it manually and try the build again.
Note: The service is running in TFS server, not build server.
For logs to troubleshooting , check the event view and the log in \agent_diag on build agent to see whether there are some useful information.
If you still get this issue, then deploy and configure a totally new build agent. This will help to narrow down the issue.
I am using TFS to execute a nightly build that includes several steps that use the TFS Test Agent. I am running the latest version of TFS/Test Agent(2015 - Update 3) and there are no other builds being run at this time. Often(maybe half the time), when the nightly job is run the step "Visual Studio Test Agent Deployment" fails with the following error:
The job has been abandoned because agent Agent-XXX did not renew the
lock. Ensure agent is running, not sleeping, and has not lost
communication with the service.
This is due to the error found in the Test Agent's log file(under _diag):
The session for this agent already exists. Sleeping for 30 seconds
before next retry.
Microsoft.TeamFoundation.DistributedTask.WebApi.TaskAgentSessionConflictException:
The task agent Agent-XXX already has an active session for owner XXX.
This issue is directly referenced here, and indirectly talked about here.
The solution I've found to this issue is to restart the server that the test agent is running on, this clears any dead sessions, and after the server starts back up, the tests run just fine. I think this is effectively what is being done in the previously mentioned post. The result of resetting the configs is that the service is restarted.
While being presented as a solution in the linked article, it is only temporary. Even after the server has been restarted and the build runs successfully, the next day the issue will again reappear necessitating manual intervention to get the build to run.
I could schedule a task to reset the service or even restart the server directly before the nightly build is run, but it strikes me as a bandage rather than a fix. Has anyone experienced this issue before, and if so is there any way to prevent it from occurring in the first place?
Update 1
I simply set up a build that runs 5 minutes before my main tests that runs a Bat script to restart all my servers hosting my test agents. This is a workaround, but one that seems to resolve the issue. Hopefully someday someone can come up with a better solution than this, but for now, it's how I have to run automated testing in TFS.
Update 2
I have three servers now, all three exhibit the same issue, though it is hard to pin down exactly when it occurs. Scaling up the workaround without creating downtime it proving to be quite challenging.
Update 3
A better day came, I upgraded TFS to 2018, and the build agent to the latest version, this issue no longer occurs, I think its a bug in the old build agent. I still don't have a solution for the original version of the build agent...
t sounds like a process Agent.Listener.exe was running under somewhere on the machine, maybe as a service (not a logged in user session).
note, if an agent process is abruptly terminated while it has an active session, the session will eventually timeout (after 5 minutes i think). and on startup, if an agent encounters session conflict then it will retry for up to 5.5 minutes i think before giving up (enough time for an abruptly terminated session to expire).
i'm going to go ahead and close this and assume a process was running somewhere. we havent had any issues in this area and haven't heard any other reports, so i dont think there is an issue here with the agent. if you find a repro, or it looks like i'm wrong then please reopen.
I am using an TFS agent to schedule some Build using triggers. My issue is that when the build starts it wants to get latest version (sync repository). But I get an unusual error:
I have literally no ideea where to begin, it's like it could reach the server or something.
I could not comment. Hence providing an answer.
Does the account your agent is running under, has rights to the TFS repository mapped in the build?
For more logs check the logs in D:\Kits\agent_diag folder.
Our Release Managemeng has a job that is stuck "In Progress".
The error is
Communication with the deployer was lost during the deployment. Please
make sure (1) the deployer machine has not rebooted during
installation and (2) the component timeout is sufficient to copy the
files from the drop location to the deployer machine and install the
package.
I can't stop or abandon the release. The buttons are all disabled. How can I kill this?
From the Release Manager, go to the Release tab. Enter in the details of the actual release, go to the step that is pending and you will see a a "Stop" Button at the top. That will stop the step and change the step of the Release.
Is the build stuck? Can you restart the build controller and / or the build agent? You can look for them by editing the build definition.
Don't trust me as Release Management is pretty new, but the error is about the connection between the RM Server and the RM Deployer service (i.e. the RM agent). RM Server don't know anything more about the agent, so your option is to connect to the target machine(s) and manually check deployment status. If completed, restart the RM Deployer service and cross fingers.
I faced the same issue of the release being stuck in 'In Progress' state. Turned out, the password of the credentials I was using, changed. Once the new password was specified in the deployment agent, the release managed to complete. This was months ago, and now I am facing same issue on other server. No clue what is the reason this time.
We has had this problem in which all releases got stuck on TFS 2018
As there is a connectivity issue with SQL when release is completed , it may not update the status in DB in some cases if load is more, so the release is stuck in InProgress state and started consuming pipeline in SQL . Other releases will also not move ahead, as there is blockage in pipeline. Once we increase the pipeline count, the problematic release could move out as processing of releases started happening.
Once the problematic release is canceled by the system, we set the pipeline back to original count of 1, then you could see their releases progressing and not being stuck.
Solution:
You need to increase the count of pipeline to let say 25 after this create a new Release Pipeline and queue this pipeline this will push all those pipeline which got stuck .Once pipeline start queuing make the count back to one or original count.
Reference - https://blogs.msdn.microsoft.com/tfssetup/2017/11/14/understanding-build-and-release-pipelines-visual-studio-team-servicesteam-foundation-server/