401 Unauthorized out of nowhere with Jenkins and TFS

401 Unauthorized out of nowhere with Jenkins and TFS - jenkins

EDIT 2
Okay, turns out this has nothing to do with TFS or MSBuilder. This is entirely a problem with SonarQube. The SonarQube Service is the one who sends back a 401 (Unauthorized) status not TFS. Since I run 5.4 I have no clue as to how I specify a SonarQube user because in Jenkins both of those fields are greyed out.
I am using Jenkins as a Windows Service and about 2 hours ago the service made a successful build. Now, out of seemingly nowhere, Jenkins keeps reporting 401 (Unauthorized) no matter which build job I try to start.
They all start a SonarQube scanner first
Run MSBuild
In order for SonarQube to read the analysis I have to run a rebuild command
Run the End SonarQube analysis step and it collects the data to put on our SonarQube portal.
What I don't understand is that the last change I made to anything, was to go delete a file from an ASP project and now none of the jobs work, even those that have nothing to do with this ASP project. All the projects are stored on our Team Foundation Server (not locally hosted).
The only thing that really changed was that we wanted the IP of the Jenkins and SonarQube services to be accessible outside of the server they are hosted on so we made two sites on the local IIS and made a DNS to point at those. Reading into the error log I first see status 302 which is a redirection, before I reach 402. When I go to "Configure Jenkins" I am told that my proxy settings failed...or something along those lines.
Any idea what might cause this behaviour?
EDIT
Here is a part of the error log:
INFO: SCM changes detected in CSharp Build Job. Triggering #1
Apr 20, 2016 11:04:08 AM com.microsoft.tfs.core.config.httpclient.DefaultHTTPClientFactory logHTTPClientConfiguration
INFO: HttpClient configured for https://omitted.visualstudio.com/, authenticating as it#omitted.dk
Apr 20, 2016 11:04:10 AM com.microsoft.tfs.core.ws.runtime.client.SOAPService executeSOAPRequestInternal
INFO: SOAP method='GetRegistrationEntries', status=302, content-length=0, server-wait=1164 ms, parse=0 ms, total=1164 ms, throughput=0 B/s, uncompressed
Apr 20, 2016 11:04:11 AM com.microsoft.tfs.core.httpclient.HttpMethodDirector processWWWAuthChallenge
INFO: Failure authenticating with BASIC #omitted.visualstudio.com:443
Apr 20, 2016 11:04:11 AM com.microsoft.tfs.core.ws.runtime.client.SOAPService executeSOAPRequestInternal
INFO: SOAP method='GetRegistrationEntries', status=401, content-length=0, server-wait=578 ms, parse=0 ms, total=578 ms, throughput=0 B/s, uncompressed
Apr 20, 2016 11:04:11 AM com.microsoft.tfs.core.TFSTeamProjectCollection getServerDataProvider
WARNING: Error getting data provider
com.microsoft.tfs.core.exceptions.TFSUnauthorizedException: Access denied connecting to TFS server https://omitted.visualstudio.com/ (authenticating as it#omitted.dk)

Try to change the Server URL from https://omitted.visualstudio.com/ to https://omitted.visualstudio.com/DefaultCollection. And use Personal access tokens or Alternate credentials for User name and password. Check the screenshot below:

Okay, I have found the solution. It's so stupid I can't believe I didn't think of it.
In my SonarQube.Analysis.xml found at
...Jenkins\.jenkins\tools\hudson.plugins.sonar.MsBuildSQRunnerInstallation\MSBuild_2.0\SonarQube.Analysis.xml
I remembered that I had the username and password written and at some point I went and changed that in SonarQube from the default values to something else. That made all the builds break.

Related

DevOps Server 2019.0.1 (Azure DevOps Error - TF30063: You are not authorized to access tfs.)

After updating from TFS 2018 v3 to DevOps Server 2019.0.1 last weekend I now receive this authentication error when attempting to manage security:
TF30063: You are not authorized to access tfs.
I receive this error when attempting to manage security from the Server Administration Console via Application Tier > Administer Security. I also receive the error when I attempt to set permissions via tfssecurity cli tool. I am in the local administrator group and I am listed in the console administration user section.
I'm trying to set permissions because after the update I received several reports from employees that receive errors when they try to access their projects. Those errors are:
TF40049: You do not have licensing rights to access this feature: Code.
*** Edit: Update
This error reoccurred when I upgraded from 2019 to 2020 RC1. The difference is, this upgrade required a migration of the server- since server requirements changed for the new version of DevOps Server.

I spent 8 hrs working through this issue yesterday, and this is what fixed our problem:
deleted DevOps server cache. (location of cache listed in devops admin console on server)
reboot server.
I deleted the cache off the server based on an article I read with the same error, a user was having security/permissions issues with visual studio and they deleted the vs cache on their local machine and it solved their problem. I don't know if deleting the cache or the reboot would have fixed it independently because I did them both as a single troubleshooting step.
Hope this helps someone.
** Edit: Update 08/13/20 **
After upgrading again, I have ran into the same issue and this does not fix my error anymore. I've tried deleting the server cache, rebooting, reapplying permissions, configuring a new service account, reapplying changes, rebooting again, etc. I still have not found a solution for this error. I cannot schedule backups through the supplied backup scheduler without permissions to manage security settings through the configuration panel.

jenkins List of flow heads unset for CpsFlowExecution perhaps due to broken storage warning

I'm getting this warning in Jenkins logs on start.
Feb 25, 2017 9:32:40 PM hudson.WebAppMain$3 run
INFO: Jenkins is fully up and running
--> setting agent port for jnlp
--> setting agent port for jnlp... done
Feb 25, 2017 9:32:58 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution getCurrentHeads
WARNING: List of flow heads unset for CpsFlowExecution[null], perhaps due to broken storage
Feb 25, 2017 9:32:58 PM org.jenkinsci.plugins.workflow.cps.CpsFlowExecution getCurrentHeads
WARNING: List of flow heads unset for CpsFlowExecution[null], perhaps due to broken storage
Feb 25, 2017 9:48:02 PM jenkins.branch.MultiBranchProject$BranchIndexing run
INFO: bible-server #20170225.214800 branch indexing action completed: SUCCESS in 2.4 sec
workflow-cps, which seems to be the problem, is part of the famous pipeline plugin - which I am using.
https://wiki.jenkins-ci.org/display/JENKINS/Pipeline+Plugin
It doesnt seem to be having any other unwanted side effects other than this annoying warning in the logs.
Anyone got ideas how to fix this up?

I was seeing the same things. Looking at the source code for the plugin, it appears this is related to a run of a pipeline that completes abnormally.
In my case, I had a run of a pipeline that ran all the way through but got a Java exception trying to send mail because the VM lost network connectivity. Once I deleted that failed run (not the pipeline itself), I stopped seeing those warnings in the logs.

Why is Web Deploy using the wrong account?

I've verified that Web Deploy works (using NTLM authorization) when I fire it from Visual Studio on my local machine. Now I want my build server to auto-deploy (if appropriate) every night. I'm using Jenkins on the build server, and I've granted the account access in IIS on the remote machine. My parameters to MSBuild are as follows:
/p:DeployOnBuild=true
/p:Configuration=Debug
/p:Platform=x86
/p:PublishProfile=DEV
/p:AuthType=NTLM
/p:AllowUntrustedCertificate=True
/p:Username=
The DEV publish profile specifies my DEV server which uses a self-signed certificate thus necessitating an untrusted certificate. The NTLM and blank username should use the current user/account to connect.
However, the Jenkins' job's MSBuild step fails with this error
msdeploy error ERROR_USER_UNAUTHORIZED: Web deployment task failed. (Connected to the remote computer ("DEV-SERVER") using the Web Management Service, but could not authorize. Make sure that you are using the correct user name and password, that the site you are connecting to exists, and that the credentials represent a user who has permissions to access the site. Learn more at: http://go.microsoft.com/fwlink/?LinkId=221672#ERROR_USER_UNAUTHORIZED.)
When I look at the IIS logs on DEV-SERVER, I see the following:
2016-01-06 23:55:10 159.212.19.186 HEAD /msdeploy.axd site=MySite 8172 - 159.212.19.123 - 401 2 5 0
2016-01-06 23:55:10 159.212.19.186 HEAD /msdeploy.axd site=MySite 8172 CO\BUILD-SERVER$ 159.212.19.123 - 401 2 64 78
I was expecting to see CO\jenkins, the account Jenkins is running under, instead of CO\BUILD-SERVER$. (And what's with the $ on the end?) Am I correct in thinking the wrong account is being used? What do I need to do to get this working?

CO\BUILD-SERVER$ is the machine account of your build server.
If you have a slave running on that machine, is it running as a windows service? If so it's probably running as "System"
Also re Selenium tests, if the tests are running on the build server then the service may need to set to run interactively so that the tests can run against a UI.

Troubleshooting TFS 2013

I have tfs installed locally on my machine. It used to work fine but for a week or so now, it does not work most of the time.
It is not possible to connect from Visual Studio 2013 now using the team menu item. It is able to connect only once in so many tries.
Git Fetch, Git Pull and Git Push commands from Git Bash take a long time to show the login prompts and sometimes does not even working reporting that it could not connect to localhost:8080
Fetch, Push and Pull from Visual Studio work once in a while.
Connecting with the web always works though sometimes slow.
Git Push with Git Bash and Gui of late gives the error below
POST git-receive-pack (8010 bytes)
fatal: The remote end hung up unexpectedly
fatal: The remote end hung up unexpectedly
error: RPC failed; result=7, HTTP code = 0
Pushing to http://localhost:8080/tfs/col1/_git/project1
Everything up-to-date
I've read so many articles and now seem to work. Is there a way to troubleshoot TFS to find out where it is coming from so that it can be corrected.

It turned out this had nothing to do with TFS as generally requests that have to do with local host fails most of the time. It may be an issue with the system when it comes to local host. In any case, I disabled IPv6 and it still didn't work.
What worked however was rawcap. I realized that when I run rawCap to monitor 127.0.0.1, all calls were going through successfully. It appears something it did rectified the issue. Hope it helps someone who also had the same issue with localhost.

Watch for -1 statuses in TFS Activity Log (http://localhost:8080/tfs/_oi). Right click -> Show Detail to get the full exception. Also look for errors in the IIS logs.

Diagnosing TFS Build Hanging after 'Copy Files to Drop Location' step

I need some advice on how to diagnose a hanging build. It’s only been happening in the last week or two and I have good reason to suspect it’s something that I’ve done recently and not just a coincidence
Setup
TFS 2013
4 machine setup - 2 app tiers (in process of deprecating one of them), 1 sql server, 1 build server running 2 agents.
Build Controller is running on 2nd app tier along with the Job Agent
1st App tier is serving the website (although that machine will soon be shutdown and everything will be passed to the 2nd app tier as the machine is getting old)
Symptoms
All executed builds (doesn’t appear to matter which build process template) never get marked as done, the last step always seems to be the same step “Copy Files to Drop Location”/“Workspace and Copy Files to Drop Location”/”Copy Binaries to drop, Reset the environment” (named differently in each build template)
The files appear to be getting dropped successfully in the build drop folder
Looking at the task manager it appears that all the build processes on the build server are exited (only TFSBuildServiceHost
Builds show their normal steps/logging while executing
Primary app tier has related warnings in the event logs (see warnings below)
Recent Changes
Installed Xamarin Android/iOS on the build server
Installed a few custom built plugins for Job Agent, Message Queue, and Web Services (been using them for years just had them disabled the last few weeks due to a app tier migration)
Installed Tiago’s Task Board Enhancer (again been using this for a long time, just had it disabled recently)
About a month ago we added the 2nd app tier and moved the sql off to another machine
What I’ve Tried
Rebooting both App tiers and build server
Uninstalling Xamarin (although I suspect some parts are still floating around as the Bonjour service appears to still be installed)
Removing the custom plugins
Turned logging diagnostics right up on one of the builds – nothing particularly of interest seems to turn up
Run the Best Practice Analyzer (nothing too unusual shows up)
Multiple build process templates (defaulttemplate, defaulttemplate.11.1, tfvctemplate.12.xaml)
Multiple build definitions
Checked the event logs of both AppTiers and Build server
The Team Foundation service host request monitor has detected the
following condition: Date (UTC): 3/02/2014 12:54:06 a.m. Machine:
CODEBASE Application Domain: /LM/W3SVC/1/ROOT/tfs-1-130357641583538280
Assembly: Microsoft.TeamFoundation.Framework.Server, Version=12.0.0.0,
Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a; v4.0.30319 Service
Host: 0dc282b5-59a8-4941-b541-a4f7d314cd0f Process Details: Process
Name: w3wp Process Id: 2508 Thread Id: 2504
Detailed Message: A request for service host XXXX has been executing
for 37 seconds, exceeding the warning threshold of 30.
Request details: Request Context Details
Url: /tfs/XXXX/XXXX/_api/_build/stop?__v=4
Method: ApiBuild.stop
Parameters: uri = vstfs:///Build/Build/34064
User Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36
Unique Id: 00000000-0000-0000-0000-000000000000
The Team Foundation service host request monitor has detected the
following condition: Date (UTC): 30/01/2014 11:10:01 p.m. Machine:
CODEBASE Application Domain: /LM/W3SVC/1/ROOT/tfs-1-130355232548668648
Assembly: Microsoft.TeamFoundation.Framework.Server, Version=12.0.0.0,
Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a; v4.0.30319 Service
Host: 0dc282b5-59a8-4941-b541-a4f7d314cd0f Process Details: Process
Name: w3wp Process Id: 70320 Thread Id: 14540
Detailed Message: A request for service host XXXX has been executing
for 37 seconds, exceeding the warning threshold of 30.
Request details: Request Context Details
Url: /tfs/XXXX/Build/v4.0/BuildService.asmx
Method: StopBuilds
Parameters: uris[0] = vstfs:///Build/Build/34051 uris = Count = 1
User Agent: Team Foundation (devenv.exe, 12.0.21005.1, Premium, SKU:16)
Unique Id: 4d2d3213-fd41-4c4d-8ab0-b87619c96a42
The Team Foundation service host request monitor has detected the
following condition: Date (UTC): 31/01/2014 3:14:17 a.m. Machine:
CODEBASE Application Domain: /LM/W3SVC/1/ROOT/tfs-1-130355232548668648
Assembly: Microsoft.TeamFoundation.Framework.Server, Version=12.0.0.0,
Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a; v4.0.30319 Service
Host: Process Details: Process Name: w3wp Process Id: 70320
Thread Id: 14540
Detailed Message: There are no active requests for service host XXXX
that exceed the warning threshold of 30.
A quick google suggests upping the timeout in the tfs registry (http://xavierdilipkumar.com/post/2013/07/04/TFS-event-7005-and-7006-warning.aspx) I've tried that and it doesn't appear to change anything.

can you look in the tfs bs logs at
Event Viewer -> Applications and Services Logs -> Microsoft -> Team Foundation Server -> Build-Services -> Operational
these timeouts generally relate to permissions. you should look for TF215106 access denied events. Although the files appear to be there, are they all the current date or are there some with different (older) dates? Also are they any alerts/steps happening when the file drop occurs?
Other than that it could be timing out because one of the dependencies is being used by another service.

You might fire up Sysinternals Process Monitor to see when the processes actually exit and what they were doing (Process Monitor monitors "real-time file system, Registry and process/thread activity").

The best course of action is to call Microsoft Support and open a Service Request. Make sure it gets priority A - your TFS production environment is not working - and be prepared to give them support and access.
The only hint from the log is that call to ApiBuild.stop. It suggests that the build workflow completed, so the code hosting it is calling back to the AT to mark the build completed. As you have no warnings from previous calls, it could be some problem at the database level. You may try activating SQL Tracing but it's not a trivial task, as you should be able to compare the trace with a working one.
Good luck

I'm reluctant to mark this as an answer because I'm not entirely sure why it worked.
Suspecting something was wrong with the build machine I created a new Build Agent on a fresh install - the hanging issue still occurred.
I then added a Build Controller to that machine and noticed that new builds using that controller would complete. This suggested that there was a communication issue between either the BA and the BC, or the BA and the primary AT.
Given that our primary AT had other issues we decided to remove it from the picture, we switched the DNS to point at the second AT and disabled all services on the old primary. Instantly builds started to complete (including the ones that had been stuck for a number of days).
I still don't know which component was broken or why, especially since it worked fine in this configuration for a month prior. I can only assume there was either another change that I am not aware of, or the corruption of the primary AT was causing bigger issues.

We were having the same problem here, the builds were kept open even after successfully passing all workflow stages.
I logged into the build machine and noticed the build controller was "running 6 builds" for some reason, even though there were no builds at all showing in the queue in Visual Studio.
After restarting the controller, the next build worked the first time.
Just wanted to let this one here as a possible answer. I'm not sure yet why the controller had those stuck builds though.

I had this issue when an activity tried to log a huge message in the build log (namely the FxCopCmd activity from the CodePlex TFS Build Extensions project).
The build agent would successfully finish the build but the controller had to chew the huge message into the build log, and it was silently crashing/hanging.
I was able to track the issue down by navigating to C:\Users\[TfsServiceAccount]\AppData\Local\Temp\BuildAgent\[AgentNumber]\Logs\[BuildNumber]\ActivityLog.xml.
The last build message was truncated and by looking at the content, I recognized the FxCop output. In my case, I just set the LogToConsole parameter to False for the FxCop activity in the build process template, and the build completed successfully.

Also appears to happen if the build agent cannot connect to the build controller server on port 9191.
Easily testable with a telnet client.
Appears that my server decided it was on an unknown network and kicked the firewall into overdrive. (The second time I got this issue, not sure if this was the reason I got it the first time but it seems reasonable).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart