TFS - Cannot Check-In "big" files - tfs

I've been working with TFS for months without a problem, now suddently I can't check-in "big" files (2kb files are ok, but 50kb files or multiple files are not). TFS is hosted on a server in the same network.
When I try to check-in, it gives-me an error like: "Check In: Operation not performed : The underlying connection was closed: An unexpected error occurred on a receive.
Please refer to the Output window for more information". The "more information" in output window is just the same error.
The event viewer of the server shows nothing, and I've been looking in Google for the past couple hours and turned out nothing yet.

The error message The underlying connection was closed is an indicator that something between the client and the server is dropping the connection unexpectedly.
Some things to investigate/try:
Is the application pool on the server restarting? Look at the Application Event Log on the AT server. Look for ASP.NET and W3SVC warnings/errors that indicate the app pool has restarted.
How is the client connecting to the server? Is there a HTTP proxy in the middle? Is the server behind a load balancer or firewall device? What is the idle timeout set to? Is it honoring HTTP Keep-Alive settings?
Is it failing for all clients? Can you checkin the same file on the TFS server itself?
If none of those seem to lead you in the right direction, you'll need to setup a Fiddler or NetMon trace on your client and/or your server.

We had a similar problem in TFS 2008, where the lock table was getting big and causing some problems. Take a look at tbl_lock in your TfsVersionControl database. This should be a very small number of rows.
In our experience when this number approached or exceeded 1,000 rows, we started seeing significant issues with check-ins.
Our resolution to this was to turn on merging for the binary files that we were storing.

Related

Request timed out (System.Web.HttpException)

In periods we are experiencing many "Request timed out" exceptions (System.Web.HttpException) from a specific endpoint that is called often.
It appears not to be related to high-peak periods and has been experienced right after deployment and at random times. No pattern.
The solution is not to increase the execution timeout as the requests are normally completed within seconds.
Neither the web server nor the backend SQL Server is stressed. We have even seen low CPU usage during an incident period.
From ApplicationInsights I got the exact endpoint failing, which is a standard controller action. However, there is no additional information. No stack trace. No error code. Nothing. The exception is thrown at any time between 1 second and minutes after the request start.
From ApplicationInsight I can see that some of the requests to the failing endpoint are completed. However, the response time is extremely long (up to 8 minutes).
I have found nothing in the IIS logs. We have set up the failed request logging and waiting for the next incident. However, we do not expect to get more information than we already got from ApplicationInsights.
I'm uncertain whether this is an ASP.NET MVC application issue or an IIS configuration. It puzzles me, that no stack trace is available.
Any suggestions on how to approach this challenge? Pointers to articles/blogs that can help me solve the issue are very much appreciated.
UPDATE
I was looking through our trace logs and realized that they were not complete, i.e., entries were missing. We use ApplicationInsights (AI) for tracing. AI is configured to keep all traces, exceptions, and events, and it is working flawlessly in DEV and STAGING.
We have two AI environments: AI-PROD and AI-TEST. The environment is selected in web.config via instrumentation key. The entire AI config is in the ApplicationInsights.config and this file is the same in DEV, STAGING, and PROD.
I tried to connect STAGING to the AI-PROD environment to verify that it was not a problem with the environment. It worked flawlessly.
I disabled AI in PROD and the server started without throwing “Request timed out” errors during startup. When PROD is connected to either the AI-PROD or the AI-TEST environment I get “Request timed out” errors during startup.

TFS 2015 Intermittent connection issues

For the last few weeks our office has experienced intermittent and fairly crippling connection issues to our on premises TFS 2015 (w/update3). When it occurs visual studio basically becomes useless and web view pages of TFS either don't load the page, load the header toolbars or take several minutes and eventually open. Queries take minutes to run and so on. Then suddenly all will be well and it works again perfectly.
There are no errors shown when this happens, either to the user or in the TFS application event logs. The system is not overloaded on any resources. I have tried various things like; rebooting (obviously!), iisrest, cleared cache on app tier and who knows what else at this point.
Are there any other logs I could be looking at or things I could try to diagnose?
Worth noting: Users have all recently migrated to a new domain but the TFS servers are still on the old domain. However we had migrated in my office long before these issues occurred. Other offices who connect in have only recently migrated into new domain.
System setup is VMware 6.0, TFS app tier with separate SQL data tier and analysis database.
According to your description: There are no errors shown when this happens, either to the user or in the TFS application event logs.
Agree with Daniel in the comment, this kind of issue should not related to TFS server side. If the server always be slowly, this should be a performance issue. However, you said suddenly all will be well and it works again perfectly. Then the most possibility should be network related issue. Suggest your team use some Network Analyzer Tools to trouble shoot it.
For example, double check the DNS related area. Your TFS server is on one domain and some users are on another. First make sure the domains are trusted each other, double check that if the slow performance was being caused by an authentication issue.
One of the fix is to have users log in using the full domain name. A sample: if they are currently logging in with DEV\MyUserAccount, then they should instead log in with DEV.COM\MyUserAccount.
It has something to so with how the TFS server is looking up the accounts when a short domain name is used. It is pre-pending the name to all of the dns suffixes which in turn ends up creating bad ones and causes delays as it's not finding any valid domains.
Besides, about the performance issue, you could also take a look at this great answer from jessehouwing.

Azure Unexpected TCP issue

I've got an application that's been working for 9 months with no issue, but as of this morning (with no programming changes on my part), I'm timing out with SQL Server Error 10060.
In checking the server via SQL Studio there is some latency, and I've also had the same message, but less frequently.
In looking at the portal.azure page I'm assured that everything is OK.
What should one do when this happens? My ISP looks fine; I've got the same speed as usual.

Issue with non responding website. How to debug?

We have a website created in asp-mvc4 running on iss on windows server 2012 and using MSSQL 2012 as data storage. Connections are done using entity framework-6... Very standard stuff.
We are not a high volume website (max 3000 users around the world so hitting it in different timezones)
The issue is that sometimes without warning the site becomes unresponsive (browser does not show it and time out). Nothing special but here is the strange issues:
The server itself is working fine if you terminal server into it
Restarting the ISS does not help er there are no error logs
SQL server have around 100 connections from the website all sleeping (but killing theses processes does not make the site recover)
SQL server at the time show half of them as waiting tasks but it is still responsive if executing sql from SSMS or even remote from excel (remote reporting)
Looking at SQL Profiler website is still sending in SQL request despite being down but they are all request like this: if db_id('dbname') is not null else select... (Not something specifically written in the website)
the really strange one: If we restart the SQL server the website becomes responsive again)
I know this is not a lot to go on but we are very puzzled and don't really know how to proceed. Northing indicate error in any kind of log (website, iss, sql server or windows). I can deduct it must be the website thinking SQL cannot give it what it need because connection pool or something is used up but why it is only freed up with a complete sql server restart and not just killing the processes really puzzles me, and why the connection pool buildup happen in the first place since and sql is handled in entity framework
Any advice on how to debug further is most welcome

TFS check in timeout of changeset containing "larger" binary files

I'm performing a TFS Integration migration from tfs.visualstudio to an on-premise 2012 server. I'm running into a problem with a particular changeset that contains multiple binary files in excess of 1 MB, a few of which are 15-16 MB. [I'm working remotely (WAN) with the on-premise TFS]
From the TFSI logs, I'm seeing: Microsoft.TeamFoundation.VersionControl.Client.VersionControlException: C:\TfsIPData\42\******\Foo.msi: The request was aborted: The request was canceled.
---> System.Net.WebException: The request was aborted: The request was canceled.
---> System.IO.IOException: Cannot close stream until all bytes are written.
Doing some Googling, I've encountered others running into similar issues, not necessarily concerning TFS Integration. I'm confident this same issue would arise if I were just checking in a changeset like normal that met the same criteria. As I understand it, when uploading files (checking in), the default chunk size is 16MB and the timeout is 5 minutes.
My internet upload speed is only 1Mbit/s at this site. (While I think the problem would mitigated with sufficient upload bandwidth, it wouldn't solve the problem).
Using TCPView, I've watched the connections to the TFS server from my client while the upload was in progress. What I see is 9 simultaneous connections. My bandwidth is thus getting shared among 9 file uploads. Sure enough, after about 5 minutes the connections crap out before the upload byte counts can finish.
My question is, how can I configure my TFS client to utilize fewer concurrent connections, and/or smaller chunk sizes, and/or increased timeouts? Can this be done globally somewhere to cover VS, TF.EXE, and TFS Integration?
After spending some time with IL DASM poking around in Microsoft.TeamFoundation.VersionControl.Client.dll FileUploader, I discovered in the constructor the string VersionControl.UploadChunkSize. It looked like it is used to override the default chunk size (DefaultUploadChunkSize = 0x01000000).
So, I added this to TfsMigrationShell.exe.config
<appSettings>
<add key="VersionControl.UploadChunkSize" value="2097152" />
</appSettings>
and ran the VC migration again -- this time it got past the problem changeset!
Basically the TFS client DLL will try and upload multiple files simultaneously (9 in my case). Your upload bandwidth will be split among the files, and if any individual file transfer cannot complete 16MB in 5 minutes, the operation will fail. So you can see that with modest upload bandwidths, changesets containing multiple binary files can possibly timeout. The only thing you can control is the bytecount of each 5 minute timeout chunk. The default is 16MB, but you can reduce it. I reduced mine to 2MB.
I imagine this could be done to devenv.exe.config to deal with the same problem when performing developer code check ins. Hopefully this information will help somebody else out and save them some time.

Resources