Upgrading from .net 5 to .net 6 causes timeouts for external http(s) calls via webclient - timeout

We decided to upgrade our website Asp.net core code from .net5 to .net6, we simply set the 'target framework' of the web application to.net6 from .net5. There were no compilation errors, we gave it a test in our development environment and all seemed well.
There were no code changes at all made, and previously the .net5 application has been running for many months without issue (and before that .net framework 4.8).
When we deployed our app to our live production environment, within a few minutes we noticed a slowdown of external calls (calls to https endpoints, often REST-like), we log any calls that take more than 5 seconds, over the space of a few minutes all calls went from slow to timing out (20 seconds).
We are using System.Net.WebClient for all of our calls, which I understand is now obsolete in .net6, however, I would not expect this to suddenly change behavior, and even so, we attempted to change to HttpClient, the recommended approach, with the same results.
I feel like I must be missing something really fundamental, we just upgraded the target framework and redeployed and now all calls made by WebClient eventually timeout.
It feels like a "running out of resources" issue, in code, due to the slow down then timeout, but I am at a loss to explain what is going on here.
To be clear, we are not doing anything special, just calling about 3 external services via WebClient for each user, and we have maybe 100 users a minute at peak, previously, there have been no timeouts.
Any pointers on what might be causing the timeouts would be greatly appreciated.

I guess time will tell if this is the answer, but we changed all of our calls to use DownloadStringTaskAsync and UploadStringTaskAsync, i.e. all calls from blocking to async await, and after 24hrs, we have not seen the same behaviour in our live environment under full load.
Why a web app using .net 5 core would not have these issues but .net 6 would, is hard to understand. For context, we are not under crazy high load, we are talking a peak of perhaps 150 users per minute, but that is what we are seeing.
Perhaps it was something specific to our set up, but I am writing this to save someone else the pain of trying to debug this issue in the future.

That is suspicious and unexpected. If you have HttpClient repro, can you please post it on GitHub https://github.com/dotnet/runtime/issues? (ideally minimal repro we can run locally for debugging)
If your repro is not transferable to another machine, or requires specific endpoints you can't expose, we may have to guide you through some local debugging ...
-Karel (.NET Networking team)

Related

Large percent of requests in CLRThreadPoolQueue

We have an ASP.NET MVC application hosted in an azure app-service. After running the profiler to help diagnose possible slow requests, we were surprised to see this:
An unusually high % of slow requests in the CLRThreadPoolQueue. We've now run multiple profile sessions each come back having between 40-80% in the CLRThreadPoolQueue (something we'd never seen before in previous profiles). CPU each time was below 40%, and after checking our metrics we aren't getting sudden spikes in requests.
The majority of the requests listed as slow are super simple api calls. We've added response caching and made them async. The only thing they do is hit a database looking for a single record result. We've checked the metrics on the database and the query avg run time is around 50ms or less. Looking at application insights for these requests confirms this, and shows that the database query doesn't take place until the very end of the request time line (I assume this is the request sitting in the queue).
Recently we started including SignalR into a portion of our application. Its not fully in use but it is in the code base. We since switched to using Azure SignalR Service and saw no changes. The addition of SignalR is the only "major" change/addition we've made since encountering this issue.
I understand we can scale up and/or increase the minWorkerThreads. However, this feels like I'm just treating the symptom not the cause.
Things we've tried:
Finding the most frequent requests and making them async (they weren't before)
Response caching to frequent requests
Using Azure SignalR service rather than hosting it on the same web
Running memory dumps and contacting azure support (they
found nothing).
Scaling up to an S3
Profiling with and without thread report
-- None of these steps have resolved our issue --
How can we determine what requests and/or code is causing requests to pile up in the CLRThreadPoolQueue?
We encountered a similar problem, I guess internally SignalR must be using up a lot of threads or some other contended resource.
We did three things that helped a lot:
Call ThreadPool.SetMinThreads(400, 1) on app startup to make sure that the threadpool has enough threads to handle all the incoming requests from the start
Create a second App Service with the same code deployed to it. In the javascript, set the SignalR URL to point to that second instance. That way, all the SignalR requests go to one app service, and all the app's HTTP requests go to the other. Obviously this requires a SignalR backplane to be set up, but assuming your app service has more than 1 instance you'll have had to do this anyway
Review the code for any synchronous code paths (eg. making a non-async call to the database or to an API) and convert them to async code paths

Unity DI misbehaving on async operations

We have a legacy multi-tenant (where each tenant has it's own database, but they all share the same code and VMs) system built on asp.net MVC 4. Due to some performance problems we started to change it a little, piece by piece.
One thing we did is to work with DI/IoC using unity. So far the only thing registered on the container is the EF DbContext using PerRequestLifetimeManager. So far no other items are registered. Whenever a Service or Controller gets instatiated,
Another thing we did is to make some operations async... we plan to make them all async but we're going one by one.
Our in-house tests were successful and we deployed to production.
After a few hours of real traffic, we started to notice some problems... inexplicable erros where the system reported a bunch of things, for which, root cause was: "Id does not exists". These were very far apart for a specific tenant (happening less than 10 times a day per tenant - avg usage of 3k operations per day per tenant) but at a grand total, this became very concerning. Capturing and executing manually always returned the expected result.
By mistake, one dev at a certain point had logged the full connection string EF was using, and for our surprise, the wrong database was being hit! client A was indeed trying to read something from client B database!
Looking all over the place, we went through TransactionScopeAsyncFlowOption.Enabled and <add key="aspnet:UseTaskFriendlySynchronizationContext" value="true" /> but the errors are still happening...
We figured that there are 2 possible locations for the root cause here: Either we're screwing up when creating the DbContext or unity is giving up the wrong instance when we call for Resolve.
Because the creation logic did not change at all and this was not happening before, we believe that Unity is at fault here.
It's important to notice that, as far as we know, no sync operations that use DI (about 95% of the system now, 5% are async ops) ever had such problems.
Anyone has any idea of what might be going on?
Details:
-Hosted on Azure App Services, framework version 4.6
-Horizontal scale. But this happens even when only 1 instance can handle the load
Resolve everything up-front (on the constructor for example) and it will all be fine...

MVC3 site running fast locally, very slow on live host

So I've been running into some speed issues with my site which has been online for a few weeks now. It's an MVC3 site using MySQL on discountasp.net.
I cleaned up the structure of the site and got it working pretty fast on my local machine, around 800-1100ms to load with no caching. The strange thing is when I try and visit the live site I get times of around 15-16 seconds, sometimes freezing up as long as 30 seconds. I switched off the viewstate in web.config and now the local loads in 1.3 seconds (yes, oddly a little longer) and the live site is down to 8-9 seconds most of the time, but that's still pretty poor.
Without making this problem to specific to my case (since there can be a million reasons sites go slow), I am curious if there are any reasons why the load times between the local Visual Studio sever or IIS Express would run so fast while the live site would run so slow. Wouldn't anything code wise or dependency wise effect both equally? I just can't think of a reason that would affect the live site but not the local.
Any thoughts?
Further thoughts: I have the site setup as a sub-folder which I'm using IIS URL Rewriting to map to a subdomain. I've not heard of this causing issues before, but could this be a problem?
Further Further Updates: So I uploaded a simple page that does nothing but query all the records in the largest table I have with no caching. On my local machine it's averages around 110ms (which still seems slow...), and on the live site it's usually over double the time. If I'm hitting the database several times to load the page, it makes sense that this would heavily affect the page load time. I'm still not sure if the issue is with LINQ or MySQL or MVC in general (maybe even discountasp.net).
I had a similar problem once and the culprit was the initialization of the user session. Turns out a lot of objects were being read/write to the session state on each request, but for some reason this wasn't affecting my local machine (I probably had InProc mode enabled locally).
So try adding an attribute to some of your controllers and see if that speeds things up:
[SessionState(SessionStateBehaviour.Disabled)]
public class MyController : Controller
{
On another note, I ran some tests, and surprisingly, it was faster to read some of those objects from the DB on each request than to read them once, then put them in the session state. That kinda makes sense, since session state mode in production was SqlServer, and serialization/deserialization was apparently slower than just assigning values to properties from a DataReader. Plus, changing that had the nice side-effect of avoiding deserialization errors when deploying a new version of the assembly...
By the way, even 992ms is too much, IMHO. Can you use output caching to shave that off a bit?
So as I mentioned above, I had caching turned off for development, but only on my local machine. What I didn't realise was there was a problem WITH the caching which was turned on for the LIVE server, which I never turned off because I thought it was helping fix the slow speeds! It all makes sense now :)
Fixing my cache issue (IQueryable<> at the top of a dataset that was supposed to cache the entire table.. >_>) my speeds have increased 10 fold.
Thanks to everyone who assisted!

ASP.NET MVC App takes ages to respond sometimes

I have an ASP.NET MVC app running in IIS7. Sometimes when I type the URL into a browser it takes 5-10 seconds before it responds, then after that it is fine. It's like it's taking it's time starting/waking up.
How should I best proceed with trying to identify the problem?
This is probably normal behaviour rather than a problem. It's probably going to sleep because it hasn't had any requests for a long time. You can try changing the Idle Timeout under the Application Pool's advanced settings or if you're running .NET 4.0, you can keep the application always running.
If this is happening right after you've edited ASPX files or rebuilt other .NET sources (e.g., *.cs), then it's probably because of the JIT native code generation of the .NET code. This can be solved by a warm-up utility like the (currently defunct, sadly) IIS warm-up module.
You can use ASP.NET Mini Profiler to check what takes too long to load. Web.config re-saving can cause your application to be restarted and if your website is not compiled, it takes longer to load as well, plus if you have a heavy database call. In other words, there's no general solution for this, it depends on how you design the application and then improve what appear to be not efficient.

Web App Performance Problem

I have a website that is hanging every 5 or 10 requests. When it works, it works fast, but if you leave the browser sit for a couple minutes and then click a link, it just hangs without responding. The user has to push refresh a few times in the browser and then it runs fast again.
I'm running .NET 3.5, ASP.NET MVC 1.0 on IIS 7.0 (Windows Server 2008). The web app connects to a SQLServer 2005 DB that is running locally on the same instance. The DB has about 300 Megs of RAM and the rest is free for web requests I presume.
It's hosted on GoGrid's cloud servers, and this instance has 1GB of RAM and 1 Core. I realize that's not much, but currently I'm the only one using the site, and I still receive these hangs.
I know it's a difficult thing to troubleshoot, but I was hoping that someone could point me in the right direction as to possible IIS configuration problems, or what the "rough" average hardware requirements would be using these technologies per 1000 users, etc. Maybe for a webserver the minimum I should have is 2 cores so that if it's busy you still get a response. Or maybe the slashdot people are right and I'm an idiot for using Windows period, lol. In my experience though, it's usually MY algorithm/configuration error and not the underlying technology's fault.
Any insights are appreciated.
What diagnistics are available to you? Can you tell what happens when the user first hits the button? Does your application see that request, and then take ages to process it, or is there a delay and then your app gets going and works as quickly as ever? Or does that first request just get lost completely?
My guess is that there's some kind of paging going on, I beleive that Windows tends to have a habit of putting non-recently used apps out of the way and then paging them back in. Is that happening to your app, or the DB, or both?
As an experiment - what happens if you have a sneekly little "howAreYou" page in your app. Does the tiniest possible amount of work, such as getting a use count from the db and displaying it. Have a little monitor client hit that page every minute or so. Measure Performance over time. Spikes? Consistency? Does the very presence of activity maintain your applicaition's presence and prevent paging?
Another idea: do you rely on any caching? Do you have any kind of aging on that cache?
Your application pool may be shutting down because of inactivity. There is an Idle Time-out setting per pool, in minutes (it's under the pool's Advanced Settings - Process Model). It will take some time for the application to start again once it shuts down.
Of course, it might just be the virtualization like others suggested, but this is worth a shot.
Is the site getting significant traffic? If so I'd look for poorly-optimized queries or queries that are being looped.
Your configuration sounds fine assuming your overall traffic is relatively low.
To many data base connections without being release?
Connecting some service/component that is causing timeout?
Bad resource release?
Network traffic?
Looping queries or in code logic?

Resources