Production website becomes unresponsive on certain pages - asp.net-mvc

I have a weird issue that just started popping up for our customers. The portal they've been using for years has started freezing on some of the pages that the user navigates to. I tried restarting the IIS Server, the site within and the Application Pool under which the site is site is running. No difference.
In Chrome Dev Tools I can see that it is always one of these three calls that take time to complete:
When it happens, one of those three calls will report that the request is not finished, like this:
When eventually the call completes, I can see that the Content Download took 3.8 minutes. Not sure whether it is relevant or not, but it is always 3.8 minutes:
Did anyone else encounter a similar situation? Is there a suggestion on how to figure out what is happening all of a sudden that triggers these type of behaviours?
TIA,
Ed
Edit: The resource that fails to load after 3.8 minutes always generates a net::ERR_CONNECTION_RESET error:
Edit2: Thanks to all of you trying to help. A little update: I was able to isolate to problem to an issue with the server not serving some of the files. either *.css or *.js. The setting is that of two identical servers placed behind a load balancer. Apparently, the load balancer software was recently updated and right after that we started having these issues. I am working closely with the IT department of our client, trying to figure out what is the impact of the newer version that seems to have triggered all this drama.

Related

Recurring job in Hangfire works intermittently

I have 3 websites configured in IIS which use the same application pool. Each use the same code base (by nature the database is different for each client) and execute a hangfire recurring job each day. Now for 2 of the websites I don't have any problems but for one of the websites, the job does not run each day. Since the job starts immediately when a user access the website, this makes me think that the application pool is suspended and it is "awaken" when the user access the website.
I have already implemented the instructions http://docs.hangfire.io/en/latest/deployment-to-production/making-aspnet-app-always-running.html so that the application is always running. As I mentioned it works fine for the other 2 and it is just for 1 website where it does not work always. Has anybody else encountered such things before? Or does Hangfire is showing signs of instability where the same code runs perfectly fine for 2 and intermittently for 1.
Thanks
I asked this question on the hangfire forum and someone suggested the server itself did not have reosurces enough to run everything and would force-sleep inactive apps even when told not to in the config. Although there was nothing to suggest anything supporting or contradict it. I thought this was the case as the problem mostly occurred on weekend. What I am doing now is to ping the application every hour so the application pool remains active - this mechanism is incorporated within the website and is also scheduled through Hangfire. This has solved the problem and I have not had a single failure since. See https://discuss.hangfire.io/t/recurring-job-does-not-run-sometimes/1860 for further details.

Why is my Rails app hanging?

I am using Rails 4.2 and Ruby 2.2.0. Recently, I have noticed that my app will hang for long periods of time when a certain worker is working. This is an app that has been under development for several years and I have never seen this sort of hanging behavior before (nor am I unfortunately completely aware of what changes I've made since it was last not hanging).
What I mean by "hang" is: No pages will load (no controllers can be reached, and even the default homepage which is basically a static page won't load). None of my logs will update. Standard output is not being written to. The application also has many background worker processes (managed by Resque), which also appear to be hung.
I have looked at top and netstat and various other Unix utilities and I don't see anything alarming there, either.
Finally, I'm not sure if it's worth mentioning but this application is running via Foreman on an EC2 instance.
EDIT: I don't believe that it is query related, because nothing abnormal shows up on top. Also, I'm a little inexperienced with rails, but each job has its own entry in the process table, if that helps.

Rails: long delays AFTER webrick renders page before browser displays page (mac development machine)

Does anyone have any experience with Rail development page display erratically slowing down and speeding up (page appears 15-20 seconds after the console says the entire page has been rendered).
My development environment is rails 3.0.17 on Mac (Lion), WEBrick 1.3.1, ruby 1.9.2, with Postgres v11 (app supplied by Heroku) for the development database.
Recently, I've noticed some very long delays loading pages... 15 to 20 seconds at times, and the delay is completely unrelated to the complexity of the page, and a given page might load fast several times, then load slow. It might be pretty bad for several minutes, then go away for an hour.
And whether the page loads slow or fast, the rails log always shows the page has rendered fairly quickly... I'll see something like "Completed 200 OK in 486ms" but then the browser might say "waiting for localhost" for another 15 seconds before displaying the page.
Does not seem to depend on the browser (FF or Safari act the same).
Dos not sem to relate to updating the code base. I cna be testing some UI elements and everything is snappy, then suddenly several pages hang for 15-120 seconds.
Still happens even after I added the gem 'rails-dev-boost' to my Gemfile in development.
I also added to my development machine gem 'http_logger' to log calls to external http requests like S3, but don't see that as a factor (the delays often happen on pages that don't access external APIs).
My layouts have a couple of external .js dependencies such as http://static.twilio.com/libs/twiliojs/1.0/twilio.min.js, but I have added that to the bottom of my layout so I would (resumably) still see the page contents quickly if that were being delayed. (Besides, it is probably cached by my browser).
Of course I have tried rebooting the machine, verified activity monitor seems normal for CPU and memory usage. It does not seem to correlate when timemachine is doing its thing.
MORE INFO: per suggestion in comment from AKG below, I used Firebug's web console and most of the time the stylesheets load in a couple of milliseocnds, but when the slowdown occurs I'm seeing delays of 8-20 seconds for most of the stylesheets... which suggests webrick is failing to serve the pages timely?
It could be that buggy/slow middleware is executing after the render is called, but before the response is sent. I recently ran into this issue with pauses of more than 1 minute in some cases, and the culprit turned out to be the Bullet gem, which is designed to help us detect N+1 queries. I removed the middleware and the issue went away!
I got the same problem a few times. I switched to Mongrel. That automatically resolved the issue. However, I got this with Rails 2.3.8.

My app fails to connect to the server some times

I've been helplessly observing this problem for a couple months now, and have decided this is my best shot.
I'm not sure what the cause of the problem is, but I can list some of the things I'm doing. I have an iOS app that uses AFNetworking to connect to a remote server hosted by Google App Engine using HTTP POST requests.
Now, everything works great, but sometimes, very very sporadically and random, I get failed requests. The activity indicator spins and spins for about a minute, and I get no feedback at the end - just a failed request. I check my server logs, and I don't see any errors. After the failed request, I try again, and it works fine. It works fine for the whole day. And then another time randomly the issue repeats itself, sometimes spinning for 10 seconds with a fail, or a minute.
Generally, what can possibly be the cause of this? Is it normal to have some failed connections randomly? Is that something on my part?
But the weird thing is, is that while on my iPhone the app is running, and the indicator is spinning, and it's trying to connect, I try connecting on the iOS simulator, and the connection works just fine. I try again on the iPhone, and it doesn't work.
If I close the app completely and start again, then it works again. So it sounds like it may be a software issue rather than connection issue, but then again I have no evidence or data what so ever.
I know it's vague, but I'm hoping someone may have had a similar problem. Anything helps.
There is a known issue with instance start on GAE for Java. You can star http://code.google.com/p/googleappengine/issues/detail?id=7706 issue.
The same problem was reported for Python but it is not such a big problem.
I think you should check logging level you use on appengine and monitor all your calls. Instance start usually takes more time, so you will be able to see how much time do you use on start and is it really a timeout problem.
For Java version you could try to change log level to debug:
.level = DEBUG
in your logging.properties file. It will give you more information about instance start process.

Windows Azure WebRole stuck in a deployment loop

I've been struggling with this one for a couple of days now. My current Windows Azure WebRole is stuck in a loop where the status keeps changing between Initializing, Busy, Stopping and Stopped.
It never goes live, and I can can never see the website as a result. The WebRole is an "out of the box" MVC 2 application with Copy Local set to true on the Mvc dll and I haven't even tried hooking up a storage or WorkerRole yet, and there is nothing really happening inside the Start method that I can see would crash.
I've really tried going back to basics to ensure nothing can complicate the process and the website launches without a problem on the Dev Fabric and yes it looks just like the standard "Home", "About" MVC app - just can't get it running in the cloud!
Funny thing is, a few days ago, this exact package worked on the staging area in the cloud, and I could even see it in the browser - but could never get it swapped over to production, so I deleted everything and started from scratch, and now I can't even get it running on staging...
Does anyone have any ideas on what I could do to diagnose this problem myself because since logging this problem on the forums 2 days ago, there has been no improvement or feedback.
Any help appreciated,
Regards,
Rob G
Turns out there are a number of things that can cause this to happen. A full thread on the Microsoft forums goes through most of them and details my adventures in the arena.
http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/1482c1af-16e3-46ca-846e-14f511c35750
Hope this helps...
I think the best starting point is enabling remote desktop on all role instances.
Saves a lot of heart ache wondering why the heck isn't the diagnostics aren't logging anything.
By remoting in you can eye ball the event logs and find lots of reasons for azure unhappness

Resources