Windows Azure WebRole stuck in a deployment loop - asp.net-mvc

I've been struggling with this one for a couple of days now. My current Windows Azure WebRole is stuck in a loop where the status keeps changing between Initializing, Busy, Stopping and Stopped.
It never goes live, and I can can never see the website as a result. The WebRole is an "out of the box" MVC 2 application with Copy Local set to true on the Mvc dll and I haven't even tried hooking up a storage or WorkerRole yet, and there is nothing really happening inside the Start method that I can see would crash.
I've really tried going back to basics to ensure nothing can complicate the process and the website launches without a problem on the Dev Fabric and yes it looks just like the standard "Home", "About" MVC app - just can't get it running in the cloud!
Funny thing is, a few days ago, this exact package worked on the staging area in the cloud, and I could even see it in the browser - but could never get it swapped over to production, so I deleted everything and started from scratch, and now I can't even get it running on staging...
Does anyone have any ideas on what I could do to diagnose this problem myself because since logging this problem on the forums 2 days ago, there has been no improvement or feedback.
Any help appreciated,
Regards,
Rob G

Turns out there are a number of things that can cause this to happen. A full thread on the Microsoft forums goes through most of them and details my adventures in the arena.
http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/1482c1af-16e3-46ca-846e-14f511c35750
Hope this helps...

I think the best starting point is enabling remote desktop on all role instances.
Saves a lot of heart ache wondering why the heck isn't the diagnostics aren't logging anything.
By remoting in you can eye ball the event logs and find lots of reasons for azure unhappness

Related

Upgrading from .net 5 to .net 6 causes timeouts for external http(s) calls via webclient

We decided to upgrade our website Asp.net core code from .net5 to .net6, we simply set the 'target framework' of the web application to.net6 from .net5. There were no compilation errors, we gave it a test in our development environment and all seemed well.
There were no code changes at all made, and previously the .net5 application has been running for many months without issue (and before that .net framework 4.8).
When we deployed our app to our live production environment, within a few minutes we noticed a slowdown of external calls (calls to https endpoints, often REST-like), we log any calls that take more than 5 seconds, over the space of a few minutes all calls went from slow to timing out (20 seconds).
We are using System.Net.WebClient for all of our calls, which I understand is now obsolete in .net6, however, I would not expect this to suddenly change behavior, and even so, we attempted to change to HttpClient, the recommended approach, with the same results.
I feel like I must be missing something really fundamental, we just upgraded the target framework and redeployed and now all calls made by WebClient eventually timeout.
It feels like a "running out of resources" issue, in code, due to the slow down then timeout, but I am at a loss to explain what is going on here.
To be clear, we are not doing anything special, just calling about 3 external services via WebClient for each user, and we have maybe 100 users a minute at peak, previously, there have been no timeouts.
Any pointers on what might be causing the timeouts would be greatly appreciated.
I guess time will tell if this is the answer, but we changed all of our calls to use DownloadStringTaskAsync and UploadStringTaskAsync, i.e. all calls from blocking to async await, and after 24hrs, we have not seen the same behaviour in our live environment under full load.
Why a web app using .net 5 core would not have these issues but .net 6 would, is hard to understand. For context, we are not under crazy high load, we are talking a peak of perhaps 150 users per minute, but that is what we are seeing.
Perhaps it was something specific to our set up, but I am writing this to save someone else the pain of trying to debug this issue in the future.
That is suspicious and unexpected. If you have HttpClient repro, can you please post it on GitHub https://github.com/dotnet/runtime/issues? (ideally minimal repro we can run locally for debugging)
If your repro is not transferable to another machine, or requires specific endpoints you can't expose, we may have to guide you through some local debugging ...
-Karel (.NET Networking team)

Production website becomes unresponsive on certain pages

I have a weird issue that just started popping up for our customers. The portal they've been using for years has started freezing on some of the pages that the user navigates to. I tried restarting the IIS Server, the site within and the Application Pool under which the site is site is running. No difference.
In Chrome Dev Tools I can see that it is always one of these three calls that take time to complete:
When it happens, one of those three calls will report that the request is not finished, like this:
When eventually the call completes, I can see that the Content Download took 3.8 minutes. Not sure whether it is relevant or not, but it is always 3.8 minutes:
Did anyone else encounter a similar situation? Is there a suggestion on how to figure out what is happening all of a sudden that triggers these type of behaviours?
TIA,
Ed
Edit: The resource that fails to load after 3.8 minutes always generates a net::ERR_CONNECTION_RESET error:
Edit2: Thanks to all of you trying to help. A little update: I was able to isolate to problem to an issue with the server not serving some of the files. either *.css or *.js. The setting is that of two identical servers placed behind a load balancer. Apparently, the load balancer software was recently updated and right after that we started having these issues. I am working closely with the IT department of our client, trying to figure out what is the impact of the newer version that seems to have triggered all this drama.

Rails 3 - All users stuck while someone is processing

We have a web-app here that makes hundreds of calculations with money and other very specific numbers. It has been developed in Rails 3, which I have not much experience.
Recently, I have noticed that even in Digital Ocean server, where it runs for production, whenever an user requests to open some page for example, that does any query to get data from the database, everyone else gets stuck until the processing finishes.
It just looks like rails is running on a "single-threaded" mode, similar to running the same scenario with Tomcat in development mode in Eclipse.
Is there any trick to make it work "multi-threaded"?
I don't see any reason or even logic for it to run like this, there's no sense in making everyone wait until the request of another user finishes.
Thanks in advance to everyone!
Well, I've found the solution.
To enable Rails with WEBrick to work in a "multi-threaded" way, you'll have to simply go to the config file, and add :
# Enable threaded mode
config.threadsafe!
After that, restart the server and the trick is done. It "solved" my problem, but ended up creating a thousand others, cause the application wasn't developed thinking about a "threadsafe" scenario, which means, that everytime I try to run similar requests at the same time, the application mixes up everything in memory, crashing.
Well, one problem at once!
The original from this topic is solved.

Recurring job in Hangfire works intermittently

I have 3 websites configured in IIS which use the same application pool. Each use the same code base (by nature the database is different for each client) and execute a hangfire recurring job each day. Now for 2 of the websites I don't have any problems but for one of the websites, the job does not run each day. Since the job starts immediately when a user access the website, this makes me think that the application pool is suspended and it is "awaken" when the user access the website.
I have already implemented the instructions http://docs.hangfire.io/en/latest/deployment-to-production/making-aspnet-app-always-running.html so that the application is always running. As I mentioned it works fine for the other 2 and it is just for 1 website where it does not work always. Has anybody else encountered such things before? Or does Hangfire is showing signs of instability where the same code runs perfectly fine for 2 and intermittently for 1.
Thanks
I asked this question on the hangfire forum and someone suggested the server itself did not have reosurces enough to run everything and would force-sleep inactive apps even when told not to in the config. Although there was nothing to suggest anything supporting or contradict it. I thought this was the case as the problem mostly occurred on weekend. What I am doing now is to ping the application every hour so the application pool remains active - this mechanism is incorporated within the website and is also scheduled through Hangfire. This has solved the problem and I have not had a single failure since. See https://discuss.hangfire.io/t/recurring-job-does-not-run-sometimes/1860 for further details.

Rails app stuck

I am running a rails app on Dreamhost.
Today, a strange thing happened.
A page is almost loaded (it seems to be fully loaded but the status is not 'Done') and after that, the app didn't respond on any page.
I checked out the log and even the log was not complete.
How do I know it?
There are 3 missing images on the problem page and the log showed only 2 missing images and stopped there.
So I guess that something happened between the 2nd and the 3rd missing images.
I couldn't even start 'script/console production'.
After 14 minutes, it began to behave normally.
I asked the hosting company and they said that the process was killed due to over-use of memory.
Probably something was running heavily during the period.
The same thing happened one more time.
I had to kill the process to unlock the stucked app.
Passenger version is 2.2.4 and rails version is 2.3.2.
I am afraid that I can't give more specific info.
What do you guess cause such a problem?
Thanks.
Sam
As theIV stated, look at the last action called. Start this up locally and try to go through what was happening on the server to see if it's reproducable, or if you just get any sort of general hiccups. I've run Rails apps on Dreamhost for a while, and have not experienced this before, so I would guess that it's not Dreamhosts fault, but there is no 100% on that.
Good luck!
This sounds pretty app specific. I would start by looking at what action was last hit before the process started hoggin' and then work backwards from there to see if there are any calls that might be doing something you weren't expecting. Other than that, no clue. :(
Try using NewRelic RPM or TuneUp Lite to see what process is chunking most of your memory. You can run them locally but it would be better to test it on production.

Resources