We have a website created in asp-mvc4 running on iss on windows server 2012 and using MSSQL 2012 as data storage. Connections are done using entity framework-6... Very standard stuff.
We are not a high volume website (max 3000 users around the world so hitting it in different timezones)
The issue is that sometimes without warning the site becomes unresponsive (browser does not show it and time out). Nothing special but here is the strange issues:
The server itself is working fine if you terminal server into it
Restarting the ISS does not help er there are no error logs
SQL server have around 100 connections from the website all sleeping (but killing theses processes does not make the site recover)
SQL server at the time show half of them as waiting tasks but it is still responsive if executing sql from SSMS or even remote from excel (remote reporting)
Looking at SQL Profiler website is still sending in SQL request despite being down but they are all request like this: if db_id('dbname') is not null else select... (Not something specifically written in the website)
the really strange one: If we restart the SQL server the website becomes responsive again)
I know this is not a lot to go on but we are very puzzled and don't really know how to proceed. Northing indicate error in any kind of log (website, iss, sql server or windows). I can deduct it must be the website thinking SQL cannot give it what it need because connection pool or something is used up but why it is only freed up with a complete sql server restart and not just killing the processes really puzzles me, and why the connection pool buildup happen in the first place since and sql is handled in entity framework
Any advice on how to debug further is most welcome
Related
I have a windows service running using Topshelf. This service makes a lot of SQL server queries. When the hosting computer is restarted it almost always causes errors in my service due to SQL Server stopping in the middle of my service making a query. I've been asked to solve this so the logs won't have so many errors as these computers are restarted frequently.
Topshelf has some built-in WhenShutdown logic that you can use to run when the computer is shutdown/restarted, but there is still no guarantee that my service will stop before SQL Server, and based on the error frequency it pretty much always happens that way. I have tried to also use Topshelfs WhenCustomCommandReceived to listen for windows PreShutdown event as shown here, but my tests when logging any custom command received and then rebooting my computer shows no logs. I also tried adding SQL Server as a dependency to my service, but this still doesn't guarantee mine will stop before SQL Server.
I have also tried adding in the logic from this solution, but again I never see any logs indicating this code is even being executed on a restart. Any tips on how I can better solve this issue?
tldr: how to ensure my topshelf service stops before SQL server on a computer restart/shutdown
Thanks!
I've got an application that's been working for 9 months with no issue, but as of this morning (with no programming changes on my part), I'm timing out with SQL Server Error 10060.
In checking the server via SQL Studio there is some latency, and I've also had the same message, but less frequently.
In looking at the portal.azure page I'm assured that everything is OK.
What should one do when this happens? My ISP looks fine; I've got the same speed as usual.
I got the above error message running Heroku Postgres Basic (as per this question) and have been trying to diagnose the problem.
One of the suggestions is to use connection pooling but it seems Rails has this built in. Another suggestion is that the app is configured improperly and opens too many connections.
My app manages all it's connections through Active Record, and I had one direct connection to the database from Navicat (or at least I thought I had).
How would I debug this?
RESOLUTION
Turns out it was an Heroku issue. From Heroku support:
We've detected an issue on the server running your Basic database.
While we pinpoint this and address it, we would recommend you
provision a new Basic database and migrate over with PGBackups as
detailed here:
https://devcenter.heroku.com/articles/upgrade-heroku-postgres-with-pgbackups
. That should put your database on a new server. I apologize for this
disruption – we're working to fix this issue and prevent it from
occurring in the future.
This has happened a few times on my app -- somehow there is a connection leak, then all of a sudden the database is getting 10 times as many connections as it should. If it is the case that you are getting swamped by an error like this, not traffic, try running this:
heroku pg:killall
That will terminate all connections to the database. If it is dangerous for your situation to possibly cut off queries be careful. I just have a rails app, and if it goes down, losing a couple queries is not a big deal, because the browser requests will have looooooong since timed out anyway.
You might be able to find why you have so many connections by inspecting view pg_stat_activity:
SELECT * FROM pg_stat_activity
Most likely, you have some stray loop that opens new connection(s) without closing it.
To save you the support call, here's the response I got from Heroku Support for a similar issue:
Hello,
One of the limitations of the hobby tier databases is unannounced maintenance. Many hobby databases run on a single shared server, and we will occasionally need to restart that server for hardware maintenance purposes, or migrate databases to another server for load balancing. When that happens, you'll see an error in your logs or have problems connecting. If the server is restarting, it might take 15 minutes or more for the database to come back online.
Most apps that maintain a connection pool (like ActiveRecord in Rails) can just open a new connection to the database. However, in some cases an app won't be able to reconnect. If that happens, you can heroku restart your app to bring it back online.
This is one of the reasons we recommend against running hobby databases for critical production applications. Standard and Premium databases include notifications for downtime events, and are much more performant and stable in general. You can use pg:copy to migrate to a standard or premium plan.
If this continues, you can try provisioning a new database (on a different server) with heroku addons:add, then use pg:copy to move the data. Keep in mind that hobby tier rules apply to the $9 basic plan as well as the free database.
Thanks,
Bradley
I have a very peculiar problem and I'm looking for suggestions that might help me get to the bottom of it.
I have an application in .NET 3.5 (MVC3) on a SQL Server 2008 R2 database.
Locally and on two other servers, it runs fine. But on the live server there is a stored procedure that always times out after 30 seconds.
If I run the stored procedure on the database, it takes a couple of seconds. But the if the stored procedure is received by the application, then profiler says it took over 30 seconds.
The same query the profiler receives, runs immediately if we run it directly on the DB.
Furthermore, the same problem doesn't occur on any of the other 3 local servers.
As you can understand, it's driving me nuts and I don't even have a clue how to diagnose this.
The even logs just show the timeout as a warning.
Has anyone had anything like this before and where could I start looking for a fix?
Many thanks
You probably have some locking taking place in your application that doesn't occur when running the query on the server.
To test this run your query in your application using READ UNCOMMITTED or the NOLOCK hint. If it works you need to check your sequence of calls or check to see whether your isolation level isn't too aggressive.
These can be tricky to nail down.
I have an Umbraco website that I have to restart every morning in order for the users to be able to publish content. Is there any solutions available that will help me get around doing this each morning?
1 - Document why do you "have to" restart IIS every morning
like the web app can't re-establish connection with SQL
or one process gets so huge that it's obvious it's leaking
or one process heats up with huge CPU usage and IIS keeps dropping requests
etc. etc. have to check log files, EventLog, SQL Server has it's own log
2 - Document usage patters of the site
like does it sit idle for 8-10 h or is busy all night
if it's busy then log files (including IIS log) will provide some info on when a problem started
if it's idle for a long time, check that AppPool for the site has automatic recycling of worker process set say after 1h of inactivity - you can also set diferent recycling tactics
if it's SQL connection after along idle period - Kerberos ticket for the account expired.
you do have a domain account under which that AppPool runs I hope
to fix that, look at DB connection string (normally in web.config) and check MSDN for params
or bring up a new web site or app that's going to keep pinging a web method which will just do a little query ( like a count on some table) and return the result as a kind of admin heartbeat -- this helps only if you acsually see SQL connection issue
3 - Check if you have multiple sites / web apps running on the server
that each has it's own AppPool and that they run under a domain account
that each app has it's own, separate folder for logs and any other writable files
that each AppPool has recycling tactics that's good for actual usage pattern
needs different recycling tactics if it's busy all the time
ask sor some mininal kind of heartbeat web service to be developed and pinged for ops needs
running as part of each web app and using the same SQL connection
if you don't have the budget for this raise some hell
makes you feel good :-)