One of the service's container is constantly restarting. From the logs I can see that some request take like 20s, and for some of them there are exceptions like: An exception occurred in the database while iterating the results of a query. System.InvalidOperationException: An operation is already in progress. at Npgsql.NpgsqlConnection or Timeouts. When I try to access the db with the local environment, I cannot reproduce such exceptions. On random requests, taking too long, the container restarts. Have somebody had some similar issue?
As the exception says, your application is likely trying to use the same physical connection at the same time from multiple threads - but it's impossible to know without seeing some code. Make sure you understand exactly when connections are being used and by which thread, and if you're still stuck try to post a minimal code sample that demonstrates the issue.
If you are using ELB( Elastic Load Balancer ) then increase timeout limit of it .
Related
I have a windows service running using Topshelf. This service makes a lot of SQL server queries. When the hosting computer is restarted it almost always causes errors in my service due to SQL Server stopping in the middle of my service making a query. I've been asked to solve this so the logs won't have so many errors as these computers are restarted frequently.
Topshelf has some built-in WhenShutdown logic that you can use to run when the computer is shutdown/restarted, but there is still no guarantee that my service will stop before SQL Server, and based on the error frequency it pretty much always happens that way. I have tried to also use Topshelfs WhenCustomCommandReceived to listen for windows PreShutdown event as shown here, but my tests when logging any custom command received and then rebooting my computer shows no logs. I also tried adding SQL Server as a dependency to my service, but this still doesn't guarantee mine will stop before SQL Server.
I have also tried adding in the logic from this solution, but again I never see any logs indicating this code is even being executed on a restart. Any tips on how I can better solve this issue?
tldr: how to ensure my topshelf service stops before SQL server on a computer restart/shutdown
Thanks!
In periods we are experiencing many "Request timed out" exceptions (System.Web.HttpException) from a specific endpoint that is called often.
It appears not to be related to high-peak periods and has been experienced right after deployment and at random times. No pattern.
The solution is not to increase the execution timeout as the requests are normally completed within seconds.
Neither the web server nor the backend SQL Server is stressed. We have even seen low CPU usage during an incident period.
From ApplicationInsights I got the exact endpoint failing, which is a standard controller action. However, there is no additional information. No stack trace. No error code. Nothing. The exception is thrown at any time between 1 second and minutes after the request start.
From ApplicationInsight I can see that some of the requests to the failing endpoint are completed. However, the response time is extremely long (up to 8 minutes).
I have found nothing in the IIS logs. We have set up the failed request logging and waiting for the next incident. However, we do not expect to get more information than we already got from ApplicationInsights.
I'm uncertain whether this is an ASP.NET MVC application issue or an IIS configuration. It puzzles me, that no stack trace is available.
Any suggestions on how to approach this challenge? Pointers to articles/blogs that can help me solve the issue are very much appreciated.
UPDATE
I was looking through our trace logs and realized that they were not complete, i.e., entries were missing. We use ApplicationInsights (AI) for tracing. AI is configured to keep all traces, exceptions, and events, and it is working flawlessly in DEV and STAGING.
We have two AI environments: AI-PROD and AI-TEST. The environment is selected in web.config via instrumentation key. The entire AI config is in the ApplicationInsights.config and this file is the same in DEV, STAGING, and PROD.
I tried to connect STAGING to the AI-PROD environment to verify that it was not a problem with the environment. It worked flawlessly.
I disabled AI in PROD and the server started without throwing “Request timed out” errors during startup. When PROD is connected to either the AI-PROD or the AI-TEST environment I get “Request timed out” errors during startup.
I have a ruby app in production that uses sidekiq (that uses redis) and I have managed to discover that flushall commands are being called which cause the database to be wiped (thus removing all the processed and scheduled jobs).
I don't know or understand what could be causing this.
Does anyone know how I can begin to trace the call to flushall?
Thanks,
It is most likely that your Redis server is open to the public network without any protection - that is just calling for trouble because anyone can connect and do much more damage than just a FLUSHALL. If that it the case, use password authentication at the very least, after burning the compromised server - the attacker may have gained access to your server's operating system and from there who knows where. More information at: http://antirez.com/news/96
If that isn't the case and you have a rogue application somewhere that randomly calls unwanted commands, you can try tracking it by combining the MONITOR and CLIENT LIST.
Lastly, you can consider renaming/disabling the FLUSHALL command, at least temporarily, until you get to the bottom of this.
Shortly, our project uses a Thrift server and mobile clients with multiplexing.
While I was developing the iOS client, I encountered a strange problem;
When I first created the client and made calls, it is OK and it works as expected.
Since there is no close method for Cocoa Thrift client, I am hoping ARC will take care of it.
After some time, I create another client for the same service and do the same things, but this time, when I made a service call, client hangs and after some time in throws a "'TTransportException', reason: 'Cannot read. Remote side has closed.'".
In the server, operation is successfully completed and the value returned.
Does anybody have an idea about what I am doing wrong?
Thanks in advance!
Reading your question i remembered that we encountered a very similar problem in very a different environment. If ARC takes care of your client and closes the connection, especially the port, this might be the reason why recreating the client again with the same port is the root of your problem. Opening the same port shortly after closing it can take a very long time (minutes) depending on timeouts.
Sorry no real answer to your problem but maybe a hint were to look for.
I have a quite big application, running from inside spree extension. Now the issue is, all requests are very slow even locally. I am getting messages like 'Waiting for localhost" or "waiting for server" in my browser status bar for 3 - 4 seconds for each request issued, before it starts execution. I can see execution time logged in log file is quite good. But overall response time is poor because of initial delay. So please suggest me, where can I start looking into improving this situation?
One possible root cause for this kind of problem is that initial DNS name resolution is failing before eventually resolving. You can check if this is the case using tcpdump (if that's available for your platform) or wireshark. Look for taffic to and from your client host on port 53 and see if the name responses are happening in a timely fashion.
If it turns out that this is the problem then you need to make sure that the client is configured such that the first resolver it trys knows about your server addresses (I'm guessing these are local LAN addresses that are failing). Different platforms have different ways of configuring this. A quick hack would be to put the address of your server in the client's hosts file to see if that fixes it.
Once you send in your request, you will see 'waiting for host' right up until the Ruby work is done, and it starts sending a response. So, if there is pretty much any processing work that is slowing you down, you'd see this error. What you'd want to do is start looking at the functions that youre seeing the behaviour on, and breaking them down into pieces to see which peices are slow. If EVERYTHING is slow, than you need to look at the things that are common to every function - before functions, or Application Controller code, or something similar. What I do, when I'm just playing around to see what I need to fix is just put 'puts' statements in my code at different stages, to print the current time, then I can see which stage is taking a long time, you know?