Every night between the hours of 12 am to 4 am the app runs out of connections. We use c3p0 to manage the connection pool. Currently have a maxpoolsize of 20. Its very intriguing that this happens only in the night. Would appreciate if someone could guide me on how can i troubleshoot this. The reason i am stumped is that why does it not happen during the day.
Here are the steps that I have either taken or in the process of taking, I will appreciate if a pro can add further to this.
Already looked at the eventViewer logs of the machine where the app is running.
Have asked for eventViewer logs of the machine where the db is running
Have asked for sqlserver logs during that time
Have asked the DBA team to provide information on any jobs that might be running during that time
What else can i do from C3P0 side, I guess I could increase the logging, is configuring C3P0 logging as simple as just adding a logger category in my log4j.xml ? I read somewhere that I could use JMX to monitor connection pooling, would that help ? Anything anyone can tell me more, would appreciate that too.
c3p0 has some facilities built-in to debug Connection pool exhaustion. please see http://www.mchange.com/projects/c3p0/index.html#configuring_to_debug_and_workaround_broken_clients
Related
I have a Prometheus service running in a docker container and we have a group of servers that are rotating reporting up and down with the error "context deadline exceeded".
Our time interval is 15 seconds and timeout is 10 second.
The servers have been polled with no issues for months, no new changes have been identified. At first I suspected a networking issues but I have triple checked the entire path and all containers and everything is okay. I have even tcpdumped on the destination server and Prometheus polling server and can see the connections establish and complete, yet still being reported as down.
Can anyone tell me where I can find logs relating to "content deadline exceeded"? Is there any additional information I can find on what is causing this?
From other thread it seems like this is a timeout issue, but the servers are a subsecond away and again there is no packetloss occurring anywhere.
Thanks for any help.
NewRelic is showing me that over 80% of execution time in the app server is taking place in "Middleware ActiveRecord::QueryCache#call"
Here is a gist of the relevant code tested (although I see similar results on other API endpoints).
Gist
I'm running the app server on AWS Elastic Beanstalk on a t2.medium instance and a t2.small Postgres RDS DB with max_connections set to 100. I'm testing this via loader.io, doing a test of 100 users with the maintain client load setting (this means about 6000 requests a minute).
Does anyone have an idea why the QueryCache is taking so much time?
Unfortunately, this issue with QueryCache is quite common and seems to have multiple causes, but the most common is that the connection between your EC2 app server and DB was temporarily severed, and QueryCache doesn't handle this particularly well.
Remedies include increasing your default connection pool size substantially (e.g. an order of magnitude higher), disabling QueryCache entirely, or increasing read_timeout in database.yml to 15 seconds or more depending on your environment.
If the read_timeout setting resolves the problem, you may want to investigate why there are so many disconnects between your app server and db.
Another path which might not be an option for you would be to run the app server on the same machine as the db, but that doesn't work for everyone due to their architecture. It certainly can be an effective test to see if eliminating the network variable helps. Good luck.
I've been trying to set up a Redmine on google compute engine with the mysql 5.5 database hosted on google cloud sql (d1, 512mb of ram, always-on, europe, package-billed).
Unfortunately, Redmine stops responding (really stops, I set the timeout to 1hour and nothing happens) to requests after a few minutes. Using newrelic I found out that it's database-related - ActiveRecord seems to have some problems with the database ..
In order to find out if the problems are really related to the cloud sql database, I set up a new database on my own server and it's working fine since then. So there definitely is an issue with the cloud sql database and redmine/ruby.
Does anyone have an idea what I can try to solve the problem?
Best,
Jan
GCE idle connections are closed automatically after 10 minutes as explained in [1]. As you are connecting to CloudSQL from a GCE instance, this is most likely the cause for your issue.
Additionally, take into account Cloud SQL instances can go down and come back anytime due to maintenances and connections must be managed accordingly. Checking the CloudSQL instance operation list would confirm this. Hope this helps.
[1] https://cloud.google.com/sql/docs/gce-access
We have just put our system into production and we have a lot of users on the production system. Our servers keep failing and we are not sure why. It seems to start with one server then it elects a new master and then a few minutes later all the servers go down in the cluster. I have it setup to send all the writes to the read databases and to leave the writes to the master. I have looked through the logs and cannot seem to find a root cause. Let me know what logs I should upload and or where I should look. Today alone we have had to restart the servers 4 times and it fixes it for a bit but its not a cure for the issue.
All databases are 16GB ram and 8 cpus and SSDs. I have them setup with the following settings in the neo4j.properties
neostore.nodestore.db.mapped_memory=1024M
neostore.relationshipstore.db.mapped_memory=2048M
neostore.propertystore.db.mapped_memory=6144M
neostore.propertystore.db.strings.mapped_memory=512M
neostore.propertystore.db.arrays.mapped_memory=512M
We are using newrelic to monitor the server and we do not see the hardware getting above 50% CPU and 40% memory so we are pretty sure that is not it.
Any help is appreciated :)
I got the above error message running Heroku Postgres Basic (as per this question) and have been trying to diagnose the problem.
One of the suggestions is to use connection pooling but it seems Rails has this built in. Another suggestion is that the app is configured improperly and opens too many connections.
My app manages all it's connections through Active Record, and I had one direct connection to the database from Navicat (or at least I thought I had).
How would I debug this?
RESOLUTION
Turns out it was an Heroku issue. From Heroku support:
We've detected an issue on the server running your Basic database.
While we pinpoint this and address it, we would recommend you
provision a new Basic database and migrate over with PGBackups as
detailed here:
https://devcenter.heroku.com/articles/upgrade-heroku-postgres-with-pgbackups
. That should put your database on a new server. I apologize for this
disruption – we're working to fix this issue and prevent it from
occurring in the future.
This has happened a few times on my app -- somehow there is a connection leak, then all of a sudden the database is getting 10 times as many connections as it should. If it is the case that you are getting swamped by an error like this, not traffic, try running this:
heroku pg:killall
That will terminate all connections to the database. If it is dangerous for your situation to possibly cut off queries be careful. I just have a rails app, and if it goes down, losing a couple queries is not a big deal, because the browser requests will have looooooong since timed out anyway.
You might be able to find why you have so many connections by inspecting view pg_stat_activity:
SELECT * FROM pg_stat_activity
Most likely, you have some stray loop that opens new connection(s) without closing it.
To save you the support call, here's the response I got from Heroku Support for a similar issue:
Hello,
One of the limitations of the hobby tier databases is unannounced maintenance. Many hobby databases run on a single shared server, and we will occasionally need to restart that server for hardware maintenance purposes, or migrate databases to another server for load balancing. When that happens, you'll see an error in your logs or have problems connecting. If the server is restarting, it might take 15 minutes or more for the database to come back online.
Most apps that maintain a connection pool (like ActiveRecord in Rails) can just open a new connection to the database. However, in some cases an app won't be able to reconnect. If that happens, you can heroku restart your app to bring it back online.
This is one of the reasons we recommend against running hobby databases for critical production applications. Standard and Premium databases include notifications for downtime events, and are much more performant and stable in general. You can use pg:copy to migrate to a standard or premium plan.
If this continues, you can try provisioning a new database (on a different server) with heroku addons:add, then use pg:copy to move the data. Keep in mind that hobby tier rules apply to the $9 basic plan as well as the free database.
Thanks,
Bradley