high percentage of network I/O in Dynatrace when using pgbouncer with Postgres 12 - postgresql-12

We are running some tests in Google Cloud and we are using CloudSQL Postgres 12 with cloudsqlproxy and pgbouncer.
[x ] bug report
Describe the issue
When we run tests in Google cloud using K8s and CloudSQL Postgres 12 with cloudsql authproxy and pgbouncer we are observing lot of Network I/O while executing queries.
Driver Version?
42.5.0
DT method hotspots
DT Network IO
Java Version?
11
OS Version?
Centos
PostgreSQL Version?
12
To Reproduce
Steps to reproduce the behaviour:
Happening almost all the time when using both pgbouncer and Postgres. Didn't observe earlier when directly using Postgres.
Logs
There are no error logs it's just that queries are taking lot of time to execute. Even queries as simple as which get by primary key are just hung up on SocketRead.
Any ideas will be really helpful. Let me know if more information need to be shared. What could be possible reasons of so much of network I/O? And it only seem to be an issue when working with pgbouncer.
We looked into pgbouncer logs and found nothing concrete there. There isn't any waiting showing up in the logs. We played around with configuration in pgbouncer.ini as well but nothing worked. If anyone has faced similar issue then share if anything worked.

Related

How to diagnose slow startup time in Cloud Run containers?

I am running some services with Google Cloud Run. While performance has been satisfactory, there's a recurrent issue with extremely slow startup time, which leads to occasionally dropped requests when new containers can't spin up in time.
Currently, with first gen execution environments and startup CPU boost enabled, Google's dashboard reports around 18 to 50 seconds of startup time. Image is based on ruby:3.0.2, and it runs a Ruby on Rails 6 application. In a development environment, startup (timed from run to container accepting requests) never seems to take more than 5 seconds.
I want to know what tools are available to diagnose this issue, and if there are any obvious pitfalls with my specific case that I might be missing.
I've tried playing around with the service's configuration options, to no avail. The biggest suspect is a startup bash script that handles migrations on the first boot, and asset compiling on development. However, I've tried building with an empty script, and the problem persists. I also think the container images might be too large (around 700Mb), but I haven't gotten around to slimming then down nor found evidence that this is the problem.

Why did my script error out with a Net::ReadTimeout error?

I wrote a RoR Rake script which interfaces with a Python ORM over XMLRPC to import thousands of products from one Postgres server to the Python ORM's server (also on Postgres).
The script was running fine. I stepped away from my computer and returned in 5 minutes to see the following error:
rake aborted!
Net::ReadTimeout: Net::ReadTimeout
What might have caused this error?
NOTE: I am writing this question with the intention of providing my own answer so as to help anyone in the future who might encounter this issue.
The error was caused due to my computer going to sleep, or at least the hard drive spinning down and going into low power mode.
I know the cause of the error may seem obvious, but I figure it's worth throwing out there.
I was running macOS Mojave 10.14.
I addressed the error by going to System Preferences/Energy Saver/Power Adapter and checking "Prevent computer from sleeping automatically when the display is off" and unchecking "Put hard disks to sleep when possible" (although I doubt that second one was at all necessary, I did it just in case).

neo4j on CENTOS 6.9 keeps stopping

I'm running Neo4j 3.2.6 on CENTOS 6.9. I have two problems:
First, Neo4j keeps stopping (typically about 5 times per day). There is nothing in the debug.log at the time of the failure (although there are many lines from the time I restart the service). How can I identify the problem? Which log files would give me a clue to the problem? Happy to share log files here if someone tells me which files to share.
Second, the above problem is compounded by the fact that I can't get Neo4j to restart automatically when it fails. I believe CENTOS 6.9 uses Upstart but I'm not having much luck setting this up. How do I set up Neo4j to restart on failure in CENTOS 6.9?

Neo4j database maxing CPU following upgrade to 3.0

I upgraded to Community version 3.0 and now when I open my database the CPU stays consistently above 85%. I've tried uninstalling and reinstalling, deleting the old installs and their config files and reinstalling, and letting it run in case neo4j was reindexing or similar and just needed time. The database was running very well under 3.0.0-M02 but I don't have that exe to try reinstalling that. I've tried 3.0.0-M05 which didn't help.
Can anyone suggest a way for me to get the database running properly again?
Is it doing recovery? If you start the database, does it fully start as expected and then go into this "mode"? Can you do a thread dump and paste here? For capturing a thread dump use jps -lm to figure out which process ID your neo4j process has and capture the thread dump using jstack <pid>, e.g. jstack 15432 > myfile.txt

Elastic Beanstalk Ruby processes consuming CPU

I have had a Rails 3 app deployed on Elastic Beanstalk for close to 2 years now. For the most part, I haven't had any issues; however, I recently upgraded to one of their new Ruby configurations (64bit Amazon Linux 2014.09 v1.0.9 running Ruby 2.1 (Passenger Standalone)) and I've been fighting an issue for several days where one of more Ruby processes will consume the CPU - to the point where my site becomes unresponsive. I was using a single m3.medium instance, but I've since moved to a m3.large, which only buys me some time to manually log into the EC2 instance and kill the run away process(es). I would say this happens once or twice a day.
The only thing I had an issue with when moving to the new Ruby config was that I had to add the following to my .ebextensions folder so Nokogiri could install (w/bundle install)...
commands:
build_nokogiri:
command: "bundle config build.nokogiri --use-system-libraries"
I don't think this would cause these hanging processes, but I could be wrong. I also don't want to rule out something unrelated to the Elastic Beanstalk upgrade, but I can't thing of any other significant change that would cause this problem. I realize this isn't a whole lot of information, but has anyone experienced anything similar to this? Anyone have suggestions for tracing these processes to their root cause?
Thanks in advance!
Since you upgraded your beanstalk configuration, I guess you also upgraded Ruby/Rails version. This bumped up all gem versions. The performance issue probably originate from one of these changes (and not the Hardware change).
So this brings us into the domain of RoR performance troubleshooting:
1. Check the beanstalk logs for errors. If you're lucky you'll find a configuration issue this way. give it an hour.
2. Assuming all well there, try to setup the exact same version on your localhost (passenger + ruby 2.1 + gems version). If you're lucky, you will witness the same slowness and be able to debug.
3. If you'd like to shoot straight for production debugging, I suggest you'd install newrelic (or any other application monitoring tool) and then drill into the details of the slowness in their dashboard. I found it extremely useful.
I was able to resolve my run away Ruby process issue by SSHing into my EC2 instance and installing/running gdb. Here's a link - http://isotope11.com/blog/getting-a-ruby-backtrace-from-gnu-debugger with the steps I followed. I did have to sudo yum install gdb before.
gdb uncovered an infinite loop in a section of my code that was looping through days in a date range.

Resources