dse-agents leaking filedescriptors and memory

dse-agents leaking filedescriptors and memory - datastax-enterprise

Every time we backup our 3 nodes DSE cluster (5.1.2), Datastax agents (6.1.2) leak memory (~ 25MB) and filedescriptors (~ 1800).
Since we make quite frequent backups, this is rather annoying.
Is this a know issue or a normal behaviour (may be the FD & RAM usage settles after a while) ?

This could be OPSC-12900 'Backups leaking file handles', an issue seen with the datastax-agent.
If you review an lsof -p <agent pid> and find that the vast majority of handles/descriptors are referencing snapshot directories, this is a likely candidate.
I recommend raising a ticket with Datastax support to get the latest status on this.

Related

Memory leak behavior in Neo4j community edition

I previously posted on the neo4j mailing list (https://groups.google.com/forum/#!topic/neo4j/zn-7lKHVvNI) but haven't received any response from the community, so I'm x-posting here...
I've noticed what appears to be memory leak behavior from neo4j community edition. Running this test code (https://gist.github.com/mlaldrid/85a03fc022170561b807) against 2.1.2 (also tested against 2.0.3) and a 512MB heap results in GC churn after a few hundred thousand cypher queries. Eventually I either get an OutOfMemory error or jetty times out.
However, when I run the same test code against an eval copy of the neo4j enterprise edition it proceeds though 3.5M queries with no signs of bumping up against the 512MB heap limit. I killed the test after that, satisfied that the behavior was sufficiently different from the community edition.
My questions are thus: Why is this memory leak behavior different in the community and enterprise editions? Is it something that the enterprise edition's "advanced caching" feature solves? Is it a known but opaque limitation of the community edition?
Thanks for any insight on this issue.

This is a recently discovered memory leak in 2 of the 4 cache types available for the community edition (weak and soft caches). It does not affect enterprise as enterprise uses the 'hpc' cache per default.
It only affects deployments where you are unlikely to read from the existing data in the db, or where the majority of load on the system is writes.
We've got a fix for this which will go out in subsequent releases. For now, if your use case is unfortunate enough to trigger this issue, you'll need to use either the 'strong' cache or 'none' in community, or switch to enterprise until the next patch release.

I'm posting output of Sampler of jvisualvm
I guess this answer the question as the leak is still there in 2.2.0
Edit:
The problem was using ExecutionEngine. I used execute method on GraphDatabaseService instead and it solved my problem

Why is my PostgreSQL server cpu constrained?

My database is very cpu constrained, and I can't find the root cause of the issue. I currently have two applications servers each wit a Rails api connecting to PostgreSQL via the ruby-pg gem. Both application server also have sidekiq running background jobs, and I have a handful of support servers processing new posts from a national feed via sidekiq. If I were running out of memory, the solution would seemingly be straight forward. Any general ideas why I am CPU constrained?
Database Specs:
Rackspace 8GB Performance Tier cloud VM (8GB RAM, 8x Core CPU, SSD)
Debian 7 Wheezy Linux OS
PostgreSQL 9.1 with PostGIS extension
Possible Problems:
PostgreSQL 9.1 is bad at indexes
The database has nearly 10GB of indexes. I am going to upgrade my database to PostgreSQL version >= 9.2. In version 9.2, index only scans were introduced.
Too many connections
In the postgresql.conf, I have set max connection equal to '500'. Usually throughout the day, only 175 connections are utilized, but during peak times, sidekiq tasks will increase the current connections to 350. How many connections are recommended with an 8GB server instance?
Idol Connections
When I take a look at pg_stat_activity in the psql console, I see sidekiq is leaving a lot of IDLE connections. Could these connections result in CPU inflation? Does the fix exist in the api or in sidekiq?
Need a more powerful server
Maybe there is not a bug. I might need to simply increase the server instance. Again this would make more sense if I was memory bound. However, both app servers and 3 of the support sidekiq servers are 4gb performance tier instances. Essentially, servers that interact with the database have combined more than double the resources of the database. Should this even matter?
Additional questions:
What tools/techniques should I employ to troubleshoot the issue?
Any basic settings in the postgresql.conf related to cpu usage?
Are there any known issues related to rails, sidekiq, or the pg gem that could be a contributing factor? (I havent seen any open issues.)
Are there any general postgreSQL guideline for CPU usage?
Any other ideas thoughts that might help my search?

You are using massively too many concurrent connections. PostgreSQL will be wasting lots of its time on housekeeping and juggling concurrent queries. All the concurrent work will be fighting for CPU and buffer space, there'll be heavy contention on spinlocks, and it'll all generally be a mess.
On an 8 core machine, you should probably not have more than 20 actively working connections if you're mostly CPU constrained. If you're I/O limited, you can go higher, but 350 is just ridiculous.
If possible, put a PgBouncer in transaction pooling mode in front of your PostgreSQL instance, so queries get queued up and executed rapidly in series instead of slowly in parallel.
See number of database connections (Pg wiki).
Additionally, PostGIS can be very CPU-heavy. It sometimes needs to do very complex calculations. I suggest using the auto_explain module to record long running queries, and using pg_stat_statements / pg_stat_plans to record what's taking up resources. Examine these queries to see if they need improvement.
Your idle in transaction sessions must be dealt with, too. Depending on why they're idle and whether they have a transaction ID or not, they might be causing serious table bloat. They're also creating unnecessary signalling overhead within PostgreSQL, as it has to do more co-ordination with backends that're actively doing things. Finally, the number of open transactions its self increases the cost of some internal housekeeping operations.
So. Your DB will probably perform better if you reduce the connection counts, put a PgBouncer in transaction pooling mode in front, and fix those idle connections.

Most likely you are CPU constrained because your work needs a lot of CPU. :)
9.1 is not generally bad at indexes. There may be some specific issues, as all versions might, which exactly what they are might change from version to version.
Index-only-scans are mostly a benefit when you are IO constrained. I wouldn't hold out much hope for that being a magic bullet for you.
350 connections are certainly not helpful, but probably are not very harmful, either. But when they are harmful, it can be downright catastrophic. The correct value is more determined by the number of cores, not the amount of RAM. If it is easy to throttle down the sidekiq connections, do it even if you can't prove that it helps.
If the connections are just IDLE, not IDLE in transaction, then they probably aren't very harmful, but again there are a few cases where they can be. That is pretty much the same issue as the number of connections.
The connection you showed from top was idle in transaction. That status shouldn't be taking up much CPU, so that probably means it is rapidly cycling through statements and top just happens to catch it while it is between them. But you didn't say how many similar lines there were in top, if it is just that one it suggests your code is not running concurrently and 7 of you 8 CPUs are wasted.
Regarding the db server versus the other servers, if the database is fundamentally the limit, beating on it with a bigger hammer is not going to help. Often there is some flexibility about where computation is done. If you can get the app servers to do more computation that is currently done on the db and let the db focus on ACID issues, that would be good. But no one but you can know if that is possible or feasible.
My first stop would be to use pg_stat_statements to see what SQL statements are taking the most time. Maybe just adding an index to the slowest/most frequent query would make the problem magically go away.

Appfog instances vs memory

I'm developing an API on Appfog and want to know what to focus on (more memory with one instances or more instances with lower memory).
Appfog gives you free 2GB of RAM and up to 16 instances if each instances get 128 MB RAM.
My application uses PHP, MySql and Memcachier.
I want to launch it soon and want to know which configuration is best for my server.
What is the benefit with more RAM or instances?
Thanks for helping :)
Best Regards,
Johnny

You want as many instances as your app will run without running out of memory :). More instances means better performance and uptime. However, if an instance runs out of memory it will be shut down leaving your app running with fewer instances until they all collapse. You can diagnose this problem with the af apps and af logs <appname> --all commands. If the app is running at < 100% regularly then the instance memory budget may be too low. When there are down instances, the logs command may reveal memory limit reached errors.
Memory Recommendations
Here are some memory recommendations to start out with: Wordpress with several installed plugins will need > 512mb to be stable. For lean custom PHP apps 128mb is usually sufficient but should be watched. If an app is using a framework try 256mb. These memory limits may seem high but it's really the peak memory usage not the average usage.
Load Test
Load testing using Seige can help find a memory / instance balance. It does this by determining if your app is peaking out over the memory limit. Scale the app down to 1 instance and siege with 5, 10, and 15 concurrent connections progressively increasing by 5 until the app falls over. If the app does stop, bump the memory up and try again.

Reducing Redmine's memory usage - Low Hanging Fruit

I am running a Redmine instance with Passenger and Nginx. With only a handful of issues in the database, Redmine consumes over 80mb of RAM.
Can anyone share tips for reducing Redmine's memory usage. The Redmine instance is used by 3 people and I am willing to sacrifice on speed.

There are not really and low hanging fruits. And if there were, we would've already included and activated them by default.
80 MB RSS (as opposed to virtual size which can be much more) is actually pretty good. In normal operation, it will use between 70 and 120 MB RSS per process (depending on the deployment model, rather few on passenger).
As andrea suggested, you can reduce your overall memory footprint by about one third when you use REE (Ruby Enterprise Edition, which is also free). But this saving can only achieved when you run more than one process (each requiring the above memory). REE achieves this saving by optimizing Ruby for a technique called Copy on Write, so that additional application processes take less memory.
So I'm sorry, your (hypothetical) 128 MB vServer will probably not suffice. For a small installation, you might be able to squeeze a minimal installation into 256MB, but it only starts to be anything but a complete pain in the ass at 512 MB (including database).
That's because of how Rails applications work in contrast to things like PHP. They require a running application server instance. That instance is typically able to answer one request at a time, using about the same amount of memory all the time. So your memory consumption is roughly equivalent to the number of application processes you run, independent of actual load. But if you tune your system properly, you can get quite a number of reqs/s out of one process.

May be i am replying very late but i got stuck in the same issue and I found a link to optimize ruby/rails memory usage, which works for me
http://community.webfaction.com/questions/2476/how-can-i-reduce-my-rubyrails-memory-usage-when-running-redmine
It may be helpful for someone else.

Delayed Jobs leaking memory?

I'm using collectiveidea's delayed_job with my Ruby on Rails app (v2.3.8), and running about 40 background jobs with it on an 8GB RAM Slicehost machine (Ubuntu 10.04 LTS, Apache 2).
Let's say I ssh into my server with no workers running. When I do free -m, I'm see I'm generally using about 1GB of RAM out of 8. Then after starting the workers and waiting about a minute for them to be utilized by the code, I'm up to about 4GB. If I come back in an hour or two, I'll be at 8GB and into the swap memory, and my website will be generating 502 errors.
So far I've just been killing the workers and restarting them, but I'd rather fix the root of the problem. Any thoughts? Is this a memory leak? Or, as a friend suggested, do I need to figure out a way to run garbage collection?

Actually, Delayed::Job 3.0 leaks memory in Ruby 1.9.2 if your models have serialized attributes. (I'm in the process of researching a solution.)
Here's someone who seemed to have solved it, http://spacevatican.org/2012/1/26/memory-leak-in-yaml-on-ruby-1-9-2
Here's the issue from Delayed::Job https://github.com/collectiveidea/delayed_job/issues/336

Just about every time someone asks about this, the problem is in their code. Try using one of the available profiling tools to find where your job is leaking. ( https://github.com/wycats/ruby-prof or similar.)
Triggering GC at the end of each job will reduce your max memory usage at the cost of thrashing your throughput. It won't stop Ruby from bloating to the max size required by any individual job, however, since Ruby can't free memory back to the OS. I don't recommend taking this approach.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart