ruby requests more memory when there are plenty free heap slots - ruby-on-rails

We have a server running
Sidekiq 4.2.9
rails 4.2.8
MRI 2.1.9
This server periodically produce some amount of importing from external API's, perform some calculations on them and save these values to the database.
About 3 weeks ago server started hanging, as I see from NewRelic (and when ssh'ed to it) - it consumes more and more memory over time, eventually occupying all available RAM, then server hangs.
I've read some articles about how ruby GC works, but still can't understand, why at ~5:30 AM heap size jumps from ~2.3M to 3M , when there's still 1M free heap slots available(GC settings are default)
similar behavior, 3:35PM:
So, the questions are:
how to make Ruby fill free heap slots instead of requesting new slots from OS ?
how to make it release free heap slots to the system ?

how to make Ruby fill free heap slots instead of requesting new slots from OS ?
Your graph does not have "full" fidelity. It is a lot to assume that GC.stat was called by Newrelic or whatnot just at the exact right time.
It is incredibly likely that you ran out of slots, heap grew and since heaps don't shrink in Ruby you are stuck with a somewhat bloated heap.
To alleviate some of the pain you can limit RUBY_GC_HEAP_GROWTH_MAX_SLOTS to a sane number, something like 100,000 will do, I am trying to lobby setting a default here in core.
Also
Create a persistent log of jobs that run and time they ran (duration and so on), gather GC.stat before and after job runs
Split up your jobs by queue, run 1 queue on one server and other queue on another one, see which queue and which job is responsible for the problem
Profile various jobs you have using flamegraph or other profiling tools
Reduce the amount of concurrent jobs you run as an experiment, or place a mutex between certain job types. It is possible that 1 "job a" at a time is OKish, and 20 concurrent "job a"s at a time will bloat memory.

Related

Neo4j randomly high CPU

Neo4j 3.5.12 Community Edition
Ubuntu Server 20.04.2
RAM: 32 Gb
EC2 instance with 4 or 8 CPUs (I change it to accommodate for processing at the moment)
Database files: 6.5Gb
Python, WSGI, Flask
dbms.memory.heap.initial_size=17g
dbms.memory.heap.max_size=17g
dbms.memory.pagecache.size=11g
I'm seeing high CPU use on the server in what appears to be a random pattern. I've profiled all the queries for the pages that I know that people are visiting at those times and they are all optimised with executions under 50ms in all cases. The CPU use doesn't seem linked with user numbers which are very low at most times anyway (max 40 concurrent users). I've checked all queries in cron jobs too.
I reduced the database notes significantly and that made no difference to performance.
I warm the database by preloading all nodes into ram with MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN count(n.prop) + count(r.prop);
The pattern is that there will be a few minutes of very low CPU use (as I would expect from this setup with these user numbers) and then processing on most CPU cores goes up to the high 90%s and the machine becomes unresponsive to new requests. Changing to an 8CPU instance sorts it, but shouldn't be needed for this level of traffic.
I would like to profile the queries with query logging, but the community edition doesn't support that.
Thanks.
Run a CPU profiler such as perf to record where CPU time is spent. You can then visualize it as a FlameGraph or, since your bursts only occur at random intervals, visualize it over time with Netflix' FlameScope
Since Neo4j is a Java application, it might also be worthwhile to have a look at async-profiler which is priceless when it comes to profiling Java applications (and it generates similar FlameGraphs and can output log files compatible with FlameScope or JMC)

Multiple workers on machine - Memory management ( Resque - Rails )

We've migrated our Resque background workers from a ton of individual instances on Heroku to between four and ten m4.2xl (32GB Mem) instances on EC2, which is much cheaper and faster.
A few things I've noticed are that: The Heroku instances we were using had 1GB of RAM and rarely ran out of memory. I am currently allocating 24 workers to one machine, so about 1.3GB of memory per worker. However, because the processing power on these machines is so much greater, I think the OS has trouble reclaiming the memory fast enough and each worker ends of consuming more on average. Most of the day the system has 17-20GB memory free but when the memory intensive jobs are run, all 24 workers grab a job almost at the same time and then start growing. They get through a few jobs but then the system hasn't had time to reap memory and crashes if there is no intervention.
I've written a daemon to pause the workers before a crash and wait for the OS to free memory. I could reduce the number of workers per machine overall or have half of them unsubscribe from the problematic queues, I just feel there must be a better way to manage this. I would prefer to be making usage of more than 20% of memory 99% of the day.
The workers are setup to fork a process when they pick up a job from the queue. The master-worker processes are run as services managed with Upstart. I'm aware there are a number of managers which simply restart the process when it consumes a certain amount of memory such as God and Monit. That seems like a heavy handed solution which will end with too many jobs killed under normal circumstances.
Is there a better strategy I can use to get higher utilization with a lowered risk of running into Errno::ENOMEM?
System specs:
OS : Ubuntu 12.04
Instance : m4.2xlarge
Memory : 32 GB

Rails application servers

I've been reading information about how different rails application servers work for a while and some things got me confused probably because of my lack of knowledge in this field. Anyway, the following things got me confused:
Puma server has the following line about its clustered mode workers number in its readme:
On a ruby implementation that offers native threads, you should tune this number to match the number of cores available
So if I have, lets say, 2 cores and use rubinius as a ruby implementation, should I still use more than 1 process considering that rubinius use native threads and doesn't have the lock thus it uses all the CPU cores anyway, even with 1 process?
I understand it that I'd need to only increase the threads pool of the only process if I upgrade to a machine with more cores and memory, if it's not correct, please explain it to me.
I've read some articles on using Server-Sent Events with puma which, as far as I understand, blocks the puma thread since the browser keeps the connection open, so if I have 16 threads and 16 people are using my site, then the 17th would have to wait before one of those 16 leaves so it could connect? That's not very efficient, is it? Or what do I miss?
If I have a 1 core machine with 3Gb of RAM, just for the sake of the question, and using unicorn as my application server and 1 worker takes 300 MBs of memory and its CPU usage is insignificant, how many workers should I have? Some say that the number of workers should be equal to the number of cores, but if I set the workers number to, lets say, 7 (since I have enough RAM for it), it will be able to handle 7 concurrent requests, won't it? So it's just a question of memory and cpu usage and amount of RAM? Or what do I miss?

Rails rake parallelization thresholds and caveats

This is the first time that I've actually run into timing issues regarding the task I have to tackle. I need to do a calculation (running against a webservice) with approximately 7M records. This would take more than 180hrs, so I was thinking about running multiple instances of the webservice on EC2 and just running rake tasks in parallel.
Since I have never done this before, I was wondering what needs to be considered.
More precisely:
What's the maximum number of rake tasks I can run (Is there any limit
at all besides your own machine power)?
What's the maximum number of concurrent connections to a postgres 9.3
db?
Are there any things to be considered when running multiple
active_record.save actions at the same time?
I am looking forward to hearing your thoughts.
Best,
Phil
rake instances
Every time you run rake, you are running a new instance of your ruby server, with all associated memory and related load-dependency usages. Look in your Rakefile for the inits.
your number of instances in limited by memory and CPU used
you must profile each memory and CPU to know how many can be run
you could write a program to monitor and calculate what's possible, but heuristics will work better for one-off, and first experiments.
datastore
heuristically explore your database capacity, too.
watch for write-locks that create blocking
watch for slow reads due to missing indices
look at your postgres configs to see concurrency limits, cache size, etc.
.save
each rake task is its own ruby server, so multiple active_record.save actions impacts:
blocking/waiting due to write-locking
one instance getting 'old' data that was read prior to another's update .save
operational complexity
the number of records (7MM) is just a multiplier for all of the operations that occur upon each record. The operational complexity is the source of limitation, since theoretically, running 7MM workers would solve the problem in the minimum timescale
if 180hr is accurate (dubious), then (180 * 60 * 60 * 1000) / 7000000 == 92.57 ms per process.
Look for any shared-resource that is an IO blocker.
look for any common calculation that you can do in advance and cache. A lookup beats a calc.
errata
leave headroom for base OS processes. These will vary by your environment, but you mention AWS but best to conceptually learn how to monitor any system for activity
run top in a separate screen / terminal as the rakes are running.
Prefer to run 2 tops in different screens. sort 1 by memory, sort the other by CPU
have a way to monitor the rakes
watch for events that bubble up the top processes.
if you do this long / well enough, you've profiled you headroom
run more rakes to fill your headroom
don't overrun your memory or you'll get swapping
You may want to consider beanstalk instead, but my guess is you'll find that more complicated than learning all these good foundations, first.

Reducing Redmine's memory usage - Low Hanging Fruit

I am running a Redmine instance with Passenger and Nginx. With only a handful of issues in the database, Redmine consumes over 80mb of RAM.
Can anyone share tips for reducing Redmine's memory usage. The Redmine instance is used by 3 people and I am willing to sacrifice on speed.
There are not really and low hanging fruits. And if there were, we would've already included and activated them by default.
80 MB RSS (as opposed to virtual size which can be much more) is actually pretty good. In normal operation, it will use between 70 and 120 MB RSS per process (depending on the deployment model, rather few on passenger).
As andrea suggested, you can reduce your overall memory footprint by about one third when you use REE (Ruby Enterprise Edition, which is also free). But this saving can only achieved when you run more than one process (each requiring the above memory). REE achieves this saving by optimizing Ruby for a technique called Copy on Write, so that additional application processes take less memory.
So I'm sorry, your (hypothetical) 128 MB vServer will probably not suffice. For a small installation, you might be able to squeeze a minimal installation into 256MB, but it only starts to be anything but a complete pain in the ass at 512 MB (including database).
That's because of how Rails applications work in contrast to things like PHP. They require a running application server instance. That instance is typically able to answer one request at a time, using about the same amount of memory all the time. So your memory consumption is roughly equivalent to the number of application processes you run, independent of actual load. But if you tune your system properly, you can get quite a number of reqs/s out of one process.
May be i am replying very late but i got stuck in the same issue and I found a link to optimize ruby/rails memory usage, which works for me
http://community.webfaction.com/questions/2476/how-can-i-reduce-my-rubyrails-memory-usage-when-running-redmine
It may be helpful for someone else.

Resources