Jruby, Garbage Collector, Redis - memory

I have an Jruby On Rails app which uses several WS's to gather data. The app processes the data displays it to the user, the user make his changes and then is sent back to the WS.
The problem here is that i store everything in the cache (session based) which is using memory store. But from time to time with no clear reason (for me at least) this error pops up:
ActionView::Template::Error (GC overhead limit exceeded)
I read what I could find about it and apparently this means that the Garbage Collector spends to much time in trying to free memory and no real progress is being made in that direction. My guess is that since everything is stored cache like into memory the GC tries to free it and can't do it and throws this error.
So here are the questions.
How can I get around this ?
If I switch from memory store to Redis, if my assumptions are correct, will this problem still appear.
Will the GC try to free Redis's memory area ? (Might be a stupid question but ... please help nonetheless :) )
Thank you.

Redis is a separate process, so your app garbage collector won't touch it. However, it is possible that redis is using up all the memory that would otherwise be available for the app, or that you are actually using a process memory cache and not redis, which is a different type of memory store.

Related

How do I figure out why a large chunk of memory is not being garbage collected in Rails?

I'm pretty new to Ruby, Rails, and everything else in that ecosystem. I've joined a team that has a Ruby 3.1.2 / Rails 6.1.7 app backed by a Postgres database.
We have a scenario where, sometimes, memory usage on one of our running instances jumps up significantly and is never relinquished (we've waited days). Until today, we didn't know what was causing it or how to reproduce it.
It turns out that it's caused by an internal tool which was running an unbounded ActiveRecord query -- no limit and no paging. When pointing this tool at a more active customer, it takes many seconds, returns thousands of records, and memory usage increases by tens of MB. No amount of waiting will lead to the memory usage going back down again.
Having discovered that, we recently added paging to that particular tool, and in the ~week since, we have not seen usage increasing in giant chunks anymore. However, there are other scenarios which have similar behavior but with smaller payloads; these cause memory usage to increase gradually over time. We deploy this application often enough that it hasn't been a big deal, but I am looking to gain a better understanding of what's happening and to determine if there's a problem here, because that's not what we should see from a stable application that's free of memory leaks.
My first suspicion was a memoized instance variable on the controller, but a quick check indicates that Rails controllers are discarded as soon as the request finishes processing, so I don't think that's it.
My next suspicion was that ActiveRecord was caching my resultset, but I've done a bunch of research on how this works and my understanding is that any cached queries/relations should be released when the request completes. Even if I have that wrong, a subsequent identical request takes just as long and causes another jump in memory usage, so either that's not it, or caching is broken on our system.
My Google searches turn up lots of results about various caching capabilities in Rails 2, 3, and 5 -- but not much about 6.x, so maybe something significant has changed and I just haven't found it.
I did find ruby-prof, memory-profiler, and get_process_mem -- these all seem like they are only suitable for high-level analysis and wouldn't help me here.
Can I explore the contents of the object graph currently in memory on an existing, live instance of my app? I'm imagining that this would happen in the Rails console, but that's not a constraint on the question. If not, is there some other way that I could find out what is currently in memory, and whether it's just a bunch of fragmented pages or if there's actually something that isn't getting garbage collected?
EDIT
#engineersmnky pointed out in the comments that maybe everything is fine and that perhaps Ruby is just still holding on to the OS page due to some other still-valid object therein. However, if this is the case, it strikes me as unlikely that memory usage would not go back down to the previous baseline after several days of production usage.
Loading tens of MB worth of resultset into memory should result in the allocation of >1000 16kb memory pages in just a handful of seconds. It seems reasonable to assume that the vast majority of those would contain exclusively this resultset, and could therefore be released as soon as the resultset is garbage collected.
Furthermore, I can reproduce the increased memory usage by running the same unbounded ActiveRecord query in the Rails console, and when I close that console, the memory goes down almost immediately -- exactly what I was expecting to see when the web request completes. I don't fully understand how the Rails console works when connecting to a running application, though, so this may not be relevant.

Sidekiq does not release memory after job is processed

I have a ruby on rails app, where we validate records from huge excel files(200k records) in background via sidekiq. We also use docker and hence a separate container for sidekiq. When the sidekiq is up, memory used is approx 120Mb, but as the validation worker begins, the memory reaches upto 500Mb (that's after a lot of optimisation).
Issue is even after the job is processed, the memory usage stays at 500Mb and is never freed, not allowing any new jobs to be added.
I manually start garbage collection using GC.start after every 10k records and also after the job is complete, but still no help.
This is most likely not related to Sidekiq, but to how Ruby allocates from and releases memory back to the OS.
Most likely the memory can not be released because of fragmentation. Besides optimizing your program (process data chunkwise instead of reading it all into memory) you could try and tweak the allocator or change the allocator.
There has been a lot written about this specific issue with Ruby/Memory, I really like this post by Nate Berkopec: https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html which goes into all the details.
The simple "solution" is:
Use jemalloc or, if not possible, set MALLOC_ARENA_MAX=2.
The more complex solution would be to try and optimize your program further, so that it does not load that much data in the first place.
I was able to cut memory usage in a project from 12GB to < 3GB by switching to jemalloc. That project dealt with a lot of imports/exports and was written quite poorly and it was an easy win.

DelayedJob doesn't release memory

I'm using Puma server and DelayedJob.
It seems that the memory taken by each job isn't released and I slowly get a bloat causing me to restart my dyno (Heroku).
Any reason why the dyno won't return to the same memory usage figure before the job was performed?
Any way to force releasing it? I tried calling GC but it doesn't seem to help.
You can have one of the following problems. Or actually all of them:
Number 1. This is not an actual problem, but a misconception about how Ruby releases memory to operating system. Short answer: it doesn't. Long answer: Ruby manages an internal list of free objects. Whenever your program needs to allocate new objects, it will get those objects from this free list. If there are no more objects there, Ruby will allocate new memory from operating system. When objects are garbage collected they go back to the free list. So Ruby still have the allocated memory. To illustrate it better, imagine that your program is normally using 100 MB. When at some point program will allocate 1 GB, it will hold this memory until you restart it.
There are some good resource to learn more about it here and here.
What you should do is to increase your dyno size and monitor your memory usage over time. It should stabilize at some level. This will show you your normal memory usage.
Number 2. You can have an actual memory leak. It can be in your code or in some gem. Check out this repository, it contains information about well known memory leaks and other memory issues in popular gems. delayed_job is actually listed there.
Number 3. You may have unoptimized code that is using more memory than needed and you should try to investigate memory usage and try to decrease it. If you are processing large files, maybe you should do it in smaller batches etc.

Can a Memcached daemon ever free() unused memory, without terminating the process?

I believe that you can't force a running Memcached instance to de-allocate memory, short of terminating that Memcached instance (and freeing all of the memory it held). Does anyone know of a definitive piece of documentation, or even a mailing list or blog posting from a reliable source, that can confirm or deny this impression?
As I understand it, a Memcached process initially allocates a chunk of memory (the exact initial allocation size is configurable), and then monotonically increases its memory utilization over its lifetime, limited by the daemon's maximum memory allocation size (also configurable). At no point does the Memcached daemon ever free any memory, regardless of whether the daemon has any ongoing need for the memory it holds.
I know that this question might sound a little whiny, with a tone of "I DEMAND that open source project X support my specific need!" That's not it, at all--I'm purely interested in the exact technical answer, here, and I swear I'm not harshing on Memcached. For the curious, this question came out of a discussion about possible methods for gracefully juggling multiple Memcached instances on a single server, given an application where the cost of a cache flush can be quite high.
However, I'd appreciate it if you save your application suggestions/advice for a different question (re-architecting my application, using a different caching implementation, etc.). I do appreciate a good brainstorm, but I think this question will be most valuable if it stays focused on the technical specifics of how Memcached does and does not work. If you don't have the answer to this specific question, there is probably still value in what you have to say, but I'd guess that there's a different, better place to post the more speculative comments/suggestions/advice.
This is probably the hardest problem we have to solve for memcached currently (well, a variation of it, anyway).
Freeing a chunk of memory requires us to know that a) nothing within the chunk is in use and b) nothing will start using it while we're in the process of purging it for reuse/freeing. I've heard some really good ideas for how we might solve our slab rebalancing problems which is basically the same, except we're not trying to free the memory, but to give it to something else (a common problem in a few large installations).
Also, whether free actually reduces the RSS of your process is implementation dependent. In many cases, a malloc/fill/free will leave the memory mapped in (unless your allocator uses mmap instead of sbrk).
I'm pretty sure this isn't possible with memcached. I don't see any technical reason why it couldn't be implemented though. Lock cache operations, expire enough keys to reach the desired size, update the size, unlock. (I'm sure there's nicer ways to avoid blocking the server during that time.)
The standard and default mechanism of memory management in memcached is slab allocator. It means that memory is being allocated for the process and never released to the operating system. Basically, when memory is no longer used to store some data, it is being held by the process in order to be reused later, when needed. However, the operating system releases memory allocated by the process when it is finished. That is why memory is being released when you kill/stop the memcached.
There is a compile-time option in memcached to enable malloc/free mechanism. So that when free() is called, memory might be released to operating system (this depends on C standard library implementation). But doing so might hurt a good fragmentation and performance.
Please read more about the issue here:
Why not use malloc/free
Memcached memory management

Ejabberd Memory Consumption (or Leak?)

I'm using ejabberd + mochiweb on our server. The longer I keep ejabberd and mochiweb running, the more memory is consumed (last night it was consuming 35% of memory. right now it's a bit above 50%). I thought this was just a mnesia garbage collection issue - so I installed Erlang R13B3 and restarted ejabberd. This didn't fix it though.
So I'm noticing now that at a bit above 50% of full memory consumption, it looks like ejabberd's starting to "let go" of memory and stay at around 50%. Is this normal? Is ~50% a threshold for ejabberd, so that when it reaches it it says, "hey time to actually let some memory go..." and maybe it keeps the rest around for quick access (like caching mnesia?)
I appreciate any input. Thanks!
Run erlang:memory(). in your shell every now and then. You can also give erlang:system_info(Type). with allocated_areas and allocator a try.
These should give you a hint on what kind of memory is leaking.
You can also setup memsup to warn you about processes allocating too much memory.
Turns out, there is no memory leak (yay!) Ejabberd is taking up only < 40MB. Finally I saw the light when I saw the Usage Graphs on EngineYard - only 288MB is actually being used, 550MB is being buffered, and 175MB is being cached. My ejabberd server an update every 2.5 seconds from each client so that may explain why so much is being buffered/cached.
Thanks for all of your help.
Newly created atoms in erlang processes get never garbage collected. This might be an issue when processes are registered by an algorith that creates atom names from random eg. randomly created strings.

Resources