Sidekiq causing memory bloat in rails app - ruby-on-rails

I have a rails app with sidekiq workers performing processes in the background and originally had around 30 threads to perform tasks. We found this was causing high memory usage and reducing the thread count for the workers reduced the memory bloat, but I don't understand why. Can anyone please explain?

From a quick google it sounds like you are experiencing memory fragmentation which is pretty normal for Sidekiq. Are you using class variables at all? Does your code require classes during execution time? How many AR queries are you executing? Many AR queries create thousands, if not millions, of objects and throw them away. Is your code thread-safe? As per this post from the author of Sidekiq, we can see memory bloat happens from a large number of memory arenas in multithreaded applications. There are some details of a solution in that article and even the readme of the Sidekiq repo that are very helpful, but it might be worth outlining the causation to understand why memory bloat happens in 'rails/ruby'.
Memory allocation in Ruby involves three layers: interpreter, OS memory allocator library and the kernal. Ruby organises objects in memory arenas called Ruby heap pages and a ruby heap page is divided into equal-sized slots, where one object occupies a slot. These slots are either occupied or free and when Ruby allocates a new object, it tries to occupy a free slot. If there are no free slots, it will allocate a new heap page. Each slot has a byte limit and if an object is higher than the byte limit, a pointer is placed in the heap page to the object.
Memory fragmentation is when these allocations happen and is quite frequent in high thread applications. When garbage collection happens the heap page marks a cleared slot as free and allows the slot to be reused. If all objects in the heap page are marked as free then the heap page is freed back to the memory allocator and potentially the kernal. Ruby does not promise garbage collection on all objects, so what happens when not all free slots are freed and there a large amount of heap pages that are partially filled? The heap pages have available slots for Ruby to allocate to but the memory allocator still thinks they are allocated memory. The memory allocator does not release the entire OS heaps at once and can release any individual OS page, just once all allocations are released for said page.
So threading plays an issue as each thread try's to allocate memory from the same OS heap at the same time and they contend for access. Only one thread can perform an allocation at a time, which reduces multithreaded memory allocation performance. The memory allocator attempts to optimize performance by creating multiple OS heaps and tries to assign different threads to its own OS heap.
If you have access to ruby 2.7 you can call GC.compact to combat this. It provides a way to find objects that can be moved in Ruby and condenses them and reduces the amount of heap pages used. Empty slots that have been freed through GC in-between consumed slots can now be condensed against. Say, for example, you have a heap page with four slots and only slot one, two and four have an object assigned. The compact call will evaluate if object four is a movable object and will assign it to slot three and any references associated with the object and redirect to slot three. Slot four is now placed with a T_MOVED object and the final GC replaces the T_MOVED object with T_EMPTY, ready for assignment.
Personally, I would not rely solely on GC.compact and you could do the simple MALLOC_ARENA_MAX trick, but have a read of the source documents and you should find a suitable solution.

Related

Writing millions records to mnesia table takes up a lot of memory(RAM) and not reclaim even these records are deleted

I am running an Erlang application that often writes millions of records to the mnesia table for making a scheduler. When the time is due, the records get executed and removed from the table. The table is configured with {type, disk_copies}, {type, ordered_set}. I use transaction operations for writing and dirty operations for deleting records.
I have an experiment that writes 2 million records and then deletes all of them: the RAM memory was not reclaimed after it finished. There is a spike that twice increases the memory when I start to delete those records. For example, the beam memory starts as 75MB, and becomes after the experiment 410MB. I've used erlang:memory() to inspect the memory before and after, found that the memory was eaten by the process_used and binary but actually, I did not have any action with binary. If I use erlang:garbage_collect(Pid) for all running processes, the memory gets reclaimed, leaving 180MB.
Any suggestions for troubleshooting this issue would be highly appreciated. Thank you so much.
Answer from Rickard Green from Elrang OTP:
The above does not indicate a bug.
A process is not garbage collected unless it reaches certain limits, for example, it needs to allocate heap data and there is no free heap available. If a process stops executing, it does not matter how long time passes, it won't automatically garbage collect by itself unless it reaches one of these limits. A garbage collection can be forced by calling erlang:garbage_collect() though.
A process that has had a lot of live data (and by this have grown large) but at the time of the garbage collection has no live data wont shrink down to its original size immediately. It will instead get a relatively large heap. The heap space is free for usage by the process, but it is allocated from the system's point of view. The relatively large heap is selected in order to avoid triggering garbage collections unnecessarily frequent.
Not only your processes are effected when you execute. Also other processes might build up heap in order to serve your processes.
If you look at memory consumption via top or similar, it is also expected that memory usage will have increased after execution even if you are able to garbage collect every process down into its initial size. This due to memory allocators that place memory blocks into larger chunks of memory which cannot be removed until the whole memory chunk is free. More or less every memory allocation system that exist will have this characteristic.

How does memory work in Ruby?

I'm trying to understand the idea behind memory usage in Ruby. I'm currently going through memory issues on my Rails web app and API.
Here's a simple question:
If I load many records inside a variable like so:
users = User.where(work: 'cook')
This would probably hold in my app's memory for the time I'm using this variable, right?
But would it help to free memory by doing the following after I'm done using the variable in my code?
users = nil
Thank you for your help. I'm also open to answers that answer the question on a broader topic.
Yes setting users to nil would indeed reduce required memory (very slightly) but it's not necessary as the Garbage Collector will eventually sweep it. In production you should assume your Ruby process will always grow over time and should be periodically restarted if your concerned about memory management. The maximum heap space reduction you'll ever see in ruby is minimal compared to its growth over time so I wouldn't concern yourself with setting large collections to nil to save a few bytes here and there a little earlier than the GC would have swept it anyway. Ruby allocates objects in a heap space that consists of heap pages. Assuming you're using Ruby2.1 or better, the heap space is divided into used (aka Eden) and empty (aka Tomb) heap pages. When instantiating objects, ruby looks for free space in the eden pages first and only if no space is available will it take a page from tomb. When you then overwrite the object with nil, those heap pages are added back to the tomb. Moving pages from the eden to the tomb will reduct heap size slightly however Ruby's Garbage Collector won't drastically reduce it because it assumes if you've created a large collection of objects before, you'll do it again. One book I recommend diving into is "Ruby Performance Optimization" as it goes through ruby's Garbage Collector in depth.

Does Elixir Garbage Collector suffer from stop the world pause? [duplicate]

I want to know technical details about garbage collection (GC) and memory management in Erlang/OTP.
But, I cannot find on erlang.org and its documents.
I have found some articles online which talk about GC in a very general manner, such as what garbage collection algorithm is used.
To classify things, lets define the memory layout and then talk about how GC works.
Memory Layout
In Erlang, each thread of execution is called a process. Each process has its own memory and that memory layout consists of three parts: Process Control Block, Stack and Heap.
PCB: Process Control Block holds information like process identifier (PID), current status (running, waiting), its registered name, and other such info.
Stack: It is a downward growing memory area which holds incoming and outgoing parameters, return addresses, local variables and temporary spaces for evaluating expressions.
Heap: It is an upward growing memory area which holds process mailbox messages and compound terms. Binary terms which are larger than 64 bytes are NOT stored in process private heap. They are stored in a large Shared Heap which is accessible by all processes.
Garbage Collection
Currently Erlang uses a Generational garbage collection that runs inside each Erlang process private heap independently, and also a Reference Counting garbage collection occurs for global shared heap.
Private Heap GC: It is generational, so divides the heap into two segments: young and old generations. Also there are two strategies for collecting; Generational (Minor) and Fullsweep (Major). The generational GC just collects the young heap, but fullsweep collect both young and old heap.
Shared Heap GC: It is reference counting. Each object in shared heap (Refc) has a counter of references to it held by other objects (ProcBin) which are stored inside private heap of Erlang processes. If an object's reference counter reaches zero, the object has become inaccessible and will be destroyed.
To get more details and performance hints, just look at my article which is the source of the answer: Erlang Garbage Collection Details and Why It Matters
A reference paper for the algorithm: One Pass Real-Time Generational Mark-Sweep Garbage Collection (1995) by Joe Armstrong and Robert Virding in
1995 (at CiteSeerX)
Abstract:
Traditional mark-sweep garbage collection algorithms do not allow reclamation of data until the mark phase of the algorithm has terminated. For the class of languages in which destructive operations are not allowed we can arrange that all pointers in the heap always point backwards towards "older" data. In this paper we present a simple scheme for reclaiming data for such language classes with a single pass mark-sweep collector. We also show how the simple scheme can be modified so that the collection can be done in an incremental manner (making it suitable for real-time collection). Following this we show how the collector can be modified for generational garbage collection, and finally how the scheme can be used for a language with concurrent processes.1
Erlang has a few properties that make GC actually pretty easy.
1 - Every variable is immutable, so a variable can never point to a value that was created after it.
2 - Values are copied between Erlang processes, so the memory referenced in a process is almost always completely isolated.
Both of these (especially the latter) significantly limit the amount of the heap that the GC has to scan during a collection.
Erlang uses a copying GC. During a GC, the process is stopped then the live pointers are copied from the from-space to the to-space. I forget the exact percentages, but the heap will be increased if something like only 25% of the heap can be collected during a collection, and it will be decreased if 75% of the process heap can be collected. A collection is triggered when a process's heap becomes full.
The only exception is when it comes to large values that are sent to another process. These will be copied into a shared space and are reference counted. When a reference to a shared object is collected the count is decreased, when that count is 0 the object is freed. No attempts are made to handle fragmentation in the shared heap.
One interesting consequence of this is, for a shared object, the size of the shared object does not contribute to the calculated size of a process's heap, only the size of the reference does. That means, if you have a lot of large shared objects, your VM could run out of memory before a GC is triggered.
Most if this is taken from the talk Jesper Wilhelmsson gave at EUC2012.
I don't know your background, but apart from the paper already pointed out by jj1bdx you can also give a chance to Jesper Wilhelmsson thesis.
BTW, if you want to monitor memory usage in Erlang to compare it to e.g. C++ you can check out:
Erlang Instrument Module
Erlang OS_MON Application
Hope this helps!

In programming environments that have automatic memory management, how often are the OS memory allocation routines invoked at runtime?

Do implementations pre-allocate blocks of memory for objects using malloc? When these blocks are used up, will additional memory be requested? When garbage collection runs and compaction occurs, will memory be returned to the OS via calls to free?
Do implementations pre-allocate blocks of memory for objects using malloc?
Yes. Most often they pre-allocate continuous blocks of memory and implement they own allocation mechanism inside (for example based on allocation pointer - pointing the memory address for the next object so allocating an object is simply returning this address and moving this pointer by given amount of bytes). This is faster than relying on OS calls and gives better control of those memory regions. For example, in case of CLR on Windows, those blocks are called segments and are managed via VirtualAlloc/VirtualFree calls. First quite a big memory region is reserved and then more and more pages are being committed as they are needed. Malloc (or more general - HeapAPI in case of Windows) is not used in CLR.
When these blocks are used up, will additional memory be requested?
Yes, they may be more blocks created but first they grow "inside" by committing (consuming) reserved memory.
When garbage collection runs and compaction occurs, will memory be returned to the OS via calls to free?
It depends on specific runtime implementation but you should not look at it as a main memory reclamation mechanism. Compaction works inside those preallocated memory blocks - for example, allocation pointer will be moved back to the left after compaction occurred. But yes, in general, segments may be returned to OS when GC decides that it is no longer needed (like all objects living inside have been reclaimed). However, on 32-bit architectures with quite limited virtual memory space it could lead to unwanted memory fragmentation and reusing such memory block was a better option. On 64-bit this may not be so big problem, however, reusing those blocks still may be a just good idea.

DelayedJob doesn't release memory

I'm using Puma server and DelayedJob.
It seems that the memory taken by each job isn't released and I slowly get a bloat causing me to restart my dyno (Heroku).
Any reason why the dyno won't return to the same memory usage figure before the job was performed?
Any way to force releasing it? I tried calling GC but it doesn't seem to help.
You can have one of the following problems. Or actually all of them:
Number 1. This is not an actual problem, but a misconception about how Ruby releases memory to operating system. Short answer: it doesn't. Long answer: Ruby manages an internal list of free objects. Whenever your program needs to allocate new objects, it will get those objects from this free list. If there are no more objects there, Ruby will allocate new memory from operating system. When objects are garbage collected they go back to the free list. So Ruby still have the allocated memory. To illustrate it better, imagine that your program is normally using 100 MB. When at some point program will allocate 1 GB, it will hold this memory until you restart it.
There are some good resource to learn more about it here and here.
What you should do is to increase your dyno size and monitor your memory usage over time. It should stabilize at some level. This will show you your normal memory usage.
Number 2. You can have an actual memory leak. It can be in your code or in some gem. Check out this repository, it contains information about well known memory leaks and other memory issues in popular gems. delayed_job is actually listed there.
Number 3. You may have unoptimized code that is using more memory than needed and you should try to investigate memory usage and try to decrease it. If you are processing large files, maybe you should do it in smaller batches etc.

Resources