How to analyze ruby on rails memory leak? - ruby-on-rails

I am dealing with a legacy system (Ruby 2.7.6), which suffers from a memory leak, that led the previous developers to make use of puma worker killer that overcomes the memory issue by restarting the process every 30 minutes.
As traffic increases, we now need to increase the number of instances and decrease the 30 minutes kill rate to even 20 minutes.
We would like to investigate the source of this memory leak, which apparently originates from one of our many Gem dependencies (information given by a previous developer).
The system is on AWS (Elastic Beanstalk) but can also run on docker.
Can anyone suggest a good tool and guide how to find the source for this memory leak?
Thanks
** UPDATE:
I made use of mini-profiler and I took some memory snapshot to see the influence of about 100 requests on the server, [BEFORE, DURING, AFTER]
judging by the outputs, it does not seem there is a memory leak in Ruby, but the memory usage did increase and stay up, although does not seem to be used by us...
BEFORE:
KiB Mem : 2007248 total, 628156 free, 766956 used, 612136
buff/cache KiB Swap: 2097148 total, 2049276 free, 47872 used.
1064852 avail Mem
Total allocated: 115227 bytes (1433 objects) Total retained: 21036
bytes (147 objects)
allocated memory by gem
33121 activesupport-6.0.4.7
21687 actionpack-6.0.4.7
14484 activerecord-6.0.4.7
12582 var/app
9904 ipaddr
6957 rack-2.2.4
3512 actionview-6.0.4.7
2680 mysql2-0.5.3
1813 rack-mini-profiler-3.0.0
1696 audited-5.0.2
1552 concurrent-ruby-1.1.10
DURING:
KiB Mem : 2007248 total, 65068 free, 1800424 used, 141756
buff/cache KiB Swap: 2097148 total, 2047228 free, 49920 used.
58376 avail Mem
Total allocated: 225272583 bytes (942506 objects) Total retained:
1732241 bytes (12035 objects)
allocated memory by gem
106497060 maxmind-db-1.0.0
58308032 psych
38857594 user_agent_parser-2.7.0
4949108 activesupport-6.0.4.7
3967930 other
3229962 activerecord-6.0.4.7
2154670 rack-2.2.4
1467383 actionpack-6.0.4.7
1336204 activemodel-6.0.4.7
AFTER:
KiB Mem : 2007248 total, 73760 free, 1817688 used, 115800
buff/cache KiB Swap: 2097148 total, 2032636 free, 64512 used.
54448 avail Mem
Total allocated: 109563 bytes (1398 objects) Total retained: 14988
bytes (110 objects)
allocated memory by gem
29745 activesupport-6.0.4.7
21495 actionpack-6.0.4.7
13452 activerecord-6.0.4.7
12502 var/app
9904 ipaddr
7237 rack-2.2.4
3128 actionview-6.0.4.7
2488 mysql2-0.5.3
1813 rack-mini-profiler-3.0.0
1360 audited-5.0.2
1360 concurrent-ruby-1.1.10
Where can the leak be then? is it Puma?

It seems from the statistics in the question that most objects get freed properly by the memory allocator.
However - when you have a lot of repeated allocations, the system's malloc can sometimes (and often does) hold the memory without releasing it to the system (Ruby isn't aware of this memory that is considered "free").
This is done for 2 main reasons:
Most importantly: heap fragmentation (the allocator is unable to free the memory and unable to use parts of it for future allocations).
The system's memory allocator knows it would probably need this memory again soon (that's in relation to the part of the memory that can be freed and doesn't suffer from fragmentation).
This can be solved by trying to replace the system's memory allocator with an allocator that's tuned for your specific needs (i.e., jamalloc, such as suggested here and here and asked about here).
You could also try to use gems that have a custom memory allocator when using C extensions (the iodine gem does that, but you could make other gems do it too).
This approach should help mitigate the issue, but the fact is that some of your gems appear memory hungry... I mean...:
is the maxmind-db gem using 106,497,060 bytes (106MB) of memory or did it allocate that number of objects?
and why is psych so hungry? are there any roundtrips between data and YAML that could be skipped?
there seems to be a lot of user agent strings stored concurrently... (the user_agent_parser gem)... maybe you could make a cache of these strings instead of having a lot of duplicates. For example, you could make a Set of these strings and replace each String object with the object in the Set. This way equal strings would point at the same object (preventing some object duplication and freeing up some memory).
Is it Puma?
Probably not.
Although I am the author of the iodine web server, I really love the work the Puma team did over the years and think it's a super solid server for what it offers. I really doubt the leak is from the server, but you can always switch and see what happens.
Re: the difference between the Linux report and the Ruby profiler
The difference is in the memory held by malloc - "free" memory that isn't returned to the system but Ruby doesn't know about.
Ruby profilers test the memory Ruby allocated ("live" memory, if you will). They have access to the number of objects allocated and the memory held by those objects.
The malloc library isn't part of Ruby. It's part of the C runtime library on top of which Ruby sits.
There's memory allocated for the process by malloc that isn't used by Ruby. That memory is either waiting to be used (retained by malloc for future use) or waiting to be released back to the system (or fragmented and lost for the moment).
That difference between what Ruby uses and what malloc holds should explain the difference between The Linux reporting and the Ruby profiling reporting.
Some gems might be using their own custom made memory allocator (i.e., iodine does that). These behave the same as malloc in the sense that the memory they hold will not show up in the Ruby profiler (at least not completely).

Related

Redis hash structure does not make memory efficient as described

my redis version is redis-version 3.2.9 and I Modify redis.conf,
hash-max-ziplist-entries 256
hash-max-ziplist-value 4096
however, the results do not play As descriped in Memory Optimization(redis hash structure can make memory more-efficient),
as well, Capacity assessment also confuse me, I will show the result I get below
As showed above, redis string key-value: the first pic shows that 3085 and 4086 uses the same memory. The second pic shows that 4096 uses more memory(about 1024 byte per key), not 4096 per key. jemalooc
I hope someone can help me, thank you
Redis internally, for optimisation purpose, stores entries in a data-structure called ZipList which directly works with memory addresses.
So the optimisation is actually compaction and reduction of memory wastage in using and maintaining pointers.
ziplist:
+----+----+----+
| a | b | c |
+----+----+----+
now, let's say we did an update in value for b and the value size has increased from let's say 10 to 20 bytes.
We have no way to fit that value in between. So we do a zip-list resizing.
ziplist:
+----+--------+----+
| a | bb | c |
+----+--------+----+
So, when when doing resizing, it will create a new block of memory with the larger size and copy the old data to that newly allocated memory and then it will deallocate the old memory area.
Since memory is moved in such cases it leads to memory fragmentation.
Redis also does memory de-fragmentation which can bring this ratio down to less than 1.
This fragmentation is calculated as,
(resident memory) / (memory allocation)
How is resident memory less than allocated memory you ask!
Normally the allocated memory should be fully contained in the resident memory, nevertheless there are a few exceptions:
If parts of the virtual memory are paged out to disk, the resident memory can be smaller than the allocated memory.
There are cases of shared memory where the shared memory is marked as used, but not as resident.

kbmmemtable EOutOfMemory error after LoadFromDataset

I am using Delphi 7 Enterprise under Windows 7 64 bit.
My computer had 16 GB of RAM.
I try to use kbmMemTable 7.70.00 Professional Edition (http://news.components4developers.com/products_kbmMemTable.html) .
My table has 150,000 records, but when I try to copy the data from Dataset to the kbmMemTable it only copies 29000 records and I get this error: EOutOfMemory
I saw this message:
https://groups.yahoo.com/neo/groups/memtable/conversations/topics/5769,
but it didn't solve my problem.
An out of memory can happen of various reasons:
Your application uses too much memory in general. A 32 bit application typically runs out of memory when it has allocated 1.4GB using FastMM memory manager. Other memory managers may have worse or better ranges.
Memory fragementation. There may not be enough space in memory for a single large allocation that is requested. kbmMemTable will attempt to allocate roughly 200000 x 4 bytes as one single large allocation. As its own largest single allocation. That shouldnt be a problem.
Too many small allocations leading to the above memory fragmentation. kbmMemTable will allocate from 1 to n blocks of memory per record depending on the setting of the Performance property .
If Performance is set to fast, then 1 block will be allocated (unless blobs fields exists, in which case an additional allocation will be made per not null blob field).
If Performance is balanced or small, then each string field will allocate another block of memory per record.
best regards
Kim/C4D

Golang. Zero Garbage propagation or efficient use of memory

From time to time I face with the concepts like zero garbage or efficient use of memory etc. As an example in the section Features of well-known package httprouter you can see the following:
Zero Garbage: The matching and dispatching process generates zero bytes of garbage. In fact, the only heap allocations that are made, is by building the slice of the key-value pairs for path parameters. If the request path contains no parameters, not a single heap allocation is necessary.
Also this package shows very good benchmark results compared to standard library's http.ServeMux:
BenchmarkHttpServeMux 5000 706222 ns/op 96 B/op 6 allocs/op
BenchmarkHttpRouter 100000 15010 ns/op 0 B/op 0 allocs/op
As far as I understand the second one has (from the table) no heap memory allocation and zero average number of allocations made per repetition.
The question: I want to learn a basic understanding of memory management. When garbage collector allocates/deallocates memory. What does the benchmark numbers means (the last two columns of the table) and how people know when heap is allocating?
I'm absolutely new in memory management, so it's really difficult to understand what's going on "under the hood". The articles I've read:
https://golang.org/ref/mem
https://golang.org/doc/effective_go.html
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
The garbage collector doesn't allocate memory :-), it just deallocates. Go's garbage collector is evolving, for the details have a look at the design document https://docs.google.com/document/d/16Y4IsnNRCN43Mx0NZc5YXZLovrHvvLhK_h0KN8woTO4/preview?sle=true and follow the discussion on the golang mailing lists.
The last two columns in the benchmark output are dead simple: How many bytes have been allocated in total and how many allocations have happened during one iteration of the benchmark code. (This allocation is done by your code, not by the garbage collector). As any allocation is a potential creation of garbage reducing these numbers might be a design goal.
When are things allocated on the heap? Whenever the Go compiler decides to! The compiler tries to allocate on the stack, but sometimes it must use the heap, especially if a value escapes from the local stack-bases scopes. This escape analysis is currently undergoing rework, so it is not easy to tell which value will be heap- or stack-allocated, especially as this is changing from compiler version to version.
I wouldn't be too obsessed with avoiding allocations until your benchmarking show too much GC overhead.

Erlang: discrepancy of memory usage figures

When I run my WebSocket test, I found the following interesting memory usage results:
Server stated, no connection
[{total,573263528},
{processes,17375688},
{processes_used,17360240},
{system,555887840},
{atom,472297},
{atom_used,451576},
{binary,28944},
{code,3774097},
{ets,271016}]
44 processes,
System:705M,
Erlang Residence:519M
100K Connections
[{total,762564512},
{processes,130105104},
{processes_used,130089656},
{system,632459408},
{atom,476337},
{atom_used,456484},
{binary,50160},
{code,3925064},
{ets,7589160}]
100044 processes,
System: 1814M,
Erlang Residence: 950M
200K Connections
( restart server and create from 0 connection, not continue from case 2)
[{total,952040232},
{processes,243161192},
{processes_used,243139984},
{system,708879040},
{atom,476337},
{atom_used,456484},
{binary,70856},
{code,3925064},
{ets,14904760}]
200044 processes,
System:3383M,
Erlang: 1837M
The figures with "System:" and "Erlang:" are provided htop, others are output of memory() call from erlang shell. Please look at the total and erlang residence memory. When there is no connection, these two are roughly same, with 100K connections, residence memory is a little larger than total, with 200K connections, residence memory is almost double the total.
Can anybody explain?
The most probable answer for your quersion is memory fragmentation.
Allocating OS memory is expensive, so Erlang tries to manage memory for you.
When Erlang allocates memory, it creates an entity called "carrier", which consists of many "blocks". Erlang memory(total) reports the sum of all the block sizes (memory actually used). OS reports the sum of all carriers sizes (sum of memory used and preallocated). Both sum of blocks sizes and carrier sizes can be read from Erlang VM. If (block sizes)/(carrier sizes) << 1, than VM has hard time with freeing the carriers. There might be many big carriers with only couple of blocks used. You can read it with: erlang:system_info({allocator,Type}). but there is an easier way. You can check it using Recon library:
http://ferd.github.io/recon/recon_alloc.html
Firstly check:
recon_alloc:fragmentation(current).
and next:
recon_alloc:fragmentation(max).
This should explain the difference between total memory reported by Erlang VM and OS. If you are sending many small messages over websockets, you can decrease the fragmentation by running Erlang with 2 options:
erl +MBas aobf +MBlmbcs 512
First option will change the block allocation strategy from best fit to address order best fit, which could help squeeze more blocks into first carriers and second one decreases maximum multiblock carrier size, which makes carriers smaller (this should make freeing them easier).

GC and memory limit issues with R

I am using R on some relatively big data and am hitting some memory issues. This is on Linux. I have significantly less data than the available memory on the system so it's an issue of managing transient allocation.
When I run gc(), I get the following listing
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 2147186 114.7 3215540 171.8 2945794 157.4
Vcells 251427223 1918.3 592488509 4520.4 592482377 4520.3
yet R appears to have 4gb allocated in resident memory and 2gb in swap. I'm assuming this is OS-allocated memory that R's memory management system will allocate and GC as needed. However, lets say that I don't want to let R OS-allocate more than 4gb, to prevent swap thrashing. I could always ulimit, but then it would just crash instead of working within the reduced space and GCing more often. Is there a way to specify an arbitrary maximum for the gc trigger and make sure that R never os-allocates more? Or is there something else I could do to manage memory usage?
In short: no. I found that you simply cannot micromanage memory management and gc().
On the other hand, you could try to keep your data in memory, but 'outside' of R. The bigmemory makes that fairly easy. Of course, using a 64bit version of R and ample ram may make the problem go away too.

Resources