Cannot allocate on heap - but memory is there - erlang

I am chasing a problem: every few days my system crashes with
Slogan: eheap_alloc: Cannot allocate 600904 bytes of memory (of type "heap").
System version: Erlang/OTP 22 [erts-10.5.4] [source] [64-bit] [smp:2:2] [ds:2:2:10] [async-threads:1] [hipe]
I used to have some OOM situations but they were caused by e.g. infinite loops causing messages queues to fill the entire memory. This time it is something different. The memory usage at the moment of crash reported by cdv was 6081 MB. The system runs on a VM with 14GB of RAM, and nothing else is on this VM. Memory monitoring shows stable usage on the level of 6GB consistent with crash dump. So at the moment of crash Beam had still 8GB available, yet it could not allocate 600k.
Not sure if this is relevant but most of the memory used by the system is in one Mnesia ETS table – at the moment of crash it was 5631 MB.
Anyone aware of possible cause of situation when Erlang cannot allocate memory, though plenty is still available?

Related

How to find a memory leaking process from an erlang crash dump?

I have a similar problem to Examining Erlang crash dumps - how to account for all memory?, my app crashed with eheap_alloc: Cannot allocate 34385784 bytes of memory (of type "old_heap") and I can't figure out which process caused it.
According to the Memory tab in the crash dump viewer, Process used is 2153MB, 4but when I sum up all the Memory: lines in the erl_crash.dump (which are in bytes, see guide) the result is only around 285MB. Old heap would be another 62MB, but I think that's included in Memory:. Where could the rest be coming from? Usually the app has a total memory usage of around 300MB.
Also at the top of the dump file it says Calling Thread: scheduler:0 but there is no further information about it. There are only entries for scheduler:1 and scheduler:2. Could they be involved in this or are the other scheduler processes unrelated?

How OS handles memory leaks

I searched quite a lot for the question but was unable to find my exact query although it seems general enough that might have been asked and answered somewhere.
I wanted to know what happens after a process causes a memory leak and terminates. In my opinion it's not big deal because of virtual memory. After all physical pages can be still allocated to other/new process even if it was causing memory leak earlier (after old process caused memory leak)
But I also read somewhere that due to memory leaks you need to restart your system, and I dont seem to understand why???
Recommended reading : Operating Systems: Three Easy Pieces
On common OSes (e.g. Linux, Windows, MacOSX, Android) each process has its own virtual address space (and the heap memory, e.g. used for malloc or mmap, is inside that virtual address space), and when the process terminates, its entire virtual address space is destroyed.
So memory leaks don't survive the process itself.
There could be subtle corner cases (e.g. leaks on using shm_overview(7) or shmget(2)).
Read (for Linux) proc(5), try cat /proc/self/maps, and see also this. Learn to use valgrind and the Address Sanitizer.
Read also about Garbage Collection. It is quite relevant.
In modern operating systems the address space is divided into a user space and an system space. The system space is the same for all processes.
When you kill a process, that destroys the user space for the process. If an application has a memory leak, killing the process remedies the leak.
However,the operating system can also allocate memory in the system space. When there is a memory leak in the operating system's allocation of system space memory, killing processes does not free it up.
That is the type of memory leak that forces you to reboot the system.

Why Erlang / Elixir observer memory usage numbers do not add up?

I am starting out with Elixir and observing some strange behavior when connect to my remote production node using iex.
As in the screenshot below, the observer reports that total of 92 MB memory is in use. However, when you sum up the memory consumption of processes, atoms, binaries, code and ets, it comes up to be: ~69 MB
Processes 19.00 MB
Atoms 0.97 MB (969 kB)
Binaries 13.00 MB
Code 28.00 MB
ETS 7.69 MB (7685 kB)
-------------------
Total 68.66 MB
So, my first question is where is this extra 23 MB of memory is coming from? I am pretty sure its not just a reporting issue. Because when I look at my Kubernetes pod's memory consumption, it is ~102 MB which is in alignment with the numbers observer is showing.
Only thing I can think of is that those 23 MB has not been garbage collected yet. Is my assumption valid? If so, its been 6 hours since this container started. And I have been monitoring the memory consumption from very beginning. Shouldn't this be garbage collected by now?
And second question: are there any Erlang VM / Elixir configuration tweaks I can make to optimize on memory footprint?
I have also been attempting to solve issues regarding memory management in OTP applications and one tool that has been particularly useful for me is the library written by Fred Hebert called recon. Especially the recon_alloc module that provides very useful information on memory usage in the Erlang VM.
The missing MegaBytes
The following quote is directly taken from the documentation of the recon_alloc:memory() function and might provide you an insight of what's going on :
The memory reported by `allocated' should roughly match what the OS
reports. If this amount is different by a large margin, it may be the
sign that someone is allocating memory in C directly, outside of
Erlang's own allocator -- a big warning sign. There are currently
three sources of memory alloction that are not counted towards this
value: The cached segments in the mseg allocator, any memory allocated
as a super carrier, and small pieces of memory allocated during
startup before the memory allocators are initialized. Also note that
low memory usages can be the sign of fragmentation in memory, in which
case exploring which specific allocator is at fault is recommended.
So I think that the extra 23 MB of memory usage might be caused by some undesired allocations, or perhaps due to fragmentation.
Tweaking ( with great caution /!\ )
As for your second question, there is a tool in Erlang called erts_alloc that also describes manual configuration of memory allocators. It can be done by passing command-line flags to the emulator, for example :
erl +SOMEFLAG +SOMEOTHERFLAG
But there's a big red warning in the documentation that strongly suggests that messing with these flags can result in much worse behaviour than with the default configuration.
So my advice would be to resort to these modifications if it is really the only way to solve the problem. In that case, there is a book about the Erlang Runtime System that has helped me understanding some aspects so I would also recommend giving it a read beforehand.
NOTE : Wild shot in the dark here and not answering your question directly, but it might be useful to double check what is going on with your binaries, as I see that there are 13 MB reported by the observer. Depending on their size (smaller or larger than 64 bytes), they are stored in process heaps or accessed by reference. I have faced case #1 with lots of small binaries piling up and ultimately crashing my system.
There are a few other helpful resources I found while trying to fix those problems :
This specific snippet from a blog post authored by Fred Hebert as well :
[erlang:garbage_collect(Pid) || Pid <- processes()].
It will trigger a GC on all running processes immediately. In my case it has done wonders. You can add an option to call it asynchronously too, so you don't have to block until it's all done :
[erlang:garbage_collect(Pid, [{async, RequestId}]) || Pid <- processes()].
This article about GC in Erlang
Efficiency guidelines in the Erlang docs for binaries, that provide useful implementation details.
Stuff goes Bad : Erlang in Anger, another free ebook written by ... yes it is Fred Hebert.
Hope this helps :)

Dart: Find memory leak using Observatory

We are using Dart VM version: 1.24.3 (Wed Dec 13 16:10:39 2017) on "linux_x64" and I've been using the observatory to find out how memory is allocated.
Under Allocation Profile -> Old Generation in observatory, I can see the following: 83.7MB of 498.5MB used
My questions are
I'm assuming 83.7MB is the memory used, but then what's the 498.5MB? Amount of memory allocated by the VM?
I can see the 498.5MB increasing over time even though the memory used is not that high. Why would the VM allocate more and more memory even though the app doesn't more than half?
When would Dart's VM release memory back to system? GC doesn't lower the allocated memory much.
How else could I narrow down where the potential memory leak is?
Thanks!

Ruby process memory structure

I'm trying to figure out an issue with memory usage in a ruby process. I tried to take a heap dump of the ruby process using the ObjectSpace module to understand what's happening. What's puzzling is that, the "top" command in linux reports that the process uses 17.8 GB of virtual memory and 15GB of resident memory. But, the size of the heap dumps are only around 2.7-2.9 GB.
Based on the Ruby documentation, Objectspace.dump_all method dumps the contents of the ruby heap as JSON.
I'm not able to understand what is hogging the rest of the memory. It would be helpful, if someone could help me to understand what's happening.
Thank you.
It is likely that your application is allocating objects that are then groomed by the Garbage Collector. You can check this with a call to GC.stat
Ruby does not release memory back to the operating system in any meaningful way. (if you're running MRI) Consequently, if you allocate 18GB of memory and 15GB gets garbage collected, you'll end up with your ~3GB of heap data.
The Ruby MRI GC is not a compacting garbage collector, so as long as there is any data in the heap the heap will not be released. This leads to memory fragmentation and the values that you see in your app.

Resources