I'm trying to understand the idea behind memory usage in Ruby. I'm currently going through memory issues on my Rails web app and API.
Here's a simple question:
If I load many records inside a variable like so:
users = User.where(work: 'cook')
This would probably hold in my app's memory for the time I'm using this variable, right?
But would it help to free memory by doing the following after I'm done using the variable in my code?
users = nil
Thank you for your help. I'm also open to answers that answer the question on a broader topic.
Yes setting users to nil would indeed reduce required memory (very slightly) but it's not necessary as the Garbage Collector will eventually sweep it. In production you should assume your Ruby process will always grow over time and should be periodically restarted if your concerned about memory management. The maximum heap space reduction you'll ever see in ruby is minimal compared to its growth over time so I wouldn't concern yourself with setting large collections to nil to save a few bytes here and there a little earlier than the GC would have swept it anyway. Ruby allocates objects in a heap space that consists of heap pages. Assuming you're using Ruby2.1 or better, the heap space is divided into used (aka Eden) and empty (aka Tomb) heap pages. When instantiating objects, ruby looks for free space in the eden pages first and only if no space is available will it take a page from tomb. When you then overwrite the object with nil, those heap pages are added back to the tomb. Moving pages from the eden to the tomb will reduct heap size slightly however Ruby's Garbage Collector won't drastically reduce it because it assumes if you've created a large collection of objects before, you'll do it again. One book I recommend diving into is "Ruby Performance Optimization" as it goes through ruby's Garbage Collector in depth.
Related
I have a rails app with sidekiq workers performing processes in the background and originally had around 30 threads to perform tasks. We found this was causing high memory usage and reducing the thread count for the workers reduced the memory bloat, but I don't understand why. Can anyone please explain?
From a quick google it sounds like you are experiencing memory fragmentation which is pretty normal for Sidekiq. Are you using class variables at all? Does your code require classes during execution time? How many AR queries are you executing? Many AR queries create thousands, if not millions, of objects and throw them away. Is your code thread-safe? As per this post from the author of Sidekiq, we can see memory bloat happens from a large number of memory arenas in multithreaded applications. There are some details of a solution in that article and even the readme of the Sidekiq repo that are very helpful, but it might be worth outlining the causation to understand why memory bloat happens in 'rails/ruby'.
Memory allocation in Ruby involves three layers: interpreter, OS memory allocator library and the kernal. Ruby organises objects in memory arenas called Ruby heap pages and a ruby heap page is divided into equal-sized slots, where one object occupies a slot. These slots are either occupied or free and when Ruby allocates a new object, it tries to occupy a free slot. If there are no free slots, it will allocate a new heap page. Each slot has a byte limit and if an object is higher than the byte limit, a pointer is placed in the heap page to the object.
Memory fragmentation is when these allocations happen and is quite frequent in high thread applications. When garbage collection happens the heap page marks a cleared slot as free and allows the slot to be reused. If all objects in the heap page are marked as free then the heap page is freed back to the memory allocator and potentially the kernal. Ruby does not promise garbage collection on all objects, so what happens when not all free slots are freed and there a large amount of heap pages that are partially filled? The heap pages have available slots for Ruby to allocate to but the memory allocator still thinks they are allocated memory. The memory allocator does not release the entire OS heaps at once and can release any individual OS page, just once all allocations are released for said page.
So threading plays an issue as each thread try's to allocate memory from the same OS heap at the same time and they contend for access. Only one thread can perform an allocation at a time, which reduces multithreaded memory allocation performance. The memory allocator attempts to optimize performance by creating multiple OS heaps and tries to assign different threads to its own OS heap.
If you have access to ruby 2.7 you can call GC.compact to combat this. It provides a way to find objects that can be moved in Ruby and condenses them and reduces the amount of heap pages used. Empty slots that have been freed through GC in-between consumed slots can now be condensed against. Say, for example, you have a heap page with four slots and only slot one, two and four have an object assigned. The compact call will evaluate if object four is a movable object and will assign it to slot three and any references associated with the object and redirect to slot three. Slot four is now placed with a T_MOVED object and the final GC replaces the T_MOVED object with T_EMPTY, ready for assignment.
Personally, I would not rely solely on GC.compact and you could do the simple MALLOC_ARENA_MAX trick, but have a read of the source documents and you should find a suitable solution.
I'm using Puma server and DelayedJob.
It seems that the memory taken by each job isn't released and I slowly get a bloat causing me to restart my dyno (Heroku).
Any reason why the dyno won't return to the same memory usage figure before the job was performed?
Any way to force releasing it? I tried calling GC but it doesn't seem to help.
You can have one of the following problems. Or actually all of them:
Number 1. This is not an actual problem, but a misconception about how Ruby releases memory to operating system. Short answer: it doesn't. Long answer: Ruby manages an internal list of free objects. Whenever your program needs to allocate new objects, it will get those objects from this free list. If there are no more objects there, Ruby will allocate new memory from operating system. When objects are garbage collected they go back to the free list. So Ruby still have the allocated memory. To illustrate it better, imagine that your program is normally using 100 MB. When at some point program will allocate 1 GB, it will hold this memory until you restart it.
There are some good resource to learn more about it here and here.
What you should do is to increase your dyno size and monitor your memory usage over time. It should stabilize at some level. This will show you your normal memory usage.
Number 2. You can have an actual memory leak. It can be in your code or in some gem. Check out this repository, it contains information about well known memory leaks and other memory issues in popular gems. delayed_job is actually listed there.
Number 3. You may have unoptimized code that is using more memory than needed and you should try to investigate memory usage and try to decrease it. If you are processing large files, maybe you should do it in smaller batches etc.
I have been reading up on how processes are implemented by computers, and found mention of stacks and heaps. This was pretty cool to me, since a lack of any kind of computer science background means that these things have been pretty esoteric to me.
My current understanding is that, on an extremely basic level, a process is represented in RAM as a fix-sized 'Stack' and associated variably-sized data structures that collectively are known as 'heap' space. Frames are added to the stack that may result in creating, editing or deleting data stored in heap space thereby changing a processes 'state').
So my question is, can all RAM usage be categorized either as being part of a heap or part of a stack?
What else could be stored in RAM that doesn't fall into either of these categories?
Yes, in extremely simple systems, especially old and not supporting multitasking, all user memory can be used by a combination of user code, user data, user heap data, user stack data. It's up to the programmer what to use memory for. The memory doesn't mind. But, naturally, there needs to be code and, in most cases, stack. Everything else is optional.
I'm using instruments on a device to try to figure out if I have any memory leaking or abandoned. Specifically I am using leaks and allocations. While instruments doesn't point out any leaks, that doesn't mean I don't have memory issues. I've been working on this for weeks, and I don't seem to be any closer to figuring out what issues I have (ugh).
I am testing a particular action by taking a heapshot after the action and repeating. After the first few "settling" generations, I can see that the growth and persistent count all start out at a certain number (several kb). After many repeated iterations (say 10-20), some (not all) slowly end up draining to 0. It takes a while, but it does happen. The generations where there remains persistent memory never actually show me anything that I find helpful, as the stack trace show all system libraries.
So my questions are:
What does this type of behavior indicate? Do I have memory issues? Is there some type of lazy release of memory somewhere?
In a sea of iterations that show persistent memory, what does one zero heap growth iteration mean?
If the stack trace for a particular generation points only to system libraries, does this mean the heap growth for that generation is valid or that there is a bug? Or could it still mean that there is something holding onto the memory on my end?
What does it mean when the stack trace shows your library and method, but it is greyed out like the system code and has a little house icon, vs a a line with your library and method that is in black and has a little person icon?
If I have something like a retain cycle - wouldn't the persistent growth be consistent?
Any answers to insights would be extremely helpful!
I'll take a stab at your questions:
What does this type of behavior indicate? Do I have memory issues? Is
there some type of lazy release of memory somewhere?
Since you can't know how the system frameworks manage their private memory needs, you must assume that yes, there could be lazy/deferred release of memory happening any time you call into the system frameworks, which in most apps is "all the time". Beyond not being able to rule it out, I can say with some certainty that there definitely are long-lived allocations triggered by seemingly-innocuous system framework usage. (See the discussion of UIWebView's long-lived memory use in this answer for an example.)
In a sea of
iterations that show persistent memory, what does one zero heap growth
iteration mean?
Hard to say. A good first-order guess might be that the heap growth associated with the iteration was somehow exactly offset by a lazy/deferred release of the memory allocated for a previous iteration.
If the stack trace for a particular generation points
only to system libraries, does this mean the heap growth for that
generation is valid or that there is a bug? Or could it still mean
that there is something holding onto the memory on my end?
If Instruments shows heap growth, then that heap growth almost certainly exists. Whether that heap growth is something you have direct control over depends. If you make no calls into system frameworks (not likely), then it's definitely your fault. Once you make a call into the system frameworks, you have to accept the possibility that the framework might allocate memory that stays allocated after your call returns.
What does
it mean when the stack trace shows your library and method, but it is
greyed out like the system code and has a little house icon, vs a a
line with your library and method that is in black and has a little
person icon?
Lines being greyed out indicates that Instruments doesn't have debug symbols for that line. That's all. It doesn't indicate anything specific with regard to memory use.
If I have something like a retain cycle - wouldn't the
persistent growth be consistent?
If each iteration created a new object graph with cyclic retains, then yes, you would expect that each iteration would cause heap growth by at least the size of that object graph. That said, small object graphs can easily be lost in the "noise." If you have suspicions, one way is to have objects of a "suspect" class perform a huge allocation that will make them stand out from the "noise." For instance, make your object malloc a megabyte (or more) for every instance (and, obviously, free it when the instance is deallocated.) This can help problem areas stick out where they might not have originally.
Note: 32 bit application, which is not planned to be migrated to 64 bit.
I'm working with a very memory consuming application and have pretty much optimized all the relevant paths in respect to memory allocation/de-allocation. (there are no memory leaks, no handle leaks, no any other kind of leaks in the application itself AFAIK and tested. 3rd party libs which I cannot touch are of course candidates but unlikely in my scenario)
The application will frequently allocate large single and bi-dimensional dynamic arrays of single and packed records of up to 4 singles. By large I mean 5000x5000 of record(single,single,single,single) is normal. Also having even 6 or 7 such arrays in work at a given time. This is needed as there are a lot of cross-computations made on these arrays and having them read from disk would be a real performance killer.
Having this clarified, I am getting out of memory errors a lot because of these large dynamic arrays which will not go away after releasing them, no matter if I setlength them to 0 or finalize them. This is of course something FastMM is doing in order to be fast, I know that much.
I am tracking both FastMM allocated blocks and process consumed memory (RAM + PF) by using:
function CurrentProcessMemory(AWaitForConsistentRead:boolean): Cardinal;
var
MemCounters: TProcessMemoryCounters;
LastRead:Cardinal;
maxCnt:integer;
begin
result := 0;// stupid D2010 compiler warning
maxCnt := 0;
repeat
Inc(maxCnt);
// this is a stabilization loop;
// in tight loops, the system doesn't get
// much chance to release allocated resources, which in turn will get falsely
// reported by this function as still being used, resulting in a false-positive
// memory leak report in the application.
// so we do a tight loop here, waiting, until the application reported memory
// gets stable.
LastRead := result;
MemCounters.cb := SizeOf(MemCounters);
if GetProcessMemoryInfo(GetCurrentProcess,
#MemCounters,
SizeOf(MemCounters)) then
Result := MemCounters.WorkingSetSize + MemCounters.PagefileUsage
else
RaiseLastOSError;
if AWaitForConsistentRead and (LastRead <> 0) and (abs(LastRead - result)>1024) then
begin
sleep(60);
application.processmessages;
end;
until (not AWaitForConsistentRead) or (abs(LastRead - result)<1024) or (maxCnt>1000);
// 60 seconds wait is a bit too much
// so if the system is that "unstable", let's just forget it.
end;
function CurrentFastMMMemory:Cardinal;
var mem:TMemoryManagerUsageSummary;
begin
GetMemoryManagerUsageSummary(mem);
result := mem.AllocatedBytes + mem.OverheadBytes;
end;
I am running the code on a 64bit computer and my top memory consumption before crashes is about 3.3 - 3.4 GB. After that, I get memory/resources related crashes anywhere in the application. Took me some time to pin it down on the large dynamic arrays usage which were buried down in some 3rd party library.
The way I am getting over this is that I made the application resume itself from where it left off, by re-starting itself and closing with certain parameters.
This is all nice and dandy if memory consumption is fair and current operation finishes.
The big problem happens when the current memory usage is 1GB and the next operation to process requires 2.5 GB memory or more to be processed. My current code limited itself to an upper value of 1.5 GB used memory before resuming, but in this situation, I'd have to drop the limit down under 1 GB which would basically have the application resume itself after each operation and not even that guaranteeing that everything will be fine.
What if another operation will have a larger data set to process and it will require a total of 4GB or more memory?
To note that I am not talking about actual 4 GB in memory, but consumed memory by allocating huge dynamic arrays which the OS doesn't get back once de-allocated and hence it still sees it as consumed, so it adds up.
So, my next point of attack is to force fastmm to release all (or at least part of) memory to the OS. I'm specifically targeting the huge dynamic arrays here. Again, these are in a 3rd party library so re-coding that is not really in the top options. It's much easier and faster to tinker in the fastmm code and write a proc to release the memory.
I can't switch from FastMM as currently the entire application and some of the 3rd party libs are heavily coded around the use of PushAllocationGroup in order to quickly find and pinpoint any memory leaks. I know I can write a dummy FastMM unit to solve the compilation references, but I will be left without this quick and certain leak detection.
In conclusion: is there any way I can force FastMM to release at least some of it's large blocks to the OS? (well, sure there is, the actual question is: did anybody write it and if so, mind sharing?)
Thanks
later edit:
I will come up with a small relevant test application soon. It doesn't appear to be that easy to mock up one
I doubt that the issue is actually down to FastMM. For huge memory blocks, FastMM will not do any sub-allocation. Your allocation request will be handled with a straight VirtualAlloc. And then deallocation is VirtualFree.
That's assuming that you are allocating those 380MB objects in one contiguous block. I suspect that what you actually have are ragged 2D dynamic arrays. And they are not single allocations. a 5000x5000 ragged 2D dynamic arrays takes 5001 allocations to initialise. One for the row pointers, and 5000 for the rows. Those will be medium FastMM blocks. There will be sub-allocation.
I think you are asking too much. In my experience, any time you need over 3GB of memory in a 32 bit process, it's game over. Fragmentation of address space will stop you before you run out of memory. You cannot hope for this to work. Switch to 64 bit, or use a cleverer, less demanding allocation pattern. Or do you really need dense 2D arrays? Can you use sparse storage?
If you cannot alleviate your memory demands that way, you could use memory mapped files. This would allow you to make use of the extra memory that your 64 bit system has. The system's disk cache can be larger than 4GB and so your app can traverse more than 4GB of memory without actually needing to hit the disk.
You could certainly try different memory managers. I honestly do not hold out any hope that it would help. You could write a trivial replacement memory manager that used HeapAlloc. And enable the low fragmentation heap (enabled by default from Vista on). But I sincerely doubt that it will help. I'm afraid that there won't be a quick fix for you. To resolve this you face a more fundamental modification to your code.
Your issue as others have said is most likely attributable to memory fragmentation. You could test this by using VirtualQuery to create a picture of how memory is allocated to your application. You will very likely find that although you may have more than enough total memory for a new array, you don't have enough contiguous memory.
FastMem already does a lot to try and avoid problems due to memory fragmentation. "Small" allocations are done at the low end of the address space, whereas "large" allocations are done at the high end. This avoids a common problem where a series of large then small allocations followed by all large allocations being released results in a large amount of fragmented memory that is almost unusable. (Certainly unusable by anything slightly larger than the original large allocations.)
To see the benfits of FastMem's approach, imagine your memory layed out as follows:
Each digit represent a 100mb block.
[0123456789012345678901234567890123456789]
Small allocations represented by "s".
Large allocations repestented by capital letters.
[0sssss678901GGGGFFFFEEEEDDDDCCCCBBBBAAAA]
Now if you free all your large blocks, you should have no trouble performing similar large allocations later.
[0sssss6789012345678901234567890123456789]
The problem is that "large" and "small" are relative, and highly dependent on the nature of your application. FastMem defines a dividing line between "large" and "small". If you happen to have some small allocations that FastMem would classify as large, you may encounter the following problem.
[0sss4sGGGGsFFFFsEEEEsDDDDsCCCCsBBBBsAAAA]
Now if you free the large blocks you're left with:
[0sss4s6789s1234s6789s1234s6789s1234s6789]
And an attempt to allocate something larger than 400mb will fail.
Options
You may be able to tweak the FastMem settings so that all your "small" allocations are also considered small by FastMem. However, there are a few situations where this won't work:
Any DLLs you use that allocate memory to your application but bypass FastMem may still cause fragmentation.
If you don't release all your large blocks together, those that remain may induce fragmentation which will slowly get worse over time.
You could take on the task of memory management yourself.
Allocate one very large block e.g. 3.5GB which you keep for the entire lifetime of the application.
Instead of using dynamic arrays, you determine the pointer locations to use when setting up a new array.
Of course the simplest alternative would be to go 64-bit.
You could consider alternate data structures.
Do you really need array lookup capability? If not, another structure that allocates in smaller chunks may suffice.
Even if you do need array lookup, consider a paged array. Sparse arrays are a combination of arrays and linked lists. Data is stored on pages, with linked lists chaining each page.
A simple variant (since you mentioned your arrays are 2 dimensional) would be to leverage that: One dimension forms its own array providing a lookup into one of multiple arrays for the second dimension.
Related to the alternate data structures option, consider storing some data on disk. Yes performance will be slower. But if an efficient caching mechanism can be found, then maybe not so much. It would be better to be a little slower, but not crashing.
Dynamic arrays are reference counted in Delphi, so they should be automatic released when they are not used anymore.
Like strings, they are handled with COW (copy on write) when shared/stored in several variables/objects. So it seems you have some kind of memory/reference leak (e.g. an object in memory that holds still are reference to an array).
Just to be sure: you are not doing any kind of low level pointer tricks, aren't you?
So please yes, post a test program (or send the complete program private via email) so one of us can take a look at it.