DelayedJob doesn't release memory - ruby-on-rails

I'm using Puma server and DelayedJob.
It seems that the memory taken by each job isn't released and I slowly get a bloat causing me to restart my dyno (Heroku).
Any reason why the dyno won't return to the same memory usage figure before the job was performed?
Any way to force releasing it? I tried calling GC but it doesn't seem to help.

You can have one of the following problems. Or actually all of them:
Number 1. This is not an actual problem, but a misconception about how Ruby releases memory to operating system. Short answer: it doesn't. Long answer: Ruby manages an internal list of free objects. Whenever your program needs to allocate new objects, it will get those objects from this free list. If there are no more objects there, Ruby will allocate new memory from operating system. When objects are garbage collected they go back to the free list. So Ruby still have the allocated memory. To illustrate it better, imagine that your program is normally using 100 MB. When at some point program will allocate 1 GB, it will hold this memory until you restart it.
There are some good resource to learn more about it here and here.
What you should do is to increase your dyno size and monitor your memory usage over time. It should stabilize at some level. This will show you your normal memory usage.
Number 2. You can have an actual memory leak. It can be in your code or in some gem. Check out this repository, it contains information about well known memory leaks and other memory issues in popular gems. delayed_job is actually listed there.
Number 3. You may have unoptimized code that is using more memory than needed and you should try to investigate memory usage and try to decrease it. If you are processing large files, maybe you should do it in smaller batches etc.

Related

Sidekiq does not release memory after job is processed

I have a ruby on rails app, where we validate records from huge excel files(200k records) in background via sidekiq. We also use docker and hence a separate container for sidekiq. When the sidekiq is up, memory used is approx 120Mb, but as the validation worker begins, the memory reaches upto 500Mb (that's after a lot of optimisation).
Issue is even after the job is processed, the memory usage stays at 500Mb and is never freed, not allowing any new jobs to be added.
I manually start garbage collection using GC.start after every 10k records and also after the job is complete, but still no help.
This is most likely not related to Sidekiq, but to how Ruby allocates from and releases memory back to the OS.
Most likely the memory can not be released because of fragmentation. Besides optimizing your program (process data chunkwise instead of reading it all into memory) you could try and tweak the allocator or change the allocator.
There has been a lot written about this specific issue with Ruby/Memory, I really like this post by Nate Berkopec: https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html which goes into all the details.
The simple "solution" is:
Use jemalloc or, if not possible, set MALLOC_ARENA_MAX=2.
The more complex solution would be to try and optimize your program further, so that it does not load that much data in the first place.
I was able to cut memory usage in a project from 12GB to < 3GB by switching to jemalloc. That project dealt with a lot of imports/exports and was written quite poorly and it was an easy win.

Why Dart goes very slow after some time?

We have some problems with Dart. It seems like after some period of time the garbage collector can't clear the memory in VM, so application hangs. Anyone with this issue? Are there any memory limits?
You should reuse your objects instead of creating new ones. You should use pool pattern:
http://en.wikipedia.org/wiki/Object_pool_pattern
Be careful about canvas and it's proper destruction.
Another GC performance papers:
http://blog.tojicode.com/2012/03/javascript-memory-optimization-and.html
http://qt-project.org/doc/qt-5/qtquick-performance.html
Are there any memory limits?
Yes. Dart apparently runs with a maximum sizes that can be configured at launch time:
How to run a dart program with big memory?
(The following applies to all garbage-collected languages ...)
If your application starts to run out of space (i.e. the heap is slowly filing with objects that the GC can't remove) then you may get into a nasty situation where the GC runs more and more frequently, and manages to reclaim less and less memory each time. Eventually you run out of memory, but before that happens the application gets really slow.
The solution is typically to do one or both of the following:
Find what is causing the memory to run out. It is typically not that you are allocating too many objects. Rather, the typical cause is that the unwanted objects are all still reachable ... via some data structure that your application has built.
Set the "quick death" tuning option for the GC .... if available. For example, Java garbage collectors can be configured to measure the time spent garbage collecting. (The GC overhead.) When the GC overhead exceeds a preset ratio, the Java virtual machine throws an OutOfMemoryError to "pull the plug".

How to reserve memory for my application and leave a specified amount remaining?

I'm planning an application which will involve loading many pictures at one time and thus requires a large chunk of memory. For example, I might have 50 image objects created at once, taking a total of 1GB of RAM. But when the user goes to load 20 more pictures, I'd like to make sure that amount of memory is already reserved and ready.
Now this part might seem a little backwards from normal. Rather than specifying how much memory my application shall reserve, instead I need to specify how much memory to leave free for other applications, and adjust my application's memory periodically according to this specification. I must say I've never worked with reserving memory at all, and especially won't know how to leave this remaining available memory.
So for example, if the computer has 2048 MB of RAM, and the option is set to leave 50 MB free for other applications, and there is already 10MB of RAM being used by other apps, then it should reserve 2048-50-10 = 1988 MB for my app.
The trouble I foresee is suppose the user opens another application which requires 1GB. My app has to catch this and shrink its self.
Does this even sound like a feasible approach? Basically, I need to make sure there is as much memory reserved as possible at any given time, while leaving a decent amount available for other apps. Would it make a significant impact on performance if I do this, or not much at all? I might be loading and unloading images at rapid paces, and I don't want it to reserve/free this memory on demand, I want it to stay reserved.
+1 for Sertac's mentioning of how SQL Server rides the line of allocating memory it needs, but releasing memory when Windows complains.
Applications can receive Window's complaints by using the CreateMemoryResourceNotification:
hLowMemory := CreateMemoryResourceNotification(LowMemoryResourceNotification);
Applications can use memory resource notification events to scale the
memory usage as appropriate. If available memory is low, the
application can reduce its working set. If available memory is high,
the application can allocate more memory.
Any thread of the calling
process can specify the memory resource notification handle in a call
to the QueryMemoryResourceNotification function or one of the wait functions.
The state of the object is signaled when the specified
memory condition exists. This is a system-wide event, so all
applications receive notification when the object is signaled. Note
that there is a range of memory availability where neither the
LowMemoryResourceNotification or HighMemoryResourceNotification object
is signaled. In this case, applications should attempt to keep the
memory use constant.
But it's also worth mentioning that you might as well allocate memory that you need. Your operating system has a very sophisiticated set of algorithms to swap out the least used memory when memory pressure is high. You can take advantage of this by simply allocating all the memory that you need. When Windows starts to run low, it will find those pages of memory that you are using the least and swap them out to disk. (This is how a well-known reverse proxy works).
The only thing left is to decide if you want to free some images when Windows says it's running low on RAM. But if you're not using the memory, it is going to be swapped out to disk for you.
It's not realistic to account for other apps. Just ignore them. The system will page things in and out as needed. If you really wanted to do this you'd have to dynamically adapt to other processes as they start and finish. That's really not realistic. What's more it's not practical to inquire of other processes how much memory they need. Leave it all to the system.
Set a budget for your app and make sure you don't exceed it. Keep the most recently used images in memory and when you approach your memory budget throw away the least recently used images to make space.
If you are stressing the available resources then make sure you use FastMM and enable LARGE_ADDRESS_AWARE for your app so that you get 4GB address space when running on a 64 bit OS.

How to use AQTime's memory allocation profiler in a program that uses a large amount of memory?

I'm finding AQTime hard to use because it interferes with the original program too much. If I have a program that uses, for example, 300MB of ram I can use AQTime's allocation profiler without a problem, and find out where most of the memory is being used. However I notice that running under AQTime, the original program uses more like 1GB while it's being profiled.
Right now I'm trying to reduce memory usage in a program which is using 1.4GB of memory. If I run it under AQTime, then the original program uses all of the 2GB address space and crashes. I can of course invent a smaller set of test data and estimate how the memory usage will scale with the full data set - but the reason I'm using a profiler in the first place is to try to avoid this sort of guesswork.
I already have AQTime set to 'Collect stack information - None' and all the check boxes to do with checking memory integrity are switched off, and I've tried restricting the area being profiled to just a few classes but this doesn't seem to improve anything. Is there a way to use AQTime that produces a smaller overhead? Or failing that, what other approaches are there to get a good idea of the memory being used?
The app is written in Delphi 2010 and I'm using AQTime 6.
NB: On top of the increased memory usage, running under AQTime slows the app down an awful lot, making the whole exercise not just impossible but impractical too :-P
AFAIK the allocation profiler will track memory block allocation regardless of profiling areas. Profiling areas are used to track classes instantiation. Of course memory-profiling an application that allocates a large amount of memory is a issue, you may try to use the LARGE_ADRESS_AWARE flag, and the /3GB boot switch, or use a 64 bit system (as long as you have at least 4GB of memory, or more). Also you can take snapshot of the application state before it crashes, to see where the memory is allocated. Profiling takes time, anyway, you may have to let it run for a while.

When to call SetProcessWorkingSetSize? (Convincing the memory manager to release the memory)

In a previous post ( My program never releases the memory back. Why? ) I show that FastMM can cache (read as hold for itself) pretty large amounts of memory. If your application just loaded a large data set in RAM, after releasing the data, you will see that impressive amounts of RAM are not released back to the memory pool.
I looked around and it seems that calling the SetProcessWorkingSetSize API function will "flush" the cache to disk. However, I cannot decide when to call this function. I wanted to call it at the end of the OnClick event on the button that is performing the RAM intensive operation. However, some people are saying that this may cause AV.
If anybody used this function successfully, please let me (us) know.
Many thanks.
Edit:
1. After releasing the data set, the program still takes large amounts of RAM. After calling SetProcessWorkingSetSize the size returns to few MB. Some argue that nothing was released back. I agree. But the memory foot print is now small AND it is NOT increasing back after using the program normally (for example when performing normal operation that does not involves loading large data sets). Unfortunately, there is no way to demonstrate that the memory swapped to disk is ever loaded back into memory, but I think it is not.
2. I have already demonstrated (I hope) this is not a memory leak:
My program never releases the memory back. Why?
How to convince the memory manager to release unused memory
If SetProcessWorkingSetSize would solve your problem, then your problem is not that FastMM is keeping hold of memory. Since this function will just trim the workingset of your application by writing the memory in RAM to the page file. Nothing is released back to Windows.
In fact you only have made accessing the memory again slower, since it now has to be read from disc. This method has the same effect as minimising your application. Then Windows presumes you are not going to use the application again soon and also writes the workingset in RAM to the pagefile. Windows does a good job of deciding when to write RAM to the pagefile and tries to keep the most used memory in RAM as long as it can. It will make the workinset size smaller (write to pagefile) when there is little RAM left. I would not mess with it just to give the illusion that you program is using less memory while in fact it is using just as much as before, only now it is slower to access. Memory that is accessed again will be loaded into RAM again and make the workinset size grow again. Touching less memory keeps the workingset size smaller.
So no, this will not help you forcing FastMM to release the memory. If your goal is for your application to use less memory you should look elsewhere. Look for leaks, look for heap fragmentations look for optimisations and if you think FastMM is keeping you from doing so you should try to find facts to support it. If your goal is to keep your workinset size small you could try to keep your memory access local. Maybe FastMM or another memory manager could help you with it, but it is a very different problem compared to using to much memory. And maybe this function does help you solve the problem you are having, but I would use it with care and certainly not use it just to keep the illusion that your program has a low memory usage.
I agree with Lars Truijens 100%, if you don't than you can check the FasttMM memory usage via FasttMM calls GetMemoryManagerState and GetMemoryManagerUsageSummary before and after calling API SetProcessWorkingSetSize.
Are you sure there is a problem? Working sets might only decrease when there really is a memory shortage.
Problem solved:
I don't need to use SetProcessWorkingSetSize. FastMM will eventually release the RAM.
To confirm that this behavior is generated by FastMM (as suggested by Barry Kelly) I crated a second program that allocated A LOT of RAM. As soon as Windows ran out of RAM, my program memory utilization returned to its original value.
I used this function just once, when I implemented TWebBrowser. This component took me so much memory even if I destroyed the instance.

Resources