How much memory your program takes? (FastMM vs Borland MM)

How much memory your program takes? (FastMM vs Borland MM) - delphi

I have seen recently a strange behavior in my program. After creating large amounts of objects (500MB of RAM) then releasing them, the program's memory footprint does not return to its original size. It still shows a footprint of 160MB (Private working set).
Normal behavior?
Borland's memory manager does not behave like this, so if possible please confirm (or infirm) this is a normal behavior for FastMM: If you have a handy program in which you create a rather complex MDI child (containing several controls/objects), can you create in a loop 250 instances of that MDI child in memory (at the same time) then release them all and check the memory footprint. Please make sure that you consume at least 200-300MB or RAM with those MDI childs.
Especially those that still using Delphi 7 can see the difference by temporary disabling FastMM.
Thanks
If anybody is interested, especially if you want some proof this is not a memory leak (I hope it is not a mem leak in my code - this is also one of the points of this post: to check if it is my fault), here are the original discussions:
My program never releases the memory back. Why?
How to convince the memory manager to release unused memory

Dear Altar, I'm dazzled at how off the point you are in your guesses and how you don't listen to what people told you many times before.
Let's set some things straight. Memory management 101. Please read thoroughly.
When you allocate memory in Delphi, there are two memory managers involved.
System memory manager
First one is a system memory manager. This one is built into Windows and it gives memory in 4kb sized pages.
But it doesn't always give you memory in RAM (or physical memory). Your data can be kept on the hard drive, and read back every time you need to access it. This is awfully slow.
In other words, imagine you have 512Mb of physical memory. You run two programs, each requesting 1Gb of memory. What does OS do?
It grants both requests. Both apps get 1Gb of memory each. Both think all the memory is "in memory". But in fact, only 512Mb can be kept in RAM. The rest is stored in page file, although your app does not know that. It just works slow.
Working set size
Now, what is a "working set size" you are measuring?
It's the part of the allocated memory that is kept in RAM.
If you have an application which allocates 1Gb of memory, and you only have 512 Mb of RAM, then it's working set size will be 512Mb. Although it "uses" 1Gb of memory!
When you run another application which needs memory, OS will automatically free some RAM by moving rarely used blocks of "memory" to the hard drive.
Your virtual memory allocation will stay the same, but more pages will be on the hard drive and less in RAM. Working set size will decrease.
From this, you should have understood by this point, that it's pointless to try and minimize the working set size. You're achieving nothing. You're not freeing memory in any sense. You're just offloading the data to the hard drive.
But the system will do that automatically when it needs to. And there's no point making room in RAM until it's needed. You're just slowing down your application, that's all.
TLDR: "Working set size" is not "how much memory application uses". It's "how much is ready right now". Don't try to minimize it, you're just making things worse.
Delphi memory manager
OS gives you virtual memory in pages of 4Kb. But often you need it in much smaller chunks. For instance, 4 bytes for your integer, or 32 bytes for some structure. The solution?
Application memory manager, such as FastMM or BorlandMM or others.
It's job is to allocate memory in pages from the operating system, then give you small chunks of those pages when you need it.
In other words, when you ask for 14 bytes of memory, this is what happens:
You ask FastMM for 14 bytes of memory.
FastMM asks OS for 1 page of memory (4096 bytes).
OS grants one page of memory, backing it up with RAM (it's stored in actual RAM).
FastMM saves that page, cuts 14 bytes of it and gives to you.
When you ask for another 14 bytes, FastMM just cuts another 14 bytes from the same page.
What happens when you release memory? The same thing backwards:
You release 14 bytes to FastMM. Nothing happens.
You release another 14 bytes. FastMM sees that the 4096 byte page it allocated is now completely unused.
Therefore it releases the page, returning it to the system.
It's worth noting that FastMM cannot release just 14 bytes to the system. It has to release memory in pages. Until the whole page is free, FastMM cannot do a thing. Nobody can.
So, why is my working set size so big, even though I released everything?
First, your working set size is not what you should be measuring. Virtual memory consumption is. But if you have big working set size, your virtual memory consumption will be high too.
What's the problem? You should be able to figure out by this point.
Let's say you allocate 1kb, then 3kb of memory. How much virtual memory have you allocated? 4kb, 1 page.
Now you release 3Kb. How much virtual memory do you use now? 1Kb? No, it's still 1 page. You cannot allocate less than 1 page from the system. You're still using 4096 bytes of virtual memory.
Imagine if you do that 1000 times. 1kb, 3kb, 1kb, 3kb, 1kb, 3kb and so on. You allocate 1000 * 4kb = 4 mb like that, and then you release all the 3kb parts. How much virtual memory do you use now?
Still 4 mb. Because you allocated 1000 pages at first. Of every page you took 1kb and 3kb chunks. Even if you release 3kb chunks, 1kb chunks will continue to keep every single page you allocated in memory. And every page takes 4kb of virtual memory.
Memory manager cannot magically "move" all of your 1kb chunks together. This is impossible, because their virtual addresses can be referenced from somewhere in code. It's not a trait of FastMM.
But why with BorlandMM everything works better?
Coincidence. Maybe it just so happens that BorlandMM gives you memory in a slightly different way than FastMM does. Next thing you know, you change something in your app and BorlandMM acts just like FastMM did. It's impossible for a memory manager to completely prevent this effect, called memory fragmentation.
So what do I do?
Short answer is, not much until this bothers you.
You see, with modern operating systems, you're not really eating anyone's RAM. Per above, OS will automatically swap your pages out when it needs RAM for other applications. This should not be a concern.
And the "excessive" memory isn't lost. Although pages are allocated, 3kb of each is marked as "free". Next time your app needs memory, memory manager will use that space.
But if you really want to help it, you should reorganize your allocations so that the ones you're planning on keeping are done first, and the ones you will soon release are all allocated after that.
Like this: 1kb, 1kb, 1kb, ..., 3kb, 3kb, 3kb...
If you now release all the 3kb chunks, your virtual memory consumption will drop significantly.
This is not always possible. If it's impossible, then just do nothing. It's more or less alright like it is.
And P.S.
You shouldn't be allocating 500 forms in the first place. This is clearly not a way to go. Fix this, and you won't even have a need to think about memory allocation and releasing.
I hope this clears things up, because four posts on the same topic, frankly, is a bit too much.

IIRC, the Delphi memory manager does not immediately return free'd memory to the OS.
Memory is allocated in chunks of small, medium and large sizes, called blocks.
These blocks are kept for a while after their contents have been disposed to have them readyly available when another allocation is requested afterwards.
This limits the amount of system calls required for succesive allocation of multiple objects, and helps avoiding heap fragmentation.

Infirming: Delphi 2007, default memory manager (should be FastMM variation). Several tests on heavy objects:
Initial memory 2Mb, peak memory 30Mb, final memory 4Mb.
Initial memory 2Mb, peak memory 1Gb, final memory 5.5Mb.

What are the heapmanager stats (GetHeapStatus) on the point that 160MB is still allocated?

SOLVED
To confirm that this behavior is generated by FastMM (as suggested by Barry Kelly) I created a second program that allocated A LOT of RAM. As soon as Windows ran out of RAM, my program memory utilization returned to its original value.
Problem solved. Special thanks to Barry Kelly, the only person that pointed to the real "problem".

Related

What's the right statistic for iOS Memory footprint. Live Bytes? Real Memory? Other?

I'm definitely confused on this point.
I have an iPad application that shows 'Live Bytes' usage of 6-12mb in the object allocation instrument. If I pull up the memory monitor or activity monitor, the 'Real Memory' Column consistently climbs to around 80-90mb after some serious usage.
So do I have a normal memory footprint or a high one?
This answer and this answer claim you should watch 'Live Bytes' as the 'Real Memory' column shows memory blocks that have been released, but the OS hasn't yet reclaimed it.
On the other hand, this answer claims you need to pay attention to that memory monitor, as the 'Live Bytes' doesn't include things like interface elements.
What is the deal with iOS memory footprint!? :)

Those are simply two different metrics for measuring memory use. Which one is the "right" one depends on what question you're trying to answer.
In a nutshell, the difference between "live bytes" and "real memory" is the difference between the amount of memory currently used for stuff that your app has created and the total amount of physical memory currently attributed to your app. There are at least two reasons that those are different:
code: Your app's code has to be loaded into memory, of course, and the virtual memory system surely attributes that to your app even though it's not memory that your app allocated.
memory pools: Most allocators work by maintaining one or more pools of memory from which they can carve off pieces for individual objects or allocated memory blocks. Most implementations of malloc work that way, and I expect that the object allocator does too. These pools aren't automatically resized downward when an object is deallocated -- the memory is just marked 'free' in the pool, but the whole pool will still be attributed to your app.
There may be other ways that memory is attributed to your app without being directly allocated by your code, too.
So, what are you trying to learn about your application? If you're trying to figure out why your app crashed due to low memory, look at both "live bytes" (to see what your app is using now) and "real memory" (to see how much memory the VM system says your app is using). If you're trying to improve your app's memory performance, looking at "live bytes" or "live objects" is more likely to help, since that's the memory that you can do something about.

Seeing as how I wrote the last answer you linked to, I'll have to stand by that. If you want a total, accurate count of the current memory usage for your application, use the Memory Monitor instrument.
For reasons that I describe in this answer, Allocations hides the memory sizes of certain elements, meaning that its memory usage totals are significantly lower than your application's in-memory size. Many people find this out the hard way when they try to get their application functional on older iOS devices. On the older hardware, you had a hard memory ceiling of ~30 MB, where if you exceeded that your application was hard-killed.
Many developers (myself included) saw that we only had ~1-2 MB of live bytes in Allocations and thought we were good, until our applications started receiving memory warnings and early terminations. If you looked at Memory Monitor, you could see the true in-memory size of these applications being >20 MB, and you could see the applications being terminated the instant they crossed the 30 MB barrier in Memory Monitor.
Therefore, if you want an accurate assessment of your total application memory usage, use Memory Monitor. Allocations is great to find out the specific objects that are in memory, particularly when you use the heap shots to find things that might be accumulating (as leaks, retain cycles, or for other reasons). Just don't trust it when determining your application's actual size in memory.

'Live bytes' means memory allocated by your code (for example by malloc), so you have access to this memory. 'Real memory' shows physical amount of memory used by your app. This include also OpenGL textures, (possibly) sounds from Open AL...
Live bytes is useful to check when you allocate and release memory in your code. Real memory is good indicator for memory optimization efficiency. And it's overhead causes 'low memory' warnings.

How to reserve memory for my application and leave a specified amount remaining?

I'm planning an application which will involve loading many pictures at one time and thus requires a large chunk of memory. For example, I might have 50 image objects created at once, taking a total of 1GB of RAM. But when the user goes to load 20 more pictures, I'd like to make sure that amount of memory is already reserved and ready.
Now this part might seem a little backwards from normal. Rather than specifying how much memory my application shall reserve, instead I need to specify how much memory to leave free for other applications, and adjust my application's memory periodically according to this specification. I must say I've never worked with reserving memory at all, and especially won't know how to leave this remaining available memory.
So for example, if the computer has 2048 MB of RAM, and the option is set to leave 50 MB free for other applications, and there is already 10MB of RAM being used by other apps, then it should reserve 2048-50-10 = 1988 MB for my app.
The trouble I foresee is suppose the user opens another application which requires 1GB. My app has to catch this and shrink its self.
Does this even sound like a feasible approach? Basically, I need to make sure there is as much memory reserved as possible at any given time, while leaving a decent amount available for other apps. Would it make a significant impact on performance if I do this, or not much at all? I might be loading and unloading images at rapid paces, and I don't want it to reserve/free this memory on demand, I want it to stay reserved.

+1 for Sertac's mentioning of how SQL Server rides the line of allocating memory it needs, but releasing memory when Windows complains.
Applications can receive Window's complaints by using the CreateMemoryResourceNotification:
hLowMemory := CreateMemoryResourceNotification(LowMemoryResourceNotification);
Applications can use memory resource notification events to scale the
memory usage as appropriate. If available memory is low, the
application can reduce its working set. If available memory is high,
the application can allocate more memory.
Any thread of the calling
process can specify the memory resource notification handle in a call
to the QueryMemoryResourceNotification function or one of the wait functions.
The state of the object is signaled when the specified
memory condition exists. This is a system-wide event, so all
applications receive notification when the object is signaled. Note
that there is a range of memory availability where neither the
LowMemoryResourceNotification or HighMemoryResourceNotification object
is signaled. In this case, applications should attempt to keep the
memory use constant.
But it's also worth mentioning that you might as well allocate memory that you need. Your operating system has a very sophisiticated set of algorithms to swap out the least used memory when memory pressure is high. You can take advantage of this by simply allocating all the memory that you need. When Windows starts to run low, it will find those pages of memory that you are using the least and swap them out to disk. (This is how a well-known reverse proxy works).
The only thing left is to decide if you want to free some images when Windows says it's running low on RAM. But if you're not using the memory, it is going to be swapped out to disk for you.

It's not realistic to account for other apps. Just ignore them. The system will page things in and out as needed. If you really wanted to do this you'd have to dynamically adapt to other processes as they start and finish. That's really not realistic. What's more it's not practical to inquire of other processes how much memory they need. Leave it all to the system.
Set a budget for your app and make sure you don't exceed it. Keep the most recently used images in memory and when you approach your memory budget throw away the least recently used images to make space.
If you are stressing the available resources then make sure you use FastMM and enable LARGE_ADDRESS_AWARE for your app so that you get 4GB address space when running on a 64 bit OS.

How to use AQTime's memory allocation profiler in a program that uses a large amount of memory?

I'm finding AQTime hard to use because it interferes with the original program too much. If I have a program that uses, for example, 300MB of ram I can use AQTime's allocation profiler without a problem, and find out where most of the memory is being used. However I notice that running under AQTime, the original program uses more like 1GB while it's being profiled.
Right now I'm trying to reduce memory usage in a program which is using 1.4GB of memory. If I run it under AQTime, then the original program uses all of the 2GB address space and crashes. I can of course invent a smaller set of test data and estimate how the memory usage will scale with the full data set - but the reason I'm using a profiler in the first place is to try to avoid this sort of guesswork.
I already have AQTime set to 'Collect stack information - None' and all the check boxes to do with checking memory integrity are switched off, and I've tried restricting the area being profiled to just a few classes but this doesn't seem to improve anything. Is there a way to use AQTime that produces a smaller overhead? Or failing that, what other approaches are there to get a good idea of the memory being used?
The app is written in Delphi 2010 and I'm using AQTime 6.
NB: On top of the increased memory usage, running under AQTime slows the app down an awful lot, making the whole exercise not just impossible but impractical too :-P

AFAIK the allocation profiler will track memory block allocation regardless of profiling areas. Profiling areas are used to track classes instantiation. Of course memory-profiling an application that allocates a large amount of memory is a issue, you may try to use the LARGE_ADRESS_AWARE flag, and the /3GB boot switch, or use a 64 bit system (as long as you have at least 4GB of memory, or more). Also you can take snapshot of the application state before it crashes, to see where the memory is allocated. Profiling takes time, anyway, you may have to let it run for a while.

When to call SetProcessWorkingSetSize? (Convincing the memory manager to release the memory)

In a previous post ( My program never releases the memory back. Why? ) I show that FastMM can cache (read as hold for itself) pretty large amounts of memory. If your application just loaded a large data set in RAM, after releasing the data, you will see that impressive amounts of RAM are not released back to the memory pool.
I looked around and it seems that calling the SetProcessWorkingSetSize API function will "flush" the cache to disk. However, I cannot decide when to call this function. I wanted to call it at the end of the OnClick event on the button that is performing the RAM intensive operation. However, some people are saying that this may cause AV.
If anybody used this function successfully, please let me (us) know.
Many thanks.
Edit:
1. After releasing the data set, the program still takes large amounts of RAM. After calling SetProcessWorkingSetSize the size returns to few MB. Some argue that nothing was released back. I agree. But the memory foot print is now small AND it is NOT increasing back after using the program normally (for example when performing normal operation that does not involves loading large data sets). Unfortunately, there is no way to demonstrate that the memory swapped to disk is ever loaded back into memory, but I think it is not.
2. I have already demonstrated (I hope) this is not a memory leak:
My program never releases the memory back. Why?
How to convince the memory manager to release unused memory

If SetProcessWorkingSetSize would solve your problem, then your problem is not that FastMM is keeping hold of memory. Since this function will just trim the workingset of your application by writing the memory in RAM to the page file. Nothing is released back to Windows.
In fact you only have made accessing the memory again slower, since it now has to be read from disc. This method has the same effect as minimising your application. Then Windows presumes you are not going to use the application again soon and also writes the workingset in RAM to the pagefile. Windows does a good job of deciding when to write RAM to the pagefile and tries to keep the most used memory in RAM as long as it can. It will make the workinset size smaller (write to pagefile) when there is little RAM left. I would not mess with it just to give the illusion that you program is using less memory while in fact it is using just as much as before, only now it is slower to access. Memory that is accessed again will be loaded into RAM again and make the workinset size grow again. Touching less memory keeps the workingset size smaller.
So no, this will not help you forcing FastMM to release the memory. If your goal is for your application to use less memory you should look elsewhere. Look for leaks, look for heap fragmentations look for optimisations and if you think FastMM is keeping you from doing so you should try to find facts to support it. If your goal is to keep your workinset size small you could try to keep your memory access local. Maybe FastMM or another memory manager could help you with it, but it is a very different problem compared to using to much memory. And maybe this function does help you solve the problem you are having, but I would use it with care and certainly not use it just to keep the illusion that your program has a low memory usage.

I agree with Lars Truijens 100%, if you don't than you can check the FasttMM memory usage via FasttMM calls GetMemoryManagerState and GetMemoryManagerUsageSummary before and after calling API SetProcessWorkingSetSize.

Are you sure there is a problem? Working sets might only decrease when there really is a memory shortage.

Problem solved:
I don't need to use SetProcessWorkingSetSize. FastMM will eventually release the RAM.
To confirm that this behavior is generated by FastMM (as suggested by Barry Kelly) I crated a second program that allocated A LOT of RAM. As soon as Windows ran out of RAM, my program memory utilization returned to its original value.

I used this function just once, when I implemented TWebBrowser. This component took me so much memory even if I destroyed the instance.

40 million page faults. How to fix this?

I have an application that loads 170 files (let’s say they are text files) from disk in individual objects and kept in memory all the time. The memory is allocated once when I load those files from disk. So, there is no memory fragmentation involved. I also use FastMM to make sure my applications never leaks memory.
The application compares all these files with each other to find similarities. Over-simplified we can say that we compare text strings but the algorithm is way more complex as I have to allow some differences between strings. Each file is about 300KB. Loaded in memory (the object that holds it) it takes about 0.4MB of RAM. So, the running app takes about 60MB or RAM (working set). It processes the data for about 15 minutes. The thing is that it generates over 40 million page faults.
Why? I have about 2GB of free RAM. From what I know Page Faults are slow. How much they are slowing down my program?
How can I optimize the program to reduce these page faults? I guess it has something to do with data locality. Does anybody know some example algorithms for this (Delphi)?
Update:
But looking at the number of page faults (no other application in Task Manager comes close to mine, not even by far) I guess that I could increase the speed of my application IF I manage to optimize memory layout (reduce the page faults).
Delphi 7, Win 7 32 bit, RAM 4GB (3GB visible, 2GB free).

Caveat - I'm only addressing the page faulting issue.
I cannot be sure but have you considered using Memory Mapped files? In this way windows will use the files themselves as the paging file (rather than the main paging file pagrefile.sys). If the files are read only then the number of page faults should theoretically decrease as the pages won't need to written out to disk via the paging file as windows will just load the data from the file itself as needed.
Now to reduce files from paging in and out you need to try and go through the data in one direction so that as new data is read, older pages can be discarded for ever. Here is where you trade off going over the files again and caching data - the cache has to be stored somewhere.
Note that Memory Mapped files is how windows loads .dlls and .exes amongst other things. I've used them to scan though gigabyte files without hitting memory limits (we had MBs in those days and not GBs of ram).
However from the data you describe I'd suggest the ability to not go back ovver files will reduce the amount of repaging going on.

On my machine most pagefaults are reported for developer studio which is reported to have 4M page faults after 30+ minutes total CPU time. You get 10 times more, in half the time. And memory is scarce on my system. So 40M faults seems like a lot.
It could just maybe be you have a memory leak.
the working set is only the physical memory in use for your application. If you leak memory, and don't touch it, it will get paged out. You will see the virtual memory useage (or page file use) increase. These pages might be swapped back in when the heap memory walks the heap, to get swapped out again by windows.
Because you have a lot of RAM, the swapped out pages will stay in physical memory, as nobody else needs them. (a page recovered from RAM counts as a soft fault, from disk as a hard one)

Do you use an exponential resize system ?
If you grow the block of memory in too small increments while loading, it might constantly request large blocks from the system, copy the data over, and then release the old block (assuming that fastmm (de)allocates very large blocks directly from the OS).
Maybe somehow this causes a loop where the OS releases memory from your app's process, and then adds it again, causing page faults on first write.
Also avoid Tstringlist.load* methods for very large files, IIRC these consume twice the space needed.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart