How do programs allocate large amounts of memory? - memory

I have 3 questions concerning memory allocation that I thought better to put into one question than 3.
When memory is allocated as I understand, it is allocated on the heap, which is just 16mb. How hen do programs such as video games or modern browsers manage to use over 1GB?
Since it is obviously possible for this much memory to be used, why can it not be allocated at the start? I have found the most I can allocate in High Level Assembly language is around 100MB. This is a lot more than 16MB, and far less than I have 3, so where does this limitation come from?
Why allocate memory in the first place, rather than allocating variables and letting the compiler/system handle it?

When memory is allocated as I understand, it is allocated on the heap,
which is just 16mb. How hen do programs such as video games or modern
browsers manage to use over 1GB?
The heap can grow. It isn't limited to any value and certainly not 16MB. You can easily allocate 1GB of heap, just make a program test and you'll see.
Since it is obviously possible for this much memory to be used, why
can it not be allocated at the start? I have found the most I can
allocate in High Level Assembly language is around 100MB. This is a
lot more than 16MB, and far less than I have 3, so where does this
limitation come from?
I'm not sure why your OS isn't filling larger allocation requests. Perhaps due to memory fragmentation? It's going to be a problem specific to your setup, which you didn't share. I can allocation much more memory than that without an issue.
You can try to use the mmap system call if malloc (which uses the brk system call) is having some sort of issue. Note that for GNU libc, malloc actually uses mmap instead of brk when the allocation is large enough (over 128k I think).
Why allocate memory in the first place, rather than allocating
variables and letting the compiler/system handle it?
Variable must live in memory somewhere. What you are saying is "why manually manage memory? Why can't some algorithm do that for me?". It is actually very common for the compiler and a runtime component to handle allocation/freeing - it's called garbage collection.

Related

How to analyze excessive memory consumption (PageFileUsage) in a Delphi Application?

This is a follow-up to this question: What could explain the difference in memory usage reported by FastMM or GetProcessMemoryInfo?
My Delphi XE application is using a very large amount of memory which sometimes lead to an out of memory exception. I'm trying to understand why and what is causing this memory usage and while FastMM is reporting low memory usage, when requesting for TProcessMemoryCounters.PageFileUsage I can clearly see that a lot of memory is used by the application.
I would like to understand what is causing this problem and would like some advise on how to handle it:
Is there a way to know what is contained in that memory and where it has been allocated ?
Is there some tool to track down memory usage by line/procedure in a Delphi application ?
Any general advise on how to handle such a problem ?
EDIT 1 : Here are two screenshots of FastMMUsageTracker indicating that memory has been allocate by the system.
Before process starts:
After process ends:
Legend: Light red is FastMM allocated and dark gray is system allocated.
I'd like to understand what is causing the system to use that much memory. Probably by understanding what is contained in that memory or what line of code or procedure did cause that allocation.
EDIT 2 : I'd rather not use the full version of AQTime for multiple reasons:
I'm using multiple virtual machines for development and their licensing system is a PITA (I'm already a registered user of TestComplete)
LITE version doesn't provide enough information and I won't waste money without making certain the FULL version will give me valuable information
Any other suggestions ?
Another problem might be heap fragmentation. This means you have enough memory free, but all the free blocks are to small. You might see it visually by using the source version of FastMM and use the FastMMUsageTracker.pas as suggested here.
You need a profiler, but even that won't be enough in lots of places and cases. Also, in your case, you would need the full featured AQTime, not the lite version that comes with Delphi XE and XE2. (AQTIME is extremely expensive, and annoyingly node-locked, so don't think I'm a shill for SmartBear software.)
The thing is that people often mistake AQTime Allocation Profiler as only a way to find leaks. It can also tell you where your memory goes, at least within the limits of the tool. While running, and consuming lots of memory, I click Run -> Get Results.
Here is one of my applications being profile in AQTime with its Allocation Profiler showing exactly what class is allocating how many instances on the heap and how much memory those use. Since you report low Delphi heap usage with FastMM, that tells me that most of AQTime's ability to analyze by delphi class name will also be useless to you. However by using AQTime's events and triggers, you might be able to figure out what areas of your application are causing you a "memory usage expense" and when those occur, what the expense is. AQTime's real-time instrumentation may be sufficient to help you narrow down the cause even though it might not find for you what function call is causing the most memory usage automatically.
The column names include "Object Name" which includes things like this:
* All delphi classes, and their instance count and heap usage.
* Virtual Memory blocks allocated via Win32 calls.
It can detect Delphi and C/C++ library allocations on the heap, and can see certain Windows-API level memory allocations.
Note the live count of objects, the amount of memory from the heap that is used.
I usually try to figure out the memory cost of a particular operation by measuring heap memory use before, and just after, some expensive operation, but before the cleanup (freeing) of the memory from that expensive operation. I can set event points inside AQTime and when a particular method gets hit or a flag gets turned on by me, I can measure before, and after values, and then compare them.
FastMM alone can not even detect a non-delphi allocation or an allocation from a heap that is not being managed by FastMM. AQTime is not limited in that way.

How does a program know how much memory to release?

I suspect the answer to my question is language specific, so I'd like to know about C and C++. When I call free() on a buffer or use delete[], how does the program know how much memory to free?
Where is the size of the buffer or of the dynamically allocated array stored and why isn't it available to the programmer as well?
Each implementation will be different, but typically the runtime allocates a bit more than asked for, and uses some hidden fields at the start of the block to remember the allocated size. The address returned to the caller is therefore offset a bit from the start of the memory claimed from the heap.
It isn't available to the caller because the true amount of memory claimed from the heap is an implementation detail, and will vary between compilers and platforms. As for knowing how much the caller asked for, rather than how much was allocated from the heap... well, the language designers assume that the programmer is capable of remembering this if needed.
The heap keeps track of all memory blocks, both allocated and free, specifically for that purpose. Typical (if naive) implemenation allocates memory, uses several bytes in the beginning for bookkeeping, and returns the address past those bytes. On subsequent operations (free/realloc), it would subtract a few bytes to get to the bookkeeping area.
Some heap implementations (say, Windows' GlobalAlloc()) let you know the block size given the starting address. But in the C/C++ RTL heap, no such service.
Note that the malloc() sometimes overallocates memory, so the information about mallocated block size would be of limited utility. C++ new[]'ed arrays, that's a whole another matter - for those, knowing exact array size is essential for array destruction to work properly. Still, there's no such thing in C++ as a dynamic_sizeof operator.
The memory allocator that gave you that chunk of memory is responsible for all that maintenance data. Typically it's stored in the beginning of the chunk (right before the actual address you use) so it's easy to access on freeing.
Regarding to your other question: why should your app know about it? It's not your concern. It decouples memory allocation management from the app so you can use different allocators (for performance or debugging reasons).
It's stored internally in a location dependent on the language/compiler/OS.
Sometimes it is available (.Length in C# for example), though that may only refer to how much memory you're allowed to use, and not the object's total size.
Usually because the size to free is stored somewhere within the allocated buffer. A common technique is to have the size stored in memory just previous to the returned pointer.
Why isn't such information available to the programmer? I don't really know. I guess its because an implementation may be able to provide memory allocation without actually needing to store its size, and such implementation -if it exists- shouldn't be penalized by the others.
It's not so much language specific. It's all done by the memory manager.
How it knows depends on how the memory manager manages memory. The general idea is that the memory manager allocates more memory than you ask for. It stores extra data about the allocated blocks of memory in those locations. Thus, when you release the memory, it uses the information stored in those locations (reconstructed based on the given pointer) and figures out how much actual memory to stop managing.
Don't confound deallocation and destruction.
free() knows the size of the memory because of some internal magic ("implementation-defined"), e.g. the allocator could keep a list of all the allocated memory regions indexed by their corresponding pointers and just look up the pointer to know what to deallocate; or that information could be stored next to the allocated memory itself in some hidden block of data.
The array-delete expression delete[] arr; does not only deallocate memory, but it also invokes all destructors. For that purpose, it is not sufficient to just know the memory size, but we also need to know the number of elements. For that purpose, new T[N] actually allocates more than sizeof(T) * N bytes of memory, so the array-deleter knows how many destructors to call. All that memory is properly deallocated by the corresponding delete-operator.

Inside Dynamics memory management

i am student and want to know more about the dynamics memory management. For C++, calling operator new() can allocate a memory block under the Heap(Free Store ). In fact, I have not a full picture how to achieve it.
There are a few questions:
1) What is the mechanism that the OS can allocate a memory block?? As I know, there are some basic memory allocation schemes like first-fit, best-fit and worst-fit. Does OS use one of them to allocate memory dynamically under the heap?
2) For different platform like Android, IOS, Window and so on, are they used different memory allocation algorithms to allocate a memory block?
3) For C++, when i call operator new() or malloc(), Does the memory allocator allocate a memory block randomly in the heap?
Hope anyone can help me.
Thanks
malloc is not a system call, it is library (libc) routine which goes through some of its internal structures to give you address of a free piece of memory of the required size. It only does a system call if the process' data segment (i.e. virtual memory it can use) is not "big enough" according to the logic of malloc in question. (On Linux, the system call to enlarge data segment is brk)
Simply said, malloc provides fine-grained memory management, while OS manages coarser, big chunks of memory made available to that process.
Not only different platforms, but also different libraries use different malloc; some programs (e.g. python) use its internal allocator instead as they know its own usage patterns and can increase performance that way.
There is a longthy article about malloc at wikipedia.

How much memory your program takes? (FastMM vs Borland MM)

I have seen recently a strange behavior in my program. After creating large amounts of objects (500MB of RAM) then releasing them, the program's memory footprint does not return to its original size. It still shows a footprint of 160MB (Private working set).
Normal behavior?
Borland's memory manager does not behave like this, so if possible please confirm (or infirm) this is a normal behavior for FastMM: If you have a handy program in which you create a rather complex MDI child (containing several controls/objects), can you create in a loop 250 instances of that MDI child in memory (at the same time) then release them all and check the memory footprint. Please make sure that you consume at least 200-300MB or RAM with those MDI childs.
Especially those that still using Delphi 7 can see the difference by temporary disabling FastMM.
Thanks
If anybody is interested, especially if you want some proof this is not a memory leak (I hope it is not a mem leak in my code - this is also one of the points of this post: to check if it is my fault), here are the original discussions:
My program never releases the memory back. Why?
How to convince the memory manager to release unused memory
Dear Altar, I'm dazzled at how off the point you are in your guesses and how you don't listen to what people told you many times before.
Let's set some things straight. Memory management 101. Please read thoroughly.
When you allocate memory in Delphi, there are two memory managers involved.
System memory manager
First one is a system memory manager. This one is built into Windows and it gives memory in 4kb sized pages.
But it doesn't always give you memory in RAM (or physical memory). Your data can be kept on the hard drive, and read back every time you need to access it. This is awfully slow.
In other words, imagine you have 512Mb of physical memory. You run two programs, each requesting 1Gb of memory. What does OS do?
It grants both requests. Both apps get 1Gb of memory each. Both think all the memory is "in memory". But in fact, only 512Mb can be kept in RAM. The rest is stored in page file, although your app does not know that. It just works slow.
Working set size
Now, what is a "working set size" you are measuring?
It's the part of the allocated memory that is kept in RAM.
If you have an application which allocates 1Gb of memory, and you only have 512 Mb of RAM, then it's working set size will be 512Mb. Although it "uses" 1Gb of memory!
When you run another application which needs memory, OS will automatically free some RAM by moving rarely used blocks of "memory" to the hard drive.
Your virtual memory allocation will stay the same, but more pages will be on the hard drive and less in RAM. Working set size will decrease.
From this, you should have understood by this point, that it's pointless to try and minimize the working set size. You're achieving nothing. You're not freeing memory in any sense. You're just offloading the data to the hard drive.
But the system will do that automatically when it needs to. And there's no point making room in RAM until it's needed. You're just slowing down your application, that's all.
TLDR: "Working set size" is not "how much memory application uses". It's "how much is ready right now". Don't try to minimize it, you're just making things worse.
Delphi memory manager
OS gives you virtual memory in pages of 4Kb. But often you need it in much smaller chunks. For instance, 4 bytes for your integer, or 32 bytes for some structure. The solution?
Application memory manager, such as FastMM or BorlandMM or others.
It's job is to allocate memory in pages from the operating system, then give you small chunks of those pages when you need it.
In other words, when you ask for 14 bytes of memory, this is what happens:
You ask FastMM for 14 bytes of memory.
FastMM asks OS for 1 page of memory (4096 bytes).
OS grants one page of memory, backing it up with RAM (it's stored in actual RAM).
FastMM saves that page, cuts 14 bytes of it and gives to you.
When you ask for another 14 bytes, FastMM just cuts another 14 bytes from the same page.
What happens when you release memory? The same thing backwards:
You release 14 bytes to FastMM. Nothing happens.
You release another 14 bytes. FastMM sees that the 4096 byte page it allocated is now completely unused.
Therefore it releases the page, returning it to the system.
It's worth noting that FastMM cannot release just 14 bytes to the system. It has to release memory in pages. Until the whole page is free, FastMM cannot do a thing. Nobody can.
So, why is my working set size so big, even though I released everything?
First, your working set size is not what you should be measuring. Virtual memory consumption is. But if you have big working set size, your virtual memory consumption will be high too.
What's the problem? You should be able to figure out by this point.
Let's say you allocate 1kb, then 3kb of memory. How much virtual memory have you allocated? 4kb, 1 page.
Now you release 3Kb. How much virtual memory do you use now? 1Kb? No, it's still 1 page. You cannot allocate less than 1 page from the system. You're still using 4096 bytes of virtual memory.
Imagine if you do that 1000 times. 1kb, 3kb, 1kb, 3kb, 1kb, 3kb and so on. You allocate 1000 * 4kb = 4 mb like that, and then you release all the 3kb parts. How much virtual memory do you use now?
Still 4 mb. Because you allocated 1000 pages at first. Of every page you took 1kb and 3kb chunks. Even if you release 3kb chunks, 1kb chunks will continue to keep every single page you allocated in memory. And every page takes 4kb of virtual memory.
Memory manager cannot magically "move" all of your 1kb chunks together. This is impossible, because their virtual addresses can be referenced from somewhere in code. It's not a trait of FastMM.
But why with BorlandMM everything works better?
Coincidence. Maybe it just so happens that BorlandMM gives you memory in a slightly different way than FastMM does. Next thing you know, you change something in your app and BorlandMM acts just like FastMM did. It's impossible for a memory manager to completely prevent this effect, called memory fragmentation.
So what do I do?
Short answer is, not much until this bothers you.
You see, with modern operating systems, you're not really eating anyone's RAM. Per above, OS will automatically swap your pages out when it needs RAM for other applications. This should not be a concern.
And the "excessive" memory isn't lost. Although pages are allocated, 3kb of each is marked as "free". Next time your app needs memory, memory manager will use that space.
But if you really want to help it, you should reorganize your allocations so that the ones you're planning on keeping are done first, and the ones you will soon release are all allocated after that.
Like this: 1kb, 1kb, 1kb, ..., 3kb, 3kb, 3kb...
If you now release all the 3kb chunks, your virtual memory consumption will drop significantly.
This is not always possible. If it's impossible, then just do nothing. It's more or less alright like it is.
And P.S.
You shouldn't be allocating 500 forms in the first place. This is clearly not a way to go. Fix this, and you won't even have a need to think about memory allocation and releasing.
I hope this clears things up, because four posts on the same topic, frankly, is a bit too much.
IIRC, the Delphi memory manager does not immediately return free'd memory to the OS.
Memory is allocated in chunks of small, medium and large sizes, called blocks.
These blocks are kept for a while after their contents have been disposed to have them readyly available when another allocation is requested afterwards.
This limits the amount of system calls required for succesive allocation of multiple objects, and helps avoiding heap fragmentation.
Infirming: Delphi 2007, default memory manager (should be FastMM variation). Several tests on heavy objects:
Initial memory 2Mb, peak memory 30Mb, final memory 4Mb.
Initial memory 2Mb, peak memory 1Gb, final memory 5.5Mb.
What are the heapmanager stats (GetHeapStatus) on the point that 160MB is still allocated?
SOLVED
To confirm that this behavior is generated by FastMM (as suggested by Barry Kelly) I created a second program that allocated A LOT of RAM. As soon as Windows ran out of RAM, my program memory utilization returned to its original value.
Problem solved. Special thanks to Barry Kelly, the only person that pointed to the real "problem".

Multithreaded Heap Management

In C/C++ I can allocate memory in one thread and delete it in another thread. Yet whenever one requests memory from the heap, the heap allocator needs to walk the heap to find a suitably sized free area. How can two threads access the same heap efficiently without corrupting the heap? (Is this done by locking the heap?)
In general, you do not need to worry about the thread-safety of your memory allocator. All standard memory allocators -- that is, those shipped with MacOS, Windows, Linux, etc. -- are thread-safe. Locks are a standard way of providing thread-safety, though it is possible to write a memory allocator that only uses atomic operations rather than locks.
Now it is an entirely different question whether those memory allocators scale; that is, is their performance independent of the number of threads performing memory operations? In most cases, the answer is no; they either slow down or can consume a lot more memory. The first scalable allocator in both dimensions (speed and space) is Hoard (which I wrote); the Mac OS X allocator is inspired by it -- and cites it in the documentation -- but Hoard is faster. There are others, including Google's tcmalloc.
Yes an "ordinary" heap implementation supporting multithreaded code will necessarily include some sort of locking to ensure correct operation. Under fairly extreme conditions (a lot of heap activity) this can become a bottleneck; more specialized heaps (generally providing some sort of thread-local heap) are available which can help in this situation. I've used Intel TBB's "scalable allocator" to good effect. tcmalloc and jemalloc are other examples of mallocs implemented with multithreaded scaling in mind.
Some timing comparisons comparisons between single threaded and multithread-aware mallocs here.
This is an Operating Systems question, so the answer is going to depend on the OS.
On Windows, each process gets its own heap. That means multiple threads in the same process are (by default) sharing a heap. Thus the OS has to thread-synchronize its allocation and deallocation calls to prevent heap corruption. If you don't like the idea of the possible contention that may ensue, you can get around it by using the Heap* routines. You can even overload malloc (in C) and new (in C++) to call them.
I found this link.
Basically, the heap can be divided into arenas. When requesting memory, each arena is checked in turn to see whether it is locked. This means that different threads can access different parts of the heap at the same time safely. Frees are a bit more complicated because each free must be freed from the arena that it was allocated from. I imagine a good implementation will get different threads to default to different arenas to try to minimize contention.
Yes, normally access to the heap has to be locked. Any time you have a shared resource, that resource needs to be protected; memory is a resource.
This will depend heavily on your platform/OS, but I believe this is generally OK on major sytems. C/C++ do not define threads, so by default I believe the answer is "heap is not protected", that you must have some sort of multithreaded protection for your heap access.
However, at least with linux and gcc, I believe that enabling -pthread will give you this protection automatically...
Additionally, here is another related question:
C++ new operator thread safety in linux and gcc 4

Resources