How does a program know how much memory to release? - memory

I suspect the answer to my question is language specific, so I'd like to know about C and C++. When I call free() on a buffer or use delete[], how does the program know how much memory to free?
Where is the size of the buffer or of the dynamically allocated array stored and why isn't it available to the programmer as well?

Each implementation will be different, but typically the runtime allocates a bit more than asked for, and uses some hidden fields at the start of the block to remember the allocated size. The address returned to the caller is therefore offset a bit from the start of the memory claimed from the heap.
It isn't available to the caller because the true amount of memory claimed from the heap is an implementation detail, and will vary between compilers and platforms. As for knowing how much the caller asked for, rather than how much was allocated from the heap... well, the language designers assume that the programmer is capable of remembering this if needed.

The heap keeps track of all memory blocks, both allocated and free, specifically for that purpose. Typical (if naive) implemenation allocates memory, uses several bytes in the beginning for bookkeeping, and returns the address past those bytes. On subsequent operations (free/realloc), it would subtract a few bytes to get to the bookkeeping area.
Some heap implementations (say, Windows' GlobalAlloc()) let you know the block size given the starting address. But in the C/C++ RTL heap, no such service.
Note that the malloc() sometimes overallocates memory, so the information about mallocated block size would be of limited utility. C++ new[]'ed arrays, that's a whole another matter - for those, knowing exact array size is essential for array destruction to work properly. Still, there's no such thing in C++ as a dynamic_sizeof operator.

The memory allocator that gave you that chunk of memory is responsible for all that maintenance data. Typically it's stored in the beginning of the chunk (right before the actual address you use) so it's easy to access on freeing.
Regarding to your other question: why should your app know about it? It's not your concern. It decouples memory allocation management from the app so you can use different allocators (for performance or debugging reasons).

It's stored internally in a location dependent on the language/compiler/OS.
Sometimes it is available (.Length in C# for example), though that may only refer to how much memory you're allowed to use, and not the object's total size.

Usually because the size to free is stored somewhere within the allocated buffer. A common technique is to have the size stored in memory just previous to the returned pointer.
Why isn't such information available to the programmer? I don't really know. I guess its because an implementation may be able to provide memory allocation without actually needing to store its size, and such implementation -if it exists- shouldn't be penalized by the others.

It's not so much language specific. It's all done by the memory manager.
How it knows depends on how the memory manager manages memory. The general idea is that the memory manager allocates more memory than you ask for. It stores extra data about the allocated blocks of memory in those locations. Thus, when you release the memory, it uses the information stored in those locations (reconstructed based on the given pointer) and figures out how much actual memory to stop managing.

Don't confound deallocation and destruction.
free() knows the size of the memory because of some internal magic ("implementation-defined"), e.g. the allocator could keep a list of all the allocated memory regions indexed by their corresponding pointers and just look up the pointer to know what to deallocate; or that information could be stored next to the allocated memory itself in some hidden block of data.
The array-delete expression delete[] arr; does not only deallocate memory, but it also invokes all destructors. For that purpose, it is not sufficient to just know the memory size, but we also need to know the number of elements. For that purpose, new T[N] actually allocates more than sizeof(T) * N bytes of memory, so the array-deleter knows how many destructors to call. All that memory is properly deallocated by the corresponding delete-operator.

Related

In programming environments that have automatic memory management, how often are the OS memory allocation routines invoked at runtime?

Do implementations pre-allocate blocks of memory for objects using malloc? When these blocks are used up, will additional memory be requested? When garbage collection runs and compaction occurs, will memory be returned to the OS via calls to free?
Do implementations pre-allocate blocks of memory for objects using malloc?
Yes. Most often they pre-allocate continuous blocks of memory and implement they own allocation mechanism inside (for example based on allocation pointer - pointing the memory address for the next object so allocating an object is simply returning this address and moving this pointer by given amount of bytes). This is faster than relying on OS calls and gives better control of those memory regions. For example, in case of CLR on Windows, those blocks are called segments and are managed via VirtualAlloc/VirtualFree calls. First quite a big memory region is reserved and then more and more pages are being committed as they are needed. Malloc (or more general - HeapAPI in case of Windows) is not used in CLR.
When these blocks are used up, will additional memory be requested?
Yes, they may be more blocks created but first they grow "inside" by committing (consuming) reserved memory.
When garbage collection runs and compaction occurs, will memory be returned to the OS via calls to free?
It depends on specific runtime implementation but you should not look at it as a main memory reclamation mechanism. Compaction works inside those preallocated memory blocks - for example, allocation pointer will be moved back to the left after compaction occurred. But yes, in general, segments may be returned to OS when GC decides that it is no longer needed (like all objects living inside have been reclaimed). However, on 32-bit architectures with quite limited virtual memory space it could lead to unwanted memory fragmentation and reusing such memory block was a better option. On 64-bit this may not be so big problem, however, reusing those blocks still may be a just good idea.

memory management and segmentation faults in modern day systems (Linux)

In modern-day operating systems, memory is available as an abstracted resource. A process is exposed to a virtual address space (which is independent from address space of all other processes) and a whole mechanism exists for mapping any virtual address to some actual physical address.
My doubt is:
If each process has its own address space, then it should be free to access any address in the same. So apart from permission restricted sections like that of .data, .bss, .text etc, one should be free to change value at any address. But this usually gives segmentation fault, why?
For acquiring the dynamic memory, we need to do a malloc. If the whole virtual space is made available to a process, then why can't it directly access it?
Different runs of a program results in different addresses for variables (both on stack and heap). Why is it so, when the environments for each run is same? Does it not affect the amount of addressable memory available for usage? (Does it have something to do with address space randomization?)
Some links on memory allocation (e.g. in heap).
The data available at different places is very confusing, as they talk about old and modern times, often not distinguishing between them. It would be helpful if someone could clarify the doubts while keeping modern systems in mind, say Linux.
Thanks.
Technically, the operating system is able to allocate any memory page on access, but there are important reasons why it shouldn't or can't:
different memory regions serve different purposes.
code. It can be read and executed, but shouldn't be written to.
literals (strings, const arrays). This memory is read-only and should be.
the heap. It can be read and written, but not executed.
the thread stack. There is no reason for two threads to access each other's stack, so the OS might as well forbid that. Moreover, the tread stack can be de-allocated when the tread ends.
memory-mapped files. Any changes to this region should affect a specific file. If the file is open for reading, the same memory page may be shared between processes because it's read-only.
the kernel space. Normally the application should not (or can not) access that region - only kernel code can. It's basically a scratch space for the kernel and it's shared between processes. The network buffer may reside there, so that it's always available for writes, no matter when the packet arrives.
...
The OS might assume that all unrecognised memory access is an attempt to allocate more heap space, but:
if an application touches the kernel memory from user code, it must be killed. On 32-bit Windows, all memory above 1<<31 (top bit set) or above 3<<30 (top two bits set) is kernel memory. You should not assume any unallocated memory region is in the user space.
if an application thinks about using a memory region but doesn't tell the OS, the OS may allocate something else to that memory (OS: sure, your file is at 0x12341234; App: but I wanted to store my data there). You could tell the OS by touching the end of your array (which is unreliable anyways), but it's easier to just call an OS function. It's just a good idea that the function call is "give me 10MB of heap", not "give me 10MB of heap starting at 0x12345678"
If the application allocates memory by using it then it typically does not de-allocate at all. This can be problematic as the OS still has to hold the unused pages (but the Java Virtual Machine does not de-allocate either, so hey).
Different runs of a program results in different addresses for variables
This is called memory layout randomisation and is used, alongside of proper permissions (stack space is not executable), to make buffer overflow attacks much more difficult. You can still kill the app, but not execute arbitrary code.
Some links on memory allocation (e.g. in heap).
Do you mean, what algorithm the allocator uses? The easiest algorithm is to always allocate at the soonest available position and link from each memory block to the next and store the flag if it's a free block or used block. More advanced algorithms always allocate blocks at the size of a power of two or a multiple of some fixed size to prevent memory fragmentation (lots of small free blocks) or link the blocks in a different structures to find a free block of sufficient size faster.
An even simpler approach is to never de-allocate and just point to the first (and only) free block and holds its size. If the remaining space is too small, throw it away and ask the OS for a new one.
There's nothing magical about memory allocators. All they do is to:
ask the OS for a large region and
partition it to smaller chunks
without
wasting too much space or
taking too long.
Anyways, the Wikipedia article about memory allocation is http://en.wikipedia.org/wiki/Memory_management .
One interesting algorithm is called "(binary) buddy blocks". It holds several pools of a power-of-two size and splits them recursively into smaller regions. Each region is then either fully allocated, fully free or split in two regions (buddies) that are not both fully free. If it's split, then one byte suffices to hold the size of the largest free block within this block.

Inside Dynamics memory management

i am student and want to know more about the dynamics memory management. For C++, calling operator new() can allocate a memory block under the Heap(Free Store ). In fact, I have not a full picture how to achieve it.
There are a few questions:
1) What is the mechanism that the OS can allocate a memory block?? As I know, there are some basic memory allocation schemes like first-fit, best-fit and worst-fit. Does OS use one of them to allocate memory dynamically under the heap?
2) For different platform like Android, IOS, Window and so on, are they used different memory allocation algorithms to allocate a memory block?
3) For C++, when i call operator new() or malloc(), Does the memory allocator allocate a memory block randomly in the heap?
Hope anyone can help me.
Thanks
malloc is not a system call, it is library (libc) routine which goes through some of its internal structures to give you address of a free piece of memory of the required size. It only does a system call if the process' data segment (i.e. virtual memory it can use) is not "big enough" according to the logic of malloc in question. (On Linux, the system call to enlarge data segment is brk)
Simply said, malloc provides fine-grained memory management, while OS manages coarser, big chunks of memory made available to that process.
Not only different platforms, but also different libraries use different malloc; some programs (e.g. python) use its internal allocator instead as they know its own usage patterns and can increase performance that way.
There is a longthy article about malloc at wikipedia.

How do programs allocate large amounts of memory?

I have 3 questions concerning memory allocation that I thought better to put into one question than 3.
When memory is allocated as I understand, it is allocated on the heap, which is just 16mb. How hen do programs such as video games or modern browsers manage to use over 1GB?
Since it is obviously possible for this much memory to be used, why can it not be allocated at the start? I have found the most I can allocate in High Level Assembly language is around 100MB. This is a lot more than 16MB, and far less than I have 3, so where does this limitation come from?
Why allocate memory in the first place, rather than allocating variables and letting the compiler/system handle it?
When memory is allocated as I understand, it is allocated on the heap,
which is just 16mb. How hen do programs such as video games or modern
browsers manage to use over 1GB?
The heap can grow. It isn't limited to any value and certainly not 16MB. You can easily allocate 1GB of heap, just make a program test and you'll see.
Since it is obviously possible for this much memory to be used, why
can it not be allocated at the start? I have found the most I can
allocate in High Level Assembly language is around 100MB. This is a
lot more than 16MB, and far less than I have 3, so where does this
limitation come from?
I'm not sure why your OS isn't filling larger allocation requests. Perhaps due to memory fragmentation? It's going to be a problem specific to your setup, which you didn't share. I can allocation much more memory than that without an issue.
You can try to use the mmap system call if malloc (which uses the brk system call) is having some sort of issue. Note that for GNU libc, malloc actually uses mmap instead of brk when the allocation is large enough (over 128k I think).
Why allocate memory in the first place, rather than allocating
variables and letting the compiler/system handle it?
Variable must live in memory somewhere. What you are saying is "why manually manage memory? Why can't some algorithm do that for me?". It is actually very common for the compiler and a runtime component to handle allocation/freeing - it's called garbage collection.

In what situations Static Allocation fares better than Dynamic Allocation?

I was going through some of the decisions made to make Xara Xtreme, an open source SVG graphics application. Their memory management decision was quite intriguing to me since I naively took it for granted that on-demand dynamic allocation as the way of writing object oriented application.
The explanation from the documentation is
How on earth can static allocations be efficient?
If you are used to large dynamic data structures, this may seem strange
to you. Firstly, all our objects (and
thus allocation size) are far smaller
(on average) than each dynamic area
allocation within a program such as
Impression. This means that though
there are likely to be many holes
within memory, they are small. Also,
we have far more allocated objects
within memory, and thus these holes
quickly get filled. Furthermore,
virtual memory managers will free up
any pages of memory that contain no
allocations and give this memory back
to the operating system so that it may
be used again (either by us, or by
another task).
We benefit greatly from
the fact that whenever we allocate
memory in this manner, we do not have
to move any memory about. This proved
a bottleneck in ArtWorks which also
had many small allocations being used
concurrently. more
In brief, the presence of plenty of small objects and the need to prevent memory move are the reasons given for choosing static allocation. I don't have clear understanding about the reasons mentioned.
Though this talks about static allocation, what I see from the cursory look at the code is that a block of memory is dynamically allocated at the application start and kept alive till the application ends, roughly simulating static allocation.
Could you explain in what situations Static Allocation fares better than on-demand Dynamic Allocation in order to consider it as the main mode of allocation in a serious applications?
It's quicker because you avoid the overhead of calling a system routine to manage your storage. malloc() maintains a heap, so every request requires a scan for an appropriately-sized block, possibly resizing the block, updating the block list to mark this block as used, etc. If you're allocating a lot of small objects, this overhead can be excessive. With static allocation you can create an allocation pool and just maintain a simple bitmap to show which areas are in use. This assumes that each object is the same size, so you commonly create one pool per object type.
In short, there's really no such thing as static allocation other than the space allocated for your functions themselves and other read-only kinds of memory. (Do an assemble-only "gcc -S" and look for all the memory blocks, if you're interested.) If you're making and breaking objects, you're dynamically allocating. That being said, there's nothing to stop you from tightly controlling the allocation mechanism itself.
That's what functions like mallinfo() and mallopt() do for controlling how malloc() does its magic. However, that might not even be good enough for you. If you know all your chunks are going to be the same size, you can allocate and deallocate much more efficiently. And if you know you have 3 sizes of stuff, you can keep 3 arenas of memory each with their own allocator.
On top of this, you have the situation at runtime where the process doesn't have enough room and needs to ask the os for more - that involves a system call that is more expensive than just incrementing an array index. On unix, it's usually brk() or sbrk() or the like. And that can take valuable time.
Another, rarer situation, would be if you need to multiply-allocate things. Like 3 threads need to share information and only when all 3 release it does it get freed. That's something nonstandard and not generally covered by typical mallopt() or even pthread-specific memory or mutex/semaphore-locked chunks.
So if you have high speed optimization issues or you are running on an embedded system where you need to squeeze all you can out of the available memory, then "static allocation", or at least controlling the allocation mechanism, may be the way to go.

Resources