In what situations Static Allocation fares better than Dynamic Allocation? - memory

I was going through some of the decisions made to make Xara Xtreme, an open source SVG graphics application. Their memory management decision was quite intriguing to me since I naively took it for granted that on-demand dynamic allocation as the way of writing object oriented application.
The explanation from the documentation is
How on earth can static allocations be efficient?
If you are used to large dynamic data structures, this may seem strange
to you. Firstly, all our objects (and
thus allocation size) are far smaller
(on average) than each dynamic area
allocation within a program such as
Impression. This means that though
there are likely to be many holes
within memory, they are small. Also,
we have far more allocated objects
within memory, and thus these holes
quickly get filled. Furthermore,
virtual memory managers will free up
any pages of memory that contain no
allocations and give this memory back
to the operating system so that it may
be used again (either by us, or by
another task).
We benefit greatly from
the fact that whenever we allocate
memory in this manner, we do not have
to move any memory about. This proved
a bottleneck in ArtWorks which also
had many small allocations being used
concurrently. more
In brief, the presence of plenty of small objects and the need to prevent memory move are the reasons given for choosing static allocation. I don't have clear understanding about the reasons mentioned.
Though this talks about static allocation, what I see from the cursory look at the code is that a block of memory is dynamically allocated at the application start and kept alive till the application ends, roughly simulating static allocation.
Could you explain in what situations Static Allocation fares better than on-demand Dynamic Allocation in order to consider it as the main mode of allocation in a serious applications?

It's quicker because you avoid the overhead of calling a system routine to manage your storage. malloc() maintains a heap, so every request requires a scan for an appropriately-sized block, possibly resizing the block, updating the block list to mark this block as used, etc. If you're allocating a lot of small objects, this overhead can be excessive. With static allocation you can create an allocation pool and just maintain a simple bitmap to show which areas are in use. This assumes that each object is the same size, so you commonly create one pool per object type.

In short, there's really no such thing as static allocation other than the space allocated for your functions themselves and other read-only kinds of memory. (Do an assemble-only "gcc -S" and look for all the memory blocks, if you're interested.) If you're making and breaking objects, you're dynamically allocating. That being said, there's nothing to stop you from tightly controlling the allocation mechanism itself.
That's what functions like mallinfo() and mallopt() do for controlling how malloc() does its magic. However, that might not even be good enough for you. If you know all your chunks are going to be the same size, you can allocate and deallocate much more efficiently. And if you know you have 3 sizes of stuff, you can keep 3 arenas of memory each with their own allocator.
On top of this, you have the situation at runtime where the process doesn't have enough room and needs to ask the os for more - that involves a system call that is more expensive than just incrementing an array index. On unix, it's usually brk() or sbrk() or the like. And that can take valuable time.
Another, rarer situation, would be if you need to multiply-allocate things. Like 3 threads need to share information and only when all 3 release it does it get freed. That's something nonstandard and not generally covered by typical mallopt() or even pthread-specific memory or mutex/semaphore-locked chunks.
So if you have high speed optimization issues or you are running on an embedded system where you need to squeeze all you can out of the available memory, then "static allocation", or at least controlling the allocation mechanism, may be the way to go.

Related

Kernel memory management: where do I begin?

I'm a bit of a noob when it comes to kernel programming, and was wondering if anyone could point me in the right direction for beginning the implementation of memory management in a kernel setting. I am currently working on a toy kernel and am doing a lot of research on the subject but I'm a bit confused on the topic of memory management. There are so many different aspects to it like paging and virtual memory mapping. Is there a specific order that I should implement things or any do's and dont's? I'm not looking for any code or anything, I just need to be pointed in the right direction. Any help would be appreciated.
There are multiple aspects that you should consider separately:
Managing the available physical memory.
Managing the memory required by the kernel and it's data structures.
Managing the virtual memory (space) of every process.
Managing the memory required by any process, i.e. malloc and free.
To be able to manage any of the other memory demands you need to know actually how much physical memory you have available and what parts of it are available to your use.
Assuming your kernel is loaded by a multiboot compatible boot loader you'll find this information in the multiboot header that you get passed (in eax on x86 if I remember correctly) from the boot loader.
The header contains a structure describing which memory areas are used and which are free to use.
You also need to store this information somehow, and keep track of what memory is allocated and freed. An easy method to do so is to maintain a bitmap, where bit N indicates whether the (fixed size S) memory area from N * S to (N + 1) * S - 1 is used or free. Of course you probably want to use more sophisticated methods like multilevel bitmaps or free lists as your kernel advances, but a simple bitmap as above can get you started.
This memory manager usually only provides "large" sized memory chunks, usually multiples of 4KB. This is of course of no use for dynamic memory allocation in style of malloc and free that you're used to from applications programming.
Since dynamic memory allocation will greatly ease implementing advanced features of your kernel (multitasking, inter process communication, ...) you usually write a memory manager especially for the kernel. It provides means for allocation (kalloc) and deallocation (kfree) of arbitrary sized memory chunks. This memory is from pool(s) that are allocated using the physical memory manager from above.
All of the above is happening inside the kernel. You probably also want to provide applications means to do dynamic memory allocation. Implementing this is very similar in concept to the management of physical memory as done above:
A process only sees its own virtual address space. Some parts of it are unusable for the process (for example the area where the kernel memory is mapped into), but most of it will be "free to use" (that is, no actually physical memory is associated with it). As a minimum the kernel needs to provide applications means to allocate and free single pages of its memory address space. Allocating a page results (under the hood, invisible to the application) in a call to the physical memory manager, and in a mapping from the requested page to this newly allocated memory.
Note though that many kernels provide its processes either more sophisticated access to their own address space or directly implement some of the following tasks in the kernel.
Being able to allocate and free pages (4KB mostly) as before doesn't help with dynamic memory management, but as before this is usually handled by some other memory manager which is using these large memory chunks as pool to provide smaller chunks to the application. A prominent example is Doug Lea's allocator. Memory managers like these are usually implemented as library (part of the standard library most likely) that is linked to every application.

Mapping and allocating

I am little confused with term mapping, for example, when we say mapping memory for database, it means that we assigning specific amount of memory at some memory location to that database?
Also is allocating memory synonym for reserving memory?
Very often I encounter these two terms, and they aren't so clear to me.
If someone can clarify these two terms, I will be very thankful.
This might be a question better asked to the software community at stackoverflow. However, I am a CS.
I would say that terms aren't always used accurately and precisely.
In general allocating memory is making memory available to a program for an active purpose, such as allocating memory for buffers to hold a file or in in-memory structure now.
Reserving memory is often used to mean the same thing. However, it is sometimes more passive. For example reserving memory in case their is a future requirement, or protecting against too much memory allocation for a different purpose.
Often when the term 'mapping' is used, it is for a file. It may mean exactly the same as allocating. Or it means more; mapping may be using an underlying mechanism provided by virtual memory management systems, where part of virtual memory is 'mapped' to the file, without actually reading the file into physical memory. The trick is, as the memory-mapped file is accessed, the block/page being accessed is read in 'invisibly' to the process when necessary. This uses a mechanism called demand paging. It's benefit is a program can access the file as if it is all read into memory, but only the parts actually accessed are retrieved from the persistent storage system (disk, flash, whatever), which can be a huge win if only small parts of the file are needed.
Further, it simplifies the program, which can be written as if the whole file is in memory. Instead of the application developer trying to keep track of which parts of the file have been loaded into memory, the operating system does that instead.
Even better, the Operating system can be asked to track which blocks/pages have their contents changed, and it can be asked to periodically write that back out to persistent storage. This can even further simplify the application program.
This is popular with some databases.
Mapping basically means assigning. Except we often want a 1 to 1 mapping in the case of functions. If you define the function of an object, physical or just logical, and define it's relationships and how it changes under transformation then you have mapped it.

How to solve memory segmentation and force FastMM to release memory to OS?

Note: 32 bit application, which is not planned to be migrated to 64 bit.
I'm working with a very memory consuming application and have pretty much optimized all the relevant paths in respect to memory allocation/de-allocation. (there are no memory leaks, no handle leaks, no any other kind of leaks in the application itself AFAIK and tested. 3rd party libs which I cannot touch are of course candidates but unlikely in my scenario)
The application will frequently allocate large single and bi-dimensional dynamic arrays of single and packed records of up to 4 singles. By large I mean 5000x5000 of record(single,single,single,single) is normal. Also having even 6 or 7 such arrays in work at a given time. This is needed as there are a lot of cross-computations made on these arrays and having them read from disk would be a real performance killer.
Having this clarified, I am getting out of memory errors a lot because of these large dynamic arrays which will not go away after releasing them, no matter if I setlength them to 0 or finalize them. This is of course something FastMM is doing in order to be fast, I know that much.
I am tracking both FastMM allocated blocks and process consumed memory (RAM + PF) by using:
function CurrentProcessMemory(AWaitForConsistentRead:boolean): Cardinal;
var
MemCounters: TProcessMemoryCounters;
LastRead:Cardinal;
maxCnt:integer;
begin
result := 0;// stupid D2010 compiler warning
maxCnt := 0;
repeat
Inc(maxCnt);
// this is a stabilization loop;
// in tight loops, the system doesn't get
// much chance to release allocated resources, which in turn will get falsely
// reported by this function as still being used, resulting in a false-positive
// memory leak report in the application.
// so we do a tight loop here, waiting, until the application reported memory
// gets stable.
LastRead := result;
MemCounters.cb := SizeOf(MemCounters);
if GetProcessMemoryInfo(GetCurrentProcess,
#MemCounters,
SizeOf(MemCounters)) then
Result := MemCounters.WorkingSetSize + MemCounters.PagefileUsage
else
RaiseLastOSError;
if AWaitForConsistentRead and (LastRead <> 0) and (abs(LastRead - result)>1024) then
begin
sleep(60);
application.processmessages;
end;
until (not AWaitForConsistentRead) or (abs(LastRead - result)<1024) or (maxCnt>1000);
// 60 seconds wait is a bit too much
// so if the system is that "unstable", let's just forget it.
end;
function CurrentFastMMMemory:Cardinal;
var mem:TMemoryManagerUsageSummary;
begin
GetMemoryManagerUsageSummary(mem);
result := mem.AllocatedBytes + mem.OverheadBytes;
end;
I am running the code on a 64bit computer and my top memory consumption before crashes is about 3.3 - 3.4 GB. After that, I get memory/resources related crashes anywhere in the application. Took me some time to pin it down on the large dynamic arrays usage which were buried down in some 3rd party library.
The way I am getting over this is that I made the application resume itself from where it left off, by re-starting itself and closing with certain parameters.
This is all nice and dandy if memory consumption is fair and current operation finishes.
The big problem happens when the current memory usage is 1GB and the next operation to process requires 2.5 GB memory or more to be processed. My current code limited itself to an upper value of 1.5 GB used memory before resuming, but in this situation, I'd have to drop the limit down under 1 GB which would basically have the application resume itself after each operation and not even that guaranteeing that everything will be fine.
What if another operation will have a larger data set to process and it will require a total of 4GB or more memory?
To note that I am not talking about actual 4 GB in memory, but consumed memory by allocating huge dynamic arrays which the OS doesn't get back once de-allocated and hence it still sees it as consumed, so it adds up.
So, my next point of attack is to force fastmm to release all (or at least part of) memory to the OS. I'm specifically targeting the huge dynamic arrays here. Again, these are in a 3rd party library so re-coding that is not really in the top options. It's much easier and faster to tinker in the fastmm code and write a proc to release the memory.
I can't switch from FastMM as currently the entire application and some of the 3rd party libs are heavily coded around the use of PushAllocationGroup in order to quickly find and pinpoint any memory leaks. I know I can write a dummy FastMM unit to solve the compilation references, but I will be left without this quick and certain leak detection.
In conclusion: is there any way I can force FastMM to release at least some of it's large blocks to the OS? (well, sure there is, the actual question is: did anybody write it and if so, mind sharing?)
Thanks
later edit:
I will come up with a small relevant test application soon. It doesn't appear to be that easy to mock up one
I doubt that the issue is actually down to FastMM. For huge memory blocks, FastMM will not do any sub-allocation. Your allocation request will be handled with a straight VirtualAlloc. And then deallocation is VirtualFree.
That's assuming that you are allocating those 380MB objects in one contiguous block. I suspect that what you actually have are ragged 2D dynamic arrays. And they are not single allocations. a 5000x5000 ragged 2D dynamic arrays takes 5001 allocations to initialise. One for the row pointers, and 5000 for the rows. Those will be medium FastMM blocks. There will be sub-allocation.
I think you are asking too much. In my experience, any time you need over 3GB of memory in a 32 bit process, it's game over. Fragmentation of address space will stop you before you run out of memory. You cannot hope for this to work. Switch to 64 bit, or use a cleverer, less demanding allocation pattern. Or do you really need dense 2D arrays? Can you use sparse storage?
If you cannot alleviate your memory demands that way, you could use memory mapped files. This would allow you to make use of the extra memory that your 64 bit system has. The system's disk cache can be larger than 4GB and so your app can traverse more than 4GB of memory without actually needing to hit the disk.
You could certainly try different memory managers. I honestly do not hold out any hope that it would help. You could write a trivial replacement memory manager that used HeapAlloc. And enable the low fragmentation heap (enabled by default from Vista on). But I sincerely doubt that it will help. I'm afraid that there won't be a quick fix for you. To resolve this you face a more fundamental modification to your code.
Your issue as others have said is most likely attributable to memory fragmentation. You could test this by using VirtualQuery to create a picture of how memory is allocated to your application. You will very likely find that although you may have more than enough total memory for a new array, you don't have enough contiguous memory.
FastMem already does a lot to try and avoid problems due to memory fragmentation. "Small" allocations are done at the low end of the address space, whereas "large" allocations are done at the high end. This avoids a common problem where a series of large then small allocations followed by all large allocations being released results in a large amount of fragmented memory that is almost unusable. (Certainly unusable by anything slightly larger than the original large allocations.)
To see the benfits of FastMem's approach, imagine your memory layed out as follows:
Each digit represent a 100mb block.
[0123456789012345678901234567890123456789]
Small allocations represented by "s".
Large allocations repestented by capital letters.
[0sssss678901GGGGFFFFEEEEDDDDCCCCBBBBAAAA]
Now if you free all your large blocks, you should have no trouble performing similar large allocations later.
[0sssss6789012345678901234567890123456789]
The problem is that "large" and "small" are relative, and highly dependent on the nature of your application. FastMem defines a dividing line between "large" and "small". If you happen to have some small allocations that FastMem would classify as large, you may encounter the following problem.
[0sss4sGGGGsFFFFsEEEEsDDDDsCCCCsBBBBsAAAA]
Now if you free the large blocks you're left with:
[0sss4s6789s1234s6789s1234s6789s1234s6789]
And an attempt to allocate something larger than 400mb will fail.
Options
You may be able to tweak the FastMem settings so that all your "small" allocations are also considered small by FastMem. However, there are a few situations where this won't work:
Any DLLs you use that allocate memory to your application but bypass FastMem may still cause fragmentation.
If you don't release all your large blocks together, those that remain may induce fragmentation which will slowly get worse over time.
You could take on the task of memory management yourself.
Allocate one very large block e.g. 3.5GB which you keep for the entire lifetime of the application.
Instead of using dynamic arrays, you determine the pointer locations to use when setting up a new array.
Of course the simplest alternative would be to go 64-bit.
You could consider alternate data structures.
Do you really need array lookup capability? If not, another structure that allocates in smaller chunks may suffice.
Even if you do need array lookup, consider a paged array. Sparse arrays are a combination of arrays and linked lists. Data is stored on pages, with linked lists chaining each page.
A simple variant (since you mentioned your arrays are 2 dimensional) would be to leverage that: One dimension forms its own array providing a lookup into one of multiple arrays for the second dimension.
Related to the alternate data structures option, consider storing some data on disk. Yes performance will be slower. But if an efficient caching mechanism can be found, then maybe not so much. It would be better to be a little slower, but not crashing.
Dynamic arrays are reference counted in Delphi, so they should be automatic released when they are not used anymore.
Like strings, they are handled with COW (copy on write) when shared/stored in several variables/objects. So it seems you have some kind of memory/reference leak (e.g. an object in memory that holds still are reference to an array).
Just to be sure: you are not doing any kind of low level pointer tricks, aren't you?
So please yes, post a test program (or send the complete program private via email) so one of us can take a look at it.

Xcode Instrument : Memory Terms Live Bytes and Overall Bytes (Real Memory) confusion

I am working on a Browser application in which I use a UIWebView for opening web pages. I run the Instruments tool with Memory Monitor. I am totally confused by the terms which are used in Instruments and why they're important. Please explain some of my questions with proper reasons:
Live Bytes is important for checking memory optimization or memory consumption? Why ?
Why would I care about the Overall Bytes/ Real Memory, if it contains also released objects?
When and why are these terms used (Live Bytes/ Overall Bytes/Real Memory)?
Thanks
"Live Bytes" means "memory which has been allocated, but not yet deallocated." It's important because it's the most easily graspable measure of "how much memory your app is using."
"Overall Bytes" means "all memory which has ever been allocated including memory that has been deallocated." This is less useful, but gives you some idea of "heap churn." Churn leads to fragmentation, and heap fragmentation can be a problem (albeit a pretty obscure one these days.)
"Real Memory" is an attempt to distinguish how much physical RAM is in use (as opposed to how many bytes of address space are valid). This is different from "Live Bytes" because "Live Bytes" could include ranges of memory that correspond to memory-mapped files (or shared memory, or window backing stores, or whatever) that are not currently paged into physical RAM. Even if you don't use memory-mapped files or other exotic VM allocation methods, the system frameworks do, and you use them, so this distinction will always have some importance to every process.
EDIT: Since you're clearly concerned about memory use incurred by using UIWebView, let me see if I can shed some light on that:
There is a certain memory "price" to using UIWebView at all (i.e. global caches and the like). These include various global font caches, JavaScript JIT caches, and stuff like that. Most of these are going to behave like singletons: allocated the first time you use them (indirectly by using UIWebView) and never deallocated until the process ends. There are also some variable size global caches (like those that cache web responses; CFURL typically manages these) but those are expected to be managed by the system. The collective "weight" of these things with respect to UIWebView is, as you've seen, non-trivial.
I don't have any knowledge of UIKit or WebKit internals, but I would expect that if you had a discussion with someone who did, their response to the question of "Why is my use of UIWebView causing so much memory use?" would be two pronged: The first prong would be "this is the price of admission for using UIWebView -- it's basically like running a whole web browser in your process." The second prong would be "system framework caches are automatically managed by the system" by which they would mean that, for instance, the CFURL caches (which is one of the things that using UIWebView causes to be created) are managed by the system, so if a memory warning came in, the system frameworks would be responsible for evicting things from those caches to reduce the memory consumed by them; you have no control over those, and you just have to trust that the system frameworks will do what needs to be done. (That doesn't help you in the case where whatever the system cache managers do isn't aggressive enough for you, but you're not going to get any more control over them, so you need to attack the issue from another angle, either way.) If you're wondering why the memory use doesn't go down once you deallocate your UIWebView, this is your answer. There's a bunch of stuff it's doing behind the scenes, that you can't control.
The expectation that allocating, using, and then deallocating a UIWebView is a net-zero operation ignores some non-trivial, inherent and unavoidable side-effects. The existence of such side-effects is not (in and of itself) indicative of a bug in UIWebView. There are side effects like this all over the place. If you were to create a trivial application that did nothing but launch and then terminate after one spin of the run loop, and you set a breakpoint on exit(), and looked at the memory that had been allocated and never freed, there would be thousands of allocations. This is a very common pattern used throughout the system frameworks and in virtually every application.
What does this mean for you? It means that you effectively have two choices: Use UIWebView and pay the "price of admission" in memory consumption, or don't use UIWebView.

What is meaning of small footprint in terms of programming?

I heard many libraries such as JXTA and PjSIP have smaller footprints. Is this pointing to small resource consumption or something else?
Footprint designates the size occupied by your application in computer RAM memory.
Footprint can have different meaning when speaking about memory consumption.
In my experience, memory footprint often doesn't include memory allocated on the heap (dynamic memory), or resource loaded from disc etc. This is because dynamic allocations are non constant and may vary depending on how the application or module is used. When reporting "low footprint" or "high footprint", a constant or top measure of the required space is usually wanted.
If for example including dynamic memory in the footprint report of an image editor, the footprint would entirely depend on the size of the image loaded into the application by the user.
In the context of a third party library, the library author can optimize the static memory footprint of the library by assuring that you never link more code into your application binary than absolutely needed. A common method used for doing this in for instance C, is to distribute library functions to separate c-files. This is because most C linkers will link all code from a c-file into your application, not just the function you call. So if you put a single function in the c-file, that's all the linker will incoporate into your application when calling it. If you put five functions in the c-file the linker will probably link all of them into your app even if you only use one of them.
All this being said, the general (academic) definition of footprint includes all kinds of memory/storage aspects.
From Wikipedia Memory footprint article:
Memory footprint refers to the amount of main memory that a program uses or references while running.
This includes all sorts of active memory regions like code segment containing (mostly) program instructions (and occasionally constants), data segment (both initialized and uninitialized), heap memory, call stack, plus memory required to hold any additional data structures, such as symbol tables, debugging data structures, open files, shared libraries mapped to the current process, etc., that the program ever needs while executing and will be loaded at least once during the entire run.
Generally it's the amount of memory it takes up - the 'footprint' it leaves in memory when running. However it can also refer to how much space it takes up on your harddrive - although these days that's less of an issue.
If you're writing an app and have memory limitations, consider running a profiler to keep track of how much your program is using.
It does refer to resources. Particularly memory. It requires a smaller amount of memory when running.
yes, resources such as memory or disk
Footprint in Computing i-e for computer programs or Computer machines is referred as the occupied device memory , for a program , process , code ,etc

Resources