How to convince the memory manager to release unused memory

How to convince the memory manager to release unused memory - delphi

In a recent post ( My program never releases the memory back. Why? ) I show that when using FastMM, the application does not release substantial amounts of memory back to the system.
Recently I created an artificial test program to make sure the issue it is not a memory and that it only appears with FastMM.
In this program I create and destroy an object (same as the one used in the previous post) 500 times.
The memory requirements are ("Private working set"):
Without FastMM
Before running the loop: 1.2MB
After running the loop: 2.1MB
With FastMM (aggressive debug mode)
Before running the loop: 2.1MB
After running the loop: 25MB
With FastMM (release mode)
Before running the loop: 1.8MB
After running the loop: 3MB
If I run the loop several times, the memory requirement does not increase. Which means that the unreleased memory is re-used so this is not a memory leak (a memory leak would increase the memory footprint with several KB/MB at each run).
My questions are:
How can I disable this behavior in FastMM? Is it even possible? I know, if I release the program without FastMM or with FastMM Release Mode it will "waste" moderate amounts of RAM. But disabling this behavior on demand, will help me (us?) identifying memory leaks. Actually in my first post (see link) many people suggested that I have a leak. The confusion was created obviously just because of this behavior. No, it is obvious there is no leak. It is just the memory manager that refuses to release large amounts of memory.
It will ever release the extra memory? When? What triggers this? Can the programmer trigger it? For example when I know that I have finished a RAM-intensive task and the user may not use the program for a while (minimize it), can I flush the RAM back to the system? What happens when the user open multiple instances of my program? Won't they compete for RAM?

You shouldn't think about it as "wasting" RAM, really. Think about it as "caching" unused RAM. The memory manager is holding onto the unused memory instead of releasing it back to the OS for a reason, and in fact you've hit upon that reason in your question.
You said that you keep re-running the same operations in a loop. When you do that, it still has the old memory available and it can assign it immediately, instead of having to ask Windows for a new chunk of heap. This is one of the tricks that puts the "Fast" in "FastMM," and if it didn't do that you'd find your program running a lot more slowly.
You don't need to worry about the FastMM debug mode figure. That's only for debugging, and you're not going to release a program compiled against FullDebugMode. And the difference between "without FastMM" and "with FastMM Release Mode" is about 1 MB, which is negligible on modern hardware. For the low cost of only 1 extra MB, you get a big performance boost. So don't worry about it.

Part of what makes FastMM fast is that it will allocate a large block of memory and carve smaller uniformly sized pieces out of it. If any part of the block is in use, none of it can be released back to the OS.
You're welcome to use a different memory manager. One approach would be to route all allocations directly to VirtualAlloc. Allocations will be rounded up to occupy an entire page at a time, so your program may suffer if you have lots of small allocations, but when you call VirtualFree, you can be confident that the memory definitely doesn't belong to your program anymore.
Another option is to route everything to the OS heap. Use HeapAlloc. You can even enable the low-fragmentation heap for your program (on by default as of Windows Vista), which will make the OS employ a strategy similar to the one used by FastMM, but it will allow you to use some debugging and analysis tools from Microsoft to track your program's memory usage over time. Beware, though, that after you call HeapFree, some metrics might still show the memory as belonging to your program.
Besides, the working set refers to the memory that's currently in physical RAM. That you observed the number go up does not mean that your program has allocated any more memory. It can simply mean that your program touched some memory that it had previously allocated, but which had not yet been put into RAM. During your loop, you touched that memory, and the OS has not decided to page it back out to disk yet.

I use the following as a memory manager. I do so because it performs much better under thread contention than FastMM which is actually rather poor. I know that a scalable manager such as Hoard would be better, but this is works fine for my needs.
unit msvcrtMM;
interface
implementation
type
size_t = Cardinal;
const
msvcrtDLL = 'msvcrt.dll';
function malloc(Size: size_t): Pointer; cdecl; external msvcrtDLL;
function realloc(P: Pointer; Size: size_t): Pointer; cdecl; external msvcrtDLL;
procedure free(P: Pointer); cdecl; external msvcrtDLL;
function GetMem(Size: Integer): Pointer;
begin
Result := malloc(size);
end;
function FreeMem(P: Pointer): Integer;
begin
free(P);
Result := 0;
end;
function ReallocMem(P: Pointer; Size: Integer): Pointer;
begin
Result := realloc(P, Size);
end;
function AllocMem(Size: Cardinal): Pointer;
begin
Result := GetMem(Size);
if Assigned(Result) then begin
FillChar(Result^, Size, 0);
end;
end;
function RegisterUnregisterExpectedMemoryLeak(P: Pointer): Boolean;
begin
Result := False;
end;
const
MemoryManager: TMemoryManagerEx = (
GetMem: GetMem;
FreeMem: FreeMem;
ReallocMem: ReallocMem;
AllocMem: AllocMem;
RegisterExpectedMemoryLeak: RegisterUnregisterExpectedMemoryLeak;
UnregisterExpectedMemoryLeak: RegisterUnregisterExpectedMemoryLeak
);
initialization
SetMemoryManager(MemoryManager);
end.
This isn't an answer to your question, but it's too long to fit into a comment and you may find it interesting to run your app against this MM. My guess is that it will perform the same way as FastMM.

SOLVED
As suggested by Barry Kelly the memory will be released automatically by FastaMM.
To confirm that this I crated a second program that allocated A LOT of RAM. As soon as Windows ran out of RAM, my program memory utilization returned to its original value.
Problem solved.
Thanks Barry.

Related

How to Free Memory when Out-of-memory exception occurs in Delphi using SetLength

I have a piece of Delphi code
var
a: array of array of array of integer;
begin
try
SetLength(a, 100000, 100000, 10000); // out of memory here
doStuffs(a);
except
a = nil; // try to free the memory
end;
end;
The above code tries to allocate lots of memory and out-of-memory will be caught. The a=nil will be executed, but the memory isn't freed.
Is there a way to free the memory in the case of out-of-memory exception?
I tried SetLength(a, 0, 0, 0) and Finalize(a), and both won't work either.

In general, it's not possible to recover from an out of memory error. At that point the heap is most likely corrupted. The appropriate response is to terminate the process.
In this specific case, the allocation is performed by DynArraySetLength in the System unit. This performs repeated allocations. Only as the last act of DynArraySetLength is the return value, a in your code above, actually assigned. And if errors occur in DynArraySetLength then the runtime makes no effort to tidy up. Which means that in case of failure, any memory allocated is leaked and cannot be recovered. You have no way to refer to it in order to free it.
You may think that DynArraySetLength should do more to tidy up. However, it's approach is justifiable. Since out of memory conditions invariably result in corrupt heap, attempts to tidy up would just prolong the agony. Once the heap is dead, there's no point in trying to deallocate memory.

Need multi-threading memory manager

I will have to create a multi-threading project soon I have seen experiments ( delphitools.info/2011/10/13/memory-manager-investigations ) showing that the default Delphi memory manager has problems with multi-threading.
So, I have found this SynScaleMM. Anybody can give some feedback on it or on a similar memory manager?
Thanks

Our SynScaleMM is still experimental.
EDIT: Take a look at the more stable ScaleMM2 and the brand new SAPMM. But my remarks below are still worth following: the less allocation you do, the better you scale!
But it worked as expected in a multi-threaded server environment. Scaling is much better than FastMM4, for some critical tests.
But the Memory Manager is perhaps not the bigger bottleneck in Multi-Threaded applications. FastMM4 could work well, if you don't stress it.
Here are some (not dogmatic, just from experiment and knowledge of low-level Delphi RTL) advice if you want to write FAST multi-threaded application in Delphi:
Always use const for string or dynamic array parameters like in MyFunc(const aString: String) to avoid allocating a temporary string per each call;
Avoid using string concatenation (s := s+'Blabla'+IntToStr(i)) , but rely on a buffered writing such as TStringBuilder available in latest versions of Delphi;
TStringBuilder is not perfect either: for instance, it will create a lot of temporary strings for appending some numerical data, and will use the awfully slow SysUtils.IntToStr() function when you add some integer value - I had to rewrite a lot of low-level functions to avoid most string allocation in our TTextWriter class as defined in SynCommons.pas;
Don't abuse on critical sections, let them be as small as possible, but rely on some atomic modifiers if you need some concurrent access - see e.g. InterlockedIncrement / InterlockedExchangeAdd;
InterlockedExchange (from SysUtils.pas) is a good way of updating a buffer or a shared object. You create an updated version of of some content in your thread, then you exchange a shared pointer to the data (e.g. a TObject instance) in one low-level CPU operation. It will notify the change to the other threads, with very good multi-thread scaling. You'll have to take care of the data integrity, but it works very well in practice.
Don't share data between threads, but rather make your own private copy or rely on some read-only buffers (the RCU pattern is the better for scaling);
Don't use indexed access to string characters, but rely on some optimized functions like PosEx() for instance;
Don't mix AnsiString/UnicodeString kind of variables/functions, and check the generated asm code via Alt-F2 to track any hidden unwanted conversion (e.g. call UStrFromPCharLen);
Rather use var parameters in a procedure instead of function returning a string (a function returning a string will add an UStrAsg/LStrAsg call which has a LOCK which will flush all CPU cores);
If you can, for your data or text parsing, use pointers and some static stack-allocated buffers instead of temporary strings or dynamic arrays;
Don't create a TMemoryStream each time you need one, but rely on a private instance in your class, already sized in enough memory, in which you will write data using Position to retrieve the end of data and not changing its Size (which will be the memory block allocated by the MM);
Limit the number of class instances you create: try to reuse the same instance, and if you can, use some record/object pointers on already allocated memory buffers, mapping the data without copying it into temporary memory;
Always use test-driven development, with dedicated multi-threaded test, trying to reach the worse-case limit (increase number of threads, data content, add some incoherent data, pause at random, try to stress network or disk access, benchmark with timing on real data...);
Never trust your instinct, but use accurate timing on real data and process.
I tried to follow those rules in our Open Source framework, and if you take a look at our code, you'll find out a lot of real-world sample code.

If your app can accommodate GPL licensed code, then I'd recommend Hoard. You'll have to write your own wrapper to it but that is very easy. In my tests, I found nothing that matched this code. If your code cannot accommodate the GPL then you can obtain a commercial licence of Hoard, for a significant fee.
Even if you can't use Hoard in an external release of your code you could compare its performance with that of FastMM to determine whether or not your app has problems with heap allocation scalability.
I have also found that the memory allocators in the versions of msvcrt.dll distributed with Windows Vista and later scale quite well under thread contention, certainly much better than FastMM does. I use these routines via the following Delphi MM.
unit msvcrtMM;
interface
implementation
type
size_t = Cardinal;
const
msvcrtDLL = 'msvcrt.dll';
function malloc(Size: size_t): Pointer; cdecl; external msvcrtDLL;
function realloc(P: Pointer; Size: size_t): Pointer; cdecl; external msvcrtDLL;
procedure free(P: Pointer); cdecl; external msvcrtDLL;
function GetMem(Size: Integer): Pointer;
begin
Result := malloc(size);
end;
function FreeMem(P: Pointer): Integer;
begin
free(P);
Result := 0;
end;
function ReallocMem(P: Pointer; Size: Integer): Pointer;
begin
Result := realloc(P, Size);
end;
function AllocMem(Size: Cardinal): Pointer;
begin
Result := GetMem(Size);
if Assigned(Result) then begin
FillChar(Result^, Size, 0);
end;
end;
function RegisterUnregisterExpectedMemoryLeak(P: Pointer): Boolean;
begin
Result := False;
end;
const
MemoryManager: TMemoryManagerEx = (
GetMem: GetMem;
FreeMem: FreeMem;
ReallocMem: ReallocMem;
AllocMem: AllocMem;
RegisterExpectedMemoryLeak: RegisterUnregisterExpectedMemoryLeak;
UnregisterExpectedMemoryLeak: RegisterUnregisterExpectedMemoryLeak
);
initialization
SetMemoryManager(MemoryManager);
end.
It is worth pointing out that your app has to be hammering the heap allocator quite hard before thread contention in FastMM becomes a hindrance to performance. Typically in my experience this happens when your app does a lot of string processing.
My main piece of advice for anyone suffering from thread contention on heap allocation is to re-work the code to avoid hitting the heap. Not only do you avoid the contention, but you also avoid the expense of heap allocation – a classic twofer!

It is locking that makes the difference!
There are two issues to be aware of:
Use of the LOCK prefix by the Delphi itself (System.dcu);
How does FastMM4 handles thread contention and what it does after it failed to acquire a lock.
Use of the LOCK prefix by the Delphi itself
Borland Delphi 5, released in 1999, was the one that introduced the lock prefix in string operations. As you know, when you assign one string to another, it does not copy the whole string but merely increases the reference counter inside the string. If you modify the string, it is de-references, decreasing the reference counter and allocating separate space for the modified string.
In Delphi 4 and earlier, the operations to increase and decrease the reference counter were normal memory operations. The programmers that have used Delphi knew about and, and, if they were using strings across threads, i.e. pass a string from one thread to another, have used their own locking mechanism only for the relevant strings. Programmers did also use read-only string copy that did not modify in any way the source string and did not require locking, for example:
function AssignStringThreadSafe(const Src: string): string;
var
L: Integer;
begin
L := Length(Src);
if L <= 0 then Result := '' else
begin
SetString(Result, nil, L);
Move(PChar(Src)^, PChar(Result)^, L*SizeOf(Src[1]));
end;
end;
But in Delphi 5, Borland have added the LOCK prefix to the string operations and they became very slow, compared to Delphi 4, even for single-threaded applications.
To overcome this slowness, programmers became to use "single threaded" SYSTEM.PAS patch files with lock's commented.
Please see https://synopse.info/forum/viewtopic.php?id=57&p=1 for more information.
FastMM4 Thread Contention
You can modify FastMM4 source code for a better locking mechanism, or use any existing FastMM4 fork, for example https://github.com/maximmasiutin/FastMM4
FastMM4 is not the fastest one for multicore operation, especially when the number of threads is more than the number of physical sockets is because it, by default, on thread contention (i.e. when one thread cannot acquire access to data, locked by another thread) calls Windows API function Sleep(0), and then, if the lock is still not available enters a loop by calling Sleep(1) after each check of the lock.
Each call to Sleep(0) experiences the expensive cost of a context switch, which can be 10000+ cycles; it also suffers the cost of ring 3 to ring 0 transitions, which can be 1000+ cycles. As about Sleep(1) – besides the costs associated with Sleep(0) – it also delays execution by at least 1 millisecond, ceding control to other threads, and, if there are no threads waiting to be executed by a physical CPU core, puts the core into sleep, effectively reducing CPU usage and power consumption.
That’s why, on multithreded wotk with FastMM, CPU use never reached 100% - because of the Sleep(1) issued by FastMM4. This way of acquiring locks is not optimal. A better way would have been a spin-lock of about 5000 pause instructions, and, if the lock was still busy, calling SwitchToThread() API call. If pause is not available (on very old processors with no SSE2 support) or SwitchToThread() API call was not available (on very old Windows versions, prior to Windows 2000), the best solution would be to utilize EnterCriticalSection/LeaveCriticalSection, that don’t have latency associated by Sleep(1), and which also very effectively cedes control of the CPU core to other threads.
The fork that I've mentioned uses a new approach to waiting for a lock, recommended by Intel in its Optimization Manual for developers - a spinloop of pause + SwitchToThread(), and, if any of these are not available: CriticalSections instead of Sleep(). With these options, the Sleep() will never be used but EnterCriticalSection/LeaveCriticalSection will be used instead. Testing has shown that the approach of using CriticalSections instead of Sleep (which was used by default before in FastMM4) provides significant gain in situations when the number of threads working with the memory manager is the same or higher than the number of physical cores. The gain is even more evident on computers with multiple physical CPUs and Non-Uniform Memory Access (NUMA). I have implemented compile-time options to take away the original FastMM4 approach of using Sleep(InitialSleepTime) and then Sleep(AdditionalSleepTime) (or Sleep(0) and Sleep(1)) and replace them with EnterCriticalSection/LeaveCriticalSection to save valuable CPU cycles wasted by Sleep(0) and to improve speed (reduce latency) that was affected each time by at least 1 millisecond by Sleep(1), because the Critical Sections are much more CPU-friendly and have definitely lower latency than Sleep(1).
When these options are enabled, FastMM4-AVX it checks: (1) whether the CPU supports SSE2 and thus the "pause" instruction, and (2) whether the operating system has the SwitchToThread() API call, and, if both conditions are met, uses "pause" spin-loop for 5000 iterations and then SwitchToThread() instead of critical sections; If a CPU doesn't have the "pause" instrcution or Windows doesn't have the SwitchToThread() API function, it will use EnterCriticalSection/LeaveCriticalSection.
You can see the test results, including made on a computer with multiple physical CPUs (sockets) in that fork.
See also the Long Duration Spin-wait Loops on Hyper-Threading Technology Enabled Intel Processors article. Here is what Intel writes about this issue - and it applies to FastMM4 very well:
The long duration spin-wait loop in this threading model seldom causes a performance problem on conventional multiprocessor systems. But it may introduce a severe penalty on a system with Hyper-Threading Technology because processor resources can be consumed by the master thread while it is waiting on the worker threads. Sleep(0) in the loop may suspend the execution of the master thread, but only when all available processors have been taken by worker threads during the entire waiting period. This condition requires all worker threads to complete their work at the same time. In other words, the workloads assigned to worker threads must be balanced. If one of the worker threads completes its work sooner than others and releases the processor, the master thread can still run on one processor.
On a conventional multiprocessor system this doesn't cause performance problems because no other thread uses the processor. But on a system with Hyper-Threading Technology the processor the master thread runs on is a logical one that shares processor resources with one of the other worker threads.
The nature of many applications makes it difficult to guarantee that workloads assigned to worker threads are balanced. A multithreaded 3D application, for example, may assign the tasks for transformation of a block of vertices from world coordinates to viewing coordinates to a team of worker threads. The amount of work for a worker thread is determined not only by the number of vertices but also by the clipped status of the vertex, which is not predictable when the master thread divides the workload for working threads.
A non-zero argument in the Sleep function forces the waiting thread to sleep N milliseconds, regardless of the processor availability. It may effectively block the waiting thread from consuming processor resources if the waiting period is set properly. But if the waiting period is unpredictable from workload to workload, then a large value of N may make the waiting thread sleep too long, and a smaller value of N may cause it to wake up too quickly.
Therefore the preferred solution to avoid wasting processor resources in a long duration spin-wait loop is to replace the loop with an operating system thread-blocking API, such as the Microsoft Windows* threading API,
WaitForMultipleObjects. This call causes the operating system to block the waiting thread from consuming processor resources.
It refers to Using Spin-Loops on Intel Pentium 4 Processor and Intel Xeon Processor application note.
You can also find a very good spin-loop implementation here at stackoverflow.
It also loads normal loads just to check before issuing a lock-ed store, just to not flood the CPU with locked operations in a loop, that would lock the bus.
FastMM4 per se is very good. Just improve the locking and you will get an excelling multi-threaded memory manager.
Please also be aware that each small block type is locked separately in FastMM4.
You can put padding between the small block control areas, to make each area have own cache line, not shared with other block sizes, and to make sure it begins at a cache line size boundary. You can use CPUID to determine the size of the CPU cache line.
So, with locking correctly implemented to suit your needs (i.e. whether you need NUMA or not, whether to use lock-ing releases, etc., you may obtain the results that the memory allocation routines would be several times faster and would not suffer so severely from thread contention.

FastMM deals with multi-threading just fine. It is the default memory manager for Delphi 2006 and up.
If you are using an older version of Delphi (Delphi 5 and up), you can still use FastMM. It's available on SourceForge.

You could use TopMM:
http://www.topsoftwaresite.nl/
You could also try ScaleMM2 (SynScaleMM is based on ScaleMM1) but I have to fix a bug regarding to interthread memory, so not production ready yet :-(
http://code.google.com/p/scalemm/

Deplhi 6 memory manager is outdated and outright bad. We were using RecyclerMM both on a high-load production server and on a multi-threaded desktop application, and we had no issues with it: it's fast, reliable and doesn't cause excess fragmentation. (Fragmentation was Delphi's stock memory manager worst issue).
The only drawback of RecyclerMM is that it isn't compatible with MemCheck out of the box. However, a small source alteration was enough to render it compatible.

FastMM4, Delphi6, Leak of TApplication?

I checked the FastMM4 with D6.
When I debug a simple application with uses "Forms", I everytime got 3 lines for memory leak.
This application has leaked memory.
The small block leaks are (excluding
expected leaks registered by pointer):
13 - 20 bytes: TObjectList x 3,
Unknown x 3 29 - 36 bytes:
TWinHelpViewer x 1 37 - 52 bytes:
THelpManager x 1
Is this normal?
Which thing causes this?
Thanks:
dd

The RTL/VCL that ships with Delphi 6 contains some memory leaks. In later releases of Delphi the use of FastMM led to these memory leaks being removed from the RTL/VCL.
What you need to do is register these known and expected memory leaks with FastMM. Once you have registered the leaks that FastMM won't report them. Although these leaks are real, they are best ignored for various reasons:
The leaked memory from these known VCL leaks is tiny and doesn't grow during the lifetime of the process.
The memory returned to the system as soon as the process terminates anyway.
Since the leaks are in code beyond your control, there's not a huge amount you can do. You could fix them and use your own version of the VCL units in question, but is it worth it?
The only time these leaks could matter is if you had a DLL which was loaded and unloaded from the same process thousands of times during the lifetime of that process. I don't believe this is a very realistic scenario.
If you don't register the leaks then the FastMM leak reporting becomes largely ineffective because it shows every time. If it shows every time you learn to ignore it. This leak reporting is very valuable, but it is only valuable if it shows leaks that you have some control over.
In my Delphi 6 project I have the following code in my .dpr file:
// Register expected VCL memory leaks caused by Delphi unit HelpIntfs.
FastMM4.RegisterExpectedMemoryLeak(36, 2); // THelpManager x 1, THTMLHelpViewer x 1
FastMM4.RegisterExpectedMemoryLeak(20, 7); // TObjectList x 3, THelpSelector x 1, Unknown x 3
FastMM4.RegisterExpectedMemoryLeak(52); // TWinHelpViewer x 1
I also have the following in a TForm descendant from which all forms in my app descend:
var
ExpectedHelpStringMemoryLeakRegistered: Boolean;
procedure TMyForm.WMHelp(var Message: TWMHelp);
begin
if not (biHelp in BorderIcons) and not ExpectedHelpStringMemoryLeakRegistered then begin
// Register expected VCL memory leaks caused by Delphi unit HelpIntfs.
FastMM4.RegisterExpectedMemoryLeak(44); // TString x 1
ExpectedHelpStringMemoryLeakRegistered := True;
end;
inherited;
end;
Depending on exactly which units you use in the RTL/VCL and how you use them, you may need to register different memory leaks.

I would guess it is normal unless you patched the sources. IIRC, when there was 'memproof', its author 'Atanas Stoyanov' kept a list of bugs that caused memory leaks. The leak in the 'classes.pas', f.i., effected every VCL forms application. Though the product does not exist anymore, you can find the list at 'Automated QA's site. Here's the list for D6.

Why doesn't my program's memory usage return to normal after I free memory?

consider the next sample application
program TestMemory;
{$APPTYPE CONSOLE}
uses
PsAPI,
Windows,
SysUtils;
function GetUsedMemoryFastMem: cardinal;
var
st: TMemoryManagerState;
sb: TSmallBlockTypeState;
begin
GetMemoryManagerState(st);
result := st.TotalAllocatedMediumBlockSize + st.TotalAllocatedLargeBlockSize;
for sb in st.SmallBlockTypeStates do
begin
result := result + sb.UseableBlockSize * sb.AllocatedBlockCount;
end;
end;
function GetUsedMemoryWindows: longint;
var
ProcessMemoryCounters: TProcessMemoryCounters;
begin
Result:=0;
ProcessMemoryCounters.cb := SizeOf(TProcessMemoryCounters);
if GetProcessMemoryInfo(GetCurrentProcess(), #ProcessMemoryCounters, ProcessMemoryCounters.cb) then
Result:= ProcessMemoryCounters.WorkingSetSize
else
RaiseLastOSError;
end;
procedure Test;
const
Size = 1024*1024;
var
P : Pointer;
begin
GetMem(P,Size);
Writeln('Inside');
Writeln('FastMem '+FormatFloat('#,', GetUsedMemoryFastMem));
Writeln('Windows '+FormatFloat('#,', GetUsedMemoryWindows));
Writeln('');
FreeMem(P);
end;
begin
Writeln('Before');
Writeln('FastMem '+FormatFloat('#,', GetUsedMemoryFastMem));
Writeln('Windows '+FormatFloat('#,', GetUsedMemoryWindows));
Writeln('');
Test;
Writeln('After');
Writeln('FastMem '+FormatFloat('#,', GetUsedMemoryFastMem));
Writeln('Windows '+FormatFloat('#,', GetUsedMemoryWindows));
Writeln('');
Readln;
end.
the results returned by the app are
Before
FastMem 1.844
Windows 3.633.152
Inside
FastMem 1.050.612
Windows 3.637.248
After
FastMem 2.036
Windows 3.633.152
I wanna know why the results of the memory usage are different in the Before and After:

Any memory manager (including FastMM) incurs some overhead, otherwise Delphi could have just used the Windows memory management.
The difference you observe is the overhead:
structures that FastMM uses to keep track of memory usage,
pieces of memory that FastMM did not yet return to the Windows memory management to optimize similar memory allocations in the future.

Because the memory manager is doing clever things in the background to speed up performance.

How does it work getmem/malloc/free?
Heap Allocator - As used by malloc...
1) Internally allocates large chunks(typically 64K up to 1Megabyte) of memory and then sub-divides the chunks up to give you the 100byte and 200byte objects and strings in the program.
When you free memory all that happens is the place where it was allocated from in the internal buffer or chunk is then marked as free. NOTHING ACTUALLY HAPPENS!
2) So you can think of the HEAP as a list of big chunks of memory, and all the objects in your program are just little parts of those chunks.
3) The big internal chunks of memory are only freed when all the objects inside them have been freed, so the usual case is that when you free some object that nothing actually happens except some bits get marked as being available.
That is a fairly naive description of the heap system, but most heaps work in a similar way, but do a lot more optimization than that. But your question is why does the memory not go down and the answer is because nothing actually gets freed. The internal pages of memory are retained for the next call to "new" or "malloc" etc...
PICTURE IT
INSIDE HEAP IS ONE HUGE BLOCK OF 100Kb
You call "malloc(1000)" or "getmem(1000)" to get a 1K block of memory.
Then all that happens is that the 1K block of memory is taken from the 100kb block of memory leaving 99K of memory available in that block. If you keep calling malloc or getmem then it will just keep dividing the larger block up until it needs another larger block.
Each little block of memory allocated with a call to malloc or getmem actually gets about 16 or 24 extra bytes(depending on allocator) extra memory. That memory is bits that the allocator uses to know what is allocated and also where it is allocated.

String Sharing/Reference issue with objects in Delphi

My application builds many objects in memory based on filenames (among other string based information). I was hoping to optimise memory usage by storing the path and filename separately, and then sharing the path between objects in the same path. I wasn't trying to look at using a string pool or anything, basically my objects are sorted so if I have 10 objects with the same path I want objects 2-10 to have their path "pointed" at object 1's path (eg object[2].Path=object[1].Path);
I have a problem though, I don't believe that my objects are in fact sharing a reference to the same string after I think I am telling them to (by the object[2].Path=object[1].Path assignment).
When I do an experiment with a string list and set all the values to point to the first value in the list I can see the "memory conservation" in action, but when I use objects I see absolutely no change at all, admittedly I am only using task manager (private working set) to watch for memory use changes.
Here's a contrived example, I hope this makes sense.
I have an object:
TfileObject=class(Tobject)
FpathPart: string;
FfilePart: string;
end;
Now I create 1,000,000 instances of the object, using a new string for each one:
var x: integer;
MyFilePath: string;
fo: TfileObject;
begin
for x := 1 to 1000000 do
begin
// create a new string for every iteration of the loop
MyFilePath:=ExtractFilePath(Application.ExeName);
fo:=TfileObject.Create;
fo.FpathPart:=MyFilePath;
FobjectList.Add(fo);
end;
end;
Run this up and task manager says I am using 68MB of memory or something. (Note that if I allocated MyFilePath outside of the loop then I do save memory because of 1 instance of the string, but this is a contrived example and not actually how it would happen in the app).
Now I want to "optimise" my memory usage by making all objects share the same instance of the path string, since it's the same value:
var x: integer;
begin
for x:=1 to FobjectList.Count-1 do
begin
TfileObject(FobjectList[x]).FpathPart:=TfileObject(FobjectList[0]).FpathPart;
end;
end;
Task Manager shows absouletly no change.
However if I do something similar with a TstringList:
var x: integer;
begin
for x := 1 to 1000000 do
begin
FstringList.Add(ExtractFilePath(Application.ExeName));
end;
end;
Task Manager says 60MB memory use.
Now optimise with:
var x: integer;
begin
for x := 1 to FstringList.Count - 1 do
FstringList[x]:=FstringList[0];
end;
Task Manager shows the drop in memory usage that I would expect, now 10MB.
So I seem to be able to share strings in a string list, but not in objects. I am obviously missing something conceptually, in code or both!
I hope this makes sense, I can really see the ability to conserve memory using this technique as I have a lot of objects all with lots of string information, that data is sorted in many different ways and I would like to be able to iterate over this data once it is loaded into memory and free some of that memory back up again by sharing strings in this way.
Thanks in advance for any assistance you can offer.
PS: I am using Delphi 2007 but I have just tested on Delphi 2010 and the results are the same, except that Delphi 2010 uses twice as much memory due to unicode strings...

When your Delphi program allocates and deallocates memory it does this not by using Windows API functions directly, but it goes through the memory manager. What you are observing here is the fact that the memory manager does not release all allocated memory back to the OS when it's no longer needed in your program. It will keep some or all of it allocated for later, to speed up later memory requests in the application. So if you use the system tools the memory will be listed as allocated by the program, but it is not in active use, it is marked as available internally and is stored in lists of usable memory blocks which the MM will use for any further memory allocations in your program, before it goes to the OS and requests more memory.
If you want to really check how any changes to your programs affect the memory consumption you should not rely on external tools, but should use the diagnostics the memory manager provides. Download the full FastMM4 version and use it in your program by putting it as the first unit in the DPR file. You can get detailed information by using the GetMemoryManagerState() function, which will tell you how much small, medium and large memory blocks are used and how much memory is allocated for each block size. For a quick check however (which will be completely sufficient here) you can simply call the GetMemoryManagerUsageSummary() function. It will tell you the total allocated memory, and if you call it you will see that your reassignment of FPathPart does indeed free several MB of memory.
You will observe different behaviour when a TStringList is used, and all strings are added sequentially. Memory for these strings will be allocated from larger blocks, and those blocks will contain nothing else, so they can be released again when the string list elements are freed. If OTOH you create your objects, then the strings will be allocated alternating with other data elements, so freeing them will create empty memory regions in the larger blocks, but the blocks won't be released as they contain still valid memory for other things. You have basically increased memory fragmentation, which could be a problem in itself.

As noted by another answer, memory that is not being used is not always immediately released to the system by the Delphi Memory Manager.
Your code guarantees a large quantity of such memory by dynamically growing the object list.
A TObjectList (in common with a TList and a TStringList) uses an incremental memory allocator. A new instance of one of these containers starts with memory allocated for 4 items (the Capacity). When the number of items added exceeds the Capacity additional memory is allocated, initially by doubling the capacity and then once a certain number of items has been reached, by increasing the capacity by 25%.
Each time the Count exceeds the Capacity, additional memory is allocated, the current memory copied to the new memory and the previously used memory released (it is this memory which is not immediately returned to the system).
When you know how many items are to be loaded into one of these types of list you can avoid this memory re-allocation behaviour (and achieve a significant performance improvement) by pre-allocating the Capacity of the list accordingly.
You do not necessarily have to set the precise capacity needed - a best guess (that is more likely to be nearer, or higher than, the actual figure required is still going to be better than the initial, default capacity of 4 if the number of items is significantly > 64)

Because task manager does not tell you the whole truth. Compare with this code:
var
x: integer;
MyFilePath: string;
fo: TfileObject;
begin
MyFilePath:=ExtractFilePath(Application.ExeName);
for x := 1 to 1000000 do
begin
fo:=TfileObject.Create;
fo.FpathPart:=MyFilePath;
FobjectList.Add(fo);
end;
end;

To share a reference, strings need to be assigned directly and be of the same type (Obviously, you can't share a reference between UnicodeString and AnsiString).
The best way I can think of to achieve what you want is as follow:
var StrReference : TStringlist; //Sorted
function GetStrReference(const S : string) : string;
var idx : Integer;
begin
if not StrReference.Find(S,idx) then
idx := StrReference.Add(S);
Result := StrReference[idx];
end;
procedure YourProc;
var x: integer;
MyFilePath: string;
fo: TfileObject;
begin
for x := 1 to 1000000 do
begin
// create a new string for every iteration of the loop
MyFilePath := GetStrReference(ExtractFilePath(Application.ExeName));
fo := TfileObject.Create;
fo.FpathPart := MyFilePath;
FobjectList.Add(fo);
end;
end;
To make sure it has worked correctly, you can call the StringRefCount(unit system) function. I don't know in which version of delphi that was introduced, so here's the current implementation.
function StringRefCount(const S: UnicodeString): Longint;
begin
Result := Longint(S);
if Result <> 0 then
Result := PLongint(Result - 8)^;
end;
Let me know if it worked as you wanted.
EDIT: If you are afraid of the stringlist growing too big, you can safely scan it periodically and delete from the list any string with a StringRefCount of 1.
The list could be wiped clean too... But that will make the function reserve a new copy of any new string passed to the function.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart