What bookkeeping data does a Delphi dynamic array contain? - delphi

Here's a simple program to check memory allocation. Checking before and after values with Task Manager suggests that each dynamic array takes up 20 bytes of memory at size = 1. The element size is 4, which means 16 bytes of overhead for bookkeeping data.
From looking through system.pas, I can find an array length field at -4 bytes, and a reference count at -8 bytes, but I can't seem to find any references to the other 8. Anyone know what they do?
Sample program:
program Project1;
{$APPTYPE CONSOLE}
type
TDynArray = array of integer;
TLotsOfArrays = array[1..1000000] of TDynArray;
PLotsOfArrays = ^TLotsOfArrays;
procedure allocateArrays;
var
arrays: PLotsOfArrays;
i: integer;
begin
new(arrays);
for I := 1 to 1000000 do
setLength(arrays^[i], 1);
end;
begin
readln;
allocateArrays;
readln;
end.

I had a look into System.pas as well and noticed that the GetMem call in _DynArrayCopyRange supports your analyis:
allocated size = count * element size
+ 2 * Sizeof(Longint)
. So maybe the numbers you get from task manager aren't very accurate. You could try Pointer(someDynArray) := nil and check which memory leak size FastMM reports for more reliable numbers.
Edit: I did a little test program:
program DynArrayLeak;
{$APPTYPE CONSOLE}
uses
SysUtils;
procedure Test;
var
arr: array of Integer;
i: Integer;
begin
for i := 1 to 6 do
begin
SetLength(arr, i);
Pointer(arr) := nil;
end;
end;
begin
ReportMemoryLeaksOnShutdown := True;
Test;
end.
This yields
An unexpected memory leak has occurred. The unexpected small block leaks are:
1 - 12 bytes: Unknown x 1
13 - 20 bytes: Unknown x 2
21 - 28 bytes: Unknown x 2
29 - 36 bytes: Unknown x 1
which supports the 8 byte overhead theory.

Memory allocations have granularity to ensure all allocations are aligned. This is just the slop caused by this.

Updated...
I actually went to check the code (which I should've done before) and I came to the same conclusion as Ulrich, it's not storing any type information, just the 2 Longint overhead then NbElements*ElementSize.
And, Task manager is not accurate for this kind of measure.
With the oddity that if you measure the memory used by the dynarray, it increases non linearly with the size of the element: for a Record with 2 or 3 Integers it's the same size (20), with 4 or 5 it's 28... following the granularity of the blocksizes.
Memory measured with:
// Return the total Memory used as reported by the Memory Manager
function MemoryUsed: Cardinal;
var
MemMgrState: TMemoryManagerState;
SmallBlockState: TSmallBlockTypeState;
begin
GetMemoryManagerState(MemMgrState);
Result := MemMgrState.TotalAllocatedMediumBlockSize + MemMgrState.TotalAllocatedLargeBlockSize;
for SmallBlockState in MemMgrState.SmallBlockTypeStates do begin
Result := Result + SmallBlockState.UseableBlockSize * SmallBlockState.AllocatedBlockCount;
end;
end;

Related

What is the purpose or function of the TQueue.Capacity property?

Delphi's generic TQueue class has a property called Capacity. If the number of items in the TQueue exceeds its capacity, additional items are still added to the queue. The documentation says the property "gets or sets the queue capacity, that is, the maximum size of the queue without resizing." It sounds like a queue is kind of like a fixed length array (memory-wise)--until it's full, at which point it becomes more like a dynamic array? Is that accurate?
When would a programmer want or need to get or set a TQueue's capacity?
Theory
Consider the following example, which generates a dynamic array of random integers:
program DynArrAlloc;
{$APPTYPE CONSOLE}
{$R *.res}
uses
Windows, System.SysUtils;
const
N = 100000000;
var
a: TArray<Integer>;
i: Integer;
tc1, tc2: Cardinal;
begin
tc1 := GetTickCount;
SetLength(a, 0);
for i := 1 to N do
begin
SetLength(a, Succ(Length(a)));
a[High(a)] := Random(1000);
end;
tc2 := GetTickCount;
Writeln(tc2 - tc1);
Readln;
end.
On my system, it takes 4.5 seconds to run it.
Notice that I -- in each iteration -- reallocate the array so it can hold one more item.
It would be better if I allocated a large enough array from the beginning:
program DynArrAlloc;
{$APPTYPE CONSOLE}
{$R *.res}
uses
Windows, System.SysUtils;
const
N = 100000000;
var
a: TArray<Integer>;
i: Integer;
tc1, tc2: Cardinal;
begin
tc1 := GetTickCount;
SetLength(a, N);
for i := 1 to N do
a[N - 1] := Random(1000);
tc2 := GetTickCount;
Writeln(tc2 - tc1);
Readln;
end.
This time, the program only takes 0.6 seconds.
Hence, one should always try not to reallocate unnecessarily. Each time I reallocate in the first example, I need to ask for more memory; then I need to copy the array to the new location, and finally free the old memory. Clearly, this is very inefficient.
Unfortunately, it isn't always possible to allocate a large enough array at the start. You simply might not know the final element count.
A common strategy then is to allocate in steps: when the array is full and you need one more slot, allocate several more slots but keep track of the actual number of used slots:
program DynArrAlloc;
{$APPTYPE CONSOLE}
{$R *.res}
uses
Windows, System.SysUtils;
const
N = 100000000;
var
a: TArray<Integer>;
i: Integer;
tc1, tc2: Cardinal;
ActualLength: Integer;
const
AllocStep = 1024;
begin
tc1 := GetTickCount;
SetLength(a, AllocStep);
ActualLength := 0;
for i := 1 to N do
begin
if ActualLength = Length(a) then
SetLength(a, Length(a) + AllocStep);
a[ActualLength] := Random(1000);
Inc(ActualLength);
end;
// Trim the excess:
SetLength(a, ActualLength);
tc2 := GetTickCount;
Writeln(tc2 - tc1);
Readln;
end.
Now we need 1.3 seconds.
In this example, I allocate in fixed-sized blocks. A more common strategy is probably to double the array at each reallocation (or multiply by 1.5 or something) or combine these options in a smart way.
Applying the theory
Under the hood, TList<T>, TQueue<T>, TStack<T>, TStringList etc. need to dynamically allocate space for an unlimited number of items. To make this performant, these classes do allocate more than necessary. The Capacity is the number of elements you can fit in the currently allocated memory while the Count <= Capacity is the actual number of elements in the container.
You can set the Capacity property to reduce the need for intermediate allocation when you fill a container and you do know the final number of elements from the beginning:
var
L: TList<Integer>;
begin
L := TList<Integer>.Create;
try
while not Something.EOF do
L.Add(Something.GetNextValue);
finally
L.Free;
end;
is OK and requires probably only a few reallocations, but
L := TList<Integer>.Create;
try
L.Capacity := Something.Count;
while not Something.EOF do
L.Add(Something.GetNextValue);
finally
L.Free;
end;
will be faster since there will be no intermediate reallocations.
Internally TQueue contains dynamic array that stores elements.
When item count reaches current capacity, array is reallocated (for example, doubles it's size) and you can add more and more elements.
If you know reliable limit for maximum item count, it is worth to set Capacity, so you will avoid memory reallocations, saving some time.

Isn't it dangerous to use the Longint count with the Int64 size in Stream.read?

I was examining the TMemoryStream class and found the following routine:
procedure TMemoryStream.LoadFromStream(Stream: TStream);
var
Count: Longint;
begin
Stream.Position := 0;
Count := Stream.Size; // <-- assigning Int64 to Longint
SetSize(Count);
if Count <> 0 then Stream.ReadBuffer(FMemory^, Count);
end;
I have seen this pattern a lot where an Int64 is assigned to a Longint.
My understanding is that Longint is four bytes and Int64 is eight bytes in both 32-bit and 64-bit Windows, so if my file size is $1 FFFF FFFF == 8589934591 == 8 GB then this routine will simply fail to read because the final count will be $ FFFF FFFF == -1.
I do not understand how this is allowed and maybe not taken into consideration (maybe not many people are trying to read an 8+ GB file).
I logged a ticket for this and it has apparently been fixed in Tokyo 10.2. This is an issue for 64 bit compilation.
https://quality.embarcadero.com/browse/RSP-19094
There are problems with large (>2GB) files in both TCustomMemoryStream and TMemoryStream. In TMemoryStream the issues are simple as the local variables need to be declared as NativeInt instead of LongInt's and Capacity needs to be changed to an NativeInt. In TCustomMemoryStream they are more subtle because both TCustomMemoryStream.Read methods assign the result of an Int64 - Int64 calculation directly to a LongInt. This will overflow even when the result of this calculation isn't larger than a LongInt.
If you want to fix this in Seattle then you will need to either do a code hook, replace the System.Classes unit or roll out your own replacement class for TMemoryStream. Bear in mind that for the last option, you will need to also replace TBytesStream and TStringStream because these descend from TMemoryStream.
The other problem with the the last option is that third party components won't have your "fixes". For us, we only had a couple of places that needed to work with files larger than 2GB so we switched those across.
The fixes for TCustomMemoryStream.Read (must be to both methods) will look something like this:
function TCustomMemoryStream.Read(var Buffer; Count: Longint): Longint;
{ These 2 lines are new }
var
remaining: Int64;
begin
if (FPosition >= 0) and (Count >= 0) then
begin
remaining{Result} := FSize - FPosition;
if remaining{Result} > 0 then
begin
if remaining{Result} > Count then
Result := Count
else
Result := remaining;
Move((PByte(FMemory) + FPosition)^, Buffer, Result);
Inc(FPosition, Result);
Exit;
end;
end;
Result := 0;
end;

Faster way to get virtual allocation addresses

I would like to process (save) the virtual memory blocks allocated to the current process. Here is the code I am using:
program Project38;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils,
Winapi.Windows;
procedure DoProcess(aStart: Pointer; aSize: Cardinal);
begin
// process it
end;
procedure ProcessVirtualMemory;
var
addr: Pointer;
i: Integer;
p: Pointer;
systemInfo: SYSTEM_INFO;
startAddress, stopAddress: Pointer;
size: size_t;
memInfo: MEMORY_BASIC_INFORMATION;
begin
GetSystemInfo(systemInfo);
startAddress := systemInfo.lpMinimumApplicationAddress;
stopAddress := systemInfo.lpMaximumApplicationAddress;
addr := startAddress;
while NativeUInt(addr) < NativeUInt(stopAddress) do begin
size:= VirtualQuery(Pointer(addr), memInfo, SizeOf(MEMORY_BASIC_INFORMATION));
if (size = SizeOf(MEMORY_BASIC_INFORMATION)) and
(memInfo.State = MEM_COMMIT) and
(memInfo.Type_9 = MEM_PRIVATE) and
(memInfo.RegionSize > 0) and
(memInfo.Protect = PAGE_READWRITE) then
begin
DoProcess(memInfo.BaseAddress, memInfo.RegionSize);
addr := Pointer(NativeUInt(addr) + memInfo.RegionSize);
end
else
addr := Pointer(NativeUInt(addr) + systemInfo.dwPageSize);
end;
end;
begin
ProcessVirtualMemory;
end.
This code is run with a huge application and collecting this information without processing is 10-12 seconds. Is there a faster way of getting the addresses of the virtual memory blocks?
Your program does contain a mistake. In case the memory block does not match your search criteria, you only increment by the page size rather than the region size. Your loop should really look like this:
while NativeUInt(addr) < NativeUInt(stopAddress) do begin
size := VirtualQuery(Pointer(addr), memInfo, SizeOf(MEMORY_BASIC_INFORMATION));
if size = 0 then begin
// handle error
break
end;
if (size = SizeOf(MEMORY_BASIC_INFORMATION)) and
(memInfo.State = MEM_COMMIT) and
(memInfo.Type_9 = MEM_PRIVATE) and
(memInfo.RegionSize > 0) and
(memInfo.Protect = PAGE_READWRITE) then begin
DoProcess(memInfo.BaseAddress, memInfo.RegionSize);
end;
addr := Pointer(NativeUInt(addr) + memInfo.RegionSize);
end;
The problem with your version is that when there are gaps in the virtual address space, your code walks over them one page at a time, rather than skipping the entire region.
Even without that change, I don't believe that enumeration was the bottleneck. I made the following alteration to your original program, to report the time taken:
var
Stopwatch: TStopwatch;
begin
Stopwatch := TStopwatch.StartNew;
ProcessVirtualMemory;
Writeln(Stopwatch.ElapsedMilliseconds);
Readln;
end.
On my machine, for a 32 bit release build, this reported around 500ms.
Then I allocated some memory:
var
i: Integer;
Stopwatch: TStopwatch;
begin
for i := 1 to 100000 do begin
HeapAlloc(GetProcessHeap, 0, Random(100000));
end;
Stopwatch := TStopwatch.StartNew;
ProcessVirtualMemory;
Writeln(Stopwatch.ElapsedMilliseconds);
Readln;
end.
No matter what I did, playing around with the constants, the program still reports around 500ms. If you fix the non-matching increment code as described at the start of this answer then the times come down to around 100ms.
Of course, for a 64 bit process, it's a little different. The defect in your code means that the program effectively gets stuck walking the huge 64 bit address space one page at a time, calling VirtualQuery for each page in the address space. I never even waited for that process to finish.
My conclusion therefore is that the main bottleneck in your program is not the code that your present, the code that finds the virtual memory blocks. That is the code inside DoProcess, the code that you have that we don't. So even when you fix the defect I described at the start, you will still spend significant time in that function. You should expect the virtual memory space enumeration to take in the region of 100ms.

Enumeration set size in x64

I found that a SizeOf(set) i different in 32-bit and 64-bit, the example below shows 5 byte for 32-bit and 8 for 64-bit. But i found nothing information about changes in SizeOf(sets) for 64-bit. Is there any Embarcadero documentation about it or compiler directive to get a similar results on 32 and 64-bit.
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses System.SysUtils;
type
{ Enumeration of properties}
TProperty1 = (p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14,
p15, p16, p17, p18, p19, p20, p21, p22, p23, p24, p25, p26, p27, p28,
p29, p30, p31, p32, p33, p34, p35, p36, p37);
TProperties1 = set of TProperty1;
begin
WriteLn(SizeOf(TProperties1));
ReadLn;
end.
To answer your question. I couldn't find anything on the Embarcadero site regarding the differences or a compiler directive to change the behavior. My research indicates the following:
Sets have the following sizes in bytes in 32 bit:
Up to 8 elements - 1 Byte
9 to 16 elements - 2 Bytes
17 to 32 elements - 4 Bytes
From this point onwards it adds adds bytes as needed, one at a time. So 33 to 40 elements uses 5 bytes and 41 to 48 elements uses 6 bytes.
In 64 bit mode, things are slightly different:
Up to 8 elements - 1 Byte
9 to 16 elements - 2 Bytes
17 to 32 elements - 4 Bytes
33 to 64 elements - 8 Bytes
From this point onwards it adds adds bytes as needed, one at a time. So 65 to 72 elements uses 9 bytes and 73 to 80 elements uses 10 bytes.
To get around this you are going to need to either use something like WriteSet in TWriter.WriteProperty and TReader.ReadSet or you can do something like this:
procedure SaveSetToStream(aStream: TStream; const aSet: TProperties1);
var
streamData: array[0..7] of byte;
begin
Assert(SizeOf(aSet) <= SizeOf(streamData), 'Set is too large to save. Increase the array length.');
FillChar(streamData, SizeOf(streamData), 0);
Move(aSet, streamData, SizeOf(aSet));
aStream.Write(streamData, SizeOf(streamData));
end;
function ReadFromStream(aStream: TStream): TProperties1;
var
streamData: array[0..7] of byte;
begin
Assert(SizeOf(Result) <= SizeOf(streamData), 'Set is too large to load. Increase the array length.');
aStream.Read(streamData, SizeOf(streamData));
Move(streamData, Result, SizeOf(Result));
end;
Another workaround, to make sure a 32 bit machine can read a stream from a 64 bit machine and vice-versa is to create a function
function SizeCheck( const p : integer ) : integer;
begin
if p in [5..8 ] then Result := 8 else Result := p; // adjust for 64 bit set sizes
end;
and then use
Stream.Write(set, SizeCheck(SizeOf(set)));
Obviously only use for sets.

How can I find out how much memory is used by a specific component or class?

Is it possible to retrieve the amount of memory that is used by a single component in delphi?
I'm downloading simple strings from the internet and I see that the memory usage is up to a gigabyte at by the end of the downloading process, but when I look at the saved file which contains everything I downloaded the file is only in the kilobyte range, clearly there is something going on with the components, even though I destroy them.
Example:
Edit:
procedure TForm1.OnCreate(Sender: TObject);
var list: TStringList;
begin
list:=TStringList.Create;
list.LoadFromFile('10MB_of_Data.txt');
list.destroy;
end;
How can I know that "list" as a TStringList is using 10 MB worth of space in memory?
Thank you.
I think comparing the memory usage before and after is the way to go with this as there is no simple way of seeing what memory was allocated by a block of code after the fact... For example, with the string list above, the class itself will only take up a small amount of memory as it is made up of pointers to other allocations (i.e. the array of strings) and that itself is an array of pointers to to the actual strings... and this is a comparatively simple case.
Anyway, this can be done with FastMM with a function like follows...
uses
FastMM4;
function CheckAllocationBy(const AProc: TProc): NativeUInt;
var
lOriginalAllocated: NativeUInt;
lFinalAllocated: NativeUInt;
lUsage: TMemoryManagerUsageSummary;
begin
GetMemoryManagerUsageSummary(lUsage);
lOriginalAllocated := lUsage.AllocatedBytes;
try
AProc;
finally
GetMemoryManagerUsageSummary(lUsage);
lFinalAllocated := lUsage.AllocatedBytes;
end;
Result := lFinalAllocated - lOriginalAllocated;
end;
And can be used like so...
lAllocatedBytes := CheckAllocationBy(
procedure
begin
list:=TStringList.Create;
list.LoadFromFile('10MB_of_Data.txt');
list.Free;
end);
This will tell you how much your string list left behind (which interestingly I get 40 bytes for on the first run of repeated calls and 0 after which after consulting the usage logs before and after the call is two encoding classes created on the first call). If you want to check where leaked memory was allocated, it's simple to use FastMM to do that also (although I agree with the above that if it's 3rd party, it shouldn't be your problem).
First of all: please, be patient, this is actually not exactly answer for your question, but it is too large for posting it in comment. This code was written and compiled using FPC, but it can give some estimations for you. May be somebody knows how to port it to the Delphi.
program project4;
uses
SysUtils,
Classes;
var
p: Pointer;
sl: TStringList;
var
a: TFPCHeapStatus;
begin
a := GetFPCHeapStatus;
writeln('== 1 ==');
//writeln(a.MaxHeapSize);
writeln(a.MaxHeapUsed);
//writeln(a.CurrHeapSize);
writeln(a.CurrHeapUsed);
//writeln(a.CurrHeapFree);
GetMem(p, 1024);
a := GetFPCHeapStatus;
writeln('== 2 ==');
writeln(a.MaxHeapUsed);
writeln(a.CurrHeapUsed);
sl := TStringList.Create;
a := GetFPCHeapStatus;
writeln('== 3 ==');
writeln(a.MaxHeapUsed);
writeln(a.CurrHeapUsed);
sl.Add('To beer or not to beer? That is the question!');
a := GetFPCHeapStatus;
writeln('== 4 ==');
writeln(a.MaxHeapUsed);
writeln(a.CurrHeapUsed);
Readln;
end.
and output:
== 1 ==
2448
2448
== 2 ==
3488
3488
== 3 ==
3568
3568
== 4 ==
3616
3616
And another test with large text file:
sl.LoadFromFile('tolstoy - war and peace.txt');
a := GetFPCHeapStatus;
writeln('== 4 ==');
writeln(a.MaxHeapUsed);
writeln(a.CurrHeapUsed);
Output:
== 3 ==
3568
3568
== 4 ==
8837104
4643776
File size: 3,1 Mb (3 280 005 bytes) (ansi encoding)

Resources