Following on from this question (Dynamic arrays and memory management in Delphi), if I create a dynamic array in Delphi, how do I access the reference count?
SetLength(a1, 100);
a2 := a1;
// The reference count for the array pointed to by both
// a1 and a2 should be 2. How do I retrieve this?
Additionally, if the reference count can be accessed, can it also be modified manually? This latter question is mainly theoretical rather than for use practically (unlike the first question above).
You can see how the reference count is managed by inspecting the code in the System unit. Here are the pertinent parts from the XE3 source:
type
PDynArrayRec = ^TDynArrayRec;
TDynArrayRec = packed record
{$IFDEF CPUX64}
_Padding: LongInt; // Make 16 byte align for payload..
{$ENDIF}
RefCnt: LongInt;
Length: NativeInt;
end;
....
procedure _DynArrayAddRef(P: Pointer);
begin
if P <> nil then
AtomicIncrement(PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt);
end;
function _DynArrayRelease(P: Pointer): LongInt;
begin
Result := AtomicDecrement(PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt);
end;
A dynamic array variable holds a pointer. If the array is empty, then the pointer is nil. Otherwise the pointer contains the address of the first element of the array. Immediately before the first element of the array is stored the metadata for the array. The TDynArrayRec type describes that metadata.
So, if you wish to read the reference count you can use the exact same technique as does the RTL. For instance:
function DynArrayRefCount(P: Pointer): LongInt;
begin
if P <> nil then
Result := PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt
else
Result := 0;
end;
If you want to modify the reference count then you can do so by exposing the functions in System:
procedure DynArrayAddRef(P: Pointer);
asm
JMP System.#DynArrayAddRef
end;
function DynArrayRelease(P: Pointer): LongInt;
asm
JMP System.#DynArrayRelease
end;
Note that the name DynArrayRelease as chosen by the RTL designers is a little mis-leading because it merely reduces the reference count. It does not release memory when the count reaches zero.
I'm not sure why you would want to do this mind you. Bear in mind that once you start modifying the reference count, you have to take full responsibility for getting it right. For instance, this program leaks:
{$APPTYPE CONSOLE}
var
a, b: array of Integer;
type
PDynArrayRec = ^TDynArrayRec;
TDynArrayRec = packed record
{$IFDEF CPUX64}
_Padding: LongInt; // Make 16 byte align for payload..
{$ENDIF}
RefCnt: LongInt;
Length: NativeInt;
end;
function DynArrayRefCount(P: Pointer): LongInt;
begin
if P <> nil then
Result := PDynArrayRec(PByte(P) - SizeOf(TDynArrayRec))^.RefCnt
else
Result := 0;
end;
procedure DynArrayAddRef(P: Pointer);
asm
JMP System.#DynArrayAddRef
end;
function DynArrayRelease(P: Pointer): LongInt;
asm
JMP System.#DynArrayRelease
end;
begin
ReportMemoryLeaksOnShutdown := True;
SetLength(a, 1);
Writeln(DynArrayRefCount(a));
b := a;
Writeln(DynArrayRefCount(a));
DynArrayAddRef(a);
Writeln(DynArrayRefCount(a));
a := nil;
Writeln(DynArrayRefCount(b));
b := nil;
Writeln(DynArrayRefCount(b));
end.
And if you make a call to DynArrayRelease that takes the reference count to zero then you would also need to dispose of the array, for reasons discussed above. I've never encountered a problem that would require manipulation of the reference count, and strongly suggest that you avoid doing so.
One final point. The RTL does not offer this functionality through its public interface. Which means that all of the above is private implementation detail. And so is subject to change in a future release. If you do attempt to read or modify the reference count then you must recognise that doing so relies on such implementation detail.
After some googling, I found an excellent article by Rudy Velthuis. I highly recommend to read it. Quoting dynamic arrays part from http://rvelthuis.de/articles/articles-pointers.html#dynarrays
At the memory location below the address to which the pointer points, there are two more fields, the number of elements allocated, and the reference count.
If, as in the diagram above, N is the address in the dynamic array variable, then the reference count is at address N-8, and the number of allocated elements (the length indicator) at N-4. The first element is at address N.
How to access these:
SetLength(a1, 100);
a2 := a1;
// Reference Count = 2
refCount := PInteger(NativeUInt(#a1[0]) - SizeOf(NativeInt) - SizeOf(Integer))^;
// Array Length = 100
arrLength := PNativeInt(NativeUInt(#a1[0]) - SizeOf(NativeInt))^;
The trick in computing proper offsets is to account for differences between 32bit and 64bit platforms code. Fields size in bytes is as follows:
32bit 64bit
RefCount 4 4
Length 4 8
Related
Given a buffer and its size in bytes, is there a way to convert this to TBytes without copying it?
Example:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
Arr: TBytes;
i: Integer;
begin
// some clever code here to get contents of the buffer into the Array
for i := 0 to Length(Arr)-1 do begin
HandleByte(Arr[i]);
end;
end;
I could of course copy the data:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
Arr: TBytes;
i: Integer;
begin
// this works but is very inefficient
SetLength(Arr, _BufSize);
Move(PByte(_Buffer)^, Arr[0], _BufSize);
//
for i := 0 to Length(Arr)-1 do begin
HandleByte(Arr[i]);
end;
end;
But for a large buffer (about a hundred megabytes) this would mean I have double the memory requirement and also spend a lot of time unnecessarily copying data.
I am aware that I could simply use a PByte to process each byte in the buffer, I'm only interested in a solution to use a TBytes instead.
I think it's not possible, but I have been wrong before.
No, this is not possible (without unreasonable hacks).
The problem is that TBytes = TArray<Byte> = array of Byte is a dynamic array and the heap object for a non-empty dynamic array has a header containing the array's reference count and length.
A function that accepts a TBytes parameter, when given a plain pointer to an array of bytes, might (rightfully) attempt to read the (non-existing) header, and then you are in serious trouble.
Also, dynamic arrays are managed types (as indicated by the reference count I mentioned), so you might have problems with that as well.
However, in your particular example code, you don't actually use the dynamic array nature of the data at all, so you can work directly with the buffer:
procedure HandleBuffer(_Buffer: PByte; _BufSize: integer);
var
i: Integer;
begin
for i := 0 to _BufSize - 1 do
HandleByte(_Buffer[i]);
end;
I use Delphi 10.1 Berlin in Windows 10.
I have two records of different sizes. I wrote code to loop through two TList<T> of these records to test elapsed times. Looping through the list of the larger record runs much slower.
Can anyone explain the reason, and provide a solution to make the loop run faster?
type
tTestRecord1 = record
Field1: array[0..4] of Integer;
Field2: array[0..4] of Extended;
Field3: string;
end;
tTestRecord2 = record
Field1: array[0..4999] of Integer;
Field2: array[0..4999] of Extended;
Field3: string;
end;
procedure TForm1.Button1Click(Sender: TObject);
var
_List: TList<tTestRecord1>;
_Record: tTestRecord1;
_Time: TTime;
i: Integer;
begin
_List := TList<tTestRecord1>.Create;
for i := 0 to 4999 do
begin
_List.Add(_Record);
end;
_Time := Time;
for i := 0 to 4999 do
begin
if _List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Button1.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.000
_List.Free;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
_List: TList<tTestRecord2>;
_Record: tTestRecord2;
_Time: TTime;
i: Integer;
begin
_List := TList<tTestRecord2>.Create;
for i := 0 to 4999 do
begin
_List.Add(_Record);
end;
_Time := Time;
for i := 0 to 4999 do
begin
if _List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Button2.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.045
_List.Free;
end;
First of all, I want to consider the entire code, even the code that populates the list which I do realise you have not timed. Because the second record is larger in size more memory needs to be copied when you make an assignment of that record type. Further when you read from the list the larger record is less cache friendly than the smaller record which impacts performance. This latter effect is likely less significant than the former.
Related to this is that as you add items the list's internal array of records has to be resized. Sometimes the resizing leads to a reallocation that cannot be performed in-place. When that happens a new block of memory is allocated and the previous content is copied to this new block. That copy is clearly ore expensive for the larger record. You can mitigate this by allocating the array once up front if you know it's length. The list Capacity is the mechanism to use. Of course, not always will you know the length ahead of time.
Your program does very little beyond memory allocation and memory access. Hence the performance of these memory operations dominates.
Now, your timing is only of the code that reads from the lists. So the memory copy performance difference on population is not part of the benchmarking that you performed. Your timing differences are mainly down to excessive memory copy when reading, as I will explain below.
Consider this code:
if _List[i].Field3 = 'abcde' then
Because _List[i] is a record, a value type, the entire record is copied to an implicit hidden local variable. The code is actually equivalent to:
var
tmp: tTestRecord2;
...
tmp := _List[i]; // copy of entire record
if tmp.Field3 = 'abcde' then
There are a few ways to avoid this copy:
Change the underlying type to be a reference type. This changes the memory management requirements. And you may have good reason to want to use a value type.
Use a container class that can return the address of an item rather than a copy of an item.
Switch from TList<T> to dynamic array TArray<T>. That simple change will allow the compiler to access individual fields directly without copying entire records.
Use the TList<T>.List to obtain access to the list object's underlying array holding the data. That would have the same effect as the previous item.
Item 4 above is the simplest change you could make to see a large difference. You would replace
if _List[i].Field3 = 'abcde' then
with
if _List.List[i].Field3 = 'abcde' then
and that should yield a very significant change in performance.
Consider this program:
{$APPTYPE CONSOLE}
uses
System.Diagnostics,
System.Generics.Collections;
type
tTestRecord2 = record
Field1: array[0..4999] of Integer;
Field2: array[0..4999] of Extended;
Field3: string;
end;
procedure Main;
const
N = 100000;
var
i: Integer;
Stopwatch: TStopwatch;
List: TList<tTestRecord2>;
Rec: tTestRecord2;
begin
List := TList<tTestRecord2>.Create;
List.Capacity := N;
for i := 0 to N-1 do
begin
List.Add(Rec);
end;
Stopwatch := TStopwatch.StartNew;
for i := 0 to N-1 do
begin
if List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Writeln(Stopwatch.ElapsedMilliseconds);
end;
begin
Main;
Readln;
end.
I had to compile it for 64 bit to avoid an out of memory condition. The output on my machine is around 700. Change List[i].Field3 to List.List[i].Field3 and the output is in single figures. The timing is rather crude, but I think this demonstrates the point.
The issue of the large record not being cache friendly remains. That is more complicated to deal with and would require a detailed analysis of how the real world code operated on its data.
As an aside, if you care about performance then you won't use Extended. Because it has size 10, not a power of two, memory access is frequently mis-aligned. Use Double or Real which is an alias to Double.
As the topic indicates above, I'm wondering if there's a good example of a clean and efficient way to handle pointers as passed in function parms when processing the data sequentially. What I have is something like:
function myfunc(inptr: pointer; inptrsize: longint): boolean;
var
inproc: pointer;
i: integer;
begin
inproc := inptr;
for i := 1 to inptrsize do
begin
// do stuff against byte data here.
inc(longint(inproc), 1);
end;
end;
The idea is that instead of finite pieces of data, I want it to be able to process whatever is pushed its way, no matter the size.
Now when it comes to processing the data, I've figured out a couple of ways to do it successfully.
Assign the parm pointers to identical temporary pointers, then use those to access each piece of data, incrementing them to move on. This method is quickest, but not very clean looking with all the pointer increments spread all over the code. (this is what I'm talking about above)
Assign the parm pointers to a pointer representing a big array value and then incremently process that using standard table logic. Much cleaner, but about 500 ms slower than #1.
Is there another way to efficiently handle processing pointers in this way, or is there some method I'm missing that will both be clean and not time inefficient?
Your code here is basically fine. I would always choose to increment a pointer than cast to a fake array.
But you should not cast to an integer. That is semantically wrong and you'll pay the penalty anytime you compile on a platform that has pointer size different from your integer size. Always use a pointer to an element of the right size. In this case a pointer to byte.
function MyFunc(Data: PByte; Length: Integer): Boolean;
var
i: Integer;
begin
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Unless the compiler is having a really bad day, you won't find it easy to get better performing code than this. What's more, I think this style is actually rather clear and easy to understand. Most of the clarity gain comes in avoiding the need to cast. Always strive to remove casts from your code.
If you want to allow any pointer type to be passed then you can write it like this:
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
i: Integer;
Data: PByte;
begin
Data := P;
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Or if you want to avoid pointers in the interface, then use an untyped const parameter.
function MyFunc(const Buffer; Length: Integer): Boolean;
var
i: Integer;
Data: PByte;
begin
Data := PByte(#Buffer);
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Use a var parameter if you need to modify the buffer.
I have a different opinion: For sake of readability I would use an array. Pascal was not designed to be able to access memory directly. Original pascal did not even have pointer arithmetic.
This is how I would use an array:
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
ArrayPtr : PByteArray Absolute P;
I : Integer;
begin
For I := 0 to Length-1 do
// do stuff against ArrayPtr^[I]
end;
But if performance matters, I would write it like this
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
EndOfMemoryBlock: PByte;
begin
EndOfMemoryBlock := PByte(Int_Ptr(Data)+Length);
While P<EndOfMemoryBlock Do begin
// do stuff against byte data here.
inc(P);
end;
end;
I'm going maintain and port to Delphi XE2 a bunch of very old Delphi code that is full of VarArrayCreate constructs to fake dynamic arrays having a lower bound that is not zero.
Drawbacks of using Variant types are:
quite a bit slower than native arrays (the code does a lot of complex financial calculations, so speed is important)
not type safe (especially when by accident a wrong var... constant is used, and the Variant system starts to do unwanted conversions or rounding)
Both could become moot if I could use dynamic arrays.
Good thing about variant arrays is that they can have non-zero lower bounds.
What I recollect is that dynamic arrays used to always start at a lower bound of zero.
Is this still true? In other words: Is it possible to have dynamic arrays start at a different bound than zero?
As an illustration a before/after example for a specific case (single dimensional, but the code is full of multi-dimensional arrays, and besides varDouble, the code also uses various other varXXX data types that TVarData allows to use):
function CalculateVector(aSV: TStrings): Variant;
var
I: Integer;
begin
Result := VarArrayCreate([1,aSV.Count-1],varDouble);
for I := 1 to aSV.Count-1 do
Result[I] := CalculateItem(aSV, I);
end;
The CalculateItem function returns Double. Bounds are from 1 to aSV.Count-1.
Current replacement is like this, trading the space zeroth element of Result for improved compile time checking:
type
TVector = array of Double;
function CalculateVector(aSV: TStrings): TVector;
var
I: Integer;
begin
SetLength(Result, aSV.Count); // lower bound is zero, we start at 1 so we ignore the zeroth element
for I := 1 to aSV.Count-1 do
Result[I] := CalculateItem(aSV, I);
end;
Dynamic arrays always have a lower bound of 0. So, low(A) equals 0 for all dynamic arrays. This is even true for empty dynamic arrays, i.e. nil.
From the documentation:
Dynamic arrays are always integer-indexed, always starting from 0.
Having answered your direct question already, I also offer you the beginnings of a generic class that you can use in your porting.
type
TSpecifiedBoundsArray<T> = class
private
FValues: TArray<T>;
FLow: Integer;
function GetHigh: Integer;
procedure SetHigh(Value: Integer);
function GetLength: Integer;
procedure SetLength(Value: Integer);
function GetItem(Index: Integer): T;
procedure SetItem(Index: Integer; const Value: T);
public
property Low: Integer read FLow write FLow;
property High: Integer read GetHigh write SetHigh;
property Length: Integer read GetLength write SetLength;
property Items[Index: Integer]: T read GetItem write SetItem; default;
end;
{ TSpecifiedBoundsArray<T> }
function TSpecifiedBoundsArray<T>.GetHigh: Integer;
begin
Result := FLow+System.High(FValues);
end;
procedure TSpecifiedBoundsArray<T>.SetHigh(Value: Integer);
begin
SetLength(FValues, 1+Value-FLow);
end;
function TSpecifiedBoundsArray<T>.GetLength: Integer;
begin
Result := System.Length(FValues);
end;
procedure TSpecifiedBoundsArray<T>.SetLength(Value: Integer);
begin
System.SetLength(FValues, Value);
end;
function TSpecifiedBoundsArray<T>.GetItem(Index: Integer): T;
begin
Result := FValues[Index-FLow];
end;
function TSpecifiedBoundsArray<T>.SetItem(Index: Integer; const Value: T);
begin
FValues[Index-FLow] := Value;
end;
I think it's pretty obvious how this works. I contemplated using a record but I consider that to be unworkable. That's down to the mix between value type semantics for FLow and reference type semantics for FValues. So, I think a class is best here.
It also behaves rather weirdly when you modify Low.
No doubt you'd want to extend this. You'd add a SetBounds, a copy to, a copy from and so on. But I think you may find it useful. It certainly shows how you can make an object that looks very much like an array with non-zero lower bound.
Normally, in Delphi one would declare a function with a variable number of arguments using the 'array of const' method. However, for compatibility with code written in C, there's an much-unknown 'varargs' directive that can be added to a function declaration (I learned this while reading Rudy's excellent 'Pitfalls of convering' document).
As an example, one could have a function in C, declared like this :
void printf(const char *fmt, ...)
In Delphi, this would become :
procedure printf(const fmt: PChar); varargs;
My question is : How can I get to the contents of the stack when implementing a method which is defined with the 'varargs' directive?
I would expect that some tooling for this exists, like Dephi translations of the va_start(), va_arg() and va_end() functions, but I can't find this anywhere.
Please help!
PS: Please don't drift off in discussions about the 'why' or the 'array of const' alternative - I need this to write C-like patches for functions inside Xbox games (see the Delphi Xbox emulator project 'Dxbx' on sourceforge for details).
OK, I see the clarification in your question to mean that you need to implement a C import in Delphi. In that case, you need to implement varargs yourself.
The basic knowledge needed is the C calling convention on the x86: the stack grows downwards, and C pushes arguments from right to left. Thus, a pointer to the last declared argument, after it is incremented by the size of the last declared argument, will point to the tail argument list. From then, it's simply a matter of reading the argument out and incrementing the pointer by an appropriate size to move deeper into the stack. The x86 stack in 32-bit mode is 4-byte aligned generally, and this also means that bytes and words are passed as 32-bit integers.
Anyhow, here's a helper record in a demo program that shows how to read out data. Note that Delphi seems to be passing Extended types in a very odd way; however, you likely won't have to worry about that, as 10-byte floats aren't generally widely used in C, and aren't even implemented in the latest MS C, IIRC.
{$apptype console}
type
TArgPtr = record
private
FArgPtr: PByte;
class function Align(Ptr: Pointer; Align: Integer): Pointer; static;
public
constructor Create(LastArg: Pointer; Size: Integer);
// Read bytes, signed words etc. using Int32
// Make an unsigned version if necessary.
function ReadInt32: Integer;
// Exact floating-point semantics depend on C compiler.
// Delphi compiler passes Extended as 10-byte float; most C
// compilers pass all floating-point values as 8-byte floats.
function ReadDouble: Double;
function ReadExtended: Extended;
function ReadPChar: PChar;
procedure ReadArg(var Arg; Size: Integer);
end;
constructor TArgPtr.Create(LastArg: Pointer; Size: Integer);
begin
FArgPtr := LastArg;
// 32-bit x86 stack is generally 4-byte aligned
FArgPtr := Align(FArgPtr + Size, 4);
end;
class function TArgPtr.Align(Ptr: Pointer; Align: Integer): Pointer;
begin
Integer(Result) := (Integer(Ptr) + Align - 1) and not (Align - 1);
end;
function TArgPtr.ReadInt32: Integer;
begin
ReadArg(Result, SizeOf(Integer));
end;
function TArgPtr.ReadDouble: Double;
begin
ReadArg(Result, SizeOf(Double));
end;
function TArgPtr.ReadExtended: Extended;
begin
ReadArg(Result, SizeOf(Extended));
end;
function TArgPtr.ReadPChar: PChar;
begin
ReadArg(Result, SizeOf(PChar));
end;
procedure TArgPtr.ReadArg(var Arg; Size: Integer);
begin
Move(FArgPtr^, Arg, Size);
FArgPtr := Align(FArgPtr + Size, 4);
end;
procedure Dump(const types: string); cdecl;
var
ap: TArgPtr;
cp: PChar;
begin
cp := PChar(types);
ap := TArgPtr.Create(#types, SizeOf(string));
while True do
begin
case cp^ of
#0:
begin
Writeln;
Exit;
end;
'i': Write(ap.ReadInt32, ' ');
'd': Write(ap.ReadDouble, ' ');
'e': Write(ap.ReadExtended, ' ');
's': Write(ap.ReadPChar, ' ');
else
Writeln('Unknown format');
Exit;
end;
Inc(cp);
end;
end;
type
PDump = procedure(const types: string) cdecl varargs;
var
MyDump: PDump;
function AsDouble(e: Extended): Double;
begin
Result := e;
end;
function AsSingle(e: Extended): Single;
begin
Result := e;
end;
procedure Go;
begin
MyDump := #Dump;
MyDump('iii', 10, 20, 30);
MyDump('sss', 'foo', 'bar', 'baz');
// Looks like Delphi passes Extended in byte-aligned
// stack offset, very strange; thus this doesn't work.
MyDump('e', 2.0);
// These two are more reliable.
MyDump('d', AsDouble(2));
// Singles passed as 8-byte floats.
MyDump('d', AsSingle(2));
end;
begin
Go;
end.
I found this (from a guy we know :))
To write this stuff properly you'll need to use BASM, Delphi's built in
assembler, and code the call sequence in asm. Hopefully you've got a good
idea of what you need to do. Perhaps a post in the .basm group will help if
you get stuck.
Delphi doesn't let you implement a varargs routine. It only works for importing external cdecl functions that use this.
Since varargs is based on the cdecl calling convention, you basically need to reimplement it yourself in Delphi, using assembly and/or various kinds of pointer manipulation.