I have several versions of my data in record stored on the disk:
TRec_v1 = record
Type: UInt32;
DT1: TDateTime;
end;
TRec_v2 = record
Type: UInt32;
DT1: TDateTime;
DT2: TDateTime;
end;
TRec_v3 = record
Type: UInt32;
DT1: TDateTime;
DT2: TDateTime;
DT3: TDateTime;
end;
Which is the fasted method to read it?
Currently I use this method:
var
Rec: TRec_v3;
Rec1: TRec_v1;
Rec2: TRec_v2;
FStream := TFileStream.Create(RecPath, fmOpenRead);
try
if FStream.Size = SizeOf(TRec_v1) then
// read to Rec1, assignt to Rec
else
if FStream.Size = SizeOf(TRec_v2) then
// read to Rec2, assigne to Rec
else
if FStream.Size = SizeOf(TRec_v3) then
// read to Rec
finally
FStream.Free;
end;
Note: every newer version contain all fields from previous version + new fields
If there is only one record stored in the file, you can use a case statement instead of a series of if statements. And since your newer records contain the same fields as your older records, you don't need separate variables, either:
var
Rec: TRec_v3;
RecSize: Integer;
FStream := TFileStream.Create(RecPath, fmOpenRead);
try
RecSize := FStream.Size;
case RecSize of
SizeOf(TRec_v1),
SizeOf(TRec_v2),
SizeOf(TRec_v3):
begin
FStream.ReadBuffer(Rec, RecSize);
end;
else
raise Exception.Create('Unsupported record size detected');
end;
finally
FStream.Free;
end;
// use Rec fields depending on RecSize...
Alternatively:
type
TRec_v1 = record
Type: UInt32;
DT1: TDateTime;
end;
TRec_v2 = record
Type: UInt32;
DT1: TDateTime;
DT2: TDateTime;
end;
TRec_v3 = record
Type: UInt32;
DT1: TDateTime;
DT2: TDateTime;
DT3: TDateTime;
end;
TRec = record
case Integer of
0: (v1: TRec_v1);
1: (v2: TRec_v2);
2: (v3: TRec_v3);
end;
var
Rec: TRec;
RecSize: Integer;
FStream := TFileStream.Create(RecPath, fmOpenRead);
try
RecSize := FStream.Size;
case RecSize of
SizeOf(TRec_v1),
SizeOf(TRec_v2),
SizeOf(TRec_v3):
begin
FStream.ReadBuffer(Rec, RecSize);
end;
else
raise Exception.Create('Unsupported record size detected');
end;
finally
FStream.Free;
end;
// use Rec.v1, Rec.v2, or Rec.v3 depending on RecSize...
Which is the fastest method to read it?
The performance of the code to read the record will be completely dominated by the file access. The majority of the time is spent opening the file, as written in the question.
Using a case statement or if statements is simply a matter of preference and will not lead to observable performance changes.
If this code is buried in a greater whole, then I don't think anyone can advise on the performance without a clear sight of that greater code.
Given the code in the question, the only scope for improving the performance in a measurable way is to evaluate the stream size one time only rather than multiple times.
var
Size: Int64;
....
Size := Stream.Size;
// test Size
Even here I doubt you will see a discernible performance impact. However, it is better not to repeat yourself, as a general rule, and this change results in better factored code.
You must measure performance to assess a proposed optimization.
Finally, your entire approach is brittle. If you add an integer to the v3 structure the record size is increased by 8, with padding due to alignment. Add another integer and the size doesn't change, that second integer fitting in the padding. Discriminating based on the type field would be more robust and extendable.
I would recommend creating, reading and writing a variant record, then differentiating between them with a tag:
type recordTypeName = record
fieldList1: type1;
...
fieldListn: typen;
case tag: ordinalType of
constantList1: (variant1);
...
constantListn: (variantn);
end;
Related
While coding a test program I ran across this sample code:
var MyData: TArray<TDataRec>;
procedure AppendData(AItem: TOrderByCustomer);
var i: Integer;
tmp: TArray<TOrderByCustomer>;
begin
SetLength(tmp, Length(MyData)+1);
for i:=0 to High(MyData) do
tmp[i]:=MyData[i];
tmp[Length(tmp)-1]:=AItem;
MyData:=tmp;
end;
In the past I have simply used SetLength() to change the length of the actual array. Why did the author of this sample create a temporary array and then assign all of the values to it and copy it back to the original array? Is there a good reason for this, or was he just being weird?
There is no good reason for this code. It introduces the potential overhead of needless copying. The function should be written like this:
procedure AppendData(const AItem: TOrderByCustomer);
var
N: Integer;
begin
N := Length(MyData);
SetLength(MyData, N + 1);
MyData[N] := AItem;
end;
I use Delphi 10.1 Berlin in Windows 10.
I have two records of different sizes. I wrote code to loop through two TList<T> of these records to test elapsed times. Looping through the list of the larger record runs much slower.
Can anyone explain the reason, and provide a solution to make the loop run faster?
type
tTestRecord1 = record
Field1: array[0..4] of Integer;
Field2: array[0..4] of Extended;
Field3: string;
end;
tTestRecord2 = record
Field1: array[0..4999] of Integer;
Field2: array[0..4999] of Extended;
Field3: string;
end;
procedure TForm1.Button1Click(Sender: TObject);
var
_List: TList<tTestRecord1>;
_Record: tTestRecord1;
_Time: TTime;
i: Integer;
begin
_List := TList<tTestRecord1>.Create;
for i := 0 to 4999 do
begin
_List.Add(_Record);
end;
_Time := Time;
for i := 0 to 4999 do
begin
if _List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Button1.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.000
_List.Free;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
_List: TList<tTestRecord2>;
_Record: tTestRecord2;
_Time: TTime;
i: Integer;
begin
_List := TList<tTestRecord2>.Create;
for i := 0 to 4999 do
begin
_List.Add(_Record);
end;
_Time := Time;
for i := 0 to 4999 do
begin
if _List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Button2.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.045
_List.Free;
end;
First of all, I want to consider the entire code, even the code that populates the list which I do realise you have not timed. Because the second record is larger in size more memory needs to be copied when you make an assignment of that record type. Further when you read from the list the larger record is less cache friendly than the smaller record which impacts performance. This latter effect is likely less significant than the former.
Related to this is that as you add items the list's internal array of records has to be resized. Sometimes the resizing leads to a reallocation that cannot be performed in-place. When that happens a new block of memory is allocated and the previous content is copied to this new block. That copy is clearly ore expensive for the larger record. You can mitigate this by allocating the array once up front if you know it's length. The list Capacity is the mechanism to use. Of course, not always will you know the length ahead of time.
Your program does very little beyond memory allocation and memory access. Hence the performance of these memory operations dominates.
Now, your timing is only of the code that reads from the lists. So the memory copy performance difference on population is not part of the benchmarking that you performed. Your timing differences are mainly down to excessive memory copy when reading, as I will explain below.
Consider this code:
if _List[i].Field3 = 'abcde' then
Because _List[i] is a record, a value type, the entire record is copied to an implicit hidden local variable. The code is actually equivalent to:
var
tmp: tTestRecord2;
...
tmp := _List[i]; // copy of entire record
if tmp.Field3 = 'abcde' then
There are a few ways to avoid this copy:
Change the underlying type to be a reference type. This changes the memory management requirements. And you may have good reason to want to use a value type.
Use a container class that can return the address of an item rather than a copy of an item.
Switch from TList<T> to dynamic array TArray<T>. That simple change will allow the compiler to access individual fields directly without copying entire records.
Use the TList<T>.List to obtain access to the list object's underlying array holding the data. That would have the same effect as the previous item.
Item 4 above is the simplest change you could make to see a large difference. You would replace
if _List[i].Field3 = 'abcde' then
with
if _List.List[i].Field3 = 'abcde' then
and that should yield a very significant change in performance.
Consider this program:
{$APPTYPE CONSOLE}
uses
System.Diagnostics,
System.Generics.Collections;
type
tTestRecord2 = record
Field1: array[0..4999] of Integer;
Field2: array[0..4999] of Extended;
Field3: string;
end;
procedure Main;
const
N = 100000;
var
i: Integer;
Stopwatch: TStopwatch;
List: TList<tTestRecord2>;
Rec: tTestRecord2;
begin
List := TList<tTestRecord2>.Create;
List.Capacity := N;
for i := 0 to N-1 do
begin
List.Add(Rec);
end;
Stopwatch := TStopwatch.StartNew;
for i := 0 to N-1 do
begin
if List[i].Field3 = 'abcde' then
begin
Break;
end;
end;
Writeln(Stopwatch.ElapsedMilliseconds);
end;
begin
Main;
Readln;
end.
I had to compile it for 64 bit to avoid an out of memory condition. The output on my machine is around 700. Change List[i].Field3 to List.List[i].Field3 and the output is in single figures. The timing is rather crude, but I think this demonstrates the point.
The issue of the large record not being cache friendly remains. That is more complicated to deal with and would require a detailed analysis of how the real world code operated on its data.
As an aside, if you care about performance then you won't use Extended. Because it has size 10, not a power of two, memory access is frequently mis-aligned. Use Double or Real which is an alias to Double.
As the topic indicates above, I'm wondering if there's a good example of a clean and efficient way to handle pointers as passed in function parms when processing the data sequentially. What I have is something like:
function myfunc(inptr: pointer; inptrsize: longint): boolean;
var
inproc: pointer;
i: integer;
begin
inproc := inptr;
for i := 1 to inptrsize do
begin
// do stuff against byte data here.
inc(longint(inproc), 1);
end;
end;
The idea is that instead of finite pieces of data, I want it to be able to process whatever is pushed its way, no matter the size.
Now when it comes to processing the data, I've figured out a couple of ways to do it successfully.
Assign the parm pointers to identical temporary pointers, then use those to access each piece of data, incrementing them to move on. This method is quickest, but not very clean looking with all the pointer increments spread all over the code. (this is what I'm talking about above)
Assign the parm pointers to a pointer representing a big array value and then incremently process that using standard table logic. Much cleaner, but about 500 ms slower than #1.
Is there another way to efficiently handle processing pointers in this way, or is there some method I'm missing that will both be clean and not time inefficient?
Your code here is basically fine. I would always choose to increment a pointer than cast to a fake array.
But you should not cast to an integer. That is semantically wrong and you'll pay the penalty anytime you compile on a platform that has pointer size different from your integer size. Always use a pointer to an element of the right size. In this case a pointer to byte.
function MyFunc(Data: PByte; Length: Integer): Boolean;
var
i: Integer;
begin
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Unless the compiler is having a really bad day, you won't find it easy to get better performing code than this. What's more, I think this style is actually rather clear and easy to understand. Most of the clarity gain comes in avoiding the need to cast. Always strive to remove casts from your code.
If you want to allow any pointer type to be passed then you can write it like this:
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
i: Integer;
Data: PByte;
begin
Data := P;
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Or if you want to avoid pointers in the interface, then use an untyped const parameter.
function MyFunc(const Buffer; Length: Integer): Boolean;
var
i: Integer;
Data: PByte;
begin
Data := PByte(#Buffer);
for i := 1 to Length do
begin
// do stuff against byte data here.
inc(Data);
end;
end;
Use a var parameter if you need to modify the buffer.
I have a different opinion: For sake of readability I would use an array. Pascal was not designed to be able to access memory directly. Original pascal did not even have pointer arithmetic.
This is how I would use an array:
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
ArrayPtr : PByteArray Absolute P;
I : Integer;
begin
For I := 0 to Length-1 do
// do stuff against ArrayPtr^[I]
end;
But if performance matters, I would write it like this
function MyFunc(P: Pointer; Length: Integer): Boolean;
var
EndOfMemoryBlock: PByte;
begin
EndOfMemoryBlock := PByte(Int_Ptr(Data)+Length);
While P<EndOfMemoryBlock Do begin
// do stuff against byte data here.
inc(P);
end;
end;
So I'm working in Delphi 2007 and I am cleaning up my code. I have come to notice that in a great many procedures I declare a number of different variables of the same type.
for example the one procedure I am looking at now I declare 4 different string lists and I have to type var1 := TStringList.Create for each one.
I had the idea to make a procedure that took in an open array of variables, my list of 4 variables and then create them all. The call would be something like this
CreateStringLists([var1,var2,var3,var4]);
But as to my knowledge you cannot pass the open array by reference and therefore not do what I was hoping to. Does anyone have any interesting ideas about this?
Often in refactoring you need to take a very wide view of the code. Why "cleanup" a couple of operations like this, when most likely you shouldn't be doing any of these operations at all?
In this case, it seems suspicous to me that you have one routine that needs to deal with 4 separate string lists. That doesn't seem very likely to have good cohesion. Perhaps instead it should be one string list-handling routine called four times. So I'd really like to see the entire routine, rather than comment on how to make this one nit in it prettier.
You can do anything (or nearly anything) with Delphi. I don't recommend the following code to use, just to know that the trick is possible:
type
PStringList = ^TStringList;
procedure CreateStringLists(const SL: array of PStringList);
var
I: Integer;
begin
for I:= 0 to High(SL) do begin
SL[I]^:= TStringList.Create;
end;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
SL1, SL2, SL3: TStringList;
begin
CreateStringLists([#SL1, #SL2, #SL3]);
SL3.Add('123');
Caption:= SL3[0];
SL1.Free;
SL2.Free;
SL3.Free;
end;
Actually, what's the problem with 4 constructors?
If it makes sense in your context, you can aggregate declarations inside a specialized TObjectList.
type
TMyList<T:class,constructor> = class(TObjectList<T>)
public
procedure CreateItems(const ACount : integer);
end;
procedure TMyList<T>.CreateItems(const ACount: integer);
var
Index: Integer;
begin
for Index := 0 to (ACount - 1) do Add(T.Create);
end;
// Test procedure
procedure TestMe;
var
MyStringsList : TMyList<TStringList>;
begin
MyStringsList := TMyList<TStringList>.Create(True);
MyStringsList.CreateItems(10);
// ...
FreeAndNil(MyStringsList);
end;
So you can specialized your list.
You could create a series of overloaded versions with 2, 3, 4 etc. parameters. For example:
procedure CreateStringLists(var L1, L2: TStringList); overload;
procedure CreateStringLists(var L1, L2, L3: TStringList); overload;
procedure CreateStringLists(var L1, L2, L3, L4: TStringList); overload;
procedure CreateStringLists(var L1, L2: TStringList);
begin
L1 := nil;
L2 := nil;
Try
L1 := TStringList.Create;
L2 := TStringList.Create;
Except
FreeAndNil(L2);
FreeAndNil(L1);
raise;
End;
end;
// etc.
If I were doing this, I'd write a script to generate the code.
As an aside, in my own code, I would write InitialiseNil(L1, L2) at the start of that function, and FreeAndNil(L2, L1) in the exception handler. InitialiseNil and FreeAndNil are functions generated by a very simple Python script that is included in the codebase as a comment so that it can be re-run. A routine like CreareStringLists as defined above is only useful if you have a matching routine to free them all in one shot. This allows you to write:
CreateStringLists(L1, L2);
Try
// do stuff with L1, L2
Finally
FreeAndNil(L2, L1);
End;
Finally, I'm not saying that I would necessarily do this, but this is meant as a naive and direct answer to the question. As #T.E.D. states, the need to do this suggests deeper problems in the codebase.
I try to write a kind of object/record serializer with Delphi 2010 and wonder if there is a way to detect, if a record is a variant record. E.g. the TRect record as defined in Types.pas:
TRect = record
case Integer of
0: (Left, Top, Right, Bottom: Longint);
1: (TopLeft, BottomRight: TPoint);
end;
As my serializer should work recursively on my data structures, it will descent on the TPoint records and generate redundant information in my serialized file. Is there a way to avoid this, by getting detailed information on the record?
One solution could be as follows:
procedure SerializeRecord (RttiRecord : TRttiRecord)
var
AField : TRttiField;
Offset : Integer;
begin
Offset := 0;
for AField in RttiRecord.Fields do
begin
if AField.Offset < Offset then Exit;
Offset := AField.Offset; //store last offset
SerializeField (AField);
end;
end;
But this solution is not a proper solution for all cases. It only works for serialization, if the different variants contain the same information and the same types. If you have something like the following (from wikipedia.org):
type
TVarRec = packed record
case Byte of
0: (FByte: Byte;
FDouble: Double);
1: (FStr: ShortString);
end;
Would you serialize
FByte=6
FDouble=1.81630607010916E-0310
or would it be better to serialize
FStr=Hello!
Yes, for sure, this would also be the same for a computer but not for a file which should be readable or even editable for humans.
So I think, the only way to solve the problem is using an Attribute, to define, which variant should be used for serialization.