Passing open array into an anonymous function - delphi

What is the least wasteful way (i.e. avoiding copying if at all possible) to pass the content of an open string array into an anonymous function and from there into another function that expects an open array?
The problem is that open arrays cannot be captured in anonymous functions in Delphi XE2.
This illustrates the problem:
procedure TMyClass.DoSomething(const aStrings: array of string);
begin
EnumItems(
function (aItem: string) : Boolean
begin
Result := IndexText(aItem, aStrings) >= 0;
end);
end;
The compiler complains: "Cannot capture symbol 'aStrings'".
An obvious solution is to make a copy of aStrings in a dynamic array and capture that. But I don't want to make a copy. (While my specific problem involves a string array and making a copy would only copy the pointers not the string data itself due to reference counting, it would also be instructive to learn how to solve the problem for an arbitrarily large array of a non-reference counted type.)
So I tried capturing instead a PString pointer to the first string in aStrings and an Integer value of the length. But then I couldn't figure out a way to pass these to InsertText.
One other constraint: I want it to be possible to call DoSomething([a, b, c]).
Is there a way to do this without making a copy of the array, and without writing my own version of IndexText, and without being hideously ugly? If so, how?
For the sake of this question I've used IndexText, but it would be instructive to find a solution for a function that could not be trivially rewritten to accept a pointer and length parameter instead of an open array.
An acceptable answer to this question would be: No, you can't do that (at least not without making a copy or rewriting IndexText) though if so I'd also like to know the fundamental reason why not.

If you don't want to copy the array then you should change the signature of DoSomething to take a TArray<string> instead. You of course have to change the caller side if you are passing the elements directly (only since XE7 you can pass dynamic arrays in the same way) - like DoSomething([a, b, c]) i mean.
My advice is not to mess around with some internal pointers and stuff, especially not for an open array.

There's no way to do this without making a copy. Open arrays cannot be captured as you have found, and you cannot get the information into the anonymous method without capture. You must capture, in general, because you need to extend the life of the variable.
So, you cannot do this with an open array and avoid a copy. You could instead:
Switch from open array to a dynamic array, TArray<string>.
Make a copy of the array. You would not be copying the string data, just the array of references to the strings.

Related

String lifetime management, in records

I am working on getting rid of shortstring.
One of the many places shortstring is currently used within our programs is in records.
Alot of these records are kept in AVL trees.
The AVL tree used is a generic one, holding a pointer to a number of bytes (ElemSize), which have worked well so far.
The memory for each record in the AVL tree is allocated with GetMem, and copied with Move.
However, with string being a pointer to a reference-counted structure, copying back the memory to a record no longer works, as the sting referenced is often freed (automatically by reference count).
With only a pointer and a size of the "data block", I assume it is not possible to have the reference count of the strings increased.
I'm looking for a way to get the reference count of the stings to be taken into account when storing the record in a AVL tree.
Can I pass the record type to the tree constructor, then cast the pointer to this type and thus get the references increased? Or a similar fix, where I can isolate the changes to primarily be in the AVL unit and calls to it's constructor.
Current code for allocation of space to store the record in AVL; XData is a pointer to the record to be stored:
New(RootPtr); { create new memory space }
GetMem(RootPtr^.TreeData, ElemSize);
WITH RootPtr^ DO BEGIN
{ copy data }
Move(XData^, RootPtr^.TreeData^, ElemSize);
In essence the question you are asking is:
How can I allocate, copy and deallocate a record when all I know about its type is its size?
The simple answer is that you can use GetMem, Move and FreeMem provided that the record does not contain managed types. You wish to work with records that contain Delphi strings, which are managed. And so your current approach using GetMem and Move does not suffice.
There are plenty of ways to solve this. You could write your own code to do reference counting, so long as you knew where in the record the managed types were. I don't recommend this. You could make your user data be a class and use polymorphism to help.
The option I'd like to discuss continues to support records and indeed allows the user to choose whatever type they like. The reasoning is as follows:
If the type contains managed types, then operating on it requires knowledge of the type. If the tree is to be generic, then it cannot have that knowledge. Ergo, the knowledge must be supplied by the user of the tree.
This leads you to events. Let the tree offer events that the user can supply handlers for. The types would look like this:
type
PTreeNodeUserData = type Pointer;
TTreeNodeCreateUserDataEvent = function: PTreeNodeUserData of object;
TTreeNodeDestroyUserDataEvent = procedure(Data: PTreeNodeUserData) of object;
TTreeNodeCopyUserDataEvent = procedure(Source, Dest: PTreeNodeUserData) of object;
Then you can arrange for your tree to publish events with these types that the user can subscribe to.
The point being that this allows the user of the tree to supply the missing knowledge about the user data type.
One of the main benefits of using records is the simplicity with which they can be copied (without using Move). So your best solution is to simply replace Move with a normal assignment operator :=. This will correctly consider the reference counts for all managed types involved.
Is there a particular reason you're not using the normal assignment operator?
PS: You need to ensure that the memory for all managed types (including long strings) is correctly initialised and finalised. I suggest you do some additional reading on the Initialize and Finalize routines.
The tree is general, it can hold a given lump of data. I hoped I could extend the functionality without making a new tree class per record.
In that case you need your "copy behaviour" to be variable depending on what it's working with. As couple of options:
If your tree is wrapped in a class you can easily modify it to use a callback event to perform the copy operation. (This option might be easiest even if you first have to work on encapsulating the tree in a class.)
Modify your nodes and/or data to be objects with polymorphic copy functionality. Then each subtype will know how to copy itself correctly, and you can write something along the lines of Root.TreeData := XData.CreateCopy;
If you are working at such a low level, and don't want compiler to help you, then you need to use PChar-strings instead of regular strings.

Delphi: how to dynamically "split" a string into substrings according to a (dynamic) mask

This is my situation: I have a text file containing a lot of equal-length strings representing records to be loaded into an SQL DB table, so I'll have to generate SQL code from these strings.
I than have a table on that DB (let's call it the "formatting table") that tells me how the strings are formatted and where to load them (each record of that table contains the destination table name, field name, data position and length referred to the strings from the text file).
I have already solved that problem in a way I think it's well-known to every Delphi programmer, using the Copy(string, pos, length) function and iterating through each field, based on the informations from the "formatting table".
That works well, but it's slow, especially when we talk of source text files with a million or more of lines, each representing several tens or even hundreds of data fields.
What I'm trying to do now is to "see" the source strings in a way that they appear already splitted, avoiding the Copy() funcion that continuously create new strings copying the content from the original string, allocating and freeing memory and so on. What I'd say is "I have the whole string, let's see it in a way that represent each 'piece' (field) of it in a single step, without creating substrings from it".
What could solve my problem would be some way to define a dynamic structure like a dynamic record or a dynamic array (not what Delphy calls a dynamic array, more something like a "dynamic static array") to "superimpose" on the string in order to "watch" it from that point of view... I don't know I'm sufficiently clear on that explanation... However Delphi (from my knowledge) doesn't implements such kind of dynamic structures.
This is a piece of (static) code that does what I want, apart from the lack of dynamism.
procedure TForm1.FormCreate(Sender: TObject);
type
PDecodeStr = ^TDecodeStr;
TDecodeStr = record
s1: Array[0..3] of AnsiChar;
s2: Array[0..9] of AnsiChar;
s3: Array[0..4] of AnsiChar;
s4: Array[0..7] of AnsiChar;
s5: Array[0..2] of AnsiChar;
end;
var
cWholeStr: AnsiString;
begin
cWholeStr := '123456789012345678901234567890';
Memo1.Lines.Add(PDecodeStr(PAnsiString(cWholeStr)).s1);
Memo1.Lines.Add(PDecodeStr(PAnsiString(cWholeStr)).s2);
Memo1.Lines.Add(PDecodeStr(PAnsiString(cWholeStr)).s3);
Memo1.Lines.Add(PDecodeStr(PAnsiString(cWholeStr)).s4);
Memo1.Lines.Add(PDecodeStr(PAnsiString(cWholeStr)).s5);
end;
Any idea on how to solve this problem?
Thanks in advance.
You can't really avoid creating extra strings. Your example at the end of your question creates strings.
Memo1.Lines.Add(PDecodeStr(PAnsiString(cWholeStr)).s1)
Your call to TStrings.Add() in this code creates a dynamic string implicitly from the parameter you pass and then this string is passed to Add().
The solution with Copy is probably the way to go since I don't see any easy way to avoid the copying of memory if you wish to do anything with the split strings.
I think that there is not a very more efficient, in Delphi, way than to use Copy.
But another solution is to load the all strings directly in a one column temporay table and, after, make the spilt with a SQL query.
The total time is depending on a lot of parameters so the best way is to test !!

Are there any optimisations for retrieving of return value in Delphi?

I'm trying to find an elegant way to access the fields of some objects in some other part of my program through the use of a record that stores a byte and accesses fields of another record through the use of functions with the same name as the record's fields.
TAilmentP = Record // actually a number but acts like a pointer
private
Ordinal: Byte;
public
function Name: String; inline;
function Description: String; inline;
class operator Implicit (const Number: Byte): TAilmentP; inline;
End;
TSkill = Class
Name: String;
Power: Word;
Ailment: TAilmentP;
End;
class operator TAilmentP.Implicit (const Number: Byte): TAilmentP;
begin
Result.Ordinal := Number;
ShowMessage (IntToStr (Integer (#Result))); // for release builds
end;
function StrToAilment (const S: String): TAilmentP; // inside same unit
var i: Byte;
begin
for i := 0 to Length (Ailments) - 1 do
if Ailments [i].Name = S then
begin
ShowMessage (IntToStr (Integer (#Result))); // for release builds
Result := i; // uses the Implicit operator
Exit;
end;
raise Exception.Create ('"' + S + '" is not a valid Ailment"');
end;
Now, I was trying to make my life easier by overloading the conversion operator so that when I try to assign a byte to a TAilmentP object, it assigns that to the Ordinal field.
However, as I've checked, it seems that this attempt is actually costly in terms of performance since any call to the implicit "operator" will create a new TAilmentP object for the return value, do its business, and then return the value and make a byte-wise copy back into the object that called it, as the addresses differ.
My code calls this method quite a lot, to be honest, and it seems like this is slower than just assigning my value directly to the Ordinal field of my object.
Is there any way to make my program actually assign the value directly to my field through the use of ANY method/function? Even inlining doesn't seem to work. Is there a way to return a reference to a (record) variable, rather than an object itself?
Finally (and sorry for being off topic a bit), why is operator overloading done through static functions? Wouldn't making them instance methods make it faster since you can access object fields without dereferencing them? This would really come in handy here and other parts of my code.
[EDIT] This is the assembler code for the Implicit operator with all optimizations on and no debugging features (not even "Debug Information" for breakpoints).
add al, [eax] /* function entry */
push ecx
mov [esp], al /* copies Byte parameter to memory */
mov eax, [esp] /* copies stored Byte back to register; function exit */
pop edx
ret
What's even funnier is that the next function has a mov eax, eax instruction at start-up. Now that looks really useful. :P Oh yeah, and my Implicit operator didn't get inlined either.
I'm pretty much convinced [esp] is the Result variable, as it has a different address than what I'm assigning to. With optimizations off, [esp] is replaced with [ebp-$01] (what I assigning to) and [ebp-$02] (the Byte parameter), one more instruction is added to move [ebp-$02] into AL (which then puts it in [ebp-$01]) and the redundant mov instruction is still there with [epb-$02].
Am I doing something wrong, or does Delphi not have return-value optimizations?
Return types — even records — that will fit in a register are returned via a register. It's only larger types that are internally transformed into "out" parameters that get passed to the function by reference.
The size of your record is 1. Making a copy of your record is just as fast as making a copy of an ordinary Byte.
The code you've added for observing the addresses of your Result variables is actually hurting the optimizer. If you don't ask for the address of the variable, then the compiler is not required to allocate any memory for it. The variable could exist only in a register. When you ask for the address, the compiler needs to allocate stack memory so that it has an address to give you.
Get rid of your "release mode" code and instead observe the compiler's work in the CPU window. You should be able to observe how your record exists primarily in registers. The Implicit operator might even compile down to a no-op since the input and output registers should both be EAX.
Whether operators are instance methods or static doesn't make much difference, especially not in terms of performance. Instance methods still receive a reference to the instance they're called on. It's just a matter of whether the reference has a name you choose or whether it's called Self and passed implicitly. Although you wouldn't have to write "Self." in front of your field accesses, the Self variable still needs to get dereferenced just like the parameters of a static operator method.
All I'll say about optimizations in other languages is that you should look up the term named return-value optimization, or its abbreviation NRVO. It's been covered on Stack Overflow before. It has nothing to do with inlining.
Delphi is supposed to optimize return assignment by using pointers. This is also true for C++ and other OOP compiled languages. I stopped writing Pascal before operator overloading was introduced, so my knowledge is a bit dated. What follows is what I would try:
What I'm thinking is this... can you create an object on the heap (use New) and pass a pointer back from your "Implicit" method? This should avoid unnecessary overhead, but will cause you to deal with the return value as a pointer. Overload your methods to deal with pointer types?
I'm not sure if you can do it this with the built-in operator overloading. Like I mentioned, overloading is something I wanted in Pascal for nearly a decade and never got to play with. I think it's worth a shot. You might need to accept that you'll must kill your dreams of elegant type casting.
There are some caveats with inlining. You probably already know that the hint is disabled (by default) for debug builds. You need to be in release mode to profile / benchmark or modify your build settings. If you haven't gone into release mode (or altered build settings) yet, it's likely that your inline hints are being ignored.
Be sure to use const to hint the compiler to optimize further. Even if it doesn't work for your case, it's a great practice to get into. Marking what should not change will prevent all kinds of disasters... and additionally give the compiler the opportunity to aggressively optimize.
Man, I wish I know if Delphi allowed cross-unit inlining by now, but I simply don't. Many C++ compilers only inline within the same source code file, unless you put the code in the header (headers have no correlate in Pascal). It's worth a search or two. Try to make inlined functions / methods local to their callers, if you can. It'll at least help compile time, if not more.
All out of ideas. Hopefully, this meandering helps.
Now that I think about it, maybe it's absolutely necessary to have the return value in a different memory space and copied back into the one being assigned to.
I'm thinking of the cases where the return value may need to be de-allocated, like for example calling a function that accepts a TAilmentP parameter with a Byte value... I don't think you can directly assign to the function's parameters since it hasn't been created yet, and fixing that would break the normal and established way of generating function calls in assembler (i.e.: trying to access a parameter's fields before it's created is a no-no, so you have to create that parameter before that, then assign to it what you have to assign OUTSIDE a constructor and then call the function in assembler).
This is especially true and obvious for the other operators (with which you could evaluate expressions and thus need to create temporary objects), just not so much this one since you'd think it's like the assignment operator in other languages (like in C++, which can be an instance member), but it's actually much more than that - it's a constructor as well.
For example
procedure ShowAilmentName (Ailment: TAilmentP);
begin
ShowMessage (Ailment.Name);
end;
[...]
begin
ShowAilmentName (5);
end.
Yes, the implicit operator can do that too, which is quite cool. :D
In this case, I'm thinking that 5, like any other Byte, would be converted into a TAilmentP (as in creating a new TAilmentP object based on that Byte) given the implicit operator, the object then being copied byte-wise into the Ailment parameter, then the function body is entered, does it's job and on return the temporary TAilmentP object obtained from conversion is destroyed.
This is even more obvious if Ailment would be const, since it would have to be a reference, and constant one too (no modifying after the function was called).
In C++, the assignment operator would have no business with function calls. Instead, one could've used a constructor for TAilmentP which accepts a Byte parameter. The same can be done in Delphi, and I suspect it would take precedence over the implicit operator, however what C++ doesn't support but Delphi does is to have down-conversion to primitive types (Byte, Integer, etc.) since the operators are overloaded using class operators. Thus, a procedure like "procedure ShowAilmentName (Number: Byte);" would never be able to accept a call like "ShowAilmentName (SomeAilment)" in C++, but in Delphi it can.
So, I guess this is a side-effect of the Implicit operator also being like a constructor, and this is necessary since records can not have prototypes (thus you could not convert both one way and the other between two records by just using constructors).
Anyone else think this might be the cause?

How does WriteLn() really work?

Since the age of the dinosaurs, Turbo Pascal and nowadays Delphi have a Write() and WriteLn() procedure that quietly do some neat stuff.
The number of parameters is variable;
Each variable can be of all sorts of types; you can supply integers, doubles, strings, booleans, and mix them all up in any order;
You can provide additional parameters for each argument:
Write('Hello':10,'World!':7); // alignment parameters
It even shows up in a special way in the code-completion drowdown:
Write ([var F:File]; P1; [...,PN] )
WriteLn ([var F:File]; [ P1; [...,PN]] )
Now that I was typing this I've noticed that Write and WriteLn don't have the same brackets in the code completion dropdown. Therefore it looks like this was not automatically generated, but it was hard-coded by someone.
Anyway, am I able to write procedures like these myself, or is all of this some magic hardcoded compiler trickery?
Writeln is what we call a compiler "magic" function. If you look in System.pas, you won't find a Writeln that is declared anything like what you would expect. The compiler literally breaks it all down into individual calls to various special runtime library functions.
In short, there is no way to implement your own version that does all the same things as the built-in writeln without modifying the compiler.
As the Allen said you can't write your own function that does all the same things.
You can, however, write a textfile driver that does something custom and when use standard Write(ln) to write to your textfile driver. We did that in ye old DOS days :)
("Driver" in the context of the previous statement is just a piece of Pascal code that is hooked into the system by switching a pointer in the System unit IIRC. Been a long time since I last used this trick.)
As far as I know, the pascal standards don't include variable arguments.
Having said that, IIRC, GNU Pascal let's you say something like:
Procecdure Foo(a: Integer; b: Integer; ...);
Try searching in your compiler's language docs for "Variable Argument Lists" or "conformant arrays". Here's an example of the later: http://www.gnu-pascal.de/demos/conformantdemo.pas.
As the prev poster said, writeln() is magic. I think the problem has to do with how the stack is assembled in a pascal function, but it's been a real long time since I've thought about where things were on the stack :)
However, unless you're writing the "writeln" function (which is already written), you probably don't need to implement a procedure with a variable arguments. Try iteration or recursion instead :)
It is magic compiler behaviour rather than regular procedure. And no, there is no way to write such subroutines (unfortunately!). Code generation resolves count of actual parameters and their types and translates to appropriate RTL calls (eg. Str()) at compile time. This opposes frequently suggested array of const (single variant array formal parameter, actually) which leads to doing the same at runtime. I'm finding later approach clumsy, it impairs code readability somewhat, and Bugland (Borland/Inprise/Codegear/Embarcadero/name it) broke Code Insight for variant open array constructors (yes, i do care, i use OutputDebugString(PChar(Format('...', [...])))) and code completion does not work properly (or at all) there.
So, closest possible way to simulate magic behaviour is to declare lot of overloaded subroutines (really lot of them, one per specific formal parameter type in the specific position). One could call this a kludge too, but this is the only way to get flexibility of variable parameter list and can be hidden in the separate module.
PS: i left out format specifiers aside intentionally, because syntax doesn't allow to semicolons use where Str(), Write() and Writeln() are accepting them.
Yes, you can do it in Delphi and friends (e.g. free pascal, Kylix, etc.) but not in more "standard" pascals. Look up variant open array parameters, which are used with a syntax something like this:
procedure MyProc(args : array of const);
(it's been a few years and I don't have manuals hand, so check the details before proceeding). This gives you an open array of TVarData (or something like that) that you can extract RTTI from.
One note though: I don't think you'll be able to match the x:y formatting syntax (that is special), and will probably have to go with a slightly more verbose wrapper.
Most is already said, but I like to add a few things.
First you can use the Format function. It is great to convert almost any kind of variable to string and control its size. Although it has its flaws:
myvar := 1;
while myvar<10000 do begin
Memo.Lines.Add(Format('(%3d)', [myVar]));
myvar := myvar * 10;
end;
Produces:
( 1)
( 10)
(100)
(1000)
So the size is the minimal size (just like the :x:y construction).
To get a minimal amount of variable arguments, you can work with default parameters and overloaded functions:
procedure WriteSome(const A1: string; const A2: string = ''; const A3: string = '');
or
procedure WriteSome(const A1: string); overload;
procedure WriteSome(const A1: Integer); overload;
You cannot write your own write/writeln in old Pascal. They are generated by the compiler, formatting, justification, etc. That's why some programmers like C language, even the flexible standard functions e.g. printf, scanf, can be implemented by any competent programmers.
You can even create an identical printf function for C if you are inclined to create something more performant than the one implemented by the C vendor. There's no magic trickery in them, your code just need to "walk" the variable arguments.
P.S.
But as MarkusQ have pointed out, some variants of Pascal(Free Pascal, Kylix, etc) can facilitate variable arguments. I last tinker with Pascal, since DOS days, Turbo Pascal 7.
Writeln is not "array of const" based, but decomposed by the compiler into various calls that convert the arguments to string and then call the primitive writestring. The "LN" is just a function that writes the lineending as a string. (OS dependant). The procedure variables (function pointers) for the primitives are part of the file type (Textrec/filerec), which is why they can be customized. (e.g. AssignCrt in TP)
If {$I+} mode is on, after each element, a call to the iocheck function is made.
The GPC construct made above is afaik the boundless C open array. FPC (and afaik Delphi too) support this too, but with different syntax.
procedure somehting (a:array of const);cdecl;
will be converted to be ABI compatible to C, printf style. This means that the relevant function (somehting in this case) can't get the number of arguments, but must rely on formatstring parsing. So this is something different from array of const, which is safe.
Although not a direct answer to you question, I would like to add the following comment:
I have recently rewritten some code using Writeln(...) syntax into using a StringList, filling the 'lines' with Format(...) and just plain IntToStr(...), FloatToStr(...) functions and the like.
The main reason for this change was speed improvement. Using a StringList and SaveFileTo is much, much more quicker than the WriteLn, Write combination.
If you are writing a program which creates a lot of text files (I was working on a web site creation program), this makes a lot of difference.

Are Delphi strings immutable?

As far as I know, strings are immutable in Delphi. I kind of understand that means if you do:
string1 := 'Hello';
string1 := string1 + " World";
first string is destroyed and you get a reference to a new string "Hello World".
But what happens if you have the same string in different places around your code?
I have a string hash assigned for identifying several variables, so for example a "change" is identified by a hash value of the properties of that change. That way it's easy for me to check to "changes" for equality.
Now, each hash is computed separately (not all the properties are taken into account so that to separate instances can be equal even if they differ on some values).
The question is, how does Delphi handles those strings? If I compute to separate hashes to the same 10 byte length string, what do I get? Two memory blocks of 10 bytes or two references to the same memory block?
Clarification: A change is composed by some properties read from the database and is generated by an individual thread. The TChange class has a GetHash method that computes a hash based on some of the values (but not all) resulting on a string. Now, other threads receive the Change and have to compare it to previously processed changes so that they don't process the same (logical) change. Hence the hash and, as they have separate instances, two different strings are computed. I'm trying to determine if it'd be a real improvement to change from string to something like a 128 bit hash or it'll be just wasting my time.
Edit: Version of Delphi is Delphi 7.0
Delphi strings are copy on write. If you modify a string (without using pointer tricks or similar techniques to fool the compiler), no other references to the same string will be affected.
Delphi strings are not interned. If you create the same string from two separate sections of code, they will not share the same backing store - the same data will be stored twice.
Delphi strings are not immutable (try: string1[2] := 'a') but they are reference-counted and copy-on-write.
The consequences for your hashes are not clear, you'll have to detail how they are stored etc.
But a hash should only depend on the contents of a string, not on how it is stored. That makes the whole question mute. Unless you can explain it better.
As others have said, Delphi strings are not generally immutable. Here are a few references on strings in Delphi.
http://blog.marcocantu.com/blog/delphi_super_duper_strings.html
http://conferences.codegear.com/he/article/32120
http://www.codexterity.com/delphistrings.htm
The Delphi version may be important to know. The good old Delphi BCL handles strings as copy-on-write, which basically means that a new instance is created when something in the string is changed. So yes, they are more or less immutable.

Resources