String lifetime management, in records - delphi

I am working on getting rid of shortstring.
One of the many places shortstring is currently used within our programs is in records.
Alot of these records are kept in AVL trees.
The AVL tree used is a generic one, holding a pointer to a number of bytes (ElemSize), which have worked well so far.
The memory for each record in the AVL tree is allocated with GetMem, and copied with Move.
However, with string being a pointer to a reference-counted structure, copying back the memory to a record no longer works, as the sting referenced is often freed (automatically by reference count).
With only a pointer and a size of the "data block", I assume it is not possible to have the reference count of the strings increased.
I'm looking for a way to get the reference count of the stings to be taken into account when storing the record in a AVL tree.
Can I pass the record type to the tree constructor, then cast the pointer to this type and thus get the references increased? Or a similar fix, where I can isolate the changes to primarily be in the AVL unit and calls to it's constructor.
Current code for allocation of space to store the record in AVL; XData is a pointer to the record to be stored:
New(RootPtr); { create new memory space }
GetMem(RootPtr^.TreeData, ElemSize);
WITH RootPtr^ DO BEGIN
{ copy data }
Move(XData^, RootPtr^.TreeData^, ElemSize);

In essence the question you are asking is:
How can I allocate, copy and deallocate a record when all I know about its type is its size?
The simple answer is that you can use GetMem, Move and FreeMem provided that the record does not contain managed types. You wish to work with records that contain Delphi strings, which are managed. And so your current approach using GetMem and Move does not suffice.
There are plenty of ways to solve this. You could write your own code to do reference counting, so long as you knew where in the record the managed types were. I don't recommend this. You could make your user data be a class and use polymorphism to help.
The option I'd like to discuss continues to support records and indeed allows the user to choose whatever type they like. The reasoning is as follows:
If the type contains managed types, then operating on it requires knowledge of the type. If the tree is to be generic, then it cannot have that knowledge. Ergo, the knowledge must be supplied by the user of the tree.
This leads you to events. Let the tree offer events that the user can supply handlers for. The types would look like this:
type
PTreeNodeUserData = type Pointer;
TTreeNodeCreateUserDataEvent = function: PTreeNodeUserData of object;
TTreeNodeDestroyUserDataEvent = procedure(Data: PTreeNodeUserData) of object;
TTreeNodeCopyUserDataEvent = procedure(Source, Dest: PTreeNodeUserData) of object;
Then you can arrange for your tree to publish events with these types that the user can subscribe to.
The point being that this allows the user of the tree to supply the missing knowledge about the user data type.

One of the main benefits of using records is the simplicity with which they can be copied (without using Move). So your best solution is to simply replace Move with a normal assignment operator :=. This will correctly consider the reference counts for all managed types involved.
Is there a particular reason you're not using the normal assignment operator?
PS: You need to ensure that the memory for all managed types (including long strings) is correctly initialised and finalised. I suggest you do some additional reading on the Initialize and Finalize routines.
The tree is general, it can hold a given lump of data. I hoped I could extend the functionality without making a new tree class per record.
In that case you need your "copy behaviour" to be variable depending on what it's working with. As couple of options:
If your tree is wrapped in a class you can easily modify it to use a callback event to perform the copy operation. (This option might be easiest even if you first have to work on encapsulating the tree in a class.)
Modify your nodes and/or data to be objects with polymorphic copy functionality. Then each subtype will know how to copy itself correctly, and you can write something along the lines of Root.TreeData := XData.CreateCopy;

If you are working at such a low level, and don't want compiler to help you, then you need to use PChar-strings instead of regular strings.

Related

Trying to understand this profiling result from F# code

A quick screenshot with the point of interest:
There are 2 questions here.
This happens in a tight loop. The 12.8% code is this:
{
this with Side = side; PositionPrice = position'; StopLossPrice = sl'; TakeProfitPrice = tp'; Volume = this.Volume + this.Quantity * position'
}
This object is passed around a lot and has 23 fields so it's not tiny. It looks like immutability is great for stable code, but it's horrible for performance.
Since this recursive loop is run in parallel, I need to store it's context variables in an object.
I am looking for a general idea of what makes sense, not something specific to that code because I have a few tight loops with a bunch of math which I need to profile as well. I am sure I'll find the same pattern in several places.
The flaw here is that I store both the context for the calculations and its variables in a singe type that gets passed in the loop. As the variable fields get updated, the whole object has to be recreated.
What would make sense here (in general for this type of situations)?
make the fields that can change mutable. In this case, that means keeping the type as is (23 fields) and make some fields mutable (only 5 fields get regularly changed)
move the mutable fields to their own type to have a general context object and one holding all the variables. In this case, that means having a context with (23 - 5 fields) and a separate 5 fields type
make the mutable fields variables and move them out of the type. In this case, these 5 fields would be passed as variables in the recursive loop?
and for the second question:
I have no idea what the 10.0% line with get_Tag is. I have nothing called 'Tag' in the code, so I assume that's a dotnet internal thing.
I have a type called Side and there is a field with the same name used in the loop, but what is the 'Tag' part?
What I would suggest is not to modify your existing immutable type at all. Instead, create a new type with mutable fields that is only used within your tight loop. If the type leaves that loop, convert it back to your immutable type (assuming you don't need a copy to go through the rest of your program with every iteration).
get_Tag in this case is likely the auto-generated get-only property on a discriminated union, it's just how the F# compiler represents this sort of type in CLR. The property can most easily be seen when looking at F# code from C#, here's a great page on F# decompiled:
https://fsharpforfunandprofit.com/posts/fsharp-decompiled/#unions
For the performance issues I can only offer some suggestions:
If you can constrain the context object to your code only, then try making a mutable version and see which effect it has.
You mention that the context object is quite large, is it possible to split it up?

Delphi object without Create? [duplicate]

In Delphi sane people use a class to define objects.
In Turbo Pascal for Windows we used object and today you can still use object to create an object.
The difference is that a object lives on the stack and a class lives on the heap.
And of course the object is depreciated.
Putting all that aside:
is there a benefit to be had, speed wise by using object instead of class?
I know that object is broken in Delphi 2009, but I've got a special use case1) where speed matters and I'm trying to find if using object will make my thing faster without making it buggy
This code base is in Delphi 7, but I may port it to Delphi 2007, haven't decided yet.
1) Conway's game of life
Long comment
Thanks all for pointing me in the right direction.
Let me explain a bit more. I'm trying to do a faster implementation of hashlife, see also here or here for simple sourcecode
The current record holder is golly, but golly uses a straight translation of Bill Gospher original lisp code (which is brilliant as an algorithm, but not optimized at the micro level at all). Hashlife enables you to calculate a generation in O(log(n)) time.
It does this by using a space/time trade off. And for this reason hashlife needs a lot of memory, gigabytes are not unheard of. In return you can calculate generation 2^128 (340282366920938463463374607431770000000) using generation 2^127 (170141183460469231731687303715880000000) in o(1) time.
Because hashlife needs to compute hashes for all sub-patterns that occur in a larger pattern, allocation of objects needs to be fast.
Here's the solution I've settled upon:
Allocation optimization
I allocate one big block of physical memory (user settable) lets say 512MB. Inside this blob I allocate what I call cheese stacks. This is a normal stack where I push and pop, but a pop can also be from the middle of the stack. If that happens I mark it on the free list (this is a normal stack). When pushing I check the free list first if nothing is free I push as normal. I'll be using records as advised it looks like the solution with the least amount of overhead.
Because of the way hashlife works, very little popping takes place and a lot of pushes. I keep separate stacks for structures of different sizes, making sure to keep memory access aligned on 4/8/16 byte boundaries.
Other optimizations
recursion removal
cache optimization
use of inline
precalculation of hashes (akin to rainbow tables)
detection of pathological cases and use of fall-back algorithm
use of GPU
For using normal OOP programming, you should always use the class kind. You'll have the most powerful object model in Delphi, including interface and generics (in later Delphi versions).
1. Records, pointers and objects
Records can be evil (slow hidden copy if you forgot to declare a parameter as const, record hidden slow cleanup code, a fillchar would make any string in record become a memory leak...), but they are sometimes very convenient to access a binary structure (e.g. some "smallish value"), via a pointer.
A dynamic array of tiny records (e.g. with one integer and one double field) will be much faster than a TList of small classes; with our TDynArray wrapper, you will have high-level access to the records, with serialization, sorting, hashing and such.
If using pointers, you must know what you are doing. It's definitively more preferable to stick with classes, and TPersistent if you want to use the magical "VCL component ownership model".
Inheritance is not allowed for records. You'll need either to use a "variant record" (using the case keyword in its type definition), either use nested records. When using C-like API, you'll sometimes have to use object-oriented structures. Using nested records or variant records is IMHO much less clear than the good old "object" inheritance model.
2. When to use object
But there are some places where objects are a good way of accessing already existing data.
Even the object model is better than the new record model, because it handles simple inheritance.
In a Blog entry last summer, I posted some possibilities to still use objects:
A memory mapped file, which I want to parse very quickly: a pointer to such an object is just great, and you still have methods at hand; I use this for TFileHeader or TFileInfo which map the .zip header, in SynZip.pas;
A Win32 structure, as defined by a API call, in which I put handy methods for easy access to the data (for that you may use record but if there is some object orientation in the struct - which is very common - you'll have to nest records, which is not the very handy);
A temporary structure defined on the stack, just used during a procedure: I use this for TZStream in SynZip.pas, or for our RTTI related classes, which map the Delphi generated RTTI in an Object-Oriented way not as the TypeInfo which is function/procedure oriented. By mapping the RTTI memory content directly, our code is faster than using the new RTTI classes created on the heap. We don't instanciate any memory, which, for an ORM framework like ours, is good for its speed. We need a lot of RTTI info, but we need it quick, we need it directly.
3. How object implementation is broken in modern Delphi
The fact that object is broken in modern Delphi is a shame, IMHO.
Normally, if you define a record on the stack, containing some reference-counted variables (like a string), it will be initialized by some compiler magic code, at the begin level of the method/function:
type TObj = object Int: integer; Str: string; end;
procedure Test;
var O: TObj
begin // here, an _InitializeRecord(#O,TypeInfo(TObj)) call is made
O.Str := 'test';
(...)
end; // here, a _FinalizeRecord(#O,TypeInfo(TObj)) call is made
Those _InitializeRecord and _FinalizeRecord will "prepare" then "release" the O.Str variable.
With Delphi 2010, I found out that sometimes, this _InitializeRecord() was not always made.
If the record has only some no public fields, the hidden calls are sometimes not generated by the compiler.
Just build the source again, and there will be...
The only solution I found out was using the record keyword instead of object.
So here is how the resulting code looks like:
/// used to store and retrieve Words in a sorted array
// - is defined either as an object either as a record, due to a bug
// in Delphi 2010 compiler (at least): this structure is not initialized
// if defined as a record on the stack, but will be as an object
TSortedWordArray = {$ifdef UNICODE}record{$else}object{$endif}
public
Values: TWordDynArray;
Count: integer;
/// add a value into the sorted array
// - return the index of the new inserted value into the Values[] array
// - return -(foundindex+1) if this value is already in the Values[] array
function Add(aValue: Word): PtrInt;
/// return the index if the supplied value in the Values[] array
// - return -1 if not found
function IndexOf(aValue: Word): PtrInt; {$ifdef HASINLINE}inline;{$endif}
end;
The {$ifdef UNICODE}record{$else}object{$endif} is awful... but the code generation error didn't occur since..
The resulting modifications in the source code are not huge, but a bit disappointing. I found out that older version of the IDE (e.g. Delphi 6/7) are not able to parse such declaration, so the class hierarchy will be broken in the editor... :(
Backward compatibility should include regression tests. A lot of Delphi users stay to this product because of the existing code. Breaking features are very problematic for the Delphi future, IMHO: if you have to rewrite a lot of code, which shouldn't you just switch the project to C# or Java?
Object was not the Delphi 1 method of setting up objects; it was the short-lived Turbo Pascal method of setting up objects, which was replaced by the Delphi TObject model in Delphi 1. It was kept around for backwards compatibility, but it should be avoided for a few reasons:
As you noted, it's broken in more recent versions. And AFAIK there are no plans to fix it.
It's a conceptualy wrong object model. The entire point of Object Oriented Programming, the one thing that really distinguishes it from procedural programming, is Liskov substitution (inheritance and polymorphism), and inheritance and value types don't mix.
You lose support for a lot of features that require TObject descendants.
If you really need value types that don't need to be dynamically allocated and initialized, you can use records instead. You can't inherit from them, but you can't do that very well with object either so you're not losing anything here.
As for the rest of the question, there aren't all that many speed benefits. The TObject model is plenty fast, especially if you're using the FastMM memory manager to speed up creation and destruction of objects, and if your objects contain lots of fields they can even be faster than records in a lot of cases, because they're passed by reference and don't have to be copied around for each function call.
When given a choice between "fast and possibly broken" and "fast and correct," always choose the latter.
Old-style objects offer no speed incentive over plain old records, so wherever you might be tempted to use old-style objects, you can use records instead without the risk of having uninitialized compiler-managed types or broken virtual methods. If your version of Delphi doesn't support records with methods, then just use standalone procedures instead.
Way back in older versions of Delphi which did not support records with methods then using object was the way to get your objects allocated on the stack. Very occasionally that would yield worthwhile performance benefits. Nowadays record is better. The only feature missing from record is the ability to inherit from another record.
You give up a lot when you change from class to record so only consider it if the performance benefits are overwhelming.

Why should we use classes rather than records, or vice versa?

I've been using Delphi for quite some time now, but rather than coming from a CS background I have learnt "on the job" - mostly from my Boss, and augmented by bits and pieces picked up from the web, users guides, examples, etc.
Now my boss is old school, started programming using Pascal, and hasn't necessarily kept up-to-date with the latest changes to Delphi.
Just recently I've been wondering whether one of our core techniques is "wrong".
Most of our applications interface with MySQL. In general we will create a record with a structure to store data read from the DB, and these records will be stored in a TList. Generally we will have a unit that defines the various records that we have in an application, and the functions and procedures that seed and read the records. We don't use record procedures such as outlined here
After reviewing some examples I've started wondering whether we'd be better off using classes rather than records, but I'm having difficulty finding strong guidance either way.
The sort of thing that we are dealing with would be User information: Names, DOB, Events, Event Types. Or Timesheet information: Hours, Jobs, etc...
The big difference is that records are value types and classes are reference types. In a nutshell what this means is that:
For a value type, when you use assignment, a := b, a copy is made. There are two distinct instances, a and b.
For a reference type, when you use assignment, a := b, both variables refer to the same instance. There is only one instance.
The main consequence of this is what happens when you write a.Field := 42. For a record, the value type, the assignment a.Field changes the value of the member in a, but not in b. That's because a and b are different instances. But for a class, since a and b both refer to the same instance, then after executing a.Field := 42 you are safe to assert that b.Field = 42.
There's no hard and fast rule that says that you should always use value types, or always use reference types. Both have their place. In some situations, it will be preferable to use one, and in other situations it will be preferable to use the other. Essentially the decision always comes down to a decision on what you want the assignment operator to mean.
You have an existing code base, and presumably programmers familiar with it, that has made particular choices. Unless you have a compelling reason to switch to using reference types, making the change will almost certainly lead to defects. And defects both in the existing code (switch to reference type changes meaning of assignment operator), and in code you write in the future (you and your colleagues have developed intuition as to meaning of assignment operator in specific contexts, and that intuition will break if you switch).
What's more, you state that your types do not use methods. A type that consists only of data, and has no methods associated with it is very likely best represented by a value type. I cannot say that for sure, but my instincts tell me that the original developers made the right choice.

What datatype/structure to store file list info?

I have an application that searches files on the computer (configurable path, type etc). Currently it adds information to a database as soon as a matching file is found. Rather than that I want to hold the information in memory for further manipulation before inserting to database. The list may contain a lot of items. I consider performance as important factor. I may need iterating thru the items, so a structure that can be coded easily is another key issue. and how can I achieve php style associative arrays for this job?
If you're using Delphi 2009, you can use a TDictionary. It takes two generic parameters. The first should be a string, for the filename, and the second would be whatever data type you're associating with. It also has three built-in enumerators, one for key-value pairs, one for keys only and one for values only, which makes iterating easy.
Another solution would be to use just a standard TStringList.
As long as it's sorted and has some duplicate setting other than dupAccept, you can use indexof or indexofname to find items in the list quickly.
It also has the Objects addition which allows you to store object information attached to the name. Starting with D2009, TStringList has the OwnsObject property which allows you to delegate object cleanup to the TStringList. Prior to D2009 you have to handle that yourself.
Much of this will depend on how you are going to use the list and to what scale. If you are going to use it as a stack, or queue, then a TList would work fine. If your needing to search through the list for a specific item then you will need something that allows faster retrieval. TDictionary (2009) or TStringList (pre 2009) would be the most likely choice.
Dynamic arrays are also a possiblity, but if you use them you will want to minimize the use of SetLength as each time it is called it will re-allocate memory. TList manages this for you, which is why I suggested using a TList. if you KNOW how many you will deal with in advance, then use a dynamic array, and set its length on the onset.
If you have more items than will fit in memory then your choices also change. At that point I would either use a database table, or a tFileStream to store the records to be processed, then seek to the beginning of the table/stream for processing.
Try using the AVL-Tree by http://sourceforge.net/projects/alcinoe/ as your associative Array. It has an iterate-method for fast iteration. You may need to derive from his baseclass and implement your own comparator, but it's easy to use.
Examples are included.

Pointer to generic type

In the process of transforming a given efficient pointer-based hash map implementation into a generic hash map implementation, I stumbled across the following problem:
I have a class representing a hash node (the hash map implementation uses a binary tree)
THashNode <KEY_TYPE, VALUE_TYPE> = class
public
Key : KEY_TYPE;
Value : VALUE_TYPE;
Left : THashNode <KEY_TYPE, VALUE_TYPE>;
Right : THashNode <KEY_TYPE, VALUE_TYPE>;
end;
In addition to that there is a function that should return a pointer to a hash node. I wanted to write
PHashNode = ^THashNode <KEY_TYPE, VALUE_TYPE>
but that doesn't compile (';' expected but '<' found).
How can I have a pointer to a generic type?
And adressed to Barry Kelly: if you read this: yes, this is based on your hash map implementation. You haven't written such a generic version of your implementation yourself, have you? That would save me some time :)
Sorry, Smasher. Pointers to open generic types are not supported because generic pointer types are not supported, although it is possible (compiler bug) to create them in certain circumstances (particularly pointers to nested types inside a generic type); this "feature" can't be removed in an update in case we break someone's code. The limitation on generic pointer types ought to be removed in the future, but I can't make promises when.
If the type in question is the one in JclStrHashMap I wrote (or the ancient HashList unit), well, the easiest way to reproduce it would be to change the node type to be a class and pass around any double-pointers as Pointer with appropriate casting. However, if I were writing that unit again today, I would not implement buckets as binary trees. I got the opportunity to write the dictionary in the Generics.Collections unit, though with all the other Delphi compiler work time was too tight before shipping for solid QA, and generic feature support itself was in flux until fairly late.
I would prefer to implement the hash map buckets as one of double-hashing, per-bucket dynamic arrays or linked lists of cells from a contiguous array, whichever came out best from tests using representative data. The logic is that cache miss cost of following links in tree/list ought to dominate any difference in bucket search between tree and list with a good hash function. The current dictionary is implemented as straight linear probing primarily because it was relatively easy to implement and worked with the available set of primitive generic operations.
That said, the binary tree buckets should have been an effective hedge against poor hash functions; if they were balanced binary trees (=> even more modification cost), they would be O(1) on average and O(log n) worst case performance.
To actually answer your question, you can't make a pointer to a generic type, because "generic types" don't exist. You have to make a pointer to a specific type, with the type parameters filled in.
Unfortunately, the compiler doesn't like finding angle brackets after a ^. But it will accept the following:
TGeneric<T> = record
value: T;
end;
TSpecific = TGeneric<string>;
PGeneric = ^TSpecific;
But "PGeneric = ^TGeneric<string>;" gives a compiler error. Sounds like a glitch to me. I'd report that over at QC if I was you.
Why are you trying to make a pointer to an object, anyway? Delphi objects are a reference type, so they're pointers already. You can just cast your object reference to Pointer and you're good.
If Delphi supported generic pointer types at all, it would have to look like this:
type
PHashNode<K, V> = ^THashNode<K, V>;
That is, mention the generic parameters on the left side where you declare the name of the type, and then use those parameters in constructing the type on the right.
However, Delphi does not support that. See QC 66584.
On the other hand, I'd also question the necessity of having a pointer to a class type at all. Generic or not. they are needed only very rarely.
There's a generic hash map called TDictionary in the Generics.Collections unit. Unfortunately, it's badly broken at the moment, but it's apparently going to be fixed in update #3, which is due out within a matter of days, according to Nick Hodges.

Resources