Delphi non visual TTree implementation - delphi

I'm looking for a non visual persistent tree (TStringTree) implementation. If someone known any good implentation of it, please let me know.
Thanks.

You'll find a flexible, non-visual tree structure in the DI Containers library (commercial). However, as others have noted above, it's really quite easy to roll your own, adding only the functionality that you need.
You can do with just two base objects: TNode and a TNodeList (e.g. a TObjectList descendant). At minimum, TNode needs just three members: your string data, a reference to its parent node (nil if the node is root), and a TNodeList, which is a list of its child nodes. What remains is the (somewhat tedious) implementation of the various attendant methods such as Add(), Delete(), IndexOf(), MoveTo(), GetFirstChild(), GetNext() etc. The basic tree should be less than a one-nighter.

What kind of tree? B-tree? Splay tree? Red-black tree? These are all common types of tree algorithms.
You might want to look at Julian Bucknall's book, Tomes of Delphi: Data Structures and Algorithms. It has all sorts of tree implementations with full Delphi source; you could easily adapt any of them to work with strings.

And of course there is still the funky DECAL (previously Rosetta), an attempt at creating a kind of STL using interfaces and variants.
http://sourceforge.net/projects/decal/
More for the people that flexibility over speed though. The base tree structure is Red-Black iirc.

Why not simply use an XML DOM document?
It may be overkill for a truly trivial string-tree, but would not be too burdensome to use for that purpose and has the benefit of being capable of accommodating just about any extension to a string-tree (for storing additional data with each string in the tree - as attributes etc) should the need arise.
In my experience often what starts out as a trivial need can quickly grow beyond the initial or anticipated requirement. :)
If you are concerned about the "overhead" of the COM based XML implementation and the VCL wrapper around it, you might look into TNativeXML

You could just use a tStringList and add objects which are other tStringLists... crude, but it works if your data can be represented as only string data.
Child := tStringlist.create;
ParentList.AddObject('Child',Child);
Of course a better solution would be to create your own objects which contain a tobjectlist of objects.

Related

Delphi object without Create? [duplicate]

In Delphi sane people use a class to define objects.
In Turbo Pascal for Windows we used object and today you can still use object to create an object.
The difference is that a object lives on the stack and a class lives on the heap.
And of course the object is depreciated.
Putting all that aside:
is there a benefit to be had, speed wise by using object instead of class?
I know that object is broken in Delphi 2009, but I've got a special use case1) where speed matters and I'm trying to find if using object will make my thing faster without making it buggy
This code base is in Delphi 7, but I may port it to Delphi 2007, haven't decided yet.
1) Conway's game of life
Long comment
Thanks all for pointing me in the right direction.
Let me explain a bit more. I'm trying to do a faster implementation of hashlife, see also here or here for simple sourcecode
The current record holder is golly, but golly uses a straight translation of Bill Gospher original lisp code (which is brilliant as an algorithm, but not optimized at the micro level at all). Hashlife enables you to calculate a generation in O(log(n)) time.
It does this by using a space/time trade off. And for this reason hashlife needs a lot of memory, gigabytes are not unheard of. In return you can calculate generation 2^128 (340282366920938463463374607431770000000) using generation 2^127 (170141183460469231731687303715880000000) in o(1) time.
Because hashlife needs to compute hashes for all sub-patterns that occur in a larger pattern, allocation of objects needs to be fast.
Here's the solution I've settled upon:
Allocation optimization
I allocate one big block of physical memory (user settable) lets say 512MB. Inside this blob I allocate what I call cheese stacks. This is a normal stack where I push and pop, but a pop can also be from the middle of the stack. If that happens I mark it on the free list (this is a normal stack). When pushing I check the free list first if nothing is free I push as normal. I'll be using records as advised it looks like the solution with the least amount of overhead.
Because of the way hashlife works, very little popping takes place and a lot of pushes. I keep separate stacks for structures of different sizes, making sure to keep memory access aligned on 4/8/16 byte boundaries.
Other optimizations
recursion removal
cache optimization
use of inline
precalculation of hashes (akin to rainbow tables)
detection of pathological cases and use of fall-back algorithm
use of GPU
For using normal OOP programming, you should always use the class kind. You'll have the most powerful object model in Delphi, including interface and generics (in later Delphi versions).
1. Records, pointers and objects
Records can be evil (slow hidden copy if you forgot to declare a parameter as const, record hidden slow cleanup code, a fillchar would make any string in record become a memory leak...), but they are sometimes very convenient to access a binary structure (e.g. some "smallish value"), via a pointer.
A dynamic array of tiny records (e.g. with one integer and one double field) will be much faster than a TList of small classes; with our TDynArray wrapper, you will have high-level access to the records, with serialization, sorting, hashing and such.
If using pointers, you must know what you are doing. It's definitively more preferable to stick with classes, and TPersistent if you want to use the magical "VCL component ownership model".
Inheritance is not allowed for records. You'll need either to use a "variant record" (using the case keyword in its type definition), either use nested records. When using C-like API, you'll sometimes have to use object-oriented structures. Using nested records or variant records is IMHO much less clear than the good old "object" inheritance model.
2. When to use object
But there are some places where objects are a good way of accessing already existing data.
Even the object model is better than the new record model, because it handles simple inheritance.
In a Blog entry last summer, I posted some possibilities to still use objects:
A memory mapped file, which I want to parse very quickly: a pointer to such an object is just great, and you still have methods at hand; I use this for TFileHeader or TFileInfo which map the .zip header, in SynZip.pas;
A Win32 structure, as defined by a API call, in which I put handy methods for easy access to the data (for that you may use record but if there is some object orientation in the struct - which is very common - you'll have to nest records, which is not the very handy);
A temporary structure defined on the stack, just used during a procedure: I use this for TZStream in SynZip.pas, or for our RTTI related classes, which map the Delphi generated RTTI in an Object-Oriented way not as the TypeInfo which is function/procedure oriented. By mapping the RTTI memory content directly, our code is faster than using the new RTTI classes created on the heap. We don't instanciate any memory, which, for an ORM framework like ours, is good for its speed. We need a lot of RTTI info, but we need it quick, we need it directly.
3. How object implementation is broken in modern Delphi
The fact that object is broken in modern Delphi is a shame, IMHO.
Normally, if you define a record on the stack, containing some reference-counted variables (like a string), it will be initialized by some compiler magic code, at the begin level of the method/function:
type TObj = object Int: integer; Str: string; end;
procedure Test;
var O: TObj
begin // here, an _InitializeRecord(#O,TypeInfo(TObj)) call is made
O.Str := 'test';
(...)
end; // here, a _FinalizeRecord(#O,TypeInfo(TObj)) call is made
Those _InitializeRecord and _FinalizeRecord will "prepare" then "release" the O.Str variable.
With Delphi 2010, I found out that sometimes, this _InitializeRecord() was not always made.
If the record has only some no public fields, the hidden calls are sometimes not generated by the compiler.
Just build the source again, and there will be...
The only solution I found out was using the record keyword instead of object.
So here is how the resulting code looks like:
/// used to store and retrieve Words in a sorted array
// - is defined either as an object either as a record, due to a bug
// in Delphi 2010 compiler (at least): this structure is not initialized
// if defined as a record on the stack, but will be as an object
TSortedWordArray = {$ifdef UNICODE}record{$else}object{$endif}
public
Values: TWordDynArray;
Count: integer;
/// add a value into the sorted array
// - return the index of the new inserted value into the Values[] array
// - return -(foundindex+1) if this value is already in the Values[] array
function Add(aValue: Word): PtrInt;
/// return the index if the supplied value in the Values[] array
// - return -1 if not found
function IndexOf(aValue: Word): PtrInt; {$ifdef HASINLINE}inline;{$endif}
end;
The {$ifdef UNICODE}record{$else}object{$endif} is awful... but the code generation error didn't occur since..
The resulting modifications in the source code are not huge, but a bit disappointing. I found out that older version of the IDE (e.g. Delphi 6/7) are not able to parse such declaration, so the class hierarchy will be broken in the editor... :(
Backward compatibility should include regression tests. A lot of Delphi users stay to this product because of the existing code. Breaking features are very problematic for the Delphi future, IMHO: if you have to rewrite a lot of code, which shouldn't you just switch the project to C# or Java?
Object was not the Delphi 1 method of setting up objects; it was the short-lived Turbo Pascal method of setting up objects, which was replaced by the Delphi TObject model in Delphi 1. It was kept around for backwards compatibility, but it should be avoided for a few reasons:
As you noted, it's broken in more recent versions. And AFAIK there are no plans to fix it.
It's a conceptualy wrong object model. The entire point of Object Oriented Programming, the one thing that really distinguishes it from procedural programming, is Liskov substitution (inheritance and polymorphism), and inheritance and value types don't mix.
You lose support for a lot of features that require TObject descendants.
If you really need value types that don't need to be dynamically allocated and initialized, you can use records instead. You can't inherit from them, but you can't do that very well with object either so you're not losing anything here.
As for the rest of the question, there aren't all that many speed benefits. The TObject model is plenty fast, especially if you're using the FastMM memory manager to speed up creation and destruction of objects, and if your objects contain lots of fields they can even be faster than records in a lot of cases, because they're passed by reference and don't have to be copied around for each function call.
When given a choice between "fast and possibly broken" and "fast and correct," always choose the latter.
Old-style objects offer no speed incentive over plain old records, so wherever you might be tempted to use old-style objects, you can use records instead without the risk of having uninitialized compiler-managed types or broken virtual methods. If your version of Delphi doesn't support records with methods, then just use standalone procedures instead.
Way back in older versions of Delphi which did not support records with methods then using object was the way to get your objects allocated on the stack. Very occasionally that would yield worthwhile performance benefits. Nowadays record is better. The only feature missing from record is the ability to inherit from another record.
You give up a lot when you change from class to record so only consider it if the performance benefits are overwhelming.

Why are mutables allowed in F#? When are they essential?

Coming from C#, trying to get my head around the language.
From what I understand one of the main benefits of F# is that you ditch the concept of state, which should (in many cases) make things much more robust.
If this is the case (and correct me if it's not), why allow us to break this principle with mutables? To me it feels like it they don't belong in the language. I understand you don't have to use them, but it gives you the tools to go off track and think in an OOP manner.
Can anyone provide an example of where a mutable value is essential?
Current compilers for declarative (stateless) code are not very smart. This results in lots of memory allocations and copy operations, which are rather expensive. Mutating some property of an object allows to re-use the object in its new state, which is much faster.
Imagine you make a game with 10000 units moving around at 60 ticks a second. You can do this in F#, including collisions with a mutable quad- or octree, on a single CPU core.
Now imagine the units and quadtree are immutable. The compiler would have no better idea than to allocate and construct 600000 units per second and create 60 new trees per second. And this excludes any changes in other management structures. In a real-world use case with complex units, this kind of solution will be too slow.
F# is a multi-paradigm language that enables the programmer to write functional, object-oriented, and, to an extent, imperative programs. Currently, each variant has its valid uses. Maybe, at some point in the future, better compilers will allow for better optimization of declarative programs, but right now, we have to fall back to imperative programming when performance becomes an issue.
Having the ability to use mutable state is often important for performance reasons, among other things.
Consider implementing the API List.take: count : int -> list : 'a list -> 'a list which returns a list consisting of only the first count elements from the input list.
If you are bound by immutability, Lists can only be built up back-to-front. Implementing take then boils down to
Build up result list back-to-front with first count guys from input: O(count)
Reverse that result and return O(count)
The F# runtime, for performance reasons, has the magic special ability to build Lists front-to-back when needed (i.e. to mutate the tail of the last guy to point to a new tail element). The basic algorithm used for List.take is:
Build up result list front-to-back with first count guys from input: O(count)
Return the result
Same asymptotic performance, but in practical terms it's twice as fast to use mutation in this case.
Pervasive mutable state can be a nightmare as code is difficult to reason about. But if you factor your code so that mutable state is tightly encapsulated (e.g. in implementation details of List.take), then you can enjoy its benefits where it makes sense. So making immutability the default, but still allowing mutability, is a very practical and useful feature of the language.
First of all, what makes F# powerful is, in my opinion, not just the immutability by default, but a whole mix of features like: immutability by default, type inference, lightweight syntax, sum (DUs) and product types (tuples), pattern matching and currying by default. Possibly more.
These make F# very functional by default and they make you program in a certain way. In particular they make you feel uncomfortable when you use mutable state, as it requires the mutable keyword. Uncomfortable in this sense means more careful. And that is exactly what you should be.
Mutable state is not forbidden or evil per se, but it should be controlled. The need to explicitly use mutable is like a warning sign making you aware of danger. And good ways how to control it, is using it internally within a function. That way you can have your own internal mutable state and still be perfectly thread-safe because you don't have shared mutable state. In fact, your function can still be referentially transparent even if it uses mutable state internally.
As for why F# allows mutable state; it would be very difficult to write usual real-world code without the possibility for it. For instance in Haskell, something like a random number cannot be done in the same way as it can be done in F#, but rather needs threading through the state explicitly.
When I write applications, I tend to have about 95% of the code base in a very functional style that would be pretty much 1:1 portable to say Haskell without any trouble. But then at the system boundaries or at some performance-critical inner loop mutable state is used. That way you get the best of both worlds.

Why does F# Set need IComparable?

So I am trying to use the F# Set as a hash table. But my element type doesn't implement the IComparable interface (although it implements IEquatable). I got an error saying the construction is not allowed because of comparison constraint. And through some further read, I discovered that F# Set is implemented using binary tree, which makes insertion causes O(log(n)). This looks weird to me, why is the Set structure designed this way?
Edit: So I learned that Set in F# is actually a SortedSet. And I guess the question becomes, why is Sorted Set somehow more preferable than a general Hash Set as an immutable/functional data structure?
There are two important points that should help you understand how sets in F# (and in functional languages in general) work and how they are used:
Implementing immutable hashtables (like .NET HashSet) is hard - when you remove or add elements, you want to avoid copying everything in the data structure and (as far as I know) there is no general way of doing that (you would end up copying too much, so it would be inefficient).
For this reason, most functional sets are implemented as (some form of trees). Those require comparison to build a sorted tree. The nice property of balanced trees is that removing and adding elements does not have to copy everything in the tree, so even the worst case scenario is reasonably efficient (though mutable hashtable is still faster).
Now, F# is functional-first, which means that immutable structures are preferred, but it is perfectly fine to use mutable data structures (especially if you limit the usage to some well defined and restricted scope). For this reason, F# programmers often use Dictionary or HashSet, especially when this is only within the scope of a single function.

Linked list single class vs multiple classes

In my second term as an computer science student almost the whole term we have focused on writing linked lists in different variations(stack, queue, ...). The design of these lists always came down to this
class List<T> {
class ListElement {
T value;
ListElement next;
}
ListElement root;
}
with variations to which methods were implemented and how they worked (I have left out constructors and properties for simplicity here).
Some day I started learning scala and focusing on functional programming. This also came to the point where a linked list was written but in a different style of implementation.
class List[T]( head: T, tail: List[T])
Despite the different syntax and immutability this is in my opinion a different aproach.
And I thought to myself "Well you could have implemented lists the same way in C# or Java with one class less than the aproach you learned".
I can see why you would implement a linked list like that in a functional language where recursion is not as dangerous as in C# or Java because at least for my way of thinking a recursive implementation of all the usual methods on a linked list for this design is very intuitive.
What I do not understand is why are linked lists in C# or Java typically implemented in the first fashion when you could implement them the other way with less code but equal verbosity? (I am not talking about the implementation of lists in the libraries of the language but about the lists you typically write as a programmer to be)
The only benefit I can see with the first approach is that you can hide the implementation from the user a bit better but is this the reason and also is this worth the additional class?
I wouldn't even need to expose my implementation to the user as I could still implement my list internally different and maybe only have chosen to have a constructor like that and provide functionality to retreive the first element of the list as head and also the rest as tail.
The reasons for them to be "implemented in the first fashion" as you mentioned include
Performance.
Time and space complexity are the two most important concerns while writing algorithms or implementing data structures that support operations like search and sort. As you have mentioned, the lists created the recursive way aren't mutable! The very purpose of creating a list is attaining faster operations on that. So designers prefer the 'first fashion'.
Object orientation
While solving real world problems, the initial object oriented analysis and design (OOAD) matter a lot. With an object modelling that closely resembles the real world objects/things as much as they can, designers can achieve better solutions. The recursive approach seems to miss out this aspect
Scalability
Designers of APIs/Libraries keep scalability in mind when they draft the designs. A code written in the 'first fashion' is much more scalable, and easy to comprehend.
Other design concerns
This is not an exhaustive list of the reasons in any way. There are so many other factors and experience based learning that exist in the programming folklore, that lead to the choice of the first fashion.

How to design a set of file readers and writers for different format

Digging into a legacy project (C++) that needs to be extended, I realized that there are about 40 reader/writer/parser classes. They are used to read and write various type of data (different objects) in different files format (binary, hdf5, xml, text, ...) ; one type of object is typically bound to one or two file formats. The classes have for most of them just no knowledge of the others. Interfaces and inheritance were apparently unknown to the writer, as well as design patterns.
It seems to me an horrendous mess. On the other hand I am not exactly sure how to handle this situation. I will at least extract interfaces. I would also like to see if I can have common code in some parent classes, for example what is specific to a hdf5 reader/writer. I also thought that the abstract factory pattern could help but the object I get out of the readers are completely different.
How would you handle this situation ? how would you design the classes ? what design pattern would you use if any ? Would you keep the reading and writing parts splitted ?
The Abstract Factory pattern isn't the right track. You usually only need interfaces if you anticipate multiple implementations for a given file type and want both to operate the same way.
Question: can one class be written to multiple file types? As in, object 'a' (of type Class A) potentially needs to be written to either/both XML or text formats?
If that is true, you need to decouple the classes from the readers/writers. Take a look at this question: What design pattern should I use for import/export?

Resources