Delphi: Problems with TList of Frames

Delphi: Problems with TList of Frames - delphi

I'm having a problem with an interface that consists of a number of frames (normally 25) within a TScrollBox.
There are 2 problems, and I am hoping that one is a consequence of the other...
Background:
When the application starts up, I create 25 frames, each containing approx. 20 controls, which are then populated with the default information. The user can then click on a control to limit the search to a subset of information at which point I free and recreate my frames (as the search may return < 25 records)
The problem:
If I quit the application after the initial search then it takes approx. 5 seconds to return to Delphi. After the 2nd search (and dispose / recreate of frames) it takes approx. 20 seconds)
Whilst I could rewrite the application to only create the frames once, I would like to understand what is going on.
Here is my create routine:
procedure TMF.CreateFrame(i: Integer; var FrameBottom: Integer);
var
NewFrame: TSF;
begin
NewFrame := TSF.Create(Self);
NewFrame.Name := 'SF' + IntToStr(i);
if i = 0 then
NewSF.Top := 8
else
NewSF.Top := FrameBottom + 8;
FrameBottom := NewFrame.Top + NewFrame.Height;
NewFrame.Parent := ScrollBox1;
FrameList.Add(NewFrame);
end;
And here is my delete routine:
procedure TMF.ClearFrames;
var
i: Integer;
SF: TSF;
begin
for i := 0 to MF.FrameList.Count -1 do
begin
SF := FrameList[i];
SF.Free;
end;
FrameList.Clear;
end;
What am I missing?

As you are taking control over the memory allocation of the Frames you are creating by Free'ing them, so there's no need to provide Self as the owner parameter in the create constructor. Pass nil instead to prevent the owner trying to free the frame.
Also, don't like the look of your ClearFrames routine. Try this instead:
while FrameList.count > 0 do
begin
TSF(Framelist[0]).free;
Framelist.delete(0);
end;
Framelist.clear;

If you want to know why your app is taking so long to do something, try profiling it. Try running Sampling Profiler against your program. The helpfile explains how to limit the profiling to only a specific section of your app, which you could use to only get sampling results on the clearing or creating parts. This should show you where you're actually spending most of your time and take a lot of the guesswork out of it.

Related

Why does scrolling through ADOTable get slower and slower?

I want to read the entire table from an MS Access file and I'm trying to do it as fast as possible. When testing a big sample I found that the loop counter increases faster when it's reading the top records comparing to last records of the table. Here's a sample code that demonstrates this:
procedure TForm1.Button1Click(Sender: TObject);
const
MaxRecords = 40000;
Step = 5000;
var
I, J: Integer;
Table: TADOTable;
T: Cardinal;
Ts: TCardinalDynArray;
begin
Table := TADOTable.Create(nil);
Table.ConnectionString :=
'Provider=Microsoft.ACE.OLEDB.12.0;'+
'Data Source=BigMDB.accdb;'+
'Mode=Read|Share Deny Read|Share Deny Write;'+
'Persist Security Info=False';
Table.TableName := 'Table1';
Table.Open;
J := 0;
SetLength(Ts, MaxRecords div Step);
T := GetTickCount;
for I := 1 to MaxRecords do
begin
Table.Next;
if ((I mod Step) = 0) then
begin
T := GetTickCount - T;
Ts[J] := T;
Inc(J);
T := GetTickCount;
end;
end;
Table.Free;
// Chart1.SeriesList[0].Clear;
// for I := 0 to Length(Ts) - 1 do
// begin
// Chart1.SeriesList[0].Add(Ts[I]/1000, Format(
// 'Records: %s %d-%d %s Duration:%f s',
// [#13, I * Step, (I + 1)*Step, #13, Ts[I]/1000]));
// end;
end;
And the result on my PC:
The table has two string fields, one double and one integer. It has no primary key nor index field. Why does it happen and how can I prevent it?

I can reproduce your results using an AdoQuery with an MS Sql Server dataset of similar size to yours.
However, after doing a bit of line-profiling, I think I've found the answer to this, and it's slightly counter-intuitive. I'm sure everyone who does
DB programming in Delphi is used to the idea that looping through a dataset tends to be much quicker if you surround the loop by calls to Disable/EnableControls. But who would bother to do that if there are no db-aware controls attached to the dataset?
Well, it turns out that in your situation, even though there are no DB-aware controls, the speed increases hugely if you use Disable/EnableControls regardless.
The reason is that TCustomADODataSet.InternalGetRecord in AdoDB.Pas contains this:
if ControlsDisabled then
RecordNumber := -2 else
RecordNumber := Recordset.AbsolutePosition;
and according to my line profiler, the while not AdoQuery1.Eof do AdoQuery1.Next loop spends 98.8% of its time executing the assignment
RecordNumber := Recordset.AbsolutePosition;
! The calculation of Recordset.AbsolutePosition is hidden, of course, on the "wrong side" of the Recordset interface, but the fact that the time to call it apparently increases the further you go into the recordset makes it reasonable imo to speculate that it's calculated by counting from the start of the recordset's data.
Of course, ControlsDisabled returns true if DisableControls has been called and not undone by a call to EnableControls. So, retest with the loop surrounded by Disable/EnableControls and hopefully you'll get a similar result to mine. It looks like you were right that the slowdown isn't related to memory allocations.
Using the following code:
procedure TForm1.btnLoopClick(Sender: TObject);
var
I: Integer;
T: Integer;
Step : Integer;
begin
Memo1.Lines.BeginUpdate;
I := 0;
Step := 4000;
if cbDisableControls.Checked then
AdoQuery1.DisableControls;
T := GetTickCount;
{.$define UseRecordSet}
{$ifdef UseRecordSet}
while not AdoQuery1.Recordset.Eof do begin
AdoQuery1.Recordset.MoveNext;
Inc(I);
if I mod Step = 0 then begin
T := GetTickCount - T;
Memo1.Lines.Add(IntToStr(I) + ':' + IntToStr(T));
T := GetTickCount;
end;
end;
{$else}
while not AdoQuery1.Eof do begin
AdoQuery1.Next;
Inc(I);
if I mod Step = 0 then begin
T := GetTickCount - T;
Memo1.Lines.Add(IntToStr(I) + ':' + IntToStr(T));
T := GetTickCount;
end;
end;
{$endif}
if cbDisableControls.Checked then
AdoQuery1.EnableControls;
Memo1.Lines.EndUpdate;
end;
I get the following results (with DisableControls not called except where noted):
Using CursorLocation = clUseClient
AdoQuery.Next AdoQuery.RecordSet AdoQuery.Next
.MoveNext + DisableControls
4000:157 4000:16 4000:15
8000:453 8000:16 8000:15
12000:687 12000:0 12000:32
16000:969 16000:15 16000:31
20000:1250 20000:16 20000:31
24000:1500 24000:0 24000:16
28000:1703 28000:15 28000:31
32000:1891 32000:16 32000:31
36000:2187 36000:16 36000:16
40000:2438 40000:0 40000:15
44000:2703 44000:15 44000:31
48000:3203 48000:16 48000:32
=======================================
Using CursorLocation = clUseServer
AdoQuery.Next AdoQuery.RecordSet AdoQuery.Next
.MoveNext + DisableControls
4000:1031 4000:454 4000:563
8000:1016 8000:468 8000:562
12000:1047 12000:469 12000:500
16000:1234 16000:484 16000:532
20000:1047 20000:454 20000:546
24000:1063 24000:484 24000:547
28000:984 28000:531 28000:563
32000:906 32000:485 32000:500
36000:1016 36000:531 36000:578
40000:1000 40000:547 40000:500
44000:968 44000:406 44000:562
48000:1016 48000:375 48000:547
Calling AdoQuery1.Recordset.MoveNext calls directly into the MDac/ADO layer, of
course, whereas AdoQuery1.Next involves all the overhead of the standard TDataSet
model. As Serge Kraikov said, changing the CursorLocation certainly makes a difference and doesn't exhibit the slowdown we noticed, though obviously it's significantly slower than using clUseClient and calling DisableControls. I suppose it depends on exactly what you're trying to do whether you can take advantage of the extra speed of using clUseClient with RecordSet.MoveNext.

When you open a table, ADO dataset internally creates special data structures to navigate dataset forward/backward - "dataset CURSOR". During navigation, ADO stores the list of already visited records to provide bidirectional navigation.
Seems ADO cursor code uses quadratic-time O(n2) algorithm to store this list.
But there are workaround - use server-side cursor:
Table.CursorLocation := clUseServer;
I tested your code using this fix and get linear fetch time - fetching every next chunk of records takes the same time as previous.
PS Some other data access libraries provides special "unidirectional" datasets - this datasets can traverse only forward and don't even store already traversed records - you get constant memory consumption and linear fetch time.

DAO is native to Access and (IMHO) is typically faster.
Whether or not you switch, use the GetRows method. Both DAO and ADO support it.
There is no looping. You can dump the entire recordset into an array with a couple of lines of code. Air code:
yourrecordset.MoveLast
yourrecordset.MoveFirst
yourarray = yourrecordset.GetRows(yourrecordset.RecordCount)

Really fast function to compare the name (full path) of two files

I have to check if I have duplicate paths in a FileListBox (FileListBox has the role of some kind of job list or play list).
Using Delphi's SameText, CompareStr, CompareText, takes 6 seconds. So I came with my own compare function which is (just) a bit faster but not fast enough. Any ideas how to improve it?
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result:= Length(Path1)= Length(Path2); { if they have different lenghts then obviously are not the same file }
if Result then
for i:= Length(Path1) downto 1 DO { start from the end because it is more likely to find the difference there }
if Path1[i]<> Path2[i] then
begin
Result:= FALSE;
Break;
end;
end;
I use it like this:
for x:= JList.Count-1 downto 1 DO
begin
sMaster:= JList.Items[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, JList.Items[y]) then
begin
JList.Items.Delete (x); { REMOVE DUPLICATES }
Break;
end;
end;
Note: The chance of having duplicates is small so Delete is not called often. Also the list cannot be sorted because the items are added by user and sometimes the order may be important.
Update:
The thing is that I lose the asvantage of my code because it is Pascal.
It would be nice if the comparison loop ( Path1[i]<> Path2[i] ) would be optimized to use Borland's ASM code.
Delphi 7, Win XP 32 bit, Tests were done with 577 items in the list. Deleting the items from list IS NOT A PROBLEM because it happens rarely.
CONCLUSION
As Svein Bringsli pointed, my code is slow not because of the comparing algorithm but because of TListBox. The BEST solution was provided by Marcelo Cantos. Thanks a lot Marcelo.
I accepted Svein's answer because it answers directly my question "how to make my comparison function faster" with "there is no point to make it faster".
For the moment I implemented the dirty and quick to implement solution: when I have under 200 files, I use my slow code to check the duplicates. If there are more than 200 files I use dwrbudr's solution (which is damn fast) considering that if the user has so many files, the order is irrelevant anyway (human brain cannot track so many items).
I want to thank you all for ideas and especially Svein for revealing the truth: (Borland's) visual controls are damn slow!

Don't waste time optimising the assembler. You can go from O(n2) to O(n log(n)) — bringing the time down to milliseconds — by sorting the list and then doing a linear scan for duplicates.
While you're at it, forget the SameFile function. The algorithmic improvement will dwarf anything you can achieve there.
Edit: Based on feedback in the comments...
You can perform an order-preserving O(n log(n)) de-duplication as follows:
Sort a copy of the list.
Identify and copy duplicated entries to a third list along with their duplication count minus one.
Walk the original list backwards as per your original version.
In the inner (for y := ...) loop, traverse the duplication list instead. If an outer item matches, delete it, decrement the duplication count, and delete the duplication entry if the count reaches zero.
This is obviously more complicated but it will still be orders of magnitude faster, even if you do horrible dirty things like storing duplication counts as strings, C:\path1\file1=2, and using code like:
y := dupes.IndexOfName(sMaster);
if y <> -1 then
begin
JList.Items.Delete(x);
c := StrToInt(dupes.ValueFromIndex(y));
if c > 1 then
dupes.Values[sMaster] = IntToStr(c - 1);
else
dupes.Delete(y);
end;
Side note: A binary chop would be more efficient than the for y := ... loop, but given that duplicates are rare, the difference ought to be negligible.

Using your code as a starting point, I modified it to take a copy of the list before searching for duplicates. The time went from 5,5 seconds to about 0,5 seconds.
vSL := TStringList.Create;
try
vSL.Assign(jList.Items);
vSL.Sorted := true;
for x:= vSL.Count-1 downto 1 DO
begin
sMaster:= vSL[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, vSL[y]) then
begin
vSL.Delete (x); { REMOVE DUPLICATES }
jList.Items.Delete (x);
Break;
end;
end;
finally
vSL.Free;
end;
Obviously, this is not a good way to do it, but it demonstrates that TFileListBox is in itself quite slow. I don't believe you can gain much by optimizing your compare-function.
To demonstrate this, I replaced your SameFile function with the following, but kept the rest of your code:
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result := false; //Pretty darn fast code!!!
end;
The time went from 5,6 seconds to 5,5 seconds. I don't think there's much more to gain there :-)

Create another sorted list with sortedList.Duplicates := dupIgnore and add your strings to that list, then back.
vSL := TStringList.Create;
try
vSL.Sorted := true;
vSL.Duplicates := dupIgnore;
for x:= 0 to jList.Count - 1 do
vSL.Add(jList[x]);
jList.Clear;
for x:= 0 to vSL.Count - 1 do
jList.Add(vSL[x]);
finally
vSL.Free;
end;

The absolute fastest way, bar none (as alluded to before) is to use a routine that generates a unique 64/128/256 bit hash code for a string (I use the SHA256Managed class in C#). Run down the list of strings, generate the hash code for the strings, check for it in the sorted hash code list, and if found then the string is a duplicate. Otherwise add the hash code to the sorted hash code list.
This will work for strings, file names, images (you can get the unique hash code for an image), etc, and I guarantee that this will be as fast or faster than any other impementation.
PS You can use a string list for the hash codes by representing the hash codes as strings. I've used a hex representation in the past (256 bits -> 64 characters) but in theory you can do it any way you like.

4 seconds for how many calls? Great performance if you call it a billion times...
Anyway, does Length(Path1) get evaluated every time through the loop? If so, store that in an Integer variable prior to looping.
Pointers may yield some speed over the strings.
Try in-lining the function with:
function SameFile(blah blah): Boolean; Inline;
That will save some time, if this is being called thousands of times per second. I would start with that and see if it saves anything.
EDIT: I didn't realize that your list wasn't sorted. Obviously, you should do that first! Then you don't have to compare against every other item in the list - just the prior or next one.

I use a modified Ternary Search Tree (TST) to dedupe lists. You simply load the items into the tree, using the whole string as the key, and on each item you can get back an indication if the key is already there (and delete your visible entry). Then you throw away the tree. Our TST load function can typically load 100000 80-byte items in well under a second. And it could not take any more than this to repaint your list, with proper use of begin- and end-update. The TST is memory-hungry, but not so that you would notice it at all if you only have of the order of 500 items. And much simpler than sorting, comparisons and assembler (if you have a suitable TST implementation, of course).

No need to use a hash table, a single sorted list gives me a result of 10 milliseconds, that's 0.01 seconds, which is about 500 times faster! Here is my test code using a TListBox:
procedure TForm1.Button1Click(Sender: TObject);
var
lIndex1: Integer;
lString: string;
lIndex2: Integer;
lStrings: TStringList;
lCount: Integer;
lItems: TStrings;
begin
ListBox1.Clear;
for lIndex1 := 1 to 577 do begin
lString := '';
for lIndex2 := 1 to 100 do
if (lIndex2 mod 6) = 0 then
lString := lString + Chr(Ord('a') + Random(2))
else
lString := lString + 'a';
ListBox1.Items.Add(lString);
end;
CsiGlobals.AddLogMsg('Start', 'Test', llBrief);
lStrings := TStringList.Create;
try
lStrings.Sorted := True;
lCount := 0;
lItems := ListBox1.Items;
with lItems do begin
BeginUpdate;
try
for lIndex1 := Count - 1 downto 0 do begin
lStrings.Add(Strings[lIndex1]);
if lStrings.Count = lCount then
Delete(lIndex1)
else
Inc(lCount);
end;
finally
EndUpdate;
end;
end;
finally
lStrings.Free;
end;
CsiGlobals.AddLogMsg('Stop', 'Test', llBrief);
end;

I'd also like to point out that your solution would take an extreme amount of time if applied to a huge list (like containing 100,000,000 items or more). Even constructing a hashtable or sorted list would take too much time.
In cases like that you could try another approach : Hash each member, but instead of populating a full-blown hashtable, create a bitset (large enough to contain a close factor to as many slots as there are input items) and just set each bit at the offset indicated by the hashfunction. If the bit was 0, change it to 1. If it was already 1, take note of the offending string-index in a separate list and continue. This results in a list of string-indexes that had a collision in the hash, so you'll have to run it a second time to find the first cause of those collisions. After that, you should sort & de-dupe the string-indexes in this list (as all indexes apart from the first one will be present twice). Once that's done you should sort the list again, but this time sort it on the string-contents in order to easily spot duplicates in a following single scan.
Granted it could be a bit extreme to go this all this length, but at least it's a workable solution for very large volumes! (Oh, and this still won't work if the number of duplicates is very high, when the hash-function has a bad spread or when the number of slots in the 'hashtable' bitset is chosen too small - which would give many collisions which aren't really duplicates.)

delphi app freezes whole win7 system

i have a simple program that sorts a text file according to length of words per line
this program works without problems in my xp based old machine
now i run this program on my new win7/intel core i5 machine, it freezes whole system and back normal after it finishes it's work.
i'v invastigated the code and found the line causing the freeze
it was this specific line...
caption := IntToStr(i) + '..' + IntTostr(ii);
i'v changed it to
caption := IntTostr(ii); //slow rate change
and there is no freeze
and then i'v changed it to
caption := IntTostr(i); //fast rate change
and it freeze again
my procedure code is
var tword : widestring;
i,ii,li : integer;
begin
tntlistbox1.items.LoadFromFile('d:\new folder\ch.txt');
tntlistbox2.items.LoadFromFile('d:\new folder\uy.txt');
For ii := 15 Downto 1 Do //slow change
Begin
For I := 0 To TntListBox1.items.Count - 1 Do //very fast change
Begin
caption := IntToStr(i) + '..' + IntTostr(ii); //problemetic line
tword := TntListBox1.items[i];
LI := Length(tword);
If lI = ii Then
Begin
tntlistbox3.items.Add(Trim(tntlistbox1.Items[i]));
tntlistbox4.items.Add(Trim(tntlistbox2.Items[i]));
End;
End;
End;
end;
any idea why ? and how to fix it?
i use delphi 2007/win32

Is this happening inside an event handler on a form? I'm going to guess taht it is. In that case, "Caption" is in the scope of the form. The form's caption text isn't managed by the VCL, but by Windows, and if you're sending a new WM_SETTEXT message on every iteration of the loop.
A thorough explanation of why this is doing what it's doing would require knowledge of Windows internals that I don't have, but if I were to take a guess, I'd say it's something like this:
Every time you send that WM_SETTEXT message with a new caption, Windows checks to make sure it's not identical to the existing caption. If it is, it can exit immediately. That's why the infrequent change (the one that only uses ii) doesn't slow your system down. But if it does change on every iteration, then Windows has to perform some sort of task switch in order to change it.
As for why that would bog down the entire system under a Vista kernel (including Win7) but not XP, that's completely outside my area of expertise. But if you're trying to do this as some sort of progress indicator, there are better ways, especially if this loop is as tight as it looks.
The best way to handle progress updates in a tight loop is to count iterations and only fire once every X times. (100 or 1000 can be good values for X, depending on how many times it's running and how fast the whole thing takes.) This is basically what the ii only option does. You could also try putting a Progress Bar on the form to measure progress instead of doing it through the form's caption.

Changing a Form's caption releases a whole bunch of actions - especially under Vista and Win7 with Aero active.
A quick try would be using a TLabel instead for displaying progress. Something like
Label1.caption := IntToStr(i) + '..' + IntTostr(ii); //problemetic line
Label1.Refresh; // or Repaint
should do the trick unless your label is transparent or on a glass area.
It would probably be best to follow Mason Wheeler's advice and use a progressbar. As the overall number of iterations is 15*TntListBox1.items.Count you can calculate the progress value quite easily.

First: you forget tntlistbox3.items.BeginUpdate/tntlistbox3.items.EndUpdate calls (same for tntlistbox4).
Second: Why does my program run faster if I click and hold the caption bar?
Solution (example):
const
UpdateInterval = 500; // half a second
var
...
LastUpdate: Cardinal;
begin
...
LastUpdate := GetTickCount + 100000; // forces first update
For ii := 15 Downto 1 Do //slow change
Begin
For I := 0 To TntListBox1.items.Count - 1 Do //very fast change
Begin
if (GetTickCount > (LastUpdate + UpdateInterval)) or
(GetTickCount < LastUpdate) then
caption := IntToStr(i) + '..' + IntTostr(ii); //problemetic line
...
end;
end;

TListView performance issues

I tried to use a TListView component to display rather large data lists (like 4000 rows large), and creating the list is incredibly slow - it takes something like 2-3 secs, which makes the UI all laggy and close to unusable.
I fill the TListView.Items inside a BeginUpdate/EndUpdate block, with only preallocated strings - I mean : I build a list of all strings to store (which takes no humanly noticeable time), then I put them in the TListView.
I wish to display the TListView's content in vsReport mode with several columns.
The code looks like this :
MyList.Items.BeginUpdate;
for i := 0 to MyCount - 1 do
begin
ListItem := MyList.Items.Add;
ListItem.Caption := StrCaptions[i];
ListItem.SubItems.Add(StrSubItems1[i]);
ListItem.SubItems.Add(StrSubItems2[i]);
end;
MyList.Items.EndUpdate;
Is there some other hack I missed in the TListView component's logic ? or should I just forget about using this component for performances ?

You can use listview in virtual mode. Have a look at the virtuallistview.dpr demo.

You can try Virtual Treeview component. It says "Virtual Treeview is extremely fast. Adding one million nodes takes only 700 milliseconds"

Use separate structure for holding your data. Set OwnerData of TListView to True.

#4000 rows I get only ~700 ms (D2009) times. For more responsiveness you could separate to other thread or add dirty Application.ProcessMessages() into loop.
rows generated with this code in 16 ms:
MyCount := 4000;
dw := GetTickCount();
for i := 0 to MyCount - 1 do begin
StrCaptions.Add('caption'+IntToStr(i));
StrSubItems1.Add('sub1'+IntToStr(i));
StrSubItems2.Add('sub2'+IntToStr(i));
end;
ShowMessageFmt('%u ms', [GetTickCount() - dw]);
Printed with:
MyList.Clear;
dw := GetTickCount();
MyList.Items.BeginUpdate;
for i := 0 to MyCount - 1 do
begin
ListItem := MyList.Items.Add;
ListItem.Caption := StrCaptions[i];
ListItem.SubItems.Add(StrSubItems1[i]);
ListItem.SubItems.Add(StrSubItems2[i]);
end;
MyList.Items.EndUpdate;
ShowMessageFmt('%u ms', [GetTickCount() - dw]);
EDIT:
I inserted Application.ProcessMessages() into print, but somewhy performance stays same

TVirtualStringTree - resetting non-visual nodes and memory consumption

I have an app that loads records from a binary log file and displays them in a virtual TListView. There are potentially millions of records in a file, and the display can be filtered by the user, so I do not load all of the records in memory at one time, and the ListView item indexes are not a 1-to-1 relation with the file record offsets (List item 1 may be file record 100, for instance). I use the ListView's OnDataHint event to load records for just the items the ListView is actually interested in. As the user scrolls around, the range specified by OnDataHint changes, allowing me to free records that are not in the new range, and allocate new records as needed.
This works fine, speed is tolerable, and the memory footprint is very low.
I am currently evaluating TVirtualStringTree as a replacement for the TListView, mainly because I want to add the ability to expand/collapse records that span multiple lines (I can fudge it with the TListView by incrementing/decrementing the item count dynamically, but this is not as straight forward as using a real tree).
For the most part, I have been able to port the TListView logic and have everything work as I need. I notice that TVirtualStringTree's virtual paradigm is vastly different, though. It does not have the same kind of OnDataHint functionality that TListView does (I can use the OnScroll event to fake it, which allows my memory buffer logic to continue working), and I can use the OnInitializeNode event to associate nodes with records that are allocated.
However, once a tree node is initialized, it sees that it remains initialized for the lifetime of the tree. That is not good for me. As the user scrolls around and I remove records from memory, I need to reset those non-visual nodes without removing them from the tree completely, or losing their expand/collapse states. When the user scrolls them back into view, I can re-allocate the records and re-initialize the nodes. Basically, I want to make TVirtualStringTree act as much like TListView as possible, as far as its virtualization is concerned.
I have seen that TVirtualStringTree has a ResetNode() method, but I encounter various errors whenever I try to use it. I must be using it wrong. I also thought of just storing a data pointer inside each node to my record buffers, and I allocate and free memory, update those pointers accordingly. The end effect does not work so well, either.
Worse, my largest test log file has ~5 million records in it. If I initialize the TVirtualStringTree with that many nodes at one time (when the log display is unfiltered), the tree's internal overhead for its nodes takes up a whopping 260MB of memory (without any records being allocated yet). Whereas with the TListView, loading the same log file and all the memory logic behind it, I can get away with using just a few MBs.
Any ideas?

You probably shouldn't switch to VST unless you have a use for at least some of the nice features of VST that a standard listbox / listview don't have. But there is of course a large memory overhead compared to a flat list of items.
I don't see a real benefit in using TVirtualStringTree only to be able to expand and collapse items that span multiple lines. You write
mainly because I want to add the ability to expand/collapse records that span multiple lines (I can fudge it with the TListView by incrementing/decrementing the item count dynamically, but this is not as straight forward as using a real tree).
but you can implement that easily without changing the item count. If you set the Style of the listbox to lbOwnerDrawVariable and implement the OnMeasureItem event you can adjust the height as required to draw either only the first or all lines. Drawing the expander triangle or the little plus symbol of a tree view manually should be easy. The Windows API functions DrawText() or DrawTextEx() can be used both to measure and draw the (optionally word-wrapped) text.
Edit:
Sorry, I completely missed the fact that you are using a listview right now, not a listbox. Indeed, there is no way to have rows with different heights in a listview, so that's no option. You could still use a listbox with a standard header control on top, but that may not support everything you are using now from listview functionality, and it may itself be as much or even more work to get right than dynamically showing and hiding listview rows to simulate collapsing and expanding.

If I understand it correctly, the memory requirement of TVirtualStringTree should be:
nodecount * (SizeOf(TVirtualNode) + YourNodeDataSize + DWORD-align-padding)
To minimize the memory footprint, you could perhaps initialize the nodes with only pointers to offsets to a memory-mapped file. Resetting nodes which have already been initialized doesn't seem necessary in this case - the memory footprint should be nodecount * (44 + 4 + 0) - for 5 million records, about 230 MB.
IMHO you can't get any better with the tree but using a memory-mapped file would allow you to read the data directly from the file without allocating even more memory and copying the data to it.
You could also consider using a tree structure instead of a flat view to present the data. That way you could initialize child nodes of a parent node on demand (when the parent node is expanded) and resetting the parent node when it's collapsed (therefore freeing all its child nodes). In other words, try not to have too many nodes at the same level.

To meet your requirement "to expand/collapse records that span multiple lines", I'd simply use a drawgrid. To check it out, drag a drawgrid onto a form, then plug in the following Delphi 6 code. You can collapse and expand 5,000,000 multiline records (or whatever quantity you want) with essentially no overhead. It's a simple technique, doesn't require much code, and works surprisingly well.
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms, Dialogs, Grids, StdCtrls;
type
TForm1 = class(TForm)
DrawGrid1: TDrawGrid;
procedure DrawGrid1DrawCell(Sender: TObject; ACol, ARow: Integer; Rect: TRect; State: TGridDrawState);
procedure DrawGrid1SelectCell(Sender: TObject; ACol, ARow: Integer; var CanSelect: Boolean);
procedure DrawGrid1TopLeftChanged(Sender: TObject);
procedure DrawGrid1DblClick(Sender: TObject);
procedure FormCreate(Sender: TObject);
private
procedure AdjustGrid;
end;
var
Form1: TForm1;
implementation
{$R *.dfm}
// Display a large number of multi-line records that can be expanded or collapsed, using minimal overhead.
// LinesInThisRecord() and RecordContents() are faked; change them to return actual data.
const TOTALRECORDS = 5000000; // arbitrary; a production implementation would probably determine this at run time
// keep track of whether each record is expanded or collapsed
var isExpanded: packed array[1..TOTALRECORDS] of boolean; // initially all FALSE
function LinesInThisRecord(const RecNum: integer): integer;
begin // how many lines (rows) does the record need to display when expanded?
result := (RecNum mod 10) + 1; // make something up, so we don't have to use real data just for this demo
end;
function LinesDisplayedForRecord(const RecNum: integer): integer;
begin // how many lines (rows) of info are we currently displaying for the given record?
if isExpanded[RecNum] then result := LinesInThisRecord(RecNum) // all lines show when expanded
else result := 1; // show only 1 row when collapsed
end;
procedure GridRowToRecordAndLine(const RowNum: integer; var RecNum, LineNum: integer);
var LinesAbove: integer;
begin // for a given row number in the drawgrid, return the record and line numbers that appear in that row
RecNum := Form1.DrawGrid1.TopRow; // for simplicity, TopRow always displays the record with that same number
if RecNum > TOTALRECORDS then RecNum := 0; // avoid overflow
LinesAbove := 0;
while (RecNum > 0) and ((LinesDisplayedForRecord(RecNum) + LinesAbove) &lt (RowNum - Form1.DrawGrid1.TopRow + 1)) do
begin // accumulate the tally of lines in expanded or collapsed records until we reach the row of interest
inc(LinesAbove, LinesDisplayedForRecord(RecNum));
inc(RecNum); if RecNum > TOTALRECORDS then RecNum := 0; // avoid overflow
end;
LineNum := RowNum - Form1.DrawGrid1.TopRow + 1 - LinesAbove;
end;
function RecordContents(const RowNum: integer): string;
var RecNum, LineNum: integer;
begin // display the data that goes in the grid row. for now, fake it
GridRowToRecordAndLine(RowNum, RecNum, LineNum); // convert row number to record and line numbers
if RecNum = 0 then result := '' // out of range
else
begin
result := 'Record ' + IntToStr(RecNum);
if isExpanded[RecNum] then // show line counts too
result := result + ' line ' + IntToStr(LineNum) + ' of ' + IntToStr(LinesInThisRecord(RecNum));
end;
end;
procedure TForm1.AdjustGrid;
begin // don't allow scrolling past last record
if DrawGrid1.TopRow > TOTALRECORDS then DrawGrid1.TopRow := TOTALRECORDS;
if RecordContents(DrawGrid1.Selection.Top) = '' then // move selection back on to a valid cell
DrawGrid1.Selection := TGridRect(Rect(0, TOTALRECORDS, 0, TOTALRECORDS));
DrawGrid1.Refresh;
end;
procedure TForm1.DrawGrid1DrawCell(Sender: TObject; ACol, ARow: Integer; Rect: TRect; State: TGridDrawState);
var s: string;
begin // time to draw one of the grid cells
if ARow = 0 then s := 'Data' // we're in the top row, get the heading for the column
else s := RecordContents(ARow); // painting a record, get the data for this cell from the appropriate record
// draw the data in the cell
ExtTextOut(DrawGrid1.Canvas.Handle, Rect.Left, Rect.Top, ETO_CLIPPED or ETO_OPAQUE, #Rect, pchar(s), length(s), nil);
end;
procedure TForm1.DrawGrid1SelectCell(Sender: TObject; ACol, ARow: Integer; var CanSelect: Boolean);
var RecNum, ignore: integer;
begin
GridRowToRecordAndLine(ARow, RecNum, ignore); // convert selected row number to record number
CanSelect := RecNum &lt> 0; // don't select unoccupied rows
end;
procedure TForm1.DrawGrid1TopLeftChanged(Sender: TObject);
begin
AdjustGrid; // keep last page looking good
end;
procedure TForm1.DrawGrid1DblClick(Sender: TObject);
var RecNum, ignore, delta: integer;
begin // expand or collapse the currently selected record
GridRowToRecordAndLine(DrawGrid1.Selection.Top, RecNum, ignore); // convert selected row number to record number
isExpanded[RecNum] := not isExpanded[RecNum]; // mark record as expanded or collapsed; subsequent records might change their position in the grid
delta := LinesInThisRecord(RecNum) - 1; // amount we grew or shrank (-1 since record already occupied 1 line)
if isExpanded[RecNum] then // just grew
else delta := -delta; // just shrank
DrawGrid1.RowCount := DrawGrid1.RowCount + delta; // keep rowcount in sync
AdjustGrid; // keep last page looking good
end;
procedure TForm1.FormCreate(Sender: TObject);
begin
Caption := FormatFloat('#,##0 records', TOTALRECORDS);
DrawGrid1.RowCount := TOTALRECORDS + 1; // +1 for column heading
DrawGrid1.ColCount := 1;
DrawGrid1.DefaultColWidth := 300; // arbitrary
DrawGrid1.DefaultRowHeight := 12; // arbitrary
DrawGrid1.Options := DrawGrid1.Options - [goVertLine, goHorzLine, goRangeSelect] + [goDrawFocusSelected, goThumbTracking]; // change some defaults
end;
end.

You shouldn't use ResetNode because this method invokes InvalidateNode and initializes node again, leading to opposite effect than expected.
I don't know if it's possible to induce VST to free memory size specified in NodeDataSize without actually removing node. But why not set NodeDataSize to size of Pointer ( Delphi, VirtualStringTree - classes (objects) instead of records ) and manage data yourself? Just an idea...

Give "DeleteChildren" a try. Here's what this procedure's comment says:
// Removes all children and their children from memory without changing the vsHasChildren style by default.
Never used it, but as I read it, you can use that in the OnCollapsed event to free the memory allocated to nodes that just became invisible. And then re-generate those nodes in OnExpading so the user never knows the node went away from memory.
But I can't be sure, I never had a need for such behaviour.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart