In Delphi, is there a fast way of emptying a TStringgrid (containing in excess of 5000 rows) that will also free the memory?
Setting the rowcount to 1, empties the grid but does not free the memory.
Thanks in advance,
Paul
This should uninitialize the allocated strings (from the string list where the row texts are stored). Cleaning is done by columns since you have a lot of rows.
procedure TForm1.Button1Click(Sender: TObject);
var
I: Integer;
begin
for I := 0 to StringGrid1.ColCount - 1 do
StringGrid1.Cols[I].Clear;
StringGrid1.RowCount := 1;
end;
By "does not free the memory", do you mean that if you set RowCount := 1, and then set the RowCount := 10' you can still see the old content of theCells`?
If so, this is an old issue and has nothing to do with the memory not being freed; it's simply because you just happen to see the previous content of the memory when it's allocated again, because memory isn't zero'd out.
I have a pretty standard routine in a utility unit that deals with this visual glitch, and unless the grid is huge works fast enough. Just pass the TStringGrid before you change the RowCount or ColCount to a lower value.
procedure ClearStringGrid(const Grid: TStringGrid);
var
c, r: Integer;
begin
for c := 0 to Pred(Grid.ColCount) do
for r := 0 to Pred(Grid.RowCount) do
Grid.Cells[c, r] := '';
end;
Use it like this:
ClearStringGrid(StringGrid1);
StringGrid1.RowCount := 1;
I would suggest storing your string values in your own memory that you have full control over, and then use a TDrawGrid, or better a virtual TListView, to display the contents of that memory as needed.
The fastest way to use a TStringGrid is using OnGetValue/OnSetValue.
This way only the text of visible cells are requested dynamically.
Adding and removing rows is then lighting fast, otherwise TStringgrid is
very slooow when you have more than 5000 records.
This way I can fill and clear a grid with 700.000 records within a second!
When memory usage is the critical argument, consider using another grid. For example, NLDStringGrid that is (re)written by myself, and which has an additional property called MemoryOptions. It controls whether data can be stored beyond ColCount * RowCount, whether the storage is proportional (less memory usage for partially filled rows and columns), whether to store the Cols and Rows property results and whether the data is stored in sparse manner.
To clear such grid that has moBeyondGrid excluded from the memory options, setting RowCount to FixedRows suffices.
It's open source and downloadable from here.
Related
In another question, David Heffernan posted a comment about his "all time least favourite Delphi construct":
SetLength(FItems, Length(FItems)+1);
The construct he dislikes was heavily used in the Pebongo project, such as in this excerpt:
procedure TBSONDocument.ReadStream( F: TStream );
var
len : integer;
elmtype : byte;
elmname : string;
begin
Clear;
f.Read( len, sizeof( len ) );
f.Read( elmtype, sizeof( byte ) );
while elmtype <> BSON_EOF do // Loop
begin
elmname := _ReadString( f );
SetLength( FItems, length( FItems ) + 1 ); // This, spotted by TOndrej
case elmtype of
BSON_ARRAY: FItems[high( FItems )] := TBSONArrayItem.Create;
BSON_BINARY: FItems[high( FItems )] := TBSONBinaryItem.Create;
...
end;
f.Read( elmtype, sizeof( byte ) );
end;
end;
What are the alternatives?
It's not the SetLength() itself that is bad but incrementing the length in the loop. Example of bad code:
SetLength( Result.FItems, 0 );
for i := 0 to high( FItems ) do
begin
SetLength(Result.FItems, Length(Result.Fitems)+1);
Result.FItems[i] := FItems[i].Clone;
end;
In this case array is rearranged and reallocating memory on each iteration. Your posted example doesn't show bad usage of SetLength()
What I was talking about was growing dynamic arrays 1 element at a time:
SetLength(FItems, Length(FItems)+1);
In a loop, and with large arrays, this can lead to memory address fragmentation. When this happens you can find yourself unable to allocate a large contiguous block of memory even though the total available address space is large. If you are constrained by 32 bit address space this can be a very real problem. In addition, performance can also be an issue.
There are a variety of ways to avoid the problem:
Pre-allocate the array. Sometimes this involves iterating twice, once to count the items and once to populate the array. This can be perfectly efficient. In fact, that is precisely what TBSONDocument.Clone in your question does.
Use a TList or TList<T> container. Although these use dynamic arrays as their underlying storage they implement a capacity based approach to allocation. When they are full, they allocate a large number of extra items in anticipation for future additions. This again can be perfectly efficient.
Yet another option, to be used as a last resort, is to move away from contiguously allocated memory. Allocate memory in chunks and use an indexed property of a class or record to map between indices and the actual chunk and location where the storage lives. This is particularly effective at avoiding address space fragmentation.
An alternative would be to use a list like TList, TObjectList, or a generic list.
Actually, the part you posted (SetLength(Result.FItems, Length(FItems));) is fine. I believe David was referring to another part: SetLength(FItems, Length(FItems) + 1); which is supposed to grow a dynamic array and can be inefficient in a loop.
An alternative you can use when the number of items is unknown beforehand: Increase the length of the array every X items, and at the end adjust the length of the array to the number of items added.
Example:
Count := 0;
try
while not Query.EOF do
begin
if Length(MyArray) = Count then
SetLength(MyArray, Length(MyArray) + 50); //increase length with 50 items
MyArray[Count] := Query.Fields(0).AsString;
Inc(Count);
Query.Next;
end;
finally
SetLength(MyArray, Count); //adjust length to number of items added
end;
A TList utilizes the non-contiguous memory style that David refers to. This is an addon to his answer.
Here's a real world analogy:
"Cars in a parking lot": An analogy for List of Objects versus an Array of Records
The Array
You have a row of cars in the parking lot filled up. The row of cars is an array. The actual cars are stored in the array. The size of the array in memory is the size of all of the real cars combined.
Adding a car to an Array
Let's say you want to add another car, but there's no more room in the row. So, you have to build a new row that's bigger and move all the cars over into the new row. You have to do this each time you want to add one to the row. That's an array.
The List
What if you put the cars in a list by writing their VINs onto a piece of paper? Each VIN refers to a car out in the parking lot. That list on the paper is a TList. It just stores a reference to each of the cars.
Adding a car to a TList
Now, let's say you want to add a car to the list. Well, you can park the car anywhere. If your lot's full, park it in Alaska. It doesn't matter. Just add the new VIN to the list, and you're done. If your paper fills up, get another piece of paper. But, don't move the cars.
TList is good because the actual list items aren't stored in the list.
It's just pointers to the items. Those pointers are relatively small,
plus TList pre-allocates space (in the analogy, each sheet of paper).
I have a TStringGrid with 10 columns. Adding 500 rows to it takes around 2 seconds. Is this normal performance?
It seems a bit slow to me.
I am getting the data from a database query. If I loop through the query but don't write the results to the StringGrid, the process takes around 100ms, so it's not the database that's slowing things down.
Once the rows are added, the StringGrid performance is fine.
Here is the code I am using
Grid.RowCount := Query.RecordCount;
J := 0;
while not Query.EOF do
begin
Grid.Cells[0,J]:=Query.FieldByName('Value1').AsString;
Grid.Cells[1,J]:=Query.FieldByName('Value2').AsString;
Grid.Cells[2,J]:=Query.FieldByName('Value3').AsString;
// etc for other columns.
Inc(J);
Query.Next();
end;
The real code is actually a bit more complex (the table columns do not correspond exactly to the query columns) but that's the basic idea
One other thing I have found to be very important when going through a lot of records is to use proper TField variables for each field. FieldByName iterates through the Fields collection every time so is not the most performant option.
Before the loop define each field as in:
var
f1, f2: TStringField;
f3: TIntegerField;
begin
// MyStringGrid.BeginUpdate; // Can't do this
// Could try something like this instead:
// MyStringGrid.Perform(WM_SETREDRAW, 0, 0);
try
while ... do
begin
rowvalues[0] := f1.AsString;
rowvalues[1] := f2.AsString;
rowvalues[2] := Format('%4.2d', f3.AsInteger);
// etc
end;
finally
// MyStringGrid.EndUpdate; // Can't - see above
// MyStringGrid.Perform(WM_SETREDRAW, 1, 0);
// MyStringGrid.Invalidate;
end;
end;
That along with BeginUpdate/Endupdate and calling Query.DisableControls if appropriate.
The solution was to add all values in a row at once, using the "Rows" property.
My code now looks like this:
Grid.RowCount := Query.RecordCount;
rowValues:=TStringList.Create;
J := 0;
while not Query.EOF do
begin
rowValues[0]:=Query.FieldByName('Value1').AsString;
rowValues[1]:=Query.FieldByName('Value2').AsString;
rowValues[2]:=Query.FieldByName('Value3').AsString;
// etc for other columns.
Grid.Rows[J]:=rowValues;
Inc(J);
Query.Next();
end;
rowValues.Free; // for the OCD among us
This brought the time down from 2 seconds to about 50ms.
FieldByName used in a loop is very slow since it is calculated each time. You should do it out of the loop and then just use results inside of a loop.
TStringGrid works OK for a small number of records, but don't try it for more than 10.000 records.
We had severe performance problems with TAdvStringGrid from TMS (which is based on Delphi TStringGrid) when loading/sorting/grouping large grid sets, but also when inserting one row at the top of the grid (expanding a grid group node). Also memory usage was high.
And yes, I used the beginupdate/endupdate already. Also other tricks. But after diving into the structure of TStringGrid I concluded it could never be fast for many records.
As a general tip (for large grids): use the OnGetText (and OnSetText) event. This event is used for filling the grid on demand (only the cells that are displayed). Store the data in your own data objects. This made our grid very fast (1.000.000 record is no problem anymore, loads within seconds!)
First optimization is to replace very slow Query.FieldByName('Value1') calls by a local TQuery.
var
F1, F2, F3: TField;
Grid.RowCount := Query.RecordCount;
J := 0;
F1 := Query.FieldByName('Value1');
F2 := Query.FieldByName('Value2');
F3 := Query.FieldByName('Value3');
while not Query.EOF do
begin
Grid.Cells[0,J]:=F1.AsString;
Grid.Cells[1,J]:=F2.AsString;
Grid.Cells[2,J]:=F3.AsString;
// etc for other columns.
Inc(J);
Query.Next();
end;
If this is not enough, use the grid in virtual mode, i.e. retrieve all content in a TStringList or any in-memory structure, then use the OnGetText or OnDrawCell methods.
I believe it's slow because it has to repaint itself everytime you add a row. Since you are taking the values from a query i think it would be better for you to use a TDBGrid instead.
Best regards.
If you know how many rows you're about to add, store the current rowcount in a temporary variable, set the grid's rowcount to accommodate the current rowcount plus the rows you're about to add, then assign the new values to the rows (using the former rowcount you stored) rather than adding them. This will reduce a lot of background processing.
Try testing with AQTime or similar tool (profilers).
Without any code is difficult, but I thinks thar the poor performance is due to FieldByName, not StringGrid.
FieldByName make a liear search:
for I := 0 to FList.Count - 1 do
begin
Result := FList.Items[I];
...
If your Dataset have many columns (fields) the performance will still be lower.
Regards.
I was going to say "why not just use beginupdate/endupdate?" but now I see that the regular string grid doesn't support it.
While googling that, I found a way to simulate beginupdate/endupdate:
http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_21832072.html
See the answer by ZhaawZ, where he uses a pair of WM_SETREDRAW messages to disable/enable the repainting. If this works, use in conjunction with the "eliminate use of FieldbyName" trick, and it should take no time to draw.
Set Grid.RowCount = 2 before the loop then when the loop is finished set the rowcount to the correct value.
That avoids lots of calls to the OnPaint event.
In my case it turned out that the Debug build was slow and the Release build was fast - a Heisenbug.
More specifically, FastMM4 FullDebugMode triggered the slowness.
I have to check if I have duplicate paths in a FileListBox (FileListBox has the role of some kind of job list or play list).
Using Delphi's SameText, CompareStr, CompareText, takes 6 seconds. So I came with my own compare function which is (just) a bit faster but not fast enough. Any ideas how to improve it?
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result:= Length(Path1)= Length(Path2); { if they have different lenghts then obviously are not the same file }
if Result then
for i:= Length(Path1) downto 1 DO { start from the end because it is more likely to find the difference there }
if Path1[i]<> Path2[i] then
begin
Result:= FALSE;
Break;
end;
end;
I use it like this:
for x:= JList.Count-1 downto 1 DO
begin
sMaster:= JList.Items[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, JList.Items[y]) then
begin
JList.Items.Delete (x); { REMOVE DUPLICATES }
Break;
end;
end;
Note: The chance of having duplicates is small so Delete is not called often. Also the list cannot be sorted because the items are added by user and sometimes the order may be important.
Update:
The thing is that I lose the asvantage of my code because it is Pascal.
It would be nice if the comparison loop ( Path1[i]<> Path2[i] ) would be optimized to use Borland's ASM code.
Delphi 7, Win XP 32 bit, Tests were done with 577 items in the list. Deleting the items from list IS NOT A PROBLEM because it happens rarely.
CONCLUSION
As Svein Bringsli pointed, my code is slow not because of the comparing algorithm but because of TListBox. The BEST solution was provided by Marcelo Cantos. Thanks a lot Marcelo.
I accepted Svein's answer because it answers directly my question "how to make my comparison function faster" with "there is no point to make it faster".
For the moment I implemented the dirty and quick to implement solution: when I have under 200 files, I use my slow code to check the duplicates. If there are more than 200 files I use dwrbudr's solution (which is damn fast) considering that if the user has so many files, the order is irrelevant anyway (human brain cannot track so many items).
I want to thank you all for ideas and especially Svein for revealing the truth: (Borland's) visual controls are damn slow!
Don't waste time optimising the assembler. You can go from O(n2) to O(n log(n)) — bringing the time down to milliseconds — by sorting the list and then doing a linear scan for duplicates.
While you're at it, forget the SameFile function. The algorithmic improvement will dwarf anything you can achieve there.
Edit: Based on feedback in the comments...
You can perform an order-preserving O(n log(n)) de-duplication as follows:
Sort a copy of the list.
Identify and copy duplicated entries to a third list along with their duplication count minus one.
Walk the original list backwards as per your original version.
In the inner (for y := ...) loop, traverse the duplication list instead. If an outer item matches, delete it, decrement the duplication count, and delete the duplication entry if the count reaches zero.
This is obviously more complicated but it will still be orders of magnitude faster, even if you do horrible dirty things like storing duplication counts as strings, C:\path1\file1=2, and using code like:
y := dupes.IndexOfName(sMaster);
if y <> -1 then
begin
JList.Items.Delete(x);
c := StrToInt(dupes.ValueFromIndex(y));
if c > 1 then
dupes.Values[sMaster] = IntToStr(c - 1);
else
dupes.Delete(y);
end;
Side note: A binary chop would be more efficient than the for y := ... loop, but given that duplicates are rare, the difference ought to be negligible.
Using your code as a starting point, I modified it to take a copy of the list before searching for duplicates. The time went from 5,5 seconds to about 0,5 seconds.
vSL := TStringList.Create;
try
vSL.Assign(jList.Items);
vSL.Sorted := true;
for x:= vSL.Count-1 downto 1 DO
begin
sMaster:= vSL[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, vSL[y]) then
begin
vSL.Delete (x); { REMOVE DUPLICATES }
jList.Items.Delete (x);
Break;
end;
end;
finally
vSL.Free;
end;
Obviously, this is not a good way to do it, but it demonstrates that TFileListBox is in itself quite slow. I don't believe you can gain much by optimizing your compare-function.
To demonstrate this, I replaced your SameFile function with the following, but kept the rest of your code:
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result := false; //Pretty darn fast code!!!
end;
The time went from 5,6 seconds to 5,5 seconds. I don't think there's much more to gain there :-)
Create another sorted list with sortedList.Duplicates := dupIgnore and add your strings to that list, then back.
vSL := TStringList.Create;
try
vSL.Sorted := true;
vSL.Duplicates := dupIgnore;
for x:= 0 to jList.Count - 1 do
vSL.Add(jList[x]);
jList.Clear;
for x:= 0 to vSL.Count - 1 do
jList.Add(vSL[x]);
finally
vSL.Free;
end;
The absolute fastest way, bar none (as alluded to before) is to use a routine that generates a unique 64/128/256 bit hash code for a string (I use the SHA256Managed class in C#). Run down the list of strings, generate the hash code for the strings, check for it in the sorted hash code list, and if found then the string is a duplicate. Otherwise add the hash code to the sorted hash code list.
This will work for strings, file names, images (you can get the unique hash code for an image), etc, and I guarantee that this will be as fast or faster than any other impementation.
PS You can use a string list for the hash codes by representing the hash codes as strings. I've used a hex representation in the past (256 bits -> 64 characters) but in theory you can do it any way you like.
4 seconds for how many calls? Great performance if you call it a billion times...
Anyway, does Length(Path1) get evaluated every time through the loop? If so, store that in an Integer variable prior to looping.
Pointers may yield some speed over the strings.
Try in-lining the function with:
function SameFile(blah blah): Boolean; Inline;
That will save some time, if this is being called thousands of times per second. I would start with that and see if it saves anything.
EDIT: I didn't realize that your list wasn't sorted. Obviously, you should do that first! Then you don't have to compare against every other item in the list - just the prior or next one.
I use a modified Ternary Search Tree (TST) to dedupe lists. You simply load the items into the tree, using the whole string as the key, and on each item you can get back an indication if the key is already there (and delete your visible entry). Then you throw away the tree. Our TST load function can typically load 100000 80-byte items in well under a second. And it could not take any more than this to repaint your list, with proper use of begin- and end-update. The TST is memory-hungry, but not so that you would notice it at all if you only have of the order of 500 items. And much simpler than sorting, comparisons and assembler (if you have a suitable TST implementation, of course).
No need to use a hash table, a single sorted list gives me a result of 10 milliseconds, that's 0.01 seconds, which is about 500 times faster! Here is my test code using a TListBox:
procedure TForm1.Button1Click(Sender: TObject);
var
lIndex1: Integer;
lString: string;
lIndex2: Integer;
lStrings: TStringList;
lCount: Integer;
lItems: TStrings;
begin
ListBox1.Clear;
for lIndex1 := 1 to 577 do begin
lString := '';
for lIndex2 := 1 to 100 do
if (lIndex2 mod 6) = 0 then
lString := lString + Chr(Ord('a') + Random(2))
else
lString := lString + 'a';
ListBox1.Items.Add(lString);
end;
CsiGlobals.AddLogMsg('Start', 'Test', llBrief);
lStrings := TStringList.Create;
try
lStrings.Sorted := True;
lCount := 0;
lItems := ListBox1.Items;
with lItems do begin
BeginUpdate;
try
for lIndex1 := Count - 1 downto 0 do begin
lStrings.Add(Strings[lIndex1]);
if lStrings.Count = lCount then
Delete(lIndex1)
else
Inc(lCount);
end;
finally
EndUpdate;
end;
end;
finally
lStrings.Free;
end;
CsiGlobals.AddLogMsg('Stop', 'Test', llBrief);
end;
I'd also like to point out that your solution would take an extreme amount of time if applied to a huge list (like containing 100,000,000 items or more). Even constructing a hashtable or sorted list would take too much time.
In cases like that you could try another approach : Hash each member, but instead of populating a full-blown hashtable, create a bitset (large enough to contain a close factor to as many slots as there are input items) and just set each bit at the offset indicated by the hashfunction. If the bit was 0, change it to 1. If it was already 1, take note of the offending string-index in a separate list and continue. This results in a list of string-indexes that had a collision in the hash, so you'll have to run it a second time to find the first cause of those collisions. After that, you should sort & de-dupe the string-indexes in this list (as all indexes apart from the first one will be present twice). Once that's done you should sort the list again, but this time sort it on the string-contents in order to easily spot duplicates in a following single scan.
Granted it could be a bit extreme to go this all this length, but at least it's a workable solution for very large volumes! (Oh, and this still won't work if the number of duplicates is very high, when the hash-function has a bad spread or when the number of slots in the 'hashtable' bitset is chosen too small - which would give many collisions which aren't really duplicates.)
I tried to use a TListView component to display rather large data lists (like 4000 rows large), and creating the list is incredibly slow - it takes something like 2-3 secs, which makes the UI all laggy and close to unusable.
I fill the TListView.Items inside a BeginUpdate/EndUpdate block, with only preallocated strings - I mean : I build a list of all strings to store (which takes no humanly noticeable time), then I put them in the TListView.
I wish to display the TListView's content in vsReport mode with several columns.
The code looks like this :
MyList.Items.BeginUpdate;
for i := 0 to MyCount - 1 do
begin
ListItem := MyList.Items.Add;
ListItem.Caption := StrCaptions[i];
ListItem.SubItems.Add(StrSubItems1[i]);
ListItem.SubItems.Add(StrSubItems2[i]);
end;
MyList.Items.EndUpdate;
Is there some other hack I missed in the TListView component's logic ? or should I just forget about using this component for performances ?
You can use listview in virtual mode. Have a look at the virtuallistview.dpr demo.
You can try Virtual Treeview component. It says "Virtual Treeview is extremely fast. Adding one million nodes takes only 700 milliseconds"
Use separate structure for holding your data. Set OwnerData of TListView to True.
#4000 rows I get only ~700 ms (D2009) times. For more responsiveness you could separate to other thread or add dirty Application.ProcessMessages() into loop.
rows generated with this code in 16 ms:
MyCount := 4000;
dw := GetTickCount();
for i := 0 to MyCount - 1 do begin
StrCaptions.Add('caption'+IntToStr(i));
StrSubItems1.Add('sub1'+IntToStr(i));
StrSubItems2.Add('sub2'+IntToStr(i));
end;
ShowMessageFmt('%u ms', [GetTickCount() - dw]);
Printed with:
MyList.Clear;
dw := GetTickCount();
MyList.Items.BeginUpdate;
for i := 0 to MyCount - 1 do
begin
ListItem := MyList.Items.Add;
ListItem.Caption := StrCaptions[i];
ListItem.SubItems.Add(StrSubItems1[i]);
ListItem.SubItems.Add(StrSubItems2[i]);
end;
MyList.Items.EndUpdate;
ShowMessageFmt('%u ms', [GetTickCount() - dw]);
EDIT:
I inserted Application.ProcessMessages() into print, but somewhy performance stays same
I have an app that loads records from a binary log file and displays them in a virtual TListView. There are potentially millions of records in a file, and the display can be filtered by the user, so I do not load all of the records in memory at one time, and the ListView item indexes are not a 1-to-1 relation with the file record offsets (List item 1 may be file record 100, for instance). I use the ListView's OnDataHint event to load records for just the items the ListView is actually interested in. As the user scrolls around, the range specified by OnDataHint changes, allowing me to free records that are not in the new range, and allocate new records as needed.
This works fine, speed is tolerable, and the memory footprint is very low.
I am currently evaluating TVirtualStringTree as a replacement for the TListView, mainly because I want to add the ability to expand/collapse records that span multiple lines (I can fudge it with the TListView by incrementing/decrementing the item count dynamically, but this is not as straight forward as using a real tree).
For the most part, I have been able to port the TListView logic and have everything work as I need. I notice that TVirtualStringTree's virtual paradigm is vastly different, though. It does not have the same kind of OnDataHint functionality that TListView does (I can use the OnScroll event to fake it, which allows my memory buffer logic to continue working), and I can use the OnInitializeNode event to associate nodes with records that are allocated.
However, once a tree node is initialized, it sees that it remains initialized for the lifetime of the tree. That is not good for me. As the user scrolls around and I remove records from memory, I need to reset those non-visual nodes without removing them from the tree completely, or losing their expand/collapse states. When the user scrolls them back into view, I can re-allocate the records and re-initialize the nodes. Basically, I want to make TVirtualStringTree act as much like TListView as possible, as far as its virtualization is concerned.
I have seen that TVirtualStringTree has a ResetNode() method, but I encounter various errors whenever I try to use it. I must be using it wrong. I also thought of just storing a data pointer inside each node to my record buffers, and I allocate and free memory, update those pointers accordingly. The end effect does not work so well, either.
Worse, my largest test log file has ~5 million records in it. If I initialize the TVirtualStringTree with that many nodes at one time (when the log display is unfiltered), the tree's internal overhead for its nodes takes up a whopping 260MB of memory (without any records being allocated yet). Whereas with the TListView, loading the same log file and all the memory logic behind it, I can get away with using just a few MBs.
Any ideas?
You probably shouldn't switch to VST unless you have a use for at least some of the nice features of VST that a standard listbox / listview don't have. But there is of course a large memory overhead compared to a flat list of items.
I don't see a real benefit in using TVirtualStringTree only to be able to expand and collapse items that span multiple lines. You write
mainly because I want to add the ability to expand/collapse records that span multiple lines (I can fudge it with the TListView by incrementing/decrementing the item count dynamically, but this is not as straight forward as using a real tree).
but you can implement that easily without changing the item count. If you set the Style of the listbox to lbOwnerDrawVariable and implement the OnMeasureItem event you can adjust the height as required to draw either only the first or all lines. Drawing the expander triangle or the little plus symbol of a tree view manually should be easy. The Windows API functions DrawText() or DrawTextEx() can be used both to measure and draw the (optionally word-wrapped) text.
Edit:
Sorry, I completely missed the fact that you are using a listview right now, not a listbox. Indeed, there is no way to have rows with different heights in a listview, so that's no option. You could still use a listbox with a standard header control on top, but that may not support everything you are using now from listview functionality, and it may itself be as much or even more work to get right than dynamically showing and hiding listview rows to simulate collapsing and expanding.
If I understand it correctly, the memory requirement of TVirtualStringTree should be:
nodecount * (SizeOf(TVirtualNode) + YourNodeDataSize + DWORD-align-padding)
To minimize the memory footprint, you could perhaps initialize the nodes with only pointers to offsets to a memory-mapped file. Resetting nodes which have already been initialized doesn't seem necessary in this case - the memory footprint should be nodecount * (44 + 4 + 0) - for 5 million records, about 230 MB.
IMHO you can't get any better with the tree but using a memory-mapped file would allow you to read the data directly from the file without allocating even more memory and copying the data to it.
You could also consider using a tree structure instead of a flat view to present the data. That way you could initialize child nodes of a parent node on demand (when the parent node is expanded) and resetting the parent node when it's collapsed (therefore freeing all its child nodes). In other words, try not to have too many nodes at the same level.
To meet your requirement "to expand/collapse records that span multiple lines", I'd simply use a drawgrid. To check it out, drag a drawgrid onto a form, then plug in the following Delphi 6 code. You can collapse and expand 5,000,000 multiline records (or whatever quantity you want) with essentially no overhead. It's a simple technique, doesn't require much code, and works surprisingly well.
unit Unit1;
interface
uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms, Dialogs, Grids, StdCtrls;
type
TForm1 = class(TForm)
DrawGrid1: TDrawGrid;
procedure DrawGrid1DrawCell(Sender: TObject; ACol, ARow: Integer; Rect: TRect; State: TGridDrawState);
procedure DrawGrid1SelectCell(Sender: TObject; ACol, ARow: Integer; var CanSelect: Boolean);
procedure DrawGrid1TopLeftChanged(Sender: TObject);
procedure DrawGrid1DblClick(Sender: TObject);
procedure FormCreate(Sender: TObject);
private
procedure AdjustGrid;
end;
var
Form1: TForm1;
implementation
{$R *.dfm}
// Display a large number of multi-line records that can be expanded or collapsed, using minimal overhead.
// LinesInThisRecord() and RecordContents() are faked; change them to return actual data.
const TOTALRECORDS = 5000000; // arbitrary; a production implementation would probably determine this at run time
// keep track of whether each record is expanded or collapsed
var isExpanded: packed array[1..TOTALRECORDS] of boolean; // initially all FALSE
function LinesInThisRecord(const RecNum: integer): integer;
begin // how many lines (rows) does the record need to display when expanded?
result := (RecNum mod 10) + 1; // make something up, so we don't have to use real data just for this demo
end;
function LinesDisplayedForRecord(const RecNum: integer): integer;
begin // how many lines (rows) of info are we currently displaying for the given record?
if isExpanded[RecNum] then result := LinesInThisRecord(RecNum) // all lines show when expanded
else result := 1; // show only 1 row when collapsed
end;
procedure GridRowToRecordAndLine(const RowNum: integer; var RecNum, LineNum: integer);
var LinesAbove: integer;
begin // for a given row number in the drawgrid, return the record and line numbers that appear in that row
RecNum := Form1.DrawGrid1.TopRow; // for simplicity, TopRow always displays the record with that same number
if RecNum > TOTALRECORDS then RecNum := 0; // avoid overflow
LinesAbove := 0;
while (RecNum > 0) and ((LinesDisplayedForRecord(RecNum) + LinesAbove) < (RowNum - Form1.DrawGrid1.TopRow + 1)) do
begin // accumulate the tally of lines in expanded or collapsed records until we reach the row of interest
inc(LinesAbove, LinesDisplayedForRecord(RecNum));
inc(RecNum); if RecNum > TOTALRECORDS then RecNum := 0; // avoid overflow
end;
LineNum := RowNum - Form1.DrawGrid1.TopRow + 1 - LinesAbove;
end;
function RecordContents(const RowNum: integer): string;
var RecNum, LineNum: integer;
begin // display the data that goes in the grid row. for now, fake it
GridRowToRecordAndLine(RowNum, RecNum, LineNum); // convert row number to record and line numbers
if RecNum = 0 then result := '' // out of range
else
begin
result := 'Record ' + IntToStr(RecNum);
if isExpanded[RecNum] then // show line counts too
result := result + ' line ' + IntToStr(LineNum) + ' of ' + IntToStr(LinesInThisRecord(RecNum));
end;
end;
procedure TForm1.AdjustGrid;
begin // don't allow scrolling past last record
if DrawGrid1.TopRow > TOTALRECORDS then DrawGrid1.TopRow := TOTALRECORDS;
if RecordContents(DrawGrid1.Selection.Top) = '' then // move selection back on to a valid cell
DrawGrid1.Selection := TGridRect(Rect(0, TOTALRECORDS, 0, TOTALRECORDS));
DrawGrid1.Refresh;
end;
procedure TForm1.DrawGrid1DrawCell(Sender: TObject; ACol, ARow: Integer; Rect: TRect; State: TGridDrawState);
var s: string;
begin // time to draw one of the grid cells
if ARow = 0 then s := 'Data' // we're in the top row, get the heading for the column
else s := RecordContents(ARow); // painting a record, get the data for this cell from the appropriate record
// draw the data in the cell
ExtTextOut(DrawGrid1.Canvas.Handle, Rect.Left, Rect.Top, ETO_CLIPPED or ETO_OPAQUE, #Rect, pchar(s), length(s), nil);
end;
procedure TForm1.DrawGrid1SelectCell(Sender: TObject; ACol, ARow: Integer; var CanSelect: Boolean);
var RecNum, ignore: integer;
begin
GridRowToRecordAndLine(ARow, RecNum, ignore); // convert selected row number to record number
CanSelect := RecNum <> 0; // don't select unoccupied rows
end;
procedure TForm1.DrawGrid1TopLeftChanged(Sender: TObject);
begin
AdjustGrid; // keep last page looking good
end;
procedure TForm1.DrawGrid1DblClick(Sender: TObject);
var RecNum, ignore, delta: integer;
begin // expand or collapse the currently selected record
GridRowToRecordAndLine(DrawGrid1.Selection.Top, RecNum, ignore); // convert selected row number to record number
isExpanded[RecNum] := not isExpanded[RecNum]; // mark record as expanded or collapsed; subsequent records might change their position in the grid
delta := LinesInThisRecord(RecNum) - 1; // amount we grew or shrank (-1 since record already occupied 1 line)
if isExpanded[RecNum] then // just grew
else delta := -delta; // just shrank
DrawGrid1.RowCount := DrawGrid1.RowCount + delta; // keep rowcount in sync
AdjustGrid; // keep last page looking good
end;
procedure TForm1.FormCreate(Sender: TObject);
begin
Caption := FormatFloat('#,##0 records', TOTALRECORDS);
DrawGrid1.RowCount := TOTALRECORDS + 1; // +1 for column heading
DrawGrid1.ColCount := 1;
DrawGrid1.DefaultColWidth := 300; // arbitrary
DrawGrid1.DefaultRowHeight := 12; // arbitrary
DrawGrid1.Options := DrawGrid1.Options - [goVertLine, goHorzLine, goRangeSelect] + [goDrawFocusSelected, goThumbTracking]; // change some defaults
end;
end.
You shouldn't use ResetNode because this method invokes InvalidateNode and initializes node again, leading to opposite effect than expected.
I don't know if it's possible to induce VST to free memory size specified in NodeDataSize without actually removing node. But why not set NodeDataSize to size of Pointer ( Delphi, VirtualStringTree - classes (objects) instead of records ) and manage data yourself? Just an idea...
Give "DeleteChildren" a try. Here's what this procedure's comment says:
// Removes all children and their children from memory without changing the vsHasChildren style by default.
Never used it, but as I read it, you can use that in the OnCollapsed event to free the memory allocated to nodes that just became invisible. And then re-generate those nodes in OnExpading so the user never knows the node went away from memory.
But I can't be sure, I never had a need for such behaviour.