Poor performance of TStringGrid - delphi

I have a TStringGrid with 10 columns. Adding 500 rows to it takes around 2 seconds. Is this normal performance?
It seems a bit slow to me.
I am getting the data from a database query. If I loop through the query but don't write the results to the StringGrid, the process takes around 100ms, so it's not the database that's slowing things down.
Once the rows are added, the StringGrid performance is fine.
Here is the code I am using
Grid.RowCount := Query.RecordCount;
J := 0;
while not Query.EOF do
begin
Grid.Cells[0,J]:=Query.FieldByName('Value1').AsString;
Grid.Cells[1,J]:=Query.FieldByName('Value2').AsString;
Grid.Cells[2,J]:=Query.FieldByName('Value3').AsString;
// etc for other columns.
Inc(J);
Query.Next();
end;
The real code is actually a bit more complex (the table columns do not correspond exactly to the query columns) but that's the basic idea

One other thing I have found to be very important when going through a lot of records is to use proper TField variables for each field. FieldByName iterates through the Fields collection every time so is not the most performant option.
Before the loop define each field as in:
var
f1, f2: TStringField;
f3: TIntegerField;
begin
// MyStringGrid.BeginUpdate; // Can't do this
// Could try something like this instead:
// MyStringGrid.Perform(WM_SETREDRAW, 0, 0);
try
while ... do
begin
rowvalues[0] := f1.AsString;
rowvalues[1] := f2.AsString;
rowvalues[2] := Format('%4.2d', f3.AsInteger);
// etc
end;
finally
// MyStringGrid.EndUpdate; // Can't - see above
// MyStringGrid.Perform(WM_SETREDRAW, 1, 0);
// MyStringGrid.Invalidate;
end;
end;
That along with BeginUpdate/Endupdate and calling Query.DisableControls if appropriate.

The solution was to add all values in a row at once, using the "Rows" property.
My code now looks like this:
Grid.RowCount := Query.RecordCount;
rowValues:=TStringList.Create;
J := 0;
while not Query.EOF do
begin
rowValues[0]:=Query.FieldByName('Value1').AsString;
rowValues[1]:=Query.FieldByName('Value2').AsString;
rowValues[2]:=Query.FieldByName('Value3').AsString;
// etc for other columns.
Grid.Rows[J]:=rowValues;
Inc(J);
Query.Next();
end;
rowValues.Free; // for the OCD among us
This brought the time down from 2 seconds to about 50ms.

FieldByName used in a loop is very slow since it is calculated each time. You should do it out of the loop and then just use results inside of a loop.

TStringGrid works OK for a small number of records, but don't try it for more than 10.000 records.
We had severe performance problems with TAdvStringGrid from TMS (which is based on Delphi TStringGrid) when loading/sorting/grouping large grid sets, but also when inserting one row at the top of the grid (expanding a grid group node). Also memory usage was high.
And yes, I used the beginupdate/endupdate already. Also other tricks. But after diving into the structure of TStringGrid I concluded it could never be fast for many records.
As a general tip (for large grids): use the OnGetText (and OnSetText) event. This event is used for filling the grid on demand (only the cells that are displayed). Store the data in your own data objects. This made our grid very fast (1.000.000 record is no problem anymore, loads within seconds!)

First optimization is to replace very slow Query.FieldByName('Value1') calls by a local TQuery.
var
F1, F2, F3: TField;
Grid.RowCount := Query.RecordCount;
J := 0;
F1 := Query.FieldByName('Value1');
F2 := Query.FieldByName('Value2');
F3 := Query.FieldByName('Value3');
while not Query.EOF do
begin
Grid.Cells[0,J]:=F1.AsString;
Grid.Cells[1,J]:=F2.AsString;
Grid.Cells[2,J]:=F3.AsString;
// etc for other columns.
Inc(J);
Query.Next();
end;
If this is not enough, use the grid in virtual mode, i.e. retrieve all content in a TStringList or any in-memory structure, then use the OnGetText or OnDrawCell methods.

I believe it's slow because it has to repaint itself everytime you add a row. Since you are taking the values from a query i think it would be better for you to use a TDBGrid instead.
Best regards.

If you know how many rows you're about to add, store the current rowcount in a temporary variable, set the grid's rowcount to accommodate the current rowcount plus the rows you're about to add, then assign the new values to the rows (using the former rowcount you stored) rather than adding them. This will reduce a lot of background processing.

Try testing with AQTime or similar tool (profilers).
Without any code is difficult, but I thinks thar the poor performance is due to FieldByName, not StringGrid.
FieldByName make a liear search:
for I := 0 to FList.Count - 1 do
begin
Result := FList.Items[I];
...
If your Dataset have many columns (fields) the performance will still be lower.
Regards.

I was going to say "why not just use beginupdate/endupdate?" but now I see that the regular string grid doesn't support it.
While googling that, I found a way to simulate beginupdate/endupdate:
http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_21832072.html
See the answer by ZhaawZ, where he uses a pair of WM_SETREDRAW messages to disable/enable the repainting. If this works, use in conjunction with the "eliminate use of FieldbyName" trick, and it should take no time to draw.

Set Grid.RowCount = 2 before the loop then when the loop is finished set the rowcount to the correct value.
That avoids lots of calls to the OnPaint event.

In my case it turned out that the Debug build was slow and the Release build was fast - a Heisenbug.
More specifically, FastMM4 FullDebugMode triggered the slowness.

Related

Loading millions of records into a stringlist can be very slow

how can i load millions of records from tadotable into a stringlist very fast?
procedure TForm1.SlowLoadingIntoStringList(StringList: TStringList);
begin
StringList.Clear;
with SourceTable do
begin
Open;
DisableControls;
try
while not EOF do
begin
StringList.Add(FieldByName('OriginalData').AsString);
Next;
end;
finally
EnableControls;
Close;
end;
end;
in your loop you get the field.
Search the field out of the loop
procedure TForm1.SlowLoadingIntoStringList(StringList: TStringList);
var
oField: TField;
begin
StringList.Clear;
with SourceTable do
begin
Open;
DisableControls;
try
oField:= FieldByName('OriginalData');
if oField<>Nil then
begin
while not EOF do
begin
StringList.Add(oField.AsString);
Next;
end;
end;
finally
EnableControls;
Close;
end;
end;
end;
Unfortunately, you can't do this quickly. It is an inherently slow operation that involves large amounts of CPU time and memory bandwidth to achieve. You could throw more hardware at it, but I suspect you should be re-thinking your task instead.
With 'millions of records' you may consider :
1/ Change your Query from
SELECT * FROM MYTABLE;
in
SELECT OriginalData FROM MYTABLE;
You'll use less memory and be more efficient.
2/ Look another component than TStringList depending on your needs.
3/ Look all good previous advices, mainly :
don't use FieldByName
direct link to the OleDB provider
Is it sorted?
// Turn off the sort for now
StringList.Sorted := False;
// Preallocate the space
StringList.Capacity := recordCount;
// Now add the data with Append()
...
// Now turn the sort back on
StringList.Sorted := True;
Seriously? Millions of records in a stringlist?
Ok, let's assume you really do need to take this approach...
There are some good suggestions already posted.
If you want to experiment with a different approach you could consider concatenating the individual records server side (via a stored procedure) and then returning the concatenated data as a blob (or possibly nvarchar(max)), which is basically the list of concatenated strings delimited by say a carriage return (assuming this is a reasonable delimiter for your needs).
You can then simply assign the returned value to the Text property of the TStringList.
Even if you cannot do all of the strings in a single hit, you could do them in groups of say 1000 at a time.
This should save you a ton of time looping around each record client side.
Expanding on #Ravaut123's answer I would suggest the following code:
Make sure your Query is not connected to any visual other component, and does not have any events set that fire on rowchanges because this will cause it to to updates on every change in the active record, slowing things way down.
You can disable the visual controls using disablecontrols, but not the events and non-visual controls.
...
SQLatable:= 'SELECT SingleField FROM atable ORDER BY indexedfield ASC';
AQuery:= TAdoQuery.Create(Form1);
AQuery.Connection:= ....
AQuery.SQL.Text:= SQLatable;
Using a Query makes sure you only select 1 field, in the order that you want, this reduces network traffic. A table fetches all fields, causing much more overhead.
function TForm1.LoadingAllIntoStringList(AQuery: TAdoQuery): TStringList;
var
Field1: TField;
begin
Result:= nil;
try
if not(AQuery.Active) then begin
AQuery.Open;
end else begin
AQuery.First;
end;
AQuery.DisableControls;
AQuery.Filtered:= false; //Filter in the SQL `where` clause
AQuery.FetchAll; //Preload all data into memory
Result:= TStringlist.Create;
except
{ignore error, will return nil}
end;
try
Result.Sorted:= false; //Make sure you don't enable sorting
Result.Capacity:= AQuery.RecordCount; //Preallocate the needed space
Field1:= AQuery.FieldByName('SingleField'); //Never use `fieldbyname` in a loop!
while not AQuery.EOF do begin
Result.Add(Field1.AsString);
AQuery.Next;
end; {while}
AQuery.EnableControls;
except
FreeAndNil(Result);
end;
If you want to load the data into the stringlist to do some processing, consider doing that in the SQL statement instead. The DB can use indexes and other optimizations that the stringlist cannot use.
If you want to save that data into a CSV file, consider using a build-in DB function for that.
e.g. MySQL has:
SELECT X FROM table1 INTO OUTFILE 'c:/filename_of_csv_file.txt'
Which will create a CSV file for you.
Many DB's have simular functions.

Emptying string grid in Delphi

In Delphi, is there a fast way of emptying a TStringgrid (containing in excess of 5000 rows) that will also free the memory?
Setting the rowcount to 1, empties the grid but does not free the memory.
Thanks in advance,
Paul
This should uninitialize the allocated strings (from the string list where the row texts are stored). Cleaning is done by columns since you have a lot of rows.
procedure TForm1.Button1Click(Sender: TObject);
var
I: Integer;
begin
for I := 0 to StringGrid1.ColCount - 1 do
StringGrid1.Cols[I].Clear;
StringGrid1.RowCount := 1;
end;
By "does not free the memory", do you mean that if you set RowCount := 1, and then set the RowCount := 10' you can still see the old content of theCells`?
If so, this is an old issue and has nothing to do with the memory not being freed; it's simply because you just happen to see the previous content of the memory when it's allocated again, because memory isn't zero'd out.
I have a pretty standard routine in a utility unit that deals with this visual glitch, and unless the grid is huge works fast enough. Just pass the TStringGrid before you change the RowCount or ColCount to a lower value.
procedure ClearStringGrid(const Grid: TStringGrid);
var
c, r: Integer;
begin
for c := 0 to Pred(Grid.ColCount) do
for r := 0 to Pred(Grid.RowCount) do
Grid.Cells[c, r] := '';
end;
Use it like this:
ClearStringGrid(StringGrid1);
StringGrid1.RowCount := 1;
I would suggest storing your string values in your own memory that you have full control over, and then use a TDrawGrid, or better a virtual TListView, to display the contents of that memory as needed.
The fastest way to use a TStringGrid is using OnGetValue/OnSetValue.
This way only the text of visible cells are requested dynamically.
Adding and removing rows is then lighting fast, otherwise TStringgrid is
very slooow when you have more than 5000 records.
This way I can fill and clear a grid with 700.000 records within a second!
When memory usage is the critical argument, consider using another grid. For example, NLDStringGrid that is (re)written by myself, and which has an additional property called MemoryOptions. It controls whether data can be stored beyond ColCount * RowCount, whether the storage is proportional (less memory usage for partially filled rows and columns), whether to store the Cols and Rows property results and whether the data is stored in sparse manner.
To clear such grid that has moBeyondGrid excluded from the memory options, setting RowCount to FixedRows suffices.
It's open source and downloadable from here.

Really fast function to compare the name (full path) of two files

I have to check if I have duplicate paths in a FileListBox (FileListBox has the role of some kind of job list or play list).
Using Delphi's SameText, CompareStr, CompareText, takes 6 seconds. So I came with my own compare function which is (just) a bit faster but not fast enough. Any ideas how to improve it?
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result:= Length(Path1)= Length(Path2); { if they have different lenghts then obviously are not the same file }
if Result then
for i:= Length(Path1) downto 1 DO { start from the end because it is more likely to find the difference there }
if Path1[i]<> Path2[i] then
begin
Result:= FALSE;
Break;
end;
end;
I use it like this:
for x:= JList.Count-1 downto 1 DO
begin
sMaster:= JList.Items[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, JList.Items[y]) then
begin
JList.Items.Delete (x); { REMOVE DUPLICATES }
Break;
end;
end;
Note: The chance of having duplicates is small so Delete is not called often. Also the list cannot be sorted because the items are added by user and sometimes the order may be important.
Update:
The thing is that I lose the asvantage of my code because it is Pascal.
It would be nice if the comparison loop ( Path1[i]<> Path2[i] ) would be optimized to use Borland's ASM code.
Delphi 7, Win XP 32 bit, Tests were done with 577 items in the list. Deleting the items from list IS NOT A PROBLEM because it happens rarely.
CONCLUSION
As Svein Bringsli pointed, my code is slow not because of the comparing algorithm but because of TListBox. The BEST solution was provided by Marcelo Cantos. Thanks a lot Marcelo.
I accepted Svein's answer because it answers directly my question "how to make my comparison function faster" with "there is no point to make it faster".
For the moment I implemented the dirty and quick to implement solution: when I have under 200 files, I use my slow code to check the duplicates. If there are more than 200 files I use dwrbudr's solution (which is damn fast) considering that if the user has so many files, the order is irrelevant anyway (human brain cannot track so many items).
I want to thank you all for ideas and especially Svein for revealing the truth: (Borland's) visual controls are damn slow!
Don't waste time optimising the assembler. You can go from O(n2) to O(n log(n)) — bringing the time down to milliseconds — by sorting the list and then doing a linear scan for duplicates.
While you're at it, forget the SameFile function. The algorithmic improvement will dwarf anything you can achieve there.
Edit: Based on feedback in the comments...
You can perform an order-preserving O(n log(n)) de-duplication as follows:
Sort a copy of the list.
Identify and copy duplicated entries to a third list along with their duplication count minus one.
Walk the original list backwards as per your original version.
In the inner (for y := ...) loop, traverse the duplication list instead. If an outer item matches, delete it, decrement the duplication count, and delete the duplication entry if the count reaches zero.
This is obviously more complicated but it will still be orders of magnitude faster, even if you do horrible dirty things like storing duplication counts as strings, C:\path1\file1=2, and using code like:
y := dupes.IndexOfName(sMaster);
if y <> -1 then
begin
JList.Items.Delete(x);
c := StrToInt(dupes.ValueFromIndex(y));
if c > 1 then
dupes.Values[sMaster] = IntToStr(c - 1);
else
dupes.Delete(y);
end;
Side note: A binary chop would be more efficient than the for y := ... loop, but given that duplicates are rare, the difference ought to be negligible.
Using your code as a starting point, I modified it to take a copy of the list before searching for duplicates. The time went from 5,5 seconds to about 0,5 seconds.
vSL := TStringList.Create;
try
vSL.Assign(jList.Items);
vSL.Sorted := true;
for x:= vSL.Count-1 downto 1 DO
begin
sMaster:= vSL[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, vSL[y]) then
begin
vSL.Delete (x); { REMOVE DUPLICATES }
jList.Items.Delete (x);
Break;
end;
end;
finally
vSL.Free;
end;
Obviously, this is not a good way to do it, but it demonstrates that TFileListBox is in itself quite slow. I don't believe you can gain much by optimizing your compare-function.
To demonstrate this, I replaced your SameFile function with the following, but kept the rest of your code:
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result := false; //Pretty darn fast code!!!
end;
The time went from 5,6 seconds to 5,5 seconds. I don't think there's much more to gain there :-)
Create another sorted list with sortedList.Duplicates := dupIgnore and add your strings to that list, then back.
vSL := TStringList.Create;
try
vSL.Sorted := true;
vSL.Duplicates := dupIgnore;
for x:= 0 to jList.Count - 1 do
vSL.Add(jList[x]);
jList.Clear;
for x:= 0 to vSL.Count - 1 do
jList.Add(vSL[x]);
finally
vSL.Free;
end;
The absolute fastest way, bar none (as alluded to before) is to use a routine that generates a unique 64/128/256 bit hash code for a string (I use the SHA256Managed class in C#). Run down the list of strings, generate the hash code for the strings, check for it in the sorted hash code list, and if found then the string is a duplicate. Otherwise add the hash code to the sorted hash code list.
This will work for strings, file names, images (you can get the unique hash code for an image), etc, and I guarantee that this will be as fast or faster than any other impementation.
PS You can use a string list for the hash codes by representing the hash codes as strings. I've used a hex representation in the past (256 bits -> 64 characters) but in theory you can do it any way you like.
4 seconds for how many calls? Great performance if you call it a billion times...
Anyway, does Length(Path1) get evaluated every time through the loop? If so, store that in an Integer variable prior to looping.
Pointers may yield some speed over the strings.
Try in-lining the function with:
function SameFile(blah blah): Boolean; Inline;
That will save some time, if this is being called thousands of times per second. I would start with that and see if it saves anything.
EDIT: I didn't realize that your list wasn't sorted. Obviously, you should do that first! Then you don't have to compare against every other item in the list - just the prior or next one.
I use a modified Ternary Search Tree (TST) to dedupe lists. You simply load the items into the tree, using the whole string as the key, and on each item you can get back an indication if the key is already there (and delete your visible entry). Then you throw away the tree. Our TST load function can typically load 100000 80-byte items in well under a second. And it could not take any more than this to repaint your list, with proper use of begin- and end-update. The TST is memory-hungry, but not so that you would notice it at all if you only have of the order of 500 items. And much simpler than sorting, comparisons and assembler (if you have a suitable TST implementation, of course).
No need to use a hash table, a single sorted list gives me a result of 10 milliseconds, that's 0.01 seconds, which is about 500 times faster! Here is my test code using a TListBox:
procedure TForm1.Button1Click(Sender: TObject);
var
lIndex1: Integer;
lString: string;
lIndex2: Integer;
lStrings: TStringList;
lCount: Integer;
lItems: TStrings;
begin
ListBox1.Clear;
for lIndex1 := 1 to 577 do begin
lString := '';
for lIndex2 := 1 to 100 do
if (lIndex2 mod 6) = 0 then
lString := lString + Chr(Ord('a') + Random(2))
else
lString := lString + 'a';
ListBox1.Items.Add(lString);
end;
CsiGlobals.AddLogMsg('Start', 'Test', llBrief);
lStrings := TStringList.Create;
try
lStrings.Sorted := True;
lCount := 0;
lItems := ListBox1.Items;
with lItems do begin
BeginUpdate;
try
for lIndex1 := Count - 1 downto 0 do begin
lStrings.Add(Strings[lIndex1]);
if lStrings.Count = lCount then
Delete(lIndex1)
else
Inc(lCount);
end;
finally
EndUpdate;
end;
end;
finally
lStrings.Free;
end;
CsiGlobals.AddLogMsg('Stop', 'Test', llBrief);
end;
I'd also like to point out that your solution would take an extreme amount of time if applied to a huge list (like containing 100,000,000 items or more). Even constructing a hashtable or sorted list would take too much time.
In cases like that you could try another approach : Hash each member, but instead of populating a full-blown hashtable, create a bitset (large enough to contain a close factor to as many slots as there are input items) and just set each bit at the offset indicated by the hashfunction. If the bit was 0, change it to 1. If it was already 1, take note of the offending string-index in a separate list and continue. This results in a list of string-indexes that had a collision in the hash, so you'll have to run it a second time to find the first cause of those collisions. After that, you should sort & de-dupe the string-indexes in this list (as all indexes apart from the first one will be present twice). Once that's done you should sort the list again, but this time sort it on the string-contents in order to easily spot duplicates in a following single scan.
Granted it could be a bit extreme to go this all this length, but at least it's a workable solution for very large volumes! (Oh, and this still won't work if the number of duplicates is very high, when the hash-function has a bad spread or when the number of slots in the 'hashtable' bitset is chosen too small - which would give many collisions which aren't really duplicates.)

TListView performance issues

I tried to use a TListView component to display rather large data lists (like 4000 rows large), and creating the list is incredibly slow - it takes something like 2-3 secs, which makes the UI all laggy and close to unusable.
I fill the TListView.Items inside a BeginUpdate/EndUpdate block, with only preallocated strings - I mean : I build a list of all strings to store (which takes no humanly noticeable time), then I put them in the TListView.
I wish to display the TListView's content in vsReport mode with several columns.
The code looks like this :
MyList.Items.BeginUpdate;
for i := 0 to MyCount - 1 do
begin
ListItem := MyList.Items.Add;
ListItem.Caption := StrCaptions[i];
ListItem.SubItems.Add(StrSubItems1[i]);
ListItem.SubItems.Add(StrSubItems2[i]);
end;
MyList.Items.EndUpdate;
Is there some other hack I missed in the TListView component's logic ? or should I just forget about using this component for performances ?
You can use listview in virtual mode. Have a look at the virtuallistview.dpr demo.
You can try Virtual Treeview component. It says "Virtual Treeview is extremely fast. Adding one million nodes takes only 700 milliseconds"
Use separate structure for holding your data. Set OwnerData of TListView to True.
#4000 rows I get only ~700 ms (D2009) times. For more responsiveness you could separate to other thread or add dirty Application.ProcessMessages() into loop.
rows generated with this code in 16 ms:
MyCount := 4000;
dw := GetTickCount();
for i := 0 to MyCount - 1 do begin
StrCaptions.Add('caption'+IntToStr(i));
StrSubItems1.Add('sub1'+IntToStr(i));
StrSubItems2.Add('sub2'+IntToStr(i));
end;
ShowMessageFmt('%u ms', [GetTickCount() - dw]);
Printed with:
MyList.Clear;
dw := GetTickCount();
MyList.Items.BeginUpdate;
for i := 0 to MyCount - 1 do
begin
ListItem := MyList.Items.Add;
ListItem.Caption := StrCaptions[i];
ListItem.SubItems.Add(StrSubItems1[i]);
ListItem.SubItems.Add(StrSubItems2[i]);
end;
MyList.Items.EndUpdate;
ShowMessageFmt('%u ms', [GetTickCount() - dw]);
EDIT:
I inserted Application.ProcessMessages() into print, but somewhy performance stays same

converting program to Multithreading, taking advantage of multicore cpu

i have a simple program with one procedure.
Procedure TForm1.btnKeywrdTransClick(Sender: TObject);
Var
i, ii : integer;
ch_word, zword, uy_word: widestring;
Begin
TntListBox1.items.LoadFromFile('d:\new folder\chh.txt'); //Chinese
TntListBox2.items.LoadFromFile('d:\new folder\uyy.txt'); //Uyword
TntListBox4.items.LoadFromFile(Edit3.text); //list of poi files
For I := 0 To TntListBox4.items.Count - 1 do
Begin
TntListBox3.items.LoadFromFile(TntListBox4.Items[i]);
zword := tntlistbox3.Items.Text; //Poi
For ii := 0 To TntListBox1.Items.count - 1 Do
Begin
loopz;
ch_word := tntlistbox1.Items[ii];
uy_word := ' ' + TntListBox2.items[ii] + ' ';
zword := wideFastReplace(zword, ch_word, uy_word, [rfReplaceAll]); //fastest, and better for large text
End;
TntListBox3.Items.text := zword;
TntListBox3.items.SaveToFile(TntListBox4.Items[i]);
end;
end;
now my new computer has 4cores, is making this program multithreading will make it run faster (if i uses 4 thread, a thread per core) ?
i have no experience with multithreading, i need your help
thanks.
ps : this is Loopz procedure
Procedure loopz;
Var
msg : tmsg;
Begin
While PeekMessage(Msg, 0, 0, 0, pm_Remove) Do
Begin
If Msg.Message = wm_Quit Then Halt(Msg.WParam);
TranslateMessage(Msg);
DispatchMessage(Msg);
End;
End;
update 1 :
from the answers, im gonna do
1 - use a profiler to find the most time consuming code
2 - try eliminate gui related things if possible
3 - use threads.
i'll report back . thanks all.
First of all make the algorithm as effective as it can be in it's current incarnation: Stop using TListBox to store your data!!! (sorry for shouting) Replace them with TStringList and you'll get a HUGE performance improvement. That's an required first step any way, because you can't use GUI objects from multiple threads (in fact you may only use them from the "main" thread). While you're changing TListBox to TStringList please give your variable meaningful names. I don't know how many people around here figured out that you're storing a list of file names in ListBox4, loading each file in ListBox3, using ListBox1 as a "keyword list" and ListBox2 as a "value list"... really, it's a big mess! Here's how it would look like with TStringList and proper names:
Procedure TForm1.btnKeywrdTransClick(Sender: TObject);
Var
i, ii : integer;
ch_word, zword, uy_word: widestring;
PoiFilesList:TStringList; // This is the list of files that need work
PoiFile:TStringList; // This is the file I'm working on right now
KeywordList, ValueList:TStringList; // I'll replace all keywords with corresponding values
Begin
PoiFilesList := TStringList.Create;
PoiFile := TStringList.Create;
KeywordList := TStringList.Create;
ValueList := TStringList.Create;
try
PoiFilesList.LoadFromFile(Edit3.text); //list of poi files
KeywordList.LoadFromFile('d:\new folder\chh.txt'); //Chinese
ValueList.LoadFromFile('d:\new folder\uyy.txt'); //Uyword
For I := 0 To PoiFilesList.Count - 1 do
Begin
PoiFile.LoadFromFile(PoiFilesList[i]);
zword := PoiFile.Text; //Poi
For ii := 0 To KeywordList.count - 1 Do
Begin
ch_word := KeywordList[ii];
uy_word := ' ' + ValueList[ii] + ' ';
zword := wideFastReplace(zword, ch_word, uy_word, [rfReplaceAll]);
End;
PoiFile.text := zword;
PoiFile.SaveToFile(PoiFilesList[i]);
end;
finally
PoiFilesList.Free;
PoiFile.Free;
KeywordList.Free;
ValueList.Free;
end;
end;
If you look at the code now, it's obvious what it does, and it's obvious how to multi-thread-it. You've got a text file containing names of files. You open up each one of those files and replace all Keywords with the corresponding Values. You save the file back to disk. It's easy! Load the KeywordList and ValueList to memory once, split the list of files into 4 smaller lists, start up 4 threads each working with it's own smaller files list.
I don't want to writhe the whole multi-threaded variant of the code because if I'll write it myself you might not understand how it works. Give it a chance and ask for help if you get into trouble.
First you should profile your code to see if reading from TntListBox is slowing you down or if it is WideFastReplace. But even before that, remove the 'loopz' call - it is slowing you the most! Why are you processing messages inside this loop at all?
To find the bottleneck, simply time your loop twice, but the second time comment out the WideFastReplace call. (And make sure you are timing only the loop, not the assignment to the TntListBox3 or saving into file or loading from file.)
When you will know what's slowing you down, report back ...
BTW, calling WideFastReplace in parallel would be almost impossible as it is always operating on the same source. I don't see any good way to parallelize your code.
A possible parallelization approach:
Split zword on an appropriate word delimiter (I'm assuming here you are only replacing words, not phrases) into N strings where N is the number of cores.
Do the full replacement (all search/replacement pairs) for each of those N strings in parallel. Of course, you would have to read search/replacement pairs first from the TntListBoxes into some internal structure (TStringList would suffice) and then use this structure in all N threads.
Concatenate those partial strings back together.
Of course, there's no point in doing that if WideFastReplace is not the time-consuming part of the code. Do the profiling first!
It looks like you are interfacing with GUI elements.
99% of all GUI code must be interfaced from one and only one thread.
If you refactor your code to perform the text replacements in a series of threads, dividing the text amongst them, and then have the GUI thread place it into your list box, you could improve performance.
Note that creating and synchronizing threads is not cheap. Unless you have thousands of entries to work on, you will likely slow down your program by adding threads.
You should gain quite a bit of improvement by using only one thread for the whole thing. With this you can omit the loopz call completely.
Be aware that you should replace the TntListboxes with local TWideStringList instances in your routine.
When you have gotten somewhat familiar with multithreading, you can go and split the work into multiple threads. This can be done for instance by splitting the list of poi files (listbox4) in multiple (say 3-4) lists, one for each thread.
Operations that could be run in parallel benefit from multitasking - those that have to be run one after another can't. The larger the operation, the larger the benefit. In your procedure you could parallelize the file loadings (although I guess they hold not so many elements) and you could parallelize the replace operation having multiple threads operating each on different list elements. How much faster it will run depends of the files size.
I guess you have more speed penality in using GUI elements to store data instead of working directly on in-memory structure, because you that means redrawing the controls often, which is an expensive operation.
Here is your answer
1. If you can, do not wait until user click to react to the action. Do it before hand like on formcreate by
Put them into wrapper object
Run it under a thread; once finish, mark it to be ready to be used
When user click on the action, check for marker. If it is not done
yet do a while loop and wait something like
btnKeywrdTrans.Enabled := False;
while not wrapper.done do
begin
Sleep(500);
Application.Processmessages;
end;
..... your further logic
btnKeywrdTrans.Enabled := True;
Replace it with TStringList or TWideStringList
Cheers
Pham

Resources