Really fast function to compare the name (full path) of two files - delphi

I have to check if I have duplicate paths in a FileListBox (FileListBox has the role of some kind of job list or play list).
Using Delphi's SameText, CompareStr, CompareText, takes 6 seconds. So I came with my own compare function which is (just) a bit faster but not fast enough. Any ideas how to improve it?
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result:= Length(Path1)= Length(Path2); { if they have different lenghts then obviously are not the same file }
if Result then
for i:= Length(Path1) downto 1 DO { start from the end because it is more likely to find the difference there }
if Path1[i]<> Path2[i] then
begin
Result:= FALSE;
Break;
end;
end;
I use it like this:
for x:= JList.Count-1 downto 1 DO
begin
sMaster:= JList.Items[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, JList.Items[y]) then
begin
JList.Items.Delete (x); { REMOVE DUPLICATES }
Break;
end;
end;
Note: The chance of having duplicates is small so Delete is not called often. Also the list cannot be sorted because the items are added by user and sometimes the order may be important.
Update:
The thing is that I lose the asvantage of my code because it is Pascal.
It would be nice if the comparison loop ( Path1[i]<> Path2[i] ) would be optimized to use Borland's ASM code.
Delphi 7, Win XP 32 bit, Tests were done with 577 items in the list. Deleting the items from list IS NOT A PROBLEM because it happens rarely.
CONCLUSION
As Svein Bringsli pointed, my code is slow not because of the comparing algorithm but because of TListBox. The BEST solution was provided by Marcelo Cantos. Thanks a lot Marcelo.
I accepted Svein's answer because it answers directly my question "how to make my comparison function faster" with "there is no point to make it faster".
For the moment I implemented the dirty and quick to implement solution: when I have under 200 files, I use my slow code to check the duplicates. If there are more than 200 files I use dwrbudr's solution (which is damn fast) considering that if the user has so many files, the order is irrelevant anyway (human brain cannot track so many items).
I want to thank you all for ideas and especially Svein for revealing the truth: (Borland's) visual controls are damn slow!

Don't waste time optimising the assembler. You can go from O(n2) to O(n log(n)) — bringing the time down to milliseconds — by sorting the list and then doing a linear scan for duplicates.
While you're at it, forget the SameFile function. The algorithmic improvement will dwarf anything you can achieve there.
Edit: Based on feedback in the comments...
You can perform an order-preserving O(n log(n)) de-duplication as follows:
Sort a copy of the list.
Identify and copy duplicated entries to a third list along with their duplication count minus one.
Walk the original list backwards as per your original version.
In the inner (for y := ...) loop, traverse the duplication list instead. If an outer item matches, delete it, decrement the duplication count, and delete the duplication entry if the count reaches zero.
This is obviously more complicated but it will still be orders of magnitude faster, even if you do horrible dirty things like storing duplication counts as strings, C:\path1\file1=2, and using code like:
y := dupes.IndexOfName(sMaster);
if y <> -1 then
begin
JList.Items.Delete(x);
c := StrToInt(dupes.ValueFromIndex(y));
if c > 1 then
dupes.Values[sMaster] = IntToStr(c - 1);
else
dupes.Delete(y);
end;
Side note: A binary chop would be more efficient than the for y := ... loop, but given that duplicates are rare, the difference ought to be negligible.

Using your code as a starting point, I modified it to take a copy of the list before searching for duplicates. The time went from 5,5 seconds to about 0,5 seconds.
vSL := TStringList.Create;
try
vSL.Assign(jList.Items);
vSL.Sorted := true;
for x:= vSL.Count-1 downto 1 DO
begin
sMaster:= vSL[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, vSL[y]) then
begin
vSL.Delete (x); { REMOVE DUPLICATES }
jList.Items.Delete (x);
Break;
end;
end;
finally
vSL.Free;
end;
Obviously, this is not a good way to do it, but it demonstrates that TFileListBox is in itself quite slow. I don't believe you can gain much by optimizing your compare-function.
To demonstrate this, I replaced your SameFile function with the following, but kept the rest of your code:
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result := false; //Pretty darn fast code!!!
end;
The time went from 5,6 seconds to 5,5 seconds. I don't think there's much more to gain there :-)

Create another sorted list with sortedList.Duplicates := dupIgnore and add your strings to that list, then back.
vSL := TStringList.Create;
try
vSL.Sorted := true;
vSL.Duplicates := dupIgnore;
for x:= 0 to jList.Count - 1 do
vSL.Add(jList[x]);
jList.Clear;
for x:= 0 to vSL.Count - 1 do
jList.Add(vSL[x]);
finally
vSL.Free;
end;

The absolute fastest way, bar none (as alluded to before) is to use a routine that generates a unique 64/128/256 bit hash code for a string (I use the SHA256Managed class in C#). Run down the list of strings, generate the hash code for the strings, check for it in the sorted hash code list, and if found then the string is a duplicate. Otherwise add the hash code to the sorted hash code list.
This will work for strings, file names, images (you can get the unique hash code for an image), etc, and I guarantee that this will be as fast or faster than any other impementation.
PS You can use a string list for the hash codes by representing the hash codes as strings. I've used a hex representation in the past (256 bits -> 64 characters) but in theory you can do it any way you like.

4 seconds for how many calls? Great performance if you call it a billion times...
Anyway, does Length(Path1) get evaluated every time through the loop? If so, store that in an Integer variable prior to looping.
Pointers may yield some speed over the strings.
Try in-lining the function with:
function SameFile(blah blah): Boolean; Inline;
That will save some time, if this is being called thousands of times per second. I would start with that and see if it saves anything.
EDIT: I didn't realize that your list wasn't sorted. Obviously, you should do that first! Then you don't have to compare against every other item in the list - just the prior or next one.

I use a modified Ternary Search Tree (TST) to dedupe lists. You simply load the items into the tree, using the whole string as the key, and on each item you can get back an indication if the key is already there (and delete your visible entry). Then you throw away the tree. Our TST load function can typically load 100000 80-byte items in well under a second. And it could not take any more than this to repaint your list, with proper use of begin- and end-update. The TST is memory-hungry, but not so that you would notice it at all if you only have of the order of 500 items. And much simpler than sorting, comparisons and assembler (if you have a suitable TST implementation, of course).

No need to use a hash table, a single sorted list gives me a result of 10 milliseconds, that's 0.01 seconds, which is about 500 times faster! Here is my test code using a TListBox:
procedure TForm1.Button1Click(Sender: TObject);
var
lIndex1: Integer;
lString: string;
lIndex2: Integer;
lStrings: TStringList;
lCount: Integer;
lItems: TStrings;
begin
ListBox1.Clear;
for lIndex1 := 1 to 577 do begin
lString := '';
for lIndex2 := 1 to 100 do
if (lIndex2 mod 6) = 0 then
lString := lString + Chr(Ord('a') + Random(2))
else
lString := lString + 'a';
ListBox1.Items.Add(lString);
end;
CsiGlobals.AddLogMsg('Start', 'Test', llBrief);
lStrings := TStringList.Create;
try
lStrings.Sorted := True;
lCount := 0;
lItems := ListBox1.Items;
with lItems do begin
BeginUpdate;
try
for lIndex1 := Count - 1 downto 0 do begin
lStrings.Add(Strings[lIndex1]);
if lStrings.Count = lCount then
Delete(lIndex1)
else
Inc(lCount);
end;
finally
EndUpdate;
end;
end;
finally
lStrings.Free;
end;
CsiGlobals.AddLogMsg('Stop', 'Test', llBrief);
end;

I'd also like to point out that your solution would take an extreme amount of time if applied to a huge list (like containing 100,000,000 items or more). Even constructing a hashtable or sorted list would take too much time.
In cases like that you could try another approach : Hash each member, but instead of populating a full-blown hashtable, create a bitset (large enough to contain a close factor to as many slots as there are input items) and just set each bit at the offset indicated by the hashfunction. If the bit was 0, change it to 1. If it was already 1, take note of the offending string-index in a separate list and continue. This results in a list of string-indexes that had a collision in the hash, so you'll have to run it a second time to find the first cause of those collisions. After that, you should sort & de-dupe the string-indexes in this list (as all indexes apart from the first one will be present twice). Once that's done you should sort the list again, but this time sort it on the string-contents in order to easily spot duplicates in a following single scan.
Granted it could be a bit extreme to go this all this length, but at least it's a workable solution for very large volumes! (Oh, and this still won't work if the number of duplicates is very high, when the hash-function has a bad spread or when the number of slots in the 'hashtable' bitset is chosen too small - which would give many collisions which aren't really duplicates.)

Related

Find duplicates in a stringlist very fast

what is the fastest way to find duplicates in a Tstringlist. I get the data I need to search for duplicates in a Stringlist. My current idea goes like this :
var TestStringList, DataStringList : TstringList;
for i := 0 to DataStringList.Items-1 do
begin
if TestStringList.Indexof(DataStringList[i])< 0 < 0 then
begin
TestStringList.Add(DataStringList[i])
end
else
begin
memo1.ines.add('duplicate item found');
end;
end;
....
Just for completeness, (and because your code doesn't actually use the duplicate, but just indicates one has been found): Delphi's TStringList has the built-in ability to deal with duplicate entries, in it's Duplicates property. Setting it to dupIgnore will simply discard any duplicates you attempt to add. Note that the destination list has to be sorted, or Duplicates has no effect.
TestStringList.Sorted := True;
TestStringList.Duplicates := dupIgnore;
for i := 0 to DataStringList.Items-1 do
TestStringList.Add(DataStringList[i]);
Memo1.Lines.Add(Format('%d duplicates discarded',
[DataStringList.Count - TestStringList.Count]));
A quick test shows that the entire loop can be removed if you use Sorted and Duplicates:
TestStringList.Sorted := True;
TestStringList.Duplicates := dupIgnore;
TestStringList.AddStrings(DataStringList);
Memo1.Lines.Add(Format('%d duplicates discarded',
[DataStringList.Count - TestStringList.Count]));
See the TStringList.Duplicates documentation for more info.
I think that you are looking for duplicates. If so then you do the following:
Case 1: The string list is ordered
In this scenario, duplicates must appear at adjacent indices. In which case you simply loop from 1 to Count-1 and check whether or not the elements of index i is the same as that at index i-1.
Case 2: The string list is not ordered
In this scenario we need a double for loop. It looks like this:
for i := 0 to List.Count-1 do
for j := i+1 to List.Count-1 do
if List[i]=List[j] then
// duplicate found
There are performance considerations. If the list is ordered the search is O(N). If the list is not ordered the search is O(N2). Clearly the former is preferable. Since a list can be sorted with complexity O(N log N), if performance becomes a factor then it will be advantageous to sort the list before searching for duplicates.
Judging by the use of IndexOf you use an unsorted list. The scaling factor of your algorithm then is n^2. That is slow. You can optimize it as David shown by limiting search area in the internal search and then the average factor would be n^2/2 - but that still scales badly.
Note: scaling factor here makes sense for limited workloads, say dozen or hundreds of strings per list. For larger sets of data asymptotic analysis O(...) measure would suit better. However finding O-measures for QuickSort and for hash-lists is a trivial task.
Option 1: Sort the list. Using quick-sort it would have scaling factor n + n*log(n) or O(n*log(n)) for large loads.
Set Duplicates to accept
Set Sorted to True
Iterate the sorted list and check if the next string exists and is the same
http://docwiki.embarcadero.com/Libraries/XE3/en/System.Classes.TStringList.Duplicates
http://docwiki.embarcadero.com/Libraries/XE3/en/System.Classes.TStringList.Sorted
Option 2: use hashed list helper. In modern Delphi that would be TDictionary<String,Boolean>, in older Delphi there is a class used by TMemIniFile
You iterate your stringlist and then check if the string was already added into the helper collection.
The scaling factor would be a constant for small data chunks and O(1) for large ones - see http://docwiki.embarcadero.com/Libraries/XE2/en/System.Generics.Collections.TDictionary.ContainsKey
If it was not - you add it with "false" value.
If it was - you switch the value to "true"
For older Delphi you can use THashedStringList in a similar pattern (thanks #FreeConsulting)
http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/IniFiles_THashedStringList_IndexOf.html
Unfortunately it is unclear what you want to do with the duplicates. Your else clause suggests you just want to know whether there is one (or more) duplicate(s). Although that could be the end goal, I assume you want more.
Extracting duplicates
The previously given answers delete or count the duplicate items. Here an answer for keeping them.
procedure ExtractDuplicates1(List1, List2: TStringList; Dupes: TStrings);
var
Both: TStringList;
I: Integer;
begin
Both := TStringList.Create;
try
Both.Sorted := True;
Both.Duplicates := dupAccept;
Both.AddStrings(List1);
Both.AddStrings(List2);
for I := 0 to Both.Count - 2 do
if (Both[I] = Both[I + 1]) then
if (Dupes.Count = 0) or (Dupes[Dupes.Count - 1] <> Both[I]) then
Dupes.Add(Both[I]);
finally
Both.Free;
end;
end;
Performance
The following alternatives are tried in order to compare performance of the above routine.
procedure ExtractDuplicates2(List1, List2: TStringList; Dupes: TStrings);
var
Both: TStringList;
I: Integer;
begin
Both := TStringList.Create;
try
Both.AddStrings(List1);
Both.AddStrings(List2);
Both.Sort;
for I := 0 to Both.Count - 2 do
if (Both[I] = Both[I + 1]) then
if (Dupes.Count = 0) or (Dupes[Dupes.Count - 1] <> Both[I]) then
Dupes.Add(Both[I]);
finally
Both.Free;
end;
end;
procedure ExtractDuplicates3(List1, List2, Dupes: TStringList);
var
I: Integer;
begin
Dupes.Sorted := True;
Dupes.Duplicates := dupAccept;
Dupes.AddStrings(List1);
Dupes.AddStrings(List2);
for I := Dupes.Count - 1 downto 1 do
if (Dupes[I] <> Dupes[I - 1]) or (I > 1) and (Dupes[I] = Dupes[I - 2]) then
Dupes.Delete(I);
if (Dupes.Count > 1) and (Dupes[0] <> Dupes[1]) then
Dupes.Delete(0);
while (Dupes.Count > 1) and (Dupes[0] = Dupes[1]) do
Dupes.Delete(0);
end;
Although ExtractDuplicates3 marginally performs better, I prefer ExtractDuplicates1 because it reeds better and the TStrings parameter provides more usability. ExtractDuplicates2 performs noticeable worst, which demonstrates that sorting all items afterwards in a single run takes more time then continuously sorting every single item added.
Note
This answer is part of this recent answer for which I was about to ask the same question: "how to keep duplicates?". I didn't, but if anyone knows or finds a better solution, please comment, add or update this answer.
This is an old thread but I thought this solution may be useful.
An option is to pump the values from one stringlist to another one with the setting of TestStringList.Duplicates := dupError; and then trap the exception.
var TestStringList, DataStringList : TstringList;
TestStringList.Sorted := True;
TestStringList.Duplicates := dupError;
for i := 0 to DataStringList.Items-1 do
begin
try
TestStringList.Add(DataStringList[i])
except
on E : EStringListError do begin
memo1.Lines.Add('duplicate item found');
end;
end;
end;
....
Just note that the trapping of the exception also masks the following errors:
There is not enough memory to expand the list, the list tried to grow beyond its maximal capacity, a non-existent element of the list was referenced. (i.e. the list index was out of bounds).
function TestDuplicates(const dataStrList: TStringList): integer;
begin
with TStringlist.create do begin
{Duplicates:= dupIgnore;}
for it:= 0 to DataStrList.count-1 do begin
if IndexOf(DataStrList[it])< 0 then
Add(DataStrList[it])
else
inc(result)
end;
Free;
end;
end;

Removing duplicates from List

I wrote this function to remove duplicates from a TList descendant, now i was wondering if this could give me problems in certain conditions, and how it does performance wise.
It seems to work with Object Pointers
function TListClass.RemoveDups: integer;
var
total,i,j:integer;
begin
total:=0;
i := 0;
while i < count do begin
j := i+1;
while j < count do begin
if items[i]=items[j] then begin
remove(items[j]);
inc(total);
end
else
inc(j);
end;
inc(i);
end;
result:=total;
end;
Update:
Does this work faster?
function TDrawObjectList.RemoveDups: integer;
var
total,i,j:integer;
templist:TLIST;
begin
templist:=TList.Create;
total:=0;
i := 0;
while i < count do
if templist.IndexOf(items[i])=-1 then begin
templist.add(i);
inc(i);
end else begin
remove(items[i]);
inc(total);
end;
result:=total;
templist.Free;
end;
You do need another List.
As noted, the solution is O(N^2) which makes it really slow on a big set of items (1000s), but as long as the count stays low it's the best bet because of it's simplicity and easiness to implement. Where's pre-sorted and other solutions need more code and prone to implementation errors more.
This maybe the same code written in different, more compact form. It runs through all elements of the list, and for each removes duplicates on right of the current element. Removal is safe as long as it's done in a reverse loop.
function TListClass.RemoveDups: Integer;
var
I, K: Integer;
begin
Result := 0;
for I := 0 to Count - 1 do //Compare to everything on the right
for K := Count - 1 downto I+1 do //Reverse loop allows to Remove items safely
if Items[K] = Items[I] then
begin
Remove(Items[K]);
Inc(Result);
end;
end;
I would suggest to leave optimizations to a later time, if you really end up with a 5000 items list. Also, as noted above, if you do check for duplicates on adding items to the list you can save on:
Check for duplicates gets distributed in time, so it wont be as noticeable to user
You can hope to quit early if dupe is found
Just hypothetical:
Interfaces
If you have interfaced objects in an TInterfaceList that are only in that list, you could check the refcount of an object. Just loop through the list backwards and delete all objects with a refcount > 1.
Custom counter
If you can edit these objects, you could do the same without interfaces. Increment a counter on the object when they are added to the list and decrease it when they are removed.
Of course, this only works if you can actually add a counter to these objects, but the boundaries weren't exactly clear in your question, so I don't know if this is allowed.
Advantage is that you don't need to look for other items, not when inserting, not when removing duplicates. Finding a duplicate in a sorted list could be faster (as mentioned in the comments), but not having to search at all will beat even the fastest lookup.

Poor performance of TStringGrid

I have a TStringGrid with 10 columns. Adding 500 rows to it takes around 2 seconds. Is this normal performance?
It seems a bit slow to me.
I am getting the data from a database query. If I loop through the query but don't write the results to the StringGrid, the process takes around 100ms, so it's not the database that's slowing things down.
Once the rows are added, the StringGrid performance is fine.
Here is the code I am using
Grid.RowCount := Query.RecordCount;
J := 0;
while not Query.EOF do
begin
Grid.Cells[0,J]:=Query.FieldByName('Value1').AsString;
Grid.Cells[1,J]:=Query.FieldByName('Value2').AsString;
Grid.Cells[2,J]:=Query.FieldByName('Value3').AsString;
// etc for other columns.
Inc(J);
Query.Next();
end;
The real code is actually a bit more complex (the table columns do not correspond exactly to the query columns) but that's the basic idea
One other thing I have found to be very important when going through a lot of records is to use proper TField variables for each field. FieldByName iterates through the Fields collection every time so is not the most performant option.
Before the loop define each field as in:
var
f1, f2: TStringField;
f3: TIntegerField;
begin
// MyStringGrid.BeginUpdate; // Can't do this
// Could try something like this instead:
// MyStringGrid.Perform(WM_SETREDRAW, 0, 0);
try
while ... do
begin
rowvalues[0] := f1.AsString;
rowvalues[1] := f2.AsString;
rowvalues[2] := Format('%4.2d', f3.AsInteger);
// etc
end;
finally
// MyStringGrid.EndUpdate; // Can't - see above
// MyStringGrid.Perform(WM_SETREDRAW, 1, 0);
// MyStringGrid.Invalidate;
end;
end;
That along with BeginUpdate/Endupdate and calling Query.DisableControls if appropriate.
The solution was to add all values in a row at once, using the "Rows" property.
My code now looks like this:
Grid.RowCount := Query.RecordCount;
rowValues:=TStringList.Create;
J := 0;
while not Query.EOF do
begin
rowValues[0]:=Query.FieldByName('Value1').AsString;
rowValues[1]:=Query.FieldByName('Value2').AsString;
rowValues[2]:=Query.FieldByName('Value3').AsString;
// etc for other columns.
Grid.Rows[J]:=rowValues;
Inc(J);
Query.Next();
end;
rowValues.Free; // for the OCD among us
This brought the time down from 2 seconds to about 50ms.
FieldByName used in a loop is very slow since it is calculated each time. You should do it out of the loop and then just use results inside of a loop.
TStringGrid works OK for a small number of records, but don't try it for more than 10.000 records.
We had severe performance problems with TAdvStringGrid from TMS (which is based on Delphi TStringGrid) when loading/sorting/grouping large grid sets, but also when inserting one row at the top of the grid (expanding a grid group node). Also memory usage was high.
And yes, I used the beginupdate/endupdate already. Also other tricks. But after diving into the structure of TStringGrid I concluded it could never be fast for many records.
As a general tip (for large grids): use the OnGetText (and OnSetText) event. This event is used for filling the grid on demand (only the cells that are displayed). Store the data in your own data objects. This made our grid very fast (1.000.000 record is no problem anymore, loads within seconds!)
First optimization is to replace very slow Query.FieldByName('Value1') calls by a local TQuery.
var
F1, F2, F3: TField;
Grid.RowCount := Query.RecordCount;
J := 0;
F1 := Query.FieldByName('Value1');
F2 := Query.FieldByName('Value2');
F3 := Query.FieldByName('Value3');
while not Query.EOF do
begin
Grid.Cells[0,J]:=F1.AsString;
Grid.Cells[1,J]:=F2.AsString;
Grid.Cells[2,J]:=F3.AsString;
// etc for other columns.
Inc(J);
Query.Next();
end;
If this is not enough, use the grid in virtual mode, i.e. retrieve all content in a TStringList or any in-memory structure, then use the OnGetText or OnDrawCell methods.
I believe it's slow because it has to repaint itself everytime you add a row. Since you are taking the values from a query i think it would be better for you to use a TDBGrid instead.
Best regards.
If you know how many rows you're about to add, store the current rowcount in a temporary variable, set the grid's rowcount to accommodate the current rowcount plus the rows you're about to add, then assign the new values to the rows (using the former rowcount you stored) rather than adding them. This will reduce a lot of background processing.
Try testing with AQTime or similar tool (profilers).
Without any code is difficult, but I thinks thar the poor performance is due to FieldByName, not StringGrid.
FieldByName make a liear search:
for I := 0 to FList.Count - 1 do
begin
Result := FList.Items[I];
...
If your Dataset have many columns (fields) the performance will still be lower.
Regards.
I was going to say "why not just use beginupdate/endupdate?" but now I see that the regular string grid doesn't support it.
While googling that, I found a way to simulate beginupdate/endupdate:
http://www.experts-exchange.com/Programming/Languages/Pascal/Delphi/Q_21832072.html
See the answer by ZhaawZ, where he uses a pair of WM_SETREDRAW messages to disable/enable the repainting. If this works, use in conjunction with the "eliminate use of FieldbyName" trick, and it should take no time to draw.
Set Grid.RowCount = 2 before the loop then when the loop is finished set the rowcount to the correct value.
That avoids lots of calls to the OnPaint event.
In my case it turned out that the Debug build was slow and the Release build was fast - a Heisenbug.
More specifically, FastMM4 FullDebugMode triggered the slowness.

converting program to Multithreading, taking advantage of multicore cpu

i have a simple program with one procedure.
Procedure TForm1.btnKeywrdTransClick(Sender: TObject);
Var
i, ii : integer;
ch_word, zword, uy_word: widestring;
Begin
TntListBox1.items.LoadFromFile('d:\new folder\chh.txt'); //Chinese
TntListBox2.items.LoadFromFile('d:\new folder\uyy.txt'); //Uyword
TntListBox4.items.LoadFromFile(Edit3.text); //list of poi files
For I := 0 To TntListBox4.items.Count - 1 do
Begin
TntListBox3.items.LoadFromFile(TntListBox4.Items[i]);
zword := tntlistbox3.Items.Text; //Poi
For ii := 0 To TntListBox1.Items.count - 1 Do
Begin
loopz;
ch_word := tntlistbox1.Items[ii];
uy_word := ' ' + TntListBox2.items[ii] + ' ';
zword := wideFastReplace(zword, ch_word, uy_word, [rfReplaceAll]); //fastest, and better for large text
End;
TntListBox3.Items.text := zword;
TntListBox3.items.SaveToFile(TntListBox4.Items[i]);
end;
end;
now my new computer has 4cores, is making this program multithreading will make it run faster (if i uses 4 thread, a thread per core) ?
i have no experience with multithreading, i need your help
thanks.
ps : this is Loopz procedure
Procedure loopz;
Var
msg : tmsg;
Begin
While PeekMessage(Msg, 0, 0, 0, pm_Remove) Do
Begin
If Msg.Message = wm_Quit Then Halt(Msg.WParam);
TranslateMessage(Msg);
DispatchMessage(Msg);
End;
End;
update 1 :
from the answers, im gonna do
1 - use a profiler to find the most time consuming code
2 - try eliminate gui related things if possible
3 - use threads.
i'll report back . thanks all.
First of all make the algorithm as effective as it can be in it's current incarnation: Stop using TListBox to store your data!!! (sorry for shouting) Replace them with TStringList and you'll get a HUGE performance improvement. That's an required first step any way, because you can't use GUI objects from multiple threads (in fact you may only use them from the "main" thread). While you're changing TListBox to TStringList please give your variable meaningful names. I don't know how many people around here figured out that you're storing a list of file names in ListBox4, loading each file in ListBox3, using ListBox1 as a "keyword list" and ListBox2 as a "value list"... really, it's a big mess! Here's how it would look like with TStringList and proper names:
Procedure TForm1.btnKeywrdTransClick(Sender: TObject);
Var
i, ii : integer;
ch_word, zword, uy_word: widestring;
PoiFilesList:TStringList; // This is the list of files that need work
PoiFile:TStringList; // This is the file I'm working on right now
KeywordList, ValueList:TStringList; // I'll replace all keywords with corresponding values
Begin
PoiFilesList := TStringList.Create;
PoiFile := TStringList.Create;
KeywordList := TStringList.Create;
ValueList := TStringList.Create;
try
PoiFilesList.LoadFromFile(Edit3.text); //list of poi files
KeywordList.LoadFromFile('d:\new folder\chh.txt'); //Chinese
ValueList.LoadFromFile('d:\new folder\uyy.txt'); //Uyword
For I := 0 To PoiFilesList.Count - 1 do
Begin
PoiFile.LoadFromFile(PoiFilesList[i]);
zword := PoiFile.Text; //Poi
For ii := 0 To KeywordList.count - 1 Do
Begin
ch_word := KeywordList[ii];
uy_word := ' ' + ValueList[ii] + ' ';
zword := wideFastReplace(zword, ch_word, uy_word, [rfReplaceAll]);
End;
PoiFile.text := zword;
PoiFile.SaveToFile(PoiFilesList[i]);
end;
finally
PoiFilesList.Free;
PoiFile.Free;
KeywordList.Free;
ValueList.Free;
end;
end;
If you look at the code now, it's obvious what it does, and it's obvious how to multi-thread-it. You've got a text file containing names of files. You open up each one of those files and replace all Keywords with the corresponding Values. You save the file back to disk. It's easy! Load the KeywordList and ValueList to memory once, split the list of files into 4 smaller lists, start up 4 threads each working with it's own smaller files list.
I don't want to writhe the whole multi-threaded variant of the code because if I'll write it myself you might not understand how it works. Give it a chance and ask for help if you get into trouble.
First you should profile your code to see if reading from TntListBox is slowing you down or if it is WideFastReplace. But even before that, remove the 'loopz' call - it is slowing you the most! Why are you processing messages inside this loop at all?
To find the bottleneck, simply time your loop twice, but the second time comment out the WideFastReplace call. (And make sure you are timing only the loop, not the assignment to the TntListBox3 or saving into file or loading from file.)
When you will know what's slowing you down, report back ...
BTW, calling WideFastReplace in parallel would be almost impossible as it is always operating on the same source. I don't see any good way to parallelize your code.
A possible parallelization approach:
Split zword on an appropriate word delimiter (I'm assuming here you are only replacing words, not phrases) into N strings where N is the number of cores.
Do the full replacement (all search/replacement pairs) for each of those N strings in parallel. Of course, you would have to read search/replacement pairs first from the TntListBoxes into some internal structure (TStringList would suffice) and then use this structure in all N threads.
Concatenate those partial strings back together.
Of course, there's no point in doing that if WideFastReplace is not the time-consuming part of the code. Do the profiling first!
It looks like you are interfacing with GUI elements.
99% of all GUI code must be interfaced from one and only one thread.
If you refactor your code to perform the text replacements in a series of threads, dividing the text amongst them, and then have the GUI thread place it into your list box, you could improve performance.
Note that creating and synchronizing threads is not cheap. Unless you have thousands of entries to work on, you will likely slow down your program by adding threads.
You should gain quite a bit of improvement by using only one thread for the whole thing. With this you can omit the loopz call completely.
Be aware that you should replace the TntListboxes with local TWideStringList instances in your routine.
When you have gotten somewhat familiar with multithreading, you can go and split the work into multiple threads. This can be done for instance by splitting the list of poi files (listbox4) in multiple (say 3-4) lists, one for each thread.
Operations that could be run in parallel benefit from multitasking - those that have to be run one after another can't. The larger the operation, the larger the benefit. In your procedure you could parallelize the file loadings (although I guess they hold not so many elements) and you could parallelize the replace operation having multiple threads operating each on different list elements. How much faster it will run depends of the files size.
I guess you have more speed penality in using GUI elements to store data instead of working directly on in-memory structure, because you that means redrawing the controls often, which is an expensive operation.
Here is your answer
1. If you can, do not wait until user click to react to the action. Do it before hand like on formcreate by
Put them into wrapper object
Run it under a thread; once finish, mark it to be ready to be used
When user click on the action, check for marker. If it is not done
yet do a while loop and wait something like
btnKeywrdTrans.Enabled := False;
while not wrapper.done do
begin
Sleep(500);
Application.Processmessages;
end;
..... your further logic
btnKeywrdTrans.Enabled := True;
Replace it with TStringList or TWideStringList
Cheers
Pham

How to increase the FOR-loop value in a FOR-loop statement?

I want to know how to increase the value in a FOR-loop statement.
This is my code.
function Check(var MemoryData:Array of byte;MemorySignature:Array of byte;Position:integer):boolean;
var i:byte;
begin
for i := 0 to Length(MemorySignature) - 1 do
begin
while(MemorySignature[i] = $FF) do inc(i); //<< ERROR <<
if(memorydata[i + position] <> MemorySignature[i]) then Result:=false;
end;
Result := True;
end;
The error is: E2081 Assignment to FOR-Loop variable 'i'.
I'm trying to translate an old code from C# to Delphi,but I can't increase 'i'.
Increasing 'i' is not the only way to go,but I want to know where the problem is.
Of course the others are (generally) correct. What wasn't said, is that 'i' in your loop doesn't exist. Delphi uses a CPU register for it. That's why you cannot change it and that's why you should use a 'for' loop (not a 'while') because the 'for' is way faster. Here is your code modified (not tested but I think that you got the idea) - also imho you had some bugs - fixed them also:
function Check(var MemoryData:Array of byte;MemorySignature:Array of byte;Position:integer):boolean;
var i:byte;
begin
Result := True; //moved at top. Your function always returned 'True'. This is what you wanted?
for i := 0 to Length(MemorySignature) - 1 do //are you sure??? Perhaps you want High(MemorySignature) here...
begin
if MemorySignature[i] <> $FF then //speedup - '<>' evaluates faster than '='
begin
Result:=memorydata[i + position] <> MemorySignature[i]; //speedup.
if not Result then
Break; //added this! - speedup. We already know the result. So, no need to scan till end.
end;
end;
end;
...also MemorySignature should have a 'const' or 'var'. Otherwise as it is now the array gets copied. Which means slowdown at each call of 'Check'. Having a 'var' the things are much faster with code unchanged because AFAIS the MemorySignature isn't changed.
HTH
in this case, you can just do a 'continue' instead of inc(i)
In addition to what Lasse wrote, assigning to a loop variable is generally considered a code smell. It makes code harder to read (if you want to leave the loop premataturely, you can express that a lot clearer using break/continue), and is often done by accident, causing all kind of nasty side-effects. So instead of jumping through hoops to make the compiler not do its optimizing fu on any loop where the loop variable is touched, Borland (now CodeGear) bit the bullet and made assigning to the loop variable illegal.
If you really want to mess about manually with loop indices, consider using a while-loop.
If you need to alter a loop counter inside a loop, try using a while loop instead.
BTW, you need your
Result := True
line to be the first line of the function for it to work properly. As it is, it will always return True.
The problem is that the compiler has taken the original FOR-loop code and assumed it knows what is happening, and thus it can optimize the code by outputting specific CPU instructions that runs the fastest, with those assumptions.
If it allowed you to mess with the variable value, those assumptions might go out the window, and thus the code might not work, and that's why it doesn't allow you to change it.
What you should do instead is just have a separate variable that you're actually using, and only use the FOR-loop indexing variable to keep track of how many iterations you've currently executed.
As an example, a typical optimization might be to write CPU-instructions that will stop iterating when the index register drops to zero, rewriting the loop in such a way that it internally counts down, instead of up, and if you start messing with the variable, it could not rewrite the code like that.
As per Mike Sutton, what you need is a while loop, not a for loop.
function Check(var MemoryData: Array of byte;
MemorySignature: Array of byte; Position: Integer):Boolean;
var
i:byte;
begin
Result := True;
i := 0;
while i < Length(MemorySignature) do
begin
while(MemorySignature[i] = $FF) do
Inc(i);
if(MemoryData[i + position] <> MemorySignature[i]) then
Result := False;
Inc(i);
end;
end;
The Delphi implementation of "for" is optimised, but as a result it is less flexible than the C-style

Resources