I wrote this function to remove duplicates from a TList descendant, now i was wondering if this could give me problems in certain conditions, and how it does performance wise.
It seems to work with Object Pointers
function TListClass.RemoveDups: integer;
var
total,i,j:integer;
begin
total:=0;
i := 0;
while i < count do begin
j := i+1;
while j < count do begin
if items[i]=items[j] then begin
remove(items[j]);
inc(total);
end
else
inc(j);
end;
inc(i);
end;
result:=total;
end;
Update:
Does this work faster?
function TDrawObjectList.RemoveDups: integer;
var
total,i,j:integer;
templist:TLIST;
begin
templist:=TList.Create;
total:=0;
i := 0;
while i < count do
if templist.IndexOf(items[i])=-1 then begin
templist.add(i);
inc(i);
end else begin
remove(items[i]);
inc(total);
end;
result:=total;
templist.Free;
end;
You do need another List.
As noted, the solution is O(N^2) which makes it really slow on a big set of items (1000s), but as long as the count stays low it's the best bet because of it's simplicity and easiness to implement. Where's pre-sorted and other solutions need more code and prone to implementation errors more.
This maybe the same code written in different, more compact form. It runs through all elements of the list, and for each removes duplicates on right of the current element. Removal is safe as long as it's done in a reverse loop.
function TListClass.RemoveDups: Integer;
var
I, K: Integer;
begin
Result := 0;
for I := 0 to Count - 1 do //Compare to everything on the right
for K := Count - 1 downto I+1 do //Reverse loop allows to Remove items safely
if Items[K] = Items[I] then
begin
Remove(Items[K]);
Inc(Result);
end;
end;
I would suggest to leave optimizations to a later time, if you really end up with a 5000 items list. Also, as noted above, if you do check for duplicates on adding items to the list you can save on:
Check for duplicates gets distributed in time, so it wont be as noticeable to user
You can hope to quit early if dupe is found
Just hypothetical:
Interfaces
If you have interfaced objects in an TInterfaceList that are only in that list, you could check the refcount of an object. Just loop through the list backwards and delete all objects with a refcount > 1.
Custom counter
If you can edit these objects, you could do the same without interfaces. Increment a counter on the object when they are added to the list and decrease it when they are removed.
Of course, this only works if you can actually add a counter to these objects, but the boundaries weren't exactly clear in your question, so I don't know if this is allowed.
Advantage is that you don't need to look for other items, not when inserting, not when removing duplicates. Finding a duplicate in a sorted list could be faster (as mentioned in the comments), but not having to search at all will beat even the fastest lookup.
Related
Today I have met very strange bug.
I have the next code:
var i: integer;
...
for i := 0 to FileNames.Count - 1 do
begin
ShowMessage(IntToStr(i) + ' from ' + IntToStr(FileNames.Count - 1));
FileName := FileNames[i];
...
end;
ShowMessage('all');
FileNames list has one element. So, I consider then loop will be executed once and I see
0 from 0
all
It is a thing I did thousands times :).
But in this case I see the second loop iteration when code optimization is switched on.
0 from 0
1 from 0
all
Without code optimization loop iterates right.
For the moment I don't know even the direction to move with this issue (and upper loop bound stays unchanged, yes).
So any suggestion will be very helpful. Thanks.
I use Delphi 2005 (Upd2) compiler.
Considering the QC report referred to by LU RD, and my own experience with D2005, here is a few workarounds. I recall using the while loop solution myself.
1.Rewrite the for loop as a while loop
var
i: integer;
begin
i := 0;
while i < FileNames.Count do
begin
...
inc(i);
end;
end;
2.Leave the for loop control variable alone from any other processing and use a separate variable, that you increment in the loop, for string manipulation and FileNames indexing.
var
ctrl, indx: integer;
begin
indx := 0;
for ctrl := 0 to FileNames.Count-1 do
begin
// use indx for string manipulation and FileNames indx
inc(indx);
end;
end;
3.You hinted at a workaround in saying Without code optimization loop iterates right.
Assuming you have optimization on turn it off ( {$O-} ) before the procedure/function and on ( {$O+} ) again after. Note! The Optimization directive can only be used around at least whole procedures/functions.
Ok, it seems to me I solved the problem and can explain it.
Unfortunately, I cannot make test to reproduce the bug, and I cannot show the real code, which under NDA. So I must use simplified example again.
Problem is in dll, which used in my code. Consider the next data structure:
type
TData = packed record
Count: integer;
end;
TPData = ^TData;
and function, which defined in dll:
Calc: function(Data: TPData): integer; stdcall;
In my code I try to proceed data records which are taken from list (TList):
var
i: integer;
Data: TData;
begin
for i := 0 to List.Count - 1 do
begin
Data := TPData(List[i])^;
Calc(#Data);
end;
and in case when optimization is on I see second iteration in loop from 0 to 0.
If rewrite code as
var
i: integer;
Data, Data2: TData;
begin
for i := 0 to List.Count - 1 do
begin
Data := TPData(List[i])^;
Data2 := TPData(List[i])^;
Calc(#Data2);
end;
all works as expected.
Dll itself was developed by another programmer, so I asked him to take care about it.
What was unexpected for me - that local procedure's stack can be corruped in so unusual way without access violations or other similar errors. BTW, Data and Data2 variables contains correct values.
Maybe, my experience will be useful to someone. Thanks all who helps me and please sorry my unconscious mistakes.
what is the fastest way to find duplicates in a Tstringlist. I get the data I need to search for duplicates in a Stringlist. My current idea goes like this :
var TestStringList, DataStringList : TstringList;
for i := 0 to DataStringList.Items-1 do
begin
if TestStringList.Indexof(DataStringList[i])< 0 < 0 then
begin
TestStringList.Add(DataStringList[i])
end
else
begin
memo1.ines.add('duplicate item found');
end;
end;
....
Just for completeness, (and because your code doesn't actually use the duplicate, but just indicates one has been found): Delphi's TStringList has the built-in ability to deal with duplicate entries, in it's Duplicates property. Setting it to dupIgnore will simply discard any duplicates you attempt to add. Note that the destination list has to be sorted, or Duplicates has no effect.
TestStringList.Sorted := True;
TestStringList.Duplicates := dupIgnore;
for i := 0 to DataStringList.Items-1 do
TestStringList.Add(DataStringList[i]);
Memo1.Lines.Add(Format('%d duplicates discarded',
[DataStringList.Count - TestStringList.Count]));
A quick test shows that the entire loop can be removed if you use Sorted and Duplicates:
TestStringList.Sorted := True;
TestStringList.Duplicates := dupIgnore;
TestStringList.AddStrings(DataStringList);
Memo1.Lines.Add(Format('%d duplicates discarded',
[DataStringList.Count - TestStringList.Count]));
See the TStringList.Duplicates documentation for more info.
I think that you are looking for duplicates. If so then you do the following:
Case 1: The string list is ordered
In this scenario, duplicates must appear at adjacent indices. In which case you simply loop from 1 to Count-1 and check whether or not the elements of index i is the same as that at index i-1.
Case 2: The string list is not ordered
In this scenario we need a double for loop. It looks like this:
for i := 0 to List.Count-1 do
for j := i+1 to List.Count-1 do
if List[i]=List[j] then
// duplicate found
There are performance considerations. If the list is ordered the search is O(N). If the list is not ordered the search is O(N2). Clearly the former is preferable. Since a list can be sorted with complexity O(N log N), if performance becomes a factor then it will be advantageous to sort the list before searching for duplicates.
Judging by the use of IndexOf you use an unsorted list. The scaling factor of your algorithm then is n^2. That is slow. You can optimize it as David shown by limiting search area in the internal search and then the average factor would be n^2/2 - but that still scales badly.
Note: scaling factor here makes sense for limited workloads, say dozen or hundreds of strings per list. For larger sets of data asymptotic analysis O(...) measure would suit better. However finding O-measures for QuickSort and for hash-lists is a trivial task.
Option 1: Sort the list. Using quick-sort it would have scaling factor n + n*log(n) or O(n*log(n)) for large loads.
Set Duplicates to accept
Set Sorted to True
Iterate the sorted list and check if the next string exists and is the same
http://docwiki.embarcadero.com/Libraries/XE3/en/System.Classes.TStringList.Duplicates
http://docwiki.embarcadero.com/Libraries/XE3/en/System.Classes.TStringList.Sorted
Option 2: use hashed list helper. In modern Delphi that would be TDictionary<String,Boolean>, in older Delphi there is a class used by TMemIniFile
You iterate your stringlist and then check if the string was already added into the helper collection.
The scaling factor would be a constant for small data chunks and O(1) for large ones - see http://docwiki.embarcadero.com/Libraries/XE2/en/System.Generics.Collections.TDictionary.ContainsKey
If it was not - you add it with "false" value.
If it was - you switch the value to "true"
For older Delphi you can use THashedStringList in a similar pattern (thanks #FreeConsulting)
http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/IniFiles_THashedStringList_IndexOf.html
Unfortunately it is unclear what you want to do with the duplicates. Your else clause suggests you just want to know whether there is one (or more) duplicate(s). Although that could be the end goal, I assume you want more.
Extracting duplicates
The previously given answers delete or count the duplicate items. Here an answer for keeping them.
procedure ExtractDuplicates1(List1, List2: TStringList; Dupes: TStrings);
var
Both: TStringList;
I: Integer;
begin
Both := TStringList.Create;
try
Both.Sorted := True;
Both.Duplicates := dupAccept;
Both.AddStrings(List1);
Both.AddStrings(List2);
for I := 0 to Both.Count - 2 do
if (Both[I] = Both[I + 1]) then
if (Dupes.Count = 0) or (Dupes[Dupes.Count - 1] <> Both[I]) then
Dupes.Add(Both[I]);
finally
Both.Free;
end;
end;
Performance
The following alternatives are tried in order to compare performance of the above routine.
procedure ExtractDuplicates2(List1, List2: TStringList; Dupes: TStrings);
var
Both: TStringList;
I: Integer;
begin
Both := TStringList.Create;
try
Both.AddStrings(List1);
Both.AddStrings(List2);
Both.Sort;
for I := 0 to Both.Count - 2 do
if (Both[I] = Both[I + 1]) then
if (Dupes.Count = 0) or (Dupes[Dupes.Count - 1] <> Both[I]) then
Dupes.Add(Both[I]);
finally
Both.Free;
end;
end;
procedure ExtractDuplicates3(List1, List2, Dupes: TStringList);
var
I: Integer;
begin
Dupes.Sorted := True;
Dupes.Duplicates := dupAccept;
Dupes.AddStrings(List1);
Dupes.AddStrings(List2);
for I := Dupes.Count - 1 downto 1 do
if (Dupes[I] <> Dupes[I - 1]) or (I > 1) and (Dupes[I] = Dupes[I - 2]) then
Dupes.Delete(I);
if (Dupes.Count > 1) and (Dupes[0] <> Dupes[1]) then
Dupes.Delete(0);
while (Dupes.Count > 1) and (Dupes[0] = Dupes[1]) do
Dupes.Delete(0);
end;
Although ExtractDuplicates3 marginally performs better, I prefer ExtractDuplicates1 because it reeds better and the TStrings parameter provides more usability. ExtractDuplicates2 performs noticeable worst, which demonstrates that sorting all items afterwards in a single run takes more time then continuously sorting every single item added.
Note
This answer is part of this recent answer for which I was about to ask the same question: "how to keep duplicates?". I didn't, but if anyone knows or finds a better solution, please comment, add or update this answer.
This is an old thread but I thought this solution may be useful.
An option is to pump the values from one stringlist to another one with the setting of TestStringList.Duplicates := dupError; and then trap the exception.
var TestStringList, DataStringList : TstringList;
TestStringList.Sorted := True;
TestStringList.Duplicates := dupError;
for i := 0 to DataStringList.Items-1 do
begin
try
TestStringList.Add(DataStringList[i])
except
on E : EStringListError do begin
memo1.Lines.Add('duplicate item found');
end;
end;
end;
....
Just note that the trapping of the exception also masks the following errors:
There is not enough memory to expand the list, the list tried to grow beyond its maximal capacity, a non-existent element of the list was referenced. (i.e. the list index was out of bounds).
function TestDuplicates(const dataStrList: TStringList): integer;
begin
with TStringlist.create do begin
{Duplicates:= dupIgnore;}
for it:= 0 to DataStrList.count-1 do begin
if IndexOf(DataStrList[it])< 0 then
Add(DataStrList[it])
else
inc(result)
end;
Free;
end;
end;
:)
First thing, my code
procedure TForm1.Button3Click(Sender: TObject);
var tempId,i:integer;
begin
tempId:=strtoint(edit5.Text);
plik:=TStringList.Create;
plik.LoadFromFile('.\klienci\'+linia_klient[id+1]+'.txt');
if (plik.Count=1) then
begin
label6.Caption:='then';
if (tempId=StrToInt(plik[0])) then
begin
Label6.Caption:='Zwrócono';
plik.Delete(0);
end
end
else
for i:=0 to plik.Count-2 do
begin
if (tempId=StrToInt(plik[i])) then
begin
Label6.Caption:='Zwrócono';
plik.Delete(i);
end;
end;
plik.SaveToFile('.\klienci\'+linia_klient[id+1]+'.txt');
plik.Free;
end;
When for i:=0 to plik.Count-2 do I can delete any element but not
last.
When for i:=0 to plik.Count-1 do I can delete any element without
but from end to start. Because otherwise List index out of bounds.
What's going one? How can I safety search and remove elements from TStringList?
When deleting intems from list you want to use downto loop, ie
for i := plik.Count-1 downto 0 do
begin
if (tempId=StrToInt(plik[i])) then
begin
Label6.Caption:='Zwrócono';
plik.Delete(i);
end;
end;
This ensures that if you delete item, the loop index stays valid as you move from the end of the list dowards beginning of the list.
This is a classic problem. A for loop evaluates the loop bounds once at the beginning of the loop, so you run off the end which explains your index out of bounds errors.
But even if for loops evaluated loop bounds every time like a while does that would not really help. When you delete an element, you reduce the Count by 1 and move the remaining elements down one in the list. So you change the index of all those still to be processed elements.
The standard trick is to loop down the list:
for i := List.Count-1 downto 0 do
if DeleteThisItem(i) then
List.Delete(i);
When you write it this way, the call to Delete affects the indices of elements that have already been processed.
For I := stringlist.count-1 downto 0 do
Now you can delete all items without any error
in an ascending loop like for i:=1 to count you just can't delete items of the list you are iterating over.
there are several solutions depending on the overall logic of what you want to achieve.
you may change the for loop into a while loop that reevaluates count and don't increment index on the delete iteration
you may reverse the loop, kinda for i:=count downto 1
instead of delete, you may create a temporary list and copy there only the items you want to keep, and recopy it back.
As others have said, using a downto loop is usually the best choice. Of course, it does change the semantics of the loop so it runs backwards instead of forwards. If you want to continue looping forwards, you have to use a while loop instead, eg:
I := 0;
while I < plik.Count do
begin
if (tempId = StrToInt(plik[I])) then
begin
...
plik.Delete(I);
end else
Inc(I);
end;
Or:
var
CurIdx, Cnt: Integer;
CurIdx := 0;
Cnt := plik.Count;
for I := 0 to Cnt-1 do
begin
if (tempId = StrToInt(plik[CurIdx])) then
begin
...
plik.Delete(CurIdx);
end else
Inc(CurIdx);
end;
This is a sorted listview with 50000 items(strings) in delphi. How to fast search items with same prefixed words and then skip out loop?
The list is like:
aa.....
ab cd//from here
ab kk
ab li
ab mn
ab xy// to here
ac xz
...
I mean how to fast find and copy items with prefixes of ab and skip out loop. Suppose index of one of ab items is got in a binarysearch. The index of ab cd to ab xy is got through a binary search.
Thank you very much.
Edit: We thank all for your answers.
If you want something fast, don't store your data in a TListView.
Use a TStringList to store your list, then use a TListView in virtual mode.
Reading from a TStringList.Items[] is many times faster than reading from a TListView.Items[] property.
If you're sure that no void item exist in the list, uses this:
procedure Extract(List, Dest: TStrings; Char1, Char2: char);
var i,j: integer;
V: cardinal;
type PC = {$ifdef UNICODE}PCardinal{$else}PWord{$endif};
begin
V := ord(Char1)+ord(Char2) shl (8*sizeof(char));
Dest.BeginUpdate;
Dest.Clear;
for i := 0 to List.Count-1 do begin
if PC(pointer(List[i]))^=V then begin
for j := i to List.Count-1 do begin
Dest.Add(List[j]);
if PC(pointer(List[j]))^<>V then
break; // end the for j := loop
end;
break; // end the for i := loop
end;
Dest.EndUpdate;
end;
You can use binary search to get it even faster. But with the PWord() trick, on a 50000 items list, you won't notice it.
Note that PC(pointer(List[i]))^=V is a faster version of copy(List[i],1,2)=Char1+Char2, because no temporary string is created during the comparison. But it works only if no List[i]='', i.e. no pointer(List[i])=nil.
I've added a {$ifdef UNICODE} and sizeof(char) in order to have this code compile with all version of Delphi (before and after Delphi 2009).
To stop running a loop, use the break command. Exit is also useful to leave an entire function, especially when you have multiple nested loops to escape. As a last resort, you can use goto to jump out of several nested loops and continue running in the same function.
If you use a while or repeat loop instead of a for loop, the you can include another conjunct in your stopping condition that you set mid-loop:
i := 0;
found := False;
while (i < count) and not found do begin
// search for items
found := True;
// more stuff
Inc(i);
end;
I have to check if I have duplicate paths in a FileListBox (FileListBox has the role of some kind of job list or play list).
Using Delphi's SameText, CompareStr, CompareText, takes 6 seconds. So I came with my own compare function which is (just) a bit faster but not fast enough. Any ideas how to improve it?
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result:= Length(Path1)= Length(Path2); { if they have different lenghts then obviously are not the same file }
if Result then
for i:= Length(Path1) downto 1 DO { start from the end because it is more likely to find the difference there }
if Path1[i]<> Path2[i] then
begin
Result:= FALSE;
Break;
end;
end;
I use it like this:
for x:= JList.Count-1 downto 1 DO
begin
sMaster:= JList.Items[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, JList.Items[y]) then
begin
JList.Items.Delete (x); { REMOVE DUPLICATES }
Break;
end;
end;
Note: The chance of having duplicates is small so Delete is not called often. Also the list cannot be sorted because the items are added by user and sometimes the order may be important.
Update:
The thing is that I lose the asvantage of my code because it is Pascal.
It would be nice if the comparison loop ( Path1[i]<> Path2[i] ) would be optimized to use Borland's ASM code.
Delphi 7, Win XP 32 bit, Tests were done with 577 items in the list. Deleting the items from list IS NOT A PROBLEM because it happens rarely.
CONCLUSION
As Svein Bringsli pointed, my code is slow not because of the comparing algorithm but because of TListBox. The BEST solution was provided by Marcelo Cantos. Thanks a lot Marcelo.
I accepted Svein's answer because it answers directly my question "how to make my comparison function faster" with "there is no point to make it faster".
For the moment I implemented the dirty and quick to implement solution: when I have under 200 files, I use my slow code to check the duplicates. If there are more than 200 files I use dwrbudr's solution (which is damn fast) considering that if the user has so many files, the order is irrelevant anyway (human brain cannot track so many items).
I want to thank you all for ideas and especially Svein for revealing the truth: (Borland's) visual controls are damn slow!
Don't waste time optimising the assembler. You can go from O(n2) to O(n log(n)) — bringing the time down to milliseconds — by sorting the list and then doing a linear scan for duplicates.
While you're at it, forget the SameFile function. The algorithmic improvement will dwarf anything you can achieve there.
Edit: Based on feedback in the comments...
You can perform an order-preserving O(n log(n)) de-duplication as follows:
Sort a copy of the list.
Identify and copy duplicated entries to a third list along with their duplication count minus one.
Walk the original list backwards as per your original version.
In the inner (for y := ...) loop, traverse the duplication list instead. If an outer item matches, delete it, decrement the duplication count, and delete the duplication entry if the count reaches zero.
This is obviously more complicated but it will still be orders of magnitude faster, even if you do horrible dirty things like storing duplication counts as strings, C:\path1\file1=2, and using code like:
y := dupes.IndexOfName(sMaster);
if y <> -1 then
begin
JList.Items.Delete(x);
c := StrToInt(dupes.ValueFromIndex(y));
if c > 1 then
dupes.Values[sMaster] = IntToStr(c - 1);
else
dupes.Delete(y);
end;
Side note: A binary chop would be more efficient than the for y := ... loop, but given that duplicates are rare, the difference ought to be negligible.
Using your code as a starting point, I modified it to take a copy of the list before searching for duplicates. The time went from 5,5 seconds to about 0,5 seconds.
vSL := TStringList.Create;
try
vSL.Assign(jList.Items);
vSL.Sorted := true;
for x:= vSL.Count-1 downto 1 DO
begin
sMaster:= vSL[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, vSL[y]) then
begin
vSL.Delete (x); { REMOVE DUPLICATES }
jList.Items.Delete (x);
Break;
end;
end;
finally
vSL.Free;
end;
Obviously, this is not a good way to do it, but it demonstrates that TFileListBox is in itself quite slow. I don't believe you can gain much by optimizing your compare-function.
To demonstrate this, I replaced your SameFile function with the following, but kept the rest of your code:
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result := false; //Pretty darn fast code!!!
end;
The time went from 5,6 seconds to 5,5 seconds. I don't think there's much more to gain there :-)
Create another sorted list with sortedList.Duplicates := dupIgnore and add your strings to that list, then back.
vSL := TStringList.Create;
try
vSL.Sorted := true;
vSL.Duplicates := dupIgnore;
for x:= 0 to jList.Count - 1 do
vSL.Add(jList[x]);
jList.Clear;
for x:= 0 to vSL.Count - 1 do
jList.Add(vSL[x]);
finally
vSL.Free;
end;
The absolute fastest way, bar none (as alluded to before) is to use a routine that generates a unique 64/128/256 bit hash code for a string (I use the SHA256Managed class in C#). Run down the list of strings, generate the hash code for the strings, check for it in the sorted hash code list, and if found then the string is a duplicate. Otherwise add the hash code to the sorted hash code list.
This will work for strings, file names, images (you can get the unique hash code for an image), etc, and I guarantee that this will be as fast or faster than any other impementation.
PS You can use a string list for the hash codes by representing the hash codes as strings. I've used a hex representation in the past (256 bits -> 64 characters) but in theory you can do it any way you like.
4 seconds for how many calls? Great performance if you call it a billion times...
Anyway, does Length(Path1) get evaluated every time through the loop? If so, store that in an Integer variable prior to looping.
Pointers may yield some speed over the strings.
Try in-lining the function with:
function SameFile(blah blah): Boolean; Inline;
That will save some time, if this is being called thousands of times per second. I would start with that and see if it saves anything.
EDIT: I didn't realize that your list wasn't sorted. Obviously, you should do that first! Then you don't have to compare against every other item in the list - just the prior or next one.
I use a modified Ternary Search Tree (TST) to dedupe lists. You simply load the items into the tree, using the whole string as the key, and on each item you can get back an indication if the key is already there (and delete your visible entry). Then you throw away the tree. Our TST load function can typically load 100000 80-byte items in well under a second. And it could not take any more than this to repaint your list, with proper use of begin- and end-update. The TST is memory-hungry, but not so that you would notice it at all if you only have of the order of 500 items. And much simpler than sorting, comparisons and assembler (if you have a suitable TST implementation, of course).
No need to use a hash table, a single sorted list gives me a result of 10 milliseconds, that's 0.01 seconds, which is about 500 times faster! Here is my test code using a TListBox:
procedure TForm1.Button1Click(Sender: TObject);
var
lIndex1: Integer;
lString: string;
lIndex2: Integer;
lStrings: TStringList;
lCount: Integer;
lItems: TStrings;
begin
ListBox1.Clear;
for lIndex1 := 1 to 577 do begin
lString := '';
for lIndex2 := 1 to 100 do
if (lIndex2 mod 6) = 0 then
lString := lString + Chr(Ord('a') + Random(2))
else
lString := lString + 'a';
ListBox1.Items.Add(lString);
end;
CsiGlobals.AddLogMsg('Start', 'Test', llBrief);
lStrings := TStringList.Create;
try
lStrings.Sorted := True;
lCount := 0;
lItems := ListBox1.Items;
with lItems do begin
BeginUpdate;
try
for lIndex1 := Count - 1 downto 0 do begin
lStrings.Add(Strings[lIndex1]);
if lStrings.Count = lCount then
Delete(lIndex1)
else
Inc(lCount);
end;
finally
EndUpdate;
end;
end;
finally
lStrings.Free;
end;
CsiGlobals.AddLogMsg('Stop', 'Test', llBrief);
end;
I'd also like to point out that your solution would take an extreme amount of time if applied to a huge list (like containing 100,000,000 items or more). Even constructing a hashtable or sorted list would take too much time.
In cases like that you could try another approach : Hash each member, but instead of populating a full-blown hashtable, create a bitset (large enough to contain a close factor to as many slots as there are input items) and just set each bit at the offset indicated by the hashfunction. If the bit was 0, change it to 1. If it was already 1, take note of the offending string-index in a separate list and continue. This results in a list of string-indexes that had a collision in the hash, so you'll have to run it a second time to find the first cause of those collisions. After that, you should sort & de-dupe the string-indexes in this list (as all indexes apart from the first one will be present twice). Once that's done you should sort the list again, but this time sort it on the string-contents in order to easily spot duplicates in a following single scan.
Granted it could be a bit extreme to go this all this length, but at least it's a workable solution for very large volumes! (Oh, and this still won't work if the number of duplicates is very high, when the hash-function has a bad spread or when the number of slots in the 'hashtable' bitset is chosen too small - which would give many collisions which aren't really duplicates.)