Find duplicates in a stringlist very fast

Find duplicates in a stringlist very fast - delphi

what is the fastest way to find duplicates in a Tstringlist. I get the data I need to search for duplicates in a Stringlist. My current idea goes like this :
var TestStringList, DataStringList : TstringList;
for i := 0 to DataStringList.Items-1 do
begin
if TestStringList.Indexof(DataStringList[i])< 0 < 0 then
begin
TestStringList.Add(DataStringList[i])
end
else
begin
memo1.ines.add('duplicate item found');
end;
end;
....

Just for completeness, (and because your code doesn't actually use the duplicate, but just indicates one has been found): Delphi's TStringList has the built-in ability to deal with duplicate entries, in it's Duplicates property. Setting it to dupIgnore will simply discard any duplicates you attempt to add. Note that the destination list has to be sorted, or Duplicates has no effect.
TestStringList.Sorted := True;
TestStringList.Duplicates := dupIgnore;
for i := 0 to DataStringList.Items-1 do
TestStringList.Add(DataStringList[i]);
Memo1.Lines.Add(Format('%d duplicates discarded',
[DataStringList.Count - TestStringList.Count]));
A quick test shows that the entire loop can be removed if you use Sorted and Duplicates:
TestStringList.Sorted := True;
TestStringList.Duplicates := dupIgnore;
TestStringList.AddStrings(DataStringList);
Memo1.Lines.Add(Format('%d duplicates discarded',
[DataStringList.Count - TestStringList.Count]));
See the TStringList.Duplicates documentation for more info.

I think that you are looking for duplicates. If so then you do the following:
Case 1: The string list is ordered
In this scenario, duplicates must appear at adjacent indices. In which case you simply loop from 1 to Count-1 and check whether or not the elements of index i is the same as that at index i-1.
Case 2: The string list is not ordered
In this scenario we need a double for loop. It looks like this:
for i := 0 to List.Count-1 do
for j := i+1 to List.Count-1 do
if List[i]=List[j] then
// duplicate found
There are performance considerations. If the list is ordered the search is O(N). If the list is not ordered the search is O(N2). Clearly the former is preferable. Since a list can be sorted with complexity O(N log N), if performance becomes a factor then it will be advantageous to sort the list before searching for duplicates.

Judging by the use of IndexOf you use an unsorted list. The scaling factor of your algorithm then is n^2. That is slow. You can optimize it as David shown by limiting search area in the internal search and then the average factor would be n^2/2 - but that still scales badly.
Note: scaling factor here makes sense for limited workloads, say dozen or hundreds of strings per list. For larger sets of data asymptotic analysis O(...) measure would suit better. However finding O-measures for QuickSort and for hash-lists is a trivial task.
Option 1: Sort the list. Using quick-sort it would have scaling factor n + n*log(n) or O(n*log(n)) for large loads.
Set Duplicates to accept
Set Sorted to True
Iterate the sorted list and check if the next string exists and is the same
http://docwiki.embarcadero.com/Libraries/XE3/en/System.Classes.TStringList.Duplicates
http://docwiki.embarcadero.com/Libraries/XE3/en/System.Classes.TStringList.Sorted
Option 2: use hashed list helper. In modern Delphi that would be TDictionary<String,Boolean>, in older Delphi there is a class used by TMemIniFile
You iterate your stringlist and then check if the string was already added into the helper collection.
The scaling factor would be a constant for small data chunks and O(1) for large ones - see http://docwiki.embarcadero.com/Libraries/XE2/en/System.Generics.Collections.TDictionary.ContainsKey
If it was not - you add it with "false" value.
If it was - you switch the value to "true"
For older Delphi you can use THashedStringList in a similar pattern (thanks #FreeConsulting)
http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/IniFiles_THashedStringList_IndexOf.html

Unfortunately it is unclear what you want to do with the duplicates. Your else clause suggests you just want to know whether there is one (or more) duplicate(s). Although that could be the end goal, I assume you want more.
Extracting duplicates
The previously given answers delete or count the duplicate items. Here an answer for keeping them.
procedure ExtractDuplicates1(List1, List2: TStringList; Dupes: TStrings);
var
Both: TStringList;
I: Integer;
begin
Both := TStringList.Create;
try
Both.Sorted := True;
Both.Duplicates := dupAccept;
Both.AddStrings(List1);
Both.AddStrings(List2);
for I := 0 to Both.Count - 2 do
if (Both[I] = Both[I + 1]) then
if (Dupes.Count = 0) or (Dupes[Dupes.Count - 1] <> Both[I]) then
Dupes.Add(Both[I]);
finally
Both.Free;
end;
end;
Performance
The following alternatives are tried in order to compare performance of the above routine.
procedure ExtractDuplicates2(List1, List2: TStringList; Dupes: TStrings);
var
Both: TStringList;
I: Integer;
begin
Both := TStringList.Create;
try
Both.AddStrings(List1);
Both.AddStrings(List2);
Both.Sort;
for I := 0 to Both.Count - 2 do
if (Both[I] = Both[I + 1]) then
if (Dupes.Count = 0) or (Dupes[Dupes.Count - 1] <> Both[I]) then
Dupes.Add(Both[I]);
finally
Both.Free;
end;
end;
procedure ExtractDuplicates3(List1, List2, Dupes: TStringList);
var
I: Integer;
begin
Dupes.Sorted := True;
Dupes.Duplicates := dupAccept;
Dupes.AddStrings(List1);
Dupes.AddStrings(List2);
for I := Dupes.Count - 1 downto 1 do
if (Dupes[I] <> Dupes[I - 1]) or (I > 1) and (Dupes[I] = Dupes[I - 2]) then
Dupes.Delete(I);
if (Dupes.Count > 1) and (Dupes[0] <> Dupes[1]) then
Dupes.Delete(0);
while (Dupes.Count > 1) and (Dupes[0] = Dupes[1]) do
Dupes.Delete(0);
end;
Although ExtractDuplicates3 marginally performs better, I prefer ExtractDuplicates1 because it reeds better and the TStrings parameter provides more usability. ExtractDuplicates2 performs noticeable worst, which demonstrates that sorting all items afterwards in a single run takes more time then continuously sorting every single item added.
Note
This answer is part of this recent answer for which I was about to ask the same question: "how to keep duplicates?". I didn't, but if anyone knows or finds a better solution, please comment, add or update this answer.

This is an old thread but I thought this solution may be useful.
An option is to pump the values from one stringlist to another one with the setting of TestStringList.Duplicates := dupError; and then trap the exception.
var TestStringList, DataStringList : TstringList;
TestStringList.Sorted := True;
TestStringList.Duplicates := dupError;
for i := 0 to DataStringList.Items-1 do
begin
try
TestStringList.Add(DataStringList[i])
except
on E : EStringListError do begin
memo1.Lines.Add('duplicate item found');
end;
end;
end;
....
Just note that the trapping of the exception also masks the following errors:
There is not enough memory to expand the list, the list tried to grow beyond its maximal capacity, a non-existent element of the list was referenced. (i.e. the list index was out of bounds).

function TestDuplicates(const dataStrList: TStringList): integer;
begin
with TStringlist.create do begin
{Duplicates:= dupIgnore;}
for it:= 0 to DataStrList.count-1 do begin
if IndexOf(DataStrList[it])< 0 then
Add(DataStrList[it])
else
inc(result)
end;
Free;
end;
end;

Related

For-loop variable violates loop bound

Today I have met very strange bug.
I have the next code:
var i: integer;
...
for i := 0 to FileNames.Count - 1 do
begin
ShowMessage(IntToStr(i) + ' from ' + IntToStr(FileNames.Count - 1));
FileName := FileNames[i];
...
end;
ShowMessage('all');
FileNames list has one element. So, I consider then loop will be executed once and I see
0 from 0
all
It is a thing I did thousands times :).
But in this case I see the second loop iteration when code optimization is switched on.
0 from 0
1 from 0
all
Without code optimization loop iterates right.
For the moment I don't know even the direction to move with this issue (and upper loop bound stays unchanged, yes).
So any suggestion will be very helpful. Thanks.
I use Delphi 2005 (Upd2) compiler.

Considering the QC report referred to by LU RD, and my own experience with D2005, here is a few workarounds. I recall using the while loop solution myself.
1.Rewrite the for loop as a while loop
var
i: integer;
begin
i := 0;
while i < FileNames.Count do
begin
...
inc(i);
end;
end;
2.Leave the for loop control variable alone from any other processing and use a separate variable, that you increment in the loop, for string manipulation and FileNames indexing.
var
ctrl, indx: integer;
begin
indx := 0;
for ctrl := 0 to FileNames.Count-1 do
begin
// use indx for string manipulation and FileNames indx
inc(indx);
end;
end;
3.You hinted at a workaround in saying Without code optimization loop iterates right.
Assuming you have optimization on turn it off ( {$O-} ) before the procedure/function and on ( {$O+} ) again after. Note! The Optimization directive can only be used around at least whole procedures/functions.

Ok, it seems to me I solved the problem and can explain it.
Unfortunately, I cannot make test to reproduce the bug, and I cannot show the real code, which under NDA. So I must use simplified example again.
Problem is in dll, which used in my code. Consider the next data structure:
type
TData = packed record
Count: integer;
end;
TPData = ^TData;
and function, which defined in dll:
Calc: function(Data: TPData): integer; stdcall;
In my code I try to proceed data records which are taken from list (TList):
var
i: integer;
Data: TData;
begin
for i := 0 to List.Count - 1 do
begin
Data := TPData(List[i])^;
Calc(#Data);
end;
and in case when optimization is on I see second iteration in loop from 0 to 0.
If rewrite code as
var
i: integer;
Data, Data2: TData;
begin
for i := 0 to List.Count - 1 do
begin
Data := TPData(List[i])^;
Data2 := TPData(List[i])^;
Calc(#Data2);
end;
all works as expected.
Dll itself was developed by another programmer, so I asked him to take care about it.
What was unexpected for me - that local procedure's stack can be corruped in so unusual way without access violations or other similar errors. BTW, Data and Data2 variables contains correct values.
Maybe, my experience will be useful to someone. Thanks all who helps me and please sorry my unconscious mistakes.

Removing duplicates from List

I wrote this function to remove duplicates from a TList descendant, now i was wondering if this could give me problems in certain conditions, and how it does performance wise.
It seems to work with Object Pointers
function TListClass.RemoveDups: integer;
var
total,i,j:integer;
begin
total:=0;
i := 0;
while i < count do begin
j := i+1;
while j < count do begin
if items[i]=items[j] then begin
remove(items[j]);
inc(total);
end
else
inc(j);
end;
inc(i);
end;
result:=total;
end;
Update:
Does this work faster?
function TDrawObjectList.RemoveDups: integer;
var
total,i,j:integer;
templist:TLIST;
begin
templist:=TList.Create;
total:=0;
i := 0;
while i < count do
if templist.IndexOf(items[i])=-1 then begin
templist.add(i);
inc(i);
end else begin
remove(items[i]);
inc(total);
end;
result:=total;
templist.Free;
end;
You do need another List.

As noted, the solution is O(N^2) which makes it really slow on a big set of items (1000s), but as long as the count stays low it's the best bet because of it's simplicity and easiness to implement. Where's pre-sorted and other solutions need more code and prone to implementation errors more.
This maybe the same code written in different, more compact form. It runs through all elements of the list, and for each removes duplicates on right of the current element. Removal is safe as long as it's done in a reverse loop.
function TListClass.RemoveDups: Integer;
var
I, K: Integer;
begin
Result := 0;
for I := 0 to Count - 1 do //Compare to everything on the right
for K := Count - 1 downto I+1 do //Reverse loop allows to Remove items safely
if Items[K] = Items[I] then
begin
Remove(Items[K]);
Inc(Result);
end;
end;
I would suggest to leave optimizations to a later time, if you really end up with a 5000 items list. Also, as noted above, if you do check for duplicates on adding items to the list you can save on:
Check for duplicates gets distributed in time, so it wont be as noticeable to user
You can hope to quit early if dupe is found

Just hypothetical:
Interfaces
If you have interfaced objects in an TInterfaceList that are only in that list, you could check the refcount of an object. Just loop through the list backwards and delete all objects with a refcount > 1.
Custom counter
If you can edit these objects, you could do the same without interfaces. Increment a counter on the object when they are added to the list and decrease it when they are removed.
Of course, this only works if you can actually add a counter to these objects, but the boundaries weren't exactly clear in your question, so I don't know if this is allowed.
Advantage is that you don't need to look for other items, not when inserting, not when removing duplicates. Finding a duplicate in a sorted list could be faster (as mentioned in the comments), but not having to search at all will beat even the fastest lookup.

How do I split alternating lines of a text file into two arrays?

How would I read data from a text file into two arrays? One being string and the other integer?
The text file has a layout like this:
Hello
1
Test
2
Bye
3
Each number corresponds to the text above it. Can anyone perhaps help me? Would greatly appreciate it

var
Items: TStringList;
Strings: array of string;
Integers: array of Integer;
i, Count: Integer;
begin
Items := TStringList.Create;
try
Items.LoadFromFile('c:\YourFileName.txt');
// Production code should check that Items.Count is even at this point.
// Actual arrays here. Set their size once, because we know already.
// growing your arrays inside the iteration will cause many reallocations
// and memory fragmentation.
Count := Items.Count div 2;
SetLength(Strings, Count);
SetLength(Integers, Count);
for i := 0 to Count - 1 do
begin
Strings[i] := Items[i*2];
Integers[i] := StrToInt(Items[i*2+1]);
end;
finally
Items.Free;
end;
end

I would read the file into a string list and then process it item by item. The even ones are put into the list of strings, and the odd ones go into the numbers.
var
file, strings, numbers: TStringList;
...
//create the lists
file.LoadFromFile(filename);
Assert(file.Count mod 2=0);
for i := 0 to file.Count-1 do
if i mod 2=0 then
strings.Add(file[i])
else
numbers.Add(file[i]);
I'd probably use some helper functions called odd and even in my own code.
If you wanted the numbers in a list of integers, rather than a string list, then you would use TList<Integer> and add StrToInt(file[i]) on the odd iterations.
I've used lists rather than dynamic arrays for the ease of writing this code, but GolezTrol shows you how to do it with dynamic arrays if that's what you prefer.
That said, since your state that the number is associated with the string, you may actually be better off with something like this:
type
TNameAndID = record
Name: string;
ID: Integer;
end;
var
List: TList<TNameAndID>;
Item: TNameAndID;
...
List := TList<TNameAndID>.Create;
file.LoadFromFile(filename);
Assert(file.Count mod 2=0);
for i := 0 to file.Count-1 do begin
if i mod 2=0 then begin
Item.Name := file[i];
end else begin
Item.ID := StrToInt(file[i]);
List.Add(Item);
end;
end;
end;
The advantage of this approach is that you now have assurance that the association between name and ID will be maintained. Should you ever wish to sort, insert or remove items then you will find the above structure much more convenient than two parallel arrays.

Really fast function to compare the name (full path) of two files

I have to check if I have duplicate paths in a FileListBox (FileListBox has the role of some kind of job list or play list).
Using Delphi's SameText, CompareStr, CompareText, takes 6 seconds. So I came with my own compare function which is (just) a bit faster but not fast enough. Any ideas how to improve it?
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result:= Length(Path1)= Length(Path2); { if they have different lenghts then obviously are not the same file }
if Result then
for i:= Length(Path1) downto 1 DO { start from the end because it is more likely to find the difference there }
if Path1[i]<> Path2[i] then
begin
Result:= FALSE;
Break;
end;
end;
I use it like this:
for x:= JList.Count-1 downto 1 DO
begin
sMaster:= JList.Items[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, JList.Items[y]) then
begin
JList.Items.Delete (x); { REMOVE DUPLICATES }
Break;
end;
end;
Note: The chance of having duplicates is small so Delete is not called often. Also the list cannot be sorted because the items are added by user and sometimes the order may be important.
Update:
The thing is that I lose the asvantage of my code because it is Pascal.
It would be nice if the comparison loop ( Path1[i]<> Path2[i] ) would be optimized to use Borland's ASM code.
Delphi 7, Win XP 32 bit, Tests were done with 577 items in the list. Deleting the items from list IS NOT A PROBLEM because it happens rarely.
CONCLUSION
As Svein Bringsli pointed, my code is slow not because of the comparing algorithm but because of TListBox. The BEST solution was provided by Marcelo Cantos. Thanks a lot Marcelo.
I accepted Svein's answer because it answers directly my question "how to make my comparison function faster" with "there is no point to make it faster".
For the moment I implemented the dirty and quick to implement solution: when I have under 200 files, I use my slow code to check the duplicates. If there are more than 200 files I use dwrbudr's solution (which is damn fast) considering that if the user has so many files, the order is irrelevant anyway (human brain cannot track so many items).
I want to thank you all for ideas and especially Svein for revealing the truth: (Borland's) visual controls are damn slow!

Don't waste time optimising the assembler. You can go from O(n2) to O(n log(n)) — bringing the time down to milliseconds — by sorting the list and then doing a linear scan for duplicates.
While you're at it, forget the SameFile function. The algorithmic improvement will dwarf anything you can achieve there.
Edit: Based on feedback in the comments...
You can perform an order-preserving O(n log(n)) de-duplication as follows:
Sort a copy of the list.
Identify and copy duplicated entries to a third list along with their duplication count minus one.
Walk the original list backwards as per your original version.
In the inner (for y := ...) loop, traverse the duplication list instead. If an outer item matches, delete it, decrement the duplication count, and delete the duplication entry if the count reaches zero.
This is obviously more complicated but it will still be orders of magnitude faster, even if you do horrible dirty things like storing duplication counts as strings, C:\path1\file1=2, and using code like:
y := dupes.IndexOfName(sMaster);
if y <> -1 then
begin
JList.Items.Delete(x);
c := StrToInt(dupes.ValueFromIndex(y));
if c > 1 then
dupes.Values[sMaster] = IntToStr(c - 1);
else
dupes.Delete(y);
end;
Side note: A binary chop would be more efficient than the for y := ... loop, but given that duplicates are rare, the difference ought to be negligible.

Using your code as a starting point, I modified it to take a copy of the list before searching for duplicates. The time went from 5,5 seconds to about 0,5 seconds.
vSL := TStringList.Create;
try
vSL.Assign(jList.Items);
vSL.Sorted := true;
for x:= vSL.Count-1 downto 1 DO
begin
sMaster:= vSL[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, vSL[y]) then
begin
vSL.Delete (x); { REMOVE DUPLICATES }
jList.Items.Delete (x);
Break;
end;
end;
finally
vSL.Free;
end;
Obviously, this is not a good way to do it, but it demonstrates that TFileListBox is in itself quite slow. I don't believe you can gain much by optimizing your compare-function.
To demonstrate this, I replaced your SameFile function with the following, but kept the rest of your code:
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result := false; //Pretty darn fast code!!!
end;
The time went from 5,6 seconds to 5,5 seconds. I don't think there's much more to gain there :-)

Create another sorted list with sortedList.Duplicates := dupIgnore and add your strings to that list, then back.
vSL := TStringList.Create;
try
vSL.Sorted := true;
vSL.Duplicates := dupIgnore;
for x:= 0 to jList.Count - 1 do
vSL.Add(jList[x]);
jList.Clear;
for x:= 0 to vSL.Count - 1 do
jList.Add(vSL[x]);
finally
vSL.Free;
end;

The absolute fastest way, bar none (as alluded to before) is to use a routine that generates a unique 64/128/256 bit hash code for a string (I use the SHA256Managed class in C#). Run down the list of strings, generate the hash code for the strings, check for it in the sorted hash code list, and if found then the string is a duplicate. Otherwise add the hash code to the sorted hash code list.
This will work for strings, file names, images (you can get the unique hash code for an image), etc, and I guarantee that this will be as fast or faster than any other impementation.
PS You can use a string list for the hash codes by representing the hash codes as strings. I've used a hex representation in the past (256 bits -> 64 characters) but in theory you can do it any way you like.

4 seconds for how many calls? Great performance if you call it a billion times...
Anyway, does Length(Path1) get evaluated every time through the loop? If so, store that in an Integer variable prior to looping.
Pointers may yield some speed over the strings.
Try in-lining the function with:
function SameFile(blah blah): Boolean; Inline;
That will save some time, if this is being called thousands of times per second. I would start with that and see if it saves anything.
EDIT: I didn't realize that your list wasn't sorted. Obviously, you should do that first! Then you don't have to compare against every other item in the list - just the prior or next one.

I use a modified Ternary Search Tree (TST) to dedupe lists. You simply load the items into the tree, using the whole string as the key, and on each item you can get back an indication if the key is already there (and delete your visible entry). Then you throw away the tree. Our TST load function can typically load 100000 80-byte items in well under a second. And it could not take any more than this to repaint your list, with proper use of begin- and end-update. The TST is memory-hungry, but not so that you would notice it at all if you only have of the order of 500 items. And much simpler than sorting, comparisons and assembler (if you have a suitable TST implementation, of course).

No need to use a hash table, a single sorted list gives me a result of 10 milliseconds, that's 0.01 seconds, which is about 500 times faster! Here is my test code using a TListBox:
procedure TForm1.Button1Click(Sender: TObject);
var
lIndex1: Integer;
lString: string;
lIndex2: Integer;
lStrings: TStringList;
lCount: Integer;
lItems: TStrings;
begin
ListBox1.Clear;
for lIndex1 := 1 to 577 do begin
lString := '';
for lIndex2 := 1 to 100 do
if (lIndex2 mod 6) = 0 then
lString := lString + Chr(Ord('a') + Random(2))
else
lString := lString + 'a';
ListBox1.Items.Add(lString);
end;
CsiGlobals.AddLogMsg('Start', 'Test', llBrief);
lStrings := TStringList.Create;
try
lStrings.Sorted := True;
lCount := 0;
lItems := ListBox1.Items;
with lItems do begin
BeginUpdate;
try
for lIndex1 := Count - 1 downto 0 do begin
lStrings.Add(Strings[lIndex1]);
if lStrings.Count = lCount then
Delete(lIndex1)
else
Inc(lCount);
end;
finally
EndUpdate;
end;
end;
finally
lStrings.Free;
end;
CsiGlobals.AddLogMsg('Stop', 'Test', llBrief);
end;

I'd also like to point out that your solution would take an extreme amount of time if applied to a huge list (like containing 100,000,000 items or more). Even constructing a hashtable or sorted list would take too much time.
In cases like that you could try another approach : Hash each member, but instead of populating a full-blown hashtable, create a bitset (large enough to contain a close factor to as many slots as there are input items) and just set each bit at the offset indicated by the hashfunction. If the bit was 0, change it to 1. If it was already 1, take note of the offending string-index in a separate list and continue. This results in a list of string-indexes that had a collision in the hash, so you'll have to run it a second time to find the first cause of those collisions. After that, you should sort & de-dupe the string-indexes in this list (as all indexes apart from the first one will be present twice). Once that's done you should sort the list again, but this time sort it on the string-contents in order to easily spot duplicates in a following single scan.
Granted it could be a bit extreme to go this all this length, but at least it's a workable solution for very large volumes! (Oh, and this still won't work if the number of duplicates is very high, when the hash-function has a bad spread or when the number of slots in the 'hashtable' bitset is chosen too small - which would give many collisions which aren't really duplicates.)

How can I search faster for name/value pairs in a Delphi TStringList?

I implemented language translation in an application by putting all strings at runtime in a TStringList with:
procedure PopulateStringList;
begin
EnglishStringList.Append('CAN_T_FIND_FILE=It is not possible to find the file');
EnglishStringList.Append('DUMMY=Just a dummy record');
// total of 2000 record appended in the same way
EnglishStringList.Sorted := True; // Updated comment: this is USELESS!
end;
Then I get the translation using:
function GetTranslation(ResStr:String):String;
var
iIndex : Integer;
begin
iIndex := -1;
iIndex := EnglishStringList.IndexOfName(ResStr);
if iIndex >= 0 then
Result := EnglishStringList.ValueFromIndex[iIndex] else
Result := ResStr + ' (Translation N/A)';
end;
Anyway with this approach it takes about 30 microseconds to locate a record, is there a better way to achieve the same result?
UPDATE: For future reference I write here the new implementation that uses TDictionary as suggested (works with Delphi 2009 and newer):
procedure PopulateStringList;
begin
EnglishDictionary := TDictionary<String, String>.Create;
EnglishDictionary.Add('CAN_T_FIND_FILE','It is not possible to find the file');
EnglishDictionary.Add('DUMMY','Just a dummy record');
// total of 2000 record appended in the same way
end;
function GetTranslation(ResStr:String):String;
var
ValueFound: Boolean;
begin
ValueFound:= EnglishDictionary.TryGetValue(ResStr, Result);
if not ValueFound then Result := Result + '(Trans N/A)';
end;
The new GetTranslation function performs 1000 times faster (on my 2000 sample records) then the first version.

THashedStringList should be better, I think.

In Delphi 2009 or later I would use TDictionary< string,string > in Generics.Collections.
Also note that there are free tools such as http://dxgettext.po.dk/ for translating applications.

If THashedStringList works for you, that's great. Its biggest weakness is that every time you change the contents of the list, the Hash table is rebuilt. So it will work for you as long as your list remains small or doesn't change very often.
For more info on this, see: THashedStringList weakness, which gives a few alternatives.
If you have a big list that may be updated, you might want to try GpStringHash by gabr, that doesn't have to recompute the whole table at every change.

I think that you don't use the EnglishStringList(TStringList) correctly. This is a sorted list, you add elements (strings), you sort it, but when you search, you do this by a partial string (only the name, with IndexOfName).
If you use IndexOfName in a sorted list, the TStringList can't use Dicotomic search. It use sequential search.
(this is the implementation of IndexOfName)
for Result := 0 to GetCount - 1 do
begin
S := Get(Result);
P := AnsiPos('=', S);
if (P <> 0) and (CompareStrings(Copy(S, 1, P - 1), Name) = 0) then Exit;
end;
I think that this is the reason of poor performance.
The alternative is use 2 TStringList:
* The first (sorted) only containts the "Name" and a pointer to the second list that contain the value; You can implement this pointer to the second list using the "pointer" of Object property.
* The second (not sorted) list containt the values.
When you search, you do it at first list; In this case you can use the Find method. when you find the name, the pointer (implemented with Object property) give you the position on second list with the value.
In this case, Find method on Sorted List is more efficient that HashList (that must execute a funcion to get the position of a value).
Regards.
Pd:Excuse-me for mistakes with english.

You can also use a CLASS HELPER to re-program the "IndexOfName" function:
TYPE
TStringsHelper = CLASS HELPER FOR TStrings
FUNCTION IndexOfName(CONST Name : STRING) : INTEGER;
END;
FUNCTION TStringsHelper.IndexOfName(CONST Name : STRING) : INTEGER;
VAR
SL : TStringList ABSOLUTE Self;
S,T : STRING;
I : INTEGER;
BEGIN
IF (Self IS TStringList) AND SL.Sorted THEN BEGIN
S:=Name+NameValueSeparator;
IF SL.Find(S,I) THEN
Result:=I
ELSE IF (I<0) OR (I>=Count) THEN
Result:=-1
ELSE BEGIN
T:=SL[I];
IF CompareStrings(COPY(T,1,LENGTH(S)),S)=0 THEN Result:=I ELSE Result:=-1
END;
EXIT
END;
Result:=INHERITED IndexOfName(Name)
END;
(or implement it in a descendant TStrings class if you dislike CLASS HELPERs or don't have them in your Delphi version).
This will use a binary search on a sorted TStringList and a sequential search on other TStrings classes.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart