This is a sorted listview with 50000 items(strings) in delphi. How to fast search items with same prefixed words and then skip out loop?
The list is like:
aa.....
ab cd//from here
ab kk
ab li
ab mn
ab xy// to here
ac xz
...
I mean how to fast find and copy items with prefixes of ab and skip out loop. Suppose index of one of ab items is got in a binarysearch. The index of ab cd to ab xy is got through a binary search.
Thank you very much.
Edit: We thank all for your answers.
If you want something fast, don't store your data in a TListView.
Use a TStringList to store your list, then use a TListView in virtual mode.
Reading from a TStringList.Items[] is many times faster than reading from a TListView.Items[] property.
If you're sure that no void item exist in the list, uses this:
procedure Extract(List, Dest: TStrings; Char1, Char2: char);
var i,j: integer;
V: cardinal;
type PC = {$ifdef UNICODE}PCardinal{$else}PWord{$endif};
begin
V := ord(Char1)+ord(Char2) shl (8*sizeof(char));
Dest.BeginUpdate;
Dest.Clear;
for i := 0 to List.Count-1 do begin
if PC(pointer(List[i]))^=V then begin
for j := i to List.Count-1 do begin
Dest.Add(List[j]);
if PC(pointer(List[j]))^<>V then
break; // end the for j := loop
end;
break; // end the for i := loop
end;
Dest.EndUpdate;
end;
You can use binary search to get it even faster. But with the PWord() trick, on a 50000 items list, you won't notice it.
Note that PC(pointer(List[i]))^=V is a faster version of copy(List[i],1,2)=Char1+Char2, because no temporary string is created during the comparison. But it works only if no List[i]='', i.e. no pointer(List[i])=nil.
I've added a {$ifdef UNICODE} and sizeof(char) in order to have this code compile with all version of Delphi (before and after Delphi 2009).
To stop running a loop, use the break command. Exit is also useful to leave an entire function, especially when you have multiple nested loops to escape. As a last resort, you can use goto to jump out of several nested loops and continue running in the same function.
If you use a while or repeat loop instead of a for loop, the you can include another conjunct in your stopping condition that you set mid-loop:
i := 0;
found := False;
while (i < count) and not found do begin
// search for items
found := True;
// more stuff
Inc(i);
end;
Related
Today I have met very strange bug.
I have the next code:
var i: integer;
...
for i := 0 to FileNames.Count - 1 do
begin
ShowMessage(IntToStr(i) + ' from ' + IntToStr(FileNames.Count - 1));
FileName := FileNames[i];
...
end;
ShowMessage('all');
FileNames list has one element. So, I consider then loop will be executed once and I see
0 from 0
all
It is a thing I did thousands times :).
But in this case I see the second loop iteration when code optimization is switched on.
0 from 0
1 from 0
all
Without code optimization loop iterates right.
For the moment I don't know even the direction to move with this issue (and upper loop bound stays unchanged, yes).
So any suggestion will be very helpful. Thanks.
I use Delphi 2005 (Upd2) compiler.
Considering the QC report referred to by LU RD, and my own experience with D2005, here is a few workarounds. I recall using the while loop solution myself.
1.Rewrite the for loop as a while loop
var
i: integer;
begin
i := 0;
while i < FileNames.Count do
begin
...
inc(i);
end;
end;
2.Leave the for loop control variable alone from any other processing and use a separate variable, that you increment in the loop, for string manipulation and FileNames indexing.
var
ctrl, indx: integer;
begin
indx := 0;
for ctrl := 0 to FileNames.Count-1 do
begin
// use indx for string manipulation and FileNames indx
inc(indx);
end;
end;
3.You hinted at a workaround in saying Without code optimization loop iterates right.
Assuming you have optimization on turn it off ( {$O-} ) before the procedure/function and on ( {$O+} ) again after. Note! The Optimization directive can only be used around at least whole procedures/functions.
Ok, it seems to me I solved the problem and can explain it.
Unfortunately, I cannot make test to reproduce the bug, and I cannot show the real code, which under NDA. So I must use simplified example again.
Problem is in dll, which used in my code. Consider the next data structure:
type
TData = packed record
Count: integer;
end;
TPData = ^TData;
and function, which defined in dll:
Calc: function(Data: TPData): integer; stdcall;
In my code I try to proceed data records which are taken from list (TList):
var
i: integer;
Data: TData;
begin
for i := 0 to List.Count - 1 do
begin
Data := TPData(List[i])^;
Calc(#Data);
end;
and in case when optimization is on I see second iteration in loop from 0 to 0.
If rewrite code as
var
i: integer;
Data, Data2: TData;
begin
for i := 0 to List.Count - 1 do
begin
Data := TPData(List[i])^;
Data2 := TPData(List[i])^;
Calc(#Data2);
end;
all works as expected.
Dll itself was developed by another programmer, so I asked him to take care about it.
What was unexpected for me - that local procedure's stack can be corruped in so unusual way without access violations or other similar errors. BTW, Data and Data2 variables contains correct values.
Maybe, my experience will be useful to someone. Thanks all who helps me and please sorry my unconscious mistakes.
what is the fastest way to find duplicates in a Tstringlist. I get the data I need to search for duplicates in a Stringlist. My current idea goes like this :
var TestStringList, DataStringList : TstringList;
for i := 0 to DataStringList.Items-1 do
begin
if TestStringList.Indexof(DataStringList[i])< 0 < 0 then
begin
TestStringList.Add(DataStringList[i])
end
else
begin
memo1.ines.add('duplicate item found');
end;
end;
....
Just for completeness, (and because your code doesn't actually use the duplicate, but just indicates one has been found): Delphi's TStringList has the built-in ability to deal with duplicate entries, in it's Duplicates property. Setting it to dupIgnore will simply discard any duplicates you attempt to add. Note that the destination list has to be sorted, or Duplicates has no effect.
TestStringList.Sorted := True;
TestStringList.Duplicates := dupIgnore;
for i := 0 to DataStringList.Items-1 do
TestStringList.Add(DataStringList[i]);
Memo1.Lines.Add(Format('%d duplicates discarded',
[DataStringList.Count - TestStringList.Count]));
A quick test shows that the entire loop can be removed if you use Sorted and Duplicates:
TestStringList.Sorted := True;
TestStringList.Duplicates := dupIgnore;
TestStringList.AddStrings(DataStringList);
Memo1.Lines.Add(Format('%d duplicates discarded',
[DataStringList.Count - TestStringList.Count]));
See the TStringList.Duplicates documentation for more info.
I think that you are looking for duplicates. If so then you do the following:
Case 1: The string list is ordered
In this scenario, duplicates must appear at adjacent indices. In which case you simply loop from 1 to Count-1 and check whether or not the elements of index i is the same as that at index i-1.
Case 2: The string list is not ordered
In this scenario we need a double for loop. It looks like this:
for i := 0 to List.Count-1 do
for j := i+1 to List.Count-1 do
if List[i]=List[j] then
// duplicate found
There are performance considerations. If the list is ordered the search is O(N). If the list is not ordered the search is O(N2). Clearly the former is preferable. Since a list can be sorted with complexity O(N log N), if performance becomes a factor then it will be advantageous to sort the list before searching for duplicates.
Judging by the use of IndexOf you use an unsorted list. The scaling factor of your algorithm then is n^2. That is slow. You can optimize it as David shown by limiting search area in the internal search and then the average factor would be n^2/2 - but that still scales badly.
Note: scaling factor here makes sense for limited workloads, say dozen or hundreds of strings per list. For larger sets of data asymptotic analysis O(...) measure would suit better. However finding O-measures for QuickSort and for hash-lists is a trivial task.
Option 1: Sort the list. Using quick-sort it would have scaling factor n + n*log(n) or O(n*log(n)) for large loads.
Set Duplicates to accept
Set Sorted to True
Iterate the sorted list and check if the next string exists and is the same
http://docwiki.embarcadero.com/Libraries/XE3/en/System.Classes.TStringList.Duplicates
http://docwiki.embarcadero.com/Libraries/XE3/en/System.Classes.TStringList.Sorted
Option 2: use hashed list helper. In modern Delphi that would be TDictionary<String,Boolean>, in older Delphi there is a class used by TMemIniFile
You iterate your stringlist and then check if the string was already added into the helper collection.
The scaling factor would be a constant for small data chunks and O(1) for large ones - see http://docwiki.embarcadero.com/Libraries/XE2/en/System.Generics.Collections.TDictionary.ContainsKey
If it was not - you add it with "false" value.
If it was - you switch the value to "true"
For older Delphi you can use THashedStringList in a similar pattern (thanks #FreeConsulting)
http://docs.embarcadero.com/products/rad_studio/delphiAndcpp2009/HelpUpdate2/EN/html/delphivclwin32/IniFiles_THashedStringList_IndexOf.html
Unfortunately it is unclear what you want to do with the duplicates. Your else clause suggests you just want to know whether there is one (or more) duplicate(s). Although that could be the end goal, I assume you want more.
Extracting duplicates
The previously given answers delete or count the duplicate items. Here an answer for keeping them.
procedure ExtractDuplicates1(List1, List2: TStringList; Dupes: TStrings);
var
Both: TStringList;
I: Integer;
begin
Both := TStringList.Create;
try
Both.Sorted := True;
Both.Duplicates := dupAccept;
Both.AddStrings(List1);
Both.AddStrings(List2);
for I := 0 to Both.Count - 2 do
if (Both[I] = Both[I + 1]) then
if (Dupes.Count = 0) or (Dupes[Dupes.Count - 1] <> Both[I]) then
Dupes.Add(Both[I]);
finally
Both.Free;
end;
end;
Performance
The following alternatives are tried in order to compare performance of the above routine.
procedure ExtractDuplicates2(List1, List2: TStringList; Dupes: TStrings);
var
Both: TStringList;
I: Integer;
begin
Both := TStringList.Create;
try
Both.AddStrings(List1);
Both.AddStrings(List2);
Both.Sort;
for I := 0 to Both.Count - 2 do
if (Both[I] = Both[I + 1]) then
if (Dupes.Count = 0) or (Dupes[Dupes.Count - 1] <> Both[I]) then
Dupes.Add(Both[I]);
finally
Both.Free;
end;
end;
procedure ExtractDuplicates3(List1, List2, Dupes: TStringList);
var
I: Integer;
begin
Dupes.Sorted := True;
Dupes.Duplicates := dupAccept;
Dupes.AddStrings(List1);
Dupes.AddStrings(List2);
for I := Dupes.Count - 1 downto 1 do
if (Dupes[I] <> Dupes[I - 1]) or (I > 1) and (Dupes[I] = Dupes[I - 2]) then
Dupes.Delete(I);
if (Dupes.Count > 1) and (Dupes[0] <> Dupes[1]) then
Dupes.Delete(0);
while (Dupes.Count > 1) and (Dupes[0] = Dupes[1]) do
Dupes.Delete(0);
end;
Although ExtractDuplicates3 marginally performs better, I prefer ExtractDuplicates1 because it reeds better and the TStrings parameter provides more usability. ExtractDuplicates2 performs noticeable worst, which demonstrates that sorting all items afterwards in a single run takes more time then continuously sorting every single item added.
Note
This answer is part of this recent answer for which I was about to ask the same question: "how to keep duplicates?". I didn't, but if anyone knows or finds a better solution, please comment, add or update this answer.
This is an old thread but I thought this solution may be useful.
An option is to pump the values from one stringlist to another one with the setting of TestStringList.Duplicates := dupError; and then trap the exception.
var TestStringList, DataStringList : TstringList;
TestStringList.Sorted := True;
TestStringList.Duplicates := dupError;
for i := 0 to DataStringList.Items-1 do
begin
try
TestStringList.Add(DataStringList[i])
except
on E : EStringListError do begin
memo1.Lines.Add('duplicate item found');
end;
end;
end;
....
Just note that the trapping of the exception also masks the following errors:
There is not enough memory to expand the list, the list tried to grow beyond its maximal capacity, a non-existent element of the list was referenced. (i.e. the list index was out of bounds).
function TestDuplicates(const dataStrList: TStringList): integer;
begin
with TStringlist.create do begin
{Duplicates:= dupIgnore;}
for it:= 0 to DataStrList.count-1 do begin
if IndexOf(DataStrList[it])< 0 then
Add(DataStrList[it])
else
inc(result)
end;
Free;
end;
end;
I wrote this function to remove duplicates from a TList descendant, now i was wondering if this could give me problems in certain conditions, and how it does performance wise.
It seems to work with Object Pointers
function TListClass.RemoveDups: integer;
var
total,i,j:integer;
begin
total:=0;
i := 0;
while i < count do begin
j := i+1;
while j < count do begin
if items[i]=items[j] then begin
remove(items[j]);
inc(total);
end
else
inc(j);
end;
inc(i);
end;
result:=total;
end;
Update:
Does this work faster?
function TDrawObjectList.RemoveDups: integer;
var
total,i,j:integer;
templist:TLIST;
begin
templist:=TList.Create;
total:=0;
i := 0;
while i < count do
if templist.IndexOf(items[i])=-1 then begin
templist.add(i);
inc(i);
end else begin
remove(items[i]);
inc(total);
end;
result:=total;
templist.Free;
end;
You do need another List.
As noted, the solution is O(N^2) which makes it really slow on a big set of items (1000s), but as long as the count stays low it's the best bet because of it's simplicity and easiness to implement. Where's pre-sorted and other solutions need more code and prone to implementation errors more.
This maybe the same code written in different, more compact form. It runs through all elements of the list, and for each removes duplicates on right of the current element. Removal is safe as long as it's done in a reverse loop.
function TListClass.RemoveDups: Integer;
var
I, K: Integer;
begin
Result := 0;
for I := 0 to Count - 1 do //Compare to everything on the right
for K := Count - 1 downto I+1 do //Reverse loop allows to Remove items safely
if Items[K] = Items[I] then
begin
Remove(Items[K]);
Inc(Result);
end;
end;
I would suggest to leave optimizations to a later time, if you really end up with a 5000 items list. Also, as noted above, if you do check for duplicates on adding items to the list you can save on:
Check for duplicates gets distributed in time, so it wont be as noticeable to user
You can hope to quit early if dupe is found
Just hypothetical:
Interfaces
If you have interfaced objects in an TInterfaceList that are only in that list, you could check the refcount of an object. Just loop through the list backwards and delete all objects with a refcount > 1.
Custom counter
If you can edit these objects, you could do the same without interfaces. Increment a counter on the object when they are added to the list and decrease it when they are removed.
Of course, this only works if you can actually add a counter to these objects, but the boundaries weren't exactly clear in your question, so I don't know if this is allowed.
Advantage is that you don't need to look for other items, not when inserting, not when removing duplicates. Finding a duplicate in a sorted list could be faster (as mentioned in the comments), but not having to search at all will beat even the fastest lookup.
I have a TStringList which is sorted and contains unique filenames. The list can be of any size (so it can be hundreds of thousands of entries). I want to check to see if any of the entries start with a particular string (i.e. if the files are in a sub-folder). It's easy enough serially scanning the list and using StartsText, but that isn't an ideal solution.
Using the TStringList.Find() code as a starting point, I've created a function which I think is the solution, but I want to be sure. Don't worry about the following not being a member of the class (FList is the TStringList instance being searched), and StartsFilename works the same way as StartsText:
function ShortcutFind(const S: string): Boolean;
var
L, H, I, C: Integer;
begin
Result := False;
L := 0;
H := FList.Count - 1;
while L <= H do begin
I := (L + H) shr 1;
if TFilenameUtils.StartsFilename(FList[I], aFolder) then begin
Result:=TRUE;
Exit;
end;
C := FList.CompareStrings(FList[I], S);
if C < 0 then
L := I + 1
else begin
H := I - 1;
if C = 0 then begin
Result := True;
if FList.Duplicates <> dupAccept then L := I;
end;
end;
end;
end;
Basically, the only real change is that it does the check before moving onto the next entry to compare.
Note that switching from TStringList is not an option.
Would this method work?
Thanks
If TFilenameUtils.StartsFilename is the same as StartsText (and your first paragraph suggests it might be), then you can do the whole function in one statement by using TStringList.Find instead of copying it:
var
I: Integer;
begin
Assert(not FList.CaseSensitive);
Result := FList.Find(S, I) or ((I < FList.Count) and StartsText(S, FList[I]));
end;
That should work because when Find fails, it still tells you the index of where the desired string would have appeared in the list. When you search for your prefix string, its location will be before any other strings that start with that prefix, so if there are any strings with that prefix, they'll appear immediately after the hypothetical location of the prefix itself.
If you want to keep your current code, then you can simplify it by removing the conditional that checks C = 0. That condition should never occur, unless your StartsFilename function is broken. But, if the function really is broken and C can be zero, then you can at least stop executing the loop at that point since you've found what you were looking for. Either way, you don't need to check Duplicates since your function doesn't have the same requirement as Find does to return the index of the found item.
How would I read data from a text file into two arrays? One being string and the other integer?
The text file has a layout like this:
Hello
1
Test
2
Bye
3
Each number corresponds to the text above it. Can anyone perhaps help me? Would greatly appreciate it
var
Items: TStringList;
Strings: array of string;
Integers: array of Integer;
i, Count: Integer;
begin
Items := TStringList.Create;
try
Items.LoadFromFile('c:\YourFileName.txt');
// Production code should check that Items.Count is even at this point.
// Actual arrays here. Set their size once, because we know already.
// growing your arrays inside the iteration will cause many reallocations
// and memory fragmentation.
Count := Items.Count div 2;
SetLength(Strings, Count);
SetLength(Integers, Count);
for i := 0 to Count - 1 do
begin
Strings[i] := Items[i*2];
Integers[i] := StrToInt(Items[i*2+1]);
end;
finally
Items.Free;
end;
end
I would read the file into a string list and then process it item by item. The even ones are put into the list of strings, and the odd ones go into the numbers.
var
file, strings, numbers: TStringList;
...
//create the lists
file.LoadFromFile(filename);
Assert(file.Count mod 2=0);
for i := 0 to file.Count-1 do
if i mod 2=0 then
strings.Add(file[i])
else
numbers.Add(file[i]);
I'd probably use some helper functions called odd and even in my own code.
If you wanted the numbers in a list of integers, rather than a string list, then you would use TList<Integer> and add StrToInt(file[i]) on the odd iterations.
I've used lists rather than dynamic arrays for the ease of writing this code, but GolezTrol shows you how to do it with dynamic arrays if that's what you prefer.
That said, since your state that the number is associated with the string, you may actually be better off with something like this:
type
TNameAndID = record
Name: string;
ID: Integer;
end;
var
List: TList<TNameAndID>;
Item: TNameAndID;
...
List := TList<TNameAndID>.Create;
file.LoadFromFile(filename);
Assert(file.Count mod 2=0);
for i := 0 to file.Count-1 do begin
if i mod 2=0 then begin
Item.Name := file[i];
end else begin
Item.ID := StrToInt(file[i]);
List.Add(Item);
end;
end;
end;
The advantage of this approach is that you now have assurance that the association between name and ID will be maintained. Should you ever wish to sort, insert or remove items then you will find the above structure much more convenient than two parallel arrays.