I'm doing the way I learned, that is:
with a FOR and taking the Index array one by one, but it is leaving too slow, would otherwise convert it to a String? that leaves quicker?
In my case it would be a Dynamic Array of ShortInt.
For example, given this input:
[0,20,-15]
I would like the following output:
0,20,-15
I suspect that your code is slow because it is performing unnecessary reallocations of the string. However, without seeing your code it's hard to be sure.
Probably the simplest way to code your algorithm is to use TStringBuilder. Whether or not that gives sufficient performance, only you can say.
sb := TStringBuilder.Create;
try
for i := 0 to high(buffer) do
begin
sb.Append(IntToStr(buffer[i]));
if i<high(buffer) then
sb.Append(',');
end;
str := sb.ToString;
finally
sb.Free;
end;
Related
I'm using code from this article:
How to Convert Numbers (Currency) to Words
I can't seem to understand how the following code works exactly.
try
sIntValue := FormatFloat('#,###', trunc(abs(Number)));
sDecValue := Copy(FormatFloat('.#########', frac(abs(Number))), 2);
if (Pos('E', sIntValue) > 0) then // if number is too big
begin
Result := 'ERROR:';
exit;
end;
except
Result := 'ERROR:';
exit;
end;
How is it checking if the number is too big using the Pos() function? Why is it searching for E in an Integer? This make no sense to me. I would apprecaite any explanation (the code works just fine, I just want to understand why and how).
The code is checking for the use of scientific notation. That is where you write a number like 1000 as '1E3'.
The code is faintly ridiculous though. Hard to know why the author did not use the > comparison operator.
How can I detect if a string contains a float. For example: '0.004'
But without using StrToFloat because that function are slow but rather by iterating through chars.
function IsInteger(const S: String): Boolean;
var
P: PChar;
begin
P := PChar(S);
Result := True;
while not (P^ = #0) do
begin
case P^ of
'0'..'9': Inc(P);
else
Result := False;
Break;
end;
end;
end;
This will check if string is a positive integer but not a float..
I would use TryStrToFloat():
if TryStrToFloat(str, value, FormatSettings) then
....
If you are prepared to use the default system wide format settings then you can omit the final parameter:
if TryStrToFloat(str, value) then
....
Can you use a RegEx here? Something like:
([+-]?[0-9]+(?:\.[0-9]*)?)
The problem with this question is that saying "is too slow" doesn't tell much. What does the profiler tells to you? Do you have an informed idea about the input data? What about different notations, for example, 6.02e23?
If your input data is mostly noise, then using regular expressions (as answered here) may improve things but only as a first filter. You could then add a second step to actually obtain your number, as explained by David's answer.
I have to check if I have duplicate paths in a FileListBox (FileListBox has the role of some kind of job list or play list).
Using Delphi's SameText, CompareStr, CompareText, takes 6 seconds. So I came with my own compare function which is (just) a bit faster but not fast enough. Any ideas how to improve it?
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result:= Length(Path1)= Length(Path2); { if they have different lenghts then obviously are not the same file }
if Result then
for i:= Length(Path1) downto 1 DO { start from the end because it is more likely to find the difference there }
if Path1[i]<> Path2[i] then
begin
Result:= FALSE;
Break;
end;
end;
I use it like this:
for x:= JList.Count-1 downto 1 DO
begin
sMaster:= JList.Items[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, JList.Items[y]) then
begin
JList.Items.Delete (x); { REMOVE DUPLICATES }
Break;
end;
end;
Note: The chance of having duplicates is small so Delete is not called often. Also the list cannot be sorted because the items are added by user and sometimes the order may be important.
Update:
The thing is that I lose the asvantage of my code because it is Pascal.
It would be nice if the comparison loop ( Path1[i]<> Path2[i] ) would be optimized to use Borland's ASM code.
Delphi 7, Win XP 32 bit, Tests were done with 577 items in the list. Deleting the items from list IS NOT A PROBLEM because it happens rarely.
CONCLUSION
As Svein Bringsli pointed, my code is slow not because of the comparing algorithm but because of TListBox. The BEST solution was provided by Marcelo Cantos. Thanks a lot Marcelo.
I accepted Svein's answer because it answers directly my question "how to make my comparison function faster" with "there is no point to make it faster".
For the moment I implemented the dirty and quick to implement solution: when I have under 200 files, I use my slow code to check the duplicates. If there are more than 200 files I use dwrbudr's solution (which is damn fast) considering that if the user has so many files, the order is irrelevant anyway (human brain cannot track so many items).
I want to thank you all for ideas and especially Svein for revealing the truth: (Borland's) visual controls are damn slow!
Don't waste time optimising the assembler. You can go from O(n2) to O(n log(n)) — bringing the time down to milliseconds — by sorting the list and then doing a linear scan for duplicates.
While you're at it, forget the SameFile function. The algorithmic improvement will dwarf anything you can achieve there.
Edit: Based on feedback in the comments...
You can perform an order-preserving O(n log(n)) de-duplication as follows:
Sort a copy of the list.
Identify and copy duplicated entries to a third list along with their duplication count minus one.
Walk the original list backwards as per your original version.
In the inner (for y := ...) loop, traverse the duplication list instead. If an outer item matches, delete it, decrement the duplication count, and delete the duplication entry if the count reaches zero.
This is obviously more complicated but it will still be orders of magnitude faster, even if you do horrible dirty things like storing duplication counts as strings, C:\path1\file1=2, and using code like:
y := dupes.IndexOfName(sMaster);
if y <> -1 then
begin
JList.Items.Delete(x);
c := StrToInt(dupes.ValueFromIndex(y));
if c > 1 then
dupes.Values[sMaster] = IntToStr(c - 1);
else
dupes.Delete(y);
end;
Side note: A binary chop would be more efficient than the for y := ... loop, but given that duplicates are rare, the difference ought to be negligible.
Using your code as a starting point, I modified it to take a copy of the list before searching for duplicates. The time went from 5,5 seconds to about 0,5 seconds.
vSL := TStringList.Create;
try
vSL.Assign(jList.Items);
vSL.Sorted := true;
for x:= vSL.Count-1 downto 1 DO
begin
sMaster:= vSL[x];
for y:= x-1 downto 0 DO
if SameFile(sMaster, vSL[y]) then
begin
vSL.Delete (x); { REMOVE DUPLICATES }
jList.Items.Delete (x);
Break;
end;
end;
finally
vSL.Free;
end;
Obviously, this is not a good way to do it, but it demonstrates that TFileListBox is in itself quite slow. I don't believe you can gain much by optimizing your compare-function.
To demonstrate this, I replaced your SameFile function with the following, but kept the rest of your code:
function SameFile(CONST Path1, Path2: string): Boolean;
VAR i: Integer;
begin
Result := false; //Pretty darn fast code!!!
end;
The time went from 5,6 seconds to 5,5 seconds. I don't think there's much more to gain there :-)
Create another sorted list with sortedList.Duplicates := dupIgnore and add your strings to that list, then back.
vSL := TStringList.Create;
try
vSL.Sorted := true;
vSL.Duplicates := dupIgnore;
for x:= 0 to jList.Count - 1 do
vSL.Add(jList[x]);
jList.Clear;
for x:= 0 to vSL.Count - 1 do
jList.Add(vSL[x]);
finally
vSL.Free;
end;
The absolute fastest way, bar none (as alluded to before) is to use a routine that generates a unique 64/128/256 bit hash code for a string (I use the SHA256Managed class in C#). Run down the list of strings, generate the hash code for the strings, check for it in the sorted hash code list, and if found then the string is a duplicate. Otherwise add the hash code to the sorted hash code list.
This will work for strings, file names, images (you can get the unique hash code for an image), etc, and I guarantee that this will be as fast or faster than any other impementation.
PS You can use a string list for the hash codes by representing the hash codes as strings. I've used a hex representation in the past (256 bits -> 64 characters) but in theory you can do it any way you like.
4 seconds for how many calls? Great performance if you call it a billion times...
Anyway, does Length(Path1) get evaluated every time through the loop? If so, store that in an Integer variable prior to looping.
Pointers may yield some speed over the strings.
Try in-lining the function with:
function SameFile(blah blah): Boolean; Inline;
That will save some time, if this is being called thousands of times per second. I would start with that and see if it saves anything.
EDIT: I didn't realize that your list wasn't sorted. Obviously, you should do that first! Then you don't have to compare against every other item in the list - just the prior or next one.
I use a modified Ternary Search Tree (TST) to dedupe lists. You simply load the items into the tree, using the whole string as the key, and on each item you can get back an indication if the key is already there (and delete your visible entry). Then you throw away the tree. Our TST load function can typically load 100000 80-byte items in well under a second. And it could not take any more than this to repaint your list, with proper use of begin- and end-update. The TST is memory-hungry, but not so that you would notice it at all if you only have of the order of 500 items. And much simpler than sorting, comparisons and assembler (if you have a suitable TST implementation, of course).
No need to use a hash table, a single sorted list gives me a result of 10 milliseconds, that's 0.01 seconds, which is about 500 times faster! Here is my test code using a TListBox:
procedure TForm1.Button1Click(Sender: TObject);
var
lIndex1: Integer;
lString: string;
lIndex2: Integer;
lStrings: TStringList;
lCount: Integer;
lItems: TStrings;
begin
ListBox1.Clear;
for lIndex1 := 1 to 577 do begin
lString := '';
for lIndex2 := 1 to 100 do
if (lIndex2 mod 6) = 0 then
lString := lString + Chr(Ord('a') + Random(2))
else
lString := lString + 'a';
ListBox1.Items.Add(lString);
end;
CsiGlobals.AddLogMsg('Start', 'Test', llBrief);
lStrings := TStringList.Create;
try
lStrings.Sorted := True;
lCount := 0;
lItems := ListBox1.Items;
with lItems do begin
BeginUpdate;
try
for lIndex1 := Count - 1 downto 0 do begin
lStrings.Add(Strings[lIndex1]);
if lStrings.Count = lCount then
Delete(lIndex1)
else
Inc(lCount);
end;
finally
EndUpdate;
end;
end;
finally
lStrings.Free;
end;
CsiGlobals.AddLogMsg('Stop', 'Test', llBrief);
end;
I'd also like to point out that your solution would take an extreme amount of time if applied to a huge list (like containing 100,000,000 items or more). Even constructing a hashtable or sorted list would take too much time.
In cases like that you could try another approach : Hash each member, but instead of populating a full-blown hashtable, create a bitset (large enough to contain a close factor to as many slots as there are input items) and just set each bit at the offset indicated by the hashfunction. If the bit was 0, change it to 1. If it was already 1, take note of the offending string-index in a separate list and continue. This results in a list of string-indexes that had a collision in the hash, so you'll have to run it a second time to find the first cause of those collisions. After that, you should sort & de-dupe the string-indexes in this list (as all indexes apart from the first one will be present twice). Once that's done you should sort the list again, but this time sort it on the string-contents in order to easily spot duplicates in a following single scan.
Granted it could be a bit extreme to go this all this length, but at least it's a workable solution for very large volumes! (Oh, and this still won't work if the number of duplicates is very high, when the hash-function has a bad spread or when the number of slots in the 'hashtable' bitset is chosen too small - which would give many collisions which aren't really duplicates.)
i have a function who's job is to convert an ADO Recordset into html:
class function RecordsetToHtml(const rs: _Recordset): WideString;
And the guts of the function involves a lot of wide string concatenation:
while not rs.EOF do
begin
Result := Result+CRLF+
'<TR>';
for i := 0 to rs.Fields.Count-1 do
Result := Result+'<TD>'+VarAsWideString(rs.Fields[i].Value)+'</TD>';
Result := Result+'</TR>';
rs.MoveNext;
end;
With a few thousand results, the function takes, what any user would feel, is too long to run. The Delphi Sampling Profiler shows that 99.3% of the time is spent in widestring concatenation (#WStrCatN and #WstrCat).
Can anyone think of a way to improve widestring concatenation? i don't think Delphi 5 has any kind of string builder. And Format doesn't support Unicode.
And to make sure nobody tries to weasel out: pretend you are implementing the interface:
IRecordsetToHtml = interface(IUnknown)
function RecordsetToHtml(const rs: _Recordset): WideString;
end;
Update One
I thought of using an IXMLDOMDocument, to build up the HTML as xml. But then i realized that the final HTML would be xhtml and not html - a subtle, but important, difference.
Update Two
Microsoft knowledge base article: How To Improve String Concatenation Performance
WideString are inherently slow because they were implemented for COM compatibility and go through COM calls. If you look at the code, it will keep on reallocating the string and call SysAllocStringLen() & C which are APIs from oleaut32.dll. It doesn't use the Delphi memory manager but AFAIK it uses the COM memory manager.
Because most HTML pages don't use UTF-16, you may get better result using the native Delphi string type and a string list, although you should be careful about conversion from UTF and the actual codepage, and the conversion will downgrade performance as well.
Also you're using a VarAsString() function that probably converts a variant to an AnsiString then converted to a WideString. Check if your version of Delphi has a VarAsWideString() or something alike function to avoid it, or rely on Delphi automatic conversion if you could be sure your variant will never be NULL.
Yup, your algorithm is clearly in O(n^2).
Instead of returning a string, try returning a TStringList, and replace your loop with
while not rs.EOF do
begin
Result.Add('<TR>');
for i := 0 to rs.Fields.Count-1 do
Result.Add( '<TD>'+VarAsString(rs.Fields[i].Value)+'</TD>' );
Result := Result.Add('</TR>');
rs.MoveNext;
end;
You can then save your Result using TStringList.SaveToFile
I'm unable to spend the time right now to give you the exact code.
But I think the fastest thing you can do is:
Loop through all the strings and total their length also adding for the extra table tags you'll need.
Use SetString to allocate one string of the proper length.
Loop through all the strings again and use the "Move" procedure to copy to the string to the proper place in the final string.
The key thing is that many concatenations to a string take longer and longer because of the constant allocating and freeing of memory. A single allocation will be your biggest timesaver.
i found the best solution. The open source HtmlParser for Delphi, has a helper TStringBuilder class. It is internally used to build what he calls DomStrings, which is actually an alias of WideString:
TDomString = WideString;
With a little bit of fiddling of his class:
TStringBuilder = class
public
constructor Create(ACapacity: Integer);
function EndWithWhiteSpace: Boolean;
function TailMatch(const Tail: WideString): Boolean;
function ToString: WideString;
procedure AppendText(const TextStr: WideString);
procedure Append(const value: WideString);
procedure AppendLine(const value: WideString);
property Length: Integer read FLength;
end;
The guts of the routine becomes:
while not rs.EOF do
begin
sb.Append('<TR>');
for i := 0 to rs.Fields.Count-1 do
sb.Append('<TD>'+VarAsWideString(rs.Fields[i].Value));
sb.AppendLine('</TR>');
rs.MoveNext;
end;
The code then feels to run infinitely afaster. Profiling shows much improvement; the WideString manipulation and length-counting became negligible. In its place was FastMM's own internal operations.
Notes
Nice catch on the mistaken forcing of all strings into current code-page (VarAsString rather than VarAsWideString)
Some HTML closing tags are optional; omitted ones that logically make no sense.
Widestring is not reference counted, any modification means a string manipulation. If your content is not unicode encoded, you can internally use the native string (reference counted) to concatenate string and then convert it to a Widestring. Example is as follows:
var
NativeString: string;
begin
// ...
NativeString := '';
while not rs.EOF do
begin
NativeString := NativeString + CRLF + '<TR>';
for i := 0 to rs.Fields.Count-1 do
NativeString := NativeString + '<TD>'+VarAsString(rs.Fields[i].Value) + '</TD>';
NativeString := NativeString + '</TR>';
rs.MoveNext;
end;
Result := WideString(NativeString);
I have also seen another approach: Encode Unicode to UTF8String (as reference counted), concatenate them and finally convert UTF8String to Widestring. But I am not sure, if two UTF8String can be concatenated directly. The time on encoding should also be considered.
Anyway, although Widestring concatenation is much slower than native string operations. But it is IMO still acceptable. Too much tuning on such kind of thing should be avoided. Seriously considering of performance, you should then upgrade your Delphi to at least 2009. The costs on buying a tool is for long-term cheaper than doing heavy hacks on an old Delphi.
I want to know how to increase the value in a FOR-loop statement.
This is my code.
function Check(var MemoryData:Array of byte;MemorySignature:Array of byte;Position:integer):boolean;
var i:byte;
begin
for i := 0 to Length(MemorySignature) - 1 do
begin
while(MemorySignature[i] = $FF) do inc(i); //<< ERROR <<
if(memorydata[i + position] <> MemorySignature[i]) then Result:=false;
end;
Result := True;
end;
The error is: E2081 Assignment to FOR-Loop variable 'i'.
I'm trying to translate an old code from C# to Delphi,but I can't increase 'i'.
Increasing 'i' is not the only way to go,but I want to know where the problem is.
Of course the others are (generally) correct. What wasn't said, is that 'i' in your loop doesn't exist. Delphi uses a CPU register for it. That's why you cannot change it and that's why you should use a 'for' loop (not a 'while') because the 'for' is way faster. Here is your code modified (not tested but I think that you got the idea) - also imho you had some bugs - fixed them also:
function Check(var MemoryData:Array of byte;MemorySignature:Array of byte;Position:integer):boolean;
var i:byte;
begin
Result := True; //moved at top. Your function always returned 'True'. This is what you wanted?
for i := 0 to Length(MemorySignature) - 1 do //are you sure??? Perhaps you want High(MemorySignature) here...
begin
if MemorySignature[i] <> $FF then //speedup - '<>' evaluates faster than '='
begin
Result:=memorydata[i + position] <> MemorySignature[i]; //speedup.
if not Result then
Break; //added this! - speedup. We already know the result. So, no need to scan till end.
end;
end;
end;
...also MemorySignature should have a 'const' or 'var'. Otherwise as it is now the array gets copied. Which means slowdown at each call of 'Check'. Having a 'var' the things are much faster with code unchanged because AFAIS the MemorySignature isn't changed.
HTH
in this case, you can just do a 'continue' instead of inc(i)
In addition to what Lasse wrote, assigning to a loop variable is generally considered a code smell. It makes code harder to read (if you want to leave the loop premataturely, you can express that a lot clearer using break/continue), and is often done by accident, causing all kind of nasty side-effects. So instead of jumping through hoops to make the compiler not do its optimizing fu on any loop where the loop variable is touched, Borland (now CodeGear) bit the bullet and made assigning to the loop variable illegal.
If you really want to mess about manually with loop indices, consider using a while-loop.
If you need to alter a loop counter inside a loop, try using a while loop instead.
BTW, you need your
Result := True
line to be the first line of the function for it to work properly. As it is, it will always return True.
The problem is that the compiler has taken the original FOR-loop code and assumed it knows what is happening, and thus it can optimize the code by outputting specific CPU instructions that runs the fastest, with those assumptions.
If it allowed you to mess with the variable value, those assumptions might go out the window, and thus the code might not work, and that's why it doesn't allow you to change it.
What you should do instead is just have a separate variable that you're actually using, and only use the FOR-loop indexing variable to keep track of how many iterations you've currently executed.
As an example, a typical optimization might be to write CPU-instructions that will stop iterating when the index register drops to zero, rewriting the loop in such a way that it internally counts down, instead of up, and if you start messing with the variable, it could not rewrite the code like that.
As per Mike Sutton, what you need is a while loop, not a for loop.
function Check(var MemoryData: Array of byte;
MemorySignature: Array of byte; Position: Integer):Boolean;
var
i:byte;
begin
Result := True;
i := 0;
while i < Length(MemorySignature) do
begin
while(MemorySignature[i] = $FF) do
Inc(i);
if(MemoryData[i + position] <> MemorySignature[i]) then
Result := False;
Inc(i);
end;
end;
The Delphi implementation of "for" is optimised, but as a result it is less flexible than the C-style