Dephi: faster way to convert a “dirty” string to a number

Dephi: faster way to convert a “dirty” string to a number - delphi

How can I make this code faster? the string can contain characters such as ", .?#" and possibly others.
Const Nums = ['0'..'9'];
function CleanNumber(s: String): Int64;
Var z: Cardinal;
begin
for z := length(s) downto 1 do
if not (s[z] in Nums) then Delete(s,z,1);
if s = '' then
Result := 0 else
Result := StrToInt64(s);
end;
Results (long loop):
CL2,CL3 = HeartWare's
32-bit, "dirty number" / "clean number"
Mine: 270ms, 165ms
CL2: 220ms, 210ms
CL3: 100ms, 110ms
DirtyStrToNum: 215ms, 90ms
64-bit, "dirty number" / "clean number"
Mine: 2280ms, 75ms
CL2: 1320ms, 130ms
CL3: 280ms, 25ms
DirtyStrToNum: 1390ms, 125ms

Here are two examples that for sure are faster than the one you have (deleting a character from a string is relatively slow):
This one works by pre-allocating a string of the maximum possible length and then filling it out with the digits as I come across them in the source string. No delete for every unsupported character, and no expansion of the target string for every supported character.
FUNCTION CleanNumber(CONST S : STRING) : Int64;
VAR
I,J : Cardinal;
C : CHAR;
T : STRING;
BEGIN
SetLength(T,LENGTH(S));
J:=LOW(T);
FOR I:=LOW(S) TO HIGH(S) DO BEGIN
C:=S[I];
IF (C>='0') AND (C<='9') THEN BEGIN
T[J]:=C;
INC(J)
END
END;
IF J=LOW(T) THEN
Result:=0
ELSE BEGIN
SetLength(T,J-LOW(T)); // or T[J]:=#0 [implementation-specific]
Result:=StrToInt64(T)
END
END;
This one works by simple multiplication of the end result by 10 and adding the corresponding digit value.
{$IFOPT Q+}
{$DEFINE OverflowEnabled }
{$ELSE }
{$Q+ If you want overflow checking }
{$ENDIF }
FUNCTION CleanNumber(CONST S : STRING) : Int64;
VAR
I : Cardinal;
C : CHAR;
BEGIN
Result:=0;
FOR I:=LOW(S) TO HIGH(S) DO BEGIN
C:=S[I];
IF (C>='0') AND (C<='9') THEN Result:=Result*10+(ORD(C)-ORD('0'))
END
END;
{$IFNDEF OverflowEnabled } {$Q-} {$ENDIF }
{$UNDEF OverflowEnabled }
Also note that I don't use IN or CharInSet as these are much slower than a simple inline >= and <= comparison.
Another comment I could make is the use of LOW and HIGH on the string variable. This makes it compatible with both 0-based strings (mobile compilers) and 1-based strings (desktop compilers).

Your function is slow mainly because of the Delete approach. Each call to Delete needs to move a lot of characters around.
A faster approach would be like this:
function DirtyStrToNum(const S: string): Int64;
var
tmp: string;
i, j: Integer;
const
DIGITS = ['0'..'9'];
begin
SetLength(tmp, S.Length);
j := 0;
for i := 1 to S.Length do
if CharInSet(S[i], DIGITS) then
begin
Inc(j);
tmp[j] := S[i];
end;
SetLength(tmp, j);
if tmp.IsEmpty then
Result := 0
else
Result := StrToInt64(tmp);
// Or, but not equivalent: Result := StrToInt64Def(tmp, 0);
end;
Notice I make a single allocation for a new string, and then only copy the minimum number of characters to it.

Related

how to convert hexa to dec using delphi and hex to octal?

function HexToDec(Str: string): Integer;
var
i, M: Integer;
begin
Result:=0;
M:=1;
Str:=AnsiUpperCase(Str);
for i:=Length(Str) downto 1 do
begin
case Str[i] of
'1'..'9': Result:=Result+(Ord(Str[i])-Ord('0'))*M;
'A'..'F': Result:=Result+(Ord(Str[i])-Ord('A')+10)*M;
end;
M:=M shl 4;
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
if Edit1.Text<>'' then
Label2.Caption:=IntToStr(HexToDec(Edit1.Text));
end;
How to using it without function, because i want to call the result again in other line, and how about hexa to octal ? am i must conver from hexa to dec and then dec to octal?

Delphi can do this already, so you don't need to write a function parsing the number. It is quite simple, actually:
function HexToDec(const Str: string): Integer;
begin
if (Str <> '') and ((Str[1] = '-') or (Str[1] = '+')) then
Result := StrToInt(Str[1] + '$' + Copy(Str, 2, MaxInt))
else
Result := StrToInt('$' + Str);
end;
Note that that also handles negative hex numbers, or numbers like +$1234.
How to using it without function, because i want to call the result again in other line ?
If you want to re-use the value, assign the result of HexToDec to a variable and use that in IntToStr.
FWIW, in your function, there is no need to call AnsiUpperCase, because all hex digits fall in the ASCII range anyway. A much simpler UpperCase should work too.

My first comment would be that you are not converting hex to decimal with your function (although you are converting to decimal as an intermediate) but rather hex to integer. IntToStr then converts integer to base 10, effectively. To generalise what you want then I would create two functions - strBaseToInt and IntToStrBase where Base is meant to imply e.g. 16 for hex, 10 for dec, 8 for octal, etc., and assuming the convention adopted by hex that A=10, and so on but to (possibly) Z = 35 making the maximum base possible 36.
I don't handle + or - but that could be added easily.
In the reverse funtion, again for simplicity of illustration I have ommitted supporting negative values.
Edit
Thanks to Rudy for this improvement
Edit 2 - Overflow test added, as per comments
function StrBaseToInt(const Str: string; const Base : integer): Integer;
var
i, iVal, iTest: Longword;
begin
if (Base > 36) or (Base < 2) then raise Exception.Create('Invalid Base');
Result:=0;
iTest := 0;
for i:=1 to Length(Str) do
begin
case Str[i] of
'0'..'9': iVal := (Ord(Str[i])-Ord('0'));
'A'..'Z': iVal := (Ord(Str[i])-Ord('A')+10);
'a'..'z': iVal := (Ord(Str[i])-Ord('a')+10);
else raise Exception.Create( 'Illegal character found');
end;
if iVal < Base then
begin
Result:=Result * Base + iVal;
if Result < iTest then // overflow test!
begin
raise Exception.Create( 'Overflow occurred');
end
else
begin
iTest := Result;
end;
end
else
begin
raise Exception.Create( 'Illegal character found');
end;
end;
end;
Then, for example your HexToOct function would look like this
function HexToOct( Value : string ) : string;
begin
Result := IntToStrBase( StrBaseToInt( Value, 16), 8 );
end;
Additional
A general function would be
function BaseToBase( const Value : string; const FromBase, ToBase : integer ) : string;
begin
Result := IntToStrBase( StrBaseToInt( Value, FromBase ),ToBase );
end;

Only allow certain characters in a string

I am trying to validate a string, where by it can contain all alphebetical and numerical characters, aswell as the underline ( _ ) symbol.
This is what I tried so far:
var
S: string;
const
Allowed = ['A'..'Z', 'a'..'z', '0'..'9', '_'];
begin
S := 'This_is_my_string_0123456789';
if Length(S) > 0 then
begin
if (Pos(Allowed, S) > 0 then
ShowMessage('Ok')
else
ShowMessage('string contains invalid symbols');
end;
end;
In Lazarus this errors with:
Error: Incompatible type for arg no. 1: Got "Set Of Char", expected
"Variant"
Clearly my use of Pos is all wrong and I am not sure if my approach is even the correct way of going about it or not?
Thanks.

You will have to check every single character of the string, if it's contained in Allowed
e.g.:
var
S: string;
const
Allowed = ['A' .. 'Z', 'a' .. 'z', '0' .. '9', '_'];
Function Valid: Boolean;
var
i: Integer;
begin
Result := Length(s) > 0;
i := 1;
while Result and (i <= Length(S)) do
begin
Result := Result AND (S[i] in Allowed);
inc(i);
end;
if Length(s) = 0 then Result := true;
end;
begin
S := 'This_is_my_string_0123456789';
if Valid then
ShowMessage('Ok')
else
ShowMessage('string contains invalid symbols');
end;

TYPE TCharSet = SET OF CHAR;
FUNCTION ValidString(CONST S : STRING ; CONST ValidChars : TCharSet) : BOOLEAN;
VAR
I : Cardinal;
BEGIN
Result:=FALSE;
FOR I:=1 TO LENGTH(S) DO IF NOT (S[I] IN ValidChars) THEN EXIT;
Result:=TRUE
END;
If you are using a Unicode version of Delphi (as you seem to be), beware that a SET OF CHAR cannot contain all valid characters in the Unicode character set. Then perhaps this function will be useful instead:
FUNCTION ValidString(CONST S,ValidChars : STRING) : BOOLEAN;
VAR
I : Cardinal;
BEGIN
Result:=FALSE;
FOR I:=1 TO LENGTH(S) DO IF POS(S[I],ValidChars)=0 THEN EXIT;
Result:=TRUE
END;
but then again, not all characters (actually Codepoints) in Unicode can be expressed by a single character, and some characters can be expressed in more than one way (both as a single character and as a multi-character).
But as long as you constrain yourself within these limitations, one of the above functions should be useful. You can even include both, if you add an "OVERLOAD;" directive to the end of each function declaration, as in:
FUNCTION ValidString(CONST S : STRING ; CONST ValidChars : TCharSet) : BOOLEAN; OVERLOAD;
FUNCTION ValidString(CONST S,ValidChars : STRING) : BOOLEAN; OVERLOAD;

Lazarus/Free Pascal doesn't overload pos for that but has "posset" variants in unit strutils for that;
http://www.freepascal.org/docs-html/rtl/strutils/posset.html
Regarding Andreas' (IMHO correct ) remark, you can use isemptystr for that. It was meant to check for strings that only contain whitespace, but it basically checks if a string only contains characters in a set.
http://www.freepascal.org/docs-html/rtl/strutils/isemptystr.html

You can use Regular Expressions:
uses System.RegularExpressions;
if not TRegEx.IsMatch(S, '^[_a-zA-Z0-9]+$') then
ShowMessage('string contains invalid symbols');

How to count number of occurrences of a certain char in string?

How can I count the number of occurrences of a certain character in a string in Delphi?
For instance, assume that I have the following string and would like to count the number of commas in it:
S := '1,2,3';
Then I would like to obtain 2 as the result.

You can use this simple function:
function OccurrencesOfChar(const S: string; const C: char): integer;
var
i: Integer;
begin
result := 0;
for i := 1 to Length(S) do
if S[i] = C then
inc(result);
end;

Even though an answer has already been accepted, I'm posting the more general function below because I find it so elegant. This solution is for counting the occurrences of a string rather than a character.
{ Returns a count of the number of occurences of SubText in Text }
function CountOccurences( const SubText: string;
const Text: string): Integer;
begin
Result := Pos(SubText, Text);
if Result > 0 then
Result := (Length(Text) - Length(StringReplace(Text, SubText, '', [rfReplaceAll]))) div Length(subtext);
end; { CountOccurences }

And for those who prefer the enumerator loop in modern Delphi versions (not any better than the accepted solution by Andreas, just an alternative solution):
function OccurrencesOfChar(const ContentString: string;
const CharToCount: char): integer;
var
C: Char;
begin
result := 0;
for C in ContentString do
if C = CharToCount then
Inc(result);
end;

This one can do the work for if you're not handling large text
...
uses RegularExpressions;
...
function CountChar(const s: string; const c: char): integer;
begin
Result:= TRegEx.Matches(s, c).Count
end;

You can use the benefit of StringReplace function as:
function OccurencesOfChar(ContentString:string; CharToCount:char):integer;
begin
Result:= Length(ContentString)-Length(StringReplace(ContentString, CharToCount,'', [rfReplaceAll, rfIgnoreCase]));
end;

Simple solution and good performance (I wrote for Delphi 7, but should work for other versions as well):
function CountOccurences(const ASubString: string; const AString: string): Integer;
var
iOffset: Integer;
iSubStrLen: Integer;
begin
Result := 0;
if (ASubString = '') or (AString = '') then
Exit;
iOffset := 1;
iSubStrLen := Length(ASubString);
while (True) do
begin
iOffset := PosEx(ASubString, AString, iOffset);
if (iOffset = 0) then
Break;
Inc(Result);
Inc(iOffset, iSubStrLen);
end;
end;

Ummm... Am I missing something? Why not just...
kSepChar:=',';//to count commas
bLen:=length(sLineToCheck);
bCount:=0;//The numer of kSepChars seen so far.
bPosn:=1;//First character in string is at position 1
for bPosn:=1 to bLen do begin
if sLineToCheck[bPosn]=kSepChar then inc(bCount);
end;//

How can I test my WideReplace function?

OS: Hungarian Windows (Windows 1250)
Under Delphi 6 Prof there is no WideStringPos, WideStringCopy, WideStringReplace...
But in an XML based project I need to use them.
Because that I tried to write "something like" these functions.
But I'm not sure they are working as I want...
Because Delphi converts the Wide to Ansi and reverse in the background, I cannot be sure that my code is safe from these side effects... :-)
The code is very primitive - I need the solution quickly...
function WideStringCopy(WWhat : WideString; From, HowMany : integer) : WideString;
var
i : integer;
l : integer;
wc : WideChar;
begin
Result := '';
if WWhat = ''
then Exit;
if (HowMany <= 0)
then Exit;
if (From < 1)
then From := 1;
l := From + HowMany - 1;
if l > Length(WWhat)
then l := Length(WWhat);
for i := From to l do begin
wc := WWhat[i];
Result := Result + wc;
end;
end;
function WideStringPos(WWhere, WWhat : WideString) : integer;
var
wscomp : WideString;
i : integer;
begin
Result := 0;
for i := 1 to Length(WWhere) do begin
wscomp := WideStringCopy(WWhere, i, LengtH(WWhat));
if WideSameStr(wscomp, WWhat)
then begin
Result := i;
Exit;
end;
end;
end;
function WideStringReplace(WWhere, WFrom, WTo : WideString) : WideString;
var
actpos : integer;
wcomp : WideString;
wc : WideChar;
begin
Result := '';
actpos := 1;
while actpos <= Length(WWhere) do begin
wcomp := WideStringCopy(WWhere, actpos, Length(WFrom));
if WideSameStr(wcomp, WFrom) then begin
Result := Result + WTo;
inc(actpos, Length(WFrom));
end else begin
wc := WWhere[actpos];
Result := Result + wc;
inc(actpos);
end;
end;
end;
I have two questions about it:
Do you see any piece of code that surely making bad result (converting the Wide to Ansi silently, and causing character loosing)?
Do you know some character with I can test this code?
For example, chr(XXX) what is remaining when my converters are keeping the Wide rules, but loosing if I make wrong code...
Thanks for every info you will write...

Do you know some character with I can test this code?
Any codepage beyond Win1250 - for example Cyrillic Win1251, Greek, Hebrew - almost all letters there would be missed from 1250/1252
You can take Jedi CodeLibrary and use its locale conversion routines: make a string consisting of #128 till #255 in some encoding like aforementioned, convert it to Unicode from that codepage and then convert back from Unicode to Hungarian codepage.
function StringToWideStringEx(const S: AnsiString; CodePage: Word): WideString;
function WideStringToStringEx(const WS: WideString; CodePage: Word): AnsiString;
Or in one call
function TranslateString(const S: AnsiString; CP1, CP2: Word): AnsiString;
Then look which chars failed to translate and turned into ReplacementCharacter.
However in JCL you'd have your Pos function and such ready to use. And XML parser. So why bother ?

StringReplace alternatives to improve performance

I am using StringReplace to replace &gt and &lt by the char itself in a generated XML like this:
StringReplace(xml.Text,'>','>',[rfReplaceAll]) ;
StringReplace(xml.Text,'<','<',[rfReplaceAll]) ;
The thing is it takes way tooo long to replace every occurence of &gt.
Do you purpose any better idea to make it faster?

If you're using Delphi 2009, this operation is about 3 times faster with TStringBuilder than with ReplaceString. It's Unicode safe, too.
I used the text from http://www.CodeGear.com with all occurrences of "<" and ">" changed to "<" and ">" as my starting point.
Including string assignments and creating/freeing objects, these took about 25ms and 75ms respectively on my system:
function TForm1.TestStringBuilder(const aString: string): string;
var
sb: TStringBuilder;
begin
StartTimer;
sb := TStringBuilder.Create;
sb.Append(aString);
sb.Replace('>', '>');
sb.Replace('<', '<');
Result := sb.ToString();
FreeAndNil(sb);
StopTimer;
end;
function TForm1.TestStringReplace(const aString: string): string;
begin
StartTimer;
Result := StringReplace(aString,'>','>',[rfReplaceAll]) ;
Result := StringReplace(Result,'<','<',[rfReplaceAll]) ;
StopTimer;
end;

Try FastStrings.pas from Peter Morris.

You should definitely look at the Fastcode project pages: http://fastcode.sourceforge.net/
They ran a challenge for a faster StringReplace (Ansi StringReplace challenge), and the 'winner' was 14 times faster than the Delphi RTL.
Several of the fastcode functions have been included within Delphi itself in recent versions (D2007 on, I think), so the performance improvement may vary dramatically depending on which Delphi version you are using.
As mentioned before, you should really be looking at a Unicode-based solution if you're serious about processing XML.

The problem is that you are iterating the entire string size twice (one for replacing > by > and another one to replace < by <).
You should iterate with a for and simply check ahead whenever you find a & for a gt; or lt; and do the immediate replace and then skipping 3 characters ((g|l)t;). This way it can do that in proportional time to the size of the string xml.Text.
A simple C# example as I do not know Delphi but should do for you to get the general idea.
String s = "<xml>test</xml>";
char[] input = s.ToCharArray();
char[] res = new char[s.Length];
int j = 0;
for (int i = 0, count = input.Length; i < count; ++i)
{
if (input[i] == '&')
{
if (i < count - 3)
{
if (input[i + 1] == 'l' || input[i + 1] == 'g')
{
if (input[i + 2] == 't' && input[i + 3] == ';')
{
res[j++] = input[i + 1] == 'l' ? '<' : '>';
i += 3;
continue;
}
}
}
}
res[j++] = input[i];
}
Console.WriteLine(new string(res, 0, j));
This outputs:
<xml>test</xml>

When you are dealing with a multiline text files, you can get some performance by processing it line by line. This approach reduced time in about 90% to replaces on >1MB xml file.
procedure ReplaceMultilineString(xml: TStrings);
var
i: Integer;
line: String;
begin
for i:=0 to xml.Count-1 do
begin
line := xml[i];
line := StringReplace(line, '>', '>', [rfReplaceAll]);
line := StringReplace(line, '<', '<', [rfReplaceAll]);
xml[i] := line;
end;
end;

Untested conversion of the C# code written by Jorge Ferreira.
function ReplaceLtGt(const s: string): string;
var
inPtr, outPtr: integer;
begin
SetLength(Result, Length(s));
inPtr := 1;
outPtr := 1;
while inPtr <= Length(s) do begin
if (s[inPtr] = '&') and ((inPtr + 3) <= Length(s)) and
(s[inPtr+1] in ['l', 'g']) and (s[inPtr+2] = 't') and
(s[inPtr+3] = ';') then
begin
if s[inPtr+1] = 'l' then
Result[outPtr] := '<'
else
Result[outPtr] := '>';
Inc(inPtr, 3);
end
else begin
Result[outPtr] := Result[inPtr];
Inc(inPtr);
end;
Inc(outPtr);
end;
SetLength(Result, outPtr - 1);
end;

Systools (Turbopower, now open source) has a ReplaceStringAllL function that does all of them in a string.

it's work like charm so fast trust it
Function NewStringReplace(const S, OldPattern, NewPattern: string; Flags: TReplaceFlags): string;
var
OldPat,Srch: string; // Srch and Oldp can contain uppercase versions of S,OldPattern
PatLength,NewPatLength,P,i,PatCount,PrevP: Integer;
c,d: pchar;
begin
PatLength:=Length(OldPattern);
if PatLength=0 then begin
Result:=S;
exit;
end;
if rfIgnoreCase in Flags then begin
Srch:=AnsiUpperCase(S);
OldPat:=AnsiUpperCase(OldPattern);
end else begin
Srch:=S;
OldPat:=OldPattern;
end;
PatLength:=Length(OldPat);
if Length(NewPattern)=PatLength then begin
//Result length will not change
Result:=S;
P:=1;
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
for i:=1 to PatLength do
Result[P+i-1]:=NewPattern[i];
if not (rfReplaceAll in Flags) then exit;
inc(P,PatLength);
end;
until p=0;
end else begin
//Different pattern length -> Result length will change
//To avoid creating a lot of temporary strings, we count how many
//replacements we're going to make.
P:=1; PatCount:=0;
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
inc(P,PatLength);
inc(PatCount);
if not (rfReplaceAll in Flags) then break;
end;
until p=0;
if PatCount=0 then begin
Result:=S;
exit;
end;
NewPatLength:=Length(NewPattern);
SetLength(Result,Length(S)+PatCount*(NewPatLength-PatLength));
P:=1; PrevP:=0;
c:=pchar(Result); d:=pchar(S);
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
for i:=PrevP+1 to P-1 do begin
c^:=d^;
inc(c); inc(d);
end;
for i:=1 to NewPatLength do begin
c^:=NewPattern[i];
inc(c);
end;
if not (rfReplaceAll in Flags) then exit;
inc(P,PatLength);
inc(d,PatLength);
PrevP:=P-1;
end else begin
for i:=PrevP+1 to Length(S) do begin
c^:=d^;
inc(c); inc(d);
end;
end;
until p=0;
end;
end;

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Dephi: faster way to convert a “dirty” string to a number - delphi

Related

how to convert hexa to dec using delphi and hex to octal?

Only allow certain characters in a string

How to count number of occurrences of a certain char in string?

How can I test my WideReplace function?

StringReplace alternatives to improve performance

Categories

Resources