It seems there is no Ansi overload for StrToInt. Is this right? Or maybe I am missing something.
StrToInt insists to convert my ansistrings to string.
You are correct. There is no ANSI version of StrToInt. The place to find ANSI versions of standard function is the AnsiStrings unit, and there's nothing there.
Either write your own function to do the job, or accept the conversion required to use StrToInt.
It's not too hard to write your own function. It might look like this:
uses
SysConst; // for SInvalidInteger
....
{$OVERFLOWCHECKS OFF}
{$RANGECHECKS OFF}
function AnsiStrToInt(const s: AnsiString): Integer;
procedure Error;
begin
raise EConvertError.CreateResFmt(#SInvalidInteger, [s]);
end;
var
Index, Len, Digit: Integer;
Negative: Boolean;
begin
Index := 1;
Result := 0;
Negative := False;
Len := Length(s);
while (Index <= Len) and (s[Index] = ' ') do
inc(Index);
if Index > Len then
Error;
case s[Index] of
'-','+':
begin
Negative := s[Index] = '-';
inc(Index);
if Index > Len then
Error;
end;
end;
while Index <= Len do
begin
Digit := ord(s[Index]) - ord('0');
if (Digit < 0) or (Digit > 9) then
Error;
Result := Result * 10 + Digit;
if Result < 0 then
Error;
inc(Index);
end;
if Negative then
Result := -Result;
end;
This is a cut-down version of that found in StrToInt. It does not handle hexadecimal and is a bit more stringent regarding errors. Before using this code I'd want to test whether or not this really is your bottleneck.
It is quite interesting that this code, based on that in the RTL source, is incapable of returning low(Integer). It's not too hard to fix that up, but it would make the code more complex.
The code is actually very simple (hex strings aren't supported but prolly you don't need them):
function AnsiStrToInt(const S: RawByteString): Integer;
var
P: PByte;
Negative: Boolean;
Digit: Integer;
begin
P:= Pointer(S);
// skip leading spaces
while (P^ = Ord(' ')) do Inc(P);
Negative:= False;
if (P^ = Ord('-')) then begin
Negative:= True;
Inc(P);
end
else if (P^ = Ord('+')) then Inc(P);
if P^ = 0 then
raise Exception.Create('No data');
Result:= 0;
repeat
if Cardinal(Result) > Cardinal(High(Result) div 10) then
raise Exception.Create('Integer overflow');
Digit:= P^ - Ord('0');
if (Digit < 0) or (Digit > 9) then
raise Exception.Create('Invalid char');
Result:= Result * 10 + Digit;
if (Result < 0) then begin
if not Negative or (Cardinal(Result) <> Cardinal(Low(Result))) then
raise Exception.Create('Integer overflow');
end;
Inc(P);
until (P^ = 0);
if Negative then Result:= -Result;
end;
I followed this tip:
How to convert AnsiString to UnicodeString in Delphi XE4
Example:
var
a : AnsiString;
b : String;
c : Integer;
begin
a := '123';
b := String(a);
c := StrToInt(b);
Related
I have a string like '10011011001', And I wish to convert this string into Hex string, what is the best way to do that.
The OP clarified that the input string's length is <= 32. Then the problem becomes simpler.
There are many possible solutions. One of them is this:
function BinStrToHex32(const S: string): string;
begin
var LValue: UInt32 := 0;
for var i := 1 to S.Length do
case S[i] of
'0', '1':
LValue := LValue shl 1 or Ord(S[i] = '1');
else
raise Exception.CreateFmt('Invalid binary number: %s', [S]);
end;
Result := IntToHex(LValue);
end;
which IMHO is quite readable and performs some validation. (For bonus points, you can add overflow checking.)
If there were no restriction to the input string length, then I'd do something like this:
function BinStrToHexStr(const S: string): string;
const
HexDigits: array[0..$F] of Char = '0123456789ABCDEF';
begin
if S.Length mod 8 <> 0 then
raise Exception.Create('Invalid binary string.');
SetLength(Result, S.Length div 4);
var LNibble: Byte := 0;
var c := 0;
for var i := 1 to S.Length do
begin
LNibble := LNibble shl 1 or Ord(S[i] = '1');
if i mod 4 = 0 then
begin
Inc(c);
Result[c] := HexDigits[LNibble];
LNibble := 0;
end;
end;
end;
I'm struggling with dealing with Ansi code strings. I'm getting the [32m, [37m, [K etc chars.
Is there a quicker way to eliminate/strip the ansi codes from the strings I get rather than doing it with the loop through chars searching for the beginning and end points of the ansi codes?
I know the declaration is something like this: #27'['#x';'#y';'#z'm';
where x, y, z... are the ANSI codes. So I assume I should be searching for #27 until I find "m;"
Are there any already made functions to achieve what I want? My search returned nothing except this article.
Thanks
You can treat this protocol very fast with code like this (simplest finite state machine):
var
s: AnsiString;
i: integer;
InColorCode: Boolean;
begin
s := 'test'#27'['#5';'#30';'#47'm colored text';
InColorCode := False;
for i := 1 to Length(s) do
if InColorCode then
case s[i] of
#0: TextAttrib = Normal;
...
#47: TextBG := White;
'm': InColorCode := false;
else;
// I do nothing here for `;`, '[' and other chars.
// treat them if necessary
end;
else
if s[i] = #27 then
InColorCode := True
else
output char with current attributes
Clearing string from ESC-codes:
procedure StripEscCode(var s: AnsiString);
const
StartChar: AnsiChar = #27;
EndChar: AnsiChar = 'm';
var
i, cnt: integer;
InEsc: Boolean;
begin
Cnt := 0;
InEsc := False;
for i := 1 to Length(s) do
if InEsc then begin
InEsc := s[i] <> EndChar;
Inc(cnt)
end
else begin
InEsc := s[i] = StartChar;
if InEsc then
Inc(cnt)
else
s[i - cnt] :=s[i];
end;
setLength(s, Length(s) - cnt);
end;
I need to strip out all non standard text characers from a string. I need remove all non ascii and control characters (except line feeds/carriage returns).
And here's a variant of Cosmin's that only walks the string once, but uses an efficient allocation pattern:
function StrippedOfNonAscii(const s: string): string;
var
i, Count: Integer;
begin
SetLength(Result, Length(s));
Count := 0;
for i := 1 to Length(s) do begin
if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then begin
inc(Count);
Result[Count] := s[i];
end;
end;
SetLength(Result, Count);
end;
Something like this should do:
// For those who need a disclaimer:
// This code is meant as a sample to show you how the basic check for non-ASCII characters goes
// It will give low performance with long strings that are called often.
// Use a TStringBuilder, or SetLength & Integer loop index to optimize.
// If you need really optimized code, pass this on to the FastCode people.
function StripNonAsciiExceptCRLF(const Value: AnsiString): AnsiString;
var
AnsiCh: AnsiChar;
begin
for AnsiCh in Value do
if (AnsiCh >= #32) and (AnsiCh <= #127) and (AnsiCh <> #13) and (AnsiCh <> #10) then
Result := Result + AnsiCh;
end;
For UnicodeString you can do something similar.
if you don't need to do it in-place, but generating a copy of the string, try this code
type CharSet=Set of Char;
function StripCharsInSet(s:string; c:CharSet):string;
var i:Integer;
begin
result:='';
for i:=1 to Length(s) do
if not (s[i] in c) then
result:=result+s[i];
end;
and use it like this
s := StripCharsInSet(s,[#0..#9,#11,#12,#14..#31,#127]);
EDIT: added #127 for DEL ctrl char.
EDIT2: this is a faster version, thanks ldsandon
function StripCharsInSet(s:string; c:CharSet):string;
var i,j:Integer;
begin
SetLength(result,Length(s));
j:=0;
for i:=1 to Length(s) do
if not (s[i] in c) then
begin
inc(j);
result[j]:=s[i];
end;
SetLength(result,j);
end;
Here's a version that doesn't build the string by appending char-by-char, but allocates the whole string in one go. It requires going over the string twice, once to count the "good" char, once to effectively copy those chars, but it's worth it because it doesn't do multiple reallocations:
function StripNonAscii(s:string):string;
var Count, i:Integer;
begin
Count := 0;
for i:=1 to Length(s) do
if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then
Inc(Count);
if Count = Length(s) then
Result := s // No characters need to be removed, return the original string (no mem allocation!)
else
begin
SetLength(Result, Count);
Count := 1;
for i:=1 to Length(s) do
if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then
begin
Result[Count] := s[i];
Inc(Count);
end;
end;
end;
my performance solution;
function StripNonAnsiChars(const AStr: String; const AIgnoreChars: TSysCharSet): string;
var
lBuilder: TStringBuilder;
I: Integer;
begin
lBuilder := TStringBuilder.Create;
try
for I := 1 to AStr.Length do
if CharInSet(AStr[I], [#32..#127] + AIgnoreChars) then
lBuilder.Append(AStr[I]);
Result := lBuilder.ToString;
finally
FreeAndNil(lBuilder);
end;
end;
I wrote by delphi xe7
my version with Result array of byte :
interface
type
TSBox = array of byte;
and the function :
function StripNonAscii(buf: array of byte): TSBox;
var temp: TSBox;
countr, countr2: integer;
const validchars : TSysCharSet = [#32..#127];
begin
if Length(buf) = 0 then exit;
countr2:= 0;
SetLength(temp, Length(buf)); //setze temp auf länge buff
for countr := 0 to Length(buf) do if CharInSet(chr(buf[countr]), validchars) then
begin
temp[countr2] := buf[countr];
inc(countr2); //count valid chars
end;
SetLength(temp, countr2);
Result := temp;
end;
What is the best way to convert a Delphi XE AnsiString containing escaped combining diacritical marks like "Fu\u0308rst" into a frienly WideString "Fürst"?
I am aware of the fact that this is not always possible for all combinations, but the common Latin blocks should be supported without building silly conversion tables on my own. I guess the solution can be found somewhere in the new Characters unit, but I don't get it.
I think you need to perform Unicode Normalization. on your string.
I don't know if there's a specific call in Delphi XE RTL to do this, but the WinAPI call NormalizeString should help you here, with mode NormalizationKC:
NormalizationKC
Unicode normalization form KC, compatibility composition. Transforms
each base plus combining characters to
the canonical precomposed equivalent
and all compatibility characters to
their equivalents. For example, the ligature fi becomes f + i; similarly, A + ¨ + fi + n becomes Ä + f + i + n.
Here is the complete code that solved my problem:
function Unescape(const s: AnsiString): string;
var
i: Integer;
j: Integer;
c: Integer;
begin
// Make result at least large enough. This prevents too many reallocs
SetLength(Result, Length(s));
i := 1;
j := 1;
while i <= Length(s) do begin
if s[i] = '\' then begin
if i < Length(s) then begin
// escaped backslash?
if s[i + 1] = '\' then begin
Result[j] := '\';
inc(i, 2);
end
// convert hex number to WideChar
else if (s[i + 1] = 'u') and (i + 1 + 4 <= Length(s))
and TryStrToInt('$' + string(Copy(s, i + 2, 4)), c) then begin
inc(i, 6);
Result[j] := WideChar(c);
end else begin
raise Exception.CreateFmt('Invalid code at position %d', [i]);
end;
end else begin
raise Exception.Create('Unexpected end of string');
end;
end else begin
Result[j] := WideChar(s[i]);
inc(i);
end;
inc(j);
end;
// Trim result in case we reserved too much space
SetLength(Result, j - 1);
end;
const
NormalizationC = 1;
function NormalizeString(NormForm: Integer; lpSrcString: LPCWSTR; cwSrcLength: Integer;
lpDstString: LPWSTR; cwDstLength: Integer): Integer; stdcall; external 'Normaliz.dll';
function Normalize(const s: string): string;
var
newLength: integer;
begin
// in NormalizationC mode the result string won't grow longer than the input string
SetLength(Result, Length(s));
newLength := NormalizeString(NormalizationC, PChar(s), Length(s), PChar(Result), Length(Result));
SetLength(Result, newLength);
end;
function UnescapeAndNormalize(const s: AnsiString): string;
begin
Result := Normalize(Unescape(s));
end;
Thank you all! I am sure that my first experience with StackOverflow won't be my last one :-)
Are they always escaped like this? Always in a number of 4 digits?
How is the \ character itself escaped?
Assuming the \character is escaped by \xxxx where xxxx is the code for the \ character, you can easily loop through the string:
function Unescape(s: AnsiString): WideString;
var
i: Integer;
j: Integer;
c: Integer;
begin
// Make result at least large enough. This prevents too many reallocs
SetLength(Result, Length(s));
i := 1; j := 1;
while i <= Length(s) do
begin
// If a '\' is found, typecast the following 4 digit integer to widechar
if s[i] = '\' then
begin
if (s[i+1] <> 'u') or not TryStrToInt(Copy(s, i+2, 4), c) then
raise Exception.CreateFmt('Invalid code at position %d', [i]);
Inc(i, 6);
Result[j] := WideChar(c);
end
else
begin
Result[j] := WideChar(s[i]);
Inc(i);
end;
Inc(j);
end;
// Trim result in case we reserved too much space
SetLength(Result, j-1);
end;
Use like this
MessageBoxW(0, PWideChar(Unescape('\u0252berhaupt')), nil, MB_OK);
This code is tested in Delphi 2007, but should work in XE as well due to the explicit use of Ansistring and Widestring.
[edit] Code is ok. Highlighter fails.
If I'm not mistaken, Delphi XE now supports regular expressions. I don't use them that often, though, but it seems a good way to parse the string and then replace all escaped values. Maybe someone has a good example of how to do this in Delphi with regular expressions?
GolezTrol,
you forget '$'
if (s[i+1] <> 'u') or not TryStrToInt('$'+Copy(s, i+2, 4), c) then
I am using StringReplace to replace > and < by the char itself in a generated XML like this:
StringReplace(xml.Text,'>','>',[rfReplaceAll]) ;
StringReplace(xml.Text,'<','<',[rfReplaceAll]) ;
The thing is it takes way tooo long to replace every occurence of >.
Do you purpose any better idea to make it faster?
If you're using Delphi 2009, this operation is about 3 times faster with TStringBuilder than with ReplaceString. It's Unicode safe, too.
I used the text from http://www.CodeGear.com with all occurrences of "<" and ">" changed to "<" and ">" as my starting point.
Including string assignments and creating/freeing objects, these took about 25ms and 75ms respectively on my system:
function TForm1.TestStringBuilder(const aString: string): string;
var
sb: TStringBuilder;
begin
StartTimer;
sb := TStringBuilder.Create;
sb.Append(aString);
sb.Replace('>', '>');
sb.Replace('<', '<');
Result := sb.ToString();
FreeAndNil(sb);
StopTimer;
end;
function TForm1.TestStringReplace(const aString: string): string;
begin
StartTimer;
Result := StringReplace(aString,'>','>',[rfReplaceAll]) ;
Result := StringReplace(Result,'<','<',[rfReplaceAll]) ;
StopTimer;
end;
Try FastStrings.pas from Peter Morris.
You should definitely look at the Fastcode project pages: http://fastcode.sourceforge.net/
They ran a challenge for a faster StringReplace (Ansi StringReplace challenge), and the 'winner' was 14 times faster than the Delphi RTL.
Several of the fastcode functions have been included within Delphi itself in recent versions (D2007 on, I think), so the performance improvement may vary dramatically depending on which Delphi version you are using.
As mentioned before, you should really be looking at a Unicode-based solution if you're serious about processing XML.
The problem is that you are iterating the entire string size twice (one for replacing > by > and another one to replace < by <).
You should iterate with a for and simply check ahead whenever you find a & for a gt; or lt; and do the immediate replace and then skipping 3 characters ((g|l)t;). This way it can do that in proportional time to the size of the string xml.Text.
A simple C# example as I do not know Delphi but should do for you to get the general idea.
String s = "<xml>test</xml>";
char[] input = s.ToCharArray();
char[] res = new char[s.Length];
int j = 0;
for (int i = 0, count = input.Length; i < count; ++i)
{
if (input[i] == '&')
{
if (i < count - 3)
{
if (input[i + 1] == 'l' || input[i + 1] == 'g')
{
if (input[i + 2] == 't' && input[i + 3] == ';')
{
res[j++] = input[i + 1] == 'l' ? '<' : '>';
i += 3;
continue;
}
}
}
}
res[j++] = input[i];
}
Console.WriteLine(new string(res, 0, j));
This outputs:
<xml>test</xml>
When you are dealing with a multiline text files, you can get some performance by processing it line by line. This approach reduced time in about 90% to replaces on >1MB xml file.
procedure ReplaceMultilineString(xml: TStrings);
var
i: Integer;
line: String;
begin
for i:=0 to xml.Count-1 do
begin
line := xml[i];
line := StringReplace(line, '>', '>', [rfReplaceAll]);
line := StringReplace(line, '<', '<', [rfReplaceAll]);
xml[i] := line;
end;
end;
Untested conversion of the C# code written by Jorge Ferreira.
function ReplaceLtGt(const s: string): string;
var
inPtr, outPtr: integer;
begin
SetLength(Result, Length(s));
inPtr := 1;
outPtr := 1;
while inPtr <= Length(s) do begin
if (s[inPtr] = '&') and ((inPtr + 3) <= Length(s)) and
(s[inPtr+1] in ['l', 'g']) and (s[inPtr+2] = 't') and
(s[inPtr+3] = ';') then
begin
if s[inPtr+1] = 'l' then
Result[outPtr] := '<'
else
Result[outPtr] := '>';
Inc(inPtr, 3);
end
else begin
Result[outPtr] := Result[inPtr];
Inc(inPtr);
end;
Inc(outPtr);
end;
SetLength(Result, outPtr - 1);
end;
Systools (Turbopower, now open source) has a ReplaceStringAllL function that does all of them in a string.
it's work like charm so fast trust it
Function NewStringReplace(const S, OldPattern, NewPattern: string; Flags: TReplaceFlags): string;
var
OldPat,Srch: string; // Srch and Oldp can contain uppercase versions of S,OldPattern
PatLength,NewPatLength,P,i,PatCount,PrevP: Integer;
c,d: pchar;
begin
PatLength:=Length(OldPattern);
if PatLength=0 then begin
Result:=S;
exit;
end;
if rfIgnoreCase in Flags then begin
Srch:=AnsiUpperCase(S);
OldPat:=AnsiUpperCase(OldPattern);
end else begin
Srch:=S;
OldPat:=OldPattern;
end;
PatLength:=Length(OldPat);
if Length(NewPattern)=PatLength then begin
//Result length will not change
Result:=S;
P:=1;
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
for i:=1 to PatLength do
Result[P+i-1]:=NewPattern[i];
if not (rfReplaceAll in Flags) then exit;
inc(P,PatLength);
end;
until p=0;
end else begin
//Different pattern length -> Result length will change
//To avoid creating a lot of temporary strings, we count how many
//replacements we're going to make.
P:=1; PatCount:=0;
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
inc(P,PatLength);
inc(PatCount);
if not (rfReplaceAll in Flags) then break;
end;
until p=0;
if PatCount=0 then begin
Result:=S;
exit;
end;
NewPatLength:=Length(NewPattern);
SetLength(Result,Length(S)+PatCount*(NewPatLength-PatLength));
P:=1; PrevP:=0;
c:=pchar(Result); d:=pchar(S);
repeat
P:=PosEx(OldPat,Srch,P);
if P>0 then begin
for i:=PrevP+1 to P-1 do begin
c^:=d^;
inc(c); inc(d);
end;
for i:=1 to NewPatLength do begin
c^:=NewPattern[i];
inc(c);
end;
if not (rfReplaceAll in Flags) then exit;
inc(P,PatLength);
inc(d,PatLength);
PrevP:=P-1;
end else begin
for i:=PrevP+1 to Length(S) do begin
c^:=d^;
inc(c); inc(d);
end;
end;
until p=0;
end;
end;