OS: Hungarian Windows (Windows 1250)
Under Delphi 6 Prof there is no WideStringPos, WideStringCopy, WideStringReplace...
But in an XML based project I need to use them.
Because that I tried to write "something like" these functions.
But I'm not sure they are working as I want...
Because Delphi converts the Wide to Ansi and reverse in the background, I cannot be sure that my code is safe from these side effects... :-)
The code is very primitive - I need the solution quickly...
function WideStringCopy(WWhat : WideString; From, HowMany : integer) : WideString;
var
i : integer;
l : integer;
wc : WideChar;
begin
Result := '';
if WWhat = ''
then Exit;
if (HowMany <= 0)
then Exit;
if (From < 1)
then From := 1;
l := From + HowMany - 1;
if l > Length(WWhat)
then l := Length(WWhat);
for i := From to l do begin
wc := WWhat[i];
Result := Result + wc;
end;
end;
function WideStringPos(WWhere, WWhat : WideString) : integer;
var
wscomp : WideString;
i : integer;
begin
Result := 0;
for i := 1 to Length(WWhere) do begin
wscomp := WideStringCopy(WWhere, i, LengtH(WWhat));
if WideSameStr(wscomp, WWhat)
then begin
Result := i;
Exit;
end;
end;
end;
function WideStringReplace(WWhere, WFrom, WTo : WideString) : WideString;
var
actpos : integer;
wcomp : WideString;
wc : WideChar;
begin
Result := '';
actpos := 1;
while actpos <= Length(WWhere) do begin
wcomp := WideStringCopy(WWhere, actpos, Length(WFrom));
if WideSameStr(wcomp, WFrom) then begin
Result := Result + WTo;
inc(actpos, Length(WFrom));
end else begin
wc := WWhere[actpos];
Result := Result + wc;
inc(actpos);
end;
end;
end;
I have two questions about it:
Do you see any piece of code that surely making bad result (converting the Wide to Ansi silently, and causing character loosing)?
Do you know some character with I can test this code?
For example, chr(XXX) what is remaining when my converters are keeping the Wide rules, but loosing if I make wrong code...
Thanks for every info you will write...
Do you know some character with I can test this code?
Any codepage beyond Win1250 - for example Cyrillic Win1251, Greek, Hebrew - almost all letters there would be missed from 1250/1252
You can take Jedi CodeLibrary and use its locale conversion routines: make a string consisting of #128 till #255 in some encoding like aforementioned, convert it to Unicode from that codepage and then convert back from Unicode to Hungarian codepage.
function StringToWideStringEx(const S: AnsiString; CodePage: Word): WideString;
function WideStringToStringEx(const WS: WideString; CodePage: Word): AnsiString;
Or in one call
function TranslateString(const S: AnsiString; CP1, CP2: Word): AnsiString;
Then look which chars failed to translate and turned into ReplacementCharacter.
However in JCL you'd have your Pos function and such ready to use. And XML parser. So why bother ?
Related
How can I make this code faster? the string can contain characters such as ", .?#" and possibly others.
Const Nums = ['0'..'9'];
function CleanNumber(s: String): Int64;
Var z: Cardinal;
begin
for z := length(s) downto 1 do
if not (s[z] in Nums) then Delete(s,z,1);
if s = '' then
Result := 0 else
Result := StrToInt64(s);
end;
Results (long loop):
CL2,CL3 = HeartWare's
32-bit, "dirty number" / "clean number"
Mine: 270ms, 165ms
CL2: 220ms, 210ms
CL3: 100ms, 110ms
DirtyStrToNum: 215ms, 90ms
64-bit, "dirty number" / "clean number"
Mine: 2280ms, 75ms
CL2: 1320ms, 130ms
CL3: 280ms, 25ms
DirtyStrToNum: 1390ms, 125ms
Here are two examples that for sure are faster than the one you have (deleting a character from a string is relatively slow):
This one works by pre-allocating a string of the maximum possible length and then filling it out with the digits as I come across them in the source string. No delete for every unsupported character, and no expansion of the target string for every supported character.
FUNCTION CleanNumber(CONST S : STRING) : Int64;
VAR
I,J : Cardinal;
C : CHAR;
T : STRING;
BEGIN
SetLength(T,LENGTH(S));
J:=LOW(T);
FOR I:=LOW(S) TO HIGH(S) DO BEGIN
C:=S[I];
IF (C>='0') AND (C<='9') THEN BEGIN
T[J]:=C;
INC(J)
END
END;
IF J=LOW(T) THEN
Result:=0
ELSE BEGIN
SetLength(T,J-LOW(T)); // or T[J]:=#0 [implementation-specific]
Result:=StrToInt64(T)
END
END;
This one works by simple multiplication of the end result by 10 and adding the corresponding digit value.
{$IFOPT Q+}
{$DEFINE OverflowEnabled }
{$ELSE }
{$Q+ If you want overflow checking }
{$ENDIF }
FUNCTION CleanNumber(CONST S : STRING) : Int64;
VAR
I : Cardinal;
C : CHAR;
BEGIN
Result:=0;
FOR I:=LOW(S) TO HIGH(S) DO BEGIN
C:=S[I];
IF (C>='0') AND (C<='9') THEN Result:=Result*10+(ORD(C)-ORD('0'))
END
END;
{$IFNDEF OverflowEnabled } {$Q-} {$ENDIF }
{$UNDEF OverflowEnabled }
Also note that I don't use IN or CharInSet as these are much slower than a simple inline >= and <= comparison.
Another comment I could make is the use of LOW and HIGH on the string variable. This makes it compatible with both 0-based strings (mobile compilers) and 1-based strings (desktop compilers).
Your function is slow mainly because of the Delete approach. Each call to Delete needs to move a lot of characters around.
A faster approach would be like this:
function DirtyStrToNum(const S: string): Int64;
var
tmp: string;
i, j: Integer;
const
DIGITS = ['0'..'9'];
begin
SetLength(tmp, S.Length);
j := 0;
for i := 1 to S.Length do
if CharInSet(S[i], DIGITS) then
begin
Inc(j);
tmp[j] := S[i];
end;
SetLength(tmp, j);
if tmp.IsEmpty then
Result := 0
else
Result := StrToInt64(tmp);
// Or, but not equivalent: Result := StrToInt64Def(tmp, 0);
end;
Notice I make a single allocation for a new string, and then only copy the minimum number of characters to it.
I have two functions:
function EnCryptSymbolVij(text, key:char; abc:string):char;
var st: string;
positionText: word;
begin
if text=' ' then begin EnCryptSymbolVij:=' '; exit; end;
PositionText:=pos(text,abc);
if positionText=0 then exit;
st:=MoveVij(length(abc)-pos(Key,abc)+1, abc);
EnCryptSymbolVij:=st[positionText];
end;
function EnCryptVij(text, key, abc:string):string;
var i, n, j: longword;
st:string;
begin
setlength(st,length(text));
j:=1;
for i:=1 to length(text) do
begin
st[i]:=EnCryptSymbolVij(text[i], Key[j], abc);
inc(j);
if j>length(key) then j:=1;
end;
EnCryptVij:=st;
End;
In the first function, I add space support. But when I encrypt the text with spaces, I get a wrong encryption, because the second function counts spaces like a part of alphavit. How can I fix this and ignore encrypt spaces in the second function? Please help to resolve this trouble.
You only support alphabetic characters and spaces. If you want to do more you need to be a bit cleverer in you code. Spaces are not encrypted, and nothing encrypts to a space so you just need to add that test in your decryption. If you later support other non alphabetic characters you will need to do the same for these too, but the test is the same for both encryption and decryption. So for spaces, your decryption code for a symbol will start just as your encryption does, like this
function DeCryptSymbolVij(text, key:char; abc:string):char;
var st: string;
positionText: word;
begin
if text=' ' then begin DeCryptSymbolVij:=' '; exit; end;
Personally I prefer to use 'Result' rather than the function name, but that is just personal preference.
Edit
To cater for spaces in the second function, you could just surround the inc(j) with a test for spaces, like this
function EnCryptVij(text, key, abc:string):string;
var i, n, j: longword;
st:string;
begin
setlength(st,length(text));
j:=1;
for i:=1 to length(text) do
begin
st[i]:=EnCryptSymbolVij(text[i], Key[j], abc);
if st[ I ] <> ' ' then
begin
inc(j);
end;
if j>length(key) then j:=1;
end;
EnCryptVij:=st;
End;
but to 'future proof' it you might be better adding a Var boolean parameter to the first function and setting it if a substitution has occurred:
function EnCryptSymbolVij(text, key:char; abc:string; var changed : boolean):char;
var st: string;
positionText: word;
begin
changed := FALSE;
PositionText:=pos(text,abc);
if positionText=0 then exit;
// else
changed := TRUE;
st:=MoveVij(length(abc)-pos(Key,abc)+1, abc);
EnCryptSymbolVij:=st[positionText];
end;
function EnCryptVij(text, key, abc:string):string;
var i, n, j: longword;
iChanged : Boolean;
st:string;
begin
setlength(st,length(text));
j:=1;
for i:=1 to length(text) do
begin
st[i]:=EnCryptSymbolVij(text[i], Key[j], abc, iChanged);
if iChanged then
begin
inc(j);
end;
if j>length(key) then j:=1;
end;
EnCryptVij:=st;
End;
Note that the space test is not required in the first function (assuming there is not a space in 'abc')
I have an app that needs to do heavy text manipulation in a TStringList. Basically i need to split text by a delimiter ; for instance, if i have a singe line with 1000 chars and this delimiter occurs 3 times in this line, then i need to split it in 3 lines. The delimiter can contain more than one char, it can be a tag like '[test]' for example.
I've wrote two functions to do this task with 2 different approaches, but both are slow in big amounts of text (more then 2mbytes usually).
How can i achieve this goal in a faster way ?
Here are both functions, both receive 2 paramaters : 'lines' which is the original tstringlist and 'q' which is the delimiter.
function splitlines(lines : tstringlist; q: string) : integer;
var
s, aux, ant : string;
i,j : integer;
flag : boolean;
m2 : tstringlist;
begin
try
m2 := tstringlist.create;
m2.BeginUpdate;
result := 0;
for i := 0 to lines.count-1 do
begin
s := lines[i];
for j := 1 to length(s) do
begin
flag := lowercase(copy(s,j,length(q))) = lowercase(q);
if flag then
begin
inc(result);
m2.add(aux);
aux := s[j];
end
else
aux := aux + s[j];
end;
m2.add(aux);
aux := '';
end;
m2.EndUpdate;
lines.text := m2.text;
finally
m2.free;
end;
end;
function splitLines2(lines : tstringlist; q: string) : integer;
var
aux, p : string;
i : integer;
flag : boolean;
begin
//maux1 and maux2 are already instanced in the parent class
try
maux2.text := lines.text;
p := '';
i := 0;
flag := false;
maux1.BeginUpdate;
maux2.BeginUpdate;
while (pos(lowercase(q),lowercase(maux2.text)) > 0) and (i < 5000) do
begin
flag := true;
aux := p+copy(maux2.text,1,pos(lowercase(q),lowercase(maux2.text))-1);
maux1.add(aux);
maux2.text := copy(maux2.text,pos(lowercase(q),lowercase(maux2.text)),length(maux2.text));
p := copy(maux2.text,1,1);
maux2.text := copy(maux2.text,2,length(maux2.text));
inc(i);
end;
finally
result := i;
maux1.EndUpdate;
maux2.EndUpdate;
if flag then
begin
maux1.add(p+maux2.text);
lines.text := maux1.text;
end;
end;
end;
I've not tested the speed, but for academic purposes, here's an easy way to split the strings:
myStringList.Text :=
StringReplace(myStringList.Text, myDelimiter, #13#10, [rfReplaceAll]);
// Use [rfReplaceAll, rfIgnoreCase] if you want to ignore case
When you set the Text property of TStringList, it parses on new lines and splits there, so converting to a string, replacing the delimiter with new lines, then assigning it back to the Text property works.
The problems with your code (at least second approach) are
You are constantly using lowecase which is slow if called so many times
If I saw correctly you are copying the whole remaining text back to the original source. This is sure to be extra slow for large strings (eg files)
I have a tokenizer in my library. Its not the fastest or best but it should do (you can get it from Cromis Library, just use the units Cromis.StringUtils and Cromis.Unicode):
type
TTokens = array of ustring;
TTextTokenizer = class
private
FTokens: TTokens;
FDelimiters: array of ustring;
public
constructor Create;
procedure Tokenize(const Text: ustring);
procedure AddDelimiters(const Delimiters: array of ustring);
property Tokens: TTokens read FTokens;
end;
{ TTextTokenizer }
procedure TTextTokenizer.AddDelimiters(const Delimiters: array of ustring);
var
I: Integer;
begin
if Length(Delimiters) > 0 then
begin
SetLength(FDelimiters, Length(Delimiters));
for I := 0 to Length(Delimiters) - 1 do
FDelimiters[I] := Delimiters[I];
end;
end;
constructor TTextTokenizer.Create;
begin
SetLength(FTokens, 0);
SetLength(FDelimiters, 0);
end;
procedure TTextTokenizer.Tokenize(const Text: ustring);
var
I, K: Integer;
Counter: Integer;
NewToken: ustring;
Position: Integer;
CurrToken: ustring;
begin
SetLength(FTokens, 100);
CurrToken := '';
Counter := 0;
for I := 1 to Length(Text) do
begin
CurrToken := CurrToken + Text[I];
for K := 0 to Length(FDelimiters) - 1 do
begin
Position := Pos(FDelimiters[K], CurrToken);
if Position > 0 then
begin
NewToken := Copy(CurrToken, 1, Position - 1);
if NewToken <> '' then
begin
if Counter > Length(FTokens) then
SetLength(FTokens, Length(FTokens) * 2);
FTokens[Counter] := Trim(NewToken);
Inc(Counter)
end;
CurrToken := '';
end;
end;
end;
if CurrToken <> '' then
begin
if Counter > Length(FTokens) then
SetLength(FTokens, Length(FTokens) * 2);
FTokens[Counter] := Trim(CurrToken);
Inc(Counter)
end;
SetLength(FTokens, Counter);
end;
How about just using StrTokens from the JCL library
procedure StrTokens(const S: string; const List: TStrings);
It's open source
http://sourceforge.net/projects/jcl/
As an additional option, you can use regular expressions. Recent versions of Delphi (XE4 and XE5) come with built in regular expression support; older versions can find a free regex library download (zip file) at Regular-Expressions.info.
For the built-in regex support (uses the generic TArray<string>):
var
RegexObj: TRegEx;
SplitArray: TArray<string>;
begin
SplitArray := nil;
try
RegexObj := TRegEx.Create('\[test\]'); // Your sample expression. Replace with q
SplitArray := RegexObj.Split(Lines, 0);
except
on E: ERegularExpressionError do begin
// Syntax error in the regular expression
end;
end;
// Use SplitArray
end;
For using TPerlRegEx in earlier Delphi versions:
var
Regex: TPerlRegEx;
m2: TStringList;
begin
m2 := TStringList.Create;
try
Regex := TPerlRegEx.Create;
try
Regex.RegEx := '\[test\]'; // Using your sample expression - replace with q
Regex.Options := [];
Regex.State := [preNotEmpty];
Regex.Subject := Lines.Text;
Regex.SplitCapture(m2, 0);
finally
Regex.Free;
end;
// Work with m2
finally
m2.Free;
end;
end;
(For those unaware, the \ in the sample expression used are because the [] characters are meaningful in regular expressions and need to be escaped to be used in the regular expression text. Typically, they're not required in the text.)
I am trying to validate a string, where by it can contain all alphebetical and numerical characters, aswell as the underline ( _ ) symbol.
This is what I tried so far:
var
S: string;
const
Allowed = ['A'..'Z', 'a'..'z', '0'..'9', '_'];
begin
S := 'This_is_my_string_0123456789';
if Length(S) > 0 then
begin
if (Pos(Allowed, S) > 0 then
ShowMessage('Ok')
else
ShowMessage('string contains invalid symbols');
end;
end;
In Lazarus this errors with:
Error: Incompatible type for arg no. 1: Got "Set Of Char", expected
"Variant"
Clearly my use of Pos is all wrong and I am not sure if my approach is even the correct way of going about it or not?
Thanks.
You will have to check every single character of the string, if it's contained in Allowed
e.g.:
var
S: string;
const
Allowed = ['A' .. 'Z', 'a' .. 'z', '0' .. '9', '_'];
Function Valid: Boolean;
var
i: Integer;
begin
Result := Length(s) > 0;
i := 1;
while Result and (i <= Length(S)) do
begin
Result := Result AND (S[i] in Allowed);
inc(i);
end;
if Length(s) = 0 then Result := true;
end;
begin
S := 'This_is_my_string_0123456789';
if Valid then
ShowMessage('Ok')
else
ShowMessage('string contains invalid symbols');
end;
TYPE TCharSet = SET OF CHAR;
FUNCTION ValidString(CONST S : STRING ; CONST ValidChars : TCharSet) : BOOLEAN;
VAR
I : Cardinal;
BEGIN
Result:=FALSE;
FOR I:=1 TO LENGTH(S) DO IF NOT (S[I] IN ValidChars) THEN EXIT;
Result:=TRUE
END;
If you are using a Unicode version of Delphi (as you seem to be), beware that a SET OF CHAR cannot contain all valid characters in the Unicode character set. Then perhaps this function will be useful instead:
FUNCTION ValidString(CONST S,ValidChars : STRING) : BOOLEAN;
VAR
I : Cardinal;
BEGIN
Result:=FALSE;
FOR I:=1 TO LENGTH(S) DO IF POS(S[I],ValidChars)=0 THEN EXIT;
Result:=TRUE
END;
but then again, not all characters (actually Codepoints) in Unicode can be expressed by a single character, and some characters can be expressed in more than one way (both as a single character and as a multi-character).
But as long as you constrain yourself within these limitations, one of the above functions should be useful. You can even include both, if you add an "OVERLOAD;" directive to the end of each function declaration, as in:
FUNCTION ValidString(CONST S : STRING ; CONST ValidChars : TCharSet) : BOOLEAN; OVERLOAD;
FUNCTION ValidString(CONST S,ValidChars : STRING) : BOOLEAN; OVERLOAD;
Lazarus/Free Pascal doesn't overload pos for that but has "posset" variants in unit strutils for that;
http://www.freepascal.org/docs-html/rtl/strutils/posset.html
Regarding Andreas' (IMHO correct ) remark, you can use isemptystr for that. It was meant to check for strings that only contain whitespace, but it basically checks if a string only contains characters in a set.
http://www.freepascal.org/docs-html/rtl/strutils/isemptystr.html
You can use Regular Expressions:
uses System.RegularExpressions;
if not TRegEx.IsMatch(S, '^[_a-zA-Z0-9]+$') then
ShowMessage('string contains invalid symbols');
i need to get paper status information from a printer. I have a list of esc/pos commands.
I'm trying to send these comands with escape function
http://msdn.microsoft.com/en-us/library/windows/desktop/dd162701%28v=vs.85%29.aspx
This is my code
type
TPrnBuffRec = record
bufflength: Word;
Buff_1: array[0..255] of Char;
end;
procedure TFTestStampa.SpeedButton2Click(Sender: TObject);
var
Buff: TPrnBuffRec;
BuffOut: TPrnBuffRec;
TestInt: Integer;
cmd : string;
begin
printer.BeginDoc;
try
TestInt := PassThrough;
if Escape(Printer.Handle, QUERYESCSUPPORT, SizeOf(TESTINT),
#testint, nil) > 0 then
begin
cmd := chr(10) + chr(04) + '4';
StrPCopy(Buff.Buff_1, cmd);
Buff.bufflength := StrLen(Buff.Buff_1);
Escape(Printer.Canvas.Handle, Passthrough, 0, #buff,
#buffOut);
ShowMessage( conver(strPas(buffOut.Buff_1)) );
end
finally
printer.EndDoc;
end;
function TFTestStampa.Conver(s: string): String;
var
i: Byte;
t : String;
begin
t := '';
for i := 1 to Length(s) do
t := t + IntToHex(Ord(s[i]), 2) + ' ';
Result := t;
end;
Problem is with different cmds I obtain always the same string ....
Can you give me an example of escape function with last parameter not nill ?
Alternatives to obtain paper status ?
I suppose you are using Delphi 2009 above and you used this source for your example, so your problem might be caused by Unicode parameters. In Delphi since version 2009, string type is defined as UnicodeString whilst in Delphi 2009 below as AnsiString, the same stands also for Char which is WideChar in Delphi 2009 up and AnsiChar below.
If so, then I think you have a problem at least with your buffer data length, because Char = WideChar takes 2 bytes and you were using StrLen function which returns the number of chars what cannot correspond to the data size of number of chars * 2 bytes.
I hope this will fix your problem, but I can't verify it, because I don't have your printer :)
type
TPrinterData = record
DataLength: Word;
Data: array [0..255] of AnsiChar; // let's use 1 byte long AnsiChar
end;
function Convert(const S: AnsiString): string;
var
I: Integer; // 32-bit integer is more efficient than 8-bit byte type
T: string; // here we keep the native string data type
begin
T := '';
for I := 1 to Length(S) do
T := T + IntToHex(Ord(S[I]), 2) + ' ';
Result := T;
end;
procedure TFTestStampa.SpeedButton2Click(Sender: TObject);
var
TestInt: Integer;
Command: AnsiString;
BufferIn: TPrinterData;
BufferOut: TPrinterData;
begin
Printer.BeginDoc;
try
TestInt := PASSTHROUGH;
if Escape(Printer.Handle, QUERYESCSUPPORT, SizeOf(TestInt), #TestInt, nil) > 0 then
begin
Command := Chr(10) + Chr(04) + '4';
StrPCopy(BufferIn.Data, Command);
BufferIn.DataLength := StrLen(Command);
FillChar(BufferOut.Data, Length(BufferOut.Data), #0);
BufferOut.DataLength := 0;
Escape(Printer.Canvas.Handle, PASSTHROUGH, 0, #BufferIn, #BufferOut);
ShowMessage(Convert(StrPas(BufferOut.Data)));
end
finally
Printer.EndDoc;
end;
end;