This is the declaration of TSysCharSet under delphi Berlin
TSysCharSet = set of Char deprecated; // Holds Char values in the ordinal range of 0..255 only.
it is now deprecated, but by what to replace it ? i just need to gave to my function some set of char like [' ', #9, #13, #10]
If all you need is to carry around a group of (unicode) characters then you don't need TSysCharSet. Just use a dynamic array of char:
var
MyCharArray : TArray<char>;
begin
MyCharArray := [' ',#9,#13,#10];
end;
TSysCharSet was primarily used in the CharInSet routine; In the docs they refer to TCharHelper as a replacement for CharInSet since a TSysCharSet cannot contain unicode characters.
uses System.Character;
var
WhiteSpace : TSysCharSet;
ac : ansichar;
c : char;
begin
// replace this....
ac := #9;
WhiteSpace := [' ',#9,#13,#10];
if CharInSet(ac, WhiteSpace) then
begin
end;
// ...with this:
c := #9;
if c.IsWhiteSpace then
begin
end
end
Related
We are trying to write a UDF in Delphi (10 Seattle) for our Firebird 2.5 database which should remove some characters from the input string.
All our string fields in the database are using character set UTF8 with collation UNICODE_CI_AI.
The function should remove some characters like space, . ; : / \ and others from the string.
Our function works fine for strings containing characters with ascii value <= 127. As soon as there are characters with ascii value bigger than 127, the UDF fails.
We have tried using PChar instead of PAnsiChar parameters but without success. For now we do a check if the character has an ascii value above 127 and if so, we remove that character from the string too.
What we want though, is a UDF that returns the original string without the punctuation characters.
This is our code so far:
unit UDFs;
interface
uses ib_util;
function UDF_RemovePunctuations(InputString: PAnsiChar): PAnsiChar; cdecl;
implementation
uses SysUtils, AnsiStrings, Classes;
//FireBird declaration:
//DECLARE EXTERNAL FUNCTION UDF_REMOVEPUNCTUATIONS
// CSTRING(500)
//RETURNS CSTRING(500) FREE_IT
//ENTRY_POINT 'UDF_RemovePunctuations' MODULE_NAME 'FB_UDF.dll';
function UDF_RemovePunctuations(InputString: PAnsiChar): PAnsiChar;
const
PunctuationChars = [' ', ',', '.', ';', '/', '\', '''', '"','(', ')'];
var
I: Integer;
S, NewS: String;
begin
S := UTF8ToUnicodeString(InputString);
For I := 1 to Length(S) do
begin
If Not CharInSet(S[I], PunctuationChars)
then begin
If S[I] <= #127
then NewS := NewS + S[I];
end;
end;
Result := ib_util_malloc(Length(NewS) + 1);
NewS := NewS + #0;
AnsiStrings.StrPCopy(Result, NewS);
end;
end.
When we remove the check on ascii value <= #127 we can see that NewS contains all characters as it should be (without the punctuation characters of course) but things go wrong when doing the StrPCopy we think.
Any help would be appreciated!
Thanks to LU RD I got this working.
The answer was to declare my string variables as Utf8String instead of String and not converting the inputstring to Unicode.
I have adapted my code like this:
//FireBird declaration:
//DECLARE EXTERNAL FUNCTION UDF_REMOVEPUNCTUATIONS
// CSTRING(500)
//RETURNS CSTRING(500) FREE_IT
//ENTRY_POINT 'UDF_RemovePunctuations' MODULE_NAME 'CarfacPlus_UDF.dll';
function UDF_RemovePunctuations(InputString: PAnsiChar): PAnsiChar;
const
PunctuationChars = [' ', ',', '.', ';', '/', '\', '''', '"','(', ')', '-',
'+', ':', '<', '>', '=', '[', ']', '{', '}'];
var
I: Integer;
S: Utf8String;
begin
S := InputString;
For I := Length(S) downto 1 do
If CharInSet(S[I], PunctuationChars)
then Delete(S, I, 1);
Result := ib_util_malloc(Length(S) + 1);
AnsiStrings.StrPCopy(Result, AnsiString(S));
end;
I use the StrUtils in to split a string into a TStringDynArray, but the output was not as expected. I will try to explain the issue:
I have a string str: 'a'; 'b'; 'c'
Now I called StrUtils.SplitString(str, '; '); to split the string and I expected an array with three elements: 'a', 'b', 'c'
But what I got is an array with five elements: 'a', '', 'b', '', 'c'.
When I split with just ';' instead of '; ' I get three elements with a leading blank.
So why do I get empty strings in my first solution?
This function is designed not to merge consecutive separators. For instance, consider splitting the following string on commas:
foo,,bar
What would you expect SplitString('foo,,bar', ',') to return? Would you be looking for ('foo', 'bar') or should the answer be ('foo', '', 'bar')? It's not clear a priori which is right, and different use cases might want different output.
If your case, you specified two delimiters, ';' and ' '. This means that
'a'; 'b'
splits at ';' and again at ' '. Between those two delimiters there is nothing, and hence an empty string is returned in between 'a' and 'b'.
The Split method from the string helper introduced in XE3 has a TStringSplitOptions parameter. If you pass ExcludeEmpty for that parameter then consecutive separators are treated as a single separator. This program:
{$APPTYPE CONSOLE}
uses
System.SysUtils;
var
S: string;
begin
for S in '''a''; ''b''; ''c'''.Split([';', ' '], ExcludeEmpty) do begin
Writeln(S);
end;
end.
outputs:
'a'
'b'
'c'
But you do not have this available to you in XE2 so I think you are going to have to roll your own split function. Which might look like this:
function IsSeparator(const C: Char; const Separators: string): Boolean;
var
sep: Char;
begin
for sep in Separators do begin
if sep=C then begin
Result := True;
exit;
end;
end;
Result := False;
end;
function Split(const Str, Separators: string): TArray<string>;
var
CharIndex, ItemIndex: Integer;
len: Integer;
SeparatorCount: Integer;
Start: Integer;
begin
len := Length(Str);
if len=0 then begin
Result := nil;
exit;
end;
SeparatorCount := 0;
for CharIndex := 1 to len do begin
if IsSeparator(Str[CharIndex], Separators) then begin
inc(SeparatorCount);
end;
end;
SetLength(Result, SeparatorCount+1); // potentially an over-allocation
ItemIndex := 0;
Start := 1;
CharIndex := 1;
for CharIndex := 1 to len do begin
if IsSeparator(Str[CharIndex], Separators) then begin
if CharIndex>Start then begin
Result[ItemIndex] := Copy(Str, Start, CharIndex-Start);
inc(ItemIndex);
end;
Start := CharIndex+1;
end;
end;
if len>Start then begin
Result[ItemIndex] := Copy(Str, Start, len-Start+1);
inc(ItemIndex);
end;
SetLength(Result, ItemIndex);
end;
Of course, all of this assumes that you want a space to act as a separator. You've asked for that in the code, but perhaps you actually want just ; to act as a separator. In that case you probably want to pass ';' as the separator, and trim the strings that are returned.
SplitString is defined as
function SplitString(const S, Delimiters: string): TStringDynArray;
One would thought that Delimiters denote single delimiter string used for splitting string, but it actually denotes set of single characters used to split string. Each character in Delimiters string will be used as one of possible delimiters.
SplitString
Splits a string into different parts delimited by the specified
delimiter characters. SplitString splits a string into different parts
delimited by the specified delimiter characters. S is the string to be
split. Delimiters is a string containing the characters defined as
delimiters.
It is because the second parameter of SplitString is a list of single character delimiters, so '; ' means split at a ';' OR split at a ' '. So the string is split at every ';' and at every space, and between the ';' and the ' ' there is nothing, hence the empty strings.
I am trying to validate a string, where by it can contain all alphebetical and numerical characters, aswell as the underline ( _ ) symbol.
This is what I tried so far:
var
S: string;
const
Allowed = ['A'..'Z', 'a'..'z', '0'..'9', '_'];
begin
S := 'This_is_my_string_0123456789';
if Length(S) > 0 then
begin
if (Pos(Allowed, S) > 0 then
ShowMessage('Ok')
else
ShowMessage('string contains invalid symbols');
end;
end;
In Lazarus this errors with:
Error: Incompatible type for arg no. 1: Got "Set Of Char", expected
"Variant"
Clearly my use of Pos is all wrong and I am not sure if my approach is even the correct way of going about it or not?
Thanks.
You will have to check every single character of the string, if it's contained in Allowed
e.g.:
var
S: string;
const
Allowed = ['A' .. 'Z', 'a' .. 'z', '0' .. '9', '_'];
Function Valid: Boolean;
var
i: Integer;
begin
Result := Length(s) > 0;
i := 1;
while Result and (i <= Length(S)) do
begin
Result := Result AND (S[i] in Allowed);
inc(i);
end;
if Length(s) = 0 then Result := true;
end;
begin
S := 'This_is_my_string_0123456789';
if Valid then
ShowMessage('Ok')
else
ShowMessage('string contains invalid symbols');
end;
TYPE TCharSet = SET OF CHAR;
FUNCTION ValidString(CONST S : STRING ; CONST ValidChars : TCharSet) : BOOLEAN;
VAR
I : Cardinal;
BEGIN
Result:=FALSE;
FOR I:=1 TO LENGTH(S) DO IF NOT (S[I] IN ValidChars) THEN EXIT;
Result:=TRUE
END;
If you are using a Unicode version of Delphi (as you seem to be), beware that a SET OF CHAR cannot contain all valid characters in the Unicode character set. Then perhaps this function will be useful instead:
FUNCTION ValidString(CONST S,ValidChars : STRING) : BOOLEAN;
VAR
I : Cardinal;
BEGIN
Result:=FALSE;
FOR I:=1 TO LENGTH(S) DO IF POS(S[I],ValidChars)=0 THEN EXIT;
Result:=TRUE
END;
but then again, not all characters (actually Codepoints) in Unicode can be expressed by a single character, and some characters can be expressed in more than one way (both as a single character and as a multi-character).
But as long as you constrain yourself within these limitations, one of the above functions should be useful. You can even include both, if you add an "OVERLOAD;" directive to the end of each function declaration, as in:
FUNCTION ValidString(CONST S : STRING ; CONST ValidChars : TCharSet) : BOOLEAN; OVERLOAD;
FUNCTION ValidString(CONST S,ValidChars : STRING) : BOOLEAN; OVERLOAD;
Lazarus/Free Pascal doesn't overload pos for that but has "posset" variants in unit strutils for that;
http://www.freepascal.org/docs-html/rtl/strutils/posset.html
Regarding Andreas' (IMHO correct ) remark, you can use isemptystr for that. It was meant to check for strings that only contain whitespace, but it basically checks if a string only contains characters in a set.
http://www.freepascal.org/docs-html/rtl/strutils/isemptystr.html
You can use Regular Expressions:
uses System.RegularExpressions;
if not TRegEx.IsMatch(S, '^[_a-zA-Z0-9]+$') then
ShowMessage('string contains invalid symbols');
Here is another question about convert old code to D2009 and Unicode. I'm certain that there is simple but i don't see the solution...
CharacterSet is a set of Char and s[i] should also be a Char.
But the compiler still think there is a conflict between AnsiChar and Char.
The code:
TSetOfChar = Set of Char;
procedure aFunc;
var
CharacterSet: TSetOfChar;
s: String;
j: Integer;
CaseSensitive: Boolean;
begin
// Other code that assign a string to s
// Set CaseSensitive to a value
CharacterSet := [];
for j := 1 to Length(s) do
begin
Include(CharacterSet, s[j]); // E2010 Incompatible types: 'AnsiChar' and 'Char'
if not CaseSensitive then
begin
Include(CharacterSet, AnsiUpperCase(s[j])[1]);
Include(CharacterSet, AnsiLowerCase(s[j])[1])
end
end;
end;
Because a Pascal set can't have a range higher than 0..255, the compiler quietly converts sets of chars to sets of AnsiChars. That's what's causing trouble for you.
There is no good and simple answer to the question (the reason is already given by Mason). The good solution is to reconsider the algoritm to get rid off "set of char" type. The quick and dirty solution is to preserve ansi chars and strings:
TSetOfChar = Set of AnsiChar;
procedure aFunc;
var
CharacterSet: TSetOfChar;
s: String;
S1, SU, SL: Ansistring;
j: Integer;
CaseSensitive: Boolean;
begin
// Other code that assign a string to s
// Set CaseSensitive to a value
S1:= s;
SU:= AnsiUpperCase(s);
SL:= AnsiLowerCase(s);
CharacterSet := [];
for j := 1 to Length(S1) do
begin
Include(CharacterSet, S1[j]);
if not CaseSensitive then
begin
Include(CharacterSet, SU[j]);
Include(CharacterSet, SL[j]);
end
end;
end;
Delphi does not support sets of Unicode characters. You can only use AnsiChar in a set, but that's not big enough to fit all the possible characters your string might hold.
Instead of Delphi's native set type, though, you can use the TBits type.
procedure aFunc;
var
CharacterSet: TBits;
s: String;
c: Char;
CaseSensitive: Boolean;
begin
// Other code that assign a string to s
// Set CaseSensitive to a value
CharacterSet := TBits.Create;
try
for c in s do begin
CharacterSet[Ord(c)] := True;
if not CaseSensitive then begin
CharacterSet[Ord(Character.ToUpper(c))] := True;
CharacterSet[Ord(Character.ToLower(c))] := True;
end
end;
finally
CharacterSet.Free;
end;
end;
A TBits object automatically expends to accommodate the highest bit it needs to represent.
Other changes I made to your code include using the new "for-in" loop style, and the new Character unit for dealing with Unicode characters.
I have a submenu that list departments. Behind this each department have an action who's name is assigned 'actPlan' + department.name.
Now I realize this was a bad idea because the name can contain any strange character in the world but the action.name cannot contain international characters. Obviously Delphi IDE itself call some method to validate if a string is a valid componentname. Anyone know more about this ?
I have also an idea to use
Action.name := 'actPlan' + department.departmentID;
instead. The advantage is that departmentID is a known format, 'xxxxx-x' (where x is 1-9), so I have only to replace '-' with for example underscore. The problem here is that those old actionnames are already persisted in a personal textfile. It will be exceptions if I suddenly change from using departments name to the ID.
I could of course eat the exception first time and then call a method that search replace that textfile with the right data and reload it.
So basically I search the most elegant and futureproof method to solve this :)
I use D2007.
Component names are validated using the IsValidIdent function from SysUtils, which simply checks whether the first character is alphabetic or an underscore and whether all subsequent characters are alphanumeric or an underscore.
To create a string that fits those rules, simply remove any characters that don't qualify, and then add a qualifying character if the result starts with a number.
That transformation might yield the same result for similar names. If that's not something you want, then you can add something unique to the end of the string, such as a checksum computed from the input string, or your department ID.
function MakeValidIdent(const s: string): string;
var
len: Integer;
x: Integer;
c: Char;
begin
SetLength(Result, Length(s));
x := 0;
for c in s do
if c in ['A'..'Z', 'a'..'z', '0'..'9', '_'] then begin
Inc(x);
Result[x] := c;
end;
SetLength(Result, x);
if x = 0 then
Result := '_'
else if Result[1] in ['0'..'9'] then
Result := '_' + Result;
// Optional uniqueness protection follows. Choose one.
Result := Result + IntToStr(Checksum(s));
Result := Result + GetDepartment(s).ID;
end;
In Delphi 2009 and later, replace the second two in operators with calls to the CharInSet function. (Unicode characters don't work well with Delphi sets.) In Delphi 8 and earlier, change the first in operator to a classic for loop and index into s.
I have written a routine
// See SysUtils.IsValidIdent:
function MakeValidIdent(const AText: string): string;
const
Alpha = ['A'..'Z', 'a'..'z', '_'];
AlphaNumeric = Alpha + ['0'..'9'];
function IsValidChar(AIndex: Integer; AChar: Char): Boolean;
begin
if AIndex = 1 then
Result := AChar in Alpha
else
Result := AChar in AlphaNumeric;
end;
var
i: Integer;
begin
Result := AText;
for i := 1 to Length(Result) do
if not IsValidChar(i, Result[i]) then
Result[i] := '_';
end;
which makes Pascal identifiers from strings.
You might also want to copy FindUniqueName from Classes.pas and apply that to the result from MakeValidIdent.
Here is my routine:
function MakeValidIdent(const s: string): string;
begin
Result := 'clm'; //Prefix
for var c in s do
if CharInSet(c, ['A'..'Z', 'a'..'z', '0'..'9', '_']) then
Result := Result + c;
end;