What is the equivalent of System.Character.TCharHelper.IsWhiteSpace / IsLetter / IsNumber for AnsiChar (UTF8)?
In general, it does not make sense to ask whether a single UTF-8 element (single byte) represents a whitespace. That's because UTF-8 is a variable length encoding and a code point may require more than a single byte to define it.
So you cannot ask whether or not a single byte is a whitespace, unless it encodes an ASCII character, i.e. < 128.
What you would need to do is to take the sequence of bytes that encode the code point of interest, and convert them into a UTF-32 value in a UCS4Char variable. Then pass that to the UCS4Char overload of TCharHelper.IsWhiteSpace.
However, that approach is not well supported by the Delphi libraries. The simplest way to do what you wish in Delphi is:
Convert your UTF-8 string to be a native UTF-16 Delphi string.
Use TCharHelper.IsWhiteSpace(str, index) to query for the code point at position index.
If your question goes as to how to check if a UTF8-string variable is all white spaces, you can use the following RECORD HELPER:
TYPE
U8StringHelper = RECORD HELPER FOR UTF8String
FUNCTION IsAllWhiteSpaces : BOOLEAN;
END;
FUNCTION U8StringHelper.IsAllWhiteSpaces : BOOLEAN;
VAR
C : CHAR;
S : UnicodeString;
BEGIN
S:=Self;
FOR C IN S DO IF NOT C.IsWhiteSpace THEN EXIT(FALSE);
Result:=TRUE
END;
Then you can use it as in:
VAR
U8 : UTF8String;
BEGIN
U8:=' '#13#10;
IF U8.IsAllWhiteSpaces THEN WRITELN('Yes') ELSE WRITELN('No');
U8:=' X'#13#10;
IF U8.IsAllWhiteSpaces THEN WRITELN('Yes') ELSE WRITELN('No');
END.
This will write out "Yes" followed by "No".
But please beware, that by defining your own helper for the UTF8String type, you are eliminating the access to any that may have been defined by the system. If that is a problem, you'll have to make a standard function instead:
FUNCTION IsAllWhiteSpaces(CONST U8 : UTF8String) : BOOLEAN;
VAR
C : CHAR;
S : UnicodeString;
BEGIN
S:=U8;
FOR C IN S DO IF NOT C.IsWhiteSpace THEN EXIT(FALSE);
Result:=TRUE
END;
and use it as follows:
VAR
U8 : UTF8String;
BEGIN
U8:=' '#13#10;
IF IsAllWhiteSpaces(U8) THEN WRITELN('Yes') ELSE WRITELN('No');
U8:=' X'#13#10;
IF IsAllWhiteSpaces(U8) THEN WRITELN('Yes') ELSE WRITELN('No');
END.
I'll leave the making of the other IsXXX functions up to the reader...
Okay - after we have finally determined the proper question, the easiest way for you is to simply cast-up the AnsiChar variable to a proper UNICODE char and then do your thing.
VAR
A : AnsiChar;
BEGIN
IF CHAR(A).IsLetter THEN ...
END.
HOWEVER: working with individual characters from a UTF-8 string is not advisable, as many characters (by the very nature of UTF-8) consists of TWO characters. You are therefore not able to decide if a single AnsiChar from UTF-8 string is anything, as it can merely be a "prefix"/"escape" character, and the actual character is the following character from the string.
So the best way would be to have your UTF8-String and assign it to a UNICODE string variable, and then use the proper CHAR type to iterate over it.
If your question is how to "convert" an AnsiString encoded in UTF-8 into a UNICODE string, you can use the following routine:
FUNCTION AnsiUTF8toUNICODE(CONST S : AnsiString) : STRING;
BEGIN
Result:=UTF8ToUnicodeString(RawString(S))
END;
Related
I have a simple Labview dll that takes a PascalString then returns the pascal string with no changes. This is just testing what we can do. The header is as follows:
void __stdcall Read_String_In_Write_String_Out(PStr String_input,
PStr String_output);
the Delphi code is as follows:
var
hbar : thandle;
str, str2 : PChar;
StringFunction : function (TestString: PChar): PChar; stdcall;
begin
hbar := LoadLibrary('C:\Interface.dll');
if hbar >= 32 then begin
StringFunction := getprocaddress(hbar, 'Read_String_In_Write_String_Out');
str := 'test';
str2 := StringFunction(str);
end;
end;
When running the program i get an Access Violation. I have no issues when doing simple math functions using dll's, but when it comes to strings everything breaks.
Can anyone help?
You say that the DLL function is taking in a Pascal String. According to Labview's documentation:
Pascal String Pointer is a pointer to the string, preceded by a length byte.
Pascal-Style Strings (PStr)
A Pascal-style string (PStr) is a series of unsigned characters. The value of the first character indicates the length of the string. A PStr can have a range of 0 to 255 characters. The following code is the type definition for a Pascal string.
typedef uChar Str255[256], Str31[32], *StringPtr, **StringHandle;
typedef uChar *PStr;
This would be equivalent to Delphi's ShortString type (well, more accurately, PShortString, ie a pointer to a ShortString).
Based on the DLL function's declaration, its 2nd parameter is not a return value, it is an input parameter taking in a pointer by value. So your use of StringFunction is wrong on 2 counts:
Getting the output in the wrong place. StringFunction should be a procedure with 2 parameters. However, the function can't modify the pointer in the 2nd parameter, all it can do is read/write data from/to whatever memory the pointer is pointing at. So, for output, you will have to pre-allocate memory for the function to write to.
Passing around the wrong kind of string data. PChar is PWideChar in Delphi 2009+, but "Pascal strings" use AnsiChar instead. And your test data is not even a Pascal string, as it lacks the leading length byte.
So, try something more like this instead:
var
hbar : THandle;
str1, str2 : ShortString;
StringFunction : procedure (String_input, String_output: PShortString); stdcall;
begin
hbar := LoadLibrary('C:\Interface.dll');
if hbar >= 32 then
begin
StringFunction := GetProcAddress(hbar, 'Read_String_In_Write_String_Out');
str1 := 'test';
StringFunction(#str1, #str2);
end;
end;
Please excuse the silly question, but I'm confused. Consider the following method (sorry for noisy comments, this is a real code under development):
function HLanguages.GetISO639LangName(Index: Integer): string;
const
MaxIso639LangName = 9; { see msdn.microsoft.com/en-us/library/windows/desktop/dd373848 }
var
LCData: array[0..MaxIso639LangName-1] of Char;
Length: Integer;
begin
{ TODO : GetLocaleStr sucks, write proper implementation }
//Result := GetLocaleStr(LocaleID[Index], LOCALE_SISO639LANGNAME, '??');
Length := GetLocaleInfo(LocaleID[Index], LOCALE_SISO639LANGNAME, #LCData, System.Length(LCData));
Win32Check(Length <> 0);
SetString(Result, #LCData, Length); // "E2008 Incompatible types" here, but why?
end;
If I remove the reference operator then implicit cast from $X+ comes to the rescue and method compiles. Why compiler refuses this code with reference operator is beyond my understanding.
This is Delphi XE2 and this behaviour might be specific to it.
And if I add a test-case dummy with equivalent prototype as intrinsic one within the scope of HLanguages.GetISO639LangName this error will magically go away:
procedure SetString(var s: string; buffer: PChar; len: Integer);
begin
{ test case dummy }
end;
You have to explicitly convert it to PChar:
SetString(result,PChar(#LCData),Length);
As you stated, SetString() is very demanding about the 2nd parameter type. It must be either a PChar either a PWideChar either a PAnsiChar, depending on the string type itself.
I suspect this is due to the fact that SetString() is defined as overloaded with either a string, a WideString, or an AnsiString as 1st parameter. So in order to validate the right signature, it needs to have exact match of all parameters types:
SetString(var s: string; buf: PChar; len: integer); overload;
SetString(var s: AnsiString; buf: PAnsiChar; len: integer); overload;
SetString(var s: WideString; buf: PWideChar; len: integer); overload;
Of course, all those are "intrinsics", so you won't find such definition in system.pas, but directly some procedure like _LStrFromPCharLen() _UStrFromPCharLen() _WStrFromPWCharLen() or such.
This behavior is the same since early versions of Delphi, and is not a regression in XE2.
I think there's a compiler bug in there because the behaviour with SetString differs from the behaviour with overloaded functions that you provide. What's more there's an interaction with the Typed # operator compiler option. I don't know how you set that. I always enable it but I suspect I'm in the minority there.
So I cannot explain the odd behaviour, and answer the precise question you ask. I suspect the only way to answer it is to look at the internals of the compiler, and very few of us can do that.
Anyway, in case it helps, I think the cleanest way to pass the parameter is like so:
SetString(Result, LCData, Length);
This compiles no matter what you set Typed # operator to.
I know this doesn't answer the specific question regarding SetString, but I'd like to point out that you can do the same thing by simply writing
Result := LCData;
When assigning to a string, Delphi treats a static array of char with ZERO starting index, as a Null terminated string with maximum length. Consider the following:
var
IndexOneArray : array [ 1 .. 9 ] of char;
IndexZeroArray : array [ 0 .. 8 ] of char;
S : string;
T : string;
begin
IndexOneArray := 'ABCD'#0'EFGH';
IndexZeroArray := 'ABCD'#0'EFGH';
S := IndexOneArray;
T := IndexZeroArray;
ShowMessage ( 'S has ' + inttostr(length(S)) + ' chars. '
+ #13'T has ' + inttostr(length(T)) + ' chars. ' );
end;
This displays a message that S has 9 chars, while T has 4.
It will also work when the zero-index array has 9 non-null characters. The result will be 9 characters regardless of what's in the following memory locations.
Because LCData is pointer to the array, not to the Char. Sure, sometimes it happens that an array or a record or a class start with char-type variable, but consequences are not what statically-typed compiler should rely upon.
You have to take the pointer to a character in that array, not to the array itself.
SetString(Result, #LCData[Low(LCData)], Length);
Question One
I have
var example : array[0..15] of char;
I want to assign the value from an input to that variable
example := inputbox('Enter Name', 'Name', '');
In the highscores unit I have record and array
type
points = record
var
_MemoryName : array[0..15] of char;
_MemoryScore : integer;
end;
var
rank : array[1..3] of points;
var s: string;
a: packed array[0..15] of char;
highscoresdata.position[1]._MemoryName := StrPLCopy(a, s, Length(a)) ;
returns -> (186): E2010 Incompatible types: 'array[0..15] of Char' and 'PWideChar'
var s: string;
a: packed array[0..15] of char;
s := InputBox('caption', 'Caption', 'Caption');
FillChar(a[0], length(a) * sizeof(char), #0);
Move(s[1], a[0], length(a) * sizeof(char));
scores.rank[1]._MemoryName := <<tried both s and a>> ;
returns (189): E2008 Incompatible types
Question One
There are many ways. One is:
procedure TForm1.FormCreate(Sender: TObject);
var
s: string;
a: packed array[0..15] of char;
begin
s := InputBox(Caption, Caption, Caption);
assert(length(s) <= 16);
FillChar(a[0], length(a) * sizeof(char), #0);
Move(s[1], a[0], length(s) * sizeof(char));
end;
But there might be a more elegant solution to your original problem, I suspect.
Question Two
Every time you wish a function/procedure didn't have a particular argument, you should realize that there might be a problem with the design of the project. Nevertheless, it isn't uncommon that Sender parameters are superfluous, because they are almost omnipresent because of the design of the VCL (in particular, the TNotifyEvent). If you know that the receiving procedure doesn't care about the Sender parameter, simply give it anything, like Self or nil.
Question Three
Consider this code:
procedure TForm4.FormCreate(Sender: TObject);
var
a: packed array[0..15] of char;
b: packed array[0..15] of char;
begin
a := b;
end;
This doesn't work. You cannot treat arrays like strings; in particular, you cannot assign static arrays like this (a := b).
Instead, you have to do something like...
Move(b[0], a[0], length(a) * sizeof(char));
...or simply loop and copy one value at a time. But the above simple assignment (a := b) does work if you declare a static array type:
type
TChrArr = packed array[0..15] of char;
procedure TForm4.FormCreate(Sender: TObject);
var
a: TChrArr;
b: TChrArr;
begin
b := a;
end;
Andreas has you covered for question 1.
Question 2
I would arrange that your event handler called another method:
procedure TForm5.Edit1KeyPress(Sender: TObject; var Key: Char);
begin
RespondToEditControlKeyPress;
end;
That way you can just call RespondToEditControlKeyPress directly.
I'd guess that you want to call it with no parameters because you want code to run when the edit control's text is modified. You could perhaps use the OnChange event instead. And it may be that OnChange is more appropriate because pressing a key is not the only way to get text into an edit control.
By the way, it's better to ask one question at a time here on Stack Overflow.
For a quick way to copy string-type values into array-of-character type values. I suggest a small helper function like this:
procedure StrToCharArray( inputStr:String; var output; maxlen:Integer);
type
ArrayChar = Array[0..1] of Char;
begin
StrLCopy( PChar(#ArrayChar(output)[0]),PChar(inputStr),maxlen);
end;
Each time you call it, pass in the maximum length to be copied. Remember that if the buffer length is 15, you should pass in 14 as the maxlen, so that you leave room for the terminating nul character, if you intend to always terminate your strings:
StrToCharArray( UserInputStr, MyRecord.MyField, 14 );
This function will ensure that the data you copy into the record is null terminated, assuming that's what you wanted. Remember that in a fixed length character array it's up to you to decide what the rules are. Null terminated? Fully padded with spaces or null characters.... Strings and arrays-of-characters are so different, that there exist multiple possible ways of converting between the two.
If you don't intend to terminate your strings with nul, then you should use the FillChar+Move combination shown in someone else's answer.
The obvious answer is of course.
Don't use a packed array of char.
Use a string instead.
If you use ansistring, 1 char will always take 1 byte.
If you use shortstring ditto.
Ansistring is compatible with Pchar which is a pointer to a packed array of char.
So you can write
function inputbox(a,b,c: ansistring): pchar;
begin
Result:= a+b+c;
end;
var s: ansistring;
begin
s:= inputbox('a','b','c');
end;
Some advice
It looks like your are translating code from c to Delphi.
a packed array of char is exactly the same as the old (1995) shortstring minus the length byte at the beginning of shortstring.
The only reason I can think of to use packed array of char is when you are reading data to and from disk, and you have legacy code that you don't want to change.
I would keep the legacy code to read and write from disk and then transfer the data into an ansistring and from there on only use ansistring.
It's soooooooo much easier, Delphi does everything for you.
And... ansistring is much faster, gets automatically created and destroyed, can have any length (up to 2GB), uses less memory --because identical strings only get stored once (which means stringa:= stringb where a string is 20 chars is at least 5x faster using ansistrings than array's of char).
And of course best of all, buffer overflow errors are impossible with ansistring.
What about unicodestring?
Unicodestring is fine to use, but sometimes translation of chars happens when converting between packed array of char and unicodestring, therefore I recommend using ansistring in this context.
What you try to do is impossible, indeed:
highscoresdata.position[1]._MemoryName := StrPLCopy(a, s, Length(a));
That tries to assign a pointer (the result of StrPLCopy, a PWideChar in the last few versions of Delphi) to an array, which is indeed impossible. You can't copy an array like that. I would do:
StrLCopy(highscoresdata.position[1]._MemoryName, PChar(s),
Length(highscoresdata.position[1]._MemoryName));
That should work, and is IMO the simplest solution to copy a string to an array of characters. There is no need to use a as some kind of intermediate, and using Move is, IMO, rather low level and therefore a little tricky (it is easy to forget to multiply by the size of a character, it is unchecked, it does not add a #0, etc.), especially if you don't know what exactly you are doing.
This solution should even work for versions of Delphi before Delphi 2009, as it does not rely on the size of the character.
FWIW, I would not use packed arrays. Packed doesn't have a meaning in current Delphi, but could confuse the compiler and make the types incompatible.
When I compile this code
{$WARNINGS ON}
function Test(s: string): string;
var
t: string;
d: double;
begin
if s = '' then begin
t := 'abc';
d := 1;
end;
Result := t + FloatToStr(d);
end;
I get the warning "Variable 'd' might not have been initialized", but I do not get the same warning for variable 't'. This seems inconsistent. This code is only a simple example to show the compiler warnings, but I have just found a bug in my live code which would have been caught by a compile-time warning for uninitialised string variables. Can I switch this warning on somehow in Delphi 6? Or in a newer version of Delphi?
Nope, there is no switch for this. The warning doesn't occur because a string is a compiler managed type and is always initialized by the compiler.
Yes :-)
Use shortstrings or pChars
{$WARNINGS ON}
function Test: String;
var
p: pChar;
d: double;
begin
Result := p + FloatToStr(d);
end;
//This code will give a warning.
Seriously
No, the normal Delphi strings and shortstrings are automatically initialized to '' (empty string). Shortstrings live on the stack and don't need cleanup. Other strings are so called 'managed' types and automatically deleted when they are no longer used using reference counting.
PChars, the good news
pChars are just pointers. Delphi does not manage them.
However Delphi does automatically convert them to strings and visa versa.
pChars the bad news
If you convert a pChar to a string Delphi copies the contents of the pChar into the string and you are still responsible for destroying the pChar.
Also note that this copying takes time and if you do it a lot will slow your code down.
If you convert a string to a pChar Delphi will give you a pointer to the address the string lives in. And !! Delphi will stop managing the string. You can still assign values to the string, but it will no longer automatically grow.
From: http://www.marcocantu.com/epascal/English/ch07str.htm
The following code will not work as expected:
procedure TForm1.Button2Click(Sender: TObject);
var
S1: String;
begin
SetLength (S1, 100);
GetWindowText (Handle, PChar (S1), Length (S1));
S1 := S1 + ' is the title'; // this won't work
Button1.Caption := S1;
end;
This program compiles, but when you run it, you are in for a surprise: The Caption of the button will have the original text of the window title, without the text of the constant string you have added to it. The problem is that when Windows writes to the string (within the GetWindowText API call), it doesn't set the length of the long Pascal string properly. Delphi still can use this string for output and can figure out when it ends by looking for the null terminator, but if you append further characters after the null terminator, they will be skipped altogether.
How can we fix this problem? The solution is to tell the system to convert the string returned by the GetWindowText API call back to a Pascal string. However, if you write the following code:
S1 := String (S1);
the system will ignore it, because converting a data type back into itself is a useless operation. To obtain the proper long Pascal string, you need to recast the string to a PChar and let Delphi convert it back again properly to a string:
S1 := String (PChar (S1));
Actually, you can skip the string conversion, because PChar-to-string conversions are automatic in Delphi. Here is the final code:
procedure TForm1.Button3Click(Sender: TObject);
var
S1: String;
begin
SetLength (S1, 100);
GetWindowText (Handle, PChar (S1), Length (S1));
S1 := String (PChar (S1));
S1 := S1 + ' is the title';
Button3.Caption := S1;
end;
An alternative is to reset the length of the Delphi string, using the length of the PChar string, by writing:
SetLength (S1, StrLen (PChar (S1)));
I found a Windows API function that performs "natural comparison" of strings. It is defined as follows:
int StrCmpLogicalW(
LPCWSTR psz1,
LPCWSTR psz2
);
To use it in Delphi, I declared it this way:
interface
function StrCmpLogicalW(psz1, psz2: PWideChar): integer; stdcall;
implementation
function StrCmpLogicalW; external 'shlwapi.dll' name 'StrCmpLogicalW';
Because it compares Unicode strings, I'm not sure how to call it when I want to compare ANSI strings. It seems to be enough to cast strings to WideString and then to PWideChar, however, I have no idea whether this approach is correct:
function AnsiNaturalCompareText(const S1, S2: string): integer;
begin
Result := StrCmpLogicalW(PWideChar(WideString(S1)), PWideChar(WideString(S2)));
end;
I know very little about character encoding so this is the reason of my question. Is this function OK or should I first convert both the compared strings somehow?
Keep in mind that casting a string to a WideString will convert it using default system codepage which may or may not be what you need. Typically, you'd want to use current user's locale.
From WCharFromChar in System.pas:
Result := MultiByteToWideChar(DefaultSystemCodePage, 0, CharSource, SrcBytes,
WCharDest, DestChars);
You can change DefaultSystemCodePage by calling SetMultiByteConversionCodePage.
The easier way to accomplish the task would be to declare your function as:
interface
function StrCmpLogicalW(const sz1, sz2: WideString): Integer; stdcall;
implementation
function StrCmpLogicalW; external 'shlwapi.dll' name 'StrCmpLogicalW';
Because a WideString variable is a pointer to a WideChar (in the same way an AnsiString variable is a pointer to an AnsiChar.)
And this way Delphi will automatically "up-convert" an AnsiString to a WideString for you.
Update
And since we're now in the world of UnicodeString, you would make it:
interface
function StrCmpLogicalW(const sz1, sz2: UnicodeString): Integer; stdcall;
implementation
function StrCmpLogicalW; external 'shlwapi.dll' name 'StrCmpLogicalW';
Because a UnicodeString variable is still a pointer to a \0\0 terminated string of WideChars. So if you call:
var
s1, s1: AnsiString;
begin
s1 := 'Hello';
s2 := 'world';
nCompare := StrCmpLogicalW(s1, s2);
end;
When you try to pass an AnsiString into a function that takes a UnicodeString, the compiler will automatically call MultiByteToWideChar for you in the generated code.
CompareString supports numeric sorting in Windows 7
Starting in Windows 7, Microsoft added SORT_DIGITSASNUMBERS to CompareString:
Windows 7: Treat digits as numbers during sorting, for example, sort "2" before "10".
None of this helps answer the actual question, which deals with when you have to convert or cast strings.
There might be an ANSI variant for your function to (I haven't checked). Most Wide API's are available as an ANSI version too, just change the W suffix to an A, and you're set. Windows does the back-and-forth conversion transparantly for you in that case.
PS: Here's an article describing the lack of StrCmpLogicalA : http://blogs.msdn.com/joshpoley/archive/2008/04/28/strcmplogicala.aspx
Use System.StringToOleStr, which is a handy wrapper around MultiByteToWideChar, see Gabr's answer:
function AnsiNaturalCompareText(const S1, S2: string): integer;
var
W1: PWideChar;
W2: PWideChar;
begin
W1 := StringToOleStr(S1);
W2 := StringToOleStr(S2);
Result := StrCmpLogicalW(W1, W2);
SysFreeString(W1);
SysFreeString(W2);
end;
But then, Ian Boyd's solution looks and is much nicer!