Is there an ANSI version for Copy? - delphi

Under Delphi XE, is there an ANSI version for Copy?
I am using Copy a lot to copy pieces of a ANSI strings.

Altar the Copy function in Delphi is a intrinsic function this means which is handled by the compiler rather than the run-time library. depending of the parameters passed this function call the LStrCopy or a UStrCopy internal functions
check this sample :
{$APPTYPE CONSOLE}
uses
SysUtils;
Var
s : AnsiString;
u : string;
begin
try
s:='this is a ansi string';
s:= Copy(s,1,5);
Writeln(s);
u:='this is a unicode string';
u:= Copy(u,1,5);
Writeln(u);
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
Readln;
end.
Now check the assembly code
Project91.dpr.12: s:='this is a ansi string';
004111DC B8787E4100 mov eax,$00417e78
004111E1 BA04134100 mov edx,$00411304
004111E6 E8314FFFFF call #LStrAsg
Project91.dpr.13: s:= Copy(s,1,5);
004111EB 68787E4100 push $00417e78
004111F0 B905000000 mov ecx,$00000005
004111F5 BA01000000 mov edx,$00000001
004111FA A1787E4100 mov eax,[$00417e78]
004111FF E8A050FFFF call #LStrCopy //call the ansi version of copy
Project91.dpr.14: Writeln(s);
00411204 A1EC2C4100 mov eax,[$00412cec]
00411209 8B15787E4100 mov edx,[$00417e78]
0041120F E84033FFFF call #Write0LString
00411214 E8DF33FFFF call #WriteLn
00411219 E8D22AFFFF call #_IOTest
Project91.dpr.15: u:='this is a unicode string';
0041121E B87C7E4100 mov eax,$00417e7c
00411223 BA28134100 mov edx,$00411328
00411228 E8534EFFFF call #UStrAsg
Project91.dpr.16: u:= Copy(u,1,5);
0041122D 687C7E4100 push $00417e7c
00411232 B905000000 mov ecx,$00000005
00411237 BA01000000 mov edx,$00000001
0041123C A17C7E4100 mov eax,[$00417e7c]
00411241 E8C654FFFF call #UStrCopy //call the unicode version of copy
Project91.dpr.17: Writeln(u);
00411246 A1EC2C4100 mov eax,[$00412cec]

Copy is a "compiler magic" routine, it is handled intrinsically by the compiler depending on what parameters you pass it (ANSI string, string, or dynamic array). You can just use Copy; it will work correctly with ANSI strings.

I have the same problem, see this code:
const
TheStart=13;
TheEnd=69;
type
TMyFileField: Array[TheStart..TheEnd] of Char; // This is a simplification of a field type on a file
procedure WriteWideStringToArrayOfChars(TheLiteral:WideString);
var
MyFileField:TMyFileField; // This is a simplification, it is really a Field inside a File
MyIndex:Integer;
begin
for MyIndex:=1 to Max(Length(TheLiteral),1+TheEnd-TheStart)
do begin // Will copy as many charactes as possible from TheLiteral to MyFileField
MyFileField[MyIndex]:=Copy(TheLiteral,MyIndex,1)[1]; // This gives Copile Error: Incompatible types 'Char' and 'WideChar'
end;
end;
The problem is that the WideString must be saved onto am Array of Char inside a file. So mix types must be done... and so, some loose of Unicode chars will occur, no way to avoid it.
The wanted: The compiler can compile it.
Solution1: Convert WideString to String prior to call Copy, or inside Copy.
Solution2: Convert WideChar to Char prior to assing.
Here are both solutions (remember some unicode chars could get lost)...
Solution1:
MyFileField[MyIndex]:=Copy(UTF8Encode(TheLiteral),MyIndex,1)[1]; // Note: Unicode chars will not get lost, but converted, so beware of accent vocals, etc...
or
MyFileField[MyIndex]:=Copy(String(TheLiteral),MyIndex,1)[1]; // Note: Unicode chars will get lost, they will be converted to '?'
Solution2:
MyFileField[MyIndex]:=Char(Copy(TheLiteral,MyIndex,1)[1]); // Note: Unicode chars will get lost, they will be converted to '?'
If anyone knows anthing better i would be glad to know.
I personally use Copy(String(Literal),Start,NumberOfChars) since normal accent letters are conserved and more important, length...
Example: Length(String('BlaBlaBlá')) -> 9
Example: Length(UTF8Encode('BlaBlaBlá')) -> More than 9 since the last 'á' is converted to multiple chars, etc...
Hope this can help someone!

I did my own function. It could be useful and get right results instead of Copy(string(ansistr), i, l) on the Linux platform:
function AnsiCopy(const s: ansistring; StartIndex, Lenght: integer): ansistring;
begin
SetLength(Result, Lenght);
Move(s[StartIndex], Result[1], Lenght);
end;

Related

Insert an emoji inside a string in Delphi 2007 [duplicate]

This question already has answers here:
Handling a Unicode String in Delphi Versions <= 2007
(5 answers)
Closed 5 years ago.
I'm trying to do exactly what the title say, insert an emoji into a string in Delphi 2007, just like the example below :
procedure TForm1.Button1Click(Sender: TObject);
var s : string;
begin
s := 'This is my original string (y)';
s := ansireplacestr(s,'(y)','👍');
showmessage(s);
end;
I can even paste the emoji into IDE's code, but in runtime showmessage results in this :
This is my original string ????
Is there a way to achieve this task in Delphi 2007 ? Due to several reasons i can't upgrade Delphi right now.
Someone said my question is solved on this topic :
Handling a Unicode String in Delphi Versions <= 2007
But this topic just says to use third-party components, without telling exactly how to do it.
EDIT : After suggested, i tried to use the functions pos, delete and insert and a widestring var :
function addEmoji(mystring : widestring) : widestring;
var r, aux : widestring;
p : integer;
begin
r := mystring;
while pos('(y)',r) > 0 do
begin
aux := r;
p := pos('(y)',aux);
Insert('👍',aux,p);
delete(aux,pos('(y)',aux),3);
r := aux;
end;
result := r;
end;
But the result is the '(y)' replaced by '????'.
In Delphi 2007, the default string type is AnsiString. Emojis require Unicode handling, as they use high Unicode codepoints that simply do not fit/exist in most commonly used Ansi encodings. So you need to use a Unicode UTF encoding instead (UTF-7, -8, -16, or -32).
You can use AnsiString for UTF-71, or UTF8String2 for UTF-8, or WideString for UTF-16, or UCS4String3 for UTF-32.
1: UTF-7 is a 7-bit ASCII compatible encoding.
2: UTF8String does exist in Delphi 2007 (it was introduced in Delphi 6), but it is not a true UTF-8 string type, it is just an alias for AnsiString with the expectation that it always holds UTF-8 encoded data. You have to use UTF8Encode() and UTF8Decode() to ensure proper conversions to other encodings via UTF-16. UTF8String did not become a true UTF-8 string type until Delphi 2009 (UTF8Encode() and UTF8Decode() were also deprecated).
3: UCS4String also exists since Delphi 6, but it is not a true string type at all (even in modern Delphi versions). It is just an alias for array of UCS4Char.
The RTL doesn't have any native support for UTF-7 (but it is not hard to implement manually), and very little support for UTF-32 (only to facilitate conversions between UTF-16 <-> UTF-32), so you should stick with UTF-8 or UTF-16 in your code.
You are going to lose Emoji data if you convert UTF data to Ansi, such as if you pass a WideString to ShowMessage(). You can pass a WideString to the Win32 API MessageBoxW() function instead, and you won't have any data loss, however the Emoji may or may not appear correctly depending on the font used by the dialog (but it won't appear as ??, at least).
However, the native RTL in Delphi 2007 simply does not support what you are attempting, at least not for UTF-16. You would have to find a 3rd party WideString-based function, or just write your own using the RTL's Pos(), Delete(), and Insert() intrinsic functions, which are overloaded for WideString data, eg:
function WideReplaceStr(const S, FromText, ToText: WideString): WideString;
var
I: Integer;
begin
Result := S;
repeat
I := Pos(FromText, Result);
if I = 0 then Break;
Delete(Result, I, Length(FromText));
Insert(ToText, Result, I);
until False;
end;
var
s : WideString;
begin
s := 'This is my original string (y)';
s := WideReplaceStr(s, '(y)', '👍');
MessageBoxW(0, PWideChar(s), '', MB_OK);
end;
However, using UTF-8, you can accomplish the same thing using the native RTL, but you still can't use ShowMessage() (well, you could, but it won't show non-ASCII characters correctly):
var
s : UTF8String;
begin
s := UTF8Encode('This is my original string (y)');
s := AnsiReplaceStr(s, '(y)', UTF8Encode('👍'));
MessageBoxW(0, PWideChar(UTF8Decode(s)), '', MB_OK);
end;
Either way, make sure your code editor is set to save the .pas file in UTF-8, otherwise you can't use the literal '👍', you would have to use something more like this instead:
var
Emoji: WideString;
SetLength(Emoji, 2);
Emoji[1] := WideChar($D83D);
Emoji[2] := WideChar($DC4D);
Then you can do this:
var s: WideString;
...
s := WideReplaceStr(s, '(y)', Emoji);
Or:
var s: UTF8String;
...
s := AnsiReplaceStr(s, '(y)', UTF8Encode(Emoji));

delphi modify pchar at runtime

I need to modify pchar string at runtime.
Help me with this code:
var
s:pChar;
begin
s:='123123';
s[0]:=#32; // SO HERE I HAVE EXCEPTION !!!
end.
Now i have exception in Delphi 7 !
My project is not using native pascal strings (no any windows.pas classes and others)
String literals are read only and cannot be modified. Hence the runtime error. You'll need to use a variable.
var
S: array[0..6] of Char;
....
// Populate S with your own library function
S[0] := #32;
Since you aren't using the Delphi runtime library you'll need to come up with your own functions to populate character arrays. For instance, you can write your own StrLen, StrCopy etc. You'll want to make versions that are passed destination buffer lengths to ensure that you don't overrun said buffers.
Of course, not using the built in string type will be inconvenient. You might need to come up with something a little more powerful than ad hoc character arrays.
You can:
procedure StrCopy(destination, source: PChar);
begin
// Iterate source until you find #0
// and copy all characters to destination.
// Remember to allocate proper amount of memory
// (length of source string and a null terminator)
// for destination before StrCopy() call
end;
var
str: array[0..9] of Char;
begin
StrCopy(str, '123123');
s[0]:=#32;
end.

Cast from RawByteString to string does automatically invoke UTF8Decode?

I want to store arbitary binary data as BLOB into a SQlite database.
The data will be added as value with this function:
procedure TSQLiteDatabase.AddParamText(name: string; value: string);
Now I want to convert a WideString into its UTF8 representation, so it can be stored to the database. After calling UTF8Encode and storing the result into the database, I noticed that the data inside the database is not UTF8 decoded. Instead, it is encoded as AnsiString in my computer's locale.
I ran following test to check what happened:
type
{$IFDEF Unicode}
TBinary = RawByteString;
{$ELSE}
TBinary = AnsiString;
{$ENDIF}
procedure TForm1.Button1Click(Sender: TObject);
var
original: WideString;
blob: TBinary;
begin
original := 'ä';
blob := UTF8Encode(original);
// Delphi 6: ä (as expected)
// Delphi XE4: ä (unexpected! How did it do an automatic UTF8Decode???)
ShowMessage(blob);
end;
After the character "ä" has been converted to UTF8, the data is correct in memory ("ä"), however, as soon as I pass the TBinary value to a function (as string or AnsiString), Delphi XE4 does a "magic typecast" invoking UTF8Decode for some reason I don't know.
I have already found a workaround to avoid this:
function RealUTF8Encode(AInput: WideString): TBinary;
var
tmp: TBinary;
begin
tmp := UTF8Encode(AInput);
SetLength(result, Length(tmp));
CopyMemory(#result[1], #tmp[1], Length(tmp));
end;
procedure TForm1.Button2Click(Sender: TObject);
var
original: WideString;
blob: TBinary;
begin
original := 'ä';
blob := RealUTF8Encode(original);
// Delphi 6: ä (as expected)
// Delphi XE4: ä (as expected)
ShowMessage(blob);
end;
However, this workaround with RealUTF8Encode looks dirty to me and I would like to understand why a simple call of UTF8Encode did not work and if there is a better solution.
In Ansi versions of Delphi (prior to D2009), UTF8Encode() returns a UTF-8 encoded AnsiString. In Unicode versions (D2009 and later), it returns a UTF-8 encoded RawByteString with a code page of CP_UTF8 (65001) assigned to it.
In Ansi versions, ShowMessage() takes an AnsiString as input, and the UTF-8 string is an AnsiString, so it gets displayed as-is. In Unicode versions, ShowMessage() takes a UTF-16 encoded UnicodeString as input, so the UTF-8 encoded RawByteString gets converted to UTF-16 using its assigned CP-UTF8 code page.
If you actually wrote the blob data directly to the database you would find that it may or may not be UTF-8 encoded, depending on how you are writing it. But your approach is wrong; the use of RawByteString is incorrect in this situation. RawByteString is meant to be used as a procedure parameter only. Do not use it as a local variable. That is the source of your problem. From the documentation:
The purpose of RawByteString is to reduce the need for multiple
overloads of procedures that read string data. This means that
parameters of routines that process strings without regard for the
string's code page should typically be of type RawByteString.
RawByteString should only be used as a parameter type, and only in
routines which otherwise would need multiple overloads for AnsiStrings
with different codepages. Such routines need to be written with care
for the actual codepage of the string at run time.
For Unicode versions of Delphi, instead of RawByteString, I would suggest that you use TBytes to hold your UTF-8 data, and encode it with TEncoding:
var
utf8: TBytes;
str: string;
...
str := ...;
utf8 := TEncoding.UTF8.GetBytes(str);
You are looking for a data type that does not perform implicit text encodings when passed around, and TBytes is that type.
For Ansi versions of Delphi, you can use AnsiString, WideString and UTF8Encode exactly as you do.
Personally however, I would recommend using TBytes consistently for your UTF-8 data. So if you need a single code base that supports Ansi and Unicode compilers (ugh!) then you should create some helpers:
{$IFDEF Unicode}
function GetUTF8Bytes(const Value: string): TBytes;
begin
Result := TEncoding.UTF8.GetBytes(Value);
end;
{$ELSE}
function GetUTF8Bytes(const Value: WideString): TBytes;
var
utf8str: UTF8String;
begin
utf8str := UTF8Encode(Value);
SetLength(Result, Length(utf8str));
Move(Pointer(utf8str)^, Pointer(Result)^, Length(utf8str));
end;
{$ENDIF}
The Ansi version incurs more heap allocations than are necessary. You might well choose to write a more efficient helper that calls WideCharToMultiByte() directly.
In Unicode versions of Delphi, if for some reason you don't want to use TBytes for UTF-8 data, you can use UTF8String instead. This is a special AnsiString that always uses the CP_UTF8 code page. You can then write:
var
utf8: UTF8String;
str: string;
....
utf8 := str;
and the compiler will convert from UTF-16 to UTF-8 behind the scenes for you. I would not recommend this though, because it is not supported on mobile platforms, or in Ansi versions of Delphi (UTF8String has existed since Delphi 6, but it was not a true UTF-8 string until Delphi 2009). That is, amongst other reasons, why I suggest that you use TBytes. My philosophy is, at least in the Unicode age, that there is the native string type, and any other encoding should be held in TBytes.

Why cant we do PChar('*') in Delphi 7?

I wrote this code and am getting AV in it.
procedure TForm1.Button1Click(Sender: TObject);
Var
C : Pchar;
s : string;
begin
c:= PChar('*');
s := string(c); // AV here , but code works if i put C:= PChar('**')
ShowMessage(c);
end;
I could not figure out why .Does anybody know ?
Thanks in advance.
With a one-character string literal, you're type casting a Char, not a string, so it's not a pointer. When you cast it back, it's still not really a pointer, despite its declared type, so it can't be converted to a string.
If you find yourself type casting a string literal, you're probably doing something unnecessary. Although you can give it a hint which type it should use, as the other answers here demonstrate, the compiler already detects which type a literal needs to have, and it's usually correct. Just assign the literal directly to the variable without any casting.
If you omit the type cast entirely, your code would work equally well for whatever length string you want:
// All PChar assignments, no casting
c := '';
c := '*';
c := '**';
Furthermore, the cast back to string is unnecessary as well. You can directly assign a PChar, and the compiler will perform the conversion automatically:
s := c;
In fact,
c:= PChar('*');
is compiled as
mov [c],$0000002a
as if it was written:
c:= PChar(ord('*'));
Since ord('*')=$2a, it appears that the '*' character is type-cast to an integer (NativeInt), then this integer is converted to a pointer. So when you try to access the c content, you access the memory address $0000002a, which is invalid, and triggers an access violation.
When you compile:
c:= PChar('**');
It is generated as
mov eax,$00548984
mov [c],eax
In this case, a constant #0-ended text buffer (and not a Delphi string) is generated by the compiler within the executable, and c is set to its address.
The fact that PChar('*') does not behave the same is one "optimization" of the char type, which can be typecasted to an integer.
But I understand it may be confusing.
If you want just a pointer to a single '*', you can write either:
c:=PChar('*'#0);
c:=PChar(string('*'));
Which will work as expected, since both will by-pass the cast to the character ordinal value.
AV means wrong work with memory. Get data from nowhere or write to nowhere.
Problem goes from different types of data.
'*'
is Char, but
'**'
is string
This will work fine with your code:
procedure TForm1.Button1Click(Sender: TObject);
Var
C : Pchar;
s : string;
begin
c:= PChar(string('*'));
s := string(c); // AV here , but code works if i put C:= PChar('**')
ShowMessage(c);
end;

Delphi Unicode and Console

I am writing a C# application that interfaces with a pair of hardware sensors. Unfortunately the only interface that is exposed on the devices requires a provided dll written in Delphi.
I am writing a Delphi executable wrapper that takes calls the necessary functions for the DLL and returns the sensor data over stout. However, the return type of this data is a PWideChar (or PChar) and I have been unable to convert it to ansi for printing on command line.
If I directly pass the data to WriteLn, I get '?' for each character. If I look through the array of characters and attempt to print them one at a time with an Ansi Conversion, only a few of the characters print (they do confirm the data though) and they will often print out of order. (printing with the index exposed simply jumps around.) I also tried converting the PWideChar's to integer straight: 'I' corresponds to 21321. I could potentially figure out all the conversions, but some of the data has a multitude of values.
I am unsure of what version of Delphi the dll uses, but I believe it is 4. Definately prior to 7.
Any help is appreciated!
TLDR: Need to convert UTF-16 PWideChar to AnsiString for printing.
Example application:
program SensorReadout;
{$APPTYPE CONSOLE}
{$R *.res}
uses
Windows,
SysUtils,
dllFuncUnit in 'dllFuncUnit.pas'; //This is my dll interface.
var state: integer;
answer: PChar;
I: integer;
J: integer;
output: string;
ch: char;
begin
try
for I := 0 to 9 do
begin
answer:= GetDeviceChannelInfo_HSI(1, Ord('a'), I, state); //DLL Function, returns a PChar with the results. See example in comments.
if state = HSI_NO_ERRORCODE then
begin
output:= '';
for J := 0 to length(answer) do
begin
ch:= answer[J]; //This copies the char. Originally had an AnsiChar convert here.
output:= output + ch;
end;
WriteLn(output);
end;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
ReadLn(I);
end.`
The issue was PAnsiChar needed to be the return type of the function sourced from the DLL.
To convert PWideChar to AnsiString:
function WideCharToAnsiString(P: PWideChar): AnsiString;
begin
Result := P;
end;
The code converts from UTF-16, null-terminated PWideChar to AnsiString. If you are getting question marks in the output then either your input is not UTF-16, or it contains characters that cannot be encoded in your ANSI codepage.
My guess is that what is actually happening is that your Delphi DLL was created with a pre-Unicode Delphi and so uses ANSI text. But now you are trying to link to it from a post-Unicode Delphi where PChar has a different meaning. I'm sure Rob explained this to you in your other question. So you can simply fix it by declaring your DLL import to return PAnsiChar rather than PChar. Like this:
function GetDeviceChannelInfo_HSI(PortNumber, Address, ChNumber: Integer;
var State: Integer): PAnsiChar; stdcall; external DLL_FILENAME;
And when you have done this you can assign to a string variable in a similar vein as I describe above.
What you need to absorb is that in older versions of Delphi, PChar is an alias for PAnsiChar. In modern Delphi it is an alias for PWideChar. That mismatch would explain everything that you report.
It does occur to me that writing a Delphi wrapper to the DLL and communicating via stdout with your C# app is a very roundabout approach. I'd just p/invoke the DLL directly from the C# code. You seem to think that this is not possible, but it is quite simple.
[DllImport(#"mydll.dll")]
static extern IntPtr GetDeviceChannelInfo_HSI(
int PortNumber,
int Address,
int ChNumber,
ref int State
);
Call the function like this:
IntPtr ptr = GetDeviceChannelInfo_HSI(Port, Addr, Channel, ref State);
If the function returns a UTF-16 string (which seems doubtful) then you can convert the IntPtr like this:
string str = Marshal.PtrToStringUni(ptr);
Or if it is actually an ANSI string which seems quite likely to me then you do it like this:
string str = Marshal.PtrToStringAnsi(ptr);
And then of course you'll want to call into your DLL to deallocate the string pointer that was returned to you, assuming it was allocated on the heap.
Changed my mind on the comment - I'll make it an answer:)
According to that code if "state" is a code <> HSI_NO_ERRORCODE and there is no exception then it will write the uninitialised string "output" to the console. Which could be anything including accidentally showing "S" and "4" and a series of 1 or more question marks
type of answer(variable) is PChar. use length function good for string variable.
use strlen instead of length.
for J := 0 to StrLen(answer)-1 do
also accessible range of PChar(char *) is 0..n-1
To convert UTF-16 PWideChar to AnsiString you can use simple cast:
var
WStr: WideString;
pWStr: PWideString;
AStr: AnsiString;
begin
WStr := 'test';
pWStr := PWideString(WStr);
AStr := AnsiString(WideString(pWStr));
end;

Resources