Cast from RawByteString to string does automatically invoke UTF8Decode? - delphi

I want to store arbitary binary data as BLOB into a SQlite database.
The data will be added as value with this function:
procedure TSQLiteDatabase.AddParamText(name: string; value: string);
Now I want to convert a WideString into its UTF8 representation, so it can be stored to the database. After calling UTF8Encode and storing the result into the database, I noticed that the data inside the database is not UTF8 decoded. Instead, it is encoded as AnsiString in my computer's locale.
I ran following test to check what happened:
type
{$IFDEF Unicode}
TBinary = RawByteString;
{$ELSE}
TBinary = AnsiString;
{$ENDIF}
procedure TForm1.Button1Click(Sender: TObject);
var
original: WideString;
blob: TBinary;
begin
original := 'ä';
blob := UTF8Encode(original);
// Delphi 6: ä (as expected)
// Delphi XE4: ä (unexpected! How did it do an automatic UTF8Decode???)
ShowMessage(blob);
end;
After the character "ä" has been converted to UTF8, the data is correct in memory ("ä"), however, as soon as I pass the TBinary value to a function (as string or AnsiString), Delphi XE4 does a "magic typecast" invoking UTF8Decode for some reason I don't know.
I have already found a workaround to avoid this:
function RealUTF8Encode(AInput: WideString): TBinary;
var
tmp: TBinary;
begin
tmp := UTF8Encode(AInput);
SetLength(result, Length(tmp));
CopyMemory(#result[1], #tmp[1], Length(tmp));
end;
procedure TForm1.Button2Click(Sender: TObject);
var
original: WideString;
blob: TBinary;
begin
original := 'ä';
blob := RealUTF8Encode(original);
// Delphi 6: ä (as expected)
// Delphi XE4: ä (as expected)
ShowMessage(blob);
end;
However, this workaround with RealUTF8Encode looks dirty to me and I would like to understand why a simple call of UTF8Encode did not work and if there is a better solution.

In Ansi versions of Delphi (prior to D2009), UTF8Encode() returns a UTF-8 encoded AnsiString. In Unicode versions (D2009 and later), it returns a UTF-8 encoded RawByteString with a code page of CP_UTF8 (65001) assigned to it.
In Ansi versions, ShowMessage() takes an AnsiString as input, and the UTF-8 string is an AnsiString, so it gets displayed as-is. In Unicode versions, ShowMessage() takes a UTF-16 encoded UnicodeString as input, so the UTF-8 encoded RawByteString gets converted to UTF-16 using its assigned CP-UTF8 code page.
If you actually wrote the blob data directly to the database you would find that it may or may not be UTF-8 encoded, depending on how you are writing it. But your approach is wrong; the use of RawByteString is incorrect in this situation. RawByteString is meant to be used as a procedure parameter only. Do not use it as a local variable. That is the source of your problem. From the documentation:
The purpose of RawByteString is to reduce the need for multiple
overloads of procedures that read string data. This means that
parameters of routines that process strings without regard for the
string's code page should typically be of type RawByteString.
RawByteString should only be used as a parameter type, and only in
routines which otherwise would need multiple overloads for AnsiStrings
with different codepages. Such routines need to be written with care
for the actual codepage of the string at run time.
For Unicode versions of Delphi, instead of RawByteString, I would suggest that you use TBytes to hold your UTF-8 data, and encode it with TEncoding:
var
utf8: TBytes;
str: string;
...
str := ...;
utf8 := TEncoding.UTF8.GetBytes(str);
You are looking for a data type that does not perform implicit text encodings when passed around, and TBytes is that type.
For Ansi versions of Delphi, you can use AnsiString, WideString and UTF8Encode exactly as you do.
Personally however, I would recommend using TBytes consistently for your UTF-8 data. So if you need a single code base that supports Ansi and Unicode compilers (ugh!) then you should create some helpers:
{$IFDEF Unicode}
function GetUTF8Bytes(const Value: string): TBytes;
begin
Result := TEncoding.UTF8.GetBytes(Value);
end;
{$ELSE}
function GetUTF8Bytes(const Value: WideString): TBytes;
var
utf8str: UTF8String;
begin
utf8str := UTF8Encode(Value);
SetLength(Result, Length(utf8str));
Move(Pointer(utf8str)^, Pointer(Result)^, Length(utf8str));
end;
{$ENDIF}
The Ansi version incurs more heap allocations than are necessary. You might well choose to write a more efficient helper that calls WideCharToMultiByte() directly.
In Unicode versions of Delphi, if for some reason you don't want to use TBytes for UTF-8 data, you can use UTF8String instead. This is a special AnsiString that always uses the CP_UTF8 code page. You can then write:
var
utf8: UTF8String;
str: string;
....
utf8 := str;
and the compiler will convert from UTF-16 to UTF-8 behind the scenes for you. I would not recommend this though, because it is not supported on mobile platforms, or in Ansi versions of Delphi (UTF8String has existed since Delphi 6, but it was not a true UTF-8 string until Delphi 2009). That is, amongst other reasons, why I suggest that you use TBytes. My philosophy is, at least in the Unicode age, that there is the native string type, and any other encoding should be held in TBytes.

Related

How to convert ANSI string filename to windows filename [duplicate]

This line:
TFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
drops an exception if the filename contain something like ñ
You are, ultimately calling CreateFileA, the ANSI API, and the characters you use have no ANSI encoding. The only way to get beyond this is to open the file with CreateFileW, the Unicode API.
You might not realise that you call CreateFileA, but that's how the Delphi 7 file stream is implemented.
One easy way to solve your problems is to upgrade to the latest Delphi which has good support for the native Windows Unicode API.
If you are stuck with ANSI Delphi then you still need to call CreateFileW. You can do this to create a file handle. You'll need to pass a UTF-16 string to that API. Use WideString to store it. You'll also need to get the filename from the user in UTF-16 form. Which means a call to GetOpenFileNameW or IFileDialog. Create a stream by passing the file handle to THandleStream.
To make all this possible you would use the TNT Unicode libraries. They work well but will impose a big port on you.
Frankly, the right way forward is to use modern tools that support Unicode.
You can use the TntUnicode units to have UTF8 support under Delphi 7.
Add TntClasses to your Uses and make the call like this:
TTntFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
Make sure that fileName is widestring.
Here you can get a copy of TntUnicode:
https://github.com/rofl0r/TntUnicode
UTF16 can be thought of as a codepage, just like all of the possible ANSI codepages.
As Remy mentions in his comment, assuming your ANSI codepage supports the required characters in your Unicode string you simply have to convert that Unicode version of that string to the equivalent ANSI codepage version.
The Delphi compiler can take care of a simple conversion for you automatically, which you use simply by casting a WIDEString (UTF16) to an (ANSI)String:
const
WIDE_FILENAME : WIDEString = 'fuññy.txt';
var
sFilename: String;
strm: TFileStream;
begin
sFilename := String(WIDE_FILENAME);
strm := TFileStream.Create(sFilename, fmOpenRead);
// etc
end;
This works perfectly well even on (e.g.) Delphi 7. The only caveat is that the codepage involved (the system default) must support the extended characters in the Unicode string.
NOTE: The above code uses the String type rather than ANSIString explicitly. On Delphi versions where String is ANSIString, this has the required effect but also is portable to versions where String is UnicodeString (should you upgrade your version later).
If you use ANSIString explicitly in this case, the result will be a double conversion if/when you upgrade:
// Unicode compiler using ANSIString type....
var
sFilename: ANSIString;
begin
sFilename := ANSIString(WIDE_FILENAME); // Codepage conversion from UTF16 to ANSI
strm := TFileStream.Create(sFilename, fmOpenRead); // Will implicitly convert *back* from ANSI to WIDE
versus
// Unicode compiler using String type....
var
sFilename: String;
begin
sFilename := String(WIDE_FILENAME); // String type conversion from WideString to UnicodeString
strm := TFileStream.Create(sFilename, fmOpenRead); // No further conversion necessary
Best solution is to go Unicode, but if that is not an option, you can still solve the problem.
In Windows you can set what codepage to use for non-Unicode programs. Just change it to support the correct language (Spanish?). Then the code should work.
Windows 7: Control Panel > Region and Language > Administrative > Language for non-Unicode programs
Windows XP: Control Panel > Regional and Language > Advanced > Language for non-Unicode programs

How to encode binary data with Indy Mime?

I have an Ansi string that I use to store binary data - bytes in 0-255 range (I know it should be a Byte array or so, but it is not much difference between them).
I want to pass this "binary string" through Indy MIME (TIdEncoderMIME.EncodeString / TIdDecoderMIME.DecodeString) and obtain a human-readable ANSI string.
I thought that the output of Encode/DecodeString will be a string that has only ANSI characters in it if I use IndyTextEncoding_8Bit encoding. But I was wrong!
so, how to encode binary data with Indy Mime (something similar to application/octet-stream)?
DONT use AnsiString for binary data!
AnsiString is not an appropriate container for binary data, especially in a Unicode environment like XE7. Use a proper byte container, like T(Id)Bytes or TMemoryStream instead.
You can't pass AnsiString as-is through the TId(Encoder|Decoder)MIME string methods, only UnicodeString, so implicit RTL Ansi<->Unicode conversions are likely to corrupt your binary data. Use the binary-oriented methods instead ((Encode|Decode)Bytes(), (Encode|Decode)Stream()). They exist for a reason.
That being said, Indy 10 does have a TIdMemoryBufferStream class (desktop platforms only), so if you MUST use AnsiString (and you really shouldn't), you can wrap it in a TStream interface without having to make additional copies of data in memory. For example:
var
Binary: AnsiString;
Strm: TIdMemoryBufferStream;
Base64: String;
begin
Binary := ...; // binary data
Strm := TIdMemoryBufferStream.Create(PAnsiChar(Binary), Length(Binary));
try
Base64 := TIdEncoderMIME.EncodeStream(Strm);
finally
Strm.Free;
end;
// use Base64 as needed...
end;
var
Base64: String;
Strm: TIdMemoryBufferStream;
Binary: AnsiString;
begin
Base64 := ...; // encoded data
SetLength(Binary, (Length(Base64) div 4) * 3);
Strm := TIdMemoryBufferStream.Create(PAnsiChar(Binary), Length(Binary));
try
TIdDecoderMIME.DecodeStream(Base64, Strm);
SetLength(Binary, Strm.Size);
SetCodePage(PRawByteString(#Binary)^, 28591, False);
finally
Strm.Free;
end;
// use Binary as needed...
end;

Insert an emoji inside a string in Delphi 2007 [duplicate]

This question already has answers here:
Handling a Unicode String in Delphi Versions <= 2007
(5 answers)
Closed 5 years ago.
I'm trying to do exactly what the title say, insert an emoji into a string in Delphi 2007, just like the example below :
procedure TForm1.Button1Click(Sender: TObject);
var s : string;
begin
s := 'This is my original string (y)';
s := ansireplacestr(s,'(y)','👍');
showmessage(s);
end;
I can even paste the emoji into IDE's code, but in runtime showmessage results in this :
This is my original string ????
Is there a way to achieve this task in Delphi 2007 ? Due to several reasons i can't upgrade Delphi right now.
Someone said my question is solved on this topic :
Handling a Unicode String in Delphi Versions <= 2007
But this topic just says to use third-party components, without telling exactly how to do it.
EDIT : After suggested, i tried to use the functions pos, delete and insert and a widestring var :
function addEmoji(mystring : widestring) : widestring;
var r, aux : widestring;
p : integer;
begin
r := mystring;
while pos('(y)',r) > 0 do
begin
aux := r;
p := pos('(y)',aux);
Insert('👍',aux,p);
delete(aux,pos('(y)',aux),3);
r := aux;
end;
result := r;
end;
But the result is the '(y)' replaced by '????'.
In Delphi 2007, the default string type is AnsiString. Emojis require Unicode handling, as they use high Unicode codepoints that simply do not fit/exist in most commonly used Ansi encodings. So you need to use a Unicode UTF encoding instead (UTF-7, -8, -16, or -32).
You can use AnsiString for UTF-71, or UTF8String2 for UTF-8, or WideString for UTF-16, or UCS4String3 for UTF-32.
1: UTF-7 is a 7-bit ASCII compatible encoding.
2: UTF8String does exist in Delphi 2007 (it was introduced in Delphi 6), but it is not a true UTF-8 string type, it is just an alias for AnsiString with the expectation that it always holds UTF-8 encoded data. You have to use UTF8Encode() and UTF8Decode() to ensure proper conversions to other encodings via UTF-16. UTF8String did not become a true UTF-8 string type until Delphi 2009 (UTF8Encode() and UTF8Decode() were also deprecated).
3: UCS4String also exists since Delphi 6, but it is not a true string type at all (even in modern Delphi versions). It is just an alias for array of UCS4Char.
The RTL doesn't have any native support for UTF-7 (but it is not hard to implement manually), and very little support for UTF-32 (only to facilitate conversions between UTF-16 <-> UTF-32), so you should stick with UTF-8 or UTF-16 in your code.
You are going to lose Emoji data if you convert UTF data to Ansi, such as if you pass a WideString to ShowMessage(). You can pass a WideString to the Win32 API MessageBoxW() function instead, and you won't have any data loss, however the Emoji may or may not appear correctly depending on the font used by the dialog (but it won't appear as ??, at least).
However, the native RTL in Delphi 2007 simply does not support what you are attempting, at least not for UTF-16. You would have to find a 3rd party WideString-based function, or just write your own using the RTL's Pos(), Delete(), and Insert() intrinsic functions, which are overloaded for WideString data, eg:
function WideReplaceStr(const S, FromText, ToText: WideString): WideString;
var
I: Integer;
begin
Result := S;
repeat
I := Pos(FromText, Result);
if I = 0 then Break;
Delete(Result, I, Length(FromText));
Insert(ToText, Result, I);
until False;
end;
var
s : WideString;
begin
s := 'This is my original string (y)';
s := WideReplaceStr(s, '(y)', '👍');
MessageBoxW(0, PWideChar(s), '', MB_OK);
end;
However, using UTF-8, you can accomplish the same thing using the native RTL, but you still can't use ShowMessage() (well, you could, but it won't show non-ASCII characters correctly):
var
s : UTF8String;
begin
s := UTF8Encode('This is my original string (y)');
s := AnsiReplaceStr(s, '(y)', UTF8Encode('👍'));
MessageBoxW(0, PWideChar(UTF8Decode(s)), '', MB_OK);
end;
Either way, make sure your code editor is set to save the .pas file in UTF-8, otherwise you can't use the literal '👍', you would have to use something more like this instead:
var
Emoji: WideString;
SetLength(Emoji, 2);
Emoji[1] := WideChar($D83D);
Emoji[2] := WideChar($DC4D);
Then you can do this:
var s: WideString;
...
s := WideReplaceStr(s, '(y)', Emoji);
Or:
var s: UTF8String;
...
s := AnsiReplaceStr(s, '(y)', UTF8Encode(Emoji));

Delphi Unicode and Console

I am writing a C# application that interfaces with a pair of hardware sensors. Unfortunately the only interface that is exposed on the devices requires a provided dll written in Delphi.
I am writing a Delphi executable wrapper that takes calls the necessary functions for the DLL and returns the sensor data over stout. However, the return type of this data is a PWideChar (or PChar) and I have been unable to convert it to ansi for printing on command line.
If I directly pass the data to WriteLn, I get '?' for each character. If I look through the array of characters and attempt to print them one at a time with an Ansi Conversion, only a few of the characters print (they do confirm the data though) and they will often print out of order. (printing with the index exposed simply jumps around.) I also tried converting the PWideChar's to integer straight: 'I' corresponds to 21321. I could potentially figure out all the conversions, but some of the data has a multitude of values.
I am unsure of what version of Delphi the dll uses, but I believe it is 4. Definately prior to 7.
Any help is appreciated!
TLDR: Need to convert UTF-16 PWideChar to AnsiString for printing.
Example application:
program SensorReadout;
{$APPTYPE CONSOLE}
{$R *.res}
uses
Windows,
SysUtils,
dllFuncUnit in 'dllFuncUnit.pas'; //This is my dll interface.
var state: integer;
answer: PChar;
I: integer;
J: integer;
output: string;
ch: char;
begin
try
for I := 0 to 9 do
begin
answer:= GetDeviceChannelInfo_HSI(1, Ord('a'), I, state); //DLL Function, returns a PChar with the results. See example in comments.
if state = HSI_NO_ERRORCODE then
begin
output:= '';
for J := 0 to length(answer) do
begin
ch:= answer[J]; //This copies the char. Originally had an AnsiChar convert here.
output:= output + ch;
end;
WriteLn(output);
end;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
ReadLn(I);
end.`
The issue was PAnsiChar needed to be the return type of the function sourced from the DLL.
To convert PWideChar to AnsiString:
function WideCharToAnsiString(P: PWideChar): AnsiString;
begin
Result := P;
end;
The code converts from UTF-16, null-terminated PWideChar to AnsiString. If you are getting question marks in the output then either your input is not UTF-16, or it contains characters that cannot be encoded in your ANSI codepage.
My guess is that what is actually happening is that your Delphi DLL was created with a pre-Unicode Delphi and so uses ANSI text. But now you are trying to link to it from a post-Unicode Delphi where PChar has a different meaning. I'm sure Rob explained this to you in your other question. So you can simply fix it by declaring your DLL import to return PAnsiChar rather than PChar. Like this:
function GetDeviceChannelInfo_HSI(PortNumber, Address, ChNumber: Integer;
var State: Integer): PAnsiChar; stdcall; external DLL_FILENAME;
And when you have done this you can assign to a string variable in a similar vein as I describe above.
What you need to absorb is that in older versions of Delphi, PChar is an alias for PAnsiChar. In modern Delphi it is an alias for PWideChar. That mismatch would explain everything that you report.
It does occur to me that writing a Delphi wrapper to the DLL and communicating via stdout with your C# app is a very roundabout approach. I'd just p/invoke the DLL directly from the C# code. You seem to think that this is not possible, but it is quite simple.
[DllImport(#"mydll.dll")]
static extern IntPtr GetDeviceChannelInfo_HSI(
int PortNumber,
int Address,
int ChNumber,
ref int State
);
Call the function like this:
IntPtr ptr = GetDeviceChannelInfo_HSI(Port, Addr, Channel, ref State);
If the function returns a UTF-16 string (which seems doubtful) then you can convert the IntPtr like this:
string str = Marshal.PtrToStringUni(ptr);
Or if it is actually an ANSI string which seems quite likely to me then you do it like this:
string str = Marshal.PtrToStringAnsi(ptr);
And then of course you'll want to call into your DLL to deallocate the string pointer that was returned to you, assuming it was allocated on the heap.
Changed my mind on the comment - I'll make it an answer:)
According to that code if "state" is a code <> HSI_NO_ERRORCODE and there is no exception then it will write the uninitialised string "output" to the console. Which could be anything including accidentally showing "S" and "4" and a series of 1 or more question marks
type of answer(variable) is PChar. use length function good for string variable.
use strlen instead of length.
for J := 0 to StrLen(answer)-1 do
also accessible range of PChar(char *) is 0..n-1
To convert UTF-16 PWideChar to AnsiString you can use simple cast:
var
WStr: WideString;
pWStr: PWideString;
AStr: AnsiString;
begin
WStr := 'test';
pWStr := PWideString(WStr);
AStr := AnsiString(WideString(pWStr));
end;

Delphi - Store WideStrings inside a program

In the past I used INI-Files to store unicode text, but now I need to store unicode text in the executable. How can I achieve this?
I want to store these letters:
āčēūīšķļņž
If you want to save the Unicode INI files then you might try the following code. The files are saved in UTF8 encoding.
Also you might take a look at this Unicode library where you can find a lot of helper functions.
uses IniFiles;
function WideStringToUTF8(const Value: WideString): AnsiString;
var
BufferLen: Integer;
begin
Result := '';
if Value <> '' then
begin
BufferLen := WideCharToMultiByte(CP_UTF8, 0, PWideChar(Value), -1, nil, 0, nil, nil);
SetLength(Result, BufferLen - 1);
if BufferLen > 1 then
WideCharToMultiByte(CP_UTF8, 0, PWideChar(Value), -1, PAnsiChar(Result), BufferLen - 1, nil, nil);
end;
end;
function UTF8ToWideString(const Value: AnsiString): WideString;
var
BufferLen: integer;
begin
Result := '';
if Value <> '' then
begin
BufferLen := MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(Value), -1, nil, 0);
SetLength(Result, BufferLen - 1);
if BufferLen > 1 then
MultiByteToWideChar(CP_UTF8, 0, PAnsiChar(Value), -1, PWideChar(Result), BufferLen - 1);
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
var
IniFile: TIniFile;
const
UnicodeValue = WideString(#$0101#$010D#$0113#$016B#$012B#$0161);
begin
IniFile := TIniFile.Create('C:\test.ini');
try
IniFile.WriteString('Section', 'Key', WideStringToUTF8(UnicodeValue));
IniFile.UpdateFile;
finally
IniFile.Free;
end;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
IniFile: TIniFile;
UnicodeValue: WideString;
begin
IniFile := TIniFile.Create('C:\test.ini');
try
UnicodeValue := UTF8ToWideString(IniFile.ReadString('Section', 'Key', 'Default'));
MessageBoxW(Handle, PWideChar(UnicodeValue), 'Caption', 0);
finally
IniFile.Free;
end;
end;
with Delphi 2007 on 64-bit Windows 7 Enterprise SP 1
If you definitely need to use Delphi 7 there are some variants:
Store strings in resources linked to executable file.
Store strings in big memo or same thing, located on global data module or any other visual or non-visual component and access it by index. It's possible because strings in Delphi resources stored in XML-encoded form. E.g. your symbols example āčēūīšķļņž will be stored as āčēūīšķļņž
Store XML-encoded or Base64-encoded strings in string constants inside your code.
For string conversion you can use EncdDecd.pas , xdom.pas or some functions of System.pas like UTF8Encode/UTF8Decode.
To display and edit Unicode strings in Delphi forms you can use special set of Unicode controls like TNT Unicode Controls or subclass original Delphi controls and do some other workarounds by yourself, like described in this excerpt from comments in TntControls.pas (part of TNT Unicode Controls):
Windows NT provides support for native Unicode windows. To add
Unicode support to a
TWinControl descendant, override CreateWindowHandle() and call
CreateUnicodeHandle().
One major reason this works is because the VCL only uses the ANSI
version of
SendMessage() -- SendMessageA(). If you call SendMessageA() on a
UNICODE
window, Windows deals with the ANSI/UNICODE conversion
automatically. So
for example, if the VCL sends WM_SETTEXT to a window using
SendMessageA,
Windows actually expects a PAnsiChar even if the target window
is a UNICODE
window. So caling SendMessageA with PChars causes no problems.
A problem in the VCL has to do with the TControl.Perform() method.
Perform()
calls the window procedure directly and assumes an ANSI window.
This is a
problem if, for example, the VCL calls Perform(WM_SETTEXT, ...)
passing in a
PAnsiChar which eventually gets passed downto DefWindowProcW()
which expects a PWideChar.
This is the reason for SubClassUnicodeControl(). This procedure
will subclass the
Windows WndProc, and the TWinControl.WindowProc pointer. It will
determine if the
message came from Windows or if the WindowProc was called
directly. It will then
call SendMessageA() for Windows to perform proper conversion on
certain text messages.
Another problem has to do with TWinControl.DoKeyPress(). It is
called from the WM_CHAR
message. It casts the WideChar to an AnsiChar, and sends the
resulting character to
DefWindowProc. In order to avoid this, the DefWindowProc is
subclassed as well. WindowProc
will make a WM_CHAR message safe for ANSI handling code by
converting the char code to
#FF before passing it on. It stores the original WideChar in the
.Unused field of TWMChar.
The code #FF is converted back to the WideChar before passing onto
DefWindowProc.
Do
const MyString = WideString('Teksts latvie'#$0161'u valod'#$0101);
Simple, the idea is to find a non-visual component, which can store text and store your text there. Prefer that such component can also provide you an editor to edit the text in design time.
There is a component call FormResource which can do this. I use TUniScript. I believe there are other similar components. However, I did not find a usable component from the standard library.
The approach Widestring(#$65E5#$672C) does not work, because Delphi 7 just doesn't expect more than one byte for the #, so the outcome is by far not what you expect when going above 255 or $FF.
Another approach WideChar($65E5)+ WideChar($672C) can be used to store single Unicode codepoints in your source code when knowing that you need to have a Widestring at the start of the assignment (which can also be an empty literal) so the compiler understands which datatype you want:
const
// Compiler error "Imcompatible types"
WONT_COMPILE: WideChar($65E5)+ WideChar($672C);
// 日本
NIPPON: Widestring('')+ WideChar($65E5)+ WideChar($672C);
Looks cumbersome, but surely has your UTF-16 texts in Delphi 7.
Alternatively, store your constants in UTF-8, which is ASCII safe - that way you can use # easily. One advantage is, that it's a lot less cumbersome to write in your source code. One disadvantage is, that you can never use the constant directly, but have to convert it to UTF-16 first:
const
// UTF-8 of the two graphemes 日 and 本, needing 3 bytes each
NIPPON: #$E6#$97#$A5#$E6#$9C#$AC;
var
sUtf16: Widestring;
begin
// Internally these are 2 WORDs: $65E5 and $672C
sUtf16:= UTF8ToWideString( NIPPON );

Resources