I'm having trouble with Unicode characters being displayed incorrectly on my UI. I have a resource-only DLL containing a string table used for UI localization. I create the DLL in Delphi XE3 with a DLL-only project (just has {$R 'lang.res' 'lang.rc'} in the DPR file, and gives me lang.dll). I've verified that my lang.rc file is in UTF-8 format with Windows CRLF line breaks. When I load the strings from the DLL, Unicode characters are jumbled on the interface. Here are some details.
A snippet from the string table:
STRINGTABLE
{
59,"180˚"
60,"90˚ CW"
61,"90˚ CCW"
}
Here are code snippets that illustrate the problem I'm having with Unicode characters:
// explicitly assigning the degrees character shows 180˚ properly
ImageMenu180Action.Caption := '180˚';
// getting the resource from the DLL shows some weird two-character string for the degrees character
ImageMenu90CWAction.Caption := TLangHelper.LoadStr(IDS_ImageMenuRotationCW90);
// OutputDebugString shows the degrees character in the debugger output correctly
OutputDebugString(PChar('IDS_ImageMenuRotationCW90: '+TLangHelper.LoadStr(IDS_ImageMenuRotationCW90)));
Here is my Delphi function used for loading strings from the resource DLL:
class function TLangHelper.LoadStr(ResourceId: Integer):String;
var
Buff: String;
L: Integer;
begin
Result := '';
if LangDllHandle = 0 then begin
LangDllHandle := LoadLibrary(LANGUAGE_DLL_LOCATION);
if LangDllHandle = 0 then begin
ShowMessage('Error loading language localization resources.');
end;
end;
if LangDllHandle <> 0 then begin
L := 1024;
SetLength(Buff, L+1);
LoadString(LangDllHandle, ResourceId, PChar(Buff), L);
Result := String(PChar(Buff));
end;
end;
Any suggestions?
FOLLOW-UP:
For Chinese characters, I had to add a preceding L to the string definitions in the .rc file so that the DLL compilation recognized them as Unicode. For example (English, Chinese Traditional, Chinese Simplified, French):
STRINGTABLE
{
35,"Status Bar"
1035,L"狀態欄"
2035,L"状态栏"
3035,"Barre d'état"
}
I found a reference from 2002 indicating that you need to tell the resource compiler how the .rc file is encoded. For UTF-8, that's code page 65001, so you'd run this:
brcc32 -c65001 lang.rc
Then, of course, you'd remove the 'lang.rc' part from the $R directive in your code because you no longer want the IDE to invoke the resource compiler itself.
If your Delphi version is recent enough, then you can keep the full $R directive and instead set the -c65001 option in the resource-compiler configuration of your project options.
It's hard to know the encoding of a file just by looking at it. There can be many valid guesses. The -c option is documented, but the documentation doesn't mention when you'd need to use it, or what the IDE uses when it runs the resource compiler. The IDE probably just uses the default, the same as brcc32.exe, which is the system's default ANSI code page.
Related
I have an .URL file which contains the following text which contains a German Umlaut character:
[InternetShortcut]
URL=http://edn.embarcadero.com/article/44358
[MyApp]
Notes=Special Test geändert
Icon=default
Title=Bug fix list for RAD Studio XE8
I try to load the text with TMemIniFile:
uses System.IniFiles;
//
procedure TForm1.Button1Click(Sender: TObject);
var
BookmarkIni: TMemIniFile;
begin
// The error occurs here:
BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url',
TEncoding.UTF8);
try
// Some code here
finally
BookmarkIni.Free;
end;
end;
This is the error message text from the debugger:
Project MyApp.exe raised exception class EEncodingError with message
'No mapping for the Unicode character exists in the target multi-byte
code page'.
When I remove the word with the German Umlaut character "geändert" from the .URL file then there is NO error.
But that's why I use TMemIniFile, because TIniFile does not work here when the text in the .URL file contains Unicode characters. (There could also be other Unicode characters in the .URL file).
So why I get an exception here in TMemIniFile.Create?
EDIT: Found the culprit: The .URL file is in ANSI format. The error does not happen when the .URL file is in UTF-8 format. But what can I do when the file is in ANSI format?
EDIT2: I've created a workaround which does work BOTH with ANSI and UTF-8 files:
procedure TForm1.Button1Click(Sender: TObject);
var
BookmarkIni: TMemIniFile;
BookmarkIni_: TIniFile;
ThisFileIsAnsi: Boolean;
begin
try
ThisFileIsAnsi := False;
BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url',
TEncoding.UTF8);
except
BookmarkIni_ := TIniFile.Create('F:\Bug fix list for RAD Studio XE8.url');
ThisFileIsAnsi := True;
end;
try
// Some code here
finally
if ThisFileIsAnsi then
BookmarkIni_.Free
else
BookmarkIni.Free;
end;
end;
What do you think?
It is not possible, in general, to auto-detect the encoding of a file from its contents.
A clear demonstration of this is given by this article from Raymond Chen: The Notepad file encoding problem, redux. Raymond uses the example of a file containing these two bytes:
D0 AE
Raymond goes on to show that this is a well formed file with the following four encodings: ANSI 1252, UTF-8, UTF-16BE and UTF-16LE.
The take home lesson here is that you have to know the encoding of your file. Either agree it by convention with whoever writes the file. Or enforce the presence of a BOM.
You need to decide on what the encoding of the file is, once and for all. There's no fool proof way to auto-detect this, so you'll have to enforce it from your code that creates these files.
If the creation of this file is outside your control, then you are more or less out of luck. You can try to rely of the BOM (Byte-Order-Mark) at the beginning of the file (which should be there if it is a UTF-8 file). I can't see from the specification of the TMemIniFile what the CREATE constructor without an encoding parameter assumes about the encoding of the file (my guess is that it follows the BOM and if there's no such thing, it assumes ANSI, ie. system codepage).
One thing you can do - if you decide to stick to your current method - is to change your code to:
procedure TForm1.Button1Click(Sender: TObject);
var
BookmarkIni: TCustomIniFile;
begin
// The error occurs here:
try
BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url',
TEncoding.UTF8);
except
BookmarkIni := TIniFile.Create('F:\Bug fix list for RAD Studio XE8.url');
end;
try
// Some code here
finally
BookmarkIni.Free;
end;
end;
You don't need two separate variables, as both TIniFile and TMemIniFile (as well as TRegistryIniFile) all have a common ancestor: TCustomIniFile. By declaring your variable as this common ancestor, you can instantiate (create) it as any of the class types that inherit from TCustomIniFile. The actual (run-time) type is determined depending on which construtcor you're calling to create.
But first, you should try to use
BookmarkIni := TMemIniFile.Create('F:\Bug fix list for RAD Studio XE8.url');
ie. without any encoding specified, and see if it works with both ANSI and UTF-8 files.
EDIT: Here's a test program to verify my claim made in the comments:
program Project21;
{$APPTYPE CONSOLE}
uses
IniFiles, System.SysUtils;
const
FileName = 'F:\Bug fix list for RAD Studio XE8.url';
var
TXT : TextFile;
procedure Test;
var
BookmarkIni: TCustomIniFile;
begin
try
BookmarkIni := TMemIniFile.Create(FileName,TEncoding.UTF8);
except
BookmarkIni := TIniFile.Create(FileName);
end;
try
Writeln(BookmarkIni.ReadString('MyApp','Notes','xxx'))
finally
BookmarkIni.Free;
end;
end;
begin
try
AssignFile(TXT,FileName); REWRITE(TXT);
try
WRITELN(TXT,'[InternetShortcut]');
WRITELN(TXT,'URL=http://edn.embarcadero.com/article/44358');
WRITELN(TXT,'[MyApp]');
WRITELN(TXT,'Notes=The German a umlaut consists of the following two ANSI characters: '#$C3#$A4);
WRITELN(TXT,'Icon=default');
WRITELN(TXT,'Title=Bug fix list for RAD Studio XE8');
finally
CloseFile(TXT)
end;
Test;
ReadLn
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
end.
The rule of thumb - to read data (file, stream whatever) correctly you must know the encoding! And the best solution is to let user to choose encoding or force one e.g. utf-8.
Moreover, the information ANSI does make things easier without code page.
A must read - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Other approach is to try to detect encoding (like browsers do with sites if no encoding specified). Detecting UTF is relatively easy if BOM exists, but more often is omitted. Take a look Mozilla's universalchardet or chsdet.
I am using Indy 10 with Delphi. Following is my code which uses EncodeString method of Indy to encode a string.
var
EncodedString : String;
StringToBeEncoded : String;
EncoderMIME: TIdEncoderMIME;
....
....
EncodedString := EncoderMIME.EncodeString(StringToBeEncoded);
I am not getting the correct value in encoded sting.
What is the purpose of IndyTextEncoding_OSDefault?
Here's the source code for IndyTextEncoding_OSDefault.
function IndyTextEncoding_OSDefault: IIdTextEncoding;
begin
if GIdOSDefaultEncoding = nil then begin
LEncoding := TIdMBCSEncoding.Create;
if InterlockedCompareExchangeIntf(IInterface(GIdOSDefaultEncoding), LEncoding, nil) <> nil then begin
LEncoding := nil;
end;
end;
Result := GIdOSDefaultEncoding;
end;
Note that I stripped out the .net conditional code for simplicity. Most of this code is to arrange singleton thread-safety. The actual value returned is synthesised by a call to TIdMBCSEncoding.Create. Let's look at that.
constructor TIdMBCSEncoding.Create;
begin
Create(CP_ACP, 0, 0);
end;
Again I've remove conditional code that does not apply to your Windows setting. Now, CP_ACP is the Active Code Page, the current system Windows ANSI code page. So, on Windows at least, IndyTextEncoding_OSDefault is an encoding for the current system Windows ANSI code page.
Why did using IndyTextEncoding_OSDefault give the same behaviour as my Delphi 7 code?
That's because the Delphi 7 / Indy 9 code for TEncoderMIME.EncodeString does not perform any code page transformation and MIME encodes the input string as though it were a byte array. Since the Delphi 7 string is encoded in the active ANSI code page, this has the same effect as passing IndyTextEncoding_OSDefault to TEncoderMIME.EncodeString in your Unicode version of the code.
What is the difference between IndyTextEncoding_Default and IndyTextEncoding_OSDefault?
Here is the source code for IndyTextEncoding_OSDefault:
function IndyTextEncoding_Default: IIdTextEncoding;
var
LType: IdTextEncodingType;
begin
LType := GIdDefaultTextEncoding;
if LType = encIndyDefault then begin
LType := encASCII;
end;
Result := IndyTextEncoding(LType);
end;
This returns an encoding that is determined by the value of GIdDefaultTextEncoding. By default, GIdDefaultTextEncoding is encASCII. And so, by default, IndyTextEncoding_Default yields an ASCII encoding.
Beyond all this you should be asking yourself which encoding you want to be using. Relying on default values leaves you at the mercy of those defaults. What if those defaults don't do what you want to do. Especially as the defaults are not Unicode encodings and so support only a limited range of characters. And what's more are dependent on system settings.
If you wish to encode international text, you would normally choose to use the UTF-8 encoding.
One other point to make is that you are calling EncodeString as though it were an instance method, but it is actually a class method. You can remove EncoderMIME and call TEncoderMIME.EncodeString. Or keep EncoderMIME and call EncoderMIME.Encode.
What is the easiest way to create and save string into .txt files?
Use TStringList.
uses
Classes, Dialogs; // Classes for TStrings, Dialogs for ShowMessage
var
Lines: TStrings;
Line: string;
FileName: string;
begin
FileName := 'test.txt';
Lines := TStringList.Create;
try
Lines.Add('First line');
Lines.Add('Second line');
Lines.SaveToFile(FileName);
Lines.LoadFromFile(FileName);
for Line in Lines do
ShowMessage(Line);
finally
Lines.Free;
end;
end;
Also SaveToFile and LoadFromFile can take an additional Encoding in Delphi 2009 and newer to set the text encoding (Ansi, UTF-8, UTF-16, UTF-16 big endian).
Actually, I prefer this:
var
Txt: TextFile;
SomeFloatNumber: Double;
SomeStringVariable: string;
Buffer: Array[1..4096] of byte;
begin
SomeStringVariable := 'Text';
AssignFile(Txt, 'Some.txt');
Rewrite(Txt);
SetTextBuf(Txt, Buffer, SizeOf(Buffer));
try
WriteLn(Txt, 'Hello, World.');
WriteLn(Txt, SomeStringVariable);
SomeFloatNumber := 3.1415;
WriteLn(Txt, SomeFloatNumber:0:2); // Will save 3.14
finally CloseFile(Txt);
end;
end;
I consider this the easiest way, since you don't need the classes or any other unit for this code. And it works for all Delphi versions including -if I'm not mistaken- all .NET versions of Delphi...
I've added a call to SetTextBuf() to this example, which is a good trick to speed up textfiles in Delphi considerably. Normally, textfiles have a buffer of only 128 bytes. I tend to increase this buffer to a multiple of 4096 bytes. In several cases, I'va also implemented my own TextFile types, allowing me to use these "console" functions to write text to memo fields or even to another, external application! At this location is some example code (ZIP) I wrote in 2000 and just modified to make sure it compiles with Delphi 2007. Not sure about newer Delphi versions, though. Then again, this code is 10 years old already.These console functions have been a standard of the Pascal language since it's beginning so I don't expect them to disappear anytime soon. The TtextRec type might be modified in the future, though, so I can't predict if this code will work in the future... Some explanations:
WA_TextCustomEdit.AssignCustomEdit allows text to be written to CustomEdit-based objects like TMemo.
WA_TextDevice.TWATextDevice is a class that can be dropped on a form, which contains events where you can do something with the data written.
WA_TextLog.AssignLog is used by me to add timestamps to every line of text.
WA_TextNull.AssignNull is basically a dummy text device. It just discards anything you write to it.
WA_TextStream.AssignStream writes text to any TStream object, including memory streams, file streams, TCP/IP streams and whatever else you have.
Code in link is hereby licensed as CC-BY
Oh, the server with the ZIP file isn't very powerful, so it tends to be down a few times every day. Sorry about that.
The IOUtils unit which was introduced in Delphi 2010 provides some very convenient functions for writing/reading text files:
//add the text 'Some text' to the file 'C:\users\documents\test.txt':
TFile.AppendAllText('C:\users\documents\text.txt', 'Some text', TEncoding.ASCII);
Or if you are using an older version of Delphi (which does not have the for line in lines method of iterating a string list):
var i : integer;
begin
...
try
Lines.Add('First line');
Lines.Add('Second line');
Lines.SaveToFile(FileName);
Lines.LoadFromFile(FileName);
for i := 0 to Lines.Count -1 do
ShowMessage(Lines[i]);
finally
Lines.Free;
end;
If you're using a Delphi version >= 2009, give a look to the TStreamWriter class.
It will also take care of text file encodings and newline characters.
procedure String2File;
var s:ansiString;
begin
s:='My String';
with TFileStream.create('c:\myfile.txt',fmCreate) do
try
writeBuffer(s[1],length(s));
finally
free;
end;
end;
Care needed when using unicode strings....
I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.
The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.
Sample:
procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
Strings: TStrings;
begin
Strings := TStringList.Create;
try
Strings.LoadFromFile(AInputFileName);
Strings.Text := UTF8Encode(Strings.Text);
Strings.SaveToFile(AOutputFileName);
finally
Strings.Free;
end;
end;
Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.
var
Latin1Encoding: TEncoding;
begin
Latin1Encoding := TEncoding.GetEncoding(28591);
try
MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
finally
Latin1Encoding.Free;
end;
end;
Please read the whole answer before you start coding.
The proper answer to question - and it is not the easy one - basically consist of tree steps:
You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.
However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.
The proper solution to your problem is ways more complicated:
If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.
It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the #ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.
I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)
I did only this:
procedure TForm1.FormCreate(Sender: TObject);
begin
Strings := TStringList.Create;
end;
procedure TForm1.Button3Click(Sender: TObject);
begin
Strings.Text := UTF8Encode(Memo1.Text);
Strings.SaveToFile('new.txt');
end;
Verified with Notepad++ UTF8 without BOM
Did you mean ASCII?
ASCII is backwards compatible with UTF-8.
http://en.wikipedia.org/wiki/UTF-8
For some reason, lately the *.UDL files on many of my client systems are no longer compatible as they were once saved as ANSI files, which is no longer compatible with the expected UNICODE file format. The end result is an error dialog which states "the file is not a valid compound file".
What is the easiest way to programatically open these files and save as a unicode file? I know I can do this by opening each one in notepad and then saving as the same file but with the "unicode" selected in the encoding section of the save as dialog, but I need to do this in the program to cut down on support calls.
This problem is very easy to duplicate, just create a *.txt file in a directory, rename it to *.UDL, then edit it using the microsoft editor. Then open it in notepad and save as the file as an ANSI encoded file. Try to open the udl from the udl editor and it will tell you its corrupt. then save it (using notepad) as a Unicode encoded file and it will open again properly.
Ok, using delphi 2009, I was able to come up with the following code which appears to work, but is it the proper way of doing this conversion?
var
sl : TStrings;
FileName : string;
begin
FileName := fServerDir+'configuration\hdconfig4.udl';
sl := TStringList.Create;
try
sl.LoadFromFile(FileName, TEncoding.Default);
sl.SaveToFile(FileName, TEncoding.Unicode);
finally
sl.Free;
end;
end;
This is very simple to do with my TGpTextFile unit. I'll put together a short sample and post it here.
It should also be very simple with the new Delphi 2009 - are you maybe using it?
EDIT: This his how you can do it using my stuff in pre-2009 Delphis.
var
strAnsi : TGpTextFile;
strUnicode: TGpTextFile;
begin
strAnsi := TGpTextFile.Create('c:\0\test.udl');
try
strAnsi.Reset; // you can also specify non-default 8-bit codepage here
strUnicode := TGpTextFile.Create('c:\0\test-out.udl');
try
strUnicode.Rewrite([cfUnicode]);
while not strAnsi.Eof do
strUnicode.Writeln(strAnsi.Readln);
finally FreeAndNil(strUnicode); end;
finally FreeAndNil(strAnsi); end;
end;
License: The code fragment above belongs to public domain. Use it anyway you like.