This line:
TFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
drops an exception if the filename contain something like ñ
You are, ultimately calling CreateFileA, the ANSI API, and the characters you use have no ANSI encoding. The only way to get beyond this is to open the file with CreateFileW, the Unicode API.
You might not realise that you call CreateFileA, but that's how the Delphi 7 file stream is implemented.
One easy way to solve your problems is to upgrade to the latest Delphi which has good support for the native Windows Unicode API.
If you are stuck with ANSI Delphi then you still need to call CreateFileW. You can do this to create a file handle. You'll need to pass a UTF-16 string to that API. Use WideString to store it. You'll also need to get the filename from the user in UTF-16 form. Which means a call to GetOpenFileNameW or IFileDialog. Create a stream by passing the file handle to THandleStream.
To make all this possible you would use the TNT Unicode libraries. They work well but will impose a big port on you.
Frankly, the right way forward is to use modern tools that support Unicode.
You can use the TntUnicode units to have UTF8 support under Delphi 7.
Add TntClasses to your Uses and make the call like this:
TTntFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
Make sure that fileName is widestring.
Here you can get a copy of TntUnicode:
https://github.com/rofl0r/TntUnicode
UTF16 can be thought of as a codepage, just like all of the possible ANSI codepages.
As Remy mentions in his comment, assuming your ANSI codepage supports the required characters in your Unicode string you simply have to convert that Unicode version of that string to the equivalent ANSI codepage version.
The Delphi compiler can take care of a simple conversion for you automatically, which you use simply by casting a WIDEString (UTF16) to an (ANSI)String:
const
WIDE_FILENAME : WIDEString = 'fuññy.txt';
var
sFilename: String;
strm: TFileStream;
begin
sFilename := String(WIDE_FILENAME);
strm := TFileStream.Create(sFilename, fmOpenRead);
// etc
end;
This works perfectly well even on (e.g.) Delphi 7. The only caveat is that the codepage involved (the system default) must support the extended characters in the Unicode string.
NOTE: The above code uses the String type rather than ANSIString explicitly. On Delphi versions where String is ANSIString, this has the required effect but also is portable to versions where String is UnicodeString (should you upgrade your version later).
If you use ANSIString explicitly in this case, the result will be a double conversion if/when you upgrade:
// Unicode compiler using ANSIString type....
var
sFilename: ANSIString;
begin
sFilename := ANSIString(WIDE_FILENAME); // Codepage conversion from UTF16 to ANSI
strm := TFileStream.Create(sFilename, fmOpenRead); // Will implicitly convert *back* from ANSI to WIDE
versus
// Unicode compiler using String type....
var
sFilename: String;
begin
sFilename := String(WIDE_FILENAME); // String type conversion from WideString to UnicodeString
strm := TFileStream.Create(sFilename, fmOpenRead); // No further conversion necessary
Best solution is to go Unicode, but if that is not an option, you can still solve the problem.
In Windows you can set what codepage to use for non-Unicode programs. Just change it to support the correct language (Spanish?). Then the code should work.
Windows 7: Control Panel > Region and Language > Administrative > Language for non-Unicode programs
Windows XP: Control Panel > Regional and Language > Advanced > Language for non-Unicode programs
Related
How i can write the unicode string "Внимание" on a .inifile?
how i'm writing:
Escreve := TIniFile.Create(Patch + 'File.ini');
Escreve.WriteString('Informations', 'Patch', ParamStr(0));
The folder name is "Внимание" and at .inifile show ????????
On Windows, TIniFile internally uses the Win32 PrivateProfile API (in this case, WritePrivateProfileStringA() in Delphi 2007 and earlier, and WritePrivateProfileStringW() in Delphi 2009 and later). WritePrivateProfileStringA() does not support Unicode at all, and WritePrivateProfileStringW() writes Unicode data only if the INI file already exists and was created with a UTF-16 BOM, otherwise it writes ANSI data instead.
If you are using Delphi 2009+, TMemIniFile allows you to specify a TEncoding for the desired charset, such as TEncoding.UTF8 or TEncoding.Unicode (UTF-16), eg:
Escreve := TMemIniFile.Create(Patch + 'File.ini', TEncoding.UTF8);
Escreve.WriteString('Informations', 'Patch', ParamStr(0));
This line:
TFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
drops an exception if the filename contain something like ñ
You are, ultimately calling CreateFileA, the ANSI API, and the characters you use have no ANSI encoding. The only way to get beyond this is to open the file with CreateFileW, the Unicode API.
You might not realise that you call CreateFileA, but that's how the Delphi 7 file stream is implemented.
One easy way to solve your problems is to upgrade to the latest Delphi which has good support for the native Windows Unicode API.
If you are stuck with ANSI Delphi then you still need to call CreateFileW. You can do this to create a file handle. You'll need to pass a UTF-16 string to that API. Use WideString to store it. You'll also need to get the filename from the user in UTF-16 form. Which means a call to GetOpenFileNameW or IFileDialog. Create a stream by passing the file handle to THandleStream.
To make all this possible you would use the TNT Unicode libraries. They work well but will impose a big port on you.
Frankly, the right way forward is to use modern tools that support Unicode.
You can use the TntUnicode units to have UTF8 support under Delphi 7.
Add TntClasses to your Uses and make the call like this:
TTntFileStream.Create(fileName, fmOpenRead or fmShareDenyNone);
Make sure that fileName is widestring.
Here you can get a copy of TntUnicode:
https://github.com/rofl0r/TntUnicode
UTF16 can be thought of as a codepage, just like all of the possible ANSI codepages.
As Remy mentions in his comment, assuming your ANSI codepage supports the required characters in your Unicode string you simply have to convert that Unicode version of that string to the equivalent ANSI codepage version.
The Delphi compiler can take care of a simple conversion for you automatically, which you use simply by casting a WIDEString (UTF16) to an (ANSI)String:
const
WIDE_FILENAME : WIDEString = 'fuññy.txt';
var
sFilename: String;
strm: TFileStream;
begin
sFilename := String(WIDE_FILENAME);
strm := TFileStream.Create(sFilename, fmOpenRead);
// etc
end;
This works perfectly well even on (e.g.) Delphi 7. The only caveat is that the codepage involved (the system default) must support the extended characters in the Unicode string.
NOTE: The above code uses the String type rather than ANSIString explicitly. On Delphi versions where String is ANSIString, this has the required effect but also is portable to versions where String is UnicodeString (should you upgrade your version later).
If you use ANSIString explicitly in this case, the result will be a double conversion if/when you upgrade:
// Unicode compiler using ANSIString type....
var
sFilename: ANSIString;
begin
sFilename := ANSIString(WIDE_FILENAME); // Codepage conversion from UTF16 to ANSI
strm := TFileStream.Create(sFilename, fmOpenRead); // Will implicitly convert *back* from ANSI to WIDE
versus
// Unicode compiler using String type....
var
sFilename: String;
begin
sFilename := String(WIDE_FILENAME); // String type conversion from WideString to UnicodeString
strm := TFileStream.Create(sFilename, fmOpenRead); // No further conversion necessary
Best solution is to go Unicode, but if that is not an option, you can still solve the problem.
In Windows you can set what codepage to use for non-Unicode programs. Just change it to support the correct language (Spanish?). Then the code should work.
Windows 7: Control Panel > Region and Language > Administrative > Language for non-Unicode programs
Windows XP: Control Panel > Regional and Language > Advanced > Language for non-Unicode programs
I am trying to replace some wildcards in a html code to send it via mailing.
Problem is when I try to replace the string with wildcard 'España$country$' with the string 'España', the result would be 'EspañaEspa?a'. I had the same problem before in Delphi 7 and I solved it by using the function 'UTF8Encode('España')' but it does not work on Delphi 10.
I have tried with 'España', 'UTF8Encode('España')' and 'AnsiToUTF8('España')'. I also tried to change the function StringReplace with ReplaceStr and ReplaceText, with same result.
......
var htmlText : TStringList;
......
htmlText := TStringList.Create;
htmlText.LoadFromFile('path.html');
htmlText.StringReplace(htmlText.Text, '$country$', UTF8Encode('España'), [rfReplaceAll]);
htmlText.SaveToFile('anotherpath.html');
......
This "stringreplace" along with "utf8encode" works well in Delphi7, showing 'España', but not in delphi 10, where you can read 'Espa?a' in the anotherpath.html.
The Delphi 7 string type, and consequently TStrings, did not support Unicode. Which is why you needed to use UTF8Encode.
Since Delphi 2009, Unicode is supported, and string maps to UnicodeString, and TStrings is a collection of such strings. Note that UnicodeString is internall encoded as UTF-16 although that's not a detail that you need to be concerned with here.
Since you are now using a Delphi that supports Unicode, your code can be much simpler. You can now write it like this:
htmlText.Text := StringReplace(htmlText.Text, '$country$', 'España', [rfReplaceAll]);
Note that if you wish the file to be encoded as UTF-8 when you save it you need to specify that when you save it. Like this:
htmlText.SaveToFile('anotherpath.html', TEncoding.UTF8);
And you may also need to specify the encoding when loading the file in case it does not include a UTF-8 BOM:
htmlText.LoadFromFile('path.html', TEncoding.UTF8);
I want to store arbitary binary data as BLOB into a SQlite database.
The data will be added as value with this function:
procedure TSQLiteDatabase.AddParamText(name: string; value: string);
Now I want to convert a WideString into its UTF8 representation, so it can be stored to the database. After calling UTF8Encode and storing the result into the database, I noticed that the data inside the database is not UTF8 decoded. Instead, it is encoded as AnsiString in my computer's locale.
I ran following test to check what happened:
type
{$IFDEF Unicode}
TBinary = RawByteString;
{$ELSE}
TBinary = AnsiString;
{$ENDIF}
procedure TForm1.Button1Click(Sender: TObject);
var
original: WideString;
blob: TBinary;
begin
original := 'ä';
blob := UTF8Encode(original);
// Delphi 6: ä (as expected)
// Delphi XE4: ä (unexpected! How did it do an automatic UTF8Decode???)
ShowMessage(blob);
end;
After the character "ä" has been converted to UTF8, the data is correct in memory ("ä"), however, as soon as I pass the TBinary value to a function (as string or AnsiString), Delphi XE4 does a "magic typecast" invoking UTF8Decode for some reason I don't know.
I have already found a workaround to avoid this:
function RealUTF8Encode(AInput: WideString): TBinary;
var
tmp: TBinary;
begin
tmp := UTF8Encode(AInput);
SetLength(result, Length(tmp));
CopyMemory(#result[1], #tmp[1], Length(tmp));
end;
procedure TForm1.Button2Click(Sender: TObject);
var
original: WideString;
blob: TBinary;
begin
original := 'ä';
blob := RealUTF8Encode(original);
// Delphi 6: ä (as expected)
// Delphi XE4: ä (as expected)
ShowMessage(blob);
end;
However, this workaround with RealUTF8Encode looks dirty to me and I would like to understand why a simple call of UTF8Encode did not work and if there is a better solution.
In Ansi versions of Delphi (prior to D2009), UTF8Encode() returns a UTF-8 encoded AnsiString. In Unicode versions (D2009 and later), it returns a UTF-8 encoded RawByteString with a code page of CP_UTF8 (65001) assigned to it.
In Ansi versions, ShowMessage() takes an AnsiString as input, and the UTF-8 string is an AnsiString, so it gets displayed as-is. In Unicode versions, ShowMessage() takes a UTF-16 encoded UnicodeString as input, so the UTF-8 encoded RawByteString gets converted to UTF-16 using its assigned CP-UTF8 code page.
If you actually wrote the blob data directly to the database you would find that it may or may not be UTF-8 encoded, depending on how you are writing it. But your approach is wrong; the use of RawByteString is incorrect in this situation. RawByteString is meant to be used as a procedure parameter only. Do not use it as a local variable. That is the source of your problem. From the documentation:
The purpose of RawByteString is to reduce the need for multiple
overloads of procedures that read string data. This means that
parameters of routines that process strings without regard for the
string's code page should typically be of type RawByteString.
RawByteString should only be used as a parameter type, and only in
routines which otherwise would need multiple overloads for AnsiStrings
with different codepages. Such routines need to be written with care
for the actual codepage of the string at run time.
For Unicode versions of Delphi, instead of RawByteString, I would suggest that you use TBytes to hold your UTF-8 data, and encode it with TEncoding:
var
utf8: TBytes;
str: string;
...
str := ...;
utf8 := TEncoding.UTF8.GetBytes(str);
You are looking for a data type that does not perform implicit text encodings when passed around, and TBytes is that type.
For Ansi versions of Delphi, you can use AnsiString, WideString and UTF8Encode exactly as you do.
Personally however, I would recommend using TBytes consistently for your UTF-8 data. So if you need a single code base that supports Ansi and Unicode compilers (ugh!) then you should create some helpers:
{$IFDEF Unicode}
function GetUTF8Bytes(const Value: string): TBytes;
begin
Result := TEncoding.UTF8.GetBytes(Value);
end;
{$ELSE}
function GetUTF8Bytes(const Value: WideString): TBytes;
var
utf8str: UTF8String;
begin
utf8str := UTF8Encode(Value);
SetLength(Result, Length(utf8str));
Move(Pointer(utf8str)^, Pointer(Result)^, Length(utf8str));
end;
{$ENDIF}
The Ansi version incurs more heap allocations than are necessary. You might well choose to write a more efficient helper that calls WideCharToMultiByte() directly.
In Unicode versions of Delphi, if for some reason you don't want to use TBytes for UTF-8 data, you can use UTF8String instead. This is a special AnsiString that always uses the CP_UTF8 code page. You can then write:
var
utf8: UTF8String;
str: string;
....
utf8 := str;
and the compiler will convert from UTF-16 to UTF-8 behind the scenes for you. I would not recommend this though, because it is not supported on mobile platforms, or in Ansi versions of Delphi (UTF8String has existed since Delphi 6, but it was not a true UTF-8 string until Delphi 2009). That is, amongst other reasons, why I suggest that you use TBytes. My philosophy is, at least in the Unicode age, that there is the native string type, and any other encoding should be held in TBytes.
I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.
The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.
Sample:
procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
Strings: TStrings;
begin
Strings := TStringList.Create;
try
Strings.LoadFromFile(AInputFileName);
Strings.Text := UTF8Encode(Strings.Text);
Strings.SaveToFile(AOutputFileName);
finally
Strings.Free;
end;
end;
Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.
var
Latin1Encoding: TEncoding;
begin
Latin1Encoding := TEncoding.GetEncoding(28591);
try
MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
finally
Latin1Encoding.Free;
end;
end;
Please read the whole answer before you start coding.
The proper answer to question - and it is not the easy one - basically consist of tree steps:
You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.
However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.
The proper solution to your problem is ways more complicated:
If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.
It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the #ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.
I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)
I did only this:
procedure TForm1.FormCreate(Sender: TObject);
begin
Strings := TStringList.Create;
end;
procedure TForm1.Button3Click(Sender: TObject);
begin
Strings.Text := UTF8Encode(Memo1.Text);
Strings.SaveToFile('new.txt');
end;
Verified with Notepad++ UTF8 without BOM
Did you mean ASCII?
ASCII is backwards compatible with UTF-8.
http://en.wikipedia.org/wiki/UTF-8