How to fetch japanese characters in Delphi 7 - delphi

I have a problem with displaying Japanese Character , specifically Unicode character "5c" in my delphi application . I need to save the application names into the registry and then display it in some kind of popup.
I have narrowed down the problem to this code specifically :-
Var
Str : WideString;
Str2: WideString;
Str3 : WideString;
TntEdit5.Text := TntOpenDialog1.FileName; //correctly displayed
Str3 := TntEdit5.Text;
ShowMessage('Original =' + Str3);
Str := UTF8Encode(TntEdit5.Text) ;
ShowMessage('UTF8Encode =' + Str3);
Str2 := UTF8Decode(Str) ;
ShowMessage('UTF8Decode =' + Str3);
end;
I dont get the correct name in Str, Str2 and Str3 . So how to fetch the name in a string ?
I dont want to display the text but i want to use it to save to registry and other functions.
Instead of SHowMessage, I used MessageBoxW(Form1.Handle, PWChar( Str3 ), 'Path', MB_OK ); which gave me correct result.
But I want to use this string internally, like write the string into a file etc. How to do that ?
Thanks In Advance

The type of Str does not match the type of result of UTF8Encode - so the line Str := UTF8Encode damages data. Instead of Str you should declare and use variable with a datatype mathcing the one of Utf8Encode result.
Same is true for Str2 := UTF8Decode(Str) line with regard to wrong data type of Str parameter the. It should be replaced with another var of proper datatype.
Str3 is not declared, so the code won't even compile. Add the Str3: WideString; line.
ShowMessage does not work with UTF-16, so then you make your own popup function that does.
Make your own dialog containing Tnt unicode-aware Label to display the text. And your new ShowMessage-like function would set the label's caption and then display that dialog instead of stock unicode-unaware one.
You may look at http://blog.synopse.info/post/2011/03/05/Open-Source-SynTaskDialog-unit-for-XP%2CVista%2CSeven for exampel of such dialogs, but i don't know if they are UTF-16 aware on D7.
Another option is searching TnT Sources for a ready-made unicode-aware function like ShowMessage - there may be one, or not.
Yet another option is using Win32 API directly, namely the MessageBoxW function working with PWideChar variables for texts: see http://msdn.microsoft.com/en-us/library/windows/desktop/ms645505.aspx
#DavidHeffernan MessageBoxW needs a lot of boilerplate both due to using C-Strings and for giving too much flexibility. It may be considered kinda good replacement for MessageDlg but not so much for ShowMessage. Then i am sure that TnT has ShowMessage conversion and that implementing own dialog would be good for both application look-and-feel and topic-starter experience.
You may also switch from obsolete Delphi 7 to modern CodeTyphon that uses UTF-8 for strings out of the box. You should at very least give it a try.
To read and write WideString from registry using Delphi 7 RTL you can make two easy options:
Convert WideString to UTF8 AnsiString and save it via TRegistry.WriteString and do back conversion on reading.
Save WideString as binary data: Cardinal(Length) followed by array of WideChar using TRegistry.WriteBinaryData
You can also use function RegReadWideString(const RootKey: DelphiHKEY; const Key, Name: string): WideString; and RegWriteWideString courtesy of http://jcl.sf.net
Whatever approach you'd choose - you have to do your own class on top of TRegistry that would be uniformly implementing those new TYourNewRegistry.WriteWideString and TYourNewRegistry.ReadWideString methods, so that the string written would always be read back using the same method.
However, since you already got TNT installed - then look carefully inside,. there just should be ready-made unicode-aware class like TTntRegistry or something like that.

Related

How to correct encode a string to UTF8 in delphi10?

I am trying to replace some wildcards in a html code to send it via mailing.
Problem is when I try to replace the string with wildcard 'España$country$' with the string 'España', the result would be 'EspañaEspa?a'. I had the same problem before in Delphi 7 and I solved it by using the function 'UTF8Encode('España')' but it does not work on Delphi 10.
I have tried with 'España', 'UTF8Encode('España')' and 'AnsiToUTF8('España')'. I also tried to change the function StringReplace with ReplaceStr and ReplaceText, with same result.
......
var htmlText : TStringList;
......
htmlText := TStringList.Create;
htmlText.LoadFromFile('path.html');
htmlText.StringReplace(htmlText.Text, '$country$', UTF8Encode('España'), [rfReplaceAll]);
htmlText.SaveToFile('anotherpath.html');
......
This "stringreplace" along with "utf8encode" works well in Delphi7, showing 'España', but not in delphi 10, where you can read 'Espa?a' in the anotherpath.html.
The Delphi 7 string type, and consequently TStrings, did not support Unicode. Which is why you needed to use UTF8Encode.
Since Delphi 2009, Unicode is supported, and string maps to UnicodeString, and TStrings is a collection of such strings. Note that UnicodeString is internall encoded as UTF-16 although that's not a detail that you need to be concerned with here.
Since you are now using a Delphi that supports Unicode, your code can be much simpler. You can now write it like this:
htmlText.Text := StringReplace(htmlText.Text, '$country$', 'España', [rfReplaceAll]);
Note that if you wish the file to be encoded as UTF-8 when you save it you need to specify that when you save it. Like this:
htmlText.SaveToFile('anotherpath.html', TEncoding.UTF8);
And you may also need to specify the encoding when loading the file in case it does not include a UTF-8 BOM:
htmlText.LoadFromFile('path.html', TEncoding.UTF8);

How to use an arbitrary string encoding?

I'm trying to get some code working against an API published by a Chinese company. I have a spec and some sample code (in Java), enough to understand most of what's going on, but I ran across one thing I don't know how to do.
String ecodeform = "GBK";
String sm = new String(Hex.encodeHex("Insert message here".getBytes(ecodeform))); //test message
It's creating a string from the char array result of the hex representation of the original string, encoded in GBK format (the standard Chinese character encoding, equivalent to ASCII for English text). I can work out how to do most of that in Delphi, but I don't know how to encode a string to GBK, which is specifically required by this API.
In SysUtils, there's a TEncoding class that comes with a few built-in encodings, such as UTF8, UTF16, and "Default" (the system's default code page), but I don't know how to set up a TEncoding for an arbitrary encoding such as GBK.
Does anyone know how to set this up?
You can use the TEncoding.GetEncoding() method to get a TEncoding object for a specific codepage/charset, eg:
var
Enc: TEncoding;
Bytes: TBytes;
begin
Enc := TEncoding.GetEncoding(936); // or TEncoding.GetEncoding('gb2312')
try
Bytes := Enc.GetBytes('Insert message here');
finally
Enc.Free;
end;
// encode Bytes to hex string as needed...
end;
TEncoding has a GetEncoding method for that. Give it the encoding name or number, and it will return a TEncoding instance.
For GBK, the number I think you want is 936. See Microsoft's list of code pages for more.

Convert WideString to PWideChar

I use Nicomsoft OCR library to OCR images in Delphi. It is good for my tasks and it has Delphi unit-wrapper so it's easy to use it in Delphi. However, Delphi debugger shows "Range Error" message when I pass empty string as parameter value to some OCR functions. I checked the wrapper code and found that DLL library functions accept PWideChars as parameter but wrapper accepts WideString. Inside of unit-wrapper there is the following conversion:
function CallSomeOCRFunction(a: WideString);
var b: PWideChar;
begin
b := #a[1];
CallSomeDLLFunction(b); //passing "b" to DLL function that accepts PWideChar
//.....
I did some research and discovered that many FAQs offer such conversion, for example: http://www.delphibasics.co.uk/RTL.asp?Name=PWideChar
It works if "a" is not empty string, but for empty string it cause "Range" error. How can I get pointer to first character of WideString variable correctly even if it is empty string? As far as I understand, even if string is empty it must contain zero character and PWideChar variable must point to it.
Use PWideChar() cast as described in the documentation. In your case it would be:
CallSomeDLLFunction(PWideChar(a));

Is a PChar UTF-8 coded?

I'm writing a tool, which use a C-DLL. The functions of the C-DLL expect a char*, which is in UTF-8 Format.
My question: Can I pass a PChar or do I have to use UTF8Encode(string)?
Consider a string variable named s. On an ANSI Delphi PChar(s) is ANSI encoded. On a Unicode Delphi it is UTF-16 encoded.
Therefore, either way, you need to convert s to UTF-8 encoding. And then you can use PAnsiChar(...) to get a pointer to a null terminated C string.
So, the code you need looks like this:
PAnsiChar(UTF8Encode(s))
Please edit the question and add the tag with your target Delphi version.
Pass it as PAnsiChar; PChar is a joker and may mean different data types. When you work with DLL-like API, you ignore compiler safety net and that means you should make your own. And that means you should use real types, not jokers, the types that would not change no matter which compiler settings and version would be active.
But before getting passing the pointer you should ensure that the source data is encoded in UTF8 actually.
.
Var data: string; buffer: UTF8String; buffer_ptr: PAnsiChar;
Begin
buffer := data + #0;
// transcoding to UTF8 from whatever charset it was, transparently done by Delphi RTL
// last zero to ensure that even for empty string you would have valid pointer below
buffer_ptr := Pointer(#buffer[1]); // making sure there can be no codepage bound to the datatype
C_DLL_CALL(buffeR_ptr);
End;

How can a text file be converted from ANSI to UTF-8 with Delphi 7?

I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.
The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.
Sample:
procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
Strings: TStrings;
begin
Strings := TStringList.Create;
try
Strings.LoadFromFile(AInputFileName);
Strings.Text := UTF8Encode(Strings.Text);
Strings.SaveToFile(AOutputFileName);
finally
Strings.Free;
end;
end;
Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.
var
Latin1Encoding: TEncoding;
begin
Latin1Encoding := TEncoding.GetEncoding(28591);
try
MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
finally
Latin1Encoding.Free;
end;
end;
Please read the whole answer before you start coding.
The proper answer to question - and it is not the easy one - basically consist of tree steps:
You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.
However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.
The proper solution to your problem is ways more complicated:
If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.
It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the #ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.
I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)
I did only this:
procedure TForm1.FormCreate(Sender: TObject);
begin
Strings := TStringList.Create;
end;
procedure TForm1.Button3Click(Sender: TObject);
begin
Strings.Text := UTF8Encode(Memo1.Text);
Strings.SaveToFile('new.txt');
end;
Verified with Notepad++ UTF8 without BOM
Did you mean ASCII?
ASCII is backwards compatible with UTF-8.
http://en.wikipedia.org/wiki/UTF-8

Resources