Chinese Symbols When Loading Text - delphi

I am loading text from text file to richedit but it displays weird Chinese symbols instead, what am I doing wrong?
ms := TMemoryStream.Create;
ms.LoadFromFile('C:\aw.txt');
ms.Seek(0, soFromEnd);
zChar:=#0;
ms.Write(zChar, 1);
ms.Seek(0, soFromBeginning);
RichEdit1.SetSelTextBuf(ms.Memory);
ms.free;

Edit Revising my answer due to the comments on question, especially the hint to Delphi 7.
Richedit is based on richedit.dll, which comes from MS and is packaged with Windows. After Windows ME, it is UNICODE enabled. Thus it gets the character set interpreting the first 2 characters of the file as BOF. There are instances that characters will be missinterpreted and taken as a BOF in ASCII or ANSI files (they will not feature a BOF for compatibility reasons). This can be seen in write.exe too.
Make sure you use the right encoding when saving the file in notepad. If the file does not have an encoding (look at the first two bytes in a binary viewer), try - if possible - to add two spaces to the front and see whether the problem persists.
Delphi 2009 and 2010
I will leave my first answer in to help people when upgrading to Delphi 2009 and up:
I would actually say that the text file does not have an encoding but is pure ASCII or ANSI and you are using Delphi 2009 or 2010, which is UNICODE enabled. The first two characters will be taken as BOF (which tells the program which UNICODE encoding is used). If this happens to be a correct BOF, maybe the wrong encoding will be applied.
TMemoryStream does not allow enforcement of encoding.
If possible you can use TStrings, that has a new TEncoding parameter in the LoadFromFile method. This would be like
RichEdit1.Lines.LoadFromFile('c:\test.txt', TEncoding.ASCII);
Have a look at this page as well: http://edn.embarcadero.com/article/38693

Related

Converting special characters in TStringList

I am using Delphi 7 and have a routine which takes a csv file with a series of records and imports them. This is done by loading it into a TStringList with MyStringList.LoadFromFile(csvfile) and then getting each line with line = MyStringList[i].
This has always worked fine but I have now discovered that special characters are not picked up correctly. For example, Rue François Coppée comes out as Rue François Coppée - the accented French characters are the problem.
Is there a simple way to solve this?
Your file is encoded as UTF-8. For instance consider the ç. As you can see from the link, this is encoded in UTF-8 as 0xC3 0xA7. And in Windows-1252, 0xC3 encodes à and 0xA7 encodes §.
Whether or not you can handle this easily using your ANSI Delphi depends on the prevailing code page under which your program runs.
If you are using Windows 1252 then you will be fine. You just need to decode the UTF-8 encoded text with a call to UTF8Decode.
If you are using a different locale then life gets more difficult. Those characters may not be present in your locale's character set and in that case you cannot represent them in a Delphi string variable which is encoded using the prevailing ANSI charset. If this is the case then you need to use Unicode.
If you care about handling international text then you need to either:
Upgrade to a modern Delphi which has Unicode support, or
Stick to Delphi 7 and use WideString and the TNT Unicode components.
Probably it's not in UTF8 encoding. Try to convert it:
Text := UTF8Encode(Text);
Regards,

ANSI application and Vietnam Codepage

I ve changed the codepage of my system to Russian as explained on this site
PC is rebooted
Then, I created a file in a dir with a name containing special russian character
Then, I ve listed all files in this dir and tried to show the file with typical Delphi 7 code using:
SearchRec: TSearchRec;
FindFirst
showmessage(SearchRec.Name);
FindNext(SearchRec);
FindClose(SearchRec);
The code works well.
When I redo all 4 steps with Vietnamese instead of Russian, the filename shown with showmessage is not correct. Some ? appear instead (see the screenshot):
Please help
This is due to the way Delphi versions prior to 2009 implements their string type. It is not a problem of font, but a problem of character encoding.
All string variables, and also all Windows API calls are performed using ANSI encoding. With ANSI, you can use only one code page at a time. In order to mix code pages (i.e. mix russian and vietnamese encodings), you'll need to process the text and call UNICODE Windows API.
Here is what occurred in your case:
You create a file with russian characters by Windows, using UNICODE encoding;
When you read the file using vietnamese current code page, only the first 127 characters (i.e. ASCII seven chars, e.g. numbers, main punctuations and English letters), are able to be read from UNICODE into ANSI vietnamese: during the conversion, all uncorrect characters are converted as ? in your ANSI vietnamese string.
So you have several workarounds:
Upgrade to Delphi >= 2009, and your string will be UNICODE, so you will be able to mix character sets;
Use widestring for storing your text, and call directly the windows wide APIs - that is, you can't use the VCL units nor FindFirst/FindNext as defined in SysUtils, nor ShowMessage as defined in Dialogs.
Of course, the first one is the easiest!

how to know (in code) that some characters are displayed fine (or not) in the interface of a program made in Delphi

Sorry about my english...
I'm trying to make a small program in Delphi 7.
Its interface will have text in my language, which has some characters with diacritics.
If "Language for non-Unicode programs" is set to my language those characters are always displayed fine. That's normal.
If is set to something else, sometimes are displayed fine, sometimes they are not.
How can I know that they can be displayed fine or not...?
Oh, and I can't use Unicode components, only normal.
Only way that I found is to capture the image of one characters into a bitmap and check pixel by pixel. But it's a lot of work to implement, slow and imprecise.
I can use GetSystemDefaultLangID function and know that "Language for non-Unicode programs" is set to something else but still don't know if they are displayed fine or not.
Thank you for any idea.
Welcome to the joys of AnsiStrings encoded using code-pages. You should not be using AnsiStrings at all, and you know that, but you say without explaining it that you can't use unicode controls. this seems strange to me. You should be using either:
(a) A Unicode version of Delphi (2009,2010, XE), where String=UnicodeString.
(b) If not that, at least use Proper Unicode controls, such as TNT Controls, and internally use WideString types where you need to store accented or international characters.
Your version of Delphi has String=AnsiString, and you are relying on the locale that your system is set to (as you say in your question) to select the codepage representations of accented characters, a problematic scheme. If you really can't move up from Delphi 7, at least start using WideStrings, and TNT Unicode Controls, but I must say that effort is WASTED you would be better off getting Delphi XE, and just porting to Unicode.
Your question asks "how can I know if they can be stored fine or not?" You can encode and decode using your codepage, and check if anything is replaced with a "?". The windows function WideCharToMultiByte, for example behaves like this. MBCS is a world of pain, and not worth doing, but you asked how you can find out where the floor falls out from under you, so that API will help you understand your selected encoding rule.
Use WideCharToMultiByte Function - http://msdn.microsoft.com/en-us/library/dd374130(v=vs.85).aspx and check lpUsedDefaultChar parameter.
Since this has been on my research list for a while, but didn't reach the top of that list yet, I can only help you out with a few links.
You will need to to quite a bit of experimentation :-)
When using Unicode, you can use functions ScriptGetCMap and GetGlyphIndices to test if a code point is in the font.
When not using Unicode, you can use the function GetGlyphIndices
There are few Delphi translations of these functions around. This Borland Newsgroup thread has a few hints on using GetGlyphIndices in Delphi.
Here is a search ScriptGetCMap in Delphi.
This page has a list of some interesting API calls that might help you further.
An extra handicap is that because not all fonts contain all characters, so Windows can do font substitution for you.
I'm not sure how to figure out that, but it is something you have to check for too.
Good luck :-)
procedure TForm1.Button2Click(Sender: TObject);
var
ACP: Integer;
begin
ACP := GetACP;
Caption := 'CP' + IntToStr(ACP);
if ACP = 1250 then
Caption := Caption + ' is okay for Romanian language';
end;

FormatDateTime with chinese location - wrong characters... Delphi 2007

Output: Period: from 11-Ê®¶þÔÂ-10 to 13-Ê®¶þÔÂ-10
The above output is from a line like this:
FormatDateTime('dd-mmm-yy', dateValue)
The IDE is Delphi 2007 and we are trying to gear up our app to the Chinese market.
How can I display the correct characters?
With the setting turn to Hindi (India), instead of the funny characters I have the "?".
I'm trying to display the date on a report, using ReportBuilder 11.
Any help will be much appreciated.
The characters seem to be correct, only IMO they have been rendered wrong.
Here's what I've done:
copied the string as presented by the OP ("11-Ê®¶þÔÂ-10 to 13-Ê®¶þÔÂ-10");
pasted it into a blank plain-text editor window with CP 1252 (Windows Latin-1) and saved;
opened the text file in a browser;
the text showed up the same as the browser chose the same codepage, so I turned on the automatic detection of character encoding, hinting it that the contents was Chinese;
the text changed to "11-十二月-10 to 13-十二月-10" (hope your browser displays correct Chinese characters here, my does anyway) and the codepage changed to GB18030 (and I then tried GB2312, but the text wouldn't change);
well, I was curious and searched for "十二月", and it turned out to stand for "December", quite suitable for the context unless the month names had been mixed up.
So, this is why I think it's a text rendering (or whatever you call it, I'm not really sure about the term) problem.
EDIT: Of course, it must have had something to do with the data type chosen for storing the string. If the function result is AnsiString and the variable is WideString, then maybe the characters get converted as WideChars and so they are no longer one-byte compounds of multi-byte characters but are multi-byte characters on their own? At least that's what happened when the OP posted them here.
I don't know actually, but if it is so then I doubt if they can be rendered correctly unless converted back and rendered as part of an AnsiString.
Another solution is to use TntControls. They're a set of standard Delphi controls enhanced to support Unicode. You'll have to go through all your form files and replace
Button1: TButton
Label1: TLabel
with TTntButton, TTntLabel et cetera.
Please note, that as things stand, it's not only Chinese which will not work. Try any language using symbols other than standard European set (latin + stress marks etc), for instance Russian.
But
By replacing the controls, you'll solve one part of the problem. Another part is that everywhere where you use "string" or "AnsiString" and "char/pchar" or "AnsiChar/PAnsiChar", you can store only strings in default system encoding.
For instance, if your system encoding ("Language for non-unicode programs") is EN/US, Russian characters will be replaced with question marks when you assign them to "string" variable:
a: WideString;
b: string;
...
a := 'ЯУЭФЫЦ'; //WideString can store international characters
b := a; //string cannot, so the data is lost - you cannot restore it from just "b"
To store string data which is independent of system encoding, use WideString/WideChar/PWideChar and appropriate functions. If you have
a, b: WideString;
...
a := UpperCase(b);
then unicode information will still be lost because UpperCase() accepts "string":
function UpperCase(const S: string): string;
Your WideString will be converted to "string" (losing all international characters), given to UpperCase, then the result will be converted back to WideString but it's already too late.
Therefore you have to replace all string functions with Wide versions:
a := WideUpperCase(b);
(for some functions, their wide versions are unavailable or called differently, TntControls also contain a bunch of wide function versions)
The Chinese Market requires support for multi-byte character sets (either WideChar or Unicode).
The Delphi 2007 RTL/VCL only supports single-byte character sets (there is very limited support for WideChar in the RTL and VCL).
The easiest for you is to upgrade to a Delphi version that supports Unicode (Delphi 2009 was the first version that supports Unicode, the current Delphi vesion is Delphi XE).
Or you will need to update all your components to support WideChar, and rewrite the portions of RTL/VCL for which you need WideChar support.
--jeroen
Did you install Far East charset support in Windows? In Windows pre 7 (or Vista) those charset are not installed by default in Western versions, you have to add them in Control Panel -> Regional Settins, IIRC
Using a non-Unicode version of Delphi unluckily what character can be displayed depends on the current codepage. If it is not one of the Chinese ones, for example, it could not display the characters you need. What characters are actually displayed depends on how the codes you're using are mapped in the current codepage. You could use a multi-lingual version of Windows to switch fully to the locale you need, or you have to use a Unicode version of Delphi (from 2009 onwards).

read unicode output of console application

I've console app. written in Delphi 2010. It's output is Unicode supported. (I used UTF8Encode and SetConsoleOutputCP(CP_UTF8) for this). When I run the program from command prompt it works fine.
Now I want to read the output from another program which was created in Delphi 5. I use this method. But I've problems with unicode characters.
Does anyone have a recommendation to read the unicode output of console app. from Delphi 5?
Delphi 5 does have unicode support, but only through WideStrings which are UTF-16(-LE) encoded. Natively, D5 does not have UTF-8 support.
You can read the output of your D2010 console app in the way you already do, although I would take out the OemToAnsi conversion. OEMToAnsi was superseded (even in D5 days) by OEMToChar which can be used to convert OEM characters to Ansi (single byte characters using various code pages) or WideString (UTF-16-LE Unicode), but it won't do a thing to interpret the UTF-8 bytes coming in and might just mess things up.
What you need is a set of functions that can take all the "raw" utf-8 bytes you have read from the pipe and convert them to (UTF-16-LE encoded) WideStrings which you can then feed to a control that can take in and show WideStrings. Alternatively you could look for a control that does the "raw" byte interpretation and conversion all itself, but I must admit I haven't seen any let alone one that still supports D5.
A library that can convert many different encodings and still supports D5 is DIUnicode: http://www.wikitaxi.org/delphi/doku.php/products/unicode/index
You have two problems using Delphi 5 with unicode output.
The first is TMemo does not support Unicode characters you will need to find another control, such as the ones in TMS Unicode Component Pack. However, this Component pack does not support Delphi 5.
The second problem is with this part of the code:
repeat
BytesRead := 0;
ReadFile(ReadPipe,Buffer[0],
ReadBuffer,BytesRead,nil) ;
Buffer[BytesRead]:= #0;
OemToAnsi(Buffer,Buffer) ;
AMemo.Text := AMemo.text + String(Buffer) ;
until (BytesRead < ReadBuffer) ;
It is reading he characters and placing them into buffer which is a PCHAR (single character per byte in D5) Then type casting this to a String which is an AnsiString in D5.
Although I have not used D5 for years, the only type that I can remember that can handle unicode data in D5 is WideString.
I've changed somethings as follows and it works fine :
In console application, I didn't use SetConsoleOutputCP(CP_UTF8). Only use string output...
And at the other program (Delphi 5), I use this function without use OemToChar(Buffer,Buffer)

Resources