FormatDateTime with chinese location - wrong characters... Delphi 2007 - delphi

Output: Period: from 11-Ê®¶þÔÂ-10 to 13-Ê®¶þÔÂ-10
The above output is from a line like this:
FormatDateTime('dd-mmm-yy', dateValue)
The IDE is Delphi 2007 and we are trying to gear up our app to the Chinese market.
How can I display the correct characters?
With the setting turn to Hindi (India), instead of the funny characters I have the "?".
I'm trying to display the date on a report, using ReportBuilder 11.
Any help will be much appreciated.

The characters seem to be correct, only IMO they have been rendered wrong.
Here's what I've done:
copied the string as presented by the OP ("11-Ê®¶þÔÂ-10 to 13-Ê®¶þÔÂ-10");
pasted it into a blank plain-text editor window with CP 1252 (Windows Latin-1) and saved;
opened the text file in a browser;
the text showed up the same as the browser chose the same codepage, so I turned on the automatic detection of character encoding, hinting it that the contents was Chinese;
the text changed to "11-十二月-10 to 13-十二月-10" (hope your browser displays correct Chinese characters here, my does anyway) and the codepage changed to GB18030 (and I then tried GB2312, but the text wouldn't change);
well, I was curious and searched for "十二月", and it turned out to stand for "December", quite suitable for the context unless the month names had been mixed up.
So, this is why I think it's a text rendering (or whatever you call it, I'm not really sure about the term) problem.
EDIT: Of course, it must have had something to do with the data type chosen for storing the string. If the function result is AnsiString and the variable is WideString, then maybe the characters get converted as WideChars and so they are no longer one-byte compounds of multi-byte characters but are multi-byte characters on their own? At least that's what happened when the OP posted them here.
I don't know actually, but if it is so then I doubt if they can be rendered correctly unless converted back and rendered as part of an AnsiString.

Another solution is to use TntControls. They're a set of standard Delphi controls enhanced to support Unicode. You'll have to go through all your form files and replace
Button1: TButton
Label1: TLabel
with TTntButton, TTntLabel et cetera.
Please note, that as things stand, it's not only Chinese which will not work. Try any language using symbols other than standard European set (latin + stress marks etc), for instance Russian.
But
By replacing the controls, you'll solve one part of the problem. Another part is that everywhere where you use "string" or "AnsiString" and "char/pchar" or "AnsiChar/PAnsiChar", you can store only strings in default system encoding.
For instance, if your system encoding ("Language for non-unicode programs") is EN/US, Russian characters will be replaced with question marks when you assign them to "string" variable:
a: WideString;
b: string;
...
a := 'ЯУЭФЫЦ'; //WideString can store international characters
b := a; //string cannot, so the data is lost - you cannot restore it from just "b"
To store string data which is independent of system encoding, use WideString/WideChar/PWideChar and appropriate functions. If you have
a, b: WideString;
...
a := UpperCase(b);
then unicode information will still be lost because UpperCase() accepts "string":
function UpperCase(const S: string): string;
Your WideString will be converted to "string" (losing all international characters), given to UpperCase, then the result will be converted back to WideString but it's already too late.
Therefore you have to replace all string functions with Wide versions:
a := WideUpperCase(b);
(for some functions, their wide versions are unavailable or called differently, TntControls also contain a bunch of wide function versions)

The Chinese Market requires support for multi-byte character sets (either WideChar or Unicode).
The Delphi 2007 RTL/VCL only supports single-byte character sets (there is very limited support for WideChar in the RTL and VCL).
The easiest for you is to upgrade to a Delphi version that supports Unicode (Delphi 2009 was the first version that supports Unicode, the current Delphi vesion is Delphi XE).
Or you will need to update all your components to support WideChar, and rewrite the portions of RTL/VCL for which you need WideChar support.
--jeroen

Did you install Far East charset support in Windows? In Windows pre 7 (or Vista) those charset are not installed by default in Western versions, you have to add them in Control Panel -> Regional Settins, IIRC
Using a non-Unicode version of Delphi unluckily what character can be displayed depends on the current codepage. If it is not one of the Chinese ones, for example, it could not display the characters you need. What characters are actually displayed depends on how the codes you're using are mapped in the current codepage. You could use a multi-lingual version of Windows to switch fully to the locale you need, or you have to use a Unicode version of Delphi (from 2009 onwards).

Related

Extended ASCII characters displayed as ? (question mark)

I have a form with a bunch of flags (static images) and below each flag is a tick box. The user selects the tick box to allow them to use a particular language. At design-time, I've set the checkbox captions for each language in their localised equivalent, in this example "Español" (Spanish).
For nearly every language this is displayed just fine at runtime, but for a couple of languages this changes to "Espa?ol". Specifically, this happens when I select Lithuanian and use:
// Note: 1063 = ((SUBLANG_DEFAULT shl 10) or LANG_LITHUANIAN)
SetThreadLocale(1063);
Curiously, if I simply re-apply the caption with the following line in the form's OnShow handler, then it displays correctly as "Español".
tbLangSpanish.Caption := 'Español'; // Strange, it now corrects itself!
The above code might be improved slightly by checking to see whether the runtime caption has a "?" character in it and only then re-apply the caption. The rest of the application displays Lithuanian perfectly (with labels being set at runtime).
Note that "ñ" is extended ASCII code 241. This issue affects a couple of other extended characters such as "ç" (character 231) in "Français". Of interest is that some extended ASCII characters are displayed correctly eg. "¾" (character 190).
Is this a bug in the IDE (using Delphi 7) or just a fact-of-life with legacy ASCII (ie. non-UNICODE) characters? Is there a prefered way to detect incompatible design-time extended ASCII characters at runtime (perhaps based on locale)?
None of the searches I performed gave any explanation about why a character would display as "?". I'm assuming this is because the requested character must be missing from the current Windows codepage, but no reference I could find explicitly says what is displayed when this happens (nor how to overcome the problem if you cannot use UNICODE).
The ? character is what happens when a conversion from one code page to another fails because the target code page does not contain the required character. This is an inevitable consequence of programming against the ANSI Win32 API. You simply cannot represent all characters in all languages.
The only realistic way forward is to use Unicode. You have two main options starting from Delphi 7:
Stick to Delphi 7 and use the TNT Unicode components.
Upgrade to a modern version of Delphi which has native support for Unicode.

Display specific regional characters

I need to display LST ISO/IEC 8859-13 codepage characters on window. Currently I'm using ShowMessage function for this purpose. Evrything displayed fine when windows locale is from this region, but how to deal when I have for example locale English UK? In this case I have just "?" instead of character. It should be some kind of possibility to show regional characters since MS Word displays them without correct locale. But how to do that?
You have two viable, tractable options:
Upgrade to a Unicode version of Delphi that has built in support for international text, or
Use the TNT Unicode controls that graft that support onto pre-Unicode Delphi by using the COM WideString type which is encoded using Unicode.
Word has no problems doing this because it uses the native Unicode API of Windows. On the other hand Delphi 7 uses the ANSI API that exists solely to provide compatibility with Windows 95/98/ME.
Short version:
you must also set the Font.Charset property if you want to be (more) sure that a particular component will display characters in a given charset.
Long version (sorry: i am prone to be wordy)
Using unicode (and you should switch to an unicode version of delphi, if you haven't done it yet) does not guarantee that the fonts installed on a foreign pc will contain the all the symbols you want to display.
Using unicode, moreover, does nothing to force the operating system to choose a font that actually supports the charset you need: even if there is an installed font able to display cyrillic characters, windows will NOT choose that font just because you are asking him to render a string containing cyrillic unicode code points: it will still be using the default system fonts.
So: there always is the possibility that you will need to ask your customers to install a font supporting the charset your application needs. if this can be a serious issue, you should consider the idea of distributing the required fonts along with your binaries (be careful with font copirights).
In second place: if there are components in your application you are SURE that they will always show russian text, well, in such components you MUST assign Font.Charset = RUSSIAN_CHARSET. This is the way of telling windows "I really need to display cyrillic chars in this component, so choose an appropriate font, regardeless of which side of the planet you are running"
It is a common misconception that che charset property is useless with unicode programs. it is quite the opposite.
Another common error is to assume that the "XYZ" font is identical on all windows installations in the world so, if I can see cyrillic chars with Thamoa on my pc, then I am safe using Thamoa for displaying cyrillic in the rest of the world. it is quite the opposite: a different unicode subset gets installed depending on the computer locale.
and... Since AFAIK ShowMessage() uses the system default font, you can't use this procedure for displaying messages containing "strange" characters: you need to write your own ShowMessage dialog box.
EDIT: here is an example demonstrating what I am saying
just drop a TPaintBox component on a form, name it "pbox", and write this OnPaint event handler:
(remember to save the source in utf-8 format, otherwise the russian symbols will be mangled)
procedure TForm1.pboxPaint(Sender: TObject);
begin
pbox.canvas.Font.Name := 'Fixedsys';
pbox.Canvas.TextOut(0,0,'Это русский');
pbox.canvas.Font.Name := 'Fixedsys';
pbox.canvas.Font.Charset := RUSSIAN_CHARSET;
pbox.Canvas.TextOut(0,20,'Это русский');
end;
On an italian pc (and I guess on any west-european or american pc) the fixedsys font does not normally contain the russian characters symbols: the first TextOut will insist in using the FixedSys font and will write garbage. On my pc i get a sequence of black square boxes, for example.
The second textout is made after having set charset=RUSSIAN_CHARSET, so windows will know that we need the russian symbols and so chooses another font. The second TextOut is not using the FixedSys font I wanted to use, but at least it is readable!
On a russian installation of windows, both TextOut calls will correctly render the russian text using the FixedSys font, since russian installations of windows have a russian version of the fixedsys font. and Windows knows it.
You can install more than one locale on a Windows system. If you are using the matching locale then it is the default locale and you can use a dialog with a text field which uses the correct locale / character set. On your development system, where English UK is installed, add the missing language(s).
Unicode is nicer, but not required to display characters from non-default character sets (computers were able to handle many character sets before Uincode was invented). Even MS Wordpad was able to display characters from different codepages, including multi-byte character sets (Korean, Japanese, Chinese) long before Unicode existed.
ShowMessage can not be used because it sticks to the default locale. But can easily be replaced with a custom dialog-style form.

ANSI application and Vietnam Codepage

I ve changed the codepage of my system to Russian as explained on this site
PC is rebooted
Then, I created a file in a dir with a name containing special russian character
Then, I ve listed all files in this dir and tried to show the file with typical Delphi 7 code using:
SearchRec: TSearchRec;
FindFirst
showmessage(SearchRec.Name);
FindNext(SearchRec);
FindClose(SearchRec);
The code works well.
When I redo all 4 steps with Vietnamese instead of Russian, the filename shown with showmessage is not correct. Some ? appear instead (see the screenshot):
Please help
This is due to the way Delphi versions prior to 2009 implements their string type. It is not a problem of font, but a problem of character encoding.
All string variables, and also all Windows API calls are performed using ANSI encoding. With ANSI, you can use only one code page at a time. In order to mix code pages (i.e. mix russian and vietnamese encodings), you'll need to process the text and call UNICODE Windows API.
Here is what occurred in your case:
You create a file with russian characters by Windows, using UNICODE encoding;
When you read the file using vietnamese current code page, only the first 127 characters (i.e. ASCII seven chars, e.g. numbers, main punctuations and English letters), are able to be read from UNICODE into ANSI vietnamese: during the conversion, all uncorrect characters are converted as ? in your ANSI vietnamese string.
So you have several workarounds:
Upgrade to Delphi >= 2009, and your string will be UNICODE, so you will be able to mix character sets;
Use widestring for storing your text, and call directly the windows wide APIs - that is, you can't use the VCL units nor FindFirst/FindNext as defined in SysUtils, nor ShowMessage as defined in Dialogs.
Of course, the first one is the easiest!

how to know (in code) that some characters are displayed fine (or not) in the interface of a program made in Delphi

Sorry about my english...
I'm trying to make a small program in Delphi 7.
Its interface will have text in my language, which has some characters with diacritics.
If "Language for non-Unicode programs" is set to my language those characters are always displayed fine. That's normal.
If is set to something else, sometimes are displayed fine, sometimes they are not.
How can I know that they can be displayed fine or not...?
Oh, and I can't use Unicode components, only normal.
Only way that I found is to capture the image of one characters into a bitmap and check pixel by pixel. But it's a lot of work to implement, slow and imprecise.
I can use GetSystemDefaultLangID function and know that "Language for non-Unicode programs" is set to something else but still don't know if they are displayed fine or not.
Thank you for any idea.
Welcome to the joys of AnsiStrings encoded using code-pages. You should not be using AnsiStrings at all, and you know that, but you say without explaining it that you can't use unicode controls. this seems strange to me. You should be using either:
(a) A Unicode version of Delphi (2009,2010, XE), where String=UnicodeString.
(b) If not that, at least use Proper Unicode controls, such as TNT Controls, and internally use WideString types where you need to store accented or international characters.
Your version of Delphi has String=AnsiString, and you are relying on the locale that your system is set to (as you say in your question) to select the codepage representations of accented characters, a problematic scheme. If you really can't move up from Delphi 7, at least start using WideStrings, and TNT Unicode Controls, but I must say that effort is WASTED you would be better off getting Delphi XE, and just porting to Unicode.
Your question asks "how can I know if they can be stored fine or not?" You can encode and decode using your codepage, and check if anything is replaced with a "?". The windows function WideCharToMultiByte, for example behaves like this. MBCS is a world of pain, and not worth doing, but you asked how you can find out where the floor falls out from under you, so that API will help you understand your selected encoding rule.
Use WideCharToMultiByte Function - http://msdn.microsoft.com/en-us/library/dd374130(v=vs.85).aspx and check lpUsedDefaultChar parameter.
Since this has been on my research list for a while, but didn't reach the top of that list yet, I can only help you out with a few links.
You will need to to quite a bit of experimentation :-)
When using Unicode, you can use functions ScriptGetCMap and GetGlyphIndices to test if a code point is in the font.
When not using Unicode, you can use the function GetGlyphIndices
There are few Delphi translations of these functions around. This Borland Newsgroup thread has a few hints on using GetGlyphIndices in Delphi.
Here is a search ScriptGetCMap in Delphi.
This page has a list of some interesting API calls that might help you further.
An extra handicap is that because not all fonts contain all characters, so Windows can do font substitution for you.
I'm not sure how to figure out that, but it is something you have to check for too.
Good luck :-)
procedure TForm1.Button2Click(Sender: TObject);
var
ACP: Integer;
begin
ACP := GetACP;
Caption := 'CP' + IntToStr(ACP);
if ACP = 1250 then
Caption := Caption + ' is okay for Romanian language';
end;

delphi 2009 unicode + ansi problem

I'm porting an isapi (pageproducers) application from delphi 7 to delphi 2009, the pages are based on html files in UTF8.
Everything goes well except when Onhtmltag is fired and I replace a transparent tag with any value with special characters like accented characters (áé...) Those characters are replaced in the output with an � character.
What's wrong?
As part of your debugging procedure, you should go find out exactly what byte value(s) the browser receives for the question-mark character.
As you should know, Delphi 2009's string type is Unicode, whereas all previous version were ANSI. Delphi 7 introduced the Utf8String type, but Delphi 2009 made that type special. If you're not using that type for holding strings that are encoded as UTF-8, then you should start doing so. Values held in Utf8String variables will be converted to UnicodeString values automatically when you assign one to the other.
If you're storing your UTF-8-encoded strings in ordinary AnsiString variables, then they will be converted to Unicode using the default system code page if you assign them to a UnicodeString. That's not what you want.
If you're assigning UTF-8-encoded literals to variables of type string, stop that. That type expects its values to be encoded as UTF-16, just like WideString always has.
If you are loading your files into a TStrings descendant with LoadFromFile, then you need to start using that method's second parameter, which tells it what encoding to use. UTF-8-encoded files should use TEncoding.UTF8. The default is TEncoding.Unicode, which is little-endian UTF-16.
This is probably a character encoding issue.
The Delphi IDE usually uses Windows-1252 or UTF-16 to encode source code.
HTML often uses UTF-8.
You probably need some transliteration between those encodings.
For that you need to find out what encodings are used exactly (like Rob mentions).
Or revert to HTML escaping accented characters (like Ralph mentions)
Can you post a small app that shows the problem? (you can email me, about anything that has jeroen in the username and pluimers.com in the domain name will arrive in my mailbox).
--jeroen
Thank you for your help, after some test the problem was very very simple (or stupid also)
response.contenttype := 'text/html charset=UTF-8'
No need to translate manually between unicodestring utf8string ansistring widestring. Delphi 2009 string usage is near to perfect.

Resources