Using of unicode with a TStringGrid - delphi

I`d like to define a string constant with mixed greek and cyrillic symbols. Something like this
const
some_const = 'cyrillic symbols' + $03C9;
where $03C9 is the lower case letter omega from there.
Maybe, I`ll should make some_const a variable in a datamodule and initialize it.
//datamodule
var
some_const = 'cyrillic symbols' + some_function_to_make_string($03C9);
So, what is a correct function for some_function_to_make_string(code : Word)?
Can I use this some_const with a TStringGrid.Cells[aCol, aRow]?

Related

Delphi - check if a Unicode character occurs in a set of characters?

This code works good with Delphi-7 (until Delphi had Unicode support):
Value := edit1.Text[1];
if Value in ['м', 'ж'] then ...
'м', 'ж' - cyrillic symbols
But this construction doesn't work with Unicode charachter.
I try a lot of things, but they are doesn't work.
I also tried changing the value types to "Char" and "AnsiChar".
Doesn't work:
const
MySet : set of WideChar = [WideChar('м'), WideChar('ж')];
begin
Value := edit1.Text[1];
if Value in MySet then ...
Doesn't work:
if AnsiChar(Value) in ['м', 'ж'] then ...
Doesn't work:
if CharInSet(Value, ['м', 'ж']) then ...
But this works good:
if (Value = 'м') or (Value = 'ж') then ...
Whether there is an opportunity to check up UNICODE character by use of a SET in the modern versions of Delphi?
Or should we check each character individually?
My Delphi version is 10.4 update 2 Community Edition
A Delphi set type can only handle a maximum of 256 values, so it cannot be used for handling Unicode characters. For handling Unicode, the System.Character unit provides various methods and helpers.
For this particular case, there is an IsInArray() character helper you can use. Instead of declaring a set of characters, you will need to declare an array of characters:
var
ch: Char;
a: array of Char;
s: string;
begin
a := ['м', 'ж'];
s := 'abcж';
for ch in s do
if ch.IsInArray(a) then ...
end;
Note: Delphi XE7 introduced additional language support for initializing and working with dynamic arrays, and square brackets can also be used for simpler array initialization. In the context of above example, ['м', 'ж'] is not a set, but an array of wide characters.
check if a Unicode character occurs in a set of characters?
Do you mean a Delphi set?
In general, it is impossible to have a set of X where the base type X has more than 256 possible distinct values. So set of Byte is fine, but set of Word isn't possible. Since there are 256 * 256 distinct wide character values, it is therefore impossible to have a set of wide characters. (If this were indeed possible, a variable of such a set type would be 8 kB in size. That would be an unusually large variable.)
Since there is no such thing as "Delphi set of Unicode characters", the question "How to see if a character belongs to a Delphi set of Unicode characters" doesn't make sense.
Or do you simply mean a mathematical set?
If so, of course this is possible, but you cannot use a Delphi set to represent the mathematical set of characters. Instead, you need to use some other data type. One possibility is a simple array, if you don't mind its O(n) characteristics.

Getting a unicode, hidden symbol, as data in Delphi

I'm writing a delimiter for some Excel spreadsheet data and I need to read the rightward arrow symbol and pilcrow symbol in a large string.
The pilcrow symbol, for row ends, was fairly simply, using the Chr function and the AnsiChar code 182.
The rightward arrow has been more tricky to figure out. There isn't an AnsiChar code for it. The Unicode value for it is '2192'. I can't, however, figure out how to make this into a string or char type for me to use in my function.
Any easy ways to do this?
You can't use the 2192 character directly. But since a STRING variable can't contain this value either (as thus your TStringList can't either), that doesn't matter.
What character(s) are the 2192 character represented as in your StringList AFTER you have read it in? Probably by these three separate characters: 0xE2 0x86 0x92 (in UTF-8 format). The simple solution, therefore, is to start by replacing these three characters with a single, unique character that you can then assign to the Delimiter field of the TStringList.
Like this:
.
.
.
<Read file into a STRING variable, say S>
S := ReplaceStr(S,#$E2#$86#$92,'|');
SL := TStringList.Create;
SL.Text := S;
SL.Delimiter := '|';
.
.
.
You'll have to select a single-character representation of your 3-byte UTF-8 Unicode character that doesn't occur in your data elsewhere.
You need to represent that character as a UTF-16 character. In Unicode Delphi you would do it like this:
Chr(2192)
which is of type WideChar.
However, you are using Delphi 7 which is a pre-Unicode Delphi. So you have to do it like this:
var
wc: WideChar;
....
wc := WideChar(2192);
Now, this might all be to no avail for you since it sounds a little like your code is working with 8 bit ANSI text. In which case that character cannot be encoded in any 8 bit ANSI character set. If you really must use that character, you'll need to use Unicode text.

Entering Text From Delphi To Word

I'm using Delphi XE2 and use the following code to enter the letter Y into a bookmark in a Word (2010) template.
Doc.Bookmarks.Item('NS').Range.InsertAfter('Y');
Except in the document, instead of the letter Y, the number 89 appears.
Is the fault likely to be from my code or in the Word document? Any direction gratefully received.
Your literal 'Y' is a character literal rather than a string string literal. The ASCII code for Y is 89.
So, you are passing a Char rather than a string. When Word needs to get a string representation of that integer it simply converts the integer 89 to its textual representation, the string '89'.
To get around the problem you can do this:
var
Text: string;
....
Text := 'Y';
Doc.Bookmarks.Item('NS').Range.InsertAfter(Text);
The idea is that we ensure that we pass a string to InsertAfter() rather than a character. Remember that InsertAfter() receives a variant parameter and so you do need to be careful about the type of the payload stored in the variant.

Problem with ord () and string

i having this problem, if i have:
mychr = ' ';
where the 'space' in mychr equival to #255 (typed manually ALT+255), and i write:
myord = ord (mychr)
to myord return value 160 and not 255. Of course, same problem is too with charater ALT+254 etc.
As i can solve this problem? I have tested on delphi xe in console mode.
Note: if i use:
mychar = #255;
then function ord() return value correctly.
I think the problem is that the Windows Alt+Num shortcuts insert characters according to the local codepage, whereas a modern Delphi use Unicode characters, and these differ (unless the value is less than or equal to 127, I think). The solution is to enter the values #255 explicitly in code. In addition, it is a very bad habit to include 'invisible' special characters in code, because you cannot tell what character it is without copying in to an external tool! In addition, you will have to trust the text encoding of the .pas file. It is much better to use constants like #255. Even better, do
const
MY_PRECIOUS_VALUE = #255;
and use this constant every time you need it.
Update
According to the English Wikipedia article on Alt code:
If the number typed has a leading 0
(zero), the character set used is the
Windows code page that matches the
current input locale. For most systems
using the Latin alphabet, this is
Windows-1252. For a complete list, see
code page. If the number does not have
a leading 0 (zero), DOS compatibility
is invoked. The character set used is
the DOS code page for the current
input locale. For systems using
English, this is code page 437. For
most other systems using the Latin
alphabet, this is code page 850. For a
complete list, see code page.
So, if you really, really want to continue entering Alt keycodes, you'd better type Alt and 0255 with the leading zero.
If you type ALT+255, DOS codepage is used; for 437 and 850 DOS codepages (one of which you probably use) #255 is NBSP (non-breaking space). In Unicode, NBSP is $A0 (160). That explains why you obtain Ord 160.
AFAIK console mode use the OEM Ansi char set. And under Delphi XE, you're not in the Ansi world, but in the UCS-2 / Unicode world.
var MyChar: char;
MyWideChar: WideChar;
MyAnsiChar: AnsiChar;
begin
MyChar := #255;
MyWideChar := #255;
MyAnsiChar := #255;
The first two variables are the same, i.e. a character with Unicode code 255 = $00FF, since in Delphi XE, char = WideChar. For the first Unicode Page, see this article.
But MyAnsiChar is what will be displayed on the console, after conversion from the current code page into the OEM console code page.
In the Unicode chart, this $00FF is a minuscule y with trema:
U+00FF ÿ Latin Small Letter Y with diaeresis
Under the console, you'll use the OEM char set, i.e. Code Page 347. So in your case $FF is NOT a character, but a special code
FF NBSP Non Breaking SPace
which is converted into U+00A0 when converted back to Unicode:
U+00A0 NBSP Non Breaking SPace
It is very likely that you are in a Windows-1252 code page, so normally the Delphi XE AnsiString will map #255 into a minuscule y with trema:
FF ÿ Latin Small Letter Y with diaeresis
You can use low-level e.g. CharToOemBuff windows functions to perform the conversion to or from OEM, or use an OEM AnsiString type:
type
TOemString = AnsiString(437);
In all cases, the console is not the best way of entering accentuated text under modern Windows, and Unicode Delphi XE.
Using InputQuery function e.g. should be safer, since it will return an Unicode string variable. ;)

Wrong Unicode conversion, how to store accent characters in Delphi 2010 source code and handle character sets?

We are upgrading our project from Delphi 2006 to Delphi 2010. Old code was:
InputText: string;
InputText := SomeTEditComponent.Text;
...
for i := 1 to length(InputText) do
if InputText[i] in ['0'..'9', 'a'..'z', 'Ř' { and more special characters } ] then ...
Trouble is with accent letters - compare will fail.
I tried switch source code from ANSI to UTF8 and LE UCS-2 but without luck. Only cast as AnsiChar works:
if CharInSet(AnsiChar(InputText[i]), ['0'..'9', 'a'..'z', 'Ř']) then
Funny is how Delphi works with that letters - try this in Evaluate during debugging:
Ord('Ř') = Ord('Ø')
(yes, Delphi says True, on Windows 7 Czech)
Question is: How can I store and compare simple strings without forcing them as AnsiStrings? Because if this is not working why we should use Unicode?
Thanks all for reply
Right now we are using in some parts simple CharInSet(AnsiChar(...
The declaration of CharInSet is
function CharInSet(C: AnsiChar; const CharSet: TSysCharSet): Boolean; overload; inline;
function CharInSet(C: WideChar; const CharSet: TSysCharSet): Boolean; overload; inline;
while TSysCharSet is
TSysCharSet = set of AnsiChar;
Thus CharInSet can only compare to a set of AnsiChar. That is why your accented character is converted to AnsiChar.
There is no equivalent to a set of WideChar as sets are limited to 256 elements. You have to implement some other means to check the character.
Something like
const
specials: string = 'Ř';
if CharInSet(InputText[i], ['0'..'9', 'a'..'z']) or (Pos(InputText[I], specials) > 0) then
might be a try. You can add more characters to specials as needed.
Don't rely on the encoding of your Delphi source code files.
It might be mangled when using any non-Unicode tool to work on your text files (or even buggy Unicode aware tools).
The best way is to specify your characters as a 4-digit Unicode code point.
const
MyEuroSign = #$20AC;
See also my blog posting about this.
As mentioned by Uwe Raabe, the problem with Unicode char is, they're pretty large. If Delphi allowed you to create an "set of Char" it would be 8 Kb in size! An "set of AnsiChar" is only 32 bytes in size, pretty manageable.
I'd like to offer some alternatives. First is a sort of drop-in replacement for the CharInSet function, one that uses an array of CHAR to do the tests. It's only merit is that it can be called immediately from almost anywhere, but it's benefits stop there. I'd avoid this if I can:
function UnicodeCharInSet(UniChr:Char; CharArray:array of Char):Boolean;
var i:Integer;
begin
for i:=0 to High(CharArray) do
if CharArray[i] = UniChr then
begin
Result := True;
Exit;
end;
Result := False;
end;
The trouble with this function is that it doesn't handle the x in ['a'..'z'] syntax and it's slow! The alternatives are faster, but aren't as close to a drop-in replacement as one might want. The first set of alternatives to be investigated are the string functions from Microsoft. Amongst them there's IsCharAlpha and IsCharAlphanumeric, they might fix lots of issues. The problem with those, all "alpha" chars are the same: You might end up with valid Alpha chars in non-enlgish non-czech languages. Alternatively you can use the TCharacter class from Embarcadero - the implementation is all in the Character.pas unit, and it looks effective, I have no idea how effective Microsoft's implementation is.
An other alternative is to write your own functions, using an "case" statement to get things to work. Here's an example:
function UnicodeCharIs(UniChr:Char):Boolean;
var i:Integer;
begin
case UniChr of
'ă': Result := True;
'ş': Result := False;
'Ă': Result := True;
'Ş': Result := False;
else Result := False;
end;
end;
I inspected the assembler generated for this function. While Delphi has to implement a series of "if" conditions for this, it does it very effectively, way better then implementing the series of IF statements from code. But it could use a lot of improvement.
For tests that are used ALOT you might want to look for some bit-mask based implementation.
You should either use IFs instead of IN or find a WideCharSet implementation. This might help if you have a lot of sets: http://code.google.com/p/delphilhlplib/source/browse/trunk/Library/src/Extensions/DeHL.WideCharSet.pas.
You have stumbled onto a case where an idiom from Pre-Unicode Pascal should not be translated directly into the most visually similar idiom in Unicode era pascal.
First, let's deal with unicode string literals. If you can always be sure you will never have any body ever use your source code with any tool that could mess up your encodings
then you could use Unicode literals. Personally, I would not like to see Unicode codepoints in string literals in any of my code, for various reasons, the strongest reason being that my code may need to be reviewed for internationalization at some point, and having literals that belong to your local language peppered through your code is even more of a problem when you use a language other than those which use the simple Ascii/Ansi codepage symbols. Your source code will be more readable if you keep in mind the assumption that your accented characters, and even non-accented character literals would be better declared, as Jeroen says to declare them, in the const section, away from your actual place in the code that you use them.
Consider the case where you use the same string literal thirty three times throughout your code. Why should it be repeated instead of a constant? And even when it is used only once, isn't the code more readable if you declare a sane constant name?
So, first you should declare constants like he shows.
Second, the CharInSet function is deprecated for all uses other than the use it was intended for which is where you must continue to use the "Set of AnsiChar" types. This is no longer a recommended approach in Delphi 2009/2010, and using arrays of literal unicode characters, in your constant section, would be more readable, and more up-to-date.
I suggest you use the JCL StrContainsChars function and avoid character sets, since
you can not declare an inline SET of Unicode Characters at all, the language does not allow it. Instead use this, and be sure to comment it:
implementation
uses
JclStrings;
const
myChar1 = #$2001;
myChar2 = #$2002;
myChar3 = #$2003;
myMatchList1 : Array[0..2] of Char = (myChar1,myChar2,myChar3);
function Match(s:String):Boolean;
begin
result := StrContainsChars( s, myMatchList1,false);
end;
String, and Character Literals are bad to have peppering your code, especially character or numeric literals, are called "Magic values" and are to be avoided.
P.S. Your debug assertion shows that Ord('?') is downcasting the unicode character quietly to an AnsiChar byte-size character in the debugger. This behaviour is unexpected and should probably logged in QC.

Resources