Delphi XE3: Chr Ansi Version? - delphi

I have my own D6 pas library with crypto functions.
Today I tried to use it under XE3, and I found many bugs in it because of unicode.
I tried to port to AnsiString, but I failed on chr(nnn) which was 8 bit limited under Delphi6.
I'm trying to explain the problem:
Str := chr(hchar);
AStr := Str;
Str - string; AStr - ansistring.
When the hchar was 216 (diamater), then AStr changed to "O", what is Ascii 79...
And I lost the original value at this moment.
Is there any function for Ansi Chr? For example: "AChr(xxxx)"
Or I need to change my code to not use Strings in the inner section, only bytes and later convert these bytes to AnsiString?
Thanks for any suggestion, help, info!
dd

You can write AnsiChar(SomeOrdinalValue) to make an AnsiChar with a specific ordinal. So your code should be:
AStr := AnsiChar(hchar);
The problem with the code in the question is that you converted to UTF-16 and back.
It would seem to me that strings are the wrong type for your crypto code. Use a byte array, TBytes.

Related

Getting a unicode, hidden symbol, as data in Delphi

I'm writing a delimiter for some Excel spreadsheet data and I need to read the rightward arrow symbol and pilcrow symbol in a large string.
The pilcrow symbol, for row ends, was fairly simply, using the Chr function and the AnsiChar code 182.
The rightward arrow has been more tricky to figure out. There isn't an AnsiChar code for it. The Unicode value for it is '2192'. I can't, however, figure out how to make this into a string or char type for me to use in my function.
Any easy ways to do this?
You can't use the 2192 character directly. But since a STRING variable can't contain this value either (as thus your TStringList can't either), that doesn't matter.
What character(s) are the 2192 character represented as in your StringList AFTER you have read it in? Probably by these three separate characters: 0xE2 0x86 0x92 (in UTF-8 format). The simple solution, therefore, is to start by replacing these three characters with a single, unique character that you can then assign to the Delimiter field of the TStringList.
Like this:
.
.
.
<Read file into a STRING variable, say S>
S := ReplaceStr(S,#$E2#$86#$92,'|');
SL := TStringList.Create;
SL.Text := S;
SL.Delimiter := '|';
.
.
.
You'll have to select a single-character representation of your 3-byte UTF-8 Unicode character that doesn't occur in your data elsewhere.
You need to represent that character as a UTF-16 character. In Unicode Delphi you would do it like this:
Chr(2192)
which is of type WideChar.
However, you are using Delphi 7 which is a pre-Unicode Delphi. So you have to do it like this:
var
wc: WideChar;
....
wc := WideChar(2192);
Now, this might all be to no avail for you since it sounds a little like your code is working with 8 bit ANSI text. In which case that character cannot be encoded in any 8 bit ANSI character set. If you really must use that character, you'll need to use Unicode text.

UTF8 version of WIDESTRING

I have a text that I need to store it in a widestring variable. But my text is UTF8 and widestring doesn't support UTF8 and converts it to some chinese characters.
so is there any UTF8 version of WIDESTRING?
I always use UTF8string but in this case I have to use WideString
When you assign a UTF8String variable to a WideString variable, the compiler automatically inserts instructions to decode the string (in Delphi 2009 and later). It coverts UTF-8 to UTF-16, which is what WideString holds. If your WideString variable holds Chinese characters, then that's because your UTF-8-encoded string holds UTF-8-encoded Chinese characters.
If you want your string ws to hold 16-bit versions of the bytes in your UTF8String s, then you can by-pass the automatic conversion with some type-casting:
var
ws: WideString;
i: Integer;
c: AnsiChar;
SetLength(ws, Length(s));
for i := 1 to Length(s) do begin
c := s[i];
ws[i] := WideChar(Ord(c));
end;
If you're using Delphi 2009 or later (which includes the XE series), then you should consider using UnicodeString instead of WideString. The former is a native Delphi type, whereas the latter is more of a wrapper for the Windows BSTR type. Both types exhibit the automatic conversion behavior when assigning to and from AnsiString derivatives like UTF8String, though, so they type you use doesn't affect this answer.
In earlier Delphi versions, the compiler would attempt to decode the string using the system code page (which is never UTF-8). To make it decode the string properly, call Utf8Decode:
ws := Utf8Decode(s);

Delphi 2010 Blockread seems to get different data than previous version

I had my old MP3 Id3 tag reader recompiled under D2010 and it seems it won't find the tags anymore.
code is farily simple, but it doesn't work.
The debugger shows a lots of zero and then chineese signs in the results!
var dat:file of char;
id3:array [0..TAGLEN] of Char; //is 0..127 for ID3 v1
begin
vValid:=True;
if FileExists(vFilename) then begin
assignfile(dat,vFilename);
If (FileGetAttr(vFilename)>32) or (FileGetAttr(vFilename)=1) then
Filemode:= 0
Else
Filemode:= 2;
reset(dat);
seek(dat,FileSize(dat)-128);
blockread(dat,id3,128);
closefile(dat);
vMP3tag:=copy(id3, 0, 3);
if vMP3Tag='TAG' then begin
vTitle:=strip(copy(id3, 4, 30),' ');
vArtist:=strip(copy(id3, 34, 30), ' ');
I heard something about Unicode, and PansiChar, but I still don't understand much what these do anyway :)
thanks for looking
Try this:
var dat:file of AnsiChar;
id3:array [0..TAGLEN] of AnsiChar; //is 0..127 for ID3 v1
That is of course if your file is ansi-based instead of unicode based. I have no idea what might be in an id3 tag of an mp3 file.
If you want to understand the difference, this white paper explained it all to me. Basically Unicode uses more memory space to store a single character (like 4 times the amount of an ansi character), but they allow characters like ie Chinese and Japanese, which ansi doesn't provide. Just read the white paper, then it'll all be clear.
In short, Ansichar and Ansistring is what used to be a string in Delphi before D2009. In those days your application wouldn't be unicode compatible (you couldn't type chinese characters by default).
As from D2009, the definition of a string changed from an ansistring to a widestring and ansichar to widechar. That means your application will be unicode by default. But old code, expecting strings to be ansicode, need to be adapted to reflect that change.
Your code said char, meaning ansichar to pre-D2009 compilers, but widechar to D2009+ compilers. In other words, the new compilers read your code differently.
I hope that explains it a bit.
Oh!
it seems like AnsiCHar instead of Char is the way to go in D2010.
Ansi-char-them-all!

Wrong Unicode conversion, how to store accent characters in Delphi 2010 source code and handle character sets?

We are upgrading our project from Delphi 2006 to Delphi 2010. Old code was:
InputText: string;
InputText := SomeTEditComponent.Text;
...
for i := 1 to length(InputText) do
if InputText[i] in ['0'..'9', 'a'..'z', 'Ř' { and more special characters } ] then ...
Trouble is with accent letters - compare will fail.
I tried switch source code from ANSI to UTF8 and LE UCS-2 but without luck. Only cast as AnsiChar works:
if CharInSet(AnsiChar(InputText[i]), ['0'..'9', 'a'..'z', 'Ř']) then
Funny is how Delphi works with that letters - try this in Evaluate during debugging:
Ord('Ř') = Ord('Ø')
(yes, Delphi says True, on Windows 7 Czech)
Question is: How can I store and compare simple strings without forcing them as AnsiStrings? Because if this is not working why we should use Unicode?
Thanks all for reply
Right now we are using in some parts simple CharInSet(AnsiChar(...
The declaration of CharInSet is
function CharInSet(C: AnsiChar; const CharSet: TSysCharSet): Boolean; overload; inline;
function CharInSet(C: WideChar; const CharSet: TSysCharSet): Boolean; overload; inline;
while TSysCharSet is
TSysCharSet = set of AnsiChar;
Thus CharInSet can only compare to a set of AnsiChar. That is why your accented character is converted to AnsiChar.
There is no equivalent to a set of WideChar as sets are limited to 256 elements. You have to implement some other means to check the character.
Something like
const
specials: string = 'Ř';
if CharInSet(InputText[i], ['0'..'9', 'a'..'z']) or (Pos(InputText[I], specials) > 0) then
might be a try. You can add more characters to specials as needed.
Don't rely on the encoding of your Delphi source code files.
It might be mangled when using any non-Unicode tool to work on your text files (or even buggy Unicode aware tools).
The best way is to specify your characters as a 4-digit Unicode code point.
const
MyEuroSign = #$20AC;
See also my blog posting about this.
As mentioned by Uwe Raabe, the problem with Unicode char is, they're pretty large. If Delphi allowed you to create an "set of Char" it would be 8 Kb in size! An "set of AnsiChar" is only 32 bytes in size, pretty manageable.
I'd like to offer some alternatives. First is a sort of drop-in replacement for the CharInSet function, one that uses an array of CHAR to do the tests. It's only merit is that it can be called immediately from almost anywhere, but it's benefits stop there. I'd avoid this if I can:
function UnicodeCharInSet(UniChr:Char; CharArray:array of Char):Boolean;
var i:Integer;
begin
for i:=0 to High(CharArray) do
if CharArray[i] = UniChr then
begin
Result := True;
Exit;
end;
Result := False;
end;
The trouble with this function is that it doesn't handle the x in ['a'..'z'] syntax and it's slow! The alternatives are faster, but aren't as close to a drop-in replacement as one might want. The first set of alternatives to be investigated are the string functions from Microsoft. Amongst them there's IsCharAlpha and IsCharAlphanumeric, they might fix lots of issues. The problem with those, all "alpha" chars are the same: You might end up with valid Alpha chars in non-enlgish non-czech languages. Alternatively you can use the TCharacter class from Embarcadero - the implementation is all in the Character.pas unit, and it looks effective, I have no idea how effective Microsoft's implementation is.
An other alternative is to write your own functions, using an "case" statement to get things to work. Here's an example:
function UnicodeCharIs(UniChr:Char):Boolean;
var i:Integer;
begin
case UniChr of
'ă': Result := True;
'ş': Result := False;
'Ă': Result := True;
'Ş': Result := False;
else Result := False;
end;
end;
I inspected the assembler generated for this function. While Delphi has to implement a series of "if" conditions for this, it does it very effectively, way better then implementing the series of IF statements from code. But it could use a lot of improvement.
For tests that are used ALOT you might want to look for some bit-mask based implementation.
You should either use IFs instead of IN or find a WideCharSet implementation. This might help if you have a lot of sets: http://code.google.com/p/delphilhlplib/source/browse/trunk/Library/src/Extensions/DeHL.WideCharSet.pas.
You have stumbled onto a case where an idiom from Pre-Unicode Pascal should not be translated directly into the most visually similar idiom in Unicode era pascal.
First, let's deal with unicode string literals. If you can always be sure you will never have any body ever use your source code with any tool that could mess up your encodings
then you could use Unicode literals. Personally, I would not like to see Unicode codepoints in string literals in any of my code, for various reasons, the strongest reason being that my code may need to be reviewed for internationalization at some point, and having literals that belong to your local language peppered through your code is even more of a problem when you use a language other than those which use the simple Ascii/Ansi codepage symbols. Your source code will be more readable if you keep in mind the assumption that your accented characters, and even non-accented character literals would be better declared, as Jeroen says to declare them, in the const section, away from your actual place in the code that you use them.
Consider the case where you use the same string literal thirty three times throughout your code. Why should it be repeated instead of a constant? And even when it is used only once, isn't the code more readable if you declare a sane constant name?
So, first you should declare constants like he shows.
Second, the CharInSet function is deprecated for all uses other than the use it was intended for which is where you must continue to use the "Set of AnsiChar" types. This is no longer a recommended approach in Delphi 2009/2010, and using arrays of literal unicode characters, in your constant section, would be more readable, and more up-to-date.
I suggest you use the JCL StrContainsChars function and avoid character sets, since
you can not declare an inline SET of Unicode Characters at all, the language does not allow it. Instead use this, and be sure to comment it:
implementation
uses
JclStrings;
const
myChar1 = #$2001;
myChar2 = #$2002;
myChar3 = #$2003;
myMatchList1 : Array[0..2] of Char = (myChar1,myChar2,myChar3);
function Match(s:String):Boolean;
begin
result := StrContainsChars( s, myMatchList1,false);
end;
String, and Character Literals are bad to have peppering your code, especially character or numeric literals, are called "Magic values" and are to be avoided.
P.S. Your debug assertion shows that Ord('?') is downcasting the unicode character quietly to an AnsiChar byte-size character in the debugger. This behaviour is unexpected and should probably logged in QC.

convert string character to ascii in delphi

How can i convert string character (123-jhk25) to ASCII in Delphi7
If you mean the ASCII code for the character you need to use the Ord() function which returns the Ordinal value of any "enumerable" type
In this case it works on character values, returning a byte:
var
Asc : Byte;
i : Integer;
begin
for i := 1 to Length(s) do
begin
Asc := Ord(s[i]);
// do something with Asc
end;
end;
It depends on your Delphi version. In Delphi 2007 and before, strings are automatically in ANSI string format, and anything below 128 is an ASCII character.
In D2009 and later, things become more complicated since the default string type is UnicodeString. You'll have to cast the character to AnsiChar. It'll perform a codepage conversion, and then whatever you end up with may or may not work depending on which language the character in question came from. But if it was originally an ASCII character, it should convert without trouble.

Resources