I have 2 functions, one to encrypt and the other to de-crypt a string, that use the Ord() function. It works great except with extended Ascii codes.
If I use the letter ê (Ascii code 136), the Ord() function returns 234 where as I expected it to return 136.
If I run decrypt on the encrypted string, I get a different result than what the original string was, the ê turns into a j.
Can somebody please help on how to solve this?
procedure TForm1.btnEncryptClick(Sender: TObject);
var
sTempString : string;
iIndex,
i: integer;
begin
sTempString := edtOriginalString.Text ;
for iIndex := 1 to length(sTempString) do
begin
i := ord(sTempString[iIndex]);
i := i shl 1;
sTempString[iIndex] := Char(i);
end;
edtEncryptedString.Text := sTempString;
end;
procedure TForm1.btnDecryptClick(Sender: TObject);
var
sTempString : string;
iIndex,
i : integer;
begin
sTempString := edtEncryptedString.Text ;
for iIndex := 1 to length(sTempString) do
begin
i := ord(sTempString[iIndex]);
i := i shr 1;
sTempString[iIndex] := char(i);
end;
edtDecryptedString.Text := sTempString;
end;
If I use the letter ê (Ascii code 136)
No, that's actually wrong. ASCII only has 128 characters (0 to 127).
However, ê is the Unicode character U+00EA: LATIN SMALL LETTER E WITH CIRCUMFLEX.
And EA (hex) is indeed 234 (dec).
Delphi characters and strings are 8-bit before Delphi 2009, and Unicode in Delphi 2009 and later.
So in your case, Delphi 6, a character is 8-bit.
Hence, your left shift will make you lose the most significant bit (MSB), and you cannot possibly hope to get it back.
Indeed, if we take the case of ê (234), we have
1110 1010 (ê)
Shifting the bits one step to the left, we obtain
1101 0100
Shifting the bits one step to the right, we obtain
0110 1010 (j).
Hence, we lost information.
However, your method will work for ASCII characters (<= 127), because they all have zero as the MSB. It will not work for any characters above 127, because they all have one as the MSB (so it wouldn't work even if Ord did indeed return 136 in your case).
Hence, you need to abandon or redesign your "encryption" method if you want to support characters above 127. For instance, you could rotate the bits instead of shifting them. Or you could invert them (using not).
If you choose to rotate instead of shift, you will get this:
1110 1010 (ê)
rotate left:
1101 0101
rotate back (right):
1110 1010 (ê)
Although it isn't relevant to your actual issue, you might still wonder why Ord doesn't return 136 as you'd expect.
Well, before Unicode (mainly in the 1990s and earlier), there simply were many different (non-compatible) character encodings. Often, an 8-bit encoding/codepage (characters 0..255) included the ASCII characters (0..127) and then made its own choices for the remaining characters (128..255). Since ê isn't an ASCII character, this means that only some of these "extended ASCII" codepages might include ê, and among those that do include ê, the actual numeric values might very well differ.
In other words, your source claiming that ê is 136 and your Delphi program are using different 8-bit codepages.
In the modern world of Unicode, this kind of problem no longer exists.
Related
Delphi RIO. I have built an Excel PlugIn with Delphi (also using AddIn Express). I iterate through a column to read cell values. After I read the cell value, I do a TRIM function. The TRIM is not deleting the last space. Code Snippet...
acctName := Trim(UpperCase(Acctname));
Before the code, AcctName is 'ABC Holdings '. It is the same AFTER the TRIM function. It appears that Excel has added some type of other char there. (new line?? Carriage return??) What is the best way to get rid of this? Is there a way I can ask the debugger to show me the HEX value for this variable. I have tried the INSPECT and EVALUATE windows. They both just show text. Note that I have to be careful of just deleting NonText characters, and some companies names have dashes, commas, apostrophes, etc.
**Additional Info - Based on Andreas suggestion, I added the following...
ShowMessage(IntToHex(Ord(Acctname[Acctname.Length])));
This comes back with '00A0'. So I am thinking I can just do a simple StringReplace... so I add this BEFORE Andreas code...
acctName := StringReplace(acctName, #13, '', [rfReplaceAll]);
acctName := StringReplace(acctName, #10, '', [rfReplaceAll]);
Yet, it appears that nothing has changed. The ShowMessage still shows '00A0' as the last character. Why isn't the StringReplace removing this?
If you want to know the true identity of the last character of your string, you can display its Unicode codepoint:
ShowMessage(IntToHex(Ord(Acctname[Acctname.Length]))).
Or, you can use a utility to investigate the Unicode character on the clipboard, like my own.
Yes, the character in question is U+00A0: NO-BREAK SPACE.
This is like a usual space, but it tells the rendering application not to put a line break at this space. For instance, in Swedish, at least, you want non-breaking spaces in 5 000 kWh.
By default, Trim and TStringHelper.Trim do not remove this kind of whitespace. (They also leave U+2007: FIGURE SPACE and a few other kinds of whitespace.)
The string helper method has an overload which lets you specify the characters to trim. You can use this to include U+00A0:
S.Trim([#$20, #$A0, #$9, #$D, #$A]) // space, nbsp, tab, CR, LF
// (many whitespace characters missing!)
But perhaps an even better solution is to rely on the Unicode characterisation and do
function RealTrimRight(const S: string): string;
var
i: Integer;
begin
i := S.Length;
while (i > 0) and S[i].IsWhiteSpace do
Dec(i);
Result := Copy(S, 1, i);
end;
Of course, you can implement similar RealTrimLeft and RealTrim functions.
And of course there are many ways to see the actual string bytes in the debugger. In addition to writing things like Ord(S[S.Length]) in the Evaluate/Modify window (Ctrl+F7), my personal favourite method is to use the Memory window (Ctrl+Alt+E). When this has focus, you can press Ctrl+G and type S[1] to see the actual bytes:
Here you see the string test string. Since strings are Unicode (UTF-16) since Delphi 2009, each character occupies two bytes. For simple ASCII characters, this means that every second byte is null. The ASCII values for our string are 74 65 73 74 20 73 74 72 69 6E 67. You can also see, on the line above (02A0855C) that our string object has reference count 1 and length B (=11).
As a demo, to show the unicode string:
program q63847533;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
type
array100 = array[0..99] of Byte;
parray100 = ^array100;
var
searchResult : TSearchRec;
Name : string;
display : parray100 absolute Name;
dummy : string;
begin
if findfirst('z*.mp3', faAnyFile, searchResult) = 0 then
begin
repeat
writeln('File name = '+searchResult.Name);
name := searchResult.Name;
writeln('File size = '+IntToStr(searchResult.Size));
until FindNext(searchResult) <> 0;
// Must free up resources used by these successful finds
FindClose(searchResult);
end;
readln(dummy);
end.
My directory contains two z*.mp3 files, one with an ANSI name and the other with a Unicode name.
WATCHing display^ as Hex or Memorydump will display what you seem to require (the Is there a way I can ask the debugger to show me the HEX value for this variable. of your question)
I am trying to better understand surrogate pairs and Unicode implementation in Delphi.
If I call length() on the Unicode string S := 'Ĥà̲V̂e' in Delphi, I will get back, 8.
This is because the lengths of the individual characters [Ĥ],[à̲],[V̂], and [e] are 2, 3, 2, and 1 respectively. This is because Ĥ has a surrogate, à̲ has two additional surrogates, V̂ has a surrogate and e has no surrogates.
If I wanted to return the second element in the string including all surrogates, [à̲], how would I do that? I know I would need to do some sort of testing of the individual bytes. I ran some tests using the routine
function GetFirstCodepointSize(const S: UTF8String): Integer;
referenced in this SO Question.
but got some unusual results, eg, here are some length and sizes of some different codepoints. Below is a snippet of how I generated these tables.
...
UTFCRUDResultStrings.add('INPUT: '+#9#9+ DATA +#9#9+ 'GetFirstCodePointSize = ' +intToStr(GetFirstCodepointSize(DATA))
+#9#9+ 'Length =' + intToStr(length(DATA)));
...
First Set: This makes sense to me, each code point size is doubled, but these are one character each and Delphi gives me the length as just 1, perfect.
INPUT: ď GetFirstCodePointSize = 2 Length =1
INPUT: ơ GetFirstCodePointSize = 2 Length =1
INPUT: ǥ GetFirstCodePointSize = 2 Length =1
Second set: It initially looks to me like the lengths and code points are reversed? I am guessing the reason for this is that the characters + surrogates are being treated individually, hence the first codepoint size is for the 'H', which is 1, but the length is returning the lengths of 'H' plus '^'.
INPUT: Ĥ GetFirstCodePointSize = 1 Length =2
INPUT: à̲ GetFirstCodePointSize = 1 Length =3
INPUT: V̂ GetFirstCodePointSize = 1 Length =2
INPUT: e GetFirstCodePointSize = 1 Length =1
Some additional tests...
INPUT: ¼ GetFirstCodePointSize = 2 Length =1
INPUT: ₧ GetFirstCodePointSize = 3 Length =1
INPUT: 𤭢 GetFirstCodePointSize = 4 Length =2
INPUT: ß GetFirstCodePointSize = 2 Length =1
INPUT: 𨳒 GetFirstCodePointSize = 4 Length =2
Is there a reliable way in Delphi to determine where an element in a Unicode String starts and ends?
I know my terminology using the word element may be off, but I don't think codepoint and character are right either, particularly given that one element may have a codepoint size of 3, but have a length of only one.
I am trying to better understand surrogate pairs and Unicode implementation in Delphi.
Let's get some terminology out of the way.
Each "character" (known as a grapheme) that is defined by Unicode is assigned a unique codepoint.
In a Unicode Transformation Format (UTF) encoding - UTF-7, UTF-8, UTF-16, and UTF-32 - each codepoint is encoded as a sequence of codeunits. The size of each codeunit is determined by the encoding - 7 bits for UTF-7, 8 bits for UTF-8, 16 bits for UTF-16, and 32 bits for UTF-32 (hence their names).
In Delphi 2009 and later, String is an alias for UnicodeString, and Char is an alias for WideChar. WideChar is 16 bits. A UnicodeString holds a UTF-16 encoded string (in earlier versions of Delphi, the equivalent string type was WideString), and each WideChar is a UTF-16 codeunit.
In UTF-16, a codepoint can be encoded using either 1 or 2 codeunits. 1 codeunit can encode codepoint values in the Basic Multilingual Plane (BMP) range - $0000 to $FFFF, inclusive. Higher codepoints require 2 codeunits, which is also known as a surrogate pair.
If I call length() on the Unicode string S := 'Ĥà̲V̂e' in Delphi, I will get back, 8.
This is because the lengths of the individual characters [Ĥ],[à̲],[V̂], and [e] are 2, 3, 2, and 1 respectively.
This is because Ĥ has a surrogate, à̲ has two additional surrogates, V̂ has a surrogate and e has no surrogates.
Yes, there are 8 WideChar elements (codeunits) in your UTF-16 UnicodeString. What you are calling "surrogates" are actually known as "combining marks". Each combining mark is its own unique codepoint, and thus its own codeunit sequence.
If I wanted to return the second element in the string including all surrogates, [à̲], how would I do that?
You have to start at the beginning of the UnicodeString and analyze each WideChar until you find one that is not a combining mark attached to a previous WideChar. On Windows, the easiest way to do that is to use the CharNextW() function, eg:
var
S: String;
P: PChar;
begin
S := 'Ĥà̲V̂e';
P := CharNext(PChar(S)); // returns a pointer to à̲
end;
The Delphi RTL does not have an equivalent function. You would have write one manually, or use a third-party library. The RTL does have a StrNextChar() function, but it only handles UTF-16 surrogates, not combining marks (CharNext() handles both). So, you could use StrNextChar() to scan through each codepoint in the UnicodeString, but you have to loo at each codepoint to know whether it is a combining mark or not, eg:
uses
Character;
function MyCharNext(P: PChar): PChar;
begin
if (P <> nil) and (P^ <> #0) then
begin
Result := StrNextChar(P);
while GetUnicodeCategory(Result^) = ucCombiningMark do
Result := StrNextChar(Result);
end else begin
Result := nil;
end;
end;
var
S: String;
P: PChar;
begin
S := 'Ĥà̲V̂e';
P := MyCharNext(PChar(S)); // should return a pointer to à̲
end;
I know I would need to do some sort of testing of the individual bytes.
Not the bytes, but the codepoints that they represent when decoded.
I ran some tests using the routine
function GetFirstCodepointSize(const S: UTF8String): Integer
Look closely at that function signature. See the parameter type? It is a UTF-8 string, not a UTF-16 string. This was even stated in the answer you got that function from:
Here is an example how to parse UTF8 string
UTF-8 and UTF-16 are very different encodings, and thus have different semantics. You cannot use UTF-8 semantics to process a UTF-16 string, and vice versa.
Is there a reliable way in Delphi to determine where an element in a Unicode String starts and ends?
Not directly. You have to parse the string from the beginning, skipping elements as needed until you reach the desired element. Remember that each codepoint may be encoded as either 1 or 2 codeunit elements, and each logical glyph may be encoded using multiple codepoints (and thus multiple codeunit sequences).
I know my terminology using the word element may be off, but I don't think codepoint and character are right either, particularly given that one element may have a codepoint size of 3, but have a length of only one.
1 glyph is comprised of 1+ codepoints, and each codepoint is encoded as 1+ codeunits.
Could someone implement the following function?
function GetElementAtIndex(S: String; StrIdx : Integer): String;
Try something like this:
uses
SysUtils, Character;
function MyCharNext(P: PChar): PChar;
begin
Result := P;
if Result <> nil then
begin
Result := StrNextChar(Result);
while GetUnicodeCategory(Result^) = ucCombiningMark do
Result := StrNextChar(Result);
end;
end;
function GetElementAtIndex(S: String; StrIdx : Integer): String;
var
pStart, pEnd: PChar;
begin
Result := '';
if (S = '') or (StrIdx < 0) then Exit;
pStart := PChar(S);
while StrIdx > 1 do
begin
pStart := MyCharNext(pStart);
if pStart^ = #0 then Exit;
Dec(StrIdx);
end;
pEnd := MyCharNext(pStart);
{$POINTERMATH ON}
SetString(Result, pStart, pEnd-pStart);
end;
Looping through the graphemes of a string can be more complicated than you might think. In Unicode 13, some graphemes are up to 14 bytes long. I advise using a third-party library for this. One of the best for this is Skia4Delphi: https://github.com/skia4delphi/skia4delphi
The code is very simple:
var LUnicode: ISkUnicode := TSkUnicode.Create;
for var LGrapheme: string in LUnicode.GetBreaks('Text', TSkBreakType.Graphemes) do
Showmessage(LGrapheme);
In the library demo itself there is an example of graphemes iterator too. Look:
I am testing migration from Delphi 5 to XE. Being unfamiliar with UnicodeString, before asking my question I would like to present its background.
Delphi XE string-oriented functions: Copy, Delete and Insert have a parameter Index telling where the operation should start. Index may have any integer value starting from 1 and finishing at the length of the string to which the function is applied.
Since the string can have multi-element characters, function operation can start at an element (surrogate) belonging to a multi-element series encoding a single unicode named code-point.
Then, having a sensible string and using one of the functions, we can obtain non sensible result.
The phenomenon can be illustrated with the below cases using the function Copy with respect to strings representing the same array of named codepoints (i.e. meaningful signs)
($61, $13000, $63)
It's concatenation of 'a', EGYPTIAN_HIEROGLYPH_A001 and 'c'; it looks as
Case 1. Copy of AnsiString (element = byte)
We start with the above mentioned UnicodeString #$61#$13000#$63 and we convert it to UTF-8 encoded AnsiString s0.
Then we test the function
copy (s0, index, 1)
for all possible values of index; there are 6 of them since s0 is 6 bytes long.
procedure Copy_Utf8Test;
type TAnsiStringUtf8 = type AnsiString (CP_UTF8);
var ss : string;
s0,s1 : TAnsiStringUtf8;
ii : integer;
begin
ss := #$61#$13000#$63; //mem dump of ss: $61 $00 $0C $D8 $00 $DC $63 $00
s0 := ss; //mem dump of s0: $61 $F0 $93 $80 $80 $63
ii := length(s0); //sets ii=6 (bytes)
s1 := copy(s0,1,1); //'a'
s1 := copy(s0,2,1); //#$F0 F means "start of 4-byte series"; no corresponding named code-point
s1 := copy(s0,3,1); //#$93 "trailing in multi-byte series"; no corresponding named code-point
s1 := copy(s0,4,1); //#$80 "trailing in multi-byte series"; no corresponding named code-point
s1 := copy(s0,5,1); //#$80 "trailing in multi-byte series"; no corresponding named code-point
s1 := copy(s0,6,1); //'c'
end;
The first and last results are sensible within UTF-8 codepage, while the other 4 are not.
Case 2. Copy of UnicodeString (element = word)
We start with the same UnicodeString s0 := #$61#$13000#$63.
Then we test the function
copy (s0, index, 1)
for all possible values of index; there are 4 of them since s0 is 4 words long.
procedure Copy_Utf16Test;
var s0,s1 : string;
ii : integer;
begin
s0 := #$61#$13000#$63; //mem dump of s0: $61 $00 $0C $D8 $00 $DC $63 $00
ii := length(s0); //sets ii=4 (bytes)
s1 := copy(s0,1,1); //'a'
s1 := copy(s0,2,1); //#$D80C surrogate pair member; no corresponding named code-point
s1 := copy(s0,3,1); //#$DC00 surrogate pair member; no corresponding named code-point
s1 := copy(s0,4,1); //'c'
end;
The first and last results are sensible within codepage CP_UNICODE (1200), while the other 2 are not.
Conclusion.
The string-oriented functions: Copy, Delete and Insert perfectly operate on string considered as a mere array of bytes or words. But they are not helpful if string is seen as that what it essentially is, i.e. representation of array of named code-points.
Both above two cases deal with strings which represent the same array of 3 named code-points. They are considered as representations (encodings) of the same text composed of 3 meaningful signs (to avoid abuse of the term "characters").
One may want to be able to extract (copy) any of those meaningful signs regardless whether a particular text representation (encoding) is mono- or multi-element one.
I've spent quite a time looking around for a satisfactory equivalent of Copy that I used to in Delphi 5.
Question.
Do such equivalents exist or I have to write them myself?
What you have described is how Copy(), Delete(), and Insert() have ALWAYS worked, even for AnsiString. The functions operate on elements (ie codeunits in Unicode terminology), and always have.
AnsiString is a string of 8bit AnsiChar elements, which can be encoded in any 8bit ANSI/MBCS format, including UTF-8.
UnicodeString (and WideString) is a string of 16bit WideChar elements, which are encoded in UTF-16.
The functions HAVE NEVER taken encoding into account. Not for MBCS AnsiString. Not for UTF-16 UnicodeString. Indexes are absolute element indexes from the beginning of the string.
If you need encoding-aware Copy/Delete/Insert functions that operate on logical codepoint boundaries, where each codepoint may be 1+ elements in the string, then you have to write your own functions, or find third-party functions that do what you need. There is no MBCS/UTF-aware mutilator functions in the RTL.
You should parse Unicode string youself. Fortunaly the Unicode encoding is designed to make parsing easy. Here is an example how to parse UTF8 string:
program Project9;
{$APPTYPE CONSOLE}
uses
SysUtils;
function GetFirstCodepointSize(const S: UTF8String): Integer;
var
B: Byte;
begin
B:= Byte(S[1]);
if (B and $80 = 0 ) then
Result:= 1
else if (B and $E0 = $C0) then
Result:= 2
else if (B and $F0 = $E0) then
Result:= 3
else if (B and $F8 = $F0) then
Result:= 4
else
Result:= -1; // invalid code
end;
var
S: string;
begin
S:= #$61#$13000#$63;
Writeln(GetFirstCodepointSize(S));
S:= #$13000#$63;
Writeln(GetFirstCodepointSize(S));
S:= #$63;
Writeln(GetFirstCodepointSize(S));
Readln;
end.
I'm reading some data from memory, and this area of memory is in Unicode. So to make one ansi string I need something like this:
while CharInSet(Chr(Ord(Buff[aux])), ['0'..'9', #0]) do
begin
Target:= Target + Chr(Ord(Buff[aux]));
inc(aux);
end;
Where Buff is array of Bytes and Target is string. I just want keep getting Buff and adding in Target while it's 0..9, but when it finds NULL memory char (00), it just stops. How can I keep adding data in Target until first letter or non-numeric character?? The #0 has no effect.
I would not even bother with CharInSet() since you are dealing with bytes and not characters:
var
b: Byte;
while aux < Length(Buff) do
begin
b := Buff[aux];
if ((b >= Ord('0')) and (b <= Ord('9'))) or (b = 0) then
begin
Target := Target + Char(Buff[aux]);
Inc(aux);
end else
Break;
end;
If your data is Unicode, then I am assuming that the encoding is UTF-16. In which case you cannot process it byte by byte. A character unit is 2 bytes wide. Put the data into a Delphi string first, and then parse it:
var
str: string;
....
SetString(str, PChar(Buff), Length(Buff) div SizeOf(Char));
Do it this way and your loop can look like this:
for i := 1 to Length(str) do
if not CharInSet(str[i], ['0'..'9']) then
begin
SetLength(str, i-1);
break;
end;
I believe that your confusion was caused by processing byte by byte. With UTF-16 encoded text, ASCII characters are encoded as a pair of bytes, the most significant of which is zero. I suspect that explains what you were trying to achieve with your CharInSet call.
If you want to cater for other digit characters then you can use the Character unit and test with TCharacter.IsDigit().
this code in delphi2007 is convert success
for example:
i have a chinese 短刀 , in delphi2007 convert is B5 CC B5 C6 ,but in delphi 2010 convert is 77 ED 52 00
function StringToHex(str: string): string;
var
i:integer;
s:string;
begin
s:='';
for i:=1 to length(str) do begin
s:=s+inttohex(Integer(str[i]),2);
end;
result:=s;
end;
but in delphi2010, it's wrong
who can edit it work in delphi2010 success?
First, in Delphi 2007, String=AnsiString, and in Delphi 2010, String=UnicodeString. That is enough explanation for you to understand, if you know what AnsiString (char is 8 bits) and UnicodeString (char is 16 bits) means.
Even though you are calling "IntToHex(x,2)", each Delphi 2010 character when converted to an integer will be in the range from 0 to 65535, which means that the IntToHex call is returning between 2 and 4 hex digits, which makes it hard for you to read the results without confusion.
A minimal unicode-aware fix is to change to IntToHex(x,4) for unicode versions of delphi, and maybe put a space in there so you can at least see where the codepoints separate Four digits like 0000 is enough hex digits for a single unicode character represented as hex. Two digits is not enough.
Why are the values different though? That's a good question. Let me try to make it clearer; I believe you are seeing a consequence of using Delphi 2007 and its ANSI+MBCS support (which is codepage reliant) versus Delphi 2010 which uses Unicode Strings. You should not be surprised that MBCS values different from unicode codepoints.
Also you should know that it takes two hex digits to show a byte, and four hex digits to show a Unicode character, which is 16 bits in size.
If you really want to see the Hex of the UTF8 string, then in Delphi 2010 you must create a UTF8 string first. If you really want MBCS, then say so. The whole world is Unicode now, I suggest you let MBCS go.
Fixed code for Unicode strings character codepoints (4 hex digits, 16 bit):
A UnicodeString=String aware version (Delphi 2009,2010,XE):
function StringToHex16(str: string): string;
var
i:integer;
s:string;
begin
s:='';
for i:=1 to length(str) do begin
s:=s+inttohex(Integer(str[i]),4);
end;
result:=s;
end;
UTF8 version for Delphi 2009,2010,XE:
function StringToHexUtf8(str: string): string;
var
i:integer;
s:string;
u:RawByteString;
begin
u := Utf8String(str);
s:='';
for i:=1 to length(u) do begin
s:=s+inttohex(Integer(u[i]),2);
end;
result:=s;
end;
And finally, since probably what you want is to reproduce exactly Delphi 2007's behaviour, here is an explicit example using MBCS functions:
function StringToHexMbcs(str: string;cp:Integer): string;
var
sz,i:integer;
s:string;
u:RawByteString;
flags:Integer;
begin
// use cp 936 or 950 for simplified or traditional chinese mbcs.
flags := WC_COMPOSITECHECK or WC_DISCARDNS or WC_SEPCHARS or WC_DEFAULTCHAR;
sz := Windows.WideCharToMultiByte( cp, flags, #str[1],-1,nil,0,nil,nil); // get length.
SetLength(u,sz+1);
Windows.WideCharToMultiByte( cp, flags, #str[1],Length(str),#u[1],sz-1, nil,nil);
s:='';
for i:=1 to sz do begin
s:=s+inttohex(Integer(u[i]),2);
end;
result:=s;
end;
For future reference though, Delphi 2007 is not the gold standard of what is "right". You have to make some effort to understand the difference between MBCS and Unicode.
To obtain the same result in D2010 as in D2007, simple change the function parameter from (Unicode)String to AnsiString. Any string value you pass in, regardless of type, with be converted by the RTL into its MBCS equivalent based on the system default codepage - the same AnsiString has always used in past versions and continues using.