How can I convert a Unicode string to an AnsiString? - delphi

Trying to move project from Delphi 2007 to Delphi XE4. What is the best way to convert String to AnsiString in Delphi XE4?

You simply assign it:
var
AnsiStr: AnsiString;
Str: string;
....
AnsiStr := Str;
The compiler will emit a warning mind you:
W1058 Implicit string cast with potential data loss from 'string' to 'AnsiString'
You can use a cast to suppress that warning:
AnsiStr := AnsiString(Str);
By default that gives no warning, although there is of course still potential for data loss. If you enable warning W1060 then the compiler says:
W1060 Explicit string cast with potential data loss from 'string' to 'AnsiString'
Of course, it's not expected that Delphi XE4 code has much place for the use of AnsiString. Unless you have a very specific interop requirement, then text is best held in the native data type, string. If you want to operate on byte arrays use TBytes or TArray<Byte>.

Related

Can we safely use ansiString in mobile with Sydney?

When I read Migrating Delphi Code to Mobile from Desktop, they say to avoid using AnsiString. Is there any reason for that? AnsiString use 2x less memory than UnicodeString, and it's a perfect container for JSON. So, can I use AnsiString safely, or do I need to stay with UnicodeString (and why)?
You can use 8-bit strings on mobile platforms. But safety depends on which kind of 8-bit strings you use.
For anything other than Windows, and even on Windows, using AnsiString is extremely bad idea. AnsiString is legacy type and while it was re-enabled in 10.4 on mobile platforms, that does not mean you should use it, and even less that you can use it safely.
One of the problems with AnsiString is that sooner or later in your code it will go through conversion, because default string type used all over RTL and FMX is UTF-16 string type, and you can lose original data.
String types you can safely use on mobile (and other platforms) are string, UTF8String and RawByteString.
When it comes to RawByteString it can only be safely used in code-page agnostic operations. See more: Delphi XE - RawByteString vs AnsiString
JSON files don't support ANSI encoding, so Unicode is your only choice. UTF-8 and UTF8String will do more than fine, because that is also default encoding for any JSON data exchange.
As far as various AnsiXXX functions are concerned, the best option is to write your own routines that will work on UTF-8 strings. You can also use standard functions that work on generic string type, but they are slower because of conversions to UTF-16 and back.
Illustration of data loss when using AnsiString on mobile (Android)
Android specification requires implementation of only few standard character charsets. That includes ISO-8859-1
https://developer.android.com/reference/java/nio/charset/Charset
For anything else you depend on the specific device.
For instance following example with AnsiString works fine for French character set, but it fails for Croatian and Chinesse.
var
s: string;
u: UTF8String;
a: AnsiString;
begin
s := 'é à è ù â ê î ô û ç ë ï ü';
a := s;
u := s;
Memo1.Lines.Add(s);
Memo1.Lines.Add(u);
Memo1.Lines.Add(a);
s := 'š đ č ć ž Š Đ Č Ć Ž';
a := s;
u := s;
Memo1.Lines.Add(s);
Memo1.Lines.Add(u);
Memo1.Lines.Add(a);
s := '新年';
u := s;
a := s;
Memo1.Lines.Add(s);
Memo1.Lines.Add(u);
Memo1.Lines.Add(a);
end;
Delphi compiler will issue a warning when you are doing unsafe typecasting between where data loss can occur, and it is prudent to fix all that code, by using some other string type.
W1058 Implicit string cast with potential data loss from 'string' to 'AnsiString'
There is also a warning when you directly convert between UTF-8 and UTF-16 string types, but to clear those warnings you can just explicitly typecast to string or UTF8String type, since compiler will do appropriate conversion in the background and all information will be retained (Note: Unicode normalization my occur during that process).
W1057 Implicit string cast from 'string' to 'UTF8String'

Cannot assign left side - Boolean(RecBuf[0]) := false

I'm converting an old code from Delphi 5 to XE5.
It has this piece of code:
Boolean(RecBuf[0]) := False;
RecBuf is PChar.
This works in Delphi 7, but not in XE5.
In XE5, it gives "Left side cannot assign" error.
How to implement this code in XE5?
In Delphi 7, PChar is an alias for PAnsiChar. That is, pointer to 8 bit ANSI character. Which means that RecBuf[0] has type AnsiChar. Since AnsiChar and Boolean have the same size, the cast is valid.
In Delphi XE5, PChar is an alias for PWideChar, pointer to 16 bit wide character. And so RecBuf[0] has type WideChar. Thus the cast is invalid because WideChar and Boolean have different size.
Exactly how best to fix the problem cannot be discerned from the code that you have shown here. Quite possibly you need to redeclare RecBuf. Perhaps it needs to be declared as PAnsiChar. Although one does wonder why you are casting a character to a Boolean.
Another possibility is that the reason RecBuf is declared as PChar is to allow you to use the index operator [], something that in older versions of Delphi is a special capability of pointers to character types. In modern Delphi you can use {$POINTERMATH ON} to enable that functionality for all typed pointers. So, if you did that then perhaps RecBuf should be PBoolean.
The bottom line is that whilst we can explain why the compiler complains about your code, we cannot give you a definitive solution.
Judging from RecBuf name they are used not as a pointer to string but as a pointer to buffer allocated in memory.
If my assumption correct, you may want to redeclare RecBuf as a variable of type PByte.
You can't assign cast stuff on the left, you need to cast the right side.
Instead of
Newtype(Whatever) := NewtypeConstant;
you should use
Whatever := TUnknown(constant)
In yourcase, of recbuf is an array of say, byte, then it puts false (0) in it, so it should be
Recbuf[0] := Byte(False);
Or you could, you know, just put zero in there.

Appending UnicodeString to WideString in Delphi

I'm curious about what happens with this piece of code in Delphi 2010:
function foo: WideString;
var
myUnicodeString: UnicodeString;
begin
for i:=1 to 1000 do
begin
myUnicodeString := ... something ...;
result := result + myUnicodeString; // This is where I'm interested
end;
end;
How many string conversions are involved, and are any particularly bad performance-wise?
I know the function should just return a UnicodeString instead, but I've seen this anti-pattern in the VCL streaming code, and want to understand the process.
To answer your question about what the code is actually doing, this statement:
result := result + myUnicodeString;
Does the following:
calls System._UStrFromWStr() to convert Result to a temp UnicodeString
calls System._UStrCat() to concatenate myUnicodeString onto the temp
calls System._WStrFromUStr() to convert the temp to a WideString and assign it back to Result.
There is a System._WStrCat() function for concatenating a WideString onto a WideString (and System._UStrCat() for UnicodeString). If CodeGear/Embarcadero had been smarter about it, they could have implemented a System._WStrCat() overload that takes a UnicodeString as input and a WideString as output (and vice versa for concatenating a WideString onto a UnicodeString). That way, no temp UnicodeString conversions would be needed anymore. Both WideString and UnicodeString are encoded as UTF-16 (well mostly, but I won't get into that here), so concatenating them together is just a matter of a single allocation and move, just like when concatenating two UnicodeStrings or two WideStrings together.
The performance is poor. There's no need for any encoding conversions since everything is UTF-16 encoded. However, WideString is a wrapper around the COM BSTR type which performs worse than native UnicodeString.
Naturally you should prefer to do all your work with the native types, either UnicodeString or TStringBuilder, and convert to WideString at the last possible moment.
That is generally a good policy. You don't want to use WideString internally since it's purely an interop type. So only convert to (and from) WideString at the interop boundary.

Delphi XE - RawByteString vs AnsiString

I had a similar question to this here: Delphi XE - should I use String or AnsiString? . After deciding that it is right to use ANSI strings in a (large) library of mine, I have realized that I can actually use RawByteString instead of ANSI. Because I mix UNICODE strings with ANSI strings, my code now has quite few places where it does conversions between them. However, it looks like if I use RawByteString I get rid of those conversions.
Please let me know your opinion about it.
Thanks.
Update:
This seems to be disappointing. It looks like the compiler still makes a conversion from RawByteString to string.
procedure TForm1.FormCreate(Sender: TObject);
var x1, x2: RawByteString;
s: string;
begin
x1:= 'a';
x2:= 'b';
x1:= x1+ x2;
s:= x1; { <------- Implicit string cast from 'RawByteString' to 'string' }
end;
I think it does some internal workings (such as copying data) and my code will not be much faster and I will still have to add lots of typecasts in my code in order to silence the compiler.
RawByteString is an AnsiString with no code page set by default.
When you assign another string to this RawByteString variable, you'll copy the code page of the source string. And this will include a conversion. Sorry.
But there is one another use of RawByteString, which is to store plain byte content (e.g. a database BLOB field content, just like an array of byte)
To summarize:
RawByteString should be used as a "code page agnostic" parameter to a method or function;
RawByteString can be used as a variable type to store some BLOB data.
If you want to reduce conversion, and would rather use 8 bit char string in your application, you should better:
Do not use the generic AnsiString type, which will depend on the current system code page, and by which you'll loose data;
Rely on UTF-8 encoding, i.e. some 8 bit code page / charset which won't loose any data when converted from or to an UnicodeString;
Don't let the compiler show warnings about implicit conversions: all conversion should be made explicit;
Use your own dedicated set of functions to handle your UTF-8 content.
That exactly what we made for our framework. We wanted to use UTF-8 in its kernel because:
We rely on UTF-8 encoded JSON for data transmission;
Memory consumption will be smaller;
The used SQLite3 engine will store text as UTF-8 in its database file;
We wanted a way of handling Unicode text with no loose of data with all versions of Delphi (from Delphi 6 up to XE), and WideString was not an option because it's dead slow and you've got the same problem of implicit conversions.
But, in order to achieve best speed, we write some optimized functions to handle our custom string type:
{{ RawUTF8 is an UTF-8 String stored in an AnsiString
- use this type instead of System.UTF8String, which behavior changed
between Delphi 2009 compiler and previous versions: our implementation
is consistent and compatible with all versions of Delphi compiler
- mimic Delphi 2009 UTF8String, without the charset conversion overhead
- all conversion to/from AnsiString or RawUnicode must be explicit }
{$ifdef UNICODE} RawUTF8 = type AnsiString(CP_UTF8); // Codepage for an UTF8string
{$else} RawUTF8 = type AnsiString; {$endif}
/// our fast RawUTF8 version of Trim(), for Unicode only compiler
// - this Trim() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Trim(const S: RawUTF8): RawUTF8;
/// our fast RawUTF8 version of Pos(), for Unicode only compiler
// - this Pos() is seldom used, but this RawUTF8 specific version is needed
// by Delphi 2009/2010/XE, to avoid two unnecessary conversions into UnicodeString
function Pos(const substr, str: RawUTF8): Integer; overload; inline;
And we reserved the RawByteString type for handling BLOB data:
{$ifndef UNICODE}
/// define RawByteString, as it does exist in Delphi 2009/2010/XE
// - to be used for byte storage into an AnsiString
// - use this type if you don't want the Delphi compiler not to do any
// code page conversions when you assign a typed AnsiString to a RawByteString,
// i.e. a RawUTF8 or a WinAnsiString
RawByteString = AnsiString;
/// pointer to a RawByteString
PRawByteString = ^RawByteString;
{$endif}
/// create a File from a string content
// - uses RawByteString for byte storage, thatever the codepage is
function FileFromString(const Content: RawByteString; const FileName: TFileName;
FlushOnDisk: boolean=false): boolean;
Source code is available in our repository. In this unit, UTF-8 related functions were deeply optimized, with both version in pascal and asm for better speed. We sometimes overloaded default functions (like Pos) to avoid conversion, or More information about how we handled text in the framework is available here.
Last word:
If you are sure that you will only have 7 bit content in your application (no accentuated characters), you may use the default AnsiString type in your program. But in this case, you should better add the AnsiStrings unit in your uses clause to have overloaded string functions which will avoid most unwanted conversion.
RawByteString is still an "AnsiString." It is best described as a "universal receiver" which means it will take on whatever the source-string's codepage is at the point of assignment without forcing a codepage conversion. RawByteString was intended to be used only as a function parameter so that you will, as you've discovered, not incur a conversion between AnsiStrings with differing code-page affinities when calling utility functions which take AnsiStrings.
However, in the case above, you're assigning what is essentially an AnsiString to a UnicodeString which will incur a conversion. It must do a conversion because the RawByteString has a payload of 8bit-based characters, whereas a string (UnicodeString) has a payload of 16bit-based characters.

Is there a quick and dirty way to Cast PansiChar to Pchar in Delphi 2009

I have a very large number of app to convert to Delphi 2009 and there are a number of external interfaces that return pAnsiChars. Does anyone have a quick and simple way to cast these back to pChars? There is a lot on string to pAnsiChar, but much I can find on the other way around.
Delphi 2009 has added a new string type called RawByteString. It is defined as:
type
RawByteString = type AnsiString($ffff);
If you need to save binary data coming in as PAnsiString, you can use this. You should be able to use the RawByteString the way you used AnsiString previously.
However, the recommended long term solution is still to convert your programs to Unicode.
There is no way to "cast" a PAnsiChar to a PChar. PChar is Unicode in Delphi 2009. Ansi data cannot be simply casted to Unicode, and vice versa. You have to perform an actual data conversion. If you have a PAnsiChar pointer to some data, and want to put the data into a Unicode string, then assign the PAnsiChar data to an AnsiString first, and then assign the AnsiString to the Unicode string as needed. Likewise, if you need to pass a Unicode string to a PAnsiChar, you have to assign the data to an AnsiString first. There are articles on Embarcadero's and TeamB's blog sites that take about Delphi 2009 migration issues.

Resources