Compiler directive to make string <> UnicodeString - delphi

In Delphi 2009 and later versions, string type implicitly equal to UnicodeString type. My discipline now is to use explicit UnicodeString type for my recent base units to eliminate the confusion. Is there a compiler directive that will make string <> UnicodeString in the unit where it was stated?

Unfortunately, the answer is No. There is no compiler switch for this.

If you want to have a codebase that can be compiled under Unicode and non-Unicode Delphi's you should be aware of the usage of each occurance of string - is it a string being passed to a Windows API? Do you want to call the 'Delphi native' version, or do you want to call the Ansi version or Wide version explicitly? Is it a string being exchanged with RTL/VCL code? Is it a string from a database? Does it need to support Unicode, Ansi or any other encoding? Etc.
In my experience, code interacting with Delphi RTL/VCL and WinAPI's (as declared in Windows.pas and such) are best being served with string itself, as it transparently means AnsiString or UnicodeString, depending on the compiler being used. If the specific purpose of the string makes the distinction Ansi or Unicode important, use AnsiSting or UnicodeString explicitly. This introduces a problem with older Delphi's as they don't have a UnicodeString. In practice this can largly be solved by defining a UncodeString yourself in some central unit like this:
{$IF NOT DECLARED(UnicodeString)}
type UnicodeString = WideString;
{$IFEND}
If on the other hand you want your code to be configurable to use Ansi or Unicode, use your own string type as often as possible. Define it something like this :
{$IFDEF MY_APP_USE_UNICODE}
type AppString = UnicodeString;
{$ELSE}
type AppString = AnsiString;
{$ENDIF}
.. and work with that in your own code.

So, you don't want to use String at all? Or revert it back to AnsiString?
Well, both things are possible through hacks, but the real answer is: DON'T DO IT.

Related

Sending TStringList between different Delphi versions

I´m migrating my Delphi 5 source code to Delphi 10 Berlin. I´ve got many DLLs in my project which export functions. These functions are called from other DLLs. There are two DLLs which I can not migrate to Delphi 10 but I still want to use them in my program.
Here an example:
function DoSomething( aList: TStringList ): Boolean; external 'Delphi5.dll';
I want to call "DoSomething" from my Delphi 10 Project. But the problem is, that TStringList in Delphi 5 is not compatible to TStringList in Delphi 10 Berlin (unicode). It would work, when DoSomething would have a parameter like "aString: AnsiString" because AnsiString is compatible to "string" in Delphi 5.
Is there a way to send a List between these two Delphi-Versions? Perhaps a TList or something else? Of course I could send a AnsiString with a separator between the strings to simulate a list, but I want a clean solution, because I´ve got many of these export-functions.
Thanks!
One should NEVER pass an object reference from an EXE to a DLL if it is meant to be used inside the DLL, or vice versa. An object reference can safely be passed to a DLL only if all the DLL does is pass the object back to the EXE (ro vice versa), such as through a callback function.
As you experienced, an object reference is not valid if the EXE and DLL aren't compiled with the same version of Delphi. Even if they are compiled with the same version, I suspect some compiler options could make them incompatible ({$Align} comes to mind, though I have never verified it). And even then, some incompatibilities might still occur (such as "Cannot assign TStringList to TStringList" errors due to RTTI mismatches).
Something that could fix your issue with minimal changes to your code would be to change the declaration of your functions to pass an interface to the DLL, and create a wrapper around TStringList that supports that interface. Said interface would need to support all the functionality you need from TStringList.
function DoSomething( aList: IStringList ): Boolean
Interfaces can be passed between DLL/EXE without most of the problems related to the object reference (as long as they use the exact same interface definition when they are compiled). (Edit: You still need to ensure the data passed to the interface's method are safe to pass to/from a DLL.)
That said, the interface should explicitly use AnsiString use a null-terminated PAnsiChar, or even a WideString (which can safely be sent to/from DLL - Reference).
function DoSomething( aListText: PAnsiChar ): Boolean
function DoSomething( aListText: WideString ): Boolean
Do not use String, which is AnsiString in Delphi 5 but is UnicodeString is Delphi 10. And don't use AnsiString, as it is not compatible between Delphi 5 and Delphi 10 due to internal structure differences.

How and where can I overwrite the default string type?

I would like to make things straight and declare/overwrite the default string type to either a wide or an ansi string.
E.g. string = WideString; under Delphi 2009
How and where is it possible to declare/set/change the default string type, so the entire project and the IDE guarantees, it has been specifically overwritten?
Where is the best place to declare/set this, so the entire project and
the IDE guarantees, that the default string is specifically
declared/overwritten?
Nowhere. string is keyword in Delphi and cannot be re-declared.
Delphi keywords
In Unicode Delphi versions string is alias for UnicodeString and in earlier for AnsiString.
WideString is provided to be compatible with the COM BSTR type and it is not reference counted like string, AnsiString or UnicodeString
String Types
No matter which Delphi version you use (pre-Unicode, or Unicode) using generic string type is preferred.
But in places where you need to be specific and code depends on exact type use AnsiString or UnicodeString even though they may map to generic string in particular Delphi versions.
use AnsiString in pre-Unicode Delphi to ensure compatibility across versions and code correctness in code that depends on variable being AnsiString
use UnicodeString in Unicode Delphi versions to future proof your code when code correctness depends on variable being UnicodeString
If you use Delphi 2009 or higher string is defined as UnicodeString. In earlier versions it is defined as AnsiString.
There is no way to redeclare the string type.

Widestring to string conversion in Delphi 7

my app is a non-unicode app written in Delphi 7.
I'd like to convert unicode strings to ANSI with this function :
function convertU(ws : widestring) : string;
begin
result := string(ws);
end;
I use also this code to set the right codepage to convert.
initialization
SetThreadLocale(GetSystemDefaultLCID);
GetFormatSettings;
It works great in the VCL main thread but not in a TThread,
where I get some questions marks '?' as result of function convertU.
Why not in a TThread ?
AFAIK SetThreadLocale does not change the current system Code Page, so won't affect the widestring to ansistring conversion in Delphi 7, which rely on GetACP API call, i.e. the system Code Page.
The system Code Page is set e.g. in Windows Seven in the Control Panel, then Region Languages / Administrative tab / Code Page for non Unicode Applications. This needs a system restart.
Delphi 7 uses this system Code Page, supplying 0 to all conversion API calls. So AFAIR SetThreadLocale won't affect the widestring to ansistring conversion in Delphi 7. It will change the locale (e.g. date/time and currency formatting), not the code page used by the system for its Ansi <-> Unicode conversion.
Newer versions of Delphi have a SetMultiByteConversionCodePage() function, able to set the code page to be used for all AnsiString handling.
But API calls (i.e. all ....A() functions in Windows.pas which are mapped by ...() in Delphi 7) will use this system code page. So you will have to call the ...W() wide API after a conversion to Unicode if you want to handle another code page. That is, the Delphi 7 VCL will work only with the system code page, not the value specified by SetThreadLocale.
Under Delphi 7, my advice is:
Use WideString everywhere, and specific "Wide" API calls - there are several set of components for Delphi 7 which handle WideString;
Use your own types, with a dedicated charset, but you'll need an explicit conversion before using the VCL/RTL or "Ansi" API calls - e.g. MyString = type AnsiString (this is what we do in mORMot, by defining a custom RawUTF8 type for internal UTF-8 process).
This is much better handled with Delphi 2009 and up, since you can specify a code page to every AnsiString type, and properly handle conversion to/from Unicode, for API calls or VCL process.
Calling SetThreadLocale() inside of an initialization block has no effect on TThread. If you want to set a thread's locale, you have to call SetThreadLocale() inside of the TThread.Execute() method.
A better option is to not rely on SetThreadLocale() at all. Do your own conversion by calling WideCharToMultiByte() directly so you can specify the particular Ansi codepage to convert to.

Delphi XE2 Dataset field type TStringField does not support Unicode?

I've been looking through the TDataset class and its string fields, in Delphi XE2 and noticed that AsWideString returns a type of UnicodeString. However it gets the value from the function TField.AsString: String which in turn calls TFIeld.AsAnsiString:AnsiString. Therefore any unicode characters would be lost? Also the buffer which is passed to TDataset.GetFieldData is declared as an array of AnsiChar.
Am I understanding this correctly?
No, you should be examining the TWideStringField class which is for Unicode fields and the TStringField class which is for non-Unicode strings. TField is just a base class and TField.GetAsWideString is a virtual method with a fall back implementation that is overridden by descendants that are Unicode aware.
YES, you did understand it correctly. This is the VCL and its documentation which are broken. Your confusion does perfectly make sense!
In the Delphi 2009+ implementation, you have to use AsString property for AnsiString and AsWideString for string=UnicodeString.
In fact, the As*String properties are defined as such:
property AsString: string read GetAsString write SetAsString;
property AsWideString: UnicodeString read GetAsWideString write SetAsWideString;
property AsAnsiString: AnsiString read GetAsAnsiString write SetAsAnsiString;
How on earth may we be able to find out that AsString returns an AnsiString? It just does not make sense at all, when compared to the rest of the VCL/RTL.
The implementation, which uses TStringField class for AnsiString and TWideStringField for string=UnicodeString is broken.
Furthermore, the documentation is also broken:
Data.DB.TField.AsString
Represents the field's value as a string (Delphi) or an AnsiString (C++).
This does not represent a string in Delphi, but an AnsiString! The fact that the property uses a plain string=UnicodeString type is perfectly missleading.
On the database point of view, it is up to the DB driver to handle Unicode or work with a specific charset. But on the VCL point of view, in Delphi 2009+ you should only know about string type, and be confident that using AsString: String will be Unicode-ready.

D2009 TStringlist ansistring

The businesswise calm of the summer has started so I picked up the migration to D2009. I roughly determined for every subsystem of the program if they should remain ascii, or can be unicode, and started porting.
It went pretty ok, all components were there in D2009 versions (some, like VSTView, slightly incompatible though) but I now have run into a problem, in some part that must remain ansistring, I extensively use TStringList, mostly as a basic map.
Is there already something easy to replace it with, or should I simply include a cut down ansistring tstringlist, based on old Delphi or FPC source?
I can't imagine I'm the first to run into this?
The changes must be relatively localised, so that the code remains compilable with BDS2006 while I go through the validation-trajectory. A few ifdefs here and there are no problem.
Of course string->ansistring and char ->ansichar etc don't count as modifications in my source, since I have to do that anyway, and it is fully backwards compat.
Edit: I've been able to work away some of the stuff in reader/writer classes. This makes going for Mason's solution easier than I originally thought. I'll holds Gabr's suggestion in mind as a fallback.
Generics is pretty much the reason I bought D2009. Pity that they made it FPC incompatible though
JCL implements TAnsiStrings and TAnsiStringList in the JclAnsiStrings unit.
If by "map" you mean "hash table", you can replace it with the generic TDictionary. Try declaring something like this:
uses
Generics.Collections;
type
TStringMap<T: class> = TDictionary<ansiString, T>;
Then just replace your StringLists with TStringMaps of the right object type. (Better type-safety gets thrown in free.) Also, if you'd like the dictionary to own the objects and free them when you're done, change it to a TObjectDictionary and when you call the constructor, pass [doOwnsValues] to the appropriate parameter.
(BTW if you're going to use TDictionary, make sure you download D2009 Update 3. The original release had some severe bugs in TDictionary that made it almost unusable.)
EDIT: If it still has to compile under D2006, then you'll have to tweak things a little. Try something like this:
type
TStringMap =
{$IFDEF UNICODE}
class TDictionary<ansiString, TObject>
(Add some basic wrapper functions here.)
end;
{$ELSE}
TStringList;
{$ENDIF}
The wrapper shouldn't take too much work if you were using it as a map in the first place. You lose the extra type safety in exchange for backwards compatibility, but you gain a real hash table that does its lookups in O(1) time.
TStringList.LoadFromFile/SaveToFile also take an optional parameter of type TEncoding, that allows you to use TStringList to store any type of string that you want.
procedure LoadFromFile(const FileName: string; Encoding: TEncoding); overload; virtual;
procedure SaveToFile(const FileName: string; Encoding: TEncoding); overload; virtual;
Also note that by default, TStringList uses ANSI as the codepage so that all existing code works as it has.
Do these subsystems need to remain ansistring, or just how they communicate with the outside world (RS232, text files, etc...)? Just like I do with C#, I treat strings in Delphi 2009 as just strings, and only worry about conversions when someone else needs them.
This will also help avoid unintentional implicit conversions in your code and when calling Windows API methods, improving performance.
You can modify Delphi 2007(or earlier)'s TStrings and TStringList classes and rename them to TAnsiStrings and TAnsiStringList. You should find that to be a very easy modification, and that will give you the classes you need.

Resources