Widestring to string conversion in Delphi 7 - delphi

my app is a non-unicode app written in Delphi 7.
I'd like to convert unicode strings to ANSI with this function :
function convertU(ws : widestring) : string;
begin
result := string(ws);
end;
I use also this code to set the right codepage to convert.
initialization
SetThreadLocale(GetSystemDefaultLCID);
GetFormatSettings;
It works great in the VCL main thread but not in a TThread,
where I get some questions marks '?' as result of function convertU.
Why not in a TThread ?

AFAIK SetThreadLocale does not change the current system Code Page, so won't affect the widestring to ansistring conversion in Delphi 7, which rely on GetACP API call, i.e. the system Code Page.
The system Code Page is set e.g. in Windows Seven in the Control Panel, then Region Languages / Administrative tab / Code Page for non Unicode Applications. This needs a system restart.
Delphi 7 uses this system Code Page, supplying 0 to all conversion API calls. So AFAIR SetThreadLocale won't affect the widestring to ansistring conversion in Delphi 7. It will change the locale (e.g. date/time and currency formatting), not the code page used by the system for its Ansi <-> Unicode conversion.
Newer versions of Delphi have a SetMultiByteConversionCodePage() function, able to set the code page to be used for all AnsiString handling.
But API calls (i.e. all ....A() functions in Windows.pas which are mapped by ...() in Delphi 7) will use this system code page. So you will have to call the ...W() wide API after a conversion to Unicode if you want to handle another code page. That is, the Delphi 7 VCL will work only with the system code page, not the value specified by SetThreadLocale.
Under Delphi 7, my advice is:
Use WideString everywhere, and specific "Wide" API calls - there are several set of components for Delphi 7 which handle WideString;
Use your own types, with a dedicated charset, but you'll need an explicit conversion before using the VCL/RTL or "Ansi" API calls - e.g. MyString = type AnsiString (this is what we do in mORMot, by defining a custom RawUTF8 type for internal UTF-8 process).
This is much better handled with Delphi 2009 and up, since you can specify a code page to every AnsiString type, and properly handle conversion to/from Unicode, for API calls or VCL process.

Calling SetThreadLocale() inside of an initialization block has no effect on TThread. If you want to set a thread's locale, you have to call SetThreadLocale() inside of the TThread.Execute() method.
A better option is to not rely on SetThreadLocale() at all. Do your own conversion by calling WideCharToMultiByte() directly so you can specify the particular Ansi codepage to convert to.

Related

Sending TStringList between different Delphi versions

I´m migrating my Delphi 5 source code to Delphi 10 Berlin. I´ve got many DLLs in my project which export functions. These functions are called from other DLLs. There are two DLLs which I can not migrate to Delphi 10 but I still want to use them in my program.
Here an example:
function DoSomething( aList: TStringList ): Boolean; external 'Delphi5.dll';
I want to call "DoSomething" from my Delphi 10 Project. But the problem is, that TStringList in Delphi 5 is not compatible to TStringList in Delphi 10 Berlin (unicode). It would work, when DoSomething would have a parameter like "aString: AnsiString" because AnsiString is compatible to "string" in Delphi 5.
Is there a way to send a List between these two Delphi-Versions? Perhaps a TList or something else? Of course I could send a AnsiString with a separator between the strings to simulate a list, but I want a clean solution, because I´ve got many of these export-functions.
Thanks!
One should NEVER pass an object reference from an EXE to a DLL if it is meant to be used inside the DLL, or vice versa. An object reference can safely be passed to a DLL only if all the DLL does is pass the object back to the EXE (ro vice versa), such as through a callback function.
As you experienced, an object reference is not valid if the EXE and DLL aren't compiled with the same version of Delphi. Even if they are compiled with the same version, I suspect some compiler options could make them incompatible ({$Align} comes to mind, though I have never verified it). And even then, some incompatibilities might still occur (such as "Cannot assign TStringList to TStringList" errors due to RTTI mismatches).
Something that could fix your issue with minimal changes to your code would be to change the declaration of your functions to pass an interface to the DLL, and create a wrapper around TStringList that supports that interface. Said interface would need to support all the functionality you need from TStringList.
function DoSomething( aList: IStringList ): Boolean
Interfaces can be passed between DLL/EXE without most of the problems related to the object reference (as long as they use the exact same interface definition when they are compiled). (Edit: You still need to ensure the data passed to the interface's method are safe to pass to/from a DLL.)
That said, the interface should explicitly use AnsiString use a null-terminated PAnsiChar, or even a WideString (which can safely be sent to/from DLL - Reference).
function DoSomething( aListText: PAnsiChar ): Boolean
function DoSomething( aListText: WideString ): Boolean
Do not use String, which is AnsiString in Delphi 5 but is UnicodeString is Delphi 10. And don't use AnsiString, as it is not compatible between Delphi 5 and Delphi 10 due to internal structure differences.

The "local" directive in Delphi

I was sitting around debugging some code and I stumbled across this line in SysUtils.pas:
procedure ConvertError(ResString: PResStringRec); local;
What does the local keyword do exactly? It seems the ConvertError function is not declared in the interface section of the file, is this just a clarification that the function is indeed local, or is there a practical benefit to using this directive beyond that?
It dates back to the Linux compiler, Kylix. Here's what I can see in my Delphi 6 language guide, page 9-4:
The directive local, which marks routines as unavailable for export, is platform-specific and has no effect in Windows programming.
On Linux, the local directive provides a slight performance optimization for routines that are compiled into a library, but are not exported. The directive can be specified for standalone procedures and functions, but not for methods. A routine declared with local—for example.
function Contraband(I: Integer): Integer; local;
—does not refresh the EBX register and hence
cannot be exported from a library.
cannot be declared in the interface section of a unit.
cannot have its address take or be assigned to a procedural-type variable.
if it is a pure assembler routine, cannot be called from a another unit unless the caller sets up EBX.

Compiler directive to make string <> UnicodeString

In Delphi 2009 and later versions, string type implicitly equal to UnicodeString type. My discipline now is to use explicit UnicodeString type for my recent base units to eliminate the confusion. Is there a compiler directive that will make string <> UnicodeString in the unit where it was stated?
Unfortunately, the answer is No. There is no compiler switch for this.
If you want to have a codebase that can be compiled under Unicode and non-Unicode Delphi's you should be aware of the usage of each occurance of string - is it a string being passed to a Windows API? Do you want to call the 'Delphi native' version, or do you want to call the Ansi version or Wide version explicitly? Is it a string being exchanged with RTL/VCL code? Is it a string from a database? Does it need to support Unicode, Ansi or any other encoding? Etc.
In my experience, code interacting with Delphi RTL/VCL and WinAPI's (as declared in Windows.pas and such) are best being served with string itself, as it transparently means AnsiString or UnicodeString, depending on the compiler being used. If the specific purpose of the string makes the distinction Ansi or Unicode important, use AnsiSting or UnicodeString explicitly. This introduces a problem with older Delphi's as they don't have a UnicodeString. In practice this can largly be solved by defining a UncodeString yourself in some central unit like this:
{$IF NOT DECLARED(UnicodeString)}
type UnicodeString = WideString;
{$IFEND}
If on the other hand you want your code to be configurable to use Ansi or Unicode, use your own string type as often as possible. Define it something like this :
{$IFDEF MY_APP_USE_UNICODE}
type AppString = UnicodeString;
{$ELSE}
type AppString = AnsiString;
{$ENDIF}
.. and work with that in your own code.
So, you don't want to use String at all? Or revert it back to AnsiString?
Well, both things are possible through hacks, but the real answer is: DON'T DO IT.

D2009 TStringlist ansistring

The businesswise calm of the summer has started so I picked up the migration to D2009. I roughly determined for every subsystem of the program if they should remain ascii, or can be unicode, and started porting.
It went pretty ok, all components were there in D2009 versions (some, like VSTView, slightly incompatible though) but I now have run into a problem, in some part that must remain ansistring, I extensively use TStringList, mostly as a basic map.
Is there already something easy to replace it with, or should I simply include a cut down ansistring tstringlist, based on old Delphi or FPC source?
I can't imagine I'm the first to run into this?
The changes must be relatively localised, so that the code remains compilable with BDS2006 while I go through the validation-trajectory. A few ifdefs here and there are no problem.
Of course string->ansistring and char ->ansichar etc don't count as modifications in my source, since I have to do that anyway, and it is fully backwards compat.
Edit: I've been able to work away some of the stuff in reader/writer classes. This makes going for Mason's solution easier than I originally thought. I'll holds Gabr's suggestion in mind as a fallback.
Generics is pretty much the reason I bought D2009. Pity that they made it FPC incompatible though
JCL implements TAnsiStrings and TAnsiStringList in the JclAnsiStrings unit.
If by "map" you mean "hash table", you can replace it with the generic TDictionary. Try declaring something like this:
uses
Generics.Collections;
type
TStringMap<T: class> = TDictionary<ansiString, T>;
Then just replace your StringLists with TStringMaps of the right object type. (Better type-safety gets thrown in free.) Also, if you'd like the dictionary to own the objects and free them when you're done, change it to a TObjectDictionary and when you call the constructor, pass [doOwnsValues] to the appropriate parameter.
(BTW if you're going to use TDictionary, make sure you download D2009 Update 3. The original release had some severe bugs in TDictionary that made it almost unusable.)
EDIT: If it still has to compile under D2006, then you'll have to tweak things a little. Try something like this:
type
TStringMap =
{$IFDEF UNICODE}
class TDictionary<ansiString, TObject>
(Add some basic wrapper functions here.)
end;
{$ELSE}
TStringList;
{$ENDIF}
The wrapper shouldn't take too much work if you were using it as a map in the first place. You lose the extra type safety in exchange for backwards compatibility, but you gain a real hash table that does its lookups in O(1) time.
TStringList.LoadFromFile/SaveToFile also take an optional parameter of type TEncoding, that allows you to use TStringList to store any type of string that you want.
procedure LoadFromFile(const FileName: string; Encoding: TEncoding); overload; virtual;
procedure SaveToFile(const FileName: string; Encoding: TEncoding); overload; virtual;
Also note that by default, TStringList uses ANSI as the codepage so that all existing code works as it has.
Do these subsystems need to remain ansistring, or just how they communicate with the outside world (RS232, text files, etc...)? Just like I do with C#, I treat strings in Delphi 2009 as just strings, and only worry about conversions when someone else needs them.
This will also help avoid unintentional implicit conversions in your code and when calling Windows API methods, improving performance.
You can modify Delphi 2007(or earlier)'s TStrings and TStringList classes and rename them to TAnsiStrings and TAnsiStringList. You should find that to be a very easy modification, and that will give you the classes you need.

AnsiString return values from a Delphi 2007 DLL in a Delphi 2009 application

I have a DLL compiled with D2007 that has functions that return AnsiStrings.
My application is compiled in D2009. When it calls the AnsiString functions, it gets back garbage.
I created a little test app/dll to experiment and discovered that if both app and dll are compiled with the same version of Delphi (either 2007 or 2009), there is no problem. But when one is compiled in 2009 and the other 2007, I get garbage.
I've tried including the latest version of FastMM in both projects, but even then the 2009 app cannot read AnsiStrings from the 2007 dll.
Any ideas of what is going wrong here? Is there a way to work around this?
The internal structure of AnsiStrings changed between Delphi 2007 and Delphi 2009. (Don't get upset; that possibility has been present since day 1.) A Delphi 2009 string maintains a number indicating what code page its data is in.
I recommend you do what every other DLL on Earth does and pass character buffers that the function can fill. The caller should pass a buffer pointer and a number indicating the size of the buffer. (Make sure you're clear about whether you're measuring the size in bytes or characters.) The DLL function fills the buffer, writing no more than the given size, counting the terminating null character.
If the caller doesn't know how many bytes the buffer should be, then you have two options:
Make the DLL behave specially when the input buffer pointer is null. In that case, have it return the required size so that the caller can allocate that much space and call the function a second time.
Have the DLL allocate space for itself, with a predetermined method available for the caller to free the buffer later. The DLL can either export a function for freeing buffers that it has allocated, or you can specify some mutually available API function for the caller to use, such as GlobalFree. Your DLL must use the corresponding allocation API, such as GlobalAlloc. (Don't use Delphi's built-in memory-allocation functions like GetMem or New; there's no guarantee that the caller's memory manager will know how to call Free or Dispose, even if it's written in the same language, even if it's written with the same Delphi version.)
Besides, it's selfish to write a DLL that can only be used by a single language. Write your DLLs in the same style as the Windows API, and you can't go wrong.
OK, so haven't tried it, so a big fat disclaimer slapped on this one.
In the help viewer, look at the topic (Unicode in RAD Stufio) ms-help://embarcadero.rs2009/devcommon/unicodeinide_xml.html
Returning the Delphi 2007 string to Delphi 2009, you should get two problems.
First, the code page mentioned by Rob. You can set this by declaring another AnsiString and calling StringCodePage on the new AnsiString. Then assign that to the old AnsiString by calling SetCodePage. That should work, but if it doesn't there is hope still.
The second problem is the element size which will be something completely mad. It should be 1, so make it 1. The issue here is that there is no SetElementSize function to lean on.
Try this:
var
ElemSizeAddr: PWord; // Need a two-byte type
BrokenAnsiString: AnsiString; // The patient we are trying to cure
...
ElemSizeAddr := Pointer(PAnsiChar(BrokenAnsiString) - 10);
ElemSizeAddr^ := 1; // The size of the element
That should do it!
Now if the StringCodePage/SetCodePage thing didn't work, you can do the same as above, changing the line where we get the address to deduct 12, instead of 10.
It has hack scribbled all over it, that's why I love it.
You are going to need to port those DLLs eventually, but this makes the port more manageable.
One final word - depending on how you return the AnsiString (function result, output parameter, etc) you may need to first assign the string to a different AnsiString variable just to make sure there is no trouble with memory being overwritten.
You'll likely just need to convert the DLL to 2009. According to Embarcadero, the conversion to 2009 is 'easy' and should take you no time at all.
Your DLL should not be returning AnsiString values to begin with. The only way that would work correctly in the first place is if both DLL and EXE were compiled with the ShareMem unit, and even then only if they are compiled with the same Delphi version. D2007's memory manager is not compatible with D2009's memory manager (or any other cross-version use of memory managers), AFAIK.
I agree with Rob and Remy here: common Dlls should return PAnsiChar instead
of AnsiStrings.
If the DLL works OK compiled with D2009, why simply doesn't stop compiling it
with D2007 and start compiling it with D2009 once and for all?
Just as a quick solution here: if your actual data that you pass back from dll in the string does not exceed 255 chars, you can change both the in-dll and interface declerations to use ShortString, which will work regardless of 2007/2009 version. Since you're using AnsiString already on 2007 without a codepage identifier, unicode wont give you any trouble.
if you go this way, all you need to do is change the declarations like:
function MyStringReturningFunction : ShortString ; external 'MyLibrary.dll';
(and in dll: function MyStringReturningFunction : ShortString; respectively)
Same goes for input/output parameters of course:
procedure MyStringTakingAndReturningFunction(s1:ShortString; var s2:ShortString); external 'MyLibrary.dll';
Should be easier than changing a lot of code. But be careful, as I said, your data must not exceed 255 characters since that is the maximum size a ShortString can hold.

Resources