Sending TStringList between different Delphi versions - delphi

I´m migrating my Delphi 5 source code to Delphi 10 Berlin. I´ve got many DLLs in my project which export functions. These functions are called from other DLLs. There are two DLLs which I can not migrate to Delphi 10 but I still want to use them in my program.
Here an example:
function DoSomething( aList: TStringList ): Boolean; external 'Delphi5.dll';
I want to call "DoSomething" from my Delphi 10 Project. But the problem is, that TStringList in Delphi 5 is not compatible to TStringList in Delphi 10 Berlin (unicode). It would work, when DoSomething would have a parameter like "aString: AnsiString" because AnsiString is compatible to "string" in Delphi 5.
Is there a way to send a List between these two Delphi-Versions? Perhaps a TList or something else? Of course I could send a AnsiString with a separator between the strings to simulate a list, but I want a clean solution, because I´ve got many of these export-functions.
Thanks!

One should NEVER pass an object reference from an EXE to a DLL if it is meant to be used inside the DLL, or vice versa. An object reference can safely be passed to a DLL only if all the DLL does is pass the object back to the EXE (ro vice versa), such as through a callback function.
As you experienced, an object reference is not valid if the EXE and DLL aren't compiled with the same version of Delphi. Even if they are compiled with the same version, I suspect some compiler options could make them incompatible ({$Align} comes to mind, though I have never verified it). And even then, some incompatibilities might still occur (such as "Cannot assign TStringList to TStringList" errors due to RTTI mismatches).
Something that could fix your issue with minimal changes to your code would be to change the declaration of your functions to pass an interface to the DLL, and create a wrapper around TStringList that supports that interface. Said interface would need to support all the functionality you need from TStringList.
function DoSomething( aList: IStringList ): Boolean
Interfaces can be passed between DLL/EXE without most of the problems related to the object reference (as long as they use the exact same interface definition when they are compiled). (Edit: You still need to ensure the data passed to the interface's method are safe to pass to/from a DLL.)
That said, the interface should explicitly use AnsiString use a null-terminated PAnsiChar, or even a WideString (which can safely be sent to/from DLL - Reference).
function DoSomething( aListText: PAnsiChar ): Boolean
function DoSomething( aListText: WideString ): Boolean
Do not use String, which is AnsiString in Delphi 5 but is UnicodeString is Delphi 10. And don't use AnsiString, as it is not compatible between Delphi 5 and Delphi 10 due to internal structure differences.

Related

Is there an easy way to work around a Delphi utf8-file flaw?

I have discovered (the hard way) that if a file has a valid UTF-8 BOM but contains any invalid UTF8 encodings, and is read by any of the Delphi (2009+) encoding-enabled methods such as LoadFromFile, then the result is a completely empty file with no error indication. In several of my applications, I would prefer to simply lose a few bad encodings, even if I get no error report in this case either.
Debugging reveals that MultiByteToWideChar is called twice, first to get the output buffer size, then to do the conversion. But TEncoding.UTF8 contains a private FMBToWCharFlags value for these calls, and this is initialized with a MB_ERR_INVALID_CHARS value. So the call to get the charcount returns 0 and the loaded file is completely empty. Calling this API without the flag would 'silently drop illegal code points'.
My question is how best to weave through the nest of classes in the Encoding area to work around the fact that this is a private value (and needs to be, because it is a class var for all threads). I think I could add a custom UTF8 encoding, using the guidance in Marco Cantu's Delphi 2009 book. And it could optionally raise an exception if MultiByteToWideChar has returned an encoding error, after calling it again without the flag. But that does not solve the problem of how to get my custom encoding used instead of Tencoding.UTF8.
If I could just set this up as a default for the application at initialization, perhaps by actually modifying the class var for Tencoding.UFT8, this would probably be sufficient.
Of course, I need a solution without waiting to lodge a QC report asking for a more robust design, getting it accepted, and seeing it changed.
Any ideas would be very welcome. And can someone confirm this is still an issue for XE4, which I have not yet installed?
I ran into the MB_ERR_INVALID_CHARS issue when I first updated Indy to support TEncoding, and ended up implementing a custom TEncoding-derived class for UTF-8 handling to avoid specifying MB_ERR_INVALID_CHARS. I didn't think to use a class helper.
However, this issue is not just limited to UTF-8. Any decoding failure of any of the TEncoding classes will result in a blank result, not an exception being raised. Why Embarcadero chose that route, when most of the RTL/VCL uses exceptions instead, is beyond me. Not raising an exception on error caused a fair amount of issues in Indy that had to be worked around.
This can be done pretty simply, at least in Delphi XE5 (have not checked earlier versions). Just instantiate your own TUTF8Encoding:
procedure LoadInvalidUTF8File(const Filename: string);
var
FEncoding: TUTF8Encoding;
begin
FEncoding := TUTF8Encoding.Create(CP_UTF8, 0, 0);
// Instead of CP_UTF8, MB_ERR_INVALID_CHARS, 0
try
with TStringList.Create do
try
LoadFromFile(Filename, FEncoding);
// ...
finally
Free;
end;
finally
FEncoding.Free;
end;
end;
The only issue here is that the IsSingleByte property for the newly instantiated TUTF8Encoding is then incorrectly set to False, but this property is not currently used anywhere in the Delphi sources.
A partial workaround is to force the UTF8 encoding to suppress MB_ERR_INVALID_CHARS globally. For me, this avoids the need for raising an exception, because I find it makes MultiByteToWideChar not quite 'silent': it actually inserts $fffd characters (Unicode 'replacement character') which I can then find in the cases where this is important. The following code does this:
unit fixutf8;
interface
uses System.Sysutils;
type
TUTF8fixer = class helper for Tmbcsencoding
public
procedure setflag0;
end;
implementation
procedure TUTF8fixer.setflag0;
{$if CompilerVersion = 31}
asm
XOR ECX,ECX
MOV Self.FMBToWCharFlags,ECX
end;
{$else}
begin
Self.FMBToWCharFlags := 0;
end;
{$endif}
procedure initencoding;
begin
(Tencoding.UTF8 as TmbcsEncoding).setflag0;
end;
initialization
initencoding;
end.
A more useful and principled fix would require changing the calls to MultiByteToWideChar not to use MB_ERR_INVALID_CHARS, and to make an initial call with this flag so that an exception could be raised after the load is complete, to indicate that characters will have been replaced.
There are relevant QC reports on this issue, including 76571, 79042 and 111980. The first one has been resolved 'as designed'.
(Edited to work with Delphi Berlin)
Your "global" approach is not really global - it relies upon the assumption that all the code would only use one and the same instance of TUTF8Encoding. The same instance where you hacked the flags field.
But it would not work if one obtain TUTF8Encoding object(s) by other means than TEncoding.GetUTF8, for example in XE2 another method - TEncoding.GetEncoding(CP_UTF8) - would create a new instance of TUTF8Encoding instead of re-using FUTF8 shared one. Or some function might run TUTF8Encode.Create directly.
So i'd suggest two more approaches.
Approach with patching the class implementation, somewhat hacky. You introduce your own class for the sake of obtaining new "fixes" constructor body.
type TMyUTF8Encoding = class(TUTF8Encoding)
public constructor Create; override;
end;
This constructor would be the copycat of TUTF8Encoding.Create() implementation, except for setting the flag as you want it ( in XE2 it is done by calling another, inherited Create(x,y,z) so u would not need an access to the private field ) instead.
Then you can patch the stock TUTF8Encoding VMT overriding its virtual constructor to that new constructor of yours.
You may read Delphi documentation about "internal formats" and so forth, to get the VMT layout. You would also need calling VirtualProtect (or other platform-specific function) to remove protection from VMT memory area before patching and then to restore it.
Examples to learn from
How to change the implementation (detour) of an externally declared function
https://stackoverflow.com/a/1482802/976391
Or you may try using Delphi Detours library, hopefully it can patch virtual constructors. Then... it might be an overkill here to use that rather complex lib for that single goal.
After you hacked the TUTF8Encoding class do call the TEncoding.FreeEncodings to remove the already created shared instances (if any) if any and thus trigger recreating the UTF8 instances with your modifications.
Then, if you compile your program as a single monolithic EXE , without using runtime BPL modules, you just can copy the SysUtils.pas sources to your application folder and then to include that local copy into your project explicitly.
How to patch a method in Classes.pas
There you would change the very TUTF8Encoding implementation as you see fit in the sources and Delphi would use it.
This brain-deadly simplistic (hence - equally reliable) approach would not work though if your projects would be built to reuse rtlNNN.bpl runtime package instead of being monolithic.

Widestring to string conversion in Delphi 7

my app is a non-unicode app written in Delphi 7.
I'd like to convert unicode strings to ANSI with this function :
function convertU(ws : widestring) : string;
begin
result := string(ws);
end;
I use also this code to set the right codepage to convert.
initialization
SetThreadLocale(GetSystemDefaultLCID);
GetFormatSettings;
It works great in the VCL main thread but not in a TThread,
where I get some questions marks '?' as result of function convertU.
Why not in a TThread ?
AFAIK SetThreadLocale does not change the current system Code Page, so won't affect the widestring to ansistring conversion in Delphi 7, which rely on GetACP API call, i.e. the system Code Page.
The system Code Page is set e.g. in Windows Seven in the Control Panel, then Region Languages / Administrative tab / Code Page for non Unicode Applications. This needs a system restart.
Delphi 7 uses this system Code Page, supplying 0 to all conversion API calls. So AFAIR SetThreadLocale won't affect the widestring to ansistring conversion in Delphi 7. It will change the locale (e.g. date/time and currency formatting), not the code page used by the system for its Ansi <-> Unicode conversion.
Newer versions of Delphi have a SetMultiByteConversionCodePage() function, able to set the code page to be used for all AnsiString handling.
But API calls (i.e. all ....A() functions in Windows.pas which are mapped by ...() in Delphi 7) will use this system code page. So you will have to call the ...W() wide API after a conversion to Unicode if you want to handle another code page. That is, the Delphi 7 VCL will work only with the system code page, not the value specified by SetThreadLocale.
Under Delphi 7, my advice is:
Use WideString everywhere, and specific "Wide" API calls - there are several set of components for Delphi 7 which handle WideString;
Use your own types, with a dedicated charset, but you'll need an explicit conversion before using the VCL/RTL or "Ansi" API calls - e.g. MyString = type AnsiString (this is what we do in mORMot, by defining a custom RawUTF8 type for internal UTF-8 process).
This is much better handled with Delphi 2009 and up, since you can specify a code page to every AnsiString type, and properly handle conversion to/from Unicode, for API calls or VCL process.
Calling SetThreadLocale() inside of an initialization block has no effect on TThread. If you want to set a thread's locale, you have to call SetThreadLocale() inside of the TThread.Execute() method.
A better option is to not rely on SetThreadLocale() at all. Do your own conversion by calling WideCharToMultiByte() directly so you can specify the particular Ansi codepage to convert to.

Where/When to free PWideChars sent to external DLL

I'm allocating memory for several PWideChar on my main executable file
var
pwcValor: PWideChar;
begin
pwcValor := AllocMem( sizeof(WideChar) * Succ(Length(pValor)));
StringToWideChar(pValor, pwcValor, Succ(Length(pValor)));
pMetodo(pCodigo, pCodigoParametro, pwcValor);
All of these variables are sent over to an external DLL using late binding. I have some questions about this situation to avoid memory leaks.
Where (on my exe or my dll) should I call the FreeMem on these variables?
Do I need to call FreeMem on these variables?
When can I (or should I) call FreeMem on these variables?
If I call them inside the external DLL (which is also mine), I get Access Violations when I try to Unload from memory the DLL library.
Tks
EDIT
Something I forgot to ask. And the other way around? I have so return parameters from my DLL to the EXE, so the PWideChars are allocated on the DLL. So, I would have to free them on the DLL, right? But I'll probably still be using them on the EXE. Must I pre-allocate on the EXE, send to the DLL the pointer, and have it filled in the DLL in these cases? Or just make a copy on the EXE of the returned parameter, so I can free it safely on the DLL?
Ultimately that depends on the design of the DLLs you use. However, I would say that if not documented otherwise it is safe to free the resources as soon as the DLL function returns. I would even suggest that you should do it. Anyway you must do it to avoid memory leaks.
Regarding the last sentence, a DLL and the invoking EXE eventhough they both be Delphi code, they use different memory managers, so you cannot in a DLL free memory allocated in the EXE.
Regarding to freeing:
There are different possible solutions here:
Your exe can allocate buffer, which then would be filled by dll;
Your dll can export one more function, say FreeString. Exe should call it every time it has finished with the string;
You can use simple WideString type. This type of strings use system memory manager which is the same for exe and dll.
Personally I recommend you the last option.
If you have a Delphi string variable and are using D2009 or later so that PChar maps to PWideChar then you can just call your function with pMetodo(pCodigo, pCodigoParametro, PChar(pValor)). Then inside your DLL, you take a copy of the string by declaring a string variable and simply assigning to the string. For example the DLL code would look like this:
procedure pMetodo(pwcValor: PChar);
var
pValor: string;
begin
pValor := pwcValor;
...
end;
The code as you have it is quite unnecessarily complex. Doing it the way I suggest avoids having to use any explicit memory allocation routines. If you want to write C code then why use Delphi!! ;-)
If you wanted to be more explicit then you could write PWideChar rather than PChar.

D2009 TStringlist ansistring

The businesswise calm of the summer has started so I picked up the migration to D2009. I roughly determined for every subsystem of the program if they should remain ascii, or can be unicode, and started porting.
It went pretty ok, all components were there in D2009 versions (some, like VSTView, slightly incompatible though) but I now have run into a problem, in some part that must remain ansistring, I extensively use TStringList, mostly as a basic map.
Is there already something easy to replace it with, or should I simply include a cut down ansistring tstringlist, based on old Delphi or FPC source?
I can't imagine I'm the first to run into this?
The changes must be relatively localised, so that the code remains compilable with BDS2006 while I go through the validation-trajectory. A few ifdefs here and there are no problem.
Of course string->ansistring and char ->ansichar etc don't count as modifications in my source, since I have to do that anyway, and it is fully backwards compat.
Edit: I've been able to work away some of the stuff in reader/writer classes. This makes going for Mason's solution easier than I originally thought. I'll holds Gabr's suggestion in mind as a fallback.
Generics is pretty much the reason I bought D2009. Pity that they made it FPC incompatible though
JCL implements TAnsiStrings and TAnsiStringList in the JclAnsiStrings unit.
If by "map" you mean "hash table", you can replace it with the generic TDictionary. Try declaring something like this:
uses
Generics.Collections;
type
TStringMap<T: class> = TDictionary<ansiString, T>;
Then just replace your StringLists with TStringMaps of the right object type. (Better type-safety gets thrown in free.) Also, if you'd like the dictionary to own the objects and free them when you're done, change it to a TObjectDictionary and when you call the constructor, pass [doOwnsValues] to the appropriate parameter.
(BTW if you're going to use TDictionary, make sure you download D2009 Update 3. The original release had some severe bugs in TDictionary that made it almost unusable.)
EDIT: If it still has to compile under D2006, then you'll have to tweak things a little. Try something like this:
type
TStringMap =
{$IFDEF UNICODE}
class TDictionary<ansiString, TObject>
(Add some basic wrapper functions here.)
end;
{$ELSE}
TStringList;
{$ENDIF}
The wrapper shouldn't take too much work if you were using it as a map in the first place. You lose the extra type safety in exchange for backwards compatibility, but you gain a real hash table that does its lookups in O(1) time.
TStringList.LoadFromFile/SaveToFile also take an optional parameter of type TEncoding, that allows you to use TStringList to store any type of string that you want.
procedure LoadFromFile(const FileName: string; Encoding: TEncoding); overload; virtual;
procedure SaveToFile(const FileName: string; Encoding: TEncoding); overload; virtual;
Also note that by default, TStringList uses ANSI as the codepage so that all existing code works as it has.
Do these subsystems need to remain ansistring, or just how they communicate with the outside world (RS232, text files, etc...)? Just like I do with C#, I treat strings in Delphi 2009 as just strings, and only worry about conversions when someone else needs them.
This will also help avoid unintentional implicit conversions in your code and when calling Windows API methods, improving performance.
You can modify Delphi 2007(or earlier)'s TStrings and TStringList classes and rename them to TAnsiStrings and TAnsiStringList. You should find that to be a very easy modification, and that will give you the classes you need.

AnsiString return values from a Delphi 2007 DLL in a Delphi 2009 application

I have a DLL compiled with D2007 that has functions that return AnsiStrings.
My application is compiled in D2009. When it calls the AnsiString functions, it gets back garbage.
I created a little test app/dll to experiment and discovered that if both app and dll are compiled with the same version of Delphi (either 2007 or 2009), there is no problem. But when one is compiled in 2009 and the other 2007, I get garbage.
I've tried including the latest version of FastMM in both projects, but even then the 2009 app cannot read AnsiStrings from the 2007 dll.
Any ideas of what is going wrong here? Is there a way to work around this?
The internal structure of AnsiStrings changed between Delphi 2007 and Delphi 2009. (Don't get upset; that possibility has been present since day 1.) A Delphi 2009 string maintains a number indicating what code page its data is in.
I recommend you do what every other DLL on Earth does and pass character buffers that the function can fill. The caller should pass a buffer pointer and a number indicating the size of the buffer. (Make sure you're clear about whether you're measuring the size in bytes or characters.) The DLL function fills the buffer, writing no more than the given size, counting the terminating null character.
If the caller doesn't know how many bytes the buffer should be, then you have two options:
Make the DLL behave specially when the input buffer pointer is null. In that case, have it return the required size so that the caller can allocate that much space and call the function a second time.
Have the DLL allocate space for itself, with a predetermined method available for the caller to free the buffer later. The DLL can either export a function for freeing buffers that it has allocated, or you can specify some mutually available API function for the caller to use, such as GlobalFree. Your DLL must use the corresponding allocation API, such as GlobalAlloc. (Don't use Delphi's built-in memory-allocation functions like GetMem or New; there's no guarantee that the caller's memory manager will know how to call Free or Dispose, even if it's written in the same language, even if it's written with the same Delphi version.)
Besides, it's selfish to write a DLL that can only be used by a single language. Write your DLLs in the same style as the Windows API, and you can't go wrong.
OK, so haven't tried it, so a big fat disclaimer slapped on this one.
In the help viewer, look at the topic (Unicode in RAD Stufio) ms-help://embarcadero.rs2009/devcommon/unicodeinide_xml.html
Returning the Delphi 2007 string to Delphi 2009, you should get two problems.
First, the code page mentioned by Rob. You can set this by declaring another AnsiString and calling StringCodePage on the new AnsiString. Then assign that to the old AnsiString by calling SetCodePage. That should work, but if it doesn't there is hope still.
The second problem is the element size which will be something completely mad. It should be 1, so make it 1. The issue here is that there is no SetElementSize function to lean on.
Try this:
var
ElemSizeAddr: PWord; // Need a two-byte type
BrokenAnsiString: AnsiString; // The patient we are trying to cure
...
ElemSizeAddr := Pointer(PAnsiChar(BrokenAnsiString) - 10);
ElemSizeAddr^ := 1; // The size of the element
That should do it!
Now if the StringCodePage/SetCodePage thing didn't work, you can do the same as above, changing the line where we get the address to deduct 12, instead of 10.
It has hack scribbled all over it, that's why I love it.
You are going to need to port those DLLs eventually, but this makes the port more manageable.
One final word - depending on how you return the AnsiString (function result, output parameter, etc) you may need to first assign the string to a different AnsiString variable just to make sure there is no trouble with memory being overwritten.
You'll likely just need to convert the DLL to 2009. According to Embarcadero, the conversion to 2009 is 'easy' and should take you no time at all.
Your DLL should not be returning AnsiString values to begin with. The only way that would work correctly in the first place is if both DLL and EXE were compiled with the ShareMem unit, and even then only if they are compiled with the same Delphi version. D2007's memory manager is not compatible with D2009's memory manager (or any other cross-version use of memory managers), AFAIK.
I agree with Rob and Remy here: common Dlls should return PAnsiChar instead
of AnsiStrings.
If the DLL works OK compiled with D2009, why simply doesn't stop compiling it
with D2007 and start compiling it with D2009 once and for all?
Just as a quick solution here: if your actual data that you pass back from dll in the string does not exceed 255 chars, you can change both the in-dll and interface declerations to use ShortString, which will work regardless of 2007/2009 version. Since you're using AnsiString already on 2007 without a codepage identifier, unicode wont give you any trouble.
if you go this way, all you need to do is change the declarations like:
function MyStringReturningFunction : ShortString ; external 'MyLibrary.dll';
(and in dll: function MyStringReturningFunction : ShortString; respectively)
Same goes for input/output parameters of course:
procedure MyStringTakingAndReturningFunction(s1:ShortString; var s2:ShortString); external 'MyLibrary.dll';
Should be easier than changing a lot of code. But be careful, as I said, your data must not exceed 255 characters since that is the maximum size a ShortString can hold.

Resources