I asked a similar question about implicit interface variables not so long ago.
The source of this question was a bug in my code due to me not being aware of the existence of an implicit interface variable created by the compiler. This variable was finalized when the procedure that owned it finished. This in turn caused a bug due to the lifetime of the variable being longer than I had anticipated.
Now, I have a simple project to illustrate some interesting behaviour from the compiler:
program ImplicitInterfaceLocals;
{$APPTYPE CONSOLE}
uses
Classes;
function Create: IInterface;
begin
Result := TInterfacedObject.Create;
end;
procedure StoreToLocal;
var
I: IInterface;
begin
I := Create;
end;
procedure StoreViaPointerToLocal;
var
I: IInterface;
P: ^IInterface;
begin
P := #I;
P^ := Create;
end;
begin
StoreToLocal;
StoreViaPointerToLocal;
end.
StoreToLocal is compiled just as you would imagine. The local variable I, the function's result, is passed as an implicit var parameter to Create. The tidy up for StoreToLocal results in a single call to IntfClear. No surprises there.
However, StoreViaPointerToLocal is treated differently. The compiler creates an implicit local variable which it passes to Create. When Create returns, the assignment to P^ is performed. This leaves the routine with two local variables holding references to the interface. The tidy up for StoreViaPointerToLocal results in two calls to IntfClear.
The compiled code for StoreViaPointerToLocal is like this:
ImplicitInterfaceLocals.dpr.24: begin
00435C50 55 push ebp
00435C51 8BEC mov ebp,esp
00435C53 6A00 push $00
00435C55 6A00 push $00
00435C57 6A00 push $00
00435C59 33C0 xor eax,eax
00435C5B 55 push ebp
00435C5C 689E5C4300 push $00435c9e
00435C61 64FF30 push dword ptr fs:[eax]
00435C64 648920 mov fs:[eax],esp
ImplicitInterfaceLocals.dpr.25: P := #I;
00435C67 8D45FC lea eax,[ebp-$04]
00435C6A 8945F8 mov [ebp-$08],eax
ImplicitInterfaceLocals.dpr.26: P^ := Create;
00435C6D 8D45F4 lea eax,[ebp-$0c]
00435C70 E873FFFFFF call Create
00435C75 8B55F4 mov edx,[ebp-$0c]
00435C78 8B45F8 mov eax,[ebp-$08]
00435C7B E81032FDFF call #IntfCopy
ImplicitInterfaceLocals.dpr.27: end;
00435C80 33C0 xor eax,eax
00435C82 5A pop edx
00435C83 59 pop ecx
00435C84 59 pop ecx
00435C85 648910 mov fs:[eax],edx
00435C88 68A55C4300 push $00435ca5
00435C8D 8D45F4 lea eax,[ebp-$0c]
00435C90 E8E331FDFF call #IntfClear
00435C95 8D45FC lea eax,[ebp-$04]
00435C98 E8DB31FDFF call #IntfClear
00435C9D C3 ret
I can guess as to why the compiler is doing this. When it can prove that assigning to the result variable will not raise an exception (i.e. if the variable is a local) then it uses the result variable directly. Otherwise it uses an implicit local and copies the interface once the function has returned thus ensuring that we don't leak the reference in case of an exception.
But I cannot find any statement of this in the documentation. It matters because interface lifetime is important and as a programmer you need to be able to influence it on occasion.
So, does anybody know if there is any documentation of this behaviour? If not does anyone have any more knowledge of it? How are instance fields handled, I have not checked that yet. Of course I could try it all out for myself but I'm looking for a more formal statement and always prefer to avoid relying on implementation detail worked out by trial and error.
Update 1
To answer Remy's question, it mattered to me when I needed to finalize the object behind the interface before carrying out another finalization.
begin
AcquirePythonGIL;
try
PyObject := CreatePythonObject;
try
//do stuff with PyObject
finally
Finalize(PyObject);
end;
finally
ReleasePythonGIL;
end;
end;
As written like this it is fine. But in the real code I had a second implicit local which was finalized after the GIL was released and that bombed. I solved the problem by extracting the code inside the Acquire/Release GIL into a separate method and thus narrowed the scope of the interface variable.
If there is any documentation of this behavior, it will probably be in the area of compiler production of temporary variables to hold intermediate results when passing function results as parameters. Consider this code:
procedure UseInterface(foo: IInterface);
begin
end;
procedure Test()
begin
UseInterface(Create());
end;
The compiler has to create an implicit temp variable to hold the result of Create as it is passed into UseInterface, to make sure that the interface has a lifetime >= the lifetime of the UseInterface call. That implicit temp variable will be disposed at the end of the procedure that owns it, in this case at the end of the Test() procedure.
It's possible that your pointer assignment case may fall into the same bucket as passing intermediate interface values as function parameters, since the compiler can't "see" where the value is going.
I recall there have been a few bugs in this area over the years. Long ago (D3? D4?), the compiler didn't reference count the intermediate value at all. It worked most of the time, but got into trouble in parameter alias situations. Once that was addressed there was a follow up regarding const params, I believe. There was always a desire to move disposal of the intermediate value interface up to as soon as possible after the statement in which it was needed, but I don't think that ever got implemented in the Win32 optimizer because the compiler just wasn't set up for handling disposal at statement or block granularity.
You can not guarantee that compiler will not decide to create a temporal invisible variable.
And even if you do, the turned off optimization (or even stack frames?) may mess up your perfectly checked code.
And even if you manage to review your code under all possible combinations of project options - compiling your code under something like Lazarus or even new Delphi version will bring hell back.
A best bet would be to use "internal variables can not outlive routine" rule. We usually do not know, if compiler would create some internal variables or not, but we do know, that any such variables (if created) would be finalized when routine exists.
Therefore, if you have code like this:
// 1. Some code which may (or may not) create invisible variables
// 2. Some code which requires release of reference-counted data
E.g.:
Lib := LoadLibrary(Lib, 'xyz');
try
// Create interface
P := GetProcAddress(Lib, 'xyz');
I := P;
// Work with interface
finally
// Something that requires all interfaces to be released
FreeLibrary(Lib); // <- May be not OK
end;
Then you should just wrap "Work with interface" block into subroutine:
procedure Work(const Lib: HModule);
begin
// Create interface
P := GetProcAddress(Lib, 'xyz');
I := P;
// Work with interface
end; // <- Releases hidden variables (if any exist)
Lib := LoadLibrary(Lib, 'xyz');
try
Work(Lib);
finally
// Something that requires all interfaces to be released
FreeLibrary(Lib); // <- OK!
end;
It is a simple, but effective rule.
Related
Introduction
I encountered a problem with Currency in one of our applications. I was getting different results in Win32 and Win64. I found an article here that shows a similar problem but that one was fixed in XE6. The first thing I tried to do was create a MCVE to duplicate the problem. That's where the wheels fell off. What looks like the identical code in the MCVE produces a different result compared to the application. The generated code 64 bit is different. So my question morphed into why are they different and once I figure that out then I can create a suitable MCVE.
I have a method that is calculating a total. This method calls another method to get a value that needs to be added to the total. The method returns a single. I assign the single value to a variable and then add it to the total which is a Currency. In my main application the value for the total is used later on but adding that to the MCVE doesn't change the behavior. I made sure that the compiler options were the same.
In my main application, the result from the calculation is $2469.6001 in Win32 and 2469.6 in Win64 but I can't duplicate this in the MCVE. Everything on the Compiling options page was the same and optimizations were disabled.
Attempted MCVE
Here is the code for my attempted MCVE. This mimics the actions in the original application.
program Project4;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
type
TTestClass = class
strict private
FMyCurrency: Currency;
function GetTheValue: Single;
public
procedure Calculate;
property MyCurrency: Currency read FMyCurrency write FMyCurrency;
end;
procedure TTestClass.Calculate;
var
myValue: Single;
begin
FMyCurrency := 0.0;
myValue := GetTheValue;
FMyCurrency := FMyCurrency + myValue;
end;
function TTestClass.GetTheValue: Single;
var
myValueExact: Int32;
begin
myValueExact := 1159354778; // 2469.60009765625;
Result := PSingle(#myValueExact)^;
end;
var
testClass: TTestClass;
begin
testClass := TTestClass.Create;
try
testClass.Calculate;
WriteLn(CurrToStr(testClass.MyCurrency));
ReadLn;
finally
testClass.Free;
end;
end.
This code generates the following assembler for the last two lines of TTestClass.Calculate:
Project4.dpr.25: myValue := GetTheValue;
00000000004242A8 488B4D40 mov rcx,[rbp+$40]
00000000004242AC E83F000000 call TTestClass.GetTheValue
00000000004242B1 F30F11452C movss dword ptr [rbp+$2c],xmm0
Project4.dpr.26: FMyCurrency := FMyCurrency + myValue;
00000000004242B6 488B4540 mov rax,[rbp+$40]
00000000004242BA 488B4D40 mov rcx,[rbp+$40]
00000000004242BE F2480F2A4108 cvtsi2sd xmm0,qword ptr [rcx+$08]
00000000004242C4 F3480F5A4D2C cvtss2sd xmm1,qword ptr [rbp+$2c]
00000000004242CA F20F590D16000000 mulsd xmm1,qword ptr [rel $00000016]
00000000004242D2 F20F58C1 addsd xmm0,xmm1
00000000004242D6 F2480F2DC8 cvtsd2si rcx,xmm0
00000000004242DB 48894808 mov [rax+$08],rcx
Main Application
This is an extract from the main application. It's difficult to give more information but I don't think that will change the nature of the question. In this class, FBulkTotal is declared as a Currency that is strict private. UpdateTotals is public.
procedure TMainApplicationClass.UpdateTotals(aMyObject: TMyObject);
var
bulkTotal: Single;
begin
..
bulkTotal := grouping.GetTotal(aMyObject, Self);
FBulkTotal := FBulkTotal + bulkTotal;
..
end;
The generated code for these two lines is:
TheCodeUnit.pas.7357: bulkTotal := grouping.GetTotal(aMyObject, Self);
0000000006DB0804 488B4D68 mov rcx,[rbp+$68]
0000000006DB0808 488B9598000000 mov rdx,[rbp+$00000098]
0000000006DB080F 4C8B8590000000 mov r8,[rbp+$00000090]
0000000006DB0816 E8551C0100 call grouping.GetTotal
0000000006DB081B F30F114564 movss dword ptr [rbp+$64],xmm0
TheCodeUnit.pas.7358: FBulkTotal := FBulkTotal + bulkTotal;
0000000006DB0820 488B8590000000 mov rax,[rbp+$00000090]
0000000006DB0827 488B8D90000000 mov rcx,[rbp+$00000090]
0000000006DB082E F3480F2A8128010000 cvtsi2ss xmm0,qword ptr [rcx+$00000128]
0000000006DB0837 F30F104D64 movss xmm1,dword ptr [rbp+$64]
0000000006DB083C F30F590D54020000 mulss xmm1,dword ptr [rel $00000254]
0000000006DB0844 F30F58C1 addss xmm0,xmm1
0000000006DB0848 F3480F2DC8 cvtss2si rcx,xmm0
0000000006DB084D 48898828010000 mov [rax+$00000128],rcx
What's strange is that the generated code is different. The MCVE has a cvtsi2sd followed by a cvtss2sd but that main application uses a movss in place of the cvtss2sd when copying the contents of the single value into the xmm1 register. I pretty sure that is what is causing the different result but without being able to create a MCVE, I can't even confirm that it is a problem with the compiler.
Question
My question is what can cause these differences in code generation? I assumed that the optimizations could do this type of thing but I made sure those were the same.
You should not be using any floating point type values when dealing with currency.
I recommend you watch Floating Point Numbers video from Computerphile where he explains of how floating point values are handled by computers and why they should not be used when handling currency.
https://www.youtube.com/watch?v=PZRI1IfStY0
I was wondering about what happens internally when New and Disposed is called. Delphi help gives a reasonable explanation but what would have to be done if a person would write one's own New and Dispose? Which methods are called internally to the two or is it all assembly?
I am not looking to write my own New and Dispose. I am just extremely curious about both methods' internal working.
New does the following:
Allocate memory for the new object with a call to GetMem.
Initialize any managed fields such as strings, interfaces, dynamic arrays etc.
Dispose reverses this:
Finalize the managed fields.
Deallocate the memory with a call to FreeMem.
Note that both New and Dispose are intrinsic functions. This means that the compiler has extra knowledge of them and is able to vary how they are implemented depending on the type in question.
For instance, if the type has no managed fields then New is optimized into a simple call to GetMem. If the type has managed fields then New is implemented with a call to System._New, which performs the steps described above.
The implementation of Dispose is much the same. A simple call to FreeMem for non-managed types, and a call to System._Dispose otherwise.
Now, System._New is implemented like this:
function _New(Size: NativeInt; TypeInfo: Pointer): Pointer;
begin
GetMem(Result, Size);
if Result <> nil then
_Initialize(Result, TypeInfo);
end;
Note that I have just shown the PUREPASCAL variant for simplicity. The call to GetMem is simple enough. The call to System._Initialize is a lot more involved. That uses the TypeInfo argument to find all the managed types contained in the object and initialize them. This is a recursive process because, for example, a record may contain members that are themselves structure types. You can see all the gory details in the RTL source.
As for System._Dispose, it calls System._Finalize and then FreeMem. And System._Finalize is very similar to System._Initialize except that it finalizes managed types rather than initializing them.
It's long been something of a bug bear of performance sensitive Delphi users that System._Initialize and System._Finalize are implemented this way, on top of runtime type information. The types are known at compile time and the compiler could be written to inline the initialization and finalization which would result in better performance.
It is how to write your own New() and Dispose() functions, with your own record initialization/finalization:
uses
TypInfo;
procedure RecordInitialize(Dest, TypeInfo: pointer);
asm
{$ifdef CPUX64}
.NOFRAME
{$endif}
jmp System.#InitializeRecord
end;
procedure RecordClear(Dest, TypeInfo: pointer);
asm
{$ifdef CPUX64}
.NOFRAME
{$endif}
jmp System.#FinalizeRecord
end;
function NewRec1(TypeInfo: pointer): pointer;
begin
GetMem(result, GetTypeData(TypeInfo)^.RecSize);
RecordInitialize(result, TypeInfo);
end;
function NewRec2(TypeInfo: pointer): pointer;
var
len: integer;
begin
len := GetTypeData(TypeInfo)^.RecSize;
GetMem(result, len);
FillChar(result^, len, 0);
end;
procedure NewDispose(Rec: pointer; TypeInfo: Pointer);
begin
RecordClear(Rec, TypeInfo);
FreeMem(Rec);
end;
First, there is a low-level asm trick to call the hidden intrinsic functions we need.
I then propose two ways of initializing the record, one using System.#InitializeRecord, the other using FillChar. Also note that if you allocate your records within a dynamic array, the initialization/finalization of the items would be done automatically. And initialization won't call Initialize() but FillChar to fill the memory with zeros.
It is a pity that we could not define global functions/procedures with generic parameters, otherwise we may have get rid of the TypeInfo() parameter.
Some years ago, I re-wrote the RTL part of record initialization/finalization, for faster execution. Note that TObject would call FinalizeRecord, so it is a part of the RTL which gain of being optimized. The compiler should emit the code instead of using the RTTI - but at least the RTL should be optimized a bit more.
If you use our Open Source SynCommons.pas unit, and define the DOPATCHTRTL conditional for your project, you will have in-process patch of the RTL FillChar Move RecordCopy FinalizeRecord InitializeRecord TObject.CleanupInstance low-level functions, which would use optimized asssembly, and SSE2 opcodes if available.
A bit of context... I need to write a fix for a bug in the VCL's TField.CopyData routine.
Not being a huge fan of recompiling the VCL, I usually opt for method hooks.
Usually, it's pretty straightforward, write the replacement method and hook it in place of the original one:
HookProc(#TMenu.ProcessMenuChar, #TPatchMenu.ProcessMenuChar, Backup);
But this time, the method is virtual and overloaded. I can't seem to find a reliable way to get the right method address.
If it wasn't overloaded, I could simply use #TField.CopyData and it would work, but that isn't guaranteed to give the right address for an overloaded function.
If it wasn't virtual, I could extend the method described here and do the following
type
TCopyDataMethod = procedure(Source, Dest : TValueBuffer) of object;
procedure DoHook;
var vOldMethod : TCopyDataMethod;
begin
vOldMethod := TField(nil).CopyData;
HookProc(TMethod(vOldMethod).Code, #TSomeClass.NewMethod, Backup);
end;
But that gives an access violation (It is using the nil reference to lookup in the VMT for the right CopyData address).
I tried various syntax that all gave me "incompatible types" at compile time.
I did come up with a solution (posted as answer), but it is less than ideal.
Inspired from David Heffernan's answer, here's the solution I was looking/hoping for.
type
TCopyDataMethod = procedure(Source, Dest : TValueBuffer) of object;
procedure Proc;
var VMT : NativeInt;
vMethod : TCopyDataMethod;
begin
VMT := NativeInt(TField);
vMethod := TField(#Vmt).CopyData;
uOriginalAddress := #vMethod;
[...]
end;
I believe this is as straightforward as it can get.
The only solution I've found to this problem is to create an instance of the class
procedure Proc;
var vMethod : TCopyDataMethod;
vFieldOrig : TField;
[...]
begin
vFieldNew := nil;
vFieldOrig := TField.Create(nil);
try
vMethod := vFieldOrig.CopyData;
uOriginalAddress := #vMethod;
[...]
finally
vFieldOrig.Free;
end;
end;
Hopefully, there is a more elegant solution.
If you know the index in the VMT, you can get the address directly from the class. If you look at a call to CopyData under the debugger you see the following:
Project1.dpr.19: THackField(Field).CopyData(TValueBuffer(nil), TValueBuffer(nil));
0053886B 33C9 xor ecx,ecx
0053886D 33D2 xor edx,edx
0053886F A104345400 mov eax,[$00543404]
00538874 8B18 mov ebx,[eax]
00538876 FF93A4000000 call dword ptr [ebx+$000000a4]
Here we can see that in my XE7, the method point is at an offset of $000000a4 bytes from the VMT. And the class is a pointer to the VMT.
So, you can get your method address like this:
Pointer(NativeUInt(TField) + $00a4)
I expect that you could come up with a more elegant solution using enhanced RTTI. But you typically do this when looking to patch bugs in specific library versions. So you hard code values for specific versions, and throw a compiler error otherwise. At least, that's what I do!
Having said all of this, there's nothing to be ashamed of with instantiating an instance at startup as you demonstrate in your answer. So long as that has no unwanted side effects I can't see any real downside.
I've got some unexpected access violations for Delphi code that I think is correct, but seems to be miscompiled. I can reduce it to
procedure Run(Proc: TProc);
begin
Proc;
end;
procedure Test;
begin
Run(
procedure
var
S: PChar;
procedure Nested;
begin
Run(
procedure
begin
end);
S := 'Hello, world!';
end;
begin
Run(
procedure
begin
S := 'Hello';
end);
Nested;
ShowMessage(S);
end);
end;
What happens for me is that S := 'Hello, world!' is storing in the wrong location. Because of that, either an access violation is raised, or ShowMessage(S) shows "Hello" (and sometimes, an access violation is raised when freeing the objects used to implement anonymous procedures).
I'm using Delphi XE, all updates installed.
How can I know where this is going to cause problems? I know how to rewrite my code to avoid anonymous procedures, but I have trouble figuring out in precisely which situations they lead to wrong code, so I don't know where to avoid them.
It would be interesting to me to know if this is fixed in later versions of Delphi, but nothing more than interesting, upgrading is not an option at this point.
On QC, the most recent report I can find the similar #91876, but that is resolved in Delphi XE.
Update:
Based on AlexSC's comments, with a slight modification:
...
procedure Nested;
begin
Run(
procedure
begin
S := S;
end);
S := 'Hello, world!';
end;
...
does work.
The generated machine code for
S := 'Hello, world!';
in the failing program is
ScratchForm.pas.44: S := 'Hello, world!';
004BD971 B89CD94B00 mov eax,$004bd99c
004BD976 894524 mov [ebp+$24],eax
whereas the correct version is
ScratchForm.pas.45: S := 'Hello, world!';
004BD981 B8B0D94B00 mov eax,$004bd9b0
004BD986 8B5508 mov edx,[ebp+$08]
004BD989 8B52FC mov edx,[edx-$04]
004BD98C 89420C mov [edx+$0c],eax
The generated code in the failing program is not seeing that S has been moved to a compiler-generated class, [ebp+$24] is how outer local variables of nested methods are accessed how local variables are accessed.
Without seeing the whole Assembler Code for the whole (procedure Test) and only assuming on the Snippet you posted, it's probably that on the failing Snippet only a Pointer has been moved where on the correct version there is some Data moved too.
So it seems that S:=S or S:='' causes the Compiler to create a reference by it's own and could even allocate some Memory, which would explain why it works then.
I also assume that's why a Access Violation occurs without S:=S or S:='', because if there is no Memory allocated for the String (remember you only declared S: PChar) then a Access Violation is raised because non-allocated Memory was accessed.
If you simply declare S: String instead, this probably won't happen.
Additions after extended Commenting:
A PChar is only a Pointer to Data Structure, that must exist. Also another common Issue with PChar is to declare local Variables and then passing a PChar to that Variable to other Procs, because what happens is that the local Variable is freed once the routine ends, but the PChar will still point to it, which then raise Access Violations once accessed.
The only possibility that exists per Documentation is declaring something like that const S: PChar = 'Hello, world!' this works because the Compiler can resolve a relative Adresse to it. But this only works for Constants and not for Variables like on the Example above. Doing it like in the Example above needs Storage to be allocated for the string literal to which the PChar then points to like S:String; P:PChar; S:='Hello, world!'; P:=PChar(S); or similar.
If it still fails with declaring String or Integer then perhaps the Variable disappears somewhere along or suddenly isn't visible anymore in a proc, but that would be another Issue that has nothing to do with the existing PChar Issue explained already.
Final Conclusion:
It's possible to do S:PChar; S:='Hello, world!' but the Compiler then simply allocates it as a local or global Constant like const S: PChar = 'Hello, world!' does which is saved into Executable, a second S := 'Hello' then creates another one which is also saved into Executable and so on - but S then only points to the last one allocated, all others are still in the Executable but not accessible any more without knowing the exact Location, because S only points to the last one allocated.
So depending which one was the last S points either to Hello, world! or Hello. On the Example above i can only guess which one was the last and who knows perhaps the Compiler can only guess too and depending on optimizations and other unpredictable Factors S could suddenly point to unallocated Mem instead of the last one by the Time Showmessage(S) is executed which then raises a Access Violation.
How can I know where this is going to cause problems?
It's hard to tell at this point in time.
If we knew the nature of the fix in Delphi XE2 we'd be in a better position.
All you can do is refrain from using anonymous functions.
Delphi has had procedural variables, so the need for anonymous functions ready is not that dire.
See http://www.deltics.co.nz/blog/posts/48.
It would be interesting to me to know if this is fixed in later versions of Delphi
According to #Sertac Akyuz this has been fixed in XE2.
Personally I dislike anonymous methods and have had to ban people from using them in my Java projects because a sizable proportion of our code base was going anonymous (event handlers).
Used in extreme moderation I can see the use case.
But in Delphi where we have procedural variables and nested procedures... Not so much.
For some reason, trying to create a TLanguages object provided by the SysUtils header by using the singleton or by calling the constructor directly is causing trouble in the wild, where some users report this error (X varies):
Access violation at address X. Write of address X (at address X)
... when the following seemingly innocent line of code is executed:
TLanguages.Create;
To clarify, this is not related to context. I can put this line in any place I like (as the only line of code of an empty program for example), but the problem remains.
The weird part is that this class is part of Delphi's standard headers, which should not fail (right?).
constructor TLanguages.Create;
type
TCallbackThunk = packed record
POPEDX: Byte;
MOVEAX: Byte;
SelfPtr: Pointer;
PUSHEAX: Byte;
PUSHEDX: Byte;
JMP: Byte;
JmpOffset: Integer;
end;
var
Callback: TCallbackThunk;
begin
inherited Create;
Callback.POPEDX := $5A;
Callback.MOVEAX := $B8;
Callback.SelfPtr := Self;
Callback.PUSHEAX := $50;
Callback.PUSHEDX := $52;
Callback.JMP := $E9;
Callback.JmpOffset := Integer(#TLanguages.LocalesCallback) - Integer(#Callback.JMP) - 5;
EnumSystemLocales(TFNLocaleEnumProc(#Callback), LCID_SUPPORTED);
end;
The constructor attempts to use a member function as the EnumSystemLocales callback, which seems to be causing the crashes, because copying the TLanguages.LocalesCallback function to global scope and passing that to EnumSystemLocales works perfectly fine.
The struct contains the following Intel x86 assembly, where each item is given by its opcode:
pop edx
mov eax Self
push eax
push edx
jmp JmpOffset
Can anyone explain how the trick works and tell me why it's not working as expected?
It appears to be a known issue with older Delphi versions, related to DEP, as I guessed in comments to the question. It's clear that the code in the RTL cannot work when DEP is enabled.
Here's a link to confirm the theory: http://codecentral.embarcadero.com/Item/23411
Although that CodeCentral article includes code to fix the problem in Delphi 5, it looks like it will work in Delphi 7 too. The fix works by hooking the SysUtils.Languages function. So make sure you always use that rather than calling TLanguages.Create yourself, for obvious reasons.