Access local variables in a nested Delphi x64 assembly code - delphi

I want to access the local variables of a Delphi procedure from its nested assembly procedure. Although the compiler does allow the references of the local variables, it compiles wrong offsets which only work if the EBP/RBP values are hacked. In the x86 environment I found a fairly elegant hack, but in x64 I couldn't find yet any decent solution.
In the x86 environment the workaround below seems to work fine:
procedure Main;
var ABC: integer;
procedure Sub;
asm
mov ebp, [esp]
mov eax, ABC
end;
...
In the above code, the compiler treats the variable ABC as it would be in the body of Main, so hacking the value of EBP in the fist assembly line solves the problem. However, the same trick won't work in the x64 environment:
procedure Main;
var ABC: int64;
procedure Sub;
asm
mov rbp, [rsp]
mov rax, ABC
end;
...
In the above code, the compiler adds an offset when it references the variable ABC which isn't correct neither with the original (Main) value of the RBP, nor with its new (Sub) value. Moreover, changing the RBP in a 64-bit code isn't recommended, so I found the workaround below:
procedure Main;
var ABC: int64;
procedure Sub;
asm
add rcx, $30
mov rax, [rcx + OFFSET ABC]
end;
...
As the compiler passes the initial value of RBP in RCX, and the reference to the variable ABC can be hacked to be RCX rather than RBP based, the above code does work. However, the correction of $30 depends on the number of variables of Main, so this workaround is kind of a last resort stuff, and I'd very much like to find something more elegant.
Does anyone have a suggestion on how to do this in a more elegant way?
Note that:
Of course: in my real code there are a large number of local variables to be accessed from the ASM code, so solutions like passing the variables as parameters are ruled out.
I'm adding x64 compatibility to x86 code, and there are dozens of codes like this, so I'd need a workaround which transforms that code with small formal changes only (accessing the local variables in a fundamentally different way would become an inexhaustible source of bugs).
UPDATE:
Found a safe but relatively complicated solution: I added a local variable called Sync to find out the offset between the RBP values of Main and Sub, then I do the correction on the RBP:
procedure Main;
var Sync: int64; ABC: int64;
procedure Sub(var SubSync: int64);
asm
push rbp
lea rax, Sync
sub rdx, rax
add rbp, rdx
mov rax, ABC
pop rbp
end;
begin
ABC := 66;
Sub(Sync);
end;

So far nobody came with a solution, so I consider the code below to be the best known solution:
procedure Main;
var Sync: int64; ABC: int64;
procedure Sub(var SubSync: int64);
asm
push rbp
lea rax, Sync
sub rdx, rax
add rbp, rdx
mov rax, ABC
pop rbp
end;
begin
ABC := 66;
Sub(Sync);
end;
BTW, as this very much looks like a Delphi bug, I posted this to the Embarcadero as a bug report.

Related

Accessing the first byte AFTER a Delphi record, class etc. from assembly routine

It’s well known for assembly coders in Delphi that any fields of a record, class etc. can be accessed from an asm code routine as shown in the example below:
type
THeader = packed record
field1: uint64;
field2: uint32;
end;
(* some code here *)
asm
mov rax, [rcx + THeader.field1]
mov edx, [rcx + THeader.field2]
end;
But what if – as the name suggests – this is just a header of a big, unpredictable sized data stream and I want to access the actual start position of the data stream (that is, the first byte after the header)?
A simple solution might be the one shown below (but I prefer something less unnatural, without defining a constant):
type
THeader = packed record
field1: uint64;
field2: uint32;
end;
(* start_of_data_stream: byte; *)
const
SIZEOFTHEADER = sizeof(THeader);
(* some code here *)
asm
mov al, [rcx + SIZEOFTHEADER] (* [rcx + THeader.start_of_data_stream] *)
end;
Any better ideas, maybe?
You can use TYPE(typename) to find the size of the type in an asm expression. For example:
mov al, [rcx + TYPE(THeader)]
This (together with a number of other useful operators) is documented: http://docwiki.embarcadero.com/RADStudio/en/Assembly_Expressions#Expression_Operators

Missing "Return value of function might be undefined" if return value is record

When I write a function in Delphi 10.2.3 Pro that has a record as return value and I just leave it empty I don't get a W1035 "W1035 Return value of function '%s' might be undefined" warning. Why don't I get a warning?
Thanks in advance!
Unfortunately the warning is also suppressed for unmanaged record. I guess the reason is that it is difficult for the compiler to keep track with the changes of all the fields. Also, if you modify one of the fields of a record, do you consider the whole record as defined or not?
Here is the code to test:
{$O-}
type
TRecordType = record
a, b: Word;
end;
function Test: TRecordType;
begin
end;
procedure TForm1.FormCreate(Sender: TObject);
var
v: TRecordType;
begin
v := Test;
ShowMessage(Format('%d, %d', [v.a, v.b]));
end;
It does give me random numbers (Not changing if run repeatedly, because the stack is almost the same).
The assembly code for calling:
0046CCA9 E8CAFFFFFF call Test
0046CCAE 8945F8 mov [ebp-$08],eax
The assembly code of Test:
0046CC78 55 push ebp
0046CC79 8BEC mov ebp,esp
0046CC7B 51 push ecx
0046CC7C 8B45FC mov eax,[ebp-$04] ; eax is the result, not initiated
0046CC7F 59 pop ecx
0046CC80 5D pop ebp
0046CC81 C3 ret
When the size of the record is not 1, 2, 4, the result is passed by reference as a hidden parameter. For example, if a and b are integer, the assembly code for calling become
0046CCAA 8D45F4 lea eax,[ebp-$0c] ; address of V, V is not initiated
0046CCAD E8C6FFFFFF call Test
The assembly code of Test:
0046CC78 55 push ebp
0046CC79 8BEC mov ebp,esp
0046CC7B 51 push ecx
0046CC7C 8945FC mov [ebp-$04],eax
0046CC7F 59 pop ecx
0046CC80 5D pop ebp
0046CC81 C3 ret
So in both case, the result is undefined without warning.
From my point of view this is clearly a bug in the warning generator of the compiler.
Keep in mention for records the compilers has to check much more than it does for a simple type like integer or real. For example what would you expect here:
function Test: TRecordType;
begin
result.a := 1;
end;
Because the record is partly (for b) undefined the compiler should also mention a W1035 here. On the other side what would you expect for the records with variant cases:
TRecordType = record
ID: string;
case boolean of
true:
(a: integer);
false:
(b: string);
end;
As you can see here it is not possible to determinate if a record is completely initialized or not. At least the Delphi compiler could warn if static fields of a record are unassigned as we see in case the TRecordType is a class.
I recommend to enter this bug in the embarcadero quality reporting side at
https://quality.embarcadero.com
and reference to this thread on stack overflow.

Why is there two sequential move to EAX under optimization build?

I looked at the ASM code of a release build with all optimizations turned on, and here is one of the inlined function I came across:
0061F854 mov eax,[$00630bec]
0061F859 mov eax,[$00630e3c]
0061F85E mov edx,$00000001
0061F863 mov eax,[eax+edx*4]
0061F866 cmp byte ptr [eax],$01
0061F869 jnz $0061fa83
The code is pretty easy to understand, it builds an offset (1) into a table, compares the byte value from it to 1 and do a jump if NZ. I know the pointer to my table is stored in $00630e3c, but I have no idea where $00630bec is coming from.
Why is there two move to eax one after the other? Isn't the first one overwritten by the second one? Can this be a cache optimization thing or am I missing something unbelievably obvious/obscure?
The Delphi code for the above ASM is as follow:
if( TGameSignals.IsSet( EmitParticleSignal ) = True ) then [...]
IsSet() is an inlined class function and calls the inlined IsSet() function of TSignalManager:
class function TGameSignals.IsSet(Signal: PBucketSignal): Boolean;
begin
Result := FSignalManagerInstance.IsSet( Signal );
end;
The final IsSet of the signal manager is as such:
function TSignalManagerInstance.IsSet( Signal: PBucketSignal ): Boolean;
begin
Result := Signal.Pending;
end;
My best guess would be that $00630bec is a reference to the class TGameSignals. You can check it by doing
ShowMessage(IntToHex(NativeInt(TGameSignals), 8))
The pre-optimisation code was probably something like this
0061F854 mov eax,[$00630bec] //Move reference to class TGameSignals in EAX
0061F859 mov eax,[eax + $250] //Move Reference to FSignalManagerInstance at offset $250 in class TGameSignals in EAX
the compiler optimised [eax + $250] to [$00630e3c], but didn't realize the previous MOV wasn't required anymore.
I'm not an expert in codegen, so take it with a grain of salt...
On a side note, in delphi, we usually write
if TGameSignals.IsSet( EmitParticleSignal ) then
As it's possible for the following IF to be true
var vBool : Boolean
[...]
vBool := Boolean(10);
if vBool and (vBool <> True) then
Granted, this is not good practice, but no point in comparing to TRUE either.
EDIT: As pointed out by Ped7g, I was wrong. The instruction is
0061F854 mov eax,[$00630bec]
and not
0061F854 mov eax,$00630bec
So what I wrote didn't really make sense...
The first MOV instruction serve to pass the "self" reference for the call to TGameSignals.IsSet. Now, if the function wasn't inline, it would look like this :
mov eax,[$00630bec]
call TGameSignals.IsSet
and then
*TGameSignals.IsSet
mov eax,[$00630e3c]
[...]
The first mov is still pointless, since "Self" isn't used in TGameSignals.IsSet but it is still required to pass "self" to the function. When the routine get inlined, it looks a lot more silly, indeed.
Like mentioned by Arnaud Bouchez, making TGameSignals.IsSet static remove the implicit Self parameter and thus, remove the first MOV operation.

Accessing Delphi Class Fields in 64 bit inline assembler

I am trying to convert the Delphi TBits.GetBit to inline assembler for the 64 bit version. The VCL source looks like this:
function TBits.GetBit(Index: Integer): Boolean;
{$IFNDEF X86ASM}
var
LRelInt: PInteger;
LMask: Integer;
begin
if (Index >= FSize) or (Index < 0) then
Error;
{ Calculate the address of the related integer }
LRelInt := FBits;
Inc(LRelInt, Index div BitsPerInt);
{ Generate the mask }
LMask := (1 shl (Index mod BitsPerInt));
Result := (LRelInt^ and LMask) <> 0;
end;
{$ELSE X86ASM}
asm
CMP Index,[EAX].FSize
JAE TBits.Error
MOV EAX,[EAX].FBits
BT [EAX],Index
SBB EAX,EAX
AND EAX,1
end;
{$ENDIF X86ASM}
I started converting the 32 bit ASM code to 64 bit. After some searching, I found out that I need to change the EAX references to RAX for the 64 bit compiler. I ended up with this for the first line:
CMP Index,[RAX].FSize
This compiles but gives an access violation when it runs. I tried a few combinations (e.g. MOV ECX,[RAX].FSize) and get the same access violation when trying to access [RAX].FSize. When I look at the assembler that is generated by the Delphi compiler, it looks like my [RAX].FSize should be correct.
Unit72.pas.143: MOV ECX,[RAX].FSize
00000000006963C0 8B8868060000 mov ecx,[rax+$00000668]
And the Delphi generated code:
Unit72.pas.131: if (Index >= FSize) or (Index < 0) then
00000000006963CF 488B4550 mov rax,[rbp+$50]
00000000006963D3 8B4D58 mov ecx,[rbp+$58]
00000000006963D6 3B8868060000 cmp ecx,[rax+$00000668]
00000000006963DC 7D06 jnl TForm72.GetBit + $24
00000000006963DE 837D5800 cmp dword ptr [rbp+$58],$00
00000000006963E2 7D09 jnl TForm72.GetBit + $2D
In both cases, the resulting assembler uses [rax+$00000668] for FSize. What is the correct way to access a class field in Delphi 64bit Assembler?
This may sound like a strange thing to optimize but the assembler for the 64bit pascal version doesn't appear to be very efficient. We call this routine a large number of times and it takes anything up to 5 times as long to execute depending on various factors.
The basic problem is that you are using the wrong register. Self is passed as an implicit parameter, before all others. In the x64 calling convention, that means it is passed in RCX and not RAX.
So Self is passed in RCX and Index is passed in RDX. Frankly, I think it's a mistake to use parameter names in inline assembler because they hide the fact that the parameter was passed in a register. If you happen to overwrite either RDX, then that changes the apparent value of Index.
So the if statement might be coded as
CMP EDX,[RCX].FSize
JNL TBits.Error
CMP EDX,0
JL TBits.Error
FWIW, this is a really simple function to implement and I don't believe that you will need to use any stack space. You have enough registers in x64 to be able to do this entirely using volatile registers.

unusual behaviour in delphi assembly block

I am running into some weird behaviour with Delphi's inline assembly, as demonstrated in this very short and simple program:
program test;
{$APPTYPE CONSOLE}
uses
SysUtils;
type
TAsdf = class
public
int: Integer;
end;
TBlah = class
public
asdf: TAsdf;
constructor Create(a: TAsdf);
procedure Test;
end;
constructor TBlah.Create(a: TAsdf);
begin
asdf := a;
end;
procedure TBlah.Test;
begin
asm
mov eax, [asdf]
end;
end;
var
asdf: TAsdf;
blah: TBlah;
begin
asdf := TAsdf.Create;
blah := TBlah.Create(asdf);
blah.Test;
readln;
end.
It's just for the sake of example (moving [asdf] into eax doesn't do much, but it works for the example). If you look at the assembly for this program, you'll see that
mov eax, [asdf]
has been turned into
mov eax, ds:[4]
(as represented by OllyDbg) which obviously crashes. However, if you do this:
var
temp: TAsdf;
begin
temp := asdf;
asm
int 3;
mov eax, [temp];
end;
It changes to
mov eax, [ebp-4]
which works. Why is this? I'm usually working with C++ and I'm used to using instance vars like that, it may be that I'm using instance variables wrong.
EDIT: Yep, that was it. Changing mov eax, [asdf] to mov eax, [Self.asdf] fixes the problem. Sorry about that.
In the first case, mov eax,[asdf], the assembler will look up asdf and discover it is a field of offset 4 in the instance. Because you used an indirect addressing mode without a base address, it will only encode the offset (it looks like 0 + asdf to the assembler). Had you written it like this: mov eax, [eax].asdf, it would have been encoded as mov eax, [eax+4]. (here eax contains Self as passed in from the caller).
In the second case, the assembler will look up Temp and see that it is a local variable indexed by EBP. Because it knows the base address register to use, it can decide to encode it as [EBP-4].
A method receives the Self pointer in the EAX register. You have to use that value as the base value for accessing the object. So your code would be something like:
mov ebx, TBlah[eax].asdf
See http://www.delphi3000.com/articles/article_3770.asp for an example.

Resources