Accessing Delphi Class Fields in 64 bit inline assembler - delphi

I am trying to convert the Delphi TBits.GetBit to inline assembler for the 64 bit version. The VCL source looks like this:
function TBits.GetBit(Index: Integer): Boolean;
{$IFNDEF X86ASM}
var
LRelInt: PInteger;
LMask: Integer;
begin
if (Index >= FSize) or (Index < 0) then
Error;
{ Calculate the address of the related integer }
LRelInt := FBits;
Inc(LRelInt, Index div BitsPerInt);
{ Generate the mask }
LMask := (1 shl (Index mod BitsPerInt));
Result := (LRelInt^ and LMask) <> 0;
end;
{$ELSE X86ASM}
asm
CMP Index,[EAX].FSize
JAE TBits.Error
MOV EAX,[EAX].FBits
BT [EAX],Index
SBB EAX,EAX
AND EAX,1
end;
{$ENDIF X86ASM}
I started converting the 32 bit ASM code to 64 bit. After some searching, I found out that I need to change the EAX references to RAX for the 64 bit compiler. I ended up with this for the first line:
CMP Index,[RAX].FSize
This compiles but gives an access violation when it runs. I tried a few combinations (e.g. MOV ECX,[RAX].FSize) and get the same access violation when trying to access [RAX].FSize. When I look at the assembler that is generated by the Delphi compiler, it looks like my [RAX].FSize should be correct.
Unit72.pas.143: MOV ECX,[RAX].FSize
00000000006963C0 8B8868060000 mov ecx,[rax+$00000668]
And the Delphi generated code:
Unit72.pas.131: if (Index >= FSize) or (Index < 0) then
00000000006963CF 488B4550 mov rax,[rbp+$50]
00000000006963D3 8B4D58 mov ecx,[rbp+$58]
00000000006963D6 3B8868060000 cmp ecx,[rax+$00000668]
00000000006963DC 7D06 jnl TForm72.GetBit + $24
00000000006963DE 837D5800 cmp dword ptr [rbp+$58],$00
00000000006963E2 7D09 jnl TForm72.GetBit + $2D
In both cases, the resulting assembler uses [rax+$00000668] for FSize. What is the correct way to access a class field in Delphi 64bit Assembler?
This may sound like a strange thing to optimize but the assembler for the 64bit pascal version doesn't appear to be very efficient. We call this routine a large number of times and it takes anything up to 5 times as long to execute depending on various factors.

The basic problem is that you are using the wrong register. Self is passed as an implicit parameter, before all others. In the x64 calling convention, that means it is passed in RCX and not RAX.
So Self is passed in RCX and Index is passed in RDX. Frankly, I think it's a mistake to use parameter names in inline assembler because they hide the fact that the parameter was passed in a register. If you happen to overwrite either RDX, then that changes the apparent value of Index.
So the if statement might be coded as
CMP EDX,[RCX].FSize
JNL TBits.Error
CMP EDX,0
JL TBits.Error
FWIW, this is a really simple function to implement and I don't believe that you will need to use any stack space. You have enough registers in x64 to be able to do this entirely using volatile registers.

Related

Accessing the first byte AFTER a Delphi record, class etc. from assembly routine

It’s well known for assembly coders in Delphi that any fields of a record, class etc. can be accessed from an asm code routine as shown in the example below:
type
THeader = packed record
field1: uint64;
field2: uint32;
end;
(* some code here *)
asm
mov rax, [rcx + THeader.field1]
mov edx, [rcx + THeader.field2]
end;
But what if – as the name suggests – this is just a header of a big, unpredictable sized data stream and I want to access the actual start position of the data stream (that is, the first byte after the header)?
A simple solution might be the one shown below (but I prefer something less unnatural, without defining a constant):
type
THeader = packed record
field1: uint64;
field2: uint32;
end;
(* start_of_data_stream: byte; *)
const
SIZEOFTHEADER = sizeof(THeader);
(* some code here *)
asm
mov al, [rcx + SIZEOFTHEADER] (* [rcx + THeader.start_of_data_stream] *)
end;
Any better ideas, maybe?
You can use TYPE(typename) to find the size of the type in an asm expression. For example:
mov al, [rcx + TYPE(THeader)]
This (together with a number of other useful operators) is documented: http://docwiki.embarcadero.com/RADStudio/en/Assembly_Expressions#Expression_Operators

Why is there two sequential move to EAX under optimization build?

I looked at the ASM code of a release build with all optimizations turned on, and here is one of the inlined function I came across:
0061F854 mov eax,[$00630bec]
0061F859 mov eax,[$00630e3c]
0061F85E mov edx,$00000001
0061F863 mov eax,[eax+edx*4]
0061F866 cmp byte ptr [eax],$01
0061F869 jnz $0061fa83
The code is pretty easy to understand, it builds an offset (1) into a table, compares the byte value from it to 1 and do a jump if NZ. I know the pointer to my table is stored in $00630e3c, but I have no idea where $00630bec is coming from.
Why is there two move to eax one after the other? Isn't the first one overwritten by the second one? Can this be a cache optimization thing or am I missing something unbelievably obvious/obscure?
The Delphi code for the above ASM is as follow:
if( TGameSignals.IsSet( EmitParticleSignal ) = True ) then [...]
IsSet() is an inlined class function and calls the inlined IsSet() function of TSignalManager:
class function TGameSignals.IsSet(Signal: PBucketSignal): Boolean;
begin
Result := FSignalManagerInstance.IsSet( Signal );
end;
The final IsSet of the signal manager is as such:
function TSignalManagerInstance.IsSet( Signal: PBucketSignal ): Boolean;
begin
Result := Signal.Pending;
end;
My best guess would be that $00630bec is a reference to the class TGameSignals. You can check it by doing
ShowMessage(IntToHex(NativeInt(TGameSignals), 8))
The pre-optimisation code was probably something like this
0061F854 mov eax,[$00630bec] //Move reference to class TGameSignals in EAX
0061F859 mov eax,[eax + $250] //Move Reference to FSignalManagerInstance at offset $250 in class TGameSignals in EAX
the compiler optimised [eax + $250] to [$00630e3c], but didn't realize the previous MOV wasn't required anymore.
I'm not an expert in codegen, so take it with a grain of salt...
On a side note, in delphi, we usually write
if TGameSignals.IsSet( EmitParticleSignal ) then
As it's possible for the following IF to be true
var vBool : Boolean
[...]
vBool := Boolean(10);
if vBool and (vBool <> True) then
Granted, this is not good practice, but no point in comparing to TRUE either.
EDIT: As pointed out by Ped7g, I was wrong. The instruction is
0061F854 mov eax,[$00630bec]
and not
0061F854 mov eax,$00630bec
So what I wrote didn't really make sense...
The first MOV instruction serve to pass the "self" reference for the call to TGameSignals.IsSet. Now, if the function wasn't inline, it would look like this :
mov eax,[$00630bec]
call TGameSignals.IsSet
and then
*TGameSignals.IsSet
mov eax,[$00630e3c]
[...]
The first mov is still pointless, since "Self" isn't used in TGameSignals.IsSet but it is still required to pass "self" to the function. When the routine get inlined, it looks a lot more silly, indeed.
Like mentioned by Arnaud Bouchez, making TGameSignals.IsSet static remove the implicit Self parameter and thus, remove the first MOV operation.

Asm equivalent to a delphi procedure

I have a simple delphi function called SetCompare that compares two singles and if they are not equal then one value is set to the other.
procedure SetCompare( A : single; B : single );
begin
if( A <> B ) then
A := B;
end;
I am trying to convert this into asm as such:
procedure SetCompare( A : Single; B : Single ); register;
begin
asm
mov EAX,A
mov ECX,B
cmp EAX,ECX
jne SetValue
#SetValue:
mov EAX,ECX
end;
end;
Will this work?
Will this work?
No this will not work, because floating point comparison is not the same as binary comparison. For instance 0 and -0 have different bit patterns, but compare as equal. Similarly, NaN compares unequal to all values, including a NaN with the same bit pattern.
The simplest way to work out how to write your code is to get the compiler to compile the Pascal code, and inspect the generated assembly code.
Some asides:
Your function is pointless anyway, because it returns no value and has no side effects.
If performance matters enough to write assembler, then you should write pure assembler functions, rather than inline asm blocks in a Pascal function. Which in any case is not supported by the x64 compiler.
Your arguments are already in registers, so it makes little sense to copy them around to other registers. For x86 code, A arrives in EAX, and B arrives in EDX. Given that EAX already contains A, why would you copy it into EAX? It is already there. And B is already in EDX, why copy it to ECX? For x64 code, the two arguments are passed in floating point registers, and can be compared there directly. As soon as your start writing assembler you need to understand the register use of the calling convention.
Your jne is pointless. If execution does not take the jump, then it moves to the next line of code. Which is where you jumped to.

Inline asm (32) emulation of move (copy memory) command

I have two two-dimensional arrays with dynamic sizes (guess that's the proper wording). I copy the content of first one into the other using:
dest:=copy(src,0,4*x*y);
// src,dest:array of array of longint; x,y:longint;
// setlength(both arrays,x,y); //x and y are max 15 bit positive!
It works. However I'm unable to reproduce this in asm. I tried the following variations to no avail... Could someone enlighten me...
MOV ESI,src; MOV EDI,dest; MOV EBX,y; MOV EAX,x; MUL EBX;
PUSH DS; POP ES; MOV ECX,EAX; CLD; REP MOVSD;
Also tried with LEA (didn't expect that to work since it should fetch the pointer address not the array address), no workie, and tried with:
p1:=#src[0,0]; p2:=#dest[0,0]; //being no-type pointers
MOV ESI,p1; MOV EDI,p2... (the same asm)
Hints pls? Btw it's delphi 6. The error is, of course, access violation.
This is really a two-fold three-fold question.
What's the structure of a dynamic array.
Which instructions in asm will copy the array.
I'm throwing random assembler at the CPU, why doesn't it work?
Structure of a dynamic array
See: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Internal_Data_Formats
To quote:
Dynamic Array Types
On the 32-bit platform, a dynamic-array variable occupies 4 bytes of memory (and 8 bytes on 64-bit) that contain a pointer to the dynamically allocated array. When the variable is empty (uninitialized) or holds a zero-length array, the pointer is nil and no dynamic memory is associated with the variable. For a nonempty array, the variable points to a dynamically allocated block of memory that contains the array in addition to a 32-bit (64-bit on Win64) length indicator and a 32-bit reference count. The table below shows the layout of a dynamic-array memory block.
Dynamic array memory layout (32-bit and 64-bit)
Offset 32-bit -8 -4 0
Offset 64-bit -12 -8 0
contents refcount count start of data
So the dynamic array variable is a pointer to the middle of the above structure.
How do I access this in asm
Let's assume the array holds records of type TMyRec
you'll need to run this code for every inner array in the outer array to do the deep copy. I leave this as an exercise for the reader. (you can do the other part in pascal).
type
TDynArr: array of TMyRec;
procedure SlowButBasicMove(const Source: TDynArr; var dest);
asm
//insert register pushes, see below.
mov esi,Source //esi = pointer to source data
mov edi,Dest //edi = pointer to dest
sub esi,8
mov ebx,[esi] //ebx = refcount (just in case)
mov ecx,[esi+4] //ecx = element count
mov edx,SizeOf(TMyRec) //anywhere from 1 to zillions
mul ecx,edx //==ecx=number of bytes in array.
//// now we can start moving
xor ebx,ebx //ebx =0
add eax,8 //eax = #data
#loop:
mov eax,[esi+ebx] //Get data from source
mov [edi+ebx],esi //copy it to dest
add ebx,4 //4 bytes at a time
cmp ebx,ecx //is ebx> number of bytes?
jle loop
//Done copying.
//insert register pops, see below
end;
That's the copy done, however in order for the system not to crash, you need to save and restore the non volatile registers (all but EAX, ECX, EDX), see: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Program_Control
push ebx
push esi
push edi
--- insert code shown above
//restore non-volatile registers
pop edi
pop esi
pop ebx //note the restoring must happen in the reverse order of the push.
See the Jeff Dunteman's book assembly step by step if you're completely new to asm.
You will get access violations if:
you try to read from a wrong address.
you try to write to a wrong adress.
you read past the end of the array.
you write to memory you haven't claimed before using GetMem or whatever means.
if you write past the end of your buffer.
if you do not restore all non-volatile registers
Remember you're directly dealing with the CPU. Delphi will not assist you in any way.
Really fast code will use some form of SSE to move 16bytes per instruction in an unrolled loop, see the above mentioned fastcode for examples of optimized assembler.
Random assembler
In assembler you need to know exactly what you're what to do, how and what the CPU does.
Set a breakpoint and run your code. Press ctrl + alt + C and behold the CPU-debug window.
This will allow you to see the code Delphi generates.
You can single step through the code to see what the CPU does.
see: http://www.plantation-productions.com/Webster/index.html
For more reading.
Dynamic Arrays differ from Static Arrays, especially when it comes to multi-dimensionality.
Refer to this reference for internal formats.
The point is that an Array Of Array Of LongInt of dimensions X and Y (in this order!) is a pointer to an array of X pointers that point to an array of Y LongInts.
Since it seems, from your comments, that you have already allocated the space for all elements in Dest, I assume you want to do a Deep Copy.
Here a sample program, where the assembly as been made as simple as possible for the sake of clarity.
Program Test;
Var Src, Dest: Array Of Array Of LongInt;
X, Y, I, J: Integer;
Begin
X := 4;
Y := 2;
setLength(Src, X, Y);
setLength(Dest, X, Y);
For I := 0 To X-1 Do
For J := 0 To Y-1 Do
Src[I,J] := I*Y + J;
{$ASMMODE intel}
Asm
cld ;Be sure movsd increments the registers
mov esi, DWORD PTR [Src] ;Src pointer
mov edi, DWORD PTR [Dest] ;Dest pointer
mov ecx, DWORD PTR [X] ;Repeat for X times
;The number of elements in Src
#_copy:
push esi ;Save these for later
push edi
push ecx
mov ecx, DWORD PTR [Y] ;Repeat for Y times
;The number of element in a Src[i] array
mov edi, DWORD PTR [edi] ;Get the pointer to the Dest[i] array
mov esi, DWORD PTR [esi] ;Get the pointer to the Src[i] array
rep movsd ;Copy sub array
pop ecx ;Restore
pop edi
pop esi
add esi, 04h ;Go from Src[i] to Src[i+1]
add edi, 04h ;Go from Dest[i] to Dest[i+1]
loop #_copy
End;
For I := 0 To X-1 Do
Begin
WriteLn();
For J := 0 To Y-1 Do
Begin
Write(' ');
Write(Dest[I,J]);
End;
End;
End.
Note 1 This source code is intended to be compile with freepascal.
Donation of Spare Time(TM) for downloading and installing Delphi are welcome!
Note 2 This source code is for illustration purpose only, it is pretty obvious, it has already been stated above, but somehow not everybody got it.
If the OP wanted a fast way to copy the array, they should have stated so.
Note 3 I don't save the clobbered registers, this is bad practice, my bad; I forgot, as there are no subroutines, no optimizations and no reason for the compiler to pass data in the registers between the two fors.
This is left as an exercise to the reader.

unusual behaviour in delphi assembly block

I am running into some weird behaviour with Delphi's inline assembly, as demonstrated in this very short and simple program:
program test;
{$APPTYPE CONSOLE}
uses
SysUtils;
type
TAsdf = class
public
int: Integer;
end;
TBlah = class
public
asdf: TAsdf;
constructor Create(a: TAsdf);
procedure Test;
end;
constructor TBlah.Create(a: TAsdf);
begin
asdf := a;
end;
procedure TBlah.Test;
begin
asm
mov eax, [asdf]
end;
end;
var
asdf: TAsdf;
blah: TBlah;
begin
asdf := TAsdf.Create;
blah := TBlah.Create(asdf);
blah.Test;
readln;
end.
It's just for the sake of example (moving [asdf] into eax doesn't do much, but it works for the example). If you look at the assembly for this program, you'll see that
mov eax, [asdf]
has been turned into
mov eax, ds:[4]
(as represented by OllyDbg) which obviously crashes. However, if you do this:
var
temp: TAsdf;
begin
temp := asdf;
asm
int 3;
mov eax, [temp];
end;
It changes to
mov eax, [ebp-4]
which works. Why is this? I'm usually working with C++ and I'm used to using instance vars like that, it may be that I'm using instance variables wrong.
EDIT: Yep, that was it. Changing mov eax, [asdf] to mov eax, [Self.asdf] fixes the problem. Sorry about that.
In the first case, mov eax,[asdf], the assembler will look up asdf and discover it is a field of offset 4 in the instance. Because you used an indirect addressing mode without a base address, it will only encode the offset (it looks like 0 + asdf to the assembler). Had you written it like this: mov eax, [eax].asdf, it would have been encoded as mov eax, [eax+4]. (here eax contains Self as passed in from the caller).
In the second case, the assembler will look up Temp and see that it is a local variable indexed by EBP. Because it knows the base address register to use, it can decide to encode it as [EBP-4].
A method receives the Self pointer in the EAX register. You have to use that value as the base value for accessing the object. So your code would be something like:
mov ebx, TBlah[eax].asdf
See http://www.delphi3000.com/articles/article_3770.asp for an example.

Resources