Why is there two sequential move to EAX under optimization build? - delphi

I looked at the ASM code of a release build with all optimizations turned on, and here is one of the inlined function I came across:
0061F854 mov eax,[$00630bec]
0061F859 mov eax,[$00630e3c]
0061F85E mov edx,$00000001
0061F863 mov eax,[eax+edx*4]
0061F866 cmp byte ptr [eax],$01
0061F869 jnz $0061fa83
The code is pretty easy to understand, it builds an offset (1) into a table, compares the byte value from it to 1 and do a jump if NZ. I know the pointer to my table is stored in $00630e3c, but I have no idea where $00630bec is coming from.
Why is there two move to eax one after the other? Isn't the first one overwritten by the second one? Can this be a cache optimization thing or am I missing something unbelievably obvious/obscure?
The Delphi code for the above ASM is as follow:
if( TGameSignals.IsSet( EmitParticleSignal ) = True ) then [...]
IsSet() is an inlined class function and calls the inlined IsSet() function of TSignalManager:
class function TGameSignals.IsSet(Signal: PBucketSignal): Boolean;
begin
Result := FSignalManagerInstance.IsSet( Signal );
end;
The final IsSet of the signal manager is as such:
function TSignalManagerInstance.IsSet( Signal: PBucketSignal ): Boolean;
begin
Result := Signal.Pending;
end;

My best guess would be that $00630bec is a reference to the class TGameSignals. You can check it by doing
ShowMessage(IntToHex(NativeInt(TGameSignals), 8))
The pre-optimisation code was probably something like this
0061F854 mov eax,[$00630bec] //Move reference to class TGameSignals in EAX
0061F859 mov eax,[eax + $250] //Move Reference to FSignalManagerInstance at offset $250 in class TGameSignals in EAX
the compiler optimised [eax + $250] to [$00630e3c], but didn't realize the previous MOV wasn't required anymore.
I'm not an expert in codegen, so take it with a grain of salt...
On a side note, in delphi, we usually write
if TGameSignals.IsSet( EmitParticleSignal ) then
As it's possible for the following IF to be true
var vBool : Boolean
[...]
vBool := Boolean(10);
if vBool and (vBool <> True) then
Granted, this is not good practice, but no point in comparing to TRUE either.
EDIT: As pointed out by Ped7g, I was wrong. The instruction is
0061F854 mov eax,[$00630bec]
and not
0061F854 mov eax,$00630bec
So what I wrote didn't really make sense...
The first MOV instruction serve to pass the "self" reference for the call to TGameSignals.IsSet. Now, if the function wasn't inline, it would look like this :
mov eax,[$00630bec]
call TGameSignals.IsSet
and then
*TGameSignals.IsSet
mov eax,[$00630e3c]
[...]
The first mov is still pointless, since "Self" isn't used in TGameSignals.IsSet but it is still required to pass "self" to the function. When the routine get inlined, it looks a lot more silly, indeed.
Like mentioned by Arnaud Bouchez, making TGameSignals.IsSet static remove the implicit Self parameter and thus, remove the first MOV operation.

Related

Asm equivalent to a delphi procedure

I have a simple delphi function called SetCompare that compares two singles and if they are not equal then one value is set to the other.
procedure SetCompare( A : single; B : single );
begin
if( A <> B ) then
A := B;
end;
I am trying to convert this into asm as such:
procedure SetCompare( A : Single; B : Single ); register;
begin
asm
mov EAX,A
mov ECX,B
cmp EAX,ECX
jne SetValue
#SetValue:
mov EAX,ECX
end;
end;
Will this work?
Will this work?
No this will not work, because floating point comparison is not the same as binary comparison. For instance 0 and -0 have different bit patterns, but compare as equal. Similarly, NaN compares unequal to all values, including a NaN with the same bit pattern.
The simplest way to work out how to write your code is to get the compiler to compile the Pascal code, and inspect the generated assembly code.
Some asides:
Your function is pointless anyway, because it returns no value and has no side effects.
If performance matters enough to write assembler, then you should write pure assembler functions, rather than inline asm blocks in a Pascal function. Which in any case is not supported by the x64 compiler.
Your arguments are already in registers, so it makes little sense to copy them around to other registers. For x86 code, A arrives in EAX, and B arrives in EDX. Given that EAX already contains A, why would you copy it into EAX? It is already there. And B is already in EDX, why copy it to ECX? For x64 code, the two arguments are passed in floating point registers, and can be compared there directly. As soon as your start writing assembler you need to understand the register use of the calling convention.
Your jne is pointless. If execution does not take the jump, then it moves to the next line of code. Which is where you jumped to.

Inline asm (32) emulation of move (copy memory) command

I have two two-dimensional arrays with dynamic sizes (guess that's the proper wording). I copy the content of first one into the other using:
dest:=copy(src,0,4*x*y);
// src,dest:array of array of longint; x,y:longint;
// setlength(both arrays,x,y); //x and y are max 15 bit positive!
It works. However I'm unable to reproduce this in asm. I tried the following variations to no avail... Could someone enlighten me...
MOV ESI,src; MOV EDI,dest; MOV EBX,y; MOV EAX,x; MUL EBX;
PUSH DS; POP ES; MOV ECX,EAX; CLD; REP MOVSD;
Also tried with LEA (didn't expect that to work since it should fetch the pointer address not the array address), no workie, and tried with:
p1:=#src[0,0]; p2:=#dest[0,0]; //being no-type pointers
MOV ESI,p1; MOV EDI,p2... (the same asm)
Hints pls? Btw it's delphi 6. The error is, of course, access violation.
This is really a two-fold three-fold question.
What's the structure of a dynamic array.
Which instructions in asm will copy the array.
I'm throwing random assembler at the CPU, why doesn't it work?
Structure of a dynamic array
See: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Internal_Data_Formats
To quote:
Dynamic Array Types
On the 32-bit platform, a dynamic-array variable occupies 4 bytes of memory (and 8 bytes on 64-bit) that contain a pointer to the dynamically allocated array. When the variable is empty (uninitialized) or holds a zero-length array, the pointer is nil and no dynamic memory is associated with the variable. For a nonempty array, the variable points to a dynamically allocated block of memory that contains the array in addition to a 32-bit (64-bit on Win64) length indicator and a 32-bit reference count. The table below shows the layout of a dynamic-array memory block.
Dynamic array memory layout (32-bit and 64-bit)
Offset 32-bit -8 -4 0
Offset 64-bit -12 -8 0
contents refcount count start of data
So the dynamic array variable is a pointer to the middle of the above structure.
How do I access this in asm
Let's assume the array holds records of type TMyRec
you'll need to run this code for every inner array in the outer array to do the deep copy. I leave this as an exercise for the reader. (you can do the other part in pascal).
type
TDynArr: array of TMyRec;
procedure SlowButBasicMove(const Source: TDynArr; var dest);
asm
//insert register pushes, see below.
mov esi,Source //esi = pointer to source data
mov edi,Dest //edi = pointer to dest
sub esi,8
mov ebx,[esi] //ebx = refcount (just in case)
mov ecx,[esi+4] //ecx = element count
mov edx,SizeOf(TMyRec) //anywhere from 1 to zillions
mul ecx,edx //==ecx=number of bytes in array.
//// now we can start moving
xor ebx,ebx //ebx =0
add eax,8 //eax = #data
#loop:
mov eax,[esi+ebx] //Get data from source
mov [edi+ebx],esi //copy it to dest
add ebx,4 //4 bytes at a time
cmp ebx,ecx //is ebx> number of bytes?
jle loop
//Done copying.
//insert register pops, see below
end;
That's the copy done, however in order for the system not to crash, you need to save and restore the non volatile registers (all but EAX, ECX, EDX), see: http://docwiki.embarcadero.com/RADStudio/Seattle/en/Program_Control
push ebx
push esi
push edi
--- insert code shown above
//restore non-volatile registers
pop edi
pop esi
pop ebx //note the restoring must happen in the reverse order of the push.
See the Jeff Dunteman's book assembly step by step if you're completely new to asm.
You will get access violations if:
you try to read from a wrong address.
you try to write to a wrong adress.
you read past the end of the array.
you write to memory you haven't claimed before using GetMem or whatever means.
if you write past the end of your buffer.
if you do not restore all non-volatile registers
Remember you're directly dealing with the CPU. Delphi will not assist you in any way.
Really fast code will use some form of SSE to move 16bytes per instruction in an unrolled loop, see the above mentioned fastcode for examples of optimized assembler.
Random assembler
In assembler you need to know exactly what you're what to do, how and what the CPU does.
Set a breakpoint and run your code. Press ctrl + alt + C and behold the CPU-debug window.
This will allow you to see the code Delphi generates.
You can single step through the code to see what the CPU does.
see: http://www.plantation-productions.com/Webster/index.html
For more reading.
Dynamic Arrays differ from Static Arrays, especially when it comes to multi-dimensionality.
Refer to this reference for internal formats.
The point is that an Array Of Array Of LongInt of dimensions X and Y (in this order!) is a pointer to an array of X pointers that point to an array of Y LongInts.
Since it seems, from your comments, that you have already allocated the space for all elements in Dest, I assume you want to do a Deep Copy.
Here a sample program, where the assembly as been made as simple as possible for the sake of clarity.
Program Test;
Var Src, Dest: Array Of Array Of LongInt;
X, Y, I, J: Integer;
Begin
X := 4;
Y := 2;
setLength(Src, X, Y);
setLength(Dest, X, Y);
For I := 0 To X-1 Do
For J := 0 To Y-1 Do
Src[I,J] := I*Y + J;
{$ASMMODE intel}
Asm
cld ;Be sure movsd increments the registers
mov esi, DWORD PTR [Src] ;Src pointer
mov edi, DWORD PTR [Dest] ;Dest pointer
mov ecx, DWORD PTR [X] ;Repeat for X times
;The number of elements in Src
#_copy:
push esi ;Save these for later
push edi
push ecx
mov ecx, DWORD PTR [Y] ;Repeat for Y times
;The number of element in a Src[i] array
mov edi, DWORD PTR [edi] ;Get the pointer to the Dest[i] array
mov esi, DWORD PTR [esi] ;Get the pointer to the Src[i] array
rep movsd ;Copy sub array
pop ecx ;Restore
pop edi
pop esi
add esi, 04h ;Go from Src[i] to Src[i+1]
add edi, 04h ;Go from Dest[i] to Dest[i+1]
loop #_copy
End;
For I := 0 To X-1 Do
Begin
WriteLn();
For J := 0 To Y-1 Do
Begin
Write(' ');
Write(Dest[I,J]);
End;
End;
End.
Note 1 This source code is intended to be compile with freepascal.
Donation of Spare Time(TM) for downloading and installing Delphi are welcome!
Note 2 This source code is for illustration purpose only, it is pretty obvious, it has already been stated above, but somehow not everybody got it.
If the OP wanted a fast way to copy the array, they should have stated so.
Note 3 I don't save the clobbered registers, this is bad practice, my bad; I forgot, as there are no subroutines, no optimizations and no reason for the compiler to pass data in the registers between the two fors.
This is left as an exercise to the reader.

Accessing Delphi Class Fields in 64 bit inline assembler

I am trying to convert the Delphi TBits.GetBit to inline assembler for the 64 bit version. The VCL source looks like this:
function TBits.GetBit(Index: Integer): Boolean;
{$IFNDEF X86ASM}
var
LRelInt: PInteger;
LMask: Integer;
begin
if (Index >= FSize) or (Index < 0) then
Error;
{ Calculate the address of the related integer }
LRelInt := FBits;
Inc(LRelInt, Index div BitsPerInt);
{ Generate the mask }
LMask := (1 shl (Index mod BitsPerInt));
Result := (LRelInt^ and LMask) <> 0;
end;
{$ELSE X86ASM}
asm
CMP Index,[EAX].FSize
JAE TBits.Error
MOV EAX,[EAX].FBits
BT [EAX],Index
SBB EAX,EAX
AND EAX,1
end;
{$ENDIF X86ASM}
I started converting the 32 bit ASM code to 64 bit. After some searching, I found out that I need to change the EAX references to RAX for the 64 bit compiler. I ended up with this for the first line:
CMP Index,[RAX].FSize
This compiles but gives an access violation when it runs. I tried a few combinations (e.g. MOV ECX,[RAX].FSize) and get the same access violation when trying to access [RAX].FSize. When I look at the assembler that is generated by the Delphi compiler, it looks like my [RAX].FSize should be correct.
Unit72.pas.143: MOV ECX,[RAX].FSize
00000000006963C0 8B8868060000 mov ecx,[rax+$00000668]
And the Delphi generated code:
Unit72.pas.131: if (Index >= FSize) or (Index < 0) then
00000000006963CF 488B4550 mov rax,[rbp+$50]
00000000006963D3 8B4D58 mov ecx,[rbp+$58]
00000000006963D6 3B8868060000 cmp ecx,[rax+$00000668]
00000000006963DC 7D06 jnl TForm72.GetBit + $24
00000000006963DE 837D5800 cmp dword ptr [rbp+$58],$00
00000000006963E2 7D09 jnl TForm72.GetBit + $2D
In both cases, the resulting assembler uses [rax+$00000668] for FSize. What is the correct way to access a class field in Delphi 64bit Assembler?
This may sound like a strange thing to optimize but the assembler for the 64bit pascal version doesn't appear to be very efficient. We call this routine a large number of times and it takes anything up to 5 times as long to execute depending on various factors.
The basic problem is that you are using the wrong register. Self is passed as an implicit parameter, before all others. In the x64 calling convention, that means it is passed in RCX and not RAX.
So Self is passed in RCX and Index is passed in RDX. Frankly, I think it's a mistake to use parameter names in inline assembler because they hide the fact that the parameter was passed in a register. If you happen to overwrite either RDX, then that changes the apparent value of Index.
So the if statement might be coded as
CMP EDX,[RCX].FSize
JNL TBits.Error
CMP EDX,0
JL TBits.Error
FWIW, this is a really simple function to implement and I don't believe that you will need to use any stack space. You have enough registers in x64 to be able to do this entirely using volatile registers.

Address of Delphi label

I'm working on a simple PIC18 MCPU mnemonic simulation in Delphi pascal. And yes, I intend to use Delphi IDE.
I'm able to simulate any asm instruction, but it stops at labels.
In some cases I need to know the address of Delphi label.
Is there any possibility to cast label in to pointer variable?
As in my example?
procedure addlw(const n:byte); //emulation of mcpu addlw instruction
begin
Carry := (wreg + n) >= 256;
wreg := wreg + n;
Zero := wreg = 0;
inc(CpuCycles);
end;
procedure bnc(p: pointer ); //emulation of mcpu bnc instruction
asm
inc CpuCycles
cmp byte ptr Carry, 0
jnz #exit
pop eax //restore return addres from stack
jmp p
#exit:
end;
//EMULATION OF MCPU ASM CODE
procedure Test;
label
Top;
var
p: pointer;
begin
//
Top:
addlw(5); //emulated mcpu addlw instruction
bnc(Top); //emulated mcpu bnc branch if not carry instruction
//
end;
No, you can't interact with labels that way. Since you're emulating everything else, you may as well emulate assembler labels, too, instead of trying to force Delphi labels to do something they're not designed for.
Suppose you could use code like this instead of the "assembler" code you wrote (without worrying for now exactly how to implement it):
procedure Test;
var
Top: TAsmLabel;
begin
//
DefineLabel(Top);
addlw(5); //emulated mcpu addlw instruction
bnc(Top); //emulated mcpu bnc branch if not carry instruction
//
end;
The syntax looks similar enough, I think. Upon running that code, you'll want Top to refer to the next instruction, which is the one that calls addlw.
Inside the hypothetical function DefineLabel, that address corresponds to the return address, so write DefineLabel to store its return address in the given parameter:
type
TAsmLabel = Pointer;
procedure DefineLabel(out Result: TAsmLabel);
asm
mov ecx, [esp] // copy return address
mov [eax], ecx // store result
end;
Beware that this code corrupts the stack. Your bcn function leaves its return address on the stack, so when the carry flag eventually gets set, you've left a trail of previous return addresses on the stack. If you don't get a stack overflow first, you'll hit strange results when you get to the end of the containing function. It will try to return, but instead of going to the caller, it will find bnc's return address instead, and jump back into the middle of your code. And that's all assuming there aren't any other stack-relative references in the code. If there are, then even calling bnc(Top) might give problems because the relative position of Top will have changed, and you'll end up reading the wrong value off the stack.

Is it necessary to assign a default value to a variant returned from a Delphi function?

Gradually I've been using more variants - they can be very useful in certain places for carrying data types that are not known at compile time. One useful value is UnAssigned ('I've not got a value for you'). I think I discovered a long time ago that the function:
function DoSomething : variant;
begin
If SomeBoolean then
Result := 4.5
end;
appeared to be equivalent to:
function DoSomething : variant;
begin
If SomeBoolean then
Result := 4.5
else
Result := Unassigned; // <<<<
end;
I presumed this reasoning that a variant has to be created dynamically and if SomeBoolean was FALSE, the compiler had created it but it was 'Unassigned' (<> nil?). To further encourage this thinking, the compiler reports no warning if you omit assigning Result.
Just now I've spotted nasty bug where my first example (where 'Result' is not explicity defaulted to 'nil') actually returned an 'old' value from somewhere else.
Should I ALWAYS assign Result (as I do when using prefefined types) when returing a variant?
Yes, you always need to initialize the Result of a function, even if it's a managed type (like string and Variant). The compiler does generate some code to initialize the future return value of a Variant function for you (at least the Delphi 2010 compiler I used for testing purposes does) but the compiler doesn't guarantee your Result is initialized; This only makes testing more difficult, because you might run into a case where your Result was initialized, base your decisions on that, only to later discover your code is buggy because under certain circumstances the Result wasn't initialized.
From my investigation, I've noticed:
If your result is assigned to a global variable, your function is called with an initialized hidden temporary variable, creating the illusion that the Result is magically initialized.
If you make two assignments to the same global variable, you'll get two distinct hidden temporary variables, re-enforcing the illusion that Result's are initialized.
If you make two assignments to the same global variable but don't use the global variable between calls, the compiler only uses 1 hidden temporary, braking the previous rule!
If your variable is local to the calling procedure, no intermediary hidden local variable is used at all, so the Result isn't initialized.
Demonstration:
First, here's the proof that a function returning a Variant receives a var Result:
Variant hidden parameter. The following two compile to the exact same assembler, shown below:
procedure RetVarProc(var V:Variant);
begin
V := 1;
end;
function RetVarFunc: Variant;
begin
Result := 1;
end;
// Generated assembler:
push ebx // needs to be saved
mov ebx, eax // EAX contains the address of the return Variant, copies that to EBX
mov eax, ebx // ... not a very smart compiler
mov edx, $00000001
mov cl, $01
call #VarFromInt
pop ebx
ret
Next, it's interesting to see how the call for the two is set up by the complier. Here's what happens for a call to a procedure that has a var X:Variant parameter:
procedure Test;
var X: Variant;
begin
ProcThatTakesOneVarParameter(X);
end;
// compiles to:
lea eax, [ebp - $10]; // EAX gets the address of the local variable X
call ProcThatTakesOneVarParameter
If we make that "X" a global variable, and we call the function returning a Variant, we get this code:
var X: Variant;
procedure Test;
begin
X := FuncReturningVar;
end;
// compiles to:
lea eax, [ebp-$10] // EAX gets the address of a HIDDEN local variable.
call FuncReturningVar // Calls our function with the local variable as parameter
lea edx, [ebp-$10] // EDX gets the address of the same HIDDEN local variable.
mov eax, $00123445 // EAX is loaded with the address of the global variable X
call #VarCopy // This moves the result of FuncReturningVar into the global variable X
If you look at the prologue of this function you'll notice the local variable that's used as a temporary parameter for the call to FuncReturningVar is initialized to ZERO. If the function doesn't contain any Result := statements, X would be "Uninitialized". If we call the function again, a DIFFERENT temporary and hidden variable is used! Here's a bit of sample code to see that:
var X: Variant; // global variable
procedure Test;
begin
X := FuncReturningVar;
WriteLn(X); // Make sure we use "X"
X := FuncReturningVar;
WriteLn(X); // Again, make sure we use "X"
end;
// compiles to:
lea eax, [ebp-$10] // first local temporary
call FuncReturningVar
lea edx, [ebp-$10]
mov eax, $00123456
call #VarCopy
// [call to WriteLn using the actual address of X removed]
lea eax, [ebp-$20] // a DIFFERENT local temporary, again, initialized to Unassigned
call FuncReturningVar
// [ same as before, removed for brevity ]
When looking at that code, you'd think the "Result" of a function returning Variant is allways initialized to Unassigned by the calling party. Not true. If in the previous test we make the "X" variable a LOCAL variable (not global), the compiler no longer uses the two separate local temporary variables. So we've got two separate cases where the compiler generates different code. In other words, don't make any assumptions, always assign Result.
My guess about the different behavior: If the Variant variable can be accessed outside the current scope, as a global variable (or class field for that matter) would, the compiler generates code that uses the thread-safe #VarCopy function. If the variable is local to the function there are no multi-threading issues so the compiler can take the liberty to make direct assignments (no-longer calling #VarCopy).
Should I ALWAYS assign Result (as I do
when using prefefined types) when
returing a variant?
Yes.
Test this:
function DoSomething(SomeBoolean: Boolean) : variant;
begin
if SomeBoolean then
Result := 1
end;
Use the function like this:
var
xx: Variant;
begin
xx := DoSomething(True);
if xx <> Unassigned then
ShowMessage('Assigned');
xx := DoSomething(False);
if xx <> Unassigned then
ShowMessage('Assigned');
end;
xx will still be assigned after the second call to DoSomething.
Change the function to this:
function DoSomething(SomeBoolean: Boolean) : variant;
begin
Result := Unassigned;
if SomeBoolean then
Result := 1
end;
And xx is not assigned after the second call to DoSomething.

Resources