64 bit inline assembly in Delphi XE3 - delphi

I ran into this issue using 64-bit inline assembler in Delphi XE3 that I don't understand.
I tried this, and it works on both 32-bit and 64-bit
function test(a, b: integer): integer; assembler; register;
asm
mov eax, a
add eax, edx
end;
However, this only works on 32-bit but not 64-bit, in 64-bit it compiles, but did not return correct result of the sum of two integers.
function test(a, b: integer): integer; assembler; register;
asm
add eax, edx
end;
I know previous FPU code such as FLD, STP works on 32-bit but it will give compilation error on 64-bit compiler. Any idea how to handle floating numbers in 64-bit ?

It is because 64-bit system uses own calling convention, and these parameters are in RCX and RDX registers. More explanations in MSDN.
About handling floating numbers - read Intel developer documentation.

Related

Access local variables in a nested Delphi x64 assembly code

I want to access the local variables of a Delphi procedure from its nested assembly procedure. Although the compiler does allow the references of the local variables, it compiles wrong offsets which only work if the EBP/RBP values are hacked. In the x86 environment I found a fairly elegant hack, but in x64 I couldn't find yet any decent solution.
In the x86 environment the workaround below seems to work fine:
procedure Main;
var ABC: integer;
procedure Sub;
asm
mov ebp, [esp]
mov eax, ABC
end;
...
In the above code, the compiler treats the variable ABC as it would be in the body of Main, so hacking the value of EBP in the fist assembly line solves the problem. However, the same trick won't work in the x64 environment:
procedure Main;
var ABC: int64;
procedure Sub;
asm
mov rbp, [rsp]
mov rax, ABC
end;
...
In the above code, the compiler adds an offset when it references the variable ABC which isn't correct neither with the original (Main) value of the RBP, nor with its new (Sub) value. Moreover, changing the RBP in a 64-bit code isn't recommended, so I found the workaround below:
procedure Main;
var ABC: int64;
procedure Sub;
asm
add rcx, $30
mov rax, [rcx + OFFSET ABC]
end;
...
As the compiler passes the initial value of RBP in RCX, and the reference to the variable ABC can be hacked to be RCX rather than RBP based, the above code does work. However, the correction of $30 depends on the number of variables of Main, so this workaround is kind of a last resort stuff, and I'd very much like to find something more elegant.
Does anyone have a suggestion on how to do this in a more elegant way?
Note that:
Of course: in my real code there are a large number of local variables to be accessed from the ASM code, so solutions like passing the variables as parameters are ruled out.
I'm adding x64 compatibility to x86 code, and there are dozens of codes like this, so I'd need a workaround which transforms that code with small formal changes only (accessing the local variables in a fundamentally different way would become an inexhaustible source of bugs).
UPDATE:
Found a safe but relatively complicated solution: I added a local variable called Sync to find out the offset between the RBP values of Main and Sub, then I do the correction on the RBP:
procedure Main;
var Sync: int64; ABC: int64;
procedure Sub(var SubSync: int64);
asm
push rbp
lea rax, Sync
sub rdx, rax
add rbp, rdx
mov rax, ABC
pop rbp
end;
begin
ABC := 66;
Sub(Sync);
end;
So far nobody came with a solution, so I consider the code below to be the best known solution:
procedure Main;
var Sync: int64; ABC: int64;
procedure Sub(var SubSync: int64);
asm
push rbp
lea rax, Sync
sub rdx, rax
add rbp, rdx
mov rax, ABC
pop rbp
end;
begin
ABC := 66;
Sub(Sync);
end;
BTW, as this very much looks like a Delphi bug, I posted this to the Embarcadero as a bug report.

FPC BASM32 MUL bug?

I come across a problem while porting Delphi BASM32 code to FPC:
program MulTest;
{$IFDEF FPC}
{$mode delphi}
{$asmmode intel}
{$ELSE}
{$APPTYPE CONSOLE}
{$ENDIF}
function Mul(A, B: LongWord): LongWord;
asm
MUL EAX,EDX
end;
begin
Writeln(Mul(10,20));
Readln;
end.
The above code compiles in Delphi XE and works as expected; FPC outputs compile-time error on MUL EAX,EDX line:
Error: Asm: [mul reg32,reg32] invalid combination of opcode and
operands
I am using Lazarus 1.4.4/FPC2.6.4 for Win32 (the current stable version)
Any workaround or fix for the problem?
FreePascal is correct. There are only 3 forms of MUL:
MUL r/m8
MUL r/m16
MUL r/m32
Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in register AL, AX or EAX (depending on the size of the operand); the source operand is located in a general-purpose register or a memory location.
In other words, the first operand (used for both input and output) is specified in AL/AX/EAX, and the second input operand is explicitly specified as either a general-purpose register or a memory address.
So, MUL EAX,EDX is indeed an invalid assembly instruction.
If you compile this code in Delphi and use the debugger to look at the generated assembly, you would see that the call to Mul(10,20) generates the following assembly code:
// Mul(10,20)
mov edx,$00000014
mov eax,$0000000a
call Mul
//MUL EAX,EDX
mul edx
So, as you can see, Delphi is actual parsing your source code, sees that the first operand is EAX and strips it off for you, thus producing the correct assembly. FreePascal is not doing that step for you.
The workaround? Write proper assembly code to begin with. Don't rely on the compiler to re-interpret your code for you.
function Mul(A, B: LongWord): LongWord;
asm
MUL EDX
end;
Or, you could simply not write assembly code directly, let the compiler do the work for you. It knows how to multiple two LongWord values together:
function Mul(A, B: LongWord): LongWord;
begin
Result := A * B;
end;
Though Delphi does use IMUL instead of MUL in this case. From Delphi's documentation:
The value of x / y is of type Extended, regardless of the types of x and y. For other arithmetic operators, the result is of type Extended whenever at least one operand is a real; otherwise, the result is of type Int64 when at least one operand is of type Int64; otherwise, the result is of type Integer. If an operand's type is a subrange of an integer type, it is treated as if it were of the integer type.
It also uses some unsightly bloated assembly unless stackframes are disabled and optimizations are enabled. By configuring those two options, it is possible to get Mul() to generate a single IMUL EDX instruction (plus the RET instruction, of course). If you don't want to change the options project-wide, you can isolate them to just Mul() by using the {$STACKFRAMES OFF}/{$W-} and {$OPTIMIZATION ON}/{$O+} compiler instructions.
{$IFOPT W+}{$W-}{$DEFINE SF_Was_On}{$ENDIF}
{$IFOPT O-}{$O+}{$DEFINE O_Was_Off}{$ENDIF}
function Mul(A, B: LongWord): LongWord;
begin
Result := A * B;
end;
{$IFDEF SF_Was_On}{W+}{$UNDEF SF_Was_On}{$ENDIF}
{$IFDEF O_Was_Off}{O-}{$UNDEF O_Was_Off}{$ENDIF}
Generates:
imul edx
ret
MUL always multiplies by AL, AX or EAX (more details), so you should specify only the other operand.

What is the Delphi equivalent to the C __builtin_clz()?

Quoted from https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html,
— Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
What is the Delphi equivalent to the C __builtin_clz() ? If there isn't, how to implement it efficiently in Delphi?
Actually, I want to use it to calculate the base-2 logarithm of an integer.
If you only care about 32 bit code then it goes like this:
function __builtin_clz(x: Cardinal): Cardinal;
asm
BSR EAX,EAX
NEG EAX
ADD EAX,32
end;
Or if you want to support 64 bit code as well then it would be:
function __builtin_clz(x: Cardinal): Cardinal;
{$IF Defined(CPUX64)}
asm
BSR ECX,ECX
NEG ECX
ADD ECX,31
MOV EAX,ECX
{$ENDIF}
{$IF Defined(CPUX86)}
asm
BSR EAX,EAX
NEG EAX
ADD EAX,31
{$ENDIF}
end;
It's likely that an asm guru could trim this down a little, but BSR (bit scan reverse) is the key instruction.
For the mobile compilers, I don't know how to do this efficiently.

Converting to Delphi 64 bit?

OgUtil.pas in onguard I want to porting to 64 bit
Delphi 64 bit not allow to use ASM with pascal Can I convert this function to work with delphi 64 bit
function LockFile(Handle : THandle;
FileOffsetLow, FileOffsetHigh,
LockCountLow, LockCountHigh : Word) : Boolean;
var
Error : Word;
begin
asm
mov ax,$5C00
mov bx,Handle
mov cx,FileOffsetHigh
mov dx,FileOffsetLow
mov si,LockCountHigh
mov di,LockCountLow
int $21
jc ##001
xor ax,ax
##001:
mov Error,ax
end;
Result := Error = 0;
end;
can convert this code to completely pascal
function UnlockFile(Handle : THandle;
FileOffsetLow, FileOffsetHigh,
UnLockCountLow, UnLockCountHigh : Word) : Boolean;
var
Error : Word;
begin
asm
mov ax, $5C01
mov bx,Handle
mov cx,FileOffsetHigh
mov dx,FileOffsetLow
mov si,UnLockCountHigh
mov di,UnLockCountLow
int $21
jc ##001
xor ax, ax
##001:
mov Error, ax
end;
Result := Error = 0;
end;
Delphi 64 bit not allow to use ASM with pascal Can I convert this function to work with delphi 64 bit
Please help to converting this code to pascal
You are calling the old DOS LockFile and UnlockFile functions via the interruption 21h, you can update and replace these calls by the LockFile and UnlockFile WinApi methods, which are defined in the Windows unit.
function LockFile(hFile: THandle; dwFileOffsetLow, dwFileOffsetHigh: DWORD;
nNumberOfBytesToLockLow, nNumberOfBytesToLockHigh: DWORD): BOOL; stdcall;
function UnlockFile(hFile: THandle; dwFileOffsetLow, dwFileOffsetHigh: DWORD;
nNumberOfBytesToUnlockLow, nNumberOfBytesToUnlockHigh: DWORD): BOOL; stdcall;
The Delphi x64 compiler does indeed support inline assembler. There is nothing to stop you writing inline assembler for the x64 compiler.
However, this is 16 bit code, and you cannot port it to either the 32 bit or 64 bit compiler. I suspect that what you have here is that OnGuard supports both 16 bit and 32 bit code. And it uses conditional compilation in places where there needs to be different implementation for 16 and 32 bit code. I bet that the OnGuard assumes that anything that is not 32 bit code is 16 bit code.
So there will likely be a {$IFDEF WIN32} test somewhere. And the code will not define LockFile and UnlockFile if that condition evaluates to True because the functions are defined in the Windows API now. And when that condition evaluates to False, the code assumes 16 bit and defines the functions. But since you are trying to support x64, the {$IFDEF WIN32} check evaluates False and the code attempts to compile 16 bit code, obviously doomed to fail.
Frankly, the best thing you can do is to remove all of the 16 bit code from this library. That will help you see the wood from the trees. I expect there will be other places in the code which attempt to use the 16 bit code simply because Win32 is not defined in the 64 bit compiler.
Update
And a quick check of the OnGuard repo reveals this code, just as I suspected:
{$IFNDEF Win32}
function LockFile(Handle : THandle; FileOffsetLow, FileOffsetHigh,
LockCountLow, LockCountHigh : Word) : Boolean;
function UnlockFile(Handle : THandle; FileOffsetLow, FileOffsetHigh,
UnLockCountLow, UnLockCountHigh : Word) : Boolean;
function FlushFileBuffers(Handle : THandle) : Boolean;
{$ENDIF}
And there are plenty more tests of Win32 which assume that the lack of that define means that the code is 16 bit. Truly this is 20th century code!
You need to look through the library for all uses of the Win32 conditional. Each and every one that you find will present a porting problem for x64.
The basic strategy you must adopt is that you want to use the Win32 variant for both 32 bit and 64 bit. So if I were you I would simply hunt down every Win32 conditional and remove the conditional. Leave behind the Win32 branch of the conditional.

How can I get SSE rounding in a Delphi version prior to XE2?

How can I get XE2 style rounding in the previous Delphi versions, so with SSE ?
Inline Delphi assembly supports SSE instructions since a while.
two overloaded versions are possible: for single and double.
In addition two versions are possible: input as Parameter or as pointer.
This version is particularly faster than the native Round()/Trunc() methods.
To round you have:
Function RoundSSE(Value: Single): Integer; Overload;
Asm
// additional PUSH/POP pointer stack added automatically
CVTSS2SI EAX, Value
End;
Function RoundSSE(Value: Double): Integer; Overload;
Asm
// additional PUSH/POP pointer stack added automatically
MOVQ XMM0,Value
CVTSD2SI EAX, XMM0
End;
Function RoundMEM_SSE(Var Value: Single): Integer; Overload;
Asm
// as written, fatest version
CVTSS2SI EAX, [Value]
End;
Function RoundMEM_SSE(Var Value: Double): Integer; Overload;
Asm
// as written, fatest version
CVTSD2SI EAX, [Value]
End;
To truncate you have the same with CVTTSS2SI / CVTTSD2SI:
Function TruncSSE(Value: Single): Integer; Overload;
Asm
// additional PUSH/POP pointer stack added automatically
CVTTSS2SI EAX, Value
End;
Function TruncSSE(Value: Double): Integer; Overload;
Asm
// additional PUSH/POP pointer stack added automatically
MOVQ XMM0,Value
CVTTSD2SI EAX, XMM0
End;
Function TruncMEM_SSE(Var Value: Single): Integer; Overload;
Asm
// as written, fatest version
CVTTSS2SI EAX, [Value]
End;
Function TruncMEM_SSE(Var Value: Double): Integer; Overload;
Asm
// as written, fatest version
CVTTSD2SI EAX, [Value]
End;
To Floor, Ceil, use respectively *TruncMEM_SSE(value)* and RoundSSE(value + 0.5).
These functions will give you a 20% perf gain. It has been tested in
loops and in a real program (with a memory cache filled/ an instruction cache filled, so it can be considered as a real-life test).

Resources