I come across a problem while porting Delphi BASM32 code to FPC:
program MulTest;
{$IFDEF FPC}
{$mode delphi}
{$asmmode intel}
{$ELSE}
{$APPTYPE CONSOLE}
{$ENDIF}
function Mul(A, B: LongWord): LongWord;
asm
MUL EAX,EDX
end;
begin
Writeln(Mul(10,20));
Readln;
end.
The above code compiles in Delphi XE and works as expected; FPC outputs compile-time error on MUL EAX,EDX line:
Error: Asm: [mul reg32,reg32] invalid combination of opcode and
operands
I am using Lazarus 1.4.4/FPC2.6.4 for Win32 (the current stable version)
Any workaround or fix for the problem?
FreePascal is correct. There are only 3 forms of MUL:
MUL r/m8
MUL r/m16
MUL r/m32
Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in register AL, AX or EAX (depending on the size of the operand); the source operand is located in a general-purpose register or a memory location.
In other words, the first operand (used for both input and output) is specified in AL/AX/EAX, and the second input operand is explicitly specified as either a general-purpose register or a memory address.
So, MUL EAX,EDX is indeed an invalid assembly instruction.
If you compile this code in Delphi and use the debugger to look at the generated assembly, you would see that the call to Mul(10,20) generates the following assembly code:
// Mul(10,20)
mov edx,$00000014
mov eax,$0000000a
call Mul
//MUL EAX,EDX
mul edx
So, as you can see, Delphi is actual parsing your source code, sees that the first operand is EAX and strips it off for you, thus producing the correct assembly. FreePascal is not doing that step for you.
The workaround? Write proper assembly code to begin with. Don't rely on the compiler to re-interpret your code for you.
function Mul(A, B: LongWord): LongWord;
asm
MUL EDX
end;
Or, you could simply not write assembly code directly, let the compiler do the work for you. It knows how to multiple two LongWord values together:
function Mul(A, B: LongWord): LongWord;
begin
Result := A * B;
end;
Though Delphi does use IMUL instead of MUL in this case. From Delphi's documentation:
The value of x / y is of type Extended, regardless of the types of x and y. For other arithmetic operators, the result is of type Extended whenever at least one operand is a real; otherwise, the result is of type Int64 when at least one operand is of type Int64; otherwise, the result is of type Integer. If an operand's type is a subrange of an integer type, it is treated as if it were of the integer type.
It also uses some unsightly bloated assembly unless stackframes are disabled and optimizations are enabled. By configuring those two options, it is possible to get Mul() to generate a single IMUL EDX instruction (plus the RET instruction, of course). If you don't want to change the options project-wide, you can isolate them to just Mul() by using the {$STACKFRAMES OFF}/{$W-} and {$OPTIMIZATION ON}/{$O+} compiler instructions.
{$IFOPT W+}{$W-}{$DEFINE SF_Was_On}{$ENDIF}
{$IFOPT O-}{$O+}{$DEFINE O_Was_Off}{$ENDIF}
function Mul(A, B: LongWord): LongWord;
begin
Result := A * B;
end;
{$IFDEF SF_Was_On}{W+}{$UNDEF SF_Was_On}{$ENDIF}
{$IFDEF O_Was_Off}{O-}{$UNDEF O_Was_Off}{$ENDIF}
Generates:
imul edx
ret
MUL always multiplies by AL, AX or EAX (more details), so you should specify only the other operand.
Related
Itβs well known for assembly coders in Delphi that any fields of a record, class etc. can be accessed from an asm code routine as shown in the example below:
type
THeader = packed record
field1: uint64;
field2: uint32;
end;
(* some code here *)
asm
mov rax, [rcx + THeader.field1]
mov edx, [rcx + THeader.field2]
end;
But what if β as the name suggests β this is just a header of a big, unpredictable sized data stream and I want to access the actual start position of the data stream (that is, the first byte after the header)?
A simple solution might be the one shown below (but I prefer something less unnatural, without defining a constant):
type
THeader = packed record
field1: uint64;
field2: uint32;
end;
(* start_of_data_stream: byte; *)
const
SIZEOFTHEADER = sizeof(THeader);
(* some code here *)
asm
mov al, [rcx + SIZEOFTHEADER] (* [rcx + THeader.start_of_data_stream] *)
end;
Any better ideas, maybe?
You can use TYPE(typename) to find the size of the type in an asm expression. For example:
mov al, [rcx + TYPE(THeader)]
This (together with a number of other useful operators) is documented: http://docwiki.embarcadero.com/RADStudio/en/Assembly_Expressions#Expression_Operators
I have a simple delphi function called SetCompare that compares two singles and if they are not equal then one value is set to the other.
procedure SetCompare( A : single; B : single );
begin
if( A <> B ) then
A := B;
end;
I am trying to convert this into asm as such:
procedure SetCompare( A : Single; B : Single ); register;
begin
asm
mov EAX,A
mov ECX,B
cmp EAX,ECX
jne SetValue
#SetValue:
mov EAX,ECX
end;
end;
Will this work?
Will this work?
No this will not work, because floating point comparison is not the same as binary comparison. For instance 0 and -0 have different bit patterns, but compare as equal. Similarly, NaN compares unequal to all values, including a NaN with the same bit pattern.
The simplest way to work out how to write your code is to get the compiler to compile the Pascal code, and inspect the generated assembly code.
Some asides:
Your function is pointless anyway, because it returns no value and has no side effects.
If performance matters enough to write assembler, then you should write pure assembler functions, rather than inline asm blocks in a Pascal function. Which in any case is not supported by the x64 compiler.
Your arguments are already in registers, so it makes little sense to copy them around to other registers. For x86 code, A arrives in EAX, and B arrives in EDX. Given that EAX already contains A, why would you copy it into EAX? It is already there. And B is already in EDX, why copy it to ECX? For x64 code, the two arguments are passed in floating point registers, and can be compared there directly. As soon as your start writing assembler you need to understand the register use of the calling convention.
Your jne is pointless. If execution does not take the jump, then it moves to the next line of code. Which is where you jumped to.
Quoted from https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html,
β Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
What is the Delphi equivalent to the C __builtin_clz() ? If there isn't, how to implement it efficiently in Delphi?
Actually, I want to use it to calculate the base-2 logarithm of an integer.
If you only care about 32 bit code then it goes like this:
function __builtin_clz(x: Cardinal): Cardinal;
asm
BSR EAX,EAX
NEG EAX
ADD EAX,32
end;
Or if you want to support 64 bit code as well then it would be:
function __builtin_clz(x: Cardinal): Cardinal;
{$IF Defined(CPUX64)}
asm
BSR ECX,ECX
NEG ECX
ADD ECX,31
MOV EAX,ECX
{$ENDIF}
{$IF Defined(CPUX86)}
asm
BSR EAX,EAX
NEG EAX
ADD EAX,31
{$ENDIF}
end;
It's likely that an asm guru could trim this down a little, but BSR (bit scan reverse) is the key instruction.
For the mobile compilers, I don't know how to do this efficiently.
I am trying to convert the Delphi TBits.GetBit to inline assembler for the 64 bit version. The VCL source looks like this:
function TBits.GetBit(Index: Integer): Boolean;
{$IFNDEF X86ASM}
var
LRelInt: PInteger;
LMask: Integer;
begin
if (Index >= FSize) or (Index < 0) then
Error;
{ Calculate the address of the related integer }
LRelInt := FBits;
Inc(LRelInt, Index div BitsPerInt);
{ Generate the mask }
LMask := (1 shl (Index mod BitsPerInt));
Result := (LRelInt^ and LMask) <> 0;
end;
{$ELSE X86ASM}
asm
CMP Index,[EAX].FSize
JAE TBits.Error
MOV EAX,[EAX].FBits
BT [EAX],Index
SBB EAX,EAX
AND EAX,1
end;
{$ENDIF X86ASM}
I started converting the 32 bit ASM code to 64 bit. After some searching, I found out that I need to change the EAX references to RAX for the 64 bit compiler. I ended up with this for the first line:
CMP Index,[RAX].FSize
This compiles but gives an access violation when it runs. I tried a few combinations (e.g. MOV ECX,[RAX].FSize) and get the same access violation when trying to access [RAX].FSize. When I look at the assembler that is generated by the Delphi compiler, it looks like my [RAX].FSize should be correct.
Unit72.pas.143: MOV ECX,[RAX].FSize
00000000006963C0 8B8868060000 mov ecx,[rax+$00000668]
And the Delphi generated code:
Unit72.pas.131: if (Index >= FSize) or (Index < 0) then
00000000006963CF 488B4550 mov rax,[rbp+$50]
00000000006963D3 8B4D58 mov ecx,[rbp+$58]
00000000006963D6 3B8868060000 cmp ecx,[rax+$00000668]
00000000006963DC 7D06 jnl TForm72.GetBit + $24
00000000006963DE 837D5800 cmp dword ptr [rbp+$58],$00
00000000006963E2 7D09 jnl TForm72.GetBit + $2D
In both cases, the resulting assembler uses [rax+$00000668] for FSize. What is the correct way to access a class field in Delphi 64bit Assembler?
This may sound like a strange thing to optimize but the assembler for the 64bit pascal version doesn't appear to be very efficient. We call this routine a large number of times and it takes anything up to 5 times as long to execute depending on various factors.
The basic problem is that you are using the wrong register. Self is passed as an implicit parameter, before all others. In the x64 calling convention, that means it is passed in RCX and not RAX.
So Self is passed in RCX and Index is passed in RDX. Frankly, I think it's a mistake to use parameter names in inline assembler because they hide the fact that the parameter was passed in a register. If you happen to overwrite either RDX, then that changes the apparent value of Index.
So the if statement might be coded as
CMP EDX,[RCX].FSize
JNL TBits.Error
CMP EDX,0
JL TBits.Error
FWIW, this is a really simple function to implement and I don't believe that you will need to use any stack space. You have enough registers in x64 to be able to do this entirely using volatile registers.
I ran into this issue using 64-bit inline assembler in Delphi XE3 that I don't understand.
I tried this, and it works on both 32-bit and 64-bit
function test(a, b: integer): integer; assembler; register;
asm
mov eax, a
add eax, edx
end;
However, this only works on 32-bit but not 64-bit, in 64-bit it compiles, but did not return correct result of the sum of two integers.
function test(a, b: integer): integer; assembler; register;
asm
add eax, edx
end;
I know previous FPU code such as FLD, STP works on 32-bit but it will give compilation error on 64-bit compiler. Any idea how to handle floating numbers in 64-bit ?
It is because 64-bit system uses own calling convention, and these parameters are in RCX and RDX registers. More explanations in MSDN.
About handling floating numbers - read Intel developer documentation.