Quoted from https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html,
— Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
What is the Delphi equivalent to the C __builtin_clz() ? If there isn't, how to implement it efficiently in Delphi?
Actually, I want to use it to calculate the base-2 logarithm of an integer.
If you only care about 32 bit code then it goes like this:
function __builtin_clz(x: Cardinal): Cardinal;
asm
BSR EAX,EAX
NEG EAX
ADD EAX,32
end;
Or if you want to support 64 bit code as well then it would be:
function __builtin_clz(x: Cardinal): Cardinal;
{$IF Defined(CPUX64)}
asm
BSR ECX,ECX
NEG ECX
ADD ECX,31
MOV EAX,ECX
{$ENDIF}
{$IF Defined(CPUX86)}
asm
BSR EAX,EAX
NEG EAX
ADD EAX,31
{$ENDIF}
end;
It's likely that an asm guru could trim this down a little, but BSR (bit scan reverse) is the key instruction.
For the mobile compilers, I don't know how to do this efficiently.
Related
I have this function (RDRand - written by David Heffernan) that seam to work ok in 32 bit, but failed in 64 bit :
function TryRdRand(out Value: Cardinal): Boolean;
{$IF defined(CPU64BITS)}
asm .noframe
{$else}
asm
{$ifend}
db $0f
db $c7
db $f1
jc #success
xor eax,eax
ret
#success:
mov [eax],ecx
mov eax,1
end;
doc of the function is here: https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide
Especially it's written :
Essentially, developers invoke this instruction with a single operand:
the destination register where the random value will be stored. Note
that this register must be a general purpose register, and the size of
the register (16, 32, or 64 bits) will determine the size of the
random value returned.
After invoking the RDRAND instruction, the caller must examine the
carry flag (CF) to determine whether a random value was available at
the time the RDRAND instruction was executed. As Table 3 shows, a
value of 1 indicates that a random value was available and placed in
the destination register provided in the invocation. A value of 0
indicates that a random value was not available. In current
architectures the destination register will also be zeroed as a side
effect of this condition.
My knowledge of ASM is quite low, what did I miss ?
Also I do not quite understand this instruction :
...
xor eax,eax
ret
...
What it's does exactly ?
If you want a function that performs exactly the same then I think that looks like this:
function TryRdRand(out Value: Cardinal): Boolean;
asm
{$if defined(WIN64)}
.noframe
// rdrand eax
db $0f
db $c7
db $f0
jnc #fail
mov [rcx],eax
{$elseif defined(WIN32)}
// rdrand ecx
db $0f
db $c7
db $f1
jnc #fail
mov [eax],ecx
{$else}
{$Message Fatal 'TryRdRand not implemented for this platform'}
{$endif}
mov eax,1
ret
#fail:
xor eax,eax
end;
The suggestion made by Peter Cordes of implementing a retry loop in the asm looks sensible to me. I will not attempt to implement that here, since I think it is somewhat outside the scope of your question.
Also, Peter points out that in x64 you can read a 64 bit random value with the REX.W=1 prefix. That would look like this:
function TryRdRand(out Value: NativeUInt): Boolean;
asm
{$if defined(WIN64)}
.noframe
// rdrand rax
db $48 // REX.W = 1
db $0f
db $c7
db $f0
jnc #fail
mov [rcx],rax
{$elseif defined(WIN32)}
// rdrand ecx
db $0f
db $c7
db $f1
jnc #fail
mov [eax],ecx
{$else}
{$Message Fatal 'TryRdRand not implemented for this platform'}
{$endif}
mov eax,1
ret
#fail:
xor eax,eax
end;
I come across a problem while porting Delphi BASM32 code to FPC:
program MulTest;
{$IFDEF FPC}
{$mode delphi}
{$asmmode intel}
{$ELSE}
{$APPTYPE CONSOLE}
{$ENDIF}
function Mul(A, B: LongWord): LongWord;
asm
MUL EAX,EDX
end;
begin
Writeln(Mul(10,20));
Readln;
end.
The above code compiles in Delphi XE and works as expected; FPC outputs compile-time error on MUL EAX,EDX line:
Error: Asm: [mul reg32,reg32] invalid combination of opcode and
operands
I am using Lazarus 1.4.4/FPC2.6.4 for Win32 (the current stable version)
Any workaround or fix for the problem?
FreePascal is correct. There are only 3 forms of MUL:
MUL r/m8
MUL r/m16
MUL r/m32
Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in register AL, AX or EAX (depending on the size of the operand); the source operand is located in a general-purpose register or a memory location.
In other words, the first operand (used for both input and output) is specified in AL/AX/EAX, and the second input operand is explicitly specified as either a general-purpose register or a memory address.
So, MUL EAX,EDX is indeed an invalid assembly instruction.
If you compile this code in Delphi and use the debugger to look at the generated assembly, you would see that the call to Mul(10,20) generates the following assembly code:
// Mul(10,20)
mov edx,$00000014
mov eax,$0000000a
call Mul
//MUL EAX,EDX
mul edx
So, as you can see, Delphi is actual parsing your source code, sees that the first operand is EAX and strips it off for you, thus producing the correct assembly. FreePascal is not doing that step for you.
The workaround? Write proper assembly code to begin with. Don't rely on the compiler to re-interpret your code for you.
function Mul(A, B: LongWord): LongWord;
asm
MUL EDX
end;
Or, you could simply not write assembly code directly, let the compiler do the work for you. It knows how to multiple two LongWord values together:
function Mul(A, B: LongWord): LongWord;
begin
Result := A * B;
end;
Though Delphi does use IMUL instead of MUL in this case. From Delphi's documentation:
The value of x / y is of type Extended, regardless of the types of x and y. For other arithmetic operators, the result is of type Extended whenever at least one operand is a real; otherwise, the result is of type Int64 when at least one operand is of type Int64; otherwise, the result is of type Integer. If an operand's type is a subrange of an integer type, it is treated as if it were of the integer type.
It also uses some unsightly bloated assembly unless stackframes are disabled and optimizations are enabled. By configuring those two options, it is possible to get Mul() to generate a single IMUL EDX instruction (plus the RET instruction, of course). If you don't want to change the options project-wide, you can isolate them to just Mul() by using the {$STACKFRAMES OFF}/{$W-} and {$OPTIMIZATION ON}/{$O+} compiler instructions.
{$IFOPT W+}{$W-}{$DEFINE SF_Was_On}{$ENDIF}
{$IFOPT O-}{$O+}{$DEFINE O_Was_Off}{$ENDIF}
function Mul(A, B: LongWord): LongWord;
begin
Result := A * B;
end;
{$IFDEF SF_Was_On}{W+}{$UNDEF SF_Was_On}{$ENDIF}
{$IFDEF O_Was_Off}{O-}{$UNDEF O_Was_Off}{$ENDIF}
Generates:
imul edx
ret
MUL always multiplies by AL, AX or EAX (more details), so you should specify only the other operand.
I am trying to convert the Delphi TBits.GetBit to inline assembler for the 64 bit version. The VCL source looks like this:
function TBits.GetBit(Index: Integer): Boolean;
{$IFNDEF X86ASM}
var
LRelInt: PInteger;
LMask: Integer;
begin
if (Index >= FSize) or (Index < 0) then
Error;
{ Calculate the address of the related integer }
LRelInt := FBits;
Inc(LRelInt, Index div BitsPerInt);
{ Generate the mask }
LMask := (1 shl (Index mod BitsPerInt));
Result := (LRelInt^ and LMask) <> 0;
end;
{$ELSE X86ASM}
asm
CMP Index,[EAX].FSize
JAE TBits.Error
MOV EAX,[EAX].FBits
BT [EAX],Index
SBB EAX,EAX
AND EAX,1
end;
{$ENDIF X86ASM}
I started converting the 32 bit ASM code to 64 bit. After some searching, I found out that I need to change the EAX references to RAX for the 64 bit compiler. I ended up with this for the first line:
CMP Index,[RAX].FSize
This compiles but gives an access violation when it runs. I tried a few combinations (e.g. MOV ECX,[RAX].FSize) and get the same access violation when trying to access [RAX].FSize. When I look at the assembler that is generated by the Delphi compiler, it looks like my [RAX].FSize should be correct.
Unit72.pas.143: MOV ECX,[RAX].FSize
00000000006963C0 8B8868060000 mov ecx,[rax+$00000668]
And the Delphi generated code:
Unit72.pas.131: if (Index >= FSize) or (Index < 0) then
00000000006963CF 488B4550 mov rax,[rbp+$50]
00000000006963D3 8B4D58 mov ecx,[rbp+$58]
00000000006963D6 3B8868060000 cmp ecx,[rax+$00000668]
00000000006963DC 7D06 jnl TForm72.GetBit + $24
00000000006963DE 837D5800 cmp dword ptr [rbp+$58],$00
00000000006963E2 7D09 jnl TForm72.GetBit + $2D
In both cases, the resulting assembler uses [rax+$00000668] for FSize. What is the correct way to access a class field in Delphi 64bit Assembler?
This may sound like a strange thing to optimize but the assembler for the 64bit pascal version doesn't appear to be very efficient. We call this routine a large number of times and it takes anything up to 5 times as long to execute depending on various factors.
The basic problem is that you are using the wrong register. Self is passed as an implicit parameter, before all others. In the x64 calling convention, that means it is passed in RCX and not RAX.
So Self is passed in RCX and Index is passed in RDX. Frankly, I think it's a mistake to use parameter names in inline assembler because they hide the fact that the parameter was passed in a register. If you happen to overwrite either RDX, then that changes the apparent value of Index.
So the if statement might be coded as
CMP EDX,[RCX].FSize
JNL TBits.Error
CMP EDX,0
JL TBits.Error
FWIW, this is a really simple function to implement and I don't believe that you will need to use any stack space. You have enough registers in x64 to be able to do this entirely using volatile registers.
Using the fldcw instruction it's possible to change the precision of the FPU unit to 24 or more bits. However after doing some testing I'm starting to think that very few x87 operations are in fact using that setting.
I haven't tested all operations but on this test machine so far, it looks like only fdiv and fsqrt stop computing at the selected precision, and that all other operations (fadd fsub fmul...) always compute full extended precision.
If that was the case I would expect it to be because those 2 instructions (fdiv and fsqrt) are significantly slower then most other x87 FPU instructions so when lower precision is sufficient it's possible to speed them up, but really, I'm just wondering if this has always been the case or if it's a quirk of the very recent processor used in my test machine.
edit: here's delphi code to show it
program Project1;
uses
windows,dialogs,sysutils;
{$R *.res}
const
test_mul:single=1234567890.0987654321;
var
i:longint;
s:single absolute i;
s1,s2,s3:single;
procedure test_24;
asm
mov word([esp-2]),$103f // 24bit precision, trunc
fldcw word([esp-2])
fld [s]
fmul [test_mul]
fstp [s1]
end;
procedure test_53;
asm
mov word([esp-2]),$123f // 53bit precision, trunc
fldcw word([esp-2])
fld [s]
fmul [test_mul]
fstp [s2]
end;
procedure test_64;
asm
mov word([esp-2]),$133f // 64bit precision, trunc
fldcw word([esp-2])
fld [s]
fmul [test_mul]
fstp [s3]
end;
begin
i:=0;
repeat
test_24;
test_53;
test_64;
if (s1<>s2) or (s2<>s3) then begin
showmessage('Error at step:'+inttostr(i));
break;
end;
inc(i);
until i=0;
showmessage('No difference found between precisions');
end.
edit2: false alarm, I was mistaken, I was storing as single instead of extended so couldn't catch the difference, here's a fixed test, thanks to hans passant for catching my mistake:
program Project1;
uses
windows,dialogs,sysutils;
{$R *.res}
const
test_mul:single=1234567890.0987654321;
var
i:longint;
errors:cardinal;
s:single absolute i;
s1,s2,s3:extended;
procedure test_24;
asm
mov word([esp-2]),$103f // 24bit precision, trunc
fldcw word([esp-2])
fld [s]
fmul [test_mul]
fstp [s1]
end;
procedure test_53;
asm
mov word([esp-2]),$123f // 53bit precision, trunc
fldcw word([esp-2])
fld [s]
fmul [test_mul]
fstp [s2]
end;
procedure test_64;
asm
mov word([esp-2]),$133f // 64bit precision, trunc
fldcw word([esp-2])
fld [s]
fmul [test_mul]
fstp [s3]
end;
begin
errors:=0;
i:=0;
repeat
test_24;
test_53;
test_64;
if (s1<>s2) or (s2<>s3) then begin
inc(errors);
end;
inc(i);
until i=0;
showmessage('Number of differences between precisions: '+inttostr(errors));
end.
I am running into some weird behaviour with Delphi's inline assembly, as demonstrated in this very short and simple program:
program test;
{$APPTYPE CONSOLE}
uses
SysUtils;
type
TAsdf = class
public
int: Integer;
end;
TBlah = class
public
asdf: TAsdf;
constructor Create(a: TAsdf);
procedure Test;
end;
constructor TBlah.Create(a: TAsdf);
begin
asdf := a;
end;
procedure TBlah.Test;
begin
asm
mov eax, [asdf]
end;
end;
var
asdf: TAsdf;
blah: TBlah;
begin
asdf := TAsdf.Create;
blah := TBlah.Create(asdf);
blah.Test;
readln;
end.
It's just for the sake of example (moving [asdf] into eax doesn't do much, but it works for the example). If you look at the assembly for this program, you'll see that
mov eax, [asdf]
has been turned into
mov eax, ds:[4]
(as represented by OllyDbg) which obviously crashes. However, if you do this:
var
temp: TAsdf;
begin
temp := asdf;
asm
int 3;
mov eax, [temp];
end;
It changes to
mov eax, [ebp-4]
which works. Why is this? I'm usually working with C++ and I'm used to using instance vars like that, it may be that I'm using instance variables wrong.
EDIT: Yep, that was it. Changing mov eax, [asdf] to mov eax, [Self.asdf] fixes the problem. Sorry about that.
In the first case, mov eax,[asdf], the assembler will look up asdf and discover it is a field of offset 4 in the instance. Because you used an indirect addressing mode without a base address, it will only encode the offset (it looks like 0 + asdf to the assembler). Had you written it like this: mov eax, [eax].asdf, it would have been encoded as mov eax, [eax+4]. (here eax contains Self as passed in from the caller).
In the second case, the assembler will look up Temp and see that it is a local variable indexed by EBP. Because it knows the base address register to use, it can decide to encode it as [EBP-4].
A method receives the Self pointer in the EAX register. You have to use that value as the base value for accessing the object. So your code would be something like:
mov ebx, TBlah[eax].asdf
See http://www.delphi3000.com/articles/article_3770.asp for an example.